Beijing, China
Beijing, China

Baidu百度, Inc. , incorporated on January 18, 2000, is a Chinese web services company headquartered in the Baidu Campus in Haidian District in Beijing.Baidu offers many services, including a Chinese language-search engine for websites, audio files, and images. Baidu offers 57 search and community services including Baidu Baike and a searchable, keyword-based discussion forum. Baidu was established in 2000 by Robin Li and Eric Xu. Both of the co-founders are Chinese nationals who studied and worked overseas before returning to China. In May 2014, Baidu ranked 5th overall in the Alexa Internet rankings. During Q4 of 2010, it is estimated that there were 4.02 billion search queries in China of which Baidu had a market share of 56.6%. China's Internet-search revenue share in second quarter 2011 by Baidu is 76%. In December 2007, Baidu became the first Chinese company to be included in the NASDAQ-100 index. In December 2014, Baidu was expected to invest in the company Uber.Baidu provides an index of over 740 million web pages, 80 million images, and 10 million multimedia files. Baidu offers multimedia content including MP3 music, and movies, and is the first in China to offer Wireless Application Protocol and personal digital assistant -based mobile search.Baidu Baike is similar to Wikipedia as an encyclopedia; however, unlike Wikipedia, only registered users can edit the articles. While access to Wikipedia has been intermittently blocked or certain articles filtered in China since June 2004, there is some controversy about the degree to which Baidu cooperates with Chinese government censorship. Wikipedia.


Time filter

Source Type

Disclosed are systems and methods that implement efficient engines for computation-intensive tasks such as neural network deployment. Various embodiments of the invention provide for high-throughput batching that increases throughput of streaming data in high-traffic applications, such as real-time speech transcription. In embodiments, throughput is increased by dynamically assembling into batches and processing together user requests that randomly arrive at unknown timing such that not all the data is present at once at the time of batching. Some embodiments allow for performing steaming classification using pre-processing. The gains in performance allow for more efficient use of a compute engine and drastically reduce the cost of deploying large neural networks at scale, while meeting strict application requirements and adding relatively little computational latency so as to maintain a satisfactory application experience.


Described herein are systems and methods for generating and using attention-based deep learning architectures for visual question answering task (VQA) to automatically generate answers for image-related (still or video images) questions. To generate the correct answers, it is important for a models attention to focus on the relevant regions of an image according to the question because different questions may ask about the attributes of different image regions. In embodiments, such question-guided attention is learned with a configurable convolutional neural network (ABC-CNN). Embodiments of the ABC-CNN models determine the attention maps by convolving image feature map with the configurable convolutional kernels determined by the questions semantics. In embodiments, the question-guided attention maps focus on the question-related regions and filters out noise in the unrelated regions.


Described herein are systems and methods that address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, embodiments are able to efficiently hypothesize the semantic meaning of new words and add them to model word dictionaries so that they can be used to describe images which contain these novel concepts. In the experiments, it was shown that the tested embodiments effectively learned novel visual concepts from a few examples without disturbing the previously learned concepts.


The embodiments of the present disclosure disclose a vehicular lane line data processing method, apparatus, storage medium, and device. The method includes: acquiring at least two consecutive original images of a vehicular lane line and positioning data of the original images; calculating, using a deep neural network model, a pixel confidence for a conformity between a pixel characteristic in the original images and a vehicular lane line characteristic; determining an outline of the vehicular lane line from the original images and using the outline of the vehicular lane line as a candidate vehicular lane line; calculating a vehicular lane line confidence of the candidate vehicular lane line based on the pixel confidences of pixels in the candidate vehicular lane line; filtering the candidate vehicular lane line based on the vehicular lane line confidence of the candidate vehicular lane line; recognizing, for the filtered vehicular lane line, attribute information of the vehicular lane line; and determining map data of the vehicular lane line based on the attribute information of the vehicular lane line and the positioning data during shooting of the original images. By means of the vehicular lane line data processing method, apparatus, storage medium, and device provided by the embodiments of the present disclosure, the vehicular lane line data can be efficiently and precisely determined, the labor costs in high-precision map production is greatly reduced, and the mass production of high-precision maps can be achieved.


Systems and methods for a multi-core optimized Recurrent Neural Network (RNN) architecture are disclosed. The various architectures affect communication and synchronization operations according to the Multi-Bulk-Synchronous-Parallel (MBSP) model for a given processor. The resulting family of network architectures, referred to as MBSP-RNNs, perform similarly to a conventional RNNs having the same number of parameters, but are substantially more efficient when mapped onto a modem general purpose processor. Due to the large gain in computational efficiency, for a fixed computational budget, MBSP-RNNs outperform RNNs at applications such as end-to-end speech recognition.


Systems and methods for a multi-core optimized Recurrent Neural Network (RNN) architecture are disclosed. The various architectures affect communication and synchronization operations according to the Multi-Bulk-Synchronous-Parallel (MBSP) model for a given processor. The resulting family of network architectures, referred to as MBSP-RNNs, perform similarly to a conventional RNNs having the same number of parameters, but are substantially more efficient when mapped onto a modern general purpose processor. Due to the large gain in computational efficiency, for a fixed computational budget, MBSP-RNNs outperform RNNs at applications such as end-to-end speech recognition.


Patent
Baidu | Date: 2017-06-21

Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a phoneme, is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.


Disclosed is an artificial intelligence based voiceprint login method. The method comprises: S1: receiving a login request from a user and acquiring user information of the user; S2: generating a login string and replacing at least one character in the login string according to character replacement control information corresponding to the user information; S3: providing the user the replaced login string and receiving voice information of a login string read by the user; and S4: performing login authentication on the user according to the voice information of the login string read by the user. The method on one hand increases voiceprint password security by combining a voiceprint and user-defined characters to replace a voiceprint authentication corresponding to normal information, on the other hand, the method hides characters that a user wishes to hide to satisfy a psychological need of the user in which the user may not wish to directly show all the passwords, improving user experiences and increasing password security. Also disclosed is an artificial intelligence based voiceprint login device.


Disclosed in embodiments of the present invention are a video advertisement filtering method, apparatus and device. The video advertisement filtering method includes: recognizing a time count number in a time count area of an advertisement frame in a video; determining a time difference between advertisement broadcast end time and present time based on the recognized time count number; and advancing playing time of the video to the advertisement broadcast end time based on the time difference. The video advertisement filtering method, apparatus and device that are provided in the embodiments of the present invention can filter an inserted video advertisement from a video program.


The present application discloses a method and apparatus for obtaining a semantic label of a digital image. An implementation of the method includes: obtaining the digital image; looking up a semantic label model corresponding to the digital image, the semantic label model being used for representing correlation between digital images and semantic labels, and a semantic label being used for literally describing a digital image; and introducing the digital image into the semantic label model to obtain full-image recognition information and local recognition information corresponding to the digital image, and combining the full-image recognition information and the local recognition information to form a semantic label, the full-image recognition information being a summarized description of the digital image, and the local recognition information being a detailed description of the digital image. According to the implementation, the digital image is obtained first, then a semantic label model corresponding to the digital image is looked up, and a semantic label is obtained by using the semantic label model, which may improve the accuracy of obtaining the semantic label corresponding to the digital image.

Loading Baidu collaborators
Loading Baidu collaborators