Audience | Date: 2015-03-16
Adaptive system and method for interacting with and navigating 2D and 3D audiovisual content by interpreting the side to side and/or up and down physical gestures and sounds of a multi-person audience. The system may be used in a movie theatre, stadium, music arena, or other venue where an audience views the same view screen. A camera pointed at the audience captures audience motion and a microphone captures sounds. Optical flow is determined by comparing successive video frames. A Motion Index between 100 and 100 is calculated using Optical flow, comparing more recent Optical Flow vectors with previous values. Methods are provided to give equal weight to audience members closer to and further from the camera. Motion Index Values are used to control and interact with content scenarios.
Audience | Date: 2015-10-02
Systems and methods for noise suppression using noise subtraction processing are provided. The noise subtraction processing comprises receiving at least a primary and a secondary acoustic signal. A desired signal component may be calculated and subtracted from the secondary acoustic signal to obtain a noise component signal. A determination may be made of a reference energy ratio and a prediction energy ratio. A determination may be made as to whether to adjust the noise component signal based partially on the reference energy ratio and partially on the prediction energy ratio. The noise component signal may be adjusted or frozen based on the determination. The noise component signal may then be removed from the primary acoustic signal to generate a noise subtracted signal which may be outputted.
Audience | Date: 2015-12-08
Systems and methods for a dynamic local automatic speech recognition (ASR) vocabulary are provided. An example method includes defining a user actionable screen content based on user interactions. At least a portion of the user actionable screen content is labeled. A local vocabulary associated with a local ASR engine is created based partially on the labeling. The local vocabulary includes words associated with functions of a mobile device and is limited by resources of the mobile device. The method includes determining whether speech includes a local key phrase or a cloud-based key phrase. Based on the determination, the method includes performing ASR on the speech using the local ASR engine or forwarding the speech to a cloud-based computing engine and performing ASR therewithin based on the cloud-based computing engines larger vocabulary.
Audience | Date: 2015-09-21
Provided are systems and methods for image enhancement based on combining multiple related images, such as images of the same object taken from different imaging angles. This approach allows simulating images captured from longer distances using telephoto lenses. Initial images may be captured using a simple camera equipped with shorter focal length lenses, typically used on camera phones, tablets, and laptops. The initial images may be taken using a single camera. An object or, more specifically, a center line of the object is identified in each image. The object is typically present in the foreground portion of the initial images. The initial images may be cross-faded along the object center line to yield a combined image. Separating of the foreground and background portions of each image may be separated and separately processed, such as blurring the background portion and sharpening the foreground portion.
Audience | Date: 2015-03-24
Systems and methods for estimating and tracking multiple attributes of multiple objects from multi-sensor data are provided. An exemplary method includes identifying features associated with sensor data. The sensor data represents data captured by at least one of a plurality of acoustic and non-acoustic sensors. Identification of the features associated with the sensor data may be based variously on detected sounds, motions, images, and the like. The exemplary method further includes determining, in parallel, multiple probable objects based at least in part on the identified features. Various embodiments of the method also include forming hypotheses based at least in part on associating identified features with the multiple probable objects and attributing the formed hypotheses to channels. Sequence of the formed hypotheses are constructed. The exemplary system includes a tracking module configured to provide the channels and constructed sequences for use in various signal processing, such as signal enhancement.
Audience | Date: 2015-08-27
Systems and methods for multi-sourced noise suppression are provided. An example system may receive streams of audio data including a voice signal and noise, the voice signal including a spoken word. The streams of audio data are provided by distributed audio devices. The system can assign weights to the audio streams based at least partially on quality of the audio streams. The weights of audio streams can be determined based on signal-to-noise ratios (SNRs). The system may further process, based on the weights, the audio stream to generate cleaned speech. Each audio device comprises microphone(s) and can be associated with the Internet of Things (IoT), such that the audio devices are Internet of Things devices. The processing can include noise suppression and reduction and echo cancellation. The cleaned speech can be provided to a remote device for further processing which may include Automatic Speech Recognition (ASR).
Audience | Date: 2015-09-10
The present technology provides adaptive noise reduction of an acoustic signal using a sophisticated level of control to balance the tradeoff between speech loss distortion and noise reduction. The energy level of a noise component in a sub-band signal of the acoustic signal is reduced based on an estimated signal-to-noise ratio of the sub-band signal, and further on an estimated threshold level of speech distortion in the sub-band signal. In various embodiments, the energy level of the noise component in the sub-band signal may be reduced to no less than a residual noise target level. Such a target level may be defined as a level at which the noise component ceases to be perceptible.
Audience | Date: 2015-12-29
Provided are systems and methods for context-based services based on keyword monitoring. An example method includes monitoring an acoustic signal associated with a user. The acoustic signal is captured by audio devices associated with the user. The method can include detecting at least one keyword in the acoustic signal and determining, based at least partially on the acoustic signal, context data associated with the at least one keyword and the user. The keyword and the context data are analyzed to determine an indication of an intent, a need, or a wish of the user. The indication can be sent to a service provider. The service provider can provide information associated with context-based services to the user. The information may include a reminder, an advertisement, a coupon, a discount offer, a rebate, and a coupon. The information can be sent when the user is located at a specific location.
Audience | Date: 2015-01-06
Systems and methods for registration of a customer inside a business place are disclosed. A method can include transmitting, by one or more ultrasound speakers associated with the business place, an ultrasound audio signature to a mobile device associated with a customer. In addition, the method can include sending, over one or more networks, a message from the mobile device to a server. An example message includes at least information associated with the customer and the ultrasound audio signature. In response to the message, a notification concerning the customer is sent from the server to the business place, according to various embodiments.
Audience | Date: 2016-01-06
Provided are systems and methods for utilizing digital microphones in low power keyword detection and noise suppression. An example method includes receiving a first acoustic signal representing at least one sound captured by a digital microphone. The first acoustic signal includes buffered data transmitted with a first clock frequency. The digital microphone may provide voice activity detection. The example method also includes receiving at least one second acoustic signal representing the at least one sound captured by a second microphone, the at least one second acoustic signal including real-time data. The first and second acoustic signals are provided to an audio processing system which may include noise suppression and keyword detection. The buffered portion may be sent with a higher, second clock frequency to eliminate a delay of the first acoustic signal from the second acoustic signal. Providing the signals may also include delaying the second acoustic signal.