Niwa K.,NTT Media Intelligence Laboratories |
Ochi D.,NTT Media Intelligence Laboratories |
Kameda A.,NTT Media Intelligence Laboratories |
Kamamoto Y.,NTT Communication Science Laboratories |
Moriya T.,NTT Communication Science Laboratories
141st Audio Engineering Society International Convention 2016, AES 2016 | Year: 2016
In virtual reality (VR), 360? video services provided through head mounted displays (HMDs) or smartphones are widely used. Despite the fact that the user's viewpoint seamlessly changes, sounds through the headphones are fixed even when images change in correspondence with user head motion in many 360 ? video services. We have been studying acoustic immersion technology that is achieved by, for example, generating binaural sounds corresponding to the user head motion. Basically, our method is composed of angular region-wise source enhancement using array observation signals, multichannel audio encoding based on MPEG-4 Audio Lossless Coding (ALS), and binaural synthesizing of enhanced signals using head related transfer functions (HRTFs). In this paper, we constructed a smartphone-based real-Time system for streaming/viewing 360 ? video including acoustic immersion and evaluated it through subjective tests.
Ogasawara T.,Device Innovation Center |
Ono K.,Device Innovation Center |
Matsuura N.,Device Innovation Center |
Yamaguchi M.,Tech Lab Group |
And 2 more authors.
NTT Technical Review | Year: 2015
NTT has developed a conductive fabric called hitoe that enables continuous measurement of the biological signals of the person wearing it. Heartbeat variations and electrocardiogram signals detected through hitoe are transmitted wirelessly by a compact dedicated device to a smartphone or tablet, where they can be readily checked using an application. Such technology is expected to lead to the creation of new services in fields such as sports training, health enhancement, security and safety, medical care support, and entertainment. In this article, we introduce some examples of approaches to application development.
Imoto K.,NTT Media Intelligence Laboratories |
Ohishi Y.,NTT Communication Science Laboratories |
Uematsu H.,NTT Media Intelligence Laboratories |
Ohmuro H.,NTT Media Intelligence Laboratories
IEEE International Workshop on Machine Learning for Signal Processing, MLSP | Year: 2013
We propose a model for analyzing acoustic scenes by using long-term (more than several seconds) acoustic signals based on a probabilistic generative model of an acoustic feature sequence associated with acoustic scenes (e.g. 'cooking') and acoustic events (e.g. 'cutting with a knife,' 'heating a skillet' or 'running water') called latent acoustic topic and event allocation (LATEA) model. The proposed model allows the analysis of a wide variety of sounds and the capture of abstract acoustic scenes by representing acoustic events and scenes as latent variables, and can also describe the acoustic similarity and variance between acoustic events by representing acoustic features as a mixture of Gaussian components. Experiments with real-life sounds indicated that the proposed model exhibited lower perplexity than conventional models; it improved the stability of acoustic scene estimation. The experimental results also suggested that the proposed model can better describe the acoustic similarity and variance between acoustic events than conventional models. © 2013 IEEE.
PubMed | Disney Research and NTT Communication Science Laboratories
Type: Journal Article | Journal: IEEE transactions on pattern analysis and machine intelligence | Year: 2013
Discriminative, or (structured) prediction, methods have proved effective for variety of problems in computer vision; a notable example is 3D monocular pose estimation. All methods to date, however, relied on an assumption that training (source) and test (target) data come from the same underlying joint distribution. In many real cases, including standard data sets, this assumption is flawed. In the presence of training set bias, the learning results in a biased model whose performance degrades on the (target) test set. Under the assumption of covariate shift, we propose an unsupervised domain adaptation approach to address this problem. The approach takes the form of training instance reweighting, where the weights are assigned based on the ratio of training and test marginals evaluated at the samples. Learning with the resulting weighted training samples alleviates the bias in the learned models. We show the efficacy of our approach by proposing weighted variants of kernel regression (KR) and twin Gaussian processes (TGP). We show that our weighted variants outperform their unweighted counterparts and improve on the state-of-the-art performance in the public (HumanEva) data set.
Nagata M.,NTT Communication Science Laboratories |
Sudoh K.,NTT Communication Science Laboratories |
Suzuki J.,NTT Communication Science Laboratories |
Akiba Y.,Communication Intelligence |
And 2 more authors.
NTT Technical Review | Year: 2013
English and Japanese have very different word orders, and they are probably one of the most difficult language pairs to translate. We developed a new method of translating English to Japanese that takes advantage of the head-final linguistic nature of Japanese. It first changes the word order in an English sentence into that of a Japanese sentence and then translates the reordered English sentence into Japanese. We found that our method dramatically improved the accuracy of English-to-Japanese translation. We also found that the method is highly effective for Chinese-to-Japanese translation.
Maeda E.,NTT Communication Science Laboratories
NTT Technical Review | Year: 2013
Research at NTT Communication Science Laboratories draws on both information science and human science with the aim of building a new technical infrastructure that will connect humans and information. These Feature Articles introduce new trends in the fields of speech, language, and hearing, which have a relatively long history of basic research.