ATR Intelligent Robotics and Communication Labs.

Kyoto, Japan

ATR Intelligent Robotics and Communication Labs.

Kyoto, Japan
SEARCH FILTERS
Time filter
Source Type

Liu C.,ATR Intelligent Robotics and Communication Labs. | Ishi C.T.,ATR Intelligent Robotics and Communication Labs. | Ishiguro H.,ATR Hiroshi Ishiguro Labs. | Hagita N.,ATR Intelligent Robotics and Communication Labs.
HRI'12 - Proceedings of the 7th Annual ACM/IEEE International Conference on Human-Robot Interaction | Year: 2012

Head motion occurs naturally and in synchrony with speech during human dialogue communication, and may carry paralinguistic information, such as intentions, attitudes and emotions. Therefore, natural-looking head motion by a robot is important for smooth human-robot interaction. Based on rules inferred from analyses of the relationship between head motion and dialogue acts, this paper proposes a model for generating head tilting and nodding, and evaluates the model using three types of humanoid robot (a very human-like android, "Geminoid F", a typical humanoid robot with less facial degrees of freedom, "Robovie R2", and a robot with a 3-axis rotatable neck and movable lips, "Telenoid R2"). Analysis of subjective scores shows that the proposed model including head tilting and nodding can generate head motion with increased naturalness compared to nodding only or directly mapping people's original motions without gaze information. We also find that an upwards motion of a robot's face can be used by robots which do not have a mouth in order to provide the appearance that utterance is taking place. Finally, we conduct an experiment in which participants act as visitors to an information desk attended by robots. As a consequence, we verify that our generation model performs equally to directly mapping people's original motions with gaze information in terms of perceived naturalness. © 2012 ACM.


Liu C.,ATR Intelligent Robotics and Communication Labs | Ishi C.T.,ATR Intelligent Robotics and Communication Labs | Ishiguro H.,ATR Hiroshi Ishiguro Labs | Hagita N.,ATR Intelligent Robotics and Communication Labs
International Journal of Humanoid Robotics | Year: 2013

Head motion occurs naturally and in synchrony with speech during human dialogue communication, and may carry paralinguistic information, such as intentions, attitudes and emotions. Therefore, natural-looking head motion by a robot is important for smooth human-robot interaction. Based on rules inferred from analyses of the relationship between head motion and dialogue acts, this paper proposes a model for generating head tilting and nodding, and evaluates the model using three types of humanoid robot (a very human-like android, "Geminoid F", a typical humanoid robot with less facial degrees of freedom, "Robovie R2", and a robot with a 3-axis rotatable neck and movable lips, "Telenoid R2"). Analysis of subjective scores shows that the proposed model including head tilting and nodding can generate head motion with increased naturalness compared to nodding only or directly mapping people's original motions without gaze information. We also find that an upward motion of a robot's face can be used by robots which do not have a mouth in order to provide the appearance that utterance is taking place. Finally, we conduct an experiment in which participants act as visitors to an information desk attended by robots. As a consequence, we verify that our generation model performs equally to directly mapping people's original motions with gaze information in terms of perceived naturalness. © 2013 World Scientific Publishing Company.


Liu C.,ATR Intelligent Robotics and Communication Labs | Ishi C.T.,ATR Intelligent Robotics and Communication Labs | Ishiguro H.,ATR Hiroshi Ishiguro Labs
ACM/IEEE International Conference on Human-Robot Interaction | Year: 2015

In a tele-operated robot system, the reproduction of auditory scenes, conveying 3D spatial information of sound sources in the remote robot environment, is important for the transmission of remote presence to the tele-operator. We proposed a tele-presence system which is able to reproduce and manipulate the auditory scenes of a remote robot environment, based on the spatial information of human voices around the robot, matched with the operator's head orientation. In the robot side, voice sources are localized and separated by using multiple microphone arrays and human tracking technologies, while in the operator side, the operator's head movement is tracked and used to relocate the spatial positions of the separated sources. Interaction experiments with humans in the robot environment indicated that the proposed system had significantly higher accuracy rates for perceived direction of sounds, and higher subjective scores for sense of presence and listenability, compared to a baseline system using stereo binaural sounds obtained by two microphones located at the humanoid robot's ears. We also proposed three different user interfaces for augmented auditory scene control. Evaluation results indicated higher subjective scores for sense of presence and usability in two of the interfaces (control of voice amplitudes based on virtual robot positioning, and amplification of voices in the frontal direction). © 2015 ACM.


Even J.,ATR Intelligent Robotics and Communication Labs. | Hagita N.,ATR Intelligent Robotics and Communication Labs.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | Year: 2011

This paper presents a novel method for solving the permutation problem inherent to frequency domain blind signal separation of multiple simultaneous speakers. As conventional methods, the proposed method exploits the direction of arrival (DOA) of the different speakers to resolve the permutation. But it is designed to exploit the information from pairs of microphones that are usually discarded because of the spatial aliasing. The proposed method is based on an explicit expression of the spatial aliasing effect on the DOA estimation. By introducing a vector of integer values in the equation used to estimate the DOA, it becomes possible to compensate the spatial aliasing by solving the equation relatively to that vector. The proposed method operates sequentially along the frequency bins. First the spatial aliasing is compensated by an iterative procedure that also detects the permutations. Then the detected permutation are suppressed and the DOA are estimated using all available pairs of microphones. Some simulation results demonstrate the effectiveness of the method. © 2011 IEEE.


Hayashi K.,ATR Intelligent Robotics and Communication Labs. | Hayashi K.,Japan Science and Technology Agency | Shiomi M.,ATR Intelligent Robotics and Communication Labs. | Shiomi M.,Japan Science and Technology Agency | And 3 more authors.
Robotics: Science and Systems | Year: 2012

This study addresses encounter interactions in public environments where people and robots walk around. In daily life, security guards, police officers, and sales clerks roam around environments and nonverbally present friendly behavior so that people feel comfortable talking to them. We modeled the behavior of human experts during friendly patrolling, which we defined as a roaming behavior that nonverbally presents a friendly attitude, to encourage people to talk to such professionals. The model was implemented in a humanoid robot, Robovie, and tested in a shopping mall. The experimental results with 39 participants demonstrated that the model worked as intended.


Ishi C.T.,ATR Intelligent Robotics and Communication Labs. | Liu C.,ATR Intelligent Robotics and Communication Labs. | Ishiguro H.,ATR Intelligent Robotics and Communication Labs. | Hagita N.,ATR Intelligent Robotics and Communication Labs.
5th ACM/IEEE International Conference on Human-Robot Interaction, HRI 2010 | Year: 2010

Head motion naturally occurs in synchrony with speech and may carry paralinguistic information, such as intention, attitude and emotion, in dialogue communication. With the aim of verifying the relationship between head motion and the dialogue acts carried by speech, analyses were conducted on motion-captured data for several speakers during natural dialogues. The analysis results first confirmed the trends of our previous work, showing that regardless of the speaker, nods frequently occur during speech utterances, not only for expressing dialogue acts such as agreement and affirmation, but also appearing at the last syllable of the phrase, in strong phrase boundaries, especially when the speaker is talking confidently, or expressing interest in the interlocutor's talk. Inter-speaker variability indicated that the frequency of head motion may vary according to the speaker's age or status, while intra-speaker variability indicated that the frequency of head motion also differs depending on the inter-personal relationship with the interlocutor. A simple model for generating nods based on rules inferred from the analysis results was proposed and evaluated in two types of humanoid robots. Subjective scores showed that the proposed model could generate head motions with naturalness comparable to the original motions. © 2010 IEEE.


Ishi C.T.,ATR Intelligent Robotics and Communication Labs. | Even J.,ATR Intelligent Robotics and Communication Labs. | Hagita N.,ATR Intelligent Robotics and Communication Labs.
IEEE International Conference on Intelligent Robots and Systems | Year: 2015

We developed a system for detecting the speech activity intervals of multiple speakers by combining multiple microphone arrays and human tracking technologies. We also proposed a method for estimating the face orientation of the detected speakers. The developed system was evaluated in two steps: individual utterances in different positions and orientations; and simultaneous dialogues by multiple speakers. Evaluation results revealed that the proposed system could detect speech activity intervals with more than 90% of accuracy, and face orientations with standard deviations within 30 degrees, in situations excluding the cases where all arrays are in the opposite direction to the speaker's face orientation. © 2015 IEEE.


Ishi C.T.,ATR Intelligent Robotics and Communication Labs. | Even J.,ATR Intelligent Robotics and Communication Labs. | Hagita N.,ATR Intelligent Robotics and Communication Labs.
IEEE International Conference on Intelligent Robots and Systems | Year: 2013

We proposed a method for estimating sound source locations in a 3D space by integrating sound directions estimated by multiple microphone arrays and taking advantage of reflection information. Two types of sources with different directivity properties (human speech and loudspeaker speech) were evaluated for different positions and orientations. Experimental results showed the effectiveness of using reflection information, depending on the position and orientation of the sound sources relative to the array, walls, and the source type. The use of reflection information increased the source position detection rates by 10% on average and up to 60% for the best case. © 2013 IEEE.


Ishi C.T.,ATR Intelligent Robotics and Communication Labs. | Ishiguro H.,ATR Intelligent Robotics and Communication Labs. | Hagita N.,ATR Intelligent Robotics and Communication Labs.
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | Year: 2011

In order to improve the acoustic characterization of breathy and whispery segments, we proposed a normalized breathiness power measure (NBP) by embedding a mid-frequency voicing measure (F1F3syn) in its formulation. A partial inverse filtering pre-processing and a sub-band periodicity-based frequency boundary selection approach were also proposed for improving the performance of the F1F3syn and NBP measures. Improvements from 70 to 83% on detection of breathy/whispery segments are achieved by the proposed NBP measure relative to previous methods, for a false detection rate of 10% in modal and rough segments. Copyright © 2011 ISCA.


Ishi C.T.,ATR Intelligent Robotics and Communication Labs. | Liu C.,Osaka University | Ishiguro H.,Osaka University | Hagita N.,ATR Intelligent Robotics and Communication Labs.
IEEE International Conference on Intelligent Robots and Systems | Year: 2012

Generating natural motion in robots is important for improving human-robot interaction. We developed a tele-operation system where the lip motion of a remote humanoid robot is automatically controlled from the operator's voice. In the present work, we introduce an improved version of our proposed speech-driven lip motion generation method, where lip height and width degrees are estimated based on vowel formant information. The method requires the calibration of only one parameter for speaker normalization. Lip height control is evaluated in two types of humanoid robots (Telenoid-R2 and Geminoid-F). Subjective evaluation indicated that the proposed audio-based method can generate lip motion with naturalness superior to vision-based and motion capture-based approaches. Partial lip width control was shown to improve lip motion naturalness in Geminoid-F, which also has an actuator for stretching the lip corners. Issues regarding online real-time processing are also discussed. © 2012 IEEE.

Loading ATR Intelligent Robotics and Communication Labs. collaborators
Loading ATR Intelligent Robotics and Communication Labs. collaborators