Time filter

Source Type

Heracleous P.,Intelligent Robotics and Communication Laboratories | Heracleous P.,CNRS GIPSA Laboratory | Beautemps D.,CNRS GIPSA Laboratory | Aboutabit N.,CNRS GIPSA Laboratory
Speech Communication | Year: 2010

This article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs). Cued Speech is a visual mode which, by using hand shapes in different positions and in combination with lip patterns of speech, makes all the sounds of a spoken language clearly understandable to deaf people. The aim of Cued Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand spoken language completely. In the current study, the authors demonstrate that visible gestures are as discriminant as audible orofacial gestures. Phoneme recognition and isolated word recognition experiments have been conducted using data from a normal-hearing cuer. The results obtained were very promising, and the study has been extended by applying the proposed methods to a deaf cuer. The achieved results have not shown any significant differences compared to automatic Cued Speech recognition in a normal-hearing subject. In automatic recognition of Cued Speech, lip shape and gesture recognition are required. Moreover, the integration of the two modalities is of great importance. In this study, lip shape component is fused with hand component to realize Cued Speech recognition. Using concatenative feature fusion and multi-stream HMM decision fusion, vowel recognition, consonant recognition, and isolated word recognition experiments have been conducted. For vowel recognition, an 87.6% vowel accuracy was obtained showing a 61.3% relative improvement compared to the sole use of lip shape parameters. In the case of consonant recognition, a 78.9% accuracy was obtained showing a 56% relative improvement compared to the use of lip shape only. In addition to vowel and consonant recognition, a complete phoneme recognition experiment using concatenated feature vectors and Gaussian mixture model (GMM) discrimination was conducted, obtaining a 74.4% phoneme accuracy. Isolated word recognition experiments in both normal-hearing and deaf subjects were also conducted providing a word accuracy of 94.9% and 89%, respectively. The obtained results were compared with those obtained using audio signal, and comparable accuracies were observed. © 2010 Elsevier B.V. All rights reserved. Source

Lee J.-W.,Korea Advanced Institute of Science and Technology | Lee J.-Y.,Intelligent Robotics and Communication Laboratories | Lee J.-J.,Korea Advanced Institute of Science and Technology
IEEE Wireless Communications Letters | Year: 2013

The Energy-Efficient Coverage (EEC) problem in unstructured Wireless Sensor Networks (WSNs) is an important issue because WSNs have limited energy. In this letter, we propose a novel stochastic optimization algorithm, called the Jenga-Inspired Optimization Algorithm (JOA), which overcomes some of the weaknesses of other optimization algorithms for solving the EEC problem. The JOA was inspired by Jenga which is a well-known board game. We also introduce the probabilistic sensor detection model, which leads to a more realistic approach to solving the EEC problem. Simulation results are conducted to verify the effectiveness of the JOA for solving the EEC problem in comparison with existing algorithms. © 2013 IEEE. Source

Heracleous P.,CNRS GIPSA Laboratory | Heracleous P.,Intelligent Robotics and Communication Laboratories | Tran V.-A.,CNRS GIPSA Laboratory | Nagai T.,Nara Institute of Science and Technology | Shikano K.,Nara Institute of Science and Technology
IEEE Transactions on Audio, Speech and Language Processing | Year: 2010

Non-audible murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with the use of special acoustic sensors (i.e., NAM microphones) attached behind the talker's ear. The authors had previously reported experimental results for NAM recognition using a stethoscopic and a silicon NAM microphone. Using a small amount of training data from a single speaker and adaptation approaches, 93.9% of word accuracy was achieved for a 20 k Japanese vocabulary dictation task. In this paper, further analysis of NAM speech is made using distance measures between hidden Markov models (HMMs). It has been shown that owing to the reduced spectral space of NAM speech, the HMM distances are also reduced when compared with those of normal speech. In the case of Japanese vowels and fricatives, the distance measures in NAM speech follow the same relative inter-phoneme relationship as that in normal speech without significant differences. However, significant differences have been found in the case of Japanese plosives. More specifically, in NAM speech, the distances between voiced/unvoiced consonant pairs articulated in the same place drastically decreased. As a result, the inter-phoneme relationship as compared to normal-speech changed significantly, causing a substantial decrease in the recognition accuracy. A speaker-dependent phoneme recognition experiment has been conducted, obtained 81.5% NAM phoneme correct, showing a relationship between HMM distance measures and phoneme accuracy. In a NAM microphone, body transmission and loss of lip radiation act as a low-pass filter. As a result, higher frequency components are attenuated in a NAM signal. Because of spectral reduction, NAM's unvoiced nature, and the type of articulation, NAM sounds become similar, causing a larger number of confusions when compared with normal speech. Yet many of those sounds are visually different on face/mouth/lips, and the integration of visual information increases their discrimination. As a result, recognition accuracy increases as well. In this article, the visual information extracted from the talkers' facial movements is fused with NAM speech. The experimental results reveal a relative improvement of 10.5% on average when fused NAM speech and facial information were used compared with using only NAM speech. © 2010 IEEE. Source

Mutlu B.,University of Wisconsin - Madison | Kanda T.,Intelligent Robotics and Communication Laboratories | Forlizzi J.,Carnegie Mellon University | Hodgins J.,Carnegie Mellon University | Ishiguro H.,Osaka University
Transactions on Interactive Intelligent Systems | Year: 2012

During conversations, speakers employ a number of verbal and nonverbal mechanisms to establish who participates in the conversation, when, and in what capacity. Gaze cues and mechanisms are particularly instrumental in establishing the participant roles of interlocutors, managing speaker turns, and signaling discourse structure. If humanlike robots are to have fluent conversations with people, they will need to use these gaze mechanisms effectively. The current work investigates people's use of key conversational gaze mechanisms, how they might be designed for and implemented in humanlike robots, and whether these signals effectively shape human-robot conversations. We focus particularly on whether humanlike gaze mechanisms might help robots signal different participant roles, manage turn-exchanges, and shape how interlocutors perceive the robot and the conversation. The evaluation of these mechanisms involved 36 trials of three-party human-robot conversations. In these trials, the robot used gaze mechanisms to signal to its conversational partners their roles either of two addressees, an addressee and a bystander, or an addressee and a nonparticipant. Results showed that participants conformed to these intended roles 97% of the time. Their conversational roles affected their rapport with the robot, feelings of groupness with their conversational partners, and attention to the task. © 2012 ACM. Source

Glas D.F.,Hiroshi Ishiguro Laboratories | Kanda T.,Intelligent Robotics and Communication Laboratories | Ishiguro H.,Osaka University
ACM/IEEE International Conference on Human-Robot Interaction | Year: 2016

Interaction Composer, a visual programming environment designed to enable programmers and non-programmers to collaboratively design social human-robot interactions in the form of state-based flows, has been in use at our laboratory for eight years. The system architecture and the design principles behind the framework have been presented in other work, but in this paper we take a case-study approach, examining several actual examples of the use of this toolkit over an eight-year period. We examine the structure and content of interaction flows, identify common design patterns, and discuss elements of the framework which have proven valuable, features which did not solve their intended purposes, and ways that future systems might better address these issues. It is hoped that the insights gained from this study will contribute to the development of more effective and more usable tools and frameworks for interaction design. © 2016 IEEE. Source

Discover hidden collaborations