Martigny, Switzerland
Martigny, Switzerland

Time filter

Source Type

A multimodal processing method comprising the steps of:


News Article | December 23, 2016
Site: phys.org

Pedestrians often try to find their way about using their smartphones. The computer scientist Peter Kiefer and the geomatics expert Martin Raubal are at work together trying to make things easier for them. They work at the GeoGazeLab at ETH Zurich and are trying to refine smartphone maps so that pedestrians will find their way perfectly in any new environment. To this end they are developing special systems that involve attaching an eye-tracking module to one's head. These modules comprise different cameras that are variously focussed on the eyes of the user and on the user's field of vision. By means of eye tracking, Kiefer and Raubal can determine which landmarks pedestrians use to orient themselves. Their findings are interesting. "People ignore some elements on the map completely", says Raubal. In order not to confuse people, he suggests that these elements – railway tracks, for example – should be left off such maps altogether. This is just one of many examples illustrating the remarkable progress made by so-called 'eye tracking' – the process of automatically tracking the direction of your gaze. The importance of this technology shouldn't surprise us, because people's gaze can tell us exactly what's the object of their attention, and also how they feel. Many areas of science and business use this technology today, from cognitive research and sociology to the car industry. Stressed pilots in front of the camera Kiefer and Raubal are also busy with another, especially ambitious project, this time in the field of air transportation. They are engaged in a collaboration with the airline Swiss, using eye tracking to monitor the training of pilots in flight simulators. In order not to hinder the pilot, eye-tracking cameras are not installed on his head, but in the cockpit itself. Raubal and Kiefer want to use the trainees' eye movements to recognise what kind of situations place them under stress. Swiss hopes that this method will offer new information to help them further refine their flight training programme. You can also use eye tracking to help optimise your office space. This is the area of research of Mandana Sarey Khanie, a civil engineer at the Interdisciplinary Laboratory of Performance-Integrated Design (LIPID) at EPFL. People who sit for eight hours a day in front of their computer often complain of sore eyes, tiredness and headaches. This can be because of brightness contrasts in their environment. People usually work more productively if they're in an office with pleasant lighting. Sarey Khanie is investigating how the intelligent use of light can be applied when designing workspaces. Her focus is on offices that are lit up by natural light. Sarey Khanie's project uses an eye-tracking system that comprises three cameras mounted on a person's head. Two look in the person's eyes while a third records the orientation of their head. Together, they serve to determine the person's viewing direction. Eye tracking enables Sarey Khanie to recognise when a person reacts to light in a systematic way. "In one experiment we observed that people like to look out of the window, and only avoid doing so when the incoming sunlight creates stark brightness contrasts", she says. You could carry out a survey instead, to try and find out if people feel they're being blinded by light at the workplace. But such a method would be too imprecise, explains Sarey Khanie. Together with Marilyne Andersen, the Director of LIPID, Sarey Khanie wants to develop software tools to enable architects to carry out simulations that meet three requirements of construction planning: maximising the use of daylight and visual contact with the outside world; avoiding the glare of bright light; and keeping energy use low. Looking at nothing Eye tracking is also used in pure research. Psychologists in particular are fond of the technology, because it enables them to observe human behaviour in an uncompromised manner. "Your eye movements aren't something you can really control", explains Agnes Scholz, a psychologist at the University of Zurich. Scholz uses eye tracking in order to investigate fundamental thought processes. When people make decisions they can orient themselves on abstract rules, or base their decisions on examples taken from recent memory. Scholz carried out an experiment to see if she could observe differences between these two approaches. Test subjects were asked to assess several people whose profiles were presented to them on a computer screen. In order to check whether recent memory played a role in their assessment, the test subjects were presented with example cases on the monitor before they came to make their own assessment. When the test subjects were observed by means of eye tracking, it revealed a fundamental difference in their direction of vision. The assessment ran differently if the test subjects remembered the examples they had seen. While they were making their decision, these test subjects looked at specific areas of the monitor – the empty spaces where the example cases had been shown just before. Psychologists call this behavioural phenomenon 'looking at nothing'. The other test subjects – those who based their assessment on abstract rules – did not engage in this 'looking at nothing'. In future, Scholz wants to find out more precisely when this specific viewing behaviour occurs, and what role it plays in decision-making. Scholz used a special camera for her eye tracking. It is directed at the eyes of the test subjects and also uses infrared light to measure the geometric characteristics of their pupils. Such systems have been honed more and more in recent years, and now function very precisely. However, they often lack flexibility, especially in cases where people move about a lot without keeping anything firmly in their gaze. At the Idiap Research Institute in Martigny, Kenneth Funes Mora and Jean-Marc Odobez are developing systems that use relatively inexpensive cameras without high resolution. They register both colours and distances. Sophisticated algorithms enable a computer to use the pictures from the cameras to determine the direction of one's gaze at all times. The variable angles of the head and eye movements are captured and then converted into data that describes the changes in a person's direction of vision. The researchers can place these camera systems inconspicuously on a conference table in order to study negotiation techniques. Funes Mora and Odobez patented their new eye-tracking method a while ago now. Funes Mora is currently researching at the Institute only on a 50% post, because in the rest of his time he has to look after their spin-off company, 'Eyeware'. The two researchers believe that such an eye-tracking system can have many different areas of possible application. Their newly developed camera is especially suited to investigating people's visual attention, and to supporting the interaction between people and computers. It could be used by a robot, for example, to advise customers in a shopping mall. Applications in the medical field would also be possible – such as in diagnosing disorders like autism, which can be recognised by tracking eye movements. And this will hardly be the last of their ideas for applying their eye tracking system. "The eyes simply tell you a lot about people", says Funes Mora.


An apparatus and a method for constructing a multilingual acoustic model, and a computer readable recording medium are provided. The method for constructing a multilingual acoustic model includes dividing an input feature into a common language portion and a distinctive language portion, acquiring a tandem feature by training the divided common language portion and distinctive language portion using a neural network to estimate and remove correlation between phonemes, dividing parameters of an initial acoustic model constructed using the tandem feature into common language parameters and distinctive language parameters, adapting the common language parameters using data of a training language, adapting the distinctive language parameters using data of a target language, and constructing an acoustic model for the target language using the adapted common language parameters and the adapted distinctive language parameters.


An apparatus and a method for constructing a multilingual acoustic model, and a computer readable recording medium are provided. The method for constructing a multilingual acoustic model includes dividing an input feature into a common language portion and a distinctive language portion, acquiring a tandem feature by training the divided common language portion and distinctive language portion using a neural network to estimate and remove correlation between phonemes, dividing parameters of an initial acoustic model constructed using the tandem feature into common language parameters and distinctive language parameters, adapting the common language parameters using data of a training language, adapting the distinctive language parameters using data of a target language, and constructing an acoustic model for the target language using the adapted common language parameters and the adapted distinctive language parameters.


News Article | October 27, 2016
Site: www.techradar.com

Banksy, you've got competition. Researchers from Goldsmiths University of London's Department of Computing, alongside the Idiap Research Institute in Martigny, Switzerland, have developed Baxter, a robot capable of reproducing graffiti tags with the same fluidity as a human. Baxter can write tags in pen, paint and neon lights, producing its own name in a graffiti style that first emerged in New York in the late 1960s. It's a skill that's considered incredibly difficult to mimic smoothly, making Baxter's robotic output particularly impressive. Previous attempts have proved superficial, with Baxter's predecessors only managing stiff, stilted reproductions lacking the finesse of human counterparts. By comparison, Baxter's tags appear incredibly fluid, highlighting the advancements in both robotics and tracking tech to make such arty output possible. Baxter is actually a modified version of a workforce factory robot of the same name. Professor Frederic Fol Leymarie and PhD candidate Daniel Berio from Goldsmiths, and Dr Sylvain Calinon from Idiap have modified the bot with weights to better approximate a human arm, in line with their "research into different aspects of human motor skills, and the perceptual processes and movement dynamics underlying the production of various forms of art." With the weights based on the mechanical properties of Baxter's manipulator, the robot's tracking accuracy and stability are set in such a way as to carefully mimic the brush pressure of a true human artist. Take a look at Baxter in action in the video below: More on Baxter can be found in the research paper "Learning dynamic graffiti strokes with a compliant robot."


Berclaz J.,Ecole Polytechnique Federale de Lausanne | Fleuret F.,Idiap Research Institute | Turetken E.,Ecole Polytechnique Federale de Lausanne | Fua P.,Ecole Polytechnique Federale de Lausanne
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2011

Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach can be made very robust to the occasional detection failure: If an object is not detected in a frame but is in previous and following ones, a correct trajectory will nevertheless be produced. By contrast, a false-positive detection in a few frames will be ignored. However, when dealing with a multiple target problem, the linking step results in a difficult optimization problem in the space of all possible families of trajectories. This is usually dealt with by sampling or greedy search based on variants of Dynamic Programming which can easily miss the global optimum. In this paper, we show that reformulating that step as a constrained flow optimization results in a convex problem. We take advantage of its particular structure to solve it using the k-shortest paths algorithm, which is very fast. This new approach is far simpler formally and algorithmically than existing techniques and lets us demonstrate excellent performance in two very different contexts. © 2011 IEEE.


Garner P.N.,Idiap Research Institute
Speech Communication | Year: 2011

Cepstral normalisation in automatic speech recognition is investigated in the context of robustness to additive noise. In this paper, it is argued that such normalisation leads naturally to a speech feature based on signal to noise ratio rather than absolute energy (or power). Explicit calculation of this SNR-cepstrum by means of a noise estimate is shown to have theoretical and practical advantages over the usual (energy based) cepstrum. The relationship between the SNR-cepstrum and the articulation index, known in psycho-acoustics, is discussed. Experiments are presented suggesting that the combination of the SNR-cepstrum with the well known perceptual linear prediction method can be beneficial in noisy environments. © 2011 Elsevier B.V. All rights reserved.


Valente F.,Idiap Research Institute
Speech Communication | Year: 2010

This paper aims at investigating the use of Dempster-Shafer (DS) combination rule for multi-stream automatic speech recognition. The DS combination is based on a generalization of the conventional Bayesian framework. The main motivation for this work is the similarity between the DS combination and findings of Fletcher on human speech recognition. Experiments are based on the combination of several Multi Layer Perceptron (MLP) classifiers trained on different representations of the speech signal. The TANDEM framework is adopted in order to use the MLP outputs into conventional speech recognition systems. We exhaustively investigate several methods for applying the DS combination into multi-stream ASR. Experiments are run on small and large vocabulary speech recognition tasks and aim at comparing the proposed technique with other frame-based combination rules (e.g. inverse entropy). Results reveal that the proposed method outperforms conventional combination rules in both tasks. Furthermore we verify that the performance of the combined feature stream is never inferior to the performance of the best individual feature stream. We conclude the paper discussing other applications of the DS combination and possible extensions. © 2009 Elsevier B.V. All rights reserved.


Ba S.O.,LabSTICC | Odobez J.-M.,Idiap Research Institute
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2011

This paper introduces a novel contextual model for the recognition of people's visual focus of attention (VFOA) in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose to jointly recognize the participants' visual attention in order to introduce context-dependent interaction models that relate to group activity and the social dynamics of communication. Meeting contextual information is represented by the location of people, conversational events identifying floor holding patterns, and a presentation activity variable. By modeling the interactions between the different contexts and their combined and sometimes contradictory impact on the gazing behavior, our model allows us to handle VFOA recognition in difficult task-based meetings involving artifacts, presentations, and moving people. We validated our model through rigorous evaluation on a publicly available and challenging data set of 12 real meetings (5 hours of data). The results demonstrated that the integration of the presentation and conversation dynamical context using our model can lead to significant performance improvements. © 2006 IEEE.


Chen C.,Idiap Research Institute | Odobez J.-M.,Idiap Research Institute
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2012

In this paper, we deal with the estimation of body and head poses (i.e orientations) in surveillance videos, and we make three main contributions. First, we address this issue as a joint model adaptation problem in a semi-supervised framework. Second, we propose to leverage the adaptation on multiple information sources (external labeled datasets, weak labels provided by the motion direction, data structure manifold), and in particular, on the coupling at the output level of the head and body classifiers, accounting for the restriction in the configurations that the head and body pose can jointly take. Third, we propose a kernel-formulation of this principle that can be efficiently solved using a global optimization scheme. The method is applied to body and head features computed from automatically extracted body and head location tracks. Thorough experiments on several datasets demonstrate the validity of our approach, the benefit of the coupled adaptation, and that the method performs similarly or better than a state-of-the-art algorithm. © 2012 IEEE.

Loading Idiap Research Institute collaborators
Loading Idiap Research Institute collaborators