University of Le Mans

of Le Mans, Italy

University of Le Mans

of Le Mans, Italy
Time filter
Source Type

Esteve Y.,University of Le Mans | Ghannay S.,University of Le Mans | Camelin N.,University of Le Mans
CEUR Workshop Proceedings | Year: 2016

Automatic speech recognition(ASR) offers the ability to access the semantic content present in spoken language within audio and video documents. While acoustic models based on deep neural networks have recently significantly improved the performances of ASR systems, automatic transcriptions still contain errors. Errors perturb the exploitation of these ASR outputs by introducing noise to the text. To reduce this noise, it is possible to apply an ASR error detection in order to remove recognized words labelled as errors. This paper presents an approach that reaches very good results, better than previous state-of-the-art approaches. This work is based on a neural approach, and more especially on a study targeted to acoustic and linguistic word embeddings, that are representations of words in a continuous space. In comparison to the previous state-of-the-art approach which were based on Conditional Random Fields, our approach reduces the classification error rate by 7.2%. © 2016, CEUR-WS. All rights reserved.

Tomashenko N.,University of Le Mans | Tomashenko N.,Saint Petersburg State University of Information Technologies, Mechanics and Optics | Vythelingum K.,University of Le Mans | Rousseau A.,University of Le Mans | Esteve Y.,University of Le Mans
2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings | Year: 2016

This paper describes the automatic speech recognition (ASR) systems developed by LIUM in the framework of the 2016 Multi-Genre Broadcast (MGB-2) Challenge in the Arabic language. LIUM participated in the first of the two proposed tasks, namely the speech-to-text transcription of Aljazeera recordings. We present the approaches and details found in our systems, as well as our results in the evaluation campaign: the primary LIUM ASR system attained the second position. The main aspects come from the use of GMM-derived features for training a DNN, combined with the use of time-delay neural networks for acoustic models, the use of two different approaches in order to automatically phonetize Arabic words, and finally, the training data selection strategy for acoustic and language models. © 2016 IEEE.

Brillouet J.-M.,French National Institute for Agricultural Research | Romieu C.,Montpellier SupAgro | Schoefs B.,University of le Mans | Solymosi K.,Eötvös Loránd University | And 5 more authors.
Annals of Botany | Year: 2013

Background and AimsCondensed tannins (also called proanthocyanidins) are widespread polymers of catechins and are essential for the defence mechanisms of vascular plants (Tracheophyta). A large body of evidence argues for the synthesis of monomeric epicatechin on the cytosolic face of the endoplasmic reticulum and its transport to the vacuole, although the site of its polymerization into tannins remains to be elucidated. The aim of the study was to re-examine the cellular frame of tannin polymerization in various representatives of the Tracheophyta.MethodsLight microscopy epifluorescence, confocal microscopy, transmission electron microscopy (TEM), chemical analysis of tannins following cell fractionation, and immunocytochemistry were used as independent methods on tannin-rich samples from various organs from Cycadophyta, Ginkgophyta, Equisetophyta, Pteridophyta, Coniferophyta and Magnoliophyta. Tissues were fixed in a caffeine-glutaraldehyde mixture and examined by TEM. Other fresh samples were incubated with primary antibodies against proteins from both chloroplastic envelopes and a thylakoidal chlorophyll-carrying protein; they were also incubated with gelatin-Oregon Green, a fluorescent marker of condensed tannins. Coupled spectral analyses of chlorophyll and tannins were carried out by confocal microscopy on fresh tissues and tannin-rich accretions obtained through cell fractionation; chemical analyses of tannins and chlorophylls were also performed on the accretions.Key Results and ConclusionsThe presence of the three different chloroplast membranes inside vacuolar accretions that constitute the typical form of tannin storage in vascular plants was established in fresh tissues as well as in purified organelles, using several independent methods. Tannins are polymerized in a new chloroplast-derived organelle, the tannosome. These are formed by pearling of the thylakoids into 30 nm spheres, which are then encapsulated in a tannosome shuttle formed by budding from the chloroplast and bound by a membrane resulting from the fusion of both chloroplast envelopes. The shuttle conveys numerous tannosomes through the cytoplasm towards the vacuole in which it is then incorporated by invagination of the tonoplast. Finally, shuttles bound by a portion of tonoplast aggregate into tannin accretions which are stored in the vacuole. Polymerization of tannins occurs inside the tannosome regardless of the compartment being crossed. A complete sequence of events apparently valid in all studied Tracheophyta is described. © 2013 The Author. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved.

Lecouteux B.,University of Grenoble Alpes | Linares G.,University of Avignon | Esteve Y.,University of le Mans | Gravier G.,French National Center for Scientific Research
IEEE Transactions on Audio, Speech and Language Processing | Year: 2013

Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both A and beam-search-based decoder yields similar performances. © 2006-2012 IEEE.

Rousseau P.,University of Angers | Rousseau P.,French Institute of Health and Medical Research | Mahe G.,University of Angers | Mahe G.,French Institute of Health and Medical Research | And 7 more authors.
Microvascular Research | Year: 2011

Objective: Both spatial variability and temporal variability of skin blood flow are high. Laser speckle contrast imagers (LSCI) allow non-contact, real-time recording of cutaneous blood flow on large skin surfaces. Thereafter, the observer can define different sizes for the region of interest (ROI) in the images to decrease spatial variability and different durations over which the blood flow values are averaged (time of interest, TOI) to decrease temporal variability. We aimed to evaluate the impact of the choices of ROI and TOI on the analysis of rest blood flow and post occlusive reactive hyperemia (PORH). Methods: Cutaneous blood flow (CBF) was assessed at rest and during PORH. Three different sizes of ROI (1mm2, 10mm2 and 100mm2), and three different TOI (CBF averaged over 1s, 15s, and 30s for rest, and over 1s, 5s and 10s for PORH peak) were evaluated. Inter-subjects and intra-subjects coefficient of variations (inter-CV and intra-CV) were studied. Results: The inter-subject variability of CBF is about 25% at rest and is moderately improved when the size of the ROI increases (inter-CV=31%, for 1s and 1mm2 versus inter-CV=23%, for 15s and 100mm2). However, increasing the TOI does not improve the results. The variability of the PORH peak is lower with an inter-CV varying between 11.4% (10s and 100mm2) and 21.6% (5s and 1mm2). The lowest intra-CV for the CBF at rest was 7.3% (TOI of 15s on a ROI of 100mm2) and was 3.1% for the PORH peak (TOI of 10s on a ROI of 100mm2). Conclusion: We suggest that a size of ROI larger than 10mm2 and a TOI longer than 1s are required to reduce the variability of CBF measurements both at rest and during PORH peak evaluations at the forearm level. Many technical aspects such as comparison of laser speckle contrast imaging and laser Doppler imaging or the effect of skin to head distance on recorded values with LCSI are required to improve future studies using this fascinating clinical tool. © 2011 Elsevier Inc.

Dufour R.,University of le Mans | Esteve Y.,University of le Mans | Deleglise P.,University of le Mans
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | Year: 2011

Extracting information from large data is a challenging task. In this paper, we investigate the link between speech spontaneity levels and speaker roles, and the relevance to use an automatic spontaneous speech characterization as a speaker role identification feature. Applying this automatic spontaneous speech characterization system to a broadcast news corpus containing ten manually labeled speaker roles allowed us to highlight this relationship. So, we propose to directly apply the spontaneous speech characterization approach in order to automatically recognize speaker roles. Experimental results show that characteristics used to detect speech spontaneity could be very useful to recognize speaker roles, as we reached an overall classification precision of 74.4%.Copyright © 2011 ISCA.

Schwenk H.,University of Le Mans
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | Year: 2013

Language models play a very important role in many natural language processing applications, in particular large vocabulary speech recognition and statistical machine translation. For a long time, back-off n-gram language models were considered to be the state-of-art when large amounts of training data are available. Recently, so called continuous space methods or neural network language models have shown to systematically outperform these models and they are getting increasingly popular. This article describes an open-source toolkit that implements these models in a very efficient way, including support for GPU cards. The modular architecture makes it very easy to work with different data formats and to support various alternative models. Using data selection, resampling techniques and a highly optimized code, training on more than five billions words takes less than 24 hours. The resulting models achieve reductions in the perplexity of almost 20%. This toolkit has been very successfully applied to various languages for large vocabulary speech recognition and statistical machine translation. By making available this toolkit we hope that many more researchers will be able to work on this very promising technique, and by these means, quickly advance the field. Copyright © 2013 ISCA.

Mahe G.,University of Angers | Mahe G.,French Institute of Health and Medical Research | Rousseau P.,University of Angers | Rousseau P.,French Institute of Health and Medical Research | And 6 more authors.
Microvascular Research | Year: 2011

Cutaneous blood flow (CBF) can be assessed non-invasively with lasers. Unfortunately, movement artefacts in the laser skin signal (LSsk) might sometimes compromise the interpretation of the data. To date, no method is available to remove movement artefacts point-by-point. Using a laser speckle contrast imager, we simultaneously recorded LSsk and the signal backscattered from an adjacent opaque surface (LSos). The completion of a first protocol allowed a definition of a simple equation to calculate the CBF from movement artefact-affected traces of LSsk and LSos. We then recorded LSsk and LSos before, during and for 5min after the tourniquet ischemia, both when subjects (n=8) were immobile or submitted to external passive movements of random intensity throughout the test. The typical post-occlusive reactive hyperemia trace was not identifiable within the LSsk recordings, with LSsk being 2 to 3 times higher during movements than in the immobile situation. After the calculation of CBF, traces in the immobile versus movement conditions were comparable, with the "r" cross-correlation coefficient being 0.930+/-0.010. Our method might facilitate future investigations in microvascular physiology and pathophysiology, specifically in subjects who have frequent or continuous involuntary movements. © 2010 Elsevier Inc.

Rauf S.A.,University of le Mans | Schwenk H.,University of le Mans
Machine Translation | Year: 2011

Aparallel corpus is an essential resource for statistical machine translation (SMT) but is often not available in the required amounts for all domains and languages. An approach is presented herewhich aims at producing parallel corpora from available comparable corpora. An SMT system is used to translate the source-language part of a comparable corpus and the translations are used as queries to conduct information retrieval from the target-language side of the comparable corpus. Simple filters are then used to score theSMToutput and the IR-returned sentence with the filter score defining the degree of similarity between the two. Using SMT system output gives the benefit of trying to correct one of the common errors by sentence tail removal. The approach was applied to Arabic-English and French-English systems using comparable news corpora and considerable improvements were achieved in the BLEU score. We show that our approach is independent of the quality of the SMT system used to make the queries, strengthening the claim of applicability of the approach for languages and domains with limited parallel corpora available to start with. We compare our approach with one of the earlier approaches and show that our approach is easier to implement and gives equally good improvements. © Springer Science+Business Media B.V. 2011.

Dufour R.,University of le Mans | Esteve Y.,University of le Mans | Deleglise P.,University of le Mans
Speech Communication | Year: 2014

Processing spontaneous speech is one of the many challenges that automatic speech recognition systems have to deal with. The main characteristics of this kind of speech are disfluencies (filled pause, repetition, false start, etc.) and many studies have focused on their detection and correction. Spontaneous speech is defined in opposition to prepared speech, where utterances contain well-formed sentences close to those found in written documents. Acoustic and linguistic features made available by the use of an automatic speech recognition system are proposed to characterize and detect spontaneous speech segments from large audio databases. To better define this notion of spontaneous speech, segments of an 11-hour corpus (French Broadcast News) had been manually labeled according to three classes of spontaneity. Firstly, we present a study of these features. We then propose a two-level strategy to automatically assign a class of spontaneity to each speech segment. The proposed system reaches a 73.0% precision and a 73.5% recall on high spontaneous speech segments, and a 66.8% precision and a 69.6% recall on prepared speech segments. A quantitative study shows that the classes of spontaneity are useful information to characterize the speaker roles. This is confirmed by extending the speech spontaneity characterization approach to build an efficient automatic speaker role recognition system. © 2013 Elsevier B.V. All rights reserved.

Loading University of Le Mans collaborators
Loading University of Le Mans collaborators