Time filter

Source Type

Nizhniy Novgorod, Russia

Savchenko A.V.,National Research University Higher School of Economics | Savchenko L.V.,Linguistic University of Nizhny Novgorod
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

The problem of recognition of a sequence of objects (e.g., video-based image recognition, phoneme recognition) is explored. The generalization of the fuzzy phonetic decoding method is proposed by assuming the distribution of the classified object to be of exponential type. Its preliminary phase includes association of each model object with the fuzzy set of model classes with grades of membership defined as the confusion probabilities estimated with the Kullback-Leibler divergence between model distributions. At first, each object (e.g., frame) in a classified sequence is put in correspondence with the fuzzy set which grades are defined as the posterior probabilities. Next, this fuzzy set is intersected with the fuzzy set corresponding to the nearest neighbor. Finally, the arithmetic mean of these fuzzy intersections is assigned to the decision for the whole sequence. In this paper we propose not to limit the method's usage with the Kullback-Leibler discrimination and to estimate the grades of membership of models and query objects based on an arbitrary distance with appropriate scale factor. The experimental results in the problem of isolated Russian vowel phonemes and words recognition for state-of-the-art measures of similarity are presented. It is shown that the correct choice of the scale parameter can significantly increase the recognition accuracy. © 2014 Springer International Publishing. Source

Savchenko A.V.,National Research University Higher School of Economics | Savchenko L.V.,Linguistic University of Nizhny Novgorod
Pattern Recognition Letters | Year: 2015

The key purpose of this paper is to train a voice control system if a small amount of user speech data is available without need for general acoustic model if the latter does not fit to the user voice due to known variability sources (childhood, voice diseases, non-nativeness, etc.). We explore the possibility to increase the recognition rate by requiring the speaker to put the stress on all vowels in a command. We propose the novel modification of our fuzzy phonetic decoding method, in which each vowel is put in correspondence with a fuzzy union of sets of available reference signals from this class. A first, syllables are detected and phoneme segmentation is performed. Secondly, the command is extracted from spontaneous speech by thresholding the ratio of the duration of homogeneous segments to the duration of the whole syllable. Finally, each syllable is put in correspondence with the fuzzy set of vowels, and commands are ordered based on similarity with the fuzzy set of the utterance. The experimental results in synthetic and real Russian datasets prove that our method is characterized by better accuracy in comparison with known recognition methods. © 2015 Elsevier B.V. Source

Savchenko V.V.,Linguistic University of Nizhny Novgorod | Savchenko A.V.,National Research University Higher School of Economics
Journal of Communications Technology and Electronics | Year: 2016

A words phonetic decoding method in automatic speech recognition is considered. The properties of Kullback–Leibler divergence are used to synthesize the estimation of the distribution of divergence between minimum speech units (e.g., single phonemes) inside a single class. It is demonstrated that the minimum variance of the intraphonemic divergence is reached when the phonetic database is tuned to the voice of a single speaker. The estimations are proven by experimental results on the recognition of vowel sounds and isolated words of Russian language. © 2016, Pleiades Publishing, Inc. Source

21 2016 . II XIII ...

Discover hidden collaborations