IFLYTEK Research

Hefei, China

IFLYTEK Research

Hefei, China
Time filter
Source Type

Du J.,Anhui University of Science and Technology | Wang Z.-R.,Anhui University of Science and Technology | Zhai J.-F.,Anhui University of Science and Technology | Hu J.-S.,IFlytek Research
Proceedings - International Conference on Pattern Recognition | Year: 2017

This paper proposes a novel segmentation-free approach using deep neural network based hidden Markov model (DNN-HMM) for offline handwritten Chinese text recognition. In the general Bayesian framework, three key issues are comprehensively investigated, namely feature extraction, character modeling, and language modeling. First, as for the feature extraction on the basis of each frame or sliding window, the gradient-based features are extracted for the DNN-based classifier. Second, the text line is sequentially modeled by HMMs with each representing one character class. Meanwhile the DNN-based classifier is adopted to calculate the posterior probability of all HMM states. Finally, the character n-gram language model is integrated with the DNN-HMM character model for the Bayesian decision. The experiments on the ICDAR 2013 competition task of CASIA-HWDB database show that the proposed approach can achieve the best published recognition results to our knowledge, yielding a character error rate (CER) of 6.50%, which significantly outperforms the previously best reported oversegmentation approach (with a CER of 9.25%) and the segmentation-free approach using multidimensional long-short term memory recurrent neural network (MDLSTM-RNN) approach (with a CER of 10.6%). © 2016 IEEE.

Zhang S.,Hefei University of Technology | Liu C.,IFLYTEK Research | Jiang H.,Lassonde | Wei S.,IFLYTEK Research | And 2 more authors.
IEEE/ACM Transactions on Audio Speech and Language Processing | Year: 2017

In this paper, we propose a novel neural network structure, namely feedforward sequential memory networks (FSMN), to model long-term dependence in time series without using recurrent feedback. The proposed FSMN is a standard fully connected feedforward neural network equipped with some learnable memory blocks in its hidden layers. The memory blocks use a tapped-delay line structure to encode the long context information into a fixed-size representation as short-term memory mechanism which are somehow similar to the time-delay neural networks layers. We have evaluated the FSMNs in several standard benchmark tasks, including speech recognition and language modeling. Experimental results have shown that FSMNs outperform the conventional recurrent neural networks (RNN) while can be learned much more reliably and faster in modeling sequential signals like speech or language. Moreover, we also propose a compact feedforward sequential memory networks (cFSMN) by combining FSMN with low-rank matrix factorization and make a slight modification to the encoding method used in FSMNs in order to further simplify the network architecture. On the speech recognition Switchboard task, the proposed cFSMN structures can reduce the model size by 60% and speed up the learning by more than seven times while the model can still significantly outperform the popular bidirectional LSTMs for both frame-level cross-entropy criterion-based training and MMI-based sequence training. © 2014 IEEE.

Chen Y.,Hefei University of Technology | Zhang L.,Anhui University | Li X.,IFlyTek Research | Zong Y.,West Anhui University | And 3 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2017

Merchant recommendation, namely recommending personalized merchants to a specific customer, has become increasingly important during the past few years especially with the prevalence of Location Based Social Networks (LBSNs). Although many existing methods attempt to address this task, most of them focus on applying the conventional recommendation algorithm (e.g. Collaborative Filtering) for merchant recommendation while ignoring harnessing the hidden information buried in the users’ reviews. In fact, the information of user real preferences on various topics hidden in the reviews is very useful for personalized merchant recommendation. To this end, in this paper, we propose a graphical model by incorporating user real preferences on various topics from user reviews into collaborative filtering technique for personalized merchant recommendation. Then, we develop an optimization algorithm based on a Gaussian model to train our merchant recommendation approach. Finally, we conduct extensive experiments on two real-world datasets to demonstrate the efficiency and effectiveness of our model. The experimental results clearly show that our proposed model outperforms the state-of-the-art benchmark approaches. © Springer International Publishing AG 2017.

Chen Y.,Hefei University of Technology | Li X.,iFlyTek Research | Li L.,Wuhan University of Technology | Liu G.,Hefei University of Technology | Xu G.,University of Technology, Sydney
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2016

The pervasive employments of Location-based Social Network call for precise and personalized Point-of-Interest (POI) recommendation to predict which places the users prefer. Modeling user mobility, as an important component of understanding user preference, plays an essential role in POI recommendation. However, existing methods mainly model user mobility through analyzing the check-in data and formulating a distribution without considering why a user checks in at a specific place from psychological perspective. In this paper, we propose a POI recommendation algorithm modeling user mobility by considering check-in data and geographical information. Specifically, with check-in data, we propose a novel probabilistic latent factor model to formulate user psychological behavior from the perspective of utility theory, which could help reveal the inner information underlying the comparative choice behaviors of users. Geographical behavior of all the historical check-ins captured by a power law distribution is then combined with probabilistic latent factor model to form the POI recommendation algorithm. Extensive evaluation experiments conducted on two real-world datasets confirm the superiority of our approach over state-of-the-art methods. © Springer International Publishing Switzerland 2016.

Liu C.,IFlytek Research | Hu Y.,IFlytek Research | Dai L.-R.,Hefei University of Technology | Jiang H.,York University
IEEE Transactions on Audio, Speech and Language Processing | Year: 2011

In this paper, we have proposed two novel optimization methods for discriminative training (DT) of hidden Markov models (HMMs) in speech recognition based on an efficient global optimization algorithm used to solve the so-called trust region (TR) problem, where a quadratic function is minimized under a spherical constraint. In the first method, maximum mutual information estimation (MMIE) of Gaussian mixture HMMs is formulated as a standard TR problem so that the efficient global optimization method can be used in each iteration to maximize the auxiliary function of discriminative training for speech recognition. In the second method, we propose to construct a new auxiliary function for DT of HMMs by adding a quadratic penalty term. The new auxiliary function is constructed to serve as first-order approximation as well as lower bound of the original discriminative objective function within a locality constraint. Due to the lower-bound property, the found optimal point of the new auxiliary function is guaranteed to improve the original discriminative objective function until it converges to a local optimum or stationary point of the objective function. Both TR-based optimization methods have been investigated on two standard large-vocabulary continuous speech recognition tasks, using the WSJ0 and Switchboard databases. Experimental results have shown that the proposed TR methods outperform the conventional EBW method in terms of convergence behavior as well as recognition performance. © 2011 IEEE.

Du J.,Microsoft | Hu Y.,IFlytek Research | Jiang H.,York University
IEEE Transactions on Audio, Speech and Language Processing | Year: 2011

In this paper, we apply the well-known boosted mixture learning (BML) method to learn Gaussian mixture HMMs in speech recognition. BML is an incremental method to learn mixture models for classification problems. In each step of BML, one new mixture component is estimated according to the functional gradient of an objective function to ensure that it is added along the direction that maximizes the objective function. Several techniques have been proposed to extend BML from simple mixture models like the Gaussian mixture model (GMM) to the Gaussian mixture hidden Markov model (HMM), including Viterbi approximation for state segmentation, weight decay and sampling boosting to initialize sample weights to avoid overfitting, combination between partial updating and global updating to refine model parameters in each BML iteration, and use of the Bayesian Information Criterion (BIC) for parsimonious modeling. Experimental results on two large-vocabulary continuous speech recognition tasks, namely the WSJ-5k and Switchboard tasks, have shown that the proposed BML yields significant performance gain over the conventional training procedure, especially for small model sizes. © 2006 IEEE.

Ding H.,National School of Technology | Pan J.,IFlytek Research | Shen M.,University of Konstanz | Shen M.,South China University of Technology
2015 IEEE International Conference on Information and Automation, ICIA 2015 - In conjunction with 2015 IEEE International Conference on Automation and Logistics | Year: 2015

Objective measures are favored and widely used by many researchers in evaluating the quality of noise-suppressed speech. A good and reliable objective measure should have property that it could evaluate speech quality in consistent and well correlated with subjective ratings. In this paper, several widely used objective measures are applied to the speech signals with the Chinese languages including Mandarin and Cantonese. The correlations between objective measure outputs and perceptual-subjective ratings are reported and analyzed. The experimental results show that the correlation with the language types of Mandarin and Cantonese are lower than the one with English and objective measures behave differently in Mandarin, Cantonese and English. Detail discussion and conclusion are presented as well. © 2015 IEEE.

Ding J.,Hefei University of Technology | Chen Y.,Hefei University of Technology | Li X.,IFlyTek Research | Liu G.,Hefei University of Technology | And 2 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2016

Personalized Recommendation has drawn greater attention in academia and industry as it can help people filter out massive useless information. Several existing recommender techniques exploit social connections, i.e., friends or trust relations as auxiliary information to improve recommendation accuracy. However, opinion leaders in each circle tend to have greater impact on recommendation than those of friends with different tastes. So we devise two unsupervised methods to identify opinion leaders that are defined as experts. In this paper, we incorporate the influence of experts into circle-based personalized recommendation. Specifically, we first build explicit and implicit social networks by utilizing users’ friendships and similarity respectively. Then we identify experts on both social networks. Further, we propose a circle-based personalized recommendation approach via fusing experts’ influences into matrix factorization technique. Extensive experiments conducted on two datasets demonstrate that our approach outperforms existing methods, particularly on handing cold-start problem. © Springer International Publishing Switzerland 2016.

Xia X.-J.,Anhui University of Science and Technology | Ling Z.-H.,Anhui University of Science and Technology | Jiang Y.,IFLYTEK Research | Dai L.-R.,Anhui University of Science and Technology
Speech Communication | Year: 2014

This paper presents a hidden Markov model (HMM) based unit selection speech synthesis method using log likelihood ratios (LLR) derived from perceptual data. The perceptual data is collected by judging the naturalness of each synthetic prosodic word manually. Two acoustic models which represent the natural speech and the unnatural synthetic speech are trained respectively. At synthesis time, the LLRs are derived from the estimated acoustic models and integrated into the unit selection criterion as target cost functions. The experimental results show that our proposed method can synthesize more natural speech than the conventional method using likelihood functions. Due to the inadequacy of the acoustic model estimated for the unnatural synthetic speech, utilizing the LLR-based target cost functions to rescore the pre-selection results or the N-best sequences can achieve better performance than substituting them for the original target cost functions directly. © 2014 Elsevier B.V. All rights reserved.

Du J.,Anhui University of Science and Technology | Zhai J.-F.,Anhui University of Science and Technology | Hu J.-S.,IFlytek Research | Zhu B.,IFlytek Research | And 2 more authors.
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR | Year: 2015

This paper presents a novel approach to writer adaptation based on convolutional neural network (CNN) as a feature extractor and improved discriminative linear regression for online handwritten Chinese character recognition. First, the proposed recognizer consisting of CNN-based feature extractor and prototype-based classifier can achieve comparable performance with the state-of-the-art CNN-based classifier while it could be designed more compact and efficient as a practical solution. Second, the writer adaption is performed via a linear transformation of the extracted feature from CNN. The transformation parameters are optimized with a so-called sample separation margin based minimum classification error criterion, which can be further improved by using more synthesized adaptation data and a simple regularization method. The experiments on the data collected from user inputs of Smartphones with a vocabulary of 20,936 characters demonstrate that our writer adaptation approach can yield significant improvements of recognition accuracy over a high-performance baseline system and also outperform a state-of-the-art approach based on style transfer mapping especially with increased adaptation data. © 2015 IEEE.

Loading IFLYTEK Research collaborators
Loading IFLYTEK Research collaborators