Entity

Time filter

Source Type


Lu S.,Shanghai Research Institute of China Post | Wei X.,East China Normal University | Lu Y.,Shanghai Research Institute of China Post | Lu Y.,East China Normal University
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR | Year: 2015

To overcome the class imbalance problem in Chinese address recognition, we propose a cost-sensitive learning method for MQDF classifier. In the learning process, a cost vector is introduced to the discriminative learning process of MQDF, and minimization of misclassification cost is used as the convergence criteria. A cost-sensitive MQDF classifier (CMQDF) is then obtained, and it is integrated into a handwritten Chinese address recognition (HCAR) system to validate its effectiveness. The experimental results show that CMQDF is an effective cost-sensitive classifier for the class imbalance problem in HCAR system. Moreover, it enhances the reliability of the HCAR system. © 2015 IEEE. Source


Ai L.,East China Normal University | Lu S.,East China Normal University | Lu S.,Shanghai Research Institute of China Post | Wen Y.,East China Normal University | And 2 more authors.
Communications in Computer and Information Science | Year: 2012

In this paper, we present segmentation of handwritten Chinese strings in presence of overlapped and touching characters. A contour tracing based method is proposed to segment the overlapped characters. To segment touching characters, a corner point analysis method is carried out to identify the cutting positions. Experimental results on 564 Chinese character strings captured from postal mail pieces show the effectiveness of the proposed methods on the segmentation of handwritten Chinese character strings. © 2012 Springer-Verlag Berlin Heidelberg. Source


Lu S.,East China Normal University | Lu S.,Shanghai Research Institute of China Post | Liu L.,East China Normal University | Lu Y.,East China Normal University | And 3 more authors.
International Journal of Pattern Recognition and Artificial Intelligence | Year: 2012

Most traditional postcode recognition systems implicitly assumed that the distribution of the 10 numerals (0-9) is balanced. However it is far from a reasonable setting because the distribution of 0-9 in postcodes of a country or a city is generally imbalanced. Some numerals appear in more postcodes, while some others do not. In this paper, we study cost-sensitive neural network classifiers to address the class imbalance problem in postcode recognition. Four methods, namely: cost-sampling, cost-convergence, rate-adapting and threshold-moving are considered in training neural networks. Cost-sampling adjusts the distribution of the training data such that the costs of classes are conveyed explicitly by the appearances of their instances. Cost-convergence and rate-adapting are carried out in training phase by modifying the architecture of training algorithms of the neural network. Threshold-moving tries to increase the probability estimations of expensive classes to avoid the samples with higher costs to be misclassified. 10,702 postcode images are experimented using five cost matrices based on the distribution of numerals in postcodes. The results suggest that cost-sensitive learning is indeed effective on class imbalanced postcode analysis and recognition. It also reveals that cost-sampling on a proper cost matrix outperforms others in this application. © 2012 World Scientific Publishing Company. Source


Lu S.,East China Normal University | Lu S.,Shanghai Research Institute of China Post | Wei X.,East China Normal University | Lu Y.,East China Normal University | Lu Y.,Shanghai Research Institute of China Post
Proceedings - International Conference on Pattern Recognition | Year: 2014

This paper proposes a cost-sensitive transformation for improving handwritten address recognition performance by converting a general-purpose handwritten Chinese character recognition engine to a special-purpose one. The class probabilities produced by character recognition engine for predicting a sample to candidate classes are transformed to the expected costs based on Naive Bayes optimal theoretical predictions firstly. And then candidate probabilities are reestimated based on the expected costs. Two general-purpose offline handwritten Chinese character recognition engines, PAIS and HAW, are tested in our experiments by applying them in handwritten Chinese address recognition system. 1822 live handwritten Chinese address images are tested with multiple cost matrices. Experimental results show that cost-sensitive transformation improves the recognition performance of general purpose recognition engines on handwritten Chinese address recognition. © 2014 IEEE. Source


Wei X.,East China Normal University | Lu S.,Shanghai Research Institute of China Post | Wen Y.,East China Normal University | Lu Y.,East China Normal University | Lu Y.,Shanghai Research Institute of China Post
Pattern Recognition Letters | Year: 2016

Handwritten Chinese address recognition is a challenging task, not only because of the large quantity of Chinese characters and unconstraint of handwriting, but also due to irregularities of various address formats. The existing techniques generally solve the problem by transforming the address database to a large scale character-level-tree (CLT) and then utilizing the nodes of the generated CLT to match with the candidate patterns. However, the CLT is unable to cover all the variations of address formats. A more compact tree is proposed in this paper to cover the variations of address formats as many and complete as possible by building the structure tree at word level. Specifically, the segment candidate patterns are firstly recognized by a character classifier, then are mapped to candidate address words by matching with the proposed word-level-tree (WLT) address database. Finally, the address recognition result is obtained in the path matching phase by summing the scores of candidate address words in each match path. The proposed scheme was tested with real mail address images captured by an automatic letter sorting machine. Experimental results have demonstrated that the performance of the proposed WLT based method outperforms the four benchmarking methods. © 2016 Elsevier B.V. All rights reserved. Source

Discover hidden collaborations