Key Laboratory of Symbol Computation and Knowledge Engineering

Changchun, China

Key Laboratory of Symbol Computation and Knowledge Engineering

Changchun, China

Time filter

Source Type

Liu H.,Zhejiang Normal University | Liu H.,Key Laboratory of Symbol Computation and Knowledge Engineering | Li M.,Zhejiang Normal University | Zhao J.,Zhejiang Normal University | And 2 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2011

With rapid development of information technology, dimensionality of data in many applications is getting higher and higher. However, many features in the high-dimensional data are redundant. Their presence may pose a great number of challenges to traditional learning algorithms. Thus, it is necessary to develop an effective technique to remove irrelevant features from data. Currently, many endeavors have been attempted in this field. In this paper, we propose a new feature selection method by using conditional mutual information estimated dynamically. Its advantage is that it can exactly represent the correlation between features along with the selection procedure. Our performance evaluations on eight benchmark datasets show that our proposed method achieves comparable performance to other well-established feature selection algorithms in most cases. © 2011 Springer-Verlag.


Ma X.,Jilin University | Ma X.,Key Laboratory of Symbol Computation and Knowledge Engineering
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2010

A semiconductor fabrication line dynamic scheduling(SFLDS) model combining MAS(Multi-Agent System) with multi-intelligence algorithms is presented in this paper. The proposed model is based on the improved generalized partial global planning(GPGP) and utilizes the advantages of static intelligence algorithms with dynamic MAS. A scheduling process from 'macro-scheduling to micro-scheduling to repeated- scheduling' is designed for large-scale complex problems to enable to implement an effective and widely applicable prototype system for SFLDS. Under this scheme, a set of limitation and improvement of GPGP about its structure are proposed. The improved GPGP and its model are simulated by using simulation software eM-plant. A case study is provided to examine the practicability, flexibility and robustness of the proposed scheduling approach. © 2010 Springer-Verlag.


Peng T.,Jilin University | Peng T.,University of Illinois at Urbana - Champaign | Peng T.,Key Laboratory of Symbol Computation and Knowledge Engineering | Liu L.,Jilin University | And 2 more authors.
International Journal of Software Engineering and Knowledge Engineering | Year: 2015

Today more and more information on the Web makes it difficult to get domain-specific information due to the huge amount of data sources and the keywords that have few features. Anchor texts, which contain a few features of a specific topic, play an important role in domain-specific information retrieval, especially in Web page classification. However, the features contained in anchor texts are not informative enough. This paper presents a novel incremental method for Web page classification enhanced by link-contexts and clustering. Directly applying the vector of anchor text to a classifier might not get a good result because of the limited amount of features. Link-context is used first to obtain the contextual information of the anchor text. Then, a hierarchical clustering method is introduced to cluster feature vectors and content unit, which increases the length of a feature vector belonging to one specific class. Finally, incremental SVM is proposed to get the final classifier and increase the accuracy and efficiency of a classifier. Experimental results show that the performance of our proposed method outperforms the conventional topical Web crawler in Harvest rate and Target recall. © 2015 World Scientific Publishing Company.


Liu L.,Jilin University | Liu L.,University of Illinois at Urbana - Champaign | Peng T.,Jilin University | Peng T.,University of Illinois at Urbana - Champaign | And 4 more authors.
Ruan Jian Xue Bao/Journal of Software | Year: 2013

Text classification is a key technology in information retrieval. Collecting more reliable negative examples, and building effective and efficient classifiers are two important problems for automatic text classification. However, the existing methods mostly collect a small number of reliable negative examples, keeping the classifiers from reaching high accuracy. In this paper, a clustering-based method for automatic PU (positive and unlabeled) text classification enhanced by SVM active learning is proposed. In contrast to traditional methods, this approach is based on the clustering technique which employs the characteristic that positive and negative examples should share as few words as possible. It finds more reliable negative examples by removing as many probable positive examples from unlabeled set as possible. In the process of building classifier, a term weighting scheme TFIPNDF (term frequency inverse positive-negative document frequency, improved TFIDF) is adopted. An additional improved Rocchio, in conjunction with SVMs active learning, significantly improves the performance of classifying. Experimental results on three different datasets (RCV1, Reuters-21578, 20 Newsgroups) show that the proposed clustering- based method extracts more reliable negative examples than the baseline algorithms with very low error rates and implementing SVM active learning also improves the accuracy of classification significantly. ©Copyright 2013, Institute of Software, the Chinese Academy of Sciences.


Liu L.,Jilin University | Liu L.,University of Illinois at Urbana - Champaign | Peng T.,Jilin University | Peng T.,University of Illinois at Urbana - Champaign | Peng T.,Key Laboratory of Symbol Computation and Knowledge Engineering
Journal of Information Science and Engineering | Year: 2014

PU learning occurs frequently in Web pages classification and text retrieval applications because users may be interested in information on the same topic. Collecting reliable negative examples is a key step in PU (Positive and Unlabeled) text classification, which solves a key problem in machine learning when no labeled negative examples are available in the training set or negative examples are difficult to collect. Thus, this paper presents a novel clustering-based method for collecting reliable negative examples (CCRNE). Different from traditional methods, we remove as many probable positive examples from unlabeled set as possible, which results that more reliable negative examples are found out. During the process of building classifier, a novel TFIDF-improved feature weighting approach, which reflects the importance of the term in the positive and negative training examples respectively, is presented to describe documents in the Vector Space Model. We also build a weighted voting classifier by iteratively applying the SVM algorithm and implement OCS (One-class SVM), PEBL (Positive Example Based Learning) and 1-DNFII (Constrained 1-DNF) methods used for comparison. Experimental results on three real-world datasets (Reuters Corpus Volume 1 (RCV1), Reuters-21578 and 20 Newsgroups) show that our proposed C-CRNE extracts more reliable negative examples than the baseline algorithms with very low error rates. And our classifier outperforms other state-of-art classification methods from the perspective of traditional performance metrics.


Zhang J.,Jilin University | Zhang J.,Key Laboratory of Symbol Computation and Knowledge Engineering | Meng W.,Jilin University | Liu Q.,Jilin University | And 3 more authors.
Optik | Year: 2016

The driving thinking of taxi drivers is always hidden in a large amount of taxis GPS data. An efficient driving stratagem derived from taxi drivers is provided for private car drivers. The five million pieces of taxis GPS data in Nanjing, China are analyzed: firstly, the data preprocessing is conducted for the reduction measuring error of GPS data with the expurgation of the static point, the drifting point, and the relatively independent point; then, the road intersections through the regional extreme points are found to restore map with the following three algorithms: the path selection algorithm based on probability, the improved Prim path selection algorithm, and the improved Prim path selection algorithm based on probability; at last, the SPFA (Shortest Path Faster Algorithm) is applied to the measurement of the road map gained from the previous three algorithms for optimal path planning with 40 pairs of starting points and termination points, and making a comparison of the road length among three methods. Through the experimental comparison, the third method namely the improved Prim path selection algorithm based on probability which proved to be more optimal than others two methods produces an efficient driving route more accurately. © 2015 Elsevier GmbH. All rights reserved.


Wang J.,Jilin University | Wang J.,Key Laboratory of Symbol Computation and Knowledge Engineering | Zuo W.,Jilin University | Zuo W.,Key Laboratory of Symbol Computation and Knowledge Engineering | And 2 more authors.
Chinese Journal of Electronics | Year: 2015

Measuring word semantic similarity is a generic problem with a broad range of applications such as ontology mapping, computational linguistics and artificial intelligence. Previous approaches to computing word semantic similarity did not consider concept occurrence frequency and word's sense number. This paper introduced Hyponymy graph, and based on which proposed a novel word semantic similarity model. For two words to be compared, we first retrieve their related concepts; then produce lowest common ancestor matrix and distance matrix between concepts; finally calculate distance-based similarity and information-based similarity, which are integrated to get final semantic similarity. The main contribution of our method is that both concept occurrence frequency and word's sense number are taken into account. This similarity measurement more closely fits with human rating and effectively simulates human thinking process. Our experimental results on benchmark dataset M&C and R&G with WordNet2.1 as platform demonstrate roughly 0.9%-1.2% improvements over existing best approaches.


Wen Z.,Zhengzhou University | Chen K.,Jilin University | Chen K.,Key Laboratory of Symbol Computation and Knowledge Engineering | Fan Z.,Zhengzhou University
ICCET 2010 - 2010 International Conference on Computer Engineering and Technology, Proceedings | Year: 2010

Currently, the research for the extraction of information in deep web is pretty active. Although many researchers already adopted ontology in the data extraction, many problems still exist. This paper proposed an ontology evolution based method for mining in the data area. Not only will this method solve the problem when the website only consists of one record, but it also can identify he meaning of data that has no labels. With the evolution of ontology, the extraction of data records is being more accurate. Experiments indicate that this method could improve the accuracy and efficiency of data extraction. © 2010 IEEE.


Zhang J.,Jilin University | Zhang J.,Key Laboratory of Symbol Computation and Knowledge Engineering | Jia X.,Jilin University | Zhou Z.,Jilin University
Optik | Year: 2015

To tackle the string stability problem of a vehicle platoon, an efficient collision prevention pre-compensation control algorithm called CPPC is proposed in this paper. In the algorithm, acceleration, speed, location, communication delay and spacing errors are introduced. The safe distance between vehicles is used to keep driving safety. We evaluate our algorithm experimentally using simulation method and compared it with the no string stability control algorithm. It reveals very encouraging simulation results indicate effectiveness of the proposed approach. © 2015 Elsevier GmbH. All rights reserved.


Zhang C.,Northeastern University China | Ouyang D.,Key Laboratory of Symbol Computation and Knowledge Engineering | Ning J.,Northeast Normal University
Expert Systems with Applications | Year: 2010

Clustering is a popular data analysis and data mining technique. In this paper, an artificial bee colony clustering algorithm is presented to optimally partition N objects into K clusters. The Deb's rules are used to direct the search direction of each candidate. This algorithm has been tested on several well-known real datasets and compared with other popular heuristics algorithm in clustering, such as GA, SA, TS, ACO and the recently proposed K-NM-PSO algorithm. The computational simulations reveal very encouraging results in terms of the quality of solution and the processing time required. © 2009 Elsevier Ltd. All rights reserved.

Loading Key Laboratory of Symbol Computation and Knowledge Engineering collaborators
Loading Key Laboratory of Symbol Computation and Knowledge Engineering collaborators