Entity

Time filter

Source Type


Wang F.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education | Wang F.,Shanxi University | Liang J.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education | Liang J.,Shanxi University
Neurocomputing | Year: 2016

Feature selection for large-scale data sets has been conceived as a very important data preprocessing step in the area of machine learning. Data sets in real databases usually take on hybrid forms, i.e., the coexistence of categorical and numerical data. In this paper, based on the idea of decomposition and fusion, an efficient feature selection approach for large-scale hybrid data sets is studied. According to this approach, one can get an effective feature subset in a much shorter time. By employing two common classifiers as the evaluation function, experiments have been carried out on twelve UCI data sets. The experimental results show that the proposed approach is effective and efficient. © 2016 Elsevier B.V. Source


Wang S.,Shanxi University | Wang S.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education | Li D.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education | Li D.,Shanxi University | And 3 more authors.
Expert Systems with Applications | Year: 2011

Owing to its openness, virtualization and sharing criterion, the Internet has been rapidly becoming a platform for people to express their opinion, attitude, feeling and emotion. As the subjectivity texts are often too many for people to go through, how to automatically classify them into different sentiment orientation categories (e.g. positive/negative) has become an important research problem. In this paper, based on Fisher's discriminant ratio, an effective feature selection method is proposed for subjectivity text sentiment classification. In order to validate the proposed method, we compared it with the method based on Information Gain while Support Vector Machine is adopted as the classifier. Two experiments are conducted by combining different feature selection methods with two kinds of candidate feature sets. Under 2739 subjectivity documents of COAE2008s and 1006 car-related subjectivity documents, the experimental results indicate that the Fisher's discriminant ratio based on word frequency estimation has the best performance respectively with accuracy 86.61% and 82.80% under two corpus while the candidate features are the words which appear in both positive and negative texts. © 2011 Elsevier Ltd. All rights reserved. Source


Meng Y.,Shanxi University | Liang J.,Shanxi University | Liang J.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education | Qian Y.,Shanxi University | Qian Y.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education
Knowledge-Based Systems | Year: 2016

Functional data type, which is an important data type, is widely prevalent in many fields such as economics, biology, finance, and meteorology. Its underlying process is often seen as a continuous curve. The classification process for functional data is a basic data mining task. The common method is a two-stage learning process: first, by means of basis functions, the functional data series is converted into multivariate data; second, a machine learning algorithm is employed for performing the classification task based on the new representation. The problem is that a majority of learning algorithms are based on Euclidean distance, whereas the distance between functional samples is L 2 distance. In this context, there are three very interesting problems. (1) Is seeing a functional sample as a point in the corresponding Euclidean space feasible? (2) How to select an orthonormal basis for a given functional data type? (3) Which one is better, orthogonal representation or non-orthogonal representation, under finite basis functions for the same number of basis? These issues are the main motivation of this study. For the first problem, theoretical studies show that seeing a functional sample as a point in the corresponding Euclidean space is feasible under the orthonormal representation. For the second problem, through experimental analysis, we find that Fourier basis is suitable for representing stable functions(especially, periodic functions), wavelet basis is good at differentiating functions with local differences, and data driven functional principal component basis could be the first preference especially when one does not have any prior knowledge on functional data types. For the third problem, experimental results show that orthogonal representation is better than non-orthogonal representation from the viewpoint of classification performance. These results have important significance for studying functional data classification. © 2015 Elsevier B.V. Source


Kang X.,Shanxi University | Kang X.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education | Li D.,Shanxi University | Li D.,Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education | And 4 more authors.
Fuzzy Sets and Systems | Year: 2012

This paper introduces granular computing (GrC) into formal concept analysis (FCA). It provides a unified model for concept lattice building and rule extraction on a fuzzy granularity base for different granulations. One of the strengths of GrC is that larger granulations help to hide some specific details, whereas FCA in a GrC context can prevent losses due to concept lattice complexity. However, the number of superfluous rules increases exponentially with the scale of the decision context. To overcome this we present some inference rules and maximal rules and prove that the set of all these maximal rules is complete and nonredundant. Thus, users who want to obtain decision rules should generate maximal rules. Examples demonstrate that application of the method is valid and practicable. In summary, this approach utilizes FCA in a GrC context and provides a practical basis for data analysis and processing. © 2012 Elsevier B.V. All rights reserved. Source

Discover hidden collaborations