Time filter

Source Type

Zhang J.,Southwest Jiaotong University | Zhang J.,Key Laboratory of Cloud Computing and Intelligent Technology | Yang Y.,Southwest Jiaotong University | Yang Y.,Key Laboratory of Cloud Computing and Intelligent Technology | And 6 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

Recent researches on data clustering is increasingly focusing on combining multiple data partitions as a way to improve the robustness of clustering solutions. Most of them focused on crisp clustering combination. Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. However, in this paper, we offer a semi-supervised clustering ensemble model based on collaborative training (SCET) and an unsupervised clustering ensemble mode based on collaborative training (UCET). In the ensemble step of SCET, semi-supervised learning is introduced. While in UCET, the knowledge used in SCET is replaced by information extracted from the base-clusterings. Then tri-training is used as consensus of clustering ensemble. The experiments on datasets from UCI machine learning repository indicate that the model improves the accuracy of clustering. © 2012 Springer-Verlag.


Yang Y.,Southwest Jiaotong University | Yang Y.,Key Laboratory of Cloud Computing and Intelligent Technology | Ni X.,Southwest Jiaotong University | Ni X.,Key Laboratory of Cloud Computing and Intelligent Technology | And 3 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

Hadoop is a distributed system infrastructure of cloud computing. Based on the characteristics of ant-based clustering algorithm, the paper implements the parallelization of this algorithm using MapReduce on Hadoop. The Map function calculates the average similarity of the object with its neighborhood objects. The Reduce function processes the objects with the Map outputs and updates related information of both ants and the objects to get ready for the next job. Results on the Hadoop clusters show that our method can significantly improve the computational efficiency with the premise of maintaining clustering accuracy. © 2012 Springer-Verlag.


Luo C.,Southwest Jiaotong University | Luo C.,Key Laboratory of Cloud Computing and Intelligent Technology | Li T.,Southwest Jiaotong University | Li T.,Key Laboratory of Cloud Computing and Intelligent Technology | And 2 more authors.
Information Sciences | Year: 2014

Set-valued information systems are important type of data tables and generalized models of single-valued information systems. Approximations are the focal point of approaches to knowledge discovery based on rough set theory, which can be used to extract and represent the hidden knowledge in the form of decision rules. Attribute generalization refers to dynamic change of the attribute set in an information system with respect to the requirements of real-life applications. In this paper, we focus on maintaining approximations dynamically in set-valued ordered decision systems under the attribute generalization. Firstly, a matrix-based approach for computing approximations of upward and downward unions of decision classes is constructed by introducing the dominant and dominated matrices with respect to the dominance relation. Then, incremental approaches for updating approximations are proposed, which involves several modifications to relevant matrices without having to retrain from the start on all accumulated training data. Finally, comparative experiments on data sets from UCI as well as synthetic data sets show the proposed incremental updating methods are efficient and effective for dynamic attribute generalization. © 2013 Elsevier Inc. All rights reserved.


Yang Y.,Southwest Jiaotong University | Yang Y.,Key Laboratory of Cloud Computing and Intelligent Technology | Tan W.,Southwest Jiaotong University | Tan W.,Key Laboratory of Cloud Computing and Intelligent Technology | And 3 more authors.
Knowledge-Based Systems | Year: 2012

Data mining processes data from different perspectives into useful knowledge, and becomes an important component in designing intelligent decision support systems (IDSS). Clustering is an effective method to discover natural structures of data objects in data mining. Both clustering ensemble and semi-supervised clustering techniques have been emerged to improve the clustering performance of unsupervised clustering algorithms. Cop-Kmeans is a K-means variant that incorporates background knowledge in the form of pairwise constraints. However, there exists a constraint violation in Cop-Kmeans. This paper proposes an improved Cop-Kmeans (ICop-Kmeans) algorithm to solve the constraint violation of Cop-Kmeans. The certainty of objects is computed to obtain a better assignment order of objects by the weighted co-association. The paper proposes a new constrained self-organizing map (SOM) to combine multiple semi-supervised clustering solutions for further enhancing the performance of ICop-Kmeans. The proposed methods effectively improve the clustering results from the validated experiments and the quality of complex decisions in IDSS. © 2011 Elsevier B.V. All rights reserved.


Chen H.,Southwest Jiaotong University | Chen H.,Key Laboratory of Cloud Computing and Intelligent Technology | Li T.,Southwest Jiaotong University | Li T.,Key Laboratory of Cloud Computing and Intelligent Technology | And 2 more authors.
Knowledge-Based Systems | Year: 2012

Approximations in rough sets theory are important operators to discover interesting patterns and dependencies in data mining. Both certain and uncertain rules are unraveled from different regions partitioned by approximations. In real-life applications, an information system may evolve with time by different factors such as attributes, objects, and attribute values. How to update approximations efficiently becomes vital in data mining related tasks. Dominance-based rough set approaches deal with the problem of ordinal classification with monotonicity constraints in multi-criteria decision analysis. Data missing frequently appears in the Incomplete Ordered Decision Systems (IODSs). Extended dominance characteristic relation-based rough set approaches process the IODS with two cases of missing data, i.e., "lost value" and "do not care". This paper focuses on dynamically updating approximations of upward and downward unions while attribute values coarsening or refining in the IODS. Under the extended dominance characteristic relation based rough sets, it presents the principles of dynamically updating approximations w.r.t. attribute values' coarsening and refining in the IODS and algorithms for incremental updating approximations of an upward union and downward union of classes. Comparative experiments from datasets of UCI and empirical results show the proposed method is efficient and effective in maintenance of approximations. © 2012 Elsevier Ltd. All rights reserved.


Wang Z.-G.,Southwest Jiaotong University | Wang Z.-G.,Key Laboratory of Cloud Computing and Intelligent Technology | Li T.-R.,Southwest Jiaotong University | Li T.-R.,Key Laboratory of Cloud Computing and Intelligent Technology | And 6 more authors.
Tongxin Xuebao/Journal on Communications | Year: 2011

With the development of high-speed railway, its security problem gained more and more attention. Noise data collected by sensors reflected the operation condition and was close related to the security of train. The efficiency of processing data became significant due to the volume of data growing at an unprecedented rate. It was a challenge to process massive noise data effectively. A method for preprocessing massive noise data based on MapReduce was proposed by use of the idea of parallel computing. The experiments on Hadoop platform prove that the proposed method can improve the efficiency of preprocessing massive noise data.

Loading Key Laboratory of Cloud Computing and Intelligent Technology collaborators
Loading Key Laboratory of Cloud Computing and Intelligent Technology collaborators