Time filter

Source Type

Ding L.,Key Laboratory of Medical Image Computing NEU | Ding L.,Northeastern University | Qiao B.,Key Laboratory of Medical Image Computing NEU | Qiao B.,Northeastern University | And 3 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2011

Recently, as a new computing infrastructure, cloud computing is getting more and more attention. How to improve the data management of cloud computing is becoming a research hot. Current cloud computing systems only support key-value insert and lookup operations. However, they can not effectively support complex queries and the management of multi-dimensional data due to lack of efficient index structures. Therefore, a scalable and reliable index structure is generally needed. In this paper, a novel quad-tree based multi-dimensional index structure is proposed for efficient data management and query processing in cloud computing systems. A local quad-tree index is built on each compute node to manage the data residing on the node. Then, the compute nodes are organized in a Chord-based overlay network. A portion of local indexes is selected from each compute node as a global index and published based on the overlay routing protocol. The global index with low maintenance cost can dramatically enhance the performance of query processing in cloud computing systems. Experiments show that the proposed index structure is scalable, efficient and reliable. © 2011 Springer-Verlag.


Han D.,Northeastern University China | Han D.,Key Laboratory of Medical Image Computing NEU | Giraud-Carrier C.,Brigham Young University | Li S.,Northeastern University China
Applied Intelligence | Year: 2015

Currently available algorithms for data streams classification are mostly designed to deal with precise and complete data. However, data in many real-life applications is naturally uncertain due to inherent instrument inaccuracy, wireless transmission error, and so on. We propose UELM-MapReduce, a parallel ensemble classifier based on Extreme Learning Machine (ELM) and MapReduce for handling uncertain data streams. We train an efficient parallel ELM-based ensemble classifier from sequential training chunks of the uncertain data streams. The weight of each base classifier in the ensemble is adjusted according to its mean square error on the up-to-date test chunk, and the classifier with the lowest accuracy is replaced. UELM-MapReduce can classify uncertain data streams with both efficiency and accuracy while effectively handling concept drift. Experimental results demonstrate that UELM-MapReduce has better performance than other methods in prediction accuracy and computational efficiency. © 2015 Springer Science+Business Media New York


Cao K.,Northeastern University China | Shi L.,PLA Logistical Engineering University | Wang G.,Northeastern University China | Wang G.,Key Laboratory of Medical Image Computing NEU | And 2 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

Outlier detection is one of the key problems in the data mining area which can reveal rare phenomena and behaviors. In this paper, we will examine the problem of density-based local outlier detection on uncertain data sets described by some discrete instances. We propose a new density-based local outlier concept based on uncertain data. In order to quickly detect outliers, an algorithm is proposed that does not require the unfolding of all possible worlds. The performance of our method is verified through a number of simulation experiments. The experimental results show that our method is an effective way to solve the problem of density-based local outlier detection on uncertain data. © 2014 Springer International Publishing Switzerland.


Wang G.,Key Laboratory of Medical Image Computing NEU | Wang G.,Northeastern University China | Yuan Y.,Key Laboratory of Medical Image Computing NEU | Yuan Y.,Northeastern University China | And 5 more authors.
World Wide Web | Year: 2010

Managing and retrieving reusable learning materials in a content-based way is a big challenge in e-Learning material sharing systems. E-Learning materials are highly heterogeneous; they may exist in the form of video, audio, image, slide or plain text. Furthermore, the learning systems are highly dynamic in the presence of massively increasing multimedia materials. P2P network seems to be one of the most promising infrastructures to deal with the challenge in such highly dynamic environments. In this paper we propose a Peer-to-Peer (P2P) infrastructure based on the trie tree and the deBruijn structure. It can support efficiently query processing in highly dynamic scenarios. Furthermore we develop a P2P e-Learning system PeerLearning to provide two content-based learning material sharing services: a keyword search component for supporting content-based document sharing and a content-based retrieval method for multimedia materials. Extensive experiments are conducted in this study to verify the superiority of our methods over the existing works. © 2010 Springer Science+Business Media, LLC.


Cao K.,Northeastern University China | Cao K.,Key Laboratory of Medical Image Computing NEU | Han D.,Northeastern University China | Han D.,Key Laboratory of Medical Image Computing NEU | And 6 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

Outlier detection plays an important role in fraud detection, sensor net, computer network management and many other areas. Now the flow property and uncertainty of data are more and more apparent, outlier detection on uncertain data stream has become a new research topic. Firstly, we propose a new outlier concept on uncertain data stream based on possible worlds. Then an outlier detection method on uncertain data stream is proposed to meet the demand of limited storage and real-time processing. Next, a dynamic storage structure is designed for outlier detection on uncertain data stream over sliding window, to meet the demands of limited storage and real-time response. Furthermore, an efficient range query method based on SM-tree(Statistics M-tree) is proposed to reduce some redundant calculation. Finally, the performance of our method is verified through a large number of simulation experiments. The experimental results show that our method is an effective way to solve the problem of outlier detection on uncertain data stream, and it could significantly reduce the execution time and storage space. © 2013 Springer-Verlag.


Ding G.,Key Laboratory of Medical Image Computing NEU | Ding G.,Northeastern University China | Wang G.,Key Laboratory of Medical Image Computing NEU | Wang G.,Northeastern University China
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2011

Attribute-level schema matching is a critical step in numerous database applications, such as DataSpaces, Ontology Merging and Schema Integration. There exist many researches on this topic, however, they ignore the implicit categorical information which is crucial to find high-quality matches between schema attributes. In this paper, we discover the categorical semantics implicit in source instances, and associate them with the matches in order to improve overall quality of schema matching. Our method works in three phases. The first phase is a pre-detecting step that detects the possible categories of source instances by using clustering techniques. In the second phase, we employ information entropy to find the attributes whose instances imply the categorical semantics. In the third phase, we introduce a new concept c-mapping to represent the associations between the matches and the categorical semantics. Then, we employ an adaptive scoring function to evaluate the c-mappings to achieve the task of associating the matches with the semantics. Moreover, we show how to translate the matches with semantics into schema mapping expressions, and use the chase procedure to transform source data into target schemas. An experimental study shows that our approach is effective and has good performance. © 2011 Springer-Verlag.


Sun Y.,Key Laboratory of Medical Image Computing NEU | Sun Y.,Northeastern University China | Yuan Y.,Key Laboratory of Medical Image Computing NEU | Yuan Y.,Northeastern University China | And 2 more authors.
Neurocomputing | Year: 2011

Although classification in centralized environments has been widely studied in recent years, it is still an important research problem for classification in P2P networks due to the popularity of P2P computing environments. The main target of classification in P2P networks is how to efficiently decrease prediction error with small network overhead. In this paper, we propose an OS-ELM based ensemble classification framework for distributed classification in a hierarchical P2P network. In the framework, we apply the incremental learning principle of OS-ELM to the hierarchical P2P network to generate an ensemble classifier. There are two kinds of implementation methods of the ensemble classifier in the P2P network, one-by-one ensemble classification and parallel ensemble classification. Furthermore, we propose a data space coverage based peer selection approach to reduce high the communication cost and large delay. We also design a two-layer index structure to efficiently support peer selection. A peer creates a local Quad-tree to index its local data and a super-peer creates a global Quad-tree to summarize its local indexes. Extensive experimental studies verify the efficiency and effectiveness of the proposed algorithms. © 2011 Elsevier B.V.


Ding L.,Key Laboratory of Medical Image Computing NEU | Ding L.,Northeastern University China | Xin J.,Key Laboratory of Medical Image Computing NEU | Xin J.,Northeastern University China | And 4 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

As a parallel programming model, MapReduce processes scalable and parallel applications with huge amounts of data on large clusters. In MapReduce framework, there are no communication mechanisms among Mappers, neither are among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data objects. We observe that this waste can be avoided by simple communication mechanisms. In this paper, we propose ComMapReduce, a framework that extends and improves MapReduce for efficient query processing of massive data in the cloud. With efficient lightweight communication mechanisms, ComMapReduce can effectively filter the unpromising intermediate data objects in Map phase so as to decrease the input of Reduce phase specifically. Three communication strategies, Lazy, Eager and Hybrid, are proposed to filter the unpromising intermediate results of Map phase. In addition, two optimization strategies, Prepositive and Postpositive, are presented to enhance the performance of query processing by filtering more candidate data objects. Our extensive experiments on different synthetic datasets demonstrate that ComMapReduce framework outperforms the original MapReduce framework in all metrics without affecting its existing characteristics. © 2012 Springer-Verlag.


Han D.-H.,Northeastern University China | Zhang X.,Northeastern University China | Wang G.-R.,Northeastern University China | Wang G.-R.,Key Laboratory of Medical Image Computing NEU
Journal of Computer Science and Technology | Year: 2015

Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup. © 2015, Springer Science+Business Media New York.


Xin J.,Key Laboratory of Medical Image Computing NEU | Xin J.,Northeastern University China | Wang Z.,Northeastern University China | Chen C.,Key Laboratory of Medical Image Computing NEU | And 7 more authors.
World Wide Web | Year: 2014

Extreme Learning Machine (ELM) has been widely used in many fields such as text classification, image recognition and bioinformatics, as it provides good generalization performance at a extremely fast learning speed. However, as the data volume in real-world applications becomes larger and larger, the traditional centralized ELM cannot learn such massive data efficiently. Therefore, in this paper, we propose a novel Distributed Extreme Learning Machine based on MapReduce framework, named ELM ∗, which can cover the shortage of traditional ELM whose learning ability is weak to huge dataset. Firstly, after adequately analyzing the property of traditional ELM, it can be found out that the most expensive computation part of the matrix Moore-Penrose generalized inverse operator in the output weight vector calculation is the matrix multiplication operator. Then, as the matrix multiplication operator is decomposable, a Distributed Extreme Learning Machine (ELM ∗) based on MapReduce framework can be developed, which can first calculate the matrix multiplication effectively with MapReduce in parallel, and then calculate the corresponding output weight vector with centralized computing. Therefore, the learning of massive data can be made effectively. Finally, we conduct extensive experiments on synthetic data to verify the effectiveness and efficiency of our proposed ELM ∗ in learning massive data with various experimental settings. © 2013, Springer Science+Business Media New York.

Loading Key Laboratory of Medical Image Computing NEU collaborators
Loading Key Laboratory of Medical Image Computing NEU collaborators