Time filter

Source Type

Pereira C.M.M.,Institute of Mathematical and Computer science | De Mello R.F.,Institute of Mathematical and Computer science
Proceedings - 2014 Brazilian Conference on Intelligent Systems, BRACIS 2014 | Year: 2014

Clustering is one of the most used data mining techniques, while computational topology is a very recent field bridging abstract mathematics with concrete computational techniques. In this paper, we explore the hypothesis that topologically-similar clusters may indicate meaningful relationships. Our approach has an efficient implementation based on computing Minimum Spanning Trees to obtain topological information of each cluster. We then compute a discreteness and a disconnectedness index, used to characterize each cluster, thus allowing the retrieval of equivalence classes. We show that for a real-world high-dimensional network intrusion data set, the topologically-similar clusters retrieved by our approach do indeed correspond to meaningful equivalence classes present in the data set. © 2014 IEEE.

Pereira C.M.M.,Institute of Mathematical and Computer science | De Mello R.F.,Institute of Mathematical and Computer science
Expert Systems with Applications | Year: 2015

Topology is the branch of mathematics that studies how objects relate to one another for their qualitative structural properties, such as connectivity and shape. In this paper, we present an approach for data clustering based on topological features computed over the persistence diagram, estimated using the theory of persistent homology. The features indicate topological properties such as Betti numbers, i.e., the number of n-dimensional holes in the discretized data space. The main contribution of our approach is enabling the clustering of time series that have similar recurrent behavior characterized by their attractors in phase space and spatial data that have similar scale-invariant spatial distributions, as traditional clustering techniques ignore that information as they rely on point-to-point dissimilarity measures such as Euclidean distance or elastic measures. We present experiments that confirm the usefulness of our approach with time series and spatial data applications in the fields of biology, medicine and ecology. © 2015 Elsevier Ltd.

Pereira C.M.M.,Institute of Mathematical and Computer science | de Mello R.F.,Institute of Mathematical and Computer science
Neurocomputing | Year: 2016

High-dimensional data streams clustering is an attractive research topic, as there are several applications that generate a high number of attributes, bringing new challenges in terms of partitioning due to the curse of dimensionality. In addition, those applications produce unbounded sequences of data which cannot be stored for later analysis. Although the importance of this scenario, there are still very few algorithms available in the literature to meet this task. Despite the theoretical foundation of mathematical topology for dealing with high-dimensional spaces, none of those approaches have investigated the problem of finding topologically similar projected clusters in high-dimensional data streams. Among the advantages of topology is the possibility to analyze data in a coordinate-free and noise-robust manner. In a previous research, we have shown that topologically similar clusters can be meaningful considering real-world data sets. In this paper, we extend those ideas and propose PTS, an algorithm for finding topologically similar clusters in high-dimensional data streams. The algorithm is capable of finding traditional projected clusters and then merging them according to topological features computed using persistent homology. Experiments with synthetic data streams of dimensions d=8, 16, 32, 64 and 128 confirm the ability of PTS to find topologically similar projected clusters. © 2015 Elsevier B.V.

Discover hidden collaborations