Kim S.,East Carolina University |
Kim S.,Neural Systems and Dynamics Laboratory |
Scalzo F.,Neural Systems and Dynamics Laboratory |
Telesca D.,University of California at Los Angeles |
Hu X.,Neural Systems and Dynamics Laboratory
International Journal of Data Mining and Bioinformatics | Year: 2015
Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques. Copyright © 2015 Inderscience Enterprises Ltd.
Hu X.,Neural Systems and Dynamics Laboratory |
Xu P.,Neural Systems and Dynamics Laboratory |
Wu S.,Neural Systems and Dynamics Laboratory |
Asgari S.,Neural Systems and Dynamics Laboratory |
Bergsneider M.,Neural Systems and Dynamics Laboratory
Journal of Biomedical Informatics | Year: 2010
Time series estimation techniques are usually employed in biomedical research to derive variables less accessible from a set of related and more accessible variables. These techniques are traditionally built from systems modeling approaches including simulation, blind decovolution, and state estimation. In this work, we define target time series (TTS) and its related time series (RTS) as the output and input of a time series estimation process, respectively. We then propose a novel data mining framework for time series estimation when TTS and RTS represent different sets of observed variables from the same dynamic system. This is made possible by mining a database of instances of TTS, its simultaneously recorded RTS, and the input/output dynamic models between them. The key mining strategy is to formulate a mapping function for each TTS-RTS pair in the database that translates a feature vector extracted from RTS to the dissimilarity between true TTS and its estimate from the dynamic model associated with the same TTS-RTS pair. At run time, a feature vector is extracted from an inquiry RTS and supplied to the mapping function associated with each TTS-RTS pair to calculate a dissimilarity measure. An optimal TTS-RTS pair is then selected by analyzing these dissimilarity measures. The associated input/output model of the selected TTS-RTS pair is then used to simulate the TTS given the inquiry RTS as an input. An exemplary implementation was built to address a biomedical problem of noninvasive intracranial pressure assessment. The performance of the proposed method was superior to that of a simple training-free approach of finding the optimal TTS-RTS pair by a conventional similarity-based search on RTS features. © 2009 Elsevier Inc. All rights reserved.