Nair N.U.,Laboratory for Computational Biology and Bioinformatics |
Kumar S.,Ecole Polytechnique Federale de Lausanne |
Moret B.M.E.,Laboratory for Computational Biology and Bioinformatics |
Moret B.M.E.,Swiss Institute of Bioinformatics |
And 2 more authors.
Bioinformatics | Year: 2014
Motivation: We have witnessed an enormous increase in ChIP-Seq data for histone modifications in the past few years. Discovering significant patterns in these data is an important problem for understanding biological mechanisms. Results: We propose probabilistic partitioning methods to discover significant patterns in ChIP-Seq data. Our methods take into account signal magnitude, shape, strand orientation and shifts. We compare our methods with some current methods and demonstrate significant improvements, especially with sparse data. Besides pattern discovery and classification, probabilistic partitioning can serve other purposes in ChIP-Seq data analysis. Specifically, we exemplify its merits in the context of peak finding and partitioning of nucleosome positioning patterns in human promoters. © The Author 2014. Published by Oxford University Press.
Lin Y.,Laboratory for Computational Biology and Bioinformatics |
Rajan V.,Laboratory for Computational Biology and Bioinformatics |
Moret B.M.E.,Laboratory for Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics | Year: 2012
Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changesreattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance. © 2012 IEEE.