Stegle O.,Max Planck Institutes Tubingen
PLoS computational biology | Year: 2010
Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/.
Jegelka S.,Max Planck Institutes Tubingen |
Bilmes J.,University of Washington
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2011
We propose a new family of non-submodular global energy functions that still use submodularity internally to couple edges in a graph cut. We show it is possible to develop an efficient approximation algorithm that, thanks to the internal submodularity, can use standard graph cuts as a subroutine. We demonstrate the advantages of edge coupling in a natural setting, namely image segmentation. In particular, for fine-structured objects and objects with shading variation, our structured edge coupling leads to significant improvements over standard approaches. © 2011 IEEE.
Wong H.M.,University of Cambridge |
Stegle O.,Max Planck Institutes Tubingen |
Rodgers S.,Thaze Ltd. |
Huppert J.L.,University of Cambridge
Journal of Nucleic Acids | Year: 2010
G-quadruplexes are four stranded nucleic acid structures formed around a core of guanines, arranged in squares with mutual hydrogen bonding. Many of these structures are highly thermally stable, especially in the presence of monovalent cations, such as those found under physiological conditions. Understanding of their physiological roles is expanding rapidly, and they have been implicated in regulating gene transcription and translation among other functions. We have built a community-focused website to act as a repository for the information that is now being developed. At its core, this site has a detailed database (QuadDB) of predicted G-quadruplexes in the human and other genomes, together with the predictive algorithm used to identify them. We also provide a QuadPredict server, which predicts thermal stability and acts as a repository for experimental data from all researchers. There are also a number of other data sources with computational predictions. We anticipate that the wide availability of this information will be of use both to researchers already active in this exciting field and to those who wish to investigate a particular gene hypothesis. Copyright © 2010 Han MinWong et al.
Gretton A.,MPI for Intelligent Systems |
Gretton A.,Gatsby Computational Neuroscience Unit |
Borgwardt K.M.,Max Planck Institutes Tubingen |
Rasch M.J.,Beijing Normal University |
And 3 more authors.
Journal of Machine Learning Research | Year: 2012
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD).We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests. © 2012 Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf and Alexander Smola.
Shervashidze N.,Max Planck Institutes Tubingen |
Schweitzer P.,Max Planck Institute for Informatics |
Van Leeuwen E.J.,University of Bergen |
Mehlhorn K.,Max Planck Institute for Informatics |
Borgwardt K.M.,Max Planck Institutes Tubingen
Journal of Machine Learning Research | Year: 2011
In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be defined based on this Weisfeiler-Lehman sequence of graphs, including a highly efficient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. © 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt.