Time filter

Source Type

Jiang Y.,Center for Computational Medicine | Jiang Y.,Harbin Institute of Technology | Turinsky A.L.,Center for Computational Medicine | Brudno M.,Center for Computational Medicine | Brudno M.,University of Toronto
Nucleic Acids Research | Year: 2015

With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. © 2015 The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. Source

Brudno M.,University of Toronto | Brudno M.,Center for Computational Medicine
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2015

Gene mutations cause not only well-recognized rare diseases such as muscular dystrophy and cystic fibrosis, but also thousands of other rare disorders. While individually rare, these disorders are collectively common, affecting one to three percent of the population. The last several years have seen the identification of hundreds of novel genes responsible for rare disorders, and an even greater number of cases where a known gene was implicated in a new disease. In this talk I will describe the computational approaches that are required to make this identification possible, and describe the tools that we (and others) have developed to enable clinicians to diagnose their patients by analyzing the patient genomes and sharing de-identified patient data. © Springer International Publishing Switzerland 2015. Source

Rampasek L.,University of Toronto | Arbabi A.,University of Toronto | Brudno M.,University of Toronto | Brudno M.,Center for Computational Medicine
Bioinformatics | Year: 2014

Motivation: The past several years have seen the development of methodologies to identify genomic variation within a fetus through the non-invasive sequencing of maternal blood plasma. These methods are based on the observation that maternal plasma contains a fraction of DNA (typically 5-15%) originating from the fetus, and such methodologies have already been used for the detection of whole-chromosome events (aneuploidies), and to a more limited extent for smaller (typically several megabases long) copy number variants (CNVs). Results: Here we present a probabilistic method for non-invasive analysis of de novo CNVs in fetal genome based on maternal plasma sequencing. Our novel method combines three types of information within a unified Hidden Markov Model: the imbalance of allelic ratios at SNP positions, the use of parental genotypes to phase nearby SNPs and depth of coverage to better differentiate between various types of CNVs and improve precision. Our simulation results, based on in silico introduction of novel CNVs into plasma samples with 13% fetal DNA concentration, demonstrate a sensitivity of 90% for CNVs >400 kb (with 13 calls in an unaffected genome), and 40% for 50-400 kb CNVs (with 108 calls in an unaffected genome). © 2014 The Author. Published by Oxford University Press. All rights reserved. Source

Jiang Y.,Harbin Institute of Technology | Jiang Y.,University of Toronto | Wang Y.,Harbin Institute of Technology | Brudno M.,University of Toronto | Brudno M.,Center for Computational Medicine
Bioinformatics | Year: 2012

Motivation: The development of high-throughput sequencing technologies has enabled novel methods for detecting structural variants (SVs). Current methods are typically based on depth of coverage or pair-end mapping clusters. However, most of these only report an approximate location for each SV, rather than exact breakpoints. Results: We have developed pair-read informed split mapping (PRISM), a method that identifies SVs and their precise breakpoints from whole-genome resequencing data. PRISM uses a split-alignment approach informed by the mapping of paired-end reads hence enabling breakpoint identification ofmultiple SV types, including arbitrary-sized inversions, deletions and tandem duplications. Comparisons to previous datasets and simulation experiments illustrate PRISM's high sensitivity, while PCR validations of PRISM results including previously uncharacterized variants, indicate an overall precision of ∼90%. © The Author 2012. Published by Oxford University Press. All rights reserved. Source

Donmez N.,University of Toronto | Brudno M.,University of Toronto | Brudno M.,Center for Computational Medicine
Bioinformatics | Year: 2013

Motivation: Scaffolding is the process of ordering and orienting contigs produced during genome assembly. Accurate scaffolding is essential for finishing draft assemblies, as it facilitates the costly and laborious procedures needed to fill in the gaps between contigs. Conventional formulations of the scaffolding problem are intractable, and most scaffolding programs rely on heuristic or approximate solutions, with potentially exponential running time.Results: We present SCARPA, a novel scaffolder, which combines fixed-parameter tractable and bounded algorithms with Linear Programming to produce near-optimal scaffolds. We test SCARPA on real datasets in addition to a simulated diploid genome and compare its performance with several state-of-the-art scaffolders. We show that SCARPA produces longer or similar length scaffolds that are highly accurate compared with other scaffolders. SCARPA is also capable of detecting misassembled contigs and reports them during scaffolding.Availability: SCARPA is open source and available from http://compbio.cs.toronto.edu/scarpa. © 2013 The Author. Source

Discover hidden collaborations