Cao H.,BGI Shenzhen |
Cao H.,Copenhagen University |
Wu H.,BGI Shenzhen |
Luo R.,BGI Shenzhen |
And 62 more authors.
Nature Biotechnology | Year: 2015
The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine. © 2015 Nature America, Inc. All rights reserved.
Wang W.,University of Hong Kong |
Wang P.,University of Hong Kong |
Xu F.,University of Hong Kong |
Luo R.,HKU BGI Bioinformatics Algorithms and Core Technology Research Laboratory |
And 3 more authors.
Bioinformatics | Year: 2014
Summary: Recent advances in high-throughput sequencing technologies have enabled us to sequence large number of cancer samples to reveal novel insights into oncogenetic mechanisms. However, the presence of intratumoral heterogeneity, normal cell contamination and insufficient sequencing depth, together pose a challenge for detecting somatic mutations. Here we propose a fast and an accurate somatic single-nucleotide variations (SNVs) detection program, FaSD-somatic. The performance of FaSD-somatic is extensively assessed on various types of cancer against several state-of-the-Art somatic SNV detection programs. Benchmarked by somatic SNVs from either existing databases or de novo higher-depth sequencing data, FaSD-somatic has the best overall performance. Furthermore, FaSD-somatic is efficient, it finishes somatic SNV calling within 14 h on 50X whole genome sequencing data in paired samples. © The Author 2014. Published by Oxford University Press. All rights reserved.