Genome Biology Research Unit
Genome Biology Research Unit
Moncunill V.,Barcelona Supercomputing Center |
Gonzalez S.,Barcelona Supercomputing Center |
Bea S.,University of Barcelona |
Andrieux L.O.,Barcelona Supercomputing Center |
And 23 more authors.
Nature Biotechnology | Year: 2014
The development of high-throughput sequencing technologies has advanced our understanding of cancer. However, characterizing somatic structural variants in tumor genomes is still challenging because current strategies depend on the initial alignment of reads to a reference genome. Here, we describe SMUFIN (somatic mutation finder), a single program that directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution. Performance tests on modeled tumor genomes showed average sensitivity of 92% and 74% for SNVs and structural variants, with specificities of 95% and 91%, respectively. Analyses of aggressive forms of solid and hematological tumors revealed that SMUFIN identifies breakpoints associated with chromothripsis and chromoplexy with high specificity. SMUFIN provides an integrated solution for the accurate, fast and comprehensive characterization of somatic sequence variation in cancer. © 2014 Nature America, Inc. All rights reserved.
PubMed | Genome Biology Research Unit, University of Barcelona, Barcelona Supercomputing Center and University of Oviedo
Type: Journal Article | Journal: Nature biotechnology | Year: 2014
The development of high-throughput sequencing technologies has advanced our understanding of cancer. However, characterizing somatic structural variants in tumor genomes is still challenging because current strategies depend on the initial alignment of reads to a reference genome. Here, we describe SMUFIN (somatic mutation finder), a single program that directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution. Performance tests on modeled tumor genomes showed average sensitivity of 92% and 74% for SNVs and structural variants, with specificities of 95% and 91%, respectively. Analyses of aggressive forms of solid and hematological tumors revealed that SMUFIN identifies breakpoints associated with chromothripsis and chromoplexy with high specificity. SMUFIN provides an integrated solution for the accurate, fast and comprehensive characterization of somatic sequence variation in cancer.
Phelan V.V.,University of California at San Diego |
Moree W.J.,University of California at San Diego |
Aguilar J.,University of California at San Diego |
Cornett D.S.,Bruker |
And 6 more authors.
Journal of Bacteriology | Year: 2014
In microbiology, gene disruption and subsequent experiments often center on phenotypic changes caused by one class of specialized metabolites (quorum sensors, virulence factors, or natural products), disregarding global downstream metabolic effects. With the recent development of mass spectrometry-based methods and technologies for microbial metabolomics investigations, it is now possible to visualize global production of diverse classes of microbial specialized metabolites simultaneously. Using imaging mass spectrometry (IMS) applied to the analysis of microbiology experiments, we can observe the effects of mutations, knockouts, insertions, and complementation on the interactive metabolome. In this study, a combination of IMS and liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to visualize the impact on specialized metabolite production of a transposon insertion into a Pseudomonas aeruginosa phenazine biosynthetic gene, phzF2. The disruption of phenazine biosynthesis led to broad changes in specialized metabolite production, including loss of pyoverdine production. This shift in specialized metabolite production significantly alters the metabolic outcome of an interaction with Aspergillus fumigatus by influencing triacetylfusarinine production. © 2014, American Society for Microbiology.
Richter J.,University of Kiel |
Schlesner M.,German Cancer Research Center |
Hoffmann S.,University of Leipzig |
Kreuz M.,University of Leipzig |
And 55 more authors.
Nature Genetics | Year: 2012
Burkitt lymphoma is a mature aggressive B-cell lymphoma derived from germinal center B cells. Its cytogenetic hallmark is the Burkitt translocation t(8;14)(q24;q32) and its variants, which juxtapose the MYC oncogene with one of the three immunoglobulin loci. Consequently, MYC is deregulated, resulting in massive perturbation of gene expression. Nevertheless, MYC deregulation alone seems not to be sufficient to drive Burkitt lymphomagenesis. By whole-genome, whole-exome and transcriptome sequencing of four prototypical Burkitt lymphomas with immunoglobulin gene (IG)-MYC translocation, we identified seven recurrently mutated genes. One of these genes, ID3, mapped to a region of focal homozygous loss in Burkitt lymphoma. In an extended cohort, 36 of 53 molecularly defined Burkitt lymphomas (68%) carried potentially damaging mutations of ID3. These were strongly enriched at somatic hypermutation motifs. Only 6 of 47 other B-cell lymphomas with the IG-MYC translocation (13%) carried ID3 mutations. These findings suggest that cooperation between ID3 inactivation and IG-MYC translocation is a hallmark of Burkitt lymphomagenesis. © 2012 Nature America, Inc. All rights reserved.
Rasche F.,Friedrich - Schiller University of Jena |
Scheubert K.,Friedrich - Schiller University of Jena |
Hufsky F.,Friedrich - Schiller University of Jena |
Hufsky F.,Max Planck Institute for Chemical Ecology |
And 4 more authors.
Analytical Chemistry | Year: 2012
Mass spectrometry allows sensitive, automated, and high-throughput analysis of small molecules. In principle, tandem mass spectrometry allows us to identify "unknown" small molecules not in any database, but the automated interpretation of such data is in its infancy. Fragmentation trees have recently been introduced for the automated analysis of the fragmentation patterns of small molecules. We present a method for the automated comparison of such fragmentation patterns, based on aligning the compounds' fragmentation trees. We cluster compounds based solely on their fragmentation patterns and show a good agreement with known compound classes. Fragmentation pattern similarities are strongly correlated with the chemical similarity of molecules. We present a tool for searching a database for compounds with fragmentation pattern similar to an unknown sample compound. We apply this tool to metabolites from Icelandic poppy. Our method allows fully automated computational identification of small molecules that cannot be found in any database. © 2012 American Chemical Society.
Sipos B.,European Bioinformatics Institute |
Massingham T.,European Bioinformatics Institute |
Stutz A.M.,Genome Biology Research Unit |
Goldman N.,European Bioinformatics Institute
PLoS ONE | Year: 2012
The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures. © 2012 Sipos et al.
Waszak S.M.,Weizmann Institute of Science |
Waszak S.M.,Weihenstephan-Triesdorf University of Applied Sciences |
Waszak S.M.,Genome Biology Research Unit |
Hasin Y.,Weizmann Institute of Science |
And 9 more authors.
PLoS Computational Biology | Year: 2010
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95-99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ~15% and~20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing. © 2010 Waszak et al.
Schlattl A.,Genome Biology Research Unit |
Anders S.,Genome Biology Research Unit |
Waszak S.M.,Genome Biology Research Unit |
Huber W.,Genome Biology Research Unit |
And 3 more authors.
Genome Research | Year: 2011
Copy-number variants (CNVs) form an abundant class of genetic variation with a presumed widespread impact on individual traits.While recent advances, such as the population-scale sequencing of human genomes, facilitated the fine-scale mapping of CNVs, the phenotypic impact of most of these CNVs remains unclear. By relating copy-number genotypes to transcriptome sequencing data, we have evaluated the impact of CNVs, mapped at fine scale, on gene expression. Based on data from 129 individuals with ancestry from two populations, we identified CNVs associated with the expression of 110 genes, with 13% of the associations involving complex, multiallelic CNVs. Categorization of CNVs according to variant type, size, and gene overlap enabled us to examine the impact of different CNV classes on expression variation. While many small (<4 kb) CNVs were associated with expression variation, overall we observed an enrichment of large duplications and deletions, including large intergenic CNVs, relative to the entire set of expression-associated CNVs. Furthermore, the copy number of genes intersecting with CNVs typically correlated positively with the genes' expression, and also was more strongly correlated with expression than nearby single nucleotide polymorphisms, suggesting a frequent causal role of CNVs in expression quantitative trait loci (eQTLs). We also elucidated unexpected cases of negative correlations between copy number and expression by assessing the CNVs' effects on the structure and regulation of genes. Finally, we examined dosage compensation of transcript levels. Our results suggest that association studies can gain in resolution and power by including fine-scale CNV information, such as those obtained from population-scale sequencing. © 2011 by Cold Spring Harbor Laboratory Press.
Kasowski M.,Yale University |
Grubert F.,Yale University |
Grubert F.,Stanford University |
Heffelfinger C.,Yale University |
And 19 more authors.
Science | Year: 2010
Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (Polll) and a key regulator of immune responses, nuclear factor KB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing Polll binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.
Bens S.,University of Kiel |
Zichner T.,Genome Biology Research Unit |
Stutz A.M.,Genome Biology Research Unit |
Caliebe A.,University of Kiel |
And 5 more authors.
Genes and Immunity | Year: 2014
Periodic fever, aphthous stomatitis, pharyngitis and adenopathy (PFAPA) syndrome is an auto-inflammatory disease for which a genetic basis has been postulated. Nevertheless, in contrast to the other periodic fever syndromes, no candidate genes have yet been identified. By cloning, following long insert size paired-end sequencing, of a de novo chromosomal translocation t(10;17)(q11.2;p13) in a patient with typical PFAPA syndrome lacking mutations in genes associated with other periodic fever syndromes we identified SPAG7 as a candidate gene for PFAPA. SPAG7 protein is expressed in tissues affected by PFAPA and has been functionally linked to antiviral and inflammatory responses. Haploinsufficiency of SPAG7 due to a microdeletion at the translocation breakpoint leading to loss of exons 2-7 from one allele was associated with PFAPA in the index. Sequence analyses of SPAG7 in additional patients with PFAPA point to genetic heterogeneity or alternative mechanisms of SPAG7 deregulation, such as somatic or epigenetic changes. Copyright © 2014 Macmillan Publishers Limited.