Institute for Genome science
Institute for Genome science
Shringarpure S.S.,Stanford University |
Shringarpure S.S.,View Inc |
Mathias R.A.,Johns Hopkins University |
Hernandez R.D.,Institute for Human Genetics |
And 8 more authors.
Bioinformatics | Year: 2017
Motivation: Variant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X). Results: We have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illuminas single-sample caller CASAVA, Real Time Genomics multisample variant caller, and the GATK UnifiedGenotyper, respectively. Since NGS sequencing data may be accompanied by genotype data for the same samples, either collected concurrent to sequencing or from a previous study, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g. a different set of criteria to determine quality for rare versus common variants) and thereby provides insight into sequencing characteristics that indicate call quality for variants of different frequencies. © 2016 The Author. Published by Oxford University Press.
Agrawal S.,Institute for Genome science |
Arze C.,Institute for Genome science |
Adkins R.S.,Institute for Genome science |
Crabtree J.,Institute for Genome science |
And 14 more authors.
BMC Genomics | Year: 2017
Background: The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. Results: CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36h on a local desktop or at a cost of<$20 on EC2. Conclusions: CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise. © 2017 The Author(s).
Jores J.,Kenya International Livestock Research Institute |
Fischer A.,Kenya International Livestock Research Institute |
Fischer A.,International Center for Insect Physiology and Ecology |
Sirand-Pugnet P.,French National Institute for Agricultural Research |
And 7 more authors.
Systematic and Applied Microbiology | Year: 2013
Five Mycoplasma strains from wild Caprinae were analyzed: four from Alpine ibex (Capra ibex) which died at the Berlin Zoo between 1993 and 1994, one from a Rocky Mountain goat collected in the USA prior to 1987. These five strains represented a population different from the populations belonging to the 'Mycoplasma mycoides cluster' as tested using multi locus sequence typing, Matrix-assisted laser desorption/ionization time of flight mass spectrometry analysis and DNA-DNA hybridization. Analysis of the 16S rRNA gene (rrs), genomic sequence based in silico as well as laboratory DNA-DNA hybridization, and the analysis of phenotypic traits in particular their exceptionally rapid growth all confirmed that they do not belong to any Mycoplasma species described to date. We therefore suggest these strains represent a novel species, for which we propose the name Mycoplasma feriruminatoris sp. nov. The type strain is G5847T (=DSM 26019T=NCTC 1362T). © 2013 Elsevier GmbH.
Boisen N.,Statens Serum Institute |
Boisen N.,University of Virginia |
Scheutz F.,Statens Serum Institute |
Persson S.,Institute for Genome science |
And 10 more authors.
Journal of Infectious Diseases | Year: 2012
Background. Enteroaggregative Escherichia coli (EAEC) is a cause of epidemic and sporadic diarrhea, yet its role as an enteric pathogen is not fully understood. Methods. We characterized 121 EAEC strains isolated in 2008 as part of a case-control study of moderate to severe acute diarrhea among children 0-59 months of age in Bamako, Mali. We applied multiplex polymerase chain reaction and comparative genome hybridization to identify potential virulence factors among the EAEC strains, coupled with classification and regression tree modeling to reveal combinations of factors most strongly associated with illness. Results. The gene encoding the autotransporter protease SepA, originally described in Shigella species, was most strongly associated with diarrhea among the EAEC strains tested (odds ratio, 5.6 [95% confidence interval, 1.92-16.17]; P =. 0006). In addition, we identified 3 gene combinations correlated with diarrhea: (1) a clonal group positive for sepA and a putative hemolysin; (2) a group harboring the EAST-1 enterotoxin and the flagellar type H33 but no other previously identified EAEC virulence factor; and (3) a group carrying several of the typical EAEC virulence genes. Conclusion. Our data suggest that only a subset of EAEC strains are pathogenic in Mali and suggest that sepA may serve as a valuable marker for the most virulent isolates. © The Author 2011. Published by Oxford University Press on behalf of the Infectious.
PubMed | The University of Notthingham, University of Nairobi, Kenya International Livestock Research Institute, Kasetsart University and 4 more.
Type: Journal Article | Journal: Genome announcements | Year: 2016
Phytoplasmas are bacterial plant pathogens with devastating impact on agricultural production worldwide. In eastern Africa, Napier grass stunt disease causes serious economic losses in the smallholder dairy industry. This draft genome sequence of ITALIC! CandidatusPhytoplasma oryzae strain Mbita1 provides insight into its genomic organization and the molecular basis of pathogenicity.
Khmaladze E.,National Center for Disease Control and Public Health |
Khmaladze E.,Tbilisi State University |
Birdsell D.N.,Northern Arizona University |
Naumann A.A.,Northern Arizona University |
And 30 more authors.
PLoS ONE | Year: 2014
Sequence analyses and subtyping of Bacillus anthracis strains from Georgia reveal a single distinct lineage (Aust94) that is ecologically established. Phylogeographic analysis and comparisons to a global collection reveals a clade that is mostly restricted to Georgia. Within this clade, many groups are found around the country, however at least one subclade is only found in the eastern part. This pattern suggests that dispersal into and out of Georgia has been rare and despite historical dispersion within the country, for at least for one lineage, current spread is limited. © 2014 Khmaladze et al.
Haas B.J.,The Broad Institute of MIT and Harvard |
Papanicolaou A.,CSIRO |
Yassour M.,The Broad Institute of MIT and Harvard |
Yassour M.,Hebrew University of Jerusalem |
And 22 more authors.
Nature Protocols | Year: 2013
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
Sahl J.W.,Institute for Genome science |
Johnson J.K.,University of Maryland, Baltimore |
Harris A.D.,Genomic Health |
Phillippy A.M.,U.S. National Biodefense Analysis and Countermeasures Center |
And 4 more authors.
BMC Genomics | Year: 2011
Background: Acinetobacter baumannii has recently emerged as a significant global pathogen, with a surprisingly rapid acquisition of antibiotic resistance and spread within hospitals and health care institutions. This study examines the genomic content of three A. baumannii strains isolated from distinct body sites. Isolates from blood, peri-anal, and wound sources were examined in an attempt to identify genetic features that could be correlated to each isolation source.Results: Pulsed-field gel electrophoresis, multi-locus sequence typing and antibiotic resistance profiles demonstrated genotypic and phenotypic variation. Each isolate was sequenced to high-quality draft status, which allowed for comparative genomic analyses with existing A. baumannii genomes. A high resolution, whole genome alignment method detailed the phylogenetic relationships of sequenced A. baumannii and found no correlation between phylogeny and body site of isolation. This method identified genomic regions unique to both those isolates found on the surface of the skin or in wounds, termed colonization isolates, and those identified from body fluids, termed invasive isolates; these regions may play a role in the pathogenesis and spread of this important pathogen. A PCR-based screen of 74 A. baumanii isolates demonstrated that these unique genes are not exclusive to either phenotype or isolation source; however, a conserved genomic region exclusive to all sequenced A. baumannii was identified and verified.Conclusions: The results of the comparative genome analysis and PCR assay show that A. baumannii is a diverse and genomically variable pathogen that appears to have the potential to cause a range of human disease regardless of the isolation source. © 2011 Sahl et al; licensee BioMed Central Ltd.