Institute for Bioscience and Biotechnology Research University of Maryland Rockville

Institute for Bioscience and Biotechnology Research University of Maryland Rockville

SEARCH FILTERS
Time filter
Source Type

Cai B.,University of Washington | Li B.,The Buck Institute for Research on Aging Novato California | Kiga N.,University of Washington | Thusberg J.,The Buck Institute for Research on Aging Novato California | And 36 more authors.
Human Mutation | Year: 2017

The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features. © 2017 Wiley Periodicals, Inc.


Chandonia J.-M.,Lawrence Berkeley National Laboratory | Adhikari A.,University of California at Berkeley | Carraro M.,University of Padua | Chhibber A.,Roche Holding AG | And 16 more authors.
Human Mutation | Year: 2017

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication. © 2017 Wiley Periodicals, Inc.


Carraro M.,University of Padua | Minervini G.,University of Padua | Giollo M.,University of Padua | Bromberg Y.,TU Munich | And 27 more authors.
Human Mutation | Year: 2017

Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants. © 2017 Wiley Periodicals, Inc.


Kundu K.,University of Maryland College Park | Pal L.R.,Institute for Bioscience and Biotechnology Research University of Maryland Rockville | Yin Y.,University of Maryland College Park | Moult J.,University of Maryland College Park
Human Mutation | Year: 2017

The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area. © 2017 Wiley Periodicals, Inc.


Pal L.R.,Institute for Bioscience and Biotechnology Research University of Maryland Rockville | Kundu K.,University of Maryland College Park | Yin Y.,University of Maryland College Park | Moult J.,University of Maryland College Park
Human Mutation | Year: 2017

Compared with earlier more restricted sequencing technologies, identification of rare disease variants using whole-genome sequence has the possibility of finding all causative variants, but issues of data quality and an overwhelming level of background variants complicate the analysis. The CAGI4 SickKids clinical genome challenge provided an opportunity to assess the landscape of variants found in a difficult set of 25 unsolved rare disease cases. To address the challenge, we developed a three-stage pipeline, first carefully analyzing data quality, then classifying high-quality gene-specific variants into seven categories, and finally examining each candidate variant for compatibility with the often complex phenotypes of these patients for final prioritization. Variants consistent with the phenotypes were found in 24 out of the 25 cases, and in a number of these, there are prioritized variants in multiple genes. Data quality analysis suggests that some of the selected variants are likely incorrect calls, complicating interpretation. The data providers followed up on three suggested variants with Sanger sequencing, and in one case, a prioritized variant was confirmed as likely causative by the referring physician, providing a diagnosis in a previously intractable case. © 2017 Wiley Periodicals, Inc.


Pal L.R.,Institute for Bioscience and Biotechnology Research University of Maryland Rockville | Kundu K.,University of Maryland College Park | Yin Y.,University of Maryland College Park | Moult J.,University of Maryland College Park
Human Mutation | Year: 2017

Understanding the basis of complex trait disease is a fundamental problem in human genetics. The CAGI Crohn's Exome challenges are providing insight into the adequacy of current disease models by requiring participants to identify which of a set of individuals has been diagnosed with the disease, given exome data. For the CAGI4 round, we developed a method that used the genotypes from exome sequencing data only to impute the status of genome wide association studies marker SNPs. We then used the imputed genotypes as input to several machine learning methods that had been trained to predict disease status from marker SNP information. We achieved the best performance using Naïve Bayes and with a consensus machine learning method, obtaining an area under the curve of 0.72, larger than other methods used in CAGI4. We also developed a model that incorporated the contribution from rare missense variants in the exome data, but this performed less well. Future progress is expected to come from the use of whole genome data rather than exomes. © 2017 Wiley Periodicals, Inc.


Daneshjou R.,Stanford University | Wang Y.,Rutgers University | Bromberg Y.,Rutgers University | Bovo S.,University of Bologna | And 46 more authors.
Human Mutation | Year: 2017

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships. © 2017 Wiley Periodicals, Inc.

Loading Institute for Bioscience and Biotechnology Research University of Maryland Rockville collaborators
Loading Institute for Bioscience and Biotechnology Research University of Maryland Rockville collaborators