In Silico Solutions

Fairfax, VA, United States

In Silico Solutions

Fairfax, VA, United States
SEARCH FILTERS
Time filter
Source Type

PubMed | In Silico Solutions, U.S. National Institutes of Health and Cornell University
Type: | Journal: Nature communications | Year: 2016

Mammalian chromosome replication starts from distinct sites; however, the principles governing initiation site selection are unclear because proteins essential for DNA replication do not exhibit sequence-specific DNA binding. Here we identify a replication-initiation determinant (RepID) protein that binds a subset of replication-initiation sites. A large fraction of RepID-binding sites share a common G-rich motif and exhibit elevated replication initiation. RepID is required for initiation of DNA replication from RepID-bound replication origins, including the origin at the human beta-globin (HBB) locus. At HBB, RepID is involved in an interaction between the replication origin (Rep-P) and the locus control region. RepID-depleted murine embryonic fibroblasts exhibit abnormal replication fork progression and fewer replication-initiation events. These observations are consistent with a model, suggesting that RepID facilitates replication initiation at a distinct group of human replication origins.


Douville C.,Johns Hopkins University | Carter H.,Johns Hopkins University | Kim R.,In Silico Solutions | Niknafs N.,Johns Hopkins University | And 5 more authors.
Bioinformatics | Year: 2013

Summary: Advances in sequencing technology have greatly reduced the costs incurred in collecting raw sequencing data. Academic laboratories and researchers therefore now have access to very large datasets of genomic alterations but limited time and computational resources to analyse their potential biological importance. Here, we provide a web-based application, Cancer-Related Analysis of Variants Toolkit, designed with an easy-to-use interface to facilitate the high-throughput assessment and prioritization of genes and missense alterations important for cancer tumorigenesis. Cancer-Related Analysis of Variants Toolkit provides predictive scores for germline variants, somatic mutations and relative gene importance, as well as annotations from published literature and databases. Results are emailed to users as MS Excel spreadsheets and/or tab-separated text files. © 2013 The Author 2013. Published by Oxford University Press.


Niknafs N.,Johns Hopkins University | Kim D.,Johns Hopkins University | Kim R.,In Silico Solutions | Diekhans M.,University of California at Santa Cruz | And 4 more authors.
Human Genetics | Year: 2013

Mutation position imaging toolbox (MuPIT) interactive is a browser-based application for single-nucleotide variants (SNVs), which automatically maps the genomic coordinates of SNVs onto the coordinates of available three-dimensional (3D) protein structures. The application is designed for interactive browser-based visualization of the putative functional relevance of SNVs by biologists who are not necessarily experts either in bioinformatics or protein structure. Users may submit batches of several thousand SNVs and review all protein structures that cover the SNVs, including available functional annotations such as binding sites, mutagenesis experiments, and common polymorphisms. Multiple SNVs may be mapped onto each structure, enabling 3D visualization of SNV clusters and their relationship to functionally annotated positions. We illustrate the utility of MuPIT interactive in rationalizing the impact of selected polymorphisms in the PharmGKB database, somatic mutations identified in the Cancer Genome Atlas study of invasive breast carcinomas, and rare variants identified in the exome sequencing project. MuPIT interactive is freely available for non-profit use at http://mupit.icm.jhu.edu. © 2013 Springer-Verlag Berlin Heidelberg.


Wong W.C.,Johns Hopkins University | Kim D.,Johns Hopkins University | Carter H.,Johns Hopkins University | Diekhans M.,University of California at Santa Cruz | And 2 more authors.
Bioinformatics | Year: 2011

Summary: Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline. © The Author(s) 2011. Published by Oxford University Press.


PubMed | Johns Hopkins University, In Silico Solutions and University of Cardiff
Type: Journal Article | Journal: Human mutation | Year: 2016

Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new PubMed feature, to estimate a genes importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method.


PubMed | Johns Hopkins University and In Silico Solutions
Type: Journal Article | Journal: Cancer research | Year: 2016

The impact of somatic missense mutation on cancer etiology and progression is often difficult to interpret. One common approach for assessing the contribution of missense mutations in carcinogenesis is to identify genes mutated with statistically nonrandom frequencies. Even given the large number of sequenced cancer samples currently available, this approach remains underpowered to detect drivers, particularly in less studied cancer types. Alternative statistical and bioinformatic approaches are needed. One approach to increase power is to focus on localized regions of increased missense mutation density or hotspot regions, rather than a whole gene or protein domain. Detecting missense mutation hotspot regions in three-dimensional (3D) protein structure may also be beneficial because linear sequence alone does not fully describe the biologically relevant organization of codons. Here, we present a novel and statistically rigorous algorithm for detecting missense mutation hotspot regions in 3D protein structures. We analyzed approximately 3 10(5) mutations from The Cancer Genome Atlas (TCGA) and identified 216 tumor-type-specific hotspot regions. In addition to experimentally determined protein structures, we considered high-quality structural models, which increase genomic coverage from approximately 5,000 to more than 15,000 genes. We provide new evidence that 3D mutation analysis has unique advantages. It enables discovery of hotspot regions in many more genes than previously shown and increases sensitivity to hotspot regions in tumor suppressor genes (TSG). Although hotspot regions have long been known to exist in both TSGs and oncogenes, we provide the first report that they have different characteristic properties in the two types of driver genes. We show how cancer researchers can use our results to link 3D protein structure and the biologic functions of missense mutations in cancer, and to generate testable hypotheses about driver mechanisms. Our results are included in a new interactive website for visualizing protein structures with TCGA mutations and associated hotspot regions. Users can submit new sequence data, facilitating the visualization of mutations in a biologically relevant context. Cancer Res; 76(13); 3719-31. 2016 AACR.


PubMed | University of Connecticut Health Center, University of Washington, In Silico Solutions, U.S. National Institutes of Health and Yeshiva University
Type: | Journal: Epigenetics & chromatin | Year: 2016

Eukaryotic genome duplication starts at discrete sequences (replication origins) that coordinate cell cycle progression, ensure genomic stability and modulate gene expression. Origins share some sequence features, but their activity also responds to changes in transcription and cellular differentiation status.To identify chromatin states and histone modifications that locally mark replication origins, we profiled origin distributions in eight human cell lines representing embryonic and differentiated cell types. Consistent with a role of chromatin structure in determining origin activity, we found that cancer and non-cancer cells of similar lineages exhibited highly similar replication origin distributions. Surprisingly, our study revealed that DNase hypersensitivity, which often correlates with early replication at large-scale chromatin domains, did not emerge as a strong local determinant of origin activity. Instead, we found that two distinct sets of chromatin modifications exhibited strong local associations with two discrete groups of replication origins. The first origin group consisted of about 40,000 regions that actively initiated replication in all cell types and preferentially colocalized with unmethylated CpGs and with the euchromatin markers, H3K4me3 and H3K9Ac. The second group included origins that were consistently active in cells of a single type or lineage and preferentially colocalized with the heterochromatin marker, H3K9me3. Shared origins replicated throughout the S-phase of the cell cycle, whereas cell-type-specific origins preferentially replicated during late S-phase.These observations are in line with the hypothesis that differentiation-associated changes in chromatin and gene expression affect the activation of specific replication origins.


Ryan M.C.,In Silico Solutions | Cleland J.,In Silico Solutions | Kim R.,In Silico Solutions | Wong W.C.,In Silico Solutions | Weinstein J.N.,University of Houston
Bioinformatics | Year: 2012

Summary: SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. © The Author 2012. Published by Oxford University Press. All rights reserved.


Ryan M.,University of Houston | Wong W.C.,In Silico Solutions | Brown R.,In Silico Solutions | Akbani R.,University of Houston | And 4 more authors.
Nucleic Acids Research | Year: 2016

TCGA's RNASeq data represent one of the largest collections of cancer transcriptomes ever assembled. RNASeq technology, combined with computational tools like our SpliceSeq package, provides a comprehensive, detailed view of alternative mRNA splicing. Aberrant splicing patterns in cancers have been implicated in such processes as carcinogenesis, de-differentiation and metastasis. TCGA SpliceSeq (http://bioinformatics.mdanderson. org/TCGASpliceSeq) is a web-based resource that provides a quick, user-friendly, highly visual interface for exploring the alternative splicing patterns of TCGA tumors. Percent Spliced In (PSI) values for splice events on samples from 33 different tumor types, including available adjacent normal samples, have been loaded into TCGA SpliceSeq. Investigators can interrogate genes of interest, search for the genes that show the strongest variation between or among selected tumor types, or explore splicing pattern changes between tumor and adjacent normal samples. The interface presents intuitive graphical representations of splicing patterns, read counts and various statistical summaries, including percent spliced in. Splicing data can also be downloaded for inclusion in integrative analyses. TCGA SpliceSeq is freely available for academic, government or commercial use. © The Author(s) 2015.


PubMed | In Silico Solutions and University of Houston
Type: Journal Article | Journal: Nucleic acids research | Year: 2016

TCGAs RNASeq data represent one of the largest collections of cancer transcriptomes ever assembled. RNASeq technology, combined with computational tools like our SpliceSeq package, provides a comprehensive, detailed view of alternative mRNA splicing. Aberrant splicing patterns in cancers have been implicated in such processes as carcinogenesis, de-differentiation and metastasis. TCGA SpliceSeq (http://bioinformatics.mdanderson.org/TCGASpliceSeq) is a web-based resource that provides a quick, user-friendly, highly visual interface for exploring the alternative splicing patterns of TCGA tumors. Percent Spliced In (PSI) values for splice events on samples from 33 different tumor types, including available adjacent normal samples, have been loaded into TCGA SpliceSeq. Investigators can interrogate genes of interest, search for the genes that show the strongest variation between or among selected tumor types, or explore splicing pattern changes between tumor and adjacent normal samples. The interface presents intuitive graphical representations of splicing patterns, read counts and various statistical summaries, including percent spliced in. Splicing data can also be downloaded for inclusion in integrative analyses. TCGA SpliceSeq is freely available for academic, government or commercial use.

Loading In Silico Solutions collaborators
Loading In Silico Solutions collaborators