Zhan X.,Quantitative Biomedical Research Center |
Zhan X.,University of Texas Southwestern Medical Center |
Hu Y.,A9.com Inc |
Li B.,Vanderbilt University |
And 2 more authors.
Bioinformatics | Year: 2016
Motivation: Next-generation sequencing technologies have enabled the large-scale assessment of the impact of rare and low-frequency genetic variants for complex human diseases. Gene-level association tests are often performed to analyze rare variants, where multiple rare variants in a gene region are analyzed jointly. Applying gene-level association tests to analyze sequence data often requires integrating multiple heterogeneous sources of information (e.g. annotations, functional prediction scores, allele frequencies, genotypes and phenotypes) to determine the optimal analysis unit and prioritize causal variants. Given the complexity and scale of current sequence datasets and bioinformatics databases, there is a compelling need for more efficient software tools to facilitate these analyses. To answer this challenge, we developed RVTESTS, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals. RVTESTS also provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection. We illustrate the advantages of RVTESTS in functionality and efficiency using the 1000 Genomes Project data. © The Author 2016. Published by Oxford University Press.
Wang C.,Genome Institute of Singapore |
Zhan X.,Quantitative Biomedical Research Center |
Liang L.,Boston University |
Abecasis G.R.,University of Michigan |
Lin X.,Boston University
American Journal of Human Genetics | Year: 2015
Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. © 2015 The American Society of Human Genetics.
Allen J.D.,Southwestern Medical Center |
Wang S.,Southern Methodist University |
Chen M.,Quantitative Biomedical Research Center |
Minna J.D.,Center for Therapeutic Oncology |
And 2 more authors.
Briefings in Bioinformatics | Year: 2012
Access to gene expression data has become increasingly common in recent years; however, analysis has become more difficult as it is often desirable to integrate data from different platforms. Probe mapping across microarray platforms is the first and most crucial step for data integration. In this article, we systematically review and compare different approaches to map probes across seven platforms from different vendors: U95A, U133A and U133 Plus 2.0 from Affymetrix, Inc.; HT-12 v1, HT-12v2 and HT-12v3 from Illumina, Inc.; and 4112A from Agilent, Inc. We use a unique data set, which contains 56 lung cancer cell line samples-each of which has been measured by two different microarray platforms-to evaluate the consistency of expression measurement across platforms using different approaches. Based on the evaluation from the empirical data set, the BLAST alignment of the probe sequences to a recent revision of the Transcriptome generated better results than using annotations provided by Vendors or from Bioconductor's Annotate package. However, a combination of all three methods (deemed the 'Consensus Annotation') yielded the most consistent expression measurement across platforms. To facilitate data integration across microarray platforms for the research community, we develop a user-friendly web-based tool, an API and an R package to map data across different microarray platforms from Affymetrix, Illumina and Agilent. Information on all three can be found at. http://qbrc.swmed.edu/software/probemapper/. © The Author 2011. Published by Oxford University Press.
Zhong R.,Quantitative Biomedical Research Center |
Kim H.S.,Yonsei University |
Kim M.,Quantitative Biomedical Research Center |
Kim M.,Simmons Comprehensive Cancer Center |
And 4 more authors.
Nucleic Acids Research | Year: 2014
A challenge for large-scale siRNA loss-of-function studies is the biological pleiotropy resulting from multiple modes of action of siRNA reagents. A major confounding feature of these reagents is the microRNA-like translational quelling resulting from short regions of oligonucleotide complementarity to many different messenger RNAs. We developed a computational approach, deconvolution analysis of RNAi screening data, for automated quantitation of off-target effects in RNAi screening data sets. Substantial reduction of off-target rates was experimentally validated in five distinct biological screens across different genome-wide siRNA libraries. A public-access graphical-user-interface has been constructed to facilitate application of this algorithm. © 2014 The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zang X.,Quantitative Biomedical Research Center |
Chen M.,University of Texas at Dallas |
Zhou Y.,Quantitative Biomedical Research Center |
Zhou Y.,University of Texas Southwestern Medical Center |
And 5 more authors.
Cancer Informatics | Year: 2015
Lung cancer is among the major causes of cancer deaths, and the survival rate of lung cancer patients is extremely low. Recent studies have demonstrated that the gene CDKN3 is related to neoplasia, but in the literature severe controversy exists over whether it is involved in cancer progression or, conversely, tumor inhibition. In this study, we investigated the expression of CDKN3 and its association with prognosis in lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC) using datasets in Lung Cancer Explorer (LCE; http://qbrc.swmed.edu/lce/). We found that CDKN3 was up-regulated in ADC and SCC compared to normal tissues. We also found that CDKN3 was expressed at a higher level in SCC than in ADC, which was further validated through meta-analysis (coefficient = 2.09, 95% CI = 1.50–2.67, P < 0.0001). In addition, based on meta-analysis for the prognostic value of CDKN3, we found that higher CDKN3 expression was associated with poorer survival outcomes in ADC (HR = 1.65, 95% CI = 1.39–1.96, P < 0.0001), but not in SCC (HR = 1.10, 95% CI = 0.84–1.44, P = 0.494). Our findings indicate that CDKN3 may be a prognostic marker in ADC, though the detailed mechanism is yet to be revealed. © the authors, publisher and licensee Libertas Academica Limited.