Lee S.,Sejong University |
Kwon M.-S.,Interdisciplinary Program in Bioinformatics |
Park T.,Interdisciplinary Program in Bioinformatics |
Park T.,Seoul National University
Bioinformatics | Year: 2012
Motivation: For the past few decades, many statistical methods in genome-wide association studies (GWAS) have been developed to identify SNP-SNP interactions for case-control studies. However, there has been less work for prospective cohort studies, involving the survival time. Recently, Gui et al. (2011) proposed a novel method, called Surv-MDR, for detecting gene-gene interactions associated with survival time. Surv-MDR is an extension of the multifactor dimensionality reduction (MDR) method to the survival phenotype by using the log-rank test for defining a binary attribute. However, the Surv-MDR method has some drawbacks in the sense that it needs more intensive computations and does not allow for a covariate adjustment. In this article, we propose a new approach, called Cox-MDR, which is an extension of the generalized multifactor dimensionality reduction (GMDR) to the survival phenotype by using a martingale residual as a score to classify multi-level genotypes as high- and low-risk groups. The advantages of Cox-MDR over Surv-MDR are to allow for the effects of discrete and quantitative covariates in the frame of Cox regression model and to require less computation than Surv-MDR. Results: Through simulation studies, we compared the power of Cox-MDR with those of Surv-MDR and Cox regression model for various heritability and minor allele frequency combinations without and with adjusting for covariate. We found that Cox-MDR and Cox regression model perform better than Surv-MDR for low minor allele frequency of 0.2, but Surv-MDR has high power for minor allele frequency of 0.4. However, when the effect of covariate is adjusted for, Cox-MDR and Cox regression model perform much better than Surv-MDR. We also compared the performance of Cox- MDR and Surv-MDR for a real data of leukemia patients to detect the gene-gene interactions with the survival time. © The Author(s) 2012. Published by Oxford University Press.
PubMed | Interdisciplinary Program in Bioinformatics. and Seoul National University
Type: Journal Article | Journal: Bioinformatics (Oxford, England) | Year: 2016
To understand the dynamic nature of the biological process, it is crucial to identify perturbed pathways in an altered environment and also to infer regulators that trigger the response. Current time-series analysis methods, however, are not powerful enough to identify perturbed pathways and regulators simultaneously. Widely used methods include methods to determine gene sets such as differentially expressed genes or gene clusters and these genes sets need to be further interpreted in terms of biological pathways using other tools. Most pathway analysis methods are not designed for time series data and they do not consider gene-gene influence on the time dimension.In this article, we propose a novel time-series analysis method TimeTP for determining transcription factors (TFs) regulating pathway perturbation, which narrows the focus to perturbed sub-pathways and utilizes the gene regulatory network and protein-protein interaction network to locate TFs triggering the perturbation. TimeTP first identifies perturbed sub-pathways that propagate the expression changes along the time. Starting points of the perturbed sub-pathways are mapped into the network and the most influential TFs are determined by influence maximization technique. The analysis result is visually summarized in TF-PATHWAY MAP IN TIME CLOCK: TimeTP was applied to PIK3CA knock-in dataset and found significant sub-pathways and their regulators relevant to the PIP3 signaling pathway.TimeTP is implemented in Python and available at http://biohealth.snu.ac.kr/software/TimeTP/Supplementary information: Supplementary data are available at Bioinformatics firstname.lastname@example.org.
Jung I.,Interdisciplinary Program in Bioinformatics |
Jung I.,Bioinformatics Institute |
Park J.C.,Seoul National University |
Kim S.,Interdisciplinary Program in Bioinformatics |
And 2 more authors.
Computational Biology and Chemistry | Year: 2014
Piwi-interacting RNAs (piRNAs) are recently discovered, endogenous small non-coding RNAs. piRNAs protect the genome from invasive transposable elements (TE) and sustain integrity of the genome in germ cell lineages. Small RNA-sequencing data can be used to detect piRNA activations in a cell under a specific condition. However, identification of cell specific piRNA activations requires sophisticated computational methods. As of now, there is only one computational method, proTRAC, to locate activated piRNAs from the sequencing data. proTRAC detects piRNA clusters based on a probabilistic analysis with assumption of a uniform distribution. Unfortunately, we were not able to locate activated piRNAs from our proprietary sequencing data in chicken germ cells using proTRAC. With a careful investigation on data sets, we found that a uniform or any statistical distribution for detecting piRNA clusters may not be assumed. Furthermore, small RNA-seq data contains many different types of RNAs which was not carefully taken into account in previous studies. To improve piRNA cluster identification, we developed piClust that uses a density based clustering approach without assumption of any parametric distribution. In previous studies, it is known that piRNAs exhibit a strong tendency of forming piRNA clusters in syntenic regions of the genome. Thus, the density based clustering approach is effective and robust to the existence of non-piRNAs or noise in the data. In experiments with piRNA data from human, mouse, rat and chicken, piClust was able to detect piRNA clusters from total small RNA-seq data from germ cell lines, while proTRAC was not successful. piClust outperformed proTRAC in terms of sensitivity and running time (up to 200 folds). © 2014 Elsevier Ltd.
Kim I.-W.,Seoul National University |
Kim K.I.,Seoul National University |
Chang H.-J.,Seoul National University |
Yeon B.,Interdisciplinary Program in Bioinformatics |
And 6 more authors.
Pharmacogenetics and Genomics | Year: 2012
OBJECTIVE: We examined the differences in allele frequencies for pharmacogenes among the Korean (KOR), Chinese (CHB), Japanese (JPT), Caucasian (CEU), and Nigerian (YRI) populations. METHODS: Fifty-seven pharmacogenes were selected from the imputed Korean Association REsource and HapMap databases. Minor allele frequencies were analyzed using the sample size-modified single nucleotide polymorphism-specific fixation index (FST) and the χ-test with Bonferroni's correction. Geneset analysis was also carried out to identify pharmacogenes that have significantly different allele frequencies among the various populations tested. RESULTS: The KOR population was the most divergent group from the YRI population (FST: 0.079) but very similar to the CHB and JPT populations (FST: 0.003). VKORC1 showed a large population divergence in the KOR-YRI (0.439) comparison. CYP3A4 was also highly divergent in the KOR-YRI (FST: 0.361) comparison. The calcium signaling pathway gene set was divergent in all pairwise population comparisons. CONCLUSION: In terms of the 57 pharmacogenes studied, there were no significant differences among the KOR, CHB, and JPT populations. However, the YRI and CEU populations were significantly differentiated from the three Eastern Asian groups. Future pharmacogenomics studies can utilize the polymorphisms identified in this study, as these variants may have important implications for the selection of highly informative single nucleotide polymorphisms for future clinical trials. © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins.
Ahn T.,Samsung |
Lee E.,Samsung |
Huh N.,Samsung |
Park T.,Interdisciplinary Program in Bioinformatics |
Park T.,Seoul National University
Bioinformatics | Year: 2014
Motivation: Identifying altered pathways in an individual is important for understanding disease mechanisms and for the future application of custom therapeutic decisions. Existing pathway analysis techniques are mainly focused on discovering altered pathways between normal and cancer groups and are not suitable for identifying the pathway aberrance that may occur in an individual sample. A simple way to identify individual's pathway aberrance is to compare normal and tumor data from the same individual. However, the matched normal data from the same individual are often unavailable in clinical situation. Therefore, we suggest a new approach for the personalized identification of altered pathways, making special use of accumulated normal data in cases when a patient's matched normal data are unavailable. The philosophy behind our method is to quantify the aberrance of an individual sample's pathway by comparing it with accumulated normal samples. We propose and examine personalized extensions of pathway statistics, overrepresentation analysis and functional class scoring, to generate individualized pathway aberrance score. Results: Collected microarray data of normal tissue of lung and colon mucosa are served as reference to investigate a number of cancer individuals of lung adenocarcinoma (LUAD) and colon cancer, respectively. Our method concurrently captures known facts of cancer survival pathways and identifies the pathway aberrances that represent cancer differentiation status and survival. It also provides more improved validation rate of survival-related pathways than when a single cancer sample is interpreted in the context of cancer-only cohort. In addition, ourmethod is useful in classifying unknown samples into cancer or normal groups. Particularly, we identified 'amino acid synthesis and interconversion' pathway is a good indicator of LUAD (Area Under the Curve (AUC) 0.982 at independent validation). Clinical importance of the method is providing pathway interpretation of single cancer, even though its matched normal data are unavailable. © The Author 2014. Published by Oxford University Press. All rights reserved.