Indiana Center for Systems Biology and Personalized Medicine

Indianapolis, IN, United States

Indiana Center for Systems Biology and Personalized Medicine

Indianapolis, IN, United States
Time filter
Source Type

Zhang F.,University of North Texas Health Science Center | Wang M.,Indiana Center for Systems Biology and Personalized Medicine | Michael T.,University of North Texas Health Science Center | Drabier R.,University of North Texas Health Science Center
BMC Systems Biology | Year: 2013

Background: In the biopharmaceutical industry, biomarkers define molecular taxonomies of patients and diseases and serve as surrogate endpoints in early-phase drug trials. Molecular biomarkers can be much more sensitive than traditional lab tests. Discriminating disease biomarkers by traditional method such as DNA microarray has proved challenging. Alternative splicing isoform represents a new class of diagnostic biomarkers. Recent scientific evidence is demonstrating that the differentiation and quantification of individual alternative splicing isoforms could improve insights into disease diagnosis and management. Identifying and characterizing alternative splicing isoforms are essential to the study of molecular mechanisms and early detection of complex diseases such as breast cancer. However, there are limitations with traditional methods used for alternative splicing isoform determination such as transcriptome-level, low level of coverage and poor focus on alternative splicing.Results: Therefore, we presented a peptidomics approach to searching novel alternative splicing isoforms in clinical proteomics. Our results showed that the approach has significant potential in enabling discovery of new types of high-quality alternative splicing isoform biomarkers.Conclusions: We developed a peptidomics approach for the proteomics community to analyze, identify, and characterize alternative splicing isoforms from MS-based proteomics experiments with more coverage and exclusive focus on alternative splicing. The approach can help generate novel hypotheses on molecular risk factors and molecular mechanisms of cancer in early stage, leading to identification of potentially highly specific alternative splicing isoform biomarkers for early detection of cancer. © 2013 Zhang et al.; licensee BioMed Central Ltd.

Zhang F.,University of North Texas Health Science Center | Chen J.Y.,Wenzhou University | Chen J.Y.,Indiana University | Chen J.Y.,Indiana Center for Systems Biology and Personalized Medicine
BMC Genomics | Year: 2016

Background: Clinical proteomics application aims at solving a specific clinical problem within the context of a clinical study. It has been growing rapidly in the field of biomarker discovery, especially in the area of cancer diagnostics. Until recently, protein isoform has not been viewed as a new class of early diagnostic biomarkers for clinical proteomics. A protein isoform is one of different forms of the same protein. Different forms of a protein may be produced from single-nucleotide polymorphisms (SNPs), alternative splicing, or post-translational modifications (PTMs). Previous studies have shown that protein isoforms play critical roles in tumorigenesis, disease diagnosis, and prognosis. Identifying and characterizing protein isoforms are essential to the study of molecular mechanisms and early detection of complex diseases such as breast cancer. However, there are limitations with traditional methods such as EST sequencing, Microarray profiling (exon array, Exon-exon junction array), mRNA next-generation sequencing used for protein isoform determination: 1) not in the protein level, 2) no connectivity about connection of nonadjacent exons, 3) no SNPs and PTMs, and 4) low reproducibility. Moreover, there exist the computational challenges of clinical proteomics studies: 1) low sensitivity of instruments, 2) high data noise, and 3) high variability and low repeatability, although recent advances in clinical proteomics technology, LC-MS/MS proteomics, have been used to identify candidate molecular biomarkers in diverse range of samples, including cells, tissues, serum/plasma, and other types of body fluids. Results: Therefore, in the paper, we presented a peptidomics method for identifying cancer-related and isoform-specific peptide for clinical proteomics application from LC-MS/MS. First, we built a Peptidomic Database of Human Protein Isoforms, then created a peptidomics approach to perform large-scale screen of breast cancer-associated alternative splicing isoform markers in clinical proteomics, and lastly performed four kinds of validations: biological validation (explainable index), exon array, statistical validation of independent samples, and extensive pathway analysis. Conclusions: Our results showed that alternative splicing isoform makers can act as independent markers of breast cancer and that the method for identifying cancer-specific protein isoform biomarkers from clinical proteomics application is an effective one for increasing the number of identified alternative splicing isoform markers in clinical proteomics. © 2016 The Author(s).

Wang M.,Indiana University | Chen J.Y.,Indiana University | Chen J.Y.,Purdue University | Chen J.Y.,Indiana Center for Systems Biology and Personalized Medicine
Artificial Intelligence in Medicine | Year: 2010

Objective: The limitation of small sample size of functional genomics experiments has made it necessary to integrate DNA microarray experimental data from different sources. However, experimentation noises and biases of different microarray platforms have made integrated data analysis challenging. In this work, we propose an integrative computational framework to identify candidate biomarker genes from publicly available functional genomics studies. Methods: We developed a new framework, Gaussian Mixture Modeling-Coupled Information Gain (GMM-IG). In this framework, we first apply a two-component Gaussian mixture model (GMM) to estimate the conditional probability distributions of gene expression data between two different types of samples, for example, normal versus cancer. An expectation-maximization algorithm is then used to estimate the maximum likelihood parameters of a mixture of two Gaussian models in the feature space and determine the underlying expression levels of genes. Gene expression results from different studies are discretized, based on GMM estimations and then unified. Significantly differentially-expressed genes are filtered and assessed with information gain (IG) measures. Results: DNA microarray experimental data for lung cancers from three different prior studies was processed using the new GMM-IG method. Target gene markers from a gene expression panel were selected and compared with several conventional computational biomarker data analysis methods. GMM-IG showed consistently high accuracy for several classification assessments. A high reproducibility of gene selection results was also determined from statistical validations. Our study shows that the GMM-IG framework can overcome poor reliability issues from single-study DNA microarray experiment while maintaining high accuracies by combining true signals from multiple studies. Conclusions: We present a conceptually simple framework that enables reliable integration of true differential gene expression signals from multiple microarray experiments. This novel computational method has been shown to generate interesting biomarker panels for lung cancer studies. It is promising as a general strategy for future panel biomarker development, especially for applications that requires integrating experimental results generated from different research centers or with different technology platforms. © 2009 Elsevier B.V.

Hale P.J.,Indiana University – Purdue University Indianapolis | Hale P.J.,Indiana Center for Systems Biology and Personalized Medicine | Lopez-Yunez A.M.,Alevio Medical Center | Chen J.Y.,Indiana University – Purdue University Indianapolis | And 3 more authors.
BMC Systems Biology | Year: 2012

Background: Many genetic studies, including single gene studies and Genome-wide association studies (GWAS), aim to identify risk alleles for genetic diseases such as Type II Diabetes (T2D). However, in T2D studies, there is a significant amount of the hereditary risk that cannot be simply explained by individual risk genes. There is a need for developing systems biology approaches to integrate comprehensive genetic information and provide new insight on T2D biology.Methods: We performed comprehensive integrative analysis of Single Nucleotide Polymorphisms (SNP's) individually curated from T2D GWAS results and mapped them to T2D candidate risk genes. Using protein-protein interaction data, we constructed a T2D-specific molecular interaction network consisting of T2D genetic risk genes and their interacting gene partners. We then studied the relationship between these T2D genes and curated gene sets.Results: We determined that T2D candidate risk genes are concentrated in certain parts of the genome, specifically in chromosome 20. Using the T2D genetic network, we identified highly-interconnected network "hub" genes. By incorporating T2D GWAS results, T2D pathways, and T2D genes' functional category information, we further ranked T2D risk genes, T2D-related pathways, and T2D-related functional categories. We found that highly-interconnected T2D disease network " hub" genes most highly associated to T2D genetic risks to be PI3KR1, ESR1, and ENPP1. The well-characterized TCF7L2, contractor to our expectation, was not among the highest-ranked T2D gene list. Many interacted pathways play a role in T2D genetic risks, which includes insulin signalling pathway, type II diabetes pathway, maturity onset diabetes of the young, adipocytokine signalling pathway, and pathways in cancer. We also observed significant crosstalk among T2D gene subnetworks which include insulin secretion, regulation of insulin secretion, response to peptide hormone stimulus, response to insulin stimulus, peptide secretion, glucose homeostasis, and hormone transport. Overview maps involving T2D genes, gene sets, pathways, and their interactions are all reported.Conclusions: Large-scale systems biology meta-analyses of GWAS results can improve interpretations of genetic variations and genetic risk factors. T2D genetic risks can be attributable to the summative genetic effects of many genes involved in a broad range of signalling pathways and functional networks. The framework developed for T2D studies may serve as a guide for studying other complex diseases. © 2012 Hale et al; licensee BioMed Central Ltd.

Zhang F.,University of North Texas Health Science Center | Chen J.Y.,Indiana University | Chen J.Y.,Purdue University | Chen J.Y.,Indiana Center for Systems Biology and Personalized Medicine
BMC Medical Genomics | Year: 2013

Background: Early detection of breast cancer in blood is both appealing clinically and challenging technically due to the disease's illusive nature and heterogeneity. Today, even though major breast cancer subtypes have been characterized, i.e., luminal A, luminal B, HER2+, and basal-like, little is known about the heterogeneity of breast cancer in blood, which could help to discover minimally invasive protein biomarkers with which clinical researchers can detect, classify, and monitor different breast cancer subtypes. Results: In this study, we performed an integrative pathway-assisted clustering analysis of breast cancer subtypes from plasma proteome samples collected from 80 patients diagnosed with breast cancer and 80 healthy women. First, four breast cancer subtypes and additionally unknown subtype (according to existing annotation) were determined based on pathology lab test results in primary tumors of enrolled patients. Next, we developed and applied four distance metrics, i.e., Protein Intensity, Q-Value, Pathway Profile, and Distance Score Function, to measure and characterize these cancer subtypes. Then, we developed a permutation test to evaluate the significant protein level changes in each biological pathway for each breast cancer subtype, using q-value. Lastly, we developed a pathway-protein matrix for each of the four distance methods to estimate the distance between breast cancer subtypes, for which further Pathway Association Network analysis were performed. Conclusions: We found that 1) the luminal group (luminal A and luminal B) are clustered together, as well as the basal group (basal-like and HER2+) and 2) luminal A and luminal B are more close to each other than basal-like and HER2+ to each other. Our results were consistent with a recent independent breast cancer research from the Cancer Genome Atlas Network using genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our results showed that changes of different breast cancer subtypes at the pathway level are more profound and less variable than those at the molecular level. Similar subtypes share distinct yet similar pathway activation networks, while dissimilar subtypes are different also at the level of pathway activation networks. The results also showed that distance or similarity of cancer subtypes based on pathway analysis might be able to provide further insight into the intrinsic relationship of breast cancer subtypes. We believe integrative pathway-assisted proteomics analysis described here can become a model for reliable clustering or classification of other cancer subtypes. © 2013 Chen; licensee BioMed Central Ltd.

Zhou A.,Indiana University | Zhou A.,Indiana Center for Systems Biology and Personalized Medicine | Zhang F.,Indiana University | Zhang F.,Indiana Center for Systems Biology and Personalized Medicine | And 3 more authors.
BMC Bioinformatics | Year: 2010

Background: Protein isoform generation, which may derive from alternative splicing, genetic polymorphism, and posttranslational modification, is an essential source of achieving molecular diversity by eukaryotic cells. Previous studies have shown that protein isoforms play critical roles in disease diagnosis, risk assessment, sub-typing, prognosis, and treatment outcome predictions. Understanding the types, presence, and abundance of different protein isoforms in different cellular and physiological conditions is a major task in functional proteomics, and may pave ways to molecular biomarker discovery of human diseases. In tandem mass spectrometry (MS/MS) based proteomics analysis, peptide peaks with exact matches to protein sequence records in the proteomics database may be identified with mass spectrometry (MS) search software. However, due to limited annotation and poor coverage of protein isoforms in proteomics databases, high throughput protein isoform identifications, particularly those arising from alternative splicing and genetic polymorphism, have not been possible.Results: Therefore, we present the PEPtidomics Protein Isoform Database (PEPPI,, a comprehensive database of computationally-synthesized human peptides that can identify protein isoforms derived from either alternatively spliced mRNA transcripts or SNP variations. We collected genome, pre-mRNA alternative splicing and SNP information from Ensembl. We synthesized in silico isoform transcripts that cover all exons and theoretically possible junctions of exons and introns, as well as all their variations derived from known SNPs. With three case studies, we further demonstrated that the database can help researchers discover and characterize new protein isoform biomarkers from experimental proteomics data.Conclusions: We developed a new tool for the proteomics community to characterize protein isoforms from MS-based proteomics experiments. By cataloguing each peptide configurations in the PEPPI database, users can study genetic variations and alternative splicing events at the proteome level. They can also batch-download peptide sequences in FASTA format to search for MS/MS spectra derived from human samples. The database can help generate novel hypotheses on molecular risk factors and molecular mechanisms of complex diseases, leading to identification of potentially highly specific protein isoform biomarkers. © 2010 Chen et al; licensee BioMed Central Ltd.

Zhang F.,Indiana University | Zhang F.,Indiana Center for Systems Biology and Personalized Medicine | Chen J.Y.,Indiana University | Chen J.Y.,Indiana Center for Systems Biology and Personalized Medicine
BMC Genomics | Year: 2010

Background: Breast cancer is worldwide the second most common type of cancer after lung cancer. Plasma proteome profiling may have a higher chance to identify protein changes between plasma samples such as normal and breast cancer tissues. Breast cancer cell lines have long been used by researches as model system for identifying protein biomarkers. A comparison of the set of proteins which change in plasma with previously published findings from proteomic analysis of human breast cancer cell lines may identify with a higher confidence a subset of candidate protein biomarker.Results: In this study, we analyzed a liquid chromatography (LC) coupled tandem mass spectrometry (MS/MS) proteomics dataset from plasma samples of 40 healthy women and 40 women diagnosed with breast cancer. Using a two-sample t-statistics and permutation procedure, we identified 254 statistically significant, differentially expressed proteins, among which 208 are over-expressed and 46 are under-expressed in breast cancer plasma. We validated this result against previously published proteomic results of human breast cancer cell lines and signaling pathways to derive 25 candidate protein biomarkers in a panel. Using the pathway analysis, we observed that the 25 " activated" plasma proteins were present in several cancer pathways, including 'Complement and coagulation cascades', 'Regulation of actin cytoskeleton', and 'Focal adhesion', and match well with previously reported studies. Additional gene ontology analysis of the 25 proteins also showed that cellular metabolic process and response to external stimulus (especially proteolysis and acute inflammatory response) were enriched functional annotations of the proteins identified in the breast cancer plasma samples. By cross-validation using two additional proteomics studies, we obtained 86% and 83% similarities in pathway-protein matrix between the first study and the two testing studies, which is much better than the similarity we measured with proteins.Conclusions: We presented a 'systems biology' method to identify, characterize, analyze and validate panel biomarkers in breast cancer proteomics data, which includes 1) t statistics and permutation process, 2) network, pathway and function annotation analysis, and 3) cross-validation of multiple studies. Our results showed that the systems biology approach is essential to the understanding molecular mechanisms of panel protein biomarkers. © 2010 Zhang and Chen; licensee BioMed Central Ltd.

Naylor S.,Predictive Physiology and Medicine PPM Inc. | Chen J.Y.,Indiana University | Chen J.Y.,Indiana Center for Systems Biology and Personalized Medicine | Chen J.Y.,Purdue University
Personalized Medicine | Year: 2010

We are all perplexed that current medical practice often appears maladroit in curing our individual illnesses or disease. However, as is often the case, a lack of understanding, tools and technologies are the root cause of such situations. Human individuality is an often-quoted term but, in the context of human biology, it is poorly understood. This is compounded when there is a need to consider the variability of human populations. In the case of the former, it is possible to quantify human complexity as determined by the 35,000 genes of the human genome, the 1-10 million proteins (including antibodies) and the 2000-3000 metabolites of the human metabolome. Human variability is much more difficult to assess, since many of the variables, such as the definition of race, are not even clearly agreed on. In order to accommodate human complexity, variability and its influence on health and disease, it is necessary to undertake a systematic approach. In the past decade, the emergence of analytical platforms and bioinformatics tools has led to the development of systems biology. Such an approach offers enormous potential in defining key pathways and networks involved in optimal human health, as well as disease onset, progression and treatment. The tools and technologies now available in systems biology analyses offer exciting opportunities to exploit the emerging areas of personalized medicine. In this article, we discuss the current status of human complexity, and how systems biology and personalized medicine can impact at the individual and population level. © 2010 Future Medicine Ltd.

Li J.,Indiana University – Purdue University Indianapolis | Zhang F.,Indiana University | Zhang F.,Indiana Center for Systems Biology and Personalized Medicine | Chen J.Y.,Indiana University | And 2 more authors.
BMC Systems Biology | Year: 2011

Bone cells can sense physical forces and convert mechanical stimulation conditions into biochemical signals that lead to expression of mechanically sensitive genes and proteins. However, it is still poorly understood how genes and proteins in bone cells are orchestrated to respond to mechanical stimulations. In this research, we applied integrated proteomics, statistical, and network biology techniques to study proteome-level changes to bone tissue cells in response to two different conditions, normal loading and fatigue loading. We harvested ulna midshafts and isolated proteins from the control, loaded, and fatigue loaded Rats. Using a label-free liquid chromatography tandem mass spectrometry (LC-MS/MS) experimental proteomics technique, we derived a comprehensive list of 1,058 proteins that are differentially expressed among normal loading, fatigue loading, and controls. By carefully developing protein selection filters and statistical models, we were able to identify 42 proteins representing 21 Rat genes that were significantly associated with bone cells' response to quantitative changes between normal loading and fatigue loading conditions. We further applied network biology techniques by building a fatigue loading activated protein-protein interaction subnetwork involving 9 of the human-homolog counterpart of the 21 rat genes in a large connected network component. Our study shows that the combination of decreased anti-apoptotic factor, Raf1, and increased pro-apoptotic factor, PDCD8, results in significant increase in the number of apoptotic osteocytes following fatigue loading. We believe controlling osteoblast differentiation/proliferation and osteocyte apoptosis could be promising directions for developing future therapeutic solutions for related bone diseases. © 2011 Li et al.

Zhang F.,Indiana University | Zhang F.,Indiana Center for Systems Biology and Personalized Medicine | Chen J.Y.,Indiana University | Chen J.Y.,Purdue University | Chen J.Y.,Indiana Center for Systems Biology and Personalized Medicine
BMC Bioinformatics | Year: 2011

Background: Each organ has a specific function in the body. " Organ-specificity" refers to differential expressions of the same gene across different organs. An organ-specific gene/protein is defined as a gene/protein whose expression is significantly elevated in a specific human organ. An " organ-specific marker" is defined as an organ-specific gene/protein that is also implicated in human diseases related to the organ. Previous studies have shown that identifying specificity for the organ in which a gene or protein is significantly differentially expressed, can lead to discovery of its function. Most currently available resources for organ-specific genes/proteins either allow users to access tissue-specific expression over a limited range of organs, or do not contain disease information such as disease-organ relationship and disease-gene relationship.Results: We designed an integrated Human Organ-specific Molecular Electronic Repository (HOMER,, defining human organ-specific genes/proteins, based on five criteria: 1) comprehensive organ coverage; 2) gene/protein to disease association; 3) disease-organ association; 4) quantification of organ-specificity; and 5) cross-linking of multiple available data sources.HOMER is a comprehensive database covering about 22,598 proteins, 52 organs, and 4,290 diseases integrated and filtered from organ-specific proteins/genes and disease databases like dbEST, TiSGeD, HPA, CTD, and Disease Ontology. The database has a Web-based user interface that allows users to find organ-specific genes/proteins by gene, protein, organ or disease, to explore the histogram of an organ-specific gene/protein, and to identify disease-related organ-specific genes by browsing the disease data online.Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) an association analysis of organ-specific genes with disease and 2) a gene set enrichment analysis of organ-specific gene expression data.Conclusions: HOMER is a new resource for analyzing, identifying, and characterizing organ-specific molecules in association with disease-organ and disease-gene relationships. The statistical method we developed for organ-specific gene identification can be applied to other organism. The current HOMER database can successfully answer a variety of questions related to organ specificity in human diseases and can help researchers in discovering and characterizing organ-specific genes/proteins with disease relevance. © 2011 Zhang and Chen; licensee BioMed Central Ltd.

Loading Indiana Center for Systems Biology and Personalized Medicine collaborators
Loading Indiana Center for Systems Biology and Personalized Medicine collaborators