Time filter

Source Type

Wang L.,Clemson University | Wang L.,Jc Self Research Institute Of Human Genetics | Huang C.,Clemson University | Yang M.Q.,Purdue University | And 4 more authors.
BMC Systems Biology | Year: 2010

Background: Understanding how biomolecules interact is a major task of systems biology. To model protein-nucleic acid interactions, it is important to identify the DNA or RNA-binding residues in proteins. Protein sequence features, including the biochemical property of amino acids and evolutionary information in terms of position-specific scoring matrix (PSSM), have been used for DNA or RNA-binding site prediction. However, PSSM is rather designed for PSI-BLAST searches, and it may not contain all the evolutionary information for modelling DNA or RNA-binding sites in protein sequences.Results: In the present study, several new descriptors of evolutionary information have been developed and evaluated for sequence-based prediction of DNA and RNA-binding residues using support vector machines (SVMs). The new descriptors were shown to improve classifier performance. Interestingly, the best classifiers were obtained by combining the new descriptors and PSSM, suggesting that they captured different aspects of evolutionary information for DNA and RNA-binding site prediction. The SVM classifiers achieved 77.3% sensitivity and 79.3% specificity for prediction of DNA-binding residues, and 71.6% sensitivity and 78.7% specificity for RNA-binding site prediction.Conclusions: Predictions at this level of accuracy may provide useful information for modelling protein-nucleic acid interactions in systems biology studies. We have thus developed a web-based tool called BindN+ (http://bioinfo.ggc.org/bindn+/) to make the SVM classifiers accessible to the research community. © 2010 Wang et al; licensee BioMed Central Ltd.


Rollins J.D.,Jc Self Research Institute Of Human Genetics | Collins J.S.,Jc Self Research Institute Of Human Genetics | Holden K.R.,Jc Self Research Institute Of Human Genetics | Holden K.R.,Medical University of South Carolina
Journal of Pediatrics | Year: 2010

Objective: To produce a more reliable, continuous set of occipitofrontal head circumference (OFC) growth reference charts for males and females from birth to adulthood in the United States. Study design: After investigating the strengths and shortcomings of previous reports, we combined the most recent statistically reliable reports of OFC growth reference data into a locally weighted regression analysis to estimate percentile curves. We used cross-sectional prospective local pediatric data to validate our results. Results: We present new age- and sex-appropriate US OFC growth charts from birth to adulthood that include 3rd and 97th percentile cutoff values. Our local pediatric data validate that our new proposed OFC growth charts' assessment of attained OFC growth is comparable with previous references. Conclusions: We have eliminated disagreements between multiple current references by unifying previously reported US OFC data into a single set of smoothed male and female growth reference charts from birth to adulthood. This will reduce confusion or errors in interpretation of normal versus abnormal measurements currently encountered by primary care clinicians and subspecialists when using OFC growth charts for the US pediatric population. © 2010 Mosby, Inc. All rights reserved.


Teng S.,Clemson University | Srivastava A.K.,Clemson University | Srivastava A.K.,Jc Self Research Institute Of Human Genetics | Wang L.,Clemson University | Wang L.,Jc Self Research Institute Of Human Genetics
BMC Genomics | Year: 2010

Background: Protein destabilization is a common mechanism by which amino acid substitutions cause human diseases. Although several machine learning methods have been reported for predicting protein stability changes upon amino acid substitutions, the previous studies did not utilize relevant sequence features representing biological knowledge for classifier construction.Results: In this study, a new machine learning method has been developed for sequence feature-based prediction of protein stability changes upon amino acid substitutions. Support vector machines were trained with data from experimental studies on the free energy change of protein stability upon mutations. To construct accurate classifiers, twenty sequence features were examined for input vector encoding. It was shown that classifier performance varied significantly by using different sequence features. The most accurate classifier in this study was constructed using a combination of six sequence features. This classifier achieved an overall accuracy of 84.59% with 70.29% sensitivity and 90.98% specificity.Conclusions: Relevant sequence features can be used to accurately predict protein stability changes upon amino acid substitutions. Predictive results at this level of accuracy may provide useful information to distinguish between deleterious and tolerant alterations in disease candidate genes. To make the classifier accessible to the genetics research community, we have developed a new web server, called MuStab (http://bioinfo.ggc.org/mustab/). © 2010 Wang et al; licensee BioMed Central Ltd.


Wang L.,Clemson University | Srivastava A.K.,Jc Self Research Institute Of Human Genetics | Schwartz C.E.,Jc Self Research Institute Of Human Genetics
BMC Genomics | Year: 2010

Background: Microarray gene expression data are accumulating in public databases. The expression profiles contain valuable information for understanding human gene expression patterns. However, the effective use of public microarray data requires integrating the expression profiles from heterogeneous sources.Results: In this study, we have compiled a compendium of microarray expression profiles of various human tissue samples. The microarray raw data generated in different research laboratories have been obtained and combined into a single dataset after data normalization and transformation. To demonstrate the usefulness of the integrated microarray data for studying human gene expression patterns, we have analyzed the dataset to identify potential tissue-selective genes. A new method has been proposed for genome-wide identification of tissue-selective gene targets using both microarray intensity values and detection calls. The candidate genes for brain, liver and testis-selective expression have been examined, and the results suggest that our approach can select some interesting gene targets for further experimental studies.Conclusion: A computational approach has been developed in this study for combining microarray expression profiles from heterogeneous sources. The integrated microarray data can be used to investigate tissue-selective expression patterns of human genes. © 2010 Wang et al; licensee BioMed Central Ltd.


Zhang Z.,Clemson University | Teng S.,Clemson University | Wang L.,Clemson University | Wang L.,Jc Self Research Institute Of Human Genetics | And 3 more authors.
Human Mutation | Year: 2010

The Snyder-Robinson syndrome is caused by missense mutations in the spermine sythase gene that encodes a protein (SMS) of 529 amino acids. Here we investigate, in silico, the molecular effect of three missense mutations, c.267G>A (p.G56S), c.496T>G (p.V132G), and c.550T>C (p.I150T) in SMS that were clinically identified to cause the disease. Single-point energy calculations, molecular dynamics simulations, and pKa calculations revealed the effects of these mutations on SMS's stability, flexibility, and interactions. It was predicted that the catalytic residue, Asp276, should be protonated prior binding the substrates. The pKa calculations indicated the p.I150T mutation causes pKa changes with respect to the wild-type SMS, which involve titratable residues interacting with the S-methyl-5′-thioadenosine (MTA) substrate. The p.I150T missense mutation was also found to decrease the stability of the C-terminal domain and to induce structural changes in the vicinity of the MTA binding site. The other two missense mutations, p.G56S and p.V132G, are away from active site and do not perturb its wild-type properties, but affect the stability of both the monomers and the dimer. Specifically, the p.G56S mutation is predicted to greatly reduce the affinity of monomers to form a dimer, and therefore should have a dramatic effect on SMS function because dimerization is essential for SMS activity. © 2010 Wiley-Liss, Inc.


Lubs H.A.,JC Self Research Institute of Human Genetics | Stevenson R.E.,JC Self Research Institute of Human Genetics | Schwartz C.E.,JC Self Research Institute of Human Genetics
American Journal of Human Genetics | Year: 2012

X-Linked intellectual disability (XLID) accounts for 5%-10% of intellectual disability in males. Over 150 syndromes, the most common of which is the fragile X syndrome, have been described. A large number of families with nonsyndromal XLID, 95 of which have been regionally mapped, have been described as well. Mutations in 102 X-linked genes have been associated with 81 of these XLID syndromes and with 35 of the regionally mapped families with nonsyndromal XLID. Identification of these genes has enabled considerable reclassification and better understanding of the biological basis of XLID. At the same time, it has improved the clinical diagnosis of XLID and allowed for carrier detection and prevention strategies through gamete donation, prenatal diagnosis, and genetic counseling. Progress in delineating XLID has far outpaced the efforts to understand the genetic basis for autosomal intellectual disability. In large measure, this has been because of the relative ease of identifying families with XLID and finding the responsible mutations, as well as the determined and interactive efforts of a small group of researchers worldwide. © 2012 The American Society of Human Genetics.


Teng S.,Clemson University | Luo H.,Clemson University | Wang L.,Clemson University | Wang L.,Jc Self Research Institute Of Human Genetics
Amino Acids | Year: 2012

Protein sumoylation is a post-translational modification that plays an important role in a wide range of cellular processes. Small ubiquitin-related modifier (SUMO) can be covalently and reversibly conjugated to the sumoylation sites of target proteins, many of which are implicated in various human genetic disorders. The accurate prediction of protein sumoylation sites may help biomedical researchers to design their experiments and understand the molecular mechanism of protein sumoylation. In this study, a new machine learning approach has been developed for predicting sumoylation sites from protein sequence information. Random forests (RFs) and support vector machines (SVMs) were trained with the data collected from the literature. Domain-specific knowledge in terms of relevant biological features was used for input vector encoding. It was shown that RF classifier performance was affected by the sequence context of sumoylation sites, and 20 residues with the core motif ΨKXE in the middle appeared to provide enough context information for sumoylation site prediction. The RF classifiers were also found to outperform SVM models for predicting protein sumoylation sites from sequence features. The results suggest that the machine learning approach gives rise to more accurate prediction of protein sumoylation sites than the other existing methods. The accurate classifiers have been used to develop a new web server, called seeSUMO ( http://bioinfo.ggc.org/ seesumo/ ), for sequence-based prediction of protein sumoylation sites. © 2011 Springer-Verlag.


Witham S.,Clemson University | Takano K.,Jc Self Research Institute Of Human Genetics | Schwartz C.,Jc Self Research Institute Of Human Genetics | Schwartz C.,Clemson University | Alexov E.,Clemson University
Proteins: Structure, Function and Bioinformatics | Year: 2011

Large-scale next generation resequencing of X chromosome genes identified a missense mutation in the CLIC2 gene on Xq28 in a male with X-linked intellectual disability (XLID) and not found in healthy individuals. At the same time, numerous nsSNPs (nonsynonomous SNP) have been reported in the CLIC2 gene in healthy individuals indicating that the CLIC2 protein can tolerate amino acid substitutions and be fully functional. To test the possibility that p.H101Q is a disease-causing mutation, we performed in silico simulations to calculate the effects of the p.H101Q mutation on CLIC2 stability, dynamics, and ionization states while comparing the effects obtained for presumably harmless nsSNPs. It was found that p.H101Q, in contrast with other nsSNPs, (a) lessens the flexibility of the joint loop which is important for the normal function of CLIC2, (b) makes the overall 3D structure of CLIC2 more stable and thus reduces the possibility of the large conformational change expected to occur when CLIC2 moves from a soluble to membrane form, and (c) removes the positively charged residue, H101, which may be important for the membrane association of CLIC2. The results of in silico modeling, in conjunction with the polymorphism analysis, suggest that p.H101Q may be a disease-causing mutation, the first one suggested in the CLIC family. © 2011 Wiley-Liss, Inc.


Zhang Z.,Clemson University | Norris J.,Jc Self Research Institute Of Human Genetics | Schwartz C.,Jc Self Research Institute Of Human Genetics | Schwartz C.,Clemson University | Alexov E.,Clemson University
PLoS ONE | Year: 2011

Background: Spermine synthase (SMS) is a key enzyme controlling the concentration of spermidine and spermine in the cell. The importance of SMS is manifested by the fact that single missense mutations were found to cause Snyder-Robinson Syndrome (SRS). At the same time, currently there are no non-synonymous single nucleoside polymorphisms, nsSNPs (harmless mutations), found in SMS, which may imply that the SMS does not tolerate amino acid substitutions, i.e. is not mutable. Methodology/Principal Findings: To investigate the mutability of the SMS, we carried out in silico analysis and in vitro experiments of the effects of amino acid substitutions at the missense mutation sites (G56, V132 and I150) that have been shown to cause SRS. Our investigation showed that the mutation sites have different degree of mutability depending on their structural micro-environment and involvement in the function and structural integrity of the SMS. It was found that the I150 site does not tolerate any mutation, while V132, despite its key position at the interface of SMS dimer, is quite mutable. The G56 site is in the middle of the spectra, but still quite sensitive to charge residue replacement. Conclusions/Significance: The performed analysis showed that mutability depends on the detail of the structural and functional factors and cannot be predicted based on conservation of wild type properties alone. Also, harmless nsSNPs can be expected to occur even at sites at which missense mutations were found to cause diseases.


Teng S.,Clemson University | Yang J.Y.,Harvard University | Wang L.,Clemson University | Wang L.,Jc Self Research Institute Of Human Genetics
BMC Medical Genomics | Year: 2013

Background: Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results: In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. Conclusions: A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression. © 2013 Teng et al.; licensee BioMed Central Ltd.

Loading Jc Self Research Institute Of Human Genetics collaborators
Loading Jc Self Research Institute Of Human Genetics collaborators