Ruan J.,Chern Institute of Mathematics
Journal of Theoretical Biology | Year: 2010
In principle, structural information of protein sequences with no detectable homology to a protein of known structure could be obtained by predicting the arrangement of their secondary structural elements. Although some ab initio methods for protein structure prediction have been reported, the long-range interactions required to accurately predict tertiary structures of β-sheet containing proteins are still difficult to simulate. To remedy this problem and facilitate de novo prediction of β-sheet containing protein structures, we developed a support vector machine (SVM) approach that classified parallel and antiparallel orientation of β-strands by using the information of interstrand amino acid pairing preferences. Based on a second-order statistics on the relative frequencies of each possible interstrand amino acid pair, we defined an average amino acid pairing encoding matrix (APEM) for encoding β-strands as input in the prediction model. As a result, a prediction accuracy of 86.89% and a Matthew's correlation coefficient value of 0.71 have been achieved through 7-fold cross-validation on a non-redundant protein dataset from PISCES. Although several issues still remain to be studied, the method presented here to some extent could indicate the important contribution of the amino acid pairs to the β-strand orientation, and provide a possible way to further be combined with other algorithms making a full 'identification' of β-strands. © 2009 Elsevier Ltd. All rights reserved.
Zhang H.,Zhejiang GongShang University |
Zhang H.,Nankai University |
Zhang H.,University of Alberta |
Zhang T.,Nankai University |
And 8 more authors.
Amino Acids | Year: 2012
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies. © Springer-Verlag 2010.
Zhang T.,Nankai University |
Zhang T.,University of Alberta |
Zhang T.,Indiana University – Purdue University Indianapolis |
Zhang H.,Zhejiang GongShang University |
And 6 more authors.
Current Protein and Peptide Science | Year: 2010
Identification and prediction of RNA-binding residues (RBRs) provides valuable insights into the mechanisms of protein-RNA interactions. We analyzed the contributions of a wide range of factors including amino acid sequence, evolutionary conservation, secondary structure and solvent accessibility, to the prediction/characterization of RBRs. Five feature sets were designed and feature selection was performed to find and investigate relevant features. We demonstrate that (1) interactions with positively charged amino acids Arg and Lys are preferred by the negatively charged nucleotides; (2) Gly provides flexibility for the RNA binding sites; (3) Glu with negatively charged side chain and several hydrophobic residues such as Leu, Val, Ala and Phe are disfavored in the RNA-binding sites; (4) coil residues, especially in long segments, are more flexible (than other secondary structures) and more likely to interact with RNA; (5) helical residues are more rigid and consequently they are less likely to bind RNA; and (6) residues partially exposed to the solvent are more likely to form RNA-binding sites. We introduce a novel sequence-based predictor of RBRs, RBRpred, which utilizes the selected features. RBRpred is comprehensively tested on three datasets with varied atom distance cutoffs by performing both five-fold cross validation and jackknife tests and achieves Matthew's correlation coefficient (MCC) of 0.51, 0.48 and 0.42, respectively. The quality is comparable to or better than that for state-of-the-art predictors that apply the distance-based cutoff definition. We show that the most important factor for RBRs prediction is evolutionary conservation, followed by the amino acid sequence, predicted secondary structure and predicted solvent accessibility. We also investigate the impact of using native vs. predicted secondary structure and solvent accessibility. The predictions are sufficient for the RBR prediction and the knowledge of the actual solvent accessibility helps in predictions for lower distance cutoffs.
Danielsson U.H.,Uppsala University |
Lundgren M.,Uppsala University |
Niemi A.J.,Uppsala University |
Niemi A.J.,University of Tours |
Niemi A.J.,Chern Institute of Mathematics
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics | Year: 2010
We combine the principle of gauge invariance with extrinsic string geometry to develop a lattice model that can be employed to theoretically describe properties of chiral, unbranched homopolymers. We find that in its low temperature phase the model is in the same universality class with proteins that are deposited in the Protein Data Bank, in the sense of the compactness index. We apply the model to analyze various statistical aspects of folded proteins. Curiously we find that it can produce results that are a very good good match to the data in the Protein Data Bank. © 2010 The American Physical Society.
Ge M.,Chern Institute of Mathematics |
Hong J.,Fudan University |
Li T.,Fudan University |
Zhang W.,Chern Institute of Mathematics
Frontiers in Differential Geometry, Partial Differential Equations, and Mathematical Physics: In Memory of Gu Chaohao | Year: 2014
This book is a collection of papers in memory of Gu Chaohao on the subjects of Differential Geometry, Partial Differential Equations and Mathematical Physics that Gu Chaohao made great contributions to with all his intelligence during his lifetime. All contributors to this book are close friends, colleagues and students of Gu Chaohao. They are all excellent experts among whom there are 9 members of the Chinese Academy of Sciences. Therefore this book will provide some important information on the frontiers of the related subjects. © 2014 by World Scientific Publishing Co. Pte. Ltd. All rights reserved.