Gordon Life Science Institute

San Diego, CA, United States

Gordon Life Science Institute

San Diego, CA, United States

Time filter

Source Type

Chou K.-C.,Shanghai JiaoTong University | Chou K.-C.,Gordon Life Science Institute | Shen H.-B.,Shanghai JiaoTong University | Shen H.-B.,Gordon Life Science Institute
PLoS ONE | Year: 2010

One of the fundamental goals in proteomics and cell biology is to identify the functions of proteins in various cellular organelles and pathways. Information of subcellular locations of proteins can provide useful insights for revealing their functions and understanding how they interact with each other in cellular network systems. Most of the existing methods in predicting plant protein subcellular localization can only cover three or four location sites, and none of them can be used to deal with multiplex plant proteins that can simultaneously exist at two, or move between, two or more different location sites. Actually, such multiplex proteins might have special biological functions worthy of particular notice. The present study was devoted to improve the existing plant protein subcellular location predictors from the aforementioned two aspects. A new predictor called "Plant-mPLoc" is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify plant proteins among the following 12 location sites: (1) cell membrane, (2) cell wall, (3) chloroplast, (4) cytoplasm, (5) endoplasmic reticulum, (6) extracellular, (7) Golgi apparatus, (8) mitochondrion, (9) nucleus, (10) peroxisome, (11) plastid, and (12) vacuole. Compared with the existing methods for predicting plant protein subcellular localization, the new predictor is much more powerful and flexible. Particularly, it also has the capacity to deal with multiple-location proteins, which is beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization. As a user-friendly web-server, Plant-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/plan-multi/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. It is anticipated that the Plant-mPLoc predictor as presented in this paper will become a very useful tool in plant science as well as all the relevant areas. © 2010 Chou, Shen.


Chen W.,Hebei United University | Chen W.,Gordon Life Science Institute | Feng P.-M.,Hebei United University | Lin H.,University of Electronic Science and Technology of China | Chou K.-C.,Gordon Life Science Institute
Nucleic Acids Research | Year: 2013

Meiotic recombination is an important biological process. As a main driving force of evolution, recombination provides natural new combinations of genetic variations. Rather than randomly occurring across a genome, meiotic recombination takes place in some genomic regions (the so-called 'hotspots') with higher frequencies, and in the other regions (the so-called 'coldspots') with lower frequencies. Therefore, the information of the hotspots and coldspots would provide useful insights for in-depth studying of the mechanism of recombination and the genome evolution process as well. So far, the recombination regions have been mainly determined by experiments, which are both expensive and time-consuming. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the recombination regions. In this study, a predictor, called 'iRSpot-PseDNC', was developed for identifying the recombination hotspots and coldspots. In the new predictor, the samples of DNA sequences are formulated by a novel feature vector, the so-called 'pseudo dinucleotide composition' (PseDNC), into which six local DNA structural properties, i.e. three angular parameters (twist, tilt and roll) and three translational parameters (shift, slide and rise), are incorporated. It was observed by the rigorous jackknife test that the overall success rate achieved by iRSpot-PseDNC was >82% in identifying recombination spots in Saccharomyces cerevisiae, indicating the new predictor is promising or at least may become a complementary tool to the existing methods in this area. Although the benchmark data set used to train and test the current method was from S. cerevisiae, the basic approaches can also be extended to deal with all the other genomes. Particularly, it has not escaped our notice that the PseDNC approach can be also used to study many other DNA-related problems. As a user-friendly web-server, iRSpot-PseDNC is freely accessible at http://lin.uestc.edu. cn/server/iRSpot- PseDNC. © The Author(s) 2013. Published by Oxford University Press.


Lin H.,University of Electronic Science and Technology of China | Lin H.,Gordon Life Science Institute | Deng E.-Z.,University of Electronic Science and Technology of China | Ding H.,University of Electronic Science and Technology of China | And 4 more authors.
Nucleic Acids Research | Year: 2014

The σ54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the σ54 promoters. Here, a predictor called 'iPro54-PseKNC' was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called 'pseudo k-tuple nucleotide composition', which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC. For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the σ54 promoters. © 2014 The Author(s).


Wang J.-F.,Shanghai JiaoTong University | Wang J.-F.,Shanghai Center for Bioinformation and Technology | Chou K.-C.,Gordon Life Science Institute
PLoS ONE | Year: 2012

Human mitochondrial ornithine transporter-1 is reported in coupling with the hyperornithinemia-hyperammonemia-homocitrullinuria (HHH) syndrome, which is a rare autosomal recessive disorder. For in-depth understanding of the molecular mechanism of the disease, it is crucially important to acquire the 3D structure of human mitochondrial ornithine transporter-1. Since no such structure is available in the current protein structure database, we have developed it via computational approaches based on the recent NMR structure of human mitochondrial uncoupling protein (Berardi MJ, Chou JJ, et al. Nature 2011, 476:109-113). Subsequently, we docked the ligand L-ornithine into the computational structure to search for the favorable binding mode. It was observed that the binding interaction for the most favorable binding mode is featured by six remarkable hydrogen bonds between the receptor and ligand, and that the most favorable binding mode shared the same ligand-binding site with most of the homologous mitochondrial carriers from different organisms, implying that the ligand-binding sites are quite conservative in the mitochondrial carriers family although their sequences similarity is very low with 20% or so. Moreover, according to our structural analysis, the relationship between the disease-causing mutations of human mitochondrial ornithine transporter-1 and the HHH syndrome can be classified into the following three categories: (i) the mutation occurs in the pseudo-repeat regions so as to change the region of the protein closer to the mitochondrial matrix; (ii) the mutation is directly affecting the substrate binding pocket so as to reduce the substrate binding affinity; (iii) the mutation is located in the structural region closer to the intermembrane space that can significantly break the salt bridge networks of the protein. These findings may provide useful insights for in-depth understanding of the molecular mechanism of the HHH syndrome and developing effective drugs against the disease. © 2012 Wang, Chou.


Zhou G.-P.,Gordon Life Science Institute | Zhou G.-P.,North Carolina State University
Journal of Theoretical Biology | Year: 2011

Wenxiang diagram is a new two-dimensional representation that characterizes the disposition of hydrophobic and hydrophilic residues in α-helices. In this research, the hydrophobic and hydrophilic residues of two leucine zipper coiled-coil (LZCC) structural proteins, cGKIα 1-59 and MBS CT35 are dispositioned on the wenxiang diagrams according to heptad repeat pattern (abcdefg) n, respectively. Their wenxiang diagrams clearly demonstrate that the residues with same repeat letters are laid on same side of the spiral diagrams, where most hydrophobic residues are positioned at a and d, and most hydrophilic residues are localized on b, c, e, f and g polar position regions. The wenxiang diagrams of a dimetric LZCC can be represented by the combination of two monomeric wenxiang diagrams, and the wenxiang diagrams of the two LZCC (tetramer) complex structures can also be assembled by using two pairs of their wenxiang diagrams. Furthermore, by comparing the wenxiang diagrams of cGKIα 1-59 and MBS CT35, the interaction between cGKIα 1-59 and MBS CT35 is suggested to be weaker. By analyzing the wenxiang diagram of the cGKIα 1-59.MBS CT42 complex structure, most affected residues of cGKIα 1-59 by the interaction with MBS CT42 are proposed at positions d, a, e and g of the LZCC structure. These findings are consistent with our previous NMR results. Incorporating NMR spectroscopy, the wenxiang diagrams of LZCC structures may provide novel insights into the interaction mechanisms between dimeric, trimeric, tetrameric coiled-coil structures. © 2011 Elsevier Ltd.


Chou K.-C.,Gordon Life Science Institute
Current Drug Metabolism | Year: 2010

Using graphic rules to deal with kinetic systems is an elegant approach by combining the graph representation (schematic representation) and rigorous mathematical derivation. It bears the following advantages: (1) providing an intuitive picture or illuminative insights; (2) helping grasp the key points from complicated details; (3) greatly simplifying many tedious, laborious, and error-prone calculations; and (4) able to double-check the final results. In this mini review, the non-steady state graphic rule in enzyme-catalyzed kinetics and protein-folding kinetics was extended to cover drugmetabolic systems. As a demonstration, a step-by-step illustration is presented showing how to use the graphic rule to derive the concentrations of the parent drug and its metabolites vs. time for the seliciclib, vildagliptin, and cyclin-dependent kinase inhibitor (AG-024322) metabolic systems, respectively. It can be seen from these paradigms that the graphic rule is particularly useful to analyze complicated drug metabolic systems and ensure the correctness of the derived results. Meanwhile, the intuitive feature of graphic representation may facilitate analyzing and classifying drug metabolic systems; e.g., according to their directed graphs, the metabolism of seliciclib and the metabolism of vildagliptin can be categorized as 0 → 5 mechanism while that of AG-024322 as 0 → 4 → 3 mechanism. © 2010 Bentham Science Publishers Ltd.


Chou K.-C.,Gordon Life Science Institute | Wu Z.-C.,Jing de Zhen Ceramic Institute | Xiao X.,Gordon Life Science Institute | Xiao X.,Jing de Zhen Ceramic Institute
PLoS ONE | Year: 2011

Predicting protein subcellular localization is an important and difficult problem, particularly when query proteins may have the multiplex character, i.e., simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular location predictor can only be used to deal with the single-location or "singleplex" proteins. Actually, multiple-location or "multiplex" proteins should not be ignored because they usually posses some unique biological functions worthy of our special notice. By introducing the "multi-labeled learning" and "accumulation-layer scale", a new predictor, called iLoc-Euk, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins. As a demonstration, the jackknife cross-validation was performed with iLoc-Euk on a benchmark dataset of eukaryotic proteins classified into the following 22 location sites: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centriole, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole, where none of proteins included has ≥25% pairwise sequence identity to any other in a same subset. The overall success rate thus obtained by iLoc-Euk was 79%, which is significantly higher than that by any of the existing predictors that also have the capacity to deal with such a complicated and stringent system. As a user-friendly web-server, iLoc-Euk is freely accessible to the public at the web-site http://icpr.jci.edu.cn/bioinfo/iLoc-Euk. It is anticipated that iLoc-Euk may become a useful bioinformatics tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development Also, its novel approach will further stimulate the development of predicting other protein attributes. © 2011 Chou et al.


Chou K.-C.,Gordon Life Science Institute | Wu Z.-C.,Jing de Zhen Ceramic Institute | Xiao X.,Gordon Life Science Institute | Xiao X.,Jing de Zhen Ceramic Institute
Molecular BioSystems | Year: 2012

Although numerous efforts have been made for predicting the subcellular locations of proteins based on their sequence information, it still remains as a challenging problem, particularly when query proteins may have the multiplex character, i.e., they simultaneously exist, or move between, two or more different subcellular location sites. Most of the existing methods were established on the assumption: a protein has one, and only one, subcellular location. Actually, recent evidence has indicated an increasing number of human proteins having multiple subcellular locations. This kind of multiplex proteins should not be ignored because they may bear some special biological functions worthy of our attention. Based on the accumulation-label scale, a new predictor, called iLoc-Hum, was developed for identifying the subcellular localization of human proteins with both single and multiple location sites. As a demonstration, the jackknife cross-validation was performed with iLoc-Hum on a benchmark dataset of human proteins that covers the following 14 location sites: centrosome, cytoplasm, cytoskeleton, endoplasmic reticulum, endosome, extracellular, Golgi apparatus, lysosome, microsome, mitochondrion, nucleus, peroxisome, plasma membrane, and synapse, where some proteins belong to two, three or four locations but none has 25% or higher pairwise sequence identity to any other in the same subset. For such a complicated and stringent system, the overall success rate achieved by iLoc-Hum was 76%, which is remarkably higher than that by any of the existing predictors that also have the capacity to deal with this kind of system. Further comparisons were also made via two independent datasets; all indicated that the success rates by iLoc-Hum were even more significantly higher than its counterparts. As a user-friendly web-server, iLoc-Hum is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/ iLoc-Hum or http://www.jci-bioinfo.cn/iLoc-Hum. For the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results by choosing either a straightforward submission or a batch submission, without the need to follow the complicated mathematical equations involved. © 2012 The Royal Society of Chemistry.


Chou K.-C.,Gordon Life Science Institute | Chou K.-C.,King Abdulaziz University
Molecular BioSystems | Year: 2013

Many molecular biosystems and biomedical systems belong to the multi-label systems in which each of their constituent molecules possesses one or more than one function or feature, and hence needs one or more than one label to indicate its attribute(s). With the avalanche of biological sequences generated in the post genomic age, it is highly desirable to develop computational methods to timely and reliably identify their various kinds of attributes. Compared with the single-label systems, the multi-label systems are much more complicated and difficult to deal with. The current mini review focuses on the recent progresses in this area from both conceptual aspects and detailed mathematical formulations. © 2013 The Royal Society of Chemistry.


Chou K.-C.,Gordon Life Science Institute
Journal of Theoretical Biology | Year: 2011

With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences. © 2010 Elsevier Ltd.

Loading Gordon Life Science Institute collaborators
Loading Gordon Life Science Institute collaborators