Time filter

Source Type

Chen L.,Shanghai Maritime University | Zeng W.-M.,Shanghai Maritime University | Cai Y.-D.,Shanghai University | Cai Y.-D.,Gordon Life Science Institute | And 3 more authors.
PLoS ONE | Year: 2012

The Anatomical Therapeutic Chemical (ATC) classification system, recommended by the World Health Organization, categories drugs into different classes according to their therapeutic and chemical characteristics. For a set of query compounds, how can we identify which ATC-class (or classes) they belong to? It is an important and challenging problem because the information thus obtained would be quite useful for drug development and utilization. By hybridizing the informations of chemical-chemical interactions and chemical-chemical similarities, a novel method was developed for such purpose. It was observed by the jackknife test on a benchmark dataset of 3,883 drug compounds that the overall success rate achieved by the prediction method was about 73% in identifying the drugs among the following 14 main ATC-classes: (1) alimentary tract and metabolism; (2) blood and blood forming organs; (3) cardiovascular system; (4) dermatologicals; (5) genitourinary system and sex hormones; (6) systemic hormonal preparations, excluding sex hormones and insulins; (7) anti-infectives for systemic use; (8) antineoplastic and immunomodulating agents; (9) musculoskeletal system; (10) nervous system; (11) antiparasitic products, insecticides and repellents; (12) respiratory system; (13) sensory organs; (14) various. Such a success rate is substantially higher than 7% by the random guess. It has not escaped our notice that the current method can be straightforwardly extended to identify the drugs for their 2 nd-level, 3 rd-level, 4 th-level, and 5 th-level ATC-classifications once the statistically significant benchmark data are available for these lower levels. © 2012 Chen et al.


Huang T.,Shanghai University | Huang T.,CAS Shanghai Institutes for Biological Sciences | Huang T.,Shanghai Center for Bioinformation Technology | Chen L.,Shanghai Maritime University | And 3 more authors.
PLoS ONE | Year: 2011

Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) "Metabolism", (ii) "Genetic Information Processing", (iii) "Environmental Information Processing", (iv) "Cellular Processes", (v) "Organismal Systems", and (vi) "Human Diseases". The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area. © 2011 Huang et al.


Hu L.-L.,Shanghai University | Huang T.,CAS Shanghai Institutes for Biological Sciences | Huang T.,Shanghai Center for Bioinformation Technology | Cai Y.-D.,Shanghai University | And 3 more authors.
PLoS ONE | Year: 2011

Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development. © 2011 Hu et al.


Pan B.,CAS Institute of Plant Physiology and Ecology | Sheng J.,Shanghai Center for Bioinformation Technology | Sun W.,CAS Institute of Plant Physiology and Ecology | Zhao Y.,Tarim University | And 2 more authors.
Nucleic Acids Research | Year: 2013

Plants have large diverse families of small secreted proteins (SSPs) that play critical roles in the processes of development, differentiation, defense, flowering, stress response, symbiosis, etc. Oryza sativa is one of the major crops worldwide and an excellent model for monocotyledonous plants. However, there had not been any effort to systematically analyze rice SSPs. Here, we constructed a comparative platform, OrysPSSP (http://www.genoportal.org/PSSP/ index.do), involving >100000 SSPs from rice and 25 plant species. OrysPSSP is composed of a core SSP database and a dynamic web interface that integrates a variety of user tools and resources. The current release (v0530) of core SSP database contains a total of 101 048 predicted SSPs, which were generated through a rigid computation/curation pipeline. The web interface consists of eight different modules, providing users with rich resources/functions, e.g. browsing SSP by chromosome, searching and filtering SSP, validating SSP with omics data, comparing SSP among multiple species and querying core SSP database with BLAST. Some cases of application are discussed to demonstrate the utility of OrysPSSP. OrysPSSP serves as a comprehensive resource to explore SSP on the genome scale and across the phylogeny of plant species. © The Author(s) 2012.


Grant
Agency: European Commission | Branch: FP7 | Program: CSA-CA | Phase: HEALTH-2007-2.1.2-6 | Award Amount: 1.75M | Year: 2009

The aim of the PSIMEx proposal is to systematically make published molecular interaction data computationally accessible. We plan to further develop the existing standard for molecular interactions developed by the HUPO Proteomics Standards Initiative, and to promote its implementation in the entire chain from experiment planning via data formatting and analysis to data representation in journal publications and public databases. Key aspects will be the dissemination of and user training on minimum requirements for publication of molecular interaction data; the further development of the PSI-MI standard for representation of data fulfilling these minimal requirements; the specification of efficient data deposition tools and data flow from data producers to public repositories as part of the publication process; implementation of international data exchange among databases; training and exchange of curation staff in the participating databases; and the definition of analysis tools for the efficient use of data following the PSI-MI standards.


Wang Y.,Shanghai JiaoTong University | Wei D.-Q.,Shanghai JiaoTong University | Wang J.-F.,Shanghai JiaoTong University | Wang J.-F.,Shanghai Center for Bioinformation Technology
Journal of Chemical Information and Modeling | Year: 2010

T1 lipase is isolated from the palm Geobacillus zalihae strain T1 in Malaysia, functioning as a secreted protein responsible for the catalyzing hydrolysis of long-chain triglycerides into fatty acids and glycerol at high temperatures. In the current study, using 30 ns molecular dynamics simulations at different temperatures, an aqueous activation was detected for T1 lipase. This aqueous activation in T1 lipase was mainly caused by a double-flap movement mechanism. The double flaps were constituted by the hydrophobic helices 6 and 9. Helix 6 employed two major components with the hydrophilic part at the surface and the hydrophobic part inside. In the aqueous solution, the hydrophobic part could provide enough power for helix 6 to move away, driving the protein into an open configuration and exposing the catalytic triad. Our findings could provide structural evidence to support the double-flap movement, revealing the catalytic mechanism for T1 lipase. © 2010 American Chemical Society.


Wang J.-F.,Shanghai JiaoTong University | Wang J.-F.,Shanghai Center for Bioinformation Technology | Wang J.-F.,Gordon Life Science Institute | Chou K.-C.,Shanghai JiaoTong University | Chou K.-C.,Gordon Life Science Institute
Current Drug Metabolism | Year: 2010

The cytochrome P450 family is a large and diverse group of hemoproteins that are located in virtually all types of organism, such as bacteria, eukaryotes and even Archaea. These proteins are found throughout the body, however the highest concentrations are associated with liver. As the Human Genome Project completed, there are 57 genes and more than 59 pseudogenes divided among 18 families of CYP genes and 43 subfamilies have been detected. In humans, CYPs are the major enzymes involved in drug metabolism and bioactivation, accounting for almost 75% of the total drug metabolism. The variability in drug metabolisms that are mainly induced by the CYP polymorphisms is reflected on the differences of the maximal plasma concentrations, half lives of some drugs and their clearance. Besides, it can also lead to adverse drug reactions that are considered as a major factor in drug toxicity. So, the genotype-activity relationships of the CYP proteins have become a hot topic in recent years. It is important to further understand why a certain genotype influences enzyme activity and how to predict more structure-activity relationships. © 2010 Bentham Science Publishers Ltd.


Qin F.,Shanghai JiaoTong University | Chen Y.,Shanghai JiaoTong University | Wu M.,Shanghai JiaoTong University | Li Y.,Shanghai Center for Bioinformation Technology | And 3 more authors.
RNA | Year: 2010

The hairpin II of U1 snRNA can bind U1A protein with high affinity and specificity. NMR spectra suggest that the loop region of apo-RNA is largely unstructured and undergoes a transition from unstructured to well-folded upon U1Abinding. However, the mechanism that RNA folding coupled protein binding is poorly understood. To get an insight into the mechanism, we have performed explicit-solvent molecular dynamics (MD) to study the folding kinetics of bound RNA and apo-RNA. Room-temperature MD simulations suggest that the conformation of bound RNA has significant adjustment and becomes more stable upon U1A binding. Kinetic analysis of high-temperature MD simulations shows that bound RNA and apo-RNA unfold via a two-state process, respectively. Both kinetics and free energy landscape analyses indicate that bound RNA folds in the order of RNA contracting, U1A binding, and tertiary folding. The predicted Φ-values suggest that A8, C10, A11, and G16 are key bases for bound RNA folding. Mutant Arg52Gln analysis shows that electrostatic interaction and hydrogen bonds between RNA and U1A (Arg52Gln) decrease. These results are in qualitative agreement with experiments. Furthermore, this method could be used in other studies about biomolecule folding upon receptor binding. Copyright © 2010 RNA Society.


Zhao X.,Tongji University | Liu Q.,Tongji University | Cai Q.,Tongji University | Li Y.,Fudan University | And 4 more authors.
Nucleic Acids Research | Year: 2012

Viral integration plays an important role in the development of malignant diseases. Viruses differ in preferred integration site and flanking sequence. Viral integration sites (VIS) have been found next to oncogenes and common fragile sites. Understanding the typical DNA features near VIS is useful for the identification of potential oncogenes, prediction of malignant disease development and assessing the probability of malignant transformation in gene therapy. Therefore, we have built a database of human disease-related VIS (Dr.VIS, http://www.scbit.org/dbmi/drvis) to collect and maintain human disease-related VIS data, including characteristics of the malignant disease, chromosome region, genomic position and viral-host junction sequence. The current build of Dr.VIS covers about 600 natural VIS of 5 oncogenic viruses representing 11 diseases. Among them, about 200 VIS have viral-host junction sequence. © The Author(s) 2011.


Li Z.,Shanghai Center for Bioinformation Technology
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium | Year: 2012

Aim to ease the secondary use of clinical data in clinical research, we introduce a metadata driven web-based clinical data management system named ClinData Express. ClinData Express is made up of two parts: 1) m-designer, a standalone software for metadata definition; 2) a web based data warehouse system for data management. With ClinData Express, what the researchers need to do is to define the metadata and data model in the m-designer. The web interface for data collection and specific database for data storage will be automatically generated. The standards used in the system and the data export modular make sure of the data reuse. The system has been tested on seven disease-data collection in Chinese and one form from dbGap. The flexibility of system makes its great potential usage in clinical research. The system is available at http://code.google.com/p/clindataexpress.

Loading Shanghai Center for Bioinformation Technology collaborators
Loading Shanghai Center for Bioinformation Technology collaborators