Suratanee A.,King Mongkuts University of Technology Bangkok |
Schaefer M.H.,EMBL CRG Systems Biology Research Unit |
Betts M.J.,University of Heidelberg |
Soons Z.,Leibniz Institute for Natural Product Research and Infection Biology |
And 22 more authors.
PLoS Computational Biology | Year: 2014
Characterizing the activating and inhibiting effect of protein-protein interactions (PPI) is fundamental to gain insight into the complex signaling system of a human cell. A plethora of methods has been suggested to infer PPI from data on a large scale, but none of them is able to characterize the effect of this interaction. Here, we present a novel computational development that employs mitotic phenotypes of a genome-wide RNAi knockdown screen and enables identifying the activating and inhibiting effects of PPIs. Exemplarily, we applied our technique to a knockdown screen of HeLa cells cultivated at standard conditions. Using a machine learning approach, we obtained high accuracy (82% AUC of the receiver operating characteristics) by cross-validation using 6,870 known activating and inhibiting PPIs as gold standard. We predicted de novo unknown activating and inhibiting effects for 1,954 PPIs in HeLa cells covering the ten major signaling pathways of the Kyoto Encyclopedia of Genes and Genomes, and made these predictions publicly available in a database. We finally demonstrate that the predicted effects can be used to cluster knockdown genes of similar biological processes in coherent subgroups. The characterization of the activating or inhibiting effect of individual PPIs opens up new perspectives for the interpretation of large datasets of PPIs and thus considerably increases the value of PPIs as an integrated resource for studying the detailed function of signaling pathways of the cellular system of interest. © 2014 Suratanee et al. Source
Krallinger M.,Spanish National Cancer Research Center |
Vazquez M.,Spanish National Cancer Research Center |
Leitner F.,Spanish National Cancer Research Center |
Salgado D.,Monash University |
And 30 more authors.
BMC Bioinformatics | Year: 2011
Background: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.Results: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.Conclusions: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows. © 2011 Krallinger et al. Source
Gueguen L.,French Institute for Research in Computer Science and Automation |
Gaillard S.,CNRS Research Institute on Horticulture and Seeds |
Boussau B.,French Institute for Research in Computer Science and Automation |
Boussau B.,University of California at Berkeley |
And 16 more authors.
Molecular Biology and Evolution | Year: 2013
Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of preexisting, optimized - yet extensible - code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics, and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included. © The Author 2013. Source
Fehlker M.,Experimental Clinical Research Center |
Huska M.R.,Computational Biology and Data Mining group |
Huska M.R.,Max Planck Institute for Molecular Genetics |
Jons T.,Institute For Integrative Anatomie |
And 2 more authors.
BMC Cancer | Year: 2014
Background: This study aimed at the identification of prognostic gene expression markers in early primary colorectal carcinomas without metastasis at the time point of surgery by analyzing genome-wide gene expression profiles using oligonucleotide microarrays.Methods: Cryo-conserved tumor specimens from 45 patients with early colorectal cancers were examined, with the majority of them being UICC stage II or earlier and with a follow-up time of 41-115 months. Gene expression profiling was performed using Whole Human Genome 4x44K Oligonucleotide Microarrays. Validation of microarray data was performed on five of the genes in a smaller cohort.Results: Using a novel algorithm based on the recursive application of support vector machines (SVMs), we selected a signature of 44 probes that discriminated between patients developing later metastasis and patients with a good prognosis. Interestingly, almost half of the genes was related to the patients' immune response and showed reduced expression in the metastatic cases.Conclusions: Whereas up to now gene signatures containing genes with various biological functions have been described for prediction of metastasis in CRC, in this study metastasis could be well predicted by a set of gene expression markers consisting exclusively of genes related to the MHC class II complex involved in immune response. Thus, our data emphasize that the proper function of a comprehensive network of immune response genes is of vital importance for the survival of colorectal cancer patients. © 2014 Fehlker et al.; licensee BioMed Central Ltd. Source