Interuniversity Institute of Bioinformatics in Brussels

Brussels, Belgium

Interuniversity Institute of Bioinformatics in Brussels

Brussels, Belgium
Time filter
Source Type

Dalkas G.A.,Roosevelt University | Dalkas G.A.,Heriot - Watt University | Rooman M.,Roosevelt University | Rooman M.,Interuniversity Institute of Bioinformatics in Brussels
BMC Bioinformatics | Year: 2017

Background: The identification of immunogenic regions on the surface of antigens, which are able to be recognized by antibodies and to trigger an immune response, is a major challenge for the design of new and effective vaccines. The prediction of such regions through computational immunology techniques is a challenging goal, which will ultimately lead to a drastic limitation of the experimental tests required to validate their efficiency. However, current methods are far from being sufficiently reliable and/or applicable on a large scale. Results: We developed SEPIa, a B-cell epitope predictor from the protein sequence, which is sufficiently fast to be applicable on a large scale. The originality of SEPIa lies in the combination of two classifiers, a naïve Bayesian and a random forest classifier, through a voting algorithm that exploits the advantages of both. It is based on 13 sequence-based features, whose values in a 9-residue sequence window are compiled to predict the epitope/non-epitope state of the central residue. The features are related to the type of amino acid, its conservation in homologous proteins, and its tendency of being exposed to the solvent, soluble, flexible, and disordered. The highest signal is obtained from statistical amino acid preferences, but all 13 features contribute non-negligibly in the predictor. SEPIa's average prediction accuracy is limited, with an AUC score (area under the receiver operating characteristic curve) that reaches 0.65 both in 10-fold cross-validation and on an independent test set. It is nevertheless slightly higher than that of other methods evaluated on the same test set. Conclusions: SEPIa was applied to a test protein whose epitopes are known, human β2 adrenergic G-protein-coupled receptor, with promising results. Although the actual AUC score is rather low, many of the predicted epitopes cluster together and overlap the experimental epitope region. The reasons underlying the limitations of SEPIa and of all other B-cell epitope predictors are discussed. © 2017 The Author(s).

Hou Q.,VU University Amsterdam | De Geest P.F.G.,VU University Amsterdam | Vranken W.F.,Interuniversity Institute of Bioinformatics in Brussels | Vranken W.F.,Vrije Universiteit Brussel | And 3 more authors.
Bioinformatics | Year: 2017

Motivation: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in protein-protein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction. In this paper, we evaluate the importance of various features using Random Forest (RF), and include as a novel feature backbone flexibility predicted from sequences to further optimise protein interface prediction. Results: We observe that there is no single sequence feature that enables pinpointing interacting sites in our Random Forest models. However, combining different properties does increase the performance of interface prediction. Our homomeric-trained RF interface predictor is able to distinguish interface from non-interface residues with an area under the ROC curve of 0.72 in a homomeric test-set. The heteromeric-trained RF interface predictor performs better than existing predictors on a independent heteromeric test-set. We trained a more general predictor on the combined homomeric and heteromeric dataset, and show that in addition to predicting homomeric interfaces, it is also able to pinpoint interface residues in heterodimers. This suggests that our random forest model and the features included capture common properties of both homodimer and heterodimer interfaces. Availability and Implementation: The predictors and test datasets used in our analyses are freely available ( . © The Author 2017.

Pucci F.,Roosevelt University | Pucci F.,Interuniversity Institute of Bioinformatics in Brussels | Rooman M.,Roosevelt University | Rooman M.,Interuniversity Institute of Bioinformatics in Brussels
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences | Year: 2016

Despite the intense efforts of the last decades to understand the thermal stability of proteins, the mechanisms responsible for its modulation still remain debated. In this investigation, we tackle this issue by showing how a multiscale perspective can yield new insights. With the help of temperaturedependent statistical potentials, we analysed some amino acid interactions at the molecular level, which are suggested to be relevant for the enhancement of thermal resistance. We then investigated the thermal stability at the protein level by quantifying its modification upon amino acid substitutions. Finally, a large scale analysis of protein stability-at the structurome level-contributed to the clarification of the relation between stability and natural evolution, thereby showing that the mutational profile of proteins differs according to their thermal properties. Some considerations on how the multiscale approach could help in unravelling the protein stability mechanisms are briefly discussed. This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'. © 2016 The Author(s) Published by the Royal Society. All rights reserved.

Faust K.,Catholic University of Leuven | Faust K.,Center for the Biology of Disease | Faust K.,Vrije Universiteit Brussel | Lahti L.,Wageningen University | And 8 more authors.
Current Opinion in Microbiology | Year: 2015

The recent increase in the number of microbial time series studies offers new insights into the stability and dynamics of microbial communities, from the world's oceans to human microbiota. Dedicated time series analysis tools allow taking full advantage of these data. Such tools can reveal periodic patterns, help to build predictive models or, on the contrary, quantify irregularities that make community behavior unpredictable. Microbial communities can change abruptly in response to small perturbations, linked to changing conditions or the presence of multiple stable states. With sufficient samples or time points, such alternative states can be detected. In addition, temporal variation of microbial interactions can be captured with time-varying networks. Here, we apply these techniques on multiple longitudinal datasets to illustrate their potential for microbiome research. © 2015 The Authors.

Vranken W.F.,Vrije Universiteit Brussel | Vranken W.F.,Interuniversity Institute of Bioinformatics in Brussels
Progress in Nuclear Magnetic Resonance Spectroscopy | Year: 2014

NMR spectroscopy is a key technique for understanding the behaviour of proteins, especially highly dynamic proteins that adopt multiple conformations in solution. Overall, protein structures determined from NMR spectroscopy data constitute just over 10% of the Protein Data Bank archive. This review covers the validation of these NMR protein structures, but rather than describing currently available methodology, it focuses on concepts that are important for understanding where and how validation is most relevant. First, the inherent characteristics of the protein under study have an influence on quality and quantity of the distinct types of data that can be acquired from NMR experiments. Second, these NMR data are necessarily transformed into a model for use in a structure calculation protocol, and the protein structures that result from this reflect the types of NMR data used as well as the protein characteristics. The validation of NMR protein structures should therefore take account, wherever possible, of the inherent behavioural characteristics of the protein, the types of available NMR data, and the calculation protocol. These concepts are discussed in the context of 'knowledge based' and 'model versus data' validation, with suggestions for questions to ask and different validation categories to consider. The principal aim of this review is to stimulate discussion and to help the reader understand the relationships between the above elements in order to make informed decisions on which validation approaches are the most relevant in particular cases. © 2014 Elsevier B.V. All rights reserved.

Skwark M.J.,University of Stockholm | Skwark M.J.,Aalto University | Raimondi D.,University of Stockholm | Raimondi D.,Interuniversity Institute of Bioinformatics in Brussels | And 2 more authors.
PLoS Computational Biology | Year: 2014

Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction. © 2014 Skwark et al.

Cilia E.,Free University of Colombia | Cilia E.,Interuniversity Institute of Bioinformatics in Brussels | Pancsa R.,Vrije Universiteit Brussel | Tompa P.,Interuniversity Institute of Bioinformatics in Brussels | And 7 more authors.
Nucleic Acids Research | Year: 2014

Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at © 2014 The Author(s).

Cilia E.,Free University of Colombia | Cilia E.,Interuniversity Institute of Bioinformatics in Brussels | Pancsa R.,Vrije Universiteit Brussel | Tompa P.,Interuniversity Institute of Bioinformatics in Brussels | And 6 more authors.
Nature Communications | Year: 2013

Protein function and dynamics are closely related; however, accurate dynamics information is difficult to obtain. Here based on a carefully assembled data set derived from experimental data for proteins in solution, we quantify backbone dynamics properties on the amino-acid level and develop DynaMine - a fast, high-quality predictor of protein backbone dynamics. DynaMine uses only protein sequence information as input and shows great potential in distinguishing regions of different structural organization, such as folded domains, disordered linkers, molten globules and pre-structured binding motifs of different sizes. It also identifies disordered regions within proteins with an accuracy comparable to the most sophisticated existing predictors, without depending on prior disorder knowledge or three-dimensional structural information. DynaMine provides molecular biologists with an important new method that grasps the dynamical characteristics of any protein of interest, as we show here for human p53 and E1A from human adenovirus 5.© 2013 Macmillan Publishers Limited. All rights reserved.

PubMed | Interuniversity Institute of Bioinformatics in Brussels
Type: | Journal: Scientific reports | Year: 2016

Next Generation Sequencing is dramatically increasing the number of known protein sequences, with related experimentally determined protein structures lagging behind. Structural bioinformatics is attempting to close this gap by developing approaches that predict structure-level characteristics for uncharacterized protein sequences, with most of the developed methods relying heavily on evolutionary information collected from homologous sequences. Here we show that there is a substantial observational selection bias in this approach: the predictions are validated on proteins with known structures from the PDB, but exactly for those proteins significantly more homologs are available compared to less studied sequences randomly extracted from Uniprot. Structural bioinformatics methods that were developed this way are thus likely to have over-estimated performances; we demonstrate this for two contact prediction methods, where performances drop up to 60% when taking into account a more realistic amount of evolutionary information. We provide a bias-free dataset for the validation for contact prediction methods called NOUMENON.

PubMed | Interuniversity Institute of Bioinformatics in Brussels and Vrije Universiteit Brussel
Type: Journal Article | Journal: Human mutation | Year: 2016

Cysteines are among the rarest amino acids in nature, and are both functionally and structurally very important for proteins. The ability of cysteines to form disulfide bonds is especially relevant, both for constraining the folded state of the protein and for performing enzymatic duties. But how does the variation record of human proteins reflect their functional importance and structural role, especially with regard to deleterious mutations? We created HUMCYS, a manually curated dataset of single amino acid variants that (1) have a known disease/neutral phenotypic outcome and (2) cause the loss of a cysteine, in order to investigate how mutated cysteines relate to structural aspects such as surface accessibility and cysteine oxidation state. We also have developed a sequence-based in silico cysteine oxidation predictor to overcome the scarcity of experimentally derived oxidation annotations, and applied it to extend our analysis to classes of proteins for which the experimental determination of their structure is technically challenging, such as transmembrane proteins. Our investigation shows that we can gain insights into the reason behind the outcome of cysteine losses in otherwise uncharacterized proteins, and we discuss the possible molecular mechanisms leading to deleterious phenotypes, such as the involvement of the mutated cysteine in a structurally or enzymatically relevant disulfide bond.

Loading Interuniversity Institute of Bioinformatics in Brussels collaborators
Loading Interuniversity Institute of Bioinformatics in Brussels collaborators