Time filter

Source Type

Paladin L.,University of Padua | Hirsh L.,University of Padua | Hirsh L.,Catholic University of Peru | Piovesan D.,University of Padua | And 6 more authors.
Nucleic Acids Research | Year: 2017

RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by an extensive manual validation for >60% of the entries. The updated web interface includes a new search engine for complex queries and a fully re-designed entry page for a better overview of structural data. It is now possible to compare unit positions, together with secondary structure, fold information and Pfam domains. Moreover, a new classification level has been introduced on top of the existing scheme as an independent layer for sequence similarity relationships at 40%, 60% and 90% identity. © 2016 The Author(s).


Cornuet J.-M.,French National Institute for Agricultural Research | Pudlo P.,French National Institute for Agricultural Research | Pudlo P.,Montpellier University | Pudlo P.,Institute Of Biologie Computationnelle Ibc | And 13 more authors.
Bioinformatics | Year: 2014

Motivation: DIYABC is a software package for a comprehensive analysis of population history using approximate Bayesian computation on DNA polymorphism data. Version 2.0 implements a number of new features and analytical methods. It allows (i) the analysis of single nucleotide polymorphism data at large number of loci, apart from microsatellite and DNA sequence data, (ii) efficient Bayesian model choice using linear discriminant analysis on summary statistics and (iii) the serial launching of multiple post-processing analyses. DIYABC v2.0 also includes a user-friendly graphical interface with various new options. It can be run on three operating systems: GNU/Linux, Microsoft Windows and Apple Os X. © 2014 The Author.


PubMed | Institute Of Biologie Computationnelle Ibc, Montpellier University, University of Paris Dauphine and French National Institute for Agricultural Research
Type: Journal Article | Journal: Bioinformatics (Oxford, England) | Year: 2016

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques.We propose a novel approach based on a machine learning tool named random forests (RF) to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with RF and postponing the approximation of the posterior probability of the selected model for a second stage also relying on RF. Compared with earlier implementations of ABC model choice, the ABC RF approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least 50) and (iv) it includes an approximation of the posterior probability of the selected model. The call to RF will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets.The proposed methodology is implemented in the R package abcrf available on the CRAN.jean-michel.marin@umontpellier.frSupplementary data are available at Bioinformatics online.


Scornavacca C.,IRD Montpellier | Scornavacca C.,Institute Of Biologie Computationnelle Ibc | Jacox E.,IRD Montpellier | Szollosi G.J.,ELTE MTA Lendulet Biophysics Research Group 1117 Bp
Bioinformatics | Year: 2015

Motivation: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods - generally computationally more efficient - require a prior estimate of parameters and of the statistical support. Results: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events. © 2014 © The Author 2014. Published by Oxford University Press. © The Author 2014. Published by Oxford University Press. All rights reserved.


Swenson K.M.,Montpellier University | Swenson K.M.,Institute Of Biologie Computationnelle Ibc | Simonaitis P.,ENS Lyon | Blanchette M.,McGill University
Algorithms for Molecular Biology | Year: 2016

Background: Traditionally, the merit of a rearrangement scenario between two gene orders has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. physical proximity of the DNA segments implicated or repetitive sequences. Results: We propose optimization problems with the objective of maximizing overall likelihood, by weighting the rearrangements. We study a binary weight function suitable to the representation of sets of genome positions that are most likely to have swapped adjacencies. We give a polynomial-time algorithm for the problem of finding a minimum weight double cut and join scenario among all minimum length scenarios. In the process we solve an optimization problem on colored noncrossing partitions, which is a generalization of the Maximum Independent Set problem on circle graphs. Conclusions: We introduce a model for weighting genome rearrangements and show that under simple yet reasonable conditions, a fundamental distance can be computed in polynomial time. This is achieved by solving a generalization of the Maximum Independent Set problem on circle graphs. Several variants of the problem are also mentioned. © 2016 Swenson et al.


Swenson K.M.,Montpellier University | Swenson K.M.,Institute Of Biologie Computationnelle Ibc | Blanchette M.,McGill University
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2015

Traditionally, the merit of a rearrangement scenario between two genomes has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. physical proximity of the DNA segments implicated, or repetitive sequences. Accordingly, we propose optimization problems with the objective ofmaximizing overall likelihood, by weighting the rearrangements. We study a binary weight function suitable to the representation of sets of genome positions that are most likely to have swapped adjacencies. We give a polynomial-time algorithm for the problem of finding a minimum weight double cut and join (DCJ) scenario among all minimum length scenarios. In the process, we solve an optimization problem on colored noncrossing partitions which is a generalization of the Maximum Independent Set problem on circle graphs. © Springer-Verlag Berlin Heidelberg 2015.


To T.-H.,Montpellier University | To T.-H.,Institute Of Biologie Computationnelle Ibc | Scornavacca C.,Montpellier University | Scornavacca C.,Institute Of Biologie Computationnelle Ibc
BMC Genomics | Year: 2015

Reconciliation methods explain topology differences between a species tree and a gene tree by evolutionary events other than speciations. However, not all phylogenies are trees: hybridization can occur and create new species and this results into reticulate phylogenies. Here, we consider the problem of reconciling a gene tree with a species network via duplication and loss events. Two variants are proposed and solved with effcient algorithms: the first one finds the best tree in the network with which to reconcile the gene tree, and the second one finds the best reconciliation between the gene tree and the whole network. © 2015 To and Scornavacca et al.


Chan Y.-B.,Montpellier University | Ranwez V.,Montpellier SupAgro | Ranwez V.,Institute Of Biologie Computationnelle Ibc | Scornavacca C.,Montpellier University | Scornavacca C.,Institute Of Biologie Computationnelle Ibc
BMC Bioinformatics | Year: 2013

Background: Genes located in the same chromosome region share common evolutionary events more often than other genes (e.g. a segmental duplication of this region). Their evolution may also be related if they are involved in the same protein complex or biological process. Identifying co-evolving genes can thus shed light on ancestral genome structures and functional gene interactions.Results: We devise a simple, fast and accurate probability method based on species tree-gene tree reconciliations to detect when two gene families have co-evolved. Our method observes the number and location of predicted macro-evolutionary events, and estimates the probability of having the observed number of common events by chance.Conclusions: Simulation studies confirm that our method effectively identifies co-evolving families. This opens numerous perspectives on genome-scale analysis where this method could be used to pinpoint co-evolving gene families and thus help to unravel ancestral genome arrangements or undocumented gene interactions. © 2013 Chan et al.; licensee BioMed Central Ltd.


PubMed | Institute Of Biologie Computationnelle Ibc, Montpellier University and University of Lyon
Type: | Journal: Bioinformatics (Oxford, England) | Year: 2017

Gene trees reconstructed from sequence alignments contain poorly supported branches when the phylogenetic signal in the sequences is insufficient to determine them all. When a species tree is available, the signal of gains and losses of genes can be used to correctly resolve the unsupported parts of the gene history. However finding a most parsimonious binary resolution of a non-binary tree obtained by contracting the unsupported branches is NP-hard if transfer events are considered as possible gene scale events, in addition to gene origination, duplication and loss. We propose an exact, parameterized algorithm to solve this problem in single-exponential time, where the parameter is the number of connected branches of the gene tree that show low support from the sequence alignment or, equivalently, the maximum number of children of any node of the gene tree once the low-support branches have been collapsed. This improves on the best known algorithm by an exponential factor. We propose a way to choose among optimal solutions based on the available information. We show the usability of this principle on several simulated and biological datasets. The results are comparable in quality to several other tested methods having similar goals, but our approach provides a lower running time and a guarantee that the produced solution is optimal.Our algorithm has been integrated into the ecceTERA phylogeny package, available at http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and which can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera CONTACT: celine.scornavacca@umontpellier.frSupplementary information: Supplementary data are available at Bioinformatics online.


PubMed | Institute Pasteur in Cambodia, Institute Of Biologie Computationnelle Ibc, Montpellier University, National Center for Parasitology and 3 more.
Type: | Journal: Malaria journal | Year: 2016

Western Cambodia is recognized as the epicentre of emergence of Plasmodium falciparum multi-drug resistance. The emergence of artemisinin resistance has been observed in this area since 2008-2009 and molecular signatures associated to artemisinin resistance have been characterized in k13 gene. At present, one of the major threats faced, is the possible spread of Asian artemisinin resistant parasites over the world threatening millions of people and jeopardizing malaria elimination programme efforts. To anticipate the diffusion of artemisinin resistance, the identification of the P. falciparum population structure and the gene flow among the parasite population in Cambodia are essential.To this end, a mid-throughput PCR-LDR-FMA approach based on LUMINEX technology was developed to screen for genetic barcode in 533 blood samples collected in 2010-2011 from 16 health centres in malaria endemics areas in Cambodia.Based on successful typing of 282 samples, subpopulations were characterized along the borders of the country. Each 11-loci barcode provides evidence supporting allele distribution gradient related to subpopulations and gene flow. The 11-loci barcode successfully identifies recently emerging parasite subpopulations in western Cambodia that are associated with the C580Y dominant allele for artemisinin resistance in k13 gene. A subpopulation was identified in northern Cambodia that was associated to artemisinin (R539T resistant allele of k13 gene) and mefloquine resistance.The gene flow between these subpopulations might have driven the spread of artemisinin resistance over Cambodia.

Loading Institute Of Biologie Computationnelle Ibc collaborators
Loading Institute Of Biologie Computationnelle Ibc collaborators