Institute Of Biologie Computationnelle

Montpellier, France

Institute Of Biologie Computationnelle

Montpellier, France
Time filter
Source Type

Cruaud A.,French National Institute for Agricultural Research | Gautier M.,French National Institute for Agricultural Research | Gautier M.,Institute Of Biologie Computationnelle | Galan M.,French National Institute for Agricultural Research | And 7 more authors.
Molecular Biology and Evolution | Year: 2014

Next-generation sequencing opened up new possibilities in phylogenetics; however, choosing an appropriate method of sample preparation remains challenging. Here, we demonstrate that restriction-site-associated DNA sequencing (RADseq) generates useful data for phylogenomics. Analysis of our RAD library using current bioinformatic and phylogenetic tools produced 400 more sites than our Sanger approach (2,262,825 nt/species), fully resolving relationships between 18 species of ground beetles (divergences up to 17 My). This suggests that RAD-seq is promising to infer phylogeny of eukaryotic species, though potential biases need to be evaluated and new methodologies developed to take full advantage of such data. © The Author 2014.

Vitalis R.,Montpellier SupAgro | Vitalis R.,Institute Of Biologie Computationnelle | Gautier M.,Montpellier SupAgro | Gautier M.,Institute Of Biologie Computationnelle | And 2 more authors.
Genetics | Year: 2014

The recent advent of high-throughput sequencing and genotyping technologies makes it possible to produce, easily and cost effectively, large amounts of detailed data on the genotype composition of populations. Detecting locus-specific effects may help identify those genes that have been, or are currently, targeted by natural selection. How best to identify these selected regions, loci, or single nucleotides remains a challenging issue. Here, we introduce a new model-based method, called SelEstim, to distinguish putative selected polymorphisms from the background of neutral (or nearly neutral) ones and to estimate the intensity of selection at the former. The underlying population genetic model is a diffusion approximation for the distribution of allele frequency in a population subdivided into a number of demes that exchange migrants. We use a Markov chain Monte Carlo algorithm for sampling from the joint posterior distribution of the model parameters, in a hierarchical Bayesian framework. We present evidence from stochastic simulations, which demonstrates the good power of SelEstim to identify loci targeted by selection and to estimate the strength of selection acting on these loci, within each deme. We also reanalyze a subset of SNP data from the Stanford HGDP-CEPH Human Genome Diversity Cell Line Panel to illustrate the performance of SelEstim on real data. In agreement with previous studies, our analyses point to a very strong signal of positive selection upstream of the LCT gene, which encodes for the enzyme lactase-phlorizin hydrolase and is associated with adult-type hypolactasia. The geographical distribution of the strength of positive selection across the Old World matches the interpolated map of lactase persistence phenotype frequencies, with the strongest selection coefficients in Europe and in the Indus Valley. © 2014 by the Genetics Society of America.

El Baidouri F.,Montpellier University | Diancourt L.,Institute Pasteur Paris | Berry V.,CNRS Montpellier Laboratory of Informatics, Robotics and Microelectronics | Berry V.,Institute Of Biologie Computationnelle | And 5 more authors.
PLoS Neglected Tropical Diseases | Year: 2013

Leishmaniasis is a complex parasitic disease from a taxonomic, clinical and epidemiological point of view. The role of genetic exchanges has been questioned for over twenty years and their recent experimental demonstration along with the identification of interspecific hybrids in natura has revived this debate. After arguing that genetic exchanges were exceptional and did not contribute to Leishmania evolution, it is currently proposed that interspecific exchanges could be a major driving force for rapid adaptation to new reservoirs and vectors, expansion into new parasitic cycles and adaptation to new life conditions.To assess the existence of gene flows between species during evolution we used MLSA-based (MultiLocus Sequence Analysis) approach to analyze 222 Leishmania strains from Africa and Eurasia to accurately represent the genetic diversity of this genus. We observed a remarkable congruence of the phylogenetic signal and identified seven genetic clusters that include mainly independent lineages which are accumulating divergences without any sign of recent interspecific recombination. From a taxonomic point of view, the strong genetic structuration of the different species does not question the current classification, except for species that cause visceral forms of leishmaniasis (L. donovani, L. infantum and L. archibaldi). Although these taxa cause specific clinical forms of the disease and are maintained through different parasitic cycles, they are not clearly distinct and form a continuum, in line with the concept of species complex already suggested for this group thirty years ago. These results should have practical consequences concerning the molecular identification of parasites and the subsequent therapeutic management of the disease. © 2013 El Baidouri et al.

Kajava A.V.,French National Center for Scientific Research | Kajava A.V.,Institute Of Biologie Computationnelle | Kajava A.V.,Saint Petersburg State University of Information Technologies, Mechanics and Optics | Klopffleisch K.,University of Cologne | And 2 more authors.
Scientific Reports | Year: 2014

The Rip homotypic interaction motif (RHIM) is a short, non-globular sequence stretch that mediates a key interaction of mammalian necroptosis signaling. In order to understand its unusual oligomerization properties, we set out to trace the evolutionary origins of the RHIM motif by identifying distantly related protein motifs that might employ the same binding mode. The RHIM motif was found to be related to the prion-forming domain of the HET-s protein, which oligomerizes by forming structurally well-characterized fibrils and is involved in fungal heterokaryon incompatibility. This evolutionary relationship explains the recently reported propensity of mammalian RHIM motifs to form amyloid fibrils, but suggests that these fibrils have a different structural architecture than currently assumed. These findings, together with numerous observations of RHIM-like motifs in immunity proteins from a wide range of species, provide insight to the modern innate immunity pathways in animals, plants and fungi.

Ahmed A.B.,French National Center for Scientific Research | Ahmed A.B.,Institute Of Biologie Computationnelle | Kajava A.V.,French National Center for Scientific Research | Kajava A.V.,Institute Of Biologie Computationnelle
FEBS Letters | Year: 2013

Numerous studies have shown that the ability to form amyloid fibrils is an inherent property of the polypeptide chain. This has lead to the development of several computational approaches to predict amyloidogenicity by amino acid sequences. Here, we discuss the principles governing these methods, and evaluate them using several datasets. They deliver excellent performance in the tests made using short peptides (∼6 residues). However, there is a general tendency towards a high number of false positives when tested against longer sequences. This shortcoming needs to be addressed as these longer sequences are linked to diseases. Recent structural studies have shown that the core element of the majority of disease-related amyloid fibrils is a β-strand-loop- β-strand motif called β-arch. This insight provides an opportunity to substantially improve the prediction of amyloids produced by natural proteins, ushering in an era of personalized medicine based on genome analysis. © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

Di Domenico T.,University of Padua | Potenza E.,University of Padua | Walsh I.,University of Padua | Gonzalo Parra R.,University of Buenos Aires | And 9 more authors.
Nucleic Acids Research | Year: 2014

RepeatsDB ( is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services. © 2013 The Author(s). Published by Oxford University Press.

Rousset F.,Montpellier University | Rousset F.,Institute Of Biologie Computationnelle | Ferdy J.-B.,CNRS Biological Evolution and Diversity Laboratory
Ecography | Year: 2014

Spatial autocorrelation is a well-recognized concern for observational data in general, and more specifically for spatial data in ecology. Generalized linear mixed models (GLMMs) with spatially autocorrelated random effects are a potential general framework for handling these spatial correlations. However, as the result of statistical and practical issues, such GLMMs have been fitted through the undocumented use of procedures based on penalized quasi-likelihood approximations (PQL), and under restrictive models of spatial correlation. Alternatively, they are often neglected in favor of simpler but more questionable approaches. In this work we aim to provide practical and validated means of inference under spatial GLMMs, that overcome these limitations. For this purpose, a new software is developed to fit spatial GLMMs. We use it to assess the performance of likelihood ratio tests for fixed effects under spatial autocorrelation, based on Laplace or PQL approximations of the likelihood. Expectedly, the Laplace approximation performs generally slightly better, although a variant of PQL was better in the binary case. We show that a previous implementation of PQL methods in the R language, glmmPQL, is not appropriate for such applications. Finally, we illustrate the efficiency of a bootstrap procedure for correcting the small sample bias of the tests, which applies also to non-spatial models. © 2014 The Authors.

Gautier M.,Montpellier SupAgro | Gautier M.,Institute Of Biologie Computationnelle | Vitalis R.,Montpellier SupAgro | Vitalis R.,Institute Of Biologie Computationnelle | Vitalis R.,Montpellier University
Molecular Biology and Evolution | Year: 2013

The recent development of high-throughput genotyping technologies has revolutionized the collection of data in a wide range of both model and nonmodel species. These data generally contain huge amounts of information about the demographic history of populations. In this study, we introduce a new method to estimate divergence times on a diffusion time scale from large single-nucleotide polymorphism (SNP) data sets, conditionally on a population history that is represented as a tree. We further assume that all the observed polymorphisms originate from the most ancestral (root) population; that is, we neglect mutations that occur after the split of the most ancestral population. This method relies on a hierarchical Bayesian model, based on Kimura's time-dependent diffusion approximation of genetic drift. We implemented a Metropolis-Hastings within Gibbs sampler to estimate the posterior distribution of the parameters of interest in this model, which we refer to as the Kimura model. Evaluating the Kimura model on simulated population histories, we found that it provides accurate estimates of divergence time. Assessing model fit using the deviance information criterion (DIC) proved efficient for retrieving the correct tree topology among a set of competing histories. We show that this procedure is robust to low-to-moderate gene flow, as well as to ascertainment bias, providing that the most distantly related populations are represented in the discovery panel. As an illustrative example, we finally analyzed published human data consisting in genotypes for 452,198 SNPs from individuals belonging to four populations worldwide. Our results suggest that the Kimura model may be helpful to characterize the demographic history of differentiated populations, using genome-wide allele frequency data. © 2012 The Author.

Rousset F.,Montpellier University | Rousset F.,Institute Of Biologie Computationnelle
Genetics | Year: 2013

A canon of population genetics concerns the properties of FST, a descriptor of spatial genetic structure. Interest for FST arose from Wright's early insights linking FST to dispersal parameters as well as to his concept of effective population size (e.g., Wright 1938, 1951). Although there is continued interest in this topic, FST also serves in other applications, such as detecting selected markers in natural populations (Beaumont and Nichols 1996) and more often in routine descriptive works. Remarkably, it is the latter use that seems to attract most discussion. Alternative descriptors have been proposed. Conversely, attempts have been made to draw biological inferences from FST properties that do not depend on biological processes. A reconsideration of its properties under biological scenarios underlines the weaknesses of such approaches. © 2013 by the Genetics Society of America.

Gautier M.,Montpellier SupAgro | Gautier M.,Institute Of Biologie Computationnelle
Molecular Ecology Resources | Year: 2014

The recent democratization of next-generation-sequencing-based approaches towards nonmodel species has made it cost-effective to produce large genotyping data sets for a wider range of species. However, when no detailed genome assembly is available, poor knowledge about the organization of the markers within the genome might hamper the optimal use of this abundant information. At the most basic level of genomic organization, the type of chromosome (autosomes, sex chromosomes, mitochondria or chloroplast in plants) may remain unknown for most markers which might be limiting or even misleading in some applications, particularly in population genetics. Conversely, the characterization of sex-linked markers allows molecular sexing of the individuals. In this study, we propose a Bayesian model-based classifier named detsex, to assign markers to their chromosome type and/or to perform sexing of individuals based on genotyping data. The performance of detsex is further evaluated by a comprehensive simulation study and by the analysis of real data sets from various origins (microsatellite and SNP data derived from genotyping assay designs and NGS experiments). Irrespective of the origin of the markers or the size of the data set, detsex was proved efficient (i) to identify the sex-linked markers, (ii) to perform molecular sexing of the individuals and (iii) to perform basic quality check of the genotyping data sets. The underlying structure of the model also allows to consider each of these potential applications either separately or jointly. © 2014 John Wiley & Sons Ltd.

Loading Institute Of Biologie Computationnelle collaborators
Loading Institute Of Biologie Computationnelle collaborators