Kelk S.,Maastricht University |
Fischer M.,Institute For Mathematik Und Informatik |
Moulton V.,University of East Anglia |
Wu T.,University of East Anglia
Theoretical Computer Science | Year: 2016
In phylogenetics, distances are often used to measure the incongruence between a pair of phylogenetic trees that are reconstructed by different methods or using different regions of genome. Motivated by the maximum parsimony principle in tree inference, we recently introduced the maximum parsimony (MP) distance, which enjoys various attractive properties due to its connection with several other well-known tree distances, such as . tbr and . spr. Here we show that computing the MP distance between two trees, a NP-hard problem in general, is fixed parameter tractable in terms of the . tbr distance between the tree pair. Our approach is based on two reduction rules - the chain reduction and the subtree reduction - that are widely used in computing . tbr and . spr distances. More precisely, we show that reducing chains to length 4 (but not shorter) preserves the MP distance. In addition, we describe a generalization of the subtree reduction which allows the pendant subtrees to be rooted in different places, and show that this still preserves the MP distance. On a slightly different note we also show that Monadic Second Order Logic (MSOL), posited over an auxiliary graph structure known as the display graph (obtained by merging the two trees at their leaves), can be used to obtain an alternative proof that computation of MP distance is fixed parameter tractable in terms of . tbr-distance. We conclude with an extended discussion in which we focus on similarities and differences between MP distance and TBR distance and present a number of open problems. One particularly intriguing question, emerging from the MSOL formulation, is whether two trees with bounded MP distance induce display graphs of bounded treewidth. © 2016 Elsevier B.V.
Martin F.,Philip Morris Products SA |
Bovet L.,Philip Morris Products SA |
Cordier A.,Philip Morris Products SA |
Stanke M.,Institute For Mathematik Und Informatik |
And 3 more authors.
BMC Genomics | Year: 2012
Background: For decades the tobacco plant has served as a model organism in plant biology to answer fundamental biological questions in the areas of plant development, physiology, and genetics. Due to the lack of sufficient coverage of genomic sequences, however, none of the expressed sequence tag (EST)-based chips developed to date cover gene expression from the whole genome. The availability of Tobacco Genome Initiative (TGI) sequences provides a useful resource to build a whole genome exon array, even if the assembled sequences are highly fragmented. Here, the design of a Tobacco Exon Array is reported and an application to improve the understanding of genes regulated by cadmium (Cd) in tobacco is described.Results: From the analysis and annotation of the 1,271,256 Nicotiana tabacum fasta and quality files from methyl filtered genomic survey sequences (GSS) obtained from the TGI and ~56,000 ESTs available in public databases, an exon array with 272,342 probesets was designed (four probes per exon) and tested on two selected tobacco varieties.Two tobacco varieties out of 45 accumulating low and high cadmium in leaf were identified based on the GGE biplot analysis, which is analysis of the genotype main effect (G) plus analysis of the genotype by environment interaction (GE) of eight field trials (four fields over two years) showing reproducibility across the trials. The selected varieties were grown under greenhouse conditions in two different soils and subjected to exon array analyses using root and leaf tissues to understand the genetic make-up of the Cd accumulation.Conclusions: An Affymetrix Exon Array was developed to cover a large (~90%) proportion of the tobacco gene space. The Tobacco Exon Array will be available for research use through Affymetrix array catalogue. As a proof of the exon array usability, we have demonstrated that the Tobacco Exon Array is a valuable tool for studying Cd accumulation in tobacco leaves. Data from field and greenhouse experiments supported by gene expression studies strongly suggested that the difference in leaf Cd accumulation between the two specific tobacco cultivars is dependent solely on genetic factors and genetic variability rather than on the environment. © 2012 Martin et al; licensee BioMed Central Ltd.
Unterthiner T.,University of Gottingen |
Schultz A.-K.,University of Gottingen |
Bulla J.,University of Caen Lower Normandy |
Morgenstern B.,University of Gottingen |
And 3 more authors.
BMC Bioinformatics | Year: 2011
Background: Methods of determining whether or not any particular HIV-1 sequence stems - completely or in part - from some unknown HIV-1 subtype are important for the design of vaccines and molecular detection systems, as well as for epidemiological monitoring. Nevertheless, a single algorithm only, the Branching Index (BI), has been developed for this task so far. Moving along the genome of a query sequence in a sliding window, the BI computes a ratio quantifying how closely the query sequence clusters with a subtype clade. In its current version, however, the BI does not provide predicted boundaries of unknown fragments.Results: We have developed Unknown Subtype Finder (USF), an algorithm based on a probabilistic model, which automatically determines which parts of an input sequence originate from a subtype yet unknown. The underlying model is based on a simple profile hidden Markov model (pHMM) for each known subtype and an additional pHMM for an unknown subtype. The emission probabilities of the latter are estimated using the emission frequencies of the known subtypes by means of a (position-wise) probabilistic model for the emergence of new subtypes. We have applied USF to SIV and HIV-1 sequences formerly classified as having emerged from an unknown subtype. Moreover, we have evaluated its performance on artificial HIV-1 recombinants and non-recombinant HIV-1 sequences. The results have been compared with the corresponding results of the BI.Conclusions: Our results demonstrate that USF is suitable for detecting segments in HIV-1 sequences stemming from yet unknown subtypes. Comparing USF with the BI shows that our algorithm performs as good as the BI or better. © 2011 Unterthiner et al; licensee BioMed Central Ltd.
Monti M.,Max Planck Institute for Human Development |
Gigerenzer G.,Max Planck Institute for Human Development |
Gigerenzer G.,Institute For Mathematik Und Informatik |
Martignon L.,Max Planck Institute for Human Development |
Martignon L.,Institute For Mathematik Und Informatik
Sistemi Intelligenti | Year: 2012
This paper aims to uncover the decision processes used by average investors, including their investment goals, the information sets they consider, and the number of factors that influence high-stakes financial decisions. We present new experimental and survey data collected from bank customers at several Italian banks. Most subjects use a strict subset of the information available to them, ignoring variables that standard economic models typically assume drive investors behavior. Fast and information-frugal heuristics model the information search of many subjects observed in this study, reflecting a noncompensatory-lexicographic hierarchy of features, risk, time horizon and cost, considered in that order. Decision behavior reflects a simple combination of a fast and frugal tree and a tallying rule, predicting 78% of investors decisions.