Program in Computational Biology and Bioinformatics and.
PubMed | Wellcome Trust Sanger Institute, Program in Computational Biology and Bioinformatics and., Yale University and University of California at Santa Cruz
Type: Comparative Study | Journal: Proceedings of the National Academy of Sciences of the United States of America | Year: 2014
Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organisms genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.
PubMed | Program in Computational Biology and Bioinformatics and. and Yale University
Type: Journal Article | Journal: Bioinformatics (Oxford, England) | Year: 2015
As next generation sequencing gains a foothold in clinical genetics, there is a need for annotation tools to characterize increasing amounts of patient variant data for identifying clinically relevant mutations. While existing informatics tools provide efficient bulk variant annotations, they often generate excess information that may limit their scalability.We propose an alternative solution based on description logic inferencing to generate workflows that produce only those annotations that will contribute to the interpretation of each variant. Workflows are dynamically generated using a novel abductive reasoning framework called a basic framework for abductive workflow generation (AbFab). Criteria for identifying disease-causing variants in Mendelian blood disorders were identified and implemented as AbFab services. A web application was built allowing users to run workflows generated from the criteria to analyze genomic variants. Significant variants are flagged and explanations provided for why they match or fail to match the criteria.The Mutadelic web application is available for use at http://email@example.com.Supplementary data are available at Bioinformatics online.