Time filter

Source Type

Juan D.,Structural Biology and Bioinformatics Programme | Rodriguez J.M.,National Bioinformatics Institute INB | Frankish A.,Wellcome Trust Sanger Institute | Diekhans M.,University of California at Santa Cruz | And 5 more authors.
Human Molecular Genetics | Year: 2014

Determining the fullcomplement of protein-coding genes is a key goal ofgenomeannotation. Themost powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide massspectrometry(MS) experiments. Here,wemappedpeptides detected insevenlarge-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome.We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96%of genes that evolved before bilateria. At the opposite end of the scale,weidentified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments.Weidentified peptides for just3%of these genes.Weshowthatmany of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort. © The Author 2014. Published by Oxford University Press. All rights reserved.

Ezkurdia I.,CSIC - National Center for Metallurgical Research | Calvo E.,CSIC - National Center for Metallurgical Research | Del Pozo A.,Hospital Universitario La Paz | Vazquez J.,CSIC - National Center for Metallurgical Research | And 3 more authors.
Expert Review of Proteomics | Year: 2015

The authors have carried out an investigation of the two "draft maps of the human proteome" published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers - the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves - should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease. © 2015 The Author(s). Published by Taylor & Francis.

Abascal F.,Spanish National Cancer Research Center | Rodriguez-Rivas J.,Spanish National Cancer Research Center | Rodriguez J.M.,National Bioinformatics Institute INB | del Pozo A.,Hospital Universitario La Paz | And 3 more authors.
PLoS Computational Biology | Year: 2015

Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles. © 2015 Abascal et al.

Discover hidden collaborations