Tobar-Tosse F.,Colombian Center for Genomics and Bioinformatics of Extreme Environments Gebix |
Tobar-Tosse F.,PanAmerican Bioinformatics Institute |
Rodriguez A.C.,Colombian Center for Genomics and Bioinformatics of Extreme Environments Gebix |
Velez P.E.,Colombian Center for Genomics and Bioinformatics of Extreme Environments Gebix |
And 6 more authors.
Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. © 2013 Tobar-Tosse et al. Source