Time filter

Source Type

Los Altos Hills, CA, United States

Wheeler T.J.,HHMI Janelia Farm Research Campus | Clements J.,HHMI Janelia Farm Research Campus | Eddy S.R.,HHMI Janelia Farm Research Campus | Hubley R.,Institute for Systems Biology | And 4 more authors.
Nucleic Acids Research | Year: 2013

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross-match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps. © The Author(s) 2012.

Kojima K.K.,Tokyo Institute of Technology | Kojima K.K.,Genetic Information Research Institute
Molecular Biology and Evolution | Year: 2011

Alu is a predominant short interspersed element (SINE) family in the human genome and consists of two monomer units connected by an A-rich linker. At present, dimeric Alu elements are active in humans, but Alu monomers are present as fossilized sequences. A comparative genome analysis of human and chimpanzee genomes revealed eight recent insertions of Alu monomers. One of them was a retroposed product of another Alu monomer with 3′ transduction. Further analysis of 1,404 loci of the Alu monomer in the human genome revealed that some Alu monomers were recently generated by recombination between the internal and 3′ A-rich tracts inside of dimeric Alu elements. The data show that Alu monomers were generated by 1) retroposition of other Alu monomers and 2) recombination between two A-rich tracts. © 2010 The Author.

Lehnert S.,Catholic University of Leuven | Kapitonov V.,Genetic Information Research Institute | Thilakarathne P.J.,Catholic University of Leuven | Schuit F.C.,Catholic University of Leuven
BMC Genomics | Year: 2011

Background: The total number of miRNA genes in a genome, expression of which is responsible for the miRNA repertoire of an organism, is not precisely known. Moreover, the question of how new miRNA genes arise during evolution is incompletely understood. Recent data in humans and opossum indicate that retrotranspons of the class of short interspersed nuclear elements have contributed to the growth of microRNA gene clusters.Method: We studied a large miRNA gene cluster in intron 10 of the mouse Sfmbt2 gene using bioinformatic tools.Results: Mice and rats are unique to harbor a 55-65 Kb DNA sequence in intron 10 of the Sfmbt2 gene. This intronic region is rich in regularly repeated B1 retrotransposons together with inverted self-complementary CA/TG microsatellites. The smallest repeats unit, called MSHORT1 in the mouse, was duplicated 9 times in a tandem head-to-tail array to form 2.5 Kb MLONG1 units. The center of the mouse miRNA gene cluster consists of 13 copies of MLONG1. BLAST analysis of MSHORT1 in the mouse shows that the repeat unit is unique for intron 10 of the Sfmbt2 gene and suggest a dual phase model for growth of the miRNA gene cluster: arrangment of 10 MSHORT1 units into MLONG1 and further duplication of 13 head-to-tail MLONG1 units in the center of the miRNA gene cluster. Rats have a similar arrangment of repeat units in intron 10 of the Sfmbt2 gene. The discrepancy between 65 miRNA genes in the mouse cluster as compared to only 1 miRNA gene in the corresponding rat repeat cluster is ascribed to sequence differences between MSHORT1 and RSHORT1 that result in lateral-shifted, less-stable miRNA precursor hairpins for RSHORT1.Conclusion: Our data provides new evidence for the emerging concept that lineage-specific retroposons have played an important role in the birth of new miRNA genes during evolution. The large difference in the number of miRNA genes in two closely related species (65 versus 1, mice versus rats) indicates that this species-specific evolution can be a rapid process. © 2011 Lehnert et al; licensee BioMed Central Ltd.

Kojima K.K.,Genetic Information Research Institute | Jurka J.,Genetic Information Research Institute
Mobile DNA | Year: 2011

Background: "Domestication" of transposable elements (TEs) led to evolutionary breakthroughs such as the origin of telomerase and the vertebrate adaptive immune system. These breakthroughs were accomplished by the adaptation of molecular functions essential for TEs, such as reverse transcription, DNA cutting and ligation or DNA binding. Cryptons represent a unique class of DNA transposons using tyrosine recombinase (YR) to cut and rejoin the recombining DNA molecules. Cryptons were originally identified in fungi and later in the sea anemone, sea urchin and insects. Results: Herein we report new Cryptons from animals, fungi, oomycetes and diatom, as well as widely conserved genes derived from ancient Crypton domestication events. Phylogenetic analysis based on the YR sequences supports four deep divisions of Crypton elements. We found that the domain of unknown function 3504 (DUF3504) in eukaryotes is derived from Crypton YR. DUF3504 is similar to YR but lacks most of the residues of the catalytic tetrad (R-H-R-Y). Genes containing the DUF3504 domain are potassium channel tetramerization domain containing 1 (KCTD1), KIAA1958, zinc finger MYM type 2 (ZMYM2), ZMYM3, ZMYM4, glutamine-rich protein 1 (QRICH1) and "without children" (WOC). The DUF3504 genes are highly conserved and are found in almost all jawed vertebrates. The sequence, domain structure, intron positions and synteny blocks support the view that ZMYM2, ZMYM3, ZMYM4, and possibly QRICH1, were derived from WOC through two rounds of genome duplication in early vertebrate evolution. WOC is observed widely among bilaterians. There could be four independent events of Crypton domestication, and one of them, generating WOC/ZMYM, predated the birth of bilaterian animals. This is the third-oldest domestication event known to date, following the domestication generating telomerase reverse transcriptase (TERT) and Prp8. Many Crypton-derived genes are transcriptional regulators with additional DNA-binding domains, and the acquisition of the DUF3504 domain could have added new regulatory pathways via protein-DNA or protein-protein interactions. Conclusions: Cryptons have contributed to animal evolution through domestication of their YR sequences. The DUF3504 domains are domesticated YRs of animal Crypton elements. © 2011 Kojima and Jurka; licensee BioMed Central Ltd.

Kojima K.K.,Genetic Information Research Institute | Jurka J.,Genetic Information Research Institute
PLoS ONE | Year: 2013

Target-specific integration of transposable elements for multicopy genes, such as ribosomal RNA and small nuclear RNA (snRNA) genes, is of great interest because of the relatively harmless nature, stable inheritance and possible application for targeted gene delivery of target-specific transposable elements. To date, such strict target specificity has been observed only among non-LTR retrotransposons. We here report a new superfamily of sequence-specific DNA transposons, designated Dada. Dada encodes a DDE-type transposase that shows a distant similarity to transposases encoded by eukaryotic MuDR, hAT, P and Kolobok transposons, as well as the prokaryotic IS256 insertion element. Dada generates 6-7 bp target site duplications upon insertion. One family of Dada DNA transposons targets a specific site inside the U6 snRNA genes and are found in various fish species, water flea, oyster and polycheate worm. Other target sequences of the Dada transposons are U1 snRNA genes and different tRNA genes. The targets are well conserved in multicopy genes, indicating that copy number and sequence conservation are the primary constraints on the target choice of Dada transposons. Dada also opens a new frontier for target-specific gene delivery application. © 2013 Kojima, Jurka.

Discover hidden collaborations