Los Altos, CA, United States
Los Altos, CA, United States

Time filter

Source Type

Wheeler T.J.,HHMI Janelia Farm Research Campus | Clements J.,HHMI Janelia Farm Research Campus | Eddy S.R.,HHMI Janelia Farm Research Campus | Hubley R.,Institute for Systems Biology | And 4 more authors.
Nucleic Acids Research | Year: 2013

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross-match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps. © The Author(s) 2012.


Kojima K.K.,Genetic Information Research Institute | Jurka J.,Genetic Information Research Institute
Mobile DNA | Year: 2011

Background: "Domestication" of transposable elements (TEs) led to evolutionary breakthroughs such as the origin of telomerase and the vertebrate adaptive immune system. These breakthroughs were accomplished by the adaptation of molecular functions essential for TEs, such as reverse transcription, DNA cutting and ligation or DNA binding. Cryptons represent a unique class of DNA transposons using tyrosine recombinase (YR) to cut and rejoin the recombining DNA molecules. Cryptons were originally identified in fungi and later in the sea anemone, sea urchin and insects. Results: Herein we report new Cryptons from animals, fungi, oomycetes and diatom, as well as widely conserved genes derived from ancient Crypton domestication events. Phylogenetic analysis based on the YR sequences supports four deep divisions of Crypton elements. We found that the domain of unknown function 3504 (DUF3504) in eukaryotes is derived from Crypton YR. DUF3504 is similar to YR but lacks most of the residues of the catalytic tetrad (R-H-R-Y). Genes containing the DUF3504 domain are potassium channel tetramerization domain containing 1 (KCTD1), KIAA1958, zinc finger MYM type 2 (ZMYM2), ZMYM3, ZMYM4, glutamine-rich protein 1 (QRICH1) and "without children" (WOC). The DUF3504 genes are highly conserved and are found in almost all jawed vertebrates. The sequence, domain structure, intron positions and synteny blocks support the view that ZMYM2, ZMYM3, ZMYM4, and possibly QRICH1, were derived from WOC through two rounds of genome duplication in early vertebrate evolution. WOC is observed widely among bilaterians. There could be four independent events of Crypton domestication, and one of them, generating WOC/ZMYM, predated the birth of bilaterian animals. This is the third-oldest domestication event known to date, following the domestication generating telomerase reverse transcriptase (TERT) and Prp8. Many Crypton-derived genes are transcriptional regulators with additional DNA-binding domains, and the acquisition of the DUF3504 domain could have added new regulatory pathways via protein-DNA or protein-protein interactions. Conclusions: Cryptons have contributed to animal evolution through domestication of their YR sequences. The DUF3504 domains are domesticated YRs of animal Crypton elements. © 2011 Kojima and Jurka; licensee BioMed Central Ltd.


Jurka J.,Genetic Information Research Institute | Bao W.,Genetic Information Research Institute | Kojima K.K.,Genetic Information Research Institute
Biology Direct | Year: 2011

Background: Eukaryotic genomes harbor diverse families of repetitive DNA derived from transposable elements (TEs) that are able to replicate and insert into genomic DNA. The biological role of TEs remains unclear, although they have profound mutagenic impact on eukaryotic genomes and the origin of repetitive families often correlates with speciation events. We present a new hypothesis to explain the observed correlations based on classical concepts of population genetics.Presentation of the hypothesis: The main thesis presented in this paper is that the TE-derived repetitive families originate primarily by genetic drift in small populations derived mostly by subdivisions of large populations into subpopulations. We outline the potential impact of the emerging repetitive families on genetic diversification of different subpopulations, and discuss implications of such diversification for the origin of new species.Testing the hypothesis: Several testable predictions of the hypothesis are examined. First, we focus on the prediction that the number of diverse families of TEs fixed in a representative genome of a particular species positively correlates with the cumulative number of subpopulations (demes) in the historical metapopulation from which the species has emerged. Furthermore, we present evidence indicating that human AluYa5 and AluYb8 families might have originated in separate proto-human subpopulations. We also revisit prior evidence linking the origin of repetitive families to mammalian phylogeny and present additional evidence linking repetitive families to speciation based on mammalian taxonomy. Finally, we discuss evidence that mammalian orders represented by the largest numbers of species may be subject to relatively recent population subdivisions and speciation events.Implications of the hypothesis: The hypothesis implies that subdivision of a population into small subpopulations is the major step in the origin of new families of TEs as well as of new species. The origin of new subpopulations is likely to be driven by the availability of new biological niches, consistent with the hypothesis of punctuated equilibria. The hypothesis also has implications for the ongoing debate on the role of genetic drift in genome evolution.Reviewers: This article was reviewed by Eugene Koonin, Juergen Brosius and I. King Jordan. © 2011 Jurka et al; licensee BioMed Central Ltd.


Kojima K.K.,Genetic Information Research Institute | Kapitonov V.V.,Genetic Information Research Institute | Jurka J.,Genetic Information Research Institute
Molecular Biology and Evolution | Year: 2011

Autonomous non-long terminal repeat (non-LTR) retrotransposons and their repetitive remnants are ubiquitous components of mammalian genomes. Recently, we identified non-LTR retrotransposon families, Ingi-1-AAl and Ingi-1-EE, in two hedgehog genomes. Here we rename them to Vingi-1-AAl and Vingi-1-EE and report a new clade "Vingi," which is a sister clade of Ingi that lacks the ribonuclease H domain. In the European hedgehog genome, there are 11 non-autonomous families of elements derived from Vingi-1-EE by internal deletions. No retrotransposons related to Vingi elements were found in any of the remaining 33 mammalian genomes nearly completely sequenced to date, but we identified several new families of Vingi and Ingi retrotransposons outside mammals. Our data suggest the horizontal transfer of Vingi elements to hedgehog, although the vertical transfer cannot be ruled out. The compact structure and trans-mobilization of nonautonomous derivatives of Vingi can make them useful for in vivo retrotransposition assay system. © 2010 The Author.


Kojima K.K.,Tokyo Institute of Technology | Kojima K.K.,Genetic Information Research Institute
Molecular Biology and Evolution | Year: 2011

Alu is a predominant short interspersed element (SINE) family in the human genome and consists of two monomer units connected by an A-rich linker. At present, dimeric Alu elements are active in humans, but Alu monomers are present as fossilized sequences. A comparative genome analysis of human and chimpanzee genomes revealed eight recent insertions of Alu monomers. One of them was a retroposed product of another Alu monomer with 3′ transduction. Further analysis of 1,404 loci of the Alu monomer in the human genome revealed that some Alu monomers were recently generated by recombination between the internal and 3′ A-rich tracts inside of dimeric Alu elements. The data show that Alu monomers were generated by 1) retroposition of other Alu monomers and 2) recombination between two A-rich tracts. © 2010 The Author.


Kojima K.K.,Genetic Information Research Institute | Jurka J.,Genetic Information Research Institute
PLoS ONE | Year: 2013

Target-specific integration of transposable elements for multicopy genes, such as ribosomal RNA and small nuclear RNA (snRNA) genes, is of great interest because of the relatively harmless nature, stable inheritance and possible application for targeted gene delivery of target-specific transposable elements. To date, such strict target specificity has been observed only among non-LTR retrotransposons. We here report a new superfamily of sequence-specific DNA transposons, designated Dada. Dada encodes a DDE-type transposase that shows a distant similarity to transposases encoded by eukaryotic MuDR, hAT, P and Kolobok transposons, as well as the prokaryotic IS256 insertion element. Dada generates 6-7 bp target site duplications upon insertion. One family of Dada DNA transposons targets a specific site inside the U6 snRNA genes and are found in various fish species, water flea, oyster and polycheate worm. Other target sequences of the Dada transposons are U1 snRNA genes and different tRNA genes. The targets are well conserved in multicopy genes, indicating that copy number and sequence conservation are the primary constraints on the target choice of Dada transposons. Dada also opens a new frontier for target-specific gene delivery application. © 2013 Kojima, Jurka.


Bao W.,Genetic Information Research Institute | Kojima K.K.,Genetic Information Research Institute | Kojima K.K.,Tokyo Medical University | Kohany O.,Genetic Information Research Institute
Mobile DNA | Year: 2015

Repbase Update (RU) is a database of representative repeat sequences in eukaryotic genomes. Since its first development as a database of human repetitive sequences in 1992, RU has been serving as a well-curated reference database fundamental for almost all eukaryotic genome sequence analyses. Here, we introduce recent updates of RU, focusing on technical issues concerning the submission and updating of Repbase entries and will give short examples of using RU data. RU sincerely invites a broader submission of repeat sequences from the research community. © 2015 Bao et al.


Bao W.,Genetic Information Research Institute | Jurka J.,Genetic Information Research Institute
Mobile DNA | Year: 2013

Background: Bacterial insertion sequences (IS) of IS200/IS605 and IS607 family often encode a transposase (TnpA) and a protein of unknown function, TnpB. Results: Here we report two groups of TnpB-like proteins (Fanzor1 and Fanzor2) that are widespread in diverse eukaryotic transposable elements (TEs), and in large double-stranded DNA (dsDNA) viruses infecting eukaryotes. Fanzor and TnpB proteins share the same conserved amino acid motif in their C-terminal half regions: D-X(125, 275)-[TS]-[TS]-X-X-[C4 zinc finger]-X(5,50)-RD, but are highly variable in their N-terminal regions. Fanzor1 proteins are frequently captured by DNA transposons from different superfamilies including Helitron, Mariner, IS4-like, Sola and MuDr. In contrast, Fanzor2 proteins appear only in some IS607-type elements. We also analyze a new Helitron2 group from the Helitron superfamily, which contains elements with hairpin structures on both ends. Non-autonomous Helitron2 elements (CRe-1, 2, 3) in the genome of green alga Chlamydomonas reinhardtii are flanked by target site duplications (TSDs) of variable length (approximately 7 to 19 bp). Conclusions: The phylogeny and distribution of the TnpB/Fanzor proteins indicate that they may be disseminated among eukaryotic species by viruses. We hypothesize that TnpB/Fanzor proteins may act as methyltransferases. © 2013 Bao and Jurka; licensee BioMed Central Ltd.


Bao W.,Genetic Information Research Institute | Jurka J.,Genetic Information Research Institute
Gene | Year: 2010

LINE-1 (L1) retrotransposons represent the most abundant family of non-LTR retrotransposons in virtually all mammals. The only currently known exception is Platypus, where it is found only in low copy numbers. Autonomous L1s encode two proteins, ORF1p and ORF2p, both of which are required for the transposition of L1s. L1 replicative machinery is also involved in the trans-mobilization of non-autonomous retrotransposons, such as diverse short interspersed repetitive elements (SINEs) and processed pseudogenes. Here, we focus on a unique category of "half -L1" elements (HAL1s), which encode ORF1p but not ORF2p. HAL1s are present both in placental mammals and marsupials. We demonstrate that HAL1s originated independently several times during the evolution of mammals. The youngest mammalian HAL1 elements analyzed in this paper were identified in the guinea pig genome. Our analysis strongly suggests that HAL1-encoded ORF1p is essential for the transposition of HAL1s and indicates that the evolution of ORF1p in HAL1s is faster than in L1s. The implications of HAL1 for the evolution of L1 elements and the host genomes are discussed. © 2010 Elsevier B.V.


Kojima K.K.,Genetic Information Research Institute
Genome Biology and Evolution | Year: 2015

Eukaryotic genomes are colonized by various transposons including short interspersed elements (SINEs). The 5′ region (head) of the majority of SINEs is derived from one of the three types of RNA genes - 7SL RNA, transfer RNA(tRNA), or 5S ribosomal RNA(rRNA) - and the internal promoter inside the head promotes the transcription of the entire SINEs. Here I report a new group of SINEs whose heads originate from either theU1orU2 small nuclear RNA gene. These SINEs, named SINEU, are distributed amongcrocodilians and classified into three families. The structures of the SINEU-1 subfamilies indicate the recurrent addition of aU1- or U2-derived sequence onto the 5′ end of SINEU-1 elements. SINEU-1 and SINEU-3 are ancient and shared among alligators, crocodiles, and gharials, while SINEU-2 is absent in the alligator genome. SINEU-2 is the only SINE family that was active after the split of crocodiles and gharials. All SINEU families, especially SINEU-3, are preferentially inserted into a family of Mariner DNA transposon, Mariner-N4-AMi. A group of Tx1 non-long terminal repeat retrotransposons designated Tx1-Mar also show target preference forMariner-N4-AMi, indicating that SINEU was mobilized by Tx1-Mar. © The Author(s) 2015.

Loading Genetic Information Research Institute collaborators
Loading Genetic Information Research Institute collaborators