Blake J.A.,The Jackson Laboratory |
Dolan M.,The Jackson Laboratory |
Drabkin H.,The Jackson Laboratory |
Hill D.P.,The Jackson Laboratory |
And 182 more authors.
Nucleic Acids Research | Year: 2013
The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bio-informatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources. © The Author(s) 2012.
Blake J.A.,The Jackson Laboratory |
Christie K.R.,The Jackson Laboratory |
Dolan M.E.,The Jackson Laboratory |
Drabkin H.J.,The Jackson Laboratory |
And 209 more authors.
Nucleic Acids Research | Year: 2015
The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. © The Author(s) 2014.
News Article | October 25, 2016
No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. ‘Homoeologous’ chromosomes are anciently orthologous chromosomes that diverged by speciation but were reunited in the same nucleus by a polyploidization event. They are a special case of paralogues. Homoeologous genes are sometimes called ‘alloalleles’ to emphasize their role as alternate forms of a gene, but since homoeologues are unlinked and assort independently, we do not use this terminology. Similarly, loss of homoeologous genes is sometimes referred to as ‘diploidization’. We prefer the simpler and more descriptive term ‘gene loss’. Note that an allotetraploid such as Xenopus laevis has two related subgenomes, but these subgenomes are each transmitted to progeny via conventional disomic inheritance. So immediately after allotetraploidization, the new species is already genetically diploid. This is clearly the case for X. laevis, since we find no evidence for recombination between homoeologous chromosomes, which would create new sequences with mixed ‘L’ and ‘S’ type transposable elements. DNA was extracted from the blood of a single female from the inbred J-strain for whole-genome shotgun sequencing. We generated 4.6 billion paired-end Illumina reads from a range of inserts and used Sanger dideoxy sequencing to obtain fosmid- and bacterial artificial chromosome (BAC)-end pairs and full BAC sequences. We used meraculous45 as the primary genome assembler. See supplementary notes for more detailed information. We identified 798 BACs containing genes of interest distributed across the Xenopus genome and performed fluorescence FISH to assign these BACs to specific chromosomes based on Hoechst 33258-stained late-replication banding patterns (Supplementary Table 1). Tethered chromatin conformation capture (TCC)46 and in vitro chromatin conformation capture47 were performed as previously described, and assembled with HiRise47. Sex determination in X. laevis follows a female heterogametic ZZ/ZW system48. We fully sequenced BAC clones representing both W and Z haplotypes, and identified both W- and Z-specific sequences (Extended Data Fig. 2a). The existence of the Z-specific sequence was unexpected and therefore verified by PCR analysis using specific primer sets and DNA from gynogenetic frogs having either W or Z loci. We made use of extensive previously generated transcriptome data for X. laevis and X. tropicalis, including 697,015 X. laevis EST sequences49. In addition, more than 1 billion RNA-seq reads were generated for this project from 14 oocyte/developmental stages and 14 adult tissues from J-strain X. laevis (Supplementary Note 4). These data were combined with homology and ab initio predictions using the Joint Genome Institute’s integrated gene call pipeline (see Supplementary Notes 4 and 8 for more details). We found subgenome-specific repeats using a RepeatMasker50 result. The repeats were used to reconstruct full-length subgenome specific transposon sequences. The specific transposons, Xl-TpL_harb, Xl-TpS_harb and Xl-TpS_mar, were classified on the basis of their target site sequence and terminal inverted repeat (TIR) sequences. The coverage lengths of the transposons on each chromosome were calculated from the results of BLASTN search (E < 10−5) using the consensus sequences of the transposons as queries. The chromosomal distribution of the Xl-TpS_mar was revealed by a FISH analysis (Supplementary Note 7.4). We used Hymenochirus boettgeri, Pipa carvalhoi and Rana pipiens sequences as outgroups to estimate the evolutionary rate of duplicated genes in X. laevis and their relationship to X. tropicalis. See Supplementary Notes 7 and 8 for more detail. Pseudogene sequences contain various defects including premature stop codons, frameshifts, disrupted splicing, and/or partial coding deletions. 985 pseudogenes were identified among 1,531 ‘2-1-2 regions’, with the others deleted or rendered unidentifiable by mutation. 368 out of 985 could be timed, based on the accumulation of non-synonymous and synonymous substitution between a pseudogene, its homoeologue and its orthologue in X. tropicalis, providing a time since the loss of constraint for each pseudogene37. See Supplementary Note 9 for additional details. We used several bioinformatic methods and high-throughput datasets to assign functional annotations to Xenopus genes. Protein domains were assigned using InterPro (including PFAM and Panther)51 and KEGG52. Gene Ontology was assigned using InterPro2Go51. We identified genes that encode mitochondrial proteins by mapping the MitoCarta53 database from mouse to the most recent X. tropicalis proteome. Xenopus genes associated with germ plasm were manually curated using the extensive Xenopus literature (Supplementary Note 13). We analysed transcriptome data generated for 14 oocyte/developmental stages and 14 adult tissues in duplicate except for oocyte stages (see Supplementary Note 4). Expression levels were measured by mapping paired-end RNA-seq reads to predicted full length cDNA and reporting transcripts per one million mapped reads (TPM). We consider the limit of detectable expression to be TPM >0.5. Co-expression modules were defined by weighted gene correlation network analysis (WGCNA) clustering54 (Supplementary Note 12). We determined DNA methylation levels (DNAme) by whole genome bisulfite sequencing and used ChIP–seq to generate profiles of the promoter mark histone H3 lysine 4 trimethylation (H3K4me3), the transcription elongation mark H3K36me3, as well as RNA polymerase II (RNAPII) and the enhancer-associated co-activator p300. To test which regulatory features would contribute most to the L versus S expression differences, we applied a Random Forest machine learning algorithm to analyse differential expression between the L and S homoeologues (See Supplementary Note 14). The XENLAv9.1 genome assembly and annotation are deposited at NCBI (accession LYTH00000000. The DNA read libraries of X. laevis and X. borealis were deposited at the Sequence Read Archive under accessions SRP071264 and SRP070985, respectively. Datasets of the X. laevis RNA-seq short reads were deposited in NCBI Gene Expression Omnibus (accession number GSE73430 for stages, GSE73419 for tissues). Datasets of the Hymenochirus RNA-seq short reads were deposited in NCBI GEO (accession number GSE76089). The epigenetic data have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession numbers GSE76059 for ChIP–seq. MethylC-seq data are accessible through GEO Series accession number GSE76247. The sequence data from BAC and fosmid clones have been deposited to DDBJ/GenBank/EMBL under the accession numbers: (i) GA131508–GA227532, GA228275–GA244139, GA244852–GA274229, GA274976–GA275712, GA277157–GA344957, GA345673–GA350926 and GA351685–GA393223 for the XLB1 end-sequences; (ii) GA720358–GA756840 for the XLB2 end-sequences; (iii) GA756841–GA867435 for the XLFIC end-sequences and (iv) AP012997–AP013026,AP014660–AP014679, AP017316 and AP017317 for the finished BAC/fosmid sequences.