Ai H.,Jiangxi Agricultural University |
Fang X.,BGI Technology |
Yang B.,Jiangxi Agricultural University |
Huang Z.,BGI Technology |
And 23 more authors.
Nature Genetics | Year: 2015
Domestic pigs have evolved genetic adaptations to their local environmental conditions, such as cold and hot climates. We sequenced the genomes of 69 pigs from 15 geographically divergent locations in China and detected 41 million variants, of which 21 million were absent from the dbSNP database. In a genome-wide scan, we identified a set of loci that likely have a role in regional adaptations to high- and low-latitude environments within China. Intriguingly, we found an exceptionally large (14-Mb) region with a low recombination rate on the X chromosome that appears to have two distinct haplotypes in the high- and low-latitude populations, possibly underlying their adaptation to cold and hot environments, respectively. Surprisingly, the adaptive sweep in the high-latitude regions has acted on DNA that might have been introgressed from an extinct Sus species. Our findings provide new insights into the evolutionary history of pigs and the role of introgression in adaptation. © 2015 Nature America, Inc. All rights reserved.
Xie Y.,South China University of Technology |
Xie Y.,Hong Kong University of Science and Technology |
Wu G.,BGI Shenzhen |
Tang J.,BGI Shenzhen |
And 20 more authors.
Bioinformatics | Year: 2014
Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 × 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge. Results: Here, we present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. We evaluated its performance on transcriptome datasets from rice and mouse. Using as our benchmarks the known transcripts from these well-annotated genomes (sequenced a decade ago), we assessed how SOAPdenovo-Trans and two other popular transcriptome assemblers handled such practical issues as alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution. © 2014 The Author. Published by Oxford University Press. All rights reserved.
Liu G.,Zhejiang University |
Liu G.,Guizhou University |
Li W.,BGI Technology |
Zheng P.,Zhejiang University |
And 5 more authors.
BMC Genomics | Year: 2012
Background: Bud dormancy is a critical developmental process that allows perennial plants to survive unfavorable environmental conditions. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms regulating bud dormancy in this species are unknown. Because genomic information for pear is currently unavailable, transcriptome and digital gene expression data for this species would be valuable resources to better understand the molecular and biological mechanisms regulating its bud dormancy.Results: We performed de novo transcriptome assembly and digital gene expression (DGE) profiling analyses of 'Suli' pear (Pyrus pyrifolia white pear group) using the Illumina RNA-seq system. RNA-Seq generated approximately 100 M high-quality reads that were assembled into 69,393 unigenes (mean length = 853 bp), including 14,531 clusters and 34,194 singletons. A total of 51,448 (74.1%) unigenes were annotated using public protein databases with a cut-off E-value above 10-5. We mainly compared gene expression levels at four time-points during bud dormancy. Between Nov. 15 and Dec. 15, Dec. 15 and Jan. 15, and Jan. 15 and Feb. 15, 1,978, 1,024, and 3,468 genes were differentially expressed, respectively. Hierarchical clustering analysis arranged 190 significantly differentially-expressed genes into seven groups. Seven genes were randomly selected to confirm their expression levels using quantitative real-time PCR.Conclusions: The new transcriptomes offer comprehensive sequence and DGE profiling data for a dynamic view of transcriptomic variation during bud dormancy in pear. These data provided a basis for future studies of metabolism during bud dormancy in non-model but economically-important perennial species. © 2012 Liu et al.; licensee BioMed Central Ltd.
Yuan Y.,BGI Technology |
Xu H.,BGI Technology |
Leung R.K.-K.,BGI Technology |
Leung R.K.-K.,University of Hong Kong |
Leung R.K.-K.,Chinese University of Hong Kong
BMC Genomics | Year: 2016
Background: Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. Results: By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27% to genome, 86.46% to transcriptome), though 47.83% of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09%, especially for long (>150bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. Conclusion: We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated. © 2016 The Author(s).
Radhakrishnan P.,University of Nebraska Medical Center |
Dabelsteen S.,Copenhagen University |
Madsen F.B.,Copenhagen University |
Francavilla C.,Copenhagen University |
And 20 more authors.
Proceedings of the National Academy of Sciences of the United States of America | Year: 2014
Aberrant expression of immature truncated O-glycans is a characteristic feature observed on virtually all epithelial cancer cells, and a very high frequency is observed in early epithelial premalignant lesions that precede the development of adenocarcinomas. Expression of the truncated O-glycan structures Tn and sialyl-Tn is strongly associated with poor prognosis and overall low survival. The genetic and biosynthetic mechanisms leading to accumulation of truncated O-glycans are not fully understood and include mutation or dysregulation of glycosyltransferases involved in elongation of O-glycans, as well as relocation of glycosyltransferases controlling initiation of O-glycosylation from Golgi to endoplasmic reticulum. Truncated O-glycans have been proposed to play functional roles for cancer-cell invasiveness, but our understanding of the biological functions of aberrant glycosylation in cancer is still highly limited. Here, we used exome sequencing of most glycosyltransferases in a large series of primary and metastatic pancreatic cancers to rule out somatic mutations as a cause of expression of truncated O-glycans. Instead, we found hypermethylation of core 1 β3-Gal-T-specific molecular chaperone, a key chaperone for O-glycan elongation, as the most prevalent cause. We next used gene editing to produce isogenic cell systems with and without homogenous truncated O-glycans that enabled, to our knowledge, the first polyomic and side-by-side evaluation of the cancer O-glycophenotype in an organotypic tissue model and in xenografts. The results strongly suggest that truncation of O-glycans directly induces oncogenic features of cell growth and invasion. The study provides support for targeting cancer-specific truncated O-glycans with immunotherapeutic measures.
Yang F.,Copenhagen University |
Li W.,BGI Technology |
Jorgensen H.J.L.,Copenhagen University
PLoS ONE | Year: 2013
The disease septoria leaf blotch of wheat, caused by fungal pathogen Septoria tritici, is of worldwide concern. The fungus exhibits a hemibiotrophic lifestyle, with a long symptomless, biotrophic phase followed by a sudden transition to necrotrophy associated with host necrosis. Little is known about the systematic interaction between fungal pathogenicity and host responses at specific growth stages and the factors triggering the transition. In order to gain some insights into global transcriptome alterations in both host and pathogen during the two phases of the compatible interaction, disease transition was monitored using pathogenesis-related gene markers and H2O 2 signature prior to RNA-Seq. Transcriptome analysis revealed that the slow symptomless growth was accompanied by minor metabolic responses and slightly suppressed defences in the host, whereas necrotrophic growth was associated with enhanced host responses involving energy metabolism, transport, signalling, defence and oxidative stress as well as a decrease in photosynthesis. The fungus expresses distinct classes of stage-specific genes encoding potential effectors, probably first suppressing plant defence responses/facilitating the symptomless growth and later triggering life style transition and inducing host necrosis/facilitating the necrotrophic growth. Transport, signalling, anti-oxidative stress mechanisms and apoplastic nutrient acquisition play important roles in the entire infection process of S. tritici. Our findings uncover systematic S. tritici -induced expression profiles of wheat related to specific fungal infection strategies and provide a transcriptome resource for studying both hosts and pathogens in plant-Dothideomycete interactions. © 2013 Yang et al.
Yang F.,Copenhagen University |
Yin Q.,BGI Technology
Proteomics | Year: 2016
Zymoseptoria tritici causes Septoria tritici blotch disease of wheat. To obtain a comprehensive protein dataset of this fungal pathogen, proteomes of Z. tritici growing in nutrient-limiting and rich media and in vivo at a late stage of wheat infection were fractionated by 1D gel or strong cation exchange (SCX) chromatography and analyzed by LC-MS/MS. A total of 5731, 5376 and 3168 Z. tritici proteins were confidently identified from these conditions, respectively. Of these in vitro and in planta proteins, 9 and 11% were predicted to contain signal peptides, respectively. Functional classification analysis revealed the proteins were involved in the various cellular activities. Comparison of three distinct protein expression profiles demonstrates the elevated carbohydrate, lipid and secondary metabolisms, transport, protein processing and energy production specifically in the host environment, in contrast to the enhancement of signaling, defense, replication, transcription and cell division in vitro. The data provide useful targets towards a better understanding of the molecular basis of Z. tritici growth, development, stress response and pathogenicity. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Dalal K.,Örebro University |
Lin Z.,BGI Technology |
Gifford M.,Örebro University |
Svanstrom L.,University of Skövde
International Journal of Preventive Medicine | Year: 2013
Background: To estimate the economic loss due to road traffic injuries (RTIs) of the World Health Organization (WHO) member countries and to explore the relationship between the economic loss and relevant health system factors. Methods: Data from the World Bank and the WHO were applied to set up the databases. Disability-adjusted life year (DALY) and gross domestic product per capita were used to estimate the economic loss relating to RTIs. Regression analysis was used. Data were analyzed by IBM SPSS Statistics, Versions 20.0. Results: In 2005, the total economic loss of RTIs was estimated to be 167,752.4 million United States Dollars. High income countries (HIC) showed the greatest economic losses. The majority (96%) of the top 25 countries with the greatest DALY losses are low and middle income countries while 48% of the top 25 countries with the highest economic losses are HIC. The linear regression model indicates an inverse relationship between nurse density in the health system and economic loss due to RTI. Conclusions: RTIs cause enormous death and DALYs loss in low-middle income countries and enormous economic loss in HIC. More road traffic prevention programs should be promoted in these areas to reduce both incidence and economic burden of RTIs.
Homouz D.,Khalifa University |
Chen G.,BGI Technology |
Kudlicki A.S.,University of Texas Medical Branch
Scientific Reports | Year: 2015
We report and model a previously undescribed systematic error causing spurious excess correlations that depend on the distance between probes on Affymetrix® microarrays. The phenomenon affects pairs of features with large chip separations, up to over 100 probes apart. The effect may have a significant impact on analysis of correlations in large collections of expression data, where the systematic experimental errors are repeated in many data sets. Examples of such studies include analysis of functions and interactions in groups of genes, as well as global properties of genomes. We find that the average correlations between probes on Affymetrix microarrays are larger for smaller chip distances, which points out to a previously undescribed positional artifact. The magnitude of the artifact depends on the design of the chip, and we find it to be especially high for the yeast S98 microarray, where spurious excess correlations reach 0.1 at a distance of 50 probes. We have designed an algorithm to correct this bias and provide new data sets with the corrected expression values. This algorithm was successfully implemented to remove the positional artifact from the S98 chip data while preserving the integrity of the data. © 2015, Nature Publishing Group. All rights reserved.
PubMed | Northwest Agriculture and Forestry University, Yunnan Academy of Grassland and Animal Science and BGI Technology
Type: | Journal: Scientific reports | Year: 2016
Gayal (Bos frontalis) is a semi-wild and endangered bovine species that differs from domestic cattle (Bos taurus and Bos indicus), and its genetic background remains unclear. Here, we performed whole-genome sequencing of one Gayal for the first time, with one Red Angus cattle and one Japanese Black cattle as controls. In total, 97.8Gb of sequencing reads were generated with an average 11.78-fold depth and >98.44% coverage of the reference sequence (UMD3.1). Numerous different variations were identified, 62.24% of the total single nucleotide polymorphisms (SNPs) detected in Gayal were novel, and 16,901 breed-specific nonsynonymous SNPs (BS-nsSNPs) that might be associated with traits of interest in Gayal were further investigated. Moreover, the demographic history of bovine species was first analyzed, and two population expansions and two population bottlenecks were identified. The obvious differences among their population sizes supported that Gayal was not B. taurus. The phylogenic analysis suggested that Gayal was a hybrid descendant from crossing of male wild gaur and female domestic cattle. These discoveries will provide valuable genomic information regarding potential genomic markers that could predict traits of interest for breeding programs of these cattle breeds and may assist relevant departments with future conservation and utilization of Gayal.