Beijing Genomic Institute
Beijing Genomic Institute
Hong H.X.,U.S. Food and Drug Administration |
Zhang W.Q.,Beijing Genomic Institute |
Shen J.,U.S. Food and Drug Administration |
Su Z.Q.,U.S. Food and Drug Administration |
And 5 more authors.
Science China Life Sciences | Year: 2013
Realizing personalized medicine requires integrating diverse data types with bioinformatics. The most vital data are genomic information for individuals that are from advanced next-generation sequencing (NGS) technologies at present. The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data. The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike. Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine. We review some key bioinformatics tasks, issues, and challenges in contexts of IT requirements, data quality, analysis tools and pipelines, and validation of biomarkers. © 2013 The Author(s).
PubMed | Qiannan Peoples Hospital, Guiyang Medical College, Beijing Genomic Institute, Pingtang Peoples Hospital and 2 more.
Type: Journal Article | Journal: BMJ open gastroenterology | Year: 2015
A total of 105 patients were identified as accidentally infected with hepatitis C virus genotype 1b (HCV1b) through blood transfusion from a single blood donor. This group provides a unique patient population to study host factors involved in the spontaneous clearance of HCV and disease progression.Clinical markers, HCV RNA and eight single nucleotide polymorphisms (SNPs) of interleukin-28B (IL-28B) were detected. Exome capture and sequencing were analysed for association with HCV clearance.Among the 85 patients with the positive HCV antibody, 27 cases (31.8%) were HCV RNA negative over a period of 9-12years. Of the 58 patients with positive HCV RNA, 22.4% developed chronic hepatitis, and 5.2% developed cirrhosis. Age was found to be associated with HCV1b clearance. IL-28 rs10853728 CC showed the trend. By exon sequencing, 39 SNPs were found to be significantly different in spontaneous clearance patients (p<0.001). Two SNPs in the tenascin receptor (TNR), five in the transmembrane protease serine 11A (TMPRSS11A), and one in the serine peptidase inhibitor kunitz type 2 (SPINT2) showed the closest associations (p<10(-5)).Host genetic analyses on the unique, single source HCV1b-infected patient population has suggested that age and mutations in TNR, TMPRSS11A and SPINT2 genes may be factors associated with HCV clearance.
News Article | February 15, 2017
No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. We sequenced Chenopodium quinoa Willd. (quinoa) accession PI 614886 (BioSample accession code SAMN04338310; also known as NSL 106399 and QQ74). DNA was extracted from leaf and flower tissue of a single plant, as described in the “Preparing Arabidopsis Genomic DNA for Size-Selected ~20 kb SMRTbell Libraries” protocol (http://www.pacb.com/wp-content/uploads/2015/09/Shared-Protocol-Preparing-Arabidopsis-DNA-for-20-kb-SMRTbell-Libraries.pdf). DNA was purified twice with Beckman Coulter Genomics AMPure XP magnetic beads and assessed by standard agarose gel electrophoresis and Thermo Fisher Scientific Qubit Fluorometry. 100 Single-Molecule Real-Time (SMRT) cells were run on the PacBio RS II system with the P6-C4 chemistry by DNALink (Seoul). De novo assembly was conducted using the smrtmake assembly pipeline (https://github.com/PacificBiosciences/smrtmake) and the Celera Assembler, and the draft assembly was polished using the quiver algorithm. DNA was also sequenced using an Illumina HiSeq 2000 machine. For this, DNA was extracted from leaf tissue of a single soil-grown plant using the Qiagen DNeasy Plant Mini Kit. 500-bp paired-end (PE) libraries were prepared using the NEBNext Ultra DNA Library Prep Kit for Illumina. Sequencing reads were processed with Trimmomatic (v0.33)42, and reads <75 nucleotides in length after trimming were removed from further analysis. The remaining high-quality reads were assembled with Velvet (v1.2.10)43 using a k-mer of 75. High-molecular-weight DNA was isolated and labelled from leaf tissue of three-week old quinoa plants according to standard BioNano protocols, using the single-stranded nicking endonuclease Nt.BspQI. Labelled DNA was imaged automatically using the BioNano Irys system and de novo assembled into consensus physical maps using the BioNano IrysView analysis software. The final de novo assembly used only single molecules with a minimum length of 150 kb and eight labels per molecule. PacBio-BioNano hybrid scaffolds were identified using IrysView’s hybrid scaffold alignment subprogram. Using the same DNA prepared for PacBio sequencing, a Chicago library was prepared as described previously10. The library was sequenced on an Illumina HiSeq 2500. Chicago sequence data (in FASTQ format) was used to scaffold the PacBio-BioNano hybrid assembly using HiRise, a software pipeline designed specifically for using Chicago data to assemble genomes10. Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analysed by HiRise to produce a likelihood model, and the resulting likelihood model was used to identify putative mis-joins and score prospective joins. A population was developed by crossing Kurmi (green, sweet) and 0654 (red, bitter). Homozygous high- and low-saponin F lines were identified by planting 12 F seeds derived from each F line, harvesting F seed from these F plants, and then performing foam tests on the F seed. Phenotyping was validated using gas chromatography/mass spectrometry (GC/MS). RNA was extracted from inflorescences containing a mixture of flowers and seeds at various stages of development from the parents and 45 individual F progeny. RNA extraction and Illumina sequencing were performed as described above. Sequencing reads from all lines were trimmed using Trimmomatic and mapped to the reference assembly using TopHat44, and SNPs were called using SAMtools mpileup (v1.1)45. For linkage mapping, markers were assigned to linkage groups on the basis of the grouping by JoinMap v4.1. Using the maximum likelihood algorithm of JoinMap, the order of the markers was determined; using this as start order and fixed order, regression mapping in JoinMap was used to determine the cM distances. Genes differentially expressed between bitter and sweet lines and between green and red lines were identified using default parameters of the Cuffdiff function of the Cufflinks program46. A second mapping population was developed by crossing Atlas (sweet) and Carina Red (bitter). Bitter and sweet F lines were identified by performing foam and taste tests on the F seed. DNA sequencing was performed with DNA from the parents and 94 sweet F lines, as described above, and sequencing reads were mapped to the reference assembly using BWA. SNPs were called in the parents and in a merged file containing all combined F lines. Genotype calls were generated for the 94 F genotypes by summing up read counts over a sliding window of 500 variants, at all variant positions for which the parents were homozygous and polymorphic. Over each 500-variant stretch, all reads with Atlas alleles were summed, and all reads with the Carina Red allele were summed. Markers were assigned to linkage groups using JoinMap, with regression mapping used to obtain the genetic maps per linkage group. The Kurmi × 0654 and Atlas × Carina Red maps were integrated with the previously published quinoa linkage map13, with the Kurmi × 0654 map being used as the reference for the positions of anchor markers and scaling. We selected markers from the same scaffold that were in the same 10,000-bp bin in the assembly. The anchor markers on the alternative map received the position of the Kurmi × 0654 map anchor marker in the integrated map. This process was repeated with anchor markers at the 100,000-bp bin level. The assumption is that at the 100,000-bp bin level recombination should essentially be zero. On this level, a regression of cM position on both maps yielded R2 values >0.85 and often >0.9, so the regression line can easily be used for interpolating the positions of the alternative map towards the corresponding position on the Kurmi × 0654 map. All Kurmi × 0654 markers went into the integrated map on their original position. Pseudomolecules were assembled by concatenating scaffolds based on their order and orientation as determined from the integrated linkage map. An AGP (‘A Golden Path’) file was made that describes the positions of the scaffold-based assembly in coordinates of the pseudomolecule assembly, with 100 ‘N’s inserted between consecutive scaffolds. Based on these coordinates, custom scripts were used to generate the pseudomolecule assembly and to recoordinate the annotation file. DNA was extracted from C. pallidicaule (PI 478407) and C. suecicum (BYU 1480) and was sent to the Beijing Genomic Institute (BGI, Hong Kong) where one 180-bp PE library and two mate-pair libraries with insert sizes of 3 and 6 kb were prepared and sequenced on the Illumina HiSeq platform to obtain 2 × 100-bp reads for each library. The generated reads were trimmed using the quality-based trimming tool Sickle (https://github.com/najoshi/sickle). The trimmed reads were then assembled using the ALLPATHS-LG assembler47, and GapCloser v1.1248 was used to resolve N spacers and gap lengths produced by the ALLPATHS-LG assembler. Repeat families found in the genome assemblies of quinoa, C. pallidicaule and C. suecicum (see Supplementary Information 3) were first independently identified de novo and classified using the software package RepeatModeler49. RepeatMasker50 was used to discover and identify repeats within the respective genomes. AUGUSTUS51 was used for ab initio gene prediction, using model training based on coding sequences from Amaranthus hypochondriacus, Beta vulgaris, Spinacia oleracea and Arabidopsis thaliana. RNA-seq and isoform sequencing reads generated from RNA of different tissues were mapped onto the reference genome using Bowtie 2 (ref. 52) and GMAP53, respectively. Hints with locations of potential intron–exon boundaries were generated from the alignment files with the software package BAM2hints in the MAKER package54. MAKER with AUGUSTUS (intron–exon boundary hints provided from RNA-seq and isoform sequencing) was then used to predict genes in the repeat-masked reference genome. To help guide the prediction process, peptide sequences from B. vulgaris and the original quinoa full-length transcript (provided as EST evidence) were used by MAKER during the prediction. Genes were characterized for their putative function by performing a BLAST search of the peptide sequences against the UniProt database. PFAM domains and InterProScan ID were added to the gene models using the scripts provided in the MAKER package. The following quinoa accessions were chosen for DNA re-sequencing: 0654, Ollague, Real, Pasankalla (BYU 1202), Kurmi, CICA-17, Regalona (BYU 947), Salcedo INIA, G-205-95DK, Cherry Vanilla (BYU 1439), Chucapaca, Ku-2, PI 634921 (Ames 22157), Atlas and Carina Red. The following accessions of C. berlandieri were sequenced: var. boscianum (BYU 937), var. macrocalycium (BYU 803), var. zschackei (BYU 1314), var. sinuatum (BYU 14108), and subsp. nuttaliae (‘Huauzontle’). Two accessions of C. hircinum (BYU 566 and BYU 1101) were also sequenced. All sequencing was performed with an Illumina HiSeq 2000 machine, using either 125-bp (Atlas and Carina Red) or 100-bp (all other accessions) paired-end libraries. Reads were trimmed using Trimmomatic and mapped to the reference assembly using BWA (v0.7.10)55. Read alignments were manipulated with SAMtools, and the mpileup function of SAMtools was used to call SNPs. Orthologous and paralogous gene clusters were identified using OrthoMCL28. Recommended settings were used for all-against-all BLASTP comparisons (Blast+ v2.3.056) and OrthoMCL analyses. Custom Perl scripts were used to process OrthoMCL outputs for visualization with InteractiVenn57. Using OrthoMCL, orthologous gene sets containing two copies in quinoa and one copy each in C. pallidicaule, C. suecicum, and B. vulgaris were identified. In total, 7,433 gene sets were chosen, and their amino acid sequences were aligned individually for each set using MAFFT58. The 7,433 alignments were converted into PHYLIP format files by the seqret command in the EMBOSS package59. Individual gene trees were then constructed using the maximum likelihood method using proml in PHYLIP60. In addition, the genomic variants of all 25 sequenced taxa (Supplementary Data 5) relative to the reference sequence were called based on the mapped Illumina reads in 25 BAM files using SAMtools. To call variants in the reference genome (PI 614886), Illumina sequencing reads were mapped to the reference assembly. Variants were then filtered using VCFtools61 and SAMtools, and the qualified SNPs were combined into a single VCF file which was used as an input into SNPhylo62 to construct the phylogenetic relationship using maximum likelihood and 1,000 bootstrap iterations. To identify FT homologues, the protein sequence from the A. thaliana flowering time gene FT was used as a BLAST query. Filtering for hits with an E value <1 × e−3 and with RNA-seq evidence resulted in the identification of four quinoa proteins. One quinoa protein (AUR62013052) appeared to be comprised of two tandem repeats which were separated for the purposes of phylogenetic analysis. For the construction of the phylogenetic tree, protein sequences from these five quinoa FT homologues were aligned using Clustal Omega63 along with two B. vulgaris (gene models: BvFT1-miuf.t1, BvFT2-eewx.t1) and one A. thaliana (AT1G65480.1) homologue. Phylogenetic analysis was performed with MEGA64 (v6.06). The JTT model was selected as the best fitting model. The initial phylogenetic tree was estimated using the neighbour joining method (bootstrap value = 50, Gaps/ Missing Data Treatment = Partial Deletion, Cutoff 95%), and the final tree was estimated using the maximum likelihood method with a bootstrap value of 1,000 replicates. The syntenic relationships between the coding sequences of the chromosomal regions surrounding these FT genes were visualized using the CoGE65 GEvo tool and the Multi-Genome Synteny Viewer66. The alignment of bHLH domains was performed with Clustal Omega63, using sequences from Mertens et al.39. The phylogeny was inferred using the maximum likelihood method based on the JTT matrix-based model67. Initial trees for the heuristic search were obtained automatically by applying Neighbour-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. All positions containing gaps and missing data were eliminated. Trimmed PE Illumina sequencing reads that were used for the de novo assembly of C. suecicum and C. pallidicaule were mapped onto the reference quinoa genome using the default settings of BWA. For every base in the quinoa genome, the depth coverage of properly paired reads from the C. suecicum and C. pallidicaule mapping was calculated using the program GenomeCoverage in the BEDtools package68. A custom Perl script was used to calculate the percentage of each scaffold with more than 5× coverage from both diploids. Scaffolds were assigned to the A or B sub-genome if >65% of the bases were covered by reads from one diploid and <25% of the bases were covered by reads from the other diploid. The relationship between the quinoa sub-genomes and the diploid species C. pallidicaule and C. suecicum was presented in a circle proportional to their sizes using Circos69. Orthologous regions in the three species were identified using BLASTN searches of the quinoa genome against each diploid genome individually. Single top BLASTN hits longer than 8 kb were selected and presented as links between the quinoa genome assembly (arranged in chromosomes, see Supplementary Information 7.3) and the two diploid genome assemblies on the Circos plot (Fig. 2a). Sub-genome synteny was analysed by plotting the positions of homoeologous pairs of A- and B-sub-genome pairs within the context of the 18 chromosomes using Circos. Synteny between the sub-genomes and B. vulgaris was assessed by first creating pseudomolecules by concatenating scaffolds which were known to be ordered and oriented within each of the nine chromosomes. Syntenic regions between these B. vulgaris chromosomes and those of quinoa were then identified using the recommended settings of the CoGe SynMap tool70 and visualized using MCScanX71 and VGSC72. For the purposes of visualization, quinoa chromosomes CqB05, CqA08, CqB11, CqA15 and CqB16 were inverted. Quinoa seeds were embedded in a 2% carboxymethylcellulose solution and frozen above liquid nitrogen. Sections of 50 μm thickness were obtained using a Reichert-Jung Frigocut 2800N, modified to use a Feather C35 blade holder and blades at −20 °C using a modified Kawamoto method73. A 2,5-dihydroxybenzoic acid (Sigma-Aldrich) matrix (40 mg ml−1 in 70% methanol) was applied using a HTX TM-Sprayer (HTX Technologies LLC) with attached LC20-AD HPLC pump (Shimadzu Scientific Instruments). Sections were vacuum dried in a desiccator before analysis. The optical image was generated using an Epson 4400 Flatbed Scanner at 4,800 d.p.i. For mass spectrometric analyses, a Bruker SolariX XR with 7T magnet was used. Images were generated using Bruker Compass FlexImaging 4.1. Data were normalized to the TIC, and brightness optimization was employed to enhance visualization of the distribution of selected compounds. Individual spectra were recalibrated using Bruker Compass DataAnalysis 4.4 to internally lock masses of known DHB clusters: C H O = 273.039364 and C H O = 409.055408 m/z. Accurate mass measurements for individual saponins and identified compounds were run using continuous accumulation of selected ions (CASI) using mass windows of 50–100 m/z and a transient of 4 megaword generating a transient of 2.93 s providing a mass resolving power of approximately 390,000 at 400 m/z. Lipids were putatively assigned by searching the LipidMaps database74 (http://www.lipidmaps.org) and lipid class confirmed by collision-induced dissociation using a 10 m/z window centred around the monoisotopic peak with collision energy of between 15–20 V. Quinoa flowers were marked at anthesis, and seeds were sampled at 12, 16, 20 and 24 days after anthesis. A pool of five seeds from each time point was analysed using GC/MS. Quantification of saponins was performed indirectly by quantifying oleanolic acid (OA) derived from the hydrolysis of saponins extracted from quinoa seeds. Derivatized solution was analysed using single quadrupole GC/MS system (Agilent 7890 GC/5975C MSD) equipped with EI source at ionisation energy of 70 eV. Chromatography separation was performed using DB-5MS fused silica capillary column (30m × 0.25 mm I.D., 0.25 μm film thickness; Agilent J&W Scientific), chemically bonded with 5% phenyl 95% methylpolysiloxane cross-linked stationary phase. Helium was used as the carrier gas with constant flow rate of 1.0 ml min−1. The quantification of OA in each sample was performed using a standard curve based on standards of OA. Specific, individual saponins were identified in quinoa using a preparation of 20 mg of seeds performed according a modified protocol from Giavalisco et al.75. Samples were measured with a Waters ACQUITY Reversed Phase Ultra Performance Liquid Chromatography (RP-UPLC) coupled to a Thermo-Fisher Exactive mass spectrometer, which consists of an electrospray ionisation source and an Orbitrap mass analyser. A C18 column was used for the hydrophilic measurements. Chromatograms were recorded in full-scan MS mode (mass range, 100 −1,500). Extraction of the LC/MS data was accomplished with the software REFINER MS 7.5 (GeneData). SwissModel76 was used to produce homology models for the bHLH region of AUR62017204, AUR62017206 and AUR62010677. RaptorX77 was used for prediction of secondary structure and disorder. QUARK78 was used for ab initio modelling of the C-terminal domain, and the DALI server79 was used for 3D homology searches of this region. Models were manually inspected and evaluated using the PyMOL program (http://pymol.org). The genome assemblies and sequence data for C. quinoa, C. pallidicaule and C. suecicum were deposited at NCBI under BioProject codes PRJNA306026, PRJNA326220 and PRJNA326219, respectively. Additional accessions numbers for deposited data can be found in Supplementary Data 9. The quinoa genome can also be accessed at http://www.cbrc.kaust.edu.sa/chenopodiumdb/ and on the Phytozome database (http://www.phytozome.net/).
Liaset B.,National Institute of Nutrition And Seafood Research |
Hao Q.,Copenhagen University |
Jorgensen H.,University of Aarhus |
Hallenborg P.,University of Southern Denmark |
And 17 more authors.
Journal of Biological Chemistry | Year: 2011
Bile acids (BAs) are powerful regulators of metabolism, and mice treated orally with cholic acid are protected from diet-induced obesity, hepatic lipid accumulation, and increased plasma triacylglycerol (TAG) and glucose levels. Here, we show that plasma BA concentration in rats was elevated by exchanging the dietary protein source from casein to salmon protein hydrolysate (SPH). Importantly, the SPH-treated rats were resistant to diet-induced obesity. SPH-treated rats had reduced fed state plasma glucose and TAG levels and lower TAG in liver. The elevated plasma BA concentration was associated with induction of genes involved in energy metabolism and uncoupling, Dio2, Pgc-1α, and Ucp1, in interscapular brown adipose tissue. Interestingly, the same transcriptional pattern was found in white adipose tissue depots of both abdominal and subcutaneous origin. Accordingly, rats fed SPH-based diet exhibited increased whole body energy expenditure and heat dissipation. In skeletal muscle, expressions of the peroxisome proliferator-activated receptor β/δ target genes (Cpt-1b, Angptl4, Adrp, and Ucp3) were induced. Pharmacological removal of BAs by inclusion of 0.5 weight % cholestyramine to the high fat SPH diet attenuated the reduction in abdominal obesity, the reduction in liver TAG, and the decrease in nonfasted plasma TAG and glucose levels. Induction of Ucp3 gene expression in muscle by SPH treatment was completely abolished by cholestyramine inclusion. Taken together, our data provide evidence that bile acid metabolism can be modulated by diet and that such modulation may prevent/ameliorate the characteristic features of the metabolic syndrome. © 2011 by The American Society for Biochemistry and Molecular Biology, Inc.
Reyes A.,Mitochondrial Biology Unit |
Melchionda L.,Fondazione IRCCS Instituto Neurologico Carlo Besta |
Nasca A.,Fondazione IRCCS Instituto Neurologico Carlo Besta |
Carrara F.,Fondazione IRCCS Instituto Neurologico Carlo Besta |
And 11 more authors.
American Journal of Human Genetics | Year: 2015
Chronic progressive external ophthalmoplegia (CPEO) is common in mitochondrial disorders and is frequently associated with multiple mtDNA deletions. The onset is typically in adulthood, and affected subjects can also present with general muscle weakness. The underlying genetic defects comprise autosomal-dominant or recessive mutations in several nuclear genes, most of which play a role in mtDNA replication. Next-generation sequencing led to the identification of compound-heterozygous RNASEH1 mutations in two singleton subjects and a homozygous mutation in four siblings. RNASEH1, encoding ribonuclease H1 (RNase H1), is an endonuclease that is present in both the nucleus and mitochondria and digests the RNA component of RNA-DNA hybrids. Unlike mitochondria, the nucleus harbors a second ribonuclease (RNase H2). All affected individuals first presented with CPEO and exercise intolerance in their twenties, and these were followed by muscle weakness, dysphagia, and spino-cerebellar signs with impaired gait coordination, dysmetria, and dysarthria. Ragged-red and cytochrome c oxidase (COX)-negative fibers, together with impaired activity of various mitochondrial respiratory chain complexes, were observed in muscle biopsies of affected subjects. Western blot analysis showed the virtual absence of RNase H1 in total lysate from mutant fibroblasts. By an in vitro assay, we demonstrated that altered RNase H1 has a reduced capability to remove the RNA from RNA-DNA hybrids, confirming their pathogenic role. Given that an increasing amount of evidence indicates the presence of RNA primers during mtDNA replication, this result might also explain the accumulation of mtDNA deletions and underscores the importance of RNase H1 for mtDNA maintenance. © 2015 The American Society of Human Genetics.
PubMed | Mitochondrial Biology Unit, Fondazione IRCCS Instituto Neurologico Carlo Besta, Beijing Genomic Institute and University of Milan
Type: Journal Article | Journal: American journal of human genetics | Year: 2015
Chronic progressive external ophthalmoplegia (CPEO) is common in mitochondrial disorders and is frequently associated with multiple mtDNA deletions. The onset is typically in adulthood, and affected subjects can also present with general muscle weakness. The underlying genetic defects comprise autosomal-dominant or recessive mutations in several nuclear genes, most of which play a role in mtDNA replication. Next-generation sequencing led to the identification of compound-heterozygous RNASEH1 mutations in two singleton subjects and a homozygous mutation in four siblings. RNASEH1, encoding ribonuclease H1 (RNase H1), is an endonuclease that is present in both the nucleus and mitochondria and digests the RNA component of RNA-DNA hybrids. Unlike mitochondria, the nucleus harbors a second ribonuclease (RNase H2). All affected individuals first presented with CPEO and exercise intolerance in their twenties, and these were followed by muscle weakness, dysphagia, and spino-cerebellar signs with impaired gait coordination, dysmetria, and dysarthria. Ragged-red and cytochrome c oxidase (COX)-negative fibers, together with impaired activity of various mitochondrial respiratory chain complexes, were observed in muscle biopsies of affected subjects. Western blot analysis showed the virtual absence of RNase H1 in total lysate from mutant fibroblasts. By an in vitro assay, we demonstrated that altered RNase H1 has a reduced capability to remove the RNA from RNA-DNA hybrids, confirming their pathogenic role. Given that an increasing amount of evidence indicates the presence of RNA primers during mtDNA replication, this result might also explain the accumulation of mtDNA deletions and underscores the importance of RNase H1 for mtDNA maintenance.