Paris, France
Paris, France

Time filter

Source Type

News Article | February 3, 2016
Site: www.nature.com

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during outcome assessment. The three study subjects were enrolled into a clinical protocol where treatment was started after we obtained peripheral blood by venepuncture, inguinal lymph nodes by excisional biopsy, and ileum and rectal biopsies through colonoscopy13. Two subjects were antiretroviral drug naive (1727 and 1679) and one subject (1774) had not received any drugs for at least 1 year before enrolment. Subjects 1679 and 1774 received emtricitabine (FTC), tenofovir (TFV; as the disoproxil fumarate (TDF)), and atazanavir with ritonavir (ATV/R). Subject 1727 received FTC, TDF and efavirenz (EFV). In all three subjects, conventional typing methods confirmed that the plasma virus was sensitive to the potent antiretroviral regimen that they received. Subjects 1727 and 1679 were well-suppressed patients. Subject 1774 continued to have measureable amounts of HIV-1 RNA in plasma after 3, but not 6 months of treatment. We obtained peripheral blood, inguinal lymph node, and terminal ileum and rectum biopsies once again at 3 and 6 months after starting treatment. Laboratory procedures for tissue management, in situ hybridization, and analytical pharmacology are described elsewhere13. The Institutional Review Board of the University of Minnesota approved the study. All subjects provided written informed consent. Nucleic acids were extracted from frozen cells obtained from blood or lymphoid tissue using the MasterPure Complete DNA and RNA Purification Kit (Epicentre). Viral RNA was isolated from plasma using the PureLink Viral RNA/DNA Mini Kit (Life Technologies). HIV-1 was quantified using a quantitative reverse transcription PCR assay. The relative amount of HIV-1 target-DNA was normalized to the quantification cycle for a concentration calibrator by using an external standard curve of serial tenfold dilutions of reference DNA for the Gag region of HIV-1 derived from the plasmid pNL-43. All reactions were performed in triplicate on the ABI 7900HT sequence detector (Applied Biosystems). As the number of viral templates in the sampled material is low, we used an amplicon-based deep sequencing strategy. To minimize biased amplification of the target sequence, primer locations in the Gag and Pol regions of HIV-1 were selected on the basis of the alignment positional entropy in the multiple sequences aligned from the Los Alamos National Laboratory HIV-1 sequence database (http://www.hiv.lanl.gov/). The primers were computationally screened for cross-dimer interactions and the concentration of each primer was optimized for amplification. For each sample, blanks were included to screen for contamination. We used designated, physically separated areas within the laboratory to set-up PCR, which avoided contact with potentially contaminating amplicons. To generate the long read-length PCR amplicons sequenced in this study (range, 509–587 bp read-lengths per run depending on the gene region being analysed), we amplified the Gag and Pol regions of HIV-1. For the Gag region of HIV-1, we used forward primer gag_632F_EK (5′-GCAGTGGCGCCCGAAC-3′ (corresponding to HXB2 nucleic acid sequence numbering positions 632–647) and reverse primer gag_1788R_EK (5′-AATAGTCTTACAATCTGGGTTCGC-3′ (1788–1765)). For the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme protease and reverse transcriptase, we used forward primer HIV-Pro1_2137F (5′-CAGAGCAGACCAGAGCCAAC-3′, corresponding to positions 2137–2156) and reverse primer HIV-RT1_3531R (5′-CTGCTATTAAGTCTTTTGATGGGTC-3′ (3531–3507)). PCR amplification was performed using the High Fidelity Platinum Taq DNA Polymerase (Invitrogen) with thermal cycling conditions of 94 °C for 2 min, followed by 35 cycles of 94 °C for 15 s, 54 °C for 15 s, 68 °C for 1 min, with a final extension step at 68 °C for 5 min. We used an integrated sequencing pipeline for library construction, template amplification, and DNA sequencing as in ref. 20. Multiplex Identifiers were included during library preparation for sample barcoding. For the Gag region of HIV-1, we used the forward primer A-Gag_977F_degEK 5′-primer A-GCTACAACCAKCCCTYCAGACAG-3′ (977–1000) and the reverse primer B-Gag_1564R_degEK 5′-primer B-CTACTGGGATAGGTGGATTAYKTG-3′ (1564–1541) to generate a 587 bp amplicon (977–1564). For the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme protease, we used forward primer A-Pol1_2235F 5′-primer-A-ACTGTATCCTTTAGCTTCCCTCA-3′ (2235–2262) and the reverse primer B-Pol1_2744R 5′-primer B-TTTCTTTATGGCAAATACTGGAG-3′ (2744–2721) to generate a 509 bp amplicon (2235–2744). For the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme reverse transcriptase, we used forward primer A-Pol2_2700F 5′-primer-A-GGGCCTGAAAATCCATACAAT-3′ (2700–2721) and the reverse primer B-Pol2_3265R 5′-primer-B-CATTTATCAGGATGGAGTTCATA-3′ (3265–3242) to generate a 565 bp amplicon (2700–3265). PCR was performed as detailed above. DNA amplicon libraries were resolved on a pre-cast 2% agarose gel and purified with QIAquick Gel Extraction kit (Qiagen) and AMPure XP SPRI beads (Beckman Coulter). Libraries were quantified with KAPA Library Quant Kit (Kapa Biosystems) on Agilent 2100 Bioanalyzer High Sensitivity DNA chip (Agilent) for concentration and size distribution. The concentration of the product DNA was normalized before pooling to achieve sequence uniformity across amplicons. Controls were spiked into the reaction to monitor the library construction process and potential index cross-contamination. The known internal control sequence (clonal sequence of 456 bp) introduced into the reaction was used to calculate the single nucleotide error rate and set the cut-off for the sequence analysis where no control errors could be detected. PCR errors are more common in later cycles of amplification, and being limited to small copy numbers they have little impact on the haplotype distribution. Bidirectional sequencing using the 454 Life Sciences' GS-FLX sequencing platform (Roche) provided independent confirmation of sequence information. The long read-length sequences were sorted based on index sequences, trimmed to remove residual adaptor bases from the ends of the reads, and filtered for length and duplicates before alignment. Read depth and coverage estimation that met predetermined coverage thresholds were performed as in ref. 20. Sequencing quality metrics were calculated for all samples using FastQC and only high-quality sequencing libraries were used in the ensuing analyses. Experimental precision along with deep coverage allows for accurate estimation of the underpinning diversity of the virus population. We binned the sequence reads by multiplex identifier barcodes, and identified and excluded sequencing errors and misaligned regions from the analysis by computational methods. A small number of reads had a disproportionate number of errors that accounted for most of the inaccuracy in the full data set. After quality filtering and trimming to a uniform length, we proceeded to build the different haplotypes present in each of the samples. We began by collapsing the identical sequences into haplotypes using reference-guided assembly to avoid the use of uninformative sequence repeats. Haplotypes at prevalence above the error threshold (defined by the internal control spiked in the sequencing runs), which corresponded to variants present above 0.04% of the total existing variants in the collapsed alignments, were used in the analysis. For the processing of the temporally and spatially linked deep-sequencing data for studying viral diversity, the viral sequences were first aligned against the HXB2 reference sequence (GenBank accession number K03455) using Segminator II (version 0.1.1). We then generated a consensus viral sequence for each patient as a reference for assembly to improve the alignment quality. A statistical model that utilizes platform error rates in conjunction with patterns within nucleotide frequencies derived from data obtained from related samples (that is, different compartments within the host or temporally linked samples) was used to separate low frequency platform error from true variation. This model is termed ‘probabilistic read error detection across temporally obtained reads’ (PREDATOR) and is implemented within Segminator II. This statistical framework maintains the reading frame and corrects for deep sequencing errors19. We corroborated the number of haplotypes and the frequency of haplotypes that explain the data using a second reconstruction algorithm based on combinations of multinomial distributions to analyse the k-mer frequency spectrum of the sequencing data implemented with QuRe40. We screened the sequence alignments for recombinant sequences using the GARD algorithm implemented in HyPhy41, 42. The high coverage of massively parallel sequencing is necessary to ensure reliable detection of low-frequency viral variants. A simple calculation based on the geometric distribution shows that in order to guarantee that a viral template occurring at frequency f is detected with probability p or better, it is necessary to sequence at least log(1 − p)/log(1 − f) − 1 templates. A variant of frequency 0.01 (that is, 1%), for example, would require 450 sequences to ensure its detection at probability 0.99 (that is, 99%) or better. Conversely, a study using 100 single-genome sequences would detect a variant of frequency 0.01 with probability 0.64. A low number of input DNA templates derived from one compartment that catches the spillover from another does not account for the complexities of partial observation and spatial heterogeneity that could lead to measurement error elsewhere16. This complication emphasizes the challenge in trying to extrapolate from single-template sequencing the magnitude and character of the virus population that comprises the viral reservoir. Maximum-likelihood phylogenies were created with PhyML using the general time-reversible model with the proportion of invariant sites and gamma distribution of among-site rate variation (GTR + I + Γ ) nucleotide substitution model43 applying an approximate likelihood ratio test for branch support44. We estimated trees on viral sequence sets from which gaps in the alignment were removed and considered as missing data for the reason that maximum-likelihood tree error may increase with inclusion of unreliable sites. We assessed the temporal structure of the trees by performing linear regression on the root-to-tip distances of samples versus the time of sampling and tested the validity of the time-dependency of the evolution rate estimates with the assumption of a strict molecular clock using the program Path-O-Gen v1.4 (http://tree.bio.ed.ac.uk/software/pathogen/). We used the Highlighter sequence visualization tool (http://www.HIV.lanl.gov) to trace commonality between sequences in an alignment based on individual nucleotide changes. We tested for subdivision of viral sequences into sub-populations in the different compartments at each time point. We calculated genetic distances using the Wright’s F and S test statistics45, 46. We used a bootstrap test to determine the confidence of the estimates and performed a permutation test (1000 iterations) to assess the significance levels of the obtained scores. We used a modification of a random effects branch-site model to detect positive selection and test whether the phylogeny diverged over time47. A likelihood ratio hypothesis test compared the fit of the model using a 3-bin ω distribution (ω and ω in [0,1], ω unrestricted) to describe the evolution of all branches in the tree, to the fit of the model where ω is also restricted to be in [0,1]. We tested whether or not a proportion of sites along internal branches of the intra-host viral phylogeny have been subject to episodic selection (ω > 1)24, restricting the test to internal branches to lessen the biasing effects of neutral or deleterious mutations on ω estimates48, 49 and serve as a proxy for population level selection25, 26. To resolve the phyloanatomy, we reconstructed the temporal and spatial dynamics of the viral haplotype lineages with a Bayesian statistical framework using Markov chain Monte Carlo sampling for evolutionary hypothesis testing, as implemented in BEAST version 2.1.2 (ref. 27). This approach was used to sample phylogenies from their joint posterior distribution, in which the viral haplotypes are restricted by their known date of sampling, using a simple substitution model described by Hasegawa–Kishino–Yano (HKY) to avoid over-parameterization50. Models differing in assumptions on mutation rate and effective population size were run for 100 million generations each and compared using the Bayes factor as implemented in Tracer version 1.6. We determined that the best-fit model included a strict molecular clock and assumed a constant population size. We used a symmetric transition model with constant rates over time that considered a discretized diffusion process among the different compartments. This was formalized as a continuous time Markov chain model to reconstruct the spatial dynamics between compartments. All chains were run for sufficient length and convergence of the relevant parameters was assessed using Tracer version 1.6, ignoring 10% of the chain as burn-in. We summarized the connections between virus evolution and anatomical compartment history using an annotated MCC phylogenetic tree estimated with BEAST. The model and its parameters were chosen after computing the posterior probability of several models to obtain the discriminatory Bayes factors. Because population structure, whether due to spatial segregation or limitations to gene flow, may affect evolutionary dynamics, we confirmed that the direction of flow was not due to oversampling of a particular environment, by running a two-deme Bayesian inference under a structured coalescent model with a HKY substitution model assuming a strict molecular clock29, which is less susceptible to sampling issues than our trait-based analysis. For completeness, we conducted a search for topologies and divergence times assuming a relaxed molecular clock as well. In the analyses performed, HIV-1 showed a high degree of clock-like evolution and a mean nucleotide substitution rate expected to be within the bounds necessary to obtain meaningful phyloanatomic information from sequence data. Using the location of each of the haplotypes, a discrete trait was included in the inference. We used BEAST to estimate the probabilities of each of the possible states. Standard descriptive statistics were performed with the use of the STATA, GraphPad, or R packages.


News Article | April 6, 2016
Site: www.nature.com

No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. Experiments were carried out using the TR146 buccal epithelial squamous cell carcinoma line32 obtained from the European Collection of Authenticated Cell Cultures (ECACC) and grown in Dulbecco’s Modified Eagle’s Medium (DMEM, Sigma-Aldrich) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin. Cells were routinely tested for mycoplasma contamination using mycoplasma-specific primers and were found to be negative. Prior to stimulation, confluent TR146 cells were serum-starved overnight, and all experiments were carried out in serum-free DMEM. C. albicans wild-type strains included the autotrophic strain BWP17 + CIp30 (ref. 33) and the parental strain SC5314 (ref. 34). Other C. albicans strains used and their sources are listed in Extended Data Tables 1 and 2. C. albicans cultures were grown in YPD medium (1% yeast extract, 2% peptone, 2% dextrose) at 30 °C overnight. Cultures were washed in sterile PBS and adjusted to the required cell density. Antibodies to phospho-MKP1 and c-Fos were from Cell Signalling Technologies (New England Biolabs UK), mouse anti-human α-actin was from Millipore (UK), and goat anti-mouse and anti-rabbit horseradish peroxidase (HRP)-conjugated antibodies were from Jackson Immunologicals (Stratech Scientific, UK). Ece1p peptides were synthesized commercially (Proteogenix (France) or Peptide Synthetics (UK). ECE1 deletion was performed as previously described35. Deletion cassettes were generated by PCR36. Primers ECE1-FG and ECE1-RG were used to amplify pFA-HIS1 and pFA-ARG4 -based markers. C. albicans BWP17 (ref. 37), was sequentially transformed38 with the ECE1-HIS1 and ECE1-ARG4 deletion cassettes and then transformed with CIp10 (ref. 39), yielding the ece1∆/Δ deletion strain. For complementation, the ECE1 gene plus upstream and downstream intergenic regions were amplified with primers ECE1-RecF3k and ECE1-RecR and cloned into plasmid CIp10 at MluI and SalI sites. This plasmid was transformed into the uridine auxotrophic ece1Δ/Δ strain, yielding the ece1∆/Δ + ECE1 complemented strain. For generation of the ece1Δ/Δ + ECE1 strain, the CIp10-ECE1 was amplified with primers Pep3-F1 and Pep3-R1, digested with ClaI and re-ligated, yielding the CIp10 + ECE1 plasmid. This plasmid was transformed into the uridine auxotrophic ece1Δ/Δ strain, yielding the ece1Δ/Δ + ECE1 strain. All integrations were confirmed by PCR/sequencing and at least two independent isogenic transformants were created to confirm results. KEX1 deletion was performed exactly as the ECE1 deletion but using primers KEX1-FG and KEX1-RG for creating the deletion cassette. Fluorescent strains of ece1Δ/Δ and BWP17 were constructed as previously described40. Briefly, the ece1Δ/Δ and BWP17 strains were transformed with the pENO1-dTom-NATr plasmid. Primers used to clone and construct the ECE1 genes and intragenic regions are listed in Extended Data Table 4. Strains are listed in Extended Data Table 2. ECE1 promoter (primers 5′ECE1prom–NarI / 3′ECE1prom–XhoI) and terminator (5′ECE1term–SacII / 5′ECE1term–SacI) were amplified and cloned into pADH1-GFP. Resulting pSK-pECE1-GFP was verified by sequencing. C. albicans SC5314 was transformed with the pECE1-GFP transformation cassette38. Resistance to nourseothricin was used as selective marker and correct integration of GFP into the ECE1 locus was verified by PCR. Primers for cloning and validation are listed in Extended Data Table 4. Strains are listed in Extended Data Table 2. C. albicans cells grown on TR146 epithelial cells were collected into RNA pure (PeqLab), centrifuged and the pellet resuspended in 400 μl AE buffer (50 mM Na-acetate pH 5.3, 10 mM EDTA, 1% SDS). Samples were vortexed (30 s), and an equal volume of phenol/chloroform/isoamyl alcohol (25:24:1) was added and incubated for 5 min (65 °C) before subjected to 2× freeze-thawing. Lysates were clarified by centrifugation and the RNA precipitated with isopropyl alcohol/0.3 M sodium acetate by incubating for 1 h at −20 °C. Precipitated pellets were washed (2× 1 ml 70% ice-cold ethanol), resuspended in DEPC-treated water and stored at −80 °C. RNA integrity and concentration was confirmed using a Bioanalyzer (Agilent). RNA (500 ng) was treated with DNase (Epicentre) and cDNA synthesized using Reverse Transcriptase Superscript III (Invitrogen). cDNA samples were used for qPCR with EVAgreen mix (Bio&Sell). Primers (ACT1-F and ACT1-R for actin, ECE1-F and ECE1-R for ECE1 Extended Data Table 4) were used at a final concentration of 500 nM. qPCR amplifications were performed using a Biorad CFX96 thermocycler. Data was evaluated using Bio-Rad CFX Manager 3.1 (Bio-Rad) with ACT1 as the reference gene and t as the control sample. TR146 cells were lysed using a modified RIPA lysis buffer (50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 1% sodium deoxycholate, 0.1% SDS) containing protease (Sigma-Aldrich) and phosphatase (Perbio Science) inhibitors41, left on ice (30 min) and then clarified (10 min) in a refrigerated microfuge. Lysate total protein content was determined using the BCA protein quantitation kit (Perbio Science). 20 μg of total protein was separated on 12% SDS–PAGE gels before transfer to nitrocellulose membranes (GE Healthcare). After probing with primary (1:1,000) and secondary (1:10,000) antibodies, membranes were developed using Immobilon chemiluminescent substrate (Millipore) and exposed to X-ray film (Fuji film). Human α-actin was used as a loading control. DNA binding activity of transcription factors was assessed using the TransAM transcription factor ELISA system (Active Motif) as previously described41, 42. Serum-starved TR146 epithelial cells were treated for 3 h before being differentially lysed to recover nuclear proteins using a nuclear protein extraction kit (Active Motif) according to the manufacturer’s protocol. Protein concentration was determined (BCA protein quantitation kit (Perbio Science)) and 5 μg of nuclear extract was assayed in the TransAM system according to the manufacturer’s protocol. Data was expressed as fold-change in A relative to resting cells. Cytokine levels in cell culture supernatants were determined using the Performance magnetic Fluorokine MAP cytokine multiplex kit (Bio-techne) and a Bioplex 200 machine. The data were analysed using Bioplex Manager 6.1 software to determine analyte concentrations. Following incubation, culture supernatant was collected and assayed for lactate dehydrogenase (LDH) activity using the Cytox 96 Non-Radioactive Cytotoxicity Assay kit (Promega) according to the manufacturer’s instructions. Recombinant porcine LDH (Sigma-Aldrich) was used to generate a standard curve. Quantification of C. albicans adherence to TR146 epithelial cells was performed as described previously43. Briefly, TR146 cells were grown to confluence on glass coverslips for 48 h in tissue culture plates in DMEM medium. C. albicans yeast cells (2 × 105) were added into 1 ml serum-free DMEM, incubated for 60 min (37 °C/5% CO ) and non-adherent C. albicans cells removed by aspiration. Following washing (3× 1 ml PBS), cells were fixed with 4% paraformaldehyde (Roth) and adherent C. albicans cells stained with Calcofluor White and quantified using fluorescence microscopy. The number of adherent cells was determined by counting 100 high-magnification fields of 200 μm × 200 μm size. Exact total cell numbers were calculated based on the quantified areas and the total size of the cover slip. C. albicans invasion of epithelial cells was determined as described previously43. Briefly, TR146 epithelial cells were grown to confluence on glass coverslips for 48 h and then infected with C. albicans yeast cells (1 × 105), for 3 h in a humidified incubator (37 °C/5% CO ). Following washing (3× PBS), the cells were fixed with 4% paraformaldehyde. All surface adherent fungal cells were stained for 1 h with a rabbit anti-Candida antibody and subsequently with a goat anti-rabbit-Alexa Fluor 488 antibody. After rinsing with PBS, epithelial cells were permeabilized (0.1% Triton X-100 in PBS for 15 min) and fungal cells (invading and non-invading) were stained with Calcofluor White. Following rinsing with water, coverslips were visualized using fluorescence microscopy. The percentage of invading C. albicans cells was determined by dividing the number of (partially) internalized cells by the total number of adherent cells. At least 100 fungal cells were counted on each coverslip. TR146 cells (105 per ml) seeded on glass coverslips in DMEM/10% FBS were infected with C. albicans (2.5 × 104 cfu per ml) in DMEM and incubated for 6 h (37 °C/5% CO ). Cells were washed with PBS, fixed overnight (4 °C in 4% paraformaldehyde) and stained with Concanavalin A-Alexa Fluor 647 in PBS (10 μg ml−1) for 45 min at room temperature in the dark with gentle shaking (70 r.p.m.) to stain the fungal cell wall. Epithelial cells were permeabilized with 0.1% Triton X-100 for 15 min at 37 °C in the dark, then washed and stained with 10 μg ml−1 Calcofluor White (0.1 M Tris-HCl pH 9.5) for 20 min at room temperature in the dark with gentle shaking. Cells were rinsed in water and mounted on slides with 6 μl of ProLong Gold anti-fade reagent, before air drying for 2 h in the dark. Fluorescence microscopy was performed on a Zeiss Axio Observer Z1 microscope, and 5 phase images were taken per picture. For scanning electron microscopy (SEM) analysis, TR146 cells were grown to confluence on Transwell inserts (Greiner) and serum starved overnight in serum-free DMEM. After 5 h of C. albicans incubation on epithelial cells at an MOI of 0.01, cell media was removed and samples were fixed overnight at 4 °C with 2.5% (v/v) glutaraldehyde in 0.05 M HEPES buffer (pH 7.2) and post-fixed in 1% (w/v) osmium tetroxide for 1 h at room temperature. After washing, samples were dehydrated through a graded ethanol series before being critical point dried (Polaron E3000, Quorum Technologies). Dried samples were mounted using carbon double side sticky discs (TAAB) on aluminium pins (TAAB) and gold coated in an Emitech K550X sputter coater (Quorum Technologies Ltd). Samples were examined and images recorded using a FEI Quanta 200 field emission scanning electron microscope operated at 3.5 kV in high vacuum mode. Zebrafish infections were performed in accordance with NIH guidelines under Institutional Animal Care and Use Committee (IACUC) protocol A2009-11-01 at the University of Maine. To determine sample size, a power calculation was done for all experiments based on two-tailed t-tests in order to detect a minimum effect size of 0.8, with an alpha error probability of 0.05 and a power (1 – beta error probability) of 0.95. This gave a minimum number of 42 fish for each group. The fish selected for the experiments were randomly assigned to the different groups by picking them from a pool without bias and the groups were injected in different orders. No blinding was used to read the results. Ten to twenty zebrafish per group per experiment were maintained at 33 °C in E3 + PTU and used as previously described40. Briefly, 4 days post-fertilization (dpf) larvae were treated with 20 μg ml−1 dexamethasone dissolved in 0.1% DMSO 1 h before infection and thereafter. For tissue damage and neutrophil recruitment, individual AB or mpo:GFP fish (respectively) were injected into the swimbladder with 4 nl of PBS with or without 25–40 C. albicans yeast cells of ece1Δ/Δ-dTomato, ece1Δ/Δ + ECE1 + dTomato, ece1Δ/Δ + ECE1  + dTomato or BWP17-dTomato. For tissue damage, 1 nl of Sytox green (0.05 mM in 1% DMSO) was injected at 20 h post-infection into the swimbladder and fish were imaged by confocal microscopy at 24 h post-infection. For neutrophil recruitment, fish were imaged at 24 h post-injection. For synthetic peptide damage, AB or α-catenin:citrine44 fish were injected with 2 nl of peptide (9 ng or 1.25 ng per fish) or vehicle (40% DMSO or 5% DMSO) + SytoxGreen (0.05 mM in 1% DMSO) or SytoxOrange (0.5 mM in 10% DMSO) and the fish imaged by confocal microscopy 4 h later. Numbers of neutrophils and damaged cells observed were counted and tabulated for each fish. Live zebrafish imaging was carried out as previously described40. Briefly, fish were anaesthetized in Tris-buffered Tricaine (200 μg ml−1, Western Chemicals) and further immobilized in a solution of 0.4% low-melting-point agarose (LMA, Lonza) in E3 + Tricaine in a 96-well plate glass-bottom imaging dish (Greiner Bio-On). Confocal imaging was carried out using an Olympus IX-81 inverted microscope with an FV-1000 laser scanning confocal system (Olympus). Images were collected and processed using Fluoview (Olympus) and Photoshop (Adobe Systems). Panels are either a single slice for the differential interference contrast channel (DIC) with maximum projection overlays of fluorescence image channels (red-green), or maximum projection overlays of fluorescence channels. The number of slices for each maximum projection is specified in the legend of individual figures. Murine infections were performed under UK Home Office Project Licence PPL 70/7598 in dedicated animal facilities at King’s College London. No statistical method was used to pre-determine sample size. No method of randomization was used to allocate animals to experimental groups. Mice in the same cage were part of the same treatment. The investigators were not blinded during outcome assessment. A previously described murine model of oropharyngeal candidiasis using female BALB/c mice45 was modified to use for investigating early infection events. Briefly, mice were treated subcutaneously with 3 mg per mouse (in 200 μl PBS with 0.5% Tween 80) of cortisone acetate on days −1 and +1 post-infection. On day 0, mice were sedated for ~75 min with an intra-peritoneal injection of 110 mg per kg ketamine and 8 mg per kg xylazine, and a swab soaked in a 107 cfu per ml of C. albicans yeast culture in sterile saline was placed sublingually for 75 min. After 2 days, mice were euthanized, the tongue excised and divided longitudinally in half. One half was weighed, homogenized and cultured to derive quantitative Candida counts. The other half was processed for histopathology and immunohistochemistry. C. albicans-infected murine tongues were fixed in 10% (v/v) formal-saline before being embedded and processed in paraffin wax using standard protocols. For each tongue, 5-μm sections were prepared using a Leica RM2055 microtome and silane coated slides. Sections were dewaxed using xylene, before C. albicans and infiltrating inflammatory cells were visualized by staining using Periodic Acid-Schiff (PAS) stain and counterstaining with haematoxylin. Sections were then examined by light microscopy. Histological quantification of infection was undertaken by measuring the area of infected epithelium and expressed as a percentage relative to the entire epithelial area. TR146 epithelial cells were grown in 35-mm Petri dishes (Nunc) for 48 h before recordings at low cell density (10–30% confluence). Cells were superfused with a modified Krebs solution (120 mM NaCl, 3 mM KCl, 2.5 mM CaCl , 1.2 mM MgCl , 22.6 mM NaHCO , 11.1 mM glucose, 5 mM HEPES pH 7.4). Isolated cells were recorded at room temperature (21–23 °C) in whole cell mode using microelectrodes (5–7 MΩ) containing 90 mM potassium acetate, 20 mM KCl, 40 mM HEPES, 3 mM EGTA, 3 mM MgCl , 1 mM CaCl (free Ca2+ 40 nM), pH 7.4. Cells were voltage clamped at −60 mV using an Axopatch 200A amplifier (Axon Instruments) and current/voltage curves were generated by 1 s steps between −100 to +50 mV. Treatments were applied to the superfusate to produce the final required concentration, with vehicle controls similarly applied. Data was recorded using Clampex software (PClamp 6, Axon Instrument) and analysed with Clampfit 10. TR146 cells were grown in a 96-well plate overnight until confluent. The medium was removed and 50 μl of a Fura-2 solution (5 μl Fura-2 (Life Technologies) (2.5 mM in 50% Pluronic F-127 (Life Technologies):50% DMSO), 5 μl probenecid (Sigma) in 5 ml saline solution (NaCl (140 mM), KCl (5 mM), MgCl (1 mM), CaCl (2 mM), glucose (10 mM) and HEPES (10 mM), adjusted to pH 7.4)) was added and the plate incubated for 1 h at 37 °C/5% CO . The Fura-2 solution was replaced with 50 μl saline solution and baseline fluorescence readings (excitation 340 nm/emission 520 nm) taken for 10 min using a FlexStation 3 (Molecular Devices). Ece1 peptides were added at different concentrations and readings immediately taken for up to 3 h. The data was analysed using Softmax Pro software to determine calcium present in the cell cytosol and expressed as the ratio between excitation and emission spectra. tBLMs with 10% tethering lipids and 90% spacer lipids (T10 slides) were formed using the solvent exchange technique46, 47 according to the manufacturer’s instructions (SDx Tethered Membranes Pty Ltd, Sydney, Australia). Briefly, 8 μl of 3 mM lipid solutions in ethanol were added, incubated for 2 min and then 93.4 μl buffer (100 mM KCl, 5 mM HEPES, pH 7.0) was added. After rinsing 3× with 100 μl buffer the conductance and capacitance of the membranes were measured for 20 min before injection of Ece1 peptides at different concentrations. All experiments were performed at room temperature. Signals were measured using the tethaPod (SDx Tethered Membranes Pty, Sydney, Australia). Intercalation of Ece1 peptides into phospholipid liposomes was determined by FRET spectroscopy applied as a probe-dilution assay48. Phospholipids mixed with each 1% (mol/mol) of the donor dye NBD-phosphatidylethanolamine (NBD-PE) and of the acceptor dye rhodamine-PE, were dissolved in chloroform, dried, solubilized in 1 ml buffer (100 mM KCl, 5 mM HEPES, pH 7.0) by vortexing, sonicated with a titan tip (30 W, Branson sonifier, cell disruptor B15), and subjected to three cycles of heating to 60 °C and cooling down to 4 °C, each for 30 min. Lipid samples were stored at 4 °C for at least 12 h before use. Ece1 peptide was added to liposomes and intercalation was monitored as the increase of the quotient between the donor fluorescence intensity I at 531 nm and the acceptor intensity I at 593 nm (FRET signal) independent of time. CD measurements were performed using a Jasco J-720 spectropolarimeter (Japan Spectroscopic Co., Japan), calibrated as described previously49. CD spectra represent the average of four scans obtained by collecting data at 1 nm intervals with a bandwidth of 2 nm. The measurements were performed in 100 mM KCl, 5 mM HEPES, pH 7.0 at 25 °C and 40 °C in a 1.0 mm quartz cuvette. The Ece1-III concentration was 15 μM. Planar lipid bilayers were prepared using the Montal-Mueller technique50 as described previously51. All measurements were performed in 5 mM HEPES, 100 mM KCl, pH 7.0 (specific electrical conductivity 17.2 mS per cm) at 37 °C. Candida strains were cultured for 18 h in hyphae inducing conditions (YNB medium containing 2% sucrose, 75 mM MOPSO buffer pH 7.2, 5 mM N-acetyl-d-glucosamine, 37 °C). Hyphal supernatants were collected by filtering through a 0.2 μm PES filter, and peptides were enriched by solid phase extraction (SPE) using first C4 and subsequently C18 columns on the C4 flowthrough. After drying in a vacuum centrifuge, samples were resolubilized in loading solution (0.2% formic acid in 71:27:2 ACN/H O/DMSO (v/v/v)) and filtered through a 10 kDa MWCO filter. The filtrate was transferred into HPLC vials and injected into the LC-MS/MS system. LC-MS/MS analysis was carried out on an Ultimate 3000 nano RSLC system coupled to a QExactive Plus mass spectrometer (ThermoFisher Scientific). Peptide separation was performed based on a direct injection setup without peptide trapping using an Accucore C4 column as stationary phase and a column oven temperature of 50 °C. The binary mobile phase consisting of A) 0.2% (v/v) formic acid in 95:5 H O/DMSO (v/v) and B) 0.2% (v/v) formic acid in 85:10:5 ACN/H O/DMSO (v/v/v) was applied for a 60 min gradient elution: 0–1.5 min at 60% B, 35–45 min at 96% B, 45.1–60 min at 60% B. The Nanospray Flex Ion Source (ThermoFisher Scientific) provided with a stainless steel emitter was used to generate positively charged ions at 2.2 kV spray voltage. Precursor ions were measured in full scan mode within a mass range of m/z 300–1600 at a resolution of 70k FWHM using a maximum injection time of 120 ms and an automatic gain control target of 1e6. For data-dependent acquisition, up to 10 most abundant precursor ions per scan cycle with an assigned charge state of z = 2–6 were selected in the quadrupole for further fragmentation using an isolation width of m/z 2.0. Fragment ions were generated in the HCD cell at a normalized collision energy of 30 V using nitrogen gas. Dynamic exclusion of precursor ions was set to 20 s. Fragment ions were monitored at a resolution of 17.5k (FWHM) using a maximum injection time of 120 ms and an AGC target of 2e5. Thermo raw files were processed by the Proteome Discoverer (PD) software v1.4.0.288 (Thermo). Tandem mass spectra were searched against the Candida Genome Database (http://www.candidagenome.org/download/sequence/C_albicans_SC5314/Assembly22/current/C_albicans_SC5314_A22_current_orf_trans_all.fasta.gz; status: 2015/05/03) using the Sequest HT search algorithm. Mass spectra were searched for both unspecific cleavages (no enzyme) and tryptic peptides with up to 4 missed cleavages. The precursor mass tolerance was set to 10 p.p.m. and the fragment mass tolerance to 0.02 Da. Target Decoy PSM Validator node and a reverse decoy database was used for (q value) validation of the peptide spectral matches (PSMs) using a strict target false discovery (FDR) rate of <1%. Furthermore, we used the score versus charge state function of the Sequest engine to filter out insignificant peptide hits (xcorr of 2.0 for z = 2, 2.25 for z = 3, 2.5 for z = 4, 2.75 for z = 5, 3.0 for z = 6). At least two unique peptides per protein were required for positive protein hits. TransAM and patch clamp data were analysed using a paired t-test while cytokines, LDH and calcium influx data were analysed using one-way ANOVA with all compared groups passing an equal variance test. Murine in vivo data was analysed using the Mann–Whitney test. Zebrafish data was analysed using the Kruskal–Wallis test with Dunn’s multiple comparison correction. In all cases, P < 0.05 was taken to be significant.


News Article | November 16, 2016
Site: www.nature.com

AAV vector plasmids were cloned in the pAAV-MCS plasmid (Agilent Technologies) containing inverted terminal repeats from AAV serotype 2. The HBB rAAV6 GFP and tNGFR donor contained promoter, MaxGFP or tNGFR, and BGH polyA. The left and right homology arms for the GFP and tNGFR HBB donors were 540 bp and 420 bp, respectively. The Glu6Val rAAV6 donor contained 2.2 kb of sequence homologous to the sequence upstream of Glu6Val. The nucleotide changes are depicted in Extended Data Fig. 2. Immediately downstream of the last nucleotide change was 2.2 kb of homologous HBB sequence. HBB cDNA contained same homology arms as GFP and tNGFR donors above except the left homology arm was shortened to end at the sickle mutation. Sequence of full HBB cDNA is depicted in (Extended Data Fig. 9b). The sickle corrective donor used in the SCD-derived HSPCs in Fig. 4 had a total of 2.4 kb sequence homology to HBB with the SNPs shown in Extended Data Fig. 8a in the centre. scAAV6 carrying the SFFV promoter driving GFP was provided by H.-P. Kiem. AAV6 vectors were produced as described with a few modifications43. In brief, 293FT cells (Life Technologies) were seeded at 13 × 106 cells per dish in ten 15-cm dishes one day before transfection. One 15-cm dish was transfected using standard PEI transfection with 6 μg ITR-containing plasmid and 22 μg pDGM6 (a gift from D. Russell), which contains the AAV6 cap genes, AAV2 rep genes, and adenovirus helper genes. Cells were incubated for 72 h until collection of AAV6 from cells by three freeze–thaw cycles followed by a 45 min incubation with TurboNuclease at 250 U ml−1 (Abnova). AAV vectors were purified on an iodixanol density gradient by ultracentrifugation at 237,000g for 2 h at 18 °C. AAV vectors were extracted at the 60–40% iodixanol interface and dialysed three times in PBS with 5% sorbitol in the last dialysis using a 10K MWCO Slide-A-Lyzer G2 Dialysis Cassette (Thermo Fisher Scientific). Vectors were added pluronic acid to a final concentration of 0.001%, aliquoted, and stored at −80 °C until use. AAV6 vectors were titred using quantitative PCR to measure number of vector genomes as described previously44. Frozen CD34+ HSPCs derived from bone marrow or mobilized peripheral blood were purchased from AllCells and thawed according to manufacturer’s instructions. CD34+ HSPCs from cord blood were either purchased frozen from AllCells or acquired from donors under informed consent via the Binns Program for Cord Blood Research at Stanford University and used fresh without freezing. CD34+ HSPCs from patients with SCD were purified within 24 h of the scheduled apheresis. For volume reduction via induced rouleaux formation, whole blood was added 6% Hetastarch in 0.9% sodium chloride injection (Hospira, Inc.) in a proportion of 5:1 (v/v). Following a 60–90-min incubation at room temperature, the top layer, enriched for HSPCs and mature leukocytes, was carefully isolated with minimal disruption of the underlying fraction. Cells were pelleted, combined, and resuspended in a volume of PBS with 2 mM EDTA and 0.5% BGS directly proportional to the fraction of residual erythrocytes—typically 200–400 ml. Mononuclear cells (MNCs) were obtained by density gradient separation using Ficoll and CD34+ HSPCs were purified using the CD34+ Microbead Kit Ultrapure (Miltenyi Biotec) according to manufacturer’s protocol. Cells were cultured overnight and then stained for CD34 and CD45 using APC anti-human CD34 (clone 561; Biolegend) and BD Horizon V450 anti-human CD45 (clone HI30; BD Biosciences), and a pure population of HSPCs defined as CD34bright/CD45dim were obtained by cell sorting on a FACS Aria II cell sorter (BD Biosciences). All CD34+ HSPCs were cultured in StemSpan SFEM II (StemCell Technologies) supplemented with SCF (100 ng ml−1), TPO (100 ng ml−1), Flt3 ligand (100 ng ml−1), IL-6 (100 ng ml−1), and StemRegenin1 (0.75 mM). Cells were cultured at 37 °C, 5% CO and 5% O . The HBB and IL2RG synthetic sgRNAs used were purchased from TriLink BioTechnologies with chemically modified nucleotides at the three terminal positions at both the 5′ and 3′ ends. Modified nucleotides contained 2′-O-methyl-3′-phosphorothioate and the sgRNAs were HPLC-purified. The genomic sgRNA target sequences, with PAM in bold, are: HBB: 5′-CTTGCCCCACAGGGCAGTAACGG-3′ (refs 45, 46); IL2RG: 5′-TGGTAATGATGGCTTCAACATGG-3′. Cas9 mRNA containing 5-methylcytidine and pseudouridine was purchased from TriLink BioTechnologies. Cas9 protein was purchased from Life Technologies. Cas9 RNP was made by incubating protein with sgRNA at a molar ratio of 1:2.5 at 25 °C for 10 min immediately before electroporation. CD34+ HSPCs were electroporated 1–2 days after thawing or isolation. CD34+ HSPCs were electroporated using the Lonza Nucleofector 2b (program U-014) and the Human T Cell Nucleofection Kit (VPA-1002, Lonza) as we have found this combination to be superior in optimization studies. The following conditions were used: 5 × 106 cells ml−1, 300 μg ml−1 Cas9 protein complexed with sgRNA at 1:2.5 molar ratio, or 100 μg ml−1 synthetic chemically modified sgRNA with 150 μg ml−1 Cas9 mRNA (TriLink BioTechnologies, non-HPLC purified). Following electroporation, cells were incubated for 15 min at 37 °C after which they were added AAV6 donor vectors at an MOI (vector genomes/cell) of 50,000–100,000 and then incubated at 30 °C or 37 °C overnight (if incubated at 30 °C, plates were then transferred to 37 °C) or targeting experiments of freshly sorted HSCs (Extended Data Fig. 5g), cells were electroporated using the Lonza Nucleofector 4D (program EO-100) and the P3 Primary Cell Nucleofection Kit (V4XP-3024). For the electroporation of 80 million CD34+ HSPCs, the Lonza 4D-Nucleofector LV unit (program DZ-100) and P3 Primary Cell Kit were used. Subsequently, we have found no benefit to the 30 °C incubation and now perform all of our manufacturing at 37 °C. Rates of targeted integration of GFP and tNGFR donors were measured by flow cytometry at least 18 days after electroporation. Targeted integration of a tNGFR expression cassette was measured by flow cytometry of cells stained with APC-conjugated anti-human CD271 (NGFR) antibody (BioLegend, clone: ME20.4). For sorting of GFPhigh or tNGFRhigh populations, cells were sorted on a FACS Aria II SORP using the LIVE/DEAD Fixable Blue Dead Cell Stain Kit (Life Technologies) to discriminate live and dead cells according to manufacturer’s instructions. Positive selection of targeted HSPCs was performed using the CD271 (tNGFR) Microbead Kit (Miltenyi Biotech), according to the manufacturer’s instructions 72 h after electroporation. In brief, tNGFR+ cells were magnetically labelled with CD271 Microbeads after which the cell suspension was loaded onto an equilibrated MACS column inserted in the magnetic field of a MACS separator. The columns were washed three times, and enriched cells were eluted by removing the column from the magnetic field and eluting with PBS. Enrichment was determined by flow cytometry during culture for 2–3 weeks by FACS analysis every 3 days. Collected wells were stained with LIVE/DEAD Fixable Blue Dead Cell Stain (Life Technologies) and then with anti-human CD34 PE-Cy7 (581, BioLegend), CD38 Alexa Fluor 647 (AT1, Santa Cruz Biotechnologies), CD45RA BV 421 (HI100, BD Biosciences), and CD90 BV605 (5E10, BioLegend) and analysed by flow cytometry. For sorting of CD34+ or CD34+ CD38− CD90+ cells, cord-blood-derived CD34+ HSPCs were stained directly after isolation from blood with anti-human CD34 FITC (8G12, BD Biosciences), CD90 PE (5E10, BD Biosciences), CD38 APC (HIT2, BD Bioscience), and cells were sorted on a FACS Aria II (BD Bioscience), cultured overnight, and then electroporated with HBB RNP and transduced with HBB GFP rAAV6 using our optimized parameters. For assessing the allele modification frequencies in samples with targeted integration of the Glu6Val rAAV6 donor, PCR amplicons spanning the targeted region (see Extended Data Fig. 2a) were created using one primer outside the donor homology arm and one inside: HBB_outside 5′-GGTGACAATTTCTGCCAATCAGG-3′ and HBB_inside: 5′-GAATGGTAGCTGGATTGTAGCTGC-3′. The PCR product was gel-purified and re-amplified using a nested primer set (HBB_nested_fw: 5′-GAAGATATGCTTAGAACCGAGG-3′ and HBB_nested_rv: 5′-CCACATGCCCAGTTTCTATTGG-3′) to create a 685-bp PCR amplicon (see Extended Data Fig. 2a) that was gel-purified and cloned into a TOPO plasmid using the Zero Blunt TOPO PCR Cloning Kit (Life Technologies) according to the manufacturer’s protocol. TOPO reactions were transformed into XL-1 Blue competent cells, plated on kanamycin-containing agar plates, and single colonies were sequenced by McLab by rolling circle amplification followed by sequencing using the following primer: 5′-GAAGATATGCTTAGAACCGAGG-3′. For each of the six unique CD34+ donors used in this experiment, 100 colonies were sequenced. Additionally, 100 colonies derived from an AAV-only sample were sequenced and detected no integration events. INDEL frequencies were quantified using the TIDE software47 (tracking of indels by decomposition) and sequenced PCR products obtained by PCR of genomic DNA extracted at least 4 days after electroporation as previously described14. The CFU assay was performed by FACS sorting of single cells into 96-well plates containing MethoCult Optimum (StemCell Technologies) 4 days after electroporation and transduction. After 12–16 days, colonies were counted and scored based on their morphological appearance in a blinded fashion. DNA was extracted from colonies formed in methylcellulose from FACS sorting of single cells into 96-well plates. In brief, PBS was added to wells with colonies, and the contents were mixed and transferred to a U-bottomed 96-well plate. Cells were pelleted by centrifugation at 300g for 5 min followed by a wash with PBS. Finally, cells were resuspended in 25 μl QuickExtract DNA Extraction Solution (Epicentre) and transferred to PCR plates, which were incubated at 65 °C for 10 min followed by 100 °C for 2 min. Integrated or non-integrated alleles were detected by PCR. For detecting HBB GFP integrations at the 3′ end, two different PCRs were set up to detect integrated (one primer in insert and one primer outside right homology arm) and non-integrated alleles (primer in each homology arm), respectively (see Extended Data Fig. 4a). HBB_int_fw: 5′-GTACCAGCACGCCTTCAAGACC-3′, HBB_int_rv: 5′-GATCCTGAGACTTCCACACTGATGC-3′, HBB_no_int_fw: 5′-GAAGATATGCTTAGAACCGAGG-3′, HBB_no_int_rv: 5′-CCACATGCCCAGTTTCTATTGG-3′. For detecting HBB tNGFR integrations at the 5′ end, a 3-primer PCR methodology was used to detect the integrated and non-integrated allele simultaneously (see Extended Data Fig. 4d). HBB_outside_5′Arm_fw: 5′-GAAGATATGCTTAGAACCGAGG-3′, SFFV_rev: 5′-ACCGCAGATATCCTGTTTGG-3′, HBB_inside_3′Arm_rev: 5′-CCACATGCCCAGTTTCTATTGG-3′. Note that for the primers assessing non-integrated alleles, the Cas9 cut site is at least 90 bp away from the primer-binding sites and since CRISPR/Cas9 generally introduces INDELs of small sizes, the primer-binding sites should only very rarely be disrupted by an INDEL. For in vivo studies, 6 to 8 week-old NSG mice were purchased from the Jackson laboratory (Bar Harbour). The experimental protocol was approved by Stanford University’s Administrative Panel on Laboratory Animal Care. For transplant data in Fig. 3a–c, sample sizes were not chosen to ensure adequate power to detect a pre-specified effect size. Four days after electroporation/transduction or directly after sorting, 500,000 cells (or 100,000–500,000 cells for the GFPhigh group) were administered by tail-vein injection into the mice after sub-lethal irradiation (200 cGy) using an insulin syringe with a 27 gauge × 0.5 inch (12.7 mm) needle. For transplant data in Fig. 3f, g, three days after electroporation, 400,000–700,000 bulk HSPCs or HSPCs enriched for targeting (FACS or bead-enrichment) were transplanted as described above. Mice were randomly assigned to each experimental group and evaluated in a blinded fashion. For secondary transplants, human cells from the RNP plus AAV group were pooled and CD34+ cells were selected using a CD34 bead enrichment kit (MACS CD34 MicroBead Kit UltraPure, human, Miltenyi Biotec), and finally cells were injected into the femurs of female secondary recipients (3 mice total). Because GFPhigh mice had low engraftment, they were not CD34+-selected, but total mononuclear cells were filtered, pooled, and finally injected into the femur of two secondary recipients. At week 16 after transplantation, mice were euthanized, mouse bones (2× femur, 2× tibia, 2× humerus, sternum, 2× pelvis, spine) were collected and crushed using mortar and pestle. MNCs were enriched using Ficoll gradient centrifugation (Ficoll-Paque Plus, GE Healthcare) for 25 min at 2,000g, room temperature. Cells were blocked for nonspecific antibody binding (10% v/v, TruStain FcX, BioLegend) and stained (30 min, 4 °C, dark) with monoclonal anti-human CD45 V450 (HI30, BD Biosciences), CD19 APC (HIB19, BD Biosciences), CD33 PE (WM53, BD Biosciences), HLA-ABC APC-Cy7 (W6/32, BioLegend), anti-mouse CD45.1 PE-Cy7 (A20, eBioScience), anti-mouse PE-Cy5 mTer119 (TER-119, eBioscience) antibodies. Normal multi-lineage engraftment was defined by the presence of myeloid cells (CD33+) and B-cells (CD19+) within engrafted human CD45+ HLA-ABC+ cells. Parts of the mouse bone marrow were used for CD34-enrichment (MACS CD34 MicroBead Kit UltraPure, human, Miltenyi Biotec) and the presence of human HSPCs was assessed by staining with anti-human CD34 APC (8G12, BD Biosciences), CD38 PE-Cy7 (HB7, BD Biosciences), CD10 APC-Cy7 (HI10a, BioLegend), and anti-mouse CD45.1 PE-Cy5 (A20, eBioScience) and analysed by flow cytometry. The estimation of the total number of modified human cells in the bone marrow at week 16 after transplant was calculated by multiplying the percentage engraftment with the percentage GFP+ cells among engrafted cells. This number was multiplied by the total number of MNCs in the bone marrow of a NSG mouse (1.1 × 108 per mouse) to give the total number of GFP+ human cells in the total bone marrow of the transplanted mice. The total number of MNCs in the bone marrow of a NSG mouse was calculated by counting the total number of MNCs in one femur in four NSG mice. The total number of MNCs in one mouse was then calculated assuming one femur is 6.1% of the total marrow as found previously48. SCD patient-derived HSPCs were cultured in three phases following targeting at 37 °C and 5% CO in SFEM II media according to previously established protocols39, 40. Media was supplemented with 100 U ml−1 penicillin/streptomycin, 2 mM l-glutamine, 40 μg ml−1 lipids, 100 ng ml−1 SCF, 10 ng ml−1 IL-3 (PeproTech), 0.5 U ml−1 erythropoietin (eBiosciences), and 200 μg ml−1 transferrin (Sigma Aldrich). In the first phase, corresponding to days 0–7 (day 0 being day 4 after electroporation), cells were cultured at 105 cells ml−1. In the second phase, corresponding to days 7–11, cells were maintained at 105 cells ml−1 and erythropoietin was increased to 3 U ml−1. In the third and final phase, days 11–21, cells were cultured at 106 cells ml−1 with 3 U ml−1 of erythropoietin and 1 mg ml−1 of transferrin. Erythrocyte differentiation of edited and non-edited HSPCs was assessed by flow cytometry using the following antibodies: hCD45 V450 (HI30, BD Biosciences), CD34 FITC (8G12, BD Biosciences), CD71 PE-Cy7 (OKT9, Affymetrix), and CD235a PE (GPA) (GA-R2, BD Biosciences). RNA was extracted from 100,000–250,000 differentiated erythrocytes between days 16–21 of erythroid differentiation using the RNeasy Mini Kit (Qiagen) and was DNase-treated with RNase-Free DNase Set (Qiagen). cDNA was made from 100 ng RNA using the iScript Reverse Transcription Supermix for RT-qPCR (Bio-Rad). Levels of HbS, HbA (from corrective SNP donor), and HbA-AS3 (anti-sickling HBB cDNA donor) were quantified by qPCR using the following primers and FAM/ZEN/IBFQ-labelled hydrolysis probes purchased as custom-designed PrimeTime qPCR Assays from IDT: HbS primer (fw): 5′-TCACTAGCAACCTCAAACAGAC-3′, HbS primer (rv): 5′-ATCCACGTTCACCTTGCC-3′, HbS probe: 5′-TAACGGCAGACTTCTCCACAGGAGTCA-3′, HbA primer (fw): 5′-TCACTAGCAACCTCAAACAGAC-3′, HbA primer (rv): 5′-ATCCACGTTCACCTTGCC-3′, HbA probe: 5′-TGACTGCGGATTTTTCCTCAGGAGTCA-3′, HbAS3 primer fw: 5′-GTGTATCCCTGGACACAAAGAT-3′, HbAS3 primer (rv): 5′-GGGCTTTGACTTTGGGATTTC-3′, HbAS3 probe: 5′-TTCGAAAGCTTCGGCGACCTCA-3′. Primers for HbA and HbS are identical, but probes differ by six nucleotides, and therefore it was experimentally confirmed that these two assays do not cross-react with targets. To normalize for RNA input, levels of the reference gene RPLP0 was determined in each sample using the IDT predesigned RPLP0 assay (Hs.PT.58.20222060). qPCR reactions were carried out on a LightCycler 480 II (Roche) using the SsoAdvanced Universal Probes Supermix (BioRad) following manufacturer’s protocol and PCR conditions of 10 min at 95 °C, 50 cycles of 15 s at 95 °C and 60 s at 58 °C. Relative mRNA levels were determined using the relative standard curve method, in which a standard curve for each gene was made from serial dilutions of the cDNA. The standard curve was used to calculate relative amounts of target mRNA in the samples relative to levels of RPLP0. The authors declare that the data supporting the findings of this study are available within the paper.


News Article | November 29, 2016
Site: www.newsmaker.com.au

With a CAGR of 10.1%, global market value for PCR Products/Tools market is anticipated to be worth US$12 billion by 2020. On a global scale, Europe accounts for more than 25% of the market. While USA accounts for the largest share of the global market value on a country basis, Asia-Pacific is the fastest growing region in terms of growth rate anticipated in the near future and leads the world. PCR Machines account for the largest share of the entire market, driving a CAGR of 9.5% during the analysis period 2014-2020. PCR Reagents and PCR Detection Kits/Assays accounts for more than 40% of the market share and fastest growing segment with a CAGR approximately 10.8% and 10.5% by 2020 respectively. The report “Polymerase Chain Reaction (PCR) - Products/Tools - Global Trends, Estimates and Forecasts, 2014-2020” reviews the latest PCR market trends with a perceptive attempt to disclose the near-future growth prospects. An in-depth analysis on a geographic basis provides strategic business intelligence for life science sector investments. The study reveals profitable investment strategies for pharmaceutical manufacturers, biotechnology companies, laboratories, Contract Research  Organizations (CROs) and many more in preferred locations. The report primarily focuses on: Estimates are based on online surveys using customized questionnaires by our research team. Besides information from government databases, company websites, press releases & published research reports are also used for estimates. The analysis primarily deals with major PCR product/tools market. Further, the subdivided categories include: The period considered for the PCR Products/Tools market analysis is 2014-2020. The region wise distribution of the market consists of North America (USA and Canada), Europe (Germany, France, United Kingdom, Italy, Spain and Rest of Europe), Asia- Pacific (Japan, China, India, South Korea and Rest of Asia-Pacific), Latin America (Brazil, Columbia, Argentina and Rest of Latin America) and Rest of the World. The market growth rate in the major economies such as the U.S., Japan, China etc. are estimated individually for the upcoming years. More than 435 leading market players are identified and 45 key companies that project improved market activities in the near future are profiled. The report consists of 91 data charts describing the market shares, sales forecasts and growth prospects. Moreover, key strategic activities in the market including mergers/acquisitions, collaborations/partnerships, product launches/developments are discussed. Abbott Laboratories, Affymetrix, Inc., Agilent Technologies, Inc., BD Biosciences, Bio-Rad Laboratories, Inc., Complete Genomics, Inc., Epicentre® Biotechnologies, GE Healthcare (Life Sciences), Illumina, Qiagen, Inc., Dna Landmarks, Inc., Roche Diagnostics, Eppendorf AG, Cytocell Ltd, Shimadzu Biotech, Dnavision SA, Exiqon, Hokkaido System Science Co., Ltd., Ocimum Biosolutions, Ltd., HY Laboratories LTD., PerkinElmer Life Sciences & many more… History Of Polymerase Chain Reaction  All About PCR Specimen Preparation 1. Isolating The Target DNA - Denaturation 2. Binding PrimerstoThe DNA Chain - Annealing 3. Making A Replica – Extension PCR Variations Of Polymerase Chain Reaction Basic PCR Technique’s Fluctuations Allele-Precise PCR Pca (Polymerase Cycling Assembly)orAssembly PCR Asymmetric PCR Helicase-Reliant Amplification Hot–Start PCR Intersequence-Specific PCR Reverse PCR Ligation Mediated PCR Methylation Specific PCR Miniprimer PCR Multiplex Ligation-Reliant Probe Amplification Multiplex PCR Nested PCR Overlap-Extension PCR Quantitative PCR Rt-PCR Solid-Phase PCR Tail-PCR Touchdown PCR Pan-Ac Universal Fast Walking Parameters For Successful PCR I) Metal Ion Cofactors And PCR Ii) Substrates And Substrate Analogs For PCR Iii) Buffers And Salts For PCR Iv) Cosolvents Theory And Methodology Of Polymerase Chain Reaction Methodology Of Use PCR – Improvised Technique For Testing Nucleic Acids Formatting Step Denaturation Step Annealing Step Extension/Elongation Step PCR Reagents Role Of PCR Reagents Gotaq® PCR Mix Primers PCR Reagents Next Generation PCR Reagents For Clinical Diagnostics PCR Reagent Market Trends Other PCR Software PCR Robotics The Automation And Usage Of Robotics In Amplification Assays Pre-PCR Robotic System The Post-PCR Robotic System Integrated Robotic System For High Sample Throughput Within A DNA Databasing Unit PCR Arrays Need Of PCR Arrays Doctrine Of Assay Corroboration For Nucleic Acid Diagnostic Tests Assay Corroboration – An Introduction 1. Selecting An Assay Fitting Its Intended Purpose Considerations Towards Primitive Assay Developments A) Care And Restraints B) Protections For Avoiding False-Positive Results C) Safeguards For Avoiding Negative Outcomes. D) Standards’ Preparation


News Article | March 1, 2017
Site: www.nature.com

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. E14 mouse ES cells were cultured in high-glucose DMEM (Invitrogen) supplemented with 15% FBS (Millipore), 0.1 mM non-essential amino acids (Invitrogen), 1 mM sodium pyruvate (Invitrogen), 0.1 mM 2-mercaptoethanol, 1500 U ml−1 LIF (Millipore), 25 U ml−1 penicillin, and 25 μg ml−1 streptomycin. The cells were mycoplasma free. Generation of Dnmt3b−/− ES cells was performed using TALEN technology. Cells were transfected with the two TALEN constructs targeting exon 17 of murine Dnmt3b (corresponding to the start of the catalytic domain) and after 16 h were seeded as a single cell. After ten days, clones were screened by western blot analysis. Positive clones were analysed by genomic sequencing. For half-life measurements and Pol II elongation inhibition, wild-type and Dnmt3b−/− ES cells were treated with DRB at the concentration of 75 μM for the indicated times. For total cell extracts, cells were resuspended in F-buffer (10 mM Tris-HCl pH 7.0, 50 mM NaCl, 30 mM Na-pyrophosphate, 50 mM NaF, 1% Triton X-100, anti-proteases) and sonicated for three pulses. Extracts were quantified using BCA assay (Pierce) and were run on SDS-polyacrylamide gels at different percentages, transferred to nitrocellulose membranes and incubated with specific primary antibodies overnight. Nuclear protein extractions were performed as described in ref. 41. In brief, cells were harvested in PBS 1× and resuspended in isotonic buffer (20 mM HEPES pH 7.5, 100 mM NaCl, 250 mM sucrose, 5 mM MgCl , 5 μM ZnCl ). Successively, cells were resuspended in isotonic buffer supplemented with 1% NP-40 to isolate nuclei. The isolated nuclei were resuspended in digestion buffer (50 mM Tris-HCl pH 8.0, 100 mM NaCl, 250 mM sucrose, 0.5 mM MgCl , 5 mM CaCl , 5 μM ZnCl ) and treated with Microccocal Nuclease (NEB) at 30 °C for 10 min. Nuclear proteins from about 1 × 107 cells were incubated with 3 μg of specific antibody overnight at 4 °C. Immunocomplexes were incubated with protein-G-conjugated magnetic beads (DYNAL, Invitrogen) for 2 h at 4 °C. Samples were washed four times with digestion buffer supplemented with 0.1% NP-40 at RT. Proteins were eluted by incubating with 0.4 M NaCl TE buffer for 30 min and were analysed by western blotting. Custom shRNAs against SetD2, Dis3 and Rrp6 were constructed using the TRC hairpin design tool (http://www.broadinstitute.org/rnai/public/seq/search), and designed to target the 3′ UTR. shRNAs with more than 14 consecutive matches to non-target transcripts were avoided. Hairpins were cloned into pLKO.1 vector (Addgene: 10878) and each construct was verified by sequencing. Dnmt3b construct was obtained by PCR amplification and cloned into pEF6/V5-His vector (Invitrogen). The Dnmt3b mutant constructs (V725G, S277P and VW-RR) were generated by introducing a site-specific mutation in the DNA sequence corresponding to Val725 to mutate it into a glycine, or Ser277 to mutate it into a proline, or Val236Trp237 to mutate it to Arg–Arg, using QuickChange XL Site-Directed Mutagenesis Kit (Agilent Technologies). Transfections of mouse ES cells were performed using Lipofectamine 2000 Transfection Reagent in according to manufacturer’s protocol using equal amounts of each plasmid in multiple transfections. For SetD2 knockdown, cells were transfected with 5 μg of the specific shRNA construct, and maintained in medium with puromycin selection (1 μg ml−1) for 48 h. To investigate the distribution of the endogenous Dnmt3b we tested different antibodies and found one that was able to immunoprecipitate the endogenous Dnmt3b cross-linked to chromatin, which showed no background signal in Dnmt3b−/− (Extended Data Fig. 1g–i). The ChIP-seq data were validated by ChIP–qPCR, using several biological replicates, on target genomic regions and by crosslinked co-immunoprecipitation experiments between Dnmt3b and H3K36me3 in wild-type or Dnmt3b−/− ES cells (Extended Data Fig. 1o, p). For Dnmt3b ChIP-seq, approximately 2 × 107 cells were cross-linked by addition of formaldehyde to 1% for 10 min at RT, quenched with 0.125 M glycine for 5 min at RT, and then washed twice with cold PBS. The cells were resuspended in lysis buffer 1 (50 mM HEPES-KOH pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100 and protease inhibitor) to disrupt the cell membrane and in lysis buffer 2 (10 mM Tris-HCl pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA and protease inhibitor) to isolate nuclei. The isolated nuclei were then resuspended in SDS ChIP Buffer (20 mM Tris-HCl pH 8.0, 10 mM EDTA, 1% SDS and protease inhibitors). Extracts were sonicated using the BioruptorH Twin (Diagenode) for two runs of ten cycles (30 s on, 30 s off) at high-power setting. Cell lysate was centrifuged at 12,000g for 10 min at 4 °C. The supernatant was diluted with ChIP dilution buffer (20 mM Tris-HCl pH 8.0, 150 mM NaCl, 2 mM EDTA, 1% Triton) before the immunoprecipitation step. Magnetic beads (Dynabeads rat anti-mouse IgM for anti-Pol II-phospho-S5, Dynabeads Protein G for all the other ChIPs, Life Technologies) were saturated with PBS/1% BSA and the samples were incubated with 2 μg of antibody overnight at 4 °C on a rotator. Next day samples were incubated with saturated beads for two hours at 4 °C on a rotator. Successively immunoprecipitated complexes were washed five times with RIPA buffer (50 mM HEPES-KOH pH 7.6, 500 mM LiCl, 1 mM EDTA, 1% NP-40, 0.7% Na-Deoxycholate) at 4 °C for 5 min each on a rotator. For other ChIP-seq, ChIP-seq was performed as described previously42. Elution buffer was added and incubated at 65 °C for 15 min. The de-crosslinking was performed at 65 °C overnight. De-crosslinked DNA was purified using QiaQuick PCR Purification Kit (Quiagen) according to the manufacture’s instruction. MeDIP was performed using MeDIP kit (Active Motif), according to the manufacturer’s protocol. DNA was analysed by quantitative real-time PCR by using SYBR GreenER kit (Invitrogen). All experiment values were normalized to input. The data shown represent triplicate real-time quantitative PCR measurements of the immunoprecipitated DNA. The data are expressed as a percentage of the DNA inputs. Error bars represent standard deviation determined from triplicate experiments. Oligonucleotide sequences are reported in Supplementary Table 1. Genomic DNA was extracted from cells using DNeasy Blood and Tissue kit (Qiagen). For dot-blot analysis, extracted genomic DNA was sonicated using the BioruptorH Twin (Diagenode) for two runs of ten cycles (30 s on, 30 s off) at high-power setting, in order to obtain 300-bp fragments, denatured with 0.4 M NaOH and incubated for 10 min at 95 °C before being spotted onto HybondTM-N+ (GE Healthcare). Membranes were saturated with 5% milk and incubated with the specific antibodies overnight. Approximately 10 ng of purified ChIP DNA were end-repaired, dA-tailed, and adaptor-ligated using the NEBNext ChIP-seq Library Prep Master Mix Set (NEB), following the manufacturer’s instructions. For whole-genome bisulphite-seq library preparation, 2.5 μg of ES cells genomic DNA, were spiked-in with 1 ng of Escherichia coli genomic DNA, and sheared using a Bioruptor Twin sonicator (Diagenode) for three runs of ten cycles (30 s on, 30 s off) at high-power setting. Fragmented/digested DNA was then end-repaired, dA-tailed, and ligated to methylated adapters, using the Illumina TruSeq DNA Sample Prep Kit, following manufacturer instructions. DNA was loaded on EGel Size select 2% agarose pre-cast gel (Invitrogen), and a fraction corresponding to fragments ranging from 180 bp to 350 bp was recovered. Purified DNA was then subjected to bisulphite conversion using the EpiTect Bisulphite Kit (Qiagen). Bisulphite-converted DNA was finally enriched by 15 cycles of PCR using Pfu Turbo Cx HotStart Taq (Agilent). Total RNA was extracted as previously described43 using TRIzol reagent (Invitrogen). Real-time PCR was performed using the SuperScript III Platinum One-Step Quantitative RT–PCR System (Invitrogen) following the manufacturer’s instructions. Ribo-RNA-seq library preparation was performed as described previously44. In brief, 2.5 μg of total RNA were depleted of ribosomal RNA using the RiboMinus Eukaryote System v2 kit (Invitrogen), following manufacturer instructions. Ribo-RNA was resuspended in 17 μl of EFP buffer (Illumina), heated to 94 °C for 8 min, and used as input for first strand synthesis, using the TruSeq RNA Sample Prep kit, following manufacturer instructions. Poly(A) RNA-seq library was performed by using the TruSeq RNA Sample Prep kit, following the manufacturer’s instructions. For immunoprecipitation of mRNA for CAP-Seq experiments, 30 μg of total RNA were fragmented by alkaline hydrolysis in ~200-nt fragments and incubated with 5 μg of mouse anti-CAP antibody (anti-m3G-cap, m7g-cap, Clone H20, Millipore MABE419) (or IgG) overnight at 4 °C in 0.5 ml of IP buffer (10 mM Tris-HCl pH 7.5; 150 mM NaCl; 0.1% Triton X-100) supplemented with 50 U ml−1 RNaseOUT (Invitrogen), 50 U ml−1 SuperaseIN (Invitrogen), and 50 U ml−1 RNase Inhibitor (Ambion). 25 μl of Dynabeads Protein G (Invitrogen) were saturated overnight at 4 °C in IP buffer supplemented with 150 μg of Sonicated Salmon Sperm DNA (Qiagen). Following incubation, beads were washed two times in IP buffer and incubated with the preformed RNA-antibody complexes at 4 °C. After 3 h, beads were washed four times with IP buffer. Specific elution of recovered fragments were obtained by incubation of beads with 100 μl elution buffer (5 mM Tris pH 7.5; 1 mM EDTA; 0.05% SDS; 0.3 mg ml−1 Proteinase K) for 1.5 h at 50 °C. Fragments were then purified by addition of 1 ml of TRIzol reagent (Invitrogen), and subjected to random-primed reverse transcription using the SuperScript III Reverse Trancriptase (Invitrogen) at 50 °C for 1 h. Resulting cDNAs were then used as input for the TruSeq RNA Sample Prep kit (Illumina), starting from the ‘second strand synthesis’ step, to produce the sequencing library, following the manufacturer’s instruction. To map the transcriptional start sites at single-base resolution we used an enzymatic-based approach by the use of the RNA 5′ pyrophosphohydrolase (RppH) enzyme to decap eukaryotic mRNAs21. We validated the specificity of this technique in a pilot experiment by comparing RppH-treated RNA versus untreated or T4 polynucleotide kinase (PNK)-treated RNA (Extended Data Fig. 6a–e). When required the total RNA was depleted from small nuclear RNAs (snRNAs) by using the following protocol. 5 μg of total RNA was resuspended in snRNA-depletion buffer (20 mM HEPES pH 7.5, 80 mM KCl, 1 mM DTT), 1 μl RNase inhibitor (Ambion), 2 μM oligo mix (designed against snRNAs sequences, primers sequences in Supplementary Table 1) in a final volume of 50 μl, heated to 70 °C for 5 min and immediately put on ice. After that it was added 25 μl snRNA-depletion buffer 2 × (40 mM HEPES pH 7.5, 160 mM KCl, 10 mM MgCl , 2 mM DTT), supplemented with 1 μl RNase inhibitor (Ambion) and 1 μl of RNAse H (NEB) to a final volume of 100 μl. Incubated for 30 min at 37 °C. snRNA-depleted RNA were purified by RNA Clean and Concentration kit (Zymo Research) and DNaseI digestion was performed following the manufacturer’s instructions. snRNA-depleted RNAs were further depleted from ribosomal RNA by using the RiboMinus Eukaryote System v2 kit (Invitrogen). The RNA obtained from previous depletions (or poly(A)+ RNA enriched using NEBNext Poly(A) mRNA Magnetic Isolation Module kit (NEB), following the manufacturer’s instructions) was chemically fragmented by using first strand buffer of the SuperScript II Reverse Transcriptase (Invitrogen). The fragmented RNA was dephosphorylated of natural 5′ and fragmentation-derived 3′ phosphate by using Antarctic Phosphatase (AP, NEB). Dephosphorylated RNA was then treated with RNA 5′ pyrophosphohydrolase (RppH, NEB) in 1 × Thermopol buffer (NEB) (for decapping and pyrophosphate removal from the 5′ end of RNA to leave a 5′ monophosphate RNA). For positive and negative control, the dephosphorylated RNA was treated with the T4 polynucleotide kinase (PNK, NEB) (for 5′ phosphorylation of all RNA fragments) or was performed without adding the enzyme. 5′ RNA adaptor ligation was carried out by using the TruSeq Small RNA Sample Preparation Kit (Illumina). Reverse transcription was performed with SuperScript III enzyme (Invitrogen) and Illumina 3′ Adaptor Rev-Comp Random Hexamers (RC3N6). The RNA was size selected on TBE-Urea 10% PAGE gel and PCR amplification was carried out by using the TruSeq Small RNA Sample Preparation Kit (Illumina). Ribosome profiling was performed using the ARTseq/TruSeq Ribo Profile (Illumina), with minor changes to the manufacturer protocol. In brief, around 3 × 107 cells were treated with 0.1 μg μl−1 final cycloheximide for 5 min at 37 °C. Cells were then washed twice and harvested with ice-cold PBS (supplemented with 0.1 μg μl−1 final cycloheximide). Cells were lysed in 1 ml of mammalian lysis buffer (supplemented with 0.5% final concentration of NP-40) at 4 °C for 10 min on a rotator. The lysate was then treated with 50 U of ART-seq nuclease for 45 min at 25 °C, with moderate shaking. 400 μl of the digested lysate were then layered on the top of a 2.5 ml sucrose cushion, and centrifuged at 265,000g for 5 h at 4 °C. After completely removing the supernatant, the pellet was resuspended in 100 μl nuclease-free water, and purified on RNA Clean & Concentrator-5 columns (Zymo Research). 5 μg of the recovered monosomal RNA was then subjected to two consecutive rounds of rRNA depletion using the Ribo-Zero Gold Kit (Human/Mouse/Rat, Epicentre), and then run on a 10% TBE-Urea PAGE gel for 25 min at 200 V. A gel slice corresponding to 28–30 nt was then cut, crushed, and RNA was recovered by passive diffusion at 4 °C for 16 h. The eluted RNA fragments were then end-repaired, ligated to the 3′ adaptor, and reverse-transcribed. The cDNA was run on 10% TBE-Urea PAGE gel for 30 min at 180 V, and a gel slice corresponding to fragments of approximately 70–80 nt was cut, crushed, and cDNA was recovered by passive diffusion at 37 °C for 16 h with vigorous shaking. The eluted cDNA was then subjected to circularization, and the final library was obtained by ten cycles of PCR. The final library was inspected on the Fragment Analyzer (Advanced Analytical), revealing a single sharp peak around 150 bp. Samples were sequenced on the HiScanSQ or Next500 platforms (Illumina). All of the analysed datasets were mapped to a recently published variant of the mm9 genome assembly that includes single-nucleotide variants from E14 ES cells45. Prior to mapping, sequencing reads were trimmed on the basis of low-quality scores and clipped from the adaptor sequence by using FASTX toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). For RNA-seq data analysis, reads were mapped using TopHat v2.0.6 (ref. 46) and mRNA quantification was performed using Cuffdiff v2.0.2 (ref. 47). For ChIP-seq data analysis, reads were mapped Bowtie version 0.12.7 (ref. 48), reporting only unique hits with up to two mismatches (parameters: -m 1 -v 2). For bisulphite-seq data analysis, reads were mapped using BSMAP v2.74 (ref. 49). Unmapped reads from the first mapping round were trimmed by 10 nt at their 5′ end, and 15 nt at their 3′ end using fastx_trimmer tool from the FASTX toolkit, and subjected to a second round of mapping. Reads failing this second mapping round were mapped to the Escherichia coli strain K-12 substrain DH10B genome (NCBI accession: NC_010473), in order to estimate bisulphite conversion efficiency. RNA-seq correlation analyses were performed by using Pearson correlation coefficient and by plotting RPKM value calculated on RefFlat gene annotation. Intragenic transcription initiation analysis was performed on a non-redundant gene annotation built starting from the RefFlat annotation, by keeping only the longest isoform for each gene, with at least 1 RPKM of expression and at least 5 exons. RPKM on each exon was calculated by counting reads falling in the exon (normalizing on the exon length in kb and on the total mapped reads of the experiment in millions) using custom script and then the ratio was calculated as the log fold-change of second, third and last exon RPKM over the first exon RPKM for each gene. For the ratio of intermediate to first exons, averages of the RPKM value of all the other exons (from fourth to penultimate) were used. Alternative promoter analysis was performed on a non-redundant gene annotation built starting from the RefFlat annotation by keeping only the genes that had at least two isoforms transcribed from known different promoters. RPKM of the first exon of the isoforms transcribed from alternative promoters was calculated with a custom script. The log ratio between the first exons transcribed from the first over the second promoter was plotted by using the heatscatter function (on R) and correlation was quantified with Pearson’s coefficient. Alternative promoter analysis was calculated on the same reference as above. The log ratio was calculated as the RPKM value of the first exon transcribed from each class of different alternative promoters over the RPKM of the whole transcript, in order to normalize differentially expressed genes in wild-type and Dnmt3b−/− cells. For DECAP-seq only intragenic mapped reads were used for further analysis. We used a RefSeq-based genic reference containing only the annotated longest isoforms and deprived from all the genes overlapping other genes or ncRNAs on the same strand. Since DECAP-seq is a technique capable of single-base resolution and the first base of the sequenced reads corresponds to the base having the cap signal, only the first position of the mapped read was used to calculate a count per million of mapped reads (RPM). All the analyses were performed on the genes belonging to the third or fourth quartiles of expression. Venn diagram overlap is calculated at single-base resolution. Logo analysis of the sequence enrichment was performed by using WebLogo (http://weblogo.berkeley.edu/). Motif discovery was performed by using HOMER Motif Analysis (http://homer.salk.edu/homer/motif/). For CAPIP-seq only intragenic mapped reads were used for further analysis. RPKM of each genomic feature were calculated as described above by using custom script. Enrichment was calculated as the log fold change of RPKM value from CAP immunoprecipitated samples over the RPKM from input samples for each genomic feature. As for DECAP-seq, the intragenic CAPIP-seq signal ratio between wild-type and Dnmt3b−/− cells was calculated as the fold change of the intragenic enrichment (from 2 kb downstream TSS to TES) in wild-type over Dnmt3b−/− cells. The ratio gene-body to TSS was defined as the log fold change of gene-body enrichment (derived from intronic and intermediate exonic regions) over the enrichment calculated on the first 200 nt of the transcripts. All the analyses were performed on genes belonging to the third or fourth quartiles of expression. Poly(A)+ enriched RNA-seq analyses were performed from RNA derived from DRB-treated wild-type and Dnmt3b−/− ES cells. For half-life calculation, gene quantifications performed with CuffDiff (see above) were normalized on the average of the top ten genes showing less degradation rate following DRB treatment having at least 10 RPKM in ES cells. Degradation rate has been defined as the ratio of RPKM value of the sample at time 0 h of DRB treatment over the average RPKM value of the samples treated for 3, 6 and 12 h with DRB. The top ten genes are Tmsb10, Mt1, Mt2, Rps14, Rplp2, 4930412F15Rik, Rpl38, Rplp1, Tomm7 and Cox6a1. Only genes with a RPKM > 1 were used for further analysis and a constant of 0.1 pseudo-RPKM was introduced to reduce sampling noise. Half-life (t ) was calculated by using the following formula50: where k is the decay rate constant obtained by fitting data (gene RPKM value for each time point) with an exponential function. Half-life on introns was measured as calculated for mature mRNAs, but gene quantification (RPKM) was performed counting the reads on introns and normalizing for intron length (kb) and for the number of total intragenic mapped reads (millions). For introns and exons quantification, reads were treated as above (see RNA-seq analysis). Analysis of ART-seq experiments were performed as previously described31. Differently from the other sequencing data, for ribosome profiling, only adaptor containing reads were used in order to avoid total RNA contamination. Reads were clipped from adapters and mapped on rRNAs and tRNAs. Only reads not mapping on rRNA/tRNA genes were used for downstream analysis. Quantification (RPKM) of the reads derived from different transcript parts or genomic features was performed as described above. Following mapping, reads with the same start mapping coordinates were collapsed using custom Perl scripts, and peak calling was performed using MACS version 1.4.1 (ref. 51). ChIP-seq signal log enrichment was calculated as previously described10, with some modifications. In brief, the mouse genome was partitioned into 500-bp bins. Bins overlapping with satellite repeats and with an insufficient coverage in WGBS (less than 50% of all CpGs covered at least 10×) were removed resulting in 2,708,724 bins. Signal enrichment was calculated as the log of ChIP-seq over input RPKM. These whole-genome log enrichment values were used for clustering, correlation, box plot and scatter plot analysis by using custom scripts. For genomic binning by H3K36me3, the above bins were divided in ten equal-size groups rank-ordered by their log enrichment for H3K36me3. Heat map representations of ChIP-seq peaks and plots were performed with respect to annotated RefSeq genes, sorted by their expression level, according to RNA-seq data. Plots of Dnmt3b and H3K36me3 distribution on genes clustered in quartiles of expression revealed an almost identical distribution for both features. For the analysis of Dnmt3b intragenic binding in Setd2 knockdown ES cells and Dnmt3b-re-expressing Dnmt3b−/− ES cells, a non-redundant gene annotation was built starting from the RefFlat annotation, by keeping only the longest isoform for each gene. After calling H3K36me3 peaks in wild-type ES cells using MACS 1.4.1 (parameters: -p 1e-8 –nolambda), the genes from the RefFlat annotation that overlap an H3K36me3 peak were marked as H3K36me3-positive, while genes lacking any overlap were marked as H3K36me3-negative. For each gene in the two datasets, the normalized Dnmt3b signal (RPKM) in control and treated ES cells was calculated as: where n is the number of Dnmt3b reads overlapping a gene’s coordinates, TSS and TES are respectively the start and end coordinate of the gene annotation, and N is the total number of mapped reads in the ChIP-seq experiment. P values were calculated using a one-tailed paired Wilcoxon rank-sum test. Methylation calling was performed using the methratio.py script provided with the BSMAP tool and comparative analyses were performed by using only CpG covered at least 5× in both wild-type and Dnmt3b−/− cells. Heat maps and comparative analysis were performed using custom Perl scripts. Datasets used for comparative analysis were obtained from Gene Expression Omnibus by downloading the following datasets: GSE12241, GSE11172, GSE31039, GSE44642, GSE44566, GSE55660, GSE57413, GSE44566. Antibodies were purchased from Abcam (anti-Dnmt3b; anti-H3K36me3; anti-single-strand DNA; anti-H3 pan; anti-Tbp; anti-TIIb), from Imgenex (anti-Dnmt3a; anti-Dnmt3b; anti-Dnmt1), from Diagenode (anti-5-methylcytidine), from Millipore (anti-H3K27me3; anti-m3G-cap, anti-m7G-cap; anti-Elk1), from Upstate (anti-H3K4me3), from Covance (anti-Pol II-phospho-Ser5), from SantaCruz (anti-pan Pol II, anti-Sp1; anti-Elf1), from Upstate (anti-H3K4me3; anti-H3ac). Anti-Dnmt3l was provided by S. Yamanaka. The raw data that support the findings of this study have been deposited at Gene Expression Omnibus under the accession code GSE72856.


No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. Metaphase cells were obtained by treating cells with Karyomax (Gibco) at a final concentration of 0.01 μg ml−1 for 1–3 h. Cells were collected, washed in PBS, and resuspended in 0.075 M KCl for 15–30 min. Carnoy’s fixative (3:1 methanol:glacial acetic acid) was added dropwise to stop the reaction. Cells were washed an additional three times with Carnoy’s fixative, before being dropped onto humidified glass sides for metaphase cell preparations. For ECdetect analyses, DAPI was added to the slides. Images in the main figures were captured with an Olympus FV1000 confocal microscope. All other images were captured at a magnification of 1,000× with an Olympus BX43 microscope equipped with a QiClick cooled camera. FISH was performed by adding the appropriate DNA FISH probe onto the fixed metaphase spreads. A coverslip was added and sealed with rubber cement. DNA denaturation was carried out at 75 °C for 3–5 min and the slides were allowed to hybridize overnight at 37 °C in a humidified chamber. Slides were subsequently washed in 0.4× SSC at 50 °C for 2 min, followed by a final wash in 2× SSC containing 0.05% Tween-20. Metaphase cells and interphase nuclei were counterstained with DAPI, a coverslip was applied and images were captured. The NCI-60 cell line panel (gift from A. Shiau, obtained from NCI) was grown in RPMI-1640 with 10% FBS under standard culture conditions. Cell lines were not authenticated, as they were obtained from the NCI. The PDX cell lines were cultured in DMEM/F-12 medium supplemented with glutamax, B27, EGF, FGF and heparin. Lymphoblastoid cells (gift from B. Ren) were grown in RPMI-1640, supplemented with 2 mM glutamine and 15% FBS. IMR90 and ALS6-Kin4 (gift from J. Ravits and D. Cleveland) cells were grown in DMEM/F-12 supplemented with 20% FBS. Normal human astrocytes (NHA) and normal human dermal fibroblasts (NHDF) were obtained from Lonza and cultured according to Lonza-specific recommendations. Cell lines were not tested for mycoplasma contamination. Tissues were obtained from the Moores Cancer Center Biorepository Tissue Shared Resource with IRB approval (#090401). All samples were de-identified and patient consent was obtained. Additional tissue samples that were obtained were approved by the UCSD IRB (#120920). DNA was sonicated to produce 300–500 bp fragments. DNA end repair was performed using End-it (Epicentre), DNA library adapters (Illumina) were ligated and the DNA libraries were amplified. Paired-end next-generation sequencing was performed and samples were run on the Illumina Hi-Seq using 100 cycles. Cells were collected and washed with 1 × cold PBS. Cell pellets were resuspended in buffer 1 (50 mM Tris pH 7.5, 10 mM EDTA, 50 μg ml−1 RNase A), and incubated in buffer 2 (1.2% SDS) for 5 min on ice. DNA was acidified by the addition of buffer 3 (3 M CsCl, 1 M potassium acetate, 0.67 M acetic acid) and incubated for 15 min on ice. Samples were centrifuged at 14,000g for 15 min at 4 °C. The supernatant was added to a Qiagen column and briefly centrifuged. The column was washed (60% ethanol, 10 mM Tris pH 7.5, 50 μM EDTA, 80 mM potassium acetate) and eluted in water. Metaphase cells were dropped onto slides and visualized with DAPI. Coverslips were removed and slides washed in 2 × SSC, and subsequently treated with 2.5% trypsin, and incubated at 25 °C for 3 min. Slides were then washed in 2 × SSC, DNase solution (1 mg ml−1) was applied to the slide and cells were incubated at 37 °C for 3 h. Slides were washed in 2 × SSC and DAPI was again applied to the slide to visualize DNA. In Fig. 2a, b the violin plots represent the distribution of ecDNA counts in different sample types. In order to compare the ecDNA counts between the different samples, we use a one-sided Wilcoxon rank-sum test, where the null hypothesis assumes that the mean ecDNA-count ranks of the compared sample types are equal. There is a wide variation in the number of ecDNA across different samples and within metaphases of the same sample. We want to estimate and compare the frequency of samples containing ecDNA for each sample type. We label a sample as being ecDNA positive by using the pathology standard: a sample is deemed to be ecDNA positive if we observe ≥ 2 ecDNA in ≥ 2 out of 20 metaphase images. Therefore, we ensure that every sample contains at least 20 metaphases. We define indicator variable X  = 1 if metaphase image j in sample i has ≥2 ecDNA elements, X  = 0 otherwise. Let n be the number of metaphase images acquired for sample i. We assume that X is the outcome of the jth Bernoulli trial, where the probability of success P is drawn at random from a beta distribution with parameters determined by ∑ X . Formally, We model the likelihood of observing k successes in n = 20 trials using the binomial density function as: Finally, the predictive distribution p(k), is computed using the product of the binomial likelihood and beta prior, modelled as a ‘beta–binomial distribution’29. We model the probability of sample i being ecDNA positive with the random variable Y so that: The expected value of Y is: Let T be the set of samples belonging to a certain sample type t, for example, immortalized samples. We estimate the frequency of samples under sample t containing ecDNA (bar heights on Fig. 2c, d) as assuming independence among samples i ∈ T. For any α or β  = 0, we assign them a sufficiently small ε. For more detail, please see Supplementary Information 1. We construct binary ecDNA-presence distributions, based on the ecDNA counts, such that an image with ≥ 2 ecDNA is represented as a 1, and 0 otherwise. In order to compare the ecDNA presence between the different samples, we use a one-sided Wilcoxon rank-sum test using the binary ecDNA-presence distributions, where the null hypothesis assumes the mean ranks of the compared sample types are equal. The software applies an initial coarse adaptive thresholding30, 31 on the DAPI images to detect the major components in the image with a window size of 150 × 150 pixels, and T = 10%. Components over 3,000 pixels and 80% of solidity are masked, and small components discarded. Weakly connected components of the remaining binary image are computed to find the separate chromosomal regions. Connected components over a cumulative pixel count of 5,000 are considered as candidate search regions, and their convex hull with a dilation of 100 pixels are added into the ecDNA search region. Following the manual masking and verification of the ecDNA search region, a second finer adaptive thresholding with a window size of 20 × 20 pixels and T = 7% is performed. Components that are greater than 75 pixels are designated as non-ecDNA structures and their 15-pixel neighbourhood is removed from the ecDNA search region. Any component detected with a size less than or equal to 75 pixels and greater than or equal to 3 pixels inside the search region is detected as ecDNA. For more detail, please see Supplementary Information 2. We sequenced 117 tumour samples including 63 cell lines, 19 neurospheres and 35 cancer tissues with coverage ranging from 0.6× to 3.89× and an additional 8 normal tissues as controls. See Extended Data Fig. 4 for the coverage distribution across samples. We mapped the sequencing reads from each sample to the hg19 (GRCh37) human reference genome32 from the UCSC genome browser33 using BWA software version 0.7.9a (ref. 34). We inferred an initial set of copy-number variants (CNVs) from these mapped sequence samples using the ReadDepth CNV software35 version 0.9.8.4 with parameters FDR = 0.05 and overDispersion = 1. We downloaded CNV calls for 11,079 paired tumour–normal samples covering 33 different tumour types from TCGA. We applied similar filtering criteria to ReadDepth output and TCGA calls to eliminate false copy number amplification calls from repetitive genomic regions and hotspots for mapping artefacts. We used the filtered set of CNV calls from ReadDepth as input probes for AmpliconArchitect which revealed the final set of amplified intervals and the architectures of the amplicons. See Supplementary Information 3 for more details. We developed a novel tool AmpliconArchitect, to automatically identify connected amplified genomic regions and reconstruct plausible amplicon architectures. For each sample, AmpliconArchitect takes as input an initial list of amplified intervals and whole-genome sequencing paired-end reads aligned to the human reference. It implements the following steps to reconstruct the one or more architectures for each amplicon present in the sample: (1) use discordant read-pair alignments and coverage information to iteratively visit and extend connected genomic regions with high copy numbers; (2) for each set of connected amplified regions, segment the regions based on depth of coverage using a mean-shift segmentation to detect copy-number changes and discordant read-pair clusters to identify genomic breaks; (3) construct a breakpoint graph connecting segments using discordant read-pair clusters; (4) compute a maximum-likelihood network to estimate copy counts of genomic segments; and (5) report paths and cycles in the graph that identify the dominant linear and circular structures of the amplicon (see also Supplementary Information 3). We compared our sample set against TCGA samples to test the assumption that the genomic intervals amplified in our sample set are broadly representative of a pan-cancer dataset, by comparing against TCGA samples. Here, we deal with an abstract notation to represent different datasets and describe a generic procedure to compare amplified regions. Consider a set of K samples. For any k ∈ [1,..., K], let S denote the set of amplified intervals in sample k. Let c be the cancer subtype for sample k. We compare S against TCGA samples with subtype c. Let T denote the set of all genomic regions which are amplified in at least 1% of TCGA samples of subtype c. For each interval t ∈ T, let f denote its frequency in TCGA samples of subtype c. We define a match score The cumulative match score for all samples is defined as: To compute the significance of statistic D, we do a permutation test. We generate N random permutations of the TCGA intervals for subtype c and estimate the distribution of match scores of our sample set against the random permutations. We choose a random assignment of locations of all intervals in T, while retaining their frequencies. For the jth permuted set T , we computed the cumulative match score D relative to our sample set. Thus the significance of overlap between amplified intervals in our sample set and the TCGA set is estimated by the fraction of random permutations with D  > D. Computing 1 million random permutations generated exactly one permutation breaching the TCGA score D, implying a P ≤ 10−6. We compared the rank correlation of the most frequent oncogenes in our sample set with the top oncogenes as reported by TCGA pan-cancer analysis in ref. 20. We identified 14 oncogenes occurring in 2 or more samples of our sample set and compared these to the top 10 oncogenes from the TCGA pan-cancer analysis. We found that 7 out of the top 10 oncogenes were represented in our list of 14 oncogenes. Considering 490 oncogenes in the COSMIC database, the significance of observing 7 or more oncogenes in common in the two datasets is given by the hypergeometric probability We found high similarity between amplicon structures of biological replicates (for example, Extended Data Fig. 8). We estimate the probability of common origin between two samples by measuring the pairwise similarity between amplicon structures. In reconstructing the structures (Supplementary Information 3), we identify a set of locations representing change in copy number and we use the locations of change in copy number to estimate the similarity in amplicon structures. Let L be the total length of amplified intervals. These intervals are binned into windows of size r, resulting in bins. We use a segmentation algorithm that determines if there is a change in copy number in any bin, within a resolution of r = 10,000 bp (see meanshift in coverage: Supplementary Information 3.2.). Note that this is an overestimate, because with split-reads and high-density sequencing data, we can often get the resolution down to a few base pairs. Let S and S represent the set of bins with copy-number changes in the two samples, respectively. S and S are selected from a candidate set of locations N . Under the null hypothesis that S is random with respect to S , we expect I = S  ∩ S to be small. Let m = min {|S1|, |S2|}, and M = max {|S |, |S |}. A P value is computed as follows: When looking at GBM39 replicates (Extended Data Fig. 8), we find that all replicates displaying EGFR ecDNA are similar to each other. Comparing replicates in row 1 and row 2 among |N | = 129 bins (1.29 Mb), |S1| = 5 corresponding to row 1 (ecDNA sample), |S2| = 6 corresponding to row 2 (ecDNA sample) and intersection set size |I| = 5, we compute that the P value for observing such structural similarity by random chance is 2.18 × 10−8, which is the highest P value among all ecDNA replicate pairs. In addition, we compare the replicates containing EGFR in ecDNA with the culture containing EGFR in HSR. Among |N | = 129 bins, |S1| = 6 corresponding to row 2 (ecDNA), |S2| = 4 corresponding to row 4 (HSR), the intersection set has size |I| = 4 intervals giving a P value of 1.98 × 10−5, which gives the highest P value among the 3 ecDNA replicates compared to the HSR culture, suggesting a common origin. Consider an initial population of N cells, of which N cells contain a single extra copy of an oncogene. We model the population using a discrete generation Galton–Watson branching process23. In this simplified model, each cell in the current generation containing k amplicons (amplifying an oncogene) either replicates with probability b to create the next generation, or dies with probability 1 − b to create the next generation. We set the selective advantage In other words, cells with k copies of the amplicon stop dividing after reaching a limit of M amplicons. Otherwise, they have a selective advantage for 0 < k ≤ M , where the strength of selection is described by f (k), as follows: Here, s denotes the selection coefficient, and parameters m and α are the ‘mid-point’, and ‘steepness’ parameters of the logistic function, respectively. Initially, f (k) grows linearly, reaching a peak value of f (k) = 1 for k = M . As the viability of cells with large number of amplicons is limited by available nutrition36, f (k) decreases logistically in value for k > M reaching f (k) → 0 for k ≥ M . We model the decrease by a sigmoid function with a single mid-point parameter m so that f (m) = 0.5. The ‘steepness’ parameter α is automatically adjusted to ensure that max{1 – f (M ), f (M )} → 0. The copy-number change is affected by different mechanisms for extrachromosomal (ecDNA) and intrachromosomal (HSR) models. In the ecDNA model, the available k amplicons are on ecDNA elements which replicate and segregate independently. We assume complete replication of ecDNA elements so that there are 2k copies which are partitioned into the two daughter cells via independent segregation. Formally, the daughter cells end up with k and k amplicons respectively, where By contrast, in the intrachromosomal model, the change in copy number happens via mitotic recombination, and the daughter cell of a cell with k amplicons will acquire either k + 1 amplicons or k − 1 amplicons, each with probability P . With probability 1−2P , the daughter cell retains k amplicons. See Supplementary Information 4 for more details. AmpliconArchitect is available for use online at: https://github.com/virajbdeshpande/AmpliconArchitect. ECdetect will be available upon request. Whole-genome sequencing data are deposited in the NCBI Sequence Read Archive (SRA) under Bioproject (accession number: PRJNA338012). DAPI and FISH metaphase images are available for download on figshare at https://figshare.com/s/ab6a214738aa43833391.


News Article | September 28, 2016
Site: www.nature.com

No statistical methods were used to predetermine sample size. These experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. Between 10 October 2009 and 12 December 2011, 90 samples were collected at 45 locations throughout the world’s oceans (Supplementary Table 1) through the Tara Oceans expedition32. These included samples from the following range of depths: surface, deep chlorophyll maximum, bottom of mixed layer when no deep chlorophyll maximum was observed (stations 123, 124, and 125), and mesopelagic samples. The sampling stations were located in 7 oceans and seas, 4 different biomes and 14 Longhurst oceanographic provinces (Supplementary Table 1). For Tara station 100, two different peaks of chlorophyll were observed, so two samples were taken at the shallow (100_DCM) and deep (100_dDCM) chlorophyll maximum. For each sample, 20 l of seawater was 0.22 μm-filtered and viruses were concentrated from the filtrate using iron chloride flocculation33 followed by storage at 4 °C. After resuspension in ascorbic-EDTA buffer (0.1 M EDTA, 0.2 M Mg, 0.2 M ascorbic acid, pH 6.0), viral particles were concentrated using Amicon Ultra 100 kDa centrifugal devices (Millipore), treated with DNase I (100 U ml−1) followed by the addition of 0.1 M EDTA and 0.1 M EGTA to halt enzyme activity and extracted as previously described34. In brief, viral particle suspensions were treated with Wizard PCR Preps DNA Purification Resin (Promega) at a ratio of 0.5 ml sample to 1 ml resin, and eluted with Tris-EDTA buffer (10 mM Tris, pH 7.5, 1 mM EDTA) using Wizard Minicolumns. Extracted DNA was Covaris-sheared and size-selected to 160–180 bp sequence lengths, followed by amplification and ligation according to standard Illumina protocol. Sequencing was performed with a HiSeq 2000 system (101 bp, paired end reads). Temperature, salinity, and oxygen data were collected from each station using an SBE 911plus CTD with Searam recorder and an SBE 43 dissolved oxygen sensor (Sea-Bird Electronics). Nutrient concentrations were determined using segmented flow analysis35 and included nitrite, phosphate, nitrite-plus-nitrate, and silica. Nutrient concentrations below the detection limit (0.02 μmol kg−1) are reported as 0.02 μmol kg−1. All data from the Tara Oceans expedition are available from the European Nucleotide Archive (ENA) (for nucleotide data) and from PANGAEA (for environmental, biogeochemical, taxonomic and morphological data)36, 37, 38. Thirteen bathypelagic samples and one mesopelagic sample were collected between 19 April 2011 and 11 July 2011 during the Malaspina 2010 global circumnavigation expedition covering the Pacific and the North Atlantic Oceans. All samples were taken at 4,000 m depth with the exception of two samples from stations 81 and 82, which were collected at 3,500 m and 2,150 m, respectively (Supplementary Table 1). Additionally, station M114 was sampled at the OMZ region at 294 m depth. For each sample, 80 l of seawater was 0.22 μm-filtered and viruses were concentrated from the filtrate using iron chloride flocculation33, followed by storage at 4 °C. More details about sampling and additional variables used in the Malaspina expedition can be found in ref. 39. Further processing was performed as for the Tara Oceans samples other than Illumina sequencing (151 bp, paired end reads). An overview of the contig-generation process is provided in Supplementary Fig. 8. The first step involved the generation of a set of contigs using as many reads as possible from the 104 oceanic viromes. These viromes including 74 epipelagic and 16 mesopelagic samples from the Tara Oceans expedition5 and 1 mesopelagic and 13 bathypelagic samples from the Malaspina expedition6. This set of contigs was generated through an iterative cross-assembly12, using MOCAT40 and Idba_ud41, (Supplementary Fig. 8) as follows: (i) high-quality reads were first assembled sample-by-sample with the MOCAT pipeline as described previously18; (ii) all reads not mapping (Bowtie 2 (ref. 42), options: -sensitive, -X 2000, -non-deterministic, other parameters at default) to a MOCAT contig (by which we denote ‘scaftigs’, that is, contigs that were extended and linked using the paired-end information of sequencing read42) were assembled sample-by-sample with Idba_ud (iterative k-mer assembly, with k-mer length increasing from 20 to 100 bp in steps of 20); (iii) all reads that remained unmapped to any contig were then pooled by Longhurst province (that is, unmapped reads from samples corresponding to the same Longhurst province were gathered) and assembled with Idba_ud (with the same parameters as above); and (iv) all remaining reads unmapped from every sample were gathered for a final cross-assembly (using Idba_ud). This resulted in 10,845,515 contigs (Supplementary Fig. 8b). As the contigs assembled from the marine viral metagenomes could still contain redundant sequences derived from the same (or closely related) populations, we set out to merge contigs derived from the same population into clusters representing population genomes. To this end, contig sequences were first clustered at 95% global average nucleotide identity (ANI) with cd-hit-est43 (options: -c 0.95 -G 1 -n 10 -mask NX) (Supplementary Fig. 8b), resulting in 10,578,271 non-redundant genome fragments. Next, we used co-abundance (that is, the correlation between abundance profiles estimated by reads mapping) and nucleotide-usage profiles of the non-redundant contigs to further identify contigs derived from the same populations using Metabat44. In brief, Metabat uses Pearson correlation between coverage profiles (determined from the mapping of high-quality reads of each sample to the contigs with Bowtie 2 (ref. 42), options: -sensitive, -X 2000, -non-deterministic, other parameters at default) and tetranucleotide frequencies to identify contigs originating from the same genome (Metabat parameters: 98% minimum correlation, mode ‘sensitive’; see Supplementary Text for more detail about the selection of these parameters). The 8,744 bins generated, including 3,376,683 contigs, were further analysed, alongside 623,665 contigs that were not included in any genome bin but were ≥1.5 kb. In an attempt to better assemble these genome bins, two additional sets of contigs were generated for each genome bin (beyond the set of initial contigs binned by Metabat44). These were based on the de novo assembly of: (i) all reads mapping to the contigs in the genome bin, and (ii) only reads from the sample displaying the highest coverage for the genome bin (both assemblies with Idba_ud41; Supplementary Fig. 8c). The latter assembly might be expected to lead to the ‘cleanest’ genome assembly because it includes the minimum between-sample sequence variation, lowering the probability of generating a chimaeric contig45. The former assembly may be necessary if the virus is locally rare, so that sequences from multiple metagenomes are needed to achieve complete genome coverage. Thus, if the assembly from the single ‘highest-coverage’ sample was improved or equivalent to the initial assembly (that is, the longest contig in the new assembly representing ≥95% of the longest contig in the initial assembly), this set of contigs was selected as the sequence for this bin (n = 6,423). This optimal single-sample assembly was thus privileged compared to a cross-assembly (either based on the initial contigs or on the re-assembly of all sequences aligned to that bin). Otherwise, the ‘all samples’ bin re-assembly was selected if it was equivalent to or better than the initial assembly (longest contig representing ≥95% of the longest initial contig, n = 999). The assumption that cross-assembly would be needed for locally rare viruses without a high-coverage sample was confirmed by the comparison between the highest coverage of these two types of bins. On average, bins for which the ‘optimal’ assembly was selected displayed a maximum coverage of 5.47× per Gb of metagenome, while the bins for which the ‘cross-assembly’ was selected displayed a maximum coverage of 1.37× per Gb of metagenome (Supplementary Table 2). Finally, if both re-assemblies yielded a longest contig smaller (<95%) than the one in the initial assembly, the bin was considered to be a false-positive (that is, binning of contigs from multiple genomes, n = 1,356), and contigs from the initial assembly were considered as ‘unbinned’ (263,006 contigs, added to the 623,665 contigs ≥1.5 kb initially retained as ‘unbinned’). Despite efforts to remove cellular DNA completely during sample preparation, the resulting viral metagenomic datasets can only ever be enriched for viruses46. Thus, assembled sequences in the GOV dataset were in silico filtered a posteriori to identify and remove any clearly non-viral signal. In this way, our purification methods should have greatly enriched for viruses, but the in silico decontamination step served as a back-up for problematic samples. Together these two filters mean that virtually no known cellular signal should have been considered in our analyses. For the in silico cleaning step, VirSorter47 was used to identify and remove microbial contigs using the ‘virome decontamination’ mode, with every contig ≥10 kb that was not identified as viral considered to be a microbial contig. Sequences predicted to be from prophages were manually curated to distinguish actual prophages (that is, viral regions within a microbial contig) from contigs that belonged to a viral genome and were wrongly predicted as a prophage. Contigs originating from a eukaryotic virus were identified based on best BLAST hit affiliation of the contig-predicted genes against NCBI RefseqVirus (see Supplementary Text). The genome bins were affiliated as microbial (if 1 or more contigs were identified as microbial, n = 1,763), eukaryotic virus (if contigs affiliated as eukaryotic virus comprised more than 10 kb or more than 25% of the genome bin total length, n = 962) or viral (that is, archaean and bacterial viruses, n = 4,341), with the 356 remaining bins that lacked a contig long enough for an accurate affiliation considered as ‘unknown’ (see Supplementary Text). Viral bins were then refined to evaluate whether they corresponded to a single viral population or to a mix. To that end, the Pearson correlation and Euclidean distance between abundance profiles (that is, the profile of the average coverage depth of a contig across the 104 samples) of bin members and the bin seed (that is, the largest contig) were computed, and a single-copy viral marker gene (terL) was identified in binned contigs (Supplementary Fig. 8e). Thresholds were chosen to maximize the number of bins with exactly one terL gene and minimize the number of bins with multiple terL genes (Supplementary Fig. 8g). For each bin, contigs with a Pearson correlation coefficient to the bin seed of <0.96 or a Euclidean distance to the seed of >1.05 were removed from the bin, and added to the pool of unbinned contigs. Eventually, every bin still displaying multiple terL genes after this refinement step were split and all corresponding contigs added to the pool of ‘unbinned’ contigs (Supplementary Fig. 8e). The final set of contigs was formed by compiling: (i) all contigs belonging to a viral bin, (ii) ‘unbinned’ viral contigs (that is, contigs affiliated to archaeal and bacterial virus and not part of any genome bin), and (iii) viral contigs identified in microbial or eukaryote virus bins (considered as ‘unbinned’ contigs, Supplementary Fig. 8f). Within this set of contigs, all viral bins were considered as viral populations, as well as every unbinned viral contig of ≥10 kb, leading to a total of 15,222 epipelagic and mesopelagic populations, and 58 bathypelagic populations (Supplementary Fig. 1, Supplementary Table 2 and Supplementary Information). In this study, we focus only on the 15,222 epipelagic and mesopelagic populations, totaling 24,353 contigs. For the detection of AMGs, we added to these populations all short epipelagic and mesopelagic unbinned viral contigs (<10 kb), totalling 298,383 contigs. Genomes of viruses associated with a bacterial or archaeal host were downloaded from NCBI RefSeq (1,680 sequences, v70, 05-26-2015; http://www.ncbi.nlm.nih.gov/refseq/). To complete this dataset of reference genomes, viral genomes and genome fragments available in GenBank (http://www.ncbi.nlm.nih.gov/genbank/) but not in RefSeq were downloaded (July 2015) and manually curated to select only bacterial and archaeal viruses (1,017 sequences). These included viral genomes not yet added to RefSeq, as well as genome fragments from fosmid libraries generated from seawater samples9, 10. Mycophage sequences (available at http://phagesdb.org48) were downloaded in July 2015 and included as well if not already in RefSeq (734 sequences). Finally, 12,498 viral genome fragments from the VirSorter Curated Dataset, identified in publicly available microbial genome sequencing projects, were added to the database8. Proteins predicted from 14,650 large GOV contigs (≥10 kb and ≥10 genes), were added to all proteins from the publicly available viral genomes and genomes fragments gathered, and compared through all-vs-all blastp, with a threshold of 10−5 for E-value and 50 for bit score. Protein clusters were then defined using MCL (Markov Cluster Algorithm, using default parameters for clustering of proteins, similarity scores as log-transformed E-value, and 2 for MCL inflation49). We then used vContact (https://bitbucket.org/MAVERICLab/vcontact) to first calculate a similarity score between every pair of genomes and/or contigs based on the number of protein clusters shared between the two sequences (as in refs 7, 8), and then compute an MCL clustering of the genomes/contigs based on these similarity scores (thresholds of 1 for similarity score, MCL inflation of 2). The resulting viral clusters (clusters including ≥2 contigs and/or genomes), consistent with a clustering based on whole-genome BLAST comparison, corresponded approximately to genus-level taxonomy, with rare cases closer to subfamily-level taxonomy (Extended Data Fig. 2 and Supplementary Information). A total of 1,259 viral clusters were obtained, with 867 including at least one GOV sequence. Notably, however, automatically defined viral clusters serve only as a starting point for assigning viral taxonomy. Current ICTV convention for formal taxonomic consideration of these viral clusters would require the manual comparison of genomes and genome fragments to identify signature genes, compare phylogenetic signals and, ideally, observe morphological features of corresponding viruses, although this process is currently being reviewed as advanced computational analytics and genome datasets, such as those presented here, are being developed. A functional annotation of all GOV-predicted proteins was based on a comparison to the PFAM domain database v.27 (ref. 50) with HmmSearch51 (threshold of 30 for bit score and 10−3 for E-value). Additional putative structural proteins were identified through a BLAST comparison to the protein clusters detected in the viral metaproteomics dataset52. This metaproteomics dataset led to the annotation of 13,547 hypothetical proteins lacking a PFAM annotation. A taxonomic annotation of the predicted proteins was performed based on a blastp against proteins from archaeal and bacterial viruses from NCBI RefSeq and GenBank (threshold of 50 for bit score and an E-value of 10−3). Viral clusters were affiliated based on isolate genome members, where available. When multiple isolates were included in the viral cluster, the viral cluster was affiliated to the corresponding subfamily or genus of these isolates (excluding all ‘unclassified’ cases). This was the case for VC_2 (T4 superfamily14, 15), and VC_9 (T7 virus16). When only one, or a handful of, affiliated isolate genomes were included in the viral cluster and lacked genus-level classification, a candidate name was derived from the isolate (if there were several isolates it was derived from the first one isolated). This was the case for VC_5 (Cbaphi381virus; ref. 53), VC_12 (P12024virus; ref. 54), VC_14 (MED4-117virus), VC_19 (HMO-2011virus; ref. 55), VC_31 (RM378virus; ref. 56), VC_36 (GBK2virus; ref. 57), VC_47 (Cbaphi142virus; ref. 53) and VC_277 (vB_RglS_P106Bvirus; ref. 58). Otherwise, viral clusters were considered as ‘new viral clusters’. All publicly available complete genomes (see above), all complete (circular) and near-complete (extrachromosomal genome fragment >50 kb with a terminase) from the VirSorter Curated Dataset and all complete and near-complete GOV contigs were compared to generate a phage proteomic tree, as previously described9, 59. In brief, a proteomic similarity score was calculated for each pair of genome based on an all-versus-all tblastx similarity as the sum of bit scores of significant hits between two genomes (E ≤ 0.001, bit score ≥ 30, identity percentage ≥ 30). To normalize for different genome sizes, each genome was also compared to itself to generate a self-score, and the distance between two different genomes was calculated as a Dice coefficient as previously9. That is, for two genomes A and B with a proteomic similarity score of AB, the corresponding distance d would be: d = 1 − (2 × AB)/(AA + BB); with AA and BB being the self-score of genomes A and B respectively. For clarity, the tree displayed in Extended Data Fig. 2 includes only non-GOV sequences found in a viral cluster with GOV sequence(s) or within a distance d < 0.5 to a GOV sequence, totalling 1,522 reference sequences. iTOL60, 61 was used to visualize and display the tree. Detection and estimation of abundance for viral contigs and populations The presence and relative abundance of a viral contig in a sample was determined based on the mapping of high-quality reads to the contig sequences, computed with Bowtie 2 (options: -sensitive, -X 2000, -non-deterministic, default parameters otherwise62), as previously described4. A contig was considered to be detected in a metagenome if more than 75% of its length was covered by aligned reads derived from the corresponding sample. A normalized coverage for the contig was then computed as the average contig coverage (that is, the number of nucleotides mapped to the contig divided by the contig length) normalized by the total number of base pairs sequenced in this sample. The detection and relative abundance of a viral population was based on the coverage of its contigs; that is, a population was considered as detected in a sample if more than 75% of its cumulated length was covered, and its normalized coverage was computed as the average normalized coverage of its contigs. The relative abundance of viral clusters was calculated based on the coverage of its members within the 15,222 viral populations identified. If a population included contigs that were all linked to the same viral cluster, or that were linked to a single viral cluster (except for unclustered contigs owing to short length), this population coverage was added to the total of the corresponding viral cluster. In the rare cases where the link between population and viral cluster was ambiguous because different contigs within a population pointed towards different viral clusters (n = 475, that is, 3.1% of the populations), the population coverage was equally split between these viral clusters. Finally, if no contig in the population belonged to any viral cluster (n = 2,605, 17% of the populations), the population coverage was added to the ‘unclustered’ category. Eventually, for each sample, the cumulative coverage of a viral cluster was normalized by the total coverage of all populations to calculate a relative abundance of the viral cluster among viral populations. The selection of abundant viral clusters within a sample was based on the contribution of the viral cluster to the sample diversity as measured by the Simpson index. For each sample, the overall Simpson index was first calculated with all viral clusters. Following this, viral clusters were sorted by decreasing relative abundance and progressively added to a new calculation of the Simpson index. Viral clusters considered as abundant were the ones which, once cumulated, represented 80% of the sample diversity (that is, a Simpson index ≥80% of the sample total Simpson index; Extended Data Fig. 1c). The 38 viral clusters that were identified as abundant in at least 2 different stations were selected as ‘recurrently abundant viral clusters in the GOV dataset’ (Fig. 2 and Extended Data Fig. 3). Three different approaches were used to link viral contigs and putative host genomes: blastn similarity, CRISPR spacer similarity and tetranucleotide frequency similarities. An overview of the contig-generation process is provided in Supplementary Fig. 8, and an extended discussion about the efficiency and raw results of these host prediction methods is provided in Supplementary Information, Supplementary Table 4, and ref. 63. A list of all host predictions by viral sequence is available in Supplementary Table 5. A genome database of putative hosts for the epipelagic and mesopelagic GOV viruses was generated, including all archaeal and bacterial genomes annotated as ‘marine’ from NCBI RefSeq and WGS (both times only sequences ≥5 kb, 184,663 sequences from 4,452 genomes, downloaded in August 2015), and all contigs ≥5 kb from the 139 Tara Oceans microbial metagenomes corresponding to the bacterial and archaeal size fraction (791,373 sequences)18. For these microbial metagenomic contigs, a first blastn alignment was computed to compare with all GOV contigs, and exclude from the putative host dataset all metagenomic contigs with a significant similarity to a viral GOV sequence (thresholds of 50 for bit score, 0.001 for E-value, and 70% for identity percentage) on ≥90% of their length, as these are likely to be sequences of viral origin sequenced in the bacteria and archaea size fraction (these represented 2.2% of the contigs in the assembled microbial metagenomes). The taxonomic affiliation of NCBI genomes was taken from the NCBI taxonomy. For Tara Oceans contigs, a last common ancestor (LCA) affiliation was generated for each contig based on genes affiliation18, if three or more genes on the contig were affiliated. All GOV viral contigs were compared to all archaeal and bacterial genomes and genome fragments with a blastn (threshold of 50 for bit score and 0.001 for E-value), to identify regions of similarity between a viral contig and a microbial genome, indicative of a prophage integration or horizontal gene transfer63. A host prediction was made when: (i) a NCBI genomes displayed a region similar to a GOV viral contig ≥5 kb at ≥70% identity, or (ii) when a Tara Oceans microbial metagenomic contig (≥5 kb) displayed a region similar to a GOV viral contig ≥2.5 kb at ≥70% identity. CRISPR arrays were predicted for all putative host genomes and genome fragments (NCBI microbial genomes and Tara Oceans microbial metagenomic contigs) with MetaCRT64, 65. CRISPR spacers were extracted, and all spacers with ambiguous bases or low complexity (that is, consisting of 4–6 bp repeat motifs) were removed. All remaining spacers were matched to viral contigs with fuzznuc66, with no mismatches allowed, which, although rarely, observed yields highly accurate host predictions63 (Supplementary Table 4). Bacterial and archaeal viruses tend to have a genome composition close to the genome composition of their host, a signal that can be used to predict viral–host pairs8, 63, 67. Here, canonical tetranucleotide frequencies were observed for all viral and host sequences using Jellyfish68 and mean absolute error (that is, the average of absolute differences) between tetranucleotide-frequency vectors were computed with in-house Perl and Python scripts for each pair of viral and host sequence as previously reported8. A GOV viral contig was then assigned to the closest sequence (that is, lowest distance ‘d’) from the pool of NCBI genomes if d < 0.001 (because both the tetranucleotide-frequency signal and the taxonomic affiliation of these complete genomes are more robust than for metagenomic contigs), and otherwise assigned to the closest (that is, lowest distance) Tara Oceans microbial contig if d < 0.001. Overall, 3,675 GOV contigs could be linked to a putative host group among the 24,353 GOV contigs associated with an epipelagic or mesopelagic viral population. To summarize these affiliations at the viral cluster level, a Poisson distribution was used to estimate the number of expected false-positive associations for each viral cluster–host group combination based on: (i) the global probability of obtaining a host prediction across all pairs of viral and host sequences tested and for all methods (5.8 × 10−8), (ii) the number of potential predictions generated for the viral cluster, corresponding to 3 times the number of sequences in the viral cluster (to take into account the three methods) and (iii) the number of sequences from the host group in the database (Supplementary Fig. 2). By comparing the number of links observed between a viral cluster and a host group to this expected value, which takes into account the bias in database (that is, some host groups will be over- or under-represented in our set of archaeal and bacterial genomes and genome fragments) and the bias linked to the variable number of sequences in viral clusters, we can determine if the number of associations observed for any combination of viral cluster and host group is likely to be due to chance alone (and calculate the associated P value). Diversity and richness indices for putative host populations were based on the OTU abundance matrix generated from the analysis of TAGs in Tara Oceans microbial metagenomes18. These indexes were computed for each host group at the same taxonomic level as the host prediction (that is, the phylum level, except for Proteobacteria where the class level is used). The R package vegan69 was used to estimate for each group: (i) a global Chao index (that is, including all OTUs from all samples) through the function estaccumR, (ii) a sample-by-sample Chao index with the function estimateR, and (iii) Sorensen indexes between all pairs of samples with the function betadiver. Diversity indices presented in Extended Data Fig. 4 are based solely on epipelagic samples as the 38 viral clusters identified as abundant were mostly retrieved in epipelagic samples. Candidate division OP1 was excluded from this analysis because no OTU affiliated to this phylum was identified. Predicted proteins from all GOV viral contigs were compared to the PFAM domain database (hmmsearch51, threshold of 40 for bit score and 0.001 for E-value), and all PFAM domains detected were classified into 8 categories: ‘structural’, ‘DNA replication, recombination, repair, nucleotide metabolism’, ‘transcription, translation, protein synthesis’, ‘lysis’, ‘membrane transport, membrane-associated’, ‘metabolism’, ‘other’, and ‘unknown’ (as in ref 20). Four AMGs (similar to a domain from the ‘metabolism’ category) were then selected for further study owing to their central role in sulfur (dsrC and soxYZ) or nitrogen (P-II, amoC) cycle, and the fact that these had never been detected in a surface ocean viral genome thus far (dsrC/tusE-like genes have been detected in deep water viruses11, 21). To evaluate if an AMG was ‘known’, a list of PFAM domain detected in NCBI RefSeqVirus and Environmental Phages was computed based on a similar hmmsearch comparison (threshold of 40 for bit score and 0.001 for E-value), and augmented by manual annotation of AMGs from refs 20, 70. These corresponded, for the most part, to photosynthesis and carbon metabolism AMGs previously described in cyanophages71, 72, 73, 74, 75. The complete list of PFAM domains detected in GOV viral contigs is available in Supplementary Table 6. Sequences similar to the four AMGs described in the previous paragraph were recruited from the Tara Oceans microbial metagenomes18, based on a blastp of all predicted proteins from microbial metagenome to the viral AMGs identified (threshold of 100 for bit score, 10−5 for E-value, except for P-II where a threshold of 170 for bit score was used because of the high number of sequences recruited). The viral AMG sequences were also compared to NCBI nr database (blastp, threshold of 50 for bit score and 10−3 for E-value) to recruit relevant reference sequences (up to 20 for each viral AMG sequence). These sets of viral AMGs and related protein sequences were then aligned with Muscle76, the alignment manually curated to remove poorly aligned positions with Jalview77, and two trees were computed from the same curated alignment: a maximum-likelihood tree with FastTree (v2.7.1, model WAG, other parameters set to default78) and a bayesian tree with MrBayes (v3.2.5, mixed evolution models, other parameters set to default, 2 MCMC chains were run until the average standard deviation of split frequencies was <0.015, relative burn-in of 25% used to generate the consensus tree79). In all cases except for AmoC, the mixed model used by MrBayes was 100% WAG, confirming that this model was well suited for archaeal and bacterial virus protein trees. Manual inspection revealed only minor differences between each pair of trees, so a Shimodaira–Hasegawa (SH) test was used to determine which tree best fitted the sequence alignment, using the R library phangorn80. Itol60 was used to visualize and display these trees, in which branches with supports <40% were collapsed. Annotated interactive trees are available online at http://itol.embl.de/shared/Siroux. Contigs map comparison were generated with Easyfig81, following the same method used for the viral clusters (see Supplementary Information). Conserved motifs were identified on the different AMGs based on the literature: dsrC-conserved motifs were obtained from ref. 24, soxYZ conserved residues were identified from the PFAM domains PF13501 and PF08770, and P-II conserved motifs identified from PROSITE documentation PDOC00439. A 3D structure could also be predicted for P-II AMGs by I-TASSER82 (default parameters), the quality of these predictions being confirmed with ProSA web server83. To further confirm the functionality of these genes, selective constraint on these AMGs was evaluated through pN/pS calculation, as previously84. In brief, synonymous (pS) and non-synonymous (pN) SNPs were observed in each AMG, and compared to expected ratio of synonymous and non-synonymous SNPs under a neutral evolution model for these genes. The interpretation of pN/pS is similar as for dN/dS analyses, with the operation of purifying selection leading to pN/pS values <1. Finally, AMG transcripts were searched in metatranscriptomic datasets, generated by the Tara Oceans consortium (ENA Id ERS1092158, ERS488920, and ERS494518). To generate these metatranscriptomes, bacterial rRNA depletion was carried out on 240–500 ng total RNA using Ribo-Zero Magnetic Kit for Bacteria (Epicentre) for 0.2–1.6 μm and 0.22–3 μm filters. The Ribo-Zero depletion protocol was modified to be adapted to low RNA input amounts85. Depleted RNA was used to synthetize cDNA with SMARTer Stranded RNA-Seq Kit (Clontech)85. Metatranscriptomic libraries were quantified by quantitive PCR using the KAPA Library Quantification Kit for Illumina Libraries (KapaBiosystems) and library profiles were assessed using the DNA High Sensitivity LabChip kit on an Agilent Bioanalyzer (Agilent Technologies). Libraries were sequenced on Illumina HiSeq2000 instrument (Illumina) using 100-base-length read chemistry in a paired-end mode. High-quality reads were then mapped to viral contigs containing dsrC, soxYZ, P-II, or amoC genes with SOAPdenovo242 within MOCAT40 (options ‘screen’ and ‘filter’ with length and identity cutoffs of 45% and 95%, respectively, and paired-end filtering set to ‘yes’), and coverage was defined for each gene as the number of base pairs mapped divided by gene length (including only those reads mapped to the predicted coding strand). The distribution and relative abundance of AMGs was based on the readmapping and normalized coverage of the contig that included the AMG. To get a range of temperature and nutrient concentrations for the widespread AMGs (those detected in >5 stations) that takes into account both the samples in which these AMGs were detected and the differences in normalized coverage, a set of samples was selected through a weighted random selection with replacement, with the weight of each sample corresponding to the normalized coverage of the AMG. This ensured that a range of temperature or nutrient concentration values associated with the distribution and abundance of the AMG could be generated for each AMG and each environmental parameter tested. The number of samples randomly selected for each AMG was the same as the total number of samples for which a value of this parameter was available. Scripts used in this manuscript are available on the Sullivan laboratory bitbucket under project GOV_Ecogenomics (http://bitbucket.org/MAVERICLab/gov_ecogenomics/overview). Scripts used in the assessment of microbial diversity are gathered in the directory Host_diversity, the ones used for host predictions are in Host_prediction, and the scripts used to identify abundant viral clusters are in Virus_clusters_prevalence. All raw reads are available through ENA (Tara Oceans) or IMG (Malaspina) using the dataset identifiers listed in Supplementary Table 1. Processed data are available through iVirus (http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/iVirus/GOV/), including all sequences from assembled contigs, lists of viral populations and associated annotated sequences as GenBank files, viral clusters composition and characteristics, map comparisons of genomes and contigs of the 38 abundant viral clusters and host predictions for viral contigs.


According to Stratistics MRC, the Global Polymerase Chain Reaction market is accounted for $6.95 billion in 2015 and is expected to reach $12.56 billion by 2022 growing at a CAGR of 8.8% during the forecast period. Increasing investments in gene therapy and government support in R&D are some of the factors fueling the market growth. However, rising non-validated home brew test and reimbursement issues are hampering the market. Real time polymerase chain reaction instrument is one of the major challenges for the polymerase chain reaction technologies market. Academics and research organizations hold the largest share in end users segment. Clinical diagnostic labs and hospitals market is anticipated to grow at a faster pace during the forecast period. North America is the leading PCR market followed by Europe, owing to rising demand for low-cost diagnosis in healthcare. Some of the key players in Polymerase Chain Reaction market are Sigma-Aldrich Co. LLC., Thermo Fisher Scientific Inc., GE Healthcare, F. Hoffmann-La Roche Ltd., Becton, Dickinson & Company, QIAGEN, Agilent Technologies Inc., Bio-Rad Laboratories Inc., Beckman Coulter Inc., Affymetrix Inc., Abbott Laboratories, Cytocell Ltd, Shimadzu Biotech, HY LABORATORIES, Eppendorf AG, Exiqon, Dna Landmarks, Roche Diagnostics, Ocimum Biosolutions, BD Biosciences, Illumina, Complete Genomics, Dnavision SA, Epicentre® Biotechnologies and Hokkaido System Science Co. Products Covered: • Reagents and Consumables o Buffers o Consumable o Nuclease Free Water o Enzymes o Template o Primers And Probes o DNA o Master Mixes o dNTP's o Others Reagents and Consumables • Instruments  o Digital PCR Systems o Standard PCR Systems Real time PCR's • Life Sciences • Industrial Application o Animal husbandry o Environment o Biomedical research o Agricultural o Applied testing o Other PCR industry applications • Clinical Diagnostics o Infectious o Non Infectious • Others Applications o Dentistry o Pathogen Detection End Users Covered: • Academic and Research Organizations • Pharmaceutical and Biotechnology Industries • Clinical Diagnostics Labs and Hospitals • Other End Users o Blood Banks Regions Covered: • North America o US o Canada o Mexico • Europe o Germany o France o Italy o UK  o Spain   o Rest of Europe     • Asia Pacific o Japan        o China        o India        o Australia        o New Zealand       o Rest of Asia Pacific     • Rest of the World o Middle East o Brazil o Argentina o South Africa o Egypt What our report offers: - Market share assessments for the regional and country level segments - Market share analysis of the top industry players - Strategic recommendations for the new entrants - Market forecasts for a minimum of 7 years of all the mentioned segments, sub segments and the regional markets - Market Trends (Drivers, Constraints, Opportunities, Threats, Challenges, Investment Opportunities, and recommendations) - Strategic recommendations in key business segments based on the market estimations - Competitive landscaping mapping the key common trends - Company profiling with detailed strategies, financials, and recent developments - Supply chain trends mapping the latest technological advancements


News Article | January 6, 2016
Site: www.nature.com

No statistical methods were used to predetermine sample size. Clinical samples GBM1w, GBM2w, GBM3w, GBM4w, GBM5w, GBM6w, GBM7w, AA15 m, AA16 m, AA17 m, OD18 m and AA19 m were obtained as frozen specimens from the Massachusetts General Hospital Pathology Tissue Bank, or received directly after surgical resection and flash frozen (Extended Data Table 1). All samples were acquired with Institutional Review Board approval, and were de-identified before receipt. GBM1w was obtained at autopsy; the remaining samples were surgical resections. IDH status was determined for all clinical samples by SNaPshot multiplex PCR31. PDGFRA status was confirmed by FISH analysis. Tissue (200–500 μg) was mechanically minced with a sterile razor blade before further processing. Gliomaspheres were maintained in culture as described32, 33. In brief, neurosphere cultures contain Neurobasal media supplemented with 20 ng ml−1 recombinant EGF (R and D Systems), 20 ng ml−1 FGF2 (R and D Systems), 1× B27 supplement (Invitrogen), 0.5× N2 supplement (Invitrogen), 3 mM L-glutamine, and penicillin/streptomycin. Cultures were confirmed to be mycoplasma-free via PCR methods. GSC4 and GSC6 gliomasphere lines were derived from IDH wild-type tumours resected at Massachusetts General Hospital, and have been previously described and characterized32, 33, 34. BT142 gliomasphere line (IDH1 mutant)35 was obtained from ATCC, and cultured as described above except 25% conditioned media was carried over each passage. BT142 G-CIMP status was confirmed by evaluating LINE methylation with the Global DNA Methylation Assay – LINE-1 kit (Active Motif), as described36, and by methylation-sensitive restriction digests. GSC119 was derived from an IDH1 mutant tumour (confirmed by SNaPshot) resected at Massachusetts General Hospital. We confirmed IDH1 mutant status of GSC119 by RNA-seq (82 out of 148 reads overlapping the relevant position in the transcript correspond the mutant allele). The gliomasphere models were derived from tumours of the following types: GSC4 and GSC6: primary glioblastoma; BT142: grade III oligoastrocytoma; GSC119: secondary glioblastoma, G-CIMP. Clinical specimens and models used in this study are detailed in Extended Data Table 1. ChIP-seq was performed as described previously32. In brief, cultured cells or minced tissue was fixed in 1% formaldehyde and snap frozen in liquid nitrogen and stored at −80 °C at least overnight. Sonication of tumour specimens and gliomaspheres was calibrated such that DNA was sheared to between 400 and 2,000 bp. CTCF was immunoprecipitated with a monoclonal rabbit CTCF antibody, clone D31H2 (Cell Signaling 3418). H3K27ac was immunoprecipitated with an antibody from Active Motif (39133). ChIP DNA was used to generate sequencing libraries by end repair (End-It DNA repair kit, Epicentre), 3′ A base overhang addition via Klenow fragment (NEB), and ligation of barcoded sequencing adapters. Barcoded fragments were amplified via PCR. Libraries were sequenced as 38-base paired-end reads on an Illumina NextSeq500 instrument or as 50-base single-end reads on a MiSeq instrument. Sequencing libraries are detailed in Extended Data Table 2. H3K27ac maps for GSC6 were previously deposited to the GEO under accession GSM1306340. Genomic data has been deposited into GEO as GSE70991. For sequence analysis, identical reads were collapsed to a single paired-end read to avoid PCR duplicates. To avoid possible saturation, reads were downsampled to 5% reads collapsed as PCR duplicates, or 5 million fragments. Reads were aligned to hg19 using BWA, and peaks were called using HOMER. ChIP-seq tracks were visualized using Integrative Genomics Viewer (IGV, http://www.broadinstitute.org/igv/). To detect peaks lost in IDH mutants, we called signal over all peaks in a 100-bp window centred on the peaks. To control for copy number changes, we first called copy number profiles from input sequencing data using CNVnator37. We then removed all regions where at least one sample had a strong deletion (<0.25), and normalized by copy number. To account for batch effects and difference in ChIP efficiency, we quantile normalized each data set. Peaks were scored as lost or gained if the difference in signal between a given tumour and the average of the five wild-type tumours was at least twofold lower or higher, with a signal of at least 1 in all wild-type or IDH mutant tumours. Fisher exact test confirmed that the overlap between peaks lost in the IDH mutant tumours is highly significant (P < 10−100). GC content over CTCF peaks lost (or retained) in the IDH mutant glioma specimens was averaged over 200-bp windows centred on each peak lost in IDH mutant tumours. Methylation levels were quantified over these same regions for 3 IDH mutant and 3 IDH wild-type tumours, using TCGA data generated by whole genome bisulfite sequencing10. In brief, methylation levels (percentage) based on proportion of reads with protected CpG were averaged over all CpG di-nucleotides in these regions, treating each tumour separately. Occupancy of the CTCF site in the boundary element adjacent to the PDGFRA locus was quantified by ChIP qPCR, using the following primers: PDGFRActcfF: 5′-GTCACAGTAGAACCACAGAT-3′; PDGFRActcfR: 5′-TAAGTATACTGGTCCTCCTC-3′. Equal masses of ChIP or input (WCE) DNA were used as input for PCR, and CTCF occupancy was quantified as a ratio between ChIP and WCE, determined by 2−ΔCt. CTCF peak intensity was further normalized as ratio to two invariant peaks, at PSMB1 and SPG11, using the following primers: PSMB1ctcfF: 5′-CCTTCCTAGTCACTCAGTAA-3′; PSMB1ctcfR: 5′-CAGTGTTGACTCATCCAG-3′; SPG11ctcfF: 5′-CAGTACCAGCCTCTCTAG-3′; SPG11ctcfR: 5′-CTAAGCTAGGCCTTCAAG-3′. RNA-seq data for 357 normal brain samples was downloaded from GTEx20. RNA-seq data and copy number profiles for lower grade gliomas were downloaded from TCGA23, 24. Contact domains of IMR90, GM12878, K562 and NHEK cells were obtained from published HiC data15. Genes were assigned to the inner-most domain in which their transcription start site fell within. Gene pairs were considered to be in the same domain if they were assigned to the same domain in both GM12878 and IMR90. Gene pairs were considered to span a boundary if they were assigned to different domains in both GM12878 and IMR90, and separated by a CTCF-binding site in IDH wild-type tumours. Gene pairs that did not fit either criterion were excluded from this analysis. The plot of correlation vs distance for brain GTEx samples is based on Pearson correlations for all relevant pairs, smoothed by locally weighted scatterplot smoothing with weighted linear least squares (LOESS). To assess the bias in correlation differences, we computed the difference of Pearson correlations between wild-type and IDH mutant gliomas for all gene pairs separated by <180 kb. In Fig. 1e, this difference in correlations is plotted against the significance of this difference (estimated by Fisher z-transformation). For each gene pair, we omitted samples with a deletion or amplification of one of the genes at or above threshold of the minimal arm level deletion or amplification (to avoid copy number bias). To ensure robustness, we also repeated the analysis using boundaries defined from HiC data for K562 and NHEK. This yielded similar results: 84% pairs gaining correlation cross boundary versus 71% expected (P < 8 × 10−3), 54% pairs losing correlation are within the same domain versus 29% expected (P < 3 × 10−8). Repeating the analysis with only the 14,055 genes that have expressed over 1 transcripts per million (TPM) in at least half the samples also yielded similar results (Extended Data Fig. 7): 92% pairs gaining correlation cross boundary versus 69% expected (P < 2 × 10−3), 73% pairs losing correlation are within the same domain versus 31% expected (P < 8 × 10−4). To detect boundaries deregulated in IDH mutant gliomas, we scanned for gene pairs, separated by <1 Mb, with a significant difference in correlation between wild-type and IDH mutant tumours (Fisher z-transformation, FDR <1%). We omitted amplified or deleted samples as described above. To ensure robustness to noise from lowly expressed genes, we first filtered out 6,476 genes expressed <1 TPM in more than half of the samples (keeping 14,055 genes). We considered all domains and boundaries scored in IMR90 HiC data13. Gene pairs crossing a CTCF peak and an IMR90 boundary (that is, can be assigned to different domains) that were significantly more correlated in IDH mutant tumours were considered to support the loss of that boundary. Gene pairs not crossing a boundary (that is, can be assigned to the same domain) that were significantly less correlated in IDH mutant tumours were considered to support the loss of a flanking boundary. We collated a set of deregulated boundaries, supported by at least one cross-boundary pair gaining correlation and at least one intra-domain pair losing correlation. Each was assigned a P value equal to the product of both supporting pairs (best P value was chosen if there were more supporting pairs). If both boundaries of a domain were deregulated, or if the same pair of gene pairs (one losing and one gaining correlations) were supporting more than one boundary due to overlapping domains, the entries were merged (Supplementary Table 1). This definition allows every gene pair to be considered as potential support for a boundary loss. To quantify CTCF occupancy over these deregulated boundaries, we averaged the signal over all CTCF peaks located within a 1-kb window around the boundary, using copy number and quantile normalized CTCF signals. To quantify DNA methylation over the deregulated boundaries, we averaged DNA methylation signals from TCGA data in 200-bp windows as above. Figure 2a depicts significance of disrupted domains and the fold change of genes in them that are upregulated in IDH mutant tumours (compared to median expression in wild type). In addition to PDGFRA, top-ranking genes include CHD4 (P < 10−32), a driver of glioblastoma tumour initiation38, L1CAM (P < 10−8), a regulator of the glioma stem cells and tumour growth39, and other candidate regulators (Supplementary Table 1). To ensure robustness to cell-type-specific boundaries, we repeated the analysis with GM12878-, K562- and NHEK-defined boundaries. This yielded very similar results, and again highlighted PDGFRA as an overexpressed gene adjacent to a disrupted boundary. For the correlation of FIP1L1 and PDGFRA expression, RNA-seq data from the TCGA lower grade glioma (LGG) and glioblastoma (GBM) data sets2, 24 were downloaded and segregated by IDH mutation status and subtype. Patients from the proneural subtype were divided by IDH mutation status, while patients from the mesenchymal, classical or neural subtypes (which had no IDH mutations) were classified as ‘other’. For correlation analysis, patients with copy number variation in either gene were excluded from the analysis to control for effects of co-amplification. For outcome analysis, LGG RNA-seq data and corresponding patient survival data was obtained from TCGA. Patients with sum PDGFRA and FIP1L1 expression of at least one-half of one standard deviation above the mean were classified as ‘high PDGFRA and FIP1L1 expression’ (n = 17), while all other patients were classified as ‘low PDGFRA and FIP1L1 expression’ (n = 201). Data were plotted as Kaplan–Meier curves and statistically analysed via log–rank test. HiC data15 were downloaded from GEO. 5-kb resolution intra-chromosomal contact scores for chromosome 4 for the cell lines IMR90, NHEK, KBM7, K562, HUVEC, HMEC and GM12878 were filtered to the region between 53,700 and 55,400 kb. The average interaction score at each coordinate pair for all cell lines was calculated and used to determine putative insulator elements as local maxima at the interaction point of two domain boundaries. To determine the interactions of the PDGFRA promoter, the interaction scores of all points in the region with the PDGFRA promoter (chr4: 55,090,000) were plotted as a one-dimensional trace. To view the topological domain structure of the region, HiC interaction scores were visualized using Juicebox (http://www.aidenlab.org/juicebox/)15. Data shown is from the IMR90 cell line at 5-kb resolution, normalized to coverage. DNA methylation was analysed in two ways. For gliomaspheres, genomic DNA was isolated via QiaAmp DNA minikit (Qiagen) and subjected to bisulfite conversion (EZ DNA Methylation Gold Kit, Zymo Research). Bisulfite-converted DNA specific to the CTCF-binding site (defined by JASPAR40) in the boundary adjacent to PDGFRA was amplified using the following primers forward: 5′-GAATTATAGATAATGTAGTTAGATGG-3′, reverse: 5′-AAATATACTAATCCTCCTCTCCCAAA-3′. Amplified DNA was used to prepare a sequencing library, which was sequenced as 38-base paired-end reads on a NextSeq500. For tumours, limiting DNA yields required an alternative strategy for methylation analysis. Tumour genomic DNA was isolated from minced frozen sections of tumours by QiaAmp DNA minikit (Qiagen). Genomic DNA was digested using the methylation-sensitive restriction enzyme Hin6I (Thermo) recognizing the restriction site GCGC, or subjected to mock digestion. Protected DNA was quantified by PCR using the following primer set: PDGFRAinsF: 5′-CGTGAGCTGAATTGTGCCTG-3′, PDGFRAinsR: 5′-TGGGAGGACAGTTTAGGGCT-3′, normalizing to mock digestion. 3C analysis was performed using procedures as described previously41, 42. In brief, ~10 million cell equivalents from minced tumour specimens or gliomasphere cultures were fixed in 1% formaldehyde. Fixed samples were lysed in lysis buffer containing 0.2% PMSF using a Dounce pestle. Following lysis, samples were digested with HinDIII (NEB) overnight on a thermomixer at 37 °C rotating at 950 r.p.m. Diluted samples were ligated using T4 DNA ligase (NEB) at 16 °C overnight, followed by RNase and proteinase K treatment. DNA was extracted via phenol/chloroform/isoamyl alcohol (Invitrogen). DNA was analysed via TaqMan PCR using ABI master mix. Primers and probe were synthesized by IDT with the following sequences: common PDGFRA promoter: 5′-GGTCGTGCCTTTGTTTT-3′; FIP1L1 control: 5′-CAGGGAAGAGAGGAAGTTT-3′; FIP1L1 enhancer: 5′-TTAAGTAAGCAGGTAAACTACAT-3′; intragenic enhancer: 5′-AGCCTTTGCCTCCTTTT-3′; intragenic control: 5′-CCACAGGGAGAAGGAAAT-3′; intact promoter: 5′-CAAGGAATTCGTAGGGTTC-3′; probe: 5′-/56-FAM/TTGTATGCG/ZEN/AGATAGAAGCCAGGGCAA/3IABkFQ/-3′. For the reciprocal FIP1L1 enhancer interaction interrogation, the following primer sequences were used: common enhancer primer (as FIP1L1 enhancer primer above): 5′-TTAAGTAAGCAGGTAAACTACAT-3′, PDGFRA promoter (as common PDGFRA promoter above): 5′-GGTCGTGCCTTTGTTTT-3′; SCFD2 promoter: 5′-AATACATGGTCATGATGCTC-3′; FIP1L1 promoter: 5′-AGGCATTGCTTAAACATAAC-3′; FIP1L1 control: 5′-TTATTTGTAGTAGAGGTTACTGG-3′; PDGFRA control: 5′-ATGATAACACCACCATTCAG-3′; FIP1L1 enhancer probe: 5′-/56-FAM/TATCCCAAC/ZEN/CAAATACAGGGCTTGG/3IABkFQ/-3′. To normalize primer signals, bacterial artificial chromosome (BAC) clones CTD-2022B5 and RP11-626H4 were obtained from Invitrogen. BAC DNA was purified via BACMAX DNA Purification kit (Epicentre) and quantified using two primer sets specific to the Chloramphenicol resistance gene: 1F: 5′-TTCGTCTCAGCCAATCCCTG-3′; 1R: 5′-TTTGCCCATGGTGAAAACGG-3′; 2F: GGTTCATCATGCCGTTTGTG-3′; 2R: 5′-CCACTCATCGCAGTACTGTTG-3′. BAC DNA was subjected to a similar 3C protocol, omitting steps related to cell lysis, proteinase or RNase treatment. PCR signal from tumour and gliomasphere 3C was normalized to digestion efficiency and BAC primer signal. BT142 cells were cultured in either 5 μM 5-azacytidine or equivalent DMSO (1:10,000) for 8 days, with drug refreshed every 2 days. The following CRISPR sgRNAs were cloned into the LentiCRISPR vector obtained from the Zhang laboratory43: GFP: 5′-GAGCTGGACGGCGACGTAAA-3′; insulator: 5′-GCCACAGATAATGCAGCTAGA-3′. GSC6 gliomaspheres were mechanically dissociated and plated in 5 μg ml−1 EHS laminin (Sigma) and allowed to adhere overnight, and then infected with lentivirus containing either CRISPR vector for 48 h. Cells were then selected in 1 μg ml−1 puromycin for 4 days, with puromycin-containing media refreshed every 2 days. Genomic DNA was isolated and the region of interest was amplified using the PDGFRAins primer set described above. CRISPR-mediated disruption of this amplified DNA was confirmed via Surveyor Assay (Transgenomic), with amplified uninfected GSC6 genomic DNA being added to each annealing reaction as the unmodified control. To quantify the precise CRISPR alterations, genomic DNA from each construct was amplified using a set of primers closer to the putative deletion site as follows: forward: 5′-TTTGCAATGGGACACGGAGA-3′, reverse: 5′-AGAAATGTGTGGATGTGAGCG-3′. PCR product from these primers was used to prepare a library that was sequenced as 38-base paired-end reads on the Illumina NextSeq500. Total RNA was isolated from CRISPR-infected GSC6 gliomaspheres (insulator or control GFP sgRNA) or BT142 gliomaspheres (5-aza-treated or control condition) using the RNeasy minikit (Qiagen) and used to synthesize cDNA with the SuperScriptIII system (Invitrogen). cDNA was analysed using SYBR mastermix (Applied Biosystems) on a 7500 Fast Real Time System (Applied Biosystems). PDGFRA expression was determined using the following primers: forward: 5′-GCTCAGCCCTGTGAGAAGAC-3′, reverse: 5′-ATTGCGGAATAACATCGGAG-3′, and was normalized to primers for ribosomal protein, large, P0 (RPLP0), as follows: forward: 5′-TCCCACTTGCTGAAAAGGTCA-3′, reverse: 5′-CCGACTCTTCCTTGGCTTCA-3′. Normalization was also verified by β-actin (ACTB), forward: 5′-AGAAAATCTGGCACCACACC-3′, reverse: 5′-AGAGGCGTACAGGGATAGCA-3′. Cells were incubated with PE-conjugated anti-PDGFRa (CD140a) antibody (Biolegend, clone 16A1) for 30 min at room temperature at the dilution specified in the manufacturer’s protocol. Data was analysed and visualized with FlowJo software. Single live cells were selected for analysis via side and forward scatter, and viable cells were selected by lack of an unstained channel (APC) autofluorescence. For the cell growth assay, 2,500 dissociated viable GSC6 cells expressing CRISPR and either GFP or insulator-targeting sgRNA (see above) was plated in 100 μl of media in an opaque-walled tissue culture 96-well plate, in 1 μM dasatinib, 500 nM crenolanib, or equivalent DMSO (1:10,000) as a vehicle control. Cell growth was analysed at days 3, 5 and 7 for dasatinib, or days 3, 7 and 10 for crenolanib, using CellTiter-Glo reagent (Promega) following the manufacturer’s protocol. Data were normalized across days using an ATP standard curve.


News Article | August 31, 2016
Site: www.nature.com

A devastating 6.2-magnitude earthquake in central Italy on 24 August that killed more than 290 people was the country’s largest since a magnitude-6.3 earthquake in 2009 that hit the town of L’Aquila, about 40 kilometres away. That event killed 308 people, destroyed tens of thousands of homes and a university. Controversially, it also caused six scientists to be put on trial for manslaughter. Central Italy’s complex geological and tectonic make-up creates a notorious quake risk. The Adria micro-plate dives beneath the Apennine mountain range from east to west, creating seismic strain. The mighty Eurasian and African plates also collide here, with the Eurasian plate moving northeast at 24 millimetres per year. The latest quake also injured hundreds and laid waste to historic villages in the Apennine mountains, including Amatrice (see ‘Epicentre of a quake’). It was a result of increased horizontal stress perpendicular to the mountain chain. Seismologists had expected a rupture to occur near the location at any time. Still, Giulio Selvaggi, a research director at the National Institute of Geophysics and Volcanology in Rome, and one of those initially convicted of manslaughter — all six were cleared on appeal — says he was shocked by the death and destruction wreaked by last week’s quake. The mountainous region around Amatrice is sparsely populated, but the final death toll may exceed that of more populated and urbanized L’Aquila. Selvaggi seconds a public outcry over the failure of authorities to prioritize making old buildings more earthquake-resistant and notes that his team supplies earthquake maps to them. “We scientists have made a beautiful, detailed seismic hazard map, showing clearly the areas in greatest need of preventive measures,” he says. “But public authorities don’t take enough action.” The court case over the L’Aquila earthquake came about because a local amateur researcher claimed to have evidence of an imminent, large quake. Six scientists and one government official who had publicly dismissed the amateur’s methods were accused of misinforming the public. Following an unprecedented trial, all seven were given six-year jail sentences for manslaughter, but the scientists were cleared on appeal in 2014. Computer scientist Paola Inverardi, who is rector of the university in L’Aquila, says the rebuilding of the university is nearly complete, and that research activities had resumed by 2012. Science in the region has also benefited from supporting initiatives following the quake, she says. One of these is the Gran Sasso Science Institute, an international graduate school founded in 2012 to inject young intellectual life into L’Aquila. It has been so successful that in June it was awarded university status. Unlike the earthquake in L’Aquila, which was preceded by frequent, mostly low-magnitude, tremors in the surrounding area, no seismic activity was recorded before the latest earthquake. “It came out of the blue, without the preceding tremors we experienced in ‘our’ earthquake,” says Inverardi. L’Aquila itself experienced virtually no damage, but, she says, “psychologically we were all pushed back”.

Loading Epicentre collaborators
Loading Epicentre collaborators