Time filter

Source Type

Oslo, Norway

All mice were bred and maintained under pathogen-free conditions at an American Association for the Accreditation of Laboratory Animal Care accredited animal facility at the University of Pennsylvania or Yale University. Mice were housed in accordance with the procedures outlined in the Guide for the Care and Use of Laboratory Animals under an animal study proposal approved by an institutional Animal Care and Use Committee. Male and female mice between 4 and 12 weeks of age were used for all experiments. Littermate controls were used whenever possible. C57BL/6 (wild type) and B6.SJL-Ptprca Pepcb/Boy (B6.SJL) mice were purchased from The Jackson Laboratory. We generated Morrbid-deficient mice and the in cis and in trans double heterozygous mice (Morrbid+/−, Bcl2l11+/−) mice using the CRISPR/Cas9 system as previously described26. In brief, to generate Morrbid-deficient mice, single guide RNAs (sgRNAs) were designed against regions flanking the first and last exon of the Morrbid locus (Extended Data Fig. 1g). Cas9-mediated double-stranded DNA breaks resolved by non-homologous end joining (NHEJ) ablated the intervening sequences containing Morrbid in C57BL/6N one-cell embryos. The resulting founder mice were Morrbid−/+, which were then bred to wild-type C57BL/6N and then intercrossed to obtain homozygous Morrbid-/- mice. One Morrbid-deficient line was generated. To control for potential off-target effects, mice were crossed for at least 5 generations to wild-type mice and then intercrossed to obtain homozygosity. Littermate controls were used when possible throughout all experiments. To generate the in cis and in trans double heterozygous mice (Morrbid+/−, Bcl2l11+/−) mice, we first obtained mouse one-cell embryos from a mating between Morrbid−/− female mice and wild-type male mice. As such, the resulting one-cell embryos were heterozygous for Morrbid (Morrbid+/−). We then micro-injected sgRNAs designed against intronic sequences flanking the second exon of Bcl2l11, which contains the translational start site/codon, into Morrbid−/+ one-cell embryos (Extended Data Fig. 9). Cas9-mediated double-stranded DNA breaks resolved by NHEJ ablated the intervening sequences containing the second exon of Bcl2l11 in Morrbid+/− (C57BL/6N) one-cell embryos, generating founder mice that were heterozygous for both Bcl2l11 and Morrbid (Bcl2l11+/−; Morrbid−/+). Founder heterozygous mice were then bred to wild-type C57BL/6N to interrogate for the segregation of the Morrbid-deficient and Bcl2l11-defient alleles (Extended Data Fig. 9). Pups that segregated such alleles were named in trans and pups that did not segregate were labelled in cis. One line of in cis and in trans double heterozygous mice (Bcl2l11+/−; Morrbid−/+) lines were generated. To control for potential off-target effects, mice were crossed for at least 5 generations to wild-type (C57BL/6N) mice (for in cis) and to Morrbid−/− mice (for in trans) to maintain heterozygosity. To determine genetic rescue, samples from mice containing different permutations of Morrbid and Bcl2l11 alleles (Fig. 4g–j) were analysed in a blinded manner by a single investigator not involved in the breeding or coding of these samples. Cells were isolated from the indicated tissues (blood, spleen, bone marrow, peritoneal exudate, adipose tissue). Red blood cells were lysed with ACK. Single-cell suspensions were stained with CD16/32 and with indicated fluorochrome-conjugated antibodies. If run live, cells were stained with 7-AAD (7-amino-actinomycin D) to exclude non-viable cells. Otherwise, before fixation, Live/Dead Fixable Violet Cell Stain Kit (Invitrogen) was used to exclude non-viable cells. Active caspase staining using Z-VAD-FMK (CaspGLOW, eBiosciences) was performed according to the manufacturer's specifications. Apoptosis staining by annexin V+ (Annexin V Apoptosis Detection kit) was performed according to the manufacturer’s recommendations. BrdU staining was performed using BrdU Staining Kit (eBioscience) according to the manufacturer’s recommendations. For BCL2L11 staining, cells were fixed for 15 min in 2% formaldehyde solution, and permeabilized with flow cytometry buffer supplemented with 0.1% Triton X-100. All flow cytometry analysis and cell-sorting procedures were done at the University of Pennsylvania Flow Cytometry and Cell Sorting Facility using BD LSRII cell analysers and a BD FACSAria II sorter running FACSDiva software (BD Biosciences). FlowJo software (version 10 TreeStar) was used for data analysis and graphic rendering. All fluorochrome-conjugated antibodies used are listed in Supplementary Table 2. 1 × 106 wild-type and Morrbid-deficient neutrophils sorted from mouse bone marrow were assayed for BCL2L11 protein expression by western blotting (Bim C34C5 rabbit monoclonal antibody, Cell Signaling), as previously described. 2 × 106 wild-type and Morrbid-deficient neutrophils sorted from mouse bone marrow were cross-linked in a 1% formaldehyde solution for 5 min at room temperature while rotating. Crosslinking was stopped by adding glycine (0.2 M in 1 × PBS (phosphate buffered saline)) and incubating on ice for 2 min. Samples were spun at 2500g for 5 min at 4 °C and washed 4 times with 1 × PBS. The pellets were flash frozen and stored at −80 °C. Cells were lysed, and nuclei were isolated and sonicated for 8 min using a Covaris S220 (105 Watts, 2% duty cycle, 200 cycles per burst) to obtain approximately 200–500 bp chromatin fragments. Chromatin fragments were pre-cleared with protein G magnetic beads (New England BioLabs) and incubated with pre-bound anti-H3K27me3 (Qiagen), anti-EZH2 (eBiosciences), or mouse IgG1 (Santa Cruz Biotechnology) antibody-protein G magnetic beads overnight at 4 °C. Beads were washed once in low-salt buffer (20 mM Tris, pH 8.1, 2 mM EDTA, 50 mM NaCl, 1% Triton X-100, 0.1% SDS), twice in high-salt buffer (20 mM Tris, pH 8.1, 2 mM EDTA, 500 mM NaCl, 1% Triton X-100, 0.1% SDS), once in LiCl buffer (10 mM Tris, pH 8.1, 1 mM EDTA, 0.25 mM LiCl, 1% NP-40, 1% deoxycholic acid) and twice in TE buffer (10 mM Tris-HCl, pH 8. 0, 1 mM EDTA). Washed beads were eluted twice with 100 μl of elution buffer (1% SDS, 0.1 M NaHCO ) and de-crosslinked (0.1 mg ml−1 RNase, 0.3 M NaCl and 0.3 mg ml−1 Proteinase K) overnight at 65 °C. The DNA samples were purified with Qiaquick PCR columns (Qiagen). qPCR was carried out on a ViiA7 Real-Time PCR System (ThermoFisher) using the SYBR Green detection system and indicated primers. Expression values of target loci were directly normalized to the indicated positive control loci, such as MyoD1 for H3K27me3 and EZH2 ChIP analysis, and Actb for Pol II ChIP analysis. ChIP–qPCR primer sequences are listed in Supplementary Table 1. 50,000 wild-type and knockout cells, in triplicate, were spun at 500g for 5 min at 4 °C, washed once with 50 μl of cold 1× PBS and centrifuged in the same conditions. Cells were resuspended in 50 μl of ice-cold lysis buffer (10 mM Tris-HCl, pH7.4, 10 mM NaCl, 3 mM MgCl , 0.1% IGEPAL CA-630). Cells were immediately spun at 500g for 10 min at 4 °C. Lysis buffer was carefully pipetted away from the pellet, which was then resuspended in 50 μl of the transposition reaction mix (25 μl 2× TD buffer, 2.5 μl Tn5 Transposase (Illumina), 22.5 μl nuclease-free water) and then incubated at 37 °C for 30 min. DNA purification was performed using a Qiagen MinElute kit and eluted in 12 μl of Elution buffer (10 mM Tris buffer, pH 8.0). To amplify library fragments, 6 μl of the eluted DNA was mixed with NEBnext High-Fidelity 2× PCR Master Mix, 25 μM of customized Nextera PCR primers 1 and 2 (Supplementary Table 1), 100x SYBR Green I and used in PCR as follow: 72 °C for 5 min; 98 °C for 30 s; and thermocycling 4 times at 98 °C for 10 s; 63 °C for 30 s; 72 °C for 1 min. 5 μl of the 5 cycles PCR amplified DNA was used in a qPCR reaction to estimate the additional number of amplification cycles. Libraries were amplified for a total of 10–11 cycles and were then purified using a Qiagen PCR Cleanup kit and eluted in 30 μl of Elution buffer. The libraries were quantified using qPCR and bioanalyser data, and then normalized and pooled to 2 nM. Each 2 nM pool was then denatured with a 0.1 N NaOH solution in equal parts then further diluted to form a 20 pM denatured pool. This pool was then further diluted down to 1.8 pM for sequencing using the NextSeq500 machine on V2 chemistry and sequenced on a 1 × 75 bp Illumina NextSeq flow cell. ATAC sequencing cells was done on Illumina NextSeq at a sequencing depth of ~40–60 million reads per sample. Libraries were prepared in triplicates. Raw reads were deposited under GSE85073. 2 × 75 bp paired-end reads were mapped to the mouse mm9 genome using ‘bwa’ algorithm with ‘mem’ option. Only reads that uniquely mapped to the genome were used in subsequent analysis. Duplicate reads were eliminated to avoid potential PCR amplification artifacts and to eliminate the high numbers of mtDNA duplicates observed in ATAC–seq libraries. Post-alignment filtering resulted in ~26–40 million uniquely aligned singleton reads per library and the technical replicates were merged into one alignment BAM file to increase the power of open chromatin signal in downstream analysis. Depicted tracks were normalized to total read depth. ATAC–seq enriched regions (peaks) in each sample was identified using MACS2 using the below settings: 10 × 106 wild-type and knockout mice neutrophils were cross-linked in a 1% formaldehyde solution for 10 min at room temperature while rotating. Crosslinking was stopped by adding glycine (0.2 M in 1 × PBS) and incubating on ice for 2 min. Samples were spun at 2500g for 5 min at 4 °C and washed 4 times with 1× PBS. The pellets were flash frozen and stored at −80 °C. Cells were lysed and sonicated (Branson Sonifier 250) for 9 cycles (30% amplitude; time, 20 s on, 1 min off). Lysates were spun at 18,400g for 10 min at 4 °C and resuspended in 3 ml of lysis buffer. A sample of 100 μl was kept aside as input and the rest of the samples were divided by the number of antibodies to test. Chromatin immunoprecipitation was performed with 10 μg of antibody-bound beads (anti-H3K27ac, H3K4me3, H3K4me1, H3K36me3 (Abcam) and anti-rabbit IgG (Santa Cruz), Dynal Protein G magnetic beads (Invitrogen)) and incubated overnight at 4 °C. Bead-bound DNA was washed, reverse cross-linked and eluted overnight at 65 °C, shaking at 950 r.p.m. Beads were removed using a magnetic stand and eluted DNA was treated with RNase A (0.2 μg μl−1) for 1 h at 37 °C shaking at 950 r.p.m., then with proteinase K (0.2 μg μl−1) for 2 h at 55 °C. 30 μg of glycogen (Roche) and 5 M of NaCl were adding to the samples. DNA was extracted with 1 volume of phenol:chlorofrom:isoamyl alcohol and washed out with 100% ethanol. Dried DNA pellets were resuspended in 30 μl of 10 mM Tris HCl, pH 8.0, and DNA concentrations were quantified using Qubit. Starting with 10 ng of DNA, ChIP–seq libraries were prepared using the KAPA Hyper Prep Kit (Kapa Biosystems, Inc.) with 10 cycles of PCR. The libraries were quantified using qPCR and bioanalyser data then normalized and pooled to 2 nM. Each 2 nM pool was then denatured with a 0.1 N NaOH solution in equal parts then further diluted to form a 20 pM denatured pool. This pool was then further diluted down to 1.8 pM for sequencing using the NextSeq500 machine on V2 chemistry and sequenced on a 1 × 75 bp Illumina NextSeq flow cell. ChIP sequencing was done on an Illumina NextSeq at a sequencing depth of ~30–40 million reads per sample. Raw reads were deposited under GSE85073. 75 bp single-end reads were mapped to the mouse mm9 genome using ‘bowtie2’ algorithm. Duplicate reads were eliminated to avoid potential PCR amplification artifacts and only reads that uniquely mapped to the genome were used in subsequent analysis. Depicted tracks were normalized to control IgG input sample. ChIP–seq-enriched regions (peaks) in each sample was identified using MACS2 using the below settings: 107 immortalized BMDMs were collected by trypsinization and resuspended in 2 ml PBS, 2 ml nuclear isolation buffer (1.28 M sucrose; 40 mM Tris-HCl, pH 7.5; 20 mM MgCl ; 4% Triton X-100), and 6 ml water on ice for 20 min (with frequent mixing). Nuclei were pelleted by centrifugation at 2,500g for 15 min. Nuclear pellets were resuspended in 1 ml RNA immunoprecipitation (RIP) buffer (150 mM KCl, 25 mM Tris, pH 7.4, 5 mM EDTA, 0.5 mM DTT, 0.5% NP40; 100 U ml−1 SUPERaseIn, Ambion; complete EDTA-free protease inhibitor, Sigma). Resuspended nuclei were split into two fractions of 500 μl each (for mock and immunoprecipitation) and were mechanically sheared using a dounce homogenizer. Nuclear membrane and debris were pelleted by centrifugation at 15,800g. for 10 min. Antibody to EZH2 (Cell Signaling 4905S; 1:30) or normal rabbit IgG (mock immunoprecipitation, SantaCruz; 10 μg) were added to supernatant and incubated for 2 hours at 4 °C with gentle rotation. 25 μl of protein G beads (New England BioLabs S1430S) were added and incubated for 1 hour at 4 °C with gentle rotation. Beads were pelleted by magnetic field, the supernatant was removed, and beads were resuspended in 500 μl RIP buffer and repeated for a total of three RIP buffer washes, followed by one wash in PBS. Beads were resuspended in 1 ml of Trizol. Co-precipitated RNAs were isolated, reverse-transcribed to cDNA, and assayed by qPCR for the Hprt and Morrbid-isoform1. Primer sequences are listed in Supplementary Table 1. EZH2 PAR–CLIP dataset (GSE49435) was analysed as previously described22. Adapter sequences were removed from total reads and those longer than 17 bp were kept. The Fastx toolkit was used to remove duplicate sequences, and the resulting reads were mapped using BOWTIE allowing for two mismatches. The four independent replicates were pooled and analysed using PARalyzer, requiring at least two T→C conversions per RNA–protein contact site. lncRNAs were annotated according to Ensemble release 67. 13 × 106 wild-type bone marrow derived mouse eosinophils were fixed with 1% formaldehyde for 10 minutes at room temperature, and quenched with 0.2 M glycine on ice. Eosinophils were lysed for 3–4 hours at 4 °C (50 mM Tris, pH 7.4, 150 mM NaCl, 0.5% NP-40, 1% Triton X-100, 1× Roche complete protease inhibitor) and dounce-homogenized. Lysis was monitored by Methyl-green pyronin staining (Sigma). Nuclei were pelleted and resuspended in 500 μl 1.4× NEB3.1 buffer, treated with 0.3% SDS for one hour at 37 °C, and 2% Triton X-100 for another hour at 37 °C. Nuclei were digested with 800 units BglII (NEB) for 22 hours at 37 °C, and treated with 1.6% SDS for 25 minutes at 65 °C to inactivate the enzyme. Digested nuclei were suspended in 6.125 ml of 1.25× ligation buffer (NEB), and were treated with 1% Triton X-100 for one hour at 37 °C. Ligation was performed with 1,000 units T4 DNA ligase (NEB) for 18 hours at 16 °C, and crosslinks were reversed by proteinase K digestion (300 μg) overnight at 65 °C. The 3C template was treated with RNase A (300 μg), and purified by phenol-chloroform extraction. Digested and undigested DNA were run on a 0.8% agarose gel to confirm digestion. To control for PCR efficiency, two bacterial artificial chromosomes (BACs) spanning the region of interest were combined in equimolar quantities and digested with 500 units BglII at 37 °C overnight. Digested BACs were ligated with 100 units T4 Ligase HC (Promega) in 60 μl overnight at 16 °C. Both BAC and 3C ligation products were amplified by qPCR (Applied Biosystems ViiA7) using SYBR fast master mix (KAPA biosystems). Products were run side by side on a 2% gel, and images were quantified using ImageJ. Intensity of 3C ligation products was normalized to intensity of respective BAC PCR product. Mice were infected with 30,000 CFUs of Listeria monocytogenes (strain 10403s, obtained as a gift from E. J. Wherry) intravenously (i.v.). Mice were weighed and inspected daily. Mice were analysed at day 4 of infection to determine the CFUs of L. monocytogenes present in the spleen and liver. Papain was purchased from Sigma Aldrich and resuspended in at 1 mg ml−1 in PBS. Mice were intranasally challenged with 5 doses of 20 μg papain in 20 μl of PBS or PBS alone every 24 hours. Mice were killed 12 hours after the last challenge. Bronchoalveolar lavage was collected in two 1 ml lavages of PBS. Cellular lung infiltrates were collected after 1 hour digestion in RPMI supplemented with 5% FCS, 1 mg ml−1 collagenase D (Roche) and 10 μg ml−1 DNase I (Invitrogen) at 37 °C. Homogenates were passed through a cell strainer and infiltrates separated with a 27.5%, Optiprep gradient (Axis-Shield) by centrifugation at 1,175g for 20 min. Cells were removed from the interface and treated with ACK lysis buffer. Congenic C57BL/6 (wild-type) bone marrow expressing CD45.1 and CD45.2 and Morrbid-deficient bone marrow expression CD45.2 was mixed in a 1:1 ratio and injected into C57BL/6 hosts irradiated twice with 5 Gy 3 hours apart that express CD45.1 (B6.SJL-Ptprca Pepcb/BoyJ). Mice were analysed between 4–9 weeks after injection. Bone marrow was isolated and cultured as previously described9. Briefly, unfractionated bone marrow cells were cultured with 100 ng ml−1 stem cell factor (SCF) and 100 ng ml−1 FLT3-ligand (FLT3-L). At day 4, the media was replaced with media containing 10 ng ml−1 interleukin (IL-5). Mature bone-marrow-derived eosinophils were analysed between day 10–14. Bone marrow cells were isolated and cultured in media containing recombinant mouse M-CSF (10 ng ml−1) for 7–8 days. On day 7–8, cells were re-plated for use in experimental assays. Bone-marrow-derived macrophages were stimulated with LPS (250 ng ml−1) for the indicated periods of time. Briefly, 40 × 107 Immortalized bone-marrow-derived macrophages were fixed with 40 ml of 1% glutaraldehyde for 10 min at room temperature. Crosslinking was quenched with 0.125 M glycine for 5 min. Cells were rinsed with PBS, pelleted for 4 min at 2,000g, snap-frozen in liquid nitrogen, and stored at −80 °C. Cell pellets were thawed at room temperature and resuspended in 800 μl of lysis buffer (50 mM Tris-HCl, pH 7.0, 10 mM EDTA, 1% SDS, 1 mM PMSF, complete protease inhibitor (Roche), 0.1 U ml−1 Superase In (Life Technologies)). Cell suspension was sonicated using a Covaris S220 machine (Covaris; 100 W, duty factor 20%, 200 cycles per burst) for 60 minutes until DNA was in the size range of 100–500 bp. After centrifugation for 5 min at 16100 g at 4 °C, the supernatant was aliquoted, snap-frozen in liquid nitrogen, and stored at −80 °C. 1 ml of chromatin was diluted in 2 ml hybridization buffer (750 mM NaCl, 1% SDS, 50 mM Tris HCl, pH 7.0, 1 mM EDTA, 15% formamide) and input RNA and DNA aliquots were removed. 100 pmoles of probes (Supplementary Table 1) were added and mixed by rotation at 37 °C for 4 h. Streptavidin paramagnetic C1 beads (Invitrogen) were equilibrated with lysis buffer. 100 μl washed C1 beads were added, and the entire reaction was mixed for 30 min at 37 °C. Samples were washed five times with 1 ml of washing buffer (SSC 2×, 0.5% SDS and fresh PMSF). 10% of each sample was removed from the last wash for RNA isolation. RNA aliquots were added to 85 μl RNA PK buffer, pH 7.0, (100 mM NaCl, 10 mM TrisCl, pH 7.0, 1 mM EDTA, 0.5% SDS, 0.2 U μl−1 proteinase K) and incubated for 45 min with end-to-end shaking. Samples were spun down, and boiled for 10 min at 95 °C. Samples were chilled on ice, added to 500 μl TRizol, and RNA was extracted according to the manufacturer’s recommendations. Equal volume of RNA was reverse-transcribed and assayed by qPCR using Hprt and Morrbid-exon1-1 primer sets (Supplementary Table 1). DNA was eluted from remaining bead fraction twice using 150 μl DNA elution buffer (50 mM NaHCO , 1%SDS, 200 mM NaCl, 100 μg ml−1 RNase A, 100 U ml−1 RNase H) incubated for 30 min at 37 °C. DNA elutions were combined and treated with 15 μl (20 mg ml−1) Proteinase K for 45 min at 50 °C. DNA was purified using phenol:chloroform:isoamyl and assayed by qPCR using the indicated primer sequences (Supplementary Table 1). shRNAs of indicated sequences (Supplementary Table 1) were cloned into pGreen shRNA cloning and expression lentivector. Psuedotyped lentivirus was generated as previously described, and 293T cells were transfected with a packaging plasmid, envelop plasmid, and the generated shRNA vector plasmid using Lipofectamine 2000. Virus was collected 14–16 h and 48 h after transfection, combined, 0.4-μm filtered, and stored at −80 °C. For generation of in vivo BM chimaeras, virus was concentrated 6 times by ultracentrifugation using an Optiprep gradient (Axis-Shield). For transduced BM-derived eosinophils, cultured BM cells on day 3 of previously described culture conditions were mixed 1:1 with indicated lentivirus and spinfected for 2 h at 260g at 25 °C with 5 μg ml−1 polybrene. Cultures were incubated overnight at 37 °C, and media was exchanged for IL-5 containing media at day 4 of culture as previously described9. Cells were sorted for GFP+ cells on day 5 of culture, and then cultured as previously described for eosinophil generation. Cells were assayed on day 11 of culture. For transduced in vivo BM chimaeras, BM cells were cultured at 2.5 × 106 cells per ml in mIL-3 (10 ng ml−1), mIL-6 (5 ng ml−1) and mSCF (100 ng ml−1) overnight at 37 °C. Culture was readjusted to 2 ml at 2.5 × 106 cells per ml in a 6-well plate, and spinfected for 2 h at 260g at 25 °C with 5 μg ml−1 polybrene. Cells were incubated overnight at 37 °C. On the day before transfer, recipient hosts were irradiated twice with 5 Gy 3 hours apart. Mice were analysed between 4 and 5 weeks following transfer. Bone marrow-derived macrophages (BMDMs) were transfected with pooled Morrbid or scrambled locked nucleic acid (LNA) antisense oligonucleotides of equivalent total concentrations using Lipofectamine 2000. Morrbid LNA pools contained Morrbid LNA 1-4 sequences at a total of 50 or 100 nM (Supplementary Table 1). After 24 h, the transfection media was replaced. The BMDMs were incubated for an additional 24 h and subsequently stimulated with LPS (250 ng ml−1) for 8−12 h. Eosinophils were derived from mouse BM as previously described. On day 12 of culture, 1 × 106 to 2 × 106 eosinophils were transfected with 50 nm of Morrbid LNA 3 or scrambled LNA (Supplementary Table 1) using TransIT-oligo according to manufacturer’s protocol. RNA was extracted 48 h after transfection. Guide RNAs (gRNAs) targeting the 5’ and 3’ flanking regions of the Morrbid promoter were cloned into Cas9 vectors pSPCas9(BB)-2A-GFP(PX458) (Addgene plasmid 48138) and pSPCas9(BB)-2A-mCherry (a gift from the Stitzel lab, JAX-GM) respectively. gRNA sequences are listed in Supplementary Table 1. The cloned Cas9 plasmids were then transfected into RAW 264.7, a mouse macrophage cell line using Lipofectamine 2000, according to manufacturer’s protocol. Forty–eight hours post transfection the double positive cells expressing GFP and mcherry, and the double negative cells lacking GFP and mcherry were sorted. The bulk sorted cells were grown in a complete media containing 20% FBS, assayed for deletion by PCR, as well as for Morrbid and Bcl2l11 transcript expression by qPCR. BM-derived eosinophils, or neutrophils or Ly6Chi monocytes sorted from mouse BM, were rested for 4–6 hours at 37 °C in complete media. Cells were subsequently stimulated with IL-3 (10 ng ml−1, Biolegend), IL-5 (10 ng ml−1, Biolegend), GM-CSF (10 ng ml−1, Biolegend), or G-CSF (10 ng ml−1, Biolegend) for 4–6 h. RNA was collected at each time-point using TRIzol (Life Technologies). Wild-type and Bcl2l11−/− BM-derived eosinophils were generated as previously described9. On day 8 of culture, the previously described IL-5 media was supplemented with the indicated concentrations of the EZH2-specific inhibitor GSK126 (Toronto Research Chemicals). Media was exchanged for fresh IL-5 GSK126 containing media every other day. Cells were assayed for numbers and cell death by flow cytometry every day for 6 days following GSK126 treatment. Total RNA was extracted from TRIzol (Life Technologies) according to the manufacturer’s instructions. Gycogen (ThermoFisher Scientific) was used as a carrier. Isolated RNA was quantified by spectophotemetry, and RNA concentrations were normalized. cDNA was synthesized using SuperScript II Reverse Transcriptase (ThermoFisher Scientific) according to the manufacturer’s instructions. Resulting cDNA was analysed by SYBR Green (KAPA SYBR Fast, KAPABiosystems) or Taqman-based (KAPA Probe Fast, KAPABiosystems) using indicated primers. Primer sequences are listed in Supplementary Table 1. All reactions were performed in duplicate using a CFX96 Touch instrument (BioRad) or ViiA7 Real-Time PCR instrument (ThermoFischer Scientific). Reads generated from mouse (Gr1+) granulocytes (previously published GSE53928), human neutrophils (previously published GSE70068), and bovine peripheral blood leukocytes (previously published GSE60265) were filtered, normalized, and aligned to the corresponding host genome. Reads mapping around the Morrbid locus were visualized. For visualization of the high level of Morrbid expression in short-lived myeloid cells, reads from sorted mouse eosinophils (previously published GSE69707), were filtered, aligned to mm9, normalized using RPKM, and gene expression was plotted in descending order. For each human sample corresponding to the indicated stimulation conditions, the number of reads mapping to the human MORRBID locus per total mapped reads was determined. For conservation across species, the genomic loci and surrounding genomic regions for the species analysed were aligned with mVista and visualized using the rankVista display generated with mouse as the reference sequence. Green highlights annotated mouse exonic regions and corresponding regions in other indicated species. Single molecule RNA fluorescence in situ hybridization (FISH) was performed as previously described. A pool of 44 oligonucleotides (Biosearch Technologies) were labelled with Atto647N (Atto-Tec). For validation purposes, we also labelled subsets consisting of odd and even numbered oligonucleotides with Atto647N and Atto700, respectively, and looked for colocalization of signal. We designed the oligonucleotides using the online Stellaris probe design software. Probe oligonucleotide sequences are listed in Supplementary Table 1. Thirty Z-sections with a 0.3-μm spacing were taken for each field of view. We acquired all images using a Nikon Ti-E widefield microscope with a 100× 1.4NA objective and a Pixis 1024BR cooled CCD camera. We counted the mRNA in each cell by using custom image processing scripts written in MATLAB. For nuclear and cytoplasmic fractionation, 5 × 106 BMDMs were stimulated with 250 ng ml−1 LPS for 4 hours. Cells were collected and washed once with cold PBS. Cells were pelleted, resuspended in 100 μl cold NAR A buffer (10 mM HEPES, pH 7.9, 10 mM KCl, 0.1 mM EDTA, 1× complete EDTA-free protease inhibitor, Sigma; 1 mM DTT, 20 mM β-glycerophasphate, 0.1 U μl−1 SUPERaseIn, Life Technologies), and incubated at 4 °C for 20 min. 10 μl 1% NP-40 was added, and cells were incubated for 3 min at room temperature. Cells were vortexed for 30 seconds, and centrifuged at 3,400g. for 1.5 min at 4 °C. Supernatant was removed, centrifuged at full speed for 90 min at 4 °C, and remaining supernatant was added to 500 μl Trizol as the cytoplasmic fraction. The original pellet was washed 4 times in 100 μl NAR A with short spins of 6,800g. for 1 min. The pellet was resuspended in 50 μl NAR C (20 mM HEPES, pH 7.9, 400 mM NaCl, 1 mM EDTA, 1× complete EDTA-free protease inhibitor, Sigma, 1 mM DTT, 20mM β-glycerophasphate, 0.1 U μl−1 SUPERaseIn, Life Technologies). Cells were vortexed every 3 min for 10 s for a total of 20 min at 4 °C. The sample was centrifuged at maximum speed for 20 min at room temperature. Remaining supernatant was added to 500 μl Trizol as the nuclear fraction. Equivalent volumes of cytoplasmic and nuclear RNA were converted to cDNA using gene specific primers and Super Script II RT (Life Technologies). Fraction was assessed by qPCR for Morrbid-exon1-1 and other known cytoplasmic and nuclear transcripts. Primer sequences are listed in Supplementary Table 1. For cytoplasmic, nuclear, and chromatin fractionation, cell fractions 5 × 106 to 10 × 106 immortalized macrophages were activated with 250 ng ml−1 LPS (Sigma) for 6 hours at 37 °C. Cells were washed 2× with PBS, and then resuspended in 380 μl ice-cold HLB (50 mM Tris-HCl, pH7.4, 50 mM NaCl, 3 mM MgCl , 0.5% NP-40, 10% glycerol), supplemented with 100 U SUPERase In RNase Inhibitor (Life Technologies). Cells were vortexed 30 s and incubated on ice for 30 min, followed by a final 30 s vortex and centrifugation at 4 °C for 5 min × 1000g. Supernatant was collected as the cytoplasmic fraction. Nuclear pellets were resuspended by vortexing in 380 μl ice-cold MWS (50 mM Tris-HCl, pH7.4, 4 mM EDTA, 0.3 M NaCl, 1 M urea, 1% NP-40) supplemented with 100 U SUPERase in RNase Inhibitor. Nuclei were lysed on ice for 10 min, vortexed for 30 s, and incubated on ice for 10 more min to complete lysis. Chromatin was pelleted by centrifugation at 4 °C for 5 min × 1000g. Supernatant was collected as the nucleoplasmic fraction. RNA was collected as described previously and cleaned up using the RNeasy kit (Qiagen). Equivalent volumes of cytoplasmic, nucleoplasmic, and chromatin-associated RNA were converted to cDNA using random hexamers and Super Script III RT (Life Technologies). Fraction was assessed by qPCR for Morrbid-exon1-2 and other known cytoplasmic and nuclear transcripts. Primer sequences are listed in Supplementary Table 1. Morrbid cDNA was cloned into reference plasmid (pCDNA3.1) containing a T7 promoter. The plasmid was linearized and Morrbid RNA was in vitro transcribed using the MEGAshortscript T7 kit (Life Technologies), according to the manufacturer’s recommendations, and purified using the MEGAclear kit (Life Technologies). RNA was quantified using spectrophotometry and serial dilutions of Morrbid RNA of calculated copy number were spiked into Morrbid-deficient RNA isolated from Morrbid-deficient mouse spleen. Samples were reverse transcribed in parallel with wild-type-sorted neutrophil RNA and B-cell RNA isolated from known cell number using gene-specific Morrbid primers, and the Morrbid standard curve and wild-type neutrophils and B cells were assayed using qPCR with Morrbid-exon 1 primer sets (Supplementary Table 1) Cohorts of mice were given a total of 4 mg bromodeoxyuridine (BrdU; Sigma Aldrich) in 2 separate intraperitoneal (i.p.) injections 3 h apart and monitored over the subsequent 5 days, unless otherwise noted. For analysis cells were stained according to manufacturer protocol (BrdU Staining Kit, ebioscience; anti-BrdU, Biolgend). A one-phase exponential curve was fitted from the peak labelling frequency to 36 h after peak labelling within each genetic background, and the half-life was determined from this curve. Study subjects were recruited and consented in accordance with the University of Pennsylvania Institutional Review Board. Peripheral blood was separated by Ficoll–Paque density gradient centrifugation, and the mononuclear cell layer and erythrocyte/granulocyte pellet were isolated and stained for fluorescence-associated cell sorting as previously described. Neutrophils (live, CD16+F4/80intCD3−CD14−CD19−), eosinophils (live, CD16−F4/80hiCD3−CD14−CD19−), T cells (live, CD3+CD16−), monocytes (live, CD14+CD3−CD16−CD56−), natural killer (NK) cells (live, CD56+CD3−CD16−CD14−), B cells (live, CD19+CD3−CD16−CD14−CD56−). Samples from human subjects were collected on NIAID IRB-approved research protocols to study eosinophilic disorders (NCT00001406) or to provide controls for in vitro research (NCT00090662). All participants gave written informed consent. Eosinophils were purified from peripheral blood by negative selection and frozen at –80 oC in TRIzol (Life Technologies). Purity was >97% as assessed by cytospin. RNA was purified according to the manufacturer’s instructions. Expression analysis by qPCR was performed in a blinded manner by an individual not involved in sample collection or coding of these of these samples. Plasma IL-5 levels were measured by suspension array in multiplex (Millipore). The minimum detectable concentration was 0.1 pg ml−1. RAW 264.7 cells were obtained from ATCC and were not authenticated, but were tested for mycoplasma contamination biannually. Immortalized C57/B6 macrophages were obtained as a generous gift from I. Brodsky. These cells were not authenticated, but were tested for mycoplasma contamination biannually. Samples sizes were estimated based on our preliminary phenotyping of Morrbid-deficient mice. Preliminary cell number analysis of eosinophils, neutrophils, and Ly6Chi monocytes suggested that there were very large differences between wild-type and Morrbid-deficient samples, which would allow statistical interpretation with relatively small numbers and no statistical methods were used to predetermine sample size. No animals were excluded from analysis. All experimental and control mice and human samples were run in parallel to control for experimental variability. The experiments were not randomized. Experiments corresponding to Fig. 3g–i and Fig. 4g–j were performed and analysed in a single-blinded manner. All other experiments were not blinded to allocation during experiments and outcome assessment. Correlation was determined by calculating the Spearman correlation coefficient. Half-life was estimated by calculating the one-phase exponential decay constant from the peak of labelling frequency to 36 h after peak labelling. P values were calculated using a two-way t-test, Mann–Whitney U-test, one-way ANOVA with Tukey post-hoc analysis, Kaplan–Meier Mantel–Cox test, and false discovery rate (FDR) as indicated. FDR was calculated using trimmed mean of M-values (TMM)-normalized read counts and the DiffBind R package as described in Extended Data Fig. 7c, d. All error bars indicate mean plus and minus the standard error of mean (s.e.m.).

No statistical methods were used to predetermine sample size. The investigators were not blinded to allocation during experiments and outcome assessment. One cell line25 was used (subclone C10) in this study. Cell line identity was verified by its responsiveness to tamoxifen, resulting in conversion of cells into macrophages (quantified by surface marker expression of Mac1 and Cd14). Cells were tested for mycoplasma contamination and found to be negative. For the pooled RNAi screen, shRNAs were expressed from the LENC vector, which has been described previously12. For the primary screen validation, timeline experiment and immunofluorescence staining, mouse shRNAs were cloned individually into LENC. For the double knockdown experiment, shRNAs were cloned into LEPC (MSCV-mirE-PGK-Puro-IRES-mCherry). For the reprogramming dynamics experiment and chimaera mouse production, shRNAs were cloned into RT3CEPIN (TRE3G-mCherry-mirE-PGK-Puro-IRES-Neo). For reprogramming experiments with non-transgenic systems, previously published OKSM lentiviral vectors were modified to introduce promoters of different strength, which are described in the main figures. Packaging cells (Platinum-E Retroviral Packaging Cell Line) for producing retroviral particles were cultured in DMEM supplemented with 15% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM) and l-glutamine (4 mM) at 37 °C with 5% CO . Mouse embryonic fibroblasts (MEF) were cultured in DMEM supplemented with 15% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM), l-glutamine (4 mM), l-ascorbic acid (50 μM) at 37 °C with low oxygen (4.5% O ). iPS cells were derived in DMEM supplemented with 15% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM), l-glutamine (4 mM), 1,000 U ml−1 LIF, 0.1 mM beta-mercaptoethanol, and 50 μg  ml−1 ascorbic acid at 37 °C with 5% CO and 4.5% O . For Tet-inducible OKSM expression, doxycycline was added at a concentration of 1 µg ml−1 (unless indicated otherwise). iPS cells for blastocyst injection were cultured on feeders in DMEM supplemented with 13% knockout serum replacement (Gibco), 2% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM), l-glutamine (4 mM), l-ascorbic acid (50 μM), 1000 U ml−1 LIF, beta-mercaptoethanol (50 µM), MEK inhibitor (PD0325901, 1 μM) and GSK3 inhibitor (CHIR99021, 3 μM) at 37 °C with 5% CO . Conventional reprogramming media consisted of DMEM supplemented with 15% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM), l-glutamine (4 mM), 1,000 U ml−1 LIF, 0.1 mM beta-mercaptoethanol unless otherwise noted. For some experiments, media was supplemented with MEK inhibitor (1 μM), GSK3 inhibitor (3 μM), Dot1l inhibitor (1 μM) or ascorbate (50 μg ml−1). Reprogrammable MEFs containing either one or two copies of the Col1a1::tetOP-OKSM, Oct4–GFP and Rosa26 M2rtTA alleles10 were derived from E13.5 embryos. MEFs were prepared after carefully excluding internal organs, heads, limbs and tails. Tissues were chopped into small clumps using scalpels and trypsin and subsequently expanded in MEF medium at low O (4%). MEFs were frozen at passage 0 upon derivation and used at passages 1–3 for all downstream transduction and reprogramming experiments. MEFs were generally cultured at low O (4%) and supplemented with ascorbate to prevent replicative senescence before OKSM induction. Reprogramming experiments were initiated at low oxygen levels during doxycycline induction and completed at normal oxygen levels (20%) for experiments using miR-E vectors. MiR-30 assays were performed under normal oxygen levels. To generate Col1a1::tetOP-miR30-tRFP-Ren.713 and Col1a1::tetOP-miR30-tRFP-Chaf1a.164 shRNA knock-in MEFs, miR30-based shRNAs targeting Chaf1a.164 or Ren.713 were cloned into a targeting vector as previously described35 except that the GFP reporter was replaced with a turbo RFP reporter. ES cells harbouring the R26-M2rtTA allele were targeted with these constructs and mice were generated by blastocyst injection. MEFs were harvested using standard protocols. HSP cells were isolated from fetal livers of the same mid-gestation reprogrammable transgenic embryos used for MEFs derivation, dissociated by vigorous pipetting with a 1 ml tip, filtered using a 35 μm nylon mesh, followed by red blood cell lysis, and cultured in RPMI/FBS media supplemented with stem cell factor (SCF), IL3 and IL6 and transduced as indicated in the schematic (Fig. 4a). Single shRNA clones were picked from the master library at CSHL, arrayed in 12 × 96-well plates and sequence-verified individually using miR-30 backbone primers. An additional 200 unmatched clones were re-picked and sequenced to allow maximum coverage of the library. Reprogrammable MEFs carrying the OKSM inducible cassette and constitutive rtTA (Col1a1::tetOP-OKSM; R26-M2rtTA) were seeded at 104 cells per well in 96-well plates in duplicates and infected with the corresponding retroviral virus particles freshly produced and filtered. 48 h post-transduction, MEFs from each row of the 96-well plate were trypsinized and transferred to 6-well dishes coated with 0.2% gelatin in standard reprogramming media supplemented with doxycycline (2 µg ml−1) and G418 at 0.2 mg ml−1 for the first 6 days of OKSM expression. Doxycycline was withdrawn at day 12, allowing stable iPS cells to form. iPS cell colonies were then stained for alkaline phosphatase expression using the Vector Red Alkaline Phosphatase Substrate Kit (VectorLabs) according to the supplier’s protocol, and plates were scanned using a Perfection V500 Photo scanner (Epson). To determine relative reprogramming efficiencies (Fig. 1c and Supplementary Table 1), automated counting of iPS cell colonies was performed using the image-processing software CL-Quant (Nikon) and a custom algorithm provided by NIKON. Data were normalized to Ren.713 control. A miR-E-based chromatin library comprising 5,049 sequence-verified shRNAs targeting 615 known and predicted chromatin regulators was constructed by subcloning pools of sequence-verified miR-30 shRNAs into pLENC and combining them at equimolar concentrations into one pool12. This pool was transduced into MEFs carrying the Col1a1::tetOP-OKSM and R26-M2rtTA alleles, as well as a Pou5f1–EGFP reporter (termed Oct4–GFP) under conditions predominantly yielding a single retroviral integration in the genome. To generate a large number of independent biological replicates, primary MEFs from 4 triple transgenic embryos were transduced with the entire pool of 5,049 shRNAmiRs in 12 independent replicates at a representation of >100 cells per shRNA, yielding a total of 48 replicates (see Fig. 1b). After 36 h, MEFs were treated with 0.5 mg ml−1 G418 for 3 days and 0.25 mg ml−1 G418 for an additional 3 days. MEFs from each replicate were plated at densities of 500,000 cells per 15 cm dish 3 or 6 days post-transduction, and induced with doxycycline (1 µg ml−1) for 7 days in medium containing serum and LIF, supplemented with ascorbate (50 µg ml−1). After passaging for an additional 4 days in doxycycline-free ES cell media, Oct4–GFP-expressing cells were sorted from each replicate using a FACSAriaIII (BD Bioscience). Genomic DNA from infected MEFs (3d after infection) and sorted Oct4-GFP iPS cells from each replicate was isolated using proteinase K lysis, followed by two rounds of phenol extraction using PhaseLock tubes (5prime) and isopropanol precipitation. Templates for deep-sequencing were generated by PCR amplification of shRNA guide strands using primers that tag the product with standard Illumina adapters (p7+loop, CAAGCAGAAGACGGCATACGA[4-nt barcode]TAGTGAAGCCACAGATGT; p5+PGK, AATGATACGGCGACCACCGATGGATGTGGAATGTGTGCGAGG). For each sample, DNA was amplified in 12 parallel 50 μl PCR reactions using Encyclo Polymerase (Evrogen). PCR products were combined for each sample, precipitated and purified on a 2% agarose gel. Samples were analysed on an Illumina High Seq 2500 and sequenced using a primer that reads in reverse into the guide strand (mirEEcoR1Seqprimer, TAGCCCCTTGAAGTCCGAGGCAGTAGGCA). Sequence processing was performed using a customized Galaxy platform. In all 96 iPS cell samples (48 biological replicates, 3 or 6 days knockdown before OKSM expression) the normalized reads of each shRNA were divided by the normalized reads in MEFs 3 days after viral transduction, and the resulting ratio was used to calculate a score for each shRNA in each replicate (default score = 0; score = 1 if ratio  >1, score = 3 if ratio >10). Scores of each shRNA in 48 replicates were added separately for the day 3 and day 6 time point, yielding a sum score to estimate the overall enrichment of each shRNA over all replicates. All shRNA sequences and primary results from the arrayed and the multiplexed screen are provided in Supplementary Tables 1 and 2, respectively. shRNAs are identified by numbers (e.g. Ren.713, Chaf1a.164), defined as the 5′ nucleotide of the guide binding site in the target transcript at the time of shRNA design. Retroviral constructs were introduced into Platinium-E Retroviral Packaging cells using calcium phosphate transfection or lipofection as previously described36. shRNAs were transduced into primary MEFs carrying single copies of the Col1a1::tetOP-OKSM and R26-M2rtTA alleles as well as the Oct4–EGFP reporter. For some experiments, Oct4–tomato knock-in MEFs were used; the Oct4–tomato allele was generated equivalently to the Oct4–GFP allele37. For transduction, 180,000 cells were plated per well in a 6-well dish; all vectors were transduced in biological triplicate. After 36 h, transduced cells were selected with 0.5 mg ml−1 G418 for 3 days and 0.25 mg ml−1 G418 for an additional 3 days. Then 3 days after shRNA transduction, infected cells were washed with PBS (1×) and trypsinized with Trypsin-EDTA (1×) and 20,000 cells were plated into a 6-well. OKSM expression was induced for 7 days and cells were cultured in DMEM supplemented with 15% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM), l-glutamine (4 mM), 1,000 U ml−1 LIF, 0.1 mM beta-mercaptoethanol, 50 μg ml−1 sodium ascorbate and 1 μg ml−1 doxycycline at 37 °C with low oxygen (4.5% O ) and 5% CO . After 7 days of OKSM expression, cells were cultured for an additional 4 days without doxycycline to withdraw OKSM transgene expression at 37 °C with 5% CO , ambient oxygen. Following trypsinization, cells were analysed for Oct4–GFP expression using a FACS BD LSRFortessa (BD Biosciences), data were analysed using FlowJo. Alkaline phosphatase activity was measured using an enzymatic assay for alkaline phosphatase (VECTOR red alkaline phosphatase (AP) substrate kit) according to the manufacturer’s protocol. Nanog immunohistochemistry of iPS cell colonies was performed as previously described15 using anti-Nanog antibody (ab80892, Abcam) at a dilution of 1:500. Cells were permeabilized with 0.2% Triton-X before blocking and antibody incubation. Triple transgenic MEFs were reprogrammed as described, using a tetOP-inducible shRNAmiR expression vector RT3CEPIN (TRE3G-mCherry-mirE-PGK-Puro-IRES-Neo). Oct4–GFP+ iPS cells generated with experimental shRNAs were sorted on day 7 of OKSM transgene expression. iPS cells were plated on feeders and cultured in DMEM supplemented with 13% knockout serum replacement, 2% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM), l-glutamine (4 mM), l-ascorbic acid (50 μM), 1,000 U ml−1 LIF, beta-mercaptoethanol, MEK inhibitor (PD0325901, 1 μM) and GSK3 inhibitor (CHIR99021, 3 μM) at 37 °C with 5% CO . Polyclonally derived iPS cells were microinjected into B6 albino blastocysts to allow identification of chimaeras based on coat colour markers. Male chimaeras were mated to B6 albino females to allow identification of germline transmission based on coat colour. Triple transgenic reprogrammable MEFs were transduced with shRNA expressed from LEPC as previously described and cultured in MEF media. 3 days after retroviral infection, cells were sorted for mCherry expression and 40,000 cells were re-plated per well of a 6-well dish. On the next day, cells were infected with the corresponding second shRNA expressed from LENC. 24 h later, cells were cultured in DMEM supplemented with 15% FBS, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, sodium pyruvate (1 mM), l-glutamine (4 mM), 1,000 U ml−1 LIF, 0.1 mM beta-mercaptoethanol and 1 μg ml−1, 50 μg ml−1 sodium ascorbate, and 1 μg ml−1 doxycycline at 37 °C with lox oxygen (4.5% O ). 36 h after the second shRNA transduction, cell culture media was supplemented with 0.5 mg ml−1 G418 for 3 days and 0.25 mg ml−1 G418 for an additional 3 days to ensure double infection. After 7 days of OKSM transgene induction, cells were cultured in ES cell medium for an additional 4 days without doxycycline to select for transgene-independent colonies at 37 °C with 5% CO . Cells were analysed for Oct4–GFP expression using a FACS BD LSRFortessa (BD Biosciences). The effect of double knockdown of targets on iPS cell formation was determined by calculating the ratio of Oct4–GFP+ to Oct4–GFP− cells at day 11 relative to an empty vector control. Triple transgenic reprogrammable MEFs carrying the Col1a1::tetOP-OKSM and R26-M2rtTA alleles as well as a Oct4–GFP reporter were reprogrammed in replicate wells as previously described. Starting on day 4 of OKSM transgene expression, cells were analysed for Oct–GFP expression in 24 h intervals using a BD LSRFortessa (BD Biosciences). In addition, 20% of cells were replated and cultured under doxycycline-free ES cell conditions. After 13 days, cells were analysed using a FACS BD LSRFortessa to determine the minimum time required for the establishment of transgene-independent iPS cells. To determine Nanog expression dynamics, triple transgenic reprogrammable MEFs were reprogrammed in independent wells and analysed every 24 h. Starting on day 4 of OKSM transgene expression, cells were trypsinized with trypsin-EDTA (1×), washed with PBS (1×) and fixed with paraformaldehyde (PFA) (4%) for 30 min. Afterwards, cells were washed with PBS (1×) and stored at 4 °C. After 11 days of reprogramming, cells were stained with anti-Nanog antibody (rabbit polyclonal, 1:400, Abcam) and analysed using a FACS BD LSRFortessa (BD Biosciences). Pecam staining of reprogramming intermediates was performed as previously described18. All samples were analysed on a MACSQuant fluorescence cytometer (Miltenyi). sgRNAs targeting the Chaf1a locus were cloned into a lentiviral vector harbouring the wild type Cas9 coding region, an sgRNA expression cassette, and a Thy 1.1 reporter transgene. Successfully transduced cells were purified by FACS using Thy1.1 expression, cultured for 7 days to allow for genome editing to occur and induced with doxycycline for one week before measuring the fraction of Oct4–GFP+ cells at day 11. Single Oct4–GFP+ iPS cells were then plated to generate clonal iPS cell cultures for PCR amplification of CRISPR/Cas9-induced genomic modifications, followed by Sanger sequencing. sgRNAs were PCNA-1: GAAGCGCATTAAGGCAGAAA and PCNA-2: TTGGGAGCCTGCGGAGTCTT. Induced neurons were generated as described in the experimental scheme (Fig. 4d). CAF-1 or Renilla RNAi inducible transgenic MEFs were transduced with Ascl1-inducible lentivirus, exposed to doxycycline 24 h post-induction, cultured in MEF media for the first 48 h and switched to serum-free neuronal media (N3B27) supplemented with doxycycline for an additional 11 days. Cultures were fixed and stained for Map2 as previously described24. Pre-B cells (C10 line)25 were cultured in RPMI Medium, 10% charcoal stripped FBS (Invitrogen), 2 mM l-glutamine, 100 U ml−1 penicillin, 100 μg ml−1 streptomycin, 55 μM beta-mercaptoethanol. Pre-B cells were transduced with lentiviral pLKO vectors obtained from the Broad Insitute’s RNAi consortium (empty vector ‘null control’ or vector carrying stem-loop shRNAs targeting Chaf1a and Chaf1b subunits). Following selection of transduced cells with puromycin, cells were seeded at 106 cells per ml and supplemented with oestradiol (E2) and macrophage cytokines (IL3 and CSF) to induce macrophage transdifferentiation as previously described25. All time points were analysed for Cd14 and Mac1 expression by flow cytometry on the same day. RNA was extracted (Qiagen RNeasy mini kit) and reverse transcribed (GE illustra ready-to-go RT–PCR beads) according to the supplier’s instruction. Quantitative PCR was performed using SybrGreen and a BIO-RAD CFX connect cycler. Primers used were: b-Act-F: GCTGTATTCCCCTCCATCGTG; b-Act-R: CACGGTTGGCCTTAGGGTTCAG; Ube2i-R: GGCAAACTTCTTCGCTTGTGCTCGGAC; Ube2i-F: ATCCTTCTGGCACAGTGTGCCTGTCC; Chaf1b-R: GGCTCCTTGCTGTCATTCATCTTCCAC; Chaf1b-F: CACCGCCGTCAGGATCTGGAAGTTGG; Chaf1a-R: GTGTCTTCCTCAACTTTCTCCTTGG; Chaf1a-F: CGCGGACAGCCGCGGCCGTGGATTGC. Whole-cell lysates from reprogramming intermediates were run on 4–20% gradient SDS-polyacrylamide gels and transferred to nitrocellulose membrane (Bio-Rad) by standard methods. Membranes were blocked for 1 h in 5% non-fat dry milk in 1 ×TBS with 0.05% Tween-20 (TBST), rinsed, and incubated with primary antibody diluted in 3% BSA in TBST overnight at 4 °C. The following primary antibodies were used: anti-Chaf1a (sc-10206, Santa Cruz), anti-Chaf1b (sc-393662, Santa Cruz), anti-TBP (ab818, Abcam), anti-Ube2i (4786, Cell signaling), anti-PCNA (D3H8P, Cell signaling), HRP conjugate anti-actin (AC-15, Sigma). Blots were washed in TBST, incubated with HRP-conjugated secondary antibodies for semi-quantitative western blot analysis and IRdye 800CW or IRdye 680RD for quantitative westerns, as indicated. Secondary antibodies for both methods were incubated in 5% milk in TBST for 1 h at room temperature (except for anti-β-actin-peroxidase antibody, which was incubated for 15 min), and washed again. HRP signal was detected by Enhanced ChemiLuminescence (Perkin Elmer). Fluorescent infrared signal was detected using LI-COR Odyssey imaging system. To generate ATAC-seq libraries, 50,000 cells were used and libraries were constructed as previously described27. Briefly, cells were washed in PBS twice, counted and nuclei were isolated from 100,000 cells using 100 μl hypotonic buffer (10 mM Tris pH 7.4, 10 mM NaCl, 3 mM MgCl , 0.1% NP40) to generate two independent transposition reactions. Nuclei were split in half and treated with 2.5 μl Tn5 Transposase (Illumina) for 30 min at 37 °C. DNA from transposed nuclei was then isolated and PCR-amplified using barcoded Nextera primers (Illumina). Library quality control was carried out using high-sensitivity DNA bioanalyzer assay and qubit measurement and sequenced using paired-end sequencing (PE50) on the Illumina Hi-Seq 2500 platform. For all ChIP experiments, 107 reprogramming intermediates were collected per library. Chromatin precipitation assays were performed as previously described38 using goat polyclonal anti-Sox2 antibody (AF2018, R&D). Briefly, cells were cross-linked on plate in 1% methanol-free formaldehyde and snap-frozen in liquid nitrogen until processed. Nuclei were isolated using 1 ml of cell lysis buffer (20 mM Tris pH 8, 85 mM KCl, 0.5% NP40 and 1 ×HALT protease inhibitor cocktail), resuspended in nuclear lysis buffer (10 mM Tris-HCl pH 7.5, 1% NP40, 0.5% Na deoxycholate, 0.1% SDS, 1× HALT protease inhibitor cocktail) and sonicated using optimized pulses of a Branson sonifier (1 min ON/OFF pulses for 5 cycles) for ChIP-seq libraries and S220 Covaris sonicator (settings: duty cycle 5%, intensity 6, cycles/burst 200, pulse length 60 s, 20 cycles, 8 °C) for SONO-seq input preparations. Sonications were verified for both methods using the 2100 Bioanalyzer. Immunoprecipitations were carried out by first adjusting salt concentration in sheared chromatin to 167 mM NaCl and adding antibodies (6 μg of Sox2 antibody) and incubated for 3–4 h at 4 °C. 50 μl Protein G dynabeads (Invitrogen) were prepared for each IP reaction by washing 2 to 3 times in ChIP dilution buffer (16.7 mM Tris-HCl pH 8.1, 167 mM NaCl, 0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA) and added for one additional hour to pull-down bound chromatin. Bead complexes were washed 6 times in RIPA buffer (20 mM Tris-HCl pH 8.1, 1 mM EDTA, 140 mM NaCl, 1% Triton X-100, 0.1% SDS, 0.1% Na deoxycholate), then twice with RIPA buffer with high salt concentration (500 mM), then twice in LiCl buffer (10 mM Tris-HCl pH 8.1, 1 mM EDTA, 1% DOC, 1% NP40, 250 mM LiCl) and twice in TE buffer. Complexes were then eluted and reverse crosslinked in 50 μl ChIP elution buffer (10 mM Tris-HCl pH 8, 5 mM EDTA, 300 mM NaCl, 01% SDS) and 8 μl of reverse crosslinking buffer (250 mM Tris-HCl pH 6.5, 1.25 M NaCl, 62.5 mM EDTA, 5 mg ml−1 proteinase K, 62.5 μg ml−1 RNase A) by incubation at 65 °C for 6 h. DNA was isolated using Ampure SPRI beads and yield quantified using Qubit fluorometer. ChIP-seq libraries were constructed from 10 ng of immunoprecipitated DNA using the NEBNext ChIP-seq library prep reagent set for Illumina (New England Biolabs), following the supplier’s protocol. Briefly, purified DNA was end-repaired and dA-tailed. Following subsequent ligation of sequencing adaptors, ligated DNA was size-selected to isolate fragments in the range of 300–550 bp in length using Egels. Adaptor-ligated fragments were enriched in a 14-cycle PCR using Illumina multiplexing primers. Libraries were purified, analysed for correct size distribution using dsDNA High Sensitivity Chips on a 2100 Bioanalyzer (Agilent), pooled and submitted for single-end 50 bp Illumina GAII high-throughput sequencing. The reads were aligned to the mouse genome (mm 9) using Bowtie with the unique mapping option39. The smoothed tag density profiles were generated using get.smoothed.tag.density function of the SPP R package40 with a 100-bp Gaussian kernel, a 50-bp step and library size normalization. The positions of promoters and enhancers in ES cells were obtained from a publicly available data set41. To access the significance of the difference in the enrichment values between CAF-1 and Renilla knockdown samples, a paired Wilcoxon rank sum test was used. The reads were aligned to mM 9 using BWA version 0.7.8 with -q 5 -l 32 -k 2 and paired option42. Non-primary mapping, failed QC, duplicates and non-paired reads were filtered. If one paired-end was mapped to one chromosome and the other end was mapped to a different chromosome, the read was not included. The reads aligned to chrM were also removed. Only uniquely mapped reads were used. The read density profiles were generated using 150-bp windows with a 20-bp step and were normalized by the library size. For the comparison between Chaf1 shRNA and Renilla shRNA samples, the read density profiles were further normalized using the mean values of all annotated promoters from mm 9. For meta-analysis, the reads from Chaf1a.164 and Chaf1a.2120 knockdown samples were merged. The coordinates of promoters and enhancers in ESCs and MEFs were obtained from a publicly available data set41. The coordinates of the super-enhancers for the meta-gene plot were used from a recently published data set43. Each super-enhancer region (with 5-kb margins) was divided into 101 bins and the tag density signals were averaged in each bin. Significantly enriched regions were detected using Hotspot44 with FDR = 0.01. A one-sided paired Wilcoxon rank sum test was used for the comparison in the enrichment values between CAF-1 and Renilla knockdown samples. To classify the genomic locations of the peaks (promoters, coding exons, introns, intergenic regions, 5′ UTR and 3′ UTR), the annotations for mm 9 were downloaded from UCSC (https://genome.ucsc.edu/cgi-bin/hgTables). The differential sites between CAF-1 and Renilla knockdown samples were identified using DiffBind with P = 0.05 for the consensus ATAC-seq peaks after normalization with TMM (trimmed mean normalization method)45. DiffBind uses statistical routines developed in edgeR46. A one-sided paired Wilcoxon rank sum test was used for the comparison in the enrichment values between CAF-1 and Renilla knockdown samples. The reads were aligned and tag densities profiles were generated as in SONO-seq analyses. The log -fold enrichment profiles were generated using get.smoothed.enrichment.mle in the SPP R package. The profiles were normalized by the background-scaling method using non-enriched regions. A paired Wilcoxon rank sum test was used for the comparison in the Sox2 enrichment values between Chaf1a and Renilla knockdown samples. For Sox2 peak comparison between CAF-1 and Renilla knockdown samples, reads were first subsampled to make the sequencing depth the same for each condition (number of peaks called tends to increase for greater sequencing depth). The significantly enriched peaks compared to input were detected using the SPP find.binding.positions function with default parameters. The overlapped peaks were compared with a margin of 200 bp. For unique peaks, we first identified the peaks that were present only in one condition (CAF-1 or Renilla knockdown) and compared the enrichment values (input-subtracted tag counts) between CAF-1 and Renilla knockdown. If the ratios between the enrichment values were greater than two-fold, we considered the peaks as ‘unique’ for one of the conditions. We used Sox2 ChIP-seq data in ES cells from publicly available data sets47 and analysed data in the same as described above. ChIP-seq data were mapped to the mouse genome (mm 9) with Bowtie 0.12.7 (ref. 39) allowing up to 3 mismatches, retaining uniquely mapping reads. To assess H3K9me3 signal distribution genome-wide, we divided the genome in 5-kb intervals, and for each interval, we calculated the ratio of RPM normalized signal in the IP and input samples. Intervals with less than 10 reads in the input samples (~10% of all) were excluded from further analyses due to low coverage. Intervals overlapping specific regions were extracted using the bedtools suite48. RRR annotations were obtained from ref. 29, and signal across all included 5-kb intervals was averaged. For H3K9me3 enrichment over transposable element (TE) bodies, we used the mm 10 genome version, as this release contains the most recent TE annotations. We extracted the genomic regions corresponding to TE families annotated in the mm10 RepeatMasker track in the UCSC genome browser (http://genome.ucsc.edu/), and calculated the normalized read counts in IP to input samples for each family. Due to the repetitive nature of TEs, we further validated all results considering reads that map to multiple (up to 10,000) positions in the genome, and scaling read counts by the number of valid alignments. This threshold for multiple mapping positions was chosen as it was previously shown to approximate results obtained allowing unlimited mapping positions, but at a significantly improved computation speed49. In all analyses, signal estimates based on uniquely mapping reads and based on reads mapping to multiple genomic positions produced similar results. The microarray data were preprocessed using Affymetrix Expression Console version and normalized by the RMA procedure. The limma Bioconductor package was used to select differentially expressed genes with false discover rate (FDR) ≤ 0.05 and at least two-fold change50. We performed functional analysis with gene set enrichment analysis (GSEA)51 using the limma moderated t-statistic to rank the genes. ATAC-seq peaks were separately called for CAF-1 and Renilla knockdown at days 0, 3 and 6 as described above. To determine which genes from Supplementary Table 5 may be affected by altered ATAC-seq signals, we incorporated long-range interaction data between promoters and enhancers based on ChIA-PET analysis in ES cells52. If there was no matched pair from the ChIA-PET tables, the regions proximal to the TSSs of genes (<4 kb) were taken. The regions were overlapped with the union ATAC-seq peaks of each conditions. For the overlapped peaks, the enrichment values (log tag counts) were compared between CAF-1 and Renilla knockdown samples with two-sided paired Wilcoxon rank sum test. RNA sequencing data was first pre-processed using Reaper53 to remove any Illumina adaptor sequences and computationally depleted of ribosomal RNA sequences (GenBank identifiers: 18S, NR_003278.3; 28S, NR_003279.1; 5S, D14832.1; and 5.8S, K01367.1) using Bowtie 0.12.7 allowing three mismatches39. For protein-coding gene expression analyses, pre-processed data was mapped to the mouse genome (mm 10) using Bowtie 0.12.7 (ref. 39) allowing three mismatches, and retaining uniquely mapping reads. Mouse transcript annotations were obtained from RefSeq, and reads corresponding to the exonic regions of each gene were calculated using a custom phyton script. For overlapping genes, reads corresponding to overlapping regions were divided equally. Gene differential expression was analysed using the DESeq R package54. For TE expression analyses, data was mapped to the mm 10 genome with 0 mismatches and considering reads that map to up to 10,000 genomic positions as in ChIP sequencing analyses. We then calculated the number of reads corresponding to TE regions annotated by the UCSC RepeatMasker track, scaling by the number of valid alignments for each read. Scaled reads for each TE family were summed, and normalized as RPM. Heatmaps were generated using the gplots R package, and differential expression analyses were performed using the DESeq R package54. Comparisons of RNA-sequencing results from analyses based on uniquely mapping reads, and based on reads mapping to multiple genomic positions, showed very similar results. Unpaired Student t-test was used for statistical analysis in replicates of cell biology experiments. All error bars represent s.d. of independent biological replicates as indicated. A P value of <0.05 was considered statistically significant. Numbers of replicate experiments (n) are shown in figure legends. All graphs with no error bars represent n = 1. To assess significant differences in signal enrichment at ESC promoters, enhancers or super-enhancers by SONO-seq, ATAC-seq and ChIP-seq analysis upon CAF-1 knockdown or Renilla knockdown, a paired Wilcoxon rank sum test was used, where it is assumed that populations do not follow normal distributions. To identify differential ATAC-seq peaks between CAF-1 and Renilla knockdown samples, negative binomial models were used. CiA transgenic MEFs carrying an array of Gal4 binding sites (UAS elements) upstream of the endogenous Oct4 promoter and a GFP reporter in place of the Oct4 coding region30 SV40-large T antigen and subcloned. Two clonal derivates of these MEFs were infected with retroviral LENC vectors expressing Chaf1a, Chaf1b or Renilla shRNA (Fig. 5 and data not shown). Cells were subsequently transduced with lentiviral vectors expressing either Gal4 alone or Gal4–VP16 in combination with a puromycin resistance cassette. Following drug selection, Oct4–GFP expression was measured by flow cytometry after 10 days.

No statistical methods were used to predetermine sample size. The investigators were not blinded to allocation during experiments and outcome assessment. The strains used in this study were derived from the base strains JYL1129 and JYL1130, haploid W303 yeast strains with genotypes MATa, STE5pr-URA3, ade2-1, his3Δ::3xHA, leu2Δ::3xHA, trp1-1, can1::STE2pr-HIS3 STE3pr-LEU2 and MATα STE5pr-URA3 ade2-1 his3Δ::3xHA, leu2Δ::3xHA, trp1-1, can1::STE2pr-HIS3 STE3pr-LEU2 respectively (provided by J.-Y. Leu). Note these strains contain nutrient markers driven by promoters that are specific to haploid cells (STE5pr-URA3) and either mating type a (STE2pr-HIS3) or mating type α (STE3pr-LEU2)31. We identified a likely non-functional open reading frame (YCR043C) as an ideal target for insertion of mating-type-specific drug resistance markers close to the MAT locus. We amplified flanking regions from genomic DNA obtained from the YCR043C deletion mutant of the S. cerevisiae whole-genome deletion collection32 using primers KANampFw and KANampRv (Supplementary Data 2) and integrated this product at the YCR043C locus of JYL1129 to generate strain MJM64. We then amplified the HPHB gene from plasmid pJHK137 (provided by J. Koschwanez) using primers HYGampFw and HYGampRv (Supplementary Data 2) and integrated at the YCR043C locus of JYL1130 to generate strain MJM36. We founded 12 mating type a lines using strain MJM64 and 12 mating type α lines using strain MJM36. Each of our 6 sexual populations consists of one specific pair of these MATa and MATα lines. The other 6 MATa and 6 MATα lines were designated as asexual controls (a total of 12 asexual controls). Between sexual cycles, we propagated these lines at 30 °C in unshaken round bottom 96-well plates containing 128 μl of yeast extract peptone dextrose (YPD) with daily 1:210 dilutions using a Biomek FX liquid handling robot (Beckman Coulter). Pairs of MATa and MATα lines representing a single sexual population were propagated independently in this mitotic phase. As previously described33, this protocol results in approximately ten generations per day and an effective population size of N  ≈ 105. Aliquots from generation 30 of each 90-generation cycle were mixed with glycerol to 25% and kept at −80 °C for long-term storage. After each 90 generations of asexual propagation, we initiated sexual cycles in the sexual populations. In each sexual cycle, we mixed and mated each pair of MATa and MATα lines, sporulated the resulting six diploid populations, isolated a and α subpopulations, and used these to initiate another 90 generations of mitotic growth (Extended Data Fig. 1). To mate our lines we mixed a and α haploids, spotted onto YPD plates, and then incubated at 30 °C. After 5 h, cells were scraped from the plate, resuspended in PBS buffer solution and then plated on YPD agar containing hygromycin (300 μg ml−1) and G418 (200 μg ml−1) to select for diploids. For sporulation, 10 μl of saturated diploid culture was inoculated into 1 ml of yeast peptone acetate liquid media for incubation on a roller drum at 21 °C. After 12–15 h, cells were pelleted, resuspended in 1 ml of 1 M KOAc and then incubated at room temperature with agitation in a roller drum. After 3 days, the presence of spores was confirmed by microscope. We then pelleted and resuspended cells in Zymolase solution (Zymo Research, 0.4 U μl−1) to digest spore walls and eliminate the majority of unmated diploids. To ensure that only mated and sporulated individuals survived this treatment, the zymolase lysate was divided, with one half plated onto defined amino-acid dropout media CSM (−uracil, −leucine) to select for α haploids, and the other half plated onto CSM (−uracil, −histidine) to select for a haploids. After 24 h of growth at 30 °C, the lawn of cells was washed from plates and diluted into liquid CSM (−uracil, −leucine) or CSM (−uracil, −histidine) and propagated for 24 h. We used a dilution series to estimate the population size of this lawn, to confirm that this procedure did not lead to a population size bottleneck compared with the effective population size. Cultures were checked for diploids by plating a sample on YPD containing G418 and hygromycin to quantify the number of unsporulated diploids that survive haploid selection. We found that diploid leakage was never more than 0.1% (see Extended Data Table 1 for details). These cultures were diluted into YPD and propagated for 90 generations before the sexual cycle was repeated. Asexual control populations were maintained in the same conditions as sexuals wherever possible, with the exception of sporulation, during which time these populations were kept at 17 °C without dilution or agitation. In principle, sexual and asexual populations could adapt differentially to the conditions specific to the sexual and asexual treatments. To test whether this effect could drive any differences between sexual and asexual lines, we measured the relative fitness of all evolved lines compared with the ancestor in both the sporulation and the 17 °C treatment conditions. Specifically, we acclimatized six replicates of each evolved strain to YPD for 24 h and then mixed each with a fluorescently marked ancestral strain in equal proportions. We subjected three of these replicate populations of each evolved strain to the 17 °C treatment (plates were sealed and incubated at 17 °C for 4 days) and the other three to the sporulation treatment (incubation for 1 day in yeast peptone acetate liquid media, followed by 3 days in 1 M KOAc at room temperature). We used flow cytometry (Fortessa, BD Biosciences) to measure the ratio of the two competing types immediately after mixing and again immediately after the 4-day treatment, counting approximately 20,000 cells for each measurement. We found that both sexual and asexual evolved lines performed better than the ancestor in 17 °C treatment and worse in the sporulation treatment (Extended Data Fig. 2). However, the effects of the sporulation and 17 °C treatments did not vary systematically between evolved sexual and asexual populations (two-sided t-test, P = 0.5 and P = 0.8 respectively), and averaged over a 90-generation cycle any differences were small compared with the gains in fitness attained during adaptation to YPD. Thus there is no evidence that adaptation to sporulation or 17 °C played any role in our results. We also tested whether conditions specific to the asexual treatment (4 days at 17 °C without dilution) or the sexual treatment (4 days of sporulation without dilution) caused variation in the number of mutations that occur in sexual and asexual lines. We assayed mutation rate by counting the number of spontaneous 5-fluoroorotic acid (5-FOA) resistant mutants that arose in independent cultures of the ancestral W303 strain. Specifically, we propagated 54 populations in a microwell plate containing 128 μl YPD. After one dilution cycle, we plated 18 of these cultures on agar plates containing SC-uracil supplemented with 1 mg ml−1 5-FOA (Sigma/Aldrich), and we counted the number of 5-FOA-resistant mutants in each culture. Of the remaining 36 cultures, we incubated 18 for 4 days at 17 °C in a microplate, and put 18 through our sporulation cycle (1 day in yeast peptone acetate and 3 days in 1 ml of KOAc). We then plated both sets of cultures on selective media and counted the total number of mutants in each (Extended Data Table 2). We then calculated the number of mutations per culture (m) using the Ma–Sandri–Sarkar maximum likelihood method34. We found no difference in the numbers of mutations across all three data sets, suggesting that most mutations occurred primarily during growth in YPD, and not during incubation at 17 °C or during sporulation culture conditions. We note that each sexual population consists of a mating type a and a mating type α subpopulation, while each asexual population consists of a single type a or type α line. Although sexual populations were bottlenecked to the same total size as the asexuals during each sexual cycle, this difference meant there was a potential difference in effective population size between treatments. To test whether this difference could explain the more rapid adaptation in sexual populations, we evolved an alternative set of 6 asexual control populations for 990 generations. Each of these alternative asexual controls consisted of one specific pair of MATa lines (that is, two MATa subpopulations per asexual population). We propagated these subpopulations separately between sexual cycles. Every 90 generations, we mixed the two subpopulations (exactly analogous to the sexual lines but without recombination) and then divided them for another 90 generations of separate propagation. Simultaneously, we evolved 12 additional asexual control lines propagated in the same manner but without mixing every 90 generations. After 990 generations of evolution, we measured the fitness of all evolved populations. We find these mixed and unmixed asexual controls adapt at the same rate (Extended Data Fig. 3, two-sided t-test, P = 0.8). Thus this difference in treatments is not responsible for the faster adaptation in sexual populations. Fitness assays were performed as described previously33. Briefly, fitness was measured by competing test clones or populations against an ancestral reference strain containing an mCitrine fluorescent marker inserted at the HIS3 locus35. Because this reference strain would mate with MATα lines, all population fitness assays were performed on MATa subpopulations. After strains had acclimatized to YPD media for 24 h, competing strains were mixed in equal proportions and propagated by diluting 1:210 every 24 h. We used flow cytometry (Fortessa, BD Biosciences) to measure the ratio of the two competing types after 1 and 3 days (approximately 10 generations and 30 generations respectively), counting approximately 20,000 cells for each measurement. We confirmed the appropriateness of each t-test conducted using this fitness data with an F-test. Glycerol stocks of populations to be sequenced were defrosted and 10 μl inoculated into 3 ml of YPD and incubated without shaking at 30 °C for 16 h (MATa and MATα subpopulations of each sexual line were sequenced separately). Genomic DNA was prepared from these cultures using a Yeastar Genomic DNA kit (Zymo Research). Library preparations were prepared with a Nextera kit, using a protocol we previously described36. Libraries were sequenced to an approximate depth of 40-fold coverage using an Illumina HiSeq 2500 (Illumina). We aligned Illumina reads from all samples (after trimming Nextera adaptor sequences) to a SNP/indel-corrected W303 reference genome21 using bowtie2 version 2.1.0 (ref. 37). Next, we marked duplicate reads with Picard version 1.44. We generated a list of candidate SNPs and indels by applying GATK’s UnifiedGenotyper version 2.3 to all time points in each population at once38. To find low-frequency variants, we set the minimum phred-scaled confidence threshold for GATK to call a mutation to 4.0. For each candidate mutation, we extracted the allele depth supporting the reference and alternate allele from the resulting VCF file and calculated mutation frequencies for each time point. We excluded potential mutations if there was less than 10× average coverage across all time points or if GATK called two or more alternate alleles at that site. We required that a mutation be supported by at least ten total reads and that it reach a frequency of 0.1 in two or more time points of the population in which it was called. To refine our list of candidate mutations, we took advantage of our time-course sequencing and multiple replicate lines. The frequency of a real mutation should be correlated across time points, while errors should be uncorrelated. We thus excluded candidate mutations whose frequency trajectories were uncorrelated (lag-1 autocorrelation less than 0.2). Also, it is unlikely that the same base-pair substitution will arise independently in replicate populations. Thus, for each candidate mutation, we estimated the site-specific error rate by calculating the frequency of the alternate allele outside of the population in which the mutation was called. We then excluded candidates with an estimated error rate above 0.05. We also calculated the probability of detecting at least the observed number of alternate alleles in the focal population, assuming a binomial error model (given the observed coverage and estimated error rate). We excluded candidates where this probability exceeded 10−5. We also detected several mutations that were present in the founding stock and thus in multiple replicate populations. We marked these mutations and excluded them from our counts of de novo mutations and from Fig. 2. After performing this procedure in the MATa and MATα subpopulations of each sexual line separately, we combined called mutations from both subpopulations and averaged the mutation frequencies to generate the whole-population trajectories in Fig. 2; data on each subpopulation separately are available in Supplementary Data 1. We annotated each called mutation using a SNP/indel-corrected GFF file and determined its effect on amino-acid sequence. We also screened for complex mutations: pairs of mutations that were within 1 kb of one another and followed the same trajectory. We discovered 7 complex mutations, all within 41 bases of one another. We determined the net effect of each complex mutation and considered them to be single mutations in our analysis. We note that it is not possible to determine the fraction of mutations that we detect with our variant-calling method. For example, sequencing depth fundamentally limits our ability to detect rare mutations. We do not attempt to call mutations that never reach ~10% frequency because our 40-fold coverage gives no resolution below that level; our results thus represent only mutations that reach substantial frequency. We are also limited to the set of mutations that can be identified by GATK, mainly SNPs and small indels (but see below for an analysis of larger-scale mutations from clone sequence data). These limitations apply equally to our sexual and asexual populations. Clonal interference is expected to generate correlations between the frequency trajectories of mutations that segregate at the same time. Two mutations in the same genetic background should increase or decrease together, while mutations on different backgrounds will tend to move in opposite directions. For each mutation trajectory, we calculated the change in frequency between each sequenced time point. We then computed the correlation coefficient between changes in the same time interval for every pair of mutations in the same population. We excluded pairs of mutations that did not segregate at the same time (that is, pairs whose frequencies were never between 0.05 and 0.95 in the same time point). Because large positive and large negative correlation coefficients are both evidence of interference effects, we compared the distributions of squared correlation coefficients (R2) in asexual and sexual populations (Fig. 2i–l). The dynamics of natural selection will introduce such correlations even among unlinked mutations by constraining the shapes of frequency trajectories. For example, two simultaneous but genetically unlinked selective sweeps will each follow a similar sigmoidal trajectory and thus be strongly correlated with one another. We controlled for this effect by repeating the above calculations with all pairs of mutations segregating in different populations of the same reproductive type. The R2 values from this procedure comprise two empirical null distributions (sexual and asexual) for mutations that are certain to be independent of one another (Fig. 2j–l). Our primary variant-calling pipeline can only detect substitutions, insertions, and deletions affecting ~3 bp or less. To estimate the prevalence of larger-scale mutations in our populations, we implemented an alternative pipeline to detect large deletions and copy-number variants on the basis of coverage depth as a function of genome position. Coverage depth in whole-population samples is difficult to interpret because it convolves individual copy-number with population variation. For example, a fixed duplication and a fourfold amplification present in half the population would generate identical coverage data in a whole-population sample. To avoid this problem, we sequenced eight total clones isolated from the final time points of two sexual and two asexual populations to an average depth per clone of 50–80×. After aligning reads to the reference as described above, we tabulated coverage depth in 100 base-pair windows as the number of mapped reads whose start positions fell within each window. These windows vary naturally in coverage depth owing to pre-existing duplications, PCR artefacts, and properties of the alignment algorithm. Therefore, to generate a baseline expectation, we calculated coverage in the same windows for all of the generation-0 and generation-90 population samples. Added together, these data yielded 564 reads in the median window. We thus calculated the expected relative coverage in each window by dividing its total coverage in the generation-0 and generation-90 samples by 564. For each clone, we then multiplied this expected relative coverage by the median coverage per window in that clone to get the expected coverage in each window. We next looked for windows in which the observed coverage depth deviated from its expectation. This is complicated by the fact that random noise is introduced by the sequencing and alignment process. Because the coverage depth is generated by a counting process, the noise variance scales with the expected coverage. We therefore applied a variance-stabilizing Anscombe transform39 to standardize the noise across windows with different expectations. First, we modelled the variance as v(m) ∝ m + m2/r, where m is the expected coverage in a window, v is the mean squared deviation from that expectation, and r is a parameter fit to the data by a linear regression of v/m by m (we find a best-fit value r ≈ 440). This variance function, which is characteristic of negative-binomial counting noise, leads to an Anscombe transformation where k is the observed coverage and c = 3/8 following the recommendation of ref. 39 for negative-binomial data. The transformed data are approximately normally distributed with mean A(m) and constant variance. Deletions and amplifications larger than our 100-bp window size should generate spatially correlated signals in our data, while the variance-stabilized noise will be largely uncorrelated between adjacent windows. To take advantage of this, we performed a ‘wavelet denoising’ procedure, a standard signal-processing method for separating spatially correlated signals from white noise40, which has been used previously41 in similar analyses of biological sequence data. Specifically, we applied a discrete wavelet transform with the Haar basis, using the Python package PyWavelets, to our variance-stabilized and mean-centred data. We then performed noise reduction by replacing each wavelet coefficient a with a thresholded coefficient a *, according to the formula a * = sign(a )max[0,|a | − t], where the threshold value t was set to three standard deviations of the variance-stabilized data. After noise reduction, we inverted the wavelet and Anscombe transforms to get a smoothed estimate of the ratio of observed to expected coverage as a function of position (Extended Data Fig. 4). By visual inspection, we identified ten regions exhibiting strong signals of amplification or deletion in at least one clone (Extended Data Table 4). Of these, two regions (an rDNA-rich segment of chromosome XII and the segment of chromosome VIII containing CUP1-1 and CUP1-2) seemed to have undergone amplification in multiple independent populations. Both of these regions are known to exhibit copy-number variation across S. cerevisiae strains42, 43. Of the remaining regions, five contained Ty elements. To probe their fitness effects, we reconstructed mutations from evolved strains in the mating type a ancestral genetic background, MATa, ura3Δ::NATMX, ade2-1, his3Δ::3xHA, leu2Δ::3xHA, trp1-1, CAN1. First, DNA fragments containing URA3 and HPHB were amplified from plasmid pJHK137 using primers containing 40 nucleotides of homology to sequence on each side of the target nucleotide (see Supplementary Data 2 for primer sequences). The mating type a ancestor was transformed with the resulting PCR product, resulting in hygromycin-resistant URA+ strains. These mutants were in turn transformed with an 80-bp double-stranded oligonucleotide centred on the mutant allele (see Supplementary Data 2). We plated on 5-FOA to select for the replacement of the URA3 genes with the mutant allele, and confirmed replacement by replica plating on YPD + hygromycin. Correct genotypes were confirmed by Sanger sequencing. We found one example of a mutation in MET2 that had a strong deleterious effect when introduced into the ancestral genetic background, despite fixing in a sexual population. We also found that this mutation had no significant effect in the sequencing fitness assay. To investigate whether epistasis could be responsible for these observations, we sought to measure the effect of this met2 mutation in the evolved background from the sexual population in which it fixed. To use our URA3-HPHB strategy, we first replaced the STE5pr::URA3 locus in the evolved clone with a NATMX marker, resulting in ura3Δ::NATMX (primers in Supplementary Data 2). We confirmed that this manipulation did not affect fitness. We then used this strain as the basis to reintroduce the wild-type MET2 allele. The resulting difference in fitness between the evolved sexual clone and reconstructed wild type was used to calculate the fitness effect for the met2 allele shown in Fig. 3c. To measure the fitness effects of mutations in evolved populations, we sampled a single evolved clone from generation 990 of each of the sequenced asexual populations and from generation 630 of each of the sequenced sexual populations. We backcrossed each of these clones with its corresponding ancestor. This resulted in diploids heterozygous for all mutant sites that were present in each original clone. We bulk sporulated each of these diploids to generate a large number of recombinant haploids with different combinations of wild-type and mutant alleles. Each of these populations of haploids was then propagated in YPD liquid medium in the same conditions used during mitotic propagation in the evolution experiment. We sampled each population after 10, 30, 50, and 70 generations, prepared genomic DNA, and sequenced to measure the frequencies of each mutation over time. We estimated the fitness effect of each mutation (Fig. 3 and Supplementary Data 1) from the coverage depth supporting the mutant and ancestral alleles as a function of time (binomial regression with a logistic link function, coefficients and standard errors calculated using the glm function in R).

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. Sediment-free, TAOM enrichment cultures were obtained after 1.5 years by semi-continuous incubation of hydrothermal vent sediments from Guaymas Basin with sulfate reducer medium31 and 0.225 MPa CH (+0.025 MPa CO ) as the sole energy source at 60 °C, as described in ref. 6. Culture medium was replaced and samples were diluted 1:2 when sulfide concentrations exceeded 12 mM. For the different experiments, subsamples of the main culture (biological replicates) were incubated in parallel. Genomic DNA was extracted as described previously32 from an active TAOM culture. The protocol encompassed three cycles of freezing and thawing, chemical lysis in a high-salt extraction buffer (1.5 M NaCl) by heating of the suspension in the presence of sodium dodecyl sulfate and hexadecyltrimethylammonium bromide, and treatment with proteinase K. To amplify bacterial 16S ribosomal DNA genes the primer pair GM3/GM4 (ref. 33) was used. For archaeal 16S rDNA genes the primers 20F (ref. 34) and Arc1492R (ref. 35) were selected. PCR reactions were performed according to ref. 6. The phylogenetic affiliation was inferred with the ARB software package36 and release 115 of the ARB SILVA database37. Representative 16S rRNA gene sequences are deposited at NCBI with the accession numbers KT152859–KT152885. Cell aliquots were fixed in 2% formaldehyde for 2 h at room temperature, washed with 1 × PBS (pH 7.4). Fixed cell suspensions were treated with mild sonication (Sonoplus HD70; Bandelin) and aliquots of 50–250 µl were filtered onto GTTP filter (0.2 µm pore size, 20 mm diameter). CARD-FISH was performed as described previously38 with the following modifications: for cell wall permeabilization, filters were sequentially incubated in lysozyme solution (10 mg ml−1 lysozyme powder, 0.1 M Tris–HCl, 0.05 M EDTA, pH 8) for 15–30 min at 37 °C and proteinase K solution (0.45 mU ml−1, 0.1 M Tris–HCl, 0.05 M EDTA, pH 8, 0.5 M NaCl) for 2 min at room temperature. Endogenous peroxidases were inactivated by incubating the filters in 0.15% H O in methanol (30 min, room temperature). The oligonucleotide probes ANME-1-350 and HotSeep-1-590 were applied with formamide concentrations according to ref. 6. For dual CARD-FISH, peroxidases of the first hybridization were inactivated by 0.3% H O in methanol (30 min, room temperature). Catalysed reporter deposition was combined with the fluorochromes Alexa Fluor 488 and Alexa Fluor 594. Filters were stained with DAPI (4,6-diamidino-2-phenylindole). Micrographs were obtained by confocal laser scanning microscopy (LSM 780; Zeiss). All experiments were performed with artificial seawater medium containing 30 mM of carbonate buffer at TAOM cultivation temperature (60 °C), except when specified otherwise. To ensure equilibration of gas phases, samples were agitated on shaking tables. Highly pure gases and chemicals were used as additions to the incubations. Standard TAOM conditions are defined here as 0.2 MPa methane and 28 mM sulfate. To test the TAOM enrichment for substrate-specific sulfide production, triplicate culture aliquots (10 ml in 20 ml Hungate tubes) were supplemented with different substrates (Extended Data Table 2) at concentrations of 20 mM, except methyl sulfide and carbon monoxide (both 0.05 MPa), and hydrogen (0.16 MPa) with and without methane (0.2 MPa). Zero-valent sulfur was prepared according to ref. 39 and was supplied as dissolved species. For this compound we additionally tested sulfide development via disproportionation in a concentration gradient from 1–12 mM final S0 concentration (Extended Data Fig. 6a). As positive reference, methane was provided at 0.2 MPa (at 60 °C roughly equivalent to 1.6 mM in solution). Sulfide production in the experiments was repeatedly measured every 3 to 4 days using the copper sulfide assay40 and absorption spectrometry at 480 nm. TAOM rates with methane as the sole energy source (0.2 MPa) reached approximately 0.100 ± 0.030 µM per day, compared to a negative control (nitrogen; <0.001 µM per day). Rates determined for other substrates were compared to those under TAOM conditions. To determine the effect of hydrogen addition on methane oxidation rates, TAOM culture aliquots were supplemented with methane and hydrogen (0.15 MPa and 0.05 MPa, respectively), or only methane as control (0.15 MPa). Cultures were incubated headspace-free at 50 °C for this experiment, because hydrogen was too rapidly consumed at 60 °C for time-course experiments. To determine concentrations of methane and hydrogen, 1 ml of medium was sampled with gas-tight glass syringes, and the sampled medium was concurrently replaced with substrate-free medium to avoid the formation of a headspace. The sampled medium was injected through the septum of 10 ml Exetainer filled with 1 ml NaOH and concentrations of CH and H were measured as described below. To determine hydrogen concentrations at TAOM conditions, 20 ml of culture was transferred into 156-ml bottles at 60 °C and gas phases were repeatedly sampled using glass syringes (1 ml) combined with direct measurements on the gas chromatograph. Cultures incubated for 3 or more days reached stable hydrogen concentrations. A comparison to molybdate addition is provided in Extended Data Fig. 6b, c. To quantify molecular hydrogen production in TAOM, 20 ml of culture was supplied with sodium molybdate (10 mM final concentration). This molybdate concentration assured complete inhibition of hydrogen-dependent sulfate reduction as shown in replicate incubations of TAOM culture (1 to 25 mM molybdate) with hydrogen (0.1 MPa) as the sole electron donor (Extended Data Fig. 6d). Samples were stored at 60 °C on a shaking table and repeatedly sampled by glass syringes. Concentrations of methane and hydrogen were measured via gas chromatography coupled to flame ionization detection (Focus GC, Thermo) and via reducing compound photometry (RCD; Peak Performer 1 RCP; Peak Laboratories). Replicate culture aliquots (n = 5) were incubated in 5-ml Hungate tubes supplemented with methane, sulfate and 14C-labelled inorganic carbon (380 kBq). AOM-independent carbon fixation was determined under N atmosphere (n = 5). To determine methane oxidation rates, replicate vials were incubated with 14C-methane (14 kBq). Incubations were performed at 60 °C for 24 h. Samples were blotted onto 0.2-µm mixed cellulose esters membrane filters (Millipore, Merck). Filters were dried and potential residual inorganic carbon was removed by exposing the filters to an HCl atmosphere for 24 h. Radioactivity in liquid aliquots (0.1 ml) and filters was determined by liquid scintillation counting (scintillation mixture; Filtercount; Perkin Elmer; scintillation counter 2900TR LSA; Packard). To isolate the hydrogenotrophic sulfate reducers in the TAOM enrichment, aliquots were transferred to Hungate tubes (20 ml) and diluted 1:10 to 1:109 with marine sulfate reducer medium. All vials were amended with 0.2 MPa H :CO (80:20) gas phase, and additionally stored in N atmosphere to prevent oxygen flux into the culture vials. Vials were stored at the TAOM temperature optimum (60 °C) and measured for sulfide production using the copper sulfate assay40. To identify cultivated microorganisms, the 16S rRNA gene of active hydrogenotrophic cultures was directly amplified from freeze-thawed pellets of culture aliquots (primer pair GM3/GM4) and sequenced as described above. The phylogenetic affiliation was inferred with the ARB software package36 and Release 115 of the ARB SILVA database37. Representative sequences are deposited at NCBI with the accession numbers KT152886 and KT152887. Electron acceptor tests. Culture aliquots (1 ml tenfold-diluted in artificial anoxic seawater medium) were supplied with different potential electron acceptors (colloidal sulfur, sulfite or thiosulfate) with and without the addition of hydrogen. Potential growth on alternative carbon sources (that is, acetate, butyrate, peptone and methyl sulfide) was tested. Growth rates. Growth rates were independently determined from the development of sulfide concentrations and cell counts (from DAPI-stained cells for total cell numbers and from fluorescence in situ hybridized cells for specific cell numbers) from replicate cultures (grown from 10% inoculum). Growth efficiencies. Efficiencies were determined in a 14C-DIC radiotracer assay. Replicate cultures were spiked with 14C-DIC (~ 5.4 MBq) and incubated with H :CO or, as control, with N :CO headspace. Sulfate-dependent hydrogen consumption was determined by the increase of sulfide (colourimetrically40) and by the decrease of sulfate (via ion chromatography) in the medium. Fixed carbon was measured from culture aliquots (5 ml volume) blotted on filters as described above. Concentrations of radioactivity on the filter and the medium were determined via scintillation counting. The total carbon fixation (mmol per ml culture) was calculated as 14C uptake into particulate organic carbon multiplied by total DIC [14C-POC (Bq per ml of culture)/14C-total (Bq per ml of culture) × DIC (mmol per ml culture)], and normalized to reducing equivalent transfer, values are compared with the consumption of sulfide. Genomic DNA was extracted from TAOM and HotSeep-1 enrichment cultures (as described above) and prepared for Illumina sequencing using the Nextera mate pair sample preparation kit (Illumina), following the Gel-Plus protocol of the manufacturer’s user guide. DNA fragments with a length of approximately 5–8 kb were extracted from a preparative gel before circularization. Additionally a paired-end read library with insert size of 500 bp was constructed for the TAOM enrichment using the TruSeq library preparation kit. Libraries were sequenced on a MiSeq instrument (MiSeq, Illumina) in a 2 × 250 bases paired-end run. Quality-controlled mate pair reads were assembled using the SPAdes genome assembler v.3.5.0 (ref. 41) with default values of k and the -hqmp option. Assembled contigs from the TAOM metagenome were binned based on tetranucleotide frequency using the Metawatt software42. ANME-1- and HotSeep-1-specific bins were extracted for targeted reassembly using the SPAdes genome assembler v.3.5.041 with mapped mate pair and paired end read data and default values of k and subsequently were used as draft genomes. A HotSeep-1 draft genome was also obtained from the assembled contigs of the highly enriched HotSeep-1 culture metagenome (hydrogenotrophic HotSeep-1). Draft genomes were annotated with Prokka43, and the draft genome of HotSeep-1 (obtained from the hydrogenotrophic HotSeep-1 enrichment) was additionally annotated with an in-house pipeline and analysed using GenDB44 and JCoast45. The annotation of reported genes was manually curated. An expectation (E)-value cut-off of 1 × 10−5 was considered for all predictions of putative protein functions. Identity of the enriched hydrogenotrophic HotSeep-1 and the TAOM partner HotSeep-1 was evaluated by pairwise blast search of the nucleotide sequence of the 16S and 23S rRNA genes, functional and housekeeping genes and the intergenic spacer region (Extended Data Table 2) derived from the draft genome of the TAOM partner HotSeep-1 (query) versus the hydrogenotrophic HotSeep-1 (subject). To verify that the organisms belong to the same species the average nucleotide identity (ANI) and the tetranucleotide frequency correlation of the two draft genomes were determined using JSpecies46 (v.1.2.1). Analyses resulted in tetranucleotide frequency correlation of 0.999 and ANI of >99%. To check for absence of ANME-1 in the hydrogenotrophic HotSeep-1 culture metagenomic reads were mapped to the SILVA SSU 119 reference database (bbmap v.35 and pyhloflash v.1.5) for phylogenetic classification at minimum identities of 90%, 95% and 97% resulting in approximately 3,500, 2,100 and 1,500 classified 16S rRNA gene fragments, respectively, which were screened for hits to ANME related sequences. To identify potential cytochrome c and type IV pili (T4P) genes in the draft genomes of ANME-1 and HotSeep-1, protein domains were predicted using hmmscan (HMMER 3.047) with the PfamA48 and TIGRFAM49 reference databases. Potential cytochromes were identified by the CXXCH motive and cytochrome c-specific protein domain models. Potential T4P genes were identified using protein models related to T4P. ANME-1 cytochrome and HotSeep-1 cytochrome and pili genes were compared for their best matching hits in the G. sulfurreducens (strain PCA) and G. metallireducens genome and the NCBI non-redundant protein database using blastp. Cytochrome annotation based on detected protein domains in PfamA, pili annotation based on detected protein domains and amino acid sequence. Subcellular localization was predicted with PSORTb50 (v.3.0.2). For cytochromes the number of potential haem-binding sites was derived from the abundance of the CXXCH motif. For sequence comparison to the NCBI non-redundant protein database and Geobacter spp. and for details on protein domains and subcellular localization prediction see Supplementary Table 2a, b, (ANME-1 cytochromes), Supplementary Table 3a, b, (HotSeep-1 cytochromes) and Supplementary Table 4a, b (HotSeep-1 Type IV pili biogenesis). Representative sequences are deposited in GenBank under the accession numbers KT759143–KT759147, KT795302–KT795321, KT795322 and KT795323. The ANME-1 draft genome was searched for genes encoding catalytic subunits of hydrogenases using blastp search against known genes of catalytic subunits of [NiFe] and [FeFe] hydrogenases (mvhA, echA, frhA, vhuA, vhtA, ehaO, hymC). Annotation of genes with hits was evaluated by blastp search against the NCBI non-redundant protein database for best matching reference sequences related to hydrogenases, but none were found. To collect cells for transcriptome analyses a 3.5-day experiment with replicates of 20 ml culture in 60-ml vials was carried out (Fig. 1). From triplicate TAOM cultures incubated with methane as control, with hydrogen, with methane/hydrogen mixture, or nitrogen as negative control, ~80% of the enrichment medium was removed and RNA was preserved using pre-heated RNA later (Life Technologies, ThermoFisher Scientific). Total RNA was extracted using the Quick-RNA MiniPrep kit (Zymo Research), treated with DNase I (Roche) and purified using the RNeasy MinElute Cleanup kit (Qiagen) following the manufacturer’s recommendations. Removal of rRNA was omitted and total RNA was prepared for sequencing using the TruSeq stranded mRNA library prep kit (Illumina) following the manufacturer’s guidelines. The cDNA library was sequenced on a MiSeq instrument (MiSeq, Illumina) generating between 2 to 3 million 150-bp single-end reads per library. Quality-controlled reads were mapped to the draft genomes of HotSeep-1 and ANME-1 using bbmap (v.35) with a minimum mapping identity of 98%. To quantify gene expression unambiguously mapped reads per gene were counted using bedtools multicov (v.2.24.0). To compare relative expression patterns within each organism across treatments, read counts per feature were converted to transcripts per million (TPM), which is the abundance of a specific gene (i) relative to the abundance and length of all other transcribed genes (j) observed in one million sequenced reads calculated according to ref. 51: where X = counts and l = length (bp) per gene. Relative changes in expression of selected genes were calculated by comparing TPM normalized expression data of the H and H + CH treatment to those under TAOM (control) conditions. Differential expression (P value, fold change and effect size) between control (TAOM condition) and treatment (H or H + CH ) was computed with the aldex2 R package52 for ANOVA-like differential expression analysis. Raw read numbers, read mapping data and statistical analysis are provided in Supplementary Table 1 (total expression) and Supplementary Table 5 (specific gene expression). For HotSeep-1 transcriptomes total RNA was extracted from triplicate cultures (50 ml) grown on hydrogen/CO following the same procedure as described for TAOM enrichments (see above). Removal of rRNA was omitted and total RNA was prepared for sequencing using the TruSeq stranded mRNA library prep kit (Illumina) following the manufacturer’s guidelines. The cDNA library was sequenced on a MiSeq instrument (MiSeq, Illumina) generating between 6.4 to 6.9 million 75-bp paired-end reads per library. Quality-controlled reads were mapped to the draft genome of HotSeep-1 using bbmap (v.35) with a minimum mapping identity of 98%. To quantify gene expression unambiguously mapped reads per gene were counted using featureCount53 (part of Subread, v.1.4.6.) with the -p option to count fragments instead of reads. Fragment counts per gene were converted to transcripts per million (TPM) as described above for TAOM transcriptome analyses. Active cultures of G. sulfurreducens (strain PCA; DSM 12127) and G. metallireducens (strain GS-15; DSM 7210) were mixed in fresh medium (DSM Medium 826) supplied with Na -fumarate (50 mM) and ethanol (20 mM) according to ref. 10 and cultivated anaerobically at 33 °C. After subsequent transfers (1% inoculum) a well-growing culture consisting of reddish microbial aggregates developed, which was used for thin sectioning and electron microscopy. The cell material was harvested at 2,000 r.p.m. using a Stat Spin Microprep 2 table-top centrifuge. After centrifugation the pellet was fixed by immersion using 2% glutaraldehyde in 0.1 M cacodylate buffer at pH 7.4. Fixation was performed for 60 min at room temperature. The fixed pellet was immobilized with 2% agarose in cacodylate buffer at pH 7.4. The pellet was cubed and the pieces carefully washed with buffer and further fixed in 1% osmium tetroxide. After pre-embedding staining with 1% uranyl acetate, samples were dehydrated and embedded in Agar 100 (Epon 812 equivalent). As an independent complementary method (shown in Extended Data Fig. 5a), samples were placed in aluminium platelets of 150 µm depth containing 1-hexadecene (ref. 54). The platelets were frozen using a Leica Em HPM100 high pressure freezer (Leica Mikrosysteme Vertrieb GmbH). The frozen samples were transferred to an Automatic Freeze Substitution Unit Leica EM AFS2 and substituted at −90 °C in a solution containing anhydrous acetone, 0.1% tannic acid for 24 h and in anhydrous acetone, 2% OsO , 0.5% anhydrous glutaraldehyde (EMS Electron Microscopical Science) for an additional 8 h. After a further incubation over 20 h at −20 °C samples were warmed up to + 4 °C and washed with anhydrous acetone subsequently. The samples were embedded at room temperature in Agar 100 at 60 °C over 24 h. Thin sections (30–60 nm) were counterstained with uranyl acetate and lead citrate and examined using a Philips CM 120 transmission electron microscope (Philips Inc.). In total, we recorded more than 200 views on TAOM consortia, 64 views on HotSeep-1 and 90 views of Geobacter consortia. Free energy yields (ΔG ) were calculated according to the equation: including the gas constant R, the temperature T (K) and the measured activities/partial pressures of the respective products P and reactants R in their respective stoichiometric appearance (n) in the reaction. Values consider activities and fugacity of respective compounds. The temperature-corrected standard free energy were determined according to ref. 55.

News Article
Site: www.nature.com

The inoculum for the Ca. N. inopinata enrichment culture was sampled from a microbial biofilm that grew on the metal surface of a pipe and was covered by hot water, which was raised from a 1,200 m deep oil exploration well. The water temperature was 56 °C and the pH 7.5. The well was located in Aushiger, North Caucasus, Russia (43°22′45.0′′ N, 43°43′26.1′′ E). The biofilm samples were taken in April 2011. Activated sludge, membrane biofilm, and foam (from a foaming event) samples were taken in August and October 2014 from a pilot-scale membrane bioreactor (MBR) performing nitrogen removal and enhanced biological phosphorus removal (EBPR) at the conventional full-scale WWTP Aalborg West, Aalborg, Denmark (57°02′59.9′′ N, 9°51′55.4′′ E). The influent wastewater for this MBR came from the primary settling tank of the full-scale plant, entering an anoxic/denitrification (2 m3) tank and going to an oxic/nitrification (2 m3) tank. An anaerobic tank (1.8 m3) used for return sludge sidestream hydrolysis provided easily degradable substrate for EBPR and denitrification. Activated sludge was also sampled from an aerated activated sludge basin (tank no. 2) of the full-scale WWTP of the University of Veterinary Medicine, Vienna, Austria (48°15′17.8′′ N, 16°25′45.6′′ E) in January 2015 (WWTP VetMed). The two continuously operated activated sludge tanks of this WWTP have a volume of 254 m3 each. The wastewater composition and nitrogen load vary with the amounts of animal faeces and other sewage. This WWTP was known to host a large diversity of Nitrospira18. Iron sludge samples were taken from groundwater well (GWW) no. 1 of the well field of the Wolfenbüttel waterworks (Wolfenbüttel, Germany) (52°08′55.9′′ N, 10°32′33.9′′ E). The well has a depth of 50 m below ground level (bgl) and a diameter of 600 mM. Groundwater is extracted through two well intake screens in 28 to 38 m bgl and 46 to 48 m bgl. The normal well capacity is 160 m3 h−1. Before sampling, the well had been out of operation for about three weeks. The well water is a mixture of aerobic and anaerobic groundwater from two different ground water storeys and is characterized by the following parameters (values from years 2012 to 2014): pH about 7.2, about 10 °C, 5 to 10 mg l−1 dissolved oxygen, 0.13 to 0.17 mg l−1 ammonium, <0.01 mg l−1 nitrite, 12 to 16 mg l−1 nitrate, 0.16 to 0.42 mg l−1 total iron, 0.03 to 0.08 mg l−1 manganese, 0.64 to 0.99 mg l−1 total organic carbon, 0.44 to 0.78 mg l−1 dissolved organic carbon, 71 to 81 mg l−1 dissolved inorganic carbon, 121 to 138 mg l−1 calcium. The drop pipe, through which the extracted water is pumped to ground level, was drawn out of the well on 27 April 2015 and had deposits of pasty iron sludge on the inner surface. A sample was taken from these deposits at several points corresponding to depths between 20 and 10 m bgl. A second sample consisted of suspended iron sludge deposits that had been flushed away from the upper well intake screen and retained on a fleece filter during pumping out of the turbid water on 28 April 2015. The biofilm used as inoculum was suspended and incubated at 46 °C with 0.5 mM NH Cl in a modified AOM medium51 containing (per litre): 50 mg KH PO ; 75 mg KCl; 50 mg MgSO  × 7H O; 584 mg NaCl; 4 g CaCO (mostly undissolved, acting as a solid buffering system and growth surface); 1 ml of specific trace element solution (TES); and 1 ml of selenium-wolfram solution (SWS)52. The composition of TES and SWS is described below. Both solutions were added to the autoclaved medium by sterile filtration using 0.2 μm pore-size cellulose acetate filters (Thermo Scientific). The pH of the medium was around 8.2 after autoclaving and was kept around 7.8 by the CaCO buffering system during growth of the enrichment. TES contained (per litre): 34.4 mg MnSO  × 1H O; 50 mg H BO ; 70 mg ZnCl ; 72.6 mg Na MoO  × 2H O; 20 mg CuCl  × 2H O; 24 mg NiCl  × 6H O; 80 mg CoCl  × 6H O; 1 g FeSO  × 7H O. All salts except FeSO  × 7H O were dissolved in 997.5 ml Milli-Q water and 2.5 ml of 37% HCl was added before dissolving the FeSO  × 7H O salt. SWS contained (per litre): 0.5 g NaOH; 3 mg Na SeO  × 5H O; 4 mg Na WO  × 2H O. The primary ammonium-consuming enrichment was subsequently treated with antibiotics (one treatment with 50 mg l−1 vancomycin, two treatments with 50 mg l−1 bacitracin). The ammonium concentration was increased to 1 mM NH Cl for these and all further cultivation steps. After these treatments and repeated serial dilutions in AOM medium without antibiotics, enrichment culture ENR4 was obtained that was characterized in this study. An aliquot of ENR4 was incubated at 50 °C for four weeks and then subjected to serial dilution at 46 °C. Propagation of the most diluted (10−8) ammonia-oxidizing culture was followed by serial dilution in AOM medium containing 1 mM urea instead of ammonium. The most diluted (10−7) urea-consuming (that is, nitrifying) culture was again cultivated in AOM medium with 1 mM NH Cl and subjected to repeated serial dilutions, which resulted in culture ENR6 that was also characterized in this study. Enrichments ENR4 and ENR6 were further cultivated in 100 ml or 250 ml Schott bottles in AOM medium containing 1 mM NH Cl. To obtain enough biomass for DNA extraction, enrichment ENR4 was up-scaled in 1 l and 2 l Schott bottles. The composition of enrichment cultures was analysed by phase contrast microscopy, electron microscopy, FISH with rRNA-targeted probes, amoA- and 16S rRNA-specific PCR, and metagenomics (see later for methodological details). To study nitrification by Ca. N. inopinata, an actively nitrifying ENR4 stock culture was harvested by centrifugation (9,300g, 30 min, 10 °C) and the biomass was suspended in AOM medium (see above) without ammonium. Aliquots (25 ml) of this suspension were distributed to 100 ml Schott bottles (all glassware was rinsed twice in 6 M HCl and three times in Milli-Q water, autoclaved, and dried at 60 °C before use). After addition of NH Cl to final concentrations of 1 mM, 0.1 mM, or 10 μm, respectively, or of NaNO to a final concentration of 0.5 mM, the biomass was incubated at 46 °C for 9 h (10 μm NH Cl) or 48 h (other experiments) without agitation in the dark. Samples (500 μl) for chemical analyses (see below) were taken directly after ammonium or nitrite addition and during the incubations. The samples were centrifuged (22,000g, 10 min, 4 °C) to remove cells and undissolved CaCO and 450 μl of the supernatant was transferred to plastic tubes and stored at −20 °C until analysis. Each incubation condition except 10 μm NH Cl was performed in parallel with four biological replicates (biological triplicates for 10 μm NH Cl), two dead biomass controls (cells were killed by autoclaving), and two abiotic controls that contained only medium and substrate, but no biomass. After the experiments, the remaining biomass was harvested by centrifugation (9,300g, 30 min, 10 °C), frozen immediately at −80 °C, and shipped on dry ice for proteome analysis. To quantify growth of Ca. N. inopinata by complete nitrification, culture ENR4 was incubated in mineral NOB medium, which has been used to cultivate nitrite-oxidizing Nitrospira21. In this experiment, the NOB medium was amended with ammonium instead of nitrite. The NOB medium was chosen because it contains less CaCO3, which can affect quantitative PCR (qPCR) efficiency and accuracy. Nitrifying activity of ENR4 in NOB medium was confirmed in preceding tests. Biomass from the supernatant (without undissolved CaCO ) from an ammonia-oxidizing culture was washed once in NOB medium, harvested by centrifugation (9,300g, 30 min, 10 °C), and prepared for incubation as described above. Following the addition of NH Cl to a final concentration of 0.6 mM, samples (100 μl) for quantitative PCR were taken immediately and after 4, 5, 7, and 8 days of incubation. Samples for chemical measurements (see below) were taken immediately and after 1, 4, 5, 7, and 8 days of incubation. All samples were stored at −20 °C until analysis. These incubation experiments were performed in biological triplicates. Copy numbers of the Ca. N. inopinata amoA gene were determined by qPCR using the newly designed Ca. Nitrospira inopinata amoA gene-specific primers Nino_amoA_19F (5′-ATAATCAAAGCCGCCAAGTTGC-3′) and Nino_amoA_252R (5′-AACGGCTGACGATAATTGACC-3′). The qPCR reactions were run with three technical replicates in a Bio-Rad C1000 CFX96 Real-Time PCR system, using the Bio-Rad iQ SYBR Green Supermix kit (Bio-Rad). Each qPCR reaction was performed in 20 μl reaction mix containing 10 μl SYBR Green Supermix, 2 μl of the sampled ENR4 cell suspension, 0.1 μl of each primer (50 μM), and 7.9 μl of autoclaved double-distilled ultrapure water. Cells were lysed and DNA was released for 10 min at 95 °C, followed by 43 PCR cycles of 40 s at 94 °C, 40 s at 52 °C, and 45 s at 72 °C. Plasmids carrying the Ca. N. inopinata amoA gene were obtained by PCR-amplifying the gene from the ENR4 culture and cloning the product into the pCR4-TOPO TA vector (Invitrogen). The M13-PCR product from these plasmids containing the amoA gene was used as standard for qPCR (the amoA copy number in the standard was calculated from DNA concentration). Tenfold serial dilutions of the standard were subjected to qPCR in triplicates to generate an external standard curve. The amplification efficiency was 92.6%, and the correlation coefficient (r2) of the standard curve was 0.999. A 1 ml aliquot of the ENR4 culture was transferred to 25 ml modified AOM medium (see above) containing 6 mM sodium acetate. After three weeks of incubation at 46 °C, a 1 ml aliquot of the betaproteobacterial primary enrichment was transferred into 25 ml of fresh modified AOM medium containing 6 mM sodium acetate. After three more weeks, a 5 ml aliquot of this culture was centrifuged (9,300g, 10 min, 10 °C) and the cells were resuspended in 25 ml NOB medium (see above) containing 1 ml of SWS and 4 mM sodium acetate. Thereafter, 1 ml of the betaproteobacterial enrichment was transferred into fresh NOB medium containing 4 mM sodium acetate every 2 weeks. The fourth transfer was checked for purity by FISH with the betaproteobacterium-specific probe Nmir1009, which showed 100% overlap with the EUB338 probe mix and DAPI signals. No Nitrospira cells were detected by FISH in the culture. To test whether the betaproteobacterium had the capability to nitrify, 20 ml of a dense pure culture of this organism was centrifuged (9,300g, 10 min, 10 °C), washed once in modified AOM medium without solid CaCO , and resuspended in modified AOM medium without ammonium and solid CaCO . Aliquots of this suspension were distributed into 100 ml Schott bottles, which had been rinsed twice in 6 M HCl, washed 3 times in Milli-Q water, closed with aluminium caps, autoclaved, and dried at 60 °C before use. Subsequently, the following substrates were added: 1 mM NH Cl; or 0.5 mM NaNO and 0.1 mM NH Cl; or 4 mM sodium acetate and 0.1 mM NH Cl (the 0.1 mM NH Cl was added to the nitrite and acetate incubations to provide the organism with a nitrogen source for assimilation). The biomass was incubated at 46 °C in the dark without agitation. All experiments were performed in parallel with biological triplicates. Samples (700 μl) for qPCR and chemical analyses (see below) were taken immediately after experimental set-up and after 19, 24, 30, 42, and 48 h of incubation. The samples were stored at −20 °C until analysis. Cell densities of the betaproteobacterium were quantified by qPCR targeting the soxB gene, which encodes the SoxB component of the periplasmic thiosulfate-oxidizing Sox enzyme complex. SoxB is a single-copy gene in the genome of the betaproteobacterium. The primers used to quantify the soxB gene were soxB_F1 (5′-GGACCAGACCGCCATCACTTACCC-3′) and soxB_R1 (5′-GCACCATGTCCCCGCCTTGCT-3′). The qPCR protocol and conditions were the same as described above. Ammonium levels were measured photometrically as described previously53, 54 with adjusted volumes of sample and reagents. Standards were prepared in AOM or NOB medium and ranged from 7.25 to 1,000 μm NH Cl. Nitrite concentrations were determined photometrically by the acidic Griess reaction55. Nitrate was reduced to nitrite by vanadium chloride and measured as NO by the Griess assay. Nitrate concentrations were calculated from the NO measurements as described elsewhere56. Standards were prepared in AOA or NOB medium and ranged from 7.8 to 1,000 μm for NO and from 3.9 to 500 μm for nitrite. The number of replications are detailed in the subsections for each specific experiment, and were mostly determined by the amount of biomass available for the different cultures. In all experiments, a minimum of three biological replications were used. No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. FISH with rRNA-targeted oligonucleotide probes was performed as described elsewhere57 using the EUB338 probe mix58, 59 for the detection of Bacteria, probes Ntspa662 and Ntspa712 specific for Nitrospira10, and probes Nso1225, Nso190, and NEU specific for betaproteobacterial AOB22. The betaproteobacterium in ENR4 and ENR6 was detected by FISH with the specific probe Nmir1009 (5′-CACTCCCCCGTCTCCGGG-3′) with 35% of formamide in the hybridization buffer. If required, unlabelled competitor oligonucleotides were added in equimolar amounts as probes. Cells were counterstained by incubation for 5 min in a 0.1 μg ml−1 DAPI (4′,6-diamidino-2-phenylindole) solution. Fluorescence micrographs were recorded by using a Leica SP7 confocal laser scanning microscope equipped with a white light laser. To determine the relative abundances of Nitrospira and AOB in WWTP VetMed by quantitative FISH, 20 confocal images of FISH probe signals were taken at random positions in the sample and analysed as described elsewhere60 by using the digital image analysis software daime61. For whole-cell electron microscopy, cells were positively stained with 1% (w/v) uranyl acetate. Electron microscopy of thin sections was carried out as described elsewhere62. To check whether the Ca. N. inopinata enrichments contained known AOB or AOA, PCR tests were performed using primer sets amoA-1F/amoA-2R targeting betaproteobacterial amoA63, CamoA-19f/CamoA-616r targeting thaumarchaeal amoA33, 64, and 771F/957R for thaumarchaeal 16S rRNA genes65 and the respective published reaction conditions. DNA was extracted for these PCR assays by using the PowerSoil DNA Isolation Kit (MoBio) according to the manufacturer’s instructions. Protein extraction from concentrated ENR4 biomass, proteolytic digestion, analysis of peptide lysates by mass spectrometry (MS), processing of MS raw files, and analysis of MS spectra were carried out as described elsewhere20. MS spectra were searched against a database of predicted gene products on the ENR4 metagenome scaffolds containing 12,234 sequence entries and a common Repository of Adventitious Proteins (cRAP) database using the Sequest HT algorithm. The PROPHANE pipeline (http://www.prophane.de/index.php) was used to classify the lowest common phylogenetic ancestor of each protein group and to calculate the normalized spectral abundance factor (NSAF). Biomass of enrichment ENR4 was collected from three culture bottles (samples ENR4_A, ENR4_E, ENR4_F) by centrifugation and frozen over night at −80 °C before total nucleic acids were extracted by bead beating in the presence of phosphate buffer, 10% (w/v) SDS and phenol as described elsewhere66 (see ref. 67 for full protocol). Bead beating was repeated twice to break remaining intact cells, the supernatants from each step were pooled, and nucleic acids purified by phenol/chloroform/isoamyl alcohol and chloroform/isoamyl alcohol extraction. Nucleic acids were precipitated using 20% (w/v) polyethylene glycol, washed in ice-cold 75% (v/v) ethanol, and resuspended in sterile 10 mM TRIS buffer. RNA was digested with RNase I (Promega) and the purity of DNA assessed by spectrophotometry. The same protocol was used to extract DNA from concentrated biomass of enrichment ENR6 (sample ENR6_N3), with the modification that bead beating was not repeated, and from an activated sludge sample of WWTP VetMed collecting only the supernatants of the second and third bead beating steps (DNA extract Vetmed_23). DNA was extracted from a second aliquot of the WWTP VetMed sample (DNA extract Vetmed_Pskit), and from pasty (sample GWW_HP_F1) or suspended (sample GWW_HP_D) iron sludge from the GWW, by using the PowerSoil DNA Isolation Kit (MoBio). DNA was extracted from all MBR samples by using the FastDNA SPIN Kit for Soil (MP Biomedicals) following the manufacturer’s instructions. Sequencing libraries were prepared using the Nextera or TruSeq PCR free kits (Illumina Inc.) following the manufacturer’s recommendations. For the TruSeq PCR free kits, the 550 bp protocol was used with 1 μg of input DNA. The prepared libraries were sequenced using either an Illumina MiSeq with MiSeq Reagent Kit v3 (2x301 bp; Illumina Inc.) or an Illumina HiSeq2000 using the TruSeq PE Cluster Kit v3-cBot-HS and TruSeq SBS kit v.3-HS sequencing kit (Illumina Inc.). Nanopore sequencing was performed in addition to facilitate completion of the Ca. N. inopinata genome sequence. Library preparation was done using the Nanopore Sequencing kit (SQK-MAP005, Oxford Nanopore) following the manufacturer's recommendations (v. MN005_1124_revC_02Mar2015) with shearing in an Eppendorf MiniSpin plus centrifuge at 8,000 rpm and including the optional PreCR treatment step, as well as Ampure XP Bead purification after dA-tailing. The libraries were sequenced using nanopore flow cells (FLO-MAP003, Oxford Nanopore) using the MinION device (Oxford Nanopore) with the MinKNOW software (v. Flow cells were primed twice with a mixture of 3 μl Fuel Mix, 75 μl 2 × Running Buffer, and 72 μl nuclease-free water for 10 min. Libraries were prepared for loading onto the flow cell by mixing 75 μl 2 × Running Buffer, 66 μl nuclease-free water, 3 μl Fuel Mix, and 6 μl Library (Pre-sequencing Mix). A sequencing run was started (MAP_48Hr_Sequencing_Run.py) after loading the library. Additional DNA library top-ups and restart of the run script was carried out to maximize yield by allowing a new selection of active pores. Base calling was carried out using Metrichor and the 2D Basecalling workflow (Rev 1.16). Details for each metagenome can be found in Supplementary Table 1. Paired-end Illumina reads were imported into CLC Genomics Workbench v. 8.0 (CLCBio, Qiagen) and trimmed using a minimum phred score of 20 and a minimum length of 50 bp, with allowing no ambiguous nucleotides and trimming off Illumina sequencing adaptors if found. FASTQ files for the Oxford Nanopore 2D reads were obtained using the R package poRe v. 0.668 and error corrected using Illumina reads through Proovread v. 2.1369. For each environment, all trimmed Illumina reads were co-assembled using CLCs de novo assembly algorithm, using a kmer of 63 and a minimum scaffold length of 1 kbp. Trimmed reads were mapped to the assembled scaffolds using CLCs map reads to reference algorithm, with a minimum similarity of 95% over 70% of the read length. Open reading frames (ORFs) were predicted in the assembled scaffolds using Prodigal70. A set of 107 hidden Markov models (HMMs) of essential single-copy genes71 were searched against the ORFs using HMMER3 (http://hmmer.janelia.org/) with default settings, except option (-cut_tc) was used. Identified proteins were taxonomically classified using BLASTP against the RefSeq (v. 52) protein database with a maximum e-value cutoff of 10−5. MEGAN72 was used to extract class-level taxonomic assignments from the BLAST output. The script network.pl (http://madsalbertsen.github.io/mmgenome/) was used to extract paired-end read connections between scaffolds. PhyloPythiaS+73 was used to taxonomically classify all scaffolds of selected samples. In addition, selected metagenome assemblies were binned based on ESOM maps74. After training the ESOM using scaffolds >5 kbp and large scaffolds chopped into 5 kbp pieces, all scaffolds were projected back to the ESOM map to retrieve a single coordinate for all scaffolds. Individual genome bins were extracted using the multi-metagenome principles23 implemented in the mmgenome R package (http://madsalbertsen.github.io/mmgenome/). All genome bins are fully reproducible from the raw metagenome assemblies using Rmarkdown files available on http://madsalbertsen.github.io/mmgenome/. The script extract.fastq.reassembly.pl was used to extract paired-end reads from the binned scaffolds, which were used for re-assembly using SPAdes75. For selected samples, error-corrected Oxford Nanopore 2D reads were used for scaffolding using SSPACE-LongRead76. For all genomes, quality was assessed using coverage plots through the mmgenome R package and through the use of QUAST77 and CheckM78. Details for each metagenome assembly can be found in Supplementary Table 2, and further details for the reconstructed bacterial genomes (including CheckM results) in Supplementary Tables 3–7. Relative genome sequence coverage was calculated as the fraction of sequence coverage of a reconstructed genome compared to the summed coverage of all genomes in these low-complexity metagenomes. The reconstructed bacterial genomes were uploaded to the MicroScope platform79 for automatic annotation and for manual annotation refinement17 of key pathways of Ca. N. inopinata. To test for the presence of additional organisms capable of nitrification, the raw reads for each enrichment ENR4 and ENR6 were mapped to the amoA, amoB, amoC, hao and nxrB sequences used to generate the trees in Extended Data Figs 5b,d, 8, and 9. Reads were required to align to any one member of a target data set over at least 70% of read length with BLASTN (word size = 7). Reads that mapped with >97% nucleotide identity were automatically classified. Reads with lower identity were placed with the Evolutionary Placement Algorithm (EPA) using RAxML80. Using this procedure, no indication was found for the presence of any nitrifier other than Ca. N. inopinata in these enrichments. For phylogenetic analyses of AMO and HAO, full amino acid data sets were downloaded from the Pfam81 site for bacterial (pfam02461) and archaeal (pfam12942) amoA. Additional amino acid sequences were identified from the NCBI GenBank82 and the Integrated Microbial Genomes databases (IMG-ER and -MER)83 that were returned using the search words ‘ammonia, methane, amo, pmo or monooxygenase’ (GenBank) or had been annotated with one of the target pfams (IMG). A BLASTP84 search was performed using the Ca. Nitrospira inopinata amoA sequence as a query, word size = 2, BLOSUM 45, E = 10 and the top 1,000 returned sequences were downloaded. Comparable procedures were performed to generate a comprehensive set of amoB (pfam04744) and amoC (pfam04896) sequences. For construction of the hao (pfam13447) data set, query words were changed to ‘hydroxylamine’ and ‘Hao’. For each gene set, amino acid sequences were filtered using hmmsearch (http://hmmer.janelia.org/) with the respective pfam HMMs, requiring an expect value < 0.001. Amino acid sequences were clustered at 75% identity using USEARCH85 and aligned using Mafft86. Phylogenetic trees were calculated using PhyloBayes87, running 5 independent chains for 21,000 cycles each, using 11,000 cycles for burn-in and sampling every 20 cycles. Sequences that mapped to centroids that clustered within the comammox clade were used for additional phylogenetic calculations along with an outgroup of 27 betaproteobacterial amoA and 29 diverse pmoA sequences. Corresponding nucleotide sequences for this set were aligned according to their amino acid translations using MUSCLE88 and manually corrected for frameshifts. Nucleotide alignments were then used for constructing consensus trees in Phylobayes, running 5 independent chains for 21,000 cycles each, using 11,000 cycles for burn-in and sampling every 20 cycles. To estimate relative abundances of amoA genes, comammox-type amoA sequences were identified from three publicly available Rifle soil metagenomic data sets (3300002121, 3300002122 and 3300002124) available within IMG. Functional profiles were generated within IMG using pfam12942 (archaeal amoA) and pfam02461 (bacterial amoA/pmoA) against the assembly and unassembled reads. All identified amoA/pmoA nucleotide sequences were downloaded as nucleic acid sequences and added to the existing amoA alignment used to generate Extended Data Fig. 8 with the -add option in Mafft. EPA in RAxML was used to assign downloaded sequences into the reference tree that is the basis for Extended Data Fig. 8. AmoA abundance for each amoA type (comammox, archaeal, betaproteobacterial AOB) was estimated by taking the sum of the estimated copy numbers of each assembled amoA gene of a given type as well as the number of unassembled reads assigned to a given amoA type. Comammox, betaproteobacterial, and archaeal amoA sequences from the metagenomes of WWTP VetMed and the GWW were identified using the same procedure as above. Comammox amoA read abundances were then used to calculate an estimate of the fraction of Nitrospira that are comammox. AmoA was assumed to be a single copy gene in all comammox (as it is in Ca. N. inopinata). Total Nitrospira were enumerated by mapping raw reads from metagenomic samples using the first 700 nucleotides of the predicted ATP-citrate lyase subunit beta (aclB) gene from Ca. N. inopinata. Reads were required to align to Ca. N. inopinata aclB over at least 70% of read length and with >60% alignment identity with BLASTN (word size = 7). AclB was chosen on the basis that this gene has a restricted taxonomic distribution, encodes a key enzyme of the reductive tricarboxylic acid cycle employed by all known Nitrospira for CO fixation, and is present in single copy within known Nitrospira genomes. To test its utility, all 150 nt segments (pos 1:150, 2:151…1,051:1,200) of the Ca. N. inopinata aclB gene was used as a query against the nr database (BLAST, word size = 7, 70% read length and 60% alignment identity). Over the first 700 nucleotides of the aclB gene, test fragments mapped only to reference Nitrospira organisms. Downstream of this region, the aclB mapping was less specific, mapping to Nitrospira and Chlorobi with high (>90%) identity. Coverage of each gene was calculated by dividing the number of mapped reads by gene length of the query (843 nt for comammox amoA and 700 nt for Nitrospira aclB). Adjusted coverage was calculated by dividing gene coverage by total number of reads in the metagenomic data set. Ratios discussed in the main text are then the adjusted coverage of comammox (as calculated from comammox amoA) divided by the adjusted coverage for all Nitrospira (as calculated from aclB). For phylogenetic analyses of NXR, the NxrA and nxrB sequences of Ca. N. inopinata were imported into existing NxrA17 and nxrB8 sequence databases using the software ARB89. NxrA sequences were aligned using Mafft, nxrB sequences were manually aligned according to their amino acid translations. Maximum likelihood trees were calculated using RAxML with the GAMMA model of rate heterogeneity using empirical base frequencies and the LG substitution model (NxrA) or with the GAMMA model of rate heterogeneity and the GTR substitution model (nxrB). Bayesian inference trees were calculated using PhyloBayes, running 3 independent chains for 32,200 cycles each, using 6,440 cycles for burn-in (NxrA) or 3 independent chains for 35,500 cycles each, using 7,000 cycles for burn-in (nxrB). Nitrospira 16S rRNA genes from this study were added to an existing Nitrospira 16S rRNA sequence database and aligned in ARB. Phylogenetic trees were calculated using RAxML with the GAMMA model of rate heterogeneity and the GTR substitution model, and using MrBayes90 v.3.2.1, running 4 independent chains for 5 million generations each, with 1.25 million cycles for burn-in and sampling every 100 generations. Pairwise average nucleotide identity (ANI) values were calculated for comammox Nitrospira genomes using BLAST (ANIb) in JSpecies91. Genome-wide tetranucleotide signatures were calculated for the forward and reverse strand for each genome with the oligonucleotideFrequency(width = 4) command using the Biostrings package in R92. Tetranucleotide patterns were also calculated across the length of the genome with a sliding window of 5 kb (step = 1 kb). The tetranucleotide pattern for each window was compared to the global tetranucleotide signature by calculating the Pearson correlation (r) of log(1+counts) of each window against the log(1+counts) of the global signature. P values, indicating a significantly low correlation for tetranucleotide signature of a window, were calculated by modelling 1 − r across all windows as a log-normal distribution. Multiple testing was accounted for by using the Benjamini–Hochberg procedure with a false discovery rate of 5%.

Discover hidden collaborations