No statistical methods were used to predetermine sample size. Experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment. The recombineering technique21 was adapted to construct all targeting vectors for homologous recombination in ES cells. Retrieval vectors were obtained by combining 5′ miniarm (NotI/SpeI), 3′ miniarm (SpeI/BamHI) and the plasmid PL253 (NotI/BamHI). SW102 cells21 containing a BAC encompassing the carboxy-terminal part of the gene encoding the remodeller, were electroporated with the SpeI-linearized retrieval vector. This allowed the subcloning of genomic fragments of approximately 10 kilobases (kb) comprising the last exon of the gene encoding each remodeller. The next step was the insertion of a TAP-tag into the subcloned DNA, immediately 3′ to the coding sequence. The TAP-tag was (Flag) -TEV-HA for Chd1, Chd2, Chd4, Chd6, Chd8, Ep400, Brg1 and 6His-Flag-HA for Chd9. We first inserted the TAP-tag and an AscI site into the PL452 vector, to clone 5′ homology arms as SalI/AscI fragments into the PL452TAP-tag vector. 46C ES cells were electroporated with NotI-linearized targeting constructs and selected with G418. In all cases, G418-positive clones were screened by Southern blot. Details on the Southern genotyping strategy, as well as sequences of primers and plasmids used in this study are available on request. Correctly targeted ES cell clones were karyotyped, and the expression of each tagged remodeller was controlled by western blot analysis, using antibodies against Flag and haemagglutinin (HA) epitopes (see Extended Data Fig. 6). We also verified by immunofluorescence, using monoclonal antibodies anti-Flag (M2, Sigma F1804) and anti-HA (HA.11, Covance MMS-101P) epitopes, that each tagged remodeller was properly localized in the nucleus of ES cells. ES cell lines expressing a tagged remodeller were all indistinguishable in culture from their mother cell line (46C). Pluripotency of tagged ES cell lines was verified by detecting alkaline phosphatase activity on ES cell colonies 5 days after plating, using the Millipore alkaline detection kit, following manufacturer’s instructions. In addition, we verified by immunofluorescence using an antibody against Oct4 (also known as Pou5f1) (Abcam ab19857, lot 943333) that expression of this pluripotency-associated transcription factor was uniform in each tagged ES cell line. Mouse 46C ES cells have been described previously22. 46C ES cells and their tagged derivatives were cultured at 37 °C, 5% CO , on mitomycin C-inactivated mouse embryonic fibroblasts, in DMEM (Sigma) with 15% fetal bovine serum (Invitrogen), l-glutamine (Invitrogen), MEM non-essential amino acids (Invitrogen), penicillin/streptomycin (Invitrogen), β-mercaptoethanol (Sigma), and a saturating amount of leukaemia inhibitory factor (LIF), as described previously23. Mouse ES nucleosomal tags were acquired from a published MNase-seq data set7 to make the reference map shown in Fig. 2. Reference nucleosomes were called using MACS 2.0 before assigning the first MNase-resistant nucleosome upstream and downstream of TSSs as −1 and +1, respectively. Because long NFRs may actually contain MNase-sensitive nucleosome-like structures or histone-containing complexes, defining the first downstream MNase-resistant nucleosome as ‘+1’ is problematic, and so we refer to it as the ‘first stable nucleosome’. Regions between the associated −1 and +1 (or first stable) nucleosomes were defined as NFRs. We further defined narrow and wide NFR categories, which have the median width of 28 bp and 808 bp, respectively. We define HFRs as lacking histones as defined by ChIP-seq. The list of 14,623 genes used in Figs 1 and 2 was obtained by filtering all mm9 RefSeq genes24. We removed redundancies (that is, genes having the same start and end sites), unmappable genes, blacklisted genomic regions (those with artefact signal regardless of which NGS techniques were used), and genes shorter than 2 kb. The purpose of this last filtering step was to unambiguously distinguish the promoter region from the end of the genes in heat maps. Lists of genes defined as having H3K4me3 and bivalent promoters: we first defined, among the 14,623 RefSeq genes, those with a promoter that was positive for H3K4me3 (accession number: GSM590111). This was accomplished by operating with the seqMINER platform. Tag densities from this data set were collected in a −500/+1,000-bp window around the TSS, and subjected to three successive rounds of k-means clustering, to remove all genes with a promoter that was clustered with low H3K4me3. We next conducted on this series of H3K4me3-positive promoters three successive rounds of k-means clustering, using several published data sets for H3K27me3. The genes with a promoter positive for H3K27me3 in four distinct H3K27me3 data sets (accession numbers: GSM590115, GSM590116, GSM307619 and GSM392046/GSM392047) were considered as bivalent. We eventually obtained a list of 6,481 genes with H3K4me3-only promoters, and a list of 3,411 bivalent genes. A detailed version of this protocol is available on the protocol exchange website: http://dx.doi.org/10.1038/protex.2014.040. In brief, about 400 million ES cells were fixed either with formaldehyde, or with a combination of disuccinimidyl glutarate (DSG) and formaldehyde (Supplementary Table 1), then permeabilized with IGEPAL, and incubated with 2,800 units of micrococcal nuclease (MNase, New England Biolabs) in order to fragment the genome into mononucleosomes (Extended Data Fig. 1). This nucleosome preparation was next incubated with agarose beads coupled with an antibody anti-HA or anti-Flag. Anti-HA-agarose (ref. A2095) and anti-Flag-agarose (ref. A2220) beads were purchased from Sigma. After a series of washes, tagged remodeller–nucleosome complexes were eluted, either by TEV protease cleavage or by peptide competition (Supplementary Table 1). The eluted complexes were then subjected to a second immunopurification step, using beads coupled to the antibody specific of the second HA or Flag epitope. After elution, DNA was extracted from the highly purified mononucleosome fraction, and processed for high-throughput sequencing (see below). As a negative control, chromatin from untagged ES cells was subjected to the same protocol to define background signal. Two biological replicates were used for each tagged and control ES cell line, using independent cell cultures and chromatin preparations. After crosslink reversion, phenol–chloroform extraction and ethanol precipitation, the DNA from remodeller–nucleosome complexes was quantified using the picogreen method (Invitrogen) or by running 1/20 of the ChIP material on a high sensitivity DNA chip on a 2100 Bioanalyzer (Agilent). Approximately 5–10 ng of ChIP DNA was used for library preparation according to the Illumina ChIP-seq protocol (ChIP-seq sample preparation kit). Following end-repair and adaptor ligation, fragments were size-selected on an agarose gel in order to purify nucleosome-sized genomic DNA fragments between 140 and 180 bp. Purified fragments were next amplified (18 cycles) and verified on a 2100 Bioanalyzer before clustering and single-read sequencing on an Illumina Genome Analyzer (GA) or GA II, according to manufacturer’s instructions. Sequencing characteristics are shown in Supplementary Table 1. Chd1, Chd2, Chd4, Chd6, Chd8, Chd9, Ep400 and Brg1 MNase remodeller ChIP-seq short reads were mapped to mouse mm9 genome using Bowtie 0.12.7 with the followings settings: -a -m1–best–strata -v2 -p3. Data sets were next converted to BED format files, and data analysis was performed using the seqMINER platform25 (Fig. 1c). To examine the distribution of remodellers at individual genes, we used WigMaker3 (default settings) to convert BED files into wig files, which were uploaded onto the IGV genome browser (Extended Data Fig. 2). Nucleosome calls were made from MNase remodeller ChIP-seq tags using GeneTrack26 with the following parameters: sigma = 20, exclusion = 146. We then globally shifted tags to the median value of half distances of all nucleosome calls. GRO-seq tags10 sharing the same or opposite orientation with the TSS were assigned as ‘sense’ and ‘divergent’ tags, respectively. The orientation of each NFR was arranged so that sense transcription proceeds to the right. ES nucleosomal tags, globally shifted tags from MNase remodeller ChIP-seq (this current study), tags from DHS regions (Mouse ENCODE), GRO-seq oriented tags from transcriptionally engaged Pol II and CpG islands (UCSC, mm9 build) were then aligned to the midpoint of each NFR. Promoter regions were then sorted by NFR length and visualized by Java TreeView (Fig. 2a, b). CpG island information was retrieved from UCSC (mm9 build) and assigned to the closest TSS by using bedtools. We noticed that promoters with wide NFRs were mostly CpG island (CpGI)-rich, while those with narrow NFRs were globally CpGI-poor, in agreement with a previous report showing that CpGIs induce nucleosome exclusion9 (Fig. 2b). Tags from reference nucleosomes7, remodeller-interacting nucleosomes (this study) and transcriptionally engaged Pol II (GRO-seq)10 were aligned to nucleosome −1 and +1 (or the first stable nucleosome) dyad positions. The direction of each dyad was assigned according to the orientation of its associated TSS, the orientation of which was arranged so that the transcription proceeds to the right. After normalization to the gene count in the two different NFR subclasses, tags were plotted from the NFR midpoint to 500 bp distal to the reference nucleosome. An x axis gap in the NFR was introduced to normalize variations in NFR length inside each class. We used DNaseI-Seq data from the mouse ENCODE consortium (GSM1004653) for the identification of DHS regions in the mouse ES cell genome. DHS regions were defined using MACS 2.0 (ref. 27) (default setting), which resulted in the identification of 139,454 DHS regions. Each of these DHS regions was represented as a 500-bp window (−250 bp/+250 bp) centred on the midpoint of the DHS peak. DHS regions overlapping with the blacklisted (high background signal) genomic areas (mm9) were removed, resulting in a final list of 138,582 DHS regions. Tags from each tested ChIP-seq data set were summed up for each DHS region before pair-wise Pearson correlation comparison. The R2 value from each pair-wise Pearson correlation was then visualized by heat map (Fig. 1a). Pearson correlation analysis at promoter-like DHS regions. Operating with the seqMINER platform, we retrieved, from the 138,582 DHS regions list, those positive for H3K4me3, TBP and Pol II S5ph. We obtained 16,300 promoter-like DHS regions befitting the criteria. Pair-wise Pearson correlation was performed and plotted (Fig. 1b) as described for Fig. 1a. We used the pHYPER shRNA vector for remodeller depletion in ES cells, as previously described28. shRNA design was performed using DSIR software (http://biodev.extra.cea.fr/DSIR/DSIR.html). Below are the shRNAs selected for each remodeller. The sense strand sequence is given; the rest of the shRNA sequence is as described previously28. Chd1 shRNA 1: 5′-GCAAAGACGGCGACTAGAAGA-3′; Chd1 shRNA 2: 5′-GACAGTGCTTAATCAAGATCG-3′; Chd4 shRNA 1: 5′-GGACGACGATTTAGATGTAGA-3′; Chd4 shRNA 2: 5′-GCTGACGTCTTCAAGAATATG-3′; Chd6 shRNA 1: 5′-GTACTATCGTGCTATCCTAGA-3′; Chd6 shRNA 2: 5′-CAGTCAGAACCCACAATAACT-3′; Chd8 shRNA 1: 5′-GCAGTTACACTGACGTCTACA-3′; Chd8 shRNA 2: 5′-GACTTTCTGTACCGCTCAAGA-3′; Chd9 shRNA 1: 5′-TATACCAATTGAACAAGAGCC-3′; Chd9 shRNA 2: 5′-AGTTAAAGTCTACAGATTAGT-3′; Ep400 shRNA 1: 5′-GGTAAAGAGTCCAGATTAAAG-3′; Ep400 shRNA 2: 5′-GGTCCACACTCAACAACGAGC-3′; Smarca4 shRNA 1: 5′-ACTTCTTGATAGAATTCTACC-3′; Smarca4 shRNA 2: 5′-CCTTCGAACAGTGGTTCAATG-3′. Each shRNA was transfected in its corresponding tagged ES cell line, to follow remodeller depletion by western blotting using monoclonal antibodies anti-Flag (M2, Sigma F1804), or anti-HA (H7, Sigma H3663) epitopes (Extended Data Fig. 6), in comparison with the signal obtained with a control antibody anti-Gapdh (Abcam ab9485). The pHYPER shRNA vectors were transfected in ES cell by electroporation, using an Amaxa nucleofector (Lonza). Twenty-four hours after transfection, puromycin (2 μg ml−1) selection was applied for an additional 48 h period, before cell collection and RNA preparation, except for Chd4, for which cells were collected after 30 h of selection. Total RNA was extracted using an RNeasy kit (Qiagen). Total RNA yield was determined using a NanoDrop ND-100 (Labtech). Total RNA profiles were recorded using a Bioanalyzer 2100 (Agilent). For each remodeller, RNA was prepared from three independent transfection experiments, and processed for transcriptome analysis. 46C ES cells were amplified on feeder cells except for the last passage, at which point cells were plated onto 60-mm dishes coated with gelatine, and grown to 70% confluence in D15 medium with LIF. Total RNA was extracted using an RNeasy Kit (Qiagen). The RNA quality was verified on a 2100 Bioanalyzer. Library preparation was performed using the Illumina mRNaseq sample preparation kit according to manufacturer’s instructions. Briefly, the total RNA was depleted of ribosomal RNA using the Sera-mag Magnetic Oligo (dT) Beads (Illumina) and after mRNA fragmentation, reverse transcription and second strand cDNA synthesis the Illumina specific adaptors were ligated. The ligation product was then purified and enriched with 15 cycles of PCR to create the final library for single-read sequencing of 75 bp carried out on an Illumina GAIIx. To keep only sequences of good quality, we retained the first 40 bp of each read and discarded all sequences with more than 10% of bases having a quality score below 20, using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Mapping of these sequences onto the mm9 assembly of mouse genome and RPKM computation were then performed using ERANGE v3.1.0 (ref. 29) and bowtie v0.12.0 (ref. 30). In brief, a splice file was created with UCSC known genes and maxBorder = 36. We created an expanded genome containing genomic and splice-spanning sequences using bowtie-build and bowtie was used to map the reads onto this expanded genome. Then the ERANGE runStandardAnalysis.sh script was used to compute RPKM values following steps previously described29, using a consolidation radius of 20 kb. Random-primed reverse transcription was performed at 52 °C in 20 μl using Maxima First strand cDNA synthesis kit (Thermo Scientific) with 1 μg of total RNA isolated from ES cells (Qiagen), quantified with NanoDrop instrument (Thermo Scientific). Reverse transcription products were diluted 40-fold before use. Composition of quantitative PCR assay included 2.5 μl of the diluted RT reaction, 0.2–0.5 mM forward and reverse primers, and 1× Maxima SYBR Green qPCR Master Mix (Thermo Scientific). Reactions were performed in a 10 μl total volume. Amplification was performed as follows: 2 min at 95 °C, 40 cycles at 95 °C for 15 s and 60 °C for 60 s in the ABI/Prism 7900HT real-time PCR machine (Applied Biosystems). The real-time fluorescent data from qPCR were analysed with the Sequence Detection System 2.3 (Applied Biosystems). Each qPCR reaction was performed using the set of primer pairs listed in Supplementary Table 2, validated for their specificity and efficiency of amplification. All reactions were performed in triplicates, using RNA prepared from three independent cell transfection experiments. Control reactions without enzyme were verified to be negative. Relative expression was calculated after normalization with three reference genes (Actb, Nmt1 and Ddb1), validated for this study. cRNA was synthesized, amplified and purified using the Illumina TotalPrep RNA Amplification Kit (Life Technologies) following Manufacturer’s instructions. In brief, 200 ng of RNA were used to prepare double-stranded cDNA using a T7 oligonucleotide (dT) primer. Second-strand synthesis was followed by in vitro transcription in the presence of biotinylated nucleotides. cRNA samples were hybridized to the Illumina BeadChips Mouse WG-6v2.0 arrays. These BeadChips contain 45,281 unique 50-mer oligonucleotides in total, with hybridization to each probe assessed at 30 different beads on average. A total of 26,822 probes (59%) are targeted at RefSeq transcripts, and the remaining 18,459 (41%) are for other transcripts. BeadChips were scanned on the Illumina iScan scanner using Illumina BeadScan image data acquisition software (version 2.3). Data were then normalized using the ‘normalize quantiles’ function in the GenomeStudio Software (version 1.9.0). Following analyses were done using Genespring software (version 13.0-GX). For Brg1, we used a previously published transcriptome data set, in which loss of Brg1 function was obtained by genetic ablation18. All array analyses were undertaken using the Limma package from the R/Bioconductor software (R-Development-Core-Team, 2007). Microarray spot intensities were normalized using the RMA method as implemented in the R affy package. Normalized measures served to compute the log ratios for each gene between the wild-type strain and the Brg1 knockout mutant. Then, to identify genes with a log ratio significantly different between the mutant and wild- type strain, P values were calculated for each gene using a moderated t-test. The moderated t-test applied here was based on an empirical Bayes analysis and was equivalent to shrinkage (or expansion) of the estimated sample variances towards a pooled estimate, resulting in a more stable inference. Finally, adjusted P values were calculated using the false discovery rate (FDR)-controlling procedure of Benjamini and Hochberg. We identified deregulated genes using the thresholds of 0.05 for the P value, and 1.5 for the fold change (FC 1.5). This FC 1.5 threshold was chosen based on a previous study on Brg1 (ref. 18), and also because it was compatible with the analysis of the remodellers more modestly involved in transcriptional control in ES cells such as Chd1, Chd6 and Chd8. Note that seemingly modest fold changes might arise from many sources including a response lag, residual remodelling activity, and relatively high experimental background. Using a FC 2 threshold, we could, however, confirm that Ep400, Chd4 and Brg1 are important transcriptional regulators in ES cells, with 535, 293 and 570 genes deregulated, respectively. This level of deregulation is indicative of a context-specific function of remodellers in transcriptional activation or repression, which is distinct from the function of general transcription factors, whose depletion is expected to affect most genes. Statistical analysis of the differences in transcriptional activation and repression by remodellers was performed using a two-sample test for equality of proportions with continuity correction. For the generation of GC-content-based lists of promoters, we used the list of promoters defined in figure 3 of ref. 15, which we crossed with the 14,623 promoter list, to obtain a list of 6,317 promoters rank ordered according to GC content. In Fig. 3b, we compared the percentages of genes either down- or upregulated by loss of function of each remodeller in the following two groups: (1) NFR length classes: genes from the narrow and wide NFR classes shown in Fig. 2a were each further divided into two subclasses, which resulted in the following four categories: narrow NFR subclass 1 (NFR < 15 bp), narrow NFR subclass 2 (15–115 bp NFR), wide NFR subclass 1 (116–504 bp) and wide NFR subclass 2 (505–1,500 bp). Genes in these groups were further subdivided into H3K4me3 and bivalent subgroups. (2) GC content classes: genes were divided into four quartiles based on GC content at promoters and further subdivided into H3K4me3 and bivalent subclasses. The number of genes analysed in Fig. 3b is indicated in brackets for the following subgroups. H3K4me3 genes: narrow NFR subclass 1 (739), subclass 2 (1,829), wide NFR subclass 1 (2,613), subclass 2 (1,253), GC content quartile 1 (low GC content) (450), quartile 2 (719), quartile 3 (644), quartile 4 (high GC content) (430). Bivalent genes: narrow NFR subclass 1 (271), subclass 2 (866), wide NFR subclass 1 (2,266), subclass 2 (1,184), GC content quartile 1 (220), quartile 2 (485), quartile 3 (750) and quartile 4 (1149). FAIRE was performed as described31 with modifications. 46C ES cells were amplified as described above for RNA preparation. Formaldehyde was added directly to the growth media (final concentration 1%), and cells were fixed for 5 min at room temperature. After quenching with glycine (125 mM) and several washes, cells were collected, resuspended in 500 μl of cold lysis buffer (2% Triton X-100, 1% SDS, 100 mM NaCl, 10 mM Tris-HCl, pH 8.0 and 1 mM EDTA) and disrupted using glass beads for five 1-min sessions with 2-min incubations on ice between disruption sessions. Samples were then sonicated for 16 sessions of 1 min (30 s on/30 s off) using a bioruptor (Diagenode) at max intensity, at 4 °C. After centrifugation, the supernatant was extracted twice with phenol–chloroform. The aqueous fractions were collected and pooled, and a final phenol–chloroform extraction was performed before DNA precipitation. FAIRE experiments were realized in triplicate, using independent ES cell cultures. Before sequencing, FAIRE DNA was analysed and quantified by running 1/25 of the FAIRE material on a high sensitivity DNA chip on a 2100 Bioanalyzer (Agilent, USA). Approximately 20 ng of FAIRE DNA was used for library preparation according to manufacturer’s instructions using the ChIP-seq sample preparation kit (Illumina). Single-read sequencing (36 bp) was performed on a Genome Analyzer II (Illumina). ES cells were grown and transfected with shRNA vectors as described for RNA analysis. Biological replicates were obtained by performing two independent transfection experiments for each shRNA vector. ATAC-seq libraries were constructed by adapting a published protocol20. In brief, 50,000 cells were collected, washed with cold PBS and resuspended in 50 μl of ES buffer (10 mM Tris, pH 7.4, 10 mM NaCl, 3 mM MgCl ). Permeabilized cells were resuspended in 50 μl transposase reaction (1× tagmentation buffer, 1.0–1.5 μl Tn5 transposase enzyme (Illumina)) and incubated for 30 min at 37 °C. Subsequent steps of the protocol were performed as previously described20. Libraries were purified using a Qiagen MinElute kit and Ampure XP magnetic beads (1:1.6 ratio) to remove remaining adapters. Libraries were controlled using a 2100 Bioanalyzer, and an aliquot of each library was sequenced at low depth onto a MiSeq platform to control duplicate level and estimate DNA concentration. Each library was then paired-end sequenced (2 × 100 bp) on a HiSeq instrument (Illumina). As ATAC-seq libraries are composed in large part of short genomic DNA fragments, reads were cropped to 50 bp using trimmomatic-0.32 to optimize paired-end alignment. Reads were aligned to the mouse genome (mm9) using Bowtie with the parameters -m1-best-strata -X2000, with two mismatches permitted in the seed (default value). The -X2000 option allows the fragments <2 kb to align and -m1 parameter keeps only unique aligning reads. Duplicated reads were removed with picard-tools-1.85. To perform differential analysis, libraries were adjusted to 33 million aligned reads using samtools-1.2 and by making a random permutation of initial input libraries (shuf linux command line). Adjusted BAM data sets were next converted to BED. We used the seqMINER platform with the lists of 6,481 H3K4me3-only and 3,411 bivalent genes described above, to collect tag densities from ATAC-seq data sets, in a window of −2 kb/+2 kb around the TSS. Output tag density files were analysed using R software to establish average ATAC-seq signal profiles shown in Extended Data Fig. 8. ES cells were grown and transfected with shRNA vectors as described above. Biological replicates were obtained by performing two independent transfection experiments for each shRNA vector. For each experiment, 1 million cells were fixed 10 min in ES cell culture medium containing 1% formaldehyde, quenched with glycine (125 mM), washed with PBS buffer, collected in 175 μl of solution I (15 mM Tris-HCl, pH 7.5, 0.3 M sucrose, 60 mM KCl, 15 mM NaCl, 5 mM MgCl and 0.1 mM EGTA), and stored on ice. Cells were permeabilized by adding 175 μl of solution II (solution I with 0.8% Igepal CA-630 (Sigma)) and incubating for 15 min on ice. We next added 700 μl of MNase digestion buffer (50 mM Tris-HCl, pH 7.5, 0.3 M sucrose, 15 mM KCl, 60 mM NaCl, 4 mM MgCl and 2 mM CaCl2), 4 U of MNase, and incubated for 10 min at 37 °C. MNase digestion was stopped by adding 10 mM EDTA (final concentration), and storing on ice. Cells were then disrupted by 15 passages through a 25 G needle, followed by a 10 min centrifugation at 18,000g. The supernatant was collected and incubated for 1 h at 65 °C with 15 μg of RNase A. We next added 10 μg of proteinase K, adjusted each sample to 0.1% SDS (final concentration) and incubated for 2 h at 55 °C. NaCl concentration was then adjusted to 200 mM and the samples were incubated overnight at 65 °C for crosslink reversal. DNA was purified from each sample by phenol–chloroform extraction followed by ethanol precipitation. Purified DNA (20 ng) was used for library preparation according to manufacturer’s instructions, using Ultralow ovation library system (Nugen). Following end-repair and adaptor ligation, fragments were size-selected onto an agarose gel in order to purify genomic DNA fragments between ~60 and 220 bp. Libraries were verified using a 2100 Bioanalyzer before clustering and paired-read sequencing. Sequencing of each sample was performed in a single lane of a HiSeq instrument (Illumina). The midpoint of each paired-end sequencing read was used to represent dyad location of each nucleosomal tag. We assumed that remodeller depletion has no bulk effect on nucleosome occupancy, hence the total reads of control and remodeller-depleted cells were adjusted to be the same. The adjusted tags were aligned to −1 nucleosome dyads (determined by the first MNase-defined peak upstream of annotated RefSeq TSS), or the first stable (MNase-defined) nucleosome dyad position downstream of the TSS for different NFR categories. These tags were further normalized to the amount of genes involved in each NFR class. The normalized tags were then binned (5 bp) and smoothed (10-bin moving average) before plotting (Fig. 3c). Distances (bp) are indicated relative to these reference points. An x axis gap in the NFR was introduced to normalize variations in NFR length inside each class. ES cells were grown and transfected with shRNA vectors as described above. Biological replicates were obtained by performing two independent transfection experiments for each shRNA vector. Following a 10 min fixation with 1% formaldehyde in ES cell culture medium, chromatin was prepared from 5–10 million cells and sonicated as described32. ChIP-exo experiments were carried out essentially as described33. This included an immunoprecipitation step using antibodies against Pol II (sc-899, Santa Cruz Biotechnology) attached to magnetic beads, followed by DNA polishing, A-tailing, Illumina adaptor ligation (ExA2), and lambda and recJ exonuclease digestion on the beads. After elution, a primer was annealed to EXA2 and extended with phi29 DNA polymerase, then A-tailed. A second Illumina adaptor was then ligated, and the products PCR-amplified and gel-purified. Sequencing was performed using NextSeq500. Uniquely aligned sequence tags were mapped to the mouse genome (mm9) using BWA-MEM (version 0.7.9a-r786)34. The uniquely aligned sequence tags were used for the downstream analysis. The 5′ end of mapped tags, representing exonuclease stop sites, were consolidated into peak calls (sigma = 5, exclusion = 20) using GeneTrack26, and peak pairs were matched when found on opposite strands and 0–100 bp apart in the 3′ direction. Tags were globally shifted to the median value of half distance between all peak pairs. These global shifted tags were then aligned relative to the annotated RefSeq TSSs for H3K4me3-only and bivalent promoters separately before further carved out remodeller-affected genes. We assumed that having remodeller deletion bore no bulk change on Pol II occupancy, and hence total tags among wild type and all remodeller mutants were normalized to be the same. To make direct comparison between different gene groups, we further normalized tags to the amount of genes within the group. These normalized tags were then smoothed (5 bp binned before 10-bin moving average) before plotting (Extended Data Fig. 9a). To examine Pol II occupancy change in remodeller mutants among different promoter groups, we first calculated total Pol II occupancy by summing up tags from transcript start to end sites (annotated RefSeq TSS and TES, respectively24) for the tested genes. Change in Pol II occupancy was calculated by dividing the total Pol II occupancy of mutant by that of wild type before log transformation and bargraph plotting (Extended Data Fig. 9b). Genes were rank-ordered according to reads per kb of transcript per million mapped reads (rpkm) and divided in four quartiles (highest: Q4, second: Q3, third: Q2 and lowest: Q1). Operating with the k-means clustering function of seqMINER, genes in each quartile were further subdivided in H3K4me3-only and bivalent genes, as described above. Using these lists of genes, tag densities from remodeller ChIP-seq data sets were collected in a window of −2 kb/+2 kb around the TSS, except for Chd2, for which densities were collected from the TSS until +4 kb. Output tag density files were first analysed using R software to establish average binding profiles. Statistical comparisons were performed between remodeller distributions at H3K4me3 promoters, to assess a significant increasing trend among distributions. Differences between successive pairs of quartiles (Q4 − Q3, Q3 − Q2 and Q2 − Q1) were compared against a null distribution using a one side t-test. The respective P values are reported for each remodeller: Chd1, Q4 − Q3 P = 1.371138 × 10−27; Q3 − Q2 P = 1.728126 × 10−16; Q2 − Q1 P = 7.985217 × 10−23. Chd2, Q4 − Q3 P = 7.543473 × 10−33; Q3 − Q2 P = 1.115223 × 10−25; Q2 − Q1 P = 3.283427 × 10−38. Chd4, Q4 − Q3 P = 0.2094255; Q3 − Q2 P = 0.1081455; Q2 − Q1 P = 0.07202865. Chd6, Q4 − Q3 P = 0.4168748; Q3 − Q2 P = 0.1534144; Q2 − Q1 P = 0.01138035. Chd8, Q4−Q3 P = 4.031959 × 10−15; Q3 − Q2 P = 1.231527 × 10−6; Q2 − Q1 P = 1.34455 × 10−9. Chd9, Q4 − Q3 P = 9.484578 × 10−44; Q3 − Q2 P = 1.059783 × 10−14; Q2 − Q1 P = 4.646352 × 10−28. Ep400, Q4 − Q3 P = 3.046796 × 10−20; Q3 − Q2 P = 1.215304 × 10−14; Q2 − Q1 P = 6.462667 × 10−11. Brg1, Q4 − Q3 P = 3.512021 × 10−24; Q3 − Q2 P = 2.515217 × 10−7; Q2 − Q1 P = 0.977422. We concluded from this analysis that Chd1, Chd2, Chd9 and Ep400 binding at promoters is tightly linked to gene expression level. By contrast, Brg1, Chd4 and Chd6 deposition showed little correlation with gene expression level (statistical test failed for at least one comparison for these remodellers). While statistical analysis of Chd8 distributions concluded to significant differences between quartiles, inspection of distributions in Extended Data Fig. 3 showed that Chd8 binding profile was intermediate between these two categories.
News Article | January 18, 2016
Forget the classic model of a bed, a rectangular mattress and four legs, because it could soon be history. Now imagine sleeping in a bubble-shaped pod. You’re wrapped in smart pajamas that monitor your sleep patterns and automatically adjust temperature, sound, and light to ensure your comfort without disturbing your sleep cycle. Humans spend one third of life sleeping, but some sleep researchers believe gadgets like these could reduce that. We could evolve to need less shuteye by using comfort-boosting technology to optimize our sleep. The key is to get the best, deepest sleep in the least amount of time—a skill humans have been mastering for thousands of years. If you want to know how to cheat sleep in the future, take a look at the past, Duke University sleep researcher David Samson says. On the heels of a study proving humans sleep “more efficiently” than our closest primate relatives, scientists believe it’s possible to shrink your total hours of sleep each night by improving your sleep environment. To explain why, Samson points to a monkey. The groundbreaking study he co-authored examined 21 species of primates and found humans need roughly half as much sleep as most monkeys and apes, about seven hours per night. (Lemurs sleep 14 to 17 hours per night and chimpanzees sleep 11.5 hours per night.) That’s because, over time, humans stopped sleeping in trees. We moved onto the ground, began sleeping near fires and eventually in beds, where we feel cozier and safer with less fear of "predators." We now sleep more deeply, reach REM faster and waste less time on light sleep than primates, according to the study. And we'll likely keep evolving that way with the help of technology, Samson said. “For hundreds of years, we have been manipulating our environments, instead of the other way around. There’s no reason we can’t use science to take that a step further and continue to optimize sleep. I absolutely think we'll see light and temperature-controlled sleeping pods become popular in [homes in] the future,” he said, adding they already exist in expensive early-model form. By 2055, most people will get by on only five hours of sleep per night, scientists from the Institute for Ethics and Emerging Technologies and sleep expert Dr. Raj Dasgupta predict. The shift toward less shuteye is already happening. Americans sleep, on average, one hour less per night than they did 40 years ago, explained Dasgupta. “Any doctor or researcher worth his salt will tell you we'll be sleeping less in the future,” Dasgupta said. “Total sleep time has been decreasing for years. In 1970s, we slept 7 to 8 hours, presently it’s 6 to 7. If you do a little math, in next 40 to 50 years it could be 5 to 6 hours.” In the past, humans have evolved—slowly over thousand of years—to reduce how many hours we sleep. But technology could lurch that into fast forward, he warned. From students to soldiers, just about everyone wants to unlock the secret to needing less sleep. But tampering with our biology could have negative health effects, he said. “Just look at the quotes about sleep that our society focuses on. You’ve got, 'You’ll sleep when you’re dead’ and ‘The early bird catches the worm.' Now, millennials are saying, ‘Sleep is a poor substitute for caffeine.' It’s almost subliminally promoting the idea we should sleep less,” Dasgupta said. “It’s sad,” he added. “Sleep is associated with better moods and better lives.” But that’s not stopping firms from dreaming big about cheating sleep. Wearable tech that monitors snoozing patterns to optimize sleep, similar to a fitness tracker, is developing quickly. A night shirt, developed by the sleep diagnostics company Nyx Devices, uses fabrics “embedded with electronics” to “monitor the quality and quantity of the user’s sleep,” according to its website. A small chip determines a user's phase of sleep, including, REM, deep and light sleep. More advanced models could soon hit the market, tracking breathing, heart rate, blood pressure and skin conductivity while the user dozes. Sleep cycle alarms are another way to squeeze the most out of your shuteye, researchers said. The gadgets monitor breath and movement and wake the user during light sleep, so he or she feels more refreshed. They’ll be used in hotels by 2035, predicted futurist Ian Pearson, who was hired by Travelodge in 2011 to study the future of sleep. “Sleep cycle alarms will monitor the electrical activity in the brain and identify the best time to wake the sleeper so that the individual will wake up fresher than if they had awoken in the middle of a new sleep cycle,” Pearson wrote in the report. The bed itself will change, too, he predicted. Enclosed spherical spaces, similar to napping pods—found in cutting-edge airports and offices like Google and Uber—can also be rigged to automatically control light and temperature and boost sleep efficiently. An early retail model, Tranquility Pod, has already hit the market for $30,000. The covered fiberglass bed is shaped like an huge egg. It controls sound, heat and light "to transport the body, mind, and spirit to a tranquil state of relaxation,” according to its website. It features a “biofeedback system” that monitors the user’s heart rate and pulse and it comes complete with oval-shaped memory foam and water mattresses. The spherical dome enclosure promotes restfulness and provides privacy by eliminating surrounding distractions, according to the firm. A study conducted by the pod-making firm MetroNaps found people who took a 20-minute pod naps saw a 30 percent boost in alertness, making them more productive. Sleeping less in the future could free up more time for work and play. But we should be careful when tinkering with human nature, Dasgupta warns. A society full of people who think they are rested could explode into a public health nightmare. “Sleep deprivation is associated with poor immunity. It could take something drastic shocking before people realize how important it is,” he said. There’s no denying modern society is obsessed with how to sleep less and still feel good. Where there’s a demand there’s a market, and in this case a tech-centric one. But it’s important for people to listen to their bodies, not just their devices, Dasgupta said. “I don’t think less sleep is what our bodies want,” he said. “But it’s where we’re headed.” You’ll Sleep When You’re Dead is Motherboard’s exploration of the future of sleep. Read more stories.
Orbital neutron spectroscopy is commonly divided into three distinct energy regimes—thermal (low energy), epithermal (intermediate energy) and fast (high energy)—each providing complimentary information about elemental abundance and distribution (spatial and depth). The process starts with fast neutrons created by cosmic-ray interactions in the lunar regolith. Elastic neutron–proton scattering causes these neutrons to rapidly lose energy, shifting some of them into the epithermal regime. Subsequent moderation and/or capture processes can further modify the flux and spectrum thereby imprinting details of the intervening material on escaping neutrons. Owing to the efficiency of the neutron energy-loss process, the epithermal regime is an especially sensitive probe of hydrogen1, 2. Epithermal neutron deficits measured from orbit are therefore indicative of enhanced hydrogen abundances. Proper determination of statistical significance is often exchanged for approximate methods that may be simple or reduce computation requirements. Low signal-to-noise environments require a more rigorous approach. Relevant statistical descriptions are based on particle counts, not rates, and therefore require the use of exposure distributions in addition to observed neutron count rates. Our statistical analysis approach uses a likelihood parameter λ to characterize consistency between acquired neutron data and a hydrogen-poor (null) hypothesis on a pixel-by-pixel basis. This parameter incorporates fundamental observational details as well as the inherent uncertainties associated with counting statistics. The likelihood parameter is governed by a well-known statistical distribution (χ2) and, hence, can be used to exclude features of low or marginal significance. The null hypothesis is rejected, on a pixel-by-pixel basis, if λ exceeds a predetermined critical value. For this work the critical value (λ = 25) was chosen to correspond to a chance probability of 5.7 × 10−7, equivalent to a 5σ Gaussian detection with 1 degree of freedom. Additional details of the statistical analysis framework used here are found in ref. 15. Significance maps are converted to hydrogen abundance distributions following the procedure outlined in ref. 14 (and references therein). Briefly, the statistical significance (for example, λ-statistic) is proportional to the magnitude of neutron count rate deficits, which in turn correlates directly with hydrogen abundance1. The relationship between neutron count rates and hydrogen abundances has been derived using Monte Carlo simulations that assume that the regolith has a composition equivalent to ferroan anorthosite1. Hydrogen abundance distributions for Lunar Prospector (LP), obtained following the likelihood-based analysis protocols described above, are shown in Fig. 1. The Lunar Reconnaissance Orbiter (LRO) Lunar Exploration Neutron Detector (LRO/LEND) includes a combination of collimated and uncollimated 3He sensors3, 31, 32 with one of the four uncollimated sensors configured for epithermal neutron detection. The collimated sensors for epithermal neutrons (CSETN) were designed to provide data with improved spatial resolution over uncollimated sensors, but low count rates and systematic background effects limit its value for confidently inferring hydrogen concentrations with high spatial resolution14, 33, 34, 35, 36. Because of these documented problems, the collimated LEND data are not used in this study. Uncollimated epithermal neutron data from the LEND sensors for epithermal neutrons (SETN) have been shown to have a reasonably good spatial correlation with the uncollimated LP data14, 32. The correlations between the two uncollimated data sets, however, are not perfect, and at best only qualitative suggestions have been provided to explain discrepancies that occur in both equatorial and polar regions32. Similar to LP, the spatial distribution of hydrogen derived from LEND-SETN does not match the predicted locations of water ice in the present thermal environment, and shows a broad, asymmetric, slightly off-polar distribution32. However, there are quantitative differences between LEND and LP that are not fully understood or documented. Our confidence in the LP data is well grounded because the LP data were measured with well-characterized sensors on a boom such that backgrounds from nearby materials were both understood and minimized37, and because the data reduction is supported by extensive documentation38 and a well-validated comparison with modelled count rates39. We expect that a more detailed analysis of the LEND data could provide additional insight to the differences between these data sets. However, such an analysis is beyond the scope of this study, and we have therefore focused on the LP-derived parameters. Two surface features are antipodal if they lie on diametrically opposite sides of a planet, such that a line connecting the two points passes through the centre of the planet. If a feature has a latitude and longitude of (θ, φ), then the antipode is located at (−θ, φ + 180°). This type of symmetry can also be referred to as an inversion, central reflection or point reflection. In typical map projection of polar data sets (for example, Fig. 1), antipodal features do not appear simply shifted by 180°. The different handedness between the north and south polar maps results in an additional reflection. This geometry is illustrated schematically in Extended Data Fig. 1a and b, in which two antipodally symmetric features are shown in north and south polar maps, respectively. To illustrate the antipodal nature of this feature, we show both north and south features in each plot, with the antipodal feature shown as it would appear if you could view it through the Moon. In this projection, a feature rotated by an angle (α) of 180° about the pole will exactly line up with its antipodal self (Extended Data Fig. 1c, d). The Pearson product-moment correlation coefficient r is used to quantify the strength of the correlations between data sets40. Values of this statistic occur within limits (−1 ≤ r ≤ 1) corresponding to perfect anti-correlation and correlation, respectively. A two-point correlation was implemented to operate on pixelated spatial distributions characterized by latitude and longitude. This coefficient measures similarities in relative amplitude (or shape) only, and is not used to evaluate the physics implications of the absolute neutron rates or thermal parameters. By itself the correlation coefficient is not a good statistic for determining the quality of an observed correlation. However, the significance of differences in correlation coefficients is relevant. The Fisher z-transformation facilitates hypothesis testing by quantifying whether a change in some physical parameter modifies the baseline correlation between two distributions. When applied to the Pearson coefficient it stabilizes the variance41 and can be used to determine significance. Fisher’s transformation takes the form and has a standard error of where N is the number of measurements in the population. If the baseline correlation (or null hypothesis) is characterized by a coefficient r and a second correlation by r, then the two-sided significance of the difference between the two measured coefficients (Δz = |z(r ) − z(r)|) is where erfc(x) is the complementary error function41. The Fisher transformation also enables determination of confidence levels. Because z can be approximated by a normal distribution with known variance, a 90% confidence interval is given by and application of the inverse Fisher transform yields the relevant confidence intervals on the measured correlation coefficient r. The determination of significance assumes that polar map pixels are independent. A globally mapped, equal-area pixel size was selected to match the ~45-km spatial resolution of acquired neutron data14, but total independence cannot be assured. This results in each polar region containing 364 equal-area pixels. Of those, only 248 (123 in the north and 125 in the south) meet the statistical threshold requirement discussed below (λ = 25). The number of pixels (N) used to evaluate significance was independent for each α and includes only those pixels common to the unrotated case (α = 0°); at peak significance N = 236. Using equation (1), the peak in Fig. 2b corresponds to Δz = 0.548 (with r = 0.728 and r = 0.356). Substituting these values into equation (2), we obtain −log (P) = 16.2, which corresponds to the peak in Fig. 2c and is equivalent to about 8.3σ. Even if the spatial distributions are oversampled by a factor of two—an extreme exaggeration that reduces the number of pixels to N = 118—the observed antipodal correlation is still significant, with a chance probability <10−9, which exceeds a 5σ threshold. Random processes (noise) will degrade any observed correlations. Therefore, investigating the dependence of inter-polar correlations on the likelihood parameter λ is instructive because it serves as a proxy for statistical significance of features. North and south polar hydrogen distributions show evidence for a strong near-antipodal relationship. Extended Data Fig. 2 shows peak significance −log (P) as a function of λ. A reduction in correlation significance as low-λ (low statistical significance) features are included is evident. Such a trend is expected if the features identified above the critical threshold (λ ≥ 25) are real, and those below are dominated by statistical fluctuations. The central argument for the existence of lunar volatiles is based on thermal modelling42. The forward thermal model presented here (for example, Figs 1c, d, 3 and 4) is an updated version of that presented in refs 5 and 6. The thermal model we use is intended to be the simplest model that can reproduce the major features of the LRO Diviner south polar observations. Updates to the model of ref. 5 include the use of polar meshes and an updated model of the Sun and its ephemeris, as detailed below. Polar meshes were modified to reproduce thermal conditions under the assumption of a past spin by transforming the polar stereographic ‘z’ coordinate to appropriately represent the change in polar position. Our offset figures were created with a map shifted to have the poles at 84.5° N, 138.6° E and 84.5° S, 318.6° E, which correspond to simple averages of the two polar hydrogen maxima. We model the Sun using a triangular mesh consisting of 128 triangles whose radiance decreases with distance from the centre of the Sun according to the solar limb darkening curve5, 6. The location and distance of the Sun relative to the Moon as a function of time is determined using the DE421 JPL Planetary Ephemeris. The full-resolution thermal-model results for ice stability depths in Fig. 1 are presented in Extended Data Fig. 3a–d. A version of Extended Data Fig. 3b has previously been published in ref. 5, but the remaining models are new. These models show where ice would be stable from sublimation at a rate of 1 mm Gyr−1 assuming a regolith cover. These depths have been shown to be consistent with radar43, 44 and neutron-spectrometer-derived depths on Mercury45. Volatiles should collect in the most thermally stable environments—permanently shadowed regions. Using a thermal and ice stability model4, 5, 6, the location of possible volatile reservoirs can be identified for different polar axis locations. The model outputs for the current lunar orientation are shown in Fig. 1. To facilitate direct comparisons with the hydrogen distributions the model outputs have been degraded to a spatial resolution of 30 km from the original 0.5 km to approximate the spatial resolution of the LP Neutron Spectrometer instrument in its low-altitude orbit. Water-ice stability depths for the current orientation, the proposed palaeopole orientation and an admixture between the two (at about 30-km resolution) are shown in Extended Data Fig. 4, which repeats parts of Fig. 1 for clarity. Admixtures of the present-day- and palaeo-axis model results (Fig. 1e, f; Extended Data Fig. 4c, f) are better correlated to the neutron data than is the present-day model alone. A given admixture is a reasonable descriptor if the corresponding correlation between it and the hydrogen distribution improves; here, statistical significance is measured relative to the correlation with a pure current spin-axis thermal model. For the north polar region, a present-day-only model (Fig. 1c, Extended Data Fig. 4a) is excluded at the 90% confidence level and the best descriptor is a 57%–43% admixture of current- and palaeo-axis hypotheses, respectively. The south polar region (Fig. 1d, Extended Data Fig. 4d) is consistent with a pure current spin-axis hypothesis, although the optimum north pole admixture (the 57%–43% mixture) is allowed at the 90% confidence level. This strengthens the argument that the identified longitudinal bias is related to topographic and thermal effects on hydrogen. We caution not to over-interpret the thermal analysis because it is an approximation that incorporates only two unique polar-axis locations. A more rigorous analysis must fully account for the TPW path and chronology. However, if temperature is the fundamental parameter driving volatile retention, then this approximation provides useful insights and additional support for our hypothesis of a palaeo-axis and TPW migration. Given higher-resolution neutron measurements46 and advances in polar crater chronology47, it may be possible to use comparisons between neutron data and crater age to help constrain the timescale of the suggested lunar TPW. If certain craters did not exist at the time of hydrogen deposition, then they will plausibly remove near-surface hydrogen-rich materials, setting a lower limit on hydrogen age. Conversely, if hydrogen is found to be associated with relatively young craters (about 2–3.5 Gyr), then it will set an upper limit on the age of hydrogen emplacement and constrain many TPW models. Evolution of lunar obliquity can also influence volatile survivability and its spatial distribution and may inform this timeline4, 12, 48, 49. Changes in the spin axis of a planetary body fall into two categories50, 51, 52: changes in obliquity and true polar wander (TPW) (Extended Data Fig. 5). The first category involves changes in the orientation of the spin axis in inertial space (that is, changing the position of the spin axis with respect to the celestial sphere), but not with respect to the surface of the planet (Extended Data Fig. 5b). In other words: the obliquity of the planet (the angle between the planet’s spin axis and the planet’s orbit normal) changes. Changes of this type result from external torques acting on the planet that can alter the planet’s angular momentum (both in magnitude and direction). For planets, the most notable torques are tidal torques from satellites, the Sun and other planets. Precession and nutation are well-known examples of this form of spin evolution for the Earth, as are Cassini state transitions for the Moon and Mercury53, 54. Spin evolution of this type can have large influences on the stability of ice at the lunar poles4, 12, 48, 49. In general, near-zero obliquity is required for ice stability at the poles. The second category of changes in planetary spin axes are those that change the orientation of the spin axis with respect to the surface of the planet, but do not change the position of the spin axis in inertial space (Extended Data Fig. 5c). This reorientation of the planet with respect to the spin axis is generally referred to as TPW18, 55, 56. Changes of this type are due to changes in the mass distribution within a planet or its hydrosphere/atmosphere. Redistribution of mass within the planet alters its inertia tensor. In a minimum energy rotation state, the rotation axis will be aligned with the maximum principal axis of inertia. If the mass redistribution changes the direction of the maximum principal axis, then the planet will reorient to keep the maximum principal axis aligned with the spin axis. Thus, to an outside (inertial) observer, the surface of the body appears to reorient with respect to the spin axis and maximum principal axis of inertia—as long as the changes in the mass distribution occur slowly with respect to the free precession period of the planet. If the changes in the inertia tensor are rapid (as might happen in the aftermath of a giant asteroid impact), the planet will enter an ephemeral period of non-principal axis rotation until the planet dissipates enough energy to return to principal axis rotation58. For most non-catastrophic geologic processes (for example, mantle convection and isostatic relaxation of topography), it is generally safe to assume that the planet always remains in principal axis rotation. TPW has been directly measured for the Earth, in the form of periodic TPW (driven by seasonal variations in atmospheric pressure, oceanic currents and ice loading) and secular TPW (driven by post-glacial rebound and mantle convection)50. Beyond Earth, TPW has been inferred for a variety of planetary bodies, including the Moon19, 20, 21, 59, 60, 61, 62, 63, 64, Enceladus65, 66, Europa66, 67, 68 and Mars69, 70 (see ref. 18 for a review). Because TPW does not change the orientation of the planet’s spin vector in inertial space, the instantaneous spin pole can remain a volatile cold trap (for sufficiently small obliquities). Our epithermal neutron palaeopole is not the first palaeopole proposed for the Moon. Extended Data Fig. 6 summarizes all previously proposed lunar palaeopoles. Lunar palaeopoles can be subdivided into three distinct categories on the basis of the data set used to identify them: (1) palaeomagnetic poles, (2) fossil-figure poles determined from long-wavelength topography or gravity, and (3) palaeopoles inferred from the distribution of polar volatiles (proposed for the first time here). Here we summarize these methods and the associated difficulties. The first lunar palaeopoles were inferred from orbital surveys of crustal magnetic anomalies from Apollo 15 and 16 sub-satellites59, 60, and have subsequently been measured to higher precession with Lunar Prospector61 and Kaguya19, 63, 64 observations. These magnetic anomalies can be fitted with source models of varying prescription, and a local dipole magnitude and orientation can be determined. Assuming that this local dipole is a frozen remnant from a global, body-centred core dynamo field, the geometry of this local field can be used to infer a palaeomagnetic pole (that is, the surface location where the magnetic dipole intersects the surface). Under the assumption that the dipole is aligned with spin axis of the Moon, this palaeomagnetic pole is then a record of the spin pole at the time at which the magnetic anomaly formed. There are several difficulties with interpreting palaeomagnetic poles. First, not all magnetic anomalies trace global dynamos. Large-scale impacts generate transient magnetic fields that can be different from any core dynamo existing at that time. Many deposits associated with magnetic anomalies (particularly those associated with impact ejecta, or features antipodal to large basins) may have experienced rapid shock-remnant magnetization during these transient fields, and thus may not accurately trace a core dynamo. To determine a true magnetic palaeopole, it is necessary to identify deposits that cooled slowly, well after the dissipation of any transient field (that is, thermoremnant magnetization). Identifying these deposits is difficult, and has been done convincingly only for a few magnetic anomalies61, 64. Although disentangling shock-remnant and thermoremnant anomalies is difficult, it is still curious that many magnetic anomalies cluster into two groups: one near the present-day spin-pole, and one in the far-side mid-latitudes19. The second major difficulty with interpreting palaeomagnetic poles is that they may not accurately trace the spin axis of a planet. This is the case on Earth, where the magnetic pole is misaligned with the spin pole by about 10°. Future work will need to investigate the formation and evolution of the lunar dynamo, in 3D, to determine how large of a misalignment is possible. There have been some attempts to infer palaeomagnetic poles from analysis of remnant magnetism in samples returned from the Apollo missions. Because the original orientation of these samples is unknown, it is not possible to completely describe the field geometry at the time these samples acquired their magnetizations—however, it is possible to infer the palaeolatitude of the samples on the basis of the orientation of the remnant field with respect to the sample’s magnetic fabric, which is used as a proxy for palaeohorizontal. Analysis of multiple samples from multiple Apollo landing sites has been used to infer palaeomagnetic poles62. The second types of palaeopoles are those inferred from measurements of the Moon’s long-wavelength topographic shape and gravitational field—the so-called ‘fossil figure’. Following the Moon’s formation and differentiation, the Moon was largely molten, and probably possessed a triaxial figure in equilibrium with the tidal and rotational potential of its early orbit. Eventually, the Moon cooled, and developed an elastic lithosphere capable of supporting this primordial, fossil triaxial figure over geologic time. The axis associated with the maximum principal moment of inertia of this figure would represent the palaeopole at the time that the elastic lithosphere formed. This fossil figure was preserved even as the Moon migrated to larger radial distances from Earth and the tidal and rotational potentials decreased. Although it is possible to directly measure the Moon’s present-day figure and its associated pole (quantified by degree-2 gravity and topography, and libration measurements), it is non-trivial to measure the primordial figure. Giant impact basins (particularly the South Pole–Aitken basin) and other large-scale geologic processes alter the Moon’s figure and obscure the true fossil figure. Garrick-Bethell et al.20 and Keane and Matsuyama21 have developed two different methods for isolating this fossil figure. A critical comparison of these two works is beyond the scope of this paper, but both suggest that the fossil figure has reoriented by 15°–30° (although in different directions). Although there is substantial scatter in the lunar palaeopoles reported in the literature (Extended Data Fig. 6), future work might be able to synthesize these data sets into a cohesive history of lunar TPW. Studies of the lunar fossil figure20, 21 should provide the ‘initial’ spin pole of the Moon. Paleomagnetic poles probably trace the lunar pole during the subsequent 1 Gyr, when the core dynamo was active71. Because polar volatiles are stable only during near-zero (roughly <12°) obliquity, polar volatiles probably trace polar wander only after the highly uncertain Cassini-state transition49. Although polar volatiles may not be able to trace the earliest episodes of lunar TPW, they have the distinct advantage of being capable of tracing small amounts of polar wander, late in lunar history. Under the assumption that the epithermal neutron palaeopole (north pole: 84.9° N, 147.9° E; south pole: 84.1° S, 309.4° E) is a former rotational palaeopole, and thus a former maximum principal axis of inertia, we ask the question: what mass anomaly would be required to reorient the Moon from this palaeopole to its present-day spin pole (0° N/S)? Phrased in terms of inertia tensors, this question is equivalent to I = I + I , in which I is the present-day lunar inertia tensor, I is the inertia tensor of some arbitrary mass anomaly and I is an undetermined inertia tensor with the maximum principal axis of inertia aligned with the epithermal neutron palaeopole. The goal here is to find all possible I that satisfy this condition. The present-day lunar inertia tensor I can be determined directly from a combination of degree-2 spherical harmonic gravity coefficients, J (–C ) and C , and libration parameters, β and γ, and is well constrained21, 57, 72, 73. In a principal-axis reference frame, the lunar inertia tensor can be written in which A, B and C are the minimum, intermediate and maximum principal moments of inertia. Following ref. 57, it is convenient to define these principal moments in terms of their departures from the mean moment of inertia in which M and R are the mass and radius of the Moon, respectively. A, B and C can then be written The inertia tensor of an arbitrary mass anomaly I depends strongly on the assumed location, geometry and mass distribution of the perturbing mass anomaly. However, if we consider the limiting case of an axisymmetric anomaly centred on the north pole of the planet (such that the symmetry axis of the anomaly, and the z axis are aligned), then the inertia tensor of the anomaly is reduced to a single parameter. This is because an axisymmetric anomaly centred on the north pole will contribute only to J , no other degree-2 spherical harmonic gravity coefficients, owing to symmetry. Following ref. 50, we then relate the degree-2 gravity of the mass anomaly located on the pole directly to an inertia tensor in which is the degree-2 zonal spherical harmonic coefficient associated with the mass anomaly when centred on the north pole (aligned with the positive z axis). Because we are concerned only with the orientation of the principal axes of inertia (the maximum of which is presumed to be associated with a palaeopole), the mean moment of inertia can be neglected. The mean moment of inertia is spherically symmetric and does not control the orientation of the principal axis of inertia. (Stated another way: the mean moment of inertia affects the eigenvalues of the inertia tensor, but not the eigenvectors.) is the inertia tensor for the case in which the mass anomaly is located on the north pole (with the symmetry axis aligned with the positive z axis). To determine the inertia tensor for a mass anomaly located anywhere on the Moon, we rotate the inertia tensor: , in which is a rotation matrix to rotate the mass anomaly from the north pole to an arbitrary latitude and longitude and is the transpose of . Ultimately, I is simply a function of and the position (latitude and longitude) of the anomaly. For simplicity, we define the quantity , with J the degree-2 zonal gravity harmonic measured by GRAIL74: J = −C = 203.2133 × 10−6, in unnormalized spherical harmonic coefficients. The negative sign forces Q to be positive for positive mass anomalies and negative for negative mass anomalies. To determine the possible locations and magnitudes of perturbing mass anomalies that could be responsible for the observed epithermal neutron palaeopole, we performed a parameter-space survey investigating the effect of placing mass anomalies of various sizes (Q) across the surface of the Moon. For each test case, we determined the palaeo inertia tensor: I − I = I . We determined the orientations of the principal axes of inertia by evaluating the eigenvalues and eigenvectors of I . We then measured the mean angular separation between the maximum principal axis of inertia and the epithermal neutron north and south poles. Extended Data Figure 7a–d shows example slices of this parameter-space search for positive and negative mass anomalies. The regions that can drive the required reorientation to within the measured uncertainty (approximately 1°) are limited. Figure 3a–c and Extended Data Fig. 7e show the acceptable regions in which a mass anomaly of a range of sizes (Q) could produce the required reorientation to within 1°. Lunar impact basins, uncompensated topography and mare basalts can have a substantial contribution to the inertia tensor of the Moon21, 58, 75. To determine if these features were possibly responsible for the reorientation that is required to explain the epithermal neutron palaeopoles, we considered a simple case of a spherical cap of uniform surface density (Extended Data Fig. 8a). Extended Data Figure 8b shows Q for spherical caps as a function of surface density (which, assuming a material density, can be converted into an equivalent, uncompensated material thickness) and cap radius. For the typical sizes of large impact basins (radii of <15°), required mass anomalies (Fig. 3a–c; |Q| > 0.2) would be equivalent to >5 km of uncompensated topography (either a topographic excess or depression, depending on the sign of the surface density). This magnitude of uncompensated topography or mare basalts is not observed in any lunar impact basin. In the following section, we exclude impact basins and mare basalts in a more rigorous manner. Internal mass anomalies, including mantle plumes or lateral variations in composition or density, can also have a contribution to the Moon’s inertia tensor. For simplicity, we considered a simple spherical mass anomaly, spanning from the outer core to the lunar crust, with an arbitrary density contrast (Extended Data Fig. 8c). In this case, Q for this simple internal anomaly is dependent only on the density contrast, as shown in Extended Data Fig. 8d. The smallest required mass anomalies (Fig. 3a–c; |Q| ≈ 0.2) would be equivalent to density anomalies of only |Δρ| ≈ 10 kg m−3. If these density anomalies are thought to arise from temperature variations, this would be equivalent to |ΔT|≈ 100 K (assuming a 3,300 kg m−3 mantle density and a volumetric coefficient of thermal expansion of 3 × 10−5)76. Temperature anomalies of this magnitude are easily generated in thermal evolution models of the PKT8, 9, 10, 11. This drives our detailed investigation of the TPW potential of the PKT. To determine whether impact-basin features can produce the mass anomalies required to explain the epithermal neutron palaeopole, we used the method of ref. 21 to isolate the degree-2 gravity field of these features. Extended Data Figure 8f shows the best-fit mass anomaly (Q) associated with each of the 32 largest lunar impact basins. From Extended Data Fig. 8f it is clear that most lunar impact basins have a small contribution to the degree-2 gravity field of the Moon—with the exception of the South Pole–Aitken basin, and its associated ejecta blanket. All other impact basins have |Q| < 0.2, which is the smallest possible value of Q that can reorient the Moon enough to explain the epithermal neutron palaeopoles (Fig. 3a–c). The only large impact basin that is located in a place that could potentially reorient the Moon in the necessary direction is Moscoviense (27° N, 148° E). For it to cause the observed reorientation, Moscoviense would need to be a present-day positive mass anomaly, with Q ≈ +0.22 (Fig. 3a–c). From the inverse modelling of this basin’s gravity field, we find that Moscoviense is a net negative mass anomaly, with Q < 0.1. Thus, even the favourably located Moscoviense impact basin is not capable of causing the required reorientation. Lunar impact basins tend to have a negligible contribution to degree-2, owing to the detailed structure of their gravity fields. Large lunar impact basins frequently possess large, central, positive free-air anomalies (so-called ‘mascons’77), surrounded by a broad, negative free-air anomaly collar resulting from the deposition of ejecta and thickening of the crust78. This alternating positive/negative ‘bull’s-eye’ pattern results in an almost net-zero contribution to the degree-2 gravity field21. It is possible that impact basins had more substantial contributions to degree-2 shortly after they formed, and before the formation of the central mascon, due to viscoelastic relaxation, mantle-flow, and cooling and contraction of the impact melt pool; however, this would be a transient stage lasting less than 30 Myr (ref. 78). It is unlikely that all of the observed hydrogen deposits formed in such a short time-span. Furthermore, if large impact basins were responsible, then we would expect several sets of antipodal epithermal neutron deposits, rather than just one. Although the South Pole–Aitken basin and its associated global ejecta blanket easily produce mass anomalies comparable to those required to explain the epithermal neutron palaeopoles21 (Extended Data Fig. 8f), it is not at the proper location to reorient the Moon in the necessary direction (Fig. 3b). In fact, the location of the South Pole–Aitken basin is incompatible with the observed epithermal neutron palaeopole. Extended Data Figure 8e illustrates the range of possible palaeopoles for both the South Pole–Aitken basin and the PKT, for a wide range of mass anomalies centred on each feature (the entire parameter space of Extended Data Fig. 8b). The latitude and longitude of the perturbing mass anomaly immediately restricts the possible locations for a palaeopole. The set of possible palaeopoles for PKT naturally passes through the epithermal neutron palaeopoles, whereas the possible palaeopoles associated with the South Pole–Aitken basin are nearly orthogonal to the observed reorientation. Thus, the South Pole–Aitken basin cannot be responsible for the observed epithermal neutron palaeopoles (although asymmetries in the impact basin and associated ejecta blanket79 may complicate this picture). To determine the reorientation of the Moon due to the thermal evolution of the Procellarum KREEP Terrain (PKT), we used the 3D thermochemical convection models of ref. 9 (see ref. 9 for the details of the model). Here, we focus on how we use these models to determine the TPW history of the Moon. The PKT thermal models consist of a 3D spherical grid, with 20-km radial resolution and 60-km lateral resolution. The radial grid runs from the core–mantle boundary (at a radius of R = 390 km) to the Moon’s surface (R = 1,740 km). At each volume element within the model domain, the density varies owing to thermal expansion/contraction; in the bulk composition, it varies owing to partial melting and subsequent melt migration. We determine the full inertia tensor of the model by summing the contribution of each volume element and similarly for the other components of the inertia tensor (I , I , I , I ). Here, V is the volume of the ith grid element and ρ is the density, which varies with time. In these calculations, we take PKT to be located along the positive z axis. For TPW, it is not only important to consider density variations within the body, but also surface deformation in response to the temperature evolution at depth. As the mantle heats up, the surface will be uplifted in response to the thermal expansion of the mantle. Depending on the magnitude of this surface compensation, it is possible for PKT to act as either a net negative anomaly (Q < 0; if the thermal anomaly at depth dominates) or a net positive anomaly (Q > 0; if the topographic uplift dominates). Our PKT models do not directly take into account changes in surface topography due to thermal evolution at depth. To address this, we followed the approach used in ref. 9 and calculated the amount of surface uplift a posteriori by determining the amount of topography necessary to balance the thermal expansion/contraction of the mantle at depth. For each radial column within the model domain, we determined the initial integrated mass within that column. As the interior warms owing to the evolution of PKT, this results in an overall decrease in density in the column, which, in an incompressible model without surface flexure, leads to a small decrease in the integrated mass within the column. If we assume that the lithosphere can perfectly compensate for this change in density (which would occur only if the lithosphere was completely strengthless), then we add this missing mass back into the model at the uppermost radial volume element within the column. This added mass is a proxy for the topographic uplift resulting from this interior changes in density. Because real planetary lithospheres are not strengthless, and instead possess some rigidity, we modulated this correction by a factor we term the ‘compensation state’ C. If C = 1, then we add in the complete mass correction corresponding to a strengthless lithosphere. If C = 0, then we do not add in any mass correction, which would correspond to a completely rigid lithosphere, incapable of deforming in response to the interior thermal expansion. Thus, the total inertia tensor I from the thermal model is I = I + CI , in which I is the inertia tensor that results from summing up the contribution of each volume element within the model and I is the inertia tensor that results from the mass due to this dynamic topography in the upper-most grid cell. For all cases, we normalize the final total inertia tensor to the observed mass and radius of the Moon. From the inertia tensor, it is possible to directly calculate spherical harmonic gravity coefficients50 For the case with the PKT centred on the z axis, the degree-2 gravity field associated with PKT is described primarily by C , owing to symmetry. Although there is some power in the other spherical harmonic gravity coefficients, C is the most important. The inertia tensor is uniquely related to degree-2 gravity coefficients (and only degree-2 gravity coefficients). In our parameter-space search for possible perturbing mass anomalies (Fig. 3a–c), we assume that the Moon used to have its spin axis at a different location (possibly at the epithermal neutron palaeopole) and was subsequently reoriented to the present-day spin pole. Phrased differently, we assume that the perturbing mass anomaly is still present, and still contributes to the observed lunar inertia tensor and degree-2 spherical harmonic gravity coefficients. Thus, to determine the relative importance of the PKT, it is more useful to define the change in the mass-anomaly size with respect to its present value: ΔQ = Q(t) − ΔQ(0 Gyr ago). This ΔQ is the relevant quantity for the parameter-space survey in Fig. 3a–c, and determines how much the Moon could have reoriented in the past, with respect to its present-day orientation. A positive ΔQ indicates the presence of a positive mass anomaly (mass excess) with respect to the present state; a negative ΔQ indicates the presence of a negative mass anomaly (mass deficit) with respect to the present state. Q and ΔQ for two end-member PKT thermal anomalies are shown in Extended Data Figs 9, 10. The nomenclature ‘W’ and ‘B’ are shortened from ‘0LW’ and ‘0LB’ adopted from ref. 9, in which ‘0’ denotes low radiogenic mantle composition and ‘L’ denotes the larger (in diameter) of two test cases, ‘W’ denotes KREEP within the crust and ‘B’ denotes KREEP below the crust. To determine how the Moon would reorient under the thermal evolution of the PKT, it is necessary to first reorient the PKT inertia tensor so that it is properly aligned with the approximate centre of the PKT (18° N, 334° E). This can be done by either directly rotating the inertia tensor or rotating the spherical harmonic gravity coefficients via the spherical harmonic addition theorem57, 80. To determine the TPW path predicted from the thermal evolution of the PKT, it is necessary to determine the location of the maximum axis of inertia as a function of time. The location of this maximum axis of inertia is defined as the palaeopole. To calculate this TPW path for any PKT thermal model, we first assume that the present-day, observed lunar inertia tensor is the sum of the inertia tensor from the final time-step of the thermal evolution models (t = 0 Gyr ago), and some non-hydrostatic component (including other impact basins, mascons, the fossil figure and so on) I : I = I (0 Gyr ago) + I . In this calculation, we remove the hydrostatic component of the present-day, observed lunar inertia tensor73. Although much of the geologic history outside of the PKT is buried within I , it is important to note that most of the other geologic processes on the Moon (for example, impact basins and mare basalts) have negligible contributions to the lunar inertia tensor21 (Extended Data Fig. 8f). The most substantial other contributors are the South Pole–Aitken basin and its global ejecta blanket21, and the Moon’s fossil figure—the remnant rotational and tidal bulge, preserved from when the Moon’s lithosphere cooled sufficiently to support long-term deformation. Although the nature of the fossil figure is debated20, 21, and the formation of the South Pole–Aitken basin is still poorly understood79, both of these events would have occurred very early in lunar history, probably predating the initial conditions of our PKT thermal model. Thus, we do not expect I to change appreciably during the course of lunar history, but rather expect only negligible perturbations due to the formation of impact basins with time. Because I is known and I (0 Gyr ago) is inferred from our PKT thermal evolution models, we rearrange the above equation (I = I (0 Gyr ago) + I ) to determine I . By isolating the non-PKT, non-hydrostatic component of the lunar inertia tensor, we then determine the inertia tensor as a function of time from our PKT thermal models: I(t) = I (t) + I . The palaeopole can be calculated at any time-step in the model by taking the inertia tensor at that time, evaluating the eigenvalue problem and identifying the orientation of the maximum axis of inertia. Figure 4 and Extended Data Figs 9 and 10 show representative TPW tracks calculated using this method. Supplementary Video 1 shows an example of this TPW for our nominal model (model W; C = 0), as viewed from an outside observer. As a consequence of our definition of I , the TPW track will always end at the present-day rotation pole. However, the TPW track is not forced to pass through the epithermal neutron palaeopole, although this happens frequently, owing to the placement of the PKT model at the PKT. The thermal evolution of the PKT is not completely axisymmetric about the centre of the PKT, owing to the 3D nature of the problem. This results in intermediate and minimum principal axes of inertia that are not quite equal (I ≠ I ), in addition to small, non-zero off-diagonal terms in the inertia tensor (I ≠ I ≠ I ≠ 0). These terms can have a small effect on the orientation of the maximum principal axis of inertia derived using the above method. To account for this variation, we rotate the PKT anomaly about the vector aligned with the PKT, and repeat the analysis for all possible PKT rotation angles. Error bars in our TPW paths (for example, in Fig. 4) indicate the 1σ uncertainty in the palaeopole position that results from this effect. In general, it is negligible. Thus far, we have only considered solutions where the compensation state of the lunar crust is constant with time and independent of the position on the surface of the Moon. More complicated histories of the strength of the lithosphere might be possible, but a full parameter-space survey is beyond the scope of this work. In Extended Data Fig. 10m–o, we present three example TPW tracks for cases for which the compensation state varies monotonically with time. Models that allow for the compensation state to increase with time (Extended Data Fig. 10n, o) produce TPW tracks that would markedly reduce the age of the epithermal neutron palaeopole (to only about 1.5 Gyr in Extended Data Fig. 10o). Although this sort of weakening of the lithosphere with time might not be physical, it is possible that the loading and isostatic adjustment (or non-adjustment) of the PKT mare basalts and other near-surface mass anomalies could replicate the effect of this time-varying compensation state. Thus, further study of the geologic and geophysical history of the PKT could provide insight into the long-term stability of lunar polar ice. Our nominal TPW models suggest a source for the observed off-polar hydrogen (plausibly in the form of water ice) that predates the migration of the lunar spin axis. Because this hydrogen would have to survive for what could amount to several billion years, it may have experienced temperature conditions warmer than present. Therefore, it is important to consider the long-term stability of polar hydrogen. Water ice will be stable if the temperatures in the first few metres of the Moon’s surface remain below about 145 K. Even near the poles, directly illuminated surfaces will experience maximum temperatures that exceed 145 K, which leads to ground ice being stable only within polar craters or regions with high topographic relief5. Above this temperature, water ice (thicker than a single surface-bound monolayer) will sublime on geologic timescales, with rates exceeding 1 mm Gyr−1 (refs 25, 81). Although a single monolayer of water is more stable, it is probably not sufficient to cause the observed hydrogen excess. A typical sample of Apollo lunar regolith has a surface area of about 0.5 m2 g−1 (refs 82, 83). An idealized monolayer contains approximately 1015 molecules per square centimetre, so a monolayer contains approximately 5 × 1018 molecules per gram of regolith. This corresponds to a mass of 1.5 × 10−4 grams of H O, or 17 p.p.m. of hydrogen atoms. Although variations in grain size may change the ratio of surface area to volume (and thus the mass fraction of hydrogen), with these assumptions, adsorption of water molecules directly to regolith can contribute only a small fraction of the minimum plausible hydrogen concentration observed at the epithermal neutron palaeopole (Fig. 1a, b). Thus, we assume that the observed hydrogen corresponds to either water ice mixed within regolith (pore ice), or hydrogen bound within mineral grains. Ancient ice must also survive billions of years of impact gardening, a process that will slowly mix ice with the surrounding regolith13, 84. Impact gardening can result in both ice loss, because ice is brought to the warmer near-surface, and preservation of ice, because ice is buried under layers of protective, thermally insulating regolith. However, impact gardening processes will dominate only if the water ice is completely immobile (as would be the case for adsorbed water). Given even short windows of time with temperatures above about 70–90 K, buried water molecules can migrate towards the surface, driven solely by the water vapour concentration gradient between the regolith and the vacuum of space. Water will migrate upward until it hits the predicted ice stability depths (Fig. 1c, d, Extended Data Fig. 4) where it will remain and concentrate, because loss rates to space are slow enough (1 mm Gyr−1) that it will not thermally sublimate over geologic time. The ice may be buried again, mixed into the regolith or lost by an impact related process, but, assuming some small amount of thermal mobility, it will again return to the ice stability depth. Therefore, even accounting for impact gardening, as long as temperatures remain greater than about 70–90 K, but never exceed about 145 K, the predicted depths should be a good proxy for detectable hydrogen. Ice stability models presented here show that ice can be stable both at the current and proposed palaeopole orientations. Extended Data Figure 4e, f shows that there are large areas that are stable in the upper 2.5 m for ice both at the current lunar pole position and at the proposed palaeopole (ice stability depth is assigned as an average of the two models). However, if wander led to a spin pole much further from the palaeopole, ground ice would no longer be stable in these locations. To estimate how far a shadowed crater could move from the pole and still retain large amounts of ground ice, we look at previous studies examining the effects of lunar obliquity on ice stability. In such studies, a polar crater (Shackleton) was found to retain stable ice until the Moon tilted by >12° (refs 4, 49). We use this 12° limit as an approximate estimate for the maximum extent of polar wander that can occur with respect to a palaeopole and still allow for the preservation of water ice at the pole. In fact, some wander past the current pole would aid in the migration of buried water to the surface by creating slightly warmer conditions than present. At present, some cold traps are so cold (maximum temperatures < 90 K) that ice is effectively immobile25, leaving it to be slowly buried by impact gardening13. TPW might have caused these areas to experience conditions warm enough that ice buried by impact gardening would migrate towards the surface (~90 K < T < 145 K), driven by the concentration gradient (with the vacuum of space). Although many of our TPW paths remain within this 12° ice-stability limit, suggesting that the hydrogen observed at the epithermal neutron palaeopole is plausibly water ice (Fig. 4a, b, Extended Data Fig. 10a–d, n, o), many do not. In these ‘large wander’ cases, the shadowed regions near the epithermal palaeopole may have experienced temperatures that exceeded the 145-K stability limit for water ice. This would suggest that the epithermal neutrons may be mineralogically trapped or bound hydrogen, rather than pore ice. Most mineralogies will have higher bonding strength than that of water to water, and be more stable to large temperature fluctuations. It may be that pore ice was originally stable at these locations, but has since been partially lost (perhaps via hydrothermally interacting with the surrounding regolith), leaving only the most stable forms of hydrogen behind. However, impact gardening will slowly bury the grains this water is bound to and thus limit the length of time such hydrogen will be in sufficient abundances to be detectable via neutron spectrometry. It is also plausible that the observed hydrogen might never have been water. Hydrogen can implant into permanently shadowed regions both from Earth’s magnetotail and by backscattering of solar-wind hydrogen off of nearby irradiated crater walls. However, an explanation of why such a mechanism would result in the observed antipodal ice distribution has not been proposed. Perhaps areas that once harboured water ice are more accepting of solar-wind hydrogen. The time required to build up about 100 p.p.m. of rim-entrapped hydrogen in a permanently shadowed region has been estimated28 to be of the order of 200 Myr. If not continuously resupplied, then hydrogen trapped at defects in grain rims has a chance to escape, with this chance depending primarily on two variables: diffusion activation energy and temperature. A range of realistic activation energies were found85 for which hydrogen would be retained at low lunar temperatures and for billions of years. Regardless of the hydrogen source, defects in weathered grain rims can create a large volume for hydrogen retention, and may be sufficient to explain the observed hydrogen concentrations. The possible retention of hydrogen trapped at defects within the rims of the lunar grains themselves has been calculated28, 86. The maximum concentration is set by the maximum retention of implanted hydrogen in laboratory experiments, and is about 2 × 1017 cm−2. The thickness of the implantation rim is taken to be 100 nm. For a lunar soil surface area of 0.5 m2 g−1, the maximum trapped-hydrogen concentration is about 1,700 p.p.m. (ref. 28). Large-scale defects, such as radiation tracks, can react with water87, 88, 89, 90, 91. Such a situation could occur as once-stable ice deposits begin to sublimate. This reaction was shown to increase the specific surface area and porosity of lunar fines and retain water. However, it is unclear how long water might stay within the grain lattice once it is established there. We adopt a silicate lattice of 3.3 g cm−3 and, hence, an atomic density of about 1023 cm−3; we assume a water density of about 1020 cm−3. By using Fick’s law92, with a typical silicate diffusion coefficient of D = 10−25 cm2 s−1, and calculating the flux J across a 1-μm lattice layer into the vacuum (so that 1020 cm−3 drops to 0 cm−3 in 10−4 cm), we obtain The number of water molecules in a 1 μm × 1 cm2 volume is 1016, so the diffusion timescale is 1016/(0.1 s−1) = 1017 s = 3 Gyr. Using values from the literature93, 94, we estimate D ≈ 10−28 and J ≈ 10−4 cm−2 s−1, which implies that water, having incorporated into lunar materials, will not diffuse from the outer micrometres in several billion years. The mechanisms described above may allow for reasonable long-term (Gyr) storage of hydrogen—either in the form of pore ice or mineralogically bound hydrogen—in the off-polar regions detected by the epithermal neutron data presented here. However, the evidence presented here points to a correlation with preferential water stability along the path of TPW. It is possible that the epithermal neutron distribution marks the surviving hydrogen from an epoch of ice stability or high supply (for example, the late heavy bombardment, a time during which the Moon had a protective magnetic field, or primordial internal water from the Moon’s formation); alternatively, it could trace a history of later-stage addition of hydrogen (for example, outgassing of water from mare volcanism, large volatile-rich impacts or some variation in solar wind). Future orbital missions with high-resolution, high-precision neutron spectrometers might be able to better constrain the extent of the polar hydrogen, and future in situ polar landers or sample return might be able to directly determine the nature of lunar polar hydrogen. Current evidence, such as the detection of water ice in the LCROSS-impact vapour plume26 (this impact occurred very close to our proposed southern palaeopole), suggests that the observed hydrogen enhancement is due to water, and that the Moon may not have wandered an extreme amount since the deposition of this water.
Somatic mutation data were obtained from four publicly available data sources: TCGA, ICGC, Alexandrov et al.6 and Zheng et al.8. The single base substitution data from the ICGC were obtained from the ICGC data portal (release 16), and data from Alexandrov et al.6 were obtained from ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl. For the XPCwt and XPC−/− skin SCC samples, mutations were obtained from the database of Genotypes and Phenotypes (dbGap) (phs000830). These mutations were used directly for the analysis. For samples obtained from TCGA, mutations were called from BAM files obtained from CGHub30 using Strelka31 with default parameters. In all analyses, only single nucleotide variants (generally referred to as mutations in the manuscript) have been used, as the frequencies of other types of mutations were too low for robust statistical analyses. Enhancer and promoter DHSs were defined using cell-type matched DNase-seq data obtained from various publicly available data sources (Supplementary Table 1). Aligned DHS and histone ChIP-seq data were downloaded and peaks were called using the FindPeaks tool within the Homer package32 using the ‘dnase’ and ‘histone’ modes, respectively. Putative promoter and enhancer DHSs were then called based on the overlap of DHS peaks (which are all 150 bp wide by default), with H3K4me3 and H3K4me1 peaks, respectively. To further increase the confidence of promoter and enhancer DHS annotations, only those that were identified by CAGE data as transcribed p1 promoters and enhancers respectively by FANTOM5 (refs 18, 27) were selected for the analysis. This resulted in a final set of promoter and enhancer DHS of 150 bp in size for each cell type, with each associated with a cancer type as identified in Supplementary Table 1. A universal promoter and enhancer DHS data set was also generated by merging all promoter and enhancer DHS regions across all cell types, but retaining only a single representative 150 bp DHS for regions where multiple DHSs overlapped. For DHS flanking regions, a 1 kb region on either side of the 150 bp DHS centre was used. Heterochromatin regions were regions that were identified as such by ChromHMM33 across all cell types. For defining coding regions, intronic regions and TSSs, the canonical UCSC genes data set was obtained from the UCSC table browser. Overlapping coding regions were merged to generate a non-redundant set of coding regions for assessing coding mutation density. To exclude genomic regions where unique short read mapping may be challenging, Duke Uniqueness, Duke Excluded Regions and DAC Blacklisted Regions were obtained from the UCSC table browser and removed from all of the above annotations. A summary of mutation counts for each sample and associated region sizes can be found in Supplementary Table 2. The mutation density of each region was reported as the number of mutations found in a particular genomic region, normalized by region size and the number of cancer samples. It was then converted to mutations per Mb. A bootstrapping analysis was performed to determine the confidence interval threshold for the number of mutations required to identify a 2-fold increase in mutation density at promoters relative to flanking regions. For this analysis, mutations within each sample were randomly shuffled to other genomic locations and the density of mutation within promoter DHS and flanking regions was computed. This was repeated 1,000 times to obtain robust confidence intervals. To assess statistically significant differences between promoter DHSs, enhancer DHSs and flanking regions, the χ2 test with Yates correction was used. For evaluating the significance of increased local promoter DHS density across individual samples of a given tumour type, a paired ratio t-test was used. The results of these tests are summarized in Extended Data Tables 1 and 2. The investigators were not blinded to allocation during experiments and outcome assessment. Trinucleotide mutation frequencies for each cancer sample were counted as previously described6. In brief, the frequency of point mutations was calculated in each of the possible 96 trinucleotide 5′ to 3′ contexts. These were counted either across the genome or within promoter DHSs or promoter DHSs ±1 kb flanking regions. Mutation signatures were defined by the relative frequency of these 96 trinucleotide mutations. Hierarchical clustering was performed using the mutation signatures to distinguish samples based on mutation processes6. Pearson’s correlation was used to determine the association of mutation signatures from ref. 6 to the mutation signature of specific samples. The total numbers of C and G bases, or each of the 32 possible trinucleotides, were counted within each of the respective genomic regions. For normalization by GC content, the C > N and T > N mutation density was normalized by the percentage GC within each region. For normalization by trinucleotide mutation frequencies, each of the 96 possible trinucleotide mutations were normalized by the respective trinucleotide frequency within each region. The normalized mutation rate was then calculated from the sum of the normalized trinucleotide mutation frequencies for each region. The normalized ratio is thus the ratio of the sum of the normalized frequencies in the two regions. To standardise a set of promoters for regression analysis, we defined a set of core promoter regions as −100bp of the TSS of canonical UCSC genes. This TSS −100 bp region was chosen as it reflects the most nucleosome-free region in gene promoters. For each of these regions, the following was computed, (1) DNase-seq coverage using the respective DNase-seq data set as defined in Supplementary Table 1; (2) gene expression of the associated gene based on the average expression of each gene across all samples of the corresponding cancer type with normalized RNA-seq expression data obtained from the ICGC data portal; (3) replication timing, calculated as the average value within the region from the wavelet-smoothed signal obtained from the ENCODE project34. The closest matching cell line for each cancer type was used (melanoma: NHEK, ovarian: HeLa-S3, lung: IMR90); (4) the proportion of rare SNPs, as the ratio of rare SNPs (defined as derived allele frequency <0.5%) to all SNPs from the 1000 Genomes Project35; (5) cancer genes, as only those listed by the Cancer Gene Census36; (6) the conservation of each promoter, as the average GERP score37 across the region; (7) the number of mutations within each promoter for each cancer type analysed; (8) GC content of the region computed as described above; (9) relative frequency of each of the 32 trinucleotide combinations. Regression models were only computed for the mutations from melanoma, ovarian and lung cancers as only these cancers had sufficient numbers of total promoter mutations to allow for the generation of a reliable regression model and they also constituted the cancers in which a majority of samples exhibited increased local promoter DHS mutation density. For the univariate logistic regression, with the exception of the cancer gene variable which was defined as a categorical variable, all variables were standardised to a mean of zero and standard deviation of one. The regression was performed using the glm package in R. The odds ratio was calculated by exponentiation of the coefficients and the P values were obtained directly from the regression model. For multivariate logistic regression, all variables were combined in a linear equation. The matrices used for regression analysis are provided in Supplementary Tables 3–5. To assess statistical significance, we used Poisson regression to evaluate the associations between DNase I hypersensitivity or NER with mutation density. For DNase I hypersensitivity with NER, linear regression was used. The regression was performed using the glm package in R. The odds ratio was calculated by exponentiation of the coefficients and the P values were obtained directly from the regression model. The set of TSSs were stratified into four quarters of DNase I hypersensitivity and expression using the data generated as described above for regression analysis. For each set of TSSs, mutation profiles were generated by counting the number of mutations for each respective cancer type across a ±5,000 bp window across each TSS. TSSs were orientated accordingly, such that the gene body is on the right of the TSS in the profiles. The mutation counts were normalized to mutations per Mb and plotted in 100 bp (for profiles stratified by DHS) and 5 bp (for profiles stratified by expression) windows. Our logistic regression analysis revealed that the presence of promoter mutations was significantly anti-correlated with average conservation (GERP score)37 of gene promoters in melanoma and lung cancer (Extended Data Table 3). Since transcription factor binding sites are generally more highly conserved than their flanking regions38, we reasoned that the increased conservation of mutated bases may reflect an increased likelihood for somatic mutations to occur within transcription factor binding sites. To test this hypothesis transcription factor footprinting analysis was performed using melanocyte DGF data from the Human Epigenome Atlas (GSM1024610). Raw reads were obtained from the Sequence Read Archive (SRA) and reads aligned using BWA39. Since the transcription pre-initiation complex comprising of TATA-binding protein (TBP) and other general transcription factors has been shown to be present within a ~50-bp region upstream of TSSs of actively transcribed genes40, Wellington41 was used to compute default and 50 bp footprints (parameters: ‘-sh 13,17,1 -fp 46,54,1’) across TSS −100 bp regions. The number of melanoma mutations was then counted inside and outside of footprints. To standardise the comparison between mutation counts in transcription factor binding (footprinted) and unoccupied (non-footprinted) sites, only promoter regions where a footprint had been detected were used for further analysis. For statistical analysis of significance, paired t-test was used for comparing mutated and non-mutated bases inside and outside of footprints. Mutation and DGF profiles were generated by averaging read coverage centred on all 50 bp footprints. To directly compare the relative mutation density at promoter and enhancer DHSs, a set of active promoters represented by those within the top 25% of DNase I hypersensitivity was selected. A corresponding set of enhancers with matched DNase I hypersensitivity was selected in which for each promoter, an enhancer within ±5 DNase-seq read coverage was randomly selected. The process was repeated 100 times to account for variations in enhancer selection. For the analysis of transcriptionally active versus less active enhancers, enhancer data from the FANTOM5 consortium was used18. FANTOM5 defined a set of ubiquitous enhancers that have been found to strongly promote the transcription of enhancer RNA across all cell lines examined18. The coordinates for ubiquitous enhancers (n = 200) and permissive enhancers (n = 43,011) were obtained from (http://enhancer.binf.ku.dk/presets/). To ensure that these sites do not overlap our promoter data set, we subtracted enhancer regions that overlapped with promoter DHS from any cell type. As for the comparison of active promoters and DNase I hypersensitivity matching enhancers described above, the same procedure was used to select DNase I hypersensitivity matching permissive enhancers. In this case, all ubiquitous enhancers were used as, by definition, they are active in all cell types. Genome-wide NER sequencing (XR-seq) data sets of CPD and 6–4PP from ultraviolet-irradiated normal human skin fibroblast cells, generated previously5, were obtained in SRA format (GSM1659156). The raw reads were trimmed using trimmomatic42 and aligned using Bowtie43 as described5. De-duplicated aligned reads for the two replicate experiments were then merged and average coverage profiles were generated. For the quantification of repair ratios against DNase I hypersensitivity at gene promoters, a 100 bp window was selected around all promoter DHS peak centres and flanking regions defined as ±150 bp. This range was selected as it was the region that best defined the peak and trough in NER near the TSS. As XR-seq reads are ~30 bp in length, to avoid multi-counting of reads that overlap both the DHS centre and flanking regions, the centre of each read was used to define its overlap with the genomic regions. The DNase I hypersensitivity of each promoter was quantified as the number of melanocyte DNase-seq reads overlapping the promoter DHS region. To establish the relationship between DNase I hypersensitivity and repair ratio, XR-seq reads for the respective regions were summed within bins of 25 DNase-seq coverage and normalized by region size. The repair ratio was calculated as the ratio of promoter DHS/flank for each bin. For enhancers, the repair ratio between the DHS centre and flank was similarly calculated for FANTOM5 ubiquitous and permissive enhancers. To compare mutation density with CpG methylation status at gene promoters of XPC−/− SCC genomes, processed whole genome bisulfite sequencing data from Normal human epithelial keratinocytes (NHEK) was obtained from the Human Epigenome Atlas44. For the generation of the average methylation profile, deepTools45 was used. To correlate promoter methylation status with the mutation density of C > T mutations within a [C/T]CpG context, the mutation density and average fraction of methylation was measured within all TSS ±1 kb regions. Scripts and annotation files used for data analysis are available as a zipped file under Supplementary Information.
News Article | May 5, 2015
MERRIAM, Kan.--(BUSINESS WIRE)--IKEA, the world’s leading home furnishings retailer, today announced it had officially plugged-in Kansas’ largest rooftop solar array, atop the recently opened IKEA Merriam. The 92,000-square-foot solar array consists of a 730.17-kW DC system, comprised of 2,394 panels, and will produce approximately 986,800 kWh of electricity annually for the store, the equivalent of reducing 680 tons of carbon dioxide (CO ) – equal to the emissions of 143 cars or providing electricity for 94 homes yearly (calculating clean energy equivalents at www.epa.gov/cleanenergy/energy-resources/calculator.html). For the development, design and installation of the Kansas City-area store’s customized solar power system, IKEA contracted with Chicago-based SoCore Energy a wholly owned subsidiary of Fortune 500 company Edison International. With hundreds of designed and installed solar projects, SoCore is one of the largest commercial solar developers in the U.S. “Plugging-in this solar array is an exciting milestone to follow-up on our successful opening last fall,” said Rob Parsons, IKEA Merriam store manager. “IKEA strives to create a sustainable life for communities where we operate, so we are proud IKEA Merriam now has solar power for our electricity besides geothermal technology to heat and cool the building.” This installation will represent the 41st solar project for IKEA in the U.S, contributing to the IKEA solar presence atop nearly 90% of its U.S. locations, and a total generation goal of 40 MW. IKEA owns and operates each of its solar PV energy systems atop its buildings – as opposed to a solar lease or PPA (power purchase agreement) – and globally has allocated $1.8 billion to invest in renewable energy through 2015. This investment reinforces the long-term commitment IKEA has to sustainability and confidence in photovoltaic (PV) technology. Consistent with the company’s goal of being energy independent by 2020, IKEA has installed more than 700,000 solar panels on buildings across the world and owns approximately 157 wind turbines in Europe and Canada, with 104 others being built in the U.S. Drawing from its Swedish heritage and respect of nature, IKEA strives to minimize its operations’ carbon emissions because reducing its environmental impact makes good business sense. Globally, IKEA evaluates locations regularly for conservation opportunities, integrates innovative materials into product design, works to maintain sustainable resources, and flat-packs goods for efficient distribution. Specific U.S. sustainable efforts include: recycling waste material; incorporating energy-efficient HVAC and lighting systems, recycled construction materials, skylights in warehouse areas, and water-conserving restrooms. Operationally, IKEA eliminated plastic bags from the check-out process, phased-out the sale of incandescent bulbs, facilitates recycling of customers’ compact fluorescent bulbs, and by 2016 will sell only L.E.D. IKEA also has installed EV charging stations at 13 stores, with plans for more locations. The 359,000 square-foot IKEA Merriam, with 1,200 parking spaces, opened September 10, 2014 on 19 acres along the eastern side of Interstate-35 and Johnson Drive, in the city of Merriam, eight miles southwest of Kansas City, Missouri. IKEA Merriam represents the second U.S. store for IKEA with a geothermal component to its heating and cooling system. (Denver-area IKEA Centennial opened with geothermal in 2011.) Incorporating geothermal and solar significantly reduces the energy IKEA Merriam will draw from the power grid. Since its 1943 founding in Sweden, IKEA has offered home furnishings of good design and function at low prices so the majority of people can afford them. There are currently more than 360 IKEA stores in 47 countries, including 40 in the U.S. IKEA incorporates sustainability into day-to-day business and supports initiatives that benefit children and the environment. For more information see IKEA-USA.com, @IKEAUSA, @IKEAUSANews, or IKEAUSA on Facebook, Youtube, Instagram and Pinterest.