Entity

Time filter

Source Type

Hinxton, United Kingdom

Patients with a diagnosis of metastatic breast cancer provided informed consent for de-identified blood collection, as per institutional review board approved protocol (DF/HCC 05-300). Enrolled patients had received multiple courses of therapy, which is typical in advanced ER+ breast cancer, and we did not have sufficient power in this pilot study to enable a statistically significant correlation between the number of therapeutic interventions and the frequency of HER2+ CTCs. Patient-matched primary and metastatic tumour specimens were collected according to institutional review board approved protocol (2002-P-002059), and relevant tumour source data are provided in Supplementary Table 1. Single CTCs were isolated from fresh whole blood by depleting leukocytes using the microfluidic CTC-iChip as previously described3. Briefly, whole blood samples were incubated with biotinylated antibodies against CD45 (R&D Systems, clone 2D1), CD66b (AbD Serotec, clone 80H3) and CD16 (BD, clone 3G8) followed by incubation with Dynabeads MyOne Streptavidin T1 (Invitrogen) to achieve magnetic labelling of white blood cells. This mixture was processed through the CTC-iChip, and the CTCs were stained in solution with Alexa 488-conjugated antibodies against EpCAM (Cell Signaling Technology, clone VU1D9) and HER2 (Cell Signaling Technology, clone 29D8 or Janssen R&D) and identified by imaging flow cytometry (Amnis). Individual CTCs were picked after staining as described above, and PE-CF594-conjugated antibody against CD45 (BD Biosciences, clone HI30) was included to stain contaminating leukocytes. CTCs were individually micromanipulated using a 10 μm transfer tip on an Eppendorf TransferMan NK 2 micromanipulator, transferred into PCR tubes containing RNA protective lysis buffer, and flash frozen in liquid nitrogen as previously described26. Standard CTC enumeration of fixed samples is performed on the BioView high content imaging system following Megafunnel fixation and staining with the combination of wide spectrum cytokeratin (Abcam, ab9377), EpCAM (Cell Signaling Technology, clone VU1D9), EGFR (Cell Signaling Technology, clone D38B1) and HER2 (Cell Signaling Technology, clone 29D8) antibodies. For mouse xenograft studies, blood was collected via cardiac puncture and ~1 ml of blood was processed through the microfluidic CTC iChip. CTCs were enumerated on the BioView imaging system after staining with Alexa 488-conjugated antibodies against EpCAM (Cell Signaling Technology, clone VU1D9), HER2 (Janssen R&D or Cell Signaling Technology, clone 29D8) and GFP (ab13970) followed by secondary antibodies conjugated with Alexa-488 (Invitrogen). Tissues were sectioned, and slides were incubated in 0.3% hydrogen peroxide in methanol for 20 min to block endogenous peroxidase activity. Tissues were permeabilized, and antigen retrieval was performed in 1× citrate buffer (pH 6) for 15 min. Slides were washed and blocked for 30 min with 5% goat serum. Primary HER2 (Cell Signaling, 29D8) or GFP (Living Colours AV 632381) antibodies were diluted 1:75 or 1:250 in DAKO antibody diluent and samples were incubated for 1 h at room temperature. Slides were incubated with HRP anti-rabbit antibody (EnVision + DAKO) for 30 min. After washing with PBS, the peroxidase reaction was performed with 3,3′-diaminobenzidine (DAB) from Vector Laboratories for 10 min. Cells were counterstained with Gill’s #2 haematoxylin for 10–15 s, dehydrated with ethanol and cleared with xylene before mounting. Images represent at least five independent fields from six to eight xenograft tumours per condition. Fluorescence in situ hybridization was performed as described previously27, 28. Briefly, 5-μm sections of formalin-fixed, paraffin-embedded tumour samples were de-paraffinized, hydrated and pretreated with 0.1% pepsin for 1–2 h. Slides were then washed in 2× saline-sodium citrate buffer (SSC), dehydrated, air dried and co-denatured at 80 °C for 5 min with a mixture of CEP17 and HER2 probes and hybridized at 40 °C overnight using the Hybrite Hybridization System (Abbott). Two-minute post-hybridization washes were performed in 2× SSC/0.3%NP40 at 72 °C followed by a 1 min wash in 2× SSC at room temperature. Slides were mounted with Vectashield containing 4′,6-diamidino-2-phenylindole (Vector, Burlingame, California, USA). Entire sections were observed with an Olympus BX61 fluorescent microscope equipped with a charge-coupled device camera and analysed with Cytovision software (Applied Imaging, Santa Clara, California). The HER2 and CEP17 signals were quantified in 50 randomly selected, non-overlapping nuclei, and mean numbers of HER2 and CEP17 copies per nucleus were calculated. HER2 was considered amplified when the HER2:CEP17 ratio was ≥2.0 or HER2 signals per nuclei was >6 following the guidelines of the American Society of Clinical Oncology/College of American Pathologists29. The probes used in this study consisted of centromeric CEP: 17p11.1-q11.1, spectrum aqua (Abbott Molecular, Des Plaines, Illinois) and locus-specific identifier probes derived from bacterial artificial chromosome RP11-94L15 (17q12-17q21.1, spectrum orange probe (CHORI, Oakland, California)). CTC cultures were grown in suspension in ultra-low attachment plates (Corning) in tumour sphere medium (RPMI-1640, EGF (20 ng/ml), bFGF (20 ng/ml), 1X B27, 1X antibiotic/antimycotic (Life Technologies)) under hypoxic (4% O ) conditions. The breast CTC lines, Brx-42, Brx-82 and Brx-142, were derived from CTCs isolated using the CTC-iChip as previously described4. CTC lines were routinely checked for mycoplasma, using a mycoplasma detection kit (MycoAlert, Lonza), and were authenticated by RNA-seq, MS and DNA-seq (1,000 gene mutation panel). Cells were trypsinized into single-cell suspensions, resuspended in Hanks’ balanced salt solution (HBSS), and incubated with Anti-HER2/NEU APC (BD, clone 42 c-erbB-2), Anti-HER2 FITC (Janssen R & D) or Annexin V FITC (BD, clone RUO) antibodies for 20 min at 4 °C. Unbound antibodies were washed from cells using HBSS. For analytical flow, cells were fixed with 3% paraformaldehyde and analysed using a Laser BD Fortessa instrument. For sterile live-cell flow cytometry, cells were sorted using a Laser BD FACS Aria Fusion Cell Sorter, BSL2+. FACS plots are representative of at least two independent experiments performed within 6 months of culture initiation (Figs 1d and 2a and Extended Data Figs 1f and 3a). Genomic DNA extracted from CTC-derived cell lines was sequenced using a multiplex polymerase chain reaction (PCR) technology called Anchored Multiplex PCR (AMP) for single nucleotide variant (SNV) and insertion/deletion (indel) detection using next generation sequencing (NGS) as previously described30. Briefly, genomic DNA was isolated from cell lines and then sheared with the Covaris M220 instrument, followed by end-repair, adenylation and ligation with an adaptor. A sequencing library targeting hotspots and exons in 39 commonly mutated, cancer-associated genes was generated using two hemi-nested PCR reactions. Illumina MiSeq 2 × 151 base paired-end sequencing results were aligned to the hg19 human genome reference using BWA-MEM31. MuTect32 and a laboratory-developed insertion/deletion analysis algorithm were used for SNV and indel variant detection, respectively. This assay has been validated to detect SNV and indel variants at 5% allelic frequency or higher in target regions with sufficient read coverage. To produce replication-incompetent lentivirus, 293T cells were co-transfected with either Lenti-Luc-GFP or Notch intracellular domain-pcw107 (Addgene 64621) constructs in combination with REV, VSVG, PDML or pMD2.G and psPAX2 (Addgene) using Lipofectamine Plus reagent (Invitrogen). Twenty-four hours later, growth medium was replenished. Viral supernatants were harvested 48 h post-transfection, concentrated with Lenti-X Concentrator (Clontech), and viral pellets were resuspended in 400 μl base medium. CTC cultures were infected overnight with 100 μl lentivirus in 6 μg/ml Polybrene. Puromycin (3 μg/ml) was used to select transduced cells over a period of 7 days. For the RNAi knockdown, CTC lines Brx-42, Brx-82 and Brx-142 were reverse transfected in ultra-low attachment six-well plates (Corning) with 25 nM siRNA smart pools (Dhamacon) containing the combination of four different siRNA oligonucleotides for ERBB2/HER2 (GGACGAAUUCUGCACAAUG; GACGAAUUCUGCACAAUGG; CUACAACACAGACACGUUU; AGACGAAGCAUACGUGAUG), NOTCH1 (GCGACAAGGUGUUGACGUU; GAUGCGAGAUCGACGUCAA; GAACGGGGCUAACAAAGAU; GCAAGGACCACUUCAGCGA), NRE2L2 (GAGAAAGAAUUGCCUGUAA, CCAAAGAGCAGUUCAAUGA, UAAAGUGGCUGCUCAGAAU; UGACAGAAGUUGACAAUUA) or the negative control gene GAPDH. siRNA pools for target genes were deconvolved to demonstrate targeted knockdown efficiency (more than two siRNAs per gene). CTC lines were spun onto poly-l-lysine-functionalized glass slides with Spintrap, fixed with 3% paraformaldehyde, permeabilized with 0.1% Triton X and stained with nuclear 4,6-diamidino-2-phenylindole (DAPI) stain, HER2 (Cell Signaling Technologies, clone 29D8), Ki67 (Zymed), Cleaved Caspase-3 (Cell Signaling Technologies, clone D3E9) and/or NOTCH1 (Cell Signaling Technologies, clone D1E11) antibodies. Secondary antibodies were conjugated to either Alexa Fluor 488 or Alexa Fluor 594 (Life Technologies), and fluorescence was measured using the Nikon 90-I fluorescent microscope. Images are representative of at least three independent images per sample. Single HER2+ or HER2− CTCs were flow sorted in 96-well white-walled plates (Corning) using Laser BD FACS Aria Fusion Cell Sorter, BSL2+. Single cell, 1-, 3-, 5- to 9-, 10- to 20- and >20-cell clones were analysed for heterogeneity in HER2 expression via staining with antibodies against EpCAM (FITC labelled; Cell Signaling, clone VU1D9) and HER2 (APC labelled, BD, clone 42 c-erbB-2). Imaging and image processing was performed sequentially with the confocal microscope (Zeiss 710 Laser Scanning Confocal) followed by FIJI (Image J). Images are representative of at least 20 independent images per colony size. Trimmomatic was used to crop reads lengths to 50 nucleotides, and to remove the TruSeq3-PE-2 Illumina adapters. The paired-end reads were then aligned using tophat2 and bowtie1 with the no-novel-juncs argument set with human genome version hg19 and transcriptome defined by the hg19 genes.gtf table from http://genome.ucsc.edu. Reads that did not align or aligned to multiple locations were discarded. The number of reads aligning to each gene was then determined using htseq-count. Samples that had fewer than 105 reads were discarded. The read count for each gene was divided by the total counts assigned to all genes and multiplied by one million to form the reads per million (RPM). Samples for which the expression of the white blood cell marker PTPRC (CD45) was greater than 10 RPM were discarded. Single-cell RNA-seq data have been deposited in the Gene Expression Omnibus under accession number GSE75367. To establish that the distribution of HER2 expression in CTCs is multi-modal, we applied the Hartigans’ dip test as implemented in the diptest R-package to the log (RPM + 1) values with 10 RPM as the threshold to define HER2− versus HER2+ CTCs. To establish that the distribution has two modes and not more, we applied the density function of R with default values to the log (RPM + 1) values. On the basis of the analysis of bimodality above, we defined HER2+ samples to be those for which the expression of HER2 exceeded 10 RPM and defined the rest to be HER2−. For the mass spectrometric data, enrichment of signalling pathways was determined by submitting the average log fold-change in protein abundance between the HER2-high and HER2-low samples to the pre-ranked function of the Broad Institute’s GSEA software using gene sets in the Pathway Interaction Database (PID) and KEGG as curated in version 4 of the Broad Institute’s MSigDB (http://www.broadinstitute.org/gsea/msigdb/). Pathway enrichment for the RNA-seq of the CTCs was done the same way with the exception that the full RPM matrix for the CTCs and the HER2+ versus HER2− distinction was input to the GSEA software instead of log fold-change. CTC cell pellets were re-suspended in lysis buffer containing 75 mM NaCl, 50 mM HEPES (pH 8.5), 10 mM sodium pyrophosphate, 10 mM NaF, 10 mM β-glycerophosphate, 10 mM sodium orthovanadate, 10 mM phenylmethanesulfonylfluoride, Roche Complete Protease Inhibitor EDTA-free tablets and 3% sodium dodecyl sulfate. Cells were lysed by passing them ten times through a 21-gauge needle, and the lyses were prepared for analysis on the mass spectrometer essentially as described previously5. Briefly, reduction and thiol alkylation were followed by purifying the proteins using MeOH/CHCl3 precipitation. Protein digest was performed with Lys-C and trypsin, and peptides were labelled with TMT-10plex reagents (Thermo Scientific)33 and fractionated by basic pH reversed phase chromatography. Multiplexed quantitative proteomics was performed on an Orbitrap Fusion mass spectrometer (Thermo Scientific) using a simultaneous precursor selection (SPS)-based MS3 method34. MS2 spectra were assigned using a SEQUEST-based proteomics analysis platform35. On the basis of the target–decoy database search strategy36 and employing linear discriminant analysis and posterior error histogram sorting, peptide and protein assignments were filtered to a FDR of < 1% (ref. 35). Peptides with sequences that were contained in more than one protein sequence from the UniProt database were assigned to the protein with most matching peptides35. TMT reporter ion intensities were extracted as that of the most intense ion within a 0.03-thomson window around the predicted reporter ion intensities in the collected MS3 spectra. Only MS3 with an average signal-to-noise value larger than 40 per reporter ion as well as with an isolation specificity5 larger than 0.75 were considered for quantification. A two-step normalization of the protein TMT-intensities was performed by first normalizing the protein intensities over all acquired TMT channels for each protein on the basis of the median average protein intensity calculated for all proteins. To correct for slight mixing errors of the peptide mixture from each sample, a median of the normalized intensities was calculated from all protein intensities in each TMT channel, and protein intensities were normalized to the median value of these median intensities. Protein interactions were extracted from the String database (high confidence score > 0.7)37. Overlapping proteins were assigned to the pathway with the greatest number of proteins, and enriched PID pathways were ranked by log (P value) to the nearest thousandth. Mass spectrometry raw data have been deposited in the MassIVE proteomics data repository under the accession number MSV000079419. Drugs were obtained from the MGH Center for Molecular Therapeutics and are listed in Supplementary Table 6. They were chosen because of their common clinical use for treatment of breast cancer or unique targeting of epigenetic/stem cell pathways. One thousand cells were seeded in tumour sphere media in 384-well ultra-low attachment plates in triplicate wells on duplicate plates 24 h before the addition of drugs. Three independent drug concentrations centred on the reported IC were used (Supplementary Table 6). Cell viability was assayed 6 days after drug treatment with CellTiter-Glo (Promega) and was normalized to corresponding untreated controls38. In compliance with ethical regulations and approved by the animal protocol (IACUC 2010N000006), 6-week-old female NSG (NOD. Cg-Prkscsdid Il2rgtm1Wjl/SzJ) mice from Jackson Laboratories were anaesthetized with isofluorane, and GFP-LUC labelled CTCs (200,000, 20,000 and/or limiting dilutions as low as 200 cells) or 50:50 mixed CTCs (GFP-LUC+/HER2+: Untagged/HER2−, and the converse) were injected into the fourth right mammary fat pad. A 90-day release 0.72 mg oestrogen pellet (Innovative Research of America) was implanted subcutaneously behind the neck of each mouse. Tumour growth was monitored weekly by in vivo imaging using IVIS Lumina II (PerkinElmer) following intraperitoneal injection (150 μl per animal) of d-luciferin substrate (Sigma). For in vivo drug sensitivity testing, Paclitaxel (10 mg/kg) was administered weekly by intravenous injection for 4 consecutive weeks. Notch inhibitors (Notchi2) LY-411575 (10 mg/kg) or (Notchi3) RO429097 (10 mg/kg) were administered daily (5 days on/2 days off) via oral gavage in 2% solvent (2% sodium caroboxymethyl cellulose) for 4 consecutive weeks. No animal randomization or blinding was used for these mouse studies. All animal studies used six to eight mice per condition to ensure sufficient statistical power.


Tripathi S.,Norwegian University of Science and Technology | Christie K.R.,The Jackson Laboratory | Balakrishnan R.,Stanford University | Huntley R.,UniProt | And 5 more authors.
Database | Year: 2013

Transcription factors control which information in a genome becomes transcribed to produce RNAs that function in the biological systems of cells and organisms. Reliable and comprehensive information about transcription factors is invaluable for large-scale network-based studies. However, existing transcription factor knowledge bases are still lacking in well-documented functional information. Here, we provide guidelines for a curation strategy, which constitutes a robust framework for using the controlled vocabularies defined by the Gene Ontology Consortium to annotate specific DNA binding transcription factors (DbTFs) based on experimental evidence reported in literature. Our standardized protocol and workflow for annotating specific DNA binding RNA polymerase II transcription factors is designed to document high-quality and decisive evidence from valid experimental methods. Within a collaborative biocuration effort involving the user community, we are now in the process of exhaustively annotating the full repertoire of human, mouse and rat proteins that qualify as DbTFs in as much as they are experimentally documented in the biomedical literature today. The completion of this task will significantly enrich Gene Ontology-based information resources for the research community. © The Author(s) 2013. Published by Oxford University Press.


No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. From 2009–2013, environmental data (Supplementary Table 9) were collected across all major oligotrophic oceanic provinces in the context of the Tara Oceans expeditions20. Sampling stations were selected to represent distinct marine ecosystems at a global scale51. Note that Southern Ocean stations were not examined herein because they were ranked as outliers due to their exceptional environmental characteristics and biota23, 24. Environmental data were obtained from vertical profiles of a sampling package48, 49. It consisted of conductivity and temperature sensors, chlorophyll and CDOM fluorometers, light transmissometer (Wetlabs C-star 25 cm), a backscatter sensor (WetLabs ECO BB), a nitrate sensor (SATLANTIC ISUS) and an underwater vision profiler (Hydroptics UVP52). Nitrate and fluorescence to chlorophyll concentrations as well as salinity were calibrated with water samples collected with Niskin bottle48. Net primary production (NPP) data were extracted from 8-day composites of the vertically generalized production model (VGPM)53 at the week of sampling50. Carbon fluxes and carbon export, corresponding to the carbon flux at 150 m, were estimated based on particle concentration and size distributions obtained from the UVP49 and details are presented below. Previous research has shown that the distribution of particle size follows a power law over the micrometre to the millimetre size range3, 54, 55. This Junge-type distribution translates into the following mathematical equation, whose parameters can be retrieved from UVP images: where d is the particle diameter, and exponent k is defined as the slope of the number spectrum when equation (1) is log transformed. This slope is commonly used as a descriptor of the shape of the aggregate size distribution. The carbon-based particle size approach relies on the assumption that the total carbon flux of particles (F) corresponds to the flux spectrum integrated over all particle sizes: where n(d) is the particle size spectrum, that is, equation (1), and m(d) is the mass (here carbon content) of a spherical particle described as: where , is the average density of the particle, and w(d) is the settling rate calculated using Stokes Law: where , is the gravitational acceleration, the fluid density, and the kinematic viscosity. In addition, mass and settling rates of particles, m(d) and w(d), respectively, are often described as power law functions of their diameter obtained by fitting observed data, . The particles carbon flux can then be estimated using an approximation of equation (2) over a finite number (x) of small logarithmic intervals for diameter d spanning from 250 μm to 1.5 mm (particles <250 μm and >1.5 mm are not considered, consistent with the method presented in ref. 56) such as where A = 12.5 ± 3.40 and B = 3.81 ± 0.70 have been estimated using a global data set that compared particle fluxes in sediment traps and particle size distributions from the UVP images. For the sake of consistency between all available data sets from the Tara Oceans expeditions, we considered subsets of the data recently published in Science23, 24, 25. In brief, one sample corresponds to data collected at one depth (surface (SRF) or deep chlorophyll maximum (DCM) determined from the profile of chlorophyll fluorometer) and at one station. To study the eukaryotic community in our current manuscript, we selected stations at which we had environmental data and carbon export estimated at 150 m with the UVP and all size fractions. Consequently a subset of 33 stations (corresponding to 56 samples) has been created compared to the 47 stations analysed in ref. 24. A similar procedure has been applied to the prokaryotic and viral data sets, reducing the prokaryotic data set from ref. 23 to a subset of 104 samples from 62 stations and the viral data set from ref. 25 into a subset of 37 samples from 22 stations (See Supplementary Table 10). In addition a detailed table is provided summarizing which samples (depth and station) are available for each domain (Supplementary Table 11). Photic-zone eukaryotic plankton diversity has been investigated through millions of environmental Illumina reads. Sequences of the 18S ribosomal RNA gene V9 region were obtained by PCR amplification and a stringent quality-check pipeline has been applied to remove potential chimaera or rare sequences (details on data cleaning in ref. 24). For 47 stations, and if possible at two depths (SRF and DCM), eukaryotic communities were sampled in the piconano- (0.8–5 μm), micro- (20–180 μm) and mesoplankton (180–2,000 μm) fractions (a detailed list of these samples is given in Supplementary Table 12). In the framework of the carbon export study, sequences from all size fractions were pooled in order to get the most accurate and statistically reliable data set of the eukaryotic community. The 2.3 million eukaryotic ribotypes were assigned to known eukaryotic taxonomic entities by global alignment to a curated database24. To get the most accurate vision of the eukaryotic community, sequences showing less than 97% identity with reference sequences were excluded. The final eukaryotic relative abundance matrix used in our analyses included 1,750 lineages (taxonomic assignation has been performed using a last common ancestor methodology, and had thus been performed down to species level when possible) in 56 samples from 33 stations. Pooled abundance (number of V9 sequences) of each lineage has been normalized by the total sum of sequences in each sample. To investigate the prokaryotic lineages, communities were sampled in the picoplankton. Both filter sizes have been used along the Tara Oceans transect: up to station #52, prokaryotic fractions correspond to a 0.22–1.6 μm size fraction, and from station #56, prokaryotic fractions correspond to a 0.22–3 μm size fraction. Prokaryotic taxonomic profiling was performed using 16S rRNA gene tags directly identified in Illumina-sequenced metagenomes ( tags) as described in ref. 57. 16S tags were mapped to cluster centroids of taxonomically annotated 16S reference sequences from the SILVA database58 (release 115: SSU Ref NR 99) that had been clustered at 97% sequence identity using USEARCH v. 6.0.30759. 16S tag counts were normalized by the total reads count in each sample (further details in ref. 23). The photic-zone prokaryotic relative abundance matrix used in our analyses included 3,253,962 tags corresponding to 1,328 genera in 104 samples from 62 stations. For each prokaryotic sample, gene relative abundance profiles were generated by mapping reads to the OM-RGC using the MOCAT pipeline60. The relative abundance of each reference gene was calculated as gene-length-normalized base counts. And functional abundances were calculated as the sum of the relative abundances of these reference genes, annotated to OG functional groups. In our analyses, we used the subset of the OM-RGC that was annotated to Bacteria or Archaea (24.4 million genes). Using a rarefied (to 33 million inserts) gene count table, an OG was considered to be part of the ocean microbial core if at least one insert from each sample was mapped to a gene annotated to that OG. For further details on the prokaryotic profiling please refer to ref. 23. The final prokaryotic functional relative abundance matrix used in our analyses included 37,832 OGs or functions in 104 samples from 62 stations. Genes from functions of FNET1 and FNET2 subnetworks were taxonomically annotated using a modified dual BLAST-based last common ancestor (2bLCA) approach61. We used RAPsearch262 rather than BLAST to efficiently process the large data volume and a database of non-redundant protein sequences from UniProt (version: UniRef_2013_07) and eukaryotic transcriptome data not represented in UniRef (see Supplementary Tables 5 and 6, for full annotations). For prokaryote enumeration by flow cytometry, three aliquots of 1 ml of seawater (pre-filtered by 200-μm mesh) were collected from both SRF and DCM. The samples were fixed immediately using cold 25% glutaraldehyde (final concentration 0.125%), left in the dark for 10 min at room temperature, flash-frozen and kept in liquid nitrogen on board and then stored at −80 °C on land. Two subsamples were taken to separate counts of heterotrophic prokaryotes (not shown herein) and phototrophic picoplankton. For heterotrophic prokaryote determination, 400 μl of sample was added to a diluted SYTO-13 (Molecular Probes Inc.) stock (10:1) at 2.5 μ mol l−1 final concentration, left for about 10 min in the dark to complete the staining and run in the flow cytometer. We used a FacsCalibur (Becton & Dickinson) flow cytometer equipped with a 15 mW argon-ion laser (488 nm emission). At least 30,000 events were acquired for each subsample (usually 100,000 events). Fluorescent beads (1 μm, Fluoresbrite carboxylate microspheres, Polysciences Inc.) were added at a known density as internal standards. The bead standard concentration was determined by epifluorescence microscopy. For phototrophic picoplankton, we used the same procedure as for heterotrophic prokaryote, but without addition of SYTO-13. Data analysis was performed with FlowJo software (Tree Star, Inc.). In order to associate viruses to carbon export we used viral populations as defined in ref. 25 using a set of 43 Tara Oceans viromes. In brief, viral populations were defined as large contigs (>10 predicted genes and >10 kb) identified as most likely originating from bacterial or archaeal viruses. These 6,322 contigs remained and were then clustered into populations if they shared more than 80% of their genes at >95% nucleotide identity. This resulted in 5,477 ‘populations’ from the 6,322 contigs, where as many as 12 contigs were included per population. For each population, the longest contig was chosen as the ‘seed’ representative sequence. The relative abundance of each population was computed by mapping all quality-controlled reads to the set of 5,477 non-redundant populations (considering only mapping quality scores greater than 1) with Bowtie2 (ref. 63) and if more than 75% of the reference sequence was covered by virome reads. The relative abundance of a population in a sample was computed as the number of base pairs recruited to the contig normalized to the total number of base pairs available in the virome and the contig length if more than 75% of the reference sequence was covered by virome reads, and set to 0 otherwise (see ref. 25 for further details). The final viral population abundance matrix used in our analyses included 5,291 viral population contigs in 37 samples from 22 stations. The longest contig in a population was defined as the seed sequence and considered the best estimate of that population’s origin. These seed sequences were used to assess taxonomic affiliation of each viral population. Cases where >50% of the genes were affiliated to a specific reference genome from RefSeq Virus (based on a BLASTP comparison with thresholds of 50 for bit score and 1 × 10−5 for e-value) with an identity percentage of at least 75% (at the protein sequence level) were considered as confident affiliations to the corresponding reference virus. The viral population host group was then estimated based on these confident affiliations (see Supplementary Table 13 for host affiliation of viral population contigs associated to carbon export). Viral protein clusters (PCs) correspond to ORFs initially mapped to existing clusters (POV, GOS and phage genomes). The remaining, unmapped ORFs were self-clustered, using cd-hit as described in ref. 25. Only PCs with more than two ORFs were considered bona fide and were used for subsequent analyses. To compute PC relative abundance for statistical analyses, reads were mapped back to predicted ORFs in the contigs data set using Mosaik as described in ref. 25. Read counts to PCs were normalized by sequencing depth of each virome. Importantly, we restricted our analyses to 4,294 PCs associated to the 277 viral population contigs significantly associated to carbon export in 37 samples from 22 stations. In order to directly associate eukaryotic lineages to carbon export and other environmental traits (Fig. 1b), we used sparse partial least square (sPLS)64 as implemented in the R package mixOmics29. We applied the sPLS in regression mode, which will model a causal relationship between the lineages and the environmental traits, that is, PLS will predict environmental traits (for example, carbon export) from lineage abundances. This approach enabled us to identify high correlations (see Supplementary Table 1) between certain lineages and carbon export but without taking into account the global structure of the planktonic community. Weighted correlation network analysis (WGCNA) was performed to delineate feature (lineages, viral populations, PCs or functions) subnetworks based on their relative abundance65, 66. A signed adjacency measure for each pair of features was calculated by raising the absolute value of their Pearson correlation coefficient to the power of a parameter p. The default value p = 6 was used for each global network, except for the Prokaryotic functional network where p had to be lowered to 4 in order to optimize the scale-free topology network fit. Indeed, this power allows the weighted correlation network to show a scale-free topology where key nodes are highly connected with others. The obtained adjacency matrix was then used to calculate the topological overlap measure (TOM), which for each pair of features, taking into account their weighted pairwise correlation (direct relationships) and their weighted correlations with other features in the network (indirect relationships). For identifying subnetworks a hierarchical clustering was performed using a distance based on the TOM measure. This resulted in the definition of several subnetworks, each represented by its first principal component. These characteristic components play a key role in weighted correlation network analysis. On the one hand, the closeness of each feature to its cluster, referred to as the subnetwork membership, is measured by correlating its relative abundance with the first principal component of the subnetwork. On the other hand, association between the subnetworks and a given trait is measured by the pairwise Pearson correlation coefficients between the considered environmental trait and their respective principal components. A similar protocol has been performed on the eukaryotic relative abundance matrix, the prokaryotic relative abundance matrix, the prokaryotic functions relative abundance matrix and the viral population and PC relative abundance matrices. All procedures were applied on Hellinger-transformed log-scaled abundances. Notably, the protocol is not sensitive to copy number variation as observed across different eukaryotic species, because the association between two species relies on a correlation score between relative abundance measurements. Computations were carried out using the R package WGCNA33. Given the nature of the eukaryotic data set (three distinct size fractions), the sampling process may lead to the loss of size fractions. In particular, samples 1, 3, 17, 37, 39, 43, 48, 53, 54, 55 and 66 are eventually biased by such a loss (Supplementary Table 12). A complementary WGCNA analysis was performed with addition of these samples to evaluate the robustness of our protocol to missing size fractions. The composition of the eukaryotic subnetwork built with an extended data set (that is, 67 samples from 37 stations for which size fractions were missing in 11 samples) was compared to the subnetwork as presented above (that is, 56 samples from 33 stations). Both subnetworks show an overlap of 75% of lineage, whereas four of the top five VIP lineages with the extended data set (see Extended Data Fig. 5 for details) can be found in the top six VIP lineages of the above subnetwork (Supplementary Table 2), emphasizing highly similar results and a small sensitivity to size fraction loss. For each subnetwork (called modules within WGCNA) extracted from each global network, pairwise Pearson correlation coefficients between the subnetwork principal components and the carbon export estimation was computed, as well as corresponding P values corrected for multiple testing using the Benjamini and Hochberg FDR procedure. The subnetworks showing the highest correlation scores are of interest and were investigated. One subnetwork (49 nodes) was significant within the eukaryotic network; one subnetwork (109 nodes) was significant for the prokaryotic network; one subnetwork (277 nodes) was significant within the virus network; two subnetworks (441 and 220 nodes) were significant within the prokaryotic functional network, and two subnetworks (1,879 and 2,147 nodes) were significant within the viral PCs network. In addition to the network analyses, we asked whether the identified subnetworks can be used as predictors for the carbon export estimations. To answer this question, we used partial least squares (PLS) regression, which is a dimensionality-reduction method that aims at determining predictor combinations with maximum covariance with the response variable. The identified combinations, called latent variables, are used to predict the response variable. The predictive power of the model is assessed by correlating the predicted vector with the measured values. The significance of the prediction power was evaluated by permuting the data 10,000 times. For each permutation, a PLS model was built to predict the randomized response variable and a Pearson correlation was calculated between the permuted response variable and in leave-one-out cross-validation (LOOCV) predicted values. The 10,000 random correlations are compared to the performance of the PLS model that were used to predict the true response variable. In addition, the predictors were ranked according to their value importance in projection (VIP)67. The VIP measure of a predictor estimates its contribution in the PLS regression. The predictors having high VIP values are assumed important for the PLS prediction of the response variable. The VIP values of the prokaryotic functional subnetworks are provided in Supplementary Tables 5, 6. For the sake of illustration, only lineages or functions with VIP >1 (ref. 67) are discussed and pictured in Figs 2 and 4. Our computations were carried out using the R package pls68. All programs are available under GPL Licence. Nodes of the subnetworks represent either lineages (eukaryotic, prokaryotic or viral) or functions (prokaryotic or viral). Subnetworks related to the carbon export have been represented in two distinct formats. Scatter plots represent each nodes based on their Pearson correlation to the carbon export and their respective node centrality within the subnetwork. The latter has been recomputed using significant Spearman correlations above 0.3 (>0.9 for viral PCs) as edges, this is done for visualization purposes since WGCNA subnetworks (based on the topology overlap measure (TOM) between nodes) are hyper-connected. Size representation of nodes are proportional to the VIP score after PLS. The hive plots depict the same subnetworks by focusing on two main features: x axis and y axis depict nodes of subnetworks ranked by their VIP scores and Pearson correlation to the carbon export, respectively.


No statistical methods were used to predetermine sample size. The investigators were not blinded to allocation during experiments and outcome assessment. Salmonella enterica serovar Typhimurium strain SL1344 constitutively expressing GFP from a chromosomal locus (strain JVS-3858) was previously described51 and is referred to as wild type throughout this study. The complete list of bacterial strains used in this study is provided in Supplementary Table 1. Routinely, bacteria were grown in Lennox broth (LB) medium at 37 °C with shaking at 220 r.p.m. When appropriate, 100 μg ml−1 ampicillin (Amp), 50 μg ml−1 kanamycin (Kan), or 20 μg ml−1 chloramphenicol (Cm) (final concentrations) were added to the liquid medium or agar plates. Chromosomal mutagenesis of Salmonella SL1344 was performed as previously described52. To construct a non-polar pinT mutant strain (YCS-034, GFP−; or JVS-10038, GFP+), the first ~60 nt of the gene were removed and replaced by a resistance cassette, while keeping the Rho-independent terminator intact. Then, the resistance cassette was eliminated using the FLP helper plasmid pCP20 at 42 °C52. All mutations were transduced into the wild-type background using P22 phage53. For plasmid transformation the respective Salmonella strains were electroporated with ~10 ng of DNA. The following cell lines were used in this study: human cervix carcinoma cells (HeLa-S3; ATCC CCL-2.2), human epithelial colorectal adenocarcinoma cells (CaCo-2; ATCC HTB-37), human epithelial colorectal adenocarcinoma cells (HT29; DSMZ No. ACC-299), human stomach adenocarcinoma cells (AGS; ATCC CRL-1739), human epithelial colon metastatic cells (LoVo; ATCC CCL-229), human embryonic kidney 293 cells (HEK293; ATCC CRL-1573), human monocytic cells (THP-1; ATCC TIB-202), murine fibroblast cells (L929; ATCC CCL-1), murine embryonic fibroblast cells (MEF; ATCC SCRC-1040), mouse leukaemic monocyte/macrophage cells (RAW264.7; ATCC TIB-71), porcine intestinal epithelial cells (IPEC-J2)54, porcine macrophage-like cells (3D4/31)55. HeLa-S3, CaCo-2, THP-1, HEK293; RAW264.7 and MEF cells were obtained from the group of Thomas Rudel (Biocentre, Würzburg). AGS cells were provided by Cynthia Sharma (Research Center for Infectious Diseases, Würzburg). L929 cells were obtained from Thomas Meyer (Max Planck Institute for Infection Biology, Berlin). HT29, LoVo, IPEC-J2 and 3D4/31 cells were provided by Karsten Tedin (Centre for Infection Medicine, Berlin). Cell lines have not been authenticated in our laboratory, but were routinely tested for mycoplasma contamination (MycoAlert Mycoplasma Detection Kit, Lonza). HeLa-S3 cells were cultured according the guidelines provided by the ENCODE consortium (http://genome.ucsc.edu/encode/protocols/cell/human/Stam_15_protocols.pdf). Briefly, cells were grown in DMEM (Gibco) supplemented with 10% fetal calf serum (FCS; Biochrom), 2 mM l-glutamine (Gibco) and 1 mM sodium pyruvate (Gibco) in T-75 flasks (Corning) in a 5% CO , humidified atmosphere, at 37 °C. Further cell lines used in this study (THP-1, CaCo-2, AGS, HT29, LoVo, HEK293, MEF, L929, RAW264.7, IPEC-J2 and 3D4/31) were cultured in RPMI (Gibco) supplemented with 10% FCS, 2 mM l-glutamine, 1 mM sodium pyruvate and 0.5% β-mercaptoethanol (Gibco) in a 5% CO , humidified atmosphere, at 37 °C. To differentiate THP-1 monocytes, seeded cells (1 × 106 cells per well; six-well format) were treated with 50 ng ml−1 (final concentration) of phorbol 12-myristate 13-acetate (PMA) (Sigma) for 72 h (after 48 h fresh PMA at the same concentration was added to the culture). For the differentiation of murine bone marrow derived macrophages (BMDMs), the marrow of femur and tibia was isolated from 8–12-week-old female C57BL/6 wild-type mice and stored in RPMI supplemented with 10% FCS. The cell suspension was centrifuged for 5 min at 250g and the leukocyte pellet was resuspended in differentiation medium consisting of X-vivo-15 medium (Lonza) supplemented with 10% FCS and 10% L929-conditioned DMEM medium (same composition as above). Cells were cultured at 3 × 106 cells per 10 ml in a T-75 flask. At day 3, another 3 ml of differentiation medium were added and cells were further cultured until day 5. Successful macrophage differentiation was validated by microscopy before the cells were detached using a rubber scraper (Sarstedt) and seeded into six-well plates at 105 cells per well in fresh differentiation medium. Infection was carried out on day 7 as described below. In vitro infection of HeLa-S3 cells was carried out following a previously published protocol56 with slight modifications. Two days before infection 2 × 105 HeLa-S3 cells were seeded in 2 ml complete DMEM (six-well format). Overnight cultures of Salmonella were diluted 1:100 in fresh LB medium and grown aerobically to an OD of 2.0. Bacterial cells were harvested by centrifugation (2 min at 12,000 r.p.m., room temperature) and resuspended in DMEM. Infection of HeLa-S3 cells was carried out by adding the bacterial suspension directly to each well. If not mentioned otherwise, infections were performed at a multiplicity of infection (m.o.i.) of 5. Immediately after addition of bacteria, the plates were centrifuged for 10 min at 250g at room temperature followed by 30 min incubation in 5% CO , humidified atmosphere, at 37 °C. Medium was then replaced for gentamicin-containing DMEM (final concentration: 50 μg ml−1) to kill extracellular bacteria. After a further 30 min incubation step, medium was again replaced by fresh DMEM containing 10 μg ml−1 of gentamicin, and incubated for the remainder of the experiment. Time point 0 was defined as the time when gentamicin was first added to the cells. Further cell types were infected as described for Hela-S3 cells except that infection was carried out in RPMI medium and that infection was with an m.o.i. of 10 (THP-1, CaCo-2, HT29, AGS, HEK293, MEF, L929 and RAW264.7) or 20 (IPEC-J2, 3D4/31), respectively. Infection of BMDMs was carried out with an m.o.i. of 10 and using X-vivo-15 medium (10% fetal calf serum, 10% L929-conditioned medium). Infection was carried out as described above, except that HeLa-S3 cells had been seeded onto coverslips (24-well format). At the respective timepoint, coverslips with infected HeLa-S3 were washed twice with PBS (Gibco) and fixed in 4% paraformaldehyde (PFA) for 15 min in a wet chamber. After two additional PBS washing steps, cells were stained with Hoechst 33342 (Invitrogen; diluted 1:5,000 in PBS) for 15 min in a wet chamber and again washed twice with PBS. After coverslips had been air-dried, they were embedded in Vectashield Mounting Medium (Biozol) and analysed using the Leica SP5 confocal microscope (Leica) and the LAS AF Lite software (Leica). To stain human mitochondria, MitoTracker Orange CMTMRos (Life Technologies; kindly provided by V. Kozjak-Pavlovic, Biocentre, Würzburg) was used. The dye was added in the dark to a final concentration of 200 nM directly into the medium of the infected cells in the 37 °C incubator, 30 min before their harvest. After the 30 min incubation with the dye, the plates were covered with aluminium foil to prevent bleaching during the following steps. The supernatant was aspirated and the cells were washed with PBS and fixed with 4% PFA at 4 °C overnight. Hoechst staining and sample preparation was performed as described above. For flow cytometry-based analyses, infected cultures were washed twice with PBS, detached from the bottom of the plate by trypsinization and resuspended in complete DMEM. Upon pelleting the cells (5 min at 250g, room temperature), they were resuspended in PBS and analysed by flow cytometry using a FACSCalibur instrument (BD Biosciences) and the Cyflogic (CyFlo Ltd; version 1.2.1) or Flowing (Cell Imaging Core, Turku Centre for Biotechnology, Finland; version 2.5.0) software, respectively. Selection of intact HeLa-S3 cells was achieved by gating based on cell diameter (forward-scatter) and granularity (side-scatter) (linear scale). Of those, infected (GFP-positive) and non-infected (GFP-negative) sub-fractions were defined based on GFP signal intensity (FITC channel) versus auto-fluorescence (PE channel) (logarithmic scale). For cell sorting, RNAlater-fixed cells (see below) were first passed through MACS Pre-Separation Filters (30 μm exclusion size; Miltenyi Biotec) and then analysed and sorted using the FACSAria III device (BD Biosciences) at 4 °C (cooling both the input tube holder and the collection tube rack) and at a medium flow rate using the same gating strategy as described above, except that the gates for GFP-positive and GFP-negative fractions were conservative in order to prevent cross-contamination (as exemplified in Extended Data Fig. 1d). Typically ~2 × 105 cells of each fraction were collected for RNA isolation. To detect apoptotic cells, HeLa-S3 cells were washed twice with PBS and resuspended in 1× binding buffer (BD Pharmingen) to a concentration of 106 cells per ml. 100 μl of this cell suspension were mixed with 5 μl of APC-labelled annexin V (BD Pharmingen) and 1 μl of 500 mg ml−1 propidium iodide (PI; lyophilized stock from Sigma). Upon incubation for 15 min at room temperature, (light-protected) cells were subjected to flow cytometry using the MACSQuant Analyzer (Miltenyi Biotec). Upon gating of the fraction of intact cells based on cell diameter (forward-scatter) and granularity (side-scatter), the annexin-positive/PI-negative sub-population was determined by comparison against the appropriate single-stained controls in the APC vs PerCP channels, and quantified. Necrosis was evaluated by quantifying released lactate dehydrogenase (LDH) via the Cytotox96 assay (Promega) according to the manufacturer’s instructions. The absorbance at 490 nm was measured using a Multiskan Ascent instrument (Thermo Fisher). In order to convert the measured absorbance values into the relative proportion of dead cells, the maximal absorbance was determined by using 1× lysis solution (Promega) following the manufacturer’s instructions and referred to as 100% cytotoxicity. For both apoptosis and cytotoxicity measurements each biological replicate comprised three technical replicates. To quantify bacterial intracellular replication (Extended Data Fig. 1b), infected host cells were analysed by flow cytometry as described above, except that the increase in GFP intensity (geometric mean) was measured in the GFP-positive sub-population over time and normalized to that of the non-infected population in the same sample (example in Extended Data Fig. 1c). Alternatively, infected HeLa-S3 cultures were solubilized with PBS containing 0.1% Triton X-100 (Gibco) at the respective time points. Cell lysates were serially diluted in PBS, plated onto LB plates and incubated at 37 °C overnight. The number of colony forming units (c.f.u.) recovered was compared to that obtained from the bacterial input solution used for infection. In all cases, each biological replicate comprised three technical replicates. Infected cells were washed twice with PBS, trypsinized and pelleted. For ethanol fixations, cell pellets were re-dissolved in 0.1 volume of ice-cold PBS and then 0.9 volume of ice-cold ethanol (either 70% or 100%; as indicated) were added in single droplets during shaking (400 r.p.m., 4 °C) to avoid cell clumping. Fixation using stop solution (95% EtOH/5% water-saturated phenol)57 was performed by resuspending the cell pellet in PBS before the addition of 0.2 volume of stop solution and mixing. When PFA was used, the pellet was resuspended in the respective PFA concentration (0.5% or 4% PFA, pH 7.4, with or without 4% sucrose) and shaken for 15 min at 400 r.p.m., room temperature. PFA-induced crosslinks were reverted by an additional heating step for 15 min at 70 °C (refs 58, 59). For fixation with RNAlater (Qiagen), cell pellets were directly resuspended in RNAlater (1 ml per 5 × 106 cells). For systematic evaluation of different fixation protocols (Extended Data Fig. 1e–g), fixed cells had not been sorted but were either directly analysed upon fixation (30 min) or stored at −20 °C (ethanol-based fixatives) or 4 °C (others), respectively, overnight. To prepare RNAlater-fixed samples for sorting, tubes containing ~5 × 106 fixed cells were filled up with 10 ml of ice-cold PBS, centrifuged (5 min, 500g, 4 °C) and cell pellets resuspended in 2 ml of cold PBS. This cell suspension was filtered and sorted (as described above). In the dual RNA-seq experiments, as a reference for gene expression changes in host cells upon infection, a non-infected yet mock-treated control was included. The bacterial reference samples were derived from Salmonella grown in LB to an OD of 2.0, which either were then shifted to DMEM for 15 min, pelleted and fixed in RNAlater (see above) or were fixed directly (that is, without a medium exchange step) as indicated. Fixed Salmonella cells were pelleted and lysed using the lysis/binding buffer of the mirVana kit (Ambion). In order to maintain the approximate ratio of bacterial to host transcripts during RNA isolation, Salmonella lysates were mixed with host cell lysate in a way that the calculated proportion of individual Salmonella cells per infected host cell at the latest time point (see Extended Data Fig. 1h) was matched. The resulting mixture was then processed collectively. RNA was extracted from cells using the mirVana kit (Ambion) following the manufacturer’s instructions for total RNA isolation. To remove contaminating genomic DNA, samples were treated with 0.25 U of DNase I (Fermentas) per 1 μg of RNA for 45 min at 37 °C. If applicable, RNA quality was checked on the Agilent 2100 Bioanalyzer (Agilent Technologies). For qRT–PCR experiments total RNA was isolated using the TRIzol LS reagent (Invitrogen) according to the manufacturer’s recommendations and treated with DNase I (Fermentas) as described above. qRT–PCR was performed with the Power SYBR Green RNA-to-CT 1-Step kit (Applied Biosystems) according to the manufacturer’s instructions. Fold changes were determined using the 2(−ΔΔC ) method60. Primer sequences are given in Supplementary Table 1 and their specificity had been confirmed using Primer-BLAST (NCBI). For the estimation of Salmonella RNA within infection samples (Extended Data Fig. 1h), a dilution series of separately isolated Salmonella and HeLa-S3 total RNA was set up and in each case the ratio of rfaH/ACTB mRNAs was determined. The same was done for biological samples from infected cells as well as for the Salmonella reference controls. From the resulting trend-line equation the approximate proportion of the Salmonella transcriptome within mixed prokaryotic and eukaryotic total RNA samples could be deduced. Where indicated (Supplementary Table 1), Salmonella and eukaryotic host rRNA were removed using the Ribo-Zero Magnetic Gold Kit (Epidemiology) purchased from Epicentre/Illumina. Following the manufacturer’s instructions, ~500 ng of total, DNase-I-treated RNA from infection samples was used as an input to the ribosomal transcript removal procedure. rRNA-depleted RNA was precipitated in ethanol for 3 h at −20 °C. cDNA libraries for Illumina sequencing were generated by Vertis Biotechnologie AG, Freising-Weihenstephan, Germany. For dual RNA-seq of total RNA, at least 100 ng RNA were used for cDNA library preparation. DNase-I-treated total RNA samples were first sheared via ultra-sound sonication (4 pulses of 30 s at 4 °C each) to generate ~200–400 bp (average) fragmentation products. Fragments <20 nt were removed using the Agencourt RNAClean XP kit (Beckman Coulter Genomics). As an internal quality control for the pilot experiment (shown in Fig. 1), spike-in RNA (5′-AAAUCCGUUCGUACGGGCCC-3′; 5′-monophosphorylated and gel-purified) was added to a final concentration of 0.5%. The samples were poly(A)-tailed using poly(A) polymerase and the 5′ triphosphate (or eukaryotic 5′ cap) structures were removed using tobacco acid pyrophosphatase (TAP). Afterwards, an RNA adaptor was ligated to the 5′ monophosphate of the RNA fragments. First-strand cDNA synthesis was performed using an oligo(dT)-adaptor primer and the M-MLV reverse transcriptase (NEB). The resulting cDNA was PCR-amplified to about 20–30 ng μl−1 using a high fidelity DNA polymerase (barcode sequences for multiplexing were part of the 3′ primers). The cDNA library was purified using the Agencourt AMPure XP kit (Beckman Coulter Genomics) and analysed by capillary electrophoresis (Shimadzu MultiNA microchip electrophoresis system). cDNA libraries for dual RNA-seq on rRNA-depleted samples were constructed as described above, except for the following modifications. Upon RNA fragmentation, dephosphorylation with Antarctic Phosphatase (AP, NEB) and re-phosphorylation with T4 Polynucleotide Kinase (PNK, NEB) were performed. Oligonucleotide adapters were ligated to both the 5′ and 3′ ends of the RNA samples. First-strand cDNA synthesis was performed using M-MLV reverse transcriptase and the 3′ adaptor as primer. cDNA libraries from Salmonella-only samples were generated by fragmenting 5 μg of total RNA using ultrasound and RNAs <20 nt were removed using the Agencourt RNAClean XP kit (Beckman Coulter Genomics) as above. The RNA samples were poly(A)-tailed and 5′ppp structures were removed as before. RNA adapters were ligated to the 5′ monophosphate of the RNA and first-strand cDNA synthesis was performed using an oligo(dT)-adaptor primer and the M-MLV reverse transcriptase. The resulting cDNAs were PCR-amplified, purified using the Agencourt AMPure XP kit (Beckman Coulter Genomics) and analysed by capillary electrophoresis (Shimadzu MultiNA microchip). Generally, for sequencing cDNA samples were pooled in approximately equimolar amounts. The cDNA pool was size-fractionated in the size range of 150–600 bp using a differential clean-up with the Agencourt AMPure kit. For the dual RNA-seq pilot experiment (Fig. 1), single-end sequencing (100 cycles) was performed on an Illumina HiSeq 2000 machine at the Max Planck Genome Centre Cologne, Cologne, Germany. For dual RNA-seq on rRNA-free samples as well as for conventional RNA-seq of Salmonella-only samples, single-end sequencing (75 cycles) was performed on a NextSeq500 platform at Vertis Biotechnologie AG, Freising-Weihenstephan, Germany. All RNA-seq data discussed in this publication have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE60144. For the accession numbers of individual experiments, see Supplementary Table 1. Total RNA prepared with TRIzol LS reagent (Invitrogen) was separated in 6% (vol/vol) polyacrylamide-8.3 M urea gels and blotted as described11. We loaded per lane either 5–10 μg of RNA from pure bacterial samples (Extended Data Figs 3d and 9a), 2 μg total RNA from sorted cell samples (Extended Data Fig. 8b), or 50 μg total RNA from unsorted infection samples (Fig. 2b). Hybond XL membranes (Amersham) were hybridized at 42°C with gene-specific [32P] end-labelled DNA oligonucleotides (see Supplementary Table 1 for sequences) in Hybri-Quick buffer (Carl Roth AG). The pinT promoter region was amplified by PCR using primers JVO-7036/-7037 and inserted via the AatII and NheI sites in the backbone of plasmid pAS093, resulting in plasmid pYC65. To identify the PhoP binding sites in a minimal fragment, the pinT promoter region was truncated by amplifying pYC65 using Phusion polymerase (NEB) with JVO-9393/-7387. The critical residues in the PhoP binding motif (T T ) were mutated to adenines by site-directed mutagenesis with JVO-12461/-12462 and Phusion polymerase (NEB). For pulse-expression of PinT in in vitro grown Salmonella, we used arabinose-induced overexpression of PinT from a pBAD plasmid previously described10, 51, 61 with minor modifications. Briefly, wild-type Salmonella that carried either a pKP8-35 (pBAD control), pYC5-34 (pBAD-PinT) or pYC60 (pBAD-PinT*) plasmid were grown overnight in LB and, the next day, the cultures were 1:100 diluted and further grown in LB to an OD of 2.0. l-arabinose (Sigma) was added to a final concentration of 0.2%; 5 min later RNA was extracted using TRIzol LS reagent (Invitrogen) and analysed by RNA-seq (~3–5 million reads/library). For the same experiment under SPI-2-inducing conditions, overnight cultures of the three strains were washed 2× with PBS and 1× with SPI-2 medium28, diluted 1:50 in SPI-2 medium and grown to an OD of 0.3 before PinT expression was induced as above. For the pulse-expression of PinT inside host cells (Extended Data Fig. 6d, e), HeLa-S3 cells were infected with the same three strains as above and 4 h after infection, 0.2% l-arabinose was supplemented directly into the DMEM medium. Activation of inducible sRNA expression in intracellular bacteria was confirmed by qRT–PCR over a time-course of 20 min (Extended Data Fig. 6d), demonstrating full induction levels to be reached already at 5 min. Thus, for Extended Data Fig. 6e the host cells were lysed at 5 min after induction with ice-cold 0.1% Triton X-100/PBS and further incubated for 30 min on ice with pipetting up and down from time to time to improve host cell lysis efficiency. Then the intact bacterial cells were pelleted by centrifugation for 2 min at 16,100g (4 °C) and resuspended in RNAlater (Qiagen). The fixed bacterial cells were further enriched against the host background via cell sorting (FACSAria III, BD Biosciences) and selective gating for the fraction of GFP+ bacterial cells released from their hosts. From those, total RNA was isolated and analysed by RNA-seq as above except that sequencing was to a depth of ~20 million reads per library as necessitated by remaining host-derived RNA fragments. Immunoblotting of Salmonella proteins was done as previously described62. Briefly, samples from Salmonella in vitro cultures were taken corresponding to 0.4 OD , centrifuged for 4 min at 16,100g at 4 °C, and pellets resuspended in sample loading buffer to a final concentration of 0.01 OD per μl. After denaturation for 5 min at 95 °C, 0.05-OD equivalents of the sample were separated via SDS–PAGE. Gel-fractionated proteins were blotted for 90 min (0.2 mA per cm2; 4 °C) in a semi-dry blotter (Peqlab) onto a PVDF membrane (Perkin Elmer) in transfer buffer (25 mM Tris base, 190 mM glycin, 20% methanol). Blocking was for 1 h at room temperature in 10% dry milk/TBST20. Appropriate primary antibodies (see Supplementary Table 1) were hybridized at 4 °C overnight and – following 3 × 10 min washing in TBST20 – secondary antibodies (Supplementary Table 1) for 1 h at room temperature. For western blotting of human proteins, infected cells were harvested in sample loading buffer (500 μl per well; six-well format), transferred to 1.5 ml reaction tubes, boiled for 5 min at 95 °C and 20 μL per lane were loaded onto a 10% PAA gel for SDS–PAGE as above. After blotting and blocking (as above), the membrane was probed with the respective primary antibody at 4 °C overnight and—upon washing (as above)—with the secondary antibody for 1 h at room temperature (a full list with information about all antibodies and sera used is given in Supplementary Table 1). After three additional washing steps for each 10 min in TBST20, blots were developed using western lightning solution (Perkin Elmer) in a Fuji LAS-4000. In Fig. 3e, intensities of protein bands were quantified using the AIDA software (Raytest, Germany) and normalized to GroEL levels. To mimic the early stages of the infection of a host cell in vitro, the indicated Salmonella strains were grown in LB overnight, diluted 1:100 in LB and grown to an OD of 2.0 (that is, a condition under which SPI-1 is highly induced4, 11), washed twice with PBS and once with SPI-2 medium28 at room temperature, diluted 1:50 in pre-warmed SPI-2 medium (defined as t ) and grown further in Erlenmeyer flasks at 37 °C for the indicated time periods. At the respective time points, samples were taken for RNA-seq, western blotting, and GFP fluorescence measurements. To measure the GFP intensity of reporter strains, bacteria were grown in LB in presence of Amp and Cm until an OD of 2.0 was reached. Salmonella cells corresponding to 1 OD were pelleted and fixed with 4% PFA. GFP fluorescence intensity was quantified for each 100,000 events by flow cytometry with the FACSCalibur instrument (BD Biosciences). Data were analysed using the Cyflogic software (CyFlo). To monitor SPI-2 activation in real time, a transcriptional gfp reporter was constructed by inserting the SPI-2-dependent ssaG promoter into plasmid pAS0093 via AatII/NheI sites as previously described8. The resulting plasmid pYC104 was co-transformed with either the pBAD-ctrl. or pBAD-PinT plasmid into the indicated strain backgrounds. The resulting strains were grown overnight in LB (+Amp + Cm) and then diluted 1:100 and further grown in the same medium to an OD of 2.0. A volume of 1 ml of the culture was pelleted and the collected cells shifted to SPI-2 medium28 (defined as t ) as described above, except that the growth experiment was conducted in 96-well plates (Nunc Microwell 96F, Thermo Scientific). After measuring the OD and GFP intensity at t , l-arabinose was added to each well to final concentration of 0.2% for sRNA induction and bacteria were grown for 20 h at 37 °C (with shaking) with measurements of both the OD and GFP fluorescence in 10 min intervals using the Infinite F200 PRO plate reader (Tecan). HeLa-S3 cells were infected with wild-type Salmonella, ΔpinT or pinT+ mutant strains at an m.o.i. of 5 as described above. Culture supernatant samples were taken at 20 h p.i. and analysed using the ELISA kit for human CXCL8/IL-8 (R&D Systems). Code availability. In order to document the details and parameters of the (dual) RNA-seq data analyses and to make the biocomputational approaches reproducible for others, we implemented the workflows as Unix Shell scripts. These scripts are deposited at Zenodo (DOI: 10.5281/zenodo.34695, https://zenodo.org/record/34695). Please refer to Supplementary Table 1 for descriptions of the analyses. For all RNA-seq experiments listed in Supplementary Table 1, Illumina reads in FASTQ format were trimmed with a Phred quality score cut-off of 20 by the program fastq_quality_trimmer from FASTX toolkit version 0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/). Reads shorter than 20 nt after adaptor- and poly(A)-trimming were discarded before the mapping. The reads were aligned to the Salmonella enterica SL1344 genome (NCBI RefSeq accession numbers: NC_016810.1, NC_017718.1, NC_017719.1, NC_017720.1) and—where applicable—the human (hg19 – GRCh37; retrieved from the 1000 Genomes Project63), the mouse (GENCODE M2, GRCm38.p2), or the porcine genome sequence (ENSEMBL, Sscrofa10.2), in parallel. The mapping was performed using the READemption pipeline (version 0.3.5)64 and the short read mapper segemehl and its remapper lack (version 0.2.0)65 allowing for split reads66. Mapped reads with an alignment accuracy <90% as well as cross-mapped reads, that is, reads which could be aligned equally well to both host and Salmonella reference sequences, were discarded. The resulting data were used for visualization (see for example, Fig. 1b and Extended Data Fig. 2b). Reads of the high resolution time-course experiment (cDNA libraries numbers 27–77 in Supplementary Table 1) that were detected as cross-mapped by READemption (see above) were further inspected: their median percentage over the entire time-course was 0.25% with increased fractions for the later time points, implying that those reads are mainly contributed by Salmonella cells. We observed that the majority of the cross-mapped reads aligned to Salmonella rRNA or tRNA loci, while on the human side no gene class preference was observed (data not shown). For dual RNA-seq experiments (cDNA libraries 1–184, 215–256 in Supplementary Table 1) after mapping differential expression analysis was carried out separately for the host and the pathogen. Strand-specific gene-wise quantifications for each data subset were performed by READemption64. Host transcript expression analyses are based on annotations from GENCODE (version 19)67, NONCODE (version 4)68 and miRBase (version 20)69 after removing redundant entries. The annotation for Salmonella genes was retrieved from NCBI (under the above mentioned accession numbers) and manually extended with small RNA annotations4, 70. In either organism, multi-mapped reads were removed and only uniquely mapped reads were considered for the expression analysis. Differential gene expression analyses were performed with the edgeR package (version 3.10.2)71 using an upper-quartile normalization and a prior count of 1. Where needed (that is, to correct for batch effects in the comparisons between wild-type and mutant infections; the comparisons displayed in Figs 3 and 4 and Extended Data Figs 5, 7,8,9), sequencing data were further normalized using the RUVs correction method72 with k = 3. For this purpose, we treated the samples time-point-wise to remove unwanted nuisance factors. At each time point our covariate of interest was the pinT status of the infecting bacterium. This is constant within replicate blocks, which are used for the RUVs correction. Host or bacterial genes with at least 10 uniquely mapped reads in three replicates were considered detected. Genes with an adjusted P value < 0.05 were considered differentially expressed. Differential expression analysis for conventional (bacteria only) RNA-seq experiments (cDNA libraries numbers 185–214 in Supplementary Table 1) was done similarly, except that a cut-off of ≥50 uniquely mapped reads was used as a detection threshold. Based on the obtained BAM files, coverage files in wiggle format were generated by READemption64 in a strand-specific manner and split by organism. In each case, coverage files are based on uniquely mapped reads and normalized by the total number of uniquely aligned reads per organism. For Fig. 4e, wiggle files were visualized using the Integrated Genome Browser (version 8.4.4)73. A database of pathways, regulons, and genomic islands was constructed using information obtained from the KEGG database74 (organism code sey), the SL1344 genome annotation70, and relevant literature sources (see Supplementary Table 1). Pearson correlation coefficients between changes in PinT expression and changes in expression of each gene within each regulon over the time-course of wild-type Salmonella infection (cDNA libraries number 27, 30, 33, 36, 39, 42, 44, 47, 50, 53, 56, 59, 61, 64, 67, 70, 73, 76 in Supplementary Table 1) were plotted in Fig. 2d. To assess enrichment of differentially expressed transcripts in pathways in the comparative infection experiments (cDNA libraries numbers 27–77 and 152–184 in Supplementary Table 1) and the in vitro assay (cDNA libraries numbers 185–202 in Supplementary Table 1), gene set enrichment analysis (GSEA; version 2.1.0) was run on the log fold changes reported by edgeR. The GSEA was performed in ranked list mode (with statistic classic) and gene sets containing less than 15 or more than 100 entries were excluded. Extended Data Fig. 5a reports all pathways significant at an FDR-corrected P value of at most 0.05 in at least one time point. Host pathway enrichment studies were performed consistently with bacterial analyses using GSEA on human pathways available in the KEGG database (downloaded January 22, 2014) using the same settings described above. Pathways with an adjusted P value ≤ 0.05 were considered to be significantly modulated. Data visualization for Extended Data Fig. 8a was produced using the Bioconductor package Pathview75. Genes displayed in Fig. 1d, that is, genes whose transcription is known or predicted to be regulated by the binding of nuclear factor κB (NF-κB) to their promoter or genes whose products have been shown to promote an NF-κB response, were retrieved from the GeneCards76 and Boston University Biology (http://www.bu.edu/nf-kb/gene-resources/target-gene) databases or refs 77, 78. STAT3 target genes denoted in Fig. 4b were retrieved from ref. 79. We used Cufflinks/Cuffdiff (version 2.2.1)80, 81 to test for differentially expressed isoforms in the high-resolution, comparative dual RNA-seq time-course data set (cDNA libraries number 27–77 in Supplementary Table 1). In a first step, we used Cufflinks to quantify transcript isoforms in the mapped read data. Afterwards, all transcript annotations were merged using Cuffmerge and differentially expressed isoforms were called using Cuffdiff. To identify bacterial and human genes with similar expression kinetics across the time-course of the infection of HeLa-S3 cells (cDNA libraries number 27–77 in Supplementary Table 1), we used RUVs-corrected, abundance-filtered and normalized read counts (see above). Absolute counts were then transformed into standard z-scores for each gene over all considered samples as follows: for each gene, the z-score was calculated as the absolute read count minus the mean read count over all samples, divided by the standard deviation of all counts over all samples. Genes with a standard deviation <2 were excluded from further analysis. Pearson correlation coefficients were calculated between all remaining bacterial genes and all remaining human genes, and P values were calculated using the function cor.test in R. To account for a possible temporal delay between Salmonella expression changes and effect manifestation in the host cell, a time-shift was allowed. This means the expression of Salmonella genes at each time point was compared to host expression at the subsequent time point. Human genes were considered to be correlated with a bacterial gene if they had a P value of less than 10−4 and a Pearson’s r greater than 0.65. This resulted in a total of 751 clusters of human genes showing correlation in expression with a bacterial gene, approximately half of which (see Supplementary Table 1) had at least one enriched GO term associated with them (adjusted P value < 0.05) as tested using the software tool Ontologizer 2.0 (build: 20100310-351)82 with the gene ontology definition obtained from the Gene Ontology Consortium (data-version: releases/2015-09-26) and the Universal Protein Resource (UniProt) gene annotation (generated: 2015-09-14). To account for the possibility that multiple bacterial genes might be associated with a human gene cluster a correlation analysis was performed for all against all bacterial genes as described above, with the only exception that no time-shift was allowed. For this, we focused on seventeen gene clusters that were built on bacterial genes encoding for secretion-associated gene products (according to UniProt; see Supplementary Table 1). Detailed inspection of these clusters revealed the one depicted in Fig. 4b (centred on the bacterial SPI-2 gene sseC) which contained many further (bacterial and human) genes with pronounced PinT-dependent expression changes – that is, genes that showed differential expression between wild-type and ΔpinT infection at several time points p.i. In all RNA-seq-based analyses, transcript expression changes that were associated with an adjusted P value < 0.05 (reported by edgeR) were considered significantly differentially expressed. For Fig. 3b, a Monte Carlo permutation test was performed on the median fold change of genes in the SPI-2 regulon, using 105 randomly selected gene sets of the same size. This indicated the significant de-repression (P < 0.05) of the SPI-2 regulon in the absence of PinT at 2 and 8 h after the infection of HeLa cells, at 2, 6 and 16 h after the infection of 3D4/31 cells, and in the in vitro assay. Tests for the evaluation of increased host cell death in Extended Data Fig. 1a were performed using a one-tailed Student’s t-test. *P values ≤ 0.05 were considered significant and ***P values ≤ 0.001 were considered very significant. The significance of gene activation in qRT–PCR results in Fig. 4c and Extended Data Figs 5b, c and 7c, d or the ELISA assay in Extended Data Fig. 7e was assessed using a one-tailed Mann–Whitney U-test. The significance of differences in intracellular replication between the ΔpinT strain and wild-type Salmonella (Extended Data Fig. 4d) was evaluated using a two-tailed Mann–Whitney U-test.


cDNAs encoding K63-Super-UIM (wild type and mutant) and the Vps27-based K63 binder18 containing C-terminal His tags were produced as synthetic genes (Eurofins) and inserted into pDONR221 by BP reactions (Invitrogen). By means of LR reactions (Invitrogen) the inserts were then transferred to the Champion pET104 BioEase Gateway Biotinylation System (Invitrogen) for recombinant protein production or pcDNA-DEST53 (Invitrogen) for GFP-tagged constitutive mammalian expression. For inducible expression of GFP-tagged K63-Super-UIM, the GFP-K63-Super-UIM complementary DNA was inserted into pcDNA4/TO (Invitrogen). Plasmids encoding HA-tagged wild-type and catalytically inactive (CI) RNF8 (C403S), wild-type and CI (C16S/C19S) forms of RNF168, and UBC13, as well as chimaeras between RNF8 and different E2 enzymes were described previously1, 4, 16. The *FHA mutation (R42A) in HA–RNF8ΔR–UBC13 was generated by site-directed mutagenesis. RNF8 constructs were made resistant to RNF8-siRNA by introducing three silent mutations (bold) in the siRNA targeting sequence (5′-TGCGGAGTATGAGTACGAG-3′) in the plasmids by site-directed mutagenesis. The RNF168 UDM1 (amino acids 110–201) and UDM2 (amino acids 419–487) fragments were amplified by PCR and inserted into either pTriEx-5 (Novagen) for Strep- and His-tagged expression in Escherichia coli and mammalian cells, or pEGFP-C1 (Clontech) for expression of GFP-tagged versions. The Strep–RNF168 UDM1 mutants used in this study (*UMI (L149A) and *MIU1 (A179G)) were generated using the QuikChange site-directed mutagenesis kit (Stratagene). Constructs encoding GFP–H1 isoforms were cloned by inserting the respective cDNAs into the BglII and BamHI sites of pEGFP-C1 (Clontech). A plasmid encoding HMGB1–GFP was provided by M. Bianchi. A Flag–HMGB1 expression construct was generated by inserting the HMGB1 open reading frame (ORF) into pFlag–CMV2 (Sigma). All constructs were verified by sequencing. Plasmid transfections were done with FuGene 6 (Promega) or Genejuice (Novagene), siRNA transfections were done with Lipofectamine RNAiMAX (Invitrogen), according to the manufacturers’ instructions. siRNA sequences used in this study were as follows. Non-targeting control (CTRL), 5′-GGGAUACCUAGACGUUCUATT-3′; UBC13, 5′-GAGCAUGGACUAGGCUAUATT-3′; RNF8, 5′-UGCGGAGUAUGAAUAUGAATT-3′; RNF168, 5′-GUGGAACUGUGGACGAUAATT-3′ or 5′-GGCGAAGAGCGAUGGAAGATT-3′; histone H1(#1), 5′-GCUACGACGUGGAGAAGAATT-3′; H1(#2), 5′-GCUCCUUUAAACUCAACAATT-3′; H1(#3), 5′-GAAGCCAAGCCCAAGGUUATT-3′; H1(#4), 5′-CCUUUAAACUCAACAAGAATT-3′; H1(#5), 5′-CCUUCAAACUCAACAAGAATT-3′; H1(#6), 5′-UCAAGAGCCUGGUGAGCAATT-3′; H1(#7), 5′-GGACCAAGAAAGUGGCCAATT-3′; H1(#8), 5′-GCAUCAAGCUGGGUCUCAATT-3′; H1(#9), 5′-CAGUGAAACCCAAAGCAAATT-3′; H1(#10) (specific for H1x), 5′-CCUUCAAGCUCAACCGCAATT-3′; 53BP1, 5′-GAACGAGGAGACGGUAAUATT-3′; USP7, 5′-GGCGAAGUUUUAAAUGUAUTT-3′; and USP9x, 5′-GCAGUGAGUGGCUGGAAGUTT-3′. Human U2OS, HCT116 and RPE1 cells were obtained from ATCC. U2OS and HCT116 were cultured in DMEM containing 10% FBS and 1×penicillin–streptomycin, while RPE1 cells were grown in a 1:1 mixture of Ham’s F12 and DMEM supplemented with 10% FBS and 1×penicillin–streptomycin. Serum-starvation of RPE1 cells was done by incubating cells for 24 h in medium supplemented with 0.25% FBS. A HCT116 UBC13-knockout cell line was generated using CRISPR–Cas9 technology14, 15. A donor plasmid bearing a splice acceptor site and a puromycin resistance marker, flanked by homology arms, was co-transfected with pX300 (ref. 14) targeting the GGCGCGCGGGAATCGCGGCG sequence within the first intron of the UBC13 gene. To generate cell lines capable of doxycycline-induced expression of GFP-tagged K63-Super-UIM, U2OS cells were transfected with GFP–K63-Super-UIM plasmid and pcDNA6/TR and positive clones were selected with Zeocin (Invitrogen) and Blasticidin S (Invitrogen). Stable U2OS cell lines expressing RNF8 or RNF168 shRNA in a doxycycline-inducible manner or Strep–HA–ubiquitin were described previously1, 4, 31. All cell lines were regularly tested for mycoplasma infection. Unless otherwise indicated, cells were exposed to DSBs using IR (4 Gy for microscopy experiments and 10 Gy for biochemical analyses) or laser micro-irradiation (as described previously32), and collected 1 h later. Purified biotinylated K63-Super-UIM wild-type and mutant proteins containing an N-terminal, biotinylated BioEase tag and a C-terminal His -tag were obtained by expressing the proteins in an E. coli strain expressing the BirA biotin ligase. Bacteria were grown in LB medium containing 0.5 mM biotin, induced with 0.25 mM isopropyl-β-d-thiogalactoside (IPTG) for 3 h at 30 °C, and then lysed by French press. The K63-Super-UIM constructs were purified using immobilized metal affinity chromatography (IMAC) followed by size-exclusion chromatography (SEC). Purity and complete biotinylation of the proteins was verified by mass spectrometry. Recombinant Strep–His –RNF168 UDM-1/2 was produced in Rosetta2(DE3)pLacI (Novagen) bacteria induced with 0.5 mM IPTG for 3 h at 30 °C, lysed using Bugbuster (Novagen) supplemented with Protease Inhibitor Cocktail without EDTA (Roche). The proteins were purified on Ni2+-NTA-agarose (Qiagen). Recombinant human UBA1, UBCH5c, UBC13, MMS2, RNF8 and ubiquitin used for in vitro ubiquitylation assays were purified as described8. Antibodies used in this study included: UBC13 (#4919, Cell Signaling), MCM6 (sc-9843, Santa Cruz), 53BP1 (sc-22760, Santa Cruz), γ-H2A.X (05-636, Millipore; or 2577, Cell Signaling), H2A.X (2595, Cell Signaling), MDC1 (ab11171, Abcam), conjugated ubiquitin (FK2) (BML-PW8810-0500, Enzo Life Sciences), HA (11867423991, Roche; and sc-7392, Santa Cruz), Myc (sc-40, Santa Cruz), His (631212, Clontech), GFP (sc-9996, Santa Cruz; 11814460001, Roche), ubiquitin (sc-8017, Santa Cruz), histone H1.2 (ab17677, Abcam), histone H1x (A304-604A, Bethyl Labs), histone H1 (pan, #AE-4 clone) (ab7789, Abcam), histone H2A (07-146, Millipore), histone H2B (ab1790, Abcam), histone H3 (ab1791, Abcam), histone H4 (ab7311, Abcam), cyclin A (sc-751, Santa Cruz), actin (MAB1501, Millipore), BRCA1 (sc-6954, Santa Cruz), RNF168 for immunofluorescence (06-1130, Millipore) and antibody to RNF168 (ref. 5) used for immunoblots were gifts from D. Durocher. Antibody to RNF8 has been described previously1. For pull-down of K63-ubiquitylated proteins, cells were lysed in high-stringency buffer (50 mM Tris, pH 7.5; 500 mM NaCl; 5 mM EDTA; 1% NP40; 1 mM dithiothreitol (DTT); 0.1% SDS) containing 1.25 mg ml−1 N-ethylmaleimide, 50 μM DUB inhibitor PR619 (LifeSensors), and protease inhibitor cocktail (Roche). Recombinant biotionylated K63-Super-UIM (25 μg ml−1) was added immediately upon lysis, followed by sonication and centrifugation. Streptavidin M-280 Dynabeads (Invitrogen) was added to immobilize the K63-Super-UIM, and bound material was washed extensively in high-stringency buffer. A Benzonase (Sigma) and MNase (NEB) treatment step was included to remove any contaminating nucleotides. Proteins were resolved by SDS–PAGE and analysed by immunoblotting. Where indicated, bound complexes were subjected to deubiquitylation by incubation with USP2cc (1 μM, Boston Biochem) in DUB buffer (50 mM HEPES, pH 7.5; 100 mM NaCl; 1 mM MnCl ; 0.01% Brij-35; 2 mM DTT) overnight at 30 °C before boiling in Laemmli Sample Buffer. Immunoblotting, Strep-Tactin pull-downs, and chromatin enrichment were done essentially as described32. Briefly, Strep–RNF168 UDM pull-down experiments from cells were performed after lysing cells in EBC buffer (50 mM Tris, pH 7.4; 150 mM NaCl; 0.5% NP-40; 1 mM EDTA) containing 1.25 mg ml−1 NEM, 50 μM PR619 (LifeSensors) and protease inhibitor cocktail (Roche). The soluble fraction was subsequently used for immunoprecipitation using Strep-Tactin sepharose (IBA). After washing in EBC buffer, proteins were eluted and analysed by immunoblotting. To isolate Strep–HA–ubiquitin-conjugated proteins, cells were lysed in denaturing buffer (20 mM Tris, pH 7.5; 50 mM NaCl; 1 mM EDTA; 1 mM DTT; 0.5% NP-40; 0.5% sodium deoxycholate; 0.5% SDS) containing 1.25 mg ml−1 NEM, 50 μM PR619 (LifeSensors) and protease inhibitor cocktail (Roche). After sonication and centrifugation, Strep–HA–ubiquitin-conjugated proteins were immobilized on Strep-Tactin sepharose (IBA). After extensive washing in denaturing buffer, proteins were eluted and analysed by immunoblotting. For chromatin fractionation, cells were first lysed in buffer 1 (100 mM NaCl; 300 mM sucrose; 3 mM MgCl ; 10 mM PIPES, pH 6.8; 1 mM EGTA; 0.2% Triton X-100) containing protease, phosphatase and DUB inhibitors and incubated on ice for 5 min. After centrifugation, the soluble proteins were removed and the pellet was resuspended in buffer 2 (50 mM Tris-HCl, pH 7.5; 150 mM NaCl; 5 mM EDTA; 1% Triton X-100; 0.1% SDS) containing protease, phosphatase and DUB inhibitors. Lysates were then incubated 10 min on ice, sonicated, and solubilized chromatin-enriched fractions were collected after centrifugation. For immunofluorescence staining, cells were fixed in 4% paraformaldehyde for 15 min, permeabilized with PBS containing 0.2% Triton X-100 for 5 min, and incubated with primary antibodies diluted in DMEM for 1 h at room temperature. After staining with secondary antibodies (Alexa Fluor; Life Technologies) for 1 h, coverslips were mounted in Vectashield mounting medium (Vector Laboratories) containing nuclear stain DAPI. Images of GFP–K63-Super-UIM were all obtained from a stable cell line where GFP–K63-Super-UIM was induced by incubating with 1 μg ml−1 doxycycline for approximately 24 h unless otherwise stated. Images were acquired with an LSM 780 confocal microscope (Carl Zeiss Microimaging) mounted on Zeiss-Axiovert 100M equipped with Plan-Apochromat 40×/1.3 oil immersion objective, using standard settings. Image acquisition and analysis was carried out with ZEN2010 software. For ImageJ-based image analysis, images were acquired with an AF6000 wide-field microscope (Leica Microsystems) equipped with a Plan-Apochromat 40×/0.85 CORR objective, using the same microscopic settings. Fluorescence intensities of the micro-irradiated region (demarcated by γ-H2AX positivity) and the nucleus were first corrected for the general image background. Using these values, relative recruitment to DNA damage sites (relative fluorescence units (RFUs)) was calculated by normalizing the nuclear-background-corrected signal at the micro-irradiated region to that of the nuclear background. Finally, the RFU of the protein of interest was normalized to the RFU of the γ-H2AX signal and plotted as the average of biological triplicates. Fluorescense recovery after photobleaching (FRAP) was performed essentially as described33. Briefly, U2OS cells stably expressing GFP–H1 were grown in glass-bottom dishes (LabTek) in the presence of CO -independent medium. A 2-μm-wide rectangular strip spanning the entire width of the cell was bleached by excitation with the maximal intensity of a 488 nm laser line, after which 95 frames of the bleached region were acquired at 4 s intervals. Mean fluorescence intensities were processed, normalized and analysed as described33. Binding of K63-Super-UIM to di-ubiquitin (Ub ) linkages (Boston Biochem) was done by incubating 100 ng Ub with 2.5 μg K63-Super-UIM immobilized on Streptavidin M-280 Dynabeads (Invitrogen) in buffer A (50 mM Tris, pH 7.5; 10% glycerol; 400 mM NaCl; 0.5% NP40; 2 mM DTT; 0.1 mg ml−1 BSA). After extensive washing, bound complexes were resolved by SDS–PAGE and analysed by immunoblotting. Binding of RNF168 UDM1/2 to di-ubiquitin (Ub ) linkages was analysed by incubating 100 ng Ub with 5 μg Strep–RNF168–UDM1/2 immobilized on Strep-Tactin sepharose (IBA BioTAGnology) in buffer B (50 mM Tris, pH 8; 5% glycerol; 0.5% NP40; 2 mM DTT; 0.1 mg ml−1 BSA; 2 mM MgCl , supplemented with 250 mM KCl for UDM1 binding and 100 mM KCl for UDM2 binding). After extensive washing, bound complexes were resolved by SDS–PAGE and analysed by immunoblotting. Where indicated, UDM1/2 binding to K63-linked Ub was analysed in the presence of increasing KCl concentrations (75 mM, 150 mM and 250 mM). To analyse binding of RNF168 UDM1/2 to recombinant histones, purified Strep–RNF168 UDM1/2 (10 μg) was pre-bound to Strep-Tactin sepharose in buffer C (for binding to H1.0) (50 mM, Tris pH 8; 5% glycerol; 150 mM KCl; 0.5% NP40; 2 mM DTT; 0.1 mg ml−1 BSA) or D (for binding to H2A) (50 mM, Tris pH 8; 5% glycerol; 75 mM KCl; 0.05% NP40; 2 mM DTT; 0.1 mg ml−1 BSA), and incubated with 500 ng recombinant histone H1.0 or H2A (New England Biolabs). Bound complexes were washed and analysed by immunoblotting. To analyse binding of LRM1 and LRM2 peptides to histone H1.0 or H2A, magnetic Streptavidin beads were incubated with buffer E (25 mM, Tris pH 8.5; 5% glycerol; 50 mM KCl; 0.5% TX-100; 1 mM DTT; 0.1 mg ml−1 BSA) in the absence (control) or presence of 1.5 μg purified, biotinylated RNF168 LRM1 (amino acids 110–133) or LRM2 (amino acids 463–485) peptide. Samples were then incubated with 250 ng recombinant H2A or H1.0 for 2 h at 4 °C, and immobilized complexes were washed and analysed by SDS–PAGE and Colloidal Blue staining (Invitrogen). For in vitro ubiquitylation assays, histone-H1-containing oligonucleosomes (10 µM) were purified in the presence of 55 mM iodoacetamide, essentially as described previously34, with the exception that micrococcal nuclease digestion was stopped with 20 mM EGTA and dialysis was started right after the second homogenization in buffer containing 50 mM Tris, pH 7.5; 150 mM NaCl; 1 mM TCEP; and 340 mM sucrose. Dialysed samples were then incubated with DUB inhibitor (Ubiquitin-PA35, 20 μM) for 20 min at room temperature. Nuclesomes were incubated with 0.5 µM human UBA1, 5 µM UBCH5c, 1 µM UBC13–MMS2 complex, 5 µM RNF8 fragment (purified as described previously8) and 75 µM ubiquitin in reaction buffer (50 mM Tris, pH 7.5; 100 mM NaCl; 3 mM ATP; 3 mM MgCl; 1 mM TCEP) at 31 °C. Samples were analysed by immunoblot analysis. For SILAC experiments, U2OS or HCT116 cells were grown in medium containing unlabelled l-arginine and l-lysine (Arg0/Lys0) as the light condition, or isotope-labelled variants of l-arginine and l-lysine (Arg6/Lys4 or Arg10/Lys8) as the heavy condition36. SILAC-labelled HCT116 wild-type and UBC13-knockout cells were lysed in modified RIPA buffer (50 mM Tris-HCl, pH 7.5; 150 mM NaCl; 1% Nonidet P-40; 0.1% sodium-deoxycholate; 1 mM EDTA) supplemented with protease inhibitors (complete protease inhibitor mixture tablets, Roche Diagnostics) and N-ethylmaleimide (5 mM). Lysates were incubated for 10 min on ice and cleared by centrifugation at 16,000g. An equal amount of protein from the two SILAC states was mixed and precipitated by adding fivefold acetone and incubating at −20 °C overnight. Precipitated proteins were dissolved in denaturing buffer (6 M urea; 2 M thiourea; 10 mM HEPES, pH 8.0), reduced with DTT (1 mM) and alkylated with chloroacetamide (5.5 mM). Proteins were digested with lysyl endoproteinase C (Lys-C) for 6 h, diluted fourfold with water and digested overnight with trypsin. The digestion was stopped by addition of trifluoroacetic acid (0.5% final concentration), incubated at 4 °C for 2 h and centrifuged for 15 min at 4,000g. Peptides from the cleared solution were purified by reversed-phase Sep-Pak C18 cartridges (Waters Corporation). Diglycine-lysine modified peptides were enriched using the Ubiquitin Remnant Motif Kit (Cell Signaling Technology), according to the manufacturer’s intructions. Briefly, peptides were eluted from the Sep-Pak C18 cartridges with 50% acetonitrile, which was subsequently removed by centrifugal evaporation. Peptides were incubated with 40 μl of anti-di-glycine-lysine antibody resin in immunoaffinity purification (IAP) buffer for 4 h at 4 °C. Beads were washed three times with IAP buffer, two times with water and peptides eluted with 0.15% trifluoroacetic acid. Eluted peptides were fractionated by microcolumn-based strong cation exchange chromatography (SCX) and cleaned by reversed-phase C18 stage-tips. SILAC-labelled cells were lysed in high-stringency RIPA buffer (50 mM Tris-HCl, pH 7.5; 500 mM NaCl; 1% Nonidet P-40; 0.1% sodium-deoxycholate; 1 mM EDTA) containing 1.25 mg ml−1 N-ethylmaleimide, 50 μM DUB inhibitor PR619 (LifeSensors), and protease inhibitor cocktail (Roche). Lysates from different SILAC states were separately incubated for 10 min on ice and cleared by centrifugation at 16,000g. Extracts (5 mg) were incubated for 4 h at 4 °C with K63-Super-UIM immobilized to Streptavidin beads (approximately 5 μg K63-Super-UIM per experiment). Beads were washed three times with high-stringency RIPA, beads from the different SILAC conditions were mixed, and proteins were eluted with SDS sample buffer, incubated with DTT (10 mM) for 10 min at 70 °C and alkylated with chloroacetamide (5.5 mM) for 60 min at 25 °C. Proteins were separated by SDS–PAGE using a 4–12% gradient gel and visualized with colloidal blue stain. Gel lanes were sliced into six pieces, and proteins were digested in-gel using standard methods37. Peptides were analysed on a quadrupole Orbitrap (Q Exactive, Thermo Scientific) mass spectrometer equipped with a nanoflow HPLC system (Thermo Scientific). Peptide samples were loaded onto C18 reversed-phase columns and eluted with a linear gradient (1–2 h for in-gel samples, and 3–4 h for di-glycine-lysine enriched samples) from 8 to 40% acetonitrile containing 0.5% acetic acid. The mass spectrometer was operated in a data-dependent mode automatically switching between MS and MS/MS. Survey full scan MS spectra (m/z 300–1200) were acquired in the Orbitrap mass analyser. The 10 most intense ions were sequentially isolated and fragmented by higher-energy C-trap dissociation (HCD). Peptides with unassigned charge states, as well as peptides with charge state less than +2 for in-gel samples and +3 for di-glycine-lysine enriched samples were excluded from fragmentation. Fragment spectra were acquired in the Orbitrap mass analyser. Raw MS data were analysed using MaxQuant software (version 1.3.9.21). Parent ion and tandem mass spectra were searched against protein sequences from the UniProt knowledge database using the Andromeda search engine. Spectra were searched with a mass tolerance of 6 ppm in the MS mode, 20 ppm for MS/MS mode, strict trypsin specificity and allowing up to two missed cleavage sites. Cysteine carbamidomethylation was searched as a fixed modification, whereas amino-terminal protein acetylation, methionine oxidation and N-ethylmaleimide modification of cysteines, and di-glycine-lysine were searched as variable modifications. Di-glycine-lysines were required to be located internally in the peptide sequence. Site localization probabilities were determined using MaxQuant (PTM scoring algorithm) as described previously38. A false discovery rate of less than 1% was achieved using the target-decoy search strategy39 and a posterior error probability filter. Information about previously known protein–protein interactions among putative UBC13-dependent K63-Super-UIM interacting proteins was extracted using the HIPPIE database40 (version 1.6), and interactions were visualized in Cytoscape41. The Gene Ontology (GO) biological process term analysis for UBC13-dependent K63-Super-UIM interacting proteins was filtered for categories annotated with at least 20 and not more than 300 genes. Redundant GO terms (less than 30% unique positive-scoring genes compared to more significant GO term) were removed and the five most significant (Fisher’s exact t-test) remaining GO term categories depicted. To determine the variation within the quantification of ubiquitin linkage types, an F-test was performed and the P values were adjusted using the Bonferroni method. A significant difference in the variances between K48 and K11, and K48 and K6 ubiquitin linkages was detected. To test the significance of the difference between the SILAC ratios measured for ubiquitin linkage types, the Welch two-sample t-test was performed and the obtained P values were adjusted using the Bonferroni method.

Discover hidden collaborations