American Type Culture Collection
American Type Culture Collection
News Article | May 17, 2017
Mice were maintained and animal experiments performed according to practices prescribed by the National Institutes of Health at Stanford’s Research Animal Facility (protocol 13565) and by the Institutional Animal Care and Use Committee at OncoMed Pharmaceuticals. Additional accreditation of Stanford and OncoMed Pharmaceuticals animal research facilities was provided by the Association for Assessment and Accreditation of Laboratory Animal Care. Animal experiments were performed unblinded except for allograft and patient-derived xenograft tumour growth measurements which were performed blinded. Immunostaining of sections from animal experiments were performed blinded. The TKO SCLC mouse model bearing deletions in p53, Rb, and p130 has been described10. Mice were maintained on a mixed genetic background composed of C57BL/6, 129/SvJ and 129/SvOla. Endogenous Notch activity in TKO tumours was assessed through a GFP reporter expressed from the endogenous Hes1 promoter (Hes1GFP/+ allele11). We also bred in the Rosa26lox-stop-lox-tdTomato (ref. 30) and Rosa26lox-stop-lox-luciferase (refs 31, 32) Cre-reporter alleles to the TKO model to label tumour cells with tdTomato and luciferase, respectively. SCLC tumours were induced in 7- to 10-week-old mice (with no discrimination by sex of mice) by intratracheal instillation with 4 × 107 plaque-forming units of Adeno-CMV-Cre (Baylor College of Medicine, Houston, Texas, USA) or Adeno-CGRP-Cre (University of Iowa). Tumours were collected for analysis after around 5–7 months for Ad-CMV-Cre or 7–8 months for Ad-CGRP-Cre, unless otherwise stated. In accordance with our animal protocol, mice were euthanized when they showed difficulty breathing, regardless of time point. TKO Hes1GFP/+ mice were treated with the γ-secretase inhibitor DBZ (Selleckchem, S2711) as previously described33. Mice were randomized and injected intraperitoneally once per day with 30 μmol per kg (body weight) of DBZ (or DMSO control) for 5 days, and tumours were collected on day 6 for flow cytometry or fixed for histological analyses. TKO or TKO Hes1GFP/+ mice bearing tumours were randomized for treatment. For acute responses, mice were treated with cisplatin (7.5 mg per kg (body weight), Teva) on day 1, and a combination of cisplatin and etoposide (15 mg per kg (body weight), Novaplus) on days 2 and 4. Lungs were fixed for histological analyses a few hours after the last injection. For longer-term chemotherapy experiments, as we observed high toxicity with etoposide administration, TKO Rosa26LSL-luciferase mice were treated weekly for 3 weeks with saline or 5 mg per kg (body weight) cisplatin only. For subcutaneous tumour growth of GFPneg or GFPhigh cells, 2,000 cells were FACS-sorted and implanted subcutaneously on the lower left and right quadrants of 8- to 10-week-old immunocompromised NOD.Cg-PrkdcscidIL2rgtm1Wjl/SzJ (NSG) mice (no selection for sex of mice). Mice were euthanized and tumours were collected after approximately 2 months. The tumours did not exceed the 1.75 cm diameter limit permitted by our animal protocol. For the human patient-derived xenograft and TKO allograft tumour growth models, NOD.CB17-Prkdcscid/NcrCrl (NOD/SCID, Charles River Laboratories) mice were maintained under pathogen-free conditions and provided with sterile food and water ad libitum. Patient-derived xenograft models were established from patient biopsies provided by Molecular Response (San Diego, California, USA). OMP-LU66 was established at OncoMed Pharmaceuticals. For the subcutaneous xenograft studies, 100,000 OMP-LU66 cells in 100 μl 50% Matrigel (BD Biosciences)/50% Hank’s balanced salt solution supplemented with 2% heat-inactivated fetal bovine serum and 20 mM HEPES (Life Technologies) were implanted into the left flank region of 7- to 8-week-old NOD/SCID mice (no selection for sex of mice) with a 25-gauge needle. Using a human Fab phage display library (HuCAL GOLD, MorphoSys AG34), functional anti-Notch antibodies were discovered from selections against recombinant Notch2 extracellular domain (EGF1-12) containing the ligand-binding site. NOD/SCID mice implanted with OMP-LU66 or TKO allografts were randomized and treated with a control antibody or tarextumab (OMP-59R5, 40 mg per kg (body weight), once every 2 weeks) as a single agent or in combination with the chemotherapy agents carboplatin (25 mg per kg (body weight), once-weekly, Teva) and irinotecan (25 mg per kg (body weight), once-weekly, Pfizer). We used carboplatin and irinotecan (instead of cisplatin and etoposide) for these longer-term studies as they are less toxic, better tolerated by the mice, and have been shown to have similar efficacies as cisplatin and etoposide35, 36. To avoid the side effects of total Notch pathway inhibition in vivo37, 38, we sought to reduce Notch signalling with the Notch2/3 antagonist tarextumab. After approximately four cycles, chemotherapy was discontinued and tarextumab dosing was continued until study completion. Mice with tumour volumes at or exceeding the 2,500 mm3 limit permitted by the Institutional Animal Care and Use Committee were euthanized regardless of time point. Tumours were dissected from the lungs of TKO Hes1GFP/+ mice approximately 5–7 months after tumour induction and digested as previously described39. The antibodies used were CD45-PE-Cy7 (eBioscience, clone 30-F11, 1:100), CD31-PE-Cy7 (eBioscience, clone 390, 1:100), TER-119-PE-Cy7 (eBioscience, clone TER-119, 1:100), CD24-APC (eBioscience, clone M1/69, 1:200), Ncam1 (Cedarlane, clone H28-123-16, 1:100), anti-rat-IgG2a-PE (eBioscience, clone r2a-21B2, 1:200), EpCam (eBioscience, clone G8.8, 1:100), and CD44-APC-Cy7 (BioLegend, clone IM7, 1:100). 7-Aminoactinomycin D (1 μg ml−1; Invitrogen) or DAPI was used to label dead cells. FACS was performed using a 100 μm nozzle on a BD FACSAria II using FACSDiva software. The sequential gating strategy is outlined in Extended Data Fig. 1d. Fluorophore compensation was performed for each experiment using either unstained cells or BD CompBeads (BD Biosciences) stained with individual fluorophore-conjugated antibodies, and compensation was calculated by FACSDiva. Data were analysed using FlowJo software and gates were set on the basis of unstained samples. TKO Hes1GFP/+ mice were injected intraperitoneally with 100 mg per kg (body weight) EdU (5-ethynyl-2′-deoxyuridine; Life Technologies) 8 h before euthanasia. GFPneg and GFPhigh tumour cells were sorted by FACS before being fixed and subject to EdU staining using the Click-iT Plus EdU Pacific Blue flow cytometry assay kit (Life Technologies). Propidium iodide was used to stain for total DNA content and percentage EdU incorporation of GFPneg and GFPhigh cells was analysed using a BD FACSAria II. The extracellular domain of rat Dll4 containing affinity-enhancing G28S, F107L, L206P N118I, I143F, H194Y, and K215E mutations (named Dll4 or Dll4 in the manuscript) was cloned into the pAcGp67A vector and modified with a carboxy (C)-terminal 8× His tag19. Dll4 was expressed using baculovirus by infecting 1 l of Hi-Five cells (Invitrogen) from Trichoplusia ni at a density of 2 × 106 cells per millilitre and harvesting cultures after 72 h. The cultures were centrifuged to remove the cells, and proteins were purified from supernatants by nickel and size-exclusion chromatography. The MigR1-ires-GFP (Ctrl) and MigR1-N1ICD-ires-GFP retroviral vectors were gifts from W. S. Pear (University of Pennsylvania, Philadelphia). For doxycycline-inducible expression, we cloned N1ICD into the pLIX-403 vector (a gift from D. Root, Addgene 41395). For Rest overexpression, we cloned the Nrsf(Rest) fragment from pHR′-NRSF-CITE-GFP (a gift from J. Nadeau, Addgene 21310 (ref. 40)) into the MigR1-ires-GFP or pLIX-403 vectors. Ascl1 (1: CTCCAACGACTTGAACTCTAT; 2: CCACGGTCTTTGCTTCTGTTT) and Rest (1: GTGTAATCTACAATACCATTT; 2: CCCAAGACAAAGACAAGTAAA) short hairpin RNAs (shRNAs) were obtained from the MISSION shRNA library (Sigma-Aldrich). Guide RNA (sgRNA) against Rest (CATCATCTGCACGTACACGA) was designed using the sgRNA Designer (Broad Institute) and cloned into the lentiCRISPR v2 backbone (a gift from F. Zhang, Addgene 52961 (ref. 41)). Except for 293T cells that were grown in DMEM, all cell lines were grown in RPMI-1640 medium supplemented with 10% bovine growth serum (BGS) (Fisher Scientific) and penicillin–streptomycin–glutamine (Gibco). Mouse KP1, KP2, and KP3 and human NJH29 SCLC cell lines were generated in the laboratory and have been described8, 32, 42. GFPneg and GFPhigh cell lines were isolated by FACS from individual mice. Human NCI-H82 and NCI-H889 cells were purchased from the American Type Culture Collection and authenticated by STR analysis. All cell lines tested negative for mycoplasma. Transfections and viral infections were performed as previously described39. For acute analysis of gene expression changes, RNA was isolated from GFPhigh cells FACS-sorted 48 h after transfection with MigR1-N1ICD or Rest-IRES-GFP or the empty vector control. Viral transductions of N1ICD or Rest were used to generate adherent non-NE cells from NE cells, a process taking about 1–2 weeks. The cells were then expanded and collected for immunoblot analyses. For isolation of Rest knockout clones, sgRNA-infected cells were selected with puromycin (2 μg ml−1) for 4 days and single cells were sorted into individual wells in 96-well plates by FACS. After 2 weeks, clones were picked and those with biallelic frameshift mutations resulting in premature truncation of the translated protein were verified by TOPO PCR cloning (Thermo Fisher Scientific) and Sanger sequencing. Tissue culture plates were coated overnight with 200 nM of purified Dll4 in PBS at 4 °C, then washed twice with PBS to remove any unbound ligand before seeding of cells. GFPhigh cell lines were maintained on Dll4-coated dishes. To assay for acute responses to the lost of Notch activation, cells were kept on Dll4-coated plates or seeded on plates without Dll4 and collected 72 h later for analyses. GFPneg cell lines were maintained on non-Dll4-coated dishes unless otherwise indicated. To test for Notch ligands expressed by NE SCLC cells, mCherry-labelled NE (KP1 and KP3) cells were co-cultured with GFPhigh cell lines at a 3:1 ratio with 10 μM DBZ or DMSO control without exogenous Dll4. This ratio was based on the average number of GFPneg and GFPhigh cells in TKO Hes1GFP/+ tumours (27.7% GFPhigh cells ≈ 3:1 ratio). Median GFP fluorescence intensity of mCherry-negative, GFPhigh cells was quantified by flow cytometry after 72 h. For Dll4 stimulation of human cell lines (suspension), plates were coated overnight with 400 nM Dll4 in PBS at 4 °C. Plates were washed twice with PBS, coated with 0.01% poly-d-lysine (Sigma-Aldrich) for an hour at 37 °C and then washed twice with PBS before seeding of cells. For GFPneg ex vivo assays, DBZ was added at a concentration of 10 μM and tarextumab at 100 μg ml−1. Cells were analysed after 2 weeks by flow cytometry for the generation of GFPhigh cells. Fifty thousand GFPneg, GFPhigh or bulk tumour cells (mixture of GFPneg and GFPhigh) were sorted from TKO Hes1GFP/+ tumours, resuspended in 100 μl of modified DMEM/F12 medium containing 50% Matrigel as previously described43 and then layered with 200 μl of medium. Overall survival was assayed 1 week later by incubating with AlamarBlue (Thermo Fisher Scientific) for 4 h. Supernatant was removed and fluorescence of the Matrigel layer was read by a fluorescence plate reader (excitation 560 nm, emission 590 nm). For immunostaining, the Matrigel layer was fixed overnight with 10% formalin in PBS then washed twice with PBS before being embedded in histogel and subjected to processing for paraffin embedding. For co-culture cell growth assays, NE mouse SCLC cells (KP1, KP2) were labelled with firefly luciferase and enhanced GFP by lentiviral infection. These cells were then mixed with GFPhigh cells at a 3:1 ratio (12,000 NE cells + 4,000 GFPhigh cells) in 96-well white bottom plates. Luciferase activity was assayed 72 h later by the Steady-Glo luciferase assay system (Promega) according to the manufacturer’s protocol. For conditioned medium assays, 0.5 × 106 GFPhigh cells were seeded overnight in 6-cm dishes. The medium was then changed and conditioned medium collected after 24 h. Twelve thousand NE cells per well of a 96-well plate were resuspended in conditioned medium and luciferase activity was assayed 72 h later. Conditioned medium from NE cells was used as the control, although in preliminary experiments we did not notice any difference in luciferase activity between NE-conditioned medium and regular medium. For co-culture EdU assays, unlabelled KP1 and KP2 were co-cultured with GFPhigh cells at a 3:1 ratio (150,000 NE cells + 50,000 GFPhigh cells) in 12-well plates for 72 h and then incubated with 10 μM EdU (Life Technologies) for 3 h. Both floating and adherent populations were collected and subject to EdU staining using a Click-iT Plus EdU Pacific Blue flow cytometry assay kit (Life Technologies). Twenty thousand NE or 4,000 GFPhigh cells were seeded per well of a 96-well plate in RPMI medium with 2% BGS. One microlitre of drug solution was added per well the next day at the appropriate concentration and cell viability was assayed 48 h later by the MTT assay (Roche). Twenty thousand NE cells were seeded per well of a 96-well plate in RPMI medium with 2% BGS in the presence of the recombinant proteins. Cell viability was assayed after 72 h by the AlamarBlue assay. The following recombinant proteins were used: Midkine (OriGene TP723299, 50 ng ml−1), Betacellulin (BioLegend 551302, 5 ng ml−1), Gdf15 (MyBioSource MBS205834, 25 ng ml−1), Bmp4 (BioLegend 595301, 50 ng ml−1), Ephrin A1 (BioLegend 755002, 50 ng ml−1), SCF (BioLegend 579702, 50 ng ml−1), and Fstl1 (R&D Systems 1738-FN-050, 200 ng ml−1). One and a half million NE cells or 0.5 × 106 GFPhigh cells were seeded per well of a 12-well plate in RPMI medium with 2% BGS. Supernatant was collected after 24 h, centrifuged at 1,500 r.p.m. for 10 min and assessed for the presence of midkine by an ELISA (LifeSpan Biosciences, LS-F5765) according to the manufacturer’s instructions. Data were analysed using http://www.elisaanalysis.com/. Tissues were fixed overnight with 10% formalin in PBS before processing for paraffin embedding. For IHC, paraffin sections were stained as previously described8. In brief, a citrate-based solution (Vector Laboratories) was used for antigen retrieval. DAB (Vector Laboratories) and haematoxylin were used for staining development and counterstaining, respectively. The primary antibodies used were Hes1 (CST 11988, 1:200), Notch2 (CST 5732, 1:200), GFP (Invitrogen A-11122, 1:400), cleaved caspase-3 (CST 9664, 1:200), Ki-67 (BD Biosciences 550609, 1:200), and Ascl1/Mash1 (BD Biosciences, 556604, 1:200). For staining of allograft and xenograft models treated with tarextumab, tissue sections were stained on a Ventana Discovery Ultra instrument (Roche) using Ventana reagents. Sections were treated with Cell Conditioning 1 before addition of antibodies. Antibodies were detected with UltraMap HRP kit and ChromoMAP DAB, then counterstained with haematoxylin. Antibodies used were the same as listed above except Ascl1 (eBioscience 1405794) and Ki67 (Abcam ab16667). For immunofluorescence, paraffin sections were deparaffinized, rehydrated, and unmasked by boiling in Trilogy (Cell Marque 920P-10) for 15 min, then blocked and stained with primary antibodies overnight, or subject to EdU staining (Life Technologies) before blocking and antibody staining. Nuclei were stained with DAPI (Sigma). The following primary antibodies were used: GFP (Rockland 600-101-215, 1:500), Uchl1 (Sigma HPA005993, 1:500), CGRP (Sigma C8198, 1:2,000), synaptophysin (Syp, Neuromics MO20000, 1:100), RFP/Tomato (Rockland 600-401-379, 1:500), phospho-histone H3 (EMD Millipore 06-570, 1:500), and cleaved caspase-3 (CST 9664, 1:100). Quantification of all immunostaining was performed blinded. Hes1pos cells in TKO lung or liver sections or in human tissue microarrays were scored on the basis of the frequency and intensity of Hes1 staining and assigned scores of 0 (no staining), 1 (staining in 1–20% of cells), 2 (staining in 20–60% of cells or strong intense staining in <20% of cells), or 3 (>60% staining). Human SCLC tissue microarrays were purchased from US Biomax (LC245, LC802a, LC818), containing a total of 172 cores from 139 patients. H scores were calculated as the summation of (1 + i)p where i is the intensity score and p is the percentage of the cells with that intensity. The frequency of Hes1pos cells in TKO sections after chemotherapy was quantified from IHC staining using the ImageJ plugin, ImmunoRatio44. The percentage of CC3pos cells in GFPneg or GFPhigh cells after acute chemotherapy of TKO Hes1GFP/+ mice was quantified from immunofluorescence images by ImageJ. For studies with human patient-derived xenograft and allograft tumour models performed at OncoMed Pharmaceuticals, slides were scanned using an Aperio AT scanner, then analysed using Definiens Tissue Studio image analysis software. Positively stained cells within tumours were identified and quantitated for staining intensity and frequency. For quantification in Extended Data Fig. 10f–m, some samples were excluded because the paraffin blocks did not have any tissue samples left to be cut (since the tumours were harvested at or close to minimum residual disease, the amount of tissue obtained was small). This exclusion due to unforeseen experimental limitations was not pre-established. The study was approved by the institutional review board of the East Paris University Hospitals Tumour Bio-bank, AP-HP, Tenon Hospital, Paris, France (AP-HP – GH-HUEP Tumorothèque Bio-bank platform). Seventy-three patients diagnosed with SCLC at Hôpital Tenon, Assistance Publique-Hôpitaux de Paris, France, from January 2010 to January 2013 were first identified. Tumour samples were obtained after getting written informed consent. We performed HES1 IHC for 68 of the patients from whom formaldehyde-fixed and paraffin-embedded tumour tissue was available. The tumour samples were first reviewed by at least two independent expert pathologists and the diagnosis of SCLC was histomorphologically confirmed by haematoxylin and eosin staining and IHC for chromogranin A, synaptophysin, NCAM and TTF1. Clinical and biological characteristics of the patients are provided in the Supplementary Methods. For survival analysis, the patients were separated into two groups on the basis of the absence (Hes1-negative) or presence (Hes1-positive) of HES1 immunostaining in their tumours. Human plasma samples from cancer-free normal donors were purchased from BioreclamationIVT. SCLC donor plasma was sourced from Conversant Biologics (Conversant Bio). The samples were collected, processed, and distributed in accordance with institutional review board approval following informed patient consent. Plasma samples were assayed by following the Luminex assay protocol with adaption of the Drop Array system (Curiox Biosystems, Luminex, Austin, Texas, USA). In brief, wells in the DropArray assay plate were blocked with 10 μl 1% BSA/PBS for 30 min at room temperature. Standards were prepared according to manufacturer’s instructions. Bead mix (5 μl) was added to all wells. Five-microlitre standards or diluted samples were then added to the plate; all standard and human plasma samples were tested in duplicate wells. The plate was shaken for 10 s at 1,000 r.p.m. then placed on a magnetic stand in a humidified chamber and shaken overnight at 4 °C. The plate was washed three times with a DropArray LT washing station MX96 (Curiox Biosystems). The detection antibody was added at 5 μl per well and the plate was incubated for 60 min. Five microlitres per well of the streptavidin-PE substrate was added to each well and incubated for 30 min with shaking. The plate was washed three times before reading by Luminex 200 instrument. Data were analysed using EMD Millipore’s Milliplex Analyst software. The standard curve readings were back-calculated and evaluated for accuracy (80–120%) and precision (percentage coefficient of variation of duplicates <30%). Cells were lysed in a modified RIPA buffer (1% NP40, 0.3% SDS, 50 mM Tris-HCl pH 8.0, 150 mM NaCl, 2 mM EDTA, 1% sodium deoxycholate, 30 mM NaF, 20 mM Na P O , 1 mM NaVO , 1 mM DTT, 60 mM β-glycerophosphate) supplemented with protease inhibitors aprotinin (10 μg ml−1), leupeptin (10 μg ml−1), and PMSF (1 mM). Protein concentration was measured with a Pierce BCA protein assay kit (Thermo Scientific). The antibodies used were Notch1 (Cell Signaling Technology (CST) 4380), cleaved Notch1 (CST 4147), Notch2 (CST 5732), Hes1 (CST 11988), GFP (Invitrogen A-11122), Rest (Abcam 21635), alpha-tubulin (Sigma T9026), and HSP90 (CST 4877). For analysis of primary tumour cells, cells were sorted from pooled tumours from individual TKO Hes1GFP/+ mice by FACS. DNA and RNA were isolated using a Qiagen Allprep DNA/RNA micro kit or an RNeasy mini kit according to the manufacturer’s protocol. qRT–PCR analysis was performed on an Applied Biosystems 7900HT Fast Real-Time PCR System using PerfeCTa SYBR Green FastMix (Quanta BioSciences 95073). Genes having C values that were high (>34) or undetermined (for example, Notch4) were removed from the graphical analyses. Data were normalized to Rplp0 as a housekeeping gene, unless otherwise stated. Primer sequences are available in Supplementary Methods. RNA from cells isolated by FACS from three TKO Hes1GFP/+ mice (independent of the samples used for qRT–PCR) was subjected to quality assessment and microarray analysis by the Stanford Protein and Nucleic Acid (PAN) facility as previously described8. The microarray was performed using a GeneChip Mouse Gene 2.0 ST Array (Affymetrix), and the Robust Multichip Average (RMA) Express 1.1.0 program was used for background adjustment and quantile RMA normalization of the 41,345 probe sets encoding mouse genome transcripts. Linear models for microarray data (Limma) was used to compare GFPneg and GFPhigh cells on RMA normalized signal intensities. The command prcomp in R was used for principal component analysis. Probe identifiers were annotated with gene symbols from the mouse gene 2.0 ST transcript cluster database (mogene20sttranscriptcluster.db). Of the 41,345 probe sets, 25,349 were annotated to genes, which were then used for gene set enrichment analysis45, 46. Default parameters were used except that we performed gene set permutation instead of phenotype permutation because there were fewer than seven samples per phenotype. Probes with an adjusted P value of 0.05 or less were considered as significantly differentially expressed. Seven thousand and ninety-six probes annotated to 5,437 genes (5,289 unique) were significant, and a heatmap for these genes was generated using the heatmap.2 function in R. Significantly differentially expressed genes were also analysed by Enrichr47, 48. To identify candidate transcription factors that might mediate the NE to non-NE switch, we used genes significantly downregulated in GFPhigh cells to search for enriched ENCODE and ChEA consensus transcription factors from the ChIP-X database. To identify a list of secreted factors, we first looked at genes that were classified in the ‘extracellular space’ gene signature and, by literature search, picked out the genes known to be secreted. We also input all significant genes into the ontology search tool in the BIOBASE Knowledge Library49, 50, and the output ontologies and gene descriptions were manually screened for secreted factors. We do not exclude the possibility that we might have missed some secreted factors that are not yet well curated in public databases. Candidates for testing in an NE cell growth assay were selected on the basis of expression fold changes and known biology. Single cells were sorted into individual wells in a 96-well PCR plate containing 5 μl of 2× reaction mix (CellsDirect One-Step qRT–PCR kit, Invitrogen) with two units of SUPERase In RNase Inhibitor (Thermo Fisher Scientific). Primers were designed and purchased from Fluidigm through the D3 assay design system. Primers were pooled, and reverse transcription and pre-amplification was performed at a final concentration of 50 nM for each primer pair using the following PCR protocol: 15 min at 50 °C, 2 min at 95 °C, 20 cycles of 15 s at 95 °C, and 4 min at 60 °C, 15 min at 4 °C. The complementary DNA (cDNA) products were treated with Exonuclease I (New England Biolabs) to remove unincorporated primers and then diluted fivefold for the final reaction. cDNA (2.25 μl), 2.5 μl 2× SsoFast EvaGreen Supermix with low ROX (Bio-Rad 172-5211) and 0.25 μl 20× DNA Binding Dye sample loading reagent (Fluidigm 100-3738) were mixed and loaded into a 48.48 or 96.96 Dynamic Array integrated fluidic circuit chip. Of each 100 μM primer pair, 0.25 μl was mixed with 2.5 μl 2× Assay Loading reagent (Fluidigm 85000736) and 2.25 μl TE buffer with low EDTA (Affymetrix 75793) and loaded into the integrated fluidic circuit. The chip was run on a Biomark machine according to the manufacturer’s protocol for EvaGreen probes. As established before the experiment, cells with high or undetectable C values (that is, low expression) for the housekeeping genes (Gapdh, Hsp90ab1, Actb) were excluded from the heatmaps. One nanogram of DNA was used for each multiplex PCR reaction to detect the unrecombined (floxed) and recombined (delta, Δ) Rb, p53, and p130 alleles. A Rb/p53/p130 (TKO) knockout cell line was a positive control for recombined alleles; DNA isolated from a mouse tail was a negative control. The reverse was true for the unrecombined alleles. Primer sequences are provided in the Supplementary Methods. Cells were fixed and ChIP was performed as previously described51. In brief, doxycycline-inducible cells were fixed after 48 h of doxycycline treatment. For N1ICD ChIP, KP1-pLIX-N1ICD cells were induced with 0.125 μg ml−1 of doxycycline and fixed with 2 mM disuccinimidyl glutarate (Thermo Scientific) in PBS for 30 min before formaldehyde fixation. For Rest ChIP, KP1-pLIX-Rest cells were induced with 0.5 μg ml−1 of doxycycline. The antibodies used were Notch1 (CST 3608), rabbit IgG (CST 2729), and Rest (Millipore 17-641). Primer sequences are provided in the Supplementary Methods. Sample sizes were chosen on the basis of our experience with similar experiments (a minimum of three to five mice for animal studies, or two to four biological replicates for in vitro/ex vivo assays, usually ensured statistical significance if the phenotypes were robust). Statistical significance was assayed by Student’s t-test with GraphPad Prism (two-tailed unpaired or paired t-test, depending on the experiment). *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001; NS, not significant. Variance was examined by an F-test. Data are represented as mean ± s.d. unless otherwise stated. For analysis of patient survival data, we used a weighted log-rank test in the OASIS web-based tool52 with greater emphasis on late time-point differences (rho: 0; gamma: 1). Microarray data are available at the NCBI Gene Expression Omnibus under accession number GSE81170. Normalized values for significantly differentially expressed genes are provided in Supplementary Table 1; gene set enrichment analyses are in Supplementary Tables 2–4. HES1 immunostaining and survival data of patients with SCLC are provided in Supplementary Table 5. For immunoblot Source Data, see Supplementary Fig. 1. Source Data are provided for Figs 1b, d and Extended Data Figs 4c, 5j, 6c, 8s, 9c–d, f and 10c, f–m, o. All other data are available from the corresponding author upon reasonable request.
News Article | May 24, 2017
No statistical methods were used to predetermine sample size. The investigators were not blinded to allocation during experiments and outcome assessment. Stable cell lines expressing affinity-tagged bait proteins were created according to protocols described previously in detail4. In brief, C-terminally HA–Flag-tagged clones targeting human bait proteins were constructed from clones included in version 8.1 of the human ORFeome (http://horfdb.dfci.harvard.edu)14. All expression clones used in this study are available from the Dana Farber/Harvard Cancer Center DNA Resource Core Facility (http://dnaseq.med.harvard.edu/). After sequence validation, clones were introduced into HEK293T, HCT116, or MCF10A cells (all from American Type Culture Collection) via lentiviral transfection. Cells were expanded under puromycin selection to obtain five 10-cm dishes per cell line before AP–MS. Bait proteins were selected from the ORFeome for high-throughput AP–MS analysis in batches corresponding to individual 96-well plates. Plates were selected for processing in random order. For AP–MS experiments in MCF10A cells, 1.15 × 106 cells per 15 cm dish were collected after 3 days (sub-confluent) or after 14 days in culture (contact inhibited) to allow for expulsion of YAP1 from the nucleus and Hippo pathway activation. MCF10A cells were grown in DMEM/F12 media supplemented with 5% horse serum, 20 ng ml−1 EGF, 10 μg ml−1 insulin, 0.5 μg ml−1 hydrocortisone, 100 ng ml−1 cholera toxin, 50 U ml−1 penicillin, and 50 μg ml−1 streptomycin. All cell lines were found to be free of mycoplasma using Mycoplasma Plus PCR assay kit (Agilent). Karyotyping (GTG-banded karyotype) of HeLa, HCT116, and HEK293T cells for cell line validation was performed by Brigham and Women’s Hospital Cytogenomics Core Laboratory. All AP–MS experiments were performed as presented previously in full4. In brief, cell pellets were lysed in the presence of 50 mM Tris-HCl pH 7.5, 300 mM NaCl, 0.5% (v/v) NP40, followed by centrifugation and filtration to remove debris. Immunoprecipitation was achieved using immobilized and pre-washed mouse monoclonal anti-HA agarose resin (Sigma-Aldrich, clone HA-7) that was incubated with clarified lysate for 4 h at 4 °C before removal of supernatant and four washes with lysis buffer followed by two washes with PBS (pH 7.2). Complexes were eluted in two steps using HA peptide in PBS at 37 °C and subsequently underwent TCA precipitation. Baits were processed in batches corresponding to 96-well plates in the ORFeome collection; plates were processed in random order. In preparation for LC–MS analysis, protein samples were reduced and digested with sequencing-grade trypsin (Promega). Peptides were then de-salted using homemade StageTips30 and approximately 1 μg of peptides were loaded onto C18 reversed-phase microcapillary columns and analysed on Thermo Fisher Q-Exactive mass spectrometers. Data acquisition methods were approximately 70 min long, including sample loading, gradient, and column re-equilibration. Tandem mass spectrometry (MS/MS) spectra were acquired in data-dependent fashion targeting the top 20 precursors for MS2 analysis. Unless noted otherwise, a single biological replicate of each bait was subjected to affinity purification followed by technical duplicate LC–MS analysis. For a complete description of data acquisition parameters, see ref. 4. A brief synopsis of our methods for identifying peptides and proteins from LC–MS data and distinguishing bona fide interacting proteins from background is provided here. For full details, refer to ref. 4. The BioPlex 2.0 network was generated by reanalysing Sequest search results from the BioPlex 1.0 dataset, combined with additional new AP–MS datasets. Sequest31 was used to match MS/MS spectra with peptide sequences from the Uniprot20 human protein database supplemented with sequences of green florescent protein (GFP) (our negative control), our Flag–HA affinity tag, and common contaminant proteins. This version of the UniProt database includes both SwissProt and Trembl entries and was current in 2013, at the outset of this project when the first AP–MS data were collected and searched. All protein sequences were included in forward and reversed orientations. Only fully tryptic peptides with two or fewer missed cleavages were considered, and precursor and product ion mass tolerances were set to 50 p.p.m. and 0.05 Da, respectively. The sole variable modification considered was oxidation of methionine (+15.9949). Target-decoy filtering32 was applied to control FDRs, using a linear discriminant function for peptide filtering and probabilistic scoring at the protein level33. Linear discriminant analysis considered Xcorr, D-Cn, peptide length, charge state, fractions of ions matched, and precursor mass error to distinguish correct from incorrect identifications. Peptide-spectral matches from each run were filtered to a 1% protein-level FDR with additional entropy-based filtering4 to reduce the final dataset protein-level FDR to well under 1%. Protein identifications supported by only a single peptide were discarded as well. These additional post-search filters further reduced the dataset-level FDR by over 100-fold. Scoring to identify HCIPs was performed in multiple stages after combining technical duplicate analyses of each AP–MS experiment and mapping all protein identifiers to Entrez Gene identifiers to minimize technical issues due to protein isoforms. Protein abundances in each immunoprecipitation were quantified using spectral counts averaged across technical replicates. The CompPASS algorithm34, 35 compared abundances of the proteins detected in each immunoprecipitation with their average levels across all other immunoprecipitations, returning a z score that quantified the extent to which a protein’s abundance exceeds its average levels across the dataset as well as the empirical NWD-score that accounted for a protein’s abundance, frequency of detection, and consistency across duplicate analyses. Subsequent filtering based on PSM counts, entropy scoring, and each protein’s frequency of detection within each batch of samples minimized false positives, liquid chromatography carryover, and technical artefacts. Putative bait–prey interactions were further filtered using CompPASS-Plus4, a naive Bayes classifier that learns to distinguish true interacting proteins from non-specific background and false positive identifications on the basis of CompPASS scores and several other metrics described previously. The algorithm modelled true interactions using examples from STRING36 and GeneMania37 databases. False positive protein identifications were modelled using decoy identifications that had survived previous filters. All remaining data were used to model background. Cross-validation was applied by batch, with each 96-well plate of immunoprecipitations scored using a model trained on ~57 different plates. Bait–prey interactions were then assembled across immunoprecipitations to produce a single network, combining scores of reciprocal interactions to increase their weight. BioPlex 2.0 was obtained by pruning this network to retain only those interactions that earned scores above 0.75, as described previously4. See Supplementary Table 1 for a list of baits as well as a complete list of interactions. BioPlex 2.0 interaction data were compared with data from BioGRID38, CORUM15, STRING36, GeneMania37, and MINT39 databases as described previously4. Because the BioPlex 2.0 dataset incorporates the contents of BioPlex 1.0 and data from this project have been deposited directly into BioGRID, released to the scientific community via the project website (http://bioplex.hms.harvard.edu), and otherwise distributed40 at intervals throughout the project, snapshots of these databases predating public disclosure of any BioPlex data were used to ensure that no interactions derived from BioPlex were included in the comparison. In Extended Data Fig. 1a, several data sources were used to determine the fractions of various protein families included as baits or preys in BioPlex 1.0 or 2.0. The list of human kinases was downloaded from kinase.com (http://kinase.com/web/current/human/; December 2007 update). Mitochondrial proteins were taken from MitoCarta 2.0 (ref. 41). Lists of transcription factors and chromatin-remodelling factors were drawn from http://www.bioguo.org. Drug target lists were taken from http://www.drugbank.ca. Cancer genes were taken from ref. 42. Disease genes were extracted from the curated set of disease–gene associations in the DisGeNET database25. ‘Essential’ genes were taken from recent papers describing clustered regularly interspaced palindromic repeat (CRISPR)–Cas9 screening to identify human genes that confer a fitness advantage6, 7. In each case, protein identifiers were converted to Entrez Gene identifiers, if necessary, and compared against those gene products included in either interaction network. Each of these analyses was performed exactly as described previously4. Brief summaries follow. Subcellular localization predictions relied upon localization information provided for a subset of proteins by the UniProt website (http://www.uniprot.org) in March 2016. These localization terms were manually condensed to 13 core localizations: nucleus, cytoplasm, cytoskeleton, endosome, endoplasmic reticulum, extracellular, Golgi, lysosome, mitochondrion, peroxisome, plasma membrane, vesicle, and cell projection. Fisher’s exact test was used to calculate the enrichment of each term among each protein’s primary and secondary neighbours, with multiple testing correction43. Predictions were made when enrichments were significant at an adjusted FDR of 1%. Localization predictions are provided in Supplementary Table 3. Domain–domain associations were uncovered by mapping PFAM domains onto the 56,553 protein–protein interactions in the BioPlex 2.0 network. After counting the numbers of interactions involving each domain individually and the number of interactions in which the domains were brought together within separate proteins, Fisher’s exact test was used to evaluate significance with subsequent correction for multiple hypothesis testing. Domains were considered significantly associated at an adjusted P value less than 0.01. Significant domain–domain associations are summarized in Supplementary Table 4. The enrichment of GO44 terms and PFAM22 domains was determined among each protein’s immediate neighbours and for each network community using Fisher’s exact test with multiple testing correction43. GO and PFAM data were downloaded from the UniProt website (http://www.uniprot.org) in March 2016. Only terms occurring at least twice were considered. Enrichments of GO terms and PFAM domains among each protein’s neighbours are summarized in Supplementary Table 5. The MCL algorithm5 was used to partition the BioPlex 2.0 network into communities of tightly interconnected proteins, using an implementation provided by the algorithm’s creator, S. van Dongen, at http://micans.org/mcl/. The option –force-connected=y was used to ensure that final clusters correspond to connected components. The MCL algorithm requires specification of one parameter, the inflation parameter, which controls the granularity of the clusters that are produced. Clustering of BioPlex 2.0 was repeated for several values of the inflation parameter between 1.5 and 2.5. After comparing experimentally derived clusters with known protein complexes, an inflation parameter of 2.0 was selected for final clustering. Clusters containing fewer than three proteins were discarded, producing a final list of 1,320 protein communities. Each cluster and its members are summarized in Supplementary Table 6; GO terms and PFAM domains enriched in each community are provided in Supplementary Table 7. One important question has been the extent to which each of the clusters observed in BioPlex 2.0 is also visible in BioPlex 1.0. To address this question, we mapped each cluster detected in BioPlex 2.0 onto the BioPlex 1.0 network. If a given cluster was also reflected in the BioPlex 1.0, then we would expect to see an enrichment of interactions; conversely, if interactions were not enriched among the relevant set of proteins above background, then there would be no evidence to support the indicated cluster. After mapping each cluster of tightly interconnected proteins from BioPlex 2.0 onto the BioPlex 1.0 network, we used a binomial test to evaluate the enrichment of BioPlex 1.0 interactions among matching proteins. The probability of interaction was estimated from the fraction of all possible interactions in the BioPlex 1.0 network that was actually detected (8.08 × 10−4); the number of trials was taken to be the maximum number of interactions possible among those proteins within the cluster that were part of the BioPlex 1.0 network; the number of interactions actually observed in this portion of BioPlex 1.0 was taken as the number of successes. A one-sided binomial test was performed and a correction for multiple testing was applied43. Overall, 45% of complexes detected in BioPlex 2.0 did not show any enrichment for protein interactions in BioPlex 1.0, suggesting that these were macromolecular complexes not covered in the first interaction network. Moreover, although the remaining 55% of complexes were at least partly reflected in BioPlex 1.0, the density of their coverage consistently increased with incorporation of additional AP–MS data into the BioPlex 2.0 network. In addition to using MCL clustering to partition the BioPlex 2.0 network into individual clusters of tightly interconnected proteins, we also wanted to explore patterns of interconnection within the network that related these clusters to each other. For this purpose, we searched for pairs of clusters that were connected to each other through interactions among their constituent proteins more often than would be expected. First, the full set of 56,553 interactions was trimmed to include only those interactions connecting one cluster with another, and the set of all cluster pairs connected by one or more interactions was identified. For each of these pairs of clusters, the number of interactions connecting the pair was determined, as were the numbers of interactions involving each cluster individually. Fisher’s exact test was used to identify pairs of clusters that were enriched for interactions among them, followed by multiple testing correction43. The 929 cluster–cluster associations that were accepted at a 1% FDR are displayed in Fig. 3a and Extended Data Fig. 9 and provided in Supplementary Table 6. GO and PFAM enrichments for each community are summarized in Supplementary Table 7. The first step towards examining network properties of fitness proteins was to combine lists of proteins associated with increased cellular fitness from refs 6, 7 into a single composite list. For our purposes, we used the union of both lists to define the set of fitness proteins. Entrez Gene identifiers were associated with proteins on this list and mapped onto the BioPlex 2.0 network. To assess network properties of fitness proteins, the composite list of proteins associated with increased cellular fitness was superimposed onto the BioPlex network, effectively subdividing all proteins in the network into two groups corresponding to fitness and non-fitness proteins. Vertex degrees, local clustering coefficients, and eigenvector centralities were then computed and averaged across all fitness proteins. To evaluate whether these values differed for fitness proteins compared with randomly selected protein subsets of equivalent size, fitness and non-fitness labels were scrambled across the network and a new average was calculated for the randomized list of fitness proteins. This process was repeated 10,000 times to define null distributions for each statistic. Since these distributions were normally distributed, Gaussian distributions were fitted to each and used to assign z scores and P values for each statistic associated with the true set of fitness proteins. To evaluate graph assortativity, the BioPlex network was subdivided into fitness and non-fitness proteins and the assortativity of the partitioned graph was calculated. This process was repeated 10,000 times, randomizing fitness and non-fitness labels, and the resulting distribution was fitted to a Gaussian distribution and used to determine a z score and P value associated with the true assortativity. A second goal was to identify clusters enriched with fitness proteins. For this purpose, a one-sided hypergeometric test was used to evaluate the enrichment of fitness proteins, taking into account the size of the cluster, the size of the BioPlex network, and the fraction of network proteins that were associated with increased cellular fitness. Only clusters containing two or more fitness proteins were considered for this analysis. Once a multiple testing correction43 was applied, 53 communities were found to be enriched with fitness proteins at a 1% FDR. These clusters are summarized in Extended Data Fig. 9. Levels of enrichment are summarized for those communities containing two or more cellular fitness proteins in Supplementary Table 8. To assess the tendency for clusters containing fitness proteins or enriched for fitness proteins to be centrally located within the cluster–cluster association network (Fig. 3a), all clusters were sorted according to their eigenvector centralities. The Kolmogorov–Smirnov test was used to compare distributions of clusters enriched and not enriched with fitness proteins within the ranked list of all clusters. This process was repeated to compare distributions of clusters containing multiple fitness proteins with clusters containing 0 or 1 fitness proteins, as shown in Fig. 3d. The basis for our study of protein complexes and disease was the DisGeNET database of disease–gene associations25. For our analysis we used the full database that relates over 16,000 genes with 13,000 partly redundant disease classifications. Each disease state and its associated proteins were then mapped onto each BioPlex 2.0 complex and evaluated for enrichment using a hypergeometric test, taking into account the size of the complex, the number of disease proteins in the complex, the number of disease proteins within the network, and the total network size. This process was repeated for each community and for each disease state. After multiple testing correction43, those complexes enriched with proteins involved with each disease at a 1% FDR were deemed associated. The resulting disease–complex associations were assembled into a network in which clusters and disease states are both represented as nodes, with edges connecting clusters with significantly associated disease states, depicted in full in Fig. 4a. All significant disease-cluster associations are provided in Supplementary Table 8. The eigenvector centralities assigned to disease states within the composite disease-community network were used to compare across a range of disease states. Disease classifications were taken from the DisGeNET database as reported in their SQLite download. All disease states in the network were ranked according to increasing eigenvector centrality. For each disease classification (for example, ‘neoplasms’), a Kolmogorov–Smirnov test was used to compare the distributions of matching and non-matching disease states within the entire ranked list. After multiple testing correction, disease states that appeared differentially distributed with respect to eigenvector centrality at a 1% FDR were identified and are highlighted in Fig. 4b. HEK293T cells were transfected with Flag–HA–GFP control plasmid, C13orf18–GFP, GFP–BECN1, or RUFY1–Flag–HA plasmids, and, after 48 h, cells were collected in lysis buffer (50 mM Tris pH 7.5, 150 mM NaCl, 1% NP-40), with protease and phosphatase inhibitors (Roche) on ice. Lysates were cleared by centrifugation, and subjected to affinity purification using anti-GFP antibodies (Chromotek, GFP–Trap, GTMA-20) or anti-Flag magnetic beads (Sigma-Aldrich, A2220)) for 2 h at 4 °C. Beads were washed four times with lysis buffer, and subsequently subjected to SDS–PAGE and immunoblotting with the following antibodies: BECN1 (Cell Signaling, clone D40C5), GFP (Roche, mouse IgG clones 7.1 and 13.1), C13orf18 (Proteintech, 21183-1-AP), and HA (Biolegend, clone HA.11). For validation of Hippo pathway interactions within BioPlex 2.0, we performed AP–MS experiments in MCF10A cells. Unlike HEK293T cells, MCF10A cells undergo contact inhibition and activate the Hippo signalling pathway; therefore we used cells under both sub-confluent and confluent conditions wherein YAP1 expulsion from the nucleus was verified by immunofluorescence (see section on ‘Clone construction and cell culture’). Affinity purification was performed essentially as described previously34, but eluted anti-HA immune complexes (Sigma-Aldrich, clone HA-7) were analysed in two ways. First, immune complexes for PDLIM7, MAGI1, YAP1, WWC1, NF2, and MPP5 (replicate 1) were subjected to LC–MS/MS analysis on an LTQ-Velos instrument and HCIPs identified using CompPASS34 in combination with a false positive background dataset derived in MCF10A cells45. The second replicate set for PDLIM7, MAGI1, YAP1, WWC1, NF2, and MPP5, as well as both replicates for PTPN14 and INADL, were processed identically to the first set except that the HA-eluted proteins were reduced and alkylated with DTT and iodoacetamide before trypsin digestion, and all the digested peptides corresponding to one sub-confluent and one confluent anti-HA immunoprecipitation were labelled heavy and light respectively, by reductive dimethylation46. Sub-confluent and confluent sample pairs corresponding to each bait were mixed to normalize the amount of bait present in each heavy and light fraction to 1:1 and analysed on an Orbitrap Elite Hybrid Ion Trap-Orbitrap Mass Spectrometer (ThermoFisher). Complexes from each growth condition were deconvolved using linear discriminant analysis parameters that filtered for either heavy-only or light-only labelled peptides. The heavy- or light-specific search results were subsequently imported into CompPASS for protein interaction analysis. Spectral count and CompPASS score data for the MCF10A dataset is provided in Supplementary Table 10. Anti-PTPN14 antibodies were from Sigma-Aldrich (GW21498A). We used CRISPR–Cas9 gene editing to knockout KIAA0196 using the gRNA sequence (GTCTAAGCCATTTAGACCAA) as described47. The KIAA0196 ORF (a gift from C. Clemen, University of Cologne) was cloned into pLenti-NTAP-IRES-Puro and expressed in KIAA0196−/− cells after selection using puromycin (1 μg ml−1). Immunoprecipitation with anti-Flag (Sigma-Aldrich, M2) antibodies, trypsinization, tandem mass tagging labelling, analysis by mass spectrometry, and quantification were performed as described previously4. Parallel immune complexes or whole-cell lysates were subjected to immunoblotting with anti-WASH1 (Sigma-Aldrich, SAB4200373), anti-KIAA0196 (Santa Cruz Biotechnology, sc-87442), anti-KIAA1033 (Bethyl Labs, A304-919A), anti-CCDC53 (Proteintech, 24445-1-AP), anti-PCNA (Santa Cruz Biotechnology, sc-56), or anti-actin (Santa Cruz Biotechnology, sc-69879) and immunoblot signals quantified using Protein Simple M in biological triplicate. HeLa cells (American Type Culture Collection) were plated on glass coverslips (Zeiss) and transiently transduced with lentiviral vectors expressing C-Flag–HA-tagged baits. At 48 h after infection, cells were fixed with 4% paraformaldehyde for 15 min at room temperature. Cells were washed in PBS, then blocked for 1 h with 5% normal goat serum (Cell Signaling Technology) in PBS containing 0.3% Triton X-100 (Sigma-Aldrich). Coverslips were incubated with anti-HA antibodies (mouse monoclonal, clone HA.11, BioLegend) or anti-HA plus anti-TOMM20 (rabbit polyclonal mitochondrial marker, Santa Cruz Biotechnology, clone FL-145, catalogue number 11415) for 2 h at room temperature in a humidified chamber. Cells were washed three times with PBS, then incubated for 1 h with appropriate Alexa Fluor-conjugated secondary antibodies (ThermoFisher). Nuclei were stained with Hoechst, and cells were washed three times with PBS and mounted on slides using Prolong Gold mounting media (ThermoFisher). All images were collected with a Yokogawa CSU-X1 spinning disk confocal scanner with Spectral Applied Research Aurora Borealis modification on a Nikon Ti-E inverted microscope using a 100 × Plan Apo numerical aperture 1.4 objective lens (Nikon Imaging Center, Harvard Medical School). Confocal images were acquired with a Hamamatsu ORCA-AG cooled CCD (charge-coupled device) camera controlled with MetaMorph 7 software (Molecular Devices). Fluorophores were excited using a Spectral Applied Research LMM-5 laser merge module with acousto-optic tuneable filter (AOTF)-controlled solid-state lasers (488 nm and 561 nm). A Lumencor SOLA fluorescence light source was used for imaging Hoechst staining. z series optical sections were collected with a step size of 0.2 μm, using the internal Nikon Ti-E focus motor, and stacked using MetaMorph to construct maximum intensity projections. We performed three major validation experiments using (1) analysis of a dozen bait proteins in both HCT116 colon cells and HEK293T cells to examine overlap in interaction partners, (2) reciprocal AP–MS experiments directed at interacting proteins for a set of 14-3-3 proteins, and (3) analysis of the PDLIM7–PTPN14–YAP1 adhesion network in MCF10A cells. As a validation approach, we selected 12 largely unstudied proteins displaying a range of interaction partners from 1 to 25 in HEK293T cells and performed AP–MS in HCT116 cells, a cell line of distinct tissue origin from HEK293T cells. After identification of HCIPs for proteins in HCT116 cells, we determined the interactions in common with HEK293T cells (Extended Data Fig. 1b–m). Over the 12 bait proteins identified, we observed 30–100% validation of interactions seen for individual baits in HEK293T cells. Cumulatively, this reflected an overall 60% validation (92 of 147 interactions seen in HEC293T cells were seen in HCT116). This rate of validation is comparable to that seen in focused studies examining F-box protein interactors in these two cell lines (51%)48. Thus, a substantial fraction of interactions seen in HEK293T cells are recapitulated in HCT116 cells. The 14-3-3 proteins represent a well-studied group of seven proteins (YWHAB, YWHAE, YWHAZ, YWHAH, YWHAQ, YWHAG, and SFN) that typically associate with phosphorylated proteins. Thirty-nine baits in BioPlex 2.0 were found to interact with one or more of these 14-3-3 proteins, with YWHAZ being detected most frequently (35 baits) and SFN being detected the least frequently (4 baits) (Extended Data Fig. 2). Seventeen of these proteins are not known to interact with 14-3-3 proteins on the basis of BioGrid. Because only the atypical 14-3-3 protein SFN had been targeted as a bait in BioPlex 2.0, the remaining six 14-3-3 proteins were submitted to our standard AP–MS pipeline using ORFeome 8.1 clones; while the clone for YWHAE failed at the sequence validation stage, the remaining five 14-3-3 proteins were processed successfully, identifying 130–360 HCIPs (Supplementary Table 2). While eight of 39 BioPlex 2.0 baits that had been observed to interact with one or more 14-3-3 proteins were not detected in HEK293T cells and thus may be impossible to detect in reciprocal immunoprecipitations, 63% of interactions eligible for reciprocal detection were confirmed (Extended Data Fig. 2a–c). This demonstrates that BioPlex 2.0 may reliably reveal novel reciprocally interacting partners even for proteins as well studied as 14-3-3 proteins. PTPN14 is a protein phosphatase that has recently been found to associate with several proteins within the Hippo pathway involving the transcription factor YAP1. The Hippo pathway is regulated by contact inhibition, and promotes YAP1 sequestration in the cytoplasm49. BioPlex 2.0 contains a highly connected group of proteins centred on PTPN14, MAGI1, MPP5, LIN7A/C, and INADL (Extended Data Fig. 2d). This network contained several interactions not seen in BioGrid. To validate these interactions, we performed an AP–MS analysis or immunoprecipitation–western analysis of PTPN14, MAGI1, MPP5, PDLIM7, INADL, WWC1, NF2, and YAP1 after stable expression in MCF10A cells in both sub-confluent and confluent states. This series of experiments strongly validated interactions seen in HEK293T cells (Extended Data Fig. 2d, f) with 65% of eligible interactions being seen in both cell lines, further validating our method and the ability of BioPlex 2.0 to robustly identify interactions. Furthermore, 63% of interactions identified in both BioPlex 2.0 and MCF10A cells were novel, having not been previously described in several previous interaction profiling experiments (Extended Data Fig. 2g). Overall, these three lines of study indicate the ability of BioPlex 2.0 to identify interactions that can be validated reciprocally or in other cell lines. The BioPlex 2.0 network and its underlying data are available in several formats. First, all interactions in the BioPlex network have been deposited in the BioGRID protein interaction database. Second, we have created a website devoted to the project (http://bioplex.hms.harvard.edu) which provides tools to download (1) the interactions that make up BioPlex 1.0 and 2.0, (2) a customized viewer that enables browsing of either network to examine the interactions of specific proteins, (3) an interface for download of nearly 12,000 individual RAW files containing mass spectrometry data from individual AP–MS experiments, and (4) an R package and web-based tool for performing CompPASS analyses. Third, the BioPlex 2.0 network as bait–prey pairs has been incorporated into NDEx40, a web-based platform for biological Network Data Exchange. Fourth, our RAW files have been submitted for inclusion in ProteomicsDB50. Finally, all RAW files (3 Tb) from this study will be provided to investigators upon request using investigator-provided hard drives. Finally, a table in.tsv format containing all proteins and spectral count information for all 5,891 AP–MS experiments reported here is available for download at the BioPlex website. All other data are available from the corresponding authors upon reasonable request.