Chen J.-H.,University of Cambridge |
Chen J.-H.,National Health Research Institute |
Segni M.,University of Rome La Sapienza |
Payne F.,Wellcome Trust Sanger Institute |
And 37 more authors.
Journal of Molecular Endocrinology | Year: 2015
We describe a female proband with primordial dwarfism, skeletal dysplasia, facial dysmorphism, extreme dyslipidaemic insulin resistance and fatty liver associated with a novel homozygous frameshift mutation in POC1A, predicted to affect two of the three protein products of the gene. POC1A encodes a protein associated with centrioles throughout the cell cycle and implicated in both mitotic spindle and primary ciliary function. Three homozygous mutations affecting all isoforms of POC1A have recently been implicated in a similar syndrome of primordial dwarfism, although no detailed metabolic phenotypes were described. Primary cells from the proband we describe exhibited increased centrosome amplification and multipolar spindle formation during mitosis, but showed normal DNA content, arguing against mitotic skipping, cleavage failure or cell fusion. Despite evidence of increased DNA damage in cells with supernumerary centrosomes, no aneuploidy was detected. Extensive centrosome clustering both at mitotic spindles and in primary cilia mitigated the consequences of centrosome amplification, and primary ciliary formation was normal. Although further metabolic studies of patients with POC1A mutations are warranted, we suggest that POC1A may be added to ALMS1 and PCNT as examples of centrosomal or pericentriolar proteins whose dysfunction leads to extreme dyslipidaemic insulin resistance. Further investigation of links between these molecular defects and adipose tissue dysfunction is likely to yield insights into mechanisms of adipose tissue maintenance and regeneration that are critical to metabolic health. © 2015 The authors. Source
The clinical and biological characteristics of the 506 patients are shown in Extended Data Table 1. Among these patients, 452 were diagnosed with CLL and 54 with MBL. Cases were defined as IGHV-MUT when the identity of immunoglobulin genes was less than 98%. The tumour samples were obtained before administration of any treatment. All patients gave informed consent for their participation in the study following the International Cancer Genome Consortium (ICGC) guidelines and the ICGC Ethics and Policy committee19. Tumour samples were obtained from fresh or cryopreserved mononuclear cells. To purify the CLL or MBL fraction, samples were incubated with a cocktail of magnetically labelled antibodies directed against T cells, natural killer cells, monocytes and granulocytes (CD2, CD3, CD11b, CD14, CD15 and CD56), adjusted to the percentage of each contaminating population (AutoMACS, Miltenyi Biotec). The degree of contamination by non-CLL cells in the CLL fraction was assessed by immunophenotype and flow cytometry. DNA was extracted from purified samples by using a Qiagen kit, and the quality of purified DNA was assessed by SYBR-green staining on agarose gels and quantified using a Nanodrop ND-100 spectrophotometer. The tumour DNA and RNA samples for further genomic analysis contained ≥95% neoplastic cells and the contamination by neoplastic cells in normal DNA was <2%. For WGS, 2 μg of genomic DNA from each sample was used for the construction of two short-insert paired-end sequencing libraries. One library was prepared using a standard TruSeqDNA Sample Preparation Kit v2 (Illumina Inc.) with some modifications. In short, following the fragmentation (CovarisE220) the libraries were size-selected on the agarose gel and processed through end-repair, adenylation and indexed adaptor ligation. The gel eluate was directly amplified by 10 PCR cycles. The second library was prepared following the same protocol as above, however, it included a heating step to 72 °C before adaptor ligation and was suddenly cooled down to 4 °C. This resulted in a biased proportion of high GC content reads and counterbalanced some of Illumina’s PCR sample preparation methods’ GC-bias, thus improving coverage of increased GC-content regions of the genome. Both types of libraries were sequenced in paired-end mode on Illumina GAIIx (2 × 151 bp) using Sequencing kit v4 or Illumina HiSeq2000 (2x101 bp) using TruSeq SBS Kit v3 (Illumina Inc.). For other samples (Supplementary Table 1), the library preparation procedure was modified to remove the PCR step during short-insert paired-end library preparation. The TruSeq DNA Sample Preparation Kit v2 (Illumina Inc.) and the KAPA Library Preparation kit (Kapa Biosystems) were used. In brief, 2 μg of genomic DNA was sheared on a Covaris E220, size-selected and concentrated using AMPure XP beads (Agencourt, Beckman Coulter) to reach the fragment size of 220–480 bp. Fragmented DNA was end-repaired, adenylated and ligated to Illumina specific indexed paired-end adaptors. All libraries were quantified by Library Quantification Kit (Kapa Biosystems). Each library was sequenced using TruSeq SBS Kit v3-HS (Illumina Inc.), in paired-end mode, 2 × 101-bp, in three sequencing lanes of HiSeq2000 flowcell v3 (Illumina Inc.) according to standard Illumina operation procedures with minimal yield of 85 Gb for each sample. Primary data analysis was carried out with the standard Illumina software Real Time Analysis (RTA 1.13.48) and followed by generation of FASTQ files. For WES, 3 μg of genomic DNA from each sample were sheared and used for the construction of a paired-end sequencing library as described in the paired-end sequencing sample preparation protocol provided by Illumina41. Enrichment of exonic sequences was then performed for each library using either the Sure Select Human All Exon 50 Mb or All Exon+UTRs v4 kits (Supplementary Table 1) following the manufacturer’s instructions (Agilent Technologies). Exon-enriched DNA was pulled down by magnetic beads coated with streptavidin (Invitrogen), followed by washing, elution and 18 additional cycles of amplification of the captured library. Enriched libraries were sequenced (2 × 76 bp) in one lane of an Illumina GAIIx sequencer or in two lanes of a HiSeq2000 when using pools of eight samples. RNA was assayed for quantity and quality using Qubit RNA HS Assay (Life Technologies) and RNA 6000 Nano Assay on a Bioanalyzer 2100. RNA-seq libraries were prepared from total RNA using the TruSeq RNA Sample Prep Kit v2 (Illumina Inc.) with minor modifications. In brief, 0.5 μg of total RNA was used as the input material for poly-A-based messenger RNA enrichment with oligo-dT magnetic beads. Selected mRNA was fragmented (resulting RNA fragment size was 80–250 nucleotides, with the major peak at 130 nucleotides). After first and second strand cDNA synthesis the double-stranded complementary DNA was end-repaired, 3′ adenylated and the 3′ ‘T’ nucleotide of the adaptor was used for the Illumina indexed adapters ligation. The ligation product was enriched by 10 cycles of PCR. Each library was sequenced using TruSeq SBS Kit v3-HS, in paired-end mode with a read length of 2 × 76 bp. We generated more than 20 million paired-end reads for each sample in a fraction of a sequencing lane on HiSeq2000 (Illumina Inc.) following the manufacturer’s protocol. Image analysis, base calling and quality scoring of the run were processed using the manufacturer’s software Real Time Analysis (RTA 1.13.48) and followed by generation of FASTQ sequence files. For WGS and WES, reads from each library were mapped to the human reference genome (GRCh37) using BWA42 with the same option, and a BAM file was generated using SAMtools43. Reads from the same paired-end libraries were merged, and optical or PCR duplicates were flagged using Picard (http://picard.sourceforge.net/index.shtml). For the identification of somatic substitutions and indels, we used the Sidrón algorithm9, 44. This algorithm was adapted to identify subclonal mutations in which the mutant allele fraction is low, but supported by at least three reads. Visual inspection of recurrent mutational hotspots allowed the inclusion of some somatic mutations that were originally discarded owing to the presence of an excess of mutant reads in the non-tumour sample, or owing to low coverage, especially in the case of NOTCH1, in which a high GC content on exon 34 usually resulted in very low coverage by WES. In samples in which NOTCH1 coverage was too low to make a call, mutations were analysed by Sanger sequencing. A comparison of mutation calls by Sidrón and by Sanger sequencing of some of the most frequently mutated genes in CLL (SF3B1, TP53, MYD88) revealed more than 97% specificity and at least 90% sensitivity. Mutational signatures were extracted using the WTSI Mutational Signature Framework45. To estimate the presence of subclonal mutations in recurrently mutated genes, the fraction of reads supporting a mutant allele was calculated for those mutations in which the depth of coverage was at least 20 reads. Flow cytometry analysis confirmed that the percentage of tumour cells was at least 98%. A case was considered as having a clonal mutation when at least 80% of cells were estimated to contain the mutation, and the mutant allelic fraction was within the 95% confidence interval. For the identification of CNAs, tumour and normal DNA from 505 CLL patients were analysed using Affymetrix SNP6.0 microarrays (Affymetrix) as previously described46. SNP array experiments were carried out at CeGen (http://www.cegen.org). Additionally, for 230 cases array-comparative genomic hybridization was performed in SurePrint G3 Human aCGH Microarray 1M (Agilent Technologies). Array-comparative genomic hybridizations were performed at qGenomics (http://www.qgenomics.com). Nexus 6.0 Discovery Edition software (Biodiscovery) was used for global analysis and visualization. Copy number neutral loss of heterozygosity was considered when the size of alteration was larger than 5 Mb. Acquired copy number neutral loss of heterozygosity was observed in 28 regions, 16 of them affecting known driver genes that already contained mutations, resulting in homozygous deletion of mir-15a/mir-16 at 13q14, or inactivation of ATM and TP53 (Supplementary Table 4). According to the literature, the presence of chromothripsis was considered when at least seven switches between two or more copy number states were detected on an individual chromosome in which LOH was retained, and chromoplexy was defined when at least three chained chromosomal rearrangements were detected in a tumour27, 47. In one case in which genotyping data were not available, we used exome2cnv48 to identify CNAs from WES data. For the identification of breakpoints in WGS derived from structural variants, we used SMUFIN24, a program that directly compares sequence reads from normal and tumour samples, to identify chromosomal breakpoints corresponding to large structural variants at base-pair resolution. We analysed 150 tumour/normal whole-genome pairs setting the cross-sample contamination filter to 5%. Two WGS tumours (019 and 029) showed an abnormal number of breakpoints owing to the presence of sequence lanes with high error rates that interfere with SMUFIN and were not considered for this analysis. All predicted breakpoints that were not confirmed through the BAM file after manual inspection were systematically discarded. A total of 48 out of 53 (91%) selected predicted breakpoints could be verified using PCR amplification followed by Sanger sequencing (Supplementary Table 5). This verification rate is similar to the one observed in our initial description of the method24. In addition, custom scripts were used to identify potential translocations involving immunoglobulin genes either in WGS or WES. This resulted in the identification of ten cases (5 WGS and 5 WES) containing putative translocations with the BCL2 locus (nine with the t(14;18)(q32;q21), and one with the t(2;18)(p11;q21) translocation), all of which were confirmed by either Fluorescence in situ hybridization (FISH), cytogenetics or PCR (Extended Data Fig. 3). Conventional cytogenetics was performed on Giemsa-banded chromosomes (G-banding) obtained after a 72-h culture and stimulation with tetradecanoyl-phorbol-acetate. At least 20 G-banded metaphases per sample were analysed. Results were described according to the International System for Human Cytogenetic Nomenclature. FISH analyses on fixed cells were performed using probes that interrogated for 11q23/ATM, 13q14.3 and 17p13/TP53 deletions and trisomy 12 (Abbott Molecular). Two hundred nuclei were examined for each probe. LSI IGH/BCL2 dual colour fusion for the t(14;18)(q32;q21) (Abbot Molecular) was used to confirm BCL2 rearrangements detected by WGS and WES. Additionally, in case 853, whole chromosomal paintings of chromosomes 8, 11 and X were performed to determine the complex karyotype (with four derivative chromosomes), and rearrangements predicted by SMUFIN algorithm. DNA methylation was analysed using the 450k Human Methylation Array (Illumina). We used the EZ DNA Methylation Kit (Zymo Research) for bisulphite conversion of 500 ng of genomic DNA, and the Infinium methylation assay was carried out as described by the manufacturer49, 50. These array experiments were performed at CeGen (http://www.cegen.org). Data from the 450k Human Methylation Array were analysed in R using the minfi package (version: 1.6.0)51, available through the Bioconductor open source software, applying several custom filters. Unsupervised analyses were performed by principal component analysis and differential methylation between individual CLL/MBL samples and controls was detected using an absolute difference of 0.25. We studied the gene expression profiling of 468 cases using highly purified leukaemic CLL cells. Total RNA was extracted with the TRIzol reagent following the recommendations of the manufacturer (Invitrogen Life Technologies). RNA integrity was examined with the Agilent 2100 Bioanalyzer (Agilent Technologies) and only high-quality RNA samples were hybridized to Affymetrix Human Genome Array U219 array plates according to Affymetrix standard protocols. Summarized expression values were computed using the robust multichip average approach implemented in the Expression Console Software (Affymetrix Inc.). cDNA was synthesized from 500 ng of total RNA using High Capacity RNA-to-cDNA kit (Life Technologies) following the manufacturer’s instructions. Amplification was performed using 50 ng of DNA using Qiagen Multiplex PCR Kit (Qiagen), and the reaction mix contained 1× Qiagen Multiplex PCR Master Mix (12.5 μl), primer mix (0.4 μM of each primer) and RNase-free water for a total reaction volume of 25 μl. For NOTCH1 within-intron splicing, primers used were: forward 5′-CCTAACAGGCAGGTGATGCT-3′ and reverse 5′-TACTCCTCGCCTGTGGACAA-3′. PCR amplification was performed for NOTCH1 3′ UTR forward primer 5′-CCTAACAGGCAGGTGATGCT-3′ and reverse primer 5′-ATCTGGCCCCAGGTAGAAAC-3′, PAX5 enhancer first region forward 5′-TAGATTGTGCCGAATGCTGA-3′ and primer 5′-ACAAGCTCTCCTCCCAGGAA-3′, and PAX5 enhancer second region forward primer 5′-AGGATGAGAACGGGCAAAC-3′ and reverse primer 5′-GGAGCTTCCAGCTGAACTGA-3′. All PCR products were run on a capillary electrophoresis gel (QIAxcel Advanced System, Qiagen) with the QIAxcel DNA screening kit (Qiagen). For western blot analysis, tumour cells were lysed for 30 min in Triton buffer (1% Triton X-100, 50 mM Tris–HCl, pH 7.6, 150 mM NaCl, 1 mM EDTA) supplemented with protease and phosphatase inhibitors (1 mM PMSF, 2 mM sodium pyrophosphate, 2 mM sodium β-glycerophosphate, 1 mM NaF, 1 mM sodium orthovanadate, 10 μg ml−1 leupeptin and 10 μg ml−1 aprotinin). Lysates were cleared by centrifugation at 15,000g at 4 °C for 15 min, and protein concentrations determined using the Bradford method. Thirty micrograms of protein was separated by SDS–PAGE and transferred onto Immobilon-P membranes. Membranes were blocked with 2.5% phosphoBlocker (Cell Biolabs) in TBS-Tween 20. For protein immunodetection, the specific primary antibodies were used: anti-cleaved NOTCH1 (Val1744) (D3B8; Cell Signaling Technology) and β-actin (Sigma). Anti-rabbit and anti-mouse horseradish peroxidase-labelled IgG (Sigma) were used as secondary antibodies. Chemiluminescence was detected by using ECL substrate (Pierce) on a mini-LAS4000 Fujifilm device (GE Healthcare). NOTCH1 immunohistochemical staining was performed on a Leica Bond system using formalin-fixed paraffin-embedded tissue sections52. Samples were pre-treated using heat-mediated antigen retrieval with EDTA buffer (pH 9.0), epitope retrieval solution 2 (HIER2) for 30 min. Then, sections were incubated with anti-cleaved NOTCH1 rabbit monoclonal antibody (clone D3B8, catalogue number 4147, Cell Signaling Technology) at a final concentration of 8.5 μg ml−1, for 60 min at room temperature and detected using a horseradish peroxidase (HRP)-conjugated compact polymer system. DAB was used as the chromogen. The section was then counterstained with haematoxylin and mounted with DPX. PCR products were treated using ExoSap IT (USB Corporation) and sequenced with ABI Prism BigDye terminator v3.1 (Applied Biosystems) and 5 pmol of each primer. Sequencing reactions were run on an ABI-3730 automated sequencer (Applied Biosystems). All sequences were examined with the Mutation Surveyor DNA Variant Analysis Software (Softgenetics). ChIP-seq was performed in normal B-cell subpopulations and in cells (>90% tumour cell content) of a CLL patient with mutated IGHV, and DNase-seq only in the latter following standard protocols generated within the Blueprint Consortium. In brief, cells for ChIP-seq were fixed for 8–16 min in 1% formaldehyde at 4 °C, and chromatin was sonicated for 15 min with a Biorruptor (Diagenode). Chromatin fragments ranging from 50 to 500 bp were selected and immunoprecipitation was carried out with antibodies from Diagenode against H3K4me3 (pAb-003-050 lot:A5051-001P), H3K4me1 (pAb-194-050 lot:A1863-001P) and H3K27ac (pAb-196-050 lot: A1723-0041D) using approximately 500,000 cells per antibody. DNase I digestion was performed using 60 units of the enzyme (Sigma) and 2.5 million cells. ChIP-seq and DNase-seq libraries were constructed using the Kapa Hyper Prep Kit (Kapa Biosystems). For each experiment, from 25 to 50 million reads were sequenced with an Illumina HiSeq2000 sequencer. Detailed protocols can be obtained from the Blueprint Consortium (http://www.blueprint-epigenome.eu/index.cfm?p = 7BF8A4B6-F4FE-861A-2AD57A08D63D0B58). 4C-seq template generation and amplification was performed as previously described53, 54. In brief, 1 × 107 cells of two CLL patients were crosslinked with 2% formaldehyde (Merck), chromatin was digested with DpnII (New England Biolabs) followed by ligation with T4 ligase (Roche). Next, chromatin was decrosslinked, DNA was digested with Csp6I (NEB) and re-ligated. PCR amplification of viewpoint regions and their ligated fragments was performed using primers 5′-TGCCACACCTCCTTTTGATC-3′ and 5′-CCTTGTGGAAAGAGTCTCAC-3′ (PAX5 putative enhancer, viewpoint fragment-end chr9:37,370,916-37,371,635) or 5′-CCGAGCTGGGGTAGCTGATC-3′ and 5′-TTGTGTCCAAAAGTTGTTTG-3′ (PAX5 promoter, viewpoint fragment-end chr9:37,033,553-37,034,192). Samples were sequenced using a MiSeq instrument (Illumina) using 50-bp single-end reads, and adding 5% PhiX control DNA. Data analysis was performed using 4Cseqpipe version 0.7 (May 2012) (downloaded from http://compgenomics.weizmann.ac.il/tanay/). Before mapping of the interacting regions to the genome, reads that are a consequence of undigested templates or self-ligation of the viewpoint fragment were removed. Human PAX5 enhancer was deleted or mutated in RAMOS cells and in an Epstein–Barr virus (EBV)-transformed lymphoblastoid B-cell line using CRISPR/Cas9 genome editing. Guide RNAs (gRNAs) were designed using E-CRISP tool (http://www.e-crisp.org/E-CRISP/index.html)55. For the deletions, four gRNAs were designed flanking the PAX5 enhancer, two at each side (L1/L2 and R1/R2) to be used in combinations (L1+R1, L1+R2, L2+R1, and L2+R2). In addition, two gRNAs were designed to target sites of mutations found in CLL (M1/M2). gRNAs sequences are: L1, 5′-GGGAACCAGGGCGTGGGAGC-3′; L2, 5′-GTGAGGCAGAAACACCACAG-3′; R1, 5′-GGCAGCATGCGGGCGTCATG-3′, R2, 5′-GCCAGGACCTGCTCTCCCAA-3′; M1, 5′-GTGAAAATTTACTCATGCTG-3′; and M2, 5′-GGTGGTACTCAGAGGCTGGG-3′. The gRNA oligonucleotides were cloned in pL-CRISPR.EFS.GFP vector (Addgene plasmid 57818)56, and lentiviral particles were produced on HEK293T cells by cotransfection with Gag-Pol and vesicular stomatitis virus G (VSV-G)-expressing vectors using the JetPEI transfection reagent (Polyplus). Viral supernatants were collected after 48 h and used for infection by spinoculation of Ramos and EBV-transformed lymphoblastoid B cells. After infection, green fluorescent protein (GFP)-positive cells were sorted (BD Influx, BD Bioscience) and grown for 1 week. Total RNA was extracted with TRIzol (Invitrogen) and converted into cDNA with SuperScript First-Strand Synthesis System (Invitrogen). Then, human PAX5 expression was determined by quantitative real-time PCR (FastStart Universal SYBR Green Master Mix, Roche) using a 7500 Real-Time PCR system (Applied Biosystems). GAPDH was used as normalization control. The following primers were used: PAX5 forward, 5′-GAGCGGGTGTGTGACAATGA-3′; PAX5 reverse, 5′-GCACCGGAGACTCCTGAATAC-3′; GAPDH forward, 5′-GAAGGT GAAGGTCGGAGT-3′; and GAPDH reverse, 5′-GAAGATGGTGATGGGATTTC-3′. To analyse the efficiency of the CRISPR/Cas9-induced deletions, DNA was extracted and PAX5 enhancer was PCR-amplified using HotStarTaq DNA Polymerase (Qiagen) and PAX5 enhancer-flanking oligonucleotides (forward) 5′-GTTGTCTTGGAGGACTTTCAG-3′, and (reverse) 5′- GTGTTATTGTGTATGTGGCAG-3′. To determine the presence of CRISPR/Cas9-induced mutations we performed heteroduplex cleavage assays using the Guide-it Mutation Detection Kit (Clontech) with primers (forward) 5′-AGGATGAGAACGGGCAAAC-3′ and (reverse) 5′-GGAGCTTCCAGCTGAACTGA-3′. Fisher’s test or non-parametric tests were used to correlate clinical and biological variables according to MBL or CLL, and the presence or absence of the different drivers herein analysed. We evaluated the clinical effect (TTT and overall survival) of all driver mutated genes and chromosomal regions with recurrent CNAs in 5 (1%) or more patients. TTT was evaluated only in patients with Binet A and B. TTT and overall survival curves from the date of sampling were plotted by the Kaplan–Meier method and compared by the log-rank test57. We examined separately the prognostic impact of point mutations in driver genes (substitutions or small indels) and CNAs. The clinical impact (TTT) of TP53, ATM and BIRC3 mutations was relatively similar to that of the loss of their respective chromosomal region, that is, del(17p) (TP53) and del(11q) (ATM and BIRC3), respectively (Extended Data Fig. 8). Therefore, to evaluate the prognostic impact for each gene/region, both types of alterations were combined. Although the clinical effect of deletions and mutations was somehow different for del(6q15)/ZNF292 (Extended Data Fig. 8), owing to the fact that most point mutations in ZNF292 were truncating, we also combined these two alterations to investigate the clinical effect. Finally, the number of cases with mutations or CNAs in the respective chromosomal region of 6p21/NFKBIE, 10q24/NFKB2, and 15q15/MGA was too small to perform a separate analysis and therefore we also combined both types of alterations. Multivariate Cox regression analysis was used to assess the independent prognostic impact from Binet stage and IGHV mutational status of each driver in the outcome of the patients. Proportional hazards were checked using Schoenfeld’s test. We adjusted all the P values for multiple comparisons using the Benjamini–Hochberg correction. All statistical tests were two-sided and statistical significance was considered to be significant with an adjusted P ≤ 0.05. All the analyses were performed using the SPSS 20 software (http://www.ibm.com) or R software v3.1.3. Recurrently mutated genes in CLL were defined considering number and type of mutations, gene size and coverage, and local density of mutations derived from the 150 CLL/MBL WGS studies. To test whether a gene was mutated more frequently than expected by chance, we calculated the basal probability for each gene to suffer a non-synonymous mutation (P ) as: In this equation, n is the total number of possible non-synonymous mutations for this gene, n the total number of possible synonymous mutations, L is the effective length of the gene open reading frame (ORF), defined as the sum of the number of bases of the ORF for that gene which are callable at 10× coverage for all exomes or whole genomes analysed, and E is the effective length of all coding regions analysed, defined as the sum of the total lengths of the coding regions that are callable at 10× coverage for all exomes or whole genomes. Finally, δ is the local density of mutations for this locus, which is determined by dividing the number of somatic mutations identified in the 150 WGS studies analysed in a 0.5-Mb region centred on the gene of interest. Thus, the probability P to find M or more non-synonymous mutations in a given gene from a set of N total number of somatic mutations in all patients is: A score is computed by taking the base-10 logarithm of this probability (P). Genes for which more than 10% of somatic mutations caused a synonymous change were removed. Finally, 1,000 Monte–Carlo simulations were performed to estimate the FDR based on the total number of mutations observed (N), and the local mutational density for each gene. To identify genes that might be recurrently mutated in an IGHV subgroup, the same analysis was performed only with tumours belonging to the same group (IGHV-MUT or IGHV-UNMUT), and adjusting the local density of mutations for each subgroup according to the mutations obtained from WGS data. Genes were classified in three different tiers (Extended Data Table 2). Tier 1 corresponds to those genes that were identified as statistically mutated in CLL as described above. Tier 2 includes those genes that are not statistically mutated when analysing CLL, but appeared significant when only one subclass (IGHV-MUT or IGHV-UNMUT) was considered. In addition, genes showing either recurrent mutations affecting the same residue, or resulting in mainly loss-of-function mutations, were included in tier 2. Finally, genes classified in tier 3 include those genes that were not in tiers 1 or 2, but containing somatic mutations previously described as driver mutations in the literature. A sample size of at least 500 tumours was selected during the ICGC study design, as this will give enough power to detect driver genes mutated in at least 3% of tumours19. Drugs with potential therapeutic interactions with driver oncogenic protein products were retrieved as described37.