Copy Number Variation in Human Health, Disease, and Evolution Dr Melody Caramins Acting Director, Genetic Laboratory Services South Eastern Area Laboratory Services Prince of Wales and Sydney Children’s Hospitals Genetic variation • Enormous amount of genetic variation – both inter-population and inter-individual. • Many international collaborative efforts to catalogue genetic variation – HapMap, HVP, 1000 genomes, etc... • A common classification system for human variation is based on size – Watson Crick basepair changes (single base changes). Missense, nonsense, silent – Larger variations which in turn can be balanced or unbalanced Variation and phenotype • Spectrum of variation from no change in discernible phenotype, to alternative phenotypes which are of no medical consequence (“normal”), to medical consequences of varying severity (“disease susceptibility/pathogenic”) • Effects of multiple variants may be additive or epistatic (see Girarajan et al. Nature genetics 2010 and editorial in same issue by Veltman and Brunner), and may help account for variable expressivity. Structural variants • umbrella term to encompass a group of microscopic or submicroscopic genomic alterations involving segments of DNA • may be – quantitative (copy number variants comprising deletions, insertions and duplications) – And/or positional (translocations) – And/or orientational (inversions). • The gain or loss of genomic material is recognised by comparison of reference and sample genomes through hybridisation or sequence analysis, and is described in relation to the reference Copy number variation (CNV) • CNVs have been defined as “a segment of DNA that is 1 kb or larger and is present at a variable copy number in comparison with a reference genome” Feuk L et al. 2006 Nat. Rev. Genet. • 1kb is completely arbitrary, and could argue that 2bp del/dup is a CNV based on a chemical definition (SNP changes only the base in the DNA, whereas the sugarphosphate backbone needs to be disrupted/altered to make a CNV) CNV and phenotype • Large, microscopic genomic dosage effects were amongst the first “pathologies” detected in genetics (e.g. trisomy 21, 1958, Lejeune) • Submicroscopic genomic duplications and deletions causing gene CNV (copy number variation) were shown to cause Mendelian traits such as α thalassaemia ~4 years after advent of Southern Blotting (1975). • These were postulated to have occured via non allelic homologous recombination (NAHR) at the time If it’s not new, then why the big deal now? • In 1991, the first disease-associated submicroscopic duplications were identified (17p12, leading to CMT1A) • The development and application of genomic techniques e.g. array comparative genomic hybridisation (aCGH) in last six years has enabled identification of genomic submicroscopic CNV • Analysis of aCGH and NGS data since has enabled identification of CNV as a significant source of genetic variation CNVs - characterisation • Early descriptions and characterisation of CNV in normal individuals in 2004 – Many CNV contained functional elements/genes, and not “junk” DNA – ~50% recurrent • As of last update of DGV (Nov 2010) – CNVs: 66741 (up from 38406 in 3/09) Inversions: 953 InDels (100bp-1Kb): 34229 Total CNV loci: 15963 • Any individual on average carries 1000 CNV ranging from 443 bp -1.28 Mb, with a median size of 2.9 kb (Conrad et al, 2010). CNV characterisation (cont’d) - Li et al PlosOne 2009 • Affymetrix 500K Array • Discover /characterise CNVs & study differences between Caucasians (n=1000) and Chinese (n=700). • Identified ~3000 CNV • CNV account for ~8% of genome in each ethnic group (CNVs included in DGV database reported to cover 29.7% of genome) • Only 15% CNV overlapped Direct comparison of two CNV surveys using the same SNP array platform and CNV calling algorithms. Pinto D et al. Hum. Mol. Genet. 2007;16:R168-R173 © The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org CNV by NGS • Due to limit of read length, many larger CNV not well covered by NGS • Paired-end reads and paired-end mapping enable detection of kb sized structural variants • NGS can also detect sturctural rearrangements (eg inversions) not identified by CGH • continuous size distribution of SVs in the human genome, smaller = most frequent CNV by NGS • In principle it should be possible to use new NGS technologies to identify all forms of SV by combining paired end read analyses with read-depth analyses. • In reality, this is still quite an analytical challenge, and ?not robust enough, particularly for diagnostic use. CNV contribution to phenotype • Many well known disease phenotypes, with more being identified, in addition to identification of “susceptibility” loci • Preliminary data indicate some contribution of CNV to gene expression, but not as much as SNP (8.75% -17.7% vs. 83.6%-92.5%) • More structural variations expected to be uncovered as sequencing gaps in human genome are closed and multicenter population genetics efforts further our knowledge Mechanisms of CNV formation • Four major mechanisms: – Non allelic homologous recombination (NAHR) – Non homologous end joining (NHEJ) – Fork-stalling and template Switching (FoSTeS) and – L1-mediated retrotransposition, NAHR • Longest known common mechanism, therefore best characterised • caused by misalignment in meiosis and mitosis between two low-copy repeats (LCRs), Alu rpts, pseudogenes, followed by crossing over • NAHR between repeats on different chromosomes can lead to chromosomal translocation Figures adapted from Vogel & Motulsky NAHR cont’d • NAHR in germ line cells (meiosis) leads to constitutional genomic rearrangements that can be manifest as genomic disorders. Genomic disorders can be either inherited (e.g. HNPP) or sporadic (e.g. Smith Magenis syndrome). • NAHR in somatic cells (mitosis), can result in mosaic populations of somatic cells carrying genomic rearrangements, and is well documented in many cancers and mosaic genomic disorders (e.g. somatic NF1 deletions and segmental neurofibromatosis NAHR cont’d • For NAHR to take place, there must be segments of a minimal length sharing extremely high similarity or identity between the LCRs, named minimal efficient processing segments (MEPS) – 300-500 bp in humans • Reciprocal deletions and duplications do not occur at the same frequencies. Some studies suggest two deletions versus one duplication (Turner et al 2008 Nat Genet, Bayes et al 2003 Am J Hum Genet) but further confirmation necessary NAHR cont’d • There seem to be differences in NAHR frequency between male and female gametogenesis in some instances • Several genomic disorders show different percentage parental origins. e.g. >95% of CMT1A duplications, and 85% of spinal muscular atrophy deletions originate in spermatogenesis; 80% of NF1 deletions are of maternal origin , some (e.g. SMS) show no significant parental origin differences • ?intrinsic differences in NAHR in germ lines • ?selection bias against the rearranged allele • combination of both Non Homologous End Joining (NHEJ) • NHEJ is one two major mechanisms used by eukaryotic cells to repair DSB • described in organisms from bacteria to mammals • NHEJ is routinely utilized by human cells to repair both 'physiological' DSBs, (V(D)J) recombinations, and 'pathological' DSBs, (ionizing radiation, ROS damage) . NHEJ (cont’d) • Inherited defects in NHEJ account for about 15% of human severe combined immunodeficiency (SCID) • NHEJ is also currently considered to be the major mechanism rejoining translocated chromosomes in cancer • Unlike NAHR, NHEJ does not need an obligatory substrate, such as LCRs • However, breakpoints of NHEJ-mediated rearrangements often fall within repetitive elements (LINE, Alu) with TTTAAA motifs in proximity NHEJ (cont’d) • NHEJ proceeds in four steps – detection of DSB; – molecular bridging of both broken DNA ends; – modification of the ends to make them compatible and ligatable; – final ligation step • This process determines the two important characteristics of NHEJ: 1. neither LCRs nor MEPS are obligatorily required for NHEJ; 2. NHEJ leaves an 'information scar' at the rejoining site as the pre-rejoining editing of the ends includes cleavage or addition of several nucleotides from or to the ends NHEJ – deletions in DMD • Two studies sequencing the breakpoints of 19 patients with muscular dystrophy due to non-recurrent deletions in introns 47 and 48 of the DMD gene. (Nobile et al 2002, Toffolatti et al 2002) • Deletions were not flanked by LCRs and junctions showed : – microhomology (2 to 4 nucleotides) in 7/19 cases – short insertions (1 to 5 nucleotides) in three cases – short duplications of surrounding fragments up to 25 bp in three cases. – Other junctions either contained short sequences of unknown origin or no microhomology, ?due to the editing process in NHEJ. Fork Stalling and Template Switching • New mechanism proposed by Lee et al in 2007 after observing complicated rearrangements not compatible with NHEJ or NAHR when studying PLP1 region by dense custom aCGH (2 probes/kb) • Observed duplications interrupted by triplicated or deleted fragments, or fragments with normal copy numbers. FoSTeS • According to this model, during DNA replication, – the replication fork stalls at one position, – the lagging strand disengages from the original template, – transfers and then anneals, by virtue of microhomology (2-5 bp) at the 3' end, to another replication fork in physical proximity (not necessarily adjacent in primary sequence) – 'primes', and restarts the DNA synthesis FoSTeS • Breakpoint sequencing data has shown that 22% CNV breakpoints highly complex and consistent FoSTeS events • ? Mechanism of some del/dups in DMD, as well as CNV at LIS1 and MECP2 loci • A generalised form of this model has been proposed to underlie structural variations in genomes from all domains of life, leading not only to, but also creating LCRs that provide the homology for NAHR and predispose to more genomic rearrangements L1 retrotransposition and CNV • ~ 500,000 Long interspersed element-1 (L1) present in the human genome and comprise ~ 17% • However, only ~.02% are intact and encode proteins (ORF1,RNA-binding protein; ORF2, endonuclease and reverse transcriptase activity) • Retrotransposition involves: – Transcription of L1 DNA to L1RNA→reverse transcription to L1cDNA →integration into a new genomic site L1 retrotransposition and CNV • L1 retrotransposition thought to be a major contributor to SV at the haemophilia A locus, and mechanism underlying exon shuffling • Large amount of inter-individual variation in mobility of L1 elements; 0-390% variation in mobilisation capacity for a reference L1 Kazazian et al PNAS 2006 • Retrotransposition in germ cells thought to be uncommon, with most events occuring in embryonic development, and contributing to somatic mosaicism /gene expression differences – e.g. neuronal gene expression in learning/behaviour/memory • Existing data suggest de novo rates for CNV may be orders of magnitude greater than for SNPs; 1x10−6 vs 1x10−8 per locus/generation • Variability of CNV mutation rate across genomic loci likely reflects differences in genomic architecture and therefore CNV mechanism • CNV have been found to be responsible for sporadic traits, Mendelian traits, and complex traits Molecular mechanism of disease causation • • • • • • gene dosage gene interruption gene fusion position effects unmasking of recessive alleles potential transvection • Clinical findings associated CNV imbalance are archived in DECIPHER (~60 syndromes included ) – http://www.decipher.sanger.ac.uk • Other databases include – ECARUCA, (4700 cases, 6700 aberrations, searcheable by aberration, feature, institution) – CHOP (http://www.cnv.chop.edu) – DGV (http://projects.tcag.ca/variation/) CNV in disease • There is a great level of complexity between the presence of CNV and the resulting phenotypes which is not only a direct consequence of specific altered gene dosage. • Analyses of genome-wide functional impact of these structural variants showed that CNV changes not only cause alterations in expression levels of genes within them but also influence the expression of genes in their vicinity • Moreover, a recent study demonstrated that the presence of structural changes associated with CNV is enough to cause a phenotype independent of gene dosage CNV in Disease- Recurrent Genomic Rearrangements • Several well known microdeletions/reciprocal microduplication syndromes known – CMT1A/HNPP – 22q11.2 microduplication and DiGeorge/VCFS – SMS/PTLS • Five recurrent disease associated CNV were actually predicted/ found based on the knowledge of genome architecture and the “rules” for the mechanism of NAHR (1q21.1, 15q13, 15q24, 17q12, and 17q21.31) Sharp et al Am J Hum Gent 2005 RAI1: A Dosage Sensitive Gene Related to Neurobehavioral Alterations Including Autistic Behavior • Smith-Magenis (SMS) and Potocki-Lupski (PTLS) syndromes are associated with a reciprocal microdel/dup at 17p11.2. • The dosage sensitive gene responsible for most phenotypes in SMS has been identified as the Retinoic Acid Induced 1 (RAI1) gene • birth prevalence estimated at 1/20,000 CNV in disease – Non Recurrent Genomic Rearrangements • Some non-recurrent rearrangements can frequently be as common as recurrent rearrangements mediated by LCR/NAHR • Suggests a predisposing role for genomic architectural features • Different sized (0.2–2.6 Mb) microduplication CNVs involving MECP2 appear to be the most common non-recurrent pathogenic subtelomeric microduplication • Deletion CNVs/loss-of-function mutations of MECP2 → – Rett syndrome, neurodevelopmental disorder affecting ~ 1:10,000 girls. – Lethal in males. • Dups → – DD/MR with hypotonia, absent speech, recurrent infections in males – Behavioural/psychiatric symptoms in female carriers IR • Almost 3 years • • • • Developmental delay Recurrent pneumonia Asthma Strabismus • Referred due to abnormality on array CGH (arranged by paediatrician) CGH array – del 14q24.3-14q31.1 minimum size 7.355 Mb 10 OMIM listed genes • OMIM gene or syndrome: ALDH6A1, MMSDH Disorder: Methylmalonate semialdehyde dehydrogenase deficiency (3) OMIM Database 603178: Aldehyde dehydrogenase 6 family, member A1 (methylmalonate semialdehyde dehydrogenase) • OMIM gene or syndrome: CHX10, HOX10, MCOP2, MCOPCB3 Disorder: Microphthalmia, isolated 2, 610093 (3) Disorder: Microphthalmia, isolated, with coloboma 3, 610092 (3) OMIM Database 142993: C. elegans ceh-10 homeo domain-containing homolog • OMIM gene or syndrome: NPC2, HE1 Disorder: Niemann-pick disease, type C2, 607625 (3) OMIM Database 601015: Epididymal secretory protein HE1 • OMIM gene or syndrome: EIF2B2 Disorder: Leukoencephalopathy with vanishing white matter, 603896 (3) Disorder: Ovarioleukodystrophy, 603896 (3) OMIM Database 606454: Eukaryotic translation initiation factor 2B, subunit 2 • OMIM gene or syndrome: MLH3, HNPCC7 Disorder: Colon cancer, hereditary nonpolypopsis, type 7 (3) Disorder: Colorectal cancer, somatic, 114500 (3) Disorder: Endometrial cancer, 608089 (3) OMIM Database 604395: Mismatch repair gene MLH3 • OMIM gene or syndrome: TGFB3 Disorder: Arrhythmogenic right ventricular dysplasia 1, 107970 (3) OMIM Database 190230: Transforming growth factor, beta-3 • OMIM gene or syndrome: ESRRB, ESRL2, DFNB35 Disorder: Deafness, autosomal recessive 35, 608565 (3) OMIM Database 602167: Estrogen-related receptor beta • OMIM gene or syndrome: POMT2 Disorder: Walker-Warburg syndrome, 236670 (3) OMIM Database 607439: Putative protein O-mannosyltransferase 2 • OMIM gene or syndrome: GSTZ1, MAAI Disorder: Tyrosinemia, type Ib (1) OMIM Database 603758: Glutathione S-transferase, zeta-1 (maleylacetoacetate isomerase) • OMIM gene or syndrome: TSHR, CHNG1 Disorder: Hyperthyroidism, familial gestational, 603373 (3) Disorder: Hyperthyroidism, nonautoimmune, 609152 (3) Disorder: Hypothyroidism, congenital, nongoitrous, 1 275200 (3) Disorder: Thyroid adenoma, hyperfunctioning, somatic (3) Disorder: Thyroid carcinoma with thyrotoxicosis (3) OMIM Database 603372: Thyroid-stimulating hormone receptor Clinical relevance Dominant conditions: • TSHR – mutations associated with both hyper- and hypothyroidism and with thyroid adenoma and thyroid carcinoma. • MLH3 – encodes a DNA mismatch repair genes that interacts with MLH1. Mutations in MLH3 associated with colorectal, endometrial and oesophageal cancer – reduced penetrance, ?low-risk gene. • TGFB3 – activating mutations associated with arrhythmogenic right ventricular dysplasia type 1. Recessive conditions : • ALDH6A1 - Methylmalonate semialdehyde dehydrogenase deficiency • CHX10 - Microphthalmia/anophthalmia +/- iris coloboma or other iris abnormalities • NPC2/ HE1 - Niemann-pick disease, type C2 • EIF2B2 - Leukoencephalopathy with vanishing white matter • ESRRB - Deafness, autosomal recessive 35 (non syndromic) • POMT2 – Muscular dystrophy-dystroglycanopathy (MDDG) - 3 subtypes: WalkerWarburg syndrome (type A2 or MEB), a less severe congenital form with mental retardation (type B2) and a milder limb-girdle form (type C2). • MAAI - MAAI deficiency (clinically indistinguishable from Tyrosinemia, type Ib) Other patients with distal interstitial deletion of 14q • • • • • • • • Developmental delay Impaired language Growth retardation Hypotonia Microcephaly Subtle dysmorphism Congenital heart defect Recurrent general infection – 1 patient Clinical Consequences Of Copy-number Variations – Complex traits • CNV also implicated in many complex neurological and psychiatric phenotypes – Autism Spectrum Disorder – Schizophrenia – nuanced connection to various CNVs emerging as factors that are significantly associated but not independently causative for the phenotype • The overall detection rate for genomic rearrangements in children with DD/MR +/multiple congenital anomalies is ~12–18% – 3–5% detected by banded karyotype, – 9–15% detected by aCGH CNV in cancer • Cancer is caused by dysregulation of the expression and activity of genes often mediated by germline or somatic mutations in oncogenes and tumour suppressor genes controoling cell growth and differentiation. • germline and somatic CNVs are now recognized as frequent contributors to the spectrum of mutations leading to cancer development CNV in cancer • Genome-wide analyses SNP arrays have started to define the extent of somatic CNVs in cancer genomes. • In ALL, analysis of leukaemia cells for 242 paediatric ALL patients showed structural rearrangements in genes encoding principal regulators of B-lymphocyte development and differentiation in 40% of cases • In these patients, 54 recurrent somatic regions of deletion were identified that were not present in matched germline samples. • Many of these deletions created fusion proteins in known oncogenes or led to other pathogenic mutations. • Copy number changes in PAX5, a gene on the B-cell development pathway, were found in 57 of 192 Bprogenitor ALL cases CNV and evolution • CNVs are preferentially located outside of genes and ultraconserved elements • Significantly lower proportion of deletions than duplications overlaps with diseaserelated genes and RefSeq genes. • Suggest del CNV subject to purifying selection CNV and evolution • Gene duplication has been thought to be a central mechanism driving evolutionary changes • 27.4% of the examined human genes represent CNVs in one or more of the 10 primate species • Gains outnumber losses (gains/losses = 2.34) • Suggest dup CNV subject to positive selection CNV and evolution • Lineage specific amplification of certain domains (DUF1220) – unknown function • DUF1220 domains are approximately 65 amino acids in length and are encoded by a two-exon doublet, mainly on chr1 (1q21.1, also at 1p36, 1p13.3, and 1p12) • Highly expanded in humans, reduced in African great apes, further reduced in orangutan and Old World monkeys, only single-copy in nonprimate mammals, and absent in nonmammalian species • brain expressed in the hippocampus and within the neocortical neurons • suggests expansion of DUF1220 in the human lineage critical to higher cognitive functions • High correlation of increasing copy number with increase in brain size How are CNVs changing laboratory practice? • Many more diagnoses being made, • Distinction between classical cytogenetics and molecular diagnostics disappearing • diagnostic arrays →less subjective interpretation, enhanced resolution and competitive costs • increasing demand for follow-up diagnostic assays, need for nuanced interpretive skills How are CNVs changing clinical practice? • Opened up analytical potential for clinical cases that previously eluded diagnosis • Many patients and families the satisfaction of an explanation for their observed challenges • More complex counselling • Gaining insight into syndromology, with some improvement in explaining the spectrum of variation and the degree of consistency or inconsistency among phenotypic features Considerations for lab and clinic • the relevance of the results will differ if a CNV is uncovered in : 1. a known disease gene, 2. in a high-risk setting (such as during prenatal complications) 3. through a targeted list (such as an individual with a family history of a disease or as confirmation of an existing clinical diagnosis) 4. in a universal population screen Challenges ahead • Shift in mindset from genetic-based to genomic-based diseases • Candidate gene searches within CNV will need to progress to analyses of added dimensions, (gene and protein pathways/networks), bioinformatics tools will be essential • Challenge to move beyond the obvious benefit of CNVs for diagnosing various phenotypes to their utility in prediction and prognosis • may be burdened for some time with the ‘variant of unknown significance’