Copy Number Variation in Human Health, Disease, and Evolution

advertisement
Copy Number Variation in
Human Health, Disease, and
Evolution
Dr Melody Caramins
Acting Director, Genetic Laboratory Services
South Eastern Area Laboratory Services
Prince of Wales and Sydney Children’s Hospitals
Genetic variation
• Enormous amount of genetic variation – both
inter-population and inter-individual.
• Many international collaborative efforts to
catalogue genetic variation – HapMap, HVP,
1000 genomes, etc...
• A common classification system for human
variation is based on size
– Watson Crick basepair changes (single base
changes). Missense, nonsense, silent
– Larger variations which in turn can be balanced or
unbalanced
Variation and phenotype
• Spectrum of variation from no change in
discernible phenotype, to alternative
phenotypes which are of no medical
consequence (“normal”), to medical
consequences of varying severity (“disease
susceptibility/pathogenic”)
• Effects of multiple variants may be additive or
epistatic (see Girarajan et al. Nature genetics 2010 and editorial
in same issue by Veltman and Brunner), and may help
account for variable expressivity.
Structural variants
• umbrella term to encompass a group of
microscopic or submicroscopic genomic
alterations involving segments of DNA
• may be
– quantitative (copy number variants comprising
deletions, insertions and duplications)
– And/or positional (translocations)
– And/or orientational (inversions).
• The gain or loss of genomic material is
recognised by comparison of reference and
sample genomes through hybridisation or
sequence analysis, and is described in
relation to the reference
Copy number variation (CNV)
• CNVs have been defined as “a segment of
DNA that is 1 kb or larger and is present at a
variable copy number in comparison with a
reference genome” Feuk L et al. 2006 Nat. Rev. Genet.
• 1kb is completely arbitrary, and could argue
that 2bp del/dup is a CNV based on a
chemical definition (SNP changes only the
base in the DNA, whereas the sugarphosphate backbone needs to be
disrupted/altered to make a CNV)
CNV and phenotype
• Large, microscopic genomic dosage effects
were amongst the first “pathologies” detected
in genetics (e.g. trisomy 21, 1958, Lejeune)
• Submicroscopic genomic duplications and
deletions causing gene CNV (copy number
variation) were shown to cause Mendelian
traits such as α thalassaemia ~4 years after
advent of Southern Blotting (1975).
• These were postulated to have occured via
non allelic homologous recombination
(NAHR) at the time
If it’s not new, then why the big
deal now?
• In 1991, the first disease-associated
submicroscopic duplications were identified
(17p12, leading to CMT1A)
• The development and application of genomic
techniques e.g. array comparative genomic
hybridisation (aCGH) in last six years has
enabled identification of genomic submicroscopic CNV
• Analysis of aCGH and NGS data since has
enabled identification of CNV as a significant
source of genetic variation
CNVs - characterisation
• Early descriptions and characterisation of
CNV in normal individuals in 2004
– Many CNV contained functional elements/genes,
and not “junk” DNA
– ~50% recurrent
• As of last update of DGV (Nov 2010)
– CNVs: 66741 (up from 38406 in 3/09)
Inversions: 953
InDels (100bp-1Kb): 34229
Total CNV loci: 15963
• Any individual on average carries 1000
CNV ranging from 443 bp -1.28 Mb, with a
median size of 2.9 kb (Conrad et al, 2010).
CNV characterisation (cont’d) - Li et al
PlosOne 2009
• Affymetrix 500K Array
• Discover /characterise
CNVs & study differences
between Caucasians
(n=1000) and Chinese
(n=700).
• Identified ~3000 CNV
• CNV account for ~8% of
genome in each ethnic
group (CNVs included in
DGV database reported to
cover 29.7% of genome)
• Only 15% CNV
overlapped
Direct comparison of two CNV surveys using the same SNP array platform and CNV calling
algorithms.
Pinto D et al. Hum. Mol. Genet. 2007;16:R168-R173
© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions,
please email: journals.permissions@oxfordjournals.org
CNV by NGS
• Due to limit of read length,
many larger CNV not well
covered by NGS
• Paired-end reads and
paired-end mapping enable
detection of kb sized
structural variants
• NGS can also detect
sturctural rearrangements
(eg inversions) not identified
by CGH
• continuous size distribution
of SVs in the human
genome, smaller = most frequent
CNV by NGS
• In principle it should be possible to use new
NGS technologies to identify all forms of SV
by combining paired end read analyses with
read-depth analyses.
• In reality, this is still quite an analytical
challenge, and ?not robust enough,
particularly for diagnostic use.
CNV contribution to phenotype
• Many well known disease phenotypes, with
more being identified, in addition to
identification of “susceptibility” loci
• Preliminary data indicate some contribution
of CNV to gene expression, but not as much
as SNP (8.75% -17.7% vs. 83.6%-92.5%)
• More structural variations expected to be
uncovered as sequencing gaps in human
genome are closed and multicenter
population genetics efforts further our
knowledge
Mechanisms of CNV formation
• Four major mechanisms:
– Non allelic homologous recombination (NAHR)
– Non homologous end joining (NHEJ)
– Fork-stalling and template Switching (FoSTeS)
and
– L1-mediated retrotransposition,
NAHR
• Longest known common mechanism, therefore best
characterised
• caused by misalignment in meiosis and mitosis between
two low-copy repeats (LCRs), Alu rpts, pseudogenes,
followed by crossing over
• NAHR between repeats on different chromosomes can
lead to chromosomal translocation
Figures adapted from
Vogel & Motulsky
NAHR cont’d
• NAHR in germ line cells (meiosis) leads to
constitutional genomic rearrangements that
can be manifest as genomic disorders.
Genomic disorders can be either inherited (e.g.
HNPP) or sporadic (e.g. Smith Magenis
syndrome).
• NAHR in somatic cells (mitosis), can result in
mosaic populations of somatic cells carrying
genomic rearrangements, and is well
documented in many cancers and mosaic
genomic disorders (e.g. somatic NF1 deletions
and segmental neurofibromatosis
NAHR cont’d
• For NAHR to take place, there must be
segments of a minimal length sharing
extremely high similarity or identity between
the LCRs, named minimal efficient
processing segments (MEPS) – 300-500 bp
in humans
• Reciprocal deletions and duplications do not
occur at the same frequencies. Some studies
suggest two deletions versus one duplication
(Turner et al 2008 Nat Genet, Bayes et al 2003 Am J Hum Genet) but
further confirmation necessary
NAHR cont’d
• There seem to be differences in NAHR
frequency between male and female
gametogenesis in some instances
• Several genomic disorders show different
percentage parental origins. e.g. >95% of
CMT1A duplications, and 85% of spinal
muscular atrophy deletions originate in
spermatogenesis; 80% of NF1 deletions are of
maternal origin , some (e.g. SMS) show no
significant parental origin differences
• ?intrinsic differences in NAHR in germ lines
• ?selection bias against the rearranged allele
• combination of both
Non Homologous End Joining
(NHEJ)
• NHEJ is one two major mechanisms used by
eukaryotic cells to repair DSB
• described in organisms from bacteria to
mammals
• NHEJ is routinely utilized by human cells to
repair both 'physiological' DSBs, (V(D)J)
recombinations, and 'pathological' DSBs,
(ionizing radiation, ROS damage)
.
NHEJ (cont’d)
• Inherited defects in NHEJ account for about
15% of human severe combined
immunodeficiency (SCID)
• NHEJ is also currently considered to be the
major mechanism rejoining translocated
chromosomes in cancer
• Unlike NAHR, NHEJ does not need an
obligatory substrate, such as LCRs
• However, breakpoints of NHEJ-mediated
rearrangements often fall within repetitive
elements (LINE, Alu) with TTTAAA motifs in
proximity
NHEJ (cont’d)
• NHEJ proceeds in four
steps
– detection of DSB;
– molecular bridging of both
broken DNA ends;
– modification of the ends to
make them compatible
and ligatable;
– final ligation step
• This process determines the two important
characteristics of NHEJ:
1. neither LCRs nor MEPS are obligatorily required
for NHEJ;
2. NHEJ leaves an 'information scar' at the
rejoining site as the pre-rejoining editing of the
ends includes cleavage or addition of several
nucleotides from or to the ends
NHEJ – deletions in DMD
• Two studies sequencing the breakpoints of
19 patients with muscular dystrophy due to
non-recurrent deletions in introns 47 and 48
of the DMD gene. (Nobile et al 2002, Toffolatti et al 2002)
• Deletions were not flanked by LCRs and
junctions showed :
– microhomology (2 to 4 nucleotides) in 7/19 cases
– short insertions (1 to 5 nucleotides) in three cases
– short duplications of surrounding fragments up to
25 bp in three cases.
– Other junctions either contained short sequences
of unknown origin or no microhomology, ?due to
the editing process in NHEJ.
Fork Stalling and Template
Switching
• New mechanism proposed by Lee et al in
2007 after observing complicated
rearrangements not compatible with NHEJ or
NAHR when studying PLP1 region by dense
custom aCGH (2 probes/kb)
• Observed duplications interrupted by
triplicated or deleted fragments, or fragments
with normal copy numbers.
FoSTeS
• According to this model, during DNA replication,
– the replication fork stalls at one position,
– the lagging strand disengages from the original
template,
– transfers and then anneals, by virtue of microhomology
(2-5 bp) at the 3' end, to another replication fork in
physical proximity (not necessarily adjacent in primary
sequence)
– 'primes', and restarts the DNA synthesis
FoSTeS
• Breakpoint sequencing data has shown that
22% CNV breakpoints highly complex and
consistent FoSTeS events
• ? Mechanism of some del/dups in DMD, as
well as CNV at LIS1 and MECP2 loci
• A generalised form of this model has been
proposed to underlie structural variations in
genomes from all domains of life, leading not
only to, but also creating LCRs that provide
the homology for NAHR and predispose to
more genomic rearrangements
L1 retrotransposition and CNV
• ~ 500,000 Long interspersed element-1 (L1)
present in the human genome and comprise
~ 17%
• However, only ~.02% are intact and encode
proteins (ORF1,RNA-binding protein; ORF2,
endonuclease and reverse transcriptase
activity)
• Retrotransposition involves:
– Transcription of L1 DNA to L1RNA→reverse
transcription to L1cDNA →integration into a new
genomic site
L1 retrotransposition and CNV
• L1 retrotransposition thought to be a major
contributor to SV at the haemophilia A locus,
and mechanism underlying exon shuffling
• Large amount of inter-individual variation in
mobility of L1 elements; 0-390% variation in
mobilisation capacity for a reference L1
Kazazian et al PNAS 2006
• Retrotransposition in germ cells thought to be
uncommon, with most events occuring in
embryonic development, and contributing to
somatic mosaicism /gene expression
differences – e.g. neuronal gene expression
in learning/behaviour/memory
• Existing data suggest de novo rates for CNV
may be orders of magnitude greater than for
SNPs; 1x10−6 vs 1x10−8 per locus/generation
• Variability of CNV mutation rate across
genomic loci likely reflects differences in
genomic architecture and therefore CNV
mechanism
• CNV have been found to be responsible for
sporadic traits, Mendelian traits, and complex
traits
Molecular mechanism of disease
causation
•
•
•
•
•
•
gene dosage
gene interruption
gene fusion
position effects
unmasking of recessive alleles
potential transvection
• Clinical findings associated CNV imbalance
are archived in DECIPHER (~60 syndromes
included )
– http://www.decipher.sanger.ac.uk
• Other databases include
– ECARUCA, (4700 cases, 6700 aberrations,
searcheable by aberration, feature, institution)
– CHOP (http://www.cnv.chop.edu)
– DGV (http://projects.tcag.ca/variation/)
CNV in disease
• There is a great level of complexity between the
presence of CNV and the resulting phenotypes
which is not only a direct consequence of specific
altered gene dosage.
• Analyses of genome-wide functional impact of these
structural variants showed that CNV changes not
only cause alterations in expression levels of genes
within them but also influence the expression of
genes in their vicinity
• Moreover, a recent study demonstrated that the
presence of structural changes associated with CNV
is enough to cause a phenotype independent of
gene dosage
CNV in Disease- Recurrent Genomic
Rearrangements
• Several well known microdeletions/reciprocal
microduplication syndromes known
– CMT1A/HNPP
– 22q11.2 microduplication and DiGeorge/VCFS
– SMS/PTLS
• Five recurrent disease associated CNV were
actually predicted/ found based on the
knowledge of genome architecture and the
“rules” for the mechanism of NAHR (1q21.1,
15q13, 15q24, 17q12, and 17q21.31) Sharp et al
Am J Hum Gent 2005
RAI1: A Dosage Sensitive Gene Related to
Neurobehavioral Alterations Including Autistic
Behavior
• Smith-Magenis (SMS) and Potocki-Lupski
(PTLS) syndromes are associated with a
reciprocal microdel/dup at 17p11.2.
• The dosage sensitive gene responsible for
most phenotypes in SMS has been identified
as the Retinoic Acid Induced 1 (RAI1) gene
• birth prevalence estimated at 1/20,000
CNV in disease – Non Recurrent Genomic
Rearrangements
• Some non-recurrent rearrangements can
frequently be as common as recurrent
rearrangements mediated by LCR/NAHR
• Suggests a predisposing role for genomic
architectural features
• Different sized (0.2–2.6 Mb) microduplication
CNVs involving MECP2 appear to be the
most common non-recurrent pathogenic
subtelomeric microduplication
• Deletion CNVs/loss-of-function mutations of
MECP2 →
– Rett syndrome, neurodevelopmental disorder
affecting ~ 1:10,000 girls.
– Lethal in males.
• Dups →
– DD/MR with hypotonia, absent speech, recurrent
infections in males
– Behavioural/psychiatric symptoms in female
carriers
IR
• Almost 3 years
•
•
•
•
Developmental delay
Recurrent pneumonia
Asthma
Strabismus
• Referred due to abnormality on array CGH
(arranged by paediatrician)
CGH array – del 14q24.3-14q31.1
minimum size 7.355 Mb
10 OMIM listed genes
•
OMIM gene or syndrome: ALDH6A1, MMSDH
Disorder: Methylmalonate semialdehyde dehydrogenase deficiency (3)
OMIM Database 603178: Aldehyde dehydrogenase 6 family, member A1 (methylmalonate semialdehyde dehydrogenase)
•
OMIM gene or syndrome: CHX10, HOX10, MCOP2, MCOPCB3
Disorder: Microphthalmia, isolated 2, 610093 (3)
Disorder: Microphthalmia, isolated, with coloboma 3, 610092 (3)
OMIM Database 142993: C. elegans ceh-10 homeo domain-containing homolog
•
OMIM gene or syndrome: NPC2, HE1
Disorder: Niemann-pick disease, type C2, 607625 (3)
OMIM Database 601015: Epididymal secretory protein HE1
•
OMIM gene or syndrome: EIF2B2
Disorder: Leukoencephalopathy with vanishing white matter, 603896 (3)
Disorder: Ovarioleukodystrophy, 603896 (3)
OMIM Database 606454: Eukaryotic translation initiation factor 2B, subunit 2
•
OMIM gene or syndrome: MLH3, HNPCC7
Disorder: Colon cancer, hereditary nonpolypopsis, type 7 (3)
Disorder: Colorectal cancer, somatic, 114500 (3)
Disorder: Endometrial cancer, 608089 (3)
OMIM Database 604395: Mismatch repair gene MLH3
•
OMIM gene or syndrome: TGFB3
Disorder: Arrhythmogenic right ventricular dysplasia 1, 107970 (3)
OMIM Database 190230: Transforming growth factor, beta-3
•
OMIM gene or syndrome: ESRRB, ESRL2, DFNB35
Disorder: Deafness, autosomal recessive 35, 608565 (3)
OMIM Database 602167: Estrogen-related receptor beta
•
OMIM gene or syndrome: POMT2
Disorder: Walker-Warburg syndrome, 236670 (3)
OMIM Database 607439: Putative protein O-mannosyltransferase 2
•
OMIM gene or syndrome: GSTZ1, MAAI
Disorder: Tyrosinemia, type Ib (1)
OMIM Database 603758: Glutathione S-transferase, zeta-1 (maleylacetoacetate isomerase)
•
OMIM gene or syndrome: TSHR, CHNG1
Disorder: Hyperthyroidism, familial gestational, 603373 (3)
Disorder: Hyperthyroidism, nonautoimmune, 609152 (3)
Disorder: Hypothyroidism, congenital, nongoitrous, 1 275200 (3)
Disorder: Thyroid adenoma, hyperfunctioning, somatic (3)
Disorder: Thyroid carcinoma with thyrotoxicosis (3)
OMIM Database 603372: Thyroid-stimulating hormone receptor
Clinical relevance
Dominant conditions:
• TSHR – mutations associated with both hyper- and hypothyroidism and with thyroid
adenoma and thyroid carcinoma.
•
MLH3 – encodes a DNA mismatch repair genes that interacts with MLH1. Mutations in
MLH3 associated with colorectal, endometrial and oesophageal cancer – reduced
penetrance, ?low-risk gene.
•
TGFB3 – activating mutations associated with arrhythmogenic right ventricular
dysplasia type 1.
Recessive conditions :
• ALDH6A1 - Methylmalonate semialdehyde dehydrogenase deficiency
• CHX10 - Microphthalmia/anophthalmia +/- iris coloboma or other iris abnormalities
• NPC2/ HE1 - Niemann-pick disease, type C2
• EIF2B2 - Leukoencephalopathy with vanishing white matter
• ESRRB - Deafness, autosomal recessive 35 (non syndromic)
• POMT2 – Muscular dystrophy-dystroglycanopathy (MDDG) - 3 subtypes: WalkerWarburg syndrome (type A2 or MEB), a less severe congenital form with mental
retardation (type B2) and a milder limb-girdle form (type C2).
• MAAI - MAAI deficiency (clinically indistinguishable from Tyrosinemia, type Ib)
Other patients with distal interstitial
deletion of 14q
•
•
•
•
•
•
•
•
Developmental delay
Impaired language
Growth retardation
Hypotonia
Microcephaly
Subtle dysmorphism
Congenital heart defect
Recurrent general infection – 1 patient
Clinical Consequences Of Copy-number
Variations – Complex traits
• CNV also implicated in many complex
neurological and psychiatric phenotypes
– Autism Spectrum Disorder
– Schizophrenia
– nuanced connection to various CNVs emerging
as factors that are significantly associated but not
independently causative for the phenotype
• The overall detection rate for genomic
rearrangements in children with DD/MR +/multiple congenital anomalies is ~12–18%
– 3–5% detected by banded karyotype,
– 9–15% detected by aCGH
CNV in cancer
• Cancer is caused by dysregulation of the
expression and activity of genes often
mediated by germline or somatic mutations in
oncogenes and tumour suppressor genes
controoling cell growth and differentiation.
• germline and somatic CNVs are now
recognized as frequent contributors to the
spectrum of mutations leading to cancer
development
CNV in cancer
• Genome-wide analyses SNP arrays have started to
define the extent of somatic CNVs in cancer
genomes.
• In ALL, analysis of leukaemia cells for 242 paediatric
ALL patients showed structural rearrangements in
genes encoding principal regulators of B-lymphocyte
development and differentiation in 40% of cases
• In these patients, 54 recurrent somatic regions of
deletion were identified that were not present in
matched germline samples.
• Many of these deletions created fusion proteins in
known oncogenes or led to other pathogenic
mutations.
• Copy number changes in PAX5, a gene on the B-cell
development pathway, were found in 57 of 192 Bprogenitor ALL cases
CNV and evolution
• CNVs are preferentially located outside of
genes and ultraconserved elements
• Significantly lower proportion of deletions
than duplications overlaps with diseaserelated genes and RefSeq genes.
• Suggest del CNV subject to purifying
selection
CNV and evolution
• Gene duplication has been thought to be a
central mechanism driving evolutionary
changes
• 27.4% of the examined human genes
represent CNVs in one or more of the 10
primate species
• Gains outnumber losses (gains/losses =
2.34)
• Suggest dup CNV subject to positive
selection
CNV and evolution
• Lineage specific amplification of certain
domains (DUF1220) – unknown function
• DUF1220 domains are approximately 65
amino acids in length and are encoded by a
two-exon doublet, mainly on chr1 (1q21.1,
also at 1p36, 1p13.3, and 1p12)
• Highly expanded in humans, reduced in
African great apes, further reduced in
orangutan and Old World monkeys, only
single-copy in nonprimate mammals, and
absent in nonmammalian species
• brain expressed in the hippocampus and
within the neocortical neurons
• suggests expansion of DUF1220 in the
human lineage critical to higher cognitive
functions
• High correlation of increasing copy number
with increase in brain size
How are CNVs changing laboratory
practice?
• Many more diagnoses being made,
• Distinction between classical cytogenetics
and molecular diagnostics disappearing
• diagnostic arrays →less subjective
interpretation, enhanced resolution and
competitive costs
• increasing demand for follow-up diagnostic
assays, need for nuanced interpretive skills
How are CNVs changing clinical
practice?
• Opened up analytical potential for clinical
cases that previously eluded diagnosis
• Many patients and families the satisfaction of
an explanation for their observed challenges
• More complex counselling
• Gaining insight into syndromology, with some
improvement in explaining the spectrum of
variation and the degree of consistency or
inconsistency among phenotypic features
Considerations for lab and clinic
• the relevance of the results will differ if a
CNV is uncovered in :
1. a known disease gene,
2. in a high-risk setting (such as during prenatal
complications)
3. through a targeted list (such as an individual with
a family history of a disease or as confirmation
of an existing clinical diagnosis)
4. in a universal population screen
Challenges ahead
• Shift in mindset from genetic-based to
genomic-based diseases
• Candidate gene searches within CNV will
need to progress to analyses of added
dimensions, (gene and protein
pathways/networks), bioinformatics tools will
be essential
• Challenge to move beyond the obvious
benefit of CNVs for diagnosing various
phenotypes to their utility in prediction and
prognosis
• may be burdened for some time with the
‘variant of unknown significance’
Download