Severe loss-of-function variants in the genomes of healthy humans James Harraway, Genetic Pathologist

advertisement
Severe loss-of-function
variants in the genomes
of healthy humans
James Harraway, Genetic Pathologist
Sullivan Nicolaides Pathology/Mater
Pathology
• Loss of Function mutations and
Mendelian disease
• Severe LOF mutations in normal
human genomes
• Implications of these findings
Loss of Function Mutations
and Mendelian Disease
What is a LOF Variant?
‘Classical’ definition of a gene = DNA sequence that
encodes RNA which is translated into a protein
product
“Loss of Function” variants are heritable changes in
DNA sequence which result in reduced/abrogated
function of the protein product encoded by the gene
• Definition of a ‘gene’ is rapidly changing – majority of
transcribed RNA doesn’t encode proteins; DNA regulatory
elements (cis and trans, short and long range); epigenetic
‘histone code’ regulating transcription, RNA editing, etc Æ
increasing future range of LOF variants
‘Severe’ LOF Variants
MacArthur and Tyler-Smith, Hum Mol Gen 2010
Mendelian disease: mutation(s) in one or both alleles
of one gene result in a clinically abnormal phenotype,
with high penetrance and/or expressivity
• Current focus of most diagnostic laboratories that investigate
inherited disease; may change with increasing understanding of
oligo/polygenic contributors to risk of ‘complex’ diseases,
pharmacogenetics etc
Most mutations in known Mendelian diseases = LOF
• More ways to reduce/abrogate function of a gene by random
mutation than to increase/change function (severe LOF
mutations, missense (non-synonymous) mutations, regulatory
region mutations etc)
LOF mutations can lead to recessively inherited disease
- when loss of function of the protein product of one
mutant allele can be ‘compensated’ by the normal
protein product from the other allele
• At least to a ‘threshold’ where clinical phenotype is not evident
LOF mutations may also lead to dominantly inherited
disease via a number of mechanisms
• Haploinsufficiency; subunit imbalances + dominant negative
effects (for specific, often missense LOF mutations)
Severe Loss of Function
Mutations in Healthy
Human Genomes
Severe LOF Mutations: More
Common Than Previously Thought…
Previously, ‘mutational load’ of LOF variants in the
population and each individual could only be
estimated, not directly measured
• e.g. based on observations of the incidence of severe
recessive diseases in the general population, or perinatal
mortality/severe disease as a result of consanguinous
breeding
Most severe LOF mutations have been detected in the
context of clinical diagnosis of Mendelian disorders Æ
implicit assumption that these are likely to be
associated with disease
• ‘Benign’ LOF variants have been assumed to be in the
minority e.g. O allele of ABO blood group
Increasing amounts of genome and exome data
(groups/individuals) Æ number of severe LOF variants
per individual in different populations can be directly
measured:
Yngvadottir et al, Am J Hum Gen 2009:
Sequenced 805 previously reported nonsense SNVs in
1151 people from 56 populations
• 169/805 were variable in the study group
• Each individual on average carried 32 nonsense SNVs (14
homozygously)
Conrad et al, Nat Gen 2010: Array-based testing of
CNVs > 1kb in 450 HapMap individuals
• 213 different complete gene deletions and 34 deletions of
exons predicted to lead to frameshifts, out of 3811 total
deletions
Ng et al, Nature 2009: Exome sequencing of 12
HapMap individuals
• 100 stop SNVs/frameshifting indels or splice-disrupting SNVs
per genome
• 30 on average in the homozygous state
Lupski et al, NEJM 2010: Jim Lupski’s whole
genome (unknown CMT gene)
• 121 stop SNPS and 112 splice-disrupting SNVs (indels were
harder to determine due to short read SOLiD technology).
• Heterozygous for 16 HGMD recessive disease SNVs
• Homozygous for 4 HGMD recessive disease SNVs
• Hemizygous for 1 X-linked disease SNV (ABCD1;XLALD)
• These were attributed to database error (or reduced
penetrance?)
Moore et al, Genet Med 2011: Survey of 10
published genomes for ‘clinically relevant’ variants
• Found on average 10 000 nonsynonymous SNVs in coding
sequences per individual, of which approximately 50 were
nonsense SNVs.
• They also found that on average each individual was
heterozygous for 65 and homozygous for 42 alleles recorded
as disease-associated in OMIM (although no patients were
homozygous for severe LOF alleles in OMIM genes)
Caveats
There are potential caveats to the accuracy of
estimates of severe LOF mutations in healthy
individuals from High Throughput Sequencing data:
• Sequencing call errors/mapping errors/sequence-mediated errors;
accumulating more data leads to more ‘false positive’ errors, and
het Æ hom errors are relatively common
•
And of course, a level of ‘false negative’ errors is unavoidable; exacerbated by
low read depth, and sequence capture effects for exome approaches
• Annotation issues for ‘genes’ (paralogs? pseudogenes?) and
alternative isoforms
• Errors/gaps in the reference sequence (unalignable reads,
problems with assigning SVs)
Despite these caveats, it is clear that there are more
severe LOF variants in healthy humans than perhaps
would have been expected
• Somewhere between 100-300 nonsense SNVs/SVs Æ gene
deletions/other severe LOF variants per healthy genome;
homozygous for between 10-50 of these
Of course, beyond severe LOF variants, there are a
(much) larger number of putative LOF variants – nonsynonymous/missense SNVs; regulatory region
variants, etc
• These are beyond the scope of this talk; they are harder to
assign functional relevance, and the high number of nonsynonymous SNVs per genome is perhaps less surprising than
severe LOF variants
• However, they are likely to be deleterious as a class; nonsynonymous SNVs make up the majority of variants known to
be responsible for Mendelian diseases, and are overrepresented
as rare alleles at a population level, indicating selection
pressure (e.g. Li et al., Nat Gen, 2010)
Implications for clinical
diagnostics
Common approach in molecular diagnostic laboratories
is to report severe LOF mutations in genes sequenced
for Mendelian disease as “likely/definitely pathogenic”
This is reasonable if there is a high a priori probability
that the gene is causative for the disorder, e.g. it is a
‘known’ disease gene for an identifiable Mendelian
disorder
• Includes when severe LOF mutations are discovered during a
survey of a relatively small ‘shortlist’ of candidate genes
However, we are moving toward an era of ‘genotype
first’ discovery of the genetic basis of some Mendelian
disorders (Mefford, Genet Med, 2009) Æ exome or
genome-sized data sets
Exome (or genome) sequencing is becoming
commonplace in the literature for discovery of causative
mutations in Mendelian disorders without clear candidate
gene shortlist/in small families (linkage studies not
possible)
• Over 40 such studies have been published to date
Array based testing for LOF structural variants has
become standard of care in patients with developmental
delay
• Exome and eventually whole genome sequencing will likely
become standard of care for this purpose to detect small scale
variants as well as SVs (e.g. O’Roak et al, Nat Gen 2011)
Moving beyond simple Mendelian disorders, HTS is being
applied in GWAS and family studies
• Discover risk alleles for non-Mendelian (oligo- or polygenic)
diseases with significant heritability
• Pharmacogenomic applications
How can genome-wide data be used in
clinical practice for inherited disease?
Sequencing exomes or whole genomes in each patient
will yield multiple severe LOF mutations per genome,
and even more putative LOF mutations (e.g. missense
variants)
Studies published to date have managed to ‘pinpoint’
the clinically significant variant(s) for Mendelian
disorders, but this may reflect publication bias
The amount of time/expense that has gone into such
publications may be difficult to apply in a high
throughput, relatively rapid TAT clinical laboratory; how
can this process become more streamlined?
Improved Sequencing Technology
Improved sequencing technologies (and
calling/mapping/filtering algorithms) should:
• Reduce sequencing cost
• Increase read depth and accuracy of sequence data + speed/utility
of variant filtering
• May allow de novo assembly (reduce reliance on reference, and
further refinements will be made to reference as data
accumulates)
• Allow rapid validation of variants of interest on different platforms
Annotation of the genome will become more accurate
with improved sequence data from more individuals
• At both DNA level (genes vs paralogs/pseudogenes) and through
RNA sequencing (better understanding of multiple
transcripts/isoforms, and relation to function in different tissues)
• Clinical labs will need to come to a consensus on which annotation
set to use – e.g. GENCODE
Improved software to predict
clinical significance
Software will (hopefully) improve for predicting clinical
significance of novel severe LOF mutations
• Algorithms to predict likelihood of haploinsufficiency based on gene
characteistics (e.g. Huang et al, PLOS Genetics, 2010)
• Includes gene characteristics (size, conservation, expression
during devel. and tissue specificity) + interaction network models
that estimate the whether a gene/protein is a network ‘node’
Refinements to software used to predict the functional
effect of less severe LOF mutations
• Algorithms based on evolutionary conservation should improve
with more complete alignments, from increased sequencing of
non-human species
• Algorithms based on protein structure/function should become
more effective, combining effects of substitutions on structure
(folding/protein stability/free energy) with information on known
‘vital’ domains (in particular domains where other mutations Æ
clinical phenotype)
Improved Databases – Normal
Variants
Large scale projects such as the 1000 genomes project
Æ broader picture of the variability in ‘normal’ genomes
• Which LOF mutations can be tolerated without severe disease, and
which genes have common LOF mutations and might be seen as
‘non-essential’ (with the caveat that phenotypic data is not
available)
• This will help inform filtering algorithms for clinically significant
variants
One goal of the 1000 genomes project is to catalogue,
validate and annotate all putative coding LOF
mutations present at over 0.1% frequency in the
populations genotyped
• Eventually, it will be useful to study the phenotypic attributes of
‘normal’ individuals with severe LOF mutations in various genes –
as they do appear to reduce fitness on an evolutionary scale given
population frequencies
Improved Databases – Clinically
Significant Variants
The clinical diagnostic community needs to contribute,
by cataloguing LOF variants that do cause or contribute
to disease
• Through publication, and contribution of SN variants/small indels
to locus specific databases Æ HGMD (which, as noted previously,
contains misattributed ‘pathogenic’ alleles and needs continued
curation and improvement)
• Through publication, and contribution of structural variants to
databases such as DECIPHER and the ISCA database Æ dbVar
Ultimately, through a central curated repository of all
disease-causing and risk-associated variants– e.g. the
Human Variome Project
Patient-Specific, Clinical Approaches
Broad family studies will often be vital
• Segregation of variants with disease will need to be established
whenever possible, especially in the case where there are a
number of possible disease-causing variant alleles after
filtering/validation/database searches/in silico predictions of
pathogenicity
• Sequencing multiple family members simultaneously will likely
become the norm (e.g. parent-child trios for children with dev
delay, which will detect de novo as well as inherited mutations)
Close liason between clinicians and the clinical lab
• Detailed phenotypic information will optimise selection of the
clinically significant gene(s) from all of those containing LOF
variants, based on known gene function/linked phenotypes
• Feedback from the lab to the clinician should occur when new data
on variants becomes available that refines the assignment of
pathogenicity
Summary
Loss of function variants are important causes of
Mendelian disease (and contribute to complex disease)
A number of recent studies have established that even
severe LOF variants are relatively common in each
human genome, and many are rare/novel
This has important implications for clinical diagnostic
laboratories as we move toward exome/genome NGS,
with increasing challenges in assigning pathogenicity to
discovered variants
Hopefully these challenges can be met, by a
combination of technological and software
improvements, improving databases of sequence both
normal and clinically affected individuals, and optimal
lab-clinical interface
Questions?
Download