Human non-synonymous SNPs: molecular function, evolution and disease

advertisement
Human non-synonymous SNP:
molecular function, evolution and disease
Shamil Sunyaev
Genetics Division, Brigham & Women’s Hospital
Harvard Medical School
Harvard-M.I.T. Division of HST
Effect on molecular function
Structural Biology
Biochemistry
Evolutionary Genetics
Medical Genetics
Phenotype
Natural selection
Predicting the effect of mutations in proteins
Why is this useful?


Understanding variation in molecular
function and structure
Evolutionary genetics: comparison of
polymorphism and divergence rates between different
functional categories is a robust way to detect
selection
Linkage analysis
Rare
Classical association studies
Common
Disease
Control
Why is this useful?

Rare human developmental disorders / mouse
mutagenesis screens: linkage studies are impossible

Genetics of complex disease: SNP prioritization

Genetics of complex disease: Rare variants
Technically, polymorphism
should not exist!
Mendelists
Biometricians
Quantitative trait
Forces to maintain
variation:
Selection
Mutation
Common disease / Common variant
Trade off (antagonistic pleiotropy)
Balancing selection
Recent positive selection
Reverse in direction of selection
Examples
APOE
AGT
CYP3A
CAPN10
Alzheimer’s disease
Hypertension
Hypertension
Type 2 diabetes
Individual human genome is a target for
deleterious mutations !
Frequency of deleterious variants is
directly proportional to mutation rate
(q=m/s)
~40% of human Mendelian diseases are
due to hypermutable sites
Multiple mostly rare variants
Many deleterious alleles in mutation-selection
balance
Examples
Plasma level of HDL-C
Plasma level of LDL-C
Colorectal adenomas
What about late onset phenotypes?
Harmful mutations

Function: damaging

Evolution: deleterious



Advantageous
pseudogenization (Zhang et
al. 2006)
Gain of function disease
mutations
Phenotype: detrimental

Sickle Cell Anemia
protein
multiple alignment
N
R
G
T
G
G
G
R
G
R
S
G
G
N
E
E
E
G
P
Q
Q
Q
K
Q
D
K
A
A
G
A
L
S
S
G
K
E
Q
D
K
K
S
R
S
A
T
V
A
L
L
A
A
V
V
L
A
A
T
L
P
V
T
T
R
T
K
T
T
S
T
T
T
S
Q
K
T
L
I
L
L
M
L
L
L
L
F
L
L
L
L
L
T
T
S
T
R
W
S
T
K
T
T
N
R
T
T
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
L
L
V
T
I
E
F
L
Q
F
R
T
K
L
V
A
V
A
V
P
P
P
V
Q
V
V
F
Y
V
V
R
T
S
S
E
I
I
V
N
V
S
S
S
V
S
G
G
G
G
G
G
N
D
Y
D
N
F
F
I
F
F
S
M
L
-
S
S
T
S
N
D
V
E
-
N
S
A
P
P
F
L
K
S
S
P
F
L
A
S
S
E
P
K
A
S
S
G
G
G
G
N
K
D
A
A
K
Y
D
D
G
S
H
H
H
D
H
D
G
S
T
N
G
V
V
Y
Y
P
S
L
I
D
A
L
Q
P
V
V
L
F
D
D
V
A
S
S
T
H
E
Y
Y
N
N
V
V
M
M
V
V
L
V
M
L
V
F
L
V
V
R
Q
Q
G
F
F
Y
E
Y
T
S
W
F
T
S
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
L
M
V
V
Y
Y
Y
T
Y
E
W
Y
Y
N
T
Profile
Ala
Arg
Asn
Asp
Cys
Gln
...
-1.2
0.6
-1.1
-0.9
0.4
...
...
1.1
-0.3
-0.5
-0.3
-0.5
...
...
-0.6
-0.3
-0.5
-0.3
0.6
...
...
-0.8
-0.5
-0.7
-0.5
0.8
...
...
0.3
0.6
0.4
0.6
-0.3
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
profile
PolyPhen
Prediction rate of damaging
substitutions
possibly
probably
Disease mutations
82%
57%
Divergence
9%
3%
27%
15%
Polymorphism
10% of PolyPhen false-positives are due to
compensatory substitutions
Williamson et al.,
PNAS 2005
Estimate of selection coefficient
Phylogenetic measures
PAM-120
-5.32
-8.35*
-12.76*
BLOSUM-45
-8.41*
-3.96
-13.39*
BLOSUM-62
-8.41*
-4.09
-12.75*
BLOSUM-80
-8.46*
-4.49
-13.52*
Site-specific structural/phylogenetic measures
Polyphen
-6.072*
-11.732*
-23.602*
de novo mutation effect spectrum
Effect of new mutation may range from lethal, to
neutral, to slightly beneficial
NO DELETERIOUS POLYMORPHISM
LOTS OF DELETERIOUS POLYMORPHISM
Mutation effect spectrum
?
NO DELETERIOUS POLYMORPHISM
LOTS OF DELETERIOUS POLYMORPHISM
Neutral mutation model
Human
Chimpanzee
Baboon
ACCTTGCAAAT
ACCTTACAAAT
ACCTTACAAAT
Prob(TAC->TGC)  Prob(TGC->TAC)
Prob(XY1Z->XY2Z) 64x3 matrix
Strongly detrimental mutations
Effectively neutral mutations
Mildly deleterious mutations
Mildly deleterious mutations
54 genes, 757 individuals
inflammatory response
236 genes, 46-47 individuals
DNA repair and
cell cycle pathways
518 genes, 90-95 individuals
Frequency itself is a reliable
predictor of function!
Set
Number of sequenced
individuals
Percent of deleterious
SNPs among missense
“singlets”
757
70%
NIEHS- EGP
90- 95
63%
SeattleSNPs
46- 47
54%
Mc Pherson set
The majority of missense mutations observed at
frequency below 1% are deleterious
Fitness and selection coefficient
Wild type
New mutation
N1= 4
Fitness
1
N2= 3
N2
N1
=1–s
Selection coefficient
Mildly deleterious mutations
54 genes, 757 individuals
inflammatory response
236 genes, 46-47 individuals
DNA repair and
cell cycle pathways
518 genes, 90-95 individuals
Fraction of
detectable
polymorphism
Estimation of selection coefficient - simulation
Human effective
population size
present
10010011001111010100100101110101000
01111001100011100010111001
past
Estimation of selection coefficient - simulation
Fsingl(s)
Human effective
population size
FMAF>25%(s)
present
SNP probability to be observed
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
Selection coefficient
5
6
-log(s)
past
Classical association studies
Common
Disease
Control
“Mutation enrichment” association studies
Rare
Disease
Control
“Mutation enrichment” association studies
Rare
Disease
Control
“Mutation enrichment” association studies
Rare missense variants in NPC1L1 gene contributes to variability in cholesterol
absorption and plasma levels of low-density lipoproteins (LDLs)
Cohen J et al., PNAS 2006 in press
Nonsynonymous sequence variants in ABCA1 gene were significantly more common in
individuals with low HDL-C (<fifth percentile) than in those with high HDL-C (>95th
percentile).
Cohen J et al., Science 2004
Multiple rare variants in different genes account for multifactorial inherited susceptibility
to colorectal adenomas
Fearnhead NS et al., PNAS 2004
Cholesterol
Adopted from Brewer et al., 2003
Effect of rare nsSNPs on HDL-C
What about common alleles of smaller effect?




Population of 3500 individuals with known plasma
levels of HDL-C
Population includes both genders and three ethnic
groups
839 SNPs genotyped
Independent population of 800 individuals for
validation
What about common alleles of smaller effect?

Introduce a linear model (ANCOVA)

Subsequently add SNPs to the linear model

Include SNPs based on the likelihood ratio test

Prioritizing SNPs based on conservation did not help
Effect of common SNPs on HDL-C
HDL
And a different population…
HDL
Acknowledgements
The lab:
Gregory Kryukov, Steffen Schmidt, Saurabh Asthana, Victor Spirin, Ivan
Adzhubey
Bioinformatics:
Human genetics:
Vasily Ramensky
Jonathan Cohen
Download