Human non-synonymous SNP: molecular function, evolution and disease Shamil Sunyaev Genetics Division, Brigham & Women’s Hospital Harvard Medical School Harvard-M.I.T. Division of HST Effect on molecular function Structural Biology Biochemistry Evolutionary Genetics Medical Genetics Phenotype Natural selection Predicting the effect of mutations in proteins Why is this useful? Understanding variation in molecular function and structure Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection Linkage analysis Rare Classical association studies Common Disease Control Why is this useful? Rare human developmental disorders / mouse mutagenesis screens: linkage studies are impossible Genetics of complex disease: SNP prioritization Genetics of complex disease: Rare variants Technically, polymorphism should not exist! Mendelists Biometricians Quantitative trait Forces to maintain variation: Selection Mutation Common disease / Common variant Trade off (antagonistic pleiotropy) Balancing selection Recent positive selection Reverse in direction of selection Examples APOE AGT CYP3A CAPN10 Alzheimer’s disease Hypertension Hypertension Type 2 diabetes Individual human genome is a target for deleterious mutations ! Frequency of deleterious variants is directly proportional to mutation rate (q=m/s) ~40% of human Mendelian diseases are due to hypermutable sites Multiple mostly rare variants Many deleterious alleles in mutation-selection balance Examples Plasma level of HDL-C Plasma level of LDL-C Colorectal adenomas What about late onset phenotypes? Harmful mutations Function: damaging Evolution: deleterious Advantageous pseudogenization (Zhang et al. 2006) Gain of function disease mutations Phenotype: detrimental Sickle Cell Anemia protein multiple alignment N R G T G G G R G R S G G N E E E G P Q Q Q K Q D K A A G A L S S G K E Q D K K S R S A T V A L L A A V V L A A T L P V T T R T K T T S T T T S Q K T L I L L M L L L L F L L L L L T T S T R W S T K T T N R T T C C C C C C C C C C C C C C C L L V T I E F L Q F R T K L V A V A V P P P V Q V V F Y V V R T S S E I I V N V S S S V S G G G G G G N D Y D N F F I F F S M L - S S T S N D V E - N S A P P F L K S S P F L A S S E P K A S S G G G G N K D A A K Y D D G S H H H D H D G S T N G V V Y Y P S L I D A L Q P V V L F D D V A S S T H E Y Y N N V V M M V V L V M L V F L V V R Q Q G F F Y E Y T S W F T S W W W W W W W W W W W W W W W L M V V Y Y Y T Y E W Y Y N T Profile Ala Arg Asn Asp Cys Gln ... -1.2 0.6 -1.1 -0.9 0.4 ... ... 1.1 -0.3 -0.5 -0.3 -0.5 ... ... -0.6 -0.3 -0.5 -0.3 0.6 ... ... -0.8 -0.5 -0.7 -0.5 0.8 ... ... 0.3 0.6 0.4 0.6 -0.3 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... profile PolyPhen Prediction rate of damaging substitutions possibly probably Disease mutations 82% 57% Divergence 9% 3% 27% 15% Polymorphism 10% of PolyPhen false-positives are due to compensatory substitutions Williamson et al., PNAS 2005 Estimate of selection coefficient Phylogenetic measures PAM-120 -5.32 -8.35* -12.76* BLOSUM-45 -8.41* -3.96 -13.39* BLOSUM-62 -8.41* -4.09 -12.75* BLOSUM-80 -8.46* -4.49 -13.52* Site-specific structural/phylogenetic measures Polyphen -6.072* -11.732* -23.602* de novo mutation effect spectrum Effect of new mutation may range from lethal, to neutral, to slightly beneficial NO DELETERIOUS POLYMORPHISM LOTS OF DELETERIOUS POLYMORPHISM Mutation effect spectrum ? NO DELETERIOUS POLYMORPHISM LOTS OF DELETERIOUS POLYMORPHISM Neutral mutation model Human Chimpanzee Baboon ACCTTGCAAAT ACCTTACAAAT ACCTTACAAAT Prob(TAC->TGC) Prob(TGC->TAC) Prob(XY1Z->XY2Z) 64x3 matrix Strongly detrimental mutations Effectively neutral mutations Mildly deleterious mutations Mildly deleterious mutations 54 genes, 757 individuals inflammatory response 236 genes, 46-47 individuals DNA repair and cell cycle pathways 518 genes, 90-95 individuals Frequency itself is a reliable predictor of function! Set Number of sequenced individuals Percent of deleterious SNPs among missense “singlets” 757 70% NIEHS- EGP 90- 95 63% SeattleSNPs 46- 47 54% Mc Pherson set The majority of missense mutations observed at frequency below 1% are deleterious Fitness and selection coefficient Wild type New mutation N1= 4 Fitness 1 N2= 3 N2 N1 =1–s Selection coefficient Mildly deleterious mutations 54 genes, 757 individuals inflammatory response 236 genes, 46-47 individuals DNA repair and cell cycle pathways 518 genes, 90-95 individuals Fraction of detectable polymorphism Estimation of selection coefficient - simulation Human effective population size present 10010011001111010100100101110101000 01111001100011100010111001 past Estimation of selection coefficient - simulation Fsingl(s) Human effective population size FMAF>25%(s) present SNP probability to be observed 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 Selection coefficient 5 6 -log(s) past Classical association studies Common Disease Control “Mutation enrichment” association studies Rare Disease Control “Mutation enrichment” association studies Rare Disease Control “Mutation enrichment” association studies Rare missense variants in NPC1L1 gene contributes to variability in cholesterol absorption and plasma levels of low-density lipoproteins (LDLs) Cohen J et al., PNAS 2006 in press Nonsynonymous sequence variants in ABCA1 gene were significantly more common in individuals with low HDL-C (<fifth percentile) than in those with high HDL-C (>95th percentile). Cohen J et al., Science 2004 Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas Fearnhead NS et al., PNAS 2004 Cholesterol Adopted from Brewer et al., 2003 Effect of rare nsSNPs on HDL-C What about common alleles of smaller effect? Population of 3500 individuals with known plasma levels of HDL-C Population includes both genders and three ethnic groups 839 SNPs genotyped Independent population of 800 individuals for validation What about common alleles of smaller effect? Introduce a linear model (ANCOVA) Subsequently add SNPs to the linear model Include SNPs based on the likelihood ratio test Prioritizing SNPs based on conservation did not help Effect of common SNPs on HDL-C HDL And a different population… HDL Acknowledgements The lab: Gregory Kryukov, Steffen Schmidt, Saurabh Asthana, Victor Spirin, Ivan Adzhubey Bioinformatics: Human genetics: Vasily Ramensky Jonathan Cohen