PolyPhen and SIFT: Tools for predicting functional effects of SNPs Epi 244 Spring 2009 Sam S. Oh Human genome variation • 3.2 billion base pairs (bp) • 99.9% similarity across individuals – 3.2 million bp dissimilar • ~11 million SNPs – Coding vs. non-coding (intron and intergenic regions) – Most are synonymous Frazer et al. Nat Rev Genet, 2009;10:241-251 DNA → RNA → Protein Example: sickle-cell anemia • A to T SNP of beta-globin gene results in glutamate (hydrophilic) to valine (hydrophobic) substitution Example: MTHFR • Folate metabolism Finding MTHFR SNPs Highlight all refSNP numbers (use scroll bar) and copy Note Build number (currently Build 130) Highlight all refSNP numbers (use scroll bar) and copy SIFT • Sorting Intolerant From Tolerant • Predicts tolerability of AA substitution effects (i.e., non-synonymous SNPs) based on – Sequence homology – Physical properties of amino acids • Can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations Compare Build numbers Copy all SNP IDs and paste into SIFT. Choose “Submit Query” Getting more info for rs2274974 Enter “rs2274974” Flanking sequence, IUPAC code, Allele info flanking seq Build number mRNA name Protein name Contig name Position of SNP in mRNA, protein, contig Scroll down Select protein Note AA1, AA2, and position Copy FASTA-formatted protein sequence Paste FASTA-formatted protein sequence Enter AA substitution [Letter1-position-Letter2] Substitution occurs at AA 566 Scroll down Check tolerance of AA substitutions Scroll down “Substitution at pos 566 from G to E is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01. Tolerance of specified substitution Polymorphism Phenotyping • Tool for prediction of possible impact of amino acid substitution (i.e., non-synonymous SNPs) on protein structure and function based on: – Amino acid sequence • What part of the protein did the SNP occur? (E.g., active site, binding site, transmembrane region) – Multiple alignments with homologous proteins and mammalian orthologues • How compatible is the substitution based on proteins of comparable sequence? – 3D structural properties with the substituted amino acid • What is the substitution’s effect on the protein’s physiochemistry? (E.g., hydrophobicity, electrostatic interactions, ligand binding) PolyPhen data flow Four potential predictions • Probably damaging – It is with high confidence supposed to affect protein function or structure • Possibly damaging – It is supposed to affect protein function or structure • Benign – Most likely lacking any phenotypic effect • Unknown – Lack of data do not allow PolyPhen to make a prediction Copy FASTA-formatted protein sequence Enter AA position, ancestral AA, and substituted AA In dbSNP Build 129, corresponds to protein NP_005948.3 Enter SNP rs# Query vs. SNP Collection Prediction PSIC db SNP Build# Query SNP Collection Probably damaging 2.093 Probably damaging 2.172 N/A 126 References • NCBI dbSNP – http://www.ncbi.nlm.nih.gov/sites/entrez • SIFT – http://sift.jcvi.org/ • PolyPhen – http://genetics.bwh.harvard.edu/pph/index.html