IAN WINTERS GENE 210: Personalized Genomics and Medicine Spring 2013 Final Exam Due Thursday, June 13, 2012 at midnight. Stanford University Honor Code The Honor Code is the University’s statement on academic integrity written by students in 1921. It articulates University expectations of students and faculty in establishing and maintaining the highest standards in academic work: • The Honor Code is an undertaking of the students, individually and collectively: – that they will not give or receive aid in examinations; that they will not give or receive unpermitted aid in class work, in the preparation of reports, or in any other work that is to be used by the instructor as the basis of grading; – that they will do their share and take an active part in seeing to it that others as well as themselves uphold the spirit and letter of the Honor Code. • The faculty on its part manifests its confidence in the honor of its students by refraining from proctoring examinations and from taking unusual and unreasonable precautions to prevent the forms of dishonesty mentioned above. The faculty will also avoid, as far as practicable, academic procedures that create temptations to violate the Honor Code. • While the faculty alone has the right and obligation to set academic requirements, the students and faculty will work together to establish optimal conditions for honorable academic work. Signature I attest that I have not given or received aid in this examination, and that I have done my share and taken an active part in seeing to it that others as well as myself uphold the spirit and letter of the Stanford University Honor Code. Name: Ian Winters Signature: SUNet ID: 05800144 Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] Some questions may have multiple reasonable answers: if you are unsure, provide a justification based in genetics and cite your sources (SNPedia is fine, journals are better); as long as the justification is sound, you will receive full credit. If you are unsure which SNP(s) are associated with a trait, you may consult any reference you like. A family of 3 (mother/father/daughter) has come to you to find out what they can learn from their genotypes. The parents were both adopted, so they do not know any of their family history. You have sent their DNA to LabCorp, which ran their genotypes on a custom 1M OmniQuad array, and they’ve returned the results at: http://stanford.edu/class/gene210/restricted/final/ (X points) 1. A mislabeling in the lab has caused the samples to be shuffled around and they are simply labeled: ‘patient1.txt,’ ‘patient2.txt,’ and ‘patient3.txt.’ Determine which sample is the mother’s, the father’s and the daughter’s. (15 points) Patient 3 must be the father since he only has a single allele for each X-chromosome SNP, unlike the other two patients. Also, patients 1 and 2 have no Y chromosome alleles. Furthermore, patient 1 and 2 have identical mitochondrial SNPs, and since mitochondrial DNA is strictly inherited via the maternal lineage, these two patients must be mother and daughter. At rs9651273 each patient has the following genotypes: patient1, AG; patient2, GG; and father, GG. The only way for these genotypes to make sense is if patient2 is the daughter. It is not possible for patient1 to be the daughter since the A could not have come from either parent (genotyping errors could have caused this but the same relationships hold for other SNPs I checked as well). So we are left with... Patient1 = Mother Patient2 = Daughter Patient3 = Father Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 2. What can you tell about the ancestry of the parents? (15 points) Using the ancestry feature of Genotation, I plotted the global ancestry of the parents by PCA. I first used the HGDP:World reference and a resolution of up to 100,000 SNPs. The plot is below: Since the parents seem to be localized to Europe, I then ran the same analysis but with the more specific HGDP:European reference instead. The detailed regional plot is below: This plot indicates that the mother is of Northern Italian or Tuscan ancestry, and that the father is French. Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 3. The parents are concerned about their daughter’s chance for getting breast cancer. You investigate the genomes of the father, mother and the daughter and provide genetic counseling for the family. (15 points total) A. What is the lifetime risk for breast cancer for the overall population of Europeans? Data published in 2012 by the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) Program suggest that the lifetime risk of breast cancer people of European ancestry was 6.53% for both sexes (0.13% for males, 12.73% for females)1. Although this data was collected from a population of Europeans living within the US, a large number of people were sampled. Thus, the 6.53% lifetime risk of breast cancer likely holds for the overall population of Europeans. 1. Lifetime Risk (Percent) of Being Diagnosed with Cancer by Site and Race/Ethnicity: Males, 18 SEER Areas, 2007-2009 (Table 1.15) and Females, 18 SEER Areas, 2007-2009 (Table 1.16). 2012. Accessed at http://seer.cancer.gov/csr/1975_2009_pops09/results_merged/topic_lifetime_risk_diagnosis.pdf on May 24, 2013. B. Does the genotype of the mother or daughter (at rs77944974) alter their risk of breast cancer? Explain briefly, providing data on the most important risk alleles and their effect on risk for breast cancer. The rs77944974 SNP is one of three BRCA1 mutations used by 23andMe as a marker for breast cancer risk. It is more commonly known as the 185delAG BRCA1 mutation. A 2012 paper published in the Journal of Clinical Oncology by Chen and Parmigiani indicates that a single (heterozygous) 185delAG mutation in the BRCA1 gene is associated with a 57% lifetime risk (cumulative risk to age 70) of developing breast cancer for women. This is a vast increase in breast cancer susceptibility since women without the deletion allele only have about a 13% chance of developing the disease by the age of 70. 1. http://www.ncbi.nlm.nih.gov/pubmed?cmd=Search&term=17416853 C. Briefly outline what advice you would give to the mother about her risk for breast cancer, based on your analysis? Unfortunately, the mother has the 185delAG mutation. It would be necessary to explain to the mother that on average, women with this particular mutation have a 57% chance of developing breast cancer by the age of 70. However, the mother is of southern European ancestry. Data published in 2002 by Tyczynski et al. in the European Network of Cancer Registries noted that cancer rates were about 60% higher in Western and Northern Europe than in Southern and Eastern Europe1. It is therefore possible that the mother has a slightly reduced risk of developing breast cancer than the average European given her genetic background. Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 1. http://www.encr.com.fr/DownloadFiles/breast-factsheets.pdf D. Briefly outline what advice you would give to the daughter about her risk for breast cancer, based on your analysis? Before the genetic test was conducted, it would be important to advise the family that the daughter has a 50% chance of having the BRCA1 mutation since her mother is heterozygous (DI) for the mutation and the father is negative (II). This would require genetic counseling for the daughter and family prior to testing. Fortunately, genetic testing results show that the daughter is negative for the 185delAG mutation. However, she still has a risk of developing breast cancer since the BRCA1 mutation accounts for only a small percent of cases each year. Since only 5-10% of breast cancer cases are known to be hereditary, the safest conclusion is that the daughter has the same risk for breast cancer as the population as a whole, which is about 13%. 4. Weeks later, the father (a 42 year old, 185 cm in height, 80 kg in weight, not taking any other medication) is rushed to the hospital with a stroke. What dose of warfarin would be given from a clinic that does not perform genetic testing? What dose of warfarin would be given from a clinic that does perform genetic testing? Explain the genetic basis for modifying the warfarin dose of the father given his genotype. (5 points) Using the Genotation Pharmacogenomics of Warfarin feature, we find that the father would receive a weekly warfarin dose of ~39.4 mg/week from a clinic that does not perform genetic testing. This is in stark contrast to the warfarin dose prescribed by a clinic that does provide genetic testing of ~24.5 mg/week. A 2002 study published in JAMA by Higashi et al. found that the CYP2C9 *2 allele was associated with a higher risk for overanticoagulation and bleeding events following warfarin usage as compared to the CYP2C9 *1 allele 1. Since the father has the *1/*2 CYP2C9 genotype, his warfarin dose should be reduced slightly as compared to an individual of the *1/*1 CYP2C9 genotype. Data from a 2005 paper published in the New England Journal of Medicine by Reider et al. suggests that the T allele of rs9923231 (a SNP located in VKORC1) leads to hypersensitivity to warfarin as compared to the G allele.2 Patients with two copies of the T allele, such as the father, were found to require less warfarin for treatment of venous thromboembolism. Furthermore, patients taking warfarin were at greater risk of serious bleeding than those with the G allele. This data suggests that the father should receive a reduced warfarin dose given that he has the TT genotype at this SNP of the VKORC1 gene. 1. http://www.ncbi.nlm.nih.gov/pubmed/11926893 2. http://www.ncbi.nlm.nih.gov/pubmed/15930419 Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 5. In her next visit, you observe that the mother has high cholesterol. Would you prescribe simvastatin (Zocor) to the mother? Why or why not? (5 points) I ran the mothers genome using the Genotation Pharmacogenetics feature and acquired the following results related to her ability to metabolize simvastatin: rs4149056 CC SLCO1B1 A person with this genotype may have a higher risk of simvastatin-related myopathy than does a person with genotype CT or TT. A paper published in 2006 in the European Journal of Clinical Pharmacology by Pasanen et al. found that the C allele of rs4149056 reduces the activity of the enzyme OATP1B1 (encoded by the SLCO1B1 gene), which metabolizes simvastatin 1. Since the mother is homozygous for the C allele, she cannot metabolize simvastatin as quickly as people of the CT or TT alleles, and is at a 17-fold increased risk of simvastatin-related myopathy2. Thus, she should not be treated with this drug to fight the high cholesterol unless it is absolutely necessary. 1. http://www.ncbi.nlm.nih.gov/pubmed/16758257 2. http://snpedia.com/index.php/Rs4149056 6. You counsel the family about the risk for type 2 diabetes for their daughter. You analyze the daughter’s genome on genotation.com. You need to explain the results to the family, and how this influences the daughter’s risk for Type 2 diabetes. (15 points total) A. What is the likelihood of type 2 diabetes prior to genetic testing? Narayan et al. published a paper in 2003 in JAMA on the Lifetime Risk for diabetes mellitus. The study found that the lifetime risk of type 2 diabetes at birth for females of European ancestry was 31.2%. Before genetic testing, it seems reasonable to assume the daughter has a similar likelihood of developing this disease. However, the daughter’s risk of developing the disease may also be lower; Genotation analysis indicates that the she has a 23.7% risk of type 2 diabetes (not accounting for sex). 1. http://jama.jamanetwork.com/article.aspx?articleid=197439 B. What is the likelihood of type 2 diabetes following analysis of the daughter’s genotype using Genotation? Based on the limited genetic susceptibility data available for diabetes mellitus, Genotation analysis indicates that the daughter’s adjusted probability of developing the disease is 44.2%. Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] C. How many SNPs were used to assess the risk for type 2 diabetes? Fifteen SNPs were used to assess the risk for type 2 diabetes, although three of these were imputed from other SNPs using haplotype data. D. How were the SNPs combined to give the overall score? Which SNP had the greatest influence on diabetes risk? Explain briefly. Firstly, the daughter’s initial probability of developing type 2 diabetes was established based on her ancestry. Data from each SNP was then used to determine an “adjusted probability” of developing diabetes. This was done by ordering the SNPs by the size of the study from which they were discovered, followed by multiplying the likelihood ratio (LR) of one SNP at a time (for affect on diabetes) by the “running LR.” In more simple terms, the LR of the daughter’s ancestry was multiplied by the LR for the SNP from the study with the largest sample size followed by the LR of the SNP with the next largest sample size, and so on until all SNPs were taken into account. In this particular case, rs9465871 was the SNP with the greatest influence on diabetes risk since it had the largest LR of any SNP and also shifted the daughter’s probability of developing the disease ~10%, more than any other SNP. E. What advice can you provide to the family to help mitigate the chance of their daughter developing type 2 diabetes? Only about ~20% of the risk of developing diabetes is thought to be heritable. This means that a large portion of the disease risk comes from external factors. There are several ways in which the parents can help reduce the risk of their daughter developing diabetes, These include encouraging her to stay physically active, eat a healthy and balanced diet, avoid smoking, stay a healthy weight, and manage her blood pressure, cholesterol and blood glucose levels. Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 7. The following two SNPs were shown to be associated with risk for type 2 diabetes in two GWAS studies. (15 points total) snp rs4402960 rs7754840 odds ratio 1.14 1.28 p-value 8.9 x 10-16 3.5x10-7 cases 14586 1921 controls 17968 1622 A. Which SNP has a larger effect size on risk for type 2 diabetes? Explain your answer. Odds ratios are a measure of effect size. In this case they essentially describe the strength of the association between changes in alleles at a particular SNP and changes in disease risk. Thus, since rs7754840 has a larger odds ratio, this SNP has a larger effect size on risk for type 2 diabetes. B. Which SNP is most statistically significant for risk for type 2 diabetes; i.e. which SNP is most likely to have a true association? The statistical significance of a SNP for risk for disease is represented by a pvalue. This value answers the question, what is the probability that the association found between SNP and disease was simply a product of chance rather than a true association? It is therefore natural that a smaller p-value is representative of a greater statistical significance. So rs4402960 is most statistically significant since it has the smallest p-value. C. Is the SNP with the biggest effect size on risk for type 2 diabetes always going to be the SNP that is most statistically significant? Why or why not? The SNP with the biggest effect size on risk for type 2 diabetes is not always going to be the SNP that is most statistically significant. Effect size is not a product of sample size, while p-value is. Thus, even if the effect size of a SNP is accurate, the significance may be low with a small sample size. D. rs7754840 is a SNP that lies within the CDKAL1 gene. This SNP was identified because it was contained on the Illumina Chip used for genotyping in the GWAS study. Does this result indicate that rs7754840 is the causal mutation? Does this result indicate that CDKAL1 is involved in type 2 diabetes? Explain why or why not. This result does not necessarily mean that rs7754840 is the causal mutation or even involved in type 2 diabetes. This SNP could simply be associated with diabetes but be a sort of “passenger” mutation that has no actual function. If this SNP is not causal, the significant association between this SNP and risk of diabetes can be explained in several ways. Perhaps this SNP is simply a necessary precursor to the causal diabetes mutation (if such a thing exists). This Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] SNP may also simply be in a haplotype group with the true casual mutation and have nothing to do with diabetes and still be significantly associated with the risk of developing the disease. 8. The two parents are considering having another child. You analyze their genomes and then counsel them on their chance of having a child with one of the following diseases: hemochromatosis (rs1800562), Alzheimer’s disease (specifically, look for APOE4 status), breast cancer (BRCA1 status; rs77944974), cystic fibrosis (rs113993960) and sickle cell anemia (rs334). For each of these five diseases, what is the chance that the child will have that disease? Briefly explain your answer. (15 points total) Hemochromatosis (rs1800562) The A allele of the SNP rs1800562 is associated with hemochromatosis in males and post-menopausal women who are homozygous for the allele. The father is GG and the mother is AG at this loci. The child is therefore not at high risk of developing this disease (<1%). This is especially true since both parents are homozygous for the nonrisk allele at rs1799945, a SNP known to synergistically interact with the heterozygous rs1800562 genotype and slightly increase the risk of mild hemochromatosis1. 1. http://snpedia.com/index.php/Rs1800562 Alzheimer’s disease (specifically, look for APOE4 status) SNPs located in the APOE gene have the strongest known associations with risk of developing Alzheimer’s disease. The APOE rs429358 and rs7412 SNPs define three variants that have known associations with Alzheimer’s as shown in the following table from the 23andMe website: Variant rs429358 rs7412 ε2 = T + T ε3 = T + C ε4 = C + C The ε4 variant is the risk allele. A single copy of this variant leads to a 2-fold increase in susceptibility to the disease while two copies is associated with an 11-fold increase as compared to the ε3/ε3 variant (according to 23andMe). Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] The father’s genotypes at these locations are as follows: rs429358, CC and rs7412, CC. The mother’s genotypes are as follows: rs429358, CT and rs7412, CC. Given these genotypes, the child will have a 50% chance of being ε3/ε4 and a 50% chance of being ε4/ε4. According to 23andMe, an individual of European ancestry with the ε3/ε3 variant has ~5% chance of developing Alzeheimer’s during their lifetime. This means the child will have with equal probability a 10% (2-fold increase) versus a 55% (11-fold increase) chance of developing this devastating disease during his or her lifetime. Breast cancer (BRCA1 status; rs77944974) The child has a 50% chance of having the BRCA1 mutation since the mother is heterozygous (DI) for the mutation and the father is negative (II). Therefore, there is an equal probability that the child is at 13% versus 57% risk of developing the disease. Cystic fibrosis (rs113993960) A deletion mutation of rs113993960 is present in 70% of patients with cystic fibrosis (CF)1. This mutation is recessive so patients with CF are homozygous for this deletion. Since both parents are carriers of the SNP (DI), the child will develop cystic fibrosis with a probability of 25%. 1. http://snpedia.com/index.php/Rs113993960 Sickle cell anemia (rs334) The A allele at rs334 allows for the production of normal hemoglobin (Hb A). This is in contrast to the T allele, which leads to the production of irregular hemoglobin and sickled blood cells. Since both parents are homozygous for the A allele, the child is not at risk of developing sickle cell anemia. 1. http://snpedia.com/index.php/Rs334 Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 9. Prenatal genetic diagnosis (15 points total) A) A pregnant woman seeks non-invasive prenatal genetic testing and provides a sample of plasma. You isolate the cell-free DNA (cfDNA) from the maternal plasma and determine that 10% of it is derived from the fetus. You perform whole genome sequencing on genomic DNA samples from the mother and father. Next you perform whole genome sequencing on the cfDNA isolated from maternal plasma. For each of the sites below, you obtain 100X coverage (i.e., 100 reads for each site). Fill in the expected read counts in the tables below. Use the parental genotypes below and the observed allele counts for the cfDNA sequencing to infer the genotype of the fetus at each of three sites and fill them in the table. The mother contributes 90% of the total reads so for each allele we expect 45 reads from the mother. We expect 10 read from the fetus, 5 for each allele. So… Site 1 If mother transmits A If mother transmits G A reads observed 59 59 A reads expected 45 + 10 = 55 45 + 5 = 50 Site 2 If mother transmits A If mother transmits G A reads observed 52 52 A reads expected 45 + 10 = 55 45 + 5 = 50 Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] Site 3 If mother transmits T If mother transmits C T reads observed 49 49 T reads expected 45 + 10 = 55 45 + 5 = 50 Infer fetal genotype: Site 1 A/A Site 2 A/G Site 3 T/C B) You worry that your call at site 3 might not be accurate. In order to improve the accuracy of your fetal genotyping, you use parental haplotype blocks. Reevaluate your fetal genotype inference based on the maternal haplotypes below. The maternal contribution to the previously inferred fetal genotype was A,G for site 1 and 2. Since the maternal haplotype at these two sites is A,A there must have been a crossing over event between site 1 and site 2 leading to a A,G,T genotype. Thus the newly inferred genotyope is… Re-evaluated fetal genotype inference: Site 1 A/A Site 2 A/G Site 3 T/T It makes more sense if “you worry that your call at site 2 might not be accurate” (instead of 3) since the middle genotype is right on the border between expected genotypes. If this is a typo, then the site 2 genotype inferred from the maternal haplotype is A. This leaves us with the following re-evaluated fetal genotype inference… Re-evaluated fetal genotype inference (if actually site 2): Site 1 A/A Site 2 A/A Site 3 C/T Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 10. Neurodegenerative disease genetics (15 points total) A) Mutations in several genes connected to production of amyloidpeptides are associated with early onset Alzheimer disease. These include presenilin 1 (PSN1) and presenilin PSN1 and PSN2 are components of gamma-secretase, the enzymatic complex zheimer disease-linked Syndrome (trisomy 21), since the APP gene is located on chromosome 21. Thus, Researchers from the company deCODE Genetics in Iceland analyzed wholegenome sequence data from 1,795 elderly Icelanders and identified a coding mutation (Ala673Thr) in APP that protects against Alzheimer disease and cognitive decline in the elderly without Alzheimer disease. They found that the protective Ala673Thr variant was significantly more common in a group of over85-year-olds without Alzheimer disease (the incidence was 0.62%) — and even more so in cognitively intact over-85-year-olds (0.79%) — than in patients with Alzheimer's disease (0.13%). Based on what you know about Alzheimer disease genetics: A) In one or two sentences, propose a mechanism by which this mutation might protect against Alzheimer disease. One mechanism by which the Ala673Thr mutation may protect against Alzheimer’s disease is that this mutation inhibits or reduces the ability of gamma-secretase enzymatic complex proteins such as PSN1 and PSN2 from cleaving the APP protein into Ab peptides. This would lead to a protective affect against Alzheimer’s since Ab peptides would never accumulate or at least would not accumulate as rapidly as in patients with the disease. B) In one or two sentences, suggest an experiment to test your hypothesis. A primitive but potentially informative experiment to test this hypothesis would be an in vitro enzymatic assay using APP protein and PSN1 and PSN2 (and perhaps other members of the gamma-secretase complex). This assay would test the enzymatic activity of isolated PSN1 or PSN2 protein to cleave APP protein with and without the Ala673Thr mutation) to see if the mutation inhibits or reduces APP cleavage. Ian Winters - Gene210 Final Exam - Spring 2013 [Type text] 11. Extra credit question available in the May 23 slot at http://stanford.edu/class/gene210/web/html/schedule.html (13 pts). 34827615 Some of the traits do not match between given phenotypes and genetic data, but the following matches were made using ancestry data and phenotype data: Person 1: E Person 2: H Person 3: A Person 4: D Person 5: G Person 6: F Person 7: C Person 8: B