Ian_Winters_Final

advertisement
IAN WINTERS
GENE 210: Personalized Genomics and Medicine
Spring 2013 Final Exam
Due Thursday, June 13, 2012 at midnight.
Stanford University Honor Code
The Honor Code is the University’s statement on academic integrity written by students
in 1921. It articulates University expectations of students and faculty in establishing and
maintaining the highest standards in academic work:
• The Honor Code is an undertaking of the students, individually and collectively:
– that they will not give or receive aid in examinations; that they will not give or
receive unpermitted aid in class work, in the preparation of reports, or in any
other work that is to be used by the instructor as the basis of grading;
– that they will do their share and take an active part in seeing to it that others as
well as themselves uphold the spirit and letter of the Honor Code.
• The faculty on its part manifests its confidence in the honor of its students by refraining
from proctoring examinations and from taking unusual and unreasonable precautions to
prevent the forms of dishonesty mentioned
above. The faculty will also avoid, as far as practicable, academic procedures that
create temptations to violate the Honor Code.
• While the faculty alone has the right and obligation to set academic requirements, the
students and faculty will work together to establish optimal conditions for honorable
academic work.
Signature
I attest that I have not given or received aid in this examination, and that I have done my
share and taken an active part in seeing to it that others as well as myself uphold the
spirit and letter of the Stanford University Honor Code.
Name: Ian Winters
Signature:
SUNet ID: 05800144
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
Some questions may have multiple reasonable answers: if you are unsure, provide a
justification based in genetics and cite your sources (SNPedia is fine, journals are
better); as long as the justification is sound, you will receive full credit.
If you are unsure which SNP(s) are associated with a trait, you may consult any
reference you like.
A family of 3 (mother/father/daughter) has come to you to find out what they can learn
from their genotypes. The parents were both adopted, so they do not know any of their
family history. You have sent their DNA to LabCorp, which ran their genotypes on a
custom 1M OmniQuad array, and they’ve returned the results at:
http://stanford.edu/class/gene210/restricted/final/ (X points)
1. A mislabeling in the lab has caused the samples to be shuffled around and they
are simply labeled: ‘patient1.txt,’ ‘patient2.txt,’ and ‘patient3.txt.’ Determine which
sample is the mother’s, the father’s and the daughter’s. (15 points)
Patient 3 must be the father since he only has a single allele for each X-chromosome
SNP, unlike the other two patients. Also, patients 1 and 2 have no Y chromosome
alleles. Furthermore, patient 1 and 2 have identical mitochondrial SNPs, and since
mitochondrial DNA is strictly inherited via the maternal lineage, these two patients must
be mother and daughter. At rs9651273 each patient has the following genotypes:
patient1, AG; patient2, GG; and father, GG. The only way for these genotypes to make
sense is if patient2 is the daughter. It is not possible for patient1 to be the daughter
since the A could not have come from either parent (genotyping errors could have
caused this but the same relationships hold for other SNPs I checked as well). So we
are left with...
Patient1 = Mother
Patient2 = Daughter
Patient3 = Father
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
2. What can you tell about the ancestry of the parents? (15 points)
Using the ancestry feature of Genotation, I plotted the global ancestry of the parents by
PCA. I first used the HGDP:World reference and a resolution of up to 100,000 SNPs.
The plot is below:
Since the parents seem to be localized to Europe, I then ran the same analysis but with
the more specific HGDP:European reference instead. The detailed regional plot is
below:
This plot indicates that the mother is of Northern Italian or Tuscan ancestry, and that the
father is French.
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
3. The parents are concerned about their daughter’s chance for getting breast
cancer. You investigate the genomes of the father, mother and the daughter and
provide genetic counseling for the family. (15 points total)
A. What is the lifetime risk for breast cancer for the overall population of
Europeans?
Data published in 2012 by the National Cancer Institute’s Surveillance,
Epidemiology and End Results (SEER) Program suggest that the lifetime risk of
breast cancer people of European ancestry was 6.53% for both sexes (0.13% for
males, 12.73% for females)1. Although this data was collected from a population
of Europeans living within the US, a large number of people were sampled. Thus,
the 6.53% lifetime risk of breast cancer likely holds for the overall population of
Europeans.
1. Lifetime Risk (Percent) of Being Diagnosed with Cancer by Site and Race/Ethnicity: Males, 18
SEER Areas, 2007-2009 (Table 1.15) and Females, 18 SEER Areas, 2007-2009 (Table 1.16).
2012. Accessed at
http://seer.cancer.gov/csr/1975_2009_pops09/results_merged/topic_lifetime_risk_diagnosis.pdf
on May 24, 2013.
B. Does the genotype of the mother or daughter (at rs77944974) alter their risk
of breast cancer? Explain briefly, providing data on the most important
risk alleles and their effect on risk for breast cancer.
The rs77944974 SNP is one of three BRCA1 mutations used by 23andMe as a
marker for breast cancer risk. It is more commonly known as the 185delAG
BRCA1 mutation. A 2012 paper published in the Journal of Clinical Oncology by
Chen and Parmigiani indicates that a single (heterozygous) 185delAG mutation
in the BRCA1 gene is associated with a 57% lifetime risk (cumulative risk to age
70) of developing breast cancer for women. This is a vast increase in breast
cancer susceptibility since women without the deletion allele only have about a
13% chance of developing the disease by the age of 70.
1. http://www.ncbi.nlm.nih.gov/pubmed?cmd=Search&term=17416853
C. Briefly outline what advice you would give to the mother about her risk for
breast cancer, based on your analysis?
Unfortunately, the mother has the 185delAG mutation. It would be necessary to
explain to the mother that on average, women with this particular mutation have
a 57% chance of developing breast cancer by the age of 70. However, the
mother is of southern European ancestry. Data published in 2002 by Tyczynski et
al. in the European Network of Cancer Registries noted that cancer rates were
about 60% higher in Western and Northern Europe than in Southern and Eastern
Europe1. It is therefore possible that the mother has a slightly reduced risk of
developing breast cancer than the average European given her genetic
background.
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
1. http://www.encr.com.fr/DownloadFiles/breast-factsheets.pdf
D. Briefly outline what advice you would give to the daughter about her risk
for breast cancer, based on your analysis?
Before the genetic test was conducted, it would be important to advise the family
that the daughter has a 50% chance of having the BRCA1 mutation since her
mother is heterozygous (DI) for the mutation and the father is negative (II). This
would require genetic counseling for the daughter and family prior to testing.
Fortunately, genetic testing results show that the daughter is negative for the
185delAG mutation. However, she still has a risk of developing breast cancer
since the BRCA1 mutation accounts for only a small percent of cases each year.
Since only 5-10% of breast cancer cases are known to be hereditary, the safest
conclusion is that the daughter has the same risk for breast cancer as the
population as a whole, which is about 13%.
4. Weeks later, the father (a 42 year old, 185 cm in height, 80 kg in weight, not
taking any other medication) is rushed to the hospital with a stroke. What dose of
warfarin would be given from a clinic that does not perform genetic testing?
What dose of warfarin would be given from a clinic that does perform genetic
testing? Explain the genetic basis for modifying the warfarin dose of the father
given his genotype. (5 points)
Using the Genotation Pharmacogenomics of Warfarin feature, we find that the father
would receive a weekly warfarin dose of ~39.4 mg/week from a clinic that does not
perform genetic testing. This is in stark contrast to the warfarin dose prescribed by a
clinic that does provide genetic testing of ~24.5 mg/week.
A 2002 study published in JAMA by Higashi et al. found that the CYP2C9 *2 allele was
associated with a higher risk for overanticoagulation and bleeding events following
warfarin usage as compared to the CYP2C9 *1 allele 1. Since the father has the *1/*2
CYP2C9 genotype, his warfarin dose should be reduced slightly as compared to an
individual of the *1/*1 CYP2C9 genotype.
Data from a 2005 paper published in the New England Journal of Medicine by Reider et
al. suggests that the T allele of rs9923231 (a SNP located in VKORC1) leads to
hypersensitivity to warfarin as compared to the G allele.2 Patients with two copies of the
T allele, such as the father, were found to require less warfarin for treatment of venous
thromboembolism. Furthermore, patients taking warfarin were at greater risk of serious
bleeding than those with the G allele. This data suggests that the father should receive
a reduced warfarin dose given that he has the TT genotype at this SNP of the VKORC1
gene.
1. http://www.ncbi.nlm.nih.gov/pubmed/11926893
2. http://www.ncbi.nlm.nih.gov/pubmed/15930419
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
5. In her next visit, you observe that the mother has high cholesterol. Would you
prescribe simvastatin (Zocor) to the mother? Why or why not? (5 points)
I ran the mothers genome using the Genotation Pharmacogenetics feature and acquired
the following results related to her ability to metabolize simvastatin:
rs4149056
CC
SLCO1B1
A person with this genotype may have a higher risk
of simvastatin-related myopathy than does a person
with genotype CT or TT.
A paper published in 2006 in the European Journal of Clinical Pharmacology by
Pasanen et al. found that the C allele of rs4149056 reduces the activity of the enzyme
OATP1B1 (encoded by the SLCO1B1 gene), which metabolizes simvastatin 1. Since the
mother is homozygous for the C allele, she cannot metabolize simvastatin as quickly as
people of the CT or TT alleles, and is at a 17-fold increased risk of simvastatin-related
myopathy2. Thus, she should not be treated with this drug to fight the high cholesterol
unless it is absolutely necessary.
1. http://www.ncbi.nlm.nih.gov/pubmed/16758257
2. http://snpedia.com/index.php/Rs4149056
6. You counsel the family about the risk for type 2 diabetes for their daughter.
You analyze the daughter’s genome on genotation.com. You need to explain the
results to the family, and how this influences the daughter’s risk for Type 2
diabetes. (15 points total)
A. What is the likelihood of type 2 diabetes prior to genetic testing?
Narayan et al. published a paper in 2003 in JAMA on the Lifetime Risk for
diabetes mellitus. The study found that the lifetime risk of type 2 diabetes at birth
for females of European ancestry was 31.2%. Before genetic testing, it seems
reasonable to assume the daughter has a similar likelihood of developing this
disease. However, the daughter’s risk of developing the disease may also be
lower; Genotation analysis indicates that the she has a 23.7% risk of type 2
diabetes (not accounting for sex).
1. http://jama.jamanetwork.com/article.aspx?articleid=197439
B. What is the likelihood of type 2 diabetes following analysis of the
daughter’s genotype using Genotation?
Based on the limited genetic susceptibility data available for diabetes mellitus,
Genotation analysis indicates that the daughter’s adjusted probability of
developing the disease is 44.2%.
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
C. How many SNPs were used to assess the risk for type 2 diabetes?
Fifteen SNPs were used to assess the risk for type 2 diabetes, although three of
these were imputed from other SNPs using haplotype data.
D. How were the SNPs combined to give the overall score? Which SNP had
the greatest influence on diabetes risk? Explain briefly.
Firstly, the daughter’s initial probability of developing type 2 diabetes was
established based on her ancestry. Data from each SNP was then used to
determine an “adjusted probability” of developing diabetes. This was done by
ordering the SNPs by the size of the study from which they were discovered,
followed by multiplying the likelihood ratio (LR) of one SNP at a time (for affect on
diabetes) by the “running LR.” In more simple terms, the LR of the daughter’s
ancestry was multiplied by the LR for the SNP from the study with the largest
sample size followed by the LR of the SNP with the next largest sample size, and
so on until all SNPs were taken into account.
In this particular case, rs9465871 was the SNP with the greatest influence on
diabetes risk since it had the largest LR of any SNP and also shifted the
daughter’s probability of developing the disease ~10%, more than any other
SNP.
E. What advice can you provide to the family to help mitigate the chance of
their daughter developing type 2 diabetes?
Only about ~20% of the risk of developing diabetes is thought to be heritable.
This means that a large portion of the disease risk comes from external factors.
There are several ways in which the parents can help reduce the risk of their
daughter developing diabetes, These include encouraging her to stay physically
active, eat a healthy and balanced diet, avoid smoking, stay a healthy weight,
and manage her blood pressure, cholesterol and blood glucose levels.
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
7. The following two SNPs were shown to be associated with risk for type 2
diabetes in two GWAS studies. (15 points total)
snp
rs4402960
rs7754840
odds ratio
1.14
1.28
p-value
8.9 x 10-16
3.5x10-7
cases
14586
1921
controls
17968
1622
A. Which SNP has a larger effect size on risk for type 2 diabetes? Explain
your answer.
Odds ratios are a measure of effect size. In this case they essentially describe
the strength of the association between changes in alleles at a particular SNP
and changes in disease risk. Thus, since rs7754840 has a larger odds ratio, this
SNP has a larger effect size on risk for type 2 diabetes.
B. Which SNP is most statistically significant for risk for type 2 diabetes; i.e.
which SNP is most likely to have a true association?
The statistical significance of a SNP for risk for disease is represented by a pvalue. This value answers the question, what is the probability that the
association found between SNP and disease was simply a product of chance
rather than a true association? It is therefore natural that a smaller p-value is
representative of a greater statistical significance. So rs4402960 is most
statistically significant since it has the smallest p-value.
C. Is the SNP with the biggest effect size on risk for type 2 diabetes always
going to be the SNP that is most statistically significant? Why or why not?
The SNP with the biggest effect size on risk for type 2 diabetes is not always
going to be the SNP that is most statistically significant. Effect size is not a
product of sample size, while p-value is. Thus, even if the effect size of a SNP is
accurate, the significance may be low with a small sample size.
D. rs7754840 is a SNP that lies within the CDKAL1 gene. This SNP was
identified because it was contained on the Illumina Chip used for
genotyping in the GWAS study. Does this result indicate that rs7754840 is
the causal mutation? Does this result indicate that CDKAL1 is involved in
type 2 diabetes? Explain why or why not.
This result does not necessarily mean that rs7754840 is the causal mutation or
even involved in type 2 diabetes. This SNP could simply be associated with
diabetes but be a sort of “passenger” mutation that has no actual function. If this
SNP is not causal, the significant association between this SNP and risk of
diabetes can be explained in several ways. Perhaps this SNP is simply a
necessary precursor to the causal diabetes mutation (if such a thing exists). This
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
SNP may also simply be in a haplotype group with the true casual mutation and
have nothing to do with diabetes and still be significantly associated with the risk
of developing the disease.
8. The two parents are considering having another child. You analyze their
genomes and then counsel them on their chance of having a child with one of the
following diseases: hemochromatosis (rs1800562), Alzheimer’s disease
(specifically, look for APOE4 status), breast cancer (BRCA1 status; rs77944974),
cystic fibrosis (rs113993960) and sickle cell anemia (rs334).
For each of these five diseases, what is the chance that the child will have that
disease? Briefly explain your answer. (15 points total)
Hemochromatosis (rs1800562)
The A allele of the SNP rs1800562 is associated with hemochromatosis in males and
post-menopausal women who are homozygous for the allele. The father is GG and the
mother is AG at this loci. The child is therefore not at high risk of developing this
disease (<1%). This is especially true since both parents are homozygous for the nonrisk allele at rs1799945, a SNP known to synergistically interact with the heterozygous
rs1800562 genotype and slightly increase the risk of mild hemochromatosis1.
1. http://snpedia.com/index.php/Rs1800562
Alzheimer’s disease (specifically, look for APOE4 status)
SNPs located in the APOE gene have the strongest known associations with risk of
developing Alzheimer’s disease. The APOE rs429358 and rs7412 SNPs define three
variants that have known associations with Alzheimer’s as shown in the following table
from the 23andMe website:
Variant
rs429358
rs7412
ε2
=
T
+
T
ε3
=
T
+
C
ε4
=
C
+
C
The ε4 variant is the risk allele. A single copy of this variant leads to a 2-fold increase in
susceptibility to the disease while two copies is associated with an 11-fold increase as
compared to the ε3/ε3 variant (according to 23andMe).
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
The father’s genotypes at these locations are as follows: rs429358, CC and rs7412,
CC. The mother’s genotypes are as follows: rs429358, CT and rs7412, CC. Given these
genotypes, the child will have a 50% chance of being ε3/ε4 and a 50% chance of being
ε4/ε4. According to 23andMe, an individual of European ancestry with the ε3/ε3 variant
has ~5% chance of developing Alzeheimer’s during their lifetime. This means the child
will have with equal probability a 10% (2-fold increase) versus a 55% (11-fold increase)
chance of developing this devastating disease during his or her lifetime.
Breast cancer (BRCA1 status; rs77944974)
The child has a 50% chance of having the BRCA1 mutation since the mother is
heterozygous (DI) for the mutation and the father is negative (II). Therefore, there is an
equal probability that the child is at 13% versus 57% risk of developing the disease.
Cystic fibrosis (rs113993960)
A deletion mutation of rs113993960 is present in 70% of patients with cystic fibrosis
(CF)1. This mutation is recessive so patients with CF are homozygous for this deletion.
Since both parents are carriers of the SNP (DI), the child will develop cystic fibrosis with
a probability of 25%.
1. http://snpedia.com/index.php/Rs113993960
Sickle cell anemia (rs334)
The A allele at rs334 allows for the production of normal hemoglobin (Hb A). This is in
contrast to the T allele, which leads to the production of irregular hemoglobin and
sickled blood cells. Since both parents are homozygous for the A allele, the child is not
at risk of developing sickle cell anemia.
1. http://snpedia.com/index.php/Rs334
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
9. Prenatal genetic diagnosis (15 points total)
A) A pregnant woman seeks non-invasive prenatal genetic testing and provides a
sample of plasma. You isolate the cell-free DNA (cfDNA) from the maternal
plasma and determine that 10% of it is derived from the fetus. You perform whole
genome sequencing on genomic DNA samples from the mother and father. Next
you perform whole genome sequencing on the cfDNA isolated from maternal
plasma. For each of the sites below, you obtain 100X coverage (i.e., 100 reads for
each site). Fill in the expected read counts in the tables below. Use the parental
genotypes below and the observed allele counts for the cfDNA sequencing to
infer the genotype of the fetus at each of three sites and fill them in the table.
The mother contributes 90% of the total reads so for each allele we expect 45 reads
from the mother. We expect 10 read from the fetus, 5 for each allele. So…
Site 1
If mother transmits A
If mother transmits G
A reads observed
59
59
A reads expected
45 + 10 = 55
45 + 5 = 50
Site 2
If mother transmits A
If mother transmits G
A reads observed
52
52
A reads expected
45 + 10 = 55
45 + 5 = 50
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
Site 3
If mother transmits T
If mother transmits C
T reads observed
49
49
T reads expected
45 + 10 = 55
45 + 5 = 50
Infer fetal genotype:
Site 1
A/A
Site 2
A/G
Site 3
T/C
B) You worry that your call at site 3 might not be accurate. In order to improve the
accuracy of your fetal genotyping, you use parental haplotype blocks. Reevaluate your fetal genotype inference based on the maternal haplotypes below.
The maternal contribution to the previously inferred fetal genotype was A,G for site 1
and 2. Since the maternal haplotype at these two sites is A,A there must have been a
crossing over event between site 1 and site 2 leading to a A,G,T genotype. Thus the
newly inferred genotyope is…
Re-evaluated fetal genotype inference:
Site 1
A/A
Site 2
A/G
Site 3
T/T
It makes more sense if “you worry that your call at site 2 might not be accurate” (instead
of 3) since the middle genotype is right on the border between expected genotypes. If
this is a typo, then the site 2 genotype inferred from the maternal haplotype is A. This
leaves us with the following re-evaluated fetal genotype inference…
Re-evaluated fetal genotype inference (if actually site 2):
Site 1
A/A
Site 2
A/A
Site 3
C/T
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
10. Neurodegenerative disease genetics (15 points total)
A) Mutations in several genes connected to production of amyloidpeptides are associated with early onset Alzheimer disease. These include
presenilin 1 (PSN1) and
presenilin
PSN1 and PSN2 are components of gamma-secretase, the enzymatic complex
zheimer disease-linked
Syndrome (trisomy 21), since the APP gene is located on chromosome 21. Thus,
Researchers from the company deCODE Genetics in Iceland analyzed wholegenome sequence data from 1,795 elderly Icelanders and identified a coding
mutation (Ala673Thr) in APP that protects against Alzheimer disease and
cognitive decline in the elderly without Alzheimer disease. They found that the
protective Ala673Thr variant was significantly more common in a group of over85-year-olds without Alzheimer disease (the incidence was 0.62%) — and even
more so in cognitively intact over-85-year-olds (0.79%) — than in patients with
Alzheimer's disease (0.13%). Based on what you know about Alzheimer disease
genetics:
A) In one or two sentences, propose a mechanism by which this mutation might
protect against Alzheimer disease.
One mechanism by which the Ala673Thr mutation may protect against Alzheimer’s
disease is that this mutation inhibits or reduces the ability of gamma-secretase
enzymatic complex proteins such as PSN1 and PSN2 from cleaving the APP protein
into Ab peptides. This would lead to a protective affect against Alzheimer’s since Ab
peptides would never accumulate or at least would not accumulate as rapidly as in
patients with the disease.
B) In one or two sentences, suggest an experiment to test your hypothesis.
A primitive but potentially informative experiment to test this hypothesis would be an in
vitro enzymatic assay using APP protein and PSN1 and PSN2 (and perhaps other
members of the gamma-secretase complex). This assay would test the enzymatic
activity of isolated PSN1 or PSN2 protein to cleave APP protein with and without the
Ala673Thr mutation) to see if the mutation inhibits or reduces APP cleavage.
Ian Winters - Gene210 Final Exam - Spring 2013
[Type text]
11. Extra credit question available in the May 23 slot at
http://stanford.edu/class/gene210/web/html/schedule.html (13 pts). 34827615
Some of the traits do not match between given phenotypes and genetic data, but the
following matches were made using ancestry data and phenotype data:
Person 1: E
Person 2: H
Person 3: A
Person 4: D
Person 5: G
Person 6: F
Person 7: C
Person 8: B
Download