docx - UCSD Genetics Training Program

advertisement
BIOM262/BGGN237
Quantitative Methods in Genetics
Winter, 2014
Final Exam
Due Wednesday, March 19, by noon (PDT) to bah@ucsd.edu
Instructions:
Each question relates to a specific module of the course and was designed by
the faculty member in charge of that module. If the meaning of an exam question
is unclear, you should first contact the faculty member in charge of the relevant
module for clarification.
You may (and should) consult class notes, handouts and assigned readings for
the relevant module. You may consult other sources, including books, published
research articles, and web resources. You may NOT consult any person, known
or anonymous, including by email, chat rooms, social media, discussion boards
nor any other medium. Violations of this restriction will be pursued as academic
dishonesty per University policy. If there are questions regarding this policy,
please address them to bah@ucsd.edu before going further.
Course grades will be based on the best eight of nine modules. Modules 5 and 9
will be graded by the project assigned by Dr. Yeo. The exam includes questions
from seven modules. The grading basis will be the average of the top three
student scores for their eight best modules, prior to awarding extra credit. This
average will define a normalized score of 100. If you complete the project and all
exam questions, half the value of your lowest score will be awarded as extra
credit. Some exam questions offer additional opportunities for extra credit,
though these may be more challenging than the primary exam questions. A final
score ≥100 will be awarded an A+, up to the top 5 scores. 90 and above is an A,
80 and above is a B, etc.
You can do well in the course even if you drop a module completely, but it is
quantitatively in your favor to attempt each module.
Extra credit question are pure bonus, no penalty if you skip them.
Module 1 (Raffi Aroian)
Question 1.1: A microscope company claims their new confocal can scan an
entire 18x18 coverslip at 400X magnification in 60 seconds with a standard
deviation of 30 seconds. You decide to demo the system. You scan 10 coverslips
at 400X magnification and find it took 750 seconds to collect the data from all 10
coverslips. Is this enough evidence to refute the company’s claim?
Question 1.2: It has been hypothesized that changes in microbiota can correlate
with changes in learning capabilities. Germfree mice were reconstituted with
microbiota from C57BL/6 mice or microbiota from Swiss Webster mice (the two
strains have different baseline microbiota). The mice were then subject to a
learning test, scored on a scale of 1-200, where 200 indicates high performance
and 1 indicates low performance. Does microbiota influence learning? Be sure
you justify the test you decide to use.
score (C57BL/6
microbiota)
154
109
137
115
152
140
154
178
101
103
126
137
165
165
129
200
148
score (Swiss Webster
microbiota)
108
140
114
91
180
115
126
92
169
146
109
132
75
86
70
115
187
104
Module 2 (Bruce Hamilton)
Postdoc Peter Prettygood has picked a project he can pilot with PCR. To test the
hypothesis that the marvelous mutation modifies the amount of Mfr2 mRNA in
maxillary glands, he performs quantitative RT-PCR and calculates the relative
quantity compared to the geometric mean of a robust set of reference genes.
Here are Peter’s data:
Sample
Genotype
AA
Mfr2
0.42016
Mfr1
1.07500
mutant1
mutant2
AA
0.47185
1.02880
mutant3
AA
0.53506
0.94277
mutant4
AA
0.49547
0.78694
mutant5
AA
0.42986
0.94166
mutant6
AA
0.56094
1.23203
mutant7
AA
1.45536
1.04192
mutant8
AA
0.89592
0.76590
mutant9
AA
0.39923
0.79212
mutant10
AA
0.35265
0.79979
mutant11
AA
0.46934
1.00656
control1
BB
0.44401
1.44315
control2
BB
0.64210
1.02843
control3
BB
0.47597
0.79809
control4
BB
0.67194
0.83985
control5
BB
0.51120
0.97134
control6
BB
0.61028
1.37147
control7
BB
1.60358
1.14948
control8
BB
1.12220
0.89417
control9
BB
0.41608
0.93955
control10
BB
0.54020
0.93507
control11
BB
0.42419
1.25809
Assuming these samples are unpaired, how would you analyze Peter’s data?
Justify your choice of test, including tails. Perform the calculation in R. Under
your test, what is the probability Peter would have obtained a difference this large
by chance?
How much would your answer change–what is the relevant test and p-value–if
the samples were paired (mutant1 with control1, etc.)?
Extra credit:
If Peter’s PI, Betsy Bayes, had previously thought the likelihood of Peter’s
hypothesis was 1,000,000 to 1 against, based on prior evidence in the field, how
should she change her view in light of this data? What should her new view be?
If Betsy Bayes had instead suggested Peter’s hypothesis as likely, based on prior
data for Mfr1 provided in the table and on a known likelihood that these genes
are co-regulated, should this matter for how she views the data on Mfr2?
Module 3 (Bruce Hamilton)
A friend asks you to help analyze a large set of genotyping data from a rat
genetic linkage cross. The experimental design is an intercross set up to detect
modifiers of a nasty mutation. As part of your initial analysis you notice that some
markers have unusual distributions of genotypes, unusual enough to have
significant chi-square tests.
3.1. Provide three plausible explanations for deviations from expected genotype
distribution that you might find in genome-wide data.
3.2. Which of these explanations would you favor if you had densely spaced
markers and the significant chi-square tests were for consecutive markers?
Which explanations would you favor if the significant chi-square tests were for
non-adjacent markers?
3.3. Using the qtl package in R and the n12_262.csv intercross data set we
explored in class, plot the nonparametric linkage evidence (LOD scores) for the
class trait (phenotype 2) as we did in class. (Be sure to follow the map
estimation steps we used in class to generate a smooth curve of imputed values).
What LOD value would define significant linkage according to the Lander and
Kruglyak (1995) guidelines (making no assumptions about inheritance mode,
therefore 2 degrees of freedom)? Draw a red line at this threshold. How many
linkage peaks are significant by this standard? Paste your plot below (screen
shots are fine).
Extra credit: For the plot above, add a solid, salmon-colored line for the 95%
Bayes credible interval.
Module 4 (Elizabeth Winzeler)
You have heard rumors that a family that you know at the University may have
contributed DNA for the CEU trio whole genome sequencing effort. In this family
the mother has blue eyes, the father brown, and the daughter blue. Based on
your reading of the literature, you know that the rs12913832 SNP in her2, located
at the center of chr15 interval:28,365,602-28,365,634 of hg19 assembly is
associated with eye color, with a GG alleles being associated with blue
eyes. Could your acquaintances be the donors? Explain your logic.
Extra Credit history questions where Module 6 should be
(all opportunity, does not hurt you if you skip it)
6.1 Which of the following statisticians may be considered Bayesians?
a. Thomas Bayes
b. Pierre-Simon Laplace
c. David Blackwell
d. Ronald Aylmer Fisher
e. Nate Silver
6.2 Who is credited with coining the term “Baysian?”
6.3 Densitometry of x-ray film after exposure to 32P-labled probes revolutionized
quantitative assessment of DNA and RNA in blotting experiments. Which of the
following are significant limitations of this anachronistic quantitative method:
a. Decay of 32P is non-linear.
b. Background may be non-uniform and difficult to subtract precisely.
c. Hybridization kinetics depend on salt concentration and reaction
temperature relative to the Tm.
d. The clicking noise made by a standard Geiger counter sounds scary.
e. Conversion of silver grains in film is non-linear.
f. Performing this properly is labor-intensive and therefore low-throughput.
Module 7 (Scott Rifkin)
Under genetic drift, an allele at a frequency of 0.4 will persist in a population for
an average of 2.7N generations where N is the population size. Determine the
standard deviation of the persistence time for such an allele in terms of N using
simulation. Include plots from your simulation to support your answer.
What is the standard deviation if the allele starts at a frequency of 0.15?
As a reminder, you can import the simulation programs into R via:
source("http://labs.biology.ucsd.edu/rifkin/courses/BIOM262/allProgs.R")
Module 8 (Nik Schork)
Suppose you want to identify a genomic locus harboring variants that influence a
particular phenotype. You have access to a very large family, with some
members of the family exhibiting the phenotype and others not. Explain how a
meiotic mapping approach leveraging linkage analysis to identifying the locus
would work. Include how concepts such as recombination, genetic maps,
polymorphic markers, linkage disequilibrium, haplotypes, transmission functions,
penetrance functions, and the Elston-Stewart algorithm fit into this strategy.
Module 10 (Jonathan Sebat)
You are a first year bioinformatics student, and you are doing a rotation in the
laboratory of Professor Gertrude Gogglestein. Gerty has just performed whole
genome sequencing on a boy with Duchenne muscular dystrophy (DMD). Pairedend sequencing was performed to 30X coverage using a library of 500 bp
fragments
DMD is a recessive X-linked muscle disease that affects only boys. DMD is
caused by mutations in the gene dystrophin. Partial and full deletions and
duplications of the gene dystrophin make up approximately 60% of the disease
causing mutations.
Dr. Gogglestein naturally suspects that the boy could carry a deletion of
dystrophin, and she asks you to analyze the sequence data for the presence of
such a deletion. You are provided with 2 pieces of data:
• Table A lists the depth of coverage in 100 bp intervals across the X
chromosome
• Table B lists the estimated fragment (insert) sizes and mapping positions for all
read pairs on the X chromosome.
You are welcome to use a schematic diagrams to illustrate.
1. Describe how you would use Table A to identify a mutation in dystrophin.
2. Describe how you would use Table B to identify a mutation of dystrophin
Download