STAT/GEN 537 – Statistics for Molecular Genetics

advertisement
STAT/GEN 537 – Statistics for Molecular Genetics
Final Exam
December, 2002
Instructions
1. There are three problems worth 35, 35, and 30 points.
2. You may use your notes, slides, handouts, material posted to the course website,
as well as your own returned assignments to help you answer the questions.
3. You may not use books (including Sham) to help you answer the questions.
4. Below are several critical values from chi-squared distributions to help you
answer the questions.
=0.05
=0.10
Description
Notation 1 df
2 df
1 df
2 df
2
Central chi-squared for
3.84
5.99
2.71
4.61

hypothesis testing.
Noncentral chi-squared for 90%  2 2
10.51
15.15
8.56
13.02
power calculation (=0.1).
 
Problem 1 (35 points)
You are studying a disease allele D where heterozygous (Dd) individuals are affected and
homozygous (DD) individuals are affected and also infertile. You have a dataset
consisting of offspring from several families where one parent was affected and the other
was not. Because the data was originally collected with another purpose in mind,
families were only included in the dataset if they had at least 2 affected offspring. All
affected members were ascertained (complete ascertainment, =1). You plan to use the
dataset to look for segregation distortion at the disease locus. You think the segregation
ratio is much higher than the expected p=0.5 due to phenocopies or epistasis. If the data
set contains 63 families with 2 offspring and 48 families with 3 offspring, please
determine whether you have 90% power to detect a segregation ratio of p=0.7. You must
account for the ascertainment bias introduced by the sampling of families with at least 2
affected members.
Problem 2 (35 points)
Suppose you have data on two biallelic markers (marker A has alleles A or a and marker
B has alleles B and b) resulting from an F2 cross where some parents were in coupling
phase (AB/ab) and others were in repulsion phase (Ab/aB). Let q be the proportion of
parents in coupling phase. Describe an EM algorithm for estimating both q and the
recombination fraction . Please derive the necessary conditional probabilities and write
out the steps in the algorithm. Then test the following data for linkage; the EM-obtained
maximum likelihood estimators are provided below the table.
AABB AaBB aaBB AABb AaBb
aaBb
AAbb
Aabb
aabb
3
2
2
2
12
4
2
0
3
ˆ
ˆ
  0.16, b  0.61
Problem 3 (30 points)
You are studying a disease thought caused by one or more genetic loci. Consider the
following table of affected sib pair marker data with known parental marker genotypes.
Each outlined box provides the observed counts of all possible combinations of affected
sib pair genotypes for a particular parental mating type. For example, there are 10
heterozygous (Aa) sib pairs born to parents with genotypes Aa and aa (i.e. mating type:
Aaxaa). Please determine whether this biallelic marker is linked to a disease-causing
locus.
Aa
aa AAxAa AA
Aa
aa AAxaa AA
Aa
aa
AAxAA AA
AA
1
0
AA
4
9
AA
Aa
0
Aa
3
Aa
9
aa
aa
aa
Aa
aa
Aa
aa
Aa
aa
AaxAa AA
Aaxaa AA
aaxaa AA
AA
7
0
18
AA
0
0
AA
Aa
0
0
Aa
10
18
Aa
aa
9
aa
5
aa
7
Download