STAT/GEN 537 – Statistics for Molecular Genetics Final Exam December, 2002 Instructions 1. There are three problems worth 35, 35, and 30 points. 2. You may use your notes, slides, handouts, material posted to the course website, as well as your own returned assignments to help you answer the questions. 3. You may not use books (including Sham) to help you answer the questions. 4. Below are several critical values from chi-squared distributions to help you answer the questions. =0.05 =0.10 Description Notation 1 df 2 df 1 df 2 df 2 Central chi-squared for 3.84 5.99 2.71 4.61 hypothesis testing. Noncentral chi-squared for 90% 2 2 10.51 15.15 8.56 13.02 power calculation (=0.1). Problem 1 (35 points) You are studying a disease allele D where heterozygous (Dd) individuals are affected and homozygous (DD) individuals are affected and also infertile. You have a dataset consisting of offspring from several families where one parent was affected and the other was not. Because the data was originally collected with another purpose in mind, families were only included in the dataset if they had at least 2 affected offspring. All affected members were ascertained (complete ascertainment, =1). You plan to use the dataset to look for segregation distortion at the disease locus. You think the segregation ratio is much higher than the expected p=0.5 due to phenocopies or epistasis. If the data set contains 63 families with 2 offspring and 48 families with 3 offspring, please determine whether you have 90% power to detect a segregation ratio of p=0.7. You must account for the ascertainment bias introduced by the sampling of families with at least 2 affected members. Problem 2 (35 points) Suppose you have data on two biallelic markers (marker A has alleles A or a and marker B has alleles B and b) resulting from an F2 cross where some parents were in coupling phase (AB/ab) and others were in repulsion phase (Ab/aB). Let q be the proportion of parents in coupling phase. Describe an EM algorithm for estimating both q and the recombination fraction . Please derive the necessary conditional probabilities and write out the steps in the algorithm. Then test the following data for linkage; the EM-obtained maximum likelihood estimators are provided below the table. AABB AaBB aaBB AABb AaBb aaBb AAbb Aabb aabb 3 2 2 2 12 4 2 0 3 ˆ ˆ 0.16, b 0.61 Problem 3 (30 points) You are studying a disease thought caused by one or more genetic loci. Consider the following table of affected sib pair marker data with known parental marker genotypes. Each outlined box provides the observed counts of all possible combinations of affected sib pair genotypes for a particular parental mating type. For example, there are 10 heterozygous (Aa) sib pairs born to parents with genotypes Aa and aa (i.e. mating type: Aaxaa). Please determine whether this biallelic marker is linked to a disease-causing locus. Aa aa AAxAa AA Aa aa AAxaa AA Aa aa AAxAA AA AA 1 0 AA 4 9 AA Aa 0 Aa 3 Aa 9 aa aa aa Aa aa Aa aa Aa aa AaxAa AA Aaxaa AA aaxaa AA AA 7 0 18 AA 0 0 AA Aa 0 0 Aa 10 18 Aa aa 9 aa 5 aa 7