Practice Exam (weeks 1 – 7) – Sample Solutions Attempt all questions. You must support all answers with reasons – correct answers with incorrect or missing reasons will receive NO CREDIT. 1. Scientists have long wondered what role, if any, fever plays in defending the body against infection. In order to determine whether fever is a beneficial respose to infection, researchers assigned laboratory mice at random to two groups. Both groups were infected with a fever-causing virus. Fevers in the first group were brought within normal limits with carefully monitored doses of aspirin. The second group was given nothing. The researchers found that the death rate for the first group was significantly higher than that of the second group. a. Is this study observational or experimental? experimental; the researchers assign the treatments to individuals (rather than the individuals choosing the treatment) b. What is the treatment? aspirin Is there a control group? If so, which group? yes, mice not receiving aspirin c. Is there likely to be a placebo effect in this study? Why or why not? no, there is no placebo (and it seems unlikely that mice would believe they should get better if receiving something ) d. Can the researchers conclude that the increased death rate in the first group was due to the absence of fever? Give reasons for your answer. No, aspirin may have effects other than reducing fever 2. These questions refer to the following output from R for tumor thickness (in mm): 0.15 0.00 0.05 0.10 Density 0.20 0.25 Histogram of tumor 0 5 10 tumor > library(boot) > data(melanoma) > tumor<-melanoma[,"thickness"] > length(tumor) 15 [1] 205 > quantile(tumor) 0% 25% 50% 0.10 0.97 1.94 75% 100% 3.56 17.42 a. How many observations are there for the variable tumor? 205 b. What was the largest value of tumor? 17.42mm c. Approximately how many tumors have thickness between 4 mm and 6 mm? .05*(6-4) = 10%, or around 20 d. Which is larger, the mean or median tumor thickness? long right tail, so mean bigger than median e. Approximately what proportion of values are larger than 2 mm? very close to 2mm, so about half are larger than 2 mm The median is f. Sketch a boxplot for tumor. (below, left) 5 0 10 5 0 10 Sample Quantiles 15 15 Normal Q-Q Plot -3 -2 -1 0 1 2 3 Theoretical Quantiles g. Sketch what you think the shape of a QQ normal plot for tumor would look like (don’t worry about the scales), and explain why it should look like this. (above, right) 3. A machine produces pins (printing tips) for use in microarray experiments. If the machine is correctly adjusted, the rate of unacceptable pins is 5%. If it is not adjusted correctly, the rate of unacceptable pins is 50%. From past company records, the machine is known to be correctly adjusted 90% of the time. A quality control inspector randomly selects one mask from those recently produced and discovers that it is defective. What is the probability that the machine is incorrectly adjusted? Use Bayes’ rule: P(U|D) = P(D|U)P(U)/[P(D|U)P(U) + P(D|not U)P(not U)] = .5*.1/[.5*.1 + .05*.9] = 10/19 4. You have two scales for measuring weight in a lab. Both scales give answers that vary a bit in repeated weighings of the same item. If the true weight of a compound is 2 grams, the first scale produces readings Z that have mean 2.000 g and SD 0.002 g. The second scale’s readings Y have mean 2.001 g and SD 0.001 g. Assume that the readings of Z and Y are independent. a. Which scale is biased? How much bias is there? second scale; bias = 2.001 – 2.000 = .001 b. Give the MSE (mean square error) for each scale. MSE = var + bias2 For the first scale, MSE = .0022 + 02 = .000004; for the second scale, MSE = .0012 + .0012 = .000002 c. Which scale is less variable? The second scale has a lower SD (or variance), so it is less variable d. What are the mean and SD of the difference between the readings, X = Z – Y? E(X) = E(Z – Y) = EZ – EY = 2.000 – 2.001 = –.001 SD(X) = sqrt(Var(Z) + Var(Y)) = sqrt(.0022 + .0012) = .0022 e. You measure once with each scale and average the readings. Your result is W=(Z+Y)/2. What are the mean and SD of W? Is the average W more variable or less variable than the reading Y of the less variable scale? E(W) = E[(Z+Y)/2] = (EZ + EY)/2 = (2.000 + 2.001)/2 =2.0005 SD(W) = sqrt( (Var(Z) + Var(Y))/4 ) = sqrt((.0022 + .0012)/4) = .0011 5. You have to take a statistics exam which consists of true/false questions. That is, each question has 2 possible answers and you have to choose the correct one. You studied very much for this test, but unfortunately you woke up with ‘statistics amnesia’ and don’t remember anything at all, so you will need to guess the answer to every question. Fortunately, you have in your pocket a fair coin, which you will use to help you answer the questions. You plan to flip the coin once for each question, and answer ‘true’ if the coin lands head and ‘false’ if the coin lands tails. You need to answer at least 80% of the question correctly to pass the exam. a. Suppose the exam has 10 questions. What is the distribution of your score on the test? Be specific, and include the values of any parameters of the distribution. Binomial (n = 10, p = .5), assuming two possible outcomes (H or T) on each flip, that each coin flip is independent, flip a fixed number of times (10 here), same probability of heads on each flip b. What is your expected score? What is the SD of your score? Let X = number correct out of 10. EX = np = 10*.5 = 5; SD(X) = sqrt(np(1 – p)) = sqrt(10*.5*.5) = 1.58 c. What is your chance of passing? P(X 8) = (108)(.510) + (109)(.510) + (1010)(.510) = (45 + 10 + 1) (.510) = .055 d. Suppose the test has 100 questions. What is the (exact) distribution of your score on the test? Again, be specific. Binomial (n = 100, p = .5) e. What is your expected score? What is the SD of your score? EX = np = 100*.5 = 50; SD(X) = sqrt(np(1 – p)) = sqrt(100*.5*.5) = 5 f. What is your chance of passing? P(X 80) P(Z (80 – 50)/5) = P(Z 6) 0 6. The number of bacteria colonies of a certain type in samples of polluted water has a Poisson distribution with a mean of 1 per cubic cm (cc). What is the chance that a 1 cc sample will contain two or more bacteria colonies? P(X 2) = 1 – P(X = 0) – P(X = 1) = 1 – e-1 – e-1 = .26 7. Suppose that whether or not it rains tomorrow depends on previous weather conditions only through whether or not it is raining today. Suppose also that, if it is raining today, then it will rain tomorrow with probability 0.6, and, if it is not raining today, then it will rain tomorrow with probability 0.3. Say the system is in state 0 when it rains and state one when it does not. a. Why is the sequence of 0s and 1s (for whether it is raining or not over a number of days) a Markov chain? Because in the problem it says that the weather tomorrow depends only on the weather today, and not on the sequence of rain/no rain leading up to today b. Write down the transition matrix. State 0 (rain) 1 (no rain) 0 0.6 0.4 1 0.3 0.7 c. Find the stationary distribution. need to solve the equations: .6*0 + .3*1 = 0 , .4*0 + .7*1 = 1; get 0 = 3/7, 1 = 4/7 d. Write down an expression for the probability that it does not rain Monday – Friday, then rains on the weekend (i.e. the sequence 1, 1, 1, 1, 1, 0, 0). (4/7)*.7*.7*.7*.7*.3*.6 8. In an experiment with rats, a behavioral scientist used an auditory signal to indicate that food was available through an open door in the cage. The scientist counted the number of trials needed by each of 25 rats to learn to recognize the signal. The mean number of trials was 15, and the SD was 2.5. a. What is the population parameter of interest? mean number of trials b. What assumptions do you need to make to be able to create a CI (confidence interval) for the population parameter? We need the value for the population parameter to be unknown, a random sample from the population, and the sample size to be large enough that the CLT holds c. Suppose that the assumptions hold. Give an approximate 95% CI for the population parameter. 15 +/- 2*2.5/sqrt(25), or 15 +/- 1