Stat 101L – Exam 2 March 7, 2008 Name: ________________________ INSTRUCTIONS: Read the questions carefully and completely. Answer each question and show work in the space provided. Partial credit will not be given if work is not shown. When asked to explain, describe, or comment, do so within the context of the problem. Be sure to include units when dealing with quantitative variables. 1. [15 pts] Short answer. a) [2] Statistics is about … _______________. (Fill in the blank with one word.) b) [5] The correlation between x and y is r = –0.90. Additionally, x =10, y =20, sx=3.0 and sy=6.0. What are the values of estimated slope, b1, and the estimated y-intercept, b0, for the line of best fit? Also give the equation of the line of best fit. c) [2] What does a correlation coefficient of r = 0 indicate about the relationship between two numerical variables? d) [2] A numerical summary of a sample is called a ______________, while a numerical summary of a population is called a _______________. e) [2] When participants do not know which treatment group they are in, the experiment is said to be _________________. f) [2] What is replication within an experiment? 1 2. [37 pts] An experiment is done with people with chronic back pain. The experiment wishes to see the relationship between the dose of a pain medication and the number of hours of relief from pain. Sixteen individuals with chronic back pain agree to participate in the experiment. The individuals are randomly assigned to treatment groups: 0.5 mg, 1 mg, 1.5 mg and 2 mg of a pain medication so that there are 4 individuals in each treatment group. a) [2] Answer the question Who? for this problem. b) [4] Answer the question What? for this problem. Be sure to identify the type of variable (categorical or numerical) and include units where appropriate. c) [3] Why is this an experiment and not an observational study? d) [3] Is there a placebo? Explain briefly. e) [2] There is a very important outside variable associated with the participants that cannot be controlled in this experiment. What is that variable? 2 f) [3] Below is a plot of hours of relief versus amount of medication. Relief (hours) 15 10 5 0 0 .5 1 1.5 2 Dose (mg) Describe the relationship in terms of direction, form, strength, and indicate any unusual points. Below is partial JMP output for the least squares regression line, e.g. the line of best fit. Linear Fit Predicted Relief (hours) = 1.39 + 3.34*Dose (mg) Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.454301 0.415323 2.187619 5.5625 16 g) [5] Give an interpretation of the slope within the context of the problem. 3 h) [2] Give an interpretation of the y-intercept within the context of the problem? i) [2] Use the least squares regression line to predict the hours of relief for a dose of 2 mg. j) [2] One of the participants in the experiment who was given 2 mg of the pain reliever experienced 9.2 hours of relief. What is the residual for this participant? k) [3] Graph the least squares regression line on the plot in f). In order to get full credit it must be obvious to me that you are using the equation to draw the line. l) [2] How much of the variability in the hours of relief can be explained by the linear relationship with dose? m) [4] On the next page is a plot of residuals. Describe what you see in the plot and what this tells you about the least squares regression line for predicting the hours of relief for a given dosage of the pain reliever. 4 5 4 3 Residual 2 1 0 -1 -2 -3 -4 -5 0 .5 1 1.5 2 Dose (mg) 3. [8 pts] One of the goals of re-expression is to make the scatter in a scatter plot more even across all levels of the explanatory variable. With this in mind, the response in the pain relief experiment (problem 2) was re-expressed using logarithms (power = 0) and negative reciprocals (power = –1). Refer to the JMP output Re-expression of Hours of Relief. a) [2] Describe the plot of residuals for the linear fit for log(Relief). b) [1] Has the log re-expression achieved the goal of making the scatter more even across all levels of the explanatory variable? c) [2] Describe the plot of residuals for the linear fit for –1/Relief. d) [1] Has the negative reciprocal re-expression achieved the goal of making the scatter more even across all levels of the explanatory variable? 5 e) [2] Is the linear model for –1/Relief adequate? Explain briefly. 4. [10 pts] An article in the Des Moines Register on January 14, 2007 reported on the relationship between folate in the diet and Alzheimer’s disease. An earlier study in 2005 using information gathered from the Baltimore Longitudinal Study of Aging reported that diets high in folate might help reduce the risk of Alzheimer's disease. Source: www.cbsnews.com In that study the researchers analyzed the diets of 579 volunteers (359 men, 220 women) 60 and older without Alzheimer's disease and followed them for nine years. The researchers looked at what percentage of participants’ diets contained antioxidant vitamins (E, C, carotenoids) and B vitamins (folate, B-6, and B-12). During the follow-up period 57 participants developed Alzheimer's disease. The researchers then compared the nutrient intake of those who developed Alzheimer's disease with that of those who did not develop the disease. They show that those with a higher dietary intake of folate had an almost 60 percent lower rate of the disease. a) [3] Why was this an observational study and not an experiment? Explain briefly. b) [3] Was this a prospective or a retrospective study? Explain briefly. c) [2] What was the explanatory variable? Is it numerical or categorical? d) [2] What was the response variable? Is it numerical or categorical? 6 5. [5 pts] In a game of chance you pay $5 to roll one fair ten-sided die (like the ones used in lab). The number that you roll indicates how much money you win. Roll a 9 you win $9, roll a 1 you win $1, etc. Use row 3 in the Table of Random Digits below to simulate the outcomes of 20 games. How much are your simulated winnings? Based on this simulation, do you think you can make money playing this game? I think I scored _________ out of 75 points on this exam. Formulas y y=∑ r= n ∑ zx z y n −1 b1 = r sy sx ∑(y − y) 2 sy = x x=∑ n −1 n x−x y− y zx = zx = sx sy b0 = y − b1 x Table of Random Digits Row 96299 07196 98642 1 71622 35940 81807 2 03272 41230 81739 3 46376 58596 14365 4 47352 42853 42903 5 20639 59225 74797 63685 97504 ∑ (x − x ) 2 sx = n −1 ŷ = b0 + b1 x residual = y − ŷ 23185 18192 70406 56555 56655 69929 80777 69273 72944 88606 56282 08710 18564 42974 70355 14125 84395 72532 96463 61406 38872 69563 78340 63533 38757 94168 86280 36699 24152 70657 7