Answers to practice problems for midterm 1

PRACTICE PROBLEMS FOR STAT 103 MIDTERM 1. This is longer than the actual midterm, but it gives you an idea of the types of questions that will be asked. Topics not covered in these problems still may be covered on exams. Name (Please print clearly) ________________________ Lab Time (circle one): 9:10 – 10:00; 10:30 – 11:20; 11:50 – 12:40; 1:10 – 2:00 Directions: 1) Print clearly on this exam. Only correct solutions that can be read will be given credit. 2) You may use a calculator and 1-page (with both sides) as crib sheets. 3) Show your method of solution on problems requiring calculations. Only answers with supporting work will be given credit. 4) Carry out all calculations to 2 decimal places of accuracy. You can leave answers in fractions. 5) As a rough guideline, allot yourself about 9 minutes per page. If you get stuck, move on. The data for the problems on pages 2 – 6 of this exam pertain to a randomized experiment that assessed the effect of intensive childcare for children with low birth weights. The study is described on the next page. 1 Description of the study (READ THIS SO YOU KNOW WHAT THE STUDY IS ABOUT.) Low birth weight infants have elevated risks of cognitive impairment and academic failures later in life (Hill et al. 2003). One approach to reduce these risks is to provide extraordinary support for the families of low birth weight infants, for example intensive childcare education and access to trained specialists for the parents. To assess the effectiveness of such interventions, in 1985 researchers designed the Infant Health Development Program (IHDP). The IHDP involved randomizing 985 low birth weight infants to one of two groups: 1) a treated group assigned to receive weekly visits from specialists and to attend daily childcare at childhood development centers, and 2) a control group that did not have access to the weekly visits or childcare centers. There were 377 infants randomly assigned to the treated group and 608 randomly assigned to the control group. The outcome variable is the infant’s score on the Peabody Picture Vocabulary Test Revised administered at age 3. Infants took the test after completing the time period of the study. Questions begin here. For questions 1 – 17, circle the right answer. Below is a histogram of the Peabody test scores for the control group. 1) The median score is closest to: 2) The SD is closest to: 85 20 3) The percentage of scores between 75 and 94.9 is closest to: 4) The percentage of infants scoring 100 or higher is closest to: 30 40 50 60 70 80 90 100110120130 50% 20% 5) There are 119 control infants whose Peabody scores are missing. Suppose through extra field work you obtain the 119 missing scores. You find that the mean and SD of these 119 scores are 85 and 10, respectively, and these 119 scores follow a normal curve. Which sentence best describes the average and SD for all 608 kids combined? b) The SD for the 608 is less than the SD from Question 2. 2 Below are box plots of the distributions of mothers’ age, Peabody scores, and birth weights in the treated and control groups: 130 40 120 110 2000 30 birth weight Peabody mom age 100 90 80 70 60 20 1000 50 40 30 Control Treated treatment Control Treated Control treatment Treated treatment 6) We want our comparisons of the treated and control groups to be free of the effects of confounding variables. Which statement must be true for the comparisons to be free of confounding? b) The distributions of mothers’ age and birth weights should be similar in the treated and control groups. 7) True The treated and control groups have similar distributions of mothers’ age. 8) The SD of the Peabody scores for the treated group is most likely (circle one of the following three choices) (i) about equal to the SD of the Peabody scores for the control group. 9) What percentage of the control group was born weighing less than 2000 grams? 65% 10) Sketch a rough histogram of birth weights for the control group. should have a slight left skew 11) Another variable measured in the study was the number of days the infant had to spend in the hospital after being born. The average of this variable for all 985 kids is 25.4 days and the SD is 23.9 days. Which of the following box plots looks most like the distribution of number of days spent in the hospital? b) box plot drawn with median less than 24 and long right tail.. 12) Consider the average days spent in the hospital for the treated and control groups. Which statement is most likely true? (ii) . The means in the treated and control groups are close to each other. 3 130 130 120 120 110 110 100 100 Peabody Peabody Below are scatter plots of Peabody scores versus mothers’ age and versus birth weight for the control group. Shown in the plots involving are the regression lines. 90 80 70 90 80 70 60 60 50 50 40 40 30 30 15 20 25 30 35 40 45 1000 mom age birth w eight 13) Estimate the slope and intercept of the regression line using mom age as the predictor. 14) The correlation in the plot involving birth weights is closest to: 2000 Slope around 0.7, intercept around 65. 0 15) The correlation in the plot involving age is closest to: 0.20 16) If we measured mother’s age in terms of months and birth weight in terms of pounds, which statement is true: d) Neither correlation would change. 4 Answer this one question about the IHDP study design. 17) A psychologist reads the article by Hill et al. (2003) and claims that the results are completely worthless, because the outcome variable is the score AFTER the treatment period ended rather than a change score (i.e., a post-study score minus a pre-study score). The psychologist says that there’s no way to tell if the childcare is more effective than the control because we cannot tell which group had the higher average increase in scores. Is the psychologist correct? Attack or defend the psychologist’s statement, using what you know about the study design as described on page 2. Don’t just say “right” or “wrong”; explain why you think the psychologist is right or wrong. WRITE ONLY UP TO 4 SENTENCES. (THAT’S MORE THAN YOU NEED.) Not true. Because the study was randomized, the average “pre-intervention” scores in the control and treated groups should be similar. Hence, any difference in the sample post-intervention average scores should be attributable to the effect of the child care. 18) You cannot make this conclusion. People who own guns are likely to differ from people who do not own guns in ways that affect homicide rates. For example, people who own guns may live in more dangerous areas or have riskier lifestyles. Because of these differences in background characteristics, we must conclude that the association found in the data does not prove that owning guns causes people to have higher likelihoods of experiencing a homicide. You might suggest that the sample may not have been representative of the larger population. The researchers claim that it is, but it is good to be skeptical of that claim. This answer earns partial credit. You might suggest that the researchers need to look at all homicides, not just the ones where there is a gun in the home. Some possible reasons for looking only at homicides in homes include: 1) the researchers may not have the information on gun ownership for people shot outside their homes, and 2) the researchers want to focus on a particular type of homicide. Even if the researchers did use all the homicide data, they still would have to deal with possible lack of balance in the background characteristics. This answer, without any mention of potential imbalance in background characteristics, earns partial credit. b) 1. Was the survey mailed to a random sample of scientists or not? 2. Was there any nonresponse bias? 3. How was the list of scientists used to pick the sample compiled? Answers that focus on the questionnaire wording do not count. It is important to consider this issue, but the problem specifies not to worry about questionnaire wording. 5 19) Answer the following questions. Show your work to maximize your chances for partial credit. You can leave answers in fractions. A fair die has six distinct faces, each with a 1/6 probability of facing up. You are going to roll one die three times. Assume the outcome of each roll is random and independent of other rolls. a) Calculate the probability that all three faces will be the same number. 6(1/6)(1/6)(1/6) = 1/36 b) Calculate the probability that the sum of the three dice will be less than 4, given that the sum is less than 6. 1/10. Obtain this by counting the number of ways to get a number less than 4 (one) out of the number of ways to get a number less than 6 (ten). c) Suppose the three rolls are all 6’s. You are going to roll the die a fourth time. FALSE: Because you rolled three sixes in a row, the results have to balance out, so that you have less than a 1/6 chance of rolling a 6 on the fourth roll. You don’t have to do any work for this problem; just circle true or false. d) This is a totally different problem, having nothing to do with dice. Body temperature measurements follow a normal curve with average equal to 98.6 degrees and standard deviation equal to 0.75 degrees. If we measured the body temperatures of all people in the U.S., approximately what percentage of temperatures would be over 99 degrees? 29.7% 20) You determine that a random variable X follows the probability distribution: f ( x)  20 x3 (1  x) for 0  x  1 , and f ( x)  0 elsewhere. a) What is the chance that X > 0.5? 1 - .1875 = .8125 b) Calculate the expected value of X. .666667 c) Calculate the variance of X. .0317 21) The uniform distribution is defined on the interval from some constant a to another constant b. It has a probability density function: 1 f ( x)  for a  x  b , and f ( x)  0 elsewhere. ba 6 a) Verify that f(x) is a true probability density function. (Show the two required conditions hold). It is always positive, and the integral over the entire sample space equals one. ( a  b) b) Show that the expected value of a uniform random variable X equals . 2 b 1 dx . Compute  x a ba (b  a)2 c) Show that the variance of a uniform random variable X equals . 12 b 1 dx . Compute  ( x  E ( X )) 2 a ba 22) In the dice game Yahtzee, you can score points equal to the sum of five dice. Suppose that you’ve thrown four dice and their sum equals 24. You have one more di to throw, and your score will be the sum of the five di. a) Write down the probability distribution of the sum? x Pr(X=x) 25 1/6 26 1/6 27 1/6 28 1/6 29 1/6 30 1/6 b) What sum do you expect to get? 27.5 c) What is the standard deviation of the sum? 1.71 23a) .9623 b) .032 c) 2.25 d) 0.58 (Remember the SD is the square root of the variance.) 24a) Pr(Z=-3) = .064. Pr(Z = -1) = .288. Pr(Z=1) = .432. Pr(Z=3) = .216. b) Pr(Z=0)=0. 7 c) E(Z) = 0.6. Var(Z) = 2.88 25) Let X be the random variable for score, and let Y be the random variable for hours studied. a) Pr( X  11)  Pr( X  15)  Pr( X  20)  .29  .20  .49 b) E ( X )  60(.06)  70(.22)  80(.35)  90(.31)  100(.06)  80.9 Var ( X )  E ( X 2 )  E ( X ) 2  60 2 (.06)  70 2 (.22)  80 2 (.35)  90 2 (.31)  100 2 (.06)  80.9 2  100.19 So, the SD(X) is 10.0095. c) E (Y )  0(.07)  5(.19)  10(.25)  15(.29)  20(.20)  11.8 Var (Y )  E (Y 2 )  E (Y ) 2  0 2 (.07)  5 2 (.19)  10 2 (.25)  15 2 (.29)  20 2 (.20)  11.8 2  35.76 So, the SD(Y) is 5.98. Putting it all together, we get Cov( X , Y )  39.38 . d) E ( X | Y  10)  60(0)  70(.28)  80(.40)  90(.32)  100(0)  80.4 . Var ( X | Y  10)  60 2 (0)  70 2 (.28)  80 2 (.40)  90 2 (.32)  100 2 (0)  80.4 2  59.84 So, the SD(X|Y=10) = 7.74. e) Cov( X , Y )  E ( XY )  E ( X ) E (Y )  E ( XY )  (80.9)(11.8). E ( XY )  (60)(5)(. 01)  (70)(5)(. 1)  (70)(10)(. 07)  (70)(15)(.03)  (80)(5)(. 05)  (80)(10)(. 1)  (80)(15)(. 15)  (80)( 20)(. 05)  (90)(5)(. 03)  (90)(10)(.08)  (90)(15)(. 1)  (90)( 20)(. 1)  100(15)(. 01)  (100)( 20)(. 05) 26) You roll a six-sided, fair dice six times. Assume the outcomes of the rolls are independent. (3 points each) a) What is the probability that the first time the dice lands on 3 is on an even toss (i.e., on 2nd, 4th, or 6th toss)? (5 / 6)(1 / 6)  (5 / 6)3 (1 / 6)  (5 / 6)5 (1 / 6) Note that we only toss the dice six times, not an infinite number of times. 8 b) A friend tells you that she rolled the dice six times, and that the first time the dice landed on 3 was in fact on an even toss. Given that information, what is the probability that the first time the dice landed on 3 was on the second toss? We want Pr(second toss | even toss) = Pr(second toss and even toss) / Pr(even toss). Pr(second toss and even toss) = Pr(second toss), because the only way you can get it to land on 3 on the second toss and on an even toss is when the dice lands 3 on the second toss. Hence, Pr(second toss | even toss) = Pr(second toss) / Pr(even toss) = (5 / 6)(1 / 6) (5 / 6)(1 / 6)  (5 / 6)3 (1 / 6)  (5 / 6)5 (1 / 6) 27) A standard deck of cards has 52 cards. The cards have thirteen different values (2 through 10, Jack, Queen, King, Ace) and four suits (hearts, diamonds, spades, clubs). Your friend offers to play the following game with you. She will deal you five cards from a shuffled deck, drawn without replacement. If you get at least one card that is an Ace, King, or Queen in your five cards, she will pay you $5. If you do not get any Aces, Kings, or Queens in your five cards, you pay her $15. a) Write out the probability mass function for your net earnings from this game. Let X = net earnings. Pr( X  5)  1  (40)(39)(38)(37)(36) (40)(39)(38)(37)(36)  .747 . Pr( X  15)   .253 . (52)(51)(50)( 49)( 48) (52)(51)(50)( 49)( 48) Note that you have to write out the probability mass function for X, not just Pr(win) and Pr(lose). Those probabilities do not completely define the probability mass function. Writing Pr(X=5) and Pr(X=-15) does. b) What is your average net earnings from this game? E(X) = 5(.747)-15(.253) = -.06 c) What is the standard deviation for the net earnings from of this game? SD( X )  (5  (0.06)) 2 .747  (15  (0.06)) 2 .253  8.69 9 28) When the U.S. Internal Revenue Service receives tax forms, it puts them through a computer to flag forms that need to be investigated further. The computer looks for mistakes in the forms, for example addition mistakes or incorrect deduction amounts. The computer correctly flags 80% of all returns that have mistakes. It also flags 5% of returns that have no mistakes. Suppose that 15% of all tax returns have errors. a) A tax return is flagged by the computer. What is the chance that it contains mistakes, given that the computer flagged it? Let F = form is flagged. Let M = form has a mistake. We want Pr(M|F). We know that Pr(M) = .15, that Pr(F|M) = .80, and that Pr(F|Mc )=.05. Pr(M|F) = Pr(F, M) / Pr(F). Using a tree, we find that Pr(M,F) = (.15)(.8) = .12. And, we find that Pr(F) = (.15)(.8)+(.85)(.05) = .1625. So, Pr(M|F) = .12/.1625 = .738. b) A tax return is not flagged by the computer. What is the chance that it contains mistakes, given that the computer did not flag it? We want Pr(M|Fc ) = Pr(M, Fc ) / Pr(Fc ). Using the same tree as in part a, we find that Pr(M, Fc ) = (.15)(.2) = .03. Pr(Fc ) = (.15)(.2)+(.85)(.95) = .8375 So, Pr(M|Fc ) = .03/.8375 = .0358. 10 29) Of the following three joint probability distributions, which one has covariance closest to zero? Circle the title of the correct distribution. (Hint: You don’t need to calculate all three covariances!) Distribution A y=1 y=2 x=1 .20 .60 x=2 .05 .15 Distribution B y=1 y=2 x=1 .20 .30 x=2 .30 .20 Distribution C y=1 y=2 x=1 .20 .40 x=2 .40 .00 Distribution A. This is because X is independent of Y in distribution A, and the covariance of two independent variables equals zero. You can verify the independence by showing that, for example, Pr(Y  1 | X  1)  .20 / .80  1 / 4. Pr(Y  1 | X  2)  .05 / .20  1 / 4. This problem also could be solved by computing the covariance for all three distributions. Correct calculations obtain zero for Distribution A and non-zero values for the others. 30) Circle true or false for each statement. a) T If X is a continuous random variable with an exponential distribution, then Pr(X=5) = 0. All continuous probability density functions have zero probability at any one point. b) F If you compute a correlation by hand and get -1.26, there is a strong negative relationship between the two variables. There must be a mistake. Correlations are bounded between -1 and 1. c) T For any random variable, X, whether continuous or discrete, the Var ( X )  E ( X 2 )  E ( X )2 . d) F In a randomized comparative experiment with a large number of subjects, there is a good chance that the background characteristics of the two treatment groups will be different. Randomization balances background characteristics. e) T In an observational study with a large number of subjects, there is a good chance that the background characteristics of the two treatment groups will be different. This is not the case for a randomized experiment, which ensures balance across groups. f) F For any two events A and B, the Pr(A and B) = Pr(A)Pr(B). 11

Answers to practice problems for midterm 1

Related documents

Products

Support

Answers to practice problems for midterm 1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib