PRACTICE PROBLEMS FOR STAT 103 MIDTERM 1. This is longer than the actual midterm, but it gives you an idea of the types of questions that will be asked. Topics not covered in these problems still may be covered on exams. Name (Please print clearly) ________________________ Lab Time (circle one): 9:10 – 10:00; 10:30 – 11:20; 11:50 – 12:40; 1:10 – 2:00 Directions: 1) Print clearly on this exam. Only correct solutions that can be read will be given credit. 2) You may use a calculator and 1-page (with both sides) as crib sheets. 3) Show your method of solution on problems requiring calculations. Only answers with supporting work will be given credit. 4) Carry out all calculations to 2 decimal places of accuracy. You can leave answers in fractions. 5) As a rough guideline, allot yourself about 9 minutes per page. If you get stuck, move on. The data for the problems on pages 2 – 6 of this exam pertain to a randomized experiment that assessed the effect of intensive childcare for children with low birth weights. The study is described on the next page. 1 Description of the study (READ THIS SO YOU KNOW WHAT THE STUDY IS ABOUT.) Low birth weight infants have elevated risks of cognitive impairment and academic failures later in life (Hill et al. 2003). One approach to reduce these risks is to provide extraordinary support for the families of low birth weight infants, for example intensive childcare education and access to trained specialists for the parents. To assess the effectiveness of such interventions, in 1985 researchers designed the Infant Health Development Program (IHDP). The IHDP involved randomizing 985 low birth weight infants to one of two groups: 1) a treated group assigned to receive weekly visits from specialists and to attend daily childcare at childhood development centers, and 2) a control group that did not have access to the weekly visits or childcare centers. There were 377 infants randomly assigned to the treated group and 608 randomly assigned to the control group. The outcome variable is the infant’s score on the Peabody Picture Vocabulary Test Revised administered at age 3. Infants took the test after completing the time period of the study. Questions begin here. For questions 1 – 17, circle the right answer. Below is a histogram of the Peabody test scores for the control group. 1) The median score is closest to: 2) The SD is closest to: 5 20 75 35 80 50 85 90 65 3) The percentage of scores between 75 and 94.9 is closest to: 25% 50% 75% 4) The percentage of infants scoring 100 or higher is closest to: 10% 20% 30% 40% 30 40 50 60 70 80 90 100110120130 5) There are 119 control infants whose Peabody scores are missing. Suppose through extra field work you obtain the 119 missing scores. You find that the mean and SD of these 119 scores are 85 and 10, respectively, and these 119 scores follow a normal curve. Which sentence best describes the average and SD for all 608 kids combined? Circle the letter of the sentence. a) The SD for the 608 is greater than the SD from Question 2. b) The SD for the 608 is less than the SD from Question 2. c) The SD for the 608 is equal to the SD from Question 2. 2 Below are box plots of the distributions of mothers’ age, Peabody scores, and birth weights in the treated and control groups: 130 40 120 110 2000 30 birth weight Peabody mom age 100 90 80 70 60 20 1000 50 40 30 Control Treated treatment Control Treated Control treatment Treated treatment 6) We want our comparisons of the treated and control groups to be free of the effects of confounding variables. Which statement must be true for the comparisons to be free of confounding? a) The distributions of mothers’ age, birth weights, and Peabody scores should be similar in the treated and control groups. b) The distributions of mothers’ age and birth weights should be similar in the treated and control groups. c) The distributions of Peabody scores should be similar in the treated and control groups. 7) True or False: The treated and control groups have similar distributions of mothers’ age. 8) The SD of the Peabody scores for the treated group is most likely (circle one of the following three choices) (i) about equal to (ii) about 50% less than, (iiI) about 50% greater than, the SD of the Peabody scores for the control group. 9) What percentage of the control group was born weighing less than 2000 grams? 35% 50% 65% 80% 10) Sketch a rough histogram of birth weights for the control group. 3 11) Another variable measured in the study was the number of days the infant had to spend in the hospital after being born. The average of this variable for all 985 kids is 25.4 days and the SD is 23.9 days. Which of the following box plots looks most like the distribution of number of days spent in the hospital? a) box plot drawn symmetric around 25.4 b) box plot drawn with median less than 25.4 and long right tail.. c) box plot drawn with median greater than 25.4 and long left tail 12) Consider the average days spent in the hospital for the treated and control groups. Which statement is most likely true? (i) The means in the treated and control groups are far apart. (ii) . The means in the treated and control groups are close to each other. (iii) There’s no way to tell from the information we have whether (i) or (ii) is more likely to be true. 130 130 120 120 110 110 100 100 Peabody Peabody Below are scatter plots of Peabody scores versus mothers’ age and versus birth weight for the control group. Shown in the plots involving are the regression lines. The questions for this problem are on the next page. 90 80 70 90 80 70 60 60 50 50 40 40 30 30 15 20 25 30 mom age 35 40 45 1000 2000 birth w eight 4 THESE QUESTIONS REFER TO THE PLOTS ON THE PREVIOUS PAGE. 13) Estimate the slope and intercept of the regression line using mom age as the predictor (watch out for where the axes start). ______________ 14) The correlation in the plot involving birth weights is closest to: -0.50 -0.25 15) The correlation in the plot involving age is closest to: -0.70 0.20 -0.20 0 0.25 0.50 0.70 16) If we measured mother’s age in terms of months and birth weight in terms of pounds, which statement is true: a) The correlation between age and Peabody scores would go up, but the correlation between weight and Peabody scores would go down. b) The correlation between age and Peabody scores would go down, but the correlation between weight and Peabody scores would go up. c) The correlation between age and Peabody scores would go up, and the correlation between weight and Peabody scores would go up. d) Neither correlation would change. Answer this one question about the IHDP study design. 17) A psychologist reads the article by Hill et al. (2003) and claims that the results are completely worthless, because the outcome variable is the score AFTER the treatment period ended rather than a change score (i.e., a post-study score minus a pre-study score). The psychologist says that there’s no way to tell if the childcare is more effective than the control because we cannot tell which group had the higher average increase in scores. Is the psychologist correct? Attack or defend the psychologist’s statement, using what you know about the study design as described on page 2. Don’t just say “right” or “wrong”; explain why you think the psychologist is right or wrong. WRITE ONLY UP TO 4 SENTENCES. (THAT’S MORE THAN YOU NEED.) 5 18) Write at most four sentences for your answer (graders may not read beyond the first four sentences). a) Here is a story from The Washington Post (1993). ”Challenging the common assumption that guns protect owners, a multi-state study of hundreds of homicides has found that keeping a gun at home nearly triples the likelihood that someone in the household will be slain there. The study, published in New England Journal of Medicine, studied the records of three populous counties surrounding Seattle, Washington, Cleveland, Ohio, and Memphis, Tennessee. The counties offered a sample representative of the entire nation because of the mix of urban, suburban, and rural communities. Although 1860 homicides occurred during the study period, the team looked only at those that occurred in the homes of the victims—about 400 deaths. The researchers found that members of households with guns were 2.7 times more likely to experience a homicide than those in households with guns.” From this study, can you conclude that owning a gun causes people to have higher likelihoods of experiencing a homicide than not owning a gun? Defend your answer. Be specific in your defense, referring to strengths or deficiencies of the study design. b) According to a poll of scientists reported in Science (Mervis, 1998), 82% of scientists “strongly or somewhat agree” with the assertion, “The U.S. public is gullible and believes in miracle solutions.” Another 82% agreed with the assertion, “The media do not understand statistics well enough to explain new findings.” What further information about the data collection would you like to have to decide whether or not to trust the results of the survey? List at least two, and no more than three, pieces of information (answers with at least two correct pieces of information will receive full credit). Assume the wording of the questions is not a problem. The survey was mailed to 1400 scientists. 6 19) Answer the following questions. Show your work to maximize your chances for partial credit. You can leave answers in fractions. A fair die has six distinct faces, each with a 1/6 probability of facing up. You are going to roll one die three times. Assume the outcome of each roll is random and independent of other rolls. a) Calculate the probability that all three faces will be the same number. b) Calculate the probability that the sum of the three dice will be less than 4, given that the sum is less than 6. c) Suppose the three rolls are all 6’s. You are going to roll the die a fourth time. TRUE or FALSE: Because you rolled three sixes in a row, the results have to balance out, so that you have less than a 1/6 chance of rolling a 6 on the fourth roll. You don’t have to do any work for this problem; just circle true or false. d) This is a totally different problem, having nothing to do with dice. Body temperature measurements follow a normal curve with average equal to 98.6 degrees and standard deviation equal to 0.75 degrees. If we measured the body temperatures of all people in the U.S., approximately what percentage of temperatures would be over 99 degrees? 20) You determine that a random variable X follows the probability distribution: f ( x) 20 x3 (1 x) for 0 x 1 , and f ( x) 0 elsewhere. a) What is the chance that X > 0.5? b) Calculate the expected value of X. c) Calculate the variance of X. 21) The uniform distribution is defined on the interval from some constant a to another constant b. It has a probability density function: 1 f ( x) for a x b , and f ( x) 0 elsewhere. ba a) Verify that f(x) is a true probability density function. (Show the two required conditions hold). 7 b) Show that the expected value of a uniform random variable X equals c) Show that the variance of a uniform random variable X equals ( a b) . 2 (b a)2 . 12 22) In the dice game Yahtzee, you can score points equal to the sum of five dice. Suppose that you’ve thrown four dice and their sum equals 24. You have one more di to throw, and your score will be the sum of the five di. a) Write down the probability distribution of the sum? b) What sum do you expect to get? c) What is the standard deviation of the sum? 23) For persons infected with a certain form of malaria, the length of time Y (in years) spent in remission is described by a continuous probability density function, f ( y) (1 / 9) y 2 for 0 y 3 . a) What is the probability that a patient’s remission lasts more than one year? b) Given that the patient’s remission lasts more than six months, what is the probability that the remission lasts less than one year? c) What is the average time spent in remission? d) What is the standard deviation of the time spent in remission? 24) An unfair coin with 60% chance of landing heads is flipped three times. Let Z equal the number of heads minus the number of tails. For example, if there are two heads and one tail, Z=1. a) Write the probability distribution of Z. b) What is the chance that Z=0? 8 c) What are the expected value and variance of Z? 25) Based on experience, the joint distribution for number of hours studied and grade on the midterm is given in the following table (these are made up data): Score 60 70 80 90 100 Hours studied 0 5 10 15 20 0.05 0.01 0.00 0.00 0.00 0.02 0.10 0.07 0.03 0.00 0.00 0.05 0.10 0.15 0.05 0.00 0.03 0.08 0.10 0.10 0.00 0.00 0.00 0.01 0.05 a) Find the probability that a randomly selected student studies more than 11 hours. b) Find the expected value and standard deviation of score. c) Find the expected value and standard deviation of hours studied. d) For a person who studied 10 hours, what is the expected value and standard deviation of her or his score? e) Find the covariance between hours studied and score 26) You roll a six-sided, fair dice six times. Assume the outcomes of the rolls are independent. a) What is the probability that the first time the dice lands on 3 is on an even toss (i.e., on 2nd, 4th, or 6th toss)? b) A friend tells you that she rolled the dice six times, and that the first time the dice landed on 3 was in fact on an even toss. Given that information, what is the probability that the first time the dice landed on 3 was on the second toss? 9 27) A standard deck of cards has 52 cards. The cards have thirteen different values (2 through 10, Jack, Queen, King, Ace) and four suits (hearts, diamonds, spades, clubs). Your friend offers to play the following game with you. She will deal you five cards from a shuffled deck, drawn without replacement. If you get at least one card that is an Ace, King, or Queen in your five cards, she will pay you $5. If you do not get any Aces, Kings, or Queens in your five cards, you pay her $15. a) Write out the probability mass function for your net earnings from this game. b) What is your average net earnings from this game? c) What is the standard deviation for the net earnings from of this game? 28) When the U.S. Internal Revenue Service receives tax forms, it puts them through a computer to flag forms that need to be investigated further. The computer looks for mistakes in the forms, for example addition mistakes or incorrect deduction amounts. The computer correctly flags 80% of all returns that have mistakes. It also flags 5% of returns that have no mistakes. Suppose that 15% of all tax returns have errors. a) A tax return is flagged by the computer. What is the chance that it contains mistakes, given that the computer flagged it? b) A tax return is not flagged by the computer. What is the chance that it contains mistakes, given that the computer did not flag it? 10 29) Of the following three joint probability distributions, which one has covariance closest to zero? Circle the title of the correct distribution. (Hint: You don’t need to calculate all three covariances!) Distribution A y=1 y=2 x=1 .20 .60 x=2 .05 .15 Distribution B y=1 y=2 x=1 .20 .30 x=2 .30 .20 Distribution C y=1 y=2 x=1 .20 .40 x=2 .40 .00 30) Circle true or false for each statement. a) T F If X is a continuous random variable with an exponential distribution, then Pr(X=5) = 0. b) T F If you compute a correlation by hand and get -1.26, there is a strong negative relationship between the two variables. c) T F For any random variable, X, whether continuous or discrete, the Var ( X ) E ( X 2 ) E ( X )2 . d) T F In a randomized comparative experiment with a large number of subjects, there is a good chance that the background characteristics of the two treatment groups will be different. e) T F In an observational study with a large number of subjects, there is a good chance that the background characteristics of the two treatment groups will be different. f) T F For any two events A and B, the Pr(A and B) = Pr(A)Pr(B) 11