Answers to practice problems for midterm 1

advertisement
PRACTICE PROBLEMS FOR STAT 103 MIDTERM 1.
This is longer than the actual midterm, but it gives you an idea of the types
of questions that will be asked. Topics not covered in these problems still
may be covered on exams.
Name (Please print clearly) ________________________
Lab Time (circle one): 9:10 – 10:00; 10:30 – 11:20; 11:50 – 12:40; 1:10 – 2:00
Directions:
1) Print clearly on this exam. Only correct solutions that can be read will be given credit.
2) You may use a calculator and 1-page (with both sides) as crib sheets.
3) Show your method of solution on problems requiring calculations. Only answers with supporting work will be given credit.
4) Carry out all calculations to 2 decimal places of accuracy. You can leave answers in fractions.
5) As a rough guideline, allot yourself about 9 minutes per page. If you get stuck, move on.
The data for the problems on pages 2 – 6 of this exam pertain to a randomized experiment that assessed the
effect of intensive childcare for children with low birth weights. The study is described on the next page.
1
Description of the study (READ THIS SO YOU KNOW WHAT THE STUDY IS ABOUT.)
Low birth weight infants have elevated risks of cognitive impairment and academic failures later in life (Hill et al. 2003). One
approach to reduce these risks is to provide extraordinary support for the families of low birth weight infants, for example intensive
childcare education and access to trained specialists for the parents.
To assess the effectiveness of such interventions, in 1985 researchers designed the Infant Health Development Program (IHDP). The
IHDP involved randomizing 985 low birth weight infants to one of two groups: 1) a treated group assigned to receive weekly visits
from specialists and to attend daily childcare at childhood development centers, and 2) a control group that did not have access to the
weekly visits or childcare centers. There were 377 infants randomly assigned to the treated group and 608 randomly assigned to the
control group. The outcome variable is the infant’s score on the Peabody Picture Vocabulary Test Revised administered at age 3.
Infants took the test after completing the time period of the study.
Questions begin here. For questions 1 – 17, circle the right answer.
Below is a histogram of the Peabody test scores for the control group.
1) The median score is closest to:
2) The SD is closest to:
85
20
3) The percentage of scores between 75 and 94.9 is closest to:
4) The percentage of infants scoring 100 or higher is closest to:
30 40 50 60 70 80 90 100110120130
50%
20%
5) There are 119 control infants whose Peabody scores are missing. Suppose through
extra field work you obtain the 119 missing scores. You find that the mean and SD of
these 119 scores are 85 and 10, respectively, and these 119 scores follow a normal
curve. Which sentence best describes the average and SD for all 608 kids combined?
b) The SD for the 608 is less than the SD from Question 2.
2
Below are box plots of the distributions of mothers’ age, Peabody scores, and birth weights in the treated and control groups:
130
40
120
110
2000
30
birth weight
Peabody
mom age
100
90
80
70
60
20
1000
50
40
30
Control
Treated
treatment
Control
Treated
Control
treatment
Treated
treatment
6) We want our comparisons of the treated and control groups to be free of the effects of confounding variables. Which statement
must be true for the comparisons to be free of confounding?
b) The distributions of mothers’ age and birth weights should be similar in the treated and control groups.
7) True
The treated and control groups have similar distributions of mothers’ age.
8) The SD of the Peabody scores for the treated group is most likely (circle one of the following three choices)
(i) about equal to the SD of the Peabody scores for the control group.
9) What percentage of the control group was born weighing less than 2000 grams?
65%
10) Sketch a rough histogram of birth weights for the control group.
should have a slight left skew
11) Another variable measured in the study was the number of days the infant had to spend in the hospital after being born. The
average of this variable for all 985 kids is 25.4 days and the SD is 23.9 days. Which of the following box plots looks most like the
distribution of number of days spent in the hospital?
b) box plot drawn with median less than 24 and long right tail..
12) Consider the average days spent in the hospital for the treated and control groups. Which statement is most likely true?
(ii) . The means in the treated and control groups are close to each other.
3
130
130
120
120
110
110
100
100
Peabody
Peabody
Below are scatter plots of Peabody scores versus mothers’ age and versus birth weight for the control group. Shown in the plots
involving are the regression lines.
90
80
70
90
80
70
60
60
50
50
40
40
30
30
15
20
25
30
35
40
45
1000
mom age
birth w eight
13) Estimate the slope and intercept of the regression line using mom age as the predictor.
14) The correlation in the plot involving birth weights is closest to:
2000
Slope around 0.7, intercept around 65.
0
15) The correlation in the plot involving age is closest to: 0.20
16) If we measured mother’s age in terms of months and birth weight in terms of pounds, which statement is true:
d) Neither correlation would change.
4
Answer this one question about the IHDP study design.
17) A psychologist reads the article by Hill et al. (2003) and claims that the results are completely worthless, because the outcome
variable is the score AFTER the treatment period ended rather than a change score (i.e., a post-study score minus a pre-study score).
The psychologist says that there’s no way to tell if the childcare is more effective than the control because we cannot tell which group
had the higher average increase in scores. Is the psychologist correct? Attack or defend the psychologist’s statement, using what
you know about the study design as described on page 2. Don’t just say “right” or “wrong”; explain why you think the psychologist is
right or wrong. WRITE ONLY UP TO 4 SENTENCES. (THAT’S MORE THAN YOU NEED.)
Not true. Because the study was randomized, the average “pre-intervention” scores in the control and treated groups should be
similar. Hence, any difference in the sample post-intervention average scores should be attributable to the effect of the child care.
18) You cannot make this conclusion. People who own guns are likely to differ from people who do not own guns in ways that affect
homicide rates. For example, people who own guns may live in more dangerous areas or have riskier lifestyles. Because of these
differences in background characteristics, we must conclude that the association found in the data does not prove that owning guns
causes people to have higher likelihoods of experiencing a homicide.
You might suggest that the sample may not have been representative of the larger population. The researchers claim that it is, but it is
good to be skeptical of that claim. This answer earns partial credit.
You might suggest that the researchers need to look at all homicides, not just the ones where there is a gun in the home. Some
possible reasons for looking only at homicides in homes include: 1) the researchers may not have the information on gun ownership
for people shot outside their homes, and 2) the researchers want to focus on a particular type of homicide. Even if the researchers did
use all the homicide data, they still would have to deal with possible lack of balance in the background characteristics. This answer,
without any mention of potential imbalance in background characteristics, earns partial credit.
b)
1. Was the survey mailed to a random sample of scientists or not?
2. Was there any nonresponse bias?
3. How was the list of scientists used to pick the sample compiled?
Answers that focus on the questionnaire wording do not count. It is important to consider this issue, but the problem specifies not to
worry about questionnaire wording.
5
19) Answer the following questions. Show your work to maximize your chances for partial credit. You can leave answers in
fractions.
A fair die has six distinct faces, each with a 1/6 probability of facing up. You are going to roll one die three times. Assume the
outcome of each roll is random and independent of other rolls.
a) Calculate the probability that all three faces will be the same number.
6(1/6)(1/6)(1/6) = 1/36
b) Calculate the probability that the sum of the three dice will be less than 4, given that the sum is less than 6.
1/10. Obtain this by counting the number of ways to get a number less than 4 (one) out of the number of ways to get a number less
than 6 (ten).
c) Suppose the three rolls are all 6’s. You are going to roll the die a fourth time.
FALSE: Because you rolled three sixes in a row, the results have to balance out, so that you have less than a 1/6 chance of rolling
a 6 on the fourth roll. You don’t have to do any work for this problem; just circle true or false.
d) This is a totally different problem, having nothing to do with dice. Body temperature measurements follow a normal curve with
average equal to 98.6 degrees and standard deviation equal to 0.75 degrees. If we measured the body temperatures of all people in the
U.S., approximately what percentage of temperatures would be over 99 degrees?
29.7%
20) You determine that a random variable X follows the probability distribution:
f ( x)  20 x3 (1  x) for 0  x  1 , and f ( x)  0 elsewhere.
a) What is the chance that X > 0.5? 1 - .1875 = .8125
b) Calculate the expected value of X. .666667
c) Calculate the variance of X. .0317
21) The uniform distribution is defined on the interval from some constant a to another constant b. It has a probability density
function:
1
f ( x) 
for a  x  b , and f ( x)  0 elsewhere.
ba
6
a) Verify that f(x) is a true probability density function. (Show the two required conditions hold).
It is always positive, and the integral over the entire sample space equals one.
( a  b)
b) Show that the expected value of a uniform random variable X equals
.
2
b
1
dx .
Compute  x
a
ba
(b  a)2
c) Show that the variance of a uniform random variable X equals
.
12
b
1
dx .
Compute  ( x  E ( X )) 2
a
ba
22) In the dice game Yahtzee, you can score points equal to the sum of five dice. Suppose that you’ve thrown four dice and their sum
equals 24. You have one more di to throw, and your score will be the sum of the five di.
a) Write down the probability distribution of the sum?
x Pr(X=x)
25
1/6
26
1/6
27
1/6
28
1/6
29
1/6
30
1/6
b) What sum do you expect to get? 27.5
c) What is the standard deviation of the sum? 1.71
23a) .9623
b) .032
c) 2.25
d) 0.58 (Remember the SD is the square root of the variance.)
24a) Pr(Z=-3) = .064. Pr(Z = -1) = .288. Pr(Z=1) = .432. Pr(Z=3) = .216.
b) Pr(Z=0)=0.
7
c) E(Z) = 0.6. Var(Z) = 2.88
25) Let X be the random variable for score, and let Y be the random variable for hours studied.
a) Pr( X  11)  Pr( X  15)  Pr( X  20)  .29  .20  .49
b) E ( X )  60(.06)  70(.22)  80(.35)  90(.31)  100(.06)  80.9
Var ( X )  E ( X 2 )  E ( X ) 2  60 2 (.06)  70 2 (.22)  80 2 (.35)  90 2 (.31)  100 2 (.06)  80.9 2  100.19 So, the SD(X) is 10.0095.
c) E (Y )  0(.07)  5(.19)  10(.25)  15(.29)  20(.20)  11.8
Var (Y )  E (Y 2 )  E (Y ) 2  0 2 (.07)  5 2 (.19)  10 2 (.25)  15 2 (.29)  20 2 (.20)  11.8 2  35.76 So, the SD(Y) is 5.98.
Putting it all together, we get Cov( X , Y )  39.38 .
d) E ( X | Y  10)  60(0)  70(.28)  80(.40)  90(.32)  100(0)  80.4 .
Var ( X | Y  10)  60 2 (0)  70 2 (.28)  80 2 (.40)  90 2 (.32)  100 2 (0)  80.4 2  59.84
So, the SD(X|Y=10) = 7.74.
e) Cov( X , Y )  E ( XY )  E ( X ) E (Y )  E ( XY )  (80.9)(11.8).
E ( XY )  (60)(5)(. 01)  (70)(5)(. 1)  (70)(10)(. 07)  (70)(15)(.03)
 (80)(5)(. 05)  (80)(10)(. 1)  (80)(15)(. 15)  (80)( 20)(. 05)
 (90)(5)(. 03)  (90)(10)(.08)  (90)(15)(. 1)  (90)( 20)(. 1)
 100(15)(. 01)  (100)( 20)(. 05)
26) You roll a six-sided, fair dice six times. Assume the outcomes of the rolls are independent. (3 points each)
a) What is the probability that the first time the dice lands on 3 is on an even toss (i.e., on 2nd, 4th, or 6th toss)?
(5 / 6)(1 / 6)  (5 / 6)3 (1 / 6)  (5 / 6)5 (1 / 6)
Note that we only toss the dice six times, not an infinite number of times.
8
b) A friend tells you that she rolled the dice six times, and that the first time the dice landed on 3 was in fact on an even toss. Given
that information, what is the probability that the first time the dice landed on 3 was on the second toss?
We want Pr(second toss | even toss) = Pr(second toss and even toss) / Pr(even toss).
Pr(second toss and even toss) = Pr(second toss), because the only way you can get it to land on 3 on the second toss and on an even
toss is when the dice lands 3 on the second toss.
Hence, Pr(second toss | even toss) = Pr(second toss) / Pr(even toss) = (5 / 6)(1 / 6)
(5 / 6)(1 / 6)  (5 / 6)3 (1 / 6)  (5 / 6)5 (1 / 6)
27) A standard deck of cards has 52 cards. The cards have thirteen different values (2 through 10, Jack, Queen, King, Ace) and four
suits (hearts, diamonds, spades, clubs). Your friend offers to play the following game with you. She will deal you five cards from a
shuffled deck, drawn without replacement. If you get at least one card that is an Ace, King, or Queen in your five cards, she will pay
you $5. If you do not get any Aces, Kings, or Queens in your five cards, you pay her $15.
a) Write out the probability mass function for your net earnings from this game.
Let X = net earnings.
Pr( X  5)  1 
(40)(39)(38)(37)(36)
(40)(39)(38)(37)(36)
 .747 . Pr( X  15) 
 .253 .
(52)(51)(50)( 49)( 48)
(52)(51)(50)( 49)( 48)
Note that you have to write out the probability mass function for X, not just Pr(win) and Pr(lose). Those probabilities do not
completely define the probability mass function. Writing Pr(X=5) and Pr(X=-15) does.
b) What is your average net earnings from this game?
E(X) = 5(.747)-15(.253) = -.06
c) What is the standard deviation for the net earnings from of this game?
SD( X )  (5  (0.06)) 2 .747  (15  (0.06)) 2 .253  8.69
9
28) When the U.S. Internal Revenue Service receives tax forms, it puts them through a computer to flag forms that need to be
investigated further. The computer looks for mistakes in the forms, for example addition mistakes or incorrect deduction amounts.
The computer correctly flags 80% of all returns that have mistakes. It also flags 5% of returns that have no mistakes. Suppose that
15% of all tax returns have errors.
a) A tax return is flagged by the computer. What is the chance that it contains mistakes, given that the computer flagged it?
Let F = form is flagged.
Let M = form has a mistake.
We want Pr(M|F).
We know that Pr(M) = .15, that Pr(F|M) = .80, and that Pr(F|Mc )=.05.
Pr(M|F) = Pr(F, M) / Pr(F). Using a tree, we find that Pr(M,F) = (.15)(.8) = .12. And, we find that Pr(F) = (.15)(.8)+(.85)(.05) =
.1625.
So, Pr(M|F) = .12/.1625 = .738.
b) A tax return is not flagged by the computer. What is the chance that it contains mistakes, given that the computer did not flag it?
We want Pr(M|Fc ) = Pr(M, Fc ) / Pr(Fc ).
Using the same tree as in part a, we find that Pr(M, Fc ) = (.15)(.2) = .03. Pr(Fc ) = (.15)(.2)+(.85)(.95) = .8375
So, Pr(M|Fc ) = .03/.8375 = .0358.
10
29) Of the following three joint probability distributions, which one has covariance closest to zero? Circle the title of the correct
distribution. (Hint: You don’t need to calculate all three covariances!)
Distribution A
y=1 y=2
x=1
.20 .60
x=2
.05 .15
Distribution B
y=1 y=2
x=1 .20 .30
x=2 .30 .20
Distribution C
y=1 y=2
x=1 .20 .40
x=2 .40 .00
Distribution A. This is because X is independent of Y in distribution A, and the covariance of two independent variables equals zero.
You can verify the independence by showing that, for example,
Pr(Y  1 | X  1)  .20 / .80  1 / 4.
Pr(Y  1 | X  2)  .05 / .20  1 / 4.
This problem also could be solved by computing the covariance for all three distributions. Correct calculations obtain zero for
Distribution A and non-zero values for the others.
30) Circle true or false for each statement.
a) T If X is a continuous random variable with an exponential distribution, then Pr(X=5) = 0.
All continuous probability density functions have zero probability at any one point.
b) F If you compute a correlation by hand and get -1.26, there is a strong negative relationship between the two variables.
There must be a mistake. Correlations are bounded between -1 and 1.
c) T For any random variable, X, whether continuous or discrete, the Var ( X )  E ( X 2 )  E ( X )2 .
d) F In a randomized comparative experiment with a large number of subjects, there is a good chance that the background
characteristics of the two treatment groups will be different.
Randomization balances background characteristics.
e) T In an observational study with a large number of subjects, there is a good chance that the background characteristics of the two
treatment groups will be different. This is not the case for a randomized experiment, which ensures balance across groups.
f) F For any two events A and B, the Pr(A and B) = Pr(A)Pr(B).
11
Download