Practice problems for final exam

advertisement
Statistics 103:
Practice Problems for Final Exam
This sheet contains practice problems for the final exam. It is longer than the
actual final so that you have extra problems. Other material in the text and
from lectures may appear on the final exam.
Questions 1 – 18 refer to a random sample of 500 heads of households. The data
are sampled from the households collected in the March 2000 Current Population
Survey. For this problem, we’ll assume that the data are a simple random sample
of 500 households from the entire U.S. population.
People who own their homes have to pay taxes on their property. Below is a
histogram that shows property taxes for all 500 households in the sample.
Property tax for all households
0 1000
3000
1. True or False:
than the mean.
5000
7000
9000
More than 45% of these households have property taxes greater
2. Choose the value that you think is closest to the standard deviation of these
property taxes: 10, 100, 1000, 10000.
3. True or False: If we remove all the houses that have property taxes equal to
zero, the average of the remaining taxes would be larger than $1,000.
4. True or False: A normal curve can be used to determine the percentage of
houses paying above $500 with very good accuracy.
5. Estimate the percentage of households that pay more than $2000 in property
taxes. _____
Below is a box plot of property tax by marital status. The marital status 1 is
for a married householder; 5 is for a divorced householder; and, 7 is for a
single householder. There are 219 married people, 84 divorced people, and 108
single people. The other people in the sample who have other marital statuses
are not displayed in this graph.
Oneway Analysis of property tax By marital status: 1 = married, 5 = divorced, 7 = single
9000
8000
property tax
7000
6000
5000
4000
3000
2000
1000
0
1
5
7
marital status
6. Order the three groups by median property tax, going from largest to
smallest. _________
7. True or false: The standard deviation of property taxes for these married
people is closer in value to the standard deviation for these single people than
it is to the standard deviation for these divorced people.
8. True or false: A larger percentage of these married people have property
taxes below 500 than do these divorced people.
9. Which of these three groups has the largest percentage of people not paying
any property tax?
10. True or False: The standard error of the sample average property tax for
divorced people is larger than the standard error for the sample average
property tax for married people.
Below is a box plot of property taxes for male and female household heads. Men
are coded with a 1, and women are coded with a 2.
Oneway Analysis of property tax By sex: men = 1 and women = 2
9000
8000
property tax
7000
6000
5000
4000
3000
2000
1000
0
1
2
sex
Means and Std Deviations
Level
1
2
Number
257
242
Mean
996.086
754.942
Std Dev
1507.10
1162.05
11. Are the assumptions for using confidence intervals or hypothesis tests
involving sample average of property taxes likely to hold? Explain your
reasoning.
12. Give a 95% confidence interval for the population difference in average
property taxes for male and female household heads. Assume the degrees of
freedom is 400.
13. Based on the interval, do you think there is overwhelming evidence that the
population average property tax amounts differ?
14. Check all of the following that are true:
___
There is a 95% chance that the population difference in average property
taxes is between the two values you determined in 12.
___ If we pick two households at random so that one is headed by a male and the
other by a female, the difference in their property taxes will fall within the
upper and lower limits 95% of the time.
___ If we took another random sample of 500, then another, then another, and so
on, we’d expect 95% of the formed confidence intervals to contain the population
difference in average property taxes.
15. Test the null hypothesis that there is no difference in population average
property taxes between male and female household heads. State your null and
alternative hypotheses, the test statistic, the p-value, and your conclusions.
Consider a p-value near 0.05 to be small.
16.
Check all of the following that are true.
____ The probability that the null hypothesis is true equals the p-value from
the previous part.
____ It may be the case that the results are due to chance, and our conclusion
from the hypothesis test is wrong.
_____ The chance of getting a value of the test statistic as or more extreme
than what was observed, assuming the null hypothesis is true, equals the pvalue.
Below are scatter plots of property tax, household income, number people in the
household, and age of the household head.
Scatterplot Matrix
8000
6000
4000
2000
property tax
0
300000
250000
200000
150000
100000
50000
0
household income
7
5
3
number people in house
1
80
60
age of hh
40
20
0 2000 5000 8000
0 100000 250000 1 2 3 4 5 6 7 8 2030
50
70
90
Questions 17 – 18 refer to the plot above.
17. The value of the correlation between household income and age of household
head is closest to which of the following values:
-.8, -.4, 0, .4, .8.
18. If you fit a regression between property tax (outcome) and income
(predictor), which of the following statements would be true. You can choose
more than one.
___
___
___
The slope of the line would be positive.
The intercept of the line would be greater than 1000.
The root MSE of the regression would be larger than the SD of property tax.
19. Does taking additional vitamin C help prevent the
common cold? Nobel Laureate Linus Pauling (1901 - 1994) performed a
randomized experiment to address this question and reported his
results in the Proceedings of the National Academy of Sciences.
Pauling randomly assigned 279 French skiers to be in one of two
groups: a group that took vitamin C supplements or a group that took a
placebo (a sugar pill). The numbers of people for each category are summarized
below:
Vitamin C
Placebo
Got a cold
17
31
Did not get a cold
122
109
Pretend that you are the consulting statistician for Linus Pauling (a
lofty honor indeed!)
i) Pauling seeks to know if there is evidence that the population incidence
rate of colds for people who take Vitamin C is less than the population
incidence rate of colds for people who take the placebo. What do
you tell him? State clearly and justify your null and alternative
hypotheses, the test statistic, the p-value, and conclusions.
ii) Asking people to take the sugar pills is expensive because you
have to buy sugar pills and distribute them to the skiers. Pauling requests
that the next experiment--Nobel Laureates always try to replicate
results--avoid the sugar pills to save resources. Instead, he suggests
the control group be randomly assigned to take nothing. Are you
willing to comply with Pauling's request? Explain why or why not.
iii) Suppose you could replicate this experiment with 1000 skiers—500 in each
treatment group. Approximate the standard error that you’d use in a 99%
confidence interval.
iv) Discuss the types of conclusions that you can draw from these data. That
is, what do the results suggest about the ability of Vitamin C to prevent colds?
20.
True or False
For each statement, if you think the statement is always true, just say it is
true. If you think the statement is always false or sometimes false, say it is
false and explain why or when it is false in two or less sentences.
i) Your research colleague calculates by hand a value of the correlation of
-0.95. He says this shows that there is a strong, negative linear association
between the two variables.
ii) When data are randomly sampled from the same population, a 95% confidence
interval constructed from a sample of 100 units should be narrower than a 95%
confidence interval constructed from a sample of 200 units.
iii) In a regression analysis, a non-random pattern in the plot of residuals
versus fitted values is consistent with the assumptions of the regression model.
iv) You perform a hypothesis test with a sample size of four units, and you do
not reject the null hypothesis. This statistical test provides conclusive
evidence against the alternative hypothesis.
v) A group of teachers attend a summer program designed to improve their foreign
language skills. The teachers take a foreign language test at the start of the
summer before the program begins. After the program ends, the teachers take
another language test of similar difficulty. Based on a matched pairs
hypothesis test, the average increase in scores is significantly greater than
zero (p-value = .002). These data demonstrate that the summer training program
improved the foreign language skills of the teachers.
vi) A professor is considering which of two exams to give. Scores on the first
exam follow a normal distribution with mean of 75 and standard deviation of 5.
Scores on the second exam follow a normal distribution with mean of 75 and
standard deviation of 10. She wants to pick the exam likely to result in a
relatively small number of people scoring below 60. She should pick the second
exam.
vii) A certain company employs 10,000 people. Fifty percent of these employees
are women. One hundred of these employees are in management positions. Of
these 100 managers, 35 are women. In a court case against the company, a
defense attorney argues that his client does not discriminate on the basis of
sex when hiring managers. As evidence, he says, ``The chance that a randomly
selected employee is in management and a woman equals Pr(is a woman) * Pr(is in
management) = .50 * .01 = .005. The chance that a randomly selected employee is
in management and a man equals Pr(is a man) * Pr(is in management) = .50 * .01 =
.005. These are equal probabilities, so that there is no evidence of
discrimination.'' The prosecutor's calculations are valid.
21. A random sample of 100 is taken from a population with 10% minority and 90%
non-minority members.
i) True or False: The number of minorities in the sample will be around 10 with
an SE of around 3.
ii) True or False: There is about a 68% chance that the number of minorities in
the sample will be between 9% and 11%.
iii) True or false: The population has about 10% minority members with an SE
of around .3%.
iv) True or false: In a particular sample of 100, it would be nearly impossible
to see more than 16 minority members.
22.
is:
In economics, a standard national accounting identity for total production
Y = C + I + G + X,
where Y = the total production, C = consumption, I = investment, G = government
expenditures, and X = net exports. Consider each of these random variables.
i) A colleague suggests that to find the expected total production, you should
add the expected values of consumption, investment, government expenditures, and
next exports. Is your colleague correct?
ii) The same colleague suggests that to find the variance of total production,
you should add the variances of consumption, investment, government
expenditures, and next exports. Is your colleague correct?
23. You have to decide whether or not to study hard for your Stats final. The
professor tells you that, in the past, 75% of the people who got As studied
hard, whereas 20% of the people who did not get As studied hard. Furthermore,
experience shows that about 40% of people get As on the final.
i)
What is the probability of getting an A, given that you study hard?
ii) What is the probability of getting an A, given that you do not study hard?
24. The time (in hours) it takes to complete a three hour final exam follows
the following continuous probability density function:
f ( y) 
1
1
ln( 3) (4  y )
for 1  y  3 , where ln(3) is the natural log of 3.
i) Find the probability that a person takes less than two hours to complete the
exam.
ii)
Find the probability that a person takes between 2 and 2.5 hours.
iii) Given that a person takes more than two hours, what is the chance that he
or she will take less than 2.5 hours?
iv) Write the expression for the expected time it takes to complete the exam.
You don’t have to evaluate the expression.
25. Suppose that the joint distribution of the number of servings of vegetables
(Y) and the number of servings of fruit (X) that the typical Duke student gets
per day is described by the following joint distribution.
X
x=0
x=1
x=2
y=0
0.1
0.15
0.05
Y
y=1
.3
.2
0
y=2
.2
0
0
i) What is the probability that a randomly selected Duke student eats at least
one serving of fruit per day?
ii) What is the expected value of the number of servings of vegetables per day
for Duke students?
iii)
Among Duke students who eat one serving of fruit per day, how many
servings of vegetables are they expected to get?
iv)
v)
vi)
What is the standard deviation of the number of vegetables?
What is the covariance between X and Y?
Just for practice, suppose you make the function:
T = 1.5Y - .2X.
What are the expected value and variance of T?
26. You are offered the following game. You roll a fair, six-sided dice. For
each roll less than three (i.e. one or two), you get $10. For each roll more
than two (i.e. three through six), you have to pay $2 times the number of rolls
(e.g., if you get a 3-6 on the first roll, you pay $2. If you get a 3-6 on the
second roll, you pay $4. If you get a 3-6 on the third roll, you pay $6.)
i) Suppose you roll the dice three times.
of your net earnings?
What is the probability distribution
ii)
What is the expected value and variance of your earnings?
27.
Circle true or false.
You don’t need to explain your choice.
(i) You take a random sample of 100 people and find out the average gratuity
they give when dining at restaurants (i.e. the tip) is 16% of their bill.
True
False:
The standard error for making a confidence interval for the
average tipping percentage in the population is
.16(.84) / 100 .
(ii) You pick a random sample of 5 M&Ms from an enormous jar that has 30% blue
M&Ms.
z
.4  .3
.3(.7) / 5
True
False:
Since
=.488, the chance of getting less than 40%
blue M&Ms is about 68.7% (this is the area under the normal curve to the left of
.488).
28. Invest your money with me.
Yeah, I’ll take care of it.
The probability distribution for the monthly return rate (expressed as a
decimal, so that a -10% return equals -0.10) for a certain stock (call it “JR
stock”) can be well described by the following probability density function:
f ( x) 
9x 2
6
for -1 < x < 1.
a) What is the probability that JR stock returns more than 20%?
b)
c)
What is the expected rate of return of JR stock?
What is the variance of the rate of return for JR stock?
d) The correlation between JR stock and the IBM stock is 0.45. IBM stock has
an expected rate of return of 7% and a standard deviation of 10%. What is the
standard deviation of a portfolio that has 75% IBM stock and 25% JR stock?
e) If you assume the return rates for all months are independent, what is the
chance that JR stock will have a positive monthly return in at least one of the
next twelve months?
f) If you assume the return rates for all months are independent, what is the
chance that JR stock will have a positive monthly return in at least 70 of the
next 120 months?
29.
Weather predictions
In preparation for tenting next year, you buy a weather predictor that has the
following properties:
 On days when it is sunny, there is an 80% chance that the predictor says
sunny.
 On days when it is sunny, there is a 15% chance that the predictor says
cloudy.
 On days when it is sunny, there is a 5% chance that the predictor says
rain.
 On days when it rains, there is a 10% chance that the predictor says
sunny.
 On days when it rains, there is a 40% chance that the predictor says
cloudy.
 On days when it rains, there is a 50% chance that the predictor says rain.
 On days when it is cloudy, there is a 33% chance that the predictor says
sunny.
 On days when it is cloudy, there is a 34% chance the predictor says
cloudy.
 On days when it is cloudy, there is a 33% chance the predictor says rain.
In January in Durham, 40% of days are sunny, 30% are cloudy, and 30% have rain.
a) The predictor says sunny.
sunny?
b) The predictor says rain.
or sunny?
30.
31.
What is the probability that the day will be
What is the probability that the day will be cloudy
Show that Cov( X , Y )  E ( XY )  E ( X ) E (Y ).
Suppose that the variance of GRE scores for individuals who graduate from
private schools equals
 12 ,
and that the variance of GRE scores for individuals
who graduate from public schools equals
 22 .
You are going to take a random
sample of
n1 individuals from private schools and n 2 individuals from public
schools.
Show that Var ( X 1  X 2 )   1 / n1   2 / n 2 .
2
2
32. In the problem above, suppose that the population means in the two groups
both equal  . You want to estimate  . Two estimators are proposed:
(i)
a)
W  X1
and
(ii)
V  ( X1  X 2 ) / 2 .
Show that both W and V are unbiased estimators of
.
b) Suppose that  1  1000 and  2  10000 , and that the sample size in the first
group is 25. What is the smallest sample size needed in the second group so
that Var(V)<Var(W)?
2
2
33. Drinking and Driving
In many states a motorist is legally drunk or driving under the influence (DUI)
if his or her blood alcohol concentration is .10% or higher. When a suspected
DUI offender is pulled over, police often request a sobriety test called a
breathalyzer, in which the suspected offender breathes into a machine that
reports a blood alcohol level. Although the breathalyzers are remarkably
precise, they do exhibit some measurement error. Because of that variability,
the possibility exists that a driver's true blood alcohol concententation may be
under .10% even though the breathalyzer gives a reading over .10%.
Experience has shown that repeated breathalyzer measurements taken on the same
person produce a distribution of responses that can be described by a normal
distribution with mean equal to the person's true blood alcohol concentration
and standard deviation equal to .004%.
i) Suppose that a driver is stopped on his way home from a party. He has a true
blood alcohol concentration of .095%, barely below the legal limit. If he takes
the breathalyzer test, what are the chances that he will be incorrectly booked
on a DUI charge (i.e., his result will be above .10%)?
ii) Suppose a different driver is stopped on her way home and is wicked drunk
with a blood alcohol level of .15%. What are the chances that the breathalyzer
will indicate that she is not guilty of DUI?
iii) In one bad night, 9 people from the same party are pulled over by the
police and given breathalyzer tests. Let's assume that all of them have a blood
alcohol content of .10%. What is the probability that at least one of the people
will be booked on a DUI charge?
34. Sex of children
Do certain families have a tendency to have babies of the same sex? Let's
assume that the sexes of babies are independent (e.g., the sex of the first-born
baby does not affect the probability the second-born baby is female) , and that
the probability that a baby is born female is 0.5014, which is approximately the
current percentage. Assume that this probability is the same regardless of
family size (which is an assumption that is not necessarily true).
i) What is the chance that a family with 6 children has them born in alternating
sex order, with the oldest being a male?
ii) What is the chance that a family has 6 girls in a row?
iii) Given that the first three children are boys, what is the chance that the
next child will be a girl?
An article that analyzes the question of sex tendencies is in the winter 2001
issue of the magazine Chance.
35. You propose to use a Poission distribution for the number of siblings that
a randomly selected person has. The probability distribution for the Poisson
is:
Pr(Y  y) 
 y e 
y!
y  0, 1, 2, 3, ....
You sample four people and obtain sibling counts of 0, 2, 1, and 4.
maximum likelihood estimate of  .
Find the
36. Lotteries
The Pennsylvania Daily Number is a lottery game in which the state constructs a
three digit number by drawing a digit from 0 to 9 from each of three different
containers. For example, if the digits were drawn in the order 3, 6, 3, then
the winning number would be 363. In this problem, we focus only on the first
container. If the numbers are truly randomly selected, each value between 0 and
9 should be equally likely to occur.
To test if the digits are randomly selected, the frequencies of each digit in
the first container were collected for the 500 days between July 19, 1999, and
November 29,2000. The frequencies are given below:
Digit
Frequency
0
47
1
50
2
55
3
46
4
53
5
39
6
55
7
55
8
44
9
56
To see if the digits are randomly selected, two analyses are proposed:
(i) Form the null hypothesis that the population average of the frequencies for
these draws equals 50, and the alternative hypothesis that the sample average
does not equal 50. To do this, a t-test is performed using the test statistic:
t = (sample average frequency – 50)/SE, where SE is the standard error of the 10
frequencies above.
(ii) For the null hypothesis that the population percentages of each digit equal
0.1, and use a chi-squared goodness of fit test with the test statistic:
x2 = sum[(observed – expected)2 / expected]
a) Which of these two analyses would you use? Explain why chose that analysis,
and what, if anything, is wrong with the other analysis.
b) For the t-test, the p-value equals 0.999. For the chi-squared test, the pvalue equals 0.733. Using the method you selected in part a, what does the pvalue tell you about whether the numbers are randomly selected?
c) I want you to change the frequencies for 8 and 9 above so that the p-value
for your analysis is much smaller than the one in part b. Write the frequencies
for 8 and 9 that you’d use to make this happen.
37)
More chi-squared problems
a) Here’s a conceptual question on chi-squared tests. Fill in the following
table so that you get a p-value from the chi-squared test of independence near
1. Your table must have 40 men and 80 women. You must have at least one person
in each of the six cells.
Favorite color
Red
Blue
Green
Total
Male
Female
40
80
.
b) Following up on part a, fill in the following table so that you get a pvalue from the chi-squared test of independence that is near zero. Your table
must have 40 men and 80 women. You must have at least one person in each of the
six cells.
Favorite color
Red
Blue
Green
Total
Male
Female
40
80
.
Download