Homework 10 Key

advertisement
Stat 4220 homework
1) Santa noticed that a lot of people put candy canes on their trees. He wants to evaluate whether
the color of a candy cane is related to the type of Christmas tree. After randomly selecting 3054
houses Santa collected the following data. Answer all three questions at the bottom of the
paper.
Color of Candy Cane
Observations
Classic
Type of
Christmas
Tree
Flocked
Artificial
Aluminum
Red/White
Blue/White
Solid Red
Brown
Black with
Green dots
298
301
300
293
201
210
203
212
132
162
148
132
97
101
112
103
19
15
13
2
Color of Candy Cane
Expected Counts
Classic
Type of
Christmas
Tree
Flocked
Artificial
Aluminum
Red/White
Blue/White
Solid Red
Brown
Black with
Green dots
292
308
303
290
202
213
A
201
140
148
146
139
101
107
105
100
12
13
12
12
Color of Candy Cane
Partial Chi-Squared
Classic
Type of
Christmas
Tree
Flocked
Artificial
Aluminum
Red/White
Blue/White
0.142
0.005
0.157
0.054
0.027 Censored
0.040
0.638
Solid Red
0.502
1.267
0.032
0.399
Brown
0.160
0.304
0.475
0.070
Black with
Green dots
B
0.433
0.024
8.241
Chi-Squared Value (meaning the sum of all the values in the completed table above): 17.30
What are the missing values for A and B
A=209.88 to 210 depending on how you round
B=4.106
2) A journalist want to estimate the proportion of students who are in debt. The current best
guess is that the proportion should be about 75%. She wants to get a 60% confidence interval
for the true proportion with a margin of error that is less than 0.013. How many students
should she survey?
.013=.84*sqrt(.75*(1-.75)/n)
N=783
3) The Daily Stat Fact reports that over 10% of engineers get a job that is not engineering related. I
think the report is way off. I randomly sample 625 engineers and find that 49 of them got a job
that was not engineering related. Test whether the percentage reported really is too high
H0:p≥.1
Ha:p<.1
α=0.05
z=(49/625-.1)/sqsrt(.1*.9/625)=-1.8
p-value=.036
Reject
Our data does show the proportion of 10% is too high
4) Harry Potter believes that he can tell if a person is a bad guy by listening to the background
music when they come near. To find out if this is the case, Harry records what type of music he
hears around 114 random people. Then Harry performs the Crucius curse to determine if the
person is a good guy or bad guy. Based on the following data, determine if the type of
background music is related to the person’s allegiance.
Allegiance
Good
Bad
Guys
Guys
Ominous
Music
Background
Music
Happy
Music
45
38
13
18
Show all the steps of the hypothesis using specifically a Χ2 test of independence!
H0: Allegiance is independent of music
Ha: Allegiance depends on music
α=0.05
Allegiance
Good Guys
Bad Guys
Background
Music
Ominous
Music
Happy
Music
42.2 15.8
40.8 15.2
Allegiance
Good Guys
Bad Guys
Background
Music
Df=1
Chisquared = 1.167
Ominous
Music
Happy
Music
.18196 .48717
.18845 .50457
.20<p-value<.25
Fail to Reject
Our data does not show you can tell who is the bad guy based on the music
5) Katelyn has discovered that salt-licks from the Great Salt Lake are normally distributed, but they
contain trace amounts of arsenic. She asks four of her friends to buy a salt-lick and measure the
amount of arsenic. Here are their results:
Raul:
28 cc
Blaine: 44 cc
Madison: 32 cc
Leanne: 20 cc
Using their data find a 98% CI for the amount of arsenic in a salt-lick
Xbar = (28+44+32+20)/4=31
S=sqrt(((28-31)^2+(44-31)^2+(32-31)^2+(20-31)^2)/(4-1))=10
31+-4.541*10/sqrt(4) = (8.295, 53.705)
6) Suppose you are testing whether green runts cause cancer. You have a large group of people
who regularly eat runts, and a large group that never eat runts, you will mark which ones
develop cancer before they die. The Willy Wonka Candy Company is worried that if a link is
found to cancer that it would be devastating. They ask you to be extra cautious not to hurt the
company’s image unless you’re absolutely certain about the results.
Choose an α level besides 0.05 and explain why.
H0: p1=p2 (runts do not cause cancer)
Ha: p1 ne p2 (runts do cause cancer)
Type 1: We say runts do cause cancer, but they do not
Type 2: We say runts don’t cause cancer when in fact they do
We were asked to avoid type 1 errors, so we should lower alpha
(small alpha – but I suppose students could say we want to guard against cancer and the Willy Wonka
company be torqued)
7) Some buildings in Laramie have been having problems with insects nesting inside the walls. A
supervisor has suggested that it could be based on whether the building has iron supports or
steel supports. Based on the data below, use any method you like to test whether that could be
true.
Insect problems
No insect problems
Iron
120
250
370
Steel
140
230
370
260
480
H0: the metal type is independent of the insect problem
HA: the metal type is dependant of the insect problem
Alpha=0.05
If you do a proportions test the z-score should be ±1.54
If you do an independence test the chi-squared should be 2.37
P-value=.1236 or
.1<p-value<.15
Fail to Reject
Our data does not show that the metal type is related to the insect problem.
8) Donald Trump just finished studying 96 business, and has classified them according to the
amount of risk the companies take (high, medium, or low), and what type of company (large,
small, personal, or not-for-profit). His final conclusion is that the amount of risk a company
takes does not depend on the type of company.
Bill Gates says that is so not true. He says different types of companies have different types of
risk levels. To keep the two from arguing you decide to compute the χ2 Test of Independence.
When you hand the paper to Donald and Bill, they fight over it and tear the corner of the report
(see the picture below).
Determine statistically who you would say the data supports.
As a hint, the partial χ2 values that you can see add up to 13.19, and the assumptions are met for
the test.
H0: company risk level is independent of size
Ha: company risk level depends on size
Alpha = 0.05
High risk non-profit chi-squared value is (8-5)^2/5=1.8
High risk large corporation expected value is (10+6+1)*30/(32+34+30)=5.3125
High risk large corporation chi-squared value is (5.3125-1)^2/5.3125=3.5
Chi-squared = 13.19+1.8+3.5=18.49
.015<p-value<.02
Reject
We can say the business risk level depends on the size (Bill Gates was right)
9) Doctor Ann randomly selects 40 people to crack their knuckles daily, and 40 people to never
crack their knuckles. Doctor Bob selects 40 pairs of twins and one twin will crack their knuckles
daily and the other not. After 10 years they measure the amount of arthritis. Who will have a
more powerful test?
a) Dr. Ann’s test is more powerful because Doctor Bob’s 80 subjects are only 40 pairs of twins
so his results will be similar to having a smaller sample size.
b) Dr. Bob’s test is more powerful because taking the difference between twins will take out
variability due to the genetics of each subject
c) Dr. Bob’s test is more powerful because it is very unlikely that two different sets of twins will
be related to each other which increases the chance that they were selected randomly
d) Dr. Ann’s test is more powerful because the people who do not crack their knuckles will act
as a control group in the experiment where they are not twins
e) Dr. Ann’s test is more powerful because the subjects do know which treatment they are
getting beforehand and it will reduce the risk of a placebo effect
10) Dr. Carl asks 1000 people to rate whether they “crack their knuckles frequently”, “crack their
knuckles sometimes”, and “almost never crack their knuckles”. Then he evaluates if they have
arthritis in their hands. What kind of test should Dr. Carl run to analyze this data assuming the
conditions are met?
A) 2 proportions z test
B) One mean t-test
C) Regression
D) Matched Pairs
E) Chi-squared
11) A genetics test is attempting to see if there is a relationship between nose type (Long,
Medium, and Flat) and diet (Poor, Somewhat Healthy, and Healthy). Below is the data
and output from a computerized Χ2 program.
OBS
Poor
Some
Healthy
Long
10
12
15
Med
15
16
9
Flat
8
2
4
Χ2
Poor
Long
0.87
Med
0.02
Flat
1.68
EXP
Poor
Some
Healthy
Long
13.4
12.2
11.4
Med
14.5
13.2
12.3
Flat
5.1
4.6
4.3
Some
Healthy
0.003
1.15
0.60
0.89
1.48
0.02
Test whether there is a relationship between nose type and diet.
There are two categories (Flat Some and Flat Healthy) which have fewer than 5 expected values, so this
cannot be done
12) A test to determine if major is related to social skills looks at 4 different majors and whether the
student has social skills. The test has a p-value of 0.55. What is the conclusion?
A) Because the number of majors is less than 5, no conclusions can be drawn.
B) The p-value is less than α, so there is evidence to suggest a link between major and social skills.
C) The p-value is greater than α so there is not evidence to suggest a link between major and social skills.
D) The p-value is greater than α, so there is evidence to suggest a link between major and social skills.
E) The p-value cannot be great than ½, so an error was made
13) The NYTimes did a study on the proportion of football players that have sustained a head injury.
Their 95% confidence interval based on 109 random NFL players was (0.571, 0.629).
Check which of the following (if any) are true.
X
X
There is a 95% probability that the proportion is between 0.571 and 0.629
95% of the time the true proportion will be between 0.571 and 0.629
This sample was not large enough to be able to use the normal distribution by the Central Limit Theorem
95% of all confidence intervals from 109 NFL players will correctly contain the true proportion
The true proportion is between 0.571 and 0.629 with 95% confidence
For a new CI there is a 95% probability of the sample proportion being between 0.571 and 0.629
14) A sociologist wants to show that the food you eat actually changes your perception of how other
people are feeling. She gathered 1000 volunteers, and randomly selected what food they would
eat. Then she asked them to look at a photograph (of a person showing no emotion) and asked
them to mark what emotion they thought the person was experiences. The data is shown below.
Test at the 1% significance level (with all 7 steps of a hypothesis) if the food they ate is related to
the emotion chosen.
Happy
Chocolate
22
Oranges
25
Breadstick
33
Salad
31
Steak
11
122
Angry
16
32
29
46
8
131
Sad
30
48
32
65
7
182
Surprised
9
8
11
21
6
55
Sleepy
39
66
51
98
8
262
Scared
44
65
25
102
12
248
160
244
181
363
52
1000
The expected value for the surprised steak group is 52*55/1000 = 2.86, which is not greater than 5, this
problem cannot be done.
15) Google wants to know if the type of browser you use determines what you do on the internet.
They installed spyware on 400 random computers and got the following data
Firefox
IE
Chrome
Social
Media
31
32
18
81
Games
49
57
15
121
Work
80
70
48
198
160
159
81
400
Test whether what you do on the computer is related to the type of browser you use.
H0: Browser is independent of internet use
HA: Browser use is dependent on internet use
α=0.05
Social
Media
Games
Work
32.4
48.4
79.2
Firefox
32.197
48.097
78.70
IE
16.402
24.502
40.09
Chrome
81
121
198
160
159
81
400
Firefox
IE
Chrome
Social Media
0.060493827
0.001211468
0.15558642
Games
0.007438
1.647788
3.685236
Work
0.008081
0.962798
1.558524
Chisq=8.08
0.05 <p-value < 0.10
Fail to Reject
We cannot say internet use depends on the browser
16) Nick knows the UW football team is better than CSU, but he wants to compare their average
rushing yards. He is fairly certain that the rushing yards are normally distributed with the same
variance for both teams.
He randomly selects 11 UW games, and the average rushing yards were 110.
He randomly selects 7 CSU games, and the average rushing yards were 93.
Can Nick say with 99% confidence that UW has more rushing yards than CSU?
Standard deviation of one game for UW: 16 yards
Standard deviation of one game for CSU: 13 yards
Pooled standard deviation for one game: 15 yards
Matched Pairs standard deviation for one game: 7.5 yards
Average standard deviation for both teams: 14.5 yards
H0: μUW ≤ μCSU
HA: μUW > μCSU
α = .01
The rushing yards are normally distributed
There are (at least) 3 different ways of doing the next steps, but you must use the pooled standard
deviation for all of them because it said the variance was the same for both teams
METHOD I (hypothesis test)
t16 
110  93
15 2 15 2

11
7
 2.344
.01< p-value <.02
Since p-value> α fail to reject the null
METHOD II (confidence interval)
99% CI for μUW - μCSU : (110-93) ± 2.921 *Sqrt( 152/11 + 152/7)
= {-4.184. 38.184}
Since 0 is in the confidence interval, we fail to reject the null
METHOD III (pair of confidence intervals)
For UW : 110 ± 2.921 *Sqrt( 152/11 ) = {96.789. 123.211}
For CSU : 93 ± 2.921 *Sqrt( 152/7 ) = {76.439, 109.561}
Since the confidence intervals overlap, we fail to reject the null
Conclude that the claim is false, there is not enough evidence to suggest that UW has more rushing
yards on average than CSU.
Download