Stat 4220 homework 1) Santa noticed that a lot of people put candy canes on their trees. He wants to evaluate whether the color of a candy cane is related to the type of Christmas tree. After randomly selecting 3054 houses Santa collected the following data. Answer all three questions at the bottom of the paper. Color of Candy Cane Observations Classic Type of Christmas Tree Flocked Artificial Aluminum Red/White Blue/White Solid Red Brown Black with Green dots 298 301 300 293 201 210 203 212 132 162 148 132 97 101 112 103 19 15 13 2 Color of Candy Cane Expected Counts Classic Type of Christmas Tree Flocked Artificial Aluminum Red/White Blue/White Solid Red Brown Black with Green dots 292 308 303 290 202 213 A 201 140 148 146 139 101 107 105 100 12 13 12 12 Color of Candy Cane Partial Chi-Squared Classic Type of Christmas Tree Flocked Artificial Aluminum Red/White Blue/White 0.142 0.005 0.157 0.054 0.027 Censored 0.040 0.638 Solid Red 0.502 1.267 0.032 0.399 Brown 0.160 0.304 0.475 0.070 Black with Green dots B 0.433 0.024 8.241 Chi-Squared Value (meaning the sum of all the values in the completed table above): 17.30 What are the missing values for A and B A=209.88 to 210 depending on how you round B=4.106 2) A journalist want to estimate the proportion of students who are in debt. The current best guess is that the proportion should be about 75%. She wants to get a 60% confidence interval for the true proportion with a margin of error that is less than 0.013. How many students should she survey? .013=.84*sqrt(.75*(1-.75)/n) N=783 3) The Daily Stat Fact reports that over 10% of engineers get a job that is not engineering related. I think the report is way off. I randomly sample 625 engineers and find that 49 of them got a job that was not engineering related. Test whether the percentage reported really is too high H0:p≥.1 Ha:p<.1 α=0.05 z=(49/625-.1)/sqsrt(.1*.9/625)=-1.8 p-value=.036 Reject Our data does show the proportion of 10% is too high 4) Harry Potter believes that he can tell if a person is a bad guy by listening to the background music when they come near. To find out if this is the case, Harry records what type of music he hears around 114 random people. Then Harry performs the Crucius curse to determine if the person is a good guy or bad guy. Based on the following data, determine if the type of background music is related to the person’s allegiance. Allegiance Good Bad Guys Guys Ominous Music Background Music Happy Music 45 38 13 18 Show all the steps of the hypothesis using specifically a Χ2 test of independence! H0: Allegiance is independent of music Ha: Allegiance depends on music α=0.05 Allegiance Good Guys Bad Guys Background Music Ominous Music Happy Music 42.2 15.8 40.8 15.2 Allegiance Good Guys Bad Guys Background Music Df=1 Chisquared = 1.167 Ominous Music Happy Music .18196 .48717 .18845 .50457 .20<p-value<.25 Fail to Reject Our data does not show you can tell who is the bad guy based on the music 5) Katelyn has discovered that salt-licks from the Great Salt Lake are normally distributed, but they contain trace amounts of arsenic. She asks four of her friends to buy a salt-lick and measure the amount of arsenic. Here are their results: Raul: 28 cc Blaine: 44 cc Madison: 32 cc Leanne: 20 cc Using their data find a 98% CI for the amount of arsenic in a salt-lick Xbar = (28+44+32+20)/4=31 S=sqrt(((28-31)^2+(44-31)^2+(32-31)^2+(20-31)^2)/(4-1))=10 31+-4.541*10/sqrt(4) = (8.295, 53.705) 6) Suppose you are testing whether green runts cause cancer. You have a large group of people who regularly eat runts, and a large group that never eat runts, you will mark which ones develop cancer before they die. The Willy Wonka Candy Company is worried that if a link is found to cancer that it would be devastating. They ask you to be extra cautious not to hurt the company’s image unless you’re absolutely certain about the results. Choose an α level besides 0.05 and explain why. H0: p1=p2 (runts do not cause cancer) Ha: p1 ne p2 (runts do cause cancer) Type 1: We say runts do cause cancer, but they do not Type 2: We say runts don’t cause cancer when in fact they do We were asked to avoid type 1 errors, so we should lower alpha (small alpha – but I suppose students could say we want to guard against cancer and the Willy Wonka company be torqued) 7) Some buildings in Laramie have been having problems with insects nesting inside the walls. A supervisor has suggested that it could be based on whether the building has iron supports or steel supports. Based on the data below, use any method you like to test whether that could be true. Insect problems No insect problems Iron 120 250 370 Steel 140 230 370 260 480 H0: the metal type is independent of the insect problem HA: the metal type is dependant of the insect problem Alpha=0.05 If you do a proportions test the z-score should be ±1.54 If you do an independence test the chi-squared should be 2.37 P-value=.1236 or .1<p-value<.15 Fail to Reject Our data does not show that the metal type is related to the insect problem. 8) Donald Trump just finished studying 96 business, and has classified them according to the amount of risk the companies take (high, medium, or low), and what type of company (large, small, personal, or not-for-profit). His final conclusion is that the amount of risk a company takes does not depend on the type of company. Bill Gates says that is so not true. He says different types of companies have different types of risk levels. To keep the two from arguing you decide to compute the χ2 Test of Independence. When you hand the paper to Donald and Bill, they fight over it and tear the corner of the report (see the picture below). Determine statistically who you would say the data supports. As a hint, the partial χ2 values that you can see add up to 13.19, and the assumptions are met for the test. H0: company risk level is independent of size Ha: company risk level depends on size Alpha = 0.05 High risk non-profit chi-squared value is (8-5)^2/5=1.8 High risk large corporation expected value is (10+6+1)*30/(32+34+30)=5.3125 High risk large corporation chi-squared value is (5.3125-1)^2/5.3125=3.5 Chi-squared = 13.19+1.8+3.5=18.49 .015<p-value<.02 Reject We can say the business risk level depends on the size (Bill Gates was right) 9) Doctor Ann randomly selects 40 people to crack their knuckles daily, and 40 people to never crack their knuckles. Doctor Bob selects 40 pairs of twins and one twin will crack their knuckles daily and the other not. After 10 years they measure the amount of arthritis. Who will have a more powerful test? a) Dr. Ann’s test is more powerful because Doctor Bob’s 80 subjects are only 40 pairs of twins so his results will be similar to having a smaller sample size. b) Dr. Bob’s test is more powerful because taking the difference between twins will take out variability due to the genetics of each subject c) Dr. Bob’s test is more powerful because it is very unlikely that two different sets of twins will be related to each other which increases the chance that they were selected randomly d) Dr. Ann’s test is more powerful because the people who do not crack their knuckles will act as a control group in the experiment where they are not twins e) Dr. Ann’s test is more powerful because the subjects do know which treatment they are getting beforehand and it will reduce the risk of a placebo effect 10) Dr. Carl asks 1000 people to rate whether they “crack their knuckles frequently”, “crack their knuckles sometimes”, and “almost never crack their knuckles”. Then he evaluates if they have arthritis in their hands. What kind of test should Dr. Carl run to analyze this data assuming the conditions are met? A) 2 proportions z test B) One mean t-test C) Regression D) Matched Pairs E) Chi-squared 11) A genetics test is attempting to see if there is a relationship between nose type (Long, Medium, and Flat) and diet (Poor, Somewhat Healthy, and Healthy). Below is the data and output from a computerized Χ2 program. OBS Poor Some Healthy Long 10 12 15 Med 15 16 9 Flat 8 2 4 Χ2 Poor Long 0.87 Med 0.02 Flat 1.68 EXP Poor Some Healthy Long 13.4 12.2 11.4 Med 14.5 13.2 12.3 Flat 5.1 4.6 4.3 Some Healthy 0.003 1.15 0.60 0.89 1.48 0.02 Test whether there is a relationship between nose type and diet. There are two categories (Flat Some and Flat Healthy) which have fewer than 5 expected values, so this cannot be done 12) A test to determine if major is related to social skills looks at 4 different majors and whether the student has social skills. The test has a p-value of 0.55. What is the conclusion? A) Because the number of majors is less than 5, no conclusions can be drawn. B) The p-value is less than α, so there is evidence to suggest a link between major and social skills. C) The p-value is greater than α so there is not evidence to suggest a link between major and social skills. D) The p-value is greater than α, so there is evidence to suggest a link between major and social skills. E) The p-value cannot be great than ½, so an error was made 13) The NYTimes did a study on the proportion of football players that have sustained a head injury. Their 95% confidence interval based on 109 random NFL players was (0.571, 0.629). Check which of the following (if any) are true. X X There is a 95% probability that the proportion is between 0.571 and 0.629 95% of the time the true proportion will be between 0.571 and 0.629 This sample was not large enough to be able to use the normal distribution by the Central Limit Theorem 95% of all confidence intervals from 109 NFL players will correctly contain the true proportion The true proportion is between 0.571 and 0.629 with 95% confidence For a new CI there is a 95% probability of the sample proportion being between 0.571 and 0.629 14) A sociologist wants to show that the food you eat actually changes your perception of how other people are feeling. She gathered 1000 volunteers, and randomly selected what food they would eat. Then she asked them to look at a photograph (of a person showing no emotion) and asked them to mark what emotion they thought the person was experiences. The data is shown below. Test at the 1% significance level (with all 7 steps of a hypothesis) if the food they ate is related to the emotion chosen. Happy Chocolate 22 Oranges 25 Breadstick 33 Salad 31 Steak 11 122 Angry 16 32 29 46 8 131 Sad 30 48 32 65 7 182 Surprised 9 8 11 21 6 55 Sleepy 39 66 51 98 8 262 Scared 44 65 25 102 12 248 160 244 181 363 52 1000 The expected value for the surprised steak group is 52*55/1000 = 2.86, which is not greater than 5, this problem cannot be done. 15) Google wants to know if the type of browser you use determines what you do on the internet. They installed spyware on 400 random computers and got the following data Firefox IE Chrome Social Media 31 32 18 81 Games 49 57 15 121 Work 80 70 48 198 160 159 81 400 Test whether what you do on the computer is related to the type of browser you use. H0: Browser is independent of internet use HA: Browser use is dependent on internet use α=0.05 Social Media Games Work 32.4 48.4 79.2 Firefox 32.197 48.097 78.70 IE 16.402 24.502 40.09 Chrome 81 121 198 160 159 81 400 Firefox IE Chrome Social Media 0.060493827 0.001211468 0.15558642 Games 0.007438 1.647788 3.685236 Work 0.008081 0.962798 1.558524 Chisq=8.08 0.05 <p-value < 0.10 Fail to Reject We cannot say internet use depends on the browser 16) Nick knows the UW football team is better than CSU, but he wants to compare their average rushing yards. He is fairly certain that the rushing yards are normally distributed with the same variance for both teams. He randomly selects 11 UW games, and the average rushing yards were 110. He randomly selects 7 CSU games, and the average rushing yards were 93. Can Nick say with 99% confidence that UW has more rushing yards than CSU? Standard deviation of one game for UW: 16 yards Standard deviation of one game for CSU: 13 yards Pooled standard deviation for one game: 15 yards Matched Pairs standard deviation for one game: 7.5 yards Average standard deviation for both teams: 14.5 yards H0: μUW ≤ μCSU HA: μUW > μCSU α = .01 The rushing yards are normally distributed There are (at least) 3 different ways of doing the next steps, but you must use the pooled standard deviation for all of them because it said the variance was the same for both teams METHOD I (hypothesis test) t16 110 93 15 2 15 2 11 7 2.344 .01< p-value <.02 Since p-value> α fail to reject the null METHOD II (confidence interval) 99% CI for μUW - μCSU : (110-93) ± 2.921 *Sqrt( 152/11 + 152/7) = {-4.184. 38.184} Since 0 is in the confidence interval, we fail to reject the null METHOD III (pair of confidence intervals) For UW : 110 ± 2.921 *Sqrt( 152/11 ) = {96.789. 123.211} For CSU : 93 ± 2.921 *Sqrt( 152/7 ) = {76.439, 109.561} Since the confidence intervals overlap, we fail to reject the null Conclude that the claim is false, there is not enough evidence to suggest that UW has more rushing yards on average than CSU.