AP Statistics Tuesday, 09 February 2016 • OBJECTIVE TSW explore Hypothesis Testing. • Student to Ms. Havens: “Is either yesterday’s test or the previous test graded yet?” • Ms. Havens to student: “No.” • Student to Ms. Havens: “Do you know when they will be graded?” • Ms. Havens to student: “No.” • Ms. Havens is sorry to not be able to give student a good answer, but she will try to have them graded by the end of the week. Hypothesis Tests One-Sample Means can I tell if they really Example:How A government agency are underweight? has received numerous complaints that a particular Hypothesis restaurant has been selling testing will help underweight me decide! Takehamburgers. a sample & find x.The restaurant advertises that it’s pattiesButare how“adoquarter I know ifpound” this x is(4 one that I expect to happen or is it one ounces). that is unlikely to happen? What are hypothesis tests? Calculations that tell us if a value occurs by random chance or not. If it is statistically significant, is it ... – a random occurrence due to variation? – a biased occurrence due to some other reason? Nature of hypothesis tests How does a murder trial work? • First begin by supposing the “effect” NOTthat present First - is assume the innocent • Next,person see ifisdata provides Then – must have sufficient evidence against the evidence to prove guilty supposition Hmmmmm … Example: murder Hypothesis tests use the same process! trial Steps: Notice the steps are the same except we add hypothesis statements – which you will learn today 1) Assumptions 2) Hypothesis statements & define parameters 3) Calculations 4) Conclusion, in context Assumptions for z-test (t-test): YEA – • • These are the same Have an SRS of context assumptions as confidence intervals!! Distribution is (approximately) normal – Given – Large sample size – Graph data • s is known (unknown) Example 1: Bottles of a popular cola are supposed to bottles contain 300 mL of •Have an SRS of cola. There is some variation from •Sampling distribution is approximately normalto because boxplot is bottle bottle.theAn inspector, who symmetrical suspects that the bottler is under• s is unknown filling, measures the contents of six randomly selected bottles. Are the assumptions met? 299.4 297.7 298.9 300.2 297 301 Writing Hypothesis statements: • Null hypothesis – is the statement being tested; this is a statement of “no effect” or “no difference” H0: • Alternative hypothesis – is the statement that we suspect is true Ha: The form: Null hypothesis H0: parameter = hypothesized value Alternative hypothesis Ha: parameter > hypothesized value or Ha: parameter < hypothesized value or Ha: parameter ≠ hypothesized value Example 2: A government agency has received numerous complaints that a particular restaurant has been selling underweight hamburgers. The restaurant advertises that it’s patties are “a quarter pound” (4 ounces). State the hypotheses : H0: m = 4 Ha: m < 4 You MUST indicate what μ represents! Where m is the true mean weight of hamburger patties Example 3: A car dealer advertises that his new subcompact models get 47 mpg. You suspect the mileage might be overrated. State the hypotheses : H0: m = 47 Ha: m < 47 Where m is the true mean mpg AP Statistics Wednesday, 10 February 2016 • OBJECTIVE TSW explore Hypothesis Testing. • TEST: Sampling Distributions is graded. – You will get it back after lunch. Example 4: Many older homes have electrical systems that use fuses rather than circuit breakers. A manufacturer of 40-A fuses wants to make sure that the mean amperage at which its fuses burn out is in fact 40. If the mean amperage is lower than 40, customers will complain because the fuses require replacement too often. If the amperage is higher than 40, the manufacturer might be liable for damage to an electrical system due to fuse malfunction. State the hypotheses : H0: m = 40 Ha: m ≠ 40 Where m is the true mean amperage of the fuses Facts to remember about hypotheses: • ALWAYS refer to populations (parameters) • The null hypothesis for the “difference” between populations is usually equal to zero H0: mx-y= 0 • The null hypothesis for the correlation (rho) of two events is usually equal to zero. H0: r= 0 Activity: For each pair of hypotheses, are not Must use indicate parameter (population) Must bewhich NOT equal! x is&aexplain statisticwhy: (sample) legitimate a) H0 : m 15 ; Ha : m 15 is the population b) H0 : x 123; Ha : x 123 proportion! Must use same c) rHis0 parameter :number 0.1; H0a! population : 0.1 as for H correlation coefficient – but H0 d) H0 : mMUST 0.4;beH“=“ a :!m 0.6 e) H0 : r 0 ; Ha : r 0 P-values • The probability that the test statistic would have a value as extreme or more than what is actually observed In other words . . . is it far out in the tails of the distribution? Level of significance • Is the amount of evidence necessary before we begin to doubt that the null hypothesis is true • Is the probability that we will reject the null hypothesis, assuming that it is true • Denoted by a – Can be any value – Usual values: 0.1, 0.05, 0.01 – Most common is 0.05 Statistically significant – • The p-value is as small or smaller than the level of significance (a) • If p > a, “fail to reject” the null hypothesis at the a level. • If p < a, “reject” the null hypothesis at the a level. Facts about p-values: • ALWAYS make a decision about the null hypothesis! • Large p-values show support for the null hypothesis, but never that it is true! • Small p-values show support that the null is not true. • Double the p-value for two-tail (≠) tests • Never accept the null hypothesis! Never “accept” the null hypothesis! Never “accept” the null hypothesis! Never “accept” the null hypothesis! At an a level of 0.05, would you reject or fail to reject H0 for the given p-values? a) b) c) d) 0.03 0.15 0.45 0.023 Reject Fail to reject Fail to reject Reject Calculating p-values • For z-test statistic – – Use normalcdf(lb,ub) – [using standard normal curve] • For t-test statistic – – Use tcdf(lb, ub, df) Draw & shade a curve & calculate the p-value: 1) right-tail test t = 1.6; n = 20 p = 0.06305 2) left-tail test z = -2.4; n = 15 p = 0.008198 3) two-tail test p = 0.03045 t = 2.3; n = 25 Assignment • WS Hypothesis Testing #1 – Due on Tuesday, 16 February 2016. Writing Conclusions: 1) A statement of the decision being made (reject or fail to reject H0) & why (linkage) AND 2) A statement of the results in context. (state in terms of Ha) “Since the p-value < (>) a, I reject (fail to reject) the H0. There is (is not) sufficient evidence to suggest that Ha.” Be sure to write Ha in context (words)! Example 5: Drinking water is considered unsafe if the mean H0: m = 15 concentration of lead is 15 ppb (parts Ha: m > 15 or greater. Suppose a per billion) t=2.1 Where m is the true mean concentration community randomly selects of 25 of leadsamples in drinking water water and computes Since the p-value < a, I reject Ha0. t-test There is statistic 2.1. Assume that lead sufficient evidence to suggest that the P-value =of tcdf(2.1,10^99,24) concentrations are of normally mean concentration lead in drinking =0.0232 water is greater thanthe 15 ppb. distributed. Write hypotheses, calculate the p-value & write the appropriate conclusion for a = 0.05. Example 6: A certain type of frozen dinners states that the dinner H0: m240 = 240calories. calories A random contains Ha: of m > 12 240of calories sample these frozen dinners t=1.9 Where m is the mean caloric was selected fromtrue production to see Since the p-value <frozen a, I reject H0. There is content of the dinners if the caloric content was greater sufficient evidence to suggest that the than stated on the box. The t-test P-value = tcdf(1.9,10^99,11) true mean caloric content of these frozen statistic was calculated to be 1.9. =0.0420 dinners is greater than 240 calories. (Assume calories vary normally.) Write the hypotheses, calculate the p-value & write the appropriate conclusion for a = 0.05. ASSUMPTIONS • • • • SRS (given) If a is not given, Normal distribution (given) include it! s unknown Ho: m = 240 calories Ha: m > 240 calories, where m is the true mean caloric content of frozen dinners p-value = tcdf(1.9, ∞, 11) = 0.04197 < a = 0.05 Since p ≤ a, we reject H0. There is evidence to suggest that the true mean caloric content of frozen dinners is greater than 240 calories. Formulas: s known: statistic - parameter test statistic standard deviation of statistic z= x m σ n Formulas: s unknown: statistic - parameter test statistic standard deviation of statistic t= x m s n AP Statistics Friday, 12 February 2016 • OBJECTIVE TSW explore the aspects of hypothesis testing. • ASSIGNMENTS DUE TUESDAY – – – – WS WS WS WS Hypothesis Testing #1 Hypothesis Testing #2 Hypothesis Testing #3 Matched Pairs due 02/19/16 • LOOKING AHEAD – Tuesday, 02/16/2016: QUIZ: Hypothesis Testing – Wednesday, 02/17/2016: ASSESSMENT: Hypothesis Testing REVIEW: Hypothesis Testing – Friday, 02/19/2016: TEST: Hypothesis Testing Hypothesis Testing WS #1 1a) No, H0 must be = c) b) No, must use parameter μ 2a) b) H0: μ = 30 ppm Where μ is the true mean nitrate concentration in the Ha: μ > 30 ppm water c) d) H0: μ = $42,500 Where μ is the true mean household income of mall Ha: μ > $42,500 shoppers e) 3a) Yes, the normal probability (quantile) plot is approximately linear so the distribution is approximately normal. b) Hypothesis Testing WS #2 1) 3) fail to reject Ho 5) 2) reject Ho 4) 6) p-value = .0256 7) P-value = .092896 8) 9) p – value = tcdf(2.056, 1E99, 14)(2) = .02946(2) = .05892 Since the p-value > α, I fail to reject the null hypothesis. There is not sufficient evidence to suggest that the true mean diameter of the catheters is not equal to 2.00 mm. Example 7: The Fritzi Cheese Company buys milk from several suppliers as the essential raw material for its cheese. Fritzi suspects that some producers are adding water to their milk to increase their profits. Excess water can be detected by determining the freezing point of milk. The freezing temperature of natural milk varies normally, with a mean of -0.545 degrees and a standard deviation of 0.008. Added water raises the freezing temperature toward 0 degrees, the freezing point of water (in Celsius). The laboratory manager measures the freezing temperature of five randomly selected lots of milk from one producer with a mean of -0.538 degrees. Is there sufficient evidence to suggest that this producer is adding water to his milk? (Full write-up.) Assumptions: SRS? Normal? •I have an SRS of milk from one producer How do you •The freezing temperature of milk is a normal know? distribution. (given) • s is known Do you What are your H0: μ = -0.545 know s? hypothesis Ha: μ > -0.545 statements? Is there a keyofword? where μ is the true mean freezing temperature milk .538 .545 z 1.9566 .008 5 Plug values into formula. p-value = normalcdf(1.9566, 1E99) = 0.0252 Use normalcdf to calculate p-value. α = .05 Compare your p-value to α & make decision Since p-value < α, I reject the null hypothesis. There is sufficient evidence to suggest that the true mean freezing temperature is greater than -0.545. This suggests that the producer is adding water to the milk. Conclusion: Write conclusion in context in terms of Ha. Example 8: The Degree of Reading Power (DRP) is a test of the reading ability of children. Here are DRP scores for a random sample of 44 third-grade students in a suburban district: (data on note page) At the a = 0.1, is there sufficient evidence to suggest that this district’s third graders reading ability is different than the national mean of 34? (Full write-up.) • I have an SRS of third-graders SRS? Normal? •Since the sample size is large, the sampling How do you know? distribution is approximately normally distributed Do you • s is unknown know s? What are your H0: μ = 34 Ha: μ ≠ 34 hypothesis where μ is the true mean reading statements? Is there a key word? ability of the district’s third-graders 35.091 34 t .6467 11.189 44 Plug values into formula. p-value = 2*tcdf(0.6467, 1E99, 43) = 2*0.2606 = 0.5212 Use tcdf to calculate p-value. α = 0.1 Compare your p-value to α & make decision Since p-value > α, I fail to reject the null hypothesis. There is not sufficient evidence to suggest that the true mean reading ability of the district’s third-graders is different than the national mean of 34. Conclusion: Write conclusion in context in terms of Ha. Example 9: The Wall Street Journal (January 27, 1994) reported that based on sales in a chain of Midwestern grocery stores, President’s Choice Chocolate Chip Cookies were selling at a mean rate of $1323 per week. Suppose a random sample of 30 weeks in 1995 in the same stores showed that the cookies were selling at the average rate of $1208 with standard deviation of $275. Does this indicate that the sales of the cookies is different from the earlier figure? (Include assumptions.) Assumptions •Have an SRS of weeks •Distribution of sales is approximately normal due to large sample size • s unknown H0: μ = 1323 where μ is the true mean cookie sales per week Ha: μ ≠ 1323 1208 1323 t 2.29 p-value 0.0295 275 30 Since p-value < α of 0.05, I reject the null hypothesis. There is sufficient evidence to suggest that the sales of cookies are different from the earlier figure. Example 9 (Continued): President’s Choice Chocolate Chip Cookies were selling at a mean rate of $1323 per week. Suppose a random sample of 30 weeks in 1995 in the same stores showed that the cookies were selling at the average rate of $1208 with standard deviation of $275. Compute a 95% confidence interval for the mean weekly sales rate. (Just compute the interval.) CI = ($1105.30, $1310.70) Based on this interval, is the mean weekly sales rate statistically different from the reported $1323? The sales rate is statistically different, since the reported mean of $1323 is not in the interval. Assignment • WS Hypothesis Testing #3 – Due on Tuesday, 16 February 2016. • WS Matched Pairs – Due on Friday, 19 February 2016. Matched Pairs Test A special type of tinference Matched Pairs – Two Forms • • • • Form #1 Pair individuals by certain characteristics Randomly select treatment for individual A Individual B is assigned to other treatment Assignment of B is dependent on assignment of A Form #2 • Individual persons or items receive both treatments • Order of treatments is randomly assigned; or, before & after measurements are taken • The two measures are dependent on the individual Is this an example of matched pairs? 1) A college wants to see if there’s a difference in time it took last year’s class to find a job after graduation and the time it took the class from five years ago to find work after graduation. Researchers take a random sample from both classes and measure the number of days between graduation and first day of employment. No, there is no pairing of individuals, you have two independent samples Is this an example of matched pairs? 2) In a taste test, a researcher asks people in a random sample to taste a certain brand of spring water and rate it. Another random sample of people is asked to taste a different brand of water and rate it. The researcher wants to compare these samples. No, there is no pairing of individuals, you have two independent samples – If you would have the same people taste both brands in random order, then it would be an example of matched pairs. Is this an example of matched pairs? 3) A pharmaceutical company wants to test its new weight-loss drug. Before giving the drug to a random sample, company researchers take a weight measurement on each person. After a month of using the drug, each person’s weight is measured again. Yes, you have two measurements that are dependent on each individual. A whale-watching company noticed that many customers wanted to know whether it was better to book an excursion in the morning or the afternoon. To test this question, the You may subtract either company collected the following data on 15 way – just be careful when Ha randomly writing selected days over the past month. (Note: days were not consecutive.) Day 1 2 3 4 Morning 8 9 7 9 10 13 10 8 Afternoon 8 10 9 8 9 5 6 7 11 8 Since you have two values for each day, they are dependent on the day – making this data matched pairs 8 9 10 11 12 13 14 15 2 5 7 7 6 8 7 10 4 7 8 9 6 6 9 First, you must find the differences for each day. Day 1 2 3 Morning 8 9 7 9 10 13 10 8 Afternoon 8 10 9 8 9 Differe nces 0 -1 4 5 1 1 2 6 7 8 9 10 11 12 13 14 15 2 5 7 7 6 8 7 11 8 10 4 7 8 9 6 6 9 I subtracted: - Morning – afternoon 2 2 -2 -2 0 2 2 1 2 2 You could subtract the other way! • Have an SRS of days for whale-watching You need to state assumptions using the •s unknown differences! Assumptions: •Since the normal probability plot is approximately linear, the distribution of differences is approximately Notice that, even with the normal. granularity in this plot, it still displays a nice linear relationship! Differenc es 0 -1 -2 1 1 2 2 -2 -2 -2 -1 -2 0 2 Is there sufficient evidence that more whales are sighted in the afternoon? H0: μD = 0 Ha: μD < 0 Be careful writing your Ha! Think about how you– If you subtract afternoon subtracted: M-A Noticemorning; we used μDthen for differences Ha: μD>0 If afternoon is more should & it equals 0 since the null should be the is differences be + or -? that there NO difference. Don’t look at numbers!!!! Where μD is the true mean difference in whale sightings from morning minus afternoon -2 Differenc es 0 -1 -2 1 1 2 2 -2 -2 -2 Finishing the hypothesis test: x m 0.4 0 t 0.945 s 1.639 n 15 p 0.1803 df 14 a 0.05 -1 -2 0 2 In your calculator, perform a t-test Notice thatthe if you using subtracted differencesA-M, (L3) then your test statistic t = + 0.945, but pvalue would be the same Since p-value > a, I fail to reject H0. There is insufficient evidence to suggest that more whales are sighted in the afternoon than in the morning. -2 Assignment • WS Matched Pairs – Due on Friday, 19 February 2016 (TEST day).