** STA 1020 - Part 3 (08/Dec/13) ** MATERIAL FOR EXAM #3 Contents Exam 3 of 3: Chance and Inference STA 1020 Quizzes every chapter and then Third Partial Exam Fall 2013 Section 09 MWF 10:40-11:35 0035 State Chapter 17 - Thinking about Chance Chapter 18 - Probability Models Instructor: Dr. J.L. Menaldi Chapter 19 - Simulation – mostly skipped! Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm Chapter 20 - The House Edge: Expected Values Chapter 21 - What is Confidence Interval? Chapter 22 - What is a Test of significance? Chapter 23 - Use and Abuse of Statistical Inference – skipped! “Statistics” is the Science of collecting, describing and interpreting data... Chapter 24 - Two-way Tables and Chi-Square Test It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA 1020 Ch17 - Thinking about Chance 1 / 112 JLM (WSU) Thought Questions. . . STA 1020 Ch17 - Thinking about Chance 2 / 112 Two Concepts of Probability Part 3: Probability. The Theory of Statistics Chapter 17 Personal-Probability Interpretation Here are two very different probability questions: Relative-Frequency Interpretation The degree to which a given individual believes the event in question will happen Personal belief (or personal ignorance about something?) If you roll a 6-sided die and do it fairly, what is the probability that it will land with “3” showing? What is the probability that in your lifetime you will travel to a foreign country other than one you have already visited? The proportion of time the event in question occurs over the long run Long-run relative frequency Two ways to determine the Relative-Frequency Probabilities Physical assumptions (theoretical mathematical model) For which question was it easier to provide a precise answer? Why? For which one could we all agree? Repeated observations (empirical results), i.e., by experience with many samples or by simulation What is wrong with the following partial answer: The probability that I will eventually travel to another foreign country (or of any other particular event happening) is 1/2, because either it will happen or it won’t JLM (WSU) STA 1020 Ch17 - Thinking about Chance 3 / 112 JLM (WSU) Ex1 Coin tossing STA 1020 Ch17 - Thinking about Chance Figure 17.1 Toss a coin many times. The proportion of heads changes as we make more tosses but eventually gets very close to 0.5. This is what we mean when we say, ”The probability of a head is one-half.” 4 / 112 Ex2 Some coin tossers The French naturalist Count Buffon (1707-88) tossed a coin 4040 times. Result: 2048 heads, or a proportion 2040/4040=0.5069 for heads Around 1900, the English statistician Karl Pearson heroically tossed a coin 24,000 times. Results: 12,012 heads, a proportion of 0.5005 While imprisoned (WW2), the South African mathematician John Kerrich tossed a coin 10,000 times. Result: 5067 heads, a proportion of 0.5067 What is called a random phenomenon? The probability of any outcome of a random phenomenon is a number between 0 and 1 that describes the proportion of times the outcome would occur in a very long series of repetitions JLM (WSU) STA 1020 5 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 1 / 19 6 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch17 - Thinking about Chance Ex3 Cannot predict Ch17 - Thinking about Chance The National Center for Health Statistics says that the proportion of men aged 20 to 24 years who died in any one year is 0.0014. This is taken as the probability that a young man will die next year. For women that age, the probability of death is about 0.0005. If an insurance company sells many policy to people aged 20 to 24, it knows (or believe?) that it will have to pay off next year on about 0.14% (0.05%) of the policies sold to men’s (women’s) lives. Logically, it will charge more to insure a man because the probability of having to pay is higher. However, we cannot predict whether a particular person will die on the next year. . . Probability answer the question “What would happen if we did this many times?” The idea of probability is that randomness is regular in long run STA 1020 Ch17 - Thinking about Chance If a basketball player makes several consecutive shots, both the fans and his teammates believe that he has a “hot hand” and is more likely to make the next shot. . . If a person win the lotto today, that same person has less change of wining again next week. . . , winning the lottery twice? Cancer is a common disease, accounting for more that 23% of all deaths in US. That cancer cases sometimes occur in clusters in the same neighborhood is not surprising: there are bound to be clusters somewhere simply by change (or not?) When a shooter in the dice game craps rolls several winners in a row, some gamblers think she/he has a “hot hand” and bet that she/he will keep on winning. Others say that “the law of average” means that she/he must now lose so that wins and losses will balance out . . . Ex8: We want a boy, the law of average affirms that . . . If we toss a coin 6 times, which of these outcomes is more probable (or look random) “HTHTTH”, “HTHTHT” or “TTTHHH” (pattern?) JLM (WSU) Ex5, 6, 7 and 8 7 / 112 JLM (WSU) Law of average STA 1020 Ch17 - Thinking about Chance Law of the large numbers: in a large number of “independent” repetitions of a random Phenomenon (such as coin tossing), averages or proportions are likely to become Stable as the number of trials increases, contrary to sums or counts . . . 8 / 112 Again. . . Relative-Frequency Probabilities Can be applied when the situation can be repeated numerous times (conceptually) and the outcome can be observed each time Relative frequency (proportion of occurrences) of an outcome settles down to one value over the long run. That one value (between 0 and 1) is then defined to be the probability of that outcome The probability cannot be used to determine whether or not the outcome will occur on a single occasion, or in a single sample (it is a long-run phenomenon) A Personal Probability of an outcome is always a number between 0 and 1 that expresses an individual’s judgment of how likely the outcome is. (the outcome may not be repeated!) Two ways: “personal judgment of how likely” and “what happens in may repetitions” Figure 17.3 Toss a coin many times. The difference between the observed number of heads and exactly one-half the number of tosses becomes more variable as the number of tosses increases. JLM (WSU) STA 1020 Ch17 - Thinking about Chance 9 / 112 Risk and Relative Risk (Case Study)The following table gives results for whether or not subjects were still smoking when given a nicotine patch or a placebo: Nicotine Placebo JLM (WSU) No 56 (46.7%) 24 (20%) STA 1020 STA 1020 Ch17 - Thinking about Chance High exposure to asbestos is dangerous. Low exposure, such as that experienced by teachers and students in schools where asbestos is present in the insulation around pipes, is not very risky. The probability that a teacher who works for 30 years in a school with typical asbestos levels will get cancer from the asbestos is around 15/1,000,000. The risk of dying in a car accident during a lifetime is about 15,000/1,000,000, i.e., 1000 times more risky, but . . . Yes 64 (53.3%) 96 (80%) JLM (WSU) Ex9 Risk Total 120 (100%) 120 (100%) 10 / 112 Relative Risk Risk of continuing to smoke: Nicotine: 0.533 (just the proportion from the table) Placebo: 0.800 Relative risk of continuing to smoke when using the placebo patch compared with when using the nicotine patch is 1.5 (0.800/0.533 = 1.5) The risk of continuing to smoke when using the placebo patch is 1.5 times the risk when using the nicotine patch Cautions about Risk What if the baseline risk is missing? The relative risk means “relative” to what? The reported risk is not necessarily your risk. Are the subjects and the setting of the study representative of you and your situation? 11 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 2 / 19 12 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch17 - Thinking about Chance Exercise Ch17 Ch17 - Thinking about Chance 17.12 Marital status. The probability that a randomly chosen 50-year old woman is divorced is about 0.18. This probability is a long-run proportion based on all the millions of women aged 50. Let’s suppose that the proportion stays at 0.18 for the next 30 years. Bridget is now 20 years old and is not married. (a) Bridget thinks her own chances of being divorced at age 50 are about 5%. Explain why this is a personal probability. (b) Give some good reasons why Bridget’s personal probability might differ from the proportion of all women aged 50 who are divorced. (c) You are a government official charged with looking into the impact of the Social Security system on middle-aged divorced women. You care only about the probability 0.18, not about anyones personal probability. Why? JLM (WSU) STA 1020 Ch17 - Thinking about Chance Exercise (answer) Ch17 **Answers (a) This is based on a personal judgment of her likelihood to get divorced; it is not based on data on repeated trials of an experiment. (b) For example, Bridget might have strong religious or moral beliefs that make her less inclined to consider divorce. (c) For the overall impact of divorce, we are concerned with the percentage of all 50-year-old women who are divorced. The probability 0.18 is supported by data, and is known to apply to the whole group. 13 / 112 JLM (WSU) Multiple choice Ch17 STA 1020 14 / 112 Ch18 - Probability Models If I toss a fair coin 5,000 times (a) the number of heads will be close to 2,500. (b) the proportion of heads will be close to 0.5. (c) the proportion of heads in these tosses is a parameter. (d) the proportion of heads will be exactly to 50%. Answer: (b) STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State ....................................................................... There are 2,598,960 possible 5-card hands that can be dealt from an ordinary 52-card deck. Of these, 5,148 have all five cards of the same suit (in poker such hands are called flushes). The probability of being dealt such a hand (assuming randomness) is closest to (a) 1/4. (b) 1/100. (c) 1/500. (d) 1/1000. Answer: (c) Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA 1020 Ch18 - Probability Models 15 / 112 Choose a woman aged 25 to 29 old at random and record her marital status, i.e., a SRS of size n=1. The probability of any marital status is just the proportion of all women aged 25 to 29 who have that status, if we choose many women, we get Never married 0.503 Married 0.452 Widowed 0.003 STA 1020 Ch18 - Probability Models Chapter 18 Marital status: Probability: JLM (WSU) Ex1 Marital Status 16 / 112 Avoid Being Inconsistent Sketching. . . For instance, the probability of married with children must not be greater than the probability that the couple is married Divorced 0.042 Because of the proportions To find out P(not married), we add P(never married), P(widowed) and P(divorced), i.e., 0.503 + 0.003 + 0.042 = 0.548 Adding P(not married) and P(married) should give 1, so P(not married) is also equal to 1 − 0.452 = 0.548. JLM (WSU) STA 1020 A probability model for a random phenomenon describes all the possible outcomes and says how to assign probabilities to any collection of outcomes. We sometimes call a collection of outcomes an event 17 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 3 / 19 18 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch18 - Probability Models Probability Rules A-B Ch18 - Probability Models *C* The probability that an event does not occur is 1 minus the probability that the event does occur “These rules tell us only what probability models make sense!” *A* Any probability is a number between 0 and 1 A probability can be interpreted as the proportion of times that a certain event can be expected to occur If the probability of an event is more than 1, then it will occur more than 100% of the time (Impossible!) *B* All possible outcomes together must have probability 1 Because some outcome must occur on every trial, the sum of the probabilities for all possible outcomes must be exactly one If the sum of all of the probabilities is less than one or greater than one, then the resulting probability model will be incoherent JLM (WSU) STA 1020 Ch18 - Probability Models If the probability that a flight will be on time is 0.70, then the probability it will be late is 0.30 *D* If two events have no outcomes in common, they are said to be “mutually exclusive”. The probability that one or the other of two mutually exclusive events occurs is the sum of their individual probabilities Example: Age of woman at first child birth. Given (a) under 20: 25% and (b) 20-24: 33%, find (1) 24 or younger: ?, Rule D says 25% + 33% = 58%, and (2) 25+: ?, Rule C says 100% − 58% = 42% 19 / 112 JLM (WSU) 20 / 112 Ex3 A Sampling distribution Figure 18.2 The sampling distribution of a sample proportion p̂ from SRSs of size 2527 drawn from a population in which 50% of the members would give positive answers. The histogram shows the distribution from 1000 samples. Now it’s your turn: How about the events “roll a 7” and “roll a 11”? STA 1020 STA 1020 Ch18 - Probability Models Assume carefully made dice, resulting in fair dice (i.e., each outcome is equally possible). The event “roll a 5” contains four outcomes, ”1+4”, “2+3”, “3+2”, “4+1”, so that P(roll a 5) = 1/36 + 1/36 + 1/36 + 1/36 = 4/36 = 0.111. Ch18 - Probability Models As a jury member, you assess the probability that the defendant is guilty to be 0.80. Thus you must also believe the probability the defendant is not guilty is 0.20 in order to be coherent (consistent with yourself). Ex2 Rolling two dice Figure 18.1 There are 6 possible outcomes for each die, so 36 for two dice. JLM (WSU) Probability Rules C-D The Normal curve is the ideal pattern that describes the results of a very large number of samples, in this case, with x̄ = 0.5 and s = 0.010. • • • So, the ‘95’ part of the 68-95-99.7 rule says than 95% of all samples will give a p̂ within 0.48 = 0.50 − 0.02 and 0.52 = 0.50 + 0.02 21 / 112 JLM (WSU) Ex4 & Ex5 Gambling STA 1020 Ch18 - Probability Models An opinion poll asks an SRS of 501 teens, “Generally speaking, do you approve or disapprove of legal gambling or betting?” Suppose exactly 50% of all teens would say ‘yes’ (i.e., the parameter p = 0.5), and that the sampling distribution follows approximatively a normal curve with x̄ = 0.5 and s = 0.022. 22 / 112 Sampling distribution The sampling distribution of a statistic tells us what values the statistic takes in repeated samples from the same population and how often it takes those values We think of a sampling distribution as assigning probabilities to the values the statistic can take. Because there are usually many possible values, sampling distributions are often described by a density curve such as a normal curve. A sampling distribution Tells what values a statistic (calculated sample value) takes and how often it takes those values in repeated sampling Assigns probabilities to the values a statistic can take. These probabilities must satisfy Rules A-D Probabilities are often assigned to intervals of outcomes by using areas under density curves Figure 18.3 The Normal sampling distribution. Because 0.478 is one standard deviation below the mean, the area under the curve to the left of 0.478 is 0.16. JLM (WSU) Figure 18.4 The Normal sampling distribution. The outcome 0.52 has standard score 0.9, so Table B tells us that the area under the curve to the left of 0.52 is 0.8159 STA 1020 Often this density curve is a normal curve Can use “68-95-99.7 rule” or get probabilities from Table B Sample proportions (i.e., p̂) follow a normal curve Check Case Study Evaluated 23 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 4 / 19 24 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch18 - Probability Models Who Voted? Ch18 - Probability Models –[ World Almanac and Book of Facts (1995), Famighetti, R. (editor), Mahwah, N.J.: Funk and Wagnalls ]– • 56% of registered voters actually voted in the 1992 presidential election. • In a random sample of 1600 voters, the proportion who claimed to have voted was 0.58. • Such sample proportions (p̂) from repeated sampling would have a normal distribution with a mean of 0.56 and a standard deviation of 0.012. • What is the probability of observing a sample proportion (p̂) as large or larger than 58%? Independent Events (Ch19!) If two events do not influence each other, and if knowledge about one does not help with the knowledge of the probability of the other, the events are said to be independent of each other. If two events are independent, the probability that they both happen is found by multiplying their individual probabilities. Example: Suppose that about 20% of incoming male freshmen smoke. Suppose that these freshmen are randomly assigned in pairs to dorm rooms. Then . . . the probability of a match (both smokers or both non-smokers): Solution: both are smokers: 0.04 = (0.20)(0.20) If we convert the observed value of 0.58 to a standardized score, we get standardized score z = (x − x̄)/s, i.e., (0.58 − 0.56)/0.012 = 1.67 neither is a smoker: 0.64 = (0.80)(0.80) From Table B, this is the 95.54 percentile, so the probability of observing a value as small as 0.58 is 0.9554 both are or neither is a smoker: 0.04 + 0.64 = 0.68 only one is a smoker: Rule C, (1 − 0.68), i.e., 32% By Rule C (or B), the probability of observing a value as large or larger than 0.58 is 1 − 0.9554 = 0.0446 JLM (WSU) STA 1020 Ch18 - Probability Models 25 / 112 Top 20% 0.44 Second 20% 0.26 Third 20% 0.23 STA 1020 Ch18 - Probability Models 18.8 High school academic rank. Select a first-year college student at random and ask what his or her academic rank was in high school. Here are the probabilities, based on proportions from a large sample survey of first-year students: Rank Probability JLM (WSU) Exercise Ch18 Fourth 20% 0.06 26 / 112 Exercise (answer) Ch18 **Answers (a) The sum is 1, as we expect, because all possible outcomes are listed. (b) 1 − 0.44 = 0.56. (c) 0.44 + 0.26 = 0.70. Lowest 20% 0.01 (a) What is the sum of these probabilities? Why do you expect the sum to have this value? (b) What is the probability that a randomly chosen first-year college student was not in the top 20% of his or her high school class? (c) What is the probability that a first-year student was in the top 40% in high school? JLM (WSU) STA 1020 Ch18 - Probability Models 27 / 112 JLM (WSU) Multiple choice Ch18 Choose an American household at random and ask how many computers that household owns. Here are the probabilities as of 2003: Number of computers Probability 0 0.38 1 0.44 STA 1020 28 / 112 Ch20 - The House Edge: Expected Valued STA 1020 2 0.18 Fall 2013 Section 09 MWF 10:40-11:35 0035 State Instructor: Dr. J.L. Menaldi 1 2 This is a legitimate assignment of probabilities because it satisfies these rules: (a) all the probabilities are between 0 and 1. (b) all the probabilities are between 0% and 100%. (c) the sum of all the probabilities is exactly 1. (d) both (a) and (c). Answer: (d) What is the probability that a randomly chosen household owns more than one computer? (a) 0.56. (b) 0.18. (c) 0.44. (d) 0.62. Answer: (b) JLM (WSU) STA 1020 Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible 29 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 5 / 19 30 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch20 - The House Edge: Expected Valued Thought Questions. . . Ch20 - The House Edge: Expected Valued Chapter 20 Raffle tickets Long-Term Gains, Losses and Expectations Tickets to a sorority fund-raiser sell for $1. Expected Value is what you logically expect in the long run . . . One ticket will be randomly chosen, the ticket owner receives a $200 gift card. (expected value) = a1 p1 + a2 p2 + · · · + an pn , where ai is the value (e.g., amount of money) that you expect if the outcome i happens, and pi is the probability (chance) that outcome i occurs, for i = 1, 2, . . . , n. They expect to sell 1000 tickets. Your ticket has a 1/1000 = 0.001 probability of winning (and a 0.999 probability of losing). ....................................................................... ** Suppose that a sorority pledge class is selling raffle tickets to raise money. The grand prize is a $200 gift certificate to the campus bookstore, and the pledges must sell all 1000 raffle tickets that were printed. How much would you be willing to pay for a single ticket? Explain your answer. ....................................................................... Your expected gain (expected value) is ($199)(0.001) + (−$1)(0.999) = −$0.80. The Main Point... While we cannot predict individual outcomes, we can “estimate” what happens (on average, i.e., repeating this over and over) in the long run. JLM (WSU) STA 1020 Ch20 - The House Edge: Expected Valued loose $0 0.999 31 / 112 win $250 0.001 loose $0 0.994 order $42 0.005 STA 1020 Number of vehicles Proportion The expected value is ($0)(0.994) + ($42)(0.005) + ($292)(0.001) = $0.502 Which one is better in the long run? Ans: If you keep playing * ’Straight’ you will loose $0.50-$0.25=$0.25, i.e., 50% and * ’Straight-Box’ $1.000-$0.502=$0.498, i.e., 49.8% STA 1020 32 / 112 Vehicles What is the average number of motor vehicles in American households? The Census Bureau tells us that the distribution of vehicles per household (2000 year census) is as follows: exact $292 0.001 You may choose to make a $1 ’Straight-Box’ (6-way) wager. You again choose a three-digit number, but now you have two ways to win. You win $292 if you exactly match the winning number, and you win $42 if your number has the same digits as the winning number, but in any order. Ch20 - The House Edge: Expected Valued JLM (WSU) Ch20 - The House Edge: Expected Valued The average or expected value is ($0)(0.999) + ($250)(0.001) = $0.25 JLM (WSU) Long term, you lose an average of $0.80 each time (conceptually) you enter such a contest (Hey, the sorority needs to make a profit!). Daily Numbers A simple lottery wager, the ‘Straight’ from Pick 3 game of the Tri-State Daily Numbers. You pay $0.50 and choose a three-digit number, and the state chooses a three-digit winning number at random and pays you $250 if your number is chosen. Outcomes n = 2 Outcome Value (ai ) Probability (pi ) Two outcomes: (a) You win $200, net gain is $199 (chance: 0.001) or (b) You do not win, net ‘gain’ is -$1 (chance: 0.999) 0 0.10 1 0.34 2 0.39 3 0.13 4 0.03 5 0.01 The expected value is (0)(0.10) + (1)(0.34) + · · · + (5)(0.01) = 1.68 . ............................................................... Deal or No Deal? (1) You choose one of four sealed cases; one contains $1,000, and the others are empty. If you open your case, you have a 25% chance to win $1,000 and a 75% chance of getting nothing (winning $0). Or, (2) you can sell your unopened case for $240, giving you a 100% chance of winning $240. * First option (open your case): EV = ($1000)(0.25) + ($0)(0.75) = $250 * Second option (sell your case): EV = $240, no variation. ** Make a Decision: Will you open or sell your case? 33 / 112 JLM (WSU) Deal or No Deal? (cont) STA 1020 Ch20 - The House Edge: Expected Valued Summary: Option 1 - a 25% chance to win $1,000 and a 75% chance of getting nothing, EV=$250 Option 2 - a gift of $240, guaranteed, EV=$240 Analysis If choosing for ONE trial 34 / 112 Deal or No Deal? (variation) Now, a variation: (1) You have a case containing $740 of your money. If you give away your case, you have a 100% chance of losing $740. Or, (2) you can keep your case and play a game in which you have a 75% chance to lose $1,000 and a 25% chance to lose nothing ($0) (1) Give away your case: EV = $740, no variation, a sure loss of $740 (2) Play the game: EV = ($1000)(0.75) + ($0)(0.25) = $750 a 75% chance to lose $1,000 and a 25% chance to lose nothing Option (1) will maximize potential gain ($1000) and also minimize potential loss ($0) Option (2) guarantees a gain ($240) Make a Decision . . . . . . . . . . . . . . . . . . . . . . . .Will you play the game or not? If choosing for ONE trial If choosing for MANY trials Option (1) will maximize expected gain (will make more money in the long run) How many trials are necessary for ‘long run’ ?, 500? Option (2) will minimize potential gain ($0) and will also maximize potential loss ($1000) Option (1) guarantees a loss of ($740) If choosing for MANY trials Option (1) will minimize expected loss (will lose less money in the long run) How many trials are necessary for ‘long run’ ?, 500? JLM (WSU) STA 1020 35 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 6 / 19 36 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch20 - The House Edge: Expected Valued The Law of Large Numbers Ch20 - The House Edge: Expected Valued The actual average (mean) outcome of many independent trials gets closer to the expected value as more trials are made. the higher the variability of the trials, the larger the sample is needed A couple plan to have children until they have a girl or until they have three children. What is the probability that they will have a girl among their children? 1 The probability model is like that for coin tossing: (a) Each child has probability 0.49 of being a girl and 0.51 of being a boy (yes, more boys than girls are born; boys have higher infant mortality, so the sexes even out soon) (b) The sexes of successive children are independent. 2 Assigning digits is also easy. Two digits simulate the sex of one child. We assign 49 of the 100 pairs to “girl” and the remaining 51 to “boy”, i.e., 00, 01, 02, . . ., 48 means girl, and 49, 50, 51, . . ., 99 means boy. 3 To simulate one repetition of this childbearing strategy, read pairs of digits from Table A until the couple have either a girl or three children. The number of pairs needed to simulate one repetition depends on how quickly the couple get a girl. Here are 10 repetitions, simulated using line 130 of Table A. To interpret the pairs of digits, we have written G for girl and B for boy under them, have added space to separate repetitions. 4 In these 10 repetitions, a girl was born 9 times. Our estimate of the probability that this strategy will produce a girl is therefore estimated probability 9/10 = 0.9. Some mathematics shows that, if our probability model is correct, the true probability of having a girl is 0.867. Our simulated answer came quite close. Unless the couple are unlucky, they will succeed in having a girl. expected values can be calculated by simulating many repetitions and finding the average of all of the outcomes The “house” in a gambling operation is not gambling at all the games are defined so that the gambler has a negative expected gain per play each play is independent of previous plays, so the law of large numbers guarantees that the average winnings of a large number of players will be close to the (negative) expected value State lottos have extremely variable outcomes; also use pari-mutuel system for (fixed) payoffs, too many trials are necessary . . . JLM (WSU) STA 1020 Ch20 - The House Edge: Expected Valued Ex4 We want a girl (Ch19!) 37 / 112 JLM (WSU) Ex3 We want a girl STA 1020 Ch20 - The House Edge: Expected Valued 38 / 112 Controversies Sometimes, expected values may be too difficult to compute and “simulation” is used. Gambling? Voluntary tax? A couple plan to have children until they have a girl or until they have three children, whichever comes first. We find the expected value by simulation, using the table of random digits. The probability model says that the sexes of successive children are independent and that each child has probability 0.49 of being a girl. Thus, a pair of digits simulates one child, with 00 to 48 standing for a girl (e.g., begin at line 130) 6905 BG 2 16 G 1 48 G 1 17 G 1 8717 BG 2 40 G 1 9517 BG 2 845340 BBG 3 648987 BBB 3 Arguments for & against C H E C K: “Exploring the Web” box and the end of this chapter. 20 G 1 Mean of number of children x̄ = (2 + 1 + · · · + 3 + 1)/10 = 1.7 This simulation is too short to be trustworthy (only 10 repetitions or trials). A deeper math analysis shows that the actual expected value is 1.77 JLM (WSU) STA 1020 Ch20 - The House Edge: Expected Valued 39 / 112 JLM (WSU) Exercise Ch20 STA 1020 Ch20 - The House Edge: Expected Valued 20.10 Keno. Keno is a popular game in casinos. Balls numbered 1 to 80 are tumbled in a machine as the bets are placed, then 20 of the balls are chosen at random. Players select numbers by marking a card. Here are two of the simpler Keno bets. Give the expected winnings for each. (a) A $1 bet on “Mark 1 number” pays $3 if the single number you mark is one of the 20 chosen; otherwise, you lose your dollar. (b) A $1 bet on “Mark 2 numbers” pays $12 if both your numbers are among the 20 chosen. The probability of this is about 0.06. Is Mark 2 a more or a less favorable bet than Mark 1? 40 / 112 Exercise (answer) Ch20 **Answers (a) The expect payoff for a Mark 1 bet is ($3)(20/80) + ($0)(60/80) = $0.75. (b) The expected payoff for a Mark 2 bet is approximately ($12)(0.06) = $0.72, slightly less favorable than a Mark 1 bet. Note: The exact probability of winning a Mark 2 bet is (20/80)(19/79) = 0.06013; with this value, the expected payoff is about $0.7215. JLM (WSU) STA 1020 41 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 7 / 19 42 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch20 - The House Edge: Expected Valued Multiple choice Ch20 Ch21 - What is a Confidence Interval? A basketball player makes 65% of her shots from the field during the season. You want to estimate the expected number of shots made in 10 shots. You simulate 10 shots 25 times and get the following numbers of hits: 7976375656756685639677879 Your estimate is: (a) 6 out of 10 shots. (b) 6.5 out of 10 shots. (c) 5.6 out of 10 shots. (d) 5.2 out of 10 shots. Answer: (d) ....................................................................... In government data, a family consists of two or more persons who live together and are related by blood or marriage. Choose an American family at random and count the number of people it contains. Here is the assignment of probabilities for your outcome: Number of persons Probability 2 0.42 3 0.23 4 0.21 5 0.09 6 0.03 7 0.02 STA 1020 Ch21 - What is a Confidence Interval? Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... Using the probabilities above, what is the expected size of the family you draw? (a) 2 people. (b) 3 people. (c) 3.14 people. (d) 3.50 people. Answer: (c) JLM (WSU) STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) 43 / 112 Estimating STA 1020 Ch21 - What is a Confidence Interval? 44 / 112 Thought Questions. . . Part 4: Inference - To draw a conclusion from evidence Chapter 21 Suppose that 40% of a certain population favor the use of nuclear power for energy Statistical inference draws conclusions about a population on the basis of data from a sample. Question such as, “what is the opinion of people about a particular issue”, or “what is the mean survival time for patients with this type of cancer”, or “how people are going to vote in the coming election”. These questions are about a number (the mean or in particular, a percentage) that describes the population on the basis of a sample. This is, to estimate a parameter on the basis of a statistic, as defined in early chapters. (b) Now suppose you randomly sample 1000 people from this population. Will exactly 400 (40%) of them be in favor of the use of nuclear power? Would you be surprised if only 200 (20%) of them are in favor? (a) If you randomly sample 10 people from this population, will exactly four (40%) of them be in favor of the use of nuclear power? Would you be surprised if only two (20%) of them are in favor? (c) In both cases (a) and (b). How about if none of the sample are in favor? A level C confidence interval (e.g., C = 95%) for a parameter has two parts An interval calculated from the data A confidence level (or coefficient) C, which gives the probability that the interval contains the true parameter value JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? 45 / 112 JLM (WSU) Thought Questions. . . (cont) STA 1020 Ch21 - What is a Confidence Interval? 46 / 112 Recall * What does it mean to say that the interval from 0.07 to 0.11 represents a 95% confidence interval for the proportion of adults in the US who have diabetes? A 95% confidence interval is an interval calculated from sample data by a process that is guaranteed to capture the true population parameter in 95% of all samples. * Would a 99% confidence interval for the above proportion be wider or narrower than the 95% interval given? What common sense tell you? Explain. Recall from previous chapters: * In a May 2006 Zogby America poll of 1000 adults, 70% said that past efforts to enforce immigration laws have been inadequate. Based on this poll, a 95% confidence interval for the proportion in the population who feel this way is about 67% to 73%. If this poll had been based on 5000 adults instead, would the 95% confidence interval be wider or narrower than the interval given? Explain. JLM (WSU) STA 1020 Parameter: fixed, unknown number that describes the population Statistic: known value calculated from a sample, a statistic is used to estimate a parameter Sampling Variability: different samples from the same population may yield different values of the sample statistic, estimates from samples will be closer to the true values in the population if the samples are larger √ Margin of Error: in Chapter 3, a quick estimate was given by 1/ n, where n is the sample size 47 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 8 / 19 48 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch21 - What is a Confidence Interval? More key words Ch21 - What is a Confidence Interval? The amount by which the proportion obtained from the sample (p̂) will differ from the true population proportion (p) rarely exceeds the margin of error Rule Conditions and Illustration Figure 21.1 Repeat many times the process of selecting an SRS of size n from a population in which the proportion p are successes. The values of the sample proportion of successes p̂ have this Normal sampling distribution. Sampling Distribution tells what values a statistic takes and how often it takes those values in repeated sampling Sample proportions (p̂) from repeated sampling would have a normal distribution with a certain mean and standard deviation Take an SRS of size n from a large population that contains proportion p of successes. Let p̂ be the sample proportion of successes, [ i.e., p̂ = (count of successes in the sample)/n ]. If the sample size n is large enough then the sampling distribution of p is approximately normal q mean of the sampling distribution is p standard deviation of the sampling distribution is p(1−p) n Ex1 & Ex2 Binge drinking: We calculate p the sample proportion is 279/2166 = 0.129. If we assume that p = 0.13 then sd = (0.13)(0.87)/2166 = 0.0072. The 68-95-99.7 rule says that 95% of all sample of that size will yield a p̂ within the interval p − 2 sd = 0.13 − 0.0144 = 0.1156 and p + 2 sd = 0.13 + 0.0144 = 0.1444. * Problem: We do not actually know the true proportion p . . . JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? Figure 21.3 Repeated samples from the same population give different 95% confidence intervals, but 95% of these intervals capture the true population proportion p. For n sufficiently large p̂ is close to p STA 1020 Ch21 - What is a Confidence Interval? Sample proportion plus orpminus two standard deviations of the sample proportion, p̂ ± 2 p(1 − p)/n Since we do not know the population proportion p (needed to calculate the standard p deviation) we will use the sample proportion p̂ in its place, p̂ ± 2 p̂(1 − p̂)/n p √ The margin of error is 2 p̂(1 − p̂)/n ≤ 1/ n, the quick method of Chapter 3 The formula for a “C-level (%) p Confidence Interval” for the population proportion is p̂ ± z ∗ p̂(1 − p̂)/n, where z ∗ is the critical value of the standard normal distribution for confidence level C JLM (WSU) 51 / 112 Confidence Level C 50% 60% 68.3%* 70% 80% 90% 95%* 95.4% 99% 99.7%* 99.9% r 52 / 112 Confidence Interval Figure 21.5 Critical values z* of the Normal distributions. In any Normal distribution, there is area (probability) C under the curve between -z* and z* standard deviations away from the mean. p̂(1 − p̂) n Critical Value z∗ 0.67 0.84 1* 1.04 1.28 1.64 1.96 2* 2.58 3 3.29 Check table z-score STA 1020 STA 1020 Ch21 - What is a Confidence Interval? p̂ ± z ∗ 50 / 112 Empirical Rule Formula for a 95% Confidence Interval for the Population Proportion Margin of Error Figure 21.4 Twenty-five samples from the same population give these 95% confidence intervals. In the long run, 95% of all such intervals cover the true population proportion, marked by the vertical line. JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? Figure 21.2 Repeat many times the process of selecting an SRS of size 2166 from a population in which the proportion p = 0.13 are successes. The middle 95% of the values of the sample proportion p̂ will lie between 0.1156 and 0.1444. JLM (WSU) JLM (WSU) 49 / 112 Binge drinking Ex5 A 99% confidence interval: The SRS of size n=2166 yields p p̂ = 279/2166 = 0.129, z ∗ = 2.58 and p̂(1 − p̂)/n = 0.0072, so 0.129 ± (2.58)(0.0072) = 0.129 ± 0.0186, i.e., from 11.04% to 14.76% 53 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 9 / 19 54 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch21 - What is a Confidence Interval? The Rule for Sample Means Ch21 - What is a Confidence Interval? Distribution of the mean The proportion is a particular “mean”, i.e., if positive answer is valued 1 and a negative answer is valued 0 then the average value (or mean) is indeed the proportion, of whole population p and of the SRS p̂. For instance, we may phase the questions as “How strong you feel about this particular issue” and then the answer in percent, say from 0% to 100%. Analogously, we may ask about “something” that is measure is some natural unit and then we have the answers as a numerical values, which are called numeric random variables. As n becomes large, the law of the large number says that the average x̄ of a SRS approximate the mean of the whole population µ. The central limit theorem says that the sampling distribution of the x̄ follows approximately (if n is large) a normal distribution, with mean µ √ and standard deviation sd = σ/ n, where σ is the standard deviation of the whole population. Figure 21.6 The sampling distribution of the sample mean x̄ of 10 observations compared with the distribution of individual observations. JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? 55 / 112 JLM (WSU) Some Simulations STA 1020 Ch21 - What is a Confidence Interval? 56 / 112 Margin of error for the mean The C-level (%) confidence interval for the population mean µ is given by σ s either x̄ ± z ∗ √ or x̄ ± z ∗ √ n n where z ∗ is the critical value of the standard normal distribution for confidence level C. If the population standard deviation σ is unknown then the sample standard deviation s is used * “We are 95% confident that the mean resting pulse rate for the population of all exercisers is between 62.8 and 69.2 bpm” (We feel that plausible values for the population of exercisers’ mean resting pulse rate are between 62.8 and 69.2.) “This does not mean that 95% of all people who exercise regularly will have resting pulse rates between 62.8 and 69.2 bpm” Figure 21.7 The distribution of a sample mean x̄ becomes more Normal as the size of the sample increases. The distribution of individual observations (n = 1) is far from Normal. The distributions of means of 2, 10, and finally 25 observations move closer to the Normal shape. JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? 57 / 112 JLM (WSU) What is the meaning of Confidence? * Next, a C-level (%) confidence means that the interval (calculated as above) is guaranteed to capture the true (population) parameter (either the proportion p or the mean µ) in C% of all samples. * In other words, e.g., take C=68%: if you take 100 samples and with each of them you use the above formula to get a ”confidence interval” then approximatively 68 of those samples will give confidence intervals containing the (true) parameter (either proportion or mean, of the whole population), i.e., 68% of the ”confidence intervals” contain the (true) population proportion or mean. * In short, C is the chance (probability) that the one sample (we took!) yields a confidence interval (calculated as above) containing the parameter. STA 1020 STA 1020 Ch21 - What is a Confidence Interval? * First, calculate the C-level (%) confidence interval from (sample) data with the formula r p(1 − p) σ either p ± z ∗ or µ ± z ∗ √ n n where either the population proportion p could be replaced by the sample proportion p̂, or the population mean µ and standard deviation σ could be replaced by the sample mean x̄ and standard deviation s (if necessary), and z ∗ is the critical value of the standard normal distribution for confidence level C JLM (WSU) * Statistically: 95% of all samples of size n = 29 from the population of exercisers should yield a sample mean within two standard errors of the population mean; i.e., in repeated samples, 95% of the confidence intervals should contain the true population mean. 58 / 112 Inference (Ch23!) The design of the data production matters. “Where do the data come from?” remains the first question to ask in any statistical study. Any inference method is intended for use in a specific setting. For our confidence interval and test for a proportion p 1 The data must be a simple random sample (SRS) from the population of interest. When you use these methods, you are acting as if the data are SRS. In practice, it is often not possible to actually choose a SRS from the population. Your conclusions may then be open to challenge. 2 These methods are not correct for sample designs more complex than an SRS, such as stratified samples. There are other methods that fit these settings. 3 There is no correct method for inference from data haphazardly collected with bias of unknown size. Fancy formulas cannot rescue badly produced data. 4 Other sources of error, such as dropouts and nonresponse, are important. Remember that confidence intervals and tests use the data you give them and ignore these practical difficulties. 59 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 10 / 19 60 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch21 - What is a Confidence Interval? Inference (Ch23!)(cont) Ch21 - What is a Confidence Interval? Know how confidence intervals behave. A confidence interval estimates the unknown value of a parameter and also tells us how uncertain the estimate is. All confidence intervals share these behaviors: 1 The confidence level says how often the method catches the true parameter in very many uses. We never know whether this specific data set gives us an interval that contains the true parameter. All we can say is that “we got this result from a method that works 95% of the time.” This data set might be one of the 5% that produce an interval that misses the parameter. If that risk bothers you, use a 99% confidence interval. 2 High confidence is not free. A 99% confidence interval will be wider than a 95% confidence interval based on the same data. There is a trade-off between how closely we can pin down the parameter and how confident we are that we have caught the parameter. 3 Extra. . . INFO: If the population standard deviation σ is unknown and the sample size n is small (e.g., n ≤ 30) then the critical values z ∗ should be obtained from the “Student t distribution” instead of the normal distribution, which is called Critical t ∗ Value, while the number n − 1 = df stands for the Degree of Freedom. This is generally ignored when estimating population proportions (as in this course). The following Table may be needed. . . (http://www.math.wayne.edu/˜menaldi/teach/others/Sta1020/Table 21 1.pdf) Larger samples give narrower intervals. If we want high confidence and a narrow interval, we must take a larger sample. The length of our confidence interval for p goes down in proportion to the square root of the sample size. To cut the interval in half, we must take four times as many observations. This is typical of many types of confidence interval. JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? 61 / 112 JLM (WSU) Extra. . . (cont) STA 1020 Ch21 - What is a Confidence Interval? Comment: As mentioned early, it is better to take SRS with size n as large as possible. Now, what seems to better to do with the data of a SRS of size n = 10, 000: (a) consider this a what it is, a simple random sample of size n = 10, 000 and calculate a 95%-level confidence interval or (b) re-evaluate and consider your data as 10 SRS of size n = 1000, calculate 95%-level confidence intervals for each of your 10 SRS and then average those confidence intervals to get a final answer? Discussion: The difference between (a) and (b) is not in collecting different kind of the data, the data is the same, simply, data is arranged in two alternative ways, and comparable calculations are performed. 62 / 112 Exercise Ch21 21.18 The quick method. The quick method of Chapter 3 (pages √ 42–43) uses p̂ ± 1/ n as a rough recipe for a 95% confidence interval for a population proportion. The margin of error from the quick method is a bit larger than needed. It differs most from the more accurate method of this chapter when p̂ is close to 0 or 1. An SRS of 500 motorcycle registrations finds that 68 of the motorcycles are Harley-Davidsons. Give a 95% confidence interval for the proportion of all motorcycles that are Harleys by the quick method and then by the method of this chapter. How much larger is the quick-method margin of error? Questions: What basic argument (theory) is behind each procedure (a) and (b)? When could “(b) be better than (a)” or “(a) be better than (b)”? How about 100 SRS of size n = 1, 000 or 1000 SRS of size n = 10? JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? 63 / 112 p (0.136)(0.864)/500 = 0.0307 or p 1.96 (0.136)(0.864)/500 = 0.0300 STA 1020 64 / 112 Multiple choice Ch21 A recent Gallup Poll asked, “Do you consider the amount of federal income tax you have to pay as too high, about right, or too low?” 52% of the sample answered “Too high.” Gallup says that: “For results based on the sample of national adults (n=1,021) surveyed April 6-9, 2008, the margin of sampling error is 3 percentage points.” 1 The poll was carried out by telephone, so people without phones are always excluded from the sample. Any errors in the final result due to excluding people without phones (a) are included in the announced margin of error. (b) are in addition to the announced margin of error. (c) can be ignored, because these people are not part of the population. (d) can be ignored, because this is a non sampling error. Answer: (b) 2 If Gallup had used an SRS of size n=1021 and obtained the sample proportion p̂ = 0.52, you can calculate that the margin of error for 95% confidence would be (a) ±1.6 percentage points. (b) ±0.05 percentage points. (c) ±3.0 percentage points. (d) ±3.1 percentage points. Answer: (d) The quick method margin of error is nearly 1.5 times larger than necessary. JLM (WSU) STA 1020 Ch21 - What is a Confidence Interval? **Answers √ The quick method. By the quick method, the margin of error is 1/ n , √ i.e., 1/ 500 = 0.0447. Because p̂ = 68/500 = 0.136, the margin of error p from the method of this chapter is z ∗ p̂(1 − p̂)/n , i.e., 2 JLM (WSU) Exercise (answer) Ch21 65 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 11 / 19 66 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch22 - What is a Test of Significance? Ch22 - What is a Test of Significance? Previously . . . Ex1 Is the coffee fresh? Chapter 22 Matched pairs Experiment: . . . Each of the 50 subjects tastes two unmarked cups of coffee and says which he or she prefers. One cup in each pair contains instant coffee and the other, fresh-brewed coffee. We find that 36 of our 50 subjects choose the fresh coffee, i.e., p̂ = 36/50 = 0.72. STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible The formula for a “C-level (%) Confidence Interval” for the population p proportion is p̂ ± z ∗ p̂(1 − p̂)/n , where z ∗ is the critical value of the standard normal distribution for confidence level C , see Table 21.1. ∗ At the p 99%-level we find z = p2.58 and the Margin of Error is ∗ ±z p̂(1 − p̂)/n = ±(2.58) (0.72)(1 − 0.72)/50 = ±0.164 and the Confidence Interval is from 0.72 − 0.16 = 0.56 to 0.72 + 0.16 = 0.88. (at 95% we find z ∗ = 1.96, so MoE= ±0.124 and CI= [0.60, 0.84]). • What is the rational argument for accepting or rejecting the claim that population proportion p = 0.5? • What is the probability that the confidence interval [0.56, 0.88] captures the true population proportion p? JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? 67 / 112 JLM (WSU) Ex1 Is the coffee fresh? STA 1020 Ch22 - What is a Test of Significance? 68 / 112 Ex1 Sampling distribution Matched pairs Experiment: . . . Each of the 50 subjects tastes two unmarked cups of coffee and says which he or she prefers. One cup in each pair contains instant coffee and the other, fresh-brewed coffee. We find that 36 of our 50 subjects choose the fresh coffee, i.e., p̂ = 36/50 = 0.72. The claim. The skeptic claims that coffee drinkers can not tell fresh from instant, so that only half will choose fresh-brewed coffee, i.e., the population proportion p is only 0.5. If this claim is true, of p̂ is approx. normal with pthe sampling distribution p p = 0.5 and sd = p(1 − p)/n = (0.5)(0.5)/50 = 0.0707. The data. In our SRS we got p̂ = 0.72, i.e., 72%, but in another SRS we could find p̂ = 0.56, i.e., 56%, or any other value! Do we have evidence against the claims?. The Probability. We can measure the strength of the evidence against the claim by a probability, i.e., “What is the probability that a sample gives p̂ this large or larger if the truth about the population is that p = 0.5?” JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? Figure 22.2 The sampling distribution of the proportion of 50 coffee drinkers who prefer fresh-brewed coffee if the truth about all coffee drinkers is that 50% prefer fresh coffee. The shaded area is the probability that the sample proportion is 56% or greater. 69 / 112 JLM (WSU) Ex1 Is the coffee fresh? (cont) STA 1020 Ch22 - What is a Test of Significance? The Probability. We can measure the strength of the evidence against the claim by a probability, i.e., “What is the probability that a sample gives p̂ this large or larger if the truth about the population is that p = 0.5?”. Our sample actually gave p̂ = 0.72, and the probability of getting a sample outcome this large (or larger) is only 0.001, i.e., 1 out of 1000 times this may happen just by change. We may declare this as a good evidence (that the claim is false). If our p̂ were equal to 0.56 then our probability would be 0.2, i.e., 2 out of 10 times, not a really evidence to reject the claim. Be sure to understand why this is a convincing evidence. There are two possible explanations of the fact that 72% of our subject prefer fresh to instant coffee: The skeptic is correct (p = 0.5), and by bad luck a very unlikely outcome occurred 70 / 112 Thought Questions. . . The defendant in a court case is either guilty or innocent. Which of these is assumed to be true when the case begins? The jury looks at the evidence presented and makes a decision about which of these two options appears more plausible. Depending on this decision, what are the two types of errors that could be made by the jury? Which is more serious? ....................................................................... Suppose 60% (0.60) of the population are in favor of new tax legislation. A random sample of 265 people results in 175, or 66%, who are in favor. From the Rule for Sample Proportions, we know the potential sample proportions in this situation follow an approximately normal distribution, with a mean of 0.60 and a standard deviation of 0.03. Find the standard score for the observed value of 0.66; then find the probability of observing a standard score at least that large or larger. In fact, the population proportion is greater then 0.5 (p > 0.5), so that the outcome is about what would be expected JLM (WSU) STA 1020 71 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 12 / 19 72 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch22 - What is a Test of Significance? Thought Questions. . . (cont) Sampling Distribution of p mean = 0.60 standard deviation = 0.03 Ch22 - What is a Test of Significance? z= 0.66−0.60 0.03 . . = 2.0 p -value = 1 − 0.9773 = 0.0227 p = 0.60 p̂ = 0.66 . Suppose that in the previous question we do not know for sure that the proportion of the population who favor the new tax legislation is 60%. Instead, this is just the claim of a politician. From the data collected, we have discovered that if the claim is true, then the sample proportion observed falls at the 97.73 percentile (about the 98th percentile) of possible sample proportions for that sample size. Should we believe the claim and conclude that we just observed strange data, or should we reject the claim? What if the result fell at the 85th percentile? At the 99.99th percentile? JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? A test of significance begins by supposing that the effect we seek is not present. Then we look for a “statistical” evidence against this supposition and in favor of the effect we hope to find The claim being tested in a statistical test is called null hypothesis H0 . The test is designed to assess the strength of the evidence against the null hypothesis. Usually, the null hypothesis is a statement of “non effect” or “no different”, which is translated into something relative to the proportion p (or the mean µ, or standard deviation σ) of an entire population. What we hope or suspect is true instead of H0 is called the alternative hypothesis Ha . The probability computed assuming that H0 is true, that the SRS outcome would be as extreme or more extreme than the actual observed outcome is called the P-value of the test. The smaller the P-value is, the stronger is the evidence against H0 . Typical examples are (H0 : p = p0 ) and either (Ha : p 6= p0 ) or (Ha : p > p0 ) or (Ha : p < p0 ) for the alternative hypothesis. 73 / 112 JLM (WSU) Ex2 Count Buffon’s coin STA 1020 Ch22 - What is a Test of Significance? For instance, in Ex1, we used (H0 : p = 0.5) with (Ha : p > 0.5), because we have discharged the possibility (Ha : p < 0.5) a priori In Ex2, the French naturalist Count Buffon tossed a coin 4040 times, he got 2048 heads, i.e., the sample proportion p̂ = 2048/4040 = 0.507. We ask: “Is this evidence that Buffon’s coin was not balanced?”. Hypotheses and P-values 74 / 112 Count Buffon’s coin (cont) Figure 22.3 The sampling distribution of the proportion of heads in 4040 tosses of a balanced coin. Count Buffon’s result, proportion 0.507 heads, is marked. We translate this into a null hypothesis (H0 : p = 0.5) and the alternative hypothesis (Ha : p 6= 0.5). If the p null hypothesis isptrue then p = 0.5 and the sample sd = p(1 − p)/n = (0.5)(0.5)/4040 = 0.00787. ** Now, for p̂ = 0.507 we get a P-value 0.37, i.e., a truly balanced coin would give a result this far or farther from 0.5 in 37% of all repetitions of Buffon’s trial. This test give no reason to think that his coin was not balanced. JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? The P-value or observed significance level of a test of hypotheses is the smallest value of α (the critical value) for which H0 (null hypothesis) can be rejected. The P-value measures the strength of evidence against H0 . 75 / 112 JLM (WSU) Ex3 Testing Coffee “Significant” in the statistical sense does not mean “important”, it means not likely to happen just by change. Use a table (and your logic) to find the P-value. For Ex3 (Testing Coffee) the null hypothesis is (H0 : p = 0.5) and the alternative hypothesis is (Ha : p > 0.5). If null hypothesis is true then p̂ follows (approx.) a Normal distribution with mean 0.5 and standard deviation 0.0707. The data yields a p̂ = 0.72, which yields a standard score z = (0.72 − 0.5)/0.0707 = 3.1, and the table (check table here!) gives a P-value 0.001. Since the P-value is small, these data provide very strong evidence that a majority of the population prefers fresh coffee STA 1020 STA 1020 Ch22 - What is a Test of Significance? If the P-value is as small or smaller than α, we say that the data are statistical significant at the level α. JLM (WSU) Figure 22.4 The P-value for testing whether Count Buffon’s coin was balanced. This is the probability, calculated assuming a balanced coin, of a sample proportion as far or farther from 0.5 as Buffon’s result of 0.507. 76 / 112 P-values When the alternative hypothesis includes a greater than symbol (Ha : p > p0 ), the P-value is the probability of getting a value as large or larger than the observed test statistic (z) value: Look up the percentile for the value of z in the standard normal table (Table B), the P-value is 1 minus this probability When the alternative hypothesis includes a less than symbol (Ha : p < p0 ), the P-value is the probability of getting a value as small or smaller than the observed test statistic (z) value: Look up the percentile for the value of z in the standard normal table (Table B), the P-value is this probability When the alternative hypothesis includes a not equal to symbol (Ha : p 6= p0 ), the P-value is found as follows: Make the value of the observed test statistic (z) positive (absolute value), look up the percentile for this positive value of z in the standard normal table (Table B), find 1 minus this probability, and double the answer to get the P-value 77 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 13 / 19 78 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch22 - What is a Test of Significance? P-values (alt) Ch22 - What is a Test of Significance? The National Assessment of Adults Literacy (NAAL) survey indicates that a score of 289 or higher on its quantitative test reflects skills that include those needed to balance a checkbook. A SRS size n = 2001 of young men (aged 19 to 24) had mean score x̄ = 279, with a standard deviation s = 103. Alternative Method for P-value 1. Make the value of the observed test statistic (z) negative 2. Look up the percentile for this negative value of z in the standard normal table (Table B) The pessimist’s claim is that the mean NAAL score is less than 289. That is our alternative hypothesis (why not the H0 ?), the statement we seek evidence for. Thus (H0 : µ = 289) and (Ha : µ < 289). Now If the alternative hypothesis includes a greater than (Ha : p > p0 ) or less than (Ha : p < p0 ) symbol, the P-value is this probability found as “percentile” in step 2 If the null hypothesis is true, µ = 289, then the sample mean x̄ follows (approx.) a Normal distribution with mean µ = 289 and standard deviation √ the unknown σ with s, we find σ/√ n, approximating √ s/ n = 103/ 2001 = 2.3. If the alternative hypothesis includes a not equal to (Ha : p 6= p0 ) symbol, double this probability found as “percentile” in step 2 to get the P-value The data gave x̄ = 279, which yields a standard score z = (279 − 289)/(2.3) = −4.35 and so, the P-value is equal to 0.0000068, very small. There are other ways, use your logic and the fact that the total area under any distribution must be equal to 1 JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? Ex4 Checkbook Hence, our conclusion is to reject the null hypothesis, i.e., this data gives a strong evidence that the mean score for all young men (aged 19 to 24) is below the level that includes the skills necessary to balance a checkbook. 79 / 112 JLM (WSU) Checkbook (cont) STA 1020 Ch22 - What is a Test of Significance? 80 / 112 Procedure The Five Steps of Hypothesis Testing Figure 22.5 The P-value is 0.0000068, for a onesided test when the standard score for the sample mean is −4.35. 1 Determining the Two Hypotheses 2 Computing the Sampling Distribution 3 Collecting and Summarizing the Data (calculating the observed test statistic) Ex5: Executives’ blood pressures: n = 72, x̄ = 126.1, s = 15.2 (H0 : µ = 128), with (Ha : µ 6= 128) √ √ s/ n = 15.2 72 = 1.79 4 Determining How Unlikely the Test Statistic is if the Null Hypothesis is True (calculating the P-value) 5 Making a Decision/Conclusion (based on the P-value, is the result statistically significant?) The P-value is 0.289, for a two-sided test when the standard score for the sample mean is (126.1 − 128)/(1.79) = −1.06 Possible Null Hypothesis H0 : population parameter equals some value, status quo, no relationship, no change, no difference in two groups, etc. The logical Alternative Hypothesis Ha is “NOT H0 ” Now it’s your turn . . . Read Case Study evaluated JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? 81 / 112 82 / 112 Decision We find the P-value associated with the standard score obtained from the data. Alternative, one-sided: (Ha : p > p0 ) or (Ha : p < p0 ), and one of these possibilities is discharged as a fact Alternative, two-sided: (Ha : p 6= p0 ) Sampling Distribution for Proportions: If numerous simple random samples of size n are taken, the sample proportions p̂ from the various samples will have an approximately normal distribution with mean equal to p (theppopulation proportion) and standard deviation equal to sd = p(1 − p)/n. Since we assume the null hypothesis is true, we replace p with p0 to complete the test. To determine if the observed proportion is unlikely to have occurred under the assumption that H0 is true, we must first convert the observed value to a standard score z = (p̂ − p0 )/sd STA 1020 STA 1020 Ch22 - What is a Test of Significance? Null: (H0 : p = p0 ) JLM (WSU) JLM (WSU) Procedure (cont) If we think the P-value is too low to believe the observed test statistic is obtained by chance only, then we would reject chance (reject the null hypothesis) and conclude that a statistically significant relationship exists (accept the alternative hypothesis) Otherwise, we fail to reject chance and do not reject the null hypothesis of no relationship (result not statistically significant) Commonly, P-values less than 0.05 are considered to be small enough to reject chance (reject the null hypothesis). However, some researchers use 0.10 or 0.01 as the cut-off instead of 0.05. This “cut-off” value is typically referred to as the significance level α of the test The P-value is like an estimation of the probability that the null hypothesis is true. Because our objective it to reject the null hypothesis (i.e., to disprove the null hypothesis), it is clear that small P-value are desired. 83 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 14 / 19 84 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch22 - What is a Test of Significance? A Survey Ch22 - What is a Test of Significance? Parental Discipline: Nationwide random telephone survey of 1,250 adults, where 474 respondents had children under 18 living at home. The results on behavior based on the smaller sample reported “3% for the full sample” and “5% for the smaller sample” as margin of error. “The 1994 survey marks the first time a majority of parents reported not having physically disciplined their children in the previous year. Figures over the past six years show a steady decline in physical punishment, from a peak of 64 percent in 1988”. The 1994 sample proportion who did not spank or hit was 51%. Question: Is this evidence that a majority of the population did not spank or hit? Null: The proportion of parents who physically disciplined their children in the previous year is the same as the proportion p of parents who did not physically discipline their children, i.e., (H0 : p = 0.5) Based on the sample: Sample size n = 474 (large, so proportions follow normal distribution) No physical discipline: 51% p̂ = 0.51 p s.d. of p̂ is (0.50)(1 − 0.50)/474 = 0.023 (recall we assume H0 : p = 0.5 true) Standard score z = (0.51 − 0.50)/0.023 = 0.43 Table B, (0.43) 7→ (65.54%), so the P-value is 1 − 0.6554 = 0.3446 Since the P-value (0.3446) is not small, we cannot reject chance as the reason for the difference between the observed proportion (0.51) and the (null) hypothesized proportion (0.50). We do not find the result to be statistically significant at α = 0.01 (or even 0.05 or 0.10) We fail to reject the null hypothesis. It is plausible that there was not a majority (over 50%) of parents who refrained from using physical discipline. Alt: A majority of parents did not physically discipline their children in the previous year, i.e., (Ha : p > 0.5) JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? 85 / 112 Decisions Reject H0 Accept H0 H0 is correct Type Error I (α) Correct (1 − α) H0 is incorrect Correct (1 − β) Type Error II (β) Null: (H0 : µ = µ0 ) The probability of this incorrect decision is equal to the cut-off α for the P-value Type II: If we decide not to reject chance and thus allow for the plausibility of the null hypothesis (complicate to estimate!) This is an incorrect decision only if the alternative hypothesis is true The probability of this incorrect decision depends on (a) the magnitude of the true relationship, (b) the sample size, (c) the cut-off for the P-value. STA 1020 Alternative, one-sided: (Ha : µ > µ0 ) or (Ha : µ < µ0 ), and one of these possibilities is discharged as a fact Alternative, two-sided: (Ha : µ 6= µ0 ) As before, if numerous simple random samples of size n are taken, the sample means from the various samples will have an approximately normal distribution with mean equal to µ (the population mean) and standard √ deviation equal to sd = σ/ n. Here we approximate the population standard deviation σ with the sample standard deviation s (i.e., remark the √ factor 1/ n between s and the standard deviation of sampling distribution of the sample means sd) 87 / 112 JLM (WSU) Tomato plants ....................................................................... [standard score]: [(sample mean diff.) - (population mean diff.)] divided by [standard deviation of the mean difference], i.e., z = (6.82 − 0)/3.10 = 2.2 This is the 98.61 percentile for a standard normal curve, so the probability of seeing a z-value this large or larger is 1.39% (i.e., 0.0139). STA 1020 STA 1020 Ch22 - What is a Test of Significance? A study showed that the difference in sample means for the heights of tomato plants when using a nutrient rich potting soil versus using ordinary top soil was 6.82 inches. The corresponding standard deviation (of the sample distribution of the mean difference) was 3.10 inches. Suppose the means are actually equal, so that the mean difference in heights for the populations is actually zero. * What is the standard score (z) corresponding to the observed difference of 6.82 inches? * How often would you expect to see a standardized score that large or larger? JLM (WSU) 86 / 112 Mean The population proportion p could be replaced by a population mean µ when setting up the two hypotheses This is an incorrect decision only if the null hypothesis is true Ch22 - What is a Test of Significance? STA 1020 Ch22 - What is a Test of Significance? Type I: If we decide there is a relationship in the population (reject null hypothesis) JLM (WSU) JLM (WSU) Errors Hypothesis Testing: Significance level (α) and Power (1 − β) A Survey (cont) 88 / 112 Bacteria One of the conclusions made by researchers from a study comparing the amount of bacteria in carpeted and uncarpeted rooms was, “The average difference [in mean bacteria colonies per cubic foot] was 3.48 colonies [95% Confidence Interval: between (−2.72) and (9.68), and P-value: (0.29)].” * What are the null and alternative hypotheses being tested here? * Is there a statistically significant difference between the means of the two groups? ...................................................................... H0 : The mean number of bacteria for carpeted rooms is equal to the mean number of bacteria for uncarpeted rooms. Ha : The mean number of bacteria for carpeted rooms is different from the mean number of bacteria for uncarpeted rooms. P-value is large (> .05), so there is not a significant difference (fail to reject the Null hypothesis) (Note that the confidence interval for the difference contains 0) 89 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 15 / 19 90 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch22 - What is a Test of Significance? Inference (Ch23!) Ch22 - What is a Test of Significance? Know what statistical significance says. Many statistical studies hope to show that some claim is true. A clinical trial compares a new drug with a standard drug because the doctors hope that patients given the new drug will do better. A psychologist studying gender differences suspects that women will do better than men (on the average) on a test that measures social-networking skills. The purpose of significance tests is to weight the evidences that the data give in favor of such claims. That is, a test helps us know if we found what we were looking for. To do this, we ask what would happen if the claim were not true. That’s the null hypothesis (no difference between the two drugs, no difference between women and men). A significance test answers only one question: “How strong is the evidence that the null hypothesis is not true?” A test answers this question by giving a P-value. The P-value tells us how unlikely data as or more extreme than ours (in the sense of providing evidence against the null hypothesis) would be if the null hypothesis were true. Data that are very unlikely are good evidence that the null hypothesis is not true. We usually don’t know whether the hypothesis is true for this specific population. All we can say is that “data as or more extreme than these would occur only 5% of the time if the hypothesis were true.” This kind of indirect evidence against the null hypothesis (and for the effect we hope to find) is less straightforward than a confidence interval. Know what your methods require. Significance test and confidence interval for a proportion p require that the population be much larger than the sample. They also require that the sample itself be reasonably large so that the sampling distribution of the sample proportion p̂ is close to Normal. We have said little about the specifics of these requirements because the reasoning of inference is more important. Just as there are inference methods that fit stratified samples, there are methods that fit small samples and small populations. JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? INFO: Sometimes, the alternative hypothesis Ha is denoted by H1 . Example: Finding Sample Size Required to Achieve 80% Power. Here is a statement similar to the one in an article from the Journal of the American Medical Association: “The trial design assumed that with a 0.05 significance level, 153 randomly selected subjects would be needed to achieve 80% power to detect a reduction in the coronary heart disease rate from 0.5 to 0.4.” Before conducting the experiment, the researchers selected a significance level of 0.05 and a power at least 80%. They also decided that a reduction in the proportion of coronary heart disease from 0.5 to 0.4 is an important difference that they want to detect (by correctly rejecting the false null hypothesis). Using a significance level of 0.05, power 0.80, and the alternative proportion of 0.4, we deduce that the required minimum sample size is 153. Related to Power of a test Check Wikipedia 91 / 112 JLM (WSU) Exercise Ch22 STA 1020 Ch22 - What is a Test of Significance? 22.26 Do chemists have more girls? Some people think that chemists are more likely than other parents to have female children. (Perhaps chemists are exposed to something in their laboratories that affects the sex of their children.) The Washington State Department of Health lists the parents occupations on birth certificates. Between 1980 and 1990, 555 children were born to fathers who were chemists. Of these births, 273 were girls. During this period, 48.8% of all births in Washington State were girls. Is there evidence that the proportion of girls born to chemists is higher than the state proportion? Extra. . . 92 / 112 Exercise (answer) Ch22 **Answers Do chemists have more girls? Our hypotheses are H0 : p = 0.488 and Ha : p > 0.488, where p is the proportion of girls among children born to chemists. If the null hypothesis is true, then the proportion of girls in an SRS of n=555 chemists’ children would have (approximately) a normal distribution with mean p = p0 = 0.488 and standard deviation p p p(1 − p)/n = (0.488)(0.512)/555 = 0.02122 Our sample had p̂ = 273/555 = 0.4919, for which the standard score is z = (p̂ − p0 )/0.02122 = (0.4919 − 0.488)/0.02122 = 0.18 From Table B [> find (0.2) 7→ (57.93%) and (0.1) 7→ (53.98%) so take percentile 57.93% and 1 − 0.5793 = 0.4207 <], we estimate the P-value to be about 0.42 (calculator/better table output gives P = 0.4272). Thus, we cannot reject the null hypothesis (not enough evidence!) JLM (WSU) STA 1020 Ch22 - What is a Test of Significance? 93 / 112 JLM (WSU) Multiple choice Ch22 STA 1020 94 / 112 Ch24 - Two-way Tables and the Chi-Square Test If the value of the standard test statistic z is 2.5 then (a) we should use a different null hypothesis. (b) we reject the null hypothesis at the 5% significance level. (c) we fail to reject the null hypothesis at the 5% significance level. (d) we reject the alternative hypothesis at the 5% significance level. Answer: (b) STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State ....................................................................... If a significance test gives a P-value of 0.50 then (a) the margin of error is 0.50. (b) the null hypothesis is very likely to be true. (c) we do not have good evidence against the null hypothesis. (d) we do have good evidence against the null hypothesis. Answer: (c) Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA 1020 95 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 16 / 19 96 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch24 - Two-way Tables and the Chi-Square Test Ex1 & Ex2 Two-way Tables Ch24 - Two-way Tables and the Chi-Square Test Chapter 24 A university offers only two degree programs, one in electrical engineering and one in English *Admission Status is the row variable *Gender is the column variable Admit Deny Total * (% male)=80/140 = 0.57, i.e., 57% * (% female)=60/140 = 0.43, i.e., 43% Male 35 45 80 Female 20 40 60 Total 55 85 140 Discrimination in admission? Because there are only two categories of admission status, we can see the relation between gender and admission status by comparing the (percentage male applicants admitted) = 35/80 = 0.44, i.e., 44% (percentage female applicants admitted) = 20/60 = 0.33, i.e., 33% STA 1020 Ch24 - Two-way Tables and the Chi-Square Test A random sample of registered voters were asked whether they preferred balancing the budget or cutting taxes. Each was then categorized as being either a Democrat or a Republican. Of the 30 Democrats, 12 preferred cutting taxes, while of the 40 Republicans, 24 preferred cutting taxes. Democrats Republican Total Prefer Tax Cutting 12 24 36 Do not Prefer TxC 18 16 34 Total 30 40 70 .................................................................................... How would you display the data in a table? To describe relationships among categorical variables, calculate appropriate percentage from counts given. JLM (WSU) Thought Questions. . . When there are two categorical variables, the data are summarized in a two-way table each row represents a value of the row variable each column represents a value of the column variable * The number of observations falling into each combination of categories is entered into each cell of the table. * Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table (prevents misleading comparisons due to unequal sample sizes for different groups) JLM (WSU) 97 / 112 Ex3 Treating cocaine addiction STA 1020 Ch24 - Two-way Tables and the Chi-Square Test 98 / 112 Ex3 Bar graph . . . A three-year study compared an antidepressant (desipramine) with lithium, and a placebo. 72 subjects were randomly divided into 3 groups (each having 24 subjects) and assigned to each treatment Group 1 2 3 Treatment Desipramine Lithium Placebo Subjects 24 24 24 Successes 14 6 4 Percent 58.3% 25.0% 16.7% Are these data good evidence that there is a relationship between treatment and outcome in the population of all cocaine addicts? To answer this question we begin with a two-way table Desipramine Lithium Placebo Success 14 6 4 Failure 10 18 20 Total 24 24 24 Figure 24.1 Bar graph comparing the success rates of three treatments for cocaine addiction JLM (WSU) STA 1020 Ch24 - Two-way Tables and the Chi-Square Test 99 / 112 JLM (WSU) Chi-square test Our null hypothesis takes the form In a two-way table when H0 is true we computer Desipramine Lithium Placebo (row total)×(column total) (table total) e.g., the expect count of successes in the desipramine group is (24)(24)/72 = 8, namely, if the null hypothesis of no treatment differences is true then we expect 8 of the 24 desipramine subjects to succeed The chi-square statistic, denoted by χ2 , is a measure of how far the observed count in a two-way table are from the expected counts χ2 = where P 100 / 112 Ex4 Cocaine addiction (cont) Here are the observed and expected counts H0 :There is no association between treatment and success in the population of all cocaine addicts (expected count) = STA 1020 Ch24 - Two-way Tables and the Chi-Square Test Observed Success Failure 14 10 6 18 4 20 Expected Success Failure 8 16 8 16 8 16 Finding the chi-square statistics, adding 6 terms for the 6 cells in the two-way table note that all “failure” values are obtainable from the “success” values (14 − 8)2 (10 − 16)2 (4 − 8)2 (20 − 16)2 + + ··· + + = 8 16 8 16 = 4.50 + 2.25 + · · · + 2.00 + 1.00 = 10.50 χ2 = P [(observed count)−(expected count)]2 (expected count) Now it’s your turn: Smoking and survival . . . means “sum over all cells in the table” JLM (WSU) STA 1020 101 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 17 / 19 102 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch24 - Two-way Tables and the Chi-Square Test Chi-square distribution Ch24 - Two-way Tables and the Chi-Square Test The chi-square statistic is a measure of the distance of the observed counts from the expected counts Chi-square table Figure 24.2 The density curves for three members of the chisquare family of distributions. The sampling distributions of chisquare statistics belong to this family is always zero or positive and skewed to the right is only zero when the observed counts are exactly equal to the expected counts large values of χ2 are evidence against H0 because these would show that the observed counts are far from what would be expected if H0 were true ** In a two-way table the chi-square test is one-sided (any violation of H0 produces a large value of χ2 ) df = (r − 1)(c − 1) A specific χ2 distribution requires to know the degree of freedom (in short df), which is computed as (r − 1)(c − 1) for a two-way table with r rows and c columns JLM (WSU) STA 1020 Ch24 - Two-way Tables and the Chi-Square Test 103 / 112 Hence, the cocaine study shows a significant relationship P < 0.01 between treatment and success. Conclusion: We found a strong evidence of some association between treatment and success, and by looking at the two-way table, we see that desipramine works better than the other treatments. NOTE: You can safely use the chi-square test when no more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater STA 1020 CHD count No CHD = Low 53 3057 High 27 606 Low 69.73 3040.27 Expected Moderate 106.08 4624.92 104 / 112 Ex6 . . . heart disease? Low Moderate High Total First step is to write the data CHD count 53 110 27 190 as a two-way table, by adding No CHD 3057 4621 606 8284 the count of subjects who did Total 3110 4731 633 8474 not suffer form heart disease .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The chi-square method tests these hypotheses: H0 : no relationship between anger and CHD Ha : some relationship between anger and CHD There are r = 2 rows and c = 3 columns, so df = (2 − 1)(3 − 1) = 2 JLM (WSU) 105 / 112 STA 1020 Ch24 - Two-way Tables and the Chi-Square Test 106 / 112 Ex7 Discrimination in admissions? ** The effects of lurking variables can change and even reverse relationship between two variables (row 1 total) × (column 3 total) (190)(633) = = 14.19 (table total) 8474 Observed Moderate 110 4621 STA 1020 Anger Score People who get angry easLow Moderate High ily tend to have more heart disease. . . . 8474 peoSample size 3110 4731 633 ple . . . coronary heart disease CHD count 53 110 27 (CHD) CHD percent 1.7% 2.3% 4.3% .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ex6 . . . heart disease? (cont) Find the expected cell count, e.g., of high-anger people with CHD is (expected count) JLM (WSU) Ch24 - Two-way Tables and the Chi-Square Test Thus, the chi-square statistic has (3 − 1)(2 − 1) = 2 degree of freedom. P From the we found χ2 = 10.5, so we look in the Table 24.1 for df = 2 to find the critical value 9.21 required for significance at the α = 0.01 level, and 13.82 for α = 0.001. Ch24 - Two-way Tables and the Chi-Square Test r = (number of rows), c = (number of columns) Ex5 Using chi-square test Back to Ex3, the two-way table has 3 treatment and 2 outcomes, i.e., it has r = 3 rows and c = 2 columns. JLM (WSU) ** There P are r × c terms in the , where Ex7: Discrimination in admissions? Go back to Ex1. Suspect women discrimination. From the two-way table we found (percentage male applicants admitted) = 35/80 = 0.44, i.e., 44% (percentage female applicants admitted) = 20/60 = 0.33, i.e., 33% High 14.19 618.81 In its defense, the University produces a three-way table ** It is safe to apply the chi-square test since all expected cell counts are greater than 5, so χ 2 = = (53 − 68.73)2 (110 − 106.08)2 (4621 − 4624.92)2 (606 − 618.81)2 + + ··· + + = 68.73 106.08 4624.92 618.81 4.014 + 0.145 + · · · + 0.003 + 0.264 = 16.083 ** For df = 2 in Table 24.1 the χ2 = 16.083 is larger than the critical value 13.82 for α = 0.001. We have highly significant evidence (P < 0.001) that anger and heart disease are related. Statistical software can give the actual P-value of P = 0.0003. JLM (WSU) STA 1020 107 / 112 http://www.math.wayne.edu/˜menaldi/teach/ Admit Deny Total Engineering Male Female 30 10 30 10 60 20 English Male Female 5 10 15 30 20 40 Combined Male Female 35 20 45 40 80 60 % Admit 50% 25% 44% JLM (WSU) 100% STA 1020 18 / 19 25% 33% 108 / 112 ** STA 1020 - Part 3 (08/Dec/13) ** Ch24 - Two-way Tables and the Chi-Square Test Simpson’s paradox Ch24 - Two-way Tables and the Chi-Square Test ** Simpson’s paradox: An association or comparison that holds for all of several groups can disappear or even reverse direction when the data are combined to form a single group. This is just an extreme form of the fact that observed associations can be misleading when there are lurking variables . . . ** Summary: Make a two-way table to display the relationship between two categorical variables Exercise Ch24 24.5 Smoking by students and their families. How are the smoking habits of students related to the smoking habits of their close family members? Here is a two-way table from a survey of male students in six secondary schools in Malaysia: Student At least one close family member smokes No close family member smokes smokes 115 25 does not smoke 207 75 Write a brief answer to the question posed, including a comparison of selected percentages. Conclude by using the P-value (critical) of the chi-square statistic *Read Ex9: Discrimination in mortgage lending? *Case Study Evaluated Chi-square Table and Tables 21.1 & 24.1 JLM (WSU) STA 1020 Ch24 - Two-way Tables and the Chi-Square Test 109 / 112 JLM (WSU) Exercise (answer) Ch24 **Answers The table below shows the percent of male students who smoke within each status of family member smoking status. At least one close family member smokes No close family member smokes STA 1020 Ch24 - Two-way Tables and the Chi-Square Test 115/322 = 35.7% 25/100 = 25% In our sample, male students with at least one close family member who smokes are more likely to smoke than are male students with no close family member who smokes. 110 / 112 Multiple choice Ch24 Which of these is an example of Simpson’s paradox? (a) Teachers’ salaries and sales of alcoholic beverages have risen together over time, but paying teachers more does not cause higher alcohol sales. (b) Alaska Air has a lower percent of late flights than America West at every airport, but America West has a lower percent when we combine all airports. (c) The percent of surgery patients given Anesthetic A who die is higher than the percent for Anesthetic B, but this is because A is used in more serious surgeries. (d) States in which a smaller percent of students take the SAT exam have higher median scores on the SAT. Answer: (b) .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . If surgical procedure A has a higher success rate than surgical procedure B in every hospital where they are used and yet procedure B has a higher overall success rate, then we suspect that: (a) this is an example of Simpson’s paradox. (b) it must be easier to achieve success at some hospitals than at others, whatever procedure is used. (c) procedure B must be used predominantly in hospitals where it is easier to achieve success, while procedure A must be used predominantly where it is harder to achieve success. (d) All of (a), (b), and (c) are true. Answer: (d) JLM (WSU) STA 1020 111 / 112 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 19 / 19 112 / 112