Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Professor William Greene Stern School of Business IOMS Department Department of Economics Part 5 – Hypothesis Testing Part 5 – Hypothesis Testing 3/100 Objectives of Statistical Analysis 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing Estimation How long do hard drives last? What is the median income among the 99%ers? Inference – hypothesis testing Did minorities pay higher mortgage rates during the housing boom? Is there a link between environmental factors and breast cancer on eastern long island? Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 4/100 General Frameworks Parametric Tests: features of specific distributions such as the mean of a Bernoulli or normal distribution. Specification Tests (Semiparametric) Do the data arrive from a Poisson process Are the data normally distributed 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Nonparametric Tests: Are two discrete processes independent? Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 5/100 Hypotheses Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing Hypotheses - labels State 0 of Nature – Null Hypothesis State 1 – Alternative Hypothesis Exclusive: Prob(H0 ∩ H1) = 0 Exhaustive: Prob(H0) + Prob(H1) = 1 Symmetric: Neither is intrinsically “preferred” – the objective of the study is only to support one or the other. (Rare?) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 6/100 Testing Strategy Before the investigation begins Prior beliefs: Prob(H 0 ), Prob(H1 ) Prob(H 0 ) Prob(H1 ) Prior odds: Results of the investigation: Likelihood of the observed data assuming H 0 or H1 Prob(data|H 0 ) Likelihood ratio: Prob(data|H1 ) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 7/100 Posterior (to the Evidence) Odds Prob(data|H 0 )Prob(H 0 ) Prob(H 0 | data) Prob(data) Prob(data|H1 )Prob(H1 ) Prob(H1 | data) Prob(data) Prob(H 0 | data) Posterior odds = Prob(H1 | data) Prob(data|H 0 )Prob(H 0 ) Prob(data|H1 )Prob(H1 ) = Prob(H 0 ) Prob(data|H 0 ) = Prob(H ) Prob( data |H ) 1 1 = Prior odds Likelihood ratio 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 8/100 Does the New Drug Work? Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% Posterior odds in favor of H0 = (.4/.6)(.0270059/.0148156) = 1.2152 > 1 Priors favored H1 1.5 to 1, but the posterior odds favor H0 , 1.2152 to 1. The evidence discredits H1 even though the ‘data’ seem more consistent with prior P1. 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing L0 (31| =.50) = Binomial(50,31,.50) = .0270059 L1 (31| =.75) = Binomial(50,31,.75) = .0148156 Percent Frequency Listing Hypotheses: H0 = .50, H1 = .75 Priors: P0 = .40, P1 = .60 Clinical Trial: N = 50, 31 patients “respond’” p = .62 Likelihoods: Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 9/100 Decision Strategy Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing Prefer the hypothesis with the higher posterior odds A gap in the theory: How does the investigator do the cost benefit test? Starting a new business venture or entering a new market: Priors and market research FDA approving a new drug or medical device. Priors and clinical trials Statistical Decision Theory adds the costs and benefits of decisions and errors. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 10/100 An Alternative Strategy Recognize the asymmetry of null and alternative hypotheses. Eliminate the prior odds (which are rarely formed or available). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 11/100 http://query.nytimes.com/gst/fullpage.html?res=9C00E4DF113BF935A3575BC0A9649C8B63 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 12/100 Classical Hypothesis Testing Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing Percent The scientific method applied to statistical hypothesis testing Hypothesis: The world works according to my hypothesis Testing or supporting the hypothesis Data gathering Rejection of the hypothesis if the data are inconsistent with it Retention and exposure to further investigation if the data are consistent with the hypothesis Failure to reject is not equivalent to acceptance. Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 13/100 Asymmetric Hypotheses Null Hypothesis: The proposed state of nature Alternative hypothesis: The state of nature that is believed to prevail if the null is rejected. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 14/100 Hypothesis Testing Strategy Formulate the null hypothesis Gather the evidence Question: If my null hypothesis were true, how likely is it that I would have observed this evidence? Very unlikely: Reject the hypothesis Not unlikely: Do not reject. (Retain the hypothesis for continued scrutiny.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 15/100 Some Terms of Art Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Type I error: Incorrectly rejecting a true null Type II error: Failure to reject a false null Power of a test: Probability a test will correctly reject a false null Alpha level: Probability that a test will incorrectly reject a true null. This is sometimes called the size of the test. Significance Level: Probability that a test will retain a true null = 1 – alpha. Rejection Region: Evidence that will lead to rejection of the null Test statistic: Specific sample evidence used to test the hypothesis Distribution of the test statistic under the null hypothesis: Probability model used to compute probability of rejecting the null. (Crucial to the testing strategy – how does the analyst assess the evidence?) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 16/100 Possible Errors in Testing Hypothesis is True Correct Decision I Do Not Reject the Hypothesis 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Type II Error Correct Type I Error Decision I Reject the Hypothesis Mushroom and Onion 9.2% Hypothesis is False 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 17/100 A Legal Analogy: The Null Hypothesis is INNOCENT Null Hypothesis Not Guilty Finding: Verdict Not Guilty Alternative Hypothesis Guilty Type II Error Guilty defendant goes free Correct Decision Type I Error Finding: Verdict Guilty Innocent defendant is convicted Correct Decision The errors are not symmetric. Most thinkers consider Type I errors to be more serious than Type II in this setting. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 18/100 (Jerzy) Neyman – (Karl) Pearson Methodology “Statistical” testing Methodology Formulate the “null” hypothesis Decide (in advance) what kinds of “evidence” (data) will lead to rejection of the null hypothesis. I.e., define the rejection region Gather the data Mechanically carry out the test. Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 19/100 Formulating the Null Hypothesis 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing Stating the hypothesis: A belief about the “state of nature” A parameter takes a particular value There is a relationship between variables And so on… The null vs. the alternative By induction: If we wish to find evidence of something, first assume it is not true. Look for evidence that leads to rejection of the assumed hypothesis. Evidence that rejects the null hypothesis is significant Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 20/100 Example: Credit Scoring Rule Investigation: I believe that Fair Isaacs relies on home ownership in deciding whether to “accept” an application. 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% What decision rule should I use? Listing Null hypothesis: There is no relationship Alternative hypothesis: They do use homeownership data. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 21/100 Some Evidence = Homeowners 5469 5030 1845 1100 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 22/100 Hypothesis Test Acceptance rate for homeowners = 5030/(5030+1100) = .82055 Acceptance rate for renters is .74774 H0: Acceptance rate for renters is not less than for owners. H0: p(renters) > .82055 H1: p(renters) < .82055 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 23/100 The Rejection Region What is the “rejection region?” Data (evidence) that are inconsistent with my hypothesis Evidence is divided into two types: Data that are inconsistent with my hypothesis (the rejection region) Everything else 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 24/100 My Testing Procedure I will reject H0 if p(renters) < .815 (chosen arbitrarily) Rejection region is sample values of p(renters) < 0.815 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 25/100 Distribution of the Test Statistic Under the Null Hypothesis Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Test statistic p(renters) = 1/N i Accept(=1 or 0) Use the central limit theorem: Assumed mean = .82055 Implied standard deviation = sqr(.82055*.17945/7413)=.00459 Using CLT, normally distributed. (N is very large). Use z = (p(renters) - .82055) / .00459 Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 26/100 Alpha Level and Rejection Region Prob(Reject H0|H0 true) = Prob(p < .815 | H0 is true) = Prob[(p - .82055)/.00459) = Prob[z < -1.209] = .11333 Probability of a Type I error Alpha level for this test 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 27/100 Distribution of the Test Statistic and the Rejection Region Area=.11333 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 28/100 The Test 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% The null hypothesis is rejected at the 11.333% significance level (by the design of the test) Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% The observed proportion is 5469/(5469+1845) = 5469/7314 = .74774 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 29/100 Power of the test Power = probability that the test will reject a false null Power depends on the alternative hypothesis For this test, for specific renter , Power = Prob[p renter .815 | renter = the value] p renter renter .815 renter = Prob renter (1 renter ) / 7314 renter (1 renter ) / 7314 .815 renter = Prob z using the normal distribution renter (1 renter ) / 7314 = 1 - (renter ) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 30/100 Power Function for the Test (Power = size when alternative = the null.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 31/100 Application: Breast Cancer On Long Island Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing Null Hypothesis: There is no link between the high cancer rate on LI and the use of pesticides and toxic chemicals in dry cleaning, farming, etc. Neyman-Pearson Procedure Examine the physical and statistical evidence If there is convincing covariation, reject the null hypothesis What is the rejection region? The NCI study: Working null hypothesis: There is a link: We will find the evidence. How do you reject this hypothesis? Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 32/100 Formulating the Testing Procedure 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing Usually: What kind of data will lead me to reject the hypothesis? Thinking scientifically: If you want to “prove” a hypothesis is true (or you want to support one) begin by assuming your hypothesis is not true, and look for evidence that contradicts the assumption. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 33/100 Hypothesis About a Mean I believe that the average income of individuals in a population is $30,000. Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency I will draw the sample and examine the data. The rejection region is data for which the sample mean is far from $30,000. How far is far????? That is the test. Listing H0 : μ = $30,000 (The null) H1: μ ≠ $30,000 (The alternative) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 34/100 Application The mean of a population takes a specific value: Null hypothesis: H0: μ = $30,000 H1: μ ≠ $30,000 Test: Sample mean close to hypothesized population mean? Rejection region: Sample means that are far from $30,000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 35/100 Deciding on the Rejection Region If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, for example, Rejection Rejection 29,500 30,000 30,500 The probability that the mean falls in the rejection region even though the hypothesis is true (should not be rejected) is the probability of a type 1 error. Even if the true mean really is $30,000, the sample mean could fall in the rejection region. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 36/100 Reduce the Probability of a Type I Error by Making the (non)Rejection Region Wider Reduce the probability of a type I error by moving the boundaries of the rejection region farther out. Probability outside this interval is large. 28,500 29,500 You can make a type I error impossible by making the rejection region very far from the null. Then you would never make a type I error because you would never reject H0. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 31,500 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% 30,500 Probability outside this interval is much smaller. Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 30,000 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 37/100 Setting the α Level Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing “α” is the probability of a type I error Choose the width of the interval by choosing the desired probability of a type I error, based on the t or normal distribution. (How confident do I want to be?) Multiply the z or t value by the standard error of the mean. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 38/100 Testing Procedure Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing The rejection region will be the range of values greater than μ0 + zσ/√N or less than μ0 - zσ/√N Use z = 1.96 for 1 - α = 95% Use z = 2.576 for 1 - α = 99% Use the t table if small sample, variance is estimated and sampling from a normal distribution. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 39/100 Deciding on the Rejection Region If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, say, Rejection Rejection $30,000 1.96 $30,000 1.96 N N I am 95% certain that I will not commit a type I error (reject the hypothesis in error). (I cannot be 100% certain.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 40/100 The Testing Procedure (For a Mean) Reject if x < 0 -1.96 x - 0 > 1.96 or x - 0 > 1.96 x - 30,000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 200000 < -1.96 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 Listing 500000 N 1.96 Frequency 800000 800000 600000 / N Percent Sausage 5.8% 900000 or x - 0 / N or z < -1.96 Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% x - 0 < -1.96 N Reject if Meatball Garlic 5.0% 2.3% or 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing or Pie Chart of Percent vs Type N N / N or z > 1.96 Mushroom and Onion 9.2% Percent Reject if x > 0 1.96 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 41/100 The Test Procedure Choosing z = 1.96 makes the probability of a Type I error 0.05. Choosing z = 2.576 would reduce the probability of a Type I error to 0.01. Reducing the probability of a Type I error reduces the power of the test because it reduces the probability that the null hypothesis will be rejected. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 42/100 P Value 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% Null hypothesis is rejected if P value < Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Probability of observing the sample evidence assuming the null hypothesis is true. Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 43/100 P value < Prob[p(renter) < .74774] = Prob[z < (.74774 - .82055)/.00459] = (-15.86) = .59946942854362260 * 10-56 Impossible =.11333 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 44/100 Confidence Intervals 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% For a two sided test about a parameter, a confidence interval is the complement of the rejection region. (Proof in text, p. 338) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 45/100 Confidence Interval If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, say, Rejection Confidence $30,000 1.96 Rejection $30,000 1.96 N N I am 95% certain that the confidence interval contains the true mean of the distribution of incomes. (I cannot be 100% certain.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 46/100 One Sided Tests H0 = 0, H1 0 Rejection region is sample mean far from 0 in either direction H0 = 0, H1 > 0. Sample means less than 0 cannot be in the rejection region. Entire rejection region is above 0. Reformulate: H0 < 0, H1 > 0. Rejection region is $30,000+1.645 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% N 0 1000000 60 800000 40 Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 47/100 Likelihood Ratio Tests Likelihood(0 | data) General Format: Likelihood Ratio = Likelihood(1 | data) Resembles posterior odds with equal priors for a simple null and a simple alternative. (E.g., Poisson mean = 2 or 1.1) Practical format: Simple null vs. composite alternative Likelihood(0 | Information in H 0 ) =Likelihood Ratio = Likelihood( | All sample information) Small value of weighs against H 0 . In standard format, reject H 0 if < c* for a given . 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 48/100 Carrying Out the LR Test In most cases, exact distribution of the statistic is unknown Use -2log Chi squared [1] For a test about 1 parameter, threshold value is 3.84 (5%) or 6.45 (1%) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 49/100 Poisson Likelihood Ratio Test y = 5, 0, 1, 1, 0, 3, 2, 3, 4, 1 f(y|) = exp(-) y =Poisson y! exp()5 exp()0 exp()1 Likelihood = ... 5! 0! 1! 20 exp(10) = 207,360 Log likelihood = -10 + 20log - 12.242 Maximum occurs at = 2, LogL = -18.379 Null Hypothesis = 1.10 LogL = -21.335 Chi squared = -2(-18.379-(-21.335)) = 5.912 (Reject H 0 ) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 49 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 50/100 Generalities About LR Test log L | less sample information log L | more sample information More sample information will mean estimating more parameters. Test cannot be used for a simple null vs. a simple alternative. 2 log = chi squared with degrees of freedom equal to the difference in the number of parameters. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 51/100 Gamma Application Income data: N = 27,326 P e Income Income P 1 Gamma Model: f(income)= (P) LogL NP log N log (P) i 1 Incomei (P 1) i 1 log Incomei N N Maximum Likelihood Estimates are =2.55971, P = 4.55320, LogL = 12574.88 Fix P at 3.5. Estimate of =2.29664, LogL = 12106.56 Chi squared = 936.64 much larger than 3.84. Reject H 0 . 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 52/100 Specification Tests Generally a test about a distribution where the alternative is “some other distribution.” Test is generally based on a feature of the distribution that is true under the null but not true under the alternative. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 53/100 Poisson Specification Tests 3820 observations on doctor visits Poisson distribution? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 54/100 Deviance Test Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Poisson Distribution p(x) = exp(-)x/x! H0: Everyone has the same Poisson Distribution H1: Everyone has their own Poisson distribution Under H0, observations will tend to be near the mean. Under H1, there will be much more variation. Likelihood ratio statistic (Text, p. 348) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 55/100 Deviance Test xi 2 log L 2 i 1 x i log (0 log0 = 0) x Chi squared has N-1 degrees of freedom. N Sample value 17,862 with 3,820 degrees of freedom. Treat as normal: (17,862 - 3820)/ 2(3820) 160.65 REJECT Null Hypothesis of Poisson 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 56/100 Dispersion Test Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Poisson Distribution p(x) = exp(-)x/x! H0: The distribution is Poisson H1: The distribution is something else Under H0, the mean will be (almost) the same as the variance Approximate Likelihood ratio statistic (Text, p. 348) = N * Variance / Mean For the doctor visit data, this is 22,348.6 vs. chi squared with 1 degree of freedom. H0 is rejected. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 57/100 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 58/100 Specification Test - Normality 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% Compare observed 3rd and 4th moments to what would be expected from a normal distribution. Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Normal Distribution is symmetric and has kurtosis = 3. Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 59/100 Symmetric and Skewed Distributions 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 60/100 Kurtosis: t[5] vs. Normal 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% =3 = 3 + 6/(k-4); for t[5] = 3+6/(5-4) = 9. Percent Kurtosis of normal(0,1) Kurtosis of t[k] 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 61/100 Bowman and Shenton Test for Normality m / s3 2 3 Chi Squared [2] = N 6 x N mj i 1 i x N 1 m 2 4 / s 3 4 24 j , s = m2 For the income data, chi squared = 1709.62 vs. 5.99. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 62/100 Testing for a Distribution Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing H0: The distribution is assumed H1: The assumed distribution is incorrect Strategy: Do the features of the sample resemble what we would observe if H0 were correct Continuous: CDF of data resemble CDF of the assumed distribution Discrete: Sample cell probabilities resemble predictions from the assumed distribution Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 63/100 Probability Plot for Normality x1 , x 2 ,..., x N hypothesized to be a sample from N[,]. Sample mean = x, sample standard deviation = s Sort the data into x (1) x (2) ... x ( N) . Theoretical quantiles of normal with mean x and standard deviation s is k x̂ (k ) x s 1 . N 1 Plot of x (k ) vs. xˆ (k ) should lie on a diagonal line. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 64/100 Normal (log)Income? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 65/100 Random Sample from Normal 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 66/100 Normality Tests Bowman and Shenton based on 3rd and 4th moments Kolmogorov-Smirnov based on CDF D N sup x | Theoretical cdf - Sample cdf | Test statistic is N DN Rough critical values values (95%) N=20, .294 N=25, .27 N=30, .24 N=35, .23 N>35, use 1.36/ N for 95% use 1.63/ N for 99% 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 67/100 Kolmogorov - Smirnov Test Sorted data min = x (1) x (2) ... x (k ) ... max = x ( N) 1 2 k N ... ... N 1 N 1 N 1 N 1 x x x (2) x x (k ) x x ( N) x Theoretical CDF (1) ... ... s s s s Empirical CDF For log income data, K-S = 0.1181 Critical values with N = 3820 are 0.0219 and 0.0262. The hypothesis of normality would be rejected. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 68/100 Chi Squared Test for a Discrete Distribution Outcomes = A1, A2,…, AM Predicted probabilities based on a theoretical distribution = E1(), E2(),…,EM(). Sample cell frequencies = O1,…,OM 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 69/100 Test Statistics Om Deviance = 2 m 1 O m log , (0log0 = 0) E m () 2 M [Oi E i ()] Pearson chi squared N m 1 E i () M Both distributed chi squared with degrees of freedom M - 1 - number of parameters in under the null hypothesis. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 70/100 Adapted from Richard Isaac, The Pleasures of Probability, Springer Verlag, 1995, pp. 99101. V2 Rocket Hits 576 0.25Km2 areas of South London in a grid (24 by 24) 535 rockets were fired randomly into the grid = N P(a rocket hits a particular grid area) = 1/576 = 0.001736 = θ Expected number of rocket hits in a particular area = 535/576 = 0.92882 How many rockets will hit any particular area? 0,1,2,… could be anything up to 535. The 0.9288 is the λ for a Poisson distribution: exp(-λ)λ#hits P(# hits) , # hits 0,1,2,... # hits! 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 71/100 1 2 3 4 5 6 7 8 9 10 11 12 Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 700000 500000 6 200000 2 1 100000 15000 800000 1000000 Normal 8 5 400000 600000 Listing Empirical CDF of Listing Mean StDev N 100 369687 156865 51 10 4 200000 13 Marginal Plot of Listing vs IncomePC 80 300000 0 12 12 400000 10 11 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 400000 10 Histogram of Listing 900000 Mean StDev N AD P-Value 90 500000 9 Scatterplot of Listing vs IncomePC Normal - 95% CI 600000 200000 8 Probability Plot of Listing 99 700000 300000 100000 7 95 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 6 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% 5 Percent Meatball Garlic 5.0% 2.3% 4 Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% 3 Listing 2 Percent 1 Listing 13 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 1 72/100 2 3 4 5 6 7 8 9 10 11 12 13 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 400000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 700000 500000 6 200000 2 1 100000 15000 800000 1000000 Normal 8 5 400000 600000 Listing Empirical CDF of Listing Mean StDev N 100 369687 156865 51 10 4 200000 13 Marginal Plot of Listing vs IncomePC 80 300000 0 12 12 400000 10 11 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 10 Histogram of Listing 900000 Mean StDev N AD P-Value 90 600000 9 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 200000 8 Probability Plot of Listing 99 95 300000 100000 7 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% 6 Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 5 Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% 4 Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% 3 Listing 2 Percent 1 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 73/100 1 2 3 4 5 6 7 8 9 10 11 12 Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 700000 500000 6 200000 2 1 100000 15000 800000 1000000 Normal 8 5 400000 600000 Listing Empirical CDF of Listing Mean StDev N 100 369687 156865 51 10 4 200000 13 Marginal Plot of Listing vs IncomePC 80 300000 0 12 12 400000 10 11 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 400000 10 Histogram of Listing 900000 Mean StDev N AD P-Value 90 500000 9 Scatterplot of Listing vs IncomePC Normal - 95% CI 600000 200000 8 Probability Plot of Listing 99 700000 300000 100000 7 95 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 6 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% 5 Percent Meatball Garlic 5.0% 2.3% 4 Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% 3 Listing 2 Percent 1 Listing 13 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 74/100 Poisson Process θ = 1/169 N = 144 λ = 144 * 1/169 = 0.852 Probabilities: P(X=0) = .4266 P(X=1) = .3634 P(X=2) = .1548 P(X=3) = .0437 P(X=4) = .0094 P(X>4) = .0021 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 75/100 Interpreting The Process Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Probability Plot of Listing Scatterplot of Listing vs IncomePC Normal - 95% CI Mean StDev N AD P-Value 95 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 99 700000 300000 100000 Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency There are 169 squares There are 144 “trials” Expect .4266*169 = 72.1 to have 0 hits/square Expect .3634*169 = 61.4 to have 1 hit/square Etc. Expect the average number of hits/square to = .852. Listing Meatball Garlic 5.0% 2.3% P(X=0) = .4266 P(X=1) = .3634 P(X=2) = .1548 P(X=3) = .0437 P(X=4) = .0094 P(X>4) = .0021 Mushroom and Onion 9.2% Percent λ = 0.852 Probabilities: Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 76/100 Does the Theory Work? Theoretical Outcomes Sample Outcomes Outcome Probability Number Sample Proportion of Cells 0 .4266 72 .4733 80 1 .3634 61 .2899 49 2 .1548 26 .1539 26 3 .0437 7 .0769 13 4 .0094 2 .0059 1 >4 .0021 1 .0000 0 169*Prob(Outcome) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Observed frequencies Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Number of cells 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 77/100 Chi Squared for the Bombing Run (.4733 .4266) 2 (.2899 .3634) 2 .4266 .3634 (.1539 .1548) 2 (.0769 .0437) 2 2 144 .1548 .0437 2 2 (.0059 .0094) (.0000 .0021) .0094 .0021 = 6.99976 Degrees of freedom = 6 - 1 - 1 = 4. Critical chi squared = 9.49 Poisson is not rejected 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 77 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 78/100 Difference in Means of Two Populations Two Independent Normal Populations Common known variance Common unknown variance Different Variances One and two sided tests Paired Samples Means of paired observations Treatments and Controls – Diff-in-Diff SAT Nonparametric – Mann/Whitney Two Bernoulli Populations 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 79/100 Comparing Two Normal Populations X ~ N[ x , 2 ], Y ~ N[ y , 2 ], independent H 0 : x y , H1 : x y Equivalent : H 0 : x y 0 Common known 2 Samples N x and N y Base test on x - y Given independence and normality, 2 2 1 1 2 x - y ~ N x y , or N 0, assuming H 0 N N N N y y x x 1 1 Confidence interval: x - y z /2 2 N N y x The region outside the confidence interval is the rejection region 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 80/100 Unknown Common Variance iNx1 (x i x) 2 i 1y (yi y) 2 s Nx Ny 2 N 2 (N x 1)s 2x (N y 1)s 2y (N x 1) (N x 1) (N y 1) = ws 2x (1 w)s 2y , w = (N x 1) (N y 1) Test is based on the t distribution with N x N y 2 degrees of freedom 1 1 Confidence interval: x - y t /2 s N N y x The region outside the confidence interval is the rejection region 2 If N x N y 50, will be indistinguishable from the normal distribution. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 81/100 Household Incomes, Equal Variances -----------------------------------------------------t test of equal means INCOME by MARRIED -----------------------------------------------------MARRIED = 0 Nx = 817 MARRIED = 1 Ny = 3057 t [ 3872] = 3.7238 P value = .0002 -----------------------------------------------------Mean Std.Dev. Std.Error INCOME ---------------------------------------------MARRIED = 0 .27982 .12939 .00453 MARRIED = 1 .30145 .15194 .00275 ------------------------------------------------------ 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 82/100 Unknown Different Variances iNx1 (x i x) 2 2 iNx1 (yi y) 2 s , sy Nx 1 Ny 1 2 x Small samples, use t distribution with 2 2 2 s x / N x s x / N x degrees of freedom = int 2 2 2 2 s / N s / N x y y x N 1 N 1 x y Test is usually based on the (asymptotic) normal distribution with s 2x s 2y Confidence interval: x - y z /2 N N y x The region outside the confidence interval is the rejection region If N x N y 50, will be indistinguishable from the normal distribution. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 83/100 2 Proportions Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Two Bernoulli Populations: Xi ~ Bernoulli with Prob(xi=1) = x Yi ~ Bernoulli with Prob(yi=1) = y H0: x = y The sample proportions are px = (1/Nx)ixi and py = (1/Ny)iyi Sample variances are px(1-px) and py(1-py). Use the Central Limit Theorem to form the test statistic. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 84/100 z Test for Equality of Proportions p x -p y z= p x (1-p x ) p y (1-p y ) + Nx Ny Application: Take up of public health insurance. -----------------------------------------------------t test of equal means PUBLIC by FEMALE -----------------------------------------------------FEMALE =0 Nx = 1812 FEMALE =1 Ny = 1565 t [ 3375] = 5.8627 P value = .0000 -----------------------------------------------------Mean Std.Dev. Std.Error PUBLIC ---------------------------------------------FEMALE = 0 .84713 .35996 .00846 FEMALE = 1 .91310 .28178 .00712 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 85/100 Paired Sample t and z Test Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing Observations are pairs (Xi,Yi), i = 1,…,N Hypothesis x = y. Both normal distributions. May be correlated. Medical Trials: Smoking vs. Nonsmoking (separate individuals, probably independent) SAT repeat tests, before and after. (Definitely correlated) Test is based on Di = Xi – Yi. Same as earlier with H0:D = 0. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 86/100 Treatment Effects SAT Do Overs Placebo: In Medical trials, N1 subjects receive a drug (treatment), N2 receive a placebo. Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Hypothesis: Effect is greater in the treatment group than in the control (placebo) group. Frequency Listing Experiment: X1, X2, …, XN = first SAT score, Y1, Y2, …, YN = second Treatment: T1,…,TN = whether or not the student took a Kaplan (or similar) prep score Hypothesis, y > x. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 87/100 Measuring Treatment Effects Measuring SAT test scores: Difference in Differences Use D = (y - x)|T=1 - (y - x)|T=0 Hypothesis is D = 0. Major complication: Nonrandom treatment assignment. Individuals choose the test prep course themselves. Choosers believe the difference will be positive. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 88/100 Treatment Effects in Clinical Trials N+0 = “The placebo effect” N+T – N+0 = “The treatment effect” The hypothesis is that the difference in differences has mean zero. Placebo Plain 32.5% N+T Scatterplot of Listing vs IncomePC 900000 900000 800000 800000 600000 500000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 400000 Mushroom 16.2% N+0 Frequency Sausage 5.8% Positive Effect Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% N0T Listing Pepperoni 21.8% N00 Listing Meatball Garlic 5.0% 2.3% No Effect Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Drug Treatment 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Does Phenogyrabluthefentanoel (Zorgrab) work? Investigate: Carry out a clinical trial. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 89/100 A Test of Independence Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing Percent In the credit card example, are Own/Rent and Accept/Reject independent? Hypothesis: Prob(Ownership) and Prob(Acceptance) are independent Formal hypothesis, based only on the laws of probability: Prob(Own,Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities. Rejection region: Joint frequencies that do not look like the products of the marginal frequencies. Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 90/100 Contingency Table Analysis The Data: Frequencies Reject Accept Total Rent 1,845 5,469 7,214 Own 1,100 5,030 6,630 Total 2,945 10,499 13,444 Step 1: Convert to Actual Proportions Reject Accept Total Rent 0.13724 0.40680 0.54404 Own 0.08182 0.37414 0.45596 Total 0.21906 0.78094 1.00000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 91/100 Independence Test Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions. 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0.11918 0.42486 0.09988 0.35606 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball = = = = Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 0.54404 x 0.21906 0.54404 x 0.78094 0.45596 x 0.21906 0.45596 x 0.78094 Frequency [Rent,Reject] [Rent,Accept] [Own,Reject] [Own,Accept] 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 92/100 Comparing Actual to Expected The statistic is N times the sum over the four cells (Observed-Expected)2 = N × Rows Columns Expected If this is large (because the observed proportions don't 2 look like the expected ones) then reject the hypothesis. (0.13724 0.11918)2 (0.40680 0.42486)2 0.11918 0.42486 2 13,444 2 2 (0.08182 0.09988) (0.37414 0.35608) 0.09988 0.35608 = 103.33013 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 93/100 When is the Chi Squared Large? Critical chi squared D.F. .05 .01 1 3.84 6.63 2 5.99 9.21 3 7.81 11.34 4 9.49 13.28 5 11.07 15.09 6 12.59 16.81 7 14.07 18.48 8 15.51 20.09 9 16.92 21.67 10 18.31 23.21 Critical values from chi squared table Degrees of freedom = (R-1)(C-1). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 94/100 Analyzing Default 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 100000 996 9.49 10499 100.00 Scatterplot of Listing vs IncomePC 90 369687 156865 51 0.994 0.012 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 4 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 Mean StDev N 369687 156865 51 80 8 300000 0 Normal 10 500000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 700000 600000 70 20 300000 Histogram of Listing 14 800000 80 400000 100000 15000 9503 90.51 900000 Mean StDev N AD P-Value 95 500000 200000 All Normal - 95% CI 600000 200000 5030 47.91 Probability Plot of Listing 99 700000 300000 381 3.63 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 4649 44.28 Percent Pepperoni 21.8% 1 Frequency Meatball Garlic 5.0% 2.3% OWNRENT 0 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Do renters default more often (at a different rate) than owners? To investigate, we study the cardholders (only) Listing DEFAULT 0 1 All 4854 615 5469 46.23 5.86 52.09 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 95/100 Hypothesis Test 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 96/100 Multiple Choices: Travel Mode Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing 210 Travelers between Sydney and Melbourne 4 available modes, air, train, bus, car Among the observed variables is income. Does income help to explain mode choice? Hypothesis: Mode choice and income are independent. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 97/100 Travel Mode Choices 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 98/100 Travel Mode Choices and Income +----------------------------------------------------------+ | Travel MODE Data | +--------+-------------------------------------------------+ |INCOME | AIR TRAIN BUS CAR || Total | +--------+-------------------------------------++----------+ |LOW | 10 36 9 8 || 63 | | | 0.04761 0.17143 0.04286 0.03810 || 0.30000 | |----------------------------------------------++----------+ |MEDIUM | 19 20 13 24 || 76 | | | 0.09048 0.09524 0.06190 0.11429 || 0.36190 | |----------------------------------------------++----------+ |HIGH | 29 7 8 27 || 71 | | | 0.13810 0.03333 0.03810 0.12857 || 0.33810 | |==============================================++==========+ |Total | 58 63 30 59 || 210 | | | 0.27619 0.30000 0.14286 0.28095 || 1.00000 | +--------+-------------------------------------+-----------+ 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 99/100 Contingency Table +----------------------------------------------------------+ | Travel MODE Data | +--------+-------------------------------------------------+ |INCOME | AIR TRAIN BUS CAR || Total | +--------+-------------------------------------++----------+ | | 10 36 9 8 || 63 | |LOW | 0.04761 0.17143 0.04286 0.03810 || 0.30000 | | | 0.08286 0.09000 0.04286 0.08429 || |----------------------------------------------++----------+ | | 19 20 13 24 || 76 | |MEDIUM | 0.09048 0.09524 0.06190 0.11429 || 0.36190 | | | 0.09995 0.10857 0.05170 0.10168 || |----------------------------------------------++----------+ | | 29 7 8 27 || 71 | |HIGH | 0.13810 0.03333 0.03810 0.12857 || 0.33810 | | | 0.09338 0.10143 0.04830 0.09499 || |==============================================++==========+ |Total | 58 63 30 59 || 210 | | | 0.27619 0.30000 0.14286 0.28095 || 1.00000 | +--------+-------------------------------------+-----------+ Assuming independence, P(Income,Mode) = P(Income) x P(Mode). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 5 – Hypothesis Testing 100/100 Computing Chi Squared (Observed-Expected)2 = N × Rows Columns Expected = 42.26158. 2 For our transport mode problem, R = 3, C = 4, so DF = 2x3 = 6. The critical value is 12.59. The hypothesis of independence is rejected. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000