Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Professor William Greene Stern School of Business IOMS Department Department of Economics Part 4 – Statistical Inference 4.1 – The Normal Family of Distributions Part 4 – Statistical Inference 4/34 Normal 2 1 1 x -μ f(x) = exp - , - < x < + σ 2π 2 σ Mean = Standard deviation 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 5/34 Standard Normal 1 2 exp - x , - < x < + f(x) = 2π 2 0, = 1 1 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 6/34 Chi Squared 1 = Square of N(0,1) Z ~ N[0,1] Dens i ty of Chi Squared[1] X Z2 ~ Chi Squared with 1 degree of freedom = 12 2. 50 .5.5 x .51e .5X 1 1 f(x) = , x 0 = Gamma , (.5) 2 2 2. 00 1. 50 E[X] E[Z ] 1 0 1 2 2 F 2 Var[X] E[X 2 ] E 2 [X] 1. 00 E[Z4 ] E 2 [Z2 ] 3 1 2 . 50 . 00 0 1 2 3 4 Z 2 2 If Z ~ N[,] then (N[0,1]) Chi Squared(1) = 1 2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 9 Marginal Plot of Listing vs IncomePC Mean StDev N 10 500000 300000 10 8 Normal 100 12 700000 400000 30 7 Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 6 300000 100000 Probability Plot of Listing 99 6 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 5 Z_SQ D 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 7/34 Limit Result for Square of N(0,1) Suppose Z N Normal(0,1) d (as a consequence of a central limit theorem). Then Z d 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 7 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 2 1 Frequency 2 N 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 8/34 Sum of Two Independent Chi Squared(1) Variables Dens i ty of Chi Squared[2] Sum of Two Independent Chi Squared Variables . 2000 2 2 . 1750 . 1500 Chi squared with 2 degrees of . 1250 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 400000 100000 60 50 40 369687 156865 51 0.994 0.012 20000 22500 25000 IncomePC 27500 30000 32500 16 18 Marginal Plot of Listing vs IncomePC Normal Mean StDev N 369687 156865 51 80 6 2 1 100000 15000 1000000 14 Empirical CDF of Listing 100 8 200000 800000 12 10 500000 5 400000 600000 Listing 10 12 700000 4 200000 8 Histogram of Listing 300000 0 6 14 800000 400000 30 4 X2 600000 70 10 17500 2 Scatterplot of Listing vs IncomePC 20 300000 100000 15000 0 80 500000 200000 . 0000 900000 Mean StDev N AD P-Value 90 600000 200000 . 0250 Normal - 95% CI 700000 300000 . 0500 Probability Plot of Listing 99 95 8 . 0750 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball . 1000 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% F 2/2 ( 1/2)x (2/2) 1 x 1 2 ( 12 ) e freedom is Gamma , (2 / 2) 2 2 Expected Value = 2 Variance = 2*2 = 4 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 2 1 Percent (1) (2) ~ 2 1 . 2250 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 9/34 Sum of N Independent Chi Squareds X1 ,..., X N all independent, all chi squared (1) X= i 1 X i ~ Chi Squared (N) N 1 N Gamma , . f(x) = 2 2 1 2 e (1/2)x x N/21 N 2 N2 Mean = N, Variance = 2N. (Prove by sum of independent variables each with mean 1 and variance 2.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 9 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 10/34 Limit Result for Square of Normal ZN d ZN d 2 Suppose Normal(0,1) so 1 p Suppose s N . 2 2 ZN ZN d d 2 Then Normal(0,1) and 1 sN s N 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 10 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 11/34 Noncentral Chi Squared Suppose Z~N[,1], 0 Z2 ~ Noncentral chi squared with Central and Nonc entral Chi Squared . 254 2 noncentrality parameter 2 Suppose Zi ~N[,1], 0 . 203 Vari abl e . 153 . 102 . 051 . 000 3 4 5 6 7 8 9 k 1 K 2 noncentrality parameter 2 X Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 90 600000 300000 100000 Probability Plot of Listing 99 95 11 Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% NO NCNTRL Percent CENTRAL 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 2 Percent 1 Z2k ~ Noncentral chi squared[K] with Frequency 0 K 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 12/34 t distribution N[0,1] t v 2 v If v=1, t=N[0,1]/N[0,1] = Cauchy. No finite moments. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 12 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 13/34 Limiting Form of t tv N[0,1] 2v v 2 1 2 1 1 2 v has mean v 1 and variance 2v = v v v v As v , random variable in denominator converges in mean square to 1. Implication d t v N[0,1] 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 13 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 14/34 F Distribution x n numerator chi squared variable x d denominator chi squared variable Independent 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 14 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Fn,d x n / n n2 / n 2 x d / d d / d 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 15/34 Limiting Form of F Limit form of F relates to denominator degrees of freedom, den. x num numerator chi squared variable x den denominator chi squared variable Independent Fnum,den 2 x num / num num / num 2 x den / den den / den d 2 As den , d2 / den 1 and num F num 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 15 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 16/34 95% critical values for chi squared 95% critical values for limiting F distribution Multiply value in last row by degrees of freedom. Equals value for chi-squared. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 16 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 17/34 Special Case of F Sausage 5.8% 900000 800000 800000 700000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 17 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% /k Percent Pepperoni 21.8% F(1, k) 2 k Listing Meatball Garlic 5.0% 2.3% /1 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% N[0,1] 2 Percent t 2 k k 2 k Frequency tk N[0,1] 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 18/34 Independence of Sample Mean and Variance in Normal Sampling X (X1 , X 2 ,..., X N ) n independent Normal[, 2 ] 1 N Xi N i 1 1 N Sample variance = s2 (X i X) 2 i 1 N 1 2 Main result: X and s are independent. Long elemental proof in text pp 195-197 Sample mean X = 2 Brief proof: (1) X sum of normals, X ~ Normal , N N 1 *** (2) X i X = linear function of normals, each is ~ N 0, 2 N (3) Cov[X, X i X] 0 (4) In multivariate normals, zero covariance ==> independence (5) X and s 2 are functions of independent variables 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 18 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 19/34 Useful Result 2 (N 1)s 2 1 N 2 X X ~ i N 1 2 2 i 1 Note, N-1 degrees of freedom, not N. (Terms are not independent). Proof in text. 2 Limiting form: Cov Xi X, X j X = - 0 N Implication: E[s 2 /2 ] 1, Var[s 2 /2 ] 2 / (N 1) s2 p 1 (converges in mean square) 2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 19 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 20/34 Distribution of the t statistic X N ~ t N 1 (N 1)s 2 (N 1) 2 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 20 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency X s/ N 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 4.2 – Interval Estimation Part 4 – Statistical Inference 22/34 Estimation 800000 800000 700000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 22 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% Interval Estimator: Provides a range of values that incorporates both the point estimator and the uncertainty about the ability of the point estimator to find the population feature exactly. Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Point Estimator: Provides a single estimate of the feature in question based on prior and sample information. Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 23/34 Obtaining a Confidence Interval Pivotal quantity f(estimator, parameters) that has a known distribution free of parameters and data Probability statement can be made about the pivotal quantity Manipulate the interval to describe the parameter. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 23 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 24/34 Example – Normal Mean In random sampling from the normal distribution with mean and variance 2 , N (x-μ) ~t[N-1] This is free of x. s N (x-μ) Prob t * (1 ) s Therefore, s Prob x-μ t * (1 ) N s s Prob xt* x+ t * (1 ) N N 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 24 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 25/34 t distribution – values of t* 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 25 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 26/34 Normal Variance In random sampling from the normal distribution, ( N 1) s 2 ~ 2 [ N 1] 2 Therefore, 2 ( N 1) s 2 2 Prob / 2 1 ( / 2) (1 ) 2 1 2 1 Prob 2 2 (1 ) 2 ( / 2) 1 ( / 2) ( N 1) s ( N 1) s 2 ( N 1) s 2 2 Prob 2 (1 ) 2 ( / 2) 1 ( / 2) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 26 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 27/34 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 27 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 28/34 GSOEP Income Data Descriptive Statistics for 1 variables --------+--------------------------------------------------------------------Variable| Mean Std.Dev. Minimum Maximum Cases Missing --------+--------------------------------------------------------------------HHNINC| .353343 .157058 .035000 1.500000 24 0 --------+--------------------------------------------------------------------- For the mean, t* for 24-1 = 23 degrees of freedom = 2.069 Confidence interval for mean is .353343 +/- 2.069 * (.15708/sqr(24)) = .353343 +/- .032064 Confidence interval for variance: Critical values from chi squared 23 are 11.69 and 38.08. Confidence interval for 2 is (24-1).157082/38.08 to (24-1).157082/11.69 = .014903 to .048546 Confidence interval for is .122078 to .220332 2 Notice, not symmetric around s or s. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 28 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 29/34 Large Sample Results 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 29 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing There are almost no other cases in which there exists an exact pivotal quantity Most estimators rely on large sample results based on central limit theorems (estimator – parameter) ---------------------------------------- N(0,1) standard error of estimator Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 30/34 Confidence Intervals Relying on the Central Limit Theorem ˆ (θ-θ) d N [0,1] ˆ EstimatedVar[θ] ˆ (θ-θ) Prob z * (1 ) EstimatedVar[θ] ˆ Therefore, we use ˆ θ θˆ z * EstimatedVar[θ] ˆ (1 ) Prob θˆ z * EstimatedVar[θ] 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 30 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 31/34 Interpretation of The Interval Not a statement about probabilities that will lie in specific intervals. (1-) percent of the time, the interval will contain the true parameter 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 31 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 32/34 Application: Credit Modeling 1992 American Express analysis of Application process: Acceptance or rejection; X = 0 (reject) or 1 (accept). Cardholder behavior • Loan default (D = 0 or 1). • Average monthly expenditure (E = $/month) • General credit usage/behavior (Y = number of charges) 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% 13,444 applications in November, 1992 Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 33/34 X in 100 samples with N = 144 in each sample 0.7809 is the true proportion in the population of 13,444 we are sampling from. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 4 – Statistical Inference 34/34 Estimates plus and minus 1 and 2 standard errors 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000