Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Professor William Greene Stern School of Business IOMS Department Department of Economics Part 3 – Estimation Theory 2/98 Immediate Reaction to the WHR Health System Performance Report New York Times, June 21, 2000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 3/98 A Model of the Best a Country Could Do vs. what They Actually Do 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 4/98 The following was taken from http://www.msnbc.msn.com/id/27339545/ An msnbc.com guide to presidential polls Why results, samples and methodology vary from survey to survey WASHINGTON - A poll is a small sample of some larger number, an estimate of something about that larger number. For instance, what percentage of people reports that they will cast their ballots for a particular candidate in an election? A sample reflects the larger number from which it is drawn. Let’s say you had a perfectly mixed barrel of 1,000 tennis balls, of which 700 are white and 300 orange. You do your sample by scooping up just 50 of those tennis balls. If your barrel was perfectly mixed, you wouldn’t need to count all 1,000 tennis balls — your sample would tell you that 30 percent of the balls were orange. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 5/98 Use random samples and basic descriptive statistics. What is the ‘breach rate’ in a pool of tens of thousands of mortgages? (‘Breach’ = improperly underwritten or serviced or otherwise faulty mortgage.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 6/98 The forensic analysis was an examination of statistics from a random sample of 1,500 loans. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory Part 3 – Estimation Theory 8/98 Estimation Nonparametric population features Mean - income Correlation – disease incidence and smoking Ratio – income per household member Proportion – proportion of ASCAP music played that is produced by Dave Matthews Distribution – histogram and density estimation Parameters Fitting distributions – mean and variance of lognormal distribution of income Parametric models of populations – relationship of loan rates to attributes of minorities and others in Bank of America settlement on mortgage bias Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 8 300000 100000 Probability Plot of Listing 99 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 9/98 Measurements as Observations Measurement Theory Characteristics Behavior Patterns Choices The theory argues that there are meaningful quantities to be statistically analyzed. 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 0 1000000 60 800000 40 Listing Population 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 10/98 Application – Health and Income German Health Care Usage Data, 7,293 Households, Observed 1984-1995 Data downloaded from Journal of Applied Econometrics Archive. Some variables in the file are DOCVIS = number of visits to the doctor in the observation period HOSPVIS = number of visits to a hospital in the observation period HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years PUBLIC = decision to buy public health insurance HSAT = self assessed health status (0,1,…,10) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 11/98 Observed Data 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 11 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 12/98 Inference about Population Population Measurement Characteristics Behavior Patterns Choices 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 13/98 Classical Inference The population is all 40 million German households (or all households in the entire world). The sample is the 7,293 German households in 1984-1995. Measurement Sample Characteristics Behavior Patterns Choices Imprecise inference about the entire population – sampling theory and asymptotics 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 0 1000000 60 800000 40 Listing Population 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 14/98 Bayesian Inference Measurement Sample Characteristics Behavior Patterns Choices Sharp, ‘exact’ inference about only the sample – the ‘posterior’ density is posterior to the data. 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 0 1000000 60 800000 40 Listing Population 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 15/98 Estimation of Population Features Estimators and Estimates Estimator = strategy for use of the data Estimate = outcome of that strategy Sampling Distribution Qualities of the estimator Uncertainty due to random sampling 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 15 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 16/98 Estimation Point Estimator: Provides a single estimate of the feature in question based on prior and sample information. Interval Estimator: Provides a range of values that incorporates both the point estimator and the uncertainty about the ability of the point estimator to find the population feature exactly. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 16 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 17/98 ‘Repeated Sampling’ - A Sampling Distribution Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Percent Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% This is a histogram for 1,000 means of samples of 20 observations from Normal[500,1002]. Percent Listing Frequency The true mean is 500. Sample means vary around 500, some quite far off. The sample mean has a sampling mean and a sampling variance. The sample mean also has a probability distribution. Looks like a normal distribution. 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 18/98 Application: Credit Modeling 1992 American Express analysis of Application process: Acceptance or rejection; X = 0 (reject) or 1 (accept). Cardholder behavior • Loan default (D = 0 or 1). • Average monthly expenditure (E = $/month) • General credit usage/behavior (Y = number of charges) 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% 13,444 applications in November, 1992 Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 19/98 X in 100 samples with N = 144 in each sample 0.7809 is the true proportion in the population of 13,444 we are sampling from. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 20/98 Estimation Concepts Random Sampling Finite populations i.i.d. sample from an infinite population Information Prior Sample X (X1 , X 2 ,..., X N ) a random sample = a set of 'outside', nonsample information about the population or the feature of interest = a feature of the population of interest ˆ (X , X ,..., X | ) = an estimator of 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 20 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball N Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% 2 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 1 Frequency N 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 21/98 Properties of Estimators ˆ is a function of a random sample, so it is a random variable N ˆ Unbiasedness: E N ˆ Asymptotic Unbiasedness : lim N E N (Usually not useful) ˆ . Consistency: Plim N (Convergence in mean square is usually sufficient.) Efficiency: The 'best' use of the data when there is more than one alternative estimator available. Sampling Distribution: Properties of the estimator used for constructing statistical inference. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 21 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 22/98 Unbiasedness The sample mean of the 100 sample estimates is 0.7844. The population mean (true proportion) is 0.7809. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory Consistency 23/98 N=144 .7 to .88 N=1024 .7 to .88 N=4900 .7 to .88 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 24/98 Competing Estimators of a Parameter Di s tri b u ti o n o f Co s t o f 5 0 0 Ba n k s (i n 5 y e a rs ) 232 F req u en cy 174 116 58 0 8. 538 9. 681 10. 824 11. 967 13. 110 14. 253 15. 395 16. 538 C Bank costs are normally distributed with mean . Which is a better estimator of , the mean (11.46) or the median (11.27)? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 24 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 25/98 Interval estimates of the acceptance rate Based on the 100 samples of 144 observations 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 26/98 Methods of Estimation Information about the source population Approaches Method of Moments Maximum Likelihood Bayesian 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 26 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 27/98 The Method of Moments Estimating Parameters of Distributions Using Moment Equations Population Moment k E[xk ] fk (1, 2 ,..., K ) Sample Moment mk N1 Ni1xik --- mk may also be 1 N Ni1hk (xi ), need not be powers Law of Large Numbers plim mk k fk (1, 2 ,..., K ) 'Moment Equation' (k = 1,...,K) = sample analog to population. Equate the sample moment to the function of population parameters. mk N1 Ni1xik fk (1, 2 ,..., K ) Method of Moments Estimator. Invert the moment equations. ˆ g (m ,...,m ), k = 1,...,K 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball K Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% 1 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% k Frequency k 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 28/98 Estimating a Parameter 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing Mean of Poisson p(y)=exp(-λ) λy / y!, y = 0,1,…; λ > 0 E[y]= λ. E[(1/N)Σiyi]= λ. This is the estimator Mean of Exponential f(y) = exp(-y), y > 0; > 0 E[y] = 1/. E(1/N)Σiyi = 1/. 1/{(1/N)Σiyi } is the estimator of Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 29/98 Mean and Variance of a Normal Distribution 1 y 2 f(y) exp 2 2 Population Moments 1 E[y] , E[y 2 ] 2 2 Moment Equations 1 N Ni1yi , 1 N Ni1yi2 2 2 Method of Moments Estimators ˆ =y, ˆ 2 N1 Ni1yi2 (y 2 ) N1 Ni1(yi y)2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 30/98 Proportion for Bernoulli In the AmEx data, the true population acceptance rate is 0.7809 = Y = 1 if application accepted, 0 if not. E[y] = E[(1/N)Σiyi] = paccept = . This is the estimator 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 30 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 31/98 Gamma Distribution P exp( y)yP1 f(y) (P) P E[y] P(P 1) E[y 2 ] 2 E[1/ y] P 1 E[log y] (P) log , (P)=dln(P)/dP Any pair of moments can be used to estimate and P. Each pair gives a different answer. Is there a 'best' pair? Yes, the ones that are 'sufficient' statistics. E[y] and E[logy]. Later. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 32/98 Method of Moments Gamma Distribution Parameters Plot of Psi(P) Function 2 P exp( y i )yPi 1 p(y i ) (P) Population Moments P E[yi ] , E[logy i ] (P) log Moment Equations: -2 PSI -4 -6 -8 -10 -12 0 P/ 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 6 Marginal Plot of Listing vs IncomePC Mean StDev N 10 500000 300000 0 5 Normal 100 12 700000 400000 10 17500 4 Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 3 (P) = (P) /(P) = dlog (P)/dP Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type 2 P mlog (,P) = E[{(1/N)Ni=1 log y i }] (P) log Mushroom and Onion 9.2% 1 0 1000000 60 800000 40 Listing m1(,P) = E[{(1/N)Ni=1y i }] 0 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 33/98 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 33 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 34/98 Estimate One Parameter Assume known to be 0.1. Estimate P E[y] = P/ = P/.1 = 10P m1 = mean of y = 31.278 Estimate of P is 31.278/10 = 3.1278. One equation in one unknown 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 34 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 35/98 Application Moment equations m1(,P) = E[{(1/N)Ni=1y i }] P/ m2 (,P) = E[{(1/N)Ni=1y i2 }] P(P 1) / 2 mlog (,P) = E[{(1/N)Ni=1 log y i }] (P) log Solving the moment equations using 'least squares:' (This is a convenient approach.) Minimize {m1 E[m1 ]} 2 {mlog E[m1 ]} 2 (m1 (P / ))2 (mlog ( (P) log ))2 m1 31.278 mlog 3.221387 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 36/98 Method of Moments Solutions create ; calc ; Minimize; ; y1=y ; y2=log(y) ; ysq=y*y$ m1=xbr(y1) ; mlog=xbr(y2); m2=xbr(ysq) $ start = 2.0, .06 ; labels = p,l fcn= (m1 - p/l)^2 + (mlog – (psi(p)-log(l)))^2 $ ---------------------------------------------------P| 2.41074 L| .07707 --------+------------------------------------------Minimize; start = 2.0, .06 ; labels = p,l ; fcn= (m1 - p/l)^2 + (m2 – p*(p+1)/l^2 )^2 $ --------+------------------------------------------P| 2.06182 L| .06589 --------+------------------------------------------- 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 37/98 Properties of MoM estimator Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 90 600000 300000 100000 Probability Plot of Listing 99 95 37 Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Efficient? Maybe – remains to be seen. (Which pair of moments should be used for the gamma distribution?) Sampling distribution? Generally normal by virtue of Lindeberg-Levy central limit theorem and the Slutsky theorem. Frequency Assumes parameters can vary continuously Assumes moment functions are continuous and smooth Listing Unbiased? Sometimes, e.g., normal, Bernoulli and Poisson means Consistent? Yes by virtue of Slutsky Theorem Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 38/98 Estimating Sampling Variance Exact sampling results – Poisson Mean, Normal Mean and Variance Approximation based on linearization Bootstrapping – discussed later with maximum likelihood estimator. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 38 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 39/98 Exact Variance of MoM Estimate normal or Poisson mean Estimator is sample mean = (1/N)i Yi. Exact variance of sample mean is 1/N * population variance. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 39 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 40/98 Linearization Approach – 1 Parameter THEORY: Variance of the Method of Moments Estimator Distribution is a function of a parameter N 1 Moment equation: E i 1 g(x i ) m f () N Write the equation in the form m() m f () Linearization: m(ˆ ) m() + m'()(ˆ ) ˆ ˆ + m() m() but m(ˆ ) 0 so ˆ m'() 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 40 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% 1 Var[ˆ ] Var m() . m'() df () ; d Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 2 Frequency Note: m'() 1 m() m'() 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 41/98 Linearization Approach – 1 Parameter APPLICATION: Exponential Distribution f(x|) = exp(-x), > 0, x 0. E[x]=1/, Var[x]=1/2 ˆ = 1/x. m() = x - 1/ m'() = 1/2 . 1/m'() 2 Var[x] 1 / 2 Var[m(θ)] Var[x] N N 2 2 2 1/ 2 ˆ Var[θ] N N 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 41 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 42/98 Linearization Approach - General Estimator is derived from moment equations m a - f a (1 , 2 ,..., K ) 0 mb - f b (1 , 2 ,..., K ) 0 ... 1 N g j ( yi ), j 1,..., K N i 1 There are the same number of moment equations as there are parameters Functions may be powers of x, log of x, 1/x, or other functions. Solutions are ˆ Q(m ,m ,...,m ) etc. Moments are sample means, m j j a b K Two items needed to compute the sampling variance Vjk Cov[m j ,m k ] for all pairs arranged in a matrix f a (1 , 2 ,..., K ) (Jacobian) 1 , 2 ,..., K = K K matrix of derivatives, J The variance is then J -1 VJ -1 . (Requires matrix algebra. Later in the course.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 42 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 43/98 Exercise: Gamma Parameters m1 = 1/N yi => P/ m2 = 1/N yi2 => P(P+1)/ 2 1. What is the Jacobian? (Derivatives) 2. How to compute the variance of m1, the variance of m2 and the covariance of m1 and m2? (The variance of m1 is 1/N times the variance of y; the variance of m2 is 1/N times the variance of y2. The covariance is 1/N times the covariance of y and y2.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 43 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 44/98 Sufficient Statistics Moment equations m1(,P) = E[{(1/N)Ni=1y i }] P/ m2 (,P) = E[{(1/N)Ni=1y i2 }] P(P 1) / 2 mlog (,P) = E[{(1/N)Ni=1 log y i }] (P) log m1(,P) = E[{(1/N)Ni=1(1/ y i )}] / (P 1) Any pair can be used to estimate P and . Is there a best choice? If 'sufficient statistics' exist, the estimator that is a function of them will have a smaller variance than one that is not. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 44 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 45/98 Sufficient Statistic x has density f(x|) where is a parameter. The joint density of a random sample is f(x1 ,x 2 ,...,x N |) T(x1 ,x 2 ,...,x N ) is a statistic formed from the N observations. If the conditional distribution, f(x1 ,x 2 ,...,x N | T(...),) is not a function of , then T(...) is a sufficient statistic for . 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 45 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 46/98 Sufficient Statistic N Bernoulli Trials, X = 0 or 1 with probability i Xi = t successes Prob(X1 =x1 ,...,X N =x N ) = t (1 ) N t Prob(X1 =x1 ,...,X N =x N and i X i = t) = t (1 ) N t N t Prob(i X i = t) = (1 ) N t t t (1 ) N t 1 Prob(X1 =x1 ,...,X N =x N | i X i = t) = N t N N t (1 ) t t 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 46 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 47/98 Sufficient Statistics Sufficient Statistics only exist for 'Exponential Families' of distributions. f(x i |) is an exponential family if and only if log f(x i |) = c()T(x i ) + d() + S(x i ). Then, N T ( xi ) is the sufficient statistic. i 1 When there are K parameters, K log f(x i |1 ,..., K ) = k 1 c k (1 ,..., K )Tk (x i ) + d(1 ,..., K ) + S(x i ) Bernoulli: f(x|) (1 ) (1 ). 1 c() log[ / (1 )], T(x i ) = x i , d() log(1 ), S ( x) 0 x 1 x x The sufficient statistic is N i 1 T(x i ) = N i 1 xi Poisson: log f(x|) = x log - - log x! 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 700000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 90 600000 xi Scatterplot of Listing vs IncomePC Normal - 95% CI 99 95 47 300000 100000 Probability Plot of Listing i 1 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 N 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball i 1 T(x i ) = Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% N Frequency The sufficient statistic is 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 48/98 Gamma Density P x P 1ex f(x|,P)= ( P ) log f(x i |,P) P log + (P-1)log x i - x i - log(P) 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 48 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 N Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% i 1 x i and mlog = i 1 log x i N 0 1000000 60 800000 40 Listing The sufficient statistics are m1 = 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 49/98 Rao Blackwell Theorem The mean squared error of an estimator based on sufficient statistics is smaller than one not based on sufficient statistics. We deal in consistent estimators, so a large sample (approximate) version of the theorem is that estimators based on sufficient statistics are more efficient than those that are not. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 49 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 50/98 Maximum Likelihood Estimation Criterion Comparable to method of moments Several virtues: Broadly, uses all the sample and nonsample information available efficient (better than MoM in many cases) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 50 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 51/98 Setting Up the MLE The distribution of the observed random variable is written as a function of the parameter(s) to be estimated P(yi|) = Probability density of data | parameters. L(|yi) = likelihood of parameter | data The likelihood function is constructed from the density Construction: Joint probability density function of the observed sample of data – generally the product when the data are a random sample. The estimator is chosen to maximize the likelihood of the data (essentially the probability of observing the sample in hand). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 52/98 Regularity Conditions Why? Regular MLE has known, good properties. Nonregular estimators usually do not have known properties (good or bad). What they are What they mean Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% MLE exists for nonregular densities (see text). Questionable statistical properties. Frequency Moment conditions and convergence. We need to obtain expectations of derivatives. We need to be able to truncate Taylor series. We will use central limit theorems Listing 1. logf(.) has three continuous derivatives wrt parameters 2. Conditions needed to obtain expectations of derivatives are met. (E.g., range of the variable is not a function of the parameters.) 3. Third derivative has finite expectation. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 53/98 Regular Exponential Density Exponential density f(yi|)=(1/)exp(-yi/) Average time until failure, , of light bulbs. yi = observed life until failure. Regularity (1) Range of y is 0 to free of (2) logf(yi|) = -log – y/ ∂logf(yi|)/∂ = -1/ + yi/2 E[yi]= , E[∂logf()/∂]=0 (3) ∂2logf(yi|)/∂2 = 1/2 - 2yi/3 finite expectation = -1/2 (4) ∂3logf(yi|)/∂3 = -2/3 + 6yi/4 has finite expectation = 4/3 (5) All derivatives are continuous functions of 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 54/98 Likelihood Function L()=Πi f(yi|) MLE = the value of that maximizes the likelihood function. Generally easier to maximize the log of L. The same maximizes log L In random sampling, logL=i log f(yi|) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 54 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 55/98 Poisson Likelihood y = 5, 0, 1, 1, 0, 3, 2, 3, 4, 1 exp(-) y f(y|) = =Poisson y! exp()5 exp()0 exp()1 Likelihood = ... 5! 0! 1! 20 exp(10) = 207,360 Log likelihood = -10 + 20log - 12.242 Maximum occurs at = 2 log and ln both mean natural log throughout this course 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 55 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 56/98 The MLE The log-likelihood function: log-L(|data)= Σi logf(yi|) The likelihood equation(s) = first derivative: First derivatives of log-L equals zero at the MLE. ∂[Σi logf(yi|)]/∂MLE = 0. (Interchange summation and differentiation) Σi [∂logf(yi|)/∂MLE]= 0. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 57/98 Applications Bernoulli Exponential Poisson Normal Gamma 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 57 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 58/98 Bernoulli f(y|θ)=(1-θ)1-y θ y log f=(1-y)log(1-θ)+ylogθ log likelihood = logL= i 1 (1 yi )log(1-θ)+yi logθ N y log L N -(1-y i ) 0 i=1 + i θ 1-θ likelihood equation = θ i=1 (1-yi ) = (1-θ) i=1 yi N N ˆ MLE = Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball i=1 yi N 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 i Histogram of Listing 900000 Mean StDev N AD P-Value 90 600000 i=1 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 y N y Probability Plot of Listing 99 95 58 300000 100000 N Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% i=1 yi - θ N 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing i=1 yi = N Percent Nθ - θ Frequency solution: 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 59/98 Exponential Estimating the average time until failure, , of light bulbs. yi = observed life until failure. f(yi|)=(1/)exp(-yi/) L()=Πi f(yi|)= -N exp(-Σyi/) logL ()=-Nlog () - Σyi/ Likelihood equation: ∂logL()/∂=-N/ + Σyi/2 =0 Solution: (Multiply both sides of equation by 2) = Σyi /N (sample average estimates population average) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 60/98 Poisson Distribution exp(-) y exp(-) y P(y)= = y! Γ(y+1) logL=-N+ i=1 yi log- i=1 logΓ(y+1)] N N yi log L N likelihood equation: = -N+ i=1 =0 Solution: Multiply equation by ; ˆ = N i=1 yi y = 5, 0, 1, 1, 0, 3, 2, 3, 4, 1 y=2 =y N Sample mean estimates population mean. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 60 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 61/98 Normal Distribution 1 (yi ) 2 1 2 exp . is 2 2 f(yi |μ,σ ) = f(yi |μ, ) = 2 1 1 11 N 2 log L log 2 log (y ) i 2 2 2 i 1 log L 11 N 1 N 2(y ) (yi ) 0 i 2 i 1 i 1 Multiply both sides by and solve. ˆ = y. log L 1 1 (using the solution for ) = 2 2 2 Multiply both sides by 2 2 and solve ˆ = 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% N Probability Plot of Listing 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 N 6 200000 2 1 100000 15000 800000 1000000 Mean StDev N 369687 156865 51 80 8 5 400000 600000 Listing Normal 10 500000 4 200000 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 700000 300000 0 2 (y y) i i 1 Histogram of Listing 400000 10 N 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 so = 900000 Mean StDev N AD P-Value 90 600000 N Scatterplot of Listing vs IncomePC Normal - 95% CI 99 700000 300000 100000 2 (y y) i i 1 95 61 2 (y y) i i 1 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball N N Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 2 (y y) 0 i i 1 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Reclaim the name: 2 = N Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 62/98 Gamma Distribution P x P 1e x f(x|,P)= ( P ) log f(x i |,P) Plogλ + (P-1)log x i - λx i - logΓ(P) 0 -2 -4 NPlogλ+(P-1) i=1 log x i - λ i=1 x i - NlogΓ(P) N N PSI log L(,P) Plot of Psi(P) Function 2 -6 log L NP N i=1 x i λ log L N N log i=1 log x i N d log Γ(P)/dP P Must be solved iteratively. There is no explicit solution. -8 -10 -12 0 1 2 3 4 5 6 P (P) = (P) /(P) = dlog (P)/dP 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 62 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 63/98 Gamma Application Gamma (Loglinear) Regression Model Dependent variable Y Log likelihood function -85.37567 --------+---------------------------------------------------------------| Standard Prob. 95% Confidence Y| Coefficient Error z |z|>Z* Interval --------+---------------------------------------------------------------|Parameters in conditional mean function LAMBDA| .07707*** .02544 3.03 .0024 .02722 .12692 |Scale parameter for gamma model P_scale| 2.41074*** .71584 3.37 .0008 1.00757 3.81363 --------+---------------------------------------------------------------SAME SOLUTION AS METHOD OF MOMENTS USING M1 and Mlog create ; calc ; Minimize; ; y1=y ; y2=log(y) $ m1=xbr(y1) ; mlog=xbr(y2) $ start = 2.0, .06 ; labels = p,l fcn= (m1 - p/l)^2 + (mlog – (psi(p)-log(l)))^2 $ -----------------------------------------------------------P| 2.41074 L| .07707 --------+--------------------------------------------------- 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 63 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 64/98 Properties of the MLE Estimator Regularity Finite sample vs. asymptotic properties Properties of the estimator Information used in estimation 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 64 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 65/98 Properties of the MLE Sometimes unbiased, usually not Always consistent (under regularity) Large sample normal distribution Efficient Invariant Sufficient (uses sufficient statistics when they exist) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 65 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 66/98 Unbiasedness Usually when estimating a parameter that is the mean of the random variable Normal mean Poisson mean Bernoulli probability is the mean. Does not make degrees of freedom corrections Almost no other cases. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 66 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 67/98 Consistency Under regularity MLE is consistent. Without regularity, it may be consistent, but usually cannot be proved. Almost all cases, mean square consistent Expectation converges to the parameter Variance converges to zero. 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 67 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% (Proof sketched in Rice text, 275-276) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 68/98 Large Sample Distribution (Sketch of a proof for one parameter) log L N At the mle = g(ˆ ) = 0 = i 1 gi (ˆ ) ˆ Linearize around . g(ˆ ) g() + H()(ˆ ) [+ ... -> 0] log L 2 log L N N g() = i 1 g i () H() = H () i 1 i 2 g() Solve for (ˆ ) . Cleverly multiply by N H() N 1 N N g ( ) i g ( ) / N i 1 i N g() g()/ N N i 1 N (ˆ ) . N N N 1 1 1 H()/N N i 1 H i () N i 1 H i () N i 1 H i () Denominator converges to E[H()] < 0. Apply a central limit theorem to the numerator. Conclude 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Var[g i (θ)] d N 0, N {E[ H ()]}2 1 N i 1 H i () Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% N g() Percent N (ˆ ) 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 69/98 The Information Equality A useful result: Fisher's Information Equality 2 log f ( yi | ) log f ( yi | ) Var Var gi () E 2 Var[g i ()] E[ H i ()] The variance of the first derivative equals the negative of the expected value of the second derivative. exp(-yi /θ) Example: f(yi |θ)= . E[yi ] = . Var[yi ] = 2 θ logf(yi |θ) = -log - yi /. g i (θ) = -1/θ+yi /θ 2 . E[g i (θ)]=0. Var[g i (θ)] = (1/θ 4 )Var[yi ] =1/θ 2 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% -E[H i (θ)] = -[1/θ 2 - 2θ/θ3 ]=1/θ 2 0 1000000 60 800000 40 Listing H i (θ) = 1/θ 2 -2yi /θ 3 . 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 70/98 Deduce The Variance of MLE Var[g i (θ)] g()/ N d ˆ N ( ) N 0, 2 H()/N { E [ H ( )]} i We found Var[g i (θ)] E[ H i ()]. Substitute E[ H i ()] d ˆ N ( ) N 0, 2 {E[ H i ()]} 1 1 1 d ˆ Solve for N , N , N { E [ H ( )]} N I(θ) i I(θ)='Information Number' = -E[H i ()] 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 71/98 Computing the Variance of the MLE Asymptotic Var(ˆ ) = -1 . NE[H i (θ)] 1. Using formula for expected second derivatives -1 Compute E[H i (θ)] using ˆ then ˆ NE[H (θ)] i 2. Just plug into actual second derivatives 1 ˆ ˆ ˆ 1 N H (θ) Since plim i H i (θ) = H(θ) use H(θ)= i N N i=1 3. Use the mean square of the first derivatives 1 N ˆ 2 Since -E[H i (θ)] Var[g(θ)], use i=1[g i (θ)] N 4. Bootstrapping 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 72/98 Application: GSOEP Income Descriptive Statistics for 1 variables --------+--------------------------------------------------------------------Variable| Mean Std.Dev. Minimum Maximum Cases Missing --------+--------------------------------------------------------------------HHNINC| .355564 .166561 .030000 2.0 2698 0 --------+--------------------------------------------------------------------- 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 73/98 Variance of MLE exp(-y/θ) Example: f(y|θ)= . E[y] = . Var[y] = 2 θ g(θ) = -1/θ+y/θ 2 . E[g(θ)]=0. Var[g(θ)] = (1/θ 4 )Var[y]=1/θ 2 H(θ) = 1/θ 2 -2y/θ3 . ˆ .355564 -E[H(θ)] = -[1/θ 2 - 2θ/θ3 ]=1/θ 2 -1/(NE[H(θ)]) = θˆ 2 /N= .0068542 N N 1 / i 1 H i (ˆ ) 1 / i 1 (1/θˆ 2 - 2y/θˆ 3 ) .0068542 N N 1 / i 1[ gi (ˆ )]2 1 / i 1 (-1/θˆ + y/θˆ 2 ) .01461582 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 74/98 Bootstrapping Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing Given the sample, i = 1,…,N Sample N observations with replacement – some get picked more than once, some do not get picked. Recompute estimate of . Repeat R times, obtain R new estimates of . Estimate variance with the sample variance of the R new estimates. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 75/98 Bootstrap Results Estimated Variance = .003112. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 76/98 Sufficiency If sufficient statistics exist, the MLE will be a function of them Therefore, MLE satisfies the Rao Blackwell Theorem (in large samples). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 77/98 Efficiency Crame’r – Rao Lower Bound Variance of a consistent, asymptotically normally distributed estimator is > -1/{NE[Hi()]}. The MLE achieves the C-R lower bound, so it is efficient. Implication: For normal sampling, the mean is better than the median. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 78/98 Invariance The MLE of a function of is that function of the MLE. 1 -y ˆ In the exponential model, f = exp . = y. θ θ 1 If the model is f = exp(-y) the MLE of is . y 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 79/98 Bayesian Estimation Philosophical underpinnings How to combine information contained in the sample 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 80/98 “Estimation” Assembling information Prior information = out of sample. Literally prior or outside information Sample information is embodied in the likelihood Result of the analysis: “Posterior belief” = blend of prior and likelihood 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 81/98 Using Conditional Probabilities: Bayes Theorem Typical application: We know P(B|A), we want P(A|B) In drug testing: We know We need P(find evidence of drug use | usage) < 1. P(usage | find evidence of drug use). The problem is false positives. P(find evidence drug of use | Not usage) > 0 This implies that 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% P(usage | find evidence of drug use) 1 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 82/98 Bayes Theorem P(A,B) P(A | B) P(B) P(B | A)P(A) P(B) P(B | A)P(A) P(A,B) P(notA,B) P(B | A)P(A) P(B | A)P(A) P(B | notA)P(notA) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 100000 15000 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 4 5 200000 2 1 100000 15000 0 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 300000 10 Mean StDev N 10 500000 400000 20 300000 200000 60 50 40 30 Normal 100 12 700000 600000 70 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 80 600000 200000 369687 156865 51 0.994 0.012 Computation Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 Definition 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Theorem Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Target 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 83/98 Disease Testing Notation + = test indicates disease, – = test indicates no disease D = presence of disease, N = absence of disease Known Data P(Disease) = P(D) = .005 (Fairly rare) (Incidence) P(Test correctly indicates disease) = P(+|D) = .98 (Sensitivity) (Correct detection of the disease) P(Test correctly indicates absence) = P(-|N) = . 95 (Specificity) (Correct failure to detect the disease) Objectives: Deduce these probabilities P(D|+) (Probability disease really is present | test positive) P(N|–) (Probability disease really is absent | test negative) Note, P(D|+) = the probability that a patient actually has the disease when the test says they do. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 84/98 More Information Deduce: Since P(+|D)=.98, we know P(–|D)=.02 because P(-|D)+P(+|D)=1 [P(–|D) is the P(False negative). Deduce: Since P(–|N)=.95, we know P(+|N)=.05 because P(-|N)+P(+|N)=1 [P(+|N) is the P(False positive). 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Deduce: Since P(D)=.005, we know P(N)=.995 because P(D)+P(N)=1. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 85/98 Now, Use Bayes Theorem We have P(+|D)=.98. Prob test shows disease given it is present What is P(D|+)? Prob disease is present given the test says it is P(D and +) P(+|D)P(D) = (By Bayes Theorem) P(+) P(+) P(+) = P(D and +) + P(N and +) P(D|+)= = P(+|D)P(D) + P(+|N)P(N) so P(D|+) = = P(+|D)P(D) P(+|D)P(D) = P(+) P(+|D)P(D) + P(+|N)P(N) .98(.005) = 0.08966 (Yikes!!) .98(.005)+.05(.995) Using the same approach, P(N|-) = 0.999889 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 86/98 Bayesian Investigation Meatball Garlic 5.0% 2.3% Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing Percent No fixed “parameters.” is a random variable. Data are realizations of random variables. There is a marginal distribution p(data) Parameters are part of the random state of nature, p() = distribution of independently (prior to) the data Investigation combines sample information with prior information. Outcome is a revision of the prior based on the observed information (data) Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 87/98 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 88/98 Symmetrical Treatment Likelihood is p(data|) Prior distribution summarizes the nonsample information about in p() Joint distribution is p(data,) P(data,) = p(data|)p()=Likelihood x Prior Use Bayes theorem to get p( |data) = posterior distribution 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 89/98 The Posterior Distribution Sample information L(data|) Prior information p() Joint density for and data = p(,data) = L(data)p() Conditional density for given the data p(,data) L(data)p() p(|data) = = posterior density p(data) L(data)p()d Information obtained from the investigation E[|data] = posterior mean = the Bayesian "estimate" Var[|data] = posterior variance used for form interval estimates Quantiles of |data such as median, or 2.5th and 97.5th quantiles 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 90/98 Priors – Where do they come from? Diffuse • Uniform • Normal with huge variance 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Percent Meatball Garlic 5.0% 2.3% L(data)p()d Conjugate priors Pie Chart of Percent vs Type Mushroom and Onion 9.2% L(data)p() Improper priors Listing p(|data) Percent What does the prior contain Informative priors – real prior information Noninformative priors Mathematical Complications Frequency 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 91/98 Application Consider estimation of the probability that a production process will produce a defective product. In case 1, suppose the sampling design is to choose N = 25 items from the production line and count the number of defectives. If the probability that any item is defective is a constant θ between zero and one, then the likelihood for the sample of data is L( θ | data) = θ D(1 − θ) 25−D, where D is the number of defectives, say, 8. The maximum likelihood estimator of θ will be q = D/25 = 0.32, and the asymptotic variance of the maximum likelihood estimator is estimated by q(1 − q)/25 = 0.008704. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 92/98 Application: Posterior Density The posterior density is p ( | data) D (1 ) N D p() D (1 ) N D p() d . Noninformative prior: all allowable values of are equally likely. (Informative conjugate prior pursued in homework.) This would imply a uniform distribution over 0,1 . Thus, p 1, 0 1. 1 0 D (1 ) N D 1 d = A beta integral with a = D+1 and b = N-D+1 = ( D 1)( N D 1) ( D 1 N D 1) D (1 ) N D ( N 2) D (1 ) N D The posterior density is p( | data) ( D 1)( N D 1) ( D 1)( N D 1) ( D 1 N D 1) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 93/98 Posterior Moments Posterior Density with uniform noninformative prior ( N 2) D (1 ) N D p(θ|data) (The data are N and D) ( D 1)( N D 1) Posterior Mean ( N 2) D (1 ) N D E[θ|data] = d 0 ( D 1)( N D 1) This is a beta integral. The posterior is a beta density with =D+1, =N-D+1. The mean of a beta variable is /( +)=(D+1)/(N+2) = 9/27 = .3333. This is the posterior mean. The prior mean was .5000. The MLE was 8/25 = .3200. The posterior variance is 1 D 1 / N D 1 / N 3 N 2 0.007936 The prior variance is 1/12 = .08333 and the variance of the MLE is .008704. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 2 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 94/98 Mixing Prior and Sample Information A typical result (exact for sampling from the normal distribution with known variance) Posterior mean w Prior Mean + (1-w) MLE = w (Prior Mean - MLE) + MLE Posterior Mean - MLE .3333 .32 w= .073889 Prior Mean - MLE .5 .32 Approximate Result Prior Mean MLE Prior Variance Asymptotic Variance Posterior Mean 1 1 Prior Variance Asymptotic Variance 1 1 / (1 / 12) Prior Variance = .09547 1 1 1 / (1 / 12) 1 / (.008704) Prior Variance Asymptotic Variance 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 95/98 Modern Bayesian Analysis Posterior Mean = p( | data)d Integral is often complicated, or does not exist in closed form. Alternative strategy: Draw a random sample from the posterior distribution and examine moments, quantiles, etc. Example: Our posterior is Beta(9,18). Based on a random sample of 5,000 draws from this population: Bayesian Estimate of Theta Observations = 5000 Mean = .334017 Posterior Variance = .007936 Skewness = .248077 Minimum = .066214 .025 Percentile = .177090 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 .333333) .086336 .007454 -.161478 .653625 .510028 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% (Posterior mean was Standard Deviation = Sample variance = Kurtosis-3 (excess)= Maximum = .975 Percentile - 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 96/98 Modern Bayesian Analysis Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing Multiple parameter settings Derivation of exact form of expectations and variances for p(1,2 ,…,K |data) is hopelessly complicated even if the density is tractable. Strategy: Sample joint observations (1,2 ,…,K) from the posterior population and use marginal means, variances, quantiles, etc. How to sample the joint observations??? (Still hopelessly complicated.) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 97/98 Magic: The Gibbs Sampler Objective: Sample joint observations on 1,2 ,…,K. from p(1,2 ,…,K|data) (Let K = 3) Strategy: Gibbs sampling: Derive p(1|2,3,data) p(2|1,3,data) p(3|1,2,data) Gibbs Cycles produce joint observations 0. Start 1,2,3 at some reasonable values 1. Sample a draw from p(1|2,3,data) using the draws of 1,2 in hand 2. Sample a draw from p(2|1,3,data) using the draw at step 1 for 1 3. Sample a draw from p(3|1,2,data) using the draws at steps 1 and 2 4. Return to step 1. After a burn in period (a few thousand), start collecting the draws. The set of draws ultimately gives a sample from the joint distribution. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Part 3 – Estimation Theory 98/98 Methodological Issues Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing Priors: Schizophrenia Uninformative are disingenuous Informative are not objective Using existing information? Bernstein von Mises and likelihood estimation. In large samples, the likelihood dominates The posterior mean will be the same as the MLE Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000