MathStat

advertisement
Statistical Inference
and Regression
Analysis: GB.3302.30
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
Inference and Regression
Part 9 – Linear Model Topics
3/95
Agenda
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Variable Selection – Stepwise Regression
Partial Regression – The Meaning of Multiple Regression
Panel Data
Test of Regression Stability
Generalized Regression
 Robust inference for OLS regression
 Heteroscedasticity and weighted least squares
 Autocorrelation and generalized least squares
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Inference and Regression
Stepwise Regression
5/95
Stepwise Regression

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Start with (a) no model, or (b) the specific variables that are
designated to be forced to into whatever model ultimately
chosen
(A: Forward step) Add a variable: “Significant?” Include the
most “significant variable” not already included.
(B: Backward step) Are variables already included in the
equation now adversely affected by collinearity? If any
variables become “insignificant,” now remove the least
significant variable.
Return to (A)
This can cycle back and forth for a while. Usually not.
Ultimately selects only variables that appear to be “significant”
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
6/95
Stepwise Regression Feature
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
7/95
Specify Procedure
All 10 predictors
Subset of predictors
that must appear in
the final model
chosen (optional)
No need to change
Methods or Options I
changed P value for
inclusion to .10.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
8/95
Used 0.10
as the cutoff
“p-value” for
inclusion or
removal. All
P values will
be less than
or equal to
.10.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
9/95
Stepwise Regression
What’s Right with It?



What’s Wrong with It?
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

No reason to assume that the resulting model will
make any sense
Test statistics are completely invalid and cannot be
used for statistical inference. (Can’t be t ratios if you
know in advance they will be larger than 2.)
Frequency

Listing

Automatic – push button
Simple to use. Not much thinking involved.
Relates in some way to connection of the variables to
each other – significance – not just R2
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Inference and Regression
Multiple Regression
11/95
The Frisch-Waugh Theorem
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
Partialing out the effect of a variable
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
Multiple Regression
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
12/95
U.S. Gasoline Market, 1953-2004
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
12
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
13/95
Multiple Regression of logG
on logPG and logY
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
13
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
14/95
Two Side Regressions
Regress logG on a
constant and logY and
compute residuals
RESLOGG
Regress logPg on a
constant and logY and
compute residuals
RESLOGPG
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
14
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
15/95
Interesting Plots
Original regression of logG on a
constant and logPg. The line
slopes the wrong way.
New regression of ReslogG on
a constant and ReslogPg. The
line slopes the right way.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
15
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
16/95
Regression of Residuals on Residuals
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
16
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
17/95
NLOGIT
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
17
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
18/95
Minitab
Use CALC to compute logg=loge(g), logpg=loge(pg), logy=loge(pcincome).
Regression of logg on logpg and logy. To save residuals, use Storage as
above. Residuals are saved as RESI1 and RESI2 in data area.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
18
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
19/95
Frisch-Waugh (1933) Theorem
Context: Model contains two sets of variables:
X = [ [1,time] | [ other variables]]
= [X1 X2]
Regression model:
y = X11 + X22 +  (population)
= X1b1 + X2b2 + e (sample)
Problem: Algebraic expression for the second set
of least squares coefficients, b2
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
20/95
Frisch-Waugh Result
“We get the same result whether we (1) detrend the other
variables by using the residuals from a regression of
them on a constant and a time trend and use the
detrended data in the regression or (2) just include a
constant and a time trend in the regression and not
detrend the data”
“Detrend the data” means compute the residuals from the
regressions of the variables on a constant and a time
trend.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
21/95
Partitioned Solution
Method of solution (Why did F&W care? In 1933,
matrix computation was not trivial!)
Direct manipulation of normal equations produces
( X X )b = X y
 X1 X1 X1 X 2 
 X1 y 
X = [ X 1 , X 2 ] so X X = 
and X y = 





X
X
X
X
X
y
2 2
 2 1
 2 
 X1 X1 X1 X 2   b1   X1 y 
(X X)b = 
=







X
X
X
X
b
X
y
2 2  2 
 2 1
 2 
X1 X1b1  X1 X 2b2  X1 y
X 2 X1b1  X 2 X 2b2  X 2 y ==> X 2 X 2b2  X 2 y - X 2 X1b1
= X 2 (y - X 1b1 )
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
22/95
Partitioned Solution
Direct manipulation of normal equations produces
b2 = (X2X2)-1X2(y - X1b1)
What is this? Regression of (y - X1b1) on X2
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
23/95
The Frisch-Waugh Result
Continuing the algebraic manipulation, the solution for b2 is:
b2 = [(X2’M1)(M1X2)]-1[(X2’M1)(M1y)]
where M1 = I-X1(X1’X1)-1X1’ and M1X2 and M1y are residuals in
regressions on X1.
This is Frisch and Waugh’s famous result - the “double residual regression.”
How do we interpret this? A regression of residuals on residuals.
“We get the same result whether we (1) detrend the other variables by using
the residuals from a regression of them on a constant and a time trend and
use the detrended data in the regression or (2) just include a constant and a
time trend in the regression and not detrend the data”
“Detrend the data” means compute the residuals from the regressions of the
variables on a constant and a time trend.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
24/95
Partial Regression
Important terms in this context:
Partialing out the effect of X1.
Netting out the effect …
“Partial regression coefficients.”
To continue belaboring the point: Note the interpretation of
partial regression as “net of the effect of …”
Now, follow this through for the case in which X1 is just a constant
term, column of ones.
What are the residuals in a regression on a constant. What is
M1?
Note that this produces the result that we can do linear
regression on data in mean deviation form.
'Partial regression coefficients' are the same as 'multiple regression
coefficients.' It follows from the Frisch-Waugh theorem.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
25/95
Understanding Multiple Regresion
In a multiple regression, the coefficient on an x
is interpreted to give the effect of change in x
on change in y holding everything else
constant.
That is, “net of the effect of everything else.”
How can y=a+b1Educ+b2Age+e?



Each year of education means aging by 1 year.
How is it possible to hold age constant and
increase education by 1 year?
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
25
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
26/95
Application – Health and Income
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
Variables in the file are
Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced
panel with 7,293 individuals. There are altogether 27,326 observations. The number of
observations ranges from 1 to 7 per family. (Frequencies are: 1=1525, 2=2158, 3=825,
4=926, 5=1051, 6=1000, 7=987). The dependent variable of interest is
DOCVIS = number of visits to the doctor in the observation period
HHNINC = household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
HHKIDS = children under age 16 in the household = 1; otherwise = 0
EDUC = years of schooling
AGE
= age in years
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
27/95
Multiple Regression
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
27
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Inference and Regression
Panel Data
29/95
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
30/95
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
31/95
THE Application of Frisch-Waugh
The Fixed Effects Model
A regression model with a dummy variable for
each individual in the sample, each observed Ti times.
yi = Xi + diαi + εi, for each individual
N columns
Plain
32.5%
0
0
0
Scatterplot of Listing vs IncomePC
900000
800000
800000
500000
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
0
900000
400000
Mushroom
16.2%
d2
0

0  β
ε


  α

dN 
0
1000000
60
800000
40
Listing
Sausage
5.8%
0
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
0
Frequency
Pepperoni
21.8%
0
Listing
Meatball
Garlic 5.0%
2.3%
d1
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
 X1

 X2


 XN
Percent
 y1 
 
 y2  
 
 
 yN 
N may be thousands. I.e., the
regression has thousands of
variables (coefficients).
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
32/95
Application – Health and Income
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
Variables in the file are
Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced
panel with 7,293 individuals. There are altogether 27,326 observations. The number of
observations ranges from 1 to 7 per family. (Frequencies are: 1=1525, 2=2158, 3=825,
4=926, 5=1051, 6=1000, 7=987). The dependent variable of interest is
DOCVIS = number of visits to the doctor in the observation period
HHNINC = household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
HHKIDS = children under age 16 in the household = 1; otherwise = 0
EDUC = years of schooling
AGE
= age in years
We desire also to include a separate family effect (7293 of them) for each family.
This requires 7293 dummy variables in addition to the four regressors.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
33/95
‘Within’ Transformations
XMD y =  XiM y i ,
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
200000
100000
15000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
500000
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
700000
400000
10
17500
Histogram of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
400000
 t=1 (x it,k -x i.,k )(y it -y i. )
900000
Mean
StDev
N
AD
P-Value
95
500000
Ti
Scatterplot of Listing vs IncomePC
Normal - 95% CI
600000
 t=1 (x it,k -x i.,k )(x it,l -x i.,l )
i k,l
i k
Probability Plot of Listing
99
700000
300000
100000
i
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
i
D
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
i
D
i
Ti
Frequency
N
i=1
XM X 
XM y 
i
D
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
i
D
Percent
XMD X =  XiM X i ,
N
i=1
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
34/95
Least Squares Dummy Variable Estimator


b is obtained by ‘within’ groups least squares
(group mean deviations)
Normal equations for a are D’Xb+D’Da=D’y
a = (D’D)-1D’(y – Xb)
Ti
ai=(1/Ti )Σt=1
(yit -xitb)=ei
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
35/95
Estimating the Fixed Effects Model
The FEM is a linear regression model
but with many independent variables
1
 X X X D   X y 
DX DD  Dy 

 

the Frisch-Waugh theorem
b
 
 a
Using
1

=[X MD X ] X MD y 
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency
b
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
36/95
Fixed Effects Estimator (cont.)
M1D 0
0 


2
0
M
0
D
 (The dummy variables are orthogonal)
MD  



N
0
MD 
 0
MDi  I Ti  di (didi ) 1 d = I Ti  (1/Ti )did
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
- T1i 

1
... - Ti 

... ... 
... 1 - T1i 
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
...
Frequency
1 - T1i - T1i
 1
1
 - Ti 1 - Ti
=
...
 ...
1
 - T1
Ti
 i
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Inference and Regression
Chow Test
38/95
Equal Regressions
Setting: Two groups of observations
(men/women, countries, two different
periods, firms, etc.)
 Regression Model:
y = α+β1x1+β2x2 + … + ε
 Hypothesis: The same model applies
to both groups
 Rejection region: Large values of F

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
39/95
Application
Health satisfaction depends on many
factors:


Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Frequency

Investigation: Multiple regression
Hypothesis: The regressions are the same.
Rejection Region: Estimated regressions
that are very different.
Listing

Age, Income, Children, Education, Marital
Status
Do these factors figure differently in a model
for women compared to men?
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
40/95
Test of Structural Stability
Two groups, cleverly labeled Group 1
and Group 2.
Regression model applies to the two
groups:
yj = Xjj + j
Null hypothesis: 1 = 2
Test using an F statistic.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
40
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
41/95
Testing Strategy: Setup
Fit separate regressions for the two groups.
Separate coefficient vectors b1 and b2
Each coefficient vector is bj = (Xj’Xj)-1Xj’yj
Sums of squares e1’e1 = (y1 - X1b1)’(y1 - X1b1)
and
e2’e2 = (y2 – X2b2)’(y2 – X2b2)
Total separate sum of squares = SS12 = e1’e1 + e2’e2




Pooled sum of squares
SSpooled = e1’e1 = (y1 - X1b)’(y1 - X1b) + (y2 – X2b)’(y2 – X2b)


SSpooled must be > SS12
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
Percent
Meatball
Garlic 5.0%
2.3%
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Listing

b = (X1’X1 + X2’X2)-1 ( X1’y1 + X2’y2)
Pooled regression
Percent

Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
42/95
Testing Strategy

Rejection Regions
(1) b1 is very different from b2
(2) SSpooled is much larger than SS12



These are the same.
(SSpooled-SS12)/(K+1)
F[K+1,N1+N2-2K-2] =
(SS12)/(N1+N2-2K-2)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
43/95
Procedure: Equal Regressions
There are N1 observations in Group 1 and N2 in Group 2.
There are K variables and the constant term in the model.
This test requires you to compute three regressions and retain the sum of squared
residuals from each:






SS1
= sum of squares from N1 observations in group 1
SS2
= sum of squares from N2 observations in group 2
SSALL = sum of squares from NALL=N1+N2 observations when the two groups
are pooled.
F=
The hypothesis of equal regressions is rejected if F is larger than the critical value from
the F table (K numerator and NALL-2K-2 denominator degrees of freedom)
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Frequency
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Listing
Pie Chart of Percent vs Type
Percent

Mushroom and Onion
9.2%
(SSALL-SS1-SS2)/(K+1)
(SS1+SS2)/(N1+N2-2K-2)
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
44/95
Health Satisfaction Models: Men vs. Women
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Frequency
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent
German survey
data over 7
years, 1984 to
1991 (with a
gap). 27,326
observations on
Health
Satisfaction and
several
covariates.
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |
T
|P value]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
Women===|=[NW = 13083]================================================
Constant|
7.05393353
.16608124
42.473
.0000
1.0000000
AGE
|
-.03902304
.00205786
-18.963
.0000
44.4759612
EDUC
|
.09171404
.01004869
9.127
.0000
10.8763811
HHNINC |
.57391631
.11685639
4.911
.0000
.34449514
HHKIDS |
.12048802
.04732176
2.546
.0109
.39157686
MARRIED |
.09769266
.04961634
1.969
.0490
.75150959
Men=====|=[NM = 14243]================================================
Constant|
7.75524549
.12282189
63.142
.0000
1.0000000
AGE
|
-.04825978
.00186912
-25.820
.0000
42.6528119
EDUC
|
.07298478
.00785826
9.288
.0000
11.7286996
HHNINC |
.73218094
.11046623
6.628
.0000
.35905406
HHKIDS |
.14868970
.04313251
3.447
.0006
.41297479
MARRIED |
.06171039
.05134870
1.202
.2294
.76514779
Both====|=[NALL = 27326]==============================================
Constant|
7.43623310
.09821909
75.711
.0000
1.0000000
AGE
|
-.04440130
.00134963
-32.899
.0000
43.5256898
EDUC
|
.08405505
.00609020
13.802
.0000
11.3206310
HHNINC |
.64217661
.08004124
8.023
.0000
.35208362
HHKIDS |
.12315329
.03153428
3.905
.0001
.40273000
MARRIED |
.07220008
.03511670
2.056
.0398
.75861817
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
45/95
Computing the F Statistic
+--------------------------------------------------------------------------------+
|
Women
Men
All
|
| LHS=HEALTH
Mean
=
6.634172
6.924362
6.785662 |
|
Standard deviation
=
2.329513
2.251479
2.293725 |
|
Number of observs.
=
13083
14243
27326 |
| Model size
Parameters
=
6
6
6 |
|
Degrees of freedom
=
13077
14237
27320 |
| Residuals
Sum of squares
=
66677.66
66705.75
133585.3 |
|
Standard error of e =
2.258063
2.164574
2.211256 |
| Fit
R-squared
=
0.060762
0.076033
.070786 |
| Model test
F (P value)
= 169.20(.000) 234.31(.000) 416.24 (.0000) |
+--------------------------------------------------------------------------------+
[133,585.3-(66,677.66+66,705.75)] / 6
= 6.8904
(66,677.66+66,705.75) / (27,326 - 6 - 6 - 2
The critical value for F[6, 23214] is
2.0989
F=
Even though the regressions look similar, the hypothesis of
equal regressions is rejected.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Inference and Regression
Generalized Regression
47/95
Generalized Regression
“General” in that the main
assumptions of the regression model
are not met.
 Heteroscedasticity
 Serial correlation (autocorrelation)

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
48/95
Conditional Homoscedasticity and
Nonautocorrelation
Disturbances provide no information about each other.
 Var[i | X ]
= 2
 Cov[i, j |X] = 0
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
100000
15000
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
4
5
200000
2
1
100000
15000
0
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
300000
10
Mean
StDev
N
10
500000
400000
20
300000
200000
60
50
40
Normal
100
12
700000
600000
70
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
80
600000
200000
369687
156865
51
0.994
0.012
... 0 

... 0 
2
... 0    I

... ... 
... 2 
0
0
2
...
0
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
0
2
0
...
0
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
... Cov(1 ,  N )  2

... Cov( 2 ,  N )   0
... Cov(3 ,  N )    0
 
...
...
  ...
... Var ( N )   0
Percent
Cov(1 ,  2 ) Cov(1 , 3 )
 Var (1 )
 Cov( ,  )
Var ( 2 )
Cov( 2 , 3 )
2 1

 Cov(3 , 1 ) Cov(3 ,  2 )
Var (3 )

...
...
...

Cov( N , 1 ) Cov( N ,  2 ) Cov( N , 3 )
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
49/95
Data on 18 OECD countries, 19 years, Gasoline consumption
logGas/Car=a+b1logIncome+b2logP+b3logcars/Capita+e
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
50/95
Heteroscedasticity
Countries
are ordered
by the
standard
deviation of
their 19
residuals.
Regression of log of per capita gasoline use on log of per capita income,
gasoline price and number of cars per capita for 18 OECD countries for 19
years. The standard deviation varies by country.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
51/95
Heteroscedasticity: Regression results for 6 (of 18) countries
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
52/95
Autocorrelation
logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
53/95
Autocorrelation Results from
an Incomplete Model
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
54/95
Heteroscedasticity
Disturbances still provide no information about each other.
 Var[i | X ]
= 2
 Cov[i, j |X] = 0
But data are of unequal variation
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
100000
15000
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
4
5
200000
2
1
100000
15000
0
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
300000
10
Mean
StDev
N
10
500000
400000
20
300000
200000
60
50
40
Marginal Plot of Listing vs IncomePC
Normal
100
12
700000
600000
70
... 0 

... 0 
... 0 

... ... 
...  N 2 
Empirical CDF of Listing
14
800000
80
600000
200000
369687
156865
51
0.994
0.012
0
0
3 2
...
0
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
0
2 2
0
...
0
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
... Cov(1 ,  N )  12

... Cov( 2 ,  N )   0
... Cov(3 ,  N )    0
 
...
...
  ...
... Var ( N )   0
Percent
Cov(1 ,  2 ) Cov(1 , 3 )
 Var (1 )
 Cov( ,  )
Var ( 2 )
Cov( 2 , 3 )
2 1

 Cov(3 , 1 ) Cov(3 ,  2 )
Var (3 )

...
...
...

Cov( N , 1 ) Cov( N ,  2 ) Cov( N , 3 )
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
55/95
Response to Heteroscedasticity
Regression model (so far) assumes
homoscedasticity
 Any implications for use of least
squares?
 Any adjustment needed for regression
computations?

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
56/95
Robust Regression Inference
Earlier result, Var[b|X]=2(X’X)-1 is no longer
correct.
It is possible to adjust the covariance matrix


Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Robust Covariance Matrix: Gives appropriate
standard errors whether or not there is
heteroscedasticity.
Percent

Frequency

Listing

Case 1. Heteroscedasticity is of unknown type.
Least squares is OK – still unbiased and
consistent.
Least squares standard errors are not OK
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
57/95
The (Hal) White Estimator


Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Compute LS, b=… as usual, and compute residuals, ei
Compute A = X’X
= i 1  xixi’
Compute B = X’[e2]X = i ei2  xixi’
Compute estimator of Var[b|X] as A-1 B A-1
This is called a sandwich estimator
Frequency

Listing

Compute a covariance matrix for least squares that
will not be distorted by heteroscedasticity, but will also
be OK if there is no heteroscedasticity.
Steps:
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
58/95
Heteroscedasticity Robust Standard Errors
REGRESS;Lhs=lgaspcar ; Rhs=one,lincomep,lrpmg,lcarpcap ; Heteroscedasticity $
(Minitab does not know how to do this.)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
59/95
Heteroscedasticity
Case 2. Simple function of a variable
yi    xi  i , Var[i ]   2 zi .
Transform to a model that looks like the familiar regression
yi
zi
1

xi

zi
zi
1
 i
zi
2
 1 
2
yi *  wi * xi * i *, Var[i *]   zi 
  2
 z 
 i
Implication:
Regression of y* on w* and x* (without a constant)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
60/95
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
61/95
Number of Observations
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
62/95
Weighted LS Using Means
 1 
yi    xi  i , Var[ i ]     .
 Ni 
Transform to a model that looks like the familiar regression
2
yi Ni   Ni   Ni xi  i Ni
 1 
yi *  wi * xi * i *, Var[i *]    
 Ni 
Implication:
2


Ni
2
 2
Regression of y* on w* and x* (without a constant)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
63/95
Weighted Least Squares
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
64/95
Groupwise Heteroscedasticity
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
65/95
Strategy for Groupwise
Var[ε country,year ]=σ country
Strategy: (1) Linear Regression pooling the data
2
e
 years country , year
2
(2) For each country, use ˆ country

Tcountry
(3) Weighted least squares
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
66/95
Groupwise Weighted Least Squares
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
67/95
Combining Heteroscedastic Estimates
Data on 18 OECD countries, 19 years, Gasoline consumption
logGas/Car=a+b1logIncome+b2logP+b3logcars/Capita+e
18 Estimates of (1 2 3 4)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
68/95
Combining Heteroscedastic Estimates
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Simple Average
Weighted average based on ssqrd
Pool the data
Weighted average based on all information
Generalized Least Squares
(same as 4.)
Listing

1.
2.
3.
4.
5.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
69/95
Simple Average
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Frequency

Listing

Each least squares estimate bk has variance
Ak = k2(Xk’Xk)-1
Simple average, b = (1/N)b1 + (1/N)b2 + …
b = k wk bk where wk=1/N
wk is the same for all k
Each is unbiased, so the average is unbiased
Variance is (1/N)2 A1 + (1/N)2A2 + …
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
70/95
Weighted Average Based on k2



Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Variances range from .0122 to .0752, difference of
36 times
Forming weights, a smaller variance should get
greater weight.
Use wk = (1/k2)/[ (1/12) + (1/22) + … + (1/N2)]
Weights are positive and sum to 1.
Weighted average b = k wk bk is still unbiased
Variance now w12A1 + w22A2 + … + wN2AN
Frequency

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
71/95
Pooling the Data
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

b = [X1’X1+X2’X2+…]-1 [X1’y1+X2’y2+…]
Use a trick. X’y = X’Xb
After some matrix algebra,
b = k Wkbk = a matrix weighted average
k Wk = I
The variance is a messy matrix weighted sum
The average accounts for data but not k2
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
72/95
Best Weighted Average
Account for both Xk’Xk and k2
 In Matrix Weighting,
Wk = [kAk]-1Ak

Ak = k2(Xk’Xk)-1
This is equivalent to the weighted least
squares we did before.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
73/95
Comparison Worst and Best
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
74/95
GLS for AR1
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
75/95
U.S. Gasoline Market, 1953-2004
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
76/95
Autocorrelation
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
How to adjust estimator to deal with
serial correlation
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
Robust estimator to patch least
squares?
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
77/95
Robust Estimation for b
Assume Cov[et,et-s] fades as the time separation increases.
log Gt    1 log Pgt   2 log Incomet
+1 log Pnct   2 log Puct  3 log Pptt  t
Correlations of least squares residuals with
past values for 13 years.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
78/95
Newey—West Estimator
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
79/95
Model for Autocorrelation
Based on a specific assumption
t  t 1  u t , u t ~ mean 0, variance  2
0 <  < 1,
is the autocorrelation coefficient.
Results (not derived here),
2
Var[t ] =
1  2
Correlation [t , t 1 ] = 
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
80/95
How To Do Generalized
Least Squares
Model: yt    xt  t , t  t 1  ut
   xt  t
A transformation: yt
yt 1    xt  t 1
(yt -yt 1 ) = (1  )  (x t  xt 1 )  (t  t 1 )
Subtract
yt * = * + x t * + u t
Now compute the regression. Two loose ends


T
ee
t  2 t t 1
T
2
t
t 1
(1) Need to know . First just use least squares, then r =
.
e
(2) What happens to the first observation? Lost? No.
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
x1 *  x1 1  2
and
Percent
Use y1 *  y1 1  2
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
81/95
AR(1) for U.S. Gasoline
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Inference and Regression
Generalized Regression
(Optional)
83/95
Generalized Regression Model
Setting: The classical linear model assumes
that E[] = Var[] = 2I. That is,
observations are uncorrelated and all are
drawn from a distribution with the same
variance. The generalized regression (GR)
model allows the variances to differ across
observations and allows correlation across
observations.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
84/95
Implications
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Listing

Percent

The assumption that Var[] = 2I is used to derive the
result Var[b] = 2(XX)-1. If it is not true, then the use
of s2(XX)-1 to estimate Var[b] is inappropriate.
The assumption was used to derive most of our test
statistics, so they must be revised as well.
Least squares gives each observation a weight of 1/n.
But, if the variances are not equal, then some
observations are more informative than others.
Least squares is based on simple sums, so the
information that one observation might provide about
another is never used.
Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
85/95
GR Model
The generalized regression model:
y = X + ,
E[|X] = 0, Var[|X] = 2.
Leading Cases
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Simple heteroscedasticity
Autocorrelation
Panel data and heterogeneity more generally.
Frequency

Listing

Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
86/95
Least Squares


Still unbiased. (Proof did not rely on )
For consistency, we need the true variance of b,
Var[b|X] = E[(b-β)(b-β)’|X]
= (X’X)-1 E[X’εε’X] (X’X)-1
= 2 (X’X)-1 XX (X’X)-1 .
Divide all 4 terms by n. If the middle one converges to a finite
matrix of constants, we have the result, so we need to
examine
(1/n)XX = (1/n)ij ij xi xj.
This will be another assumption of the model
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
87/95
Generalized Least Squares
A transformation of the model:
P = -1/2. P’P = -1
Py = PX + P or
y* = X* + *.
Why?
E[**’|X*]= PE[’|X*]P’
= PE[’|X]P’
= σ2PP’ = σ2 -1/2 -1/2 = σ2 0
= σ2I
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
88/95
Generalized Least Squares
Aitken theorem. The Generalized Least Squares
estimator, GLS.
Py = PX + P or
y* = X* + *.
E[**’|X*]= σ2I
Use ordinary least squares in the transformed
model. Satisfies the Gauss – Markov theorem.
b* = (X*’X*)-1X*’y*
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
89/95
Unbiasedness
ˆ  ( X'Ω-1 X )1 X'Ω-1 y
β
 β  ( X'Ω-1 X )1 X'Ω-1ε
ˆ
E[β|X
] = β  ( X'Ω-1 X ) 1 X'Ω-1E[ε | X ]
= β if E[ε | X]  0
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
90/95
Consistency
Use Mean Square
-1


X'Ω
X
ˆ
Var[β|X ]= 

n 
n

2
1
 0?
 X'Ω-1 X 
Requires to be 
 "well behaved"
n


Either converge to a constant matrix or diverge.
Heteroscedasticity case:
X'Ω-1 X 1 n 1
  i1 x i x i'
n
n
ii
Autocorrelation case:
X'Ω-1 X 1 n
n
1
  i1  j1
x i x j'. n2 terms. Convergence is unclear.
n
n
ij
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
91/95
Generalized (Weighted) Least Squares
Heteroscedasticity Case
 1
0
2
2 
Var[]     
0

0
-1/2
0
...
0
... 0 
... 0 

... n 
2
0
0
1 / 1

 0

 0
 0

0


...
0 

...
0 
... 1 / n 
...
1 / 2
0
0
0
1
ˆ  ( X'Ω X ) ( X'Ω y )    n 1 x x     n 1 x y 
β
i i
i i
i1
i1
i
i

 

1
-1
-1
ˆ
 y  x β
i
i
 i1   
i


2

ˆ 
nK
2
n
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
92/95
Autocorrelation
t = t-1 + ut
(‘First order autocorrelation.’ How does this
come about?)
Assume -1 <  < 1. Why?
ut = ‘nonautocorrelated white noise’
t = t-1 + ut (the autoregressive form)
= (t-2 + ut-1) + ut
= ... (continue to substitute)
= ut + ut-1 + 2ut-2 + 3ut-3 + ...
= (the moving average form)
(Some observations about modeling time series.)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
93/95
Autocorrelation
Var[ t ]  Var[ut  ut 1  2ut 1  ...]

= Var   i=0 iut i 


u2

2i 2
= i=0  u 
1  2
An easier way: Since Var[ t ]  Var[ t 1 ] and  t   t 1  ut
Var[ t ]  2 Var[ t 1 ]  Var[ut ]  2Cov[ t 1 ,ut ]
=2 Var[ t ]  u2
u2

1  2
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
94/95
Autocovariances
Continuing...
Cov[ t ,  t 1 ] = Cov[ t 1  ut ,  t 1 ]
= Cov[ t 1 ,  t 1 ]  Cov[ut ,  t 1 ]
= Var[ t-1 ]  Var[ t ]
u2
=
(1  2 )
Cov[ t ,  t 2 ] = Cov[ t 1  ut ,  t 2 ]
= Cov[ t 1 ,  t 2 ]  Cov[ut ,  t 2 ]
= Cov[ t ,  t 1 ]
2 u2
=
and so on.
2
(1   )
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
95/95
Autocorrelation Matrix
 1



1
2

 u  2
2

 Ω

2  
 1   

T 1 T  2

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000

Scatterplot of Listing vs IncomePC
Normal - 95% CI
99
700000
300000
100000
Probability Plot of Listing
T 3
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
1
Percent
900000

T 2 
 
T 3 


1 


Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

T 1
2
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
96/95
Generalized Least Squares
 1  2

 
 0

 ...

 0
Ω 1 / 2
0
0
...
1
0
...

1
...
0

0
0

...

0
... ... ...
0



0
 1  2 y 
1

 y  y 
1
 2

1 / 2
Ω y =  y  y 
3
2


...


 y 

 T
T 1 
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
1
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Download