MathStat

advertisement
Statistical Inference
and Regression
Analysis: GB.3302.30
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
2/97
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
2
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
3/97
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
3
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Statistics and Data Analysis
Part 7 – Regression Model-1
Regression
Diagnostics
5/97
Using the Residuals
How do you know the model is “good?”
 Various diagnostics to be developed over
the semester.
 But, the first place to look is at the residuals.

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
6/97
Residuals Can Signal a
Flawed Model
Standard application: Cost function
for output of a production process.
 Compare linear equation to a
quadratic model (in logs)
 (123 American Electric Utilities)

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
7/97
Electricity Cost Function
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
8/97
Candidate Model for Cost
Log c = a + b log q + e
Scatterplot of logCost vs logOutput
7
6
5
Most of the points in
this area are above
the regression line.
logCost
4
3
Most of the points in
this area are above
the regression line.
2
1
Most of the points in
this area are below the
regression line.
0
-1
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
Mean
StDev
N
AD
P-Value
95
90
500000
400000
100000
15000
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
4
5
200000
2
1
100000
15000
0
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
300000
10
Mean
StDev
N
10
500000
400000
20
300000
200000
60
50
40
30
Marginal Plot of Listing vs IncomePC
Normal
100
12
700000
600000
70
12
Empirical CDF of Listing
14
800000
80
600000
200000
369687
156865
51
0.994
0.012
10
Histogram of Listing
900000
99
700000
300000
100000
Probability Plot of Listing
8
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
6
logOutput
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
4
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
2
Percent
0
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
9/97
A Better Model?
Log Cost = α + β1 logOutput + β2 [logOutput]2 + ε
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
10/97
Candidate Models for Cost
The quadratic equation is the appropriate model.
Logc = a + b1 logq + b2 log2q + e
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
11/97
Missing Variable Included
Residuals from the quadratic cost model
Residuals Versus logOutput
Residuals Versus logOutput
(response is logCost)
(response is logCost)
2.0
0.50
1.5
0.25
Residual
Residual
1.0
0.5
0.00
0.0
-0.25
-0.5
-0.50
-1.0
0
2
4
6
logOutput
8
10
12
0
2
4
6
logOutput
8
10
12
Residuals from the linear cost model
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
12/97
Unusual Data Points
Outliers have (what appear to be) very large disturbances, ε
Scatterplot of Weight vs TLength
Regression of Foreign Box Office on Domestic
Overseas = 6.693 + 1.051 Domestic
160
1400
S
R-Sq
R-Sq(adj)
1200
140
73.0041
52.2%
52.1%
Overseas
Weight
1000
120
100
800
600
400
80
200
0
60
16
18
20
TLength
22
24
26
28
0
Wolf weight vs. tail length
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
6
200000
2
1
100000
15000
800000
1000000
Normal
Mean
StDev
N
369687
156865
51
80
5
400000
600000
Listing
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
8
4
200000
600
10
500000
300000
0
500
12
700000
400000
10
17500
400
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
300
Domestic
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
200
The 500 most successful movies
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
100
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
14
Percent
12
Frequency
10
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
13/97
Outliers
99.5% of observations will lie within mean ± 3 standard deviations.
We show (a+bx) ± 3se below.)
Regression of Foreign Box Office on Domestic
Titanic is 8.1
standard
deviations
from the
regression!
Overseas = 6.693 + 1.051 Domestic
1400
S
R-Sq
R-Sq(adj)
1200
1000
Overseas
These
observations
might
deserve a
close look.
73.0041
52.2%
52.1%
800
Only 0.86% of
the 466
observations
lie outside
the bounds.
(We will
refine this
later.)
600
400
200
0
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
600
Scatterplot of Listing vs IncomePC
Normal - 95% CI
99
700000
300000
100000
Probability Plot of Listing
500
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
400
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
300
Domestic
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
200
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
100
Frequency
0
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
14/97
Prices paid at auction for Monet paintings vs. surface area (in logs)
logPrice = a + b logArea + e
Not an outlier: Monet chose to paint a small painting.
Possibly an outlier: Why was the price so low?
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
15/97
What to Do About Outliers
(1) Examine the data
(2) Are they due to mismeasurement error or obvious
“coding errors?” Delete the observations.
(3) Are they just unusual observations? Do nothing.
(4) Generally, resist the temptation to remove outliers.
Especially if the sample is large. (500 movies is
large. 10 wolves is not.)
(5) Question why you think it is an outlier. Is it really?
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
16/97
Regression Options
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
17/97
Diagnostics
Residuals
e = y - (a + bx)
Standardized Residual
e
e* =
(x-x) 2
se 1 
(x-x) 2
e* has mean 0 and standard deviation 1
Influential observations
have very large values
of | x i - x | / (x i - x ) 2 .
(How large depends on the
number of variables in the model.)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
18/97
On Removing Outliers
Be careful about singling out particular
observations this way.
The resulting model might be a product of your
opinions, not the real relationship in the data.
Removing outliers might create new outliers
that were not outliers before.
Statistical inferences from the model will be
incorrect.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Statistics and Data Analysis
Part 7 – Regression Model-2
Statistical Inference
20/97
b As a Statistical Estimator
What is the interest in b?
  = dE[y|x]/dx
 Effect of a policy variable on the
expectation of a variable of interest.
 Effect of medication dosage on
disease response
 … many others

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
21/97
Application: Health Care Data
German Health Care Usage Data, There are altogether 27,326 observations on
German households, 1984-1994.
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
1(Number of doctor visits > 0)
1(Number of hospital visits > 0)
health satisfaction, coded 0 (low) - 10 (high)
number of doctor visits in last three months
number of hospital visits in last calendar year
insured in public health insurance = 1; otherwise = 0
insured by add-on insurance = 1; otherswise = 0
household nominal monthly net income in German marks / 10000.
children under age 16 in the household = 1; otherwise = 0
years of schooling
age in years
marital status
years of education
Frequency
DOCTOR =
HOSPITAL=
HSAT
=
DOCVIS =
HOSPVIS =
PUBLIC =
ADDON =
INCOME =
HHKIDS =
EDUC
=
AGE
=
MARRIED =
EDUC
=
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
22/97
Regression?
Population relationship
Income =  + Health + 
 (For this population,
Income = .31237 + .00585 Health + 
E[Income | Health] =
.31237 + .00585 Health

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
23/97
Distribution of Health
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
24/97
Distribution of Income
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
25/97
Average Income | Health
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
90
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
200000
2
1
100000
15000
800000
1000000
Normal
Mean
StDev
N
369687
156865
51
80
5
400000
600000
Listing
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
8
4
200000
3192
10
500000
300000
0
3061
12
700000
400000
10
6172
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
400000
4191
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
500000
2570
Scatterplot of Listing vs IncomePC
Normal - 95% CI
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
1390
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
1173
Percent
Meatball
Garlic 5.0%
2.3%
642
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
255
Listing
447
Percent
=
Listing
Nj
Health
4233
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
26/97
b is a statistic
Random because it is a sum of the ’s.
 It has a distribution, like any sample
statistic

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
27/97
Sampling Experiment
500 samples of N=52 drawn from the
27,326 (using a random number
generator to simulate N observation
numbers from 1 to 27,326)
 Compute b with each sample
 Histogram of 500 values

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
28/97
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
29/97
Conclusions
Sampling variability
 Seems to center on 
 Appears to be normally distributed

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
30/97
Distribution of slope estimator, b

Assumptions:
(Model) Regression: yi =  + xi + i
 (Crucial) Exogenous data: data x and noise 
are independent;
E[|x]=0 or Cov(,x)=0
 (Temporary) Var[|x] = 2, not a function of x
(Homoscedastic)

Results: What are the properties of b?
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Frequency
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
31/97
  x  x  y  y 
 x  x
N
b
i 1
i
i
2
N
i 1
i
yi    x i  i
Treat the sample data on x as given, then
 x
 x  (  x i  i )  (   x   ) 
N
(Insert expr. for yi ) b 
i 1
i
 x
i 1
 x
b
i 1
i
2
 x   (  )  ( x i  x)  (i   ) 
N
(Expand)
 x
N
i
 i1  x i  x 
N
2
 x  x  x  x    x  x     

(Expand more)
b

 x  x
 x  x
  x  x     x  x     
(Simplify)
b 
 x  x
 x  x
 x  x     

(First term simplifies) b   
 x  x
N
N
i 1
i
i 1
i
2
N
i 1
i 1
2
i 1
i
N
i
N
2
i 1
2
i 1
i
N
i
N
i
i
i
2
N
i 1
i
i
N
i 1
i
i
2
N
i 1
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
i
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
32/97
(1) b is unbiased and linear in 
 x  x     
x  x  x  x 


b 


 x  x
 x  x  x  x
N
N
i 1
i
i 1
i
2
N
i 1
N
i
2
N
i 1
i
    i 1 w i i , w i 
n
i 1
i
 xi  x 
2
N
 i1  x i  x 
i
2
N
i 1
i
i
(Linearity in i )
N
E[b | x1 ,..., x N ]    E   i 1 w i i | x i 


    i 1 w i E[ i | x i ]
N
  (Unbiasedness)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
33/97
(2) b is efficient
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
Variance of b is smallest among linear
unbiased estimators.
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
Gauss – Markov Theorem: Like Rao
Blackwell. (Proof in Greene)
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
34/97
(3) b is consistent
xi  x 

wi 
2
N
 i1  x i  x 
b     i 1 w i i ,
n
(Linearity in i )
2


x

x


N
N
i

Var[b | data]   i 1 w i2 2  2  i 1 
2
 N x x 
 
  i 1  i


N
2
   xi  x  
2
2
i 1

= 

0
2
2
N
2
  N x  x     xi  x 
 
i 1
   i 1  i
 
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
35/97
Consistency: N=52 vs. N=520
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
36/97
a is unbiased and consistent
a  y  bx


    x        i 1 w i i x
N
    i 1 ( N1  xw i )i
N
E[a | data]     i 1 ( N1  xw i )E[i | data]
N

2


2
2
x
(x

x)
x(x

x)
1


i

Var[a | data]    i 1    
2 N i
2
2
  N   N (x  x) 2 
i 1 (x i  x) 
i

1
i



x2
2 1
   N

0
2 
 N i 1 (x i  x) 
N
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency
2
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
37/97
Covariance of a and b
a     i 1 ( N1  xw i )i
N
b     i 1 w i i
N
Covariance[a, b]   i 1 Cov ( N1  xw i )i , w i i 
(cross terms are all zero because of independence)
N
 2  i 1 ( N1  xw i )w i
N
 2

N

w i  2
1
N
i 1

N
i 1

xw i2 (first term is zero)
=   2 x  i 1 w i2
N
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
(x i  x)
(We will need this result later)
2
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
i 1
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

N
Frequency

 2 x
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
38/97
Inference about 
Have derived expected value and
variance of b.
 b is a ‘point’ estimator
 Looking for a way to form a
confidence interval.
 Need a distribution and a pivotal
statistic to use.

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
39/97
Normality
Additional assumption about the data
i comes from a normal population with mean 0
and variance 2
b     i 1 wi i
N
2
2
E[b|data]  , Var[b|data]= N


b
i 1 ( xi  x ) 2
Now, we also have that b has a normal distribution with
this mean and variance
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
40/97
Confidence Interval
b 
~ N[0,1]
b
Prob
b 
 z  2(1   ( z ))
b
E.g., if z = 1.96, Prob[b is more than 1.96 standard deviations
from ] is 2(1 -  (1.96)) = 2(1 - .975) = 2(.025) = .05
The usual confidence interval calculation, applied to b.
Prob[b - zb   < b + zb ] = 2(1   ( z ))
********* We need to know  to use this. ***********
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
41/97
Estimating sigma squared
 i=1 ei2
N
se2 =
N- 2

=
N
2
(y
a
bx
)
i
i
i=1
N- 2
Is an unbiased estimator of  2 . (Proof later)
se2
2
2
Use to form s  N
as an estimator of b  N
2
i 1 ( xi  x )
i 1 ( xi  x ) 2
2
b
(N-2)s e2
has a chi squared distribution with N-2 degrees of freedom.
2

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
42/97
Usable Confidence Interval
Use s instead of s.
 Use t distribution instead of normal.
 Critical t depends on degrees of
freedom
 b - ts <  < b + ts

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
43/97
Slope Estimator
Results
b=72.7181
N 2
Σ i=1
ei =10751.6
N=62
N
Σ i=1
(cntwait3i -cntwait3) 2 =1.49654
s e2 =13.38627 2
s b =10.94249
t*=2.0003
Confidence interval = 72.7181  2.0003(10.94249)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
44/97
Regression Results
----------------------------------------------------------------------------Ordinary
least squares regression ............
LHS=BOX
Mean
=
20.72065
Standard deviation
=
17.49244
---------No. of observations =
62 DegFreedom
Mean square
Regression
Sum of Squares
=
7913.58
1
7913.57745
Residual
Sum of Squares
=
10751.5
60
179.19235
Total
Sum of Squares
=
18665.1
61
305.98555
---------Standard error of e =
13.38627 Root MSE
13.16860
Fit
R-squared
=
.42398 R-bar squared
.41438
Model test
F[ 1,
60]
=
44.16247 Prob F > F*
.00000
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
BOX| Coefficient
Error
t
|t|>T*
Interval
--------+-------------------------------------------------------------------Constant|
-14.3600**
5.54587
-2.59 .0121
-25.2297
-3.4903
CNTWAIT3|
72.7181***
10.94249
6.65 .0000
51.2712
94.1650
--------+-------------------------------------------------------------------Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
45/97
Hypothesis Test about 
Outside the confidence interval is the
rejection for hypothesis tests about 
 For the internet buzz regression, the
confidence interval is

51.2712 to 94.1650
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Frequency
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
The hypothesis that  equals zero is
rejected.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Statistics and Data Analysis
Part 7-3 – Prediction
47/97
Predicting y Using the Regression
Actual y0 is  + x0 + 0
 Prediction is y0^ = a + bx0 + 0
 Error is y0 – y0^ = (a-) + (b-)x0 + 0
 Variance of the error is
Var[a] + x02 Var[b] + 2x0 Cov[a,b] + Var[0]

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
48/97
Prediction Variance
Var[a] + x 02 Var[b] + 2x 0 Cov[a,b] + var[ε 0 ]
1
  2 2


x2
1
x
2
2
   N

x


2
x








0
0
2
N
2
N
2
N

(
x

x
)

(
x

x
)

(
x

x
)
i 1
i
i 1
i

 

 i 1 i

Collect the terms:
2

( x0  x ) 2 
1
1
2
2 2
Var[prediction error] =  1   N


1


(
x

x
)
sb
0


2 
N

 N i 1 ( xi  x ) 
To form a prediction interval, use s to estimate  and use t distribution.
2
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
49/97
Quantum of Solace

Pie Chart of Percent vs Type
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Percent

Actual Box = $67.528882M
a=-14.36, b=72.7181, N=62, sb =10.94249, s2 = 13.38632
buzz = 0.76, prediction = 40.906
Mean buzz = 0.4824194
(buzz – mean)2 = 1.49654
Sforecast = 13.8314252
Confidence interval = 40.906 +/- 2.003(13.831425)
= 13.239 to 68.527
(Note: The confidence interval contains the value)
Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
50/97
Forecasting Out of Sample
Fitted Line Plot
Regression Analysis: G versus Income
The regression equation is
G = 1.93 + 0.000179 Income
Predictor
Coef SE Coef
T
P
Constant
1.9280
0.1651
11.68 0.000
Income 0.00017897 0.00000934 19.17 0.000
S = 0.370241 R-Sq = 88.0% R-Sq(adj) = 87.8%
G = 1.928 + 0.000179 Income
8
Regression
95% PI
S
R-Sq
R-Sq(adj)
7
0.370241
88.0%
87.8%
G
6
5
How to predict G for 2012? You would
need first to predict Income for 2012.
4
3
10000 12500 15000 17500 20000 22500 25000 27500
Income
How should we do that?
Per Capita Gasoline Consumption
vs. Per Capita Income, 1953-2004.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
51/97
The Extrapolation Penalty
The interval is narrowest at x* = x , the
center of our experience.
The interval widens as we move away
from the center of our experience to
reflect the greater uncertainty.
(1) Uncertainty about the prediction of x
(2) Uncertainty that the linear
relationship will continue to exist as
we move farther from the center.
Fitted Line Plot
Weight = 40.00 + 3.000 TLength
Regression
95% PI
S
R-Sq
R-Sq(adj)
22.3607
36.0%
28.0%
100
50
0
12
14
16
18
20
TLength
22
24
26
28
1

s  1+   (x*  x)2 (SE(b))2
N

Prediction  1.96
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
2
e
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
10
Percent
Weight
150
Frequency
200
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
52/97
Normality
Necessary for t statistics and
confidence intervals
 Residuals reveal whether
disturbances are normal?
 Standard tests and devices

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
53/97
Normally Distributed Residuals?
-------------------------------------Kolmogorov-Smirnov test of F(E
)
vs. Normal[
.00000,
13.27610^2]
******* K-S test statistic = .1810063
******* 95% critical value = .1727202
******* 99% critical value = .2070102
Normality hyp. should
be rejected.
--------------------------------------
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
54/97
Nonnormal Disturbances
Plain
32.5%
800000
800000
500000
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
Percent
Scatterplot of Listing vs IncomePC
900000
400000
Mushroom
16.2%
t is essentially normal if N > 50.
Frequency
Sausage
5.8%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
Use standard normal instead of t
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
Appeal to the central limit theorem
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Statistics and Data Analysis
Part 7-4 – Multiple Regression
56/97
Box Office and Movie Buzz
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
56
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
57/97
Box Office and Budget
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
57
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
58/97
Budget and Buzz Effects
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
58
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
59/97
An Enduring Art Mystery
Graphics show relative
sizes of the two works.
The Persistence
of Statistics.
Rice, 2007
Why do larger
paintings command
higher prices?
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
The Persistence of
Memory. Salvador
Dali, 1931
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
60/97
The Data
Histogram of ln (SurfaceArea)
Histogram of ln (US$)
80
90
80
60
70
50
60
Frequency
Frequency
70
40
30
50
40
30
20
20
10
10
0
10.5
12.0
13.5
ln (US$)
15.0
0
16.5
3.2
4.0
4.8
5.6
6.4
ln (SurfaceArea)
7.2
8.0
8.8
Note: Using logs in this context. This is common when analyzing financial
measurements (e.g., price) and when percentage changes are more interesting
than unit changes. (E.g., what is the % premium when the painting is 10% larger?)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
61/97
Monet in Large and Small
Sale prices of 328 signed Monet paintings
Fitted Line Plot
ln (US$) = 2.825 + 1.725 ln (SurfaceArea)
18
S
R-Sq
R-Sq(adj)
17
1.00645
20.0%
19.8%
ln (US$)
16
The residuals do not
show any obvious
patterns that seem
inconsistent with the
assumptions of the
model.
15
14
13
12
11
6.0
6.2
6.4
6.6
6.8
7.0
ln (SurfaceArea)
7.2
7.4
7.6
Log of $price = a + b log surface area + e
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
62/97
Monet Regression: There seems to
be a regression. Is there a theory?
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
63/97
How much for the signature?

The sample also contains 102 unsigned
paintings
Average Sale Price
Signed
$3,364,248
Not signed $1,832,712
Average price of signed Monet’s is almost
twice that of unsigned
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Frequency
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
64/97
Can we separate the two effects?
Average Prices
Small Large
Unsigned 346,845 5,795,000
Signed
689,422 5,556,490
What do the data suggest?
(1) The size effect is huge
(2) The signature effect is confined to the
small paintings.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
65/97
A Multiple Regression
Scatterplot of ln (US$) vs ln (SurfaceArea)
18
Signed
0
1
17
16
b2
ln (US$)
15
14
13
12
11
10
6.0
6.2
6.4
6.6
6.8
7.0
ln (SurfaceArea)
7.2
7.4
7.6
Ln Price = a + b1 ln Area + b2 (0 if unsigned, 1 if signed) + e
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
66/97
Monet Multiple Regression
Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed
The regression equation is
ln (US$) = 4.12 + 1.35 ln (SurfaceArea) + 1.26 Signed
Predictor
Coef SE Coef
T
P
Constant
4.1222
0.5585
7.38 0.000
ln (SurfaceArea)
1.3458
0.08151 16.51 0.000
Signed
1.2618
0.1249
10.11 0.000
S = 0.992509
R-Sq = 46.2%
R-Sq(adj) = 46.0%
Interpretation (to be explored as we develop the topic):
(1) Elasticity of price with respect to surface area is 1.3458 – very large
(2) The signature multiplies the price by exp(1.2618) (about 3.5), for any
given size.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
67/97
Ceteris Paribus in Theory

Demand for gasoline:
G = f(price,income)

Demand (price) elasticity:
eP = %change in G given %change in P
holding income constant.

How do you do that in the real world?
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Percent
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Listing

The “percentage changes”
How to change price and hold income
constant?
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
68/97
The Real World Data
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
69/97
U.S. Gasoline Market, 1953-2004
Time Series Plot of logG, logIncome, logPg
5
Variable
logG
logIncome
logPg
Data
4
3
2
1
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
2001
Scatterplot of Listing vs IncomePC
Normal - 95% CI
99
700000
300000
100000
Probability Plot of Listing
1993
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
1985
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
1977
Year
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
1969
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
1961
Frequency
1953
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
70/97
Shouldn’t Demand Curves Slope Downward?
Scatterplot of GasPrice vs G
140
120
GasPrice
100
80
60
40
20
0
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
100000
15000
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
4
5
200000
2
1
100000
15000
0
200000
400000
600000
Listing
800000
1000000
Mean
StDev
N
369687
156865
51
80
8
300000
10
Normal
10
500000
400000
20
300000
200000
60
50
40
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
700000
600000
70
0.65
14
800000
80
500000
200000
369687
156865
51
0.994
0.012
0.60
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
0.55
Scatterplot of Listing vs IncomePC
Normal - 95% CI
99
700000
300000
100000
Probability Plot of Listing
0.50
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
0.45
G
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
0.40
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
0.35
Percent
0.30
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
71/97
A Thought Experiment
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
g
3
10000
Probability Plot of Listing
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
Histogram of Listing
6
200000
2
1
100000
15000
800000
1000000
Marginal Plot of Listing vs IncomePC
Normal
Mean
StDev
N
369687
156865
51
80
5
400000
600000
Listing
27500
Empirical CDF of Listing
100
8
4
200000
25000
10
500000
300000
0
22500
12
700000
400000
10
17500
20000
Income
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
15000
900000
Mean
StDev
N
AD
P-Value
95
600000
12500
Scatterplot of Listing vs IncomePC
Normal - 95% CI
99
700000
300000
100000
4
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
5
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Meatball
Garlic 5.0%
2.3%
6
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
7
Frequency

Scatterplot of g vs Income
Listing

Percent

The main driver of
gasoline consumption
is income not price
Income is growing
over time.
We are not holding
income constant when
we change price!
How do we do that?
Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
72/97
How to Hold Income Constant?
Multiple Regression Using Price and Income
Regression Analysis: G versus GasPrice, Income
The regression equation is
G = 0.134 - 0.00163 GasPrice + 0.000026 Income
Predictor
Constant
GasPrice
Income
Coef
0.13449
-0.0016281
0.00002634
SE Coef
0.02081
0.0004152
0.00000231
T
6.46
-3.92
11.43
P
0.000
0.000
0.000
It looks like the theory works.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Statistics and Data Analysis
Linear Multiple
Regression Model
74/97
Classical Linear Regression
Model

The model is y = f(x1,x2,…,xK,1,2,…K) + 
= a multiple regression model.
Important examples:
 Marginal cost in a multiple output setting
 Separate age and education effects in an earnings equation.
 Denote (x1,x2,…,xK) as x. Boldface symbol = vector.
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
‘Dependent’ and ‘independent’ variables.
 Independent of what? Think in terms of autonomous variation.
 Can y just ‘change?’ What ‘causes’ the change?
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
Form of the model – E[y|x] = a linear function of x.
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
75/97
Model Assumptions: Generalities

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Linearity means linear in the parameters. We’ll return
to this issue shortly.
Identifiability. It is not possible in the context of the
model for two different sets of parameters to produce
the same value of E[y|x] for all x vectors. (It is possible
for some x.)
Conditional expected value of the deviation of an
observation from the conditional mean function is zero
Form of the variance of the random variable around
the conditional mean is specified
Nature of the process by which x is observed.
Assumptions about the specific probability distribution.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
76/97
Linearity of the Model



E[y|x] = 1*1 + 2*x2 + … + K*xK.
(1*1 = the intercept term).
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Boldface letter indicates a column vector. “x” denotes a
variable, a function of a variable, or a function of a set of
variables.
There are K “variables” on the right hand side of the
conditional mean “function.”
The first “variable” is usually a constant term. (Wisdom:
Models should have a constant term unless the theory says
they should not.)
Listing

f(x1,x2,…,xK,1,2,…K) = x11 + x22 + … + xKK
Notation: x11 + x22 + … + xKK = x.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
77/97
Linearity


Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Logs and levels in economics
Time trends, and time trends in loglinear models –
rates of growth
Dummy variables
Quadratics, power functions, log-quadratic, trig
functions, interactions and so on.
Percent

Frequency

Listing

Linearity means linear in the parameters, not
in the variables
E[y|x] = 1 f1(…) + 2 f2(…) + … + K fK(…).
fk() may be any function of data.
Examples:
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
78/97
Linearity
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Simple linear model, E[y|x] =x’β
Quadratic model:
E[y|x] = α + β1x + β2x2
Loglinear model,
E[lny|x] = α + Σk lnxkβk
Semilog,
E[y|x] = α + Σk lnxkβk
All are “linear.” An infinite number of variations.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
79/97
Matrix Notation
xik  xrow,column  xobservation ,var iable
Observation 1
y1  1 x11  2 x12  ...   K x1K  1
Observation 2
y 2  1 x21  2 x22  ...   K x2 K   2
Observation 3
y3  1 x31  2 x32  ...   K x3 K  3
...
Observation i
yi  1 xi1  2 xi 2  ...   K xiK  i
...
Observation N y1  1 xN 1  2 xN 2  ...   K xNK   N
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
79
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
80/97
Notation
Define column vectors of N observations on y and the K x variables.
 y1   x11
  
y2   x21

y

  
  
 yN   xN 1
y = X + 
x1K   1   1 
x2 K   2    2 


    
    
xNK  K   N 
x12
x22
xN 2
The assumption means that the rank of the matrix X is K.
No linear dependencies => FULL COLUMN RANK of the matrix X.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
81/97
Uniqueness of the
Conditional Mean
The conditional mean relationship must hold for any set of N
observations, i = 1,…,N. Assume, that N  K (justified later)
E[y1|x] = x1
E[y2|x] = x2
…
E[yn|x] = xn
All N observations at once: E[y|X] = X = E.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
82/97
Uniqueness of E[y|X]
Now, suppose there is a    that produces the
same expected value,
E[y|X] = X = E.
Let  =  - . Then,
X = X - X = E - E = 0.
Is this possible? X is an NK matrix (N rows, K
columns). What does X = 0 mean? We
assume this is not possible. This is the ‘full
rank’ assumption. Ultimately, it will imply that
we can ‘estimate’ . This requires N  K .
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
83/97
An Unidentified (But Valid)
Theory of Art Appreciation
Enhanced Monet Area Effect Model:
Height and Width Effects
Log(Price) = β1 + β2 log Area +
β3 log Aspect Ratio +
β4 log Height +
β5 Signature +
ε
(Aspect Ratio = Height/Width)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
84/97
Conditional Homoscedasticity and
Nonautocorrelation
Disturbances provide no information about each other.
 Var[i | X ]
= 2
 Cov[i, j |X] = 0
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
100000
15000
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
4
5
200000
2
1
100000
15000
0
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
300000
10
Mean
StDev
N
10
500000
400000
20
300000
200000
60
50
40
30
Normal
100
12
700000
600000
70
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
80
600000
200000
369687
156865
51
0.994
0.012
... 0 

... 0 
2
... 0    I

... ... 
... 2 
0
0
2
...
0
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
0
2
0
...
0
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
... Cov(1 ,  N )  2

... Cov( 2 ,  N )   0
... Cov(3 ,  N )    0
 
...
...
  ...
... Var ( N )   0
Percent
Cov(1 ,  2 ) Cov(1 , 3 )
 Var (1 )
 Cov( ,  )
Var ( 2 )
Cov( 2 , 3 )
2 1

 Cov(3 , 1 ) Cov(3 ,  2 )
Var (3 )

...
...
...

Cov( N , 1 ) Cov( N ,  2 ) Cov( N , 3 )
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
85/97
Heteroscedasticity
Countries
are ordered
by the
standard
deviation of
their 19
residuals.
Regression of log of per capita gasoline use on log of per capita income,
gasoline price and number of cars per capita for 18 OECD countries for 19
years. The standard deviation varies by country.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
86/97
Autocorrelation
logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
87/97
Autocorrelation Results from
an Incomplete Model
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
88/97
Normal Distribution of ε

Used to facilitate finite sample derivations of certain test
statistics.

Observations are independent

Assumption will be unnecessary – we will use the central limit
theorem for the statistical results we need.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
89/97
The Linear Model
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
Regression: E[y|X] = X
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
y = X+ε, N observations, K columns in X,
(usually including a column of ones for the intercept).
 Standard assumptions about X
 Standard assumptions about ε|X
 E[ε|X]=0, E[ε]=0 and Cov[ε,x]=0
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Statistics and Data Analysis
Least Squares
91/97
Vocabulary

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Some terms to be used in the discussion.
Population characteristics and entities vs. sample
quantities and analogs
Residuals and disturbances
Population regression line and sample regression
Objective: Learn about the conditional mean
function. Estimate  and 2
First step: Mechanics of fitting a line to a set of data
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
92/97
Least Squares
2
e
 i1 i  ee = (y - Xb)'(y - Xb)
n
A digression on multivariate calculus.
Matrix and vector derivatives.
Derivative of a scalar with respect to a vector
Derivative of a column vector wrt a row vector
Other derivatives
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
93/97
Matrix Results
... in1 xi1 xiK 

... in1 xi 2 xiK 
... 
...

2
n
... i 1 xiK 
 in1 xi21
in1 xi1 xi 2
 n
2
n
x

x
x

i 1 i 2
X'X =  i 1 i 2 i1
 ...
...
 n
n
i 1 xiK xi1 i 1 xiK xi 2
 in1 xi1 yi 

 n
y
x

X'y =  i  i 2 i 
 ... 

 n
i  xiK yi 
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
94/97
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
95/97
Moment Matrices
XX = 1 Pg Y  1 Pg Y 


12  ...12
.925(1)  ...  3.789(1)
6036(1)  ...  11934(1)


2
2
  .925(1)  ...  3.789(1)
.925  ...  3.789
6036(.925)  ...  11934(3.789) 
6036(1)  ...  11934(1) 6036(.925)  ...  11934(3.789)

60362  ...  119342


Xy = 1 Pg Y  G 
1(129.7)  ...  1(297.8)



  .925(129.7)  ...  3.789(297.8) 
6036(129.7)  ...  11934(297.8) 
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
96/97
Least Squares Normal Equations
(y - Xb)'(y - Xb)
b
(1  1) / (k  1)
 2 X'(y - Xb)
= 0
( -2)(N  K)'(N  1)
= ( -2)(K  N)(N  1) = K  1
Note: Derivative of 11 wrt K 1 is a K 1 vector.
Solution - Least squares normal equations: X'y = X'Xb
Assuming it exists: b = (X'X)-1X'y
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
97/97
Second Order Conditions
(y - Xb)'(y - Xb)
b
 2 X'(y - Xb)
 (y - Xb)'(y - Xb) 


 column vector
b


=
=
b
 row vector
[2X'(y - Xb)]
=
b
= 2X'X
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency
 2 (y - Xb)'(y - Xb)
bb
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
98/97
Does b Minimize e’e?
 in1 xi21
in1 xi1 xi 2
 n
n
2
2

x
x

x
 e'e
i 1 i 2
 2 X'X = 2  i 1 i 2 i1
 ...
bb'
...
 n
n
i 1 xiK xi1 i 1 xiK xi 2
... in1 xi1 xiK 

... in1 xi 2 xiK 
...
... 

n
2
... i 1 xiK 
If there were a single b, we would require this to be
positive, which it would be; 2x'x = 2 i 1 xi2  0.
n
The matrix counterpart of a positive number is a
positive definite matrix.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
99/97
Positive Definite Matrix
Matrix C is positive definite if a'Ca is > 0
for any a.
Generally hard to check. Requires a look at
characteristic roots.
For some matrices, it is easy to verify. X'X is
one of these.
2
v
 k=1 k  0
K
a'(X'X)a = (a'X')( Xa) = ( Xa)'( Xa) = v'v =
Could v = 0? v = 0 means Xa = 0. Is this possible?
Conclusion: b = ( X'X)-1 X'y does indeed minimize e'e.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Download