Regression

advertisement
Applied Regression Analysis
BUSI 6220
Adapted from notes by:
Dr. K.N.Thompson, Dept. of Marketing, University of North Texas, 1999
Dr. S. Kulkarni, ITDS Dept., University of North Texas, 2004
Dr. N. Evangelopoulos, ITDS Dept., University of North Texas, 2012
Welcome to the UNT PhD in
Business Program! Get Socialized!
For a person to become a member of a scientific community typically involves not only the
cognitive or intellectual work required in a Ph.D. program, but also the socialization into the
ways of being a scientist in the given scientific discipline or specialty, where the socializing
typically begins with the Ph.D. student’s being a research assistant (i.e., apprentice) to a
senior professor, continues with experiences in gaining or not gaining acceptance at
conferences and journals, and eventually comes to include the tacit knowledge with which
the members of his or her particular scientific specialty are able, without conscious
deliberation, to know and agree that a particular instance of theoretical or empirical
research is valid and significant (or not). In his historical and sociological studies of natural
scientists, Kuhn (1996) has argued convincingly that a scientific theory does not exist
independently of the social forces of the particular scientific community that has developed,
championed, and refined it, but can be understood only in the social and historical context
of that particular scientific community.
Mårtensson and Lee, “Dialogical Action Research at
Omega Corporation,” MIS Quarterly Vol. 28 No. 3
(September 2004), pp. 507-536.
BusinessWeek report: Are you a top
performer?


In a confirmation of the proverbial
“Lies, Damn Lies and Statistics”1, the
question “Are you one of the top 10%
performers in your company?”
yielded some more surprising (or,
perhaps, not so surprising) results
Moral of the story: Cognitive biases
force people to lie, even to
themselves. Because of that, statistics
is often poorly understood by the
general public
1Attributed
to Benjamin Disraeli, British Prime
Minister, 1874-1880, popularized by Mark Twain
More quotations about Statistics
“Not everything
that can be counted
counts, and not
everything that
counts can be
counted”.-George
Gallup
“I’ve come loaded
with statistics, for
I’ve noticed that man
can’t prove anything
without statistics.”Mark Twain
“If we knew what it
was we were doing,
it would not be
called research,
would it”? -Albert
Einstein
Taken from The RESEARCH DIGEST Web site, http://researchexpert.wordpress.com/wise-words/
Statistics and the big facts of life
Statistics show that there
are more women in the
world than anything else.
Except insects.
Glenn Ford in
Gilda (1946)
© Columbia 1946
Statistics in National Security
Yes, well, I've worked out a few
statistics of my own. 15 billion
dollars in gold bullion weighs
10,500 tons. Sixty men would
take twelve days to load it onto
200 trucks.
Now, at the most, you're going
to have two hours before the
Army, Navy, Air Force, and
Marines move in and make
you put it back!…
Sean Connery in
Goldfinger (1964)
© United Artists 1964
How best decisions are made
We have no way of knowing
what lies ahead for us in the
future… All we can do is use
the information at hand to
make the best decision
possible!
Christopher Walken
in Wedding
Crashers (2005)
© WireImage 2003
Regression : A Definition
What is Regression Analysis?
A very “robust” statistical methodology that
traditionally has used existing relationships
between variables to allow prediction of the values
of one variable from one or more others
 Examples

 Sales can be predicted using advertising expenditure
 Performance on aptitude tests can be used to predict job
performance
 GPA after first year in PhD program can be predicted
from GMAT score
On the news: Regression line helps
catch teachers who cheat

See course Web site, “school test erasure scandal”
A historical note: “Regression to
Mediocrity”
In the late 1800s, Sir Francis Galton
observed that heights of children of both
short and tall parents appeared to come
closer to the mean of the group:
extraordinary parents gave birth to more
“ordinary” children. Galton considered
this to be a “regression to mediocrity”
Today we understand that this effect is
due to the presence of other height
predictors: children with parents of
extraordinary height may be ordinary in
other height determinants, such as
nutrition

Functional Relationship
300
$ Sales
250
200
150
100
50
0
0
10
20
30
40
50
60
70
80
90 100 110 120 130 140
Units Sold

Y = 2X
 Y is the dependent or criterion variable
 X is the independent or predictor variable


Value of Y exactly predicted by X
No ‘error’ of prediction exists -- a perfect
relationship between X and Y
Y = GPA at end of first
year (response variable,
criterion variable,
dependent variable)
Statistical Relationship
4
X = Entrance exam score
(predictor variable,
independent variable,
explanatory variable)
3.5
Actual GPA
3
2.5
2
y = 0.8399x - 1.6996
R2 = 0.6538
1.5
1
Each ‘dot’ is a ‘case’ or
‘trial’
0.5
0
3.9 4.1 4.3 4.5 4.7 4.9 5.1 5.3 5.5 5.7 5.9 6.1 6.3 6.5
Entrance Test Score


Scatter diagram showing relationship between two variables, Score & GPA
Y = -1.6996 + .8399X
 X does not perfectly predict Y
 Your GPA at the end of the first year cannot be exactly predicted by your score on
an entrance exam

Relationship between Score and GPA appears to be linear
Regression as a General Data
Analytic System

Ability to ‘partial’ out the effects of specific
predictor variables on the criterion in situations in
which predictors are not orthogonal to one another
 Can establish the unique contributions of each predictor
to variance in the criterion
 Allows identification of ‘spurious’ relationships

Study systems of causal relationships Y=f(C,D,E,
etc.)
 Experimental & Non-experimental designs
 Causal Modeling, Covariance Structure Modeling
(C&C pp. 1-10)
Regression as a General Data
Analytic System

Form of the data more than quantitative, interval
or ratio
 Data can range from nominal to ratio
 Nominally scaled predictors
• Traditionally assessed within the context of ANOVA,
ANCOVA (by “grouping” the Y values)
• Can be incorporated into regression models with the help of
dummy variables

Shape of relationship need not be linear
 Predictors may be linear or non-linear
 Transformations of non-linear data possible to produce
linearity required for the regression model
Curvilinear Relationship
22
d-s Score
20
18
16
14
12
10
0
1
2
3
4
5
6
Motivation



Scatter diagram showing relationship between two variables, Motivation & d-s
Score
Motivation (X) does not perfectly predict d-s Score (Y )
Relationship between Motivation and d-s Score appears to be curvilinear
Regression as a General Data
Analytic System

Investigate ‘conditional relationships’
 Interactions between predictor variables or groups of
variables
 Extends ANOVA, ANCOVA
• Not limited to interactions between nominally scaled variables
• Can assess interactions between predictors measured at
virtually any level
 Extremely common in behavioral sciences….
Basic Regression Model
Population Regression
Function
Yi  β 0  β 1 X i  ε i

Yi= value of observed response on ith trial;
0 and 1 are parameters;
Regression Model is:
th
Xi = value of predictor on i trial ( a constant); • Simple
• Linear in parameters
i is a random error term

i = 1, …. n



• Linear in predictor variable
 E{i }=0 (expected value of error terms is zero)
• First order
 2 {i} = 2 (variance of error terms is constant)
  {i, j} = 0 for all i, j; i  j (error terms do not covary i.e. are not
correlated)
Each Yi consists of two parts: (1) a constant term predicted by the
regression equation; and, (2) a random error term unique to Yi . The
error term makes Yi a random variable.
Unexplained part: i
Observed value: Yi
Yi
E{Yi}
Explained part: Ŷi
Ŷi  β0  β1 X i
1 = the change in the
mean of the probability
distribution for Y for
each unit increase in X.
Xi
0 = Y-intercept; the mean of the probability distribution for
Y when X = 0. Assumes scope of model includes X = 0.
Regression Function
E{Y}  β 0  β 1 X
Yi
E{Yi}
The regression function
predicts the expected value of
Yi for a given Xi.
Values of Yi come from a
probability distribution with
mean of E{Yi } = 0 + 1Xi.
Xi
Example
GPA
3.10
2.30
3.00
1.90
2.50
3.70
3.40
2.60
2.80
1.60
2.00
2.90
2.30
3.20
1.80
1.40
2.00
3.80
2.20
1.50
Prediction of GPA at end of first year based on GMAT.
4.00
3.50
First Year GPA
Score
550
480
470
390
450
620
600
520
470
430
490
540
500
630
460
430
500
590
410
470
3.00
2.50
2.00
1.50
1.00
0.50
0.00
350
400
450
500
GMAT Score
550
600
650
E{GPA} = 3.34 when
GMAT =4.00
600
First Year GPA
3.34 3.50
3.00
2.50
2.00
1.50
1.00
0 = -1.6996.0.50
Value of
0.00
GPA assuming
that a
GMAT score of 0350
is
possible.
400
450
500
550
600
GMAT Score
E{Y }  1.6996  .0084 X
1 = .0084. GPA increases
by .0084 for each unit
increase in GMAT score.
650
Estimating the Regression Function
Regression Function specifies relationship
between predictor and response variables in a
population
 Values of regression parameters (0 and 1) are
estimated from sample data drawn from the
population. Data are obtained via:

 Observation
 Experimentation
 Survey
Method of Least Squares

Technique employed to produce estimates b0 and
b1 for 0 and 1, respectively.

Find those values of b0 and b1 that minimize the
sum of all squared error terms (i2)
n
Q   (Yi   0  1 X i ) 2
i 1

The estimators of 0 and 1 are the values of b0
and b1 that minimize Q for a set of sample
observations.
Example
Assume from GPA example that:
b0 = -2.5
b1 = 0.01
GMAT
Xi
550
480
470
390
450
620
600
520
470
430
490
540
500
630
460
430
500
590
410
470
GPA
Yi
Predicted
GPA

Yi
Error
i2
3.10
2.30
3.00
1.90
2.50
3.70
3.40
2.60
2.80
1.60
2.00
2.90
2.30
3.20
1.80
1.40
2.00
3.80
2.20
1.50
3.00
2.30
2.20
1.40
2.00
3.70
3.50
2.70
2.20
1.80
2.40
2.90
2.50
3.80
2.10
1.80
2.50
3.40
1.60
2.20
0.10
0.00
0.80
0.50
0.50
0.00
-0.10
-0.10
0.60
-0.20
-0.40
0.00
-0.20
-0.60
-0.30
-0.40
-0.50
0.40
0.60
-0.70
0.0100
0.0000
0.6400
0.2500
0.2500
0.0000
0.0100
0.0100
0.3600
0.0400
0.1600
0.0000
0.0400
0.3600
0.0900
0.1600
0.2500
0.1600
0.3600
0.4900
i
Sum = 3.6400
n
Q   (Yi   0  1 X i ) 2
i 1
n
Q   (Yi  b0  b1 X i ) 2
i 1
Q  3.64
4.00
3.50
3.00
GPA
2.50
2.00
1.50
1.00
0.50
0.00
350
400
450
500
550
600
650
GMAT
b0 = -2.5; b1 = .01. Q = 3.64.
Looks pretty good! Seems quite
reasonable, but…. are there other
values of b0 and b1 that provide
smaller Q’s for the sample data?
4.00
3.50
3.00
GPA
2.50
2.00
1.50
1.00
0.50
0.00
350
400
450
500
550
600
650
GMAT
b0 = -1.70; b1 = .0084 Q = 3.41.
Looks even better! This is the least
squares solution that minimizes Q.
No other values of b0 and b1 will
provide a smaller value of Q.
GMAT
Score
(Xi)
550
480
470
390
450
620
600
520
470
430
490
540
500
630
460
430
500
590
410
470
GPA
(Yi)
3.10
2.30
3.00
1.90
2.50
3.70
3.40
2.60
2.80
1.60
2.00
2.90
2.30
3.20
1.80
1.40
2.00
3.80
2.20
1.50
Predicted
GPA
(Y)
2.92
2.33
2.25
1.58
2.08
3.51
3.34
2.67
2.25
1.91
2.42
2.84
2.50
3.59
2.16
1.91
2.50
3.26
1.74
2.25
Error
Terms
0.18
-0.03
0.75
0.32
0.42
0.19
0.06
-0.07
0.55
-0.31
-0.42
0.06
-0.20
-0.39
-0.36
-0.51
-0.50
0.54
0.46
-0.75
Q=
Squared
Error
Terms
0.0324
0.0010
0.5655
0.1049
0.1764
0.0369
0.0036
0.0046
0.3047
0.0974
0.1730
0.0041
0.0400
0.1535
0.1325
0.2622
0.2500
0.2961
0.2079
0.5595
3.41
The solution that
minimizes Q:
b0 = -1.69955
b1 = 0.008399
Finding the Least Squares
Estimators
Numerical Search Procedures
Analytic Procedures
Numerical Search

Unconstrained Optimization Algorithms
 Systematically search for values of b0 and b1 that
minimize Q for a given set of data
 Spread sheet solution possible.

Excel Example Using GMAT data
b0 = -25
b1 = 0.05
GMAT
Score
(Xi)
550
480
470
390
450
620
600
520
470
430
490
540
500
630
460
430
500
590
410
470
GPA
(Yi)
3.10
2.30
3.00
1.90
2.50
3.70
3.40
2.60
2.80
1.60
2.00
2.90
2.30
3.20
1.80
1.40
2.00
3.80
2.20
1.50
Predicted
GPA
(Y)
2.50
-1.00
-1.50
-5.50
-2.50
6.00
5.00
1.00
-1.50
-3.50
-0.50
2.00
0.00
6.50
-2.00
-3.50
0.00
4.50
-4.50
-1.50
Error
Terms
0.60
3.30
4.50
7.40
5.00
-2.30
-1.60
1.60
4.30
5.10
2.50
0.90
2.30
-3.30
3.80
4.90
2.00
-0.70
6.70
3.00
Q=
Squared
Error
Terms
0.3600
10.8900
20.2500
54.7600
25.0000
5.2900
2.5600
2.5600
18.4900
26.0100
6.2500
0.8100
5.2900
10.8900
14.4400
24.0100
4.0000
0.4900
44.8900
9.0000
286.24
Yˆi  b0  b1 X i
e  Y  Yˆ
i
i
i
ei2  (Yi  Yˆi )2
Run Excel Example
(First, verify
using
analytic procedure)
n
Q  ε i2
i 1
Analytic Procedures

Direct solution for values of β0 and β1 (e.g., β0 =
b0 and β1 = b1) that minimize Q

Using calculus can find set of simultaneous
equations, the “normal equations”

Normal equations for b0 and b1 are:
Y  nb  b  X
X Y  b X b X
i
i i
0
0
1
i
i
1
2
i

Solving the normal equations (see HW1) we
obtain the values b0 and b1 that minimize Q:
b1
( X  X )(Y  Y )


(X  X )
i
i
2
i
1
b0   Yi  b1  X i   Y  b1 X
n
Example computations of b0 and
b1 Using GMAT data
Total
Mean
1
GMAT
Score
2
3
4
GPA
Xi
Yi
Xi  X
Yi  Y
550
480
470
390
450
620
600
520
470
430
490
540
500
630
460
430
500
590
410
470
10000
500
3.10
2.30
3.00
1.90
2.50
3.70
3.40
2.60
2.80
1.60
2.00
2.90
2.30
3.20
1.80
1.40
2.00
3.80
2.20
1.50
50
2.5
50.00
-20.00
-30.00
-110.00
-50.00
120.00
100.00
20.00
-30.00
-70.00
-10.00
40.00
0.00
130.00
-40.00
-70.00
0.00
90.00
-90.00
-30.00
0.60
-0.20
0.50
-0.60
0.00
1.20
0.90
0.10
0.30
-0.90
-0.50
0.40
-0.20
0.70
-0.70
-1.10
-0.50
1.30
-0.30
-1.00
5
6
( X i  X )(Yi  Y ) ( X i  X ) 2
30.0000
4.0000
-15.0000
66.0000
0.0000
144.0000
90.0000
2.0000
-9.0000
63.0000
5.0000
16.0000
0.0000
91.0000
28.0000
77.0000
0.0000
117.0000
27.0000
30.0000
766
2500.00
400.00
900.00
12100.00
2500.00
14400.00
10000.00
400.00
900.00
4900.00
100.00
1600.00
0.00
16900.00
1600.00
4900.00
0.00
8100.00
8100.00
900.00
91200
7
(Yi  Y ) 2
0.36
0.04
0.25
0.36
0.00
1.44
0.81
0.01
0.09
0.81
0.25
0.16
0.04
0.49
0.49
1.21
0.25
1.69
0.09
1.00
9.84
( X
i
 X )(Yi  Y ) 766
2
(
X

X
)
91,200
 i
X  500
Y  2.5
b1

X  X Y  Y 
766



 .0084
91,200
 X  X 
i
i
2
i
b0  Y  b1 X  2.5  .0083(500)  1.70
Point Estimates of Mean
Response
Point estimates obtained from...
ˆ
Y  b0  b1 X
Yˆ is the estimate of E{Y}, the ‘mean response’,
when the level of the predictor is X.
b0 and b1 are estimates of 0 and 1, respectively
Yˆi
is the fitted value for the ith case (i.e. when X =
Xi)
GPA Example...
b0 = -1.70; b1 = .0084
Yˆi  1.70  .0084 X i
Yˆ600  1.70  .0084(600)
 3.34
GMAT
Score
(Xi)
390
410
430
430
450
460
470
470
470
480
490
500
500
520
540
550
590
600
620
630
GPA
(Yi)
1.90
2.20
1.60
1.40
2.50
1.80
3.00
2.80
1.50
2.30
2.00
2.30
2.00
2.60
2.90
3.10
3.80
3.40
3.70
3.20
Predicted
GPA
(Y)
Yˆi 
1.58
1.74
1.91
1.91
2.08
2.16
2.25
2.25
2.25
2.33
2.42
2.50
2.50
2.67
2.84
2.92
3.26
3.34
3.51
3.59
Note difference
between observed
value and fitted
values..
Yˆi  b0  b1 X i
GMAT
Score
(Xi)
390
410
430
430
450
460
470
470
470
480
490
500
500
520
540
550
590
600
620
630
GPA
(Yi)
1.90
2.20
1.60
1.40
2.50
1.80
3.00
2.80
1.50
2.30
2.00
2.30
2.00
2.60
2.90
3.10
3.80
3.40
3.70
3.20
Predicted
GPA
ˆ
Y
(Y)
i
1.58
1.74
1.91
1.91
2.08
2.16
2.25
2.25
2.25
2.33
2.42
2.50
2.50
2.67
2.84
2.92
3.26
3.34
3.51
3.59
 
Error
Terms
0.32
0.46
-0.31
-0.51
0.42
-0.36
0.75
0.55
-0.75
-0.03
-0.42
-0.20
-0.50
-0.07
0.06
0.18
0.54
0.06
0.19
-0.39
Squared
Error
Terms
0.1049
0.2079
0.0974
0.2622
0.1764
0.1325
0.5655
0.3047
0.5595
0.0010
0.1730
0.0400
0.2500
0.0046
0.0041
0.0324
0.2961
0.0036
0.0369
0.1535
ei  Yi  Yˆi
ei2  (Yi  Yˆi )2
Difference between ei and i
Y600  3.4
e600  3.4  3.34  .06
Yˆ600
Yˆ600  3.34
X600
ei (residual) is the known deviation between the observed value
and the fitted value
i (model error term) is the deviation between the observed value
and the unknown true regression line. ei is an estimate of i
Estimate of Error Terms
Variance:
2

Unbiased estimator of 2 is MSE
Mean Square Error
SSE
MSE 

df
df=2 because had to
estimate 0 and 1 from
sample of size n. Thus,
only n-2 sources of
variability left to
estimate MSE

Error Sum of Squares or
Residual Sum of Squares
ˆ
Y

Y
 i i
n2
  e
2
2
i
n2
Degrees of Freedom
GPA Example...
e

MSE 
2
i
3.4063

 .189
n2
18
b
O
m
e
S
d
u
F
a
M
i
f
a
g
a
1
R
4
1
4
8
0
R
6
8
9
T
0
9
a
P
b
D
Maximum Likelihood Estimation




Requires functional form of probability distribution of
random error terms.
Provides estimates of required parameters that are most
consistent with the sample data.
In case of simple linear regression, the MLE estimators for
b0 and b1 are BLUE (=Best Linear Unbiased Estimators,
meaning that their expected values are equal to the true
parameter values β0 and β1).
The MLE estimator for 2 is biased but works out OK
when sample size is large.
Download