# STK310LectureNotes2022SectionA

```Department of Statistics
Statistics 310
Lecture Notes
SECTION A: Simple &amp; Multiple Regression Analysis
Compiled by Dr Paul J van Staden
Statistics 310
A1.
SIMPLE REGRESSION – THE NATURE OF REGRESSION ANALYSIS ............................................................. 1
THE MODERN INTERPRETATION OF REGRESSION ................................................................................................................ 1
STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS ........................................................................................................ 1
REGRESSION VERSUS CAUSATION ...................................................................................................................................... 2
REGRESSION VERSUS CORRELATION .................................................................................................................................. 2
TYPES OF DATA................................................................................................................................................................. 3
Time series data ...................................................................................................................................................... 3
Cross-sectional data .............................................................................................................................................. 3
Pooled data .............................................................................................................................................................. 4
A2.
SIMPLE REGRESSION – TWO-VARIABLE REGRESSION MODEL: BASIC IDEAS ......................................... 5
A HYPOTHETICAL EXAMPLE .............................................................................................................................................. 5
THE CONCEPT OF POPULATION REGRESSION FUNCTION (PRF) ........................................................................................... 6
THE MEANING OF THE TERM LINEAR .................................................................................................................................. 7
STOCHASTIC SPECIFICATION OF PRF ................................................................................................................................. 7
THE SAMPLE REGRESSION FUNCTION (SRF) ...................................................................................................................... 8
A3.
SIMPLE REGRESSION – TWO-VARIABLE REGRESSION MODEL: ESTIMATION ....................................... 10
METHOD OF ORDINARY LEAST SQUARES (OLS) .............................................................................................................. 10
CLASSICAL LINEAR REGRESSION MODEL: ASSUMPTIONS UNDERLYING METHOD OF LEAST SQUARES ................................. 13
STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS: GAUSS-MARKOV THEOREM.................................................. 14
STANDARD ERRORS OF LEAST SQUARES ESTIMATORS...................................................................................................... 17
COEFFICIENT OF DETERMINATION (R2): A MEASURE OF GOODNESS OF FIT ....................................................................... 18
A4.
SIMPLE REGRESSION – THE NORMALITY ASSUMPTION ........................................................................... 21
NORMALITY ASSUMPTION FOR STOCHASTIC ERROR TERM................................................................................................ 21
STATISTICAL PROPERTIES OF OLS ESTIMATORS UNDER THE NORMALITY ASSUMPTION ..................................................... 21
THE METHOD OF MAXIMUM LIKELIHOOD (ML)............................................................................................................... 22
A5.
SIMPLE REGRESSION – TWO-VARIABLE REGRESSION MODEL: INFERENCE ......................................... 25
THEOREMS FOR PROBABILITY DISTRIBUTIONS ................................................................................................................ 25
CONFIDENCE INTERVALS ................................................................................................................................................ 26
Confidence interval for β2 ................................................................................................................................. 26
Confidence interval for β1 ................................................................................................................................. 27
Confidence interval for σ2 ................................................................................................................................. 27
HYPOTHESIS TESTING .................................................................................................................................................... 27
Hypothesis testing for β2 .................................................................................................................................... 27
Hypothesis testing for β1 .................................................................................................................................... 28
ANALYSIS OF VARIANCE (ANOVA) .................................................................................................................................. 31
PREDICTION .................................................................................................................................................................. 32
Mean prediction ................................................................................................................................................... 32
Individual prediction .......................................................................................................................................... 32
A6.
SIMPLE REGRESSION – EXTENSIONS OF THE TWO-VARIABLE LINEAR REGRESSION MODEL ........... 34
REGRESSION THROUGH ORIGIN...................................................................................................................................... 34
LOG-LINEAR MODEL ..................................................................................................................................................... 34
LOG-LIN MODEL ........................................................................................................................................................... 36
LIN-LOG MODEL ........................................................................................................................................................... 37
RECIPROCAL MODEL ..................................................................................................................................................... 37
A7.
MULTIPLE REGRESSION – ESTIMATION ...................................................................................................... 38
THREE-VARIABLE REGRESSION MODEL .......................................................................................................................... 38
MEANING OF PARTIAL REGRESSION COEFFICIENTS .......................................................................................................... 39
OLS &amp; ML ESTIMATION................................................................................................................................................. 39
R2 &amp; ADJUSTED R2 ......................................................................................................................................................... 39
PARTIAL CORRELATION COEFFICIENTS ........................................................................................................................... 40
POLYNOMIAL REGRESSION ............................................................................................................................................. 40
A8.
MULTIPLE REGRESSION – INFERENCE......................................................................................................... 41
NORMALITY ASSUMPTION ............................................................................................................................................... 41
HYPOTHESIS TESTING .................................................................................................................................................... 41
PREDICTION .................................................................................................................................................................. 44
A9.
MULTIPLE REGRESSION – MATRIX APPROACH TO REGRESSION ANALYSIS ......................................... 46
TWO-VARIABLE REGRESSION MODEL ............................................................................................................................. 46
THREE-VARIABLE REGRESSION MODEL .......................................................................................................................... 46
K-VARIABLE REGRESSION MODEL .................................................................................................................................. 46
OLS ESTIMATION .......................................................................................................................................................... 48
INFERENCE .................................................................................................................................................................... 49
REFERENCES.................................................................................................................................................................. 50
STK310
A1. SIMPLE REGRESSION – THE NATURE OF REGRESSION
ANALYSIS
THE MODERN INTERPRETATION OF REGRESSION
Regression analysis is concerned with the study of the dependence of one variable, the so-called
dependent variable, on one or more other variables, referred to as the explanatory variables.
With regression analysis the population mean value of the dependent variable is estimated and/or
predicted in terms of the known values of the explanatory variables.
Suppose, for example, we want to find out how the average height of sons changes, given their
fathers’ height (Gujarati &amp; Porter, 2009). In other words, we want to predict the average height of a
son given that we know the height of the father.
195
190
Son's height (cm)
185
180
175
170
165
160
155
155
160
165
170
175
180
Father's height (cm)
185
190
195
Scatter diagram of the hypothetical distribution of sons' heights corresponding to given
heights of their fathers
Given a certain height for the fathers, we have a range (distribution) of heights for their sons.
Furthermore, the average height of the sons increases as the height of the fathers increases. This is
indicated by the regression line, which will be discussed in detail later.
STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS
We will deal with variables that are stochastic or random, that is, variables that have probability
distributions. We will consider the statistical relationships between these variables.
Consider the example on the heights of fathers and sons. The dependence of the height of a son on
his father’s height is statistical in nature, since we will not be able to predict the son’s height
exactly. This may be because of errors in the measurement of the variables as well as the absence of
other variables that affect height but are not used in the analysis. There is therefore some random
variability in the height that cannot be fully explained.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
1
STK310
With deterministic dependency, the variables are not random or stochastic. An example is Ohm’s
law, which states: For metallic conductors over a limited range of temperature the current, C, is
proportional to the voltage, V:
C = k1 V
REGRESSION VERSUS CAUSATION
Regression analysis deals with the dependence of one variable on other variables. However, this
does not necessarily imply causation. Consider the following example (data taken from Steyn et al.,
1999):
500
450
400
350
300
250
200
150
100
50
0
0
50
100
150 200 250 300 350 400
Production of eggs (million dozen)
450
500
Scatter diagram of the annual number of road accidents in South Africa (thousands)
against the annual production of eggs in South Africa (million dozen)
As the annual production of eggs increases, the average annual number of road accidents increases.
There is therefore a statistical relationship between the two variables. However, we cannot logically
explain this dependence. For causality we need a priori or theoretical considerations.
REGRESSION VERSUS CORRELATION
Regression analysis and correlation analysis are closely related, but have some fundamental
differences.
We have already defined regression analysis as the study of the dependence (linear or nonlinear) of
the dependent variable on one or more explanatory variables. We assume that the dependent
variable is stochastic (random) with a probability distribution. The explanatory variables are
assumed to have fixed values (in repeated sampling).
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
2
STK310
With correlation analysis the strength of the linear association between two variables is measured.
We assume that both variables are stochastic. There is furthermore no distinction between
dependent and explanatory variables. This means that the correlation between the height of a son
and the height of his father is the same as the correlation between the height of a father and the
height of his son.
TYPES OF DATA
Time series data
A time series is a set of values that are observed sequentially over time. The data is typically
collected at regular intervals – daily (e.g. stock prices), monthly (e.g. the consumer price index),
quarterly (e.g. private consumption expenditure) and annually. Consider as an example the annual
production of eggs in South Africa (Steyn et al., 1999).
Production of eggs (million dozen)
400
350
300
250
200
150
100
50
0
1959 1962 1965 1968 1971 1974 1977 1980 1983 1986 1989 1992
Year
Time plot of the annual production of eggs in South Africa (million dozen)
Cross-sectional data
Cross-sectional data are observations of a variable that are collected at the same point in time.
Consider as an example the production of eggs in the 50 states of the USA in 1990 (Gujarati &amp;
Porter, 2009). The histogram is an appropriate graphical representation for cross-sectional data.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
3
STK310
35
30
Frequency
25
20
15
10
5
0
500
1500
2500
3500
4500
5500
Production of eggs (millions)
6500
7500
Histogram of the production of eggs in the 50 states of the USA in 1990
Pooled data
Pooled data is a combination of time series and cross-sectional data. For example, if the production
of eggs is observed for each of the 50 states in the USA for a number of years, then the data is
referred to as pooled data. Panel data is a special type of pooled data where the same cross-sectional
unit is observed over time. The above-mentioned example is therefore an example of panel data,
since we have observations for the same 50 states at each time point.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
4
STK310
A2. SIMPLE REGRESSION – TWO-VARIABLE REGRESSION
MODEL: BASIC IDEAS
A HYPOTHETICAL EXAMPLE
Suppose we have a country with a total population of 60 families and that we want to study the
relationship between weekly family consumption expenditure, Y, and weekly income, X (Gujarati &amp;
Porter, 2009, p. 34). Assume that we want to predict the expected level (mean level) of
consumption expenditure given that we know the income. Consider the following hypothetical
dataset:
Weekly family
consumption
expenditure: Y
80
55
60
65
70
75
100
65
70
74
80
85
88
120
79
84
90
94
98
Σ(Y | X)
E(Y | X)
325
65
462
77
445
89
Weekly family income: X
140 160 180 200 220
80 102 110 120 135
93 107 115 136 137
95 110 120 140 140
103 116 130 144 152
108 118 135 145 157
113 125 140
160
115
162
707 678 750 685 1043
101 113 125 137 149
240
137
145
155
165
175
189
260
150
152
175
178
180
185
191
966 1211
161 173
Corresponding to a given income level, we have a range of values for the consumption expenditure.
In other words, we have a distribution of values for Y for a fixed value of X. Since Y is conditional
upon the given values for X, this distribution is referred to as a conditional distribution. The
expected value of Y is
E (Y ) =
1
n
∑Y = 601 &times; 7 272 = 121.2 ,
so the mean level of weekly consumption expenditure is \$121.20. This is an unconditional mean,
since the level of income is not taken into account. For each level of income, a conditional expected
value of consumption expenditure can be calculated. For example,
E (Y | X = 80) =
1
5
∑ (Y | X = 80) = 15 &times; 325 = 65 .
Thus, given a weekly income level of \$80, the expected weekly consumption expenditure is \$65. In
general the conditional expected value is calculated with the formula
E (Y | X = X i ) = ∑ [Y &times; p (Y | X = X i )] ,
where p(Y | X = X i ) is the conditional probability. For instance, there are 5 families with a weekly
income level of \$80. Assuming that one of these families is randomly selected, the probability of
selecting the family is 15 . Therefore
E (Y | X = 80) = ∑ [Y &times; p (Y | X = 80)] = 15 &times; 325 = 65 .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
5
STK310
Consider a scatter diagram of the weekly consumption expenditure against the weekly income:
Weekly consumption expenditure (\$)
200
150
100
50
60
80
100 120 140 160 180 200 220 240 260 280
Weekly income (\$)
Conditional distribution of consumption expenditure for various levels of income
Note again that for a given level of income, there is a range of values for consumption expenditure.
Furthermore, on average the consumption expenditure increases as the income increases. In other
words, the conditional expected values of Y increase as the values of X increase. The conditional
expected values lie on a straight line with a positive slope. This line is called the population
regression line and it represents the regression of Y on X. In general we will have a population
regression curve (the population regression line is a special case).
THE CONCEPT OF POPULATION REGRESSION FUNCTION (PRF)
The conditional expected value, E (Y | X i ) , is a function of X i ,
E (Y | X i ) = f ( X i ) ,
and is called the population regression function (PRF). The functional form of the PRF can be
linear or nonlinear. For simplicity we will start with a linear form. In effect, we will assume that the
PRF is a linear function of X i ,
E (Y | X i ) = β1 + β 2 X i .
β1 and β 2 are unknown (but fixed) parameters called regression coefficients. β1 is the intercept,
while β 2 is the slope. With regression analysis we want to estimate the PRF. This means that we
want to estimate β1 and β 2 .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
6
STK310
THE MEANING OF THE TERM LINEAR
The population regression line,
E (Y | X i ) = β1 + β 2 X i
is a linear function of X i . We say that we have linearity in variables. In contrast, the PRF
E (Y | X i ) = β1 + β 2 X i2
is nonlinear with respect to the variable X. For both examples we have so-called linearity in
parameters. This means that the PRF is linear with respect to the parameters β1 and β 2 . Linearity
in parameters requires that the parameters are raised to the first power only. Furthermore, a
parameter may not be multiplied or divided by any other parameter. An example of a PRF that is
nonlinear with respect to the parameters is
E (Y | X i ) = β 1 + β 2 X i .
We will consider linear regression models. A model is a linear regression model if it is linear in
parameters. The model may or may not be linear in variables.
STOCHASTIC SPECIFICATION OF PRF
Returning to the example on consumption expenditure and income, we see that given a certain level
of income, the consumption expenditure of some families is more than the conditional expected
value, while for, other families it is less. In effect, given a level of income, the consumption
expenditure of the different families fluctuates around the conditional expected vale. The deviation
of an individual Yi around its expected value is
ui = Yi − E (Y | X i ) .
The stochastic specification of the PRF is then
Yi = E (Y | X i ) + u i .
That is, Yi is expressed as the sum of two components:
E (Y | X i ) , the conditional expected value of Y for a given value of X, is the deterministic
component.
ui , the so-called stochastic error (disturbance) term, is a random component.
ui is a surrogate for all the explanatory variables that are omitted from the model, but that may
affect the dependent variable. It can be shown that E (ui | X i ) = 0 .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
7
STK310
The stochastic specification of the population regression line is
Yi = E (Y | X i ) + ui
= β1 + β 2 X i + ui .
THE SAMPLE REGRESSION FUNCTION (SRF)
With regression analysis we want to estimate the PRF on the basis of a sample regression function
(SRF). Returning to the example on consumption expenditure and income, we considered a
population of Y values corresponding to fixed X values. Generally the population values are not all
known. We only have a sample of Y values available. Consider the following two random samples
of Y values:
First sample
Y
X
70
80
65
100
90
120
95
140
110
160
115
180
120
200
140
220
155
240
150
260
Second sample
Y
X
55
80
88
100
90
120
80
140
118
160
120
180
145
200
135
220
145
240
175
260
For these two random samples, a scatter diagram of the weekly consumption expenditure against
weekly income is given below. The pairs of (X, Y) values plotted with &times; and the solid line
correspond to the first sample, while the pairs of (X, Y) values plotted with ● and the dashed line
correspond to the second sample.
Weekly consumption expenditure (\$)
200
150
100
50
60
80
100 120 140 160 180 200 220 240 260 280
Weekly income (\$)
Regression lines based on two different samples
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
8
STK310
The drawn lines are known as sample regression lines. The sample regression line is the sample
counterpart of the population regression line and is written as
Yˆi = βˆ1 + βˆ2 X i .
Yˆi is the estimator of the conditional expected value, E (Y | X i ) , while βˆ1 and βˆ 2 are the
estimators of β1 and β 2 , the unknown population parameters. Note that the value obtained for an
estimator is referred to as an estimate. Also note that, whereas the PRF and its parameters are
unknown but fixed, the SRF and its estimators will differ for different random samples.
The stochastic specification of the SRF is
Yi = Yˆi + uˆi
= βˆ1 + βˆ2 X i + uˆi ,
where
uˆi = Yi − Yˆi ,
the difference between the observed and estimated values of Y, is known as the residual.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
9
STK310
A3. SIMPLE REGRESSION – TWO-VARIABLE REGRESSION
MODEL: ESTIMATION
METHOD OF ORDINARY LEAST SQUARES (OLS)
Recall that the PRF is given by
Yi = E (Y | X i ) + ui
= β1 + β 2 X i + ui .
Since the PRF is not observable, we estimate it from the SRF
Yi = βˆ1 + βˆ2 X i + uˆi
= Yˆ + uˆ .
i
i
Given a sample of n observations of Y and X, we would like to determine the SRF so that it is as
close as possible to the PRF, in effect, so that the Yˆi ’s are as close as possible to the Yi ’s. This is
equivalent to minimizing the ûi ’s.
How is this done?
First consider the sum of the residuals,
∑ ûi . Minimizing ∑ ûi
is not a good criterion, since the
positive residuals will be cancelled out by the negative residuals. We therefore can get that ∑ û i is
small (even zero), although the individual residuals are large. We need a criterion that ignores the
sign of the residuals. The sum of the absolute residuals, ∑ û i , can be used, but minimizing this is
mathematically demanding. We therefore use the least-squares criterion.
Theorem
The OLS estimators for β 2 and β1 are respectively
βˆ2 =
=
n ∑ X iYi − ∑ X i ∑Yi
n ∑ X i2 − (∑ X i )
2
∑ ( X i − X )(Yi − Y )
∑ ( X i − X )2
and
βˆ1 =
1
n
∑Yi − n1 βˆ2 ∑ X i
= Y − βˆ2 X .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
10
STK310
Proof
With the least-squares method, the sum of the squared residuals,
Q = ∑ uˆi2
= ∑ (Yi − Yˆi )2
= ∑ (Yi − βˆ1 − βˆ2 X i ) 2 ,
is minimized. The ordinary least squares (OLS) estimators, β̂1 and β̂ 2 , are derived as follow:
1. Determine the partial derivatives with respect to β̂1 and β̂ 2 :
∂Q
= −2∑ (Yi − βˆ1 − βˆ2 X i )
ˆ
∂β1
①
∂Q
= −2∑ X i (Yi − βˆ1 − βˆ2 X i )
ˆ
∂β 2
②
2. Set ① and ② equal to zero and simplify to obtain the so-called normal equations:
① = 0:
− 2∑ (Yi − βˆ1 − βˆ2 X i ) = 0
∑Yi − ∑ βˆ1 − ∑ βˆ2 X i = 0
∑Yi = nβˆ1 + βˆ2 ∑ X i
② = 0:
③
− 2∑ X i (Yi − βˆ1 − βˆ2 X i ) = 0
∑ X iYi − ∑ βˆ1 X i − ∑ βˆ2 X i2 = 0
∑ X iYi = βˆ1 ∑ X i + βˆ2 ∑ X i2
④
3. Solve ③ and ④ simultaneously:
③ &times; ∑ Xi :
2
∑ X i ∑Yi = nβˆ1 ∑ X i + βˆ2 (∑ X i )
⑤
④ &times;n:
n ∑ X iYi = nβˆ1 ∑ X i + nβˆ2 ∑ X i2
⑥
⑥ – ⑤:
2
n ∑ X iYi − ∑ X i ∑Yi = nβˆ2 ∑ X i2 − βˆ2 (∑ X i )
[
2
= βˆ2 n ∑ X i2 − (∑ X i )
]
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
11
STK310
βˆ2 =
=
From ③:
n ∑ X iYi − ∑ X i ∑Yi
n ∑ X i2 − (∑ X i )
2
∑ ( X i − X )(Yi − Y )
∑ ( X i − X )2
βˆ1 = n1 ∑ Yi − n1 βˆ2 ∑ X i
= Y − βˆ2 X
■
The OLS estimates are interpreted as follow:
β̂1 gives the mean value of Y given that X = 0 .
β̂ 2 gives the effect of a unit change in X on the mean value of Y.
The OLS estimators have the following numerical properties:
I. The OLS estimators are expressed in terms of observed Y and X values and are therefore easily
calculated.
II. The OLS estimators are point estimators.
III. The SRF is determined using the OLS estimates and the observed X.
The deviation form of the SRF is
y i = yˆ i + uˆ i
= βˆ 2 x i + uˆ i
where yi = Yi − Y and xi = X i − X . The properties of the SRF are:
1. The SRF passes through Y and X .
2. Yˆ = Y
3.
∑ uˆ i = 0 and hence uˆ = 0
4. ûi is uncorrelated with Yˆi since
5. ûi is uncorrelated with X i since
∑ Yˆi uˆ i = ∑ yˆ i uˆ i = 0 .
∑ X i uˆ i = 0 .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
12
STK310
CLASSICAL LINEAR REGRESSION MODEL: ASSUMPTIONS UNDERLYING METHOD OF
LEAST SQUARES
Assumption 1
Linear regression model (linear in parameters):
Yi = β1 + β 2 X i + ui
Assumption 2
X values fixed (nonstochastic) in repeated sampling or sampled randomly such that cov(ui , X i ) = 0 .
Assumption 3
Zero mean value for ui :
E (ui | X i ) = 0
Assumption 4
Homoscedasticity or equal variance of ui :
var(ui | X i ) = σ 2
Assumption 5
No autocorrelation between ui ’s:
cov(ui , u j | X i , X j ) = 0 , i ≠ j
Assumption 6
Number of observations (n) &gt; number of parameters (k).
Assumption 7
Variability in X values:
var( X i ) &gt; 0
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
13
STK310
STATISTICAL PROPERTIES
THEOREM
OF
LEAST SQUARES ESTIMATORS: GAUSS-MARKOV
An estimator is said to be the best linear unbiased estimator (BLUE) if
1. it is a linear function of a random variable,
2. it is unbiased in that the expected value of the estimator is equal to the (unknown) true
parameter value,
3. and it is efficient since it has the minimum variance in the class of all linear unbiased
estimators.
Theorem
The Gauss-Markov Theorem states that, given the assumptions of the classical linear regression
model, the OLS estimators are BLUE.
Proof
This theorem will be proven for βˆ2 . Note that this estimator can be expressed as
βˆ2 = ∑
xiYi
∑ xi2
where ki =
= ∑ kiYi ,
xi
and xi = X i − X .
∑ xi2
The properties of ki are:
Assuming that X i is nonstochastic (fixed), ki is also nonstochastic.
 xi 
=
2
x
∑
i


∑ ki = ∑ 
2
∑ xi = 0 since ∑ xi = 0 .
∑ xi2
 xi 
2
k
=
∑ i ∑  x 2  =
∑ i 
∑ xi2 = 1 .
(∑ xi2 )2 ∑ xi2
 x 
∑ ki xi = ∑  ix 2  xi =
∑ i 
∑ xi2 = 1 .
∑ xi2
1. Linearity
β̂ 2 = ∑ kiYi is a linear function of Y and thus a linear estimator.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
14
STK310
2. Unbiasedness
Using the population regression line and the properties of ki , βˆ2 can be written as
βˆ2 = ∑ kiYi
= ∑ ki (β1 + β 2 X i + ui )
= β1 ∑ ki + β 2 ∑ ki X i + ∑ ki ui
= β1 ∑ ki + β 2 ∑ ki (xi + X ) + ∑ ki ui
= β1 ∑ ki + β 2 ∑ ki xi + β 2 X ∑ ki + ∑ ki ui
= β 2 + ∑ ki ui .
The expected value of βˆ2 is then
( )
E βˆ2 = E (β 2 + ∑ ki ui )
= β 2 + ∑ ki E (ui )
= β2 ,
and therefore βˆ2 is an unbiased estimator for β 2 .
3. Efficiency
Since var (Yi ) = var (ui ) = σ 2 and cov (Yi , Y j ) = cov (ui , u j ) = 0 for i ≠ j ,
( )
var βˆ2 = var (∑ kiYi )
= ∑ ki2 var (Yi )
=
σ2
∑ xi2
Let β 2* = ∑ wiYi be an alternative linear estimator of β 2 . Then
β 2* = ∑ wiYi
= ∑ wi (β1 + β 2 X i + ui )
= β1 ∑ wi + β 2 ∑ wi X i + ∑ wi ui
= β1 ∑ wi + β 2 ∑ wi (xi + X ) + ∑ wi ui
= β1 ∑ wi + β 2 ∑ wi xi + β 2 X ∑ wi + ∑ wi ui .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
15
STK310
The expected value of β 2* is
( )
E β 2* = E (β1 ∑ wi + β 2 ∑ wi xi + β 2 X ∑ wi + ∑ wi ui )
= β1 ∑ wi + β 2 ∑ wi xi + β 2 X ∑ wi + ∑ wi E (ui ) .
To be unbiased,
∑ wi = 0 and ∑ wi xi = 1 so that E (β 2* ) = β 2 .
The variance of β 2* is
( )
var β 2* = var (∑ wiYi )
= σ 2 ∑ wi2

x
= σ ∑  wi − i 2 +

∑ xi

2
xi 
∑ xi2 
2
2

 x 
x 
= σ ∑  wi − i 2  + σ 2 ∑  i 2 

 ∑x 
∑ xi 
i 


2
2

x  x 
+ 2σ 2 ∑  wi − i 2   i 2 

∑ xi   ∑ xi 

2

xi 
σ2

= σ ∑ wi −
+
,

∑ xi2  ∑ xi2

2
since
∑ wi xi = 1
∑ xi2 ∑ xi2
If wi =
and
∑ xi2 = 1 .
(∑ xi2 )2 ∑ xi2
xi
= ki , then
2
x
∑i
( )
var β 2* =
σ2
∑ xi2
( )
= var βˆ2 .
In effect, the miminum variance obtained for any unbiased estimator of β 2 is the variance of
βˆ2 , so βˆ2 is an efficient estimator.
■
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
16
STK310
STANDARD ERRORS OF LEAST SQUARES ESTIMATORS
The expected value, variance and standard error of β̂1 are respectively
( )
E βˆ1 = β1 ,
σ 2 ∑ X i2
var βˆ1 =
n ∑ xi2
( )
and
∑ X i2 .
n ∑ xi2
( )
se βˆ1 = σ
For β̂ 2 expected value, variance and standard error are
( )
E βˆ2 = β 2 ,
σ2
var βˆ2 =
∑ xi2
( )
and
( )
se βˆ2 =
σ
∑ xi2
.
The covariance between β̂1 and β̂ 2 is
σ2
cov βˆ1 , βˆ2 = − X var βˆ2 = − X
.
∑ xi2
(
)
( )
The mean square error (MSE) is
σˆ
2
uˆi2
∑
=
.
n−2
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
17
STK310
COEFFICIENT OF DETERMINATION (R2): A MEASURE OF GOODNESS OF FIT
The total deviation for the dependent variable can be divided into two components,
yi = Yi − Y
= Yi − Y + Yˆi − Yˆi
= Yˆi − Y + Yi − Yˆi ,
(
) (
)
where
(
) (
Yˆi − Y = βˆ1 + βˆ2 X i − βˆ1 + βˆ2 X
= βˆ (X − X )
2
)
i
= βˆ2 xi
is the explained deviation and Yi − Yˆi = uˆi is the unexplained deviation.
The total sum of squares (total variation) is given by
TSS = ∑ yi2
= ∑ (Yi − Y )
2
(
) + ∑ (Y − Yˆ )
2
= ∑ Yˆi − Y
2
i
i
= βˆ22 ∑ xi2 + ∑ uˆi2
where ESS is the explained sum of squares (explained variation) and RSS is the residual sum of
squares (unexplained variation).
The coefficient of determination is then defined as
R2 =
=
ESS
TSS
∑ Yˆi − Y
(
)
2
∑ (Yi − Y )
=1−
=1−
2
TSS
∑ uˆi2
∑ yi2
.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
18
STK310
R 2 measures the percentage of total variation in Y explained by the regression model. If Yˆi = Yi so
that uˆi = 0 for all i, then R 2 = 1 and all the variation in Y is explained by the regression model. In
contrast, if Yˆi = Y for all i, then none of the variation in Y is explained by the regression model and
R 2 = 0 . So it follows that 0 ≤ R 2 ≤ 1 .
The sample correlation coefficient is given by
r = &plusmn; R2
=
∑ xi yi .
∑ xi2 ∑ yi2
The properties of r are:
1. The sign of r is the same as the sign of βˆ2 .
2. − 1 ≤ r ≤ 1
3. rXY = rYX
4. r is independent of the units and scale in which X and Y are measured.
5. If X and Y are independent, then r = 0 . However, if r = 0 , then X and Y are not necessarily
independent.
6. r is only a measure of linear dependence.
7. r cannot be used to determine the cause-and-effect relationship between two variables.
Example: Wages versus education (Gujarati &amp; Porter, 2009, p. 78)
In this example education, measured by the number of years of schooling, is used to explain the
mean hourly wage (in \$). The dataset is:
Years of schooling: X Hourly wage: Y
6
4.4567
7
5.7700
8
5.9787
9
7.3317
10
7.3182
11
6.5844
12
7.8182
13
7.8351
14
11.0223
15
10.6738
16
10.8361
17
13.6150
18
13.5310
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
19
STK310
14
13
Hourly wage: Y
12
y = 0.7241x - 0.0145
R&sup2; = 0.9078
11
10
9
8
7
6
5
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
Years of schooling: X
Scatter diagram of hourly wage (\$) against years of schooling with sample regression line
The estimated regression line is Yˆi = −0.0145 + 0.7241 X i .
There is no practical interpretation for the OLS estimate of the intercept, βˆ1 = −0.0145 , since wages
cannot be negative. The OLS estimate for the slope, βˆ = 0.7241 , indicates that with each
2
additional year of schooling, hourly wages on average increase by 72 cents.
Based on the coefficient of determination, R 2 = 0.9078 , 90.78% of the variation in hourly wages is
explained by the number of years of schooling. Hourly wages and education are highly positively
correlated in that r = 0.9528 .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
20
STK310
A4. SIMPLE REGRESSION – THE NORMALITY ASSUMPTION
NORMALITY ASSUMPTION FOR STOCHASTIC ERROR TERM
E (ui ) = 0
var(ui ) = σ 2
cov(ui , u j ) = 0 for i ≠ j
No assumption has been made yet about the probability distribution of ui .
The classical normal linear regression model assumes that each ui is normally distributed with
mean zero and variance σ 2 , denoted ui ~ N (0, σ 2 ) . If two normally distributed variables have zero
correlation, they are independent. Therefore ui ~ NID(0, σ 2 ) , where NID indicates normally and
independently distributed.
Why the normal distribution?
1. Following the central limit theorem (CLT), given a very large number of independent and
identically distributed random variables, denoted IID, the distribution of the sum of these
variables will be approximately normal.
2. Following a variant of the CLT, the above still holds if the number of variables is not very large
or if they are not strictly independent.
3. Any linear function of normally distributed variables is also normally distributed. Therefore,
under the normality assumption of ui , the OLS estimators β̂1 and β̂ 2 will also be normally
distributed.
4. The normal distribution is fully described by only two parameters, its mean and variance.
5. The normality assumption allows the use of t, F and χ 2 hypothesis tests in regression
modeling.
STATISTICAL PROPERTIES OF OLS ESTIMATORS UNDER THE NORMALITY ASSUMPTION
Given the assumption of normality, the OLS estimators, β̂1 , β̂ 2 and σ̂ 2 , have the following
statistical properties:
1. They are unbiased estimators.
2. They have minimum variance and are therefore efficient estimators.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
21
STK310
3. They are consistent estimators in that, as the sample size tends to infinity, the estimates
converge to the true parameter values.
4. βˆ1 ~ N ( β1 , σ β2ˆ ) with σ β2ˆ =
1
1
∑ X i2 σ 2 so that Z = βˆ1 − β1 ~ N (0, 1)
σ βˆ
n∑ xi2
1
σ2
βˆ2 − β 2
=
Z
~ N (0, 1)
so
that
5. βˆ 2 ~ N ( β 2 , σ β2ˆ ) with σ β2ˆ =
2
2
σ βˆ
∑ xi2
2
6.
( n − 2)σˆ 2
σ
2
~ χ 2 ( n − 2)
7. β̂1 and β̂ 2 are distributed independently of σ̂ 2 .
8. β̂1 and β̂ 2 have the minimum variance of all unbiased estimators (linear and nonlinear) and are
therefore the best unbiased estimators (BUE).
THE METHOD OF MAXIMUM LIKELIHOOD (ML)
Consider the two-variable regression model,
Yi = β1 + β 2 X i + ui .
Assume that the Yi s are normally and independently distributed, in effect,
Yi ~ NID( β1 + β 2 X i ,σ 2 ) .
Theorem
The ML estimators for β 2 , β1 and σ 2 are
~
β 2 = βˆ2
=
=
n ∑ X iYi − ∑ X i ∑Yi
n ∑ X i2 − (∑ X i )
2
∑ ( X i − X )(Yi − Y ) ,
∑ ( X i − X )2
~
β1 = βˆ1
=
1
n
∑Yi − n1 βˆ2 ∑ X i
= Y − βˆ2 X .
and
σ~ 2 =
1
n
∑ uˆi2 .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
22
STK310
Proof
The probability density function of Yi is
f (Yi ) =
 (Y − β1 − β 2 X i )2 
exp − i
.
2σ 2
2πσ 2


1
Due to their independence, the joint probability density function of Y1 , Y2 ,..., Yn is obtained by
f (Y1 , Y2 ,..., Yn ) = f (Y1 ) &times; f (Y2 ) &times; ... &times; f (Yn ) .
The likelihood function is then given by
LF ( β1 , β 2 ,σ 2 ) = f (Y1,Y2 ,...,Yn )
=
1
( 2πσ )
2
=
 1 (Yi − β1 − β 2 X i ) 2 
exp
− 2 ∑

n
σ2


 1 (Yi − β1 − β 2 X i ) 2 
exp
− 2 ∑

n
θ
( 2πθ ) 2


1
where θ = σ 2 .
With the method of maximum likelihood, the unknown parameters are estimated in such a way that
the probability of observing the given Yi s is as high as possible. This is done by partial
differentiation to obtain normal equations that are solved simultaneously.
To simplify the differentiation, the log-likelihood function,
ln LF = − n2 ln 2π − n2 ln θ − 12 ∑
(Yi − β1 − β 2 X i ) 2
θ
,
is used. The partial derivatives are:
~ ~
∂ ln LF 1
~ = ~ ∑ (Yi − β1 − β 2 X i ) ,
θ
∂β1
ⓐ
∂ ln LF 1
~ ~
~ = ~ ∑ X i (Yi − β1 − β 2 X i ) ,
θ
∂β 2
ⓑ
and
∂ ln LF
n
1
~ = − ~ + ~2
∂θ
2θ 2θ
~
~
∑ (Yi − β1 − β 2 X i )2 .
ⓒ
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
23
STK310
The normal equations are obtained by setting ⓐ and ⓑ equal to zero and simplifying:
1
~ ~
~ ∑ (Yi − β1 − β 2 X i ) = 0
ⓐ = 0:
θ
~
~
∑Yi − ∑ β1 − ∑ β2 X i = 0
~
~
∑Yi = nβ1 + β2 ∑ X i
ⓓ
1
~ ~
~ ∑ X i (Yi − β1 − β 2 X i )
ⓑ = 0:
θ
~
~
∑ X iYi = β1 ∑ X i + β2 ∑ X i2
ⓔ
The normal equations of the ML estimators and the OLS estimators for β1 and β 2 are the same and
~
~
thus β = βˆ and β = βˆ .
1
1
2
2
To obtain the ML estimator of σ 2 , set ⓒ equal to zero and simplify:
n
1
− ~ + ~2
2θ 2θ
~
~
∑ (Yi − β1 − β 2 X i ) 2 = 0
~
θ = σ~ 2
∑ (Yi − βˆ1 − βˆ2 X i )2
= n1 ∑ uˆi2
=
1
n
■
Note that the ML estimator of σ 2 is biased, since
E (σ~ 2 ) = (n −n 2 )σ 2
≠ σ 2.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
24
STK310
A5. SIMPLE REGRESSION – TWO-VARIABLE REGRESSION
MODEL: INFERENCE
THEOREMS FOR PROBABILITY DISTRIBUTIONS
The following listed theorems will be of importance for statistical inference.
Theorem 1
(
)
Let Z1 , Z 2 ,..., Z n be normally and independently distributed random variables, Zi ~ N &micro;i ,σ i2 . Then
∑ ki Zi ~ N (∑ ki &micro;i , ∑
ki2σ i2
), where the k s are constants.
i
Theorem 2
Let
Z1 , Z 2 ,..., Z n be normally distributed random variables which are not independent,
Zi ~ N ( &micro;i ,σ i2 ) . Then


∑ ki Z i ~ N  ∑ ki &micro; i , ∑ ki2σ i2 + 2∑ ki k j cov(Z i , Z j )  .

i≠ j

Theorem 3
Let Z1 , Z 2 ,..., Z n be standardized normal random variables that are independent, Z i ~ N (0, 1) . Then
Zi2 ~ χ 2 (1) and
∑ Zi2 ~ χ 2 (n) .
Theorem 4
Let Z1 , Z 2 ,..., Z n be independently distributed random variables, each following a χ 2 distribution,
Z i ~ χ 2 (ki ) . Then
∑ Zi ~ χ 2 (∑ ki ).
Theorem 5
If Z1 ~ N (0, 1) and Z 2 ~ χ 2 ( k ) are independent random variables, then t =
Z1
Z2
k
~ t (k ) .
Theorem 6
If Z1 ~ χ 2 ( k1 ) and Z 2 ~ χ 2 ( k 2 ) are independent random variables, then F =
Z1 k1
~ F ( k1 , k 2 ) .
Z 2 k2
Theorem 7
Let t ~ t (k ) . Then F = t 2 ~ F (1, k ) .
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
25
STK310
CONFIDENCE INTERVALS
Confidence interval for β2
(
)
σ2
βˆ2 ~ N β 2 , σ β2ˆ with σ β2ˆ =
OLS estimator
2
∑ xi2
2
(
) ∑x
βˆ2 − β 2 βˆ2 − β 2
Z=
=
σ βˆ
σ
Standardization
2
i
~ N (0, 1)
2
But σ 2 is unknown.
σˆ 2 = n 1−2 ∑ uˆi2
OLS estimator for σ 2
σ β2ˆ =
2
σ2
∑ xi2
W = ( n − 2)
σˆ 2
estimated by σˆ β2ˆ =
∑ xi2
2
σˆ 2
~ χ 2 (n − 2)
2
σ
βˆ2 and σˆ 2 independent
(
Z and W independent
) ∑x
βˆ2 − β 2
Z
t=
=
σˆ
W ( n − 2)
2
i
=
βˆ2 − β 2
~ t ( n − 2)
σˆ βˆ
2
100(1 − α )% confidence interval for β 2 :
P ( −tα 2 ≤ t ≤ tα 2 ) = 1 − α

βˆ − β 2
P − tα 2 ≤ 2
≤ tα

σˆ βˆ
2

2

 =1−α


P( −tα 2σˆ βˆ ≤ βˆ2 − β 2 ≤ tα 2σˆ βˆ ) = 1 − α
2
2
P( − βˆ2 − tα 2σˆ βˆ ≤ − β 2 ≤ − βˆ2 + tα 2σˆ βˆ ) = 1 − α
2
2
P( βˆ2 − tα 2σˆ βˆ ≤ β 2 ≤ βˆ2 + tα 2σˆ βˆ ) = 1 − α
2
2
βˆ2 &plusmn; tα 2σˆ βˆ
2
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
26
STK310
Confidence interval for β1
(
)
βˆ1 ~ N β1 , σ β2ˆ with σ β2ˆ =
OLS estimator
1
1
100(1 − α )% confidence interval for β1
σ 2 ∑ X i2
n ∑ xi2
σˆ 2 ∑ X i2
2
ˆ
β1 &plusmn; tα 2σˆ βˆ with σˆ βˆ =
1
1
n ∑ xi2
Confidence interval for σ2
σˆ 2 = n 1−2 ∑ uˆi2 with W = ( n − 2)
OLS estimator
100(1 − α )% confidence interval for σ
2
σˆ 2
~ χ 2 (n − 2)
2
σ
2
2

 ( n − 2) σˆ , ( n − 2) σˆ

χα2 2
χ12−α




2
Example: Wages versus education (Gujarati &amp; Porter, 2009, p. 78)
Y
Hourly wage
X
Years of schooling
Estimated regression line
Yˆi = −0.01445 + 0.7241 X i
95% confidence interval for β1
(−1.93949, 1.91058)
95% confidence interval for β 2
(0.57095, 0.87724)
Mean square error (MSE)
σˆ 2 = 0.88116
95% confidence interval for σ 2
(0.442187, 2.540199)
HYPOTHESIS TESTING
Hypothesis testing for β2
1. Specify null and alternative hypotheses:
H 0 : β 2 = β 2∗
H1 : β 2 ≠ β 2∗
where β 2∗ is parameter value under H 0
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
27
STK310
2. Select significance level α and define test statistic
t=
βˆ2 − β 2∗
~ t ( n − 2)
σˆ βˆ
2
3. Give decision rule:
Confidence interval
reject H 0 if β 2∗ falls outside 100(1 − α )% confidence interval
Critical value vs test statistic value
reject H 0 if | t |≥ tα
2
reject H 0 if p-value ≤ α
p-value vs significance level
4. Calculate:
Confidence interval
Test statistic value
p-value
5. Draw conclusion
Hypothesis testing for β1
1. H 0 : β1 = β1∗
H1 : β1 ≠ β1∗
2. t =
βˆ1 − β1∗
~ t ( n − 2)
σˆ βˆ
1
3. Decision rule
4. Calculate confidence interval / test statistic value / p-value
5. Conclusion
Example: Wages versus education (Gujarati &amp; Porter, 2009, p. 78)
Y
Hourly wage
X
Years of schooling
Estimated regression line
Yˆi = −0.01445 + 0.7241 X i
Is intercept parameter significant?
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
28
STK310
1. H 0 : β1 = 0
H 1 : β1 ≠ 0
2. α = 0.05
t=
βˆ1
~ t (11)
σˆ βˆ
1
3. Reject H 0 if:
0 falls outside 95% confidence interval
| t |≥ t0.025 = 2.201
p-value &lt; 0.05
4. Calculations using SAS:
95% confidence interval
t=
(−1.93949, 1.91058)
βˆ1 − 0.01445
=
= −0.02
σˆ βˆ
0.87462
1
p-value = 0.9871
intercept parameter not significant
5. H 0 not rejected
Is slope parameter significant?
1. H 0 : β 2 = 0
H1 : β 2 ≠ 0
2. α = 0.05
t=
βˆ2
~ t (11)
σˆ βˆ
2
3. Reject H 0 if:
0 falls outside 95% confidence interval
| t |≥ t0.025 = 2.201
p-value &lt; 0.05
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
29
STK310
4. Calculations using SAS:
95% confidence interval
t=
(0.57095, 0.87724)
0.7241
βˆ2
=
= 10.41
σˆ βˆ 0.06958
2
p-value &lt; .0001
5. H 0 rejected
slope parameter significant
For each additional year of schooling, does hourly wage on average increase by 50 cents?
1. H 0 : β 2 = 0.5
H 1 : β 2 ≠ 0 .5
2. α = 0.05
t=
βˆ2 − 0.5
~ t (11)
σˆ βˆ
2
3. Reject H 0 if:
0.5 falls outside 95% confidence interval
| t |≥ t0.025 = 2.201
p-value &lt; 0.05
4. Calculations using SAS:
95% confidence interval
t=
(0.57095, 0.87724)
βˆ2 − 0.5 0.7241 − 0.5
=
= 3.22
σˆ βˆ
0.06958
2
p-value = 0.008
5. H 0 rejected
For each additional year of schooling, hourly wage does not increase on
average by 50 cents
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
30
STK310
ANALYSIS OF VARIANCE (ANOVA)
Total sum of squares divided into explained sum of squares and residual sum of squares:
Study of these components
ANOVA
ANOVA table:
Source
Degrees of
freedom
Sum of
squares
Mean
square
F
ESS
1
βˆ22 ∑ xi2
βˆ22 ∑ xi2
βˆ22 ∑ xi2 σˆ 2
n−2
∑ uˆi2
σ̂ 2
TSS
n −1
∑ yi2
What is the distribution of F?
(
) ∑x
βˆ2 − β 2 βˆ2 − β 2
=
Z=
σ βˆ
σ
2
i
Z 2 ~ χ 2 (1)
~ N (0, 1)
2
W = ( n − 2)
σˆ 2
~ χ 2 (n − 2)
σ2
(
)
2
βˆ2 − β 2 ∑ xi2 βˆ22 ∑ xi2
Z2 1
F=
under H 0 : β 2 = 0
=
=
W ( n − 2)
σˆ 2
σˆ 2
F=
Z &amp; W independent
βˆ22 ∑ xi2
~ F (1, n − 2)
σˆ 2
Example: Wages versus education (Gujarati &amp; Porter, 2009, p. 78)
Y
Hourly wage
X
Years of schooling
ANOVA table:
Source
Degrees of
freedom
Sum of
squares
Mean
square
ESS
1
95.42552
95.42552
11
9.69281
0.88116
TSS
12
105.11833
F
108.29
Is slope parameter significant?
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
31
STK310
1. H 0 : β 2 = 0
H1 : β 2 ≠ 0
βˆ22 ∑ xi2
F=
~ F (1, 11) under H 0 : β 2 = 0
σˆ 2
2. α = 0.01
3. Reject H 0 if p-value &lt; 0.01
4. Calculations using SAS:
F = 108.29
p-value &lt; .0001
5. H 0 rejected
slope parameter significant
PREDICTION
Mean prediction
Prediction of conditional mean value of Y corresponding to given value of X, say X 0
Yˆ0 = βˆ1 + βˆ2 X 0
point estimator for E (Y | X 0 )
 1 ( X − X )2 
Yˆ0 ~ N β1 + β 2 X 0 , σ Y2ˆ where σ Y2ˆ = σ 2  + 0 2 
0
0
n
∑ xi 

(
t=
)
Yˆ0 − ( β1 + β 2 X 0 )
~ t ( n − 2)
σˆYˆ
0
100(1 − α )% confidence interval for E (Y | X 0 )
Yˆ0 &plusmn; tα 2σˆYˆ
0
Individual prediction
Prediction of individual value of Y, say Y0 , corresponding to given value of X, say X 0
Yˆ0 = βˆ1 + βˆ2 X 0
Y0 − Yˆ0
point estimator for Y0
prediction error
(
(Y0 − Yˆ0 ) ~ N 0, σ (2Y
t=
ˆ
0 −Y0 )
) where σ
2
(Y0 −Yˆ0 )
 1 ( X − X )2 
= σ 2 1 + + 0 2 
 n
∑ xi 

Y0 − Yˆ0
~ t ( n − 2)
σˆ (Y −Yˆ )
0
0
100(1 − α )% confidence interval for Y0
Yˆ0 &plusmn; tα 2σˆ (Y
ˆ
0 −Y0 )
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
32
STK310
Example: Wages versus education (Gujarati &amp; Porter, 2009, p. 78)
Y
Hourly wage
X
Years of schooling
Estimated regression line
Yˆi = −0.01445 + 0.7241 X i
Predict the mean hourly wage for 6 years of schooling and calculate a 95% confidence interval.
Yˆ0 = −0.01445 + 0.7241 &times; 6 = 4.3301
95% confidence interval for E (Y | 6)
(3.2472, 5.413)
Nosnow Cannotski has 6 years of schooling. Predict his hourly wage and calculate a 95%
confidence interval.
Yˆ0 = −0.01445 + 0.7241 &times; 6 = 4.3301
95% confidence interval for Y0
(1.9975, 6.6628)
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
33
STK310
A6. SIMPLE REGRESSION – EXTENSIONS OF THE TWOVARIABLE LINEAR REGRESSION MODEL
REGRESSION THROUGH ORIGIN
PRF
Yi = β 2 X i + ui
SRF
Yi = βˆ2 X i + uˆi
OLS estimator for β2
βˆ2 = ∑
OLS estimator for σ 2
σˆ 2 = n1−1 ∑ uˆi2
X iYi
∑ X i2
σ2
E βˆ2 = β 2 and var βˆ2 =
∑ X i2
( )
( )
rawR
2
2
(
X iYi )
∑
=
∑ X i2 ∑Yi2
Example: Wages versus education (Gujarati &amp; Porter, 2009, p. 78)
Y
Hourly wage
X
Years of schooling
OLS regression
Yˆi = −0.0145 + 0.7241X i
OLS regression through origin
Yˆi = 0.723X i
LOG-LINEAR MODEL
Exponential regression model
Apply double-log transformation
Yi = β1 X iβ 2 eui
ln Yi = ln β1 + β 2 ln X i + ui
Let
Yi∗ = ln Yi
X i∗ = ln X i
α = ln β1
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
34
STK310
Yi∗ = α + β 2 X i∗ + ui
Log-linear / log-log / double-log model
Absolute, relative and percentage change:
Absolute change
X i − X i −1 and Yi − Yi −1
Relative change
X i − X i −1
Y − Yi −1
and i
X i −1
Yi −1
Percentage (%) change
100 &times;
X i − X i −1
Y −Y
and 100 &times; i i −1
X i −1
Yi −1
β2 measures absolute change in Y ∗ for given absolute change in X ∗
β2 measures % change in Y for given % change in X
elasticity
Example: Coffee consumption (Gujarati &amp; Porter, 2009, p. 204)
Y
USA coffee consumption (cups of coffee per person per day)
X
Retail price in dollars per pound
Estimated linear model:
Yˆi = 2.69112 − 0.47953X i
If price of coffee increases by \$1 per pound, demand of coffee decreases on average by &frac12;
cup per day
What is price elasticity of coffee demand?
Estimated log-log model:
Yˆi* = 0.77742 − 0.25305X i*
If price of coffee increases by 1% per pound, demand of coffee decreases on average by
0.25%
Estimated exponential regression model:
βˆ1 = eαˆ = e0.77742 = 2.17585
Yˆi = 2.17585X i−0.25305
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
35
STK310
LOG-LIN MODEL
Yi = β1β 2X i eui
ln Yi = ln β1 + X i ln β 2 + ui
Apply semilog transformation
Let
Yi∗ = ln Yi
α1 = ln β1
α2 = ln β 2
Log-lin / semilog model
Yi∗ = α1 + α 2 X i + ui
α2 measures % change in Y for given absolute change in X
Example: Growth rate of fast food company
Growth rate of fast food company must be measured in terms of its number of pizzerias in
operation by using growth rate formula
Yt = Y0 (1 + r )t
Yt
number of pizzerias for years t = 1, 2, ..., 15
Y0
initial number of pizzerias when company started
compound rate of growth
r
t
Yt
1
7
2
13
3
20
4
33
5
40
6
53
7
60
8
47
9
54
10
41
11
48
12
61
13
68
14
55
15
75
Apply suitable transformation to growth rate formula to obtain linear regression model.
Growth rate formula:
Yt = Y0 (1 + r ) t
Apply semilog transformation:
ln Yt = ln Y0 + t ln(1 + r )
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
36
STK310
Let Yt* = ln Yt , α1 = lnY0 and α 2 = ln(1 + r ) :
Yt* = α1 + α 2t
Add error term, ut , to obtain linear regression model:
Yt* = α1 + α 2t + ut
Estimate instantaneous rate of growth and compound rate of growth of company.
Estimated linear model:
Yˆt* = 2.69115 + 0.12057t
Estimated instantaneous rate of growth:
100 &times; 0.12057 = 12.057%
Estimated compound rate of growth:
(
)
100 &times; e 0.12057 − 1 = 12.814%
LIN-LOG MODEL
Yi = β1 + β 2 ln X i + ui
Let X i∗ = ln X i
Yi = β1 + β 2 X i∗ + ui
β 2 measures absolute change in Y for given % change in X
RECIPROCAL MODEL
Yi = β1 + β 2
Let X i∗ =
1
+ ui
Xi
1
Xi
Yi = β1 + β 2 X i∗ + ui
As X increases, Y approaches β1
limiting / asymptotic value
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
37
STK310
A7. MULTIPLE REGRESSION – ESTIMATION
THREE-VARIABLE REGRESSION MODEL
Yi = β1 + β 2 X 2i + β 3 X 3i + ui
PRF
β1
intercept parameter
β 2 and β 3
partial regression coefficients
Assumptions:
Linear regression model
X values fixed (nonstochastic) in repeated sampling
OR
X values sampled randomly such that covariance between error term and each X variable is
zero
Zero mean value for ui
Homoscedasticity for ui
E (ui ) = 0
var(ui ) = σ 2
No autocorrelation between ui ’s
cov(ui , u j ) = 0 , i ≠ j
Number of observations (n) &gt; number of parameters (k)
Variability in X values
No perfect collinearity (linear relationship) between X variables
No specification bias
Example: Semester test
mark obtained
Y
X2
number of hours studied per day
X3
number of hours studied per week
Since X 3 = 7X 2 :
Yi = β1 + β 2 X 2i + β 3 X 3i + ui
= β1 + β 2 X 2i + 7 β 3 X 2i + ui
= β1 + ( β 2 + 7 β 3 ) X 2i + ui
= β1 + α X 2i + ui
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
38
STK310
MEANING OF PARTIAL REGRESSION COEFFICIENTS
β2
measures change in mean value of Y for unit change in X 2 , holding X 3 constant
β3
measures change in mean value of Y for unit change in X 3 , holding X 2 constant
OLS &amp; ML ESTIMATION
PRF
Yi = β1 + β 2 X 2i + β 3 X 3i + ui
SRF
Yi = Yˆi + uˆi = βˆ1 + βˆ2 X 2i + βˆ3 X 3i + uˆi
OLS estimators:
βˆ1 = Y − βˆ2 X 2 − βˆ3 X 3
βˆ2 =
(∑ yi x2i )(∑ x32i ) − (∑ yi x3i )(∑ x2i x3i )
(∑ x22i )(∑ x32i )− (∑ x2i x3i )2
βˆ3 =
(∑ yi x3i )(∑ x22i ) − (∑ yi x2i )(∑ x2i x3i )
(∑ x22i )(∑ x32i )− (∑ x2i x3i )2
σˆ
2
uˆi2
∑
=
n−3
ML estimators:
~
β1 = βˆ1
~
β 2 = βˆ2
~
β 3 = βˆ3
σ~ 2 =
∑ uˆi2
n
ESS
∑ uˆi
R =
= 1−
= 1−
TSS
TSS
∑ yi2
2
2
R
2
uˆi2 ( n − k )
 n −1 
∑
=1−
= 1 − (1 − R 2 )

2
n−k 
∑ yi (n − 1)
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
39
STK310
PARTIAL CORRELATION COEFFICIENTS
Correlation coefficients:
r12
correlation coefficient between Y and X 2
r13
correlation coefficient between Y and X 3
r23
correlation coefficient between X 2 and X 3
Partial correlation coefficients:
r12.3
correlation coefficient between Y and X 2 holding X 3 constant
r13.2
correlation coefficient between Y and X 3 holding X 2 constant
r23.1
correlation coefficient between X 2 and X 3 holding Y constant
POLYNOMIAL REGRESSION
PRF
Yi = β 0 + β1 X i + β 2 Zi + ui
Suppose Zi = X i2
Yi = β 0 + β1 X i + β 2 X i2 + ui
One explanatory variable
β1 and β 2 cannot be interpreted individually
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
40
STK310
A8. MULTIPLE REGRESSION – INFERENCE
NORMALITY ASSUMPTION
Assumption
t=
ui ~ N (0, σ 2 )
βˆ j − β j
~ t (n − k ) for j = 1, 2, ..., k
σˆ βˆ
j
σˆ 2
W = ( n − k ) 2 ~ χ 2 (n − k )
σ
F=
ESS ( k − 1)
R 2 ( k − 1)
=
~ F (k − 1 , n − k )
RSS (n − k ) (1 − R 2 ) (n − k )
F=
2
( RUR
− RR2 ) m
=
~ F (m , n − k )
2
RSSUR ( n − k )
(1 − RUR
) (n − k )
HYPOTHESIS TESTING
Example: Demand for chicken (Gujarati &amp; Porter, 2009, p. 220)
Demand function for chicken:
Yt = αX 2βt2 X 3βt3 X 4βt4 X 5βt5 eut
Y
per capita consumption of chickens in pounds
X2
real disposable income per capita in dollars
X3
real retail price of chicken per pounds in cents
X4
real retail price of pork per pounds in cents
X5
real retail price of beef per pounds in cents
Regression model in linear form:
ln Yt = β1 + β 2 ln X 2t + β 3 ln X 3t + β 4 ln X 4t + β5 ln X 5t + ut
Fitted regression model:
ln Yt = 2.18979 + 0.34256 ln X 2 t − 0.50459 ln X 3t
( 0.15571)
( 0.08327 )
( 0.11089 )
+ 0.14855 ln X 4 t + 0.09110 ln X 5t + uˆt
( 0.09967 )
( 0.10072 )
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
41
STK310
Are the individual regression coefficients significant?
1. H 0 : β 2 = 0
H1 : β 2 ≠ 0
2. t =
βˆ2
~ t (18) under H 0
σˆ βˆ
2
3. Reject H 0 if p-value ≤ α
4. t =
0.34256
= 4.11
0.08327
p-value = 0.0007
5. H 0 : β 2 = 0 rejected at 1% significance level
β 2 significant
Similarly it follows that β1 and β3 are significant, but β 4 and β5 are not.
Is the overall regression model significant?
1. H 0 : β 2 = β 3 = β 4 = β5 = 0
H1 : At least one β j ≠ 0 , j = 2, 3, 4, 5
2. F =
ESS 4
R2 4
=
~ F ( 4 , 18) under H 0
RSS 18 (1 − R 2 ) 18
3. Reject H 0 if p-value ≤ α
4. F =
0.76105 4
0.9823 4
=
= 249.93
0.0137 18 (1 − 0.9823) 18
p-value &lt; 0.0001
5. H 0 : β 2 = β 3 = β 4 = β5 = 0 rejected at 1% significance level
The overall regression model is significant.
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
42
STK310
Are β4 and β5 both zero?
1. H 0 : β 4 = β 5 = 0
H1 : At least one of β 4 and β5 not zero
Unrestricted model:
ln Yt = β1 + β 2 ln X 2t + β 3 ln X 3t + β 4 ln X 4t + β5 ln X 5t + ut
Restricted model under H0:
ln Yt = β1 + β 2 ln X 2t + β 3 ln X 3t + ut
2. F =
2
− RR2 ) 2
=
~ F (2 , 18) under H 0
2
(1 − RUR
) 18
3. Reject H 0 if p-value ≤ α
4. F =
(0.01544 − 0.0137) 2 (0.9823 − 0.9801) 2
=
= 1.14
0.0137 18
(1 − 0.9823) 18
p-value = 0.3421
5. H 0 : β 4 = β 5 = 0 not rejected
β 4 and β5 are not significantly different from zero
Is β4 equal to β5?
1. H 0 : β 4 = β5
H1 : β 4 ≠ β 5
Unrestricted model:
ln Yt = β1 + β 2 ln X 2t + β 3 ln X 3t + β 4 ln X 4t + β5 ln X 5t + ut
Restricted model under H0:
ln Yt = β1 + β 2 ln X 2t + β 3 ln X 3t + β 4 ln X 4t + β 4 ln X 5t + ut
= β1 + β 2 ln X 2t + β 3 ln X 3t + β 4 (ln X 4t + ln X 5t ) + ut
2. F =
2
− RR2 ) 1
=
~ F (1 , 18) under H 0
2
(1 − RUR
) 18
3. Reject H 0 if p-value ≤ α
4. F =
(0.01394 − 0.0137) 1 (0.9823 − 0.9820) 1
=
= 0.31
0.0137 18
(1 − 0.9823) 18
p-value = 0.5864
5. H 0 : β 4 = β5 not rejected
β 4 is not significantly different from β5
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
43
STK310
If the real retail price of chicken increases by 1% while the real disposable income per
capita and the real retail prices of pork and beef remain constant, will the per capita
consumption of chickens decrease by 0.75%?
1. H 0 : β 3 = −0.75
H1 : β 3 ≠ −0.75
2. t-test: t =
βˆ3 − ( −0.75)
~ t (18) under H 0
σˆ βˆ
3
F-test: F =
2
− RR2 ) 1
=
~ F (1 , 18) under H 0
2
(1 − RUR
) 18
3. t-test: Reject H 0 if –0.75 falls outside the 95% confidence interval for β 3
F-test: Reject H 0 if p-value ≤ α
4. t-test: 95% confidence interval for β 3
F-test: F = 4.90
( −0.73757, − 0.27161)
p-value = 0.0401
5. H 0 : β 3 = −0.75 rejected at 5% significance level
The per capita consumption of chickens will not decrease by 0.75% if the real retail price of
chicken increases by 1% while the real disposable income per capita and the real retail prices
of pork and beef remain constant.
PREDICTION
Example: Demand for chicken (Gujarati &amp; Porter, 2009, p. 220)
Consider following demand function for chicken:
Yt = αX 2βt2 X 3βt3 eut
Regression model in linear form:
ln Yt = β1 + β 2 ln X 2t + β 3 ln X 3t + ut
Fitted regression model:
ln Yt = 2.03282 + 0.45153 ln X 2 t − 0.37221ln X 3t + uˆt
( 0.11618 )
( 0.02469 )
( 0.06347 )
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
44
STK310
Suppose for 1983 we have:
Real disposable income is \$2 700 per capita
Real retail price of chicken is 70 cents per pound
Calculate a 95% confidence interval for the mean consumption of chicken when the real
disposable income is \$2 700 per capita and the real retail price of chicken is 70 cents per pound:
Predicted value
e 4.019 = 55.64543
95% confidence interval
( e 3.9919 , e 4.0461 ) = (54.15769 , 57.17404)
Calculate a 95% confidence interval for the consumption of chicken in 1983:
Predicted value
e 4.019 = 55.64543
95% confidence interval
( e 3.955 , e 4.083 ) = (52.19569 , 59.32317)
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
45
STK310
A9. MULTIPLE REGRESSION – MATRIX APPROACH TO
REGRESSION ANALYSIS
TWO-VARIABLE REGRESSION MODEL
PRF
Yi = β1 + β 2 X i + ui
SRF
Yi = βˆ1 + βˆ2 X i + uˆi
βˆ1 = Y − βˆ2 X
n XY − X Y
xy
βˆ2 = ∑ i i2 ∑ i ∑2 i = ∑ i 2 i
n ∑ X i − (∑ X i )
∑ xi
σˆ
2
uˆi2
∑
=
n−2
THREE-VARIABLE REGRESSION MODEL
PRF
Yi = β1 + β 2 X 2i + β 3 X 3i + ui
SRF
Yi = βˆ1 + βˆ2 X 2i + βˆ3 X 3i + uˆi
βˆ1 = Y − βˆ2 X 2 − βˆ3 X 3
βˆ2 =
(∑ yi x2i )(∑ x32i ) − (∑ yi x3i )(∑ x2i x3i )
(∑ x22i )(∑ x32i )− (∑ x2i x3i )2
βˆ3 =
(∑ yi x3i )(∑ x22i ) − (∑ yi x2i )(∑ x2i x3i )
(∑ x22i )(∑ x32i )− (∑ x2i x3i )2
σˆ
2
uˆi2
∑
=
n−3
k-VARIABLE REGRESSION MODEL
PRF
Yi = β1 + β 2 X 2i + β 3 X 3i + ... + β k X ki + ui
SRF
Yi = βˆ1 + βˆ2 X 2i + βˆ3 X 3i + ... + βˆk X ki + uˆi
How do we now obtain expressions for the OLS estimators?
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
46
STK310
Consider system of n equations in PRF:
Y1 = β1 + β 2 X 21 + β 3 X 31 + ... + β k X k 1 + u1
Y2 = β1 + β 2 X 22 + β 3 X 32 + ... + β k X k 2 + u2
M
Yn = β1 + β 2 X 2 n + β 3 X 3n + ... + β k X kn + un
Write this system in matrix notation:
 Y1  1 X 21
  
 Y2  = 1 X 22
 M  M
M
  
 Yn  1 X 2 n
y =
n &times;1
K X k1  β1   u1 
   
K X k 2  β 2  +  u2 
O M  M   M 
   
K X kn  β k   un 
X
β + u
n&times;k
k &times;1 n &times;1
Theorem
u ~ N ( 0, σ 2 I )
Proof
 u1   E (u1 )   0 
  
  
 u 2   E ( u2 )   0 
E (u) = E   = 
=
=0
M
M  M
  
  
 un   E ( un )   0 
 u12 u1u2

  u1 


 
 u2u1 u22

  u2 
E ( uu' ) = E   (u1 u2 K un )  = E 
M
M
 M

 
u 

u u u u

 n 
 n 1 n 2
 E (u12 ) E (u1u2 )

 E (u2u1 ) E (u22 )
=
M
 M
 E (u u ) E ( u u )
n 1
n 2

1

20
=σ 
M

0
0
1
M
0
K
K
O
K
K u1un 

K u2un 

O
M 
K un2 
K E (u1un )   σ 2 0
 
K E (u2un )   0 σ 2
=
O
M
M
  M
2  
K E (un )   0
0
K 0

K 0

O M 
K σ 2 
0

0
= σ 2I

M

1 
■
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
47
STK310
OLS ESTIMATION
y = Xβˆ + uˆ
SRF in matrix notation
Theorem
OLS estimator for β
βˆ = ( X ' X ) −1 X ' y
Proof
OLS estimators obtained by minimizing
In matrix notation
∑ uˆi2
minimize uˆ ' uˆ :
uˆ ' uˆ = ( y − Xβˆ )' ( y − Xβˆ )
= y ' y − βˆ ' X ' y − y ' Xβˆ + βˆ ' X ' Xβˆ
= y ' y − 2βˆ ' X ' y + βˆ ' X ' Xβˆ
Take derivative, set equal to zero and solve:
∂uˆ ' uˆ
= −2 X ' y + 2 X ' Xβˆ = 0
∂βˆ
∴ X ' Xβˆ = X ' y
∴ βˆ = ( X ' X ) −1 X ' y
■
Theorem
βˆ ~ N (β, σ 2 ( X ' X ) −1 )
Proof
βˆ = ( X ' X ) −1 X ' y
= ( X ' X ) −1 X ' ( Xβ + u)
= ( X ' X ) −1 X ' Xβ + ( X ' X ) −1 X ' u
= β + ( X ' X ) −1 X ' u
(
E (βˆ ) = E β + ( X ' X ) −1 X ' u
)
(
= E (β) + E ( X ' X ) −1 X ' u
)
= β + ( X ' X ) −1 X ' E ( u)
= β since E ( u) = 0
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
48
STK310
(
)
= E ((( X ' X ) X ' u)(( X ' X ) X ' u)' )
= E ((( X ' X ) X ' u)( u' X ( X ' X ) ) )
= E (( X ' X ) X ' uu' X ( X ' X ) )
var − cov(βˆ ) = E (βˆ − β)(βˆ − β)'
−1
−1
−1
−1
−1
−1
= ( X ' X ) −1 X ' E (uu' ) X ( X ' X ) −1
= ( X ' X ) −1 X ' σ 2IX ( X ' X ) −1
= σ 2 ( X ' X ) −1
■
INFERENCE
σˆ 2 =
uˆ ' uˆ
n−k
R2 =
βˆ ' X ' y − nY 2
y ' y − nY 2
(βˆ ' X ' y − nY 2 ) ( k − 1)
R 2 ( k − 1)
F=
=
~ F (k − 1 , n − k )
( y ' y − βˆ ' X ' y ) (n − k ) (1 − R 2 ) (n − k )
βˆ j − β j
t=
~ t(n − k )
σˆ βˆ
j
__________________________________________________________________________________
Dr Paul J van Staden, University of Pretoria, Department of Statistics
Statistics 310
49
STK310
REFERENCES
Gujarati, D.N. &amp; Porter, D.C. 2009. Basic Econometrics, Fifth edition, McGraw-Hill.
Steyn, A.G.W., Smit, C.F., du Toit, S.H.C. &amp; Strasheim, C. 1999. Modern Statistics in Practice,
J.L. van Schaik Publishers.
__________________________________________________________________________________