Ordinary Least Squares Regression

advertisement
Ordinary Least Squares Regression
PO7001: Quantitative Methods I
Kenneth Benoit
24 November 2010
Independent and Dependent variables
I
I
A dependent variable represents the quantity we wish to
explain variation in, or the thing we are trying to explain
Typical examples of a dependent variable in political science:
I
I
I
I
I
An independent variable represents a quantity whose variation
will be used to explain variation in the dependent variable
Typical examples of independent variables in political science:
I
I
I
I
I
I
votes received by a governing party
support for a referendum result like Lisbon
party support for European integration
demographic: gender, national background, age
economic: socioeconomic status, income, national wealth
political: party affiliation of one’s parents
institutional: district magnitude, electoral system, presidential
v. parliamentary
behavioural: campaign spending levels
Using this language implies causality: X → Y
The importance of variation
I
Variation in outcomes – of the dependent variable – are what
we seek to explain in social and political research
I
We seek to explain these outcomes using (independent)
variables
I
The very language – here, the term “variable” suggests that
the quantity so named has to vary
I
Conversely, a quantity that does not vary is impossible to
study in this way. This also applies to samples that do not
vary: these will not help us in research.
I
Typically when we collect data, we wish to have as much
variation in our sample as possible
I
Example: In the returns from schooling example, it would be
best to have as much variation in schooling as possible, to
maximize the leverage of our research
Different functional relationships
Linear A linear relationship, also known as a straight-line
relationship, exists if a line drawn through the central
tendency of the points is a straight-line
Curvilinear Exists if the relationship between variables is not a
straight-line function, but is instead curved
Example: Television viewing (Y ) as a function of age (X )
(More on this in Quant 2)
Correlations
I
The central idea behind correlation is that two variables have
a relationship such that there is a systematic relationship
between the their values
I
This relationship can be positive or negative
I
This relationship varies in strength
I
Bivariate associations are usually depicted graphically by use
of a scatterplot
I
We can also summarize (as per an earlier week) the
correlation using Pearson’s r to show the direction and
strength of the bivariate relationship
Pearson’s r revisited
r
=
=
=
P
(X − X̄ )(Yi − Ȳ )
qPi i
2
2
i (Xi − X̄ ) (Yi − Ȳ )
Sum of Products
p
Sum of Squaresx Sum of Squaresy
SP
p
SSx SSy
Testing the significance of Pearson’s r
I
Pearson’s r measures correlation in the sample – but the
association we are interested in exists in the population
I
Question: what is the probability that any correlation we
measure in a sample really exists in the population, and is not
merely due to sampling error
I
H0 : ρx,y = 0 – No correlation exists in the population
I
HA : ρx,y 6= 0
I
Significance can be tested by the t-ratio:
√
r N −2
t= √
1 − r2
Pearson’s r significance testing: Example
I
Let’s assume we have data such that: r = .24, N = 8
I
So computation of t is:
t
=
=
=
√
.24 8 − 2
p
1 − (.24)2
.24(2.45)
√
1 − .0576
.59
.59
√
=
= .61
.97
.9424
I
From R (using qt()), we know that the critical value for t with
df=6, α = .05 is 2.447 – so we do not reject H0
I
In R there is a test for this called cor.test(x,y)
Pearson’s r significance testing using R
> x <- c(12,10,6,16,8,9,12,11)
> y <- c(12,8,12,11,10,8,16,15)
> cor(x,y)
[1] 0.2420615
> # compute the empirical t-value
> (t.calc <- (.24*sqrt(8-2) / sqrt(1-.24^2)))
[1] 0.6055768
> # compute the critical t-value
> (t.crit <- qt(1-.05/2, 6))
[1] 2.446912
> # compare
> t.calc > t.crit
[1] FALSE
> # using R’s built-in correlation significance test
> cor.test(x,y)
Pearson’s product-moment correlation
data: x and y
t = 0.6111, df = 6, p-value = 0.5636
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5577490 0.8087778
sample estimates:
cor
0.2420615
Regression analysis
I
Recall the basic linear model:
yi = a + bXi
I
I
Here the relationship is determined by two parameters:
a the intercept: this refers to the value of Y when
X is zero
b the slope: or the rate of change in Y for a
one-unit change in X . Also known as the
regression coefficient
Note that this implies a straight line, perfect relationship
however. Because this is never the case in real research, we
also have an error term or residual ei :
Yi = β0 + β1 xi + εi
and we use the β and ε terminology instead of the high-school
math quantities a and b
Example
35
30
25
20
15
Sentence length (Y)
40
45
par(mar=c(4,4,1,1))
x <- c(0,3,1,0,6,5,3,4,10,8)
y <- c(12,13,15,19,26,27,29,31,40,48)
plot(x, y, xlab="Number of prior convictions (X)",
ylab="Sentence length (Y)", pch=19)
abline(h=c(10,20,30,40), col="grey70")
0
2
4
6
Number of prior convictions (X)
8
10
Least squares formulas
For the three parameters (simple regression):
I
the regression coefficient:
P
(xi − x̄)(yi − ȳ )
P
β̂1 =
(xi − x̄)2
I
the intercept:
β̂0 = ȳ − β̂1 x̄
I
and the residual variance σ 2 :
1 X
σ̂ 2 =
[yi − (β̂0 + β̂1 xi )]2
n−2
Least squares formulas continued
Things to note:
I
the prediction line is ŷ = β̂0 + β̂1 x
I
the value ŷi = β̂0 + β̂1 xi is the predicted value for xi
I
the residual is ei = yi − ŷi
I
The residual sum of squares (RSS) =
I
The estimate for σ 2 is the same as
2
i ei
P
σ̂ 2 = RSS/(n − 2)
Example to show fomulas in R
> x <- c(0,3,1,0,6,5,3,4,10,8)
> y <- c(12,13,15,19,26,27,29,31,40,48)
> (data <- data.frame(x, y, xdev=(x-mean(x)), ydev=(y-mean(y)),
+
xdevydev=((x-mean(x))*(y-mean(y))),
+
xdev2=(x-mean(x))^2,
+
ydev2=(y-mean(y))^2))
x y xdev ydev xdevydev xdev2 ydev2
1
0 12
-4 -14
56
16
196
2
3 13
-1 -13
13
1
169
3
1 15
-3 -11
33
9
121
4
0 19
-4
-7
28
16
49
5
6 26
2
0
0
4
0
6
5 27
1
1
1
1
1
7
3 29
-1
3
-3
1
9
8
4 31
0
5
0
0
25
9 10 40
6
14
84
36
196
10 8 48
4
22
88
16
484
> (SP <- sum(data$xdevydev))
[1] 300
> (SSx <- sum(data$xdev2))
[1] 100
> (SSy <- sum(data$ydev2))
[1] 1250
> (b1 <- SP / SSx)
[1] 3
From observed to “predicted” relationship
I
In the above example, β̂0 = 14, β̂1 = 3
I
This linear equation forms the regression line
I
The regression line always passes through two points:
I
I
the point (x = 0, y = β̂0 )
the point (x̄, ȳ ) (the average X predicts the average Y )
2
i ei
P
I
The residual sum of squares (RSS) =
I
The regression line is that which minimizes the RSS
Plot of regression example
45
Regression line
30
25
20
15
Sentence length (Y)
35
40
Yhat = 14 + 3X
Y intercept
0
2
4
6
Number of prior convictions (X)
8
10
Requirements for regression
1. Both variables should be measured at the interval level
2. The relationship (in the population) between X and Y is
linear
I
I
A transformation may fix non-linearity in some cases
Outliers or influential points may need special treatment
3. Sample is randomly chosen (necessary for inference)
4. Both variables must be normally distributed, unless we have a
very large sample
Regression “terminology”
I
y is the dependent variable
I
I
referred to also (by Greene) as a regressand
X are the independent variables
I
I
also known as explanatory variables
also known as regressors
I
y is regressed on X
I
The error term is sometimes called a disturbance
Notation
I
Independent variables X
I
Dependent variable Y
I
Yi is a random variable (not directly observed)
I
yi is a realised value of Yi (observed)
I
So dependent variable sometimes refers to a set of numbers in
your dataset (y ) and sometimes to a random variable at each
i (Yi ).
Interpreting the regression results
I
The Y -intercept corresponds to the expected value of Y when
X = 0 (may or may not be meaningful)
I
The regression coefficient β1 refers to the expected change in
Y resulting from a one-unit change in X
I
Typically we are more interested in β1 than β0 , but β0 forms a
vital part of any regression estimation
I
We can make a prediction yˆi for any i (and xi , although we
should be careful to choose reasonable values of xi )
Example: For xi =13,
yˆi
= β̂0 + β̂1 (13)
= 14 + 3(13)
= 14 + 39
= 53
Sums of squares (ANOVA)
TSS Total sum of squares
P
(yi − ȳ )2
P
ESS Estimation or Regression sum of squares (ŷi − ȳ )2
P 2 P
RSS Residual sum of squares
ei = (ŷi − yi )2
The key to remember is that TSS = ESS + RSS
R2
How much of the variance did we explain?
PN
PN
2
(yˆi − ȳ )2
RSS
i=1 (yi − yˆi )
= 1 − PN
= Pi=1
R =1−
N
2
2
TSS
i=1 (yi − ȳ )
i=1 (yi − ȳ )
2
Can be interpreted as the proportion of total variance explained by
the model.
R 2 and Pearson’s r
For simple linear regression (i.e. one independent variable), R 2 is
the same as the correlation coefficient, Pearson’s r , squared.
R2
I
Note that computing the regression line comes from the same
sums of squares (SP, SSx , SSY ) used in computing the
correlation r
I
The squared correlation R 2 is an important quantity in
regression (sometimes called the coefficient of determination)
I
R 2 is the proportion of variance in Y determined by variation
in X
I
0 ≤ R 2 ≤ 1.0
I
But remember that R 2 is not a regression parameter
I
Computation:
SP
r=p
SSx SSy
R 2 computation example
r
=
p
=
p
=
=
=
SP
SSx SSy
300
(100)(1250)
300
√
125000
300
353.55
.85
r 2 is then (.85)2 = .72
This means that 72% of the sentence length is explained by the
number of priors
R2
I
A much over-used statistic: it may not be what we are
interested in at all
I
Interpretation: the proportion of the variation in y that is
explained lineraly by the independent variables
I
Defined in terms of sums of squares:
ESS
TSS
RSS
= 1−
TSS
P
(yi − ŷi )2
= 1− P
(yi − ȳ )2
R2 =
I
Alternatively, R 2 is the squared correlation coefficient between
y and ŷ
R 2 continued
I
When a model has no intercept, it is possible for R 2 to lie
outside the interval (0, 1)
I
R 2 rises with the addition of more explanatory variables. For
n−1
this reason we often report “adjusted R 2 ”: 1 − (1 − R 2 ) n−k−1
where k is the total number of regressors in the linear model
(excluding the constant)
I
Whether R 2 is high or not depends a lot on the overall
variance in Y
I
To R 2 values from different Y samples cannot be compared
R 2 continued
I
Solid arrow:
P variation in y when X is unknown (TSS Total Sum of
Squares (yi − ȳ )2 )
I
Dashed arrow: variation
in y when X is known (ESS Estimation
P
Sum of Squares (ŷi − ȳ )2 )
R 2 decomposed
y
= ŷ + Var(y ) = Var(ŷ ) + Var(e) + 2Cov(ŷ , e)
Var(y ) = Var(ŷ ) + Var(e) + 0
X
X
(yi − ȳ )2 /N =
(ŷi − ŷ¯ )2 /N +
(ei − ê)2 /N
X
X
X
(yi − ȳ )2 =
(ŷi − ŷ¯ )2 +
(ei − ê)2
X
X
X
(yi − ȳ )2 =
(ŷi − ŷ¯ )2 +
ei2
X
TSS
TSS/TSS
= ESS + RSS
= ESS/TSS + RSSTSS
1 = R 2 + unexplained variance
Example from height-weight data
(of course) there is a direct way to compute regression statistics in R:
> # regression models in R
> regmodel <- lm(y ~ x)
> summary(regmodel)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q Median
-5.2121 -2.5682 -0.6515
3Q
1.5303
Max
8.2424
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.0909
21.6202
2.271
0.0636 .
x
0.7576
0.3992
1.898
0.1065
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.587 on 6 degrees of freedom
Multiple R-squared: 0.375, Adjusted R-squared: 0.2709
F-statistic: 3.601 on 1 and 6 DF, p-value: 0.1065
Illustration using the Anscombe dataset
[show R analysis here]
## Illustration using Anscombe dataset
data(anscombe)
attach(anscombe)
round(coef(lm(y1~x1)), 2)
# same b0, b1
round(coef(lm(y2~x2)), 2)
round(coef(lm(y3~x3)), 2)
round(coef(lm(y4~x4)), 2)
round(summary(lm(y1~x1))$r.squared, 2)
# same R^2
round(summary(lm(y2~x2))$r.squared, 2)
round(summary(lm(y3~x3))$r.squared, 2)
round(summary(lm(y4~x4))$r.squared, 2)
# plot the four x-y pairs
par(mfrow=c(2,2), mar=c(4, 4, 1, 1)+0.1) # 4 plots in one graphic window
plot(x1,y1)
abline(lm(y1~x1), col="red", lty="dashed")
plot(x2,y2)
abline(lm(y2~x2), col="red", lty="dashed")
plot(x3,y3)
abline(lm(y3~x3), col="red", lty="dashed")
plot(x4,y4)1
abline(lm(y4~x4), col="red", lty="dashed")
detach(anscombe)
6
y2
4
3
5
4
6
5
7
y1
8
7
9
8
10
9
11
Anscombe plots
4
6
8
10
12
14
4
6
8
10
12
14
x2
y4
6
6
8
8
y3
10
10
12
12
x1
4
6
8
10
x3
12
14
8
10
12
14
x4
16
18
Extensions of the regression model
I
The regression model can be generalized to multiple
regression, which involves “regressing Y ” on several
independent variables X1 , X2 , etc.
I
Regression allows us to isolate the linear contribution of each
unit of each Xk on Y , by “holding everything else constant”
I
This is the most common and most powerful basic technique
in social science statistics, and something that is used in
virtually every analysis that attempts to establish any sort of
cause and effect
I
The additional X variables are typically considered to be
control variables
I
Some variables may also be categorical, especially when they
are “dummy variables” (0 or 1)
Linear model
The linear model can be written as:
Yi = Xi β + εi
εi ∼ N(0, σ 2 )
Or, alternatively, as:
Yi ∼ N(µi , σ 2 )
µi = Xi β
Linear model
Two components of the model:
Yi ∼ N(µi , σ 2 )
µi = Xi β
Stochastic
Systematic
Generalised version:
Yi ∼ f (θi , α)
θi = g (Xi , β)
Stochastic
Systematic
Model
Yi ∼ f (θi , α)
θi = g (Xi , β)
Stochastic
Systematic
Stochastic component: varies over repeated (hypothetical)
observations on the same unit.
Systematic component: varies across units, but constant given X .
Model
Yi ∼ f (θi , α)
θi = g (Xi , β)
Stochastic
Systematic
Two types of uncertainty:
Estimation uncertainty: lack of knowledge about α and β; can be
reduced by increasing N.
Fundamental uncertainty: represented by stochastic component
and exists independent of researcher.
Inference from regression
In linear regression, the sampling distribution of the coefficient
estimates form a normal distribution, which is approximated by a t
distribution due to approximating σ by s.
Thus we can calculate a confidence interval for each estimated
coefficient.
Or perform a hypothesis test along the lines of:
H0 :β1 = 0
H1 :β1 6= 0
Inference from regression
To calculate the confidence interval, we need to calculate the
standard error of the coefficient.
Rule of thumb to get the 95% confidence interval:
β − 2SE < β < β + 2SE
Thus if β is positive, we are 95% certain it is different from zero
when β − 2SE > 0. (Or when the t value is greater than 2 or less
than −2.)
In R, we get the standard errors by using the summary command
on the model output.
Simple regression model: example
Does infant mortality get lower as GDP per capita increases,
measured in 1976?
> m <- lm(INFMORT ~ LEVEL, data=aclp,
subset=(YEAR == 1976 & INFMORT >= 0))
> summary(m)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.3028647 2.7222356 12.234 9.44e-13 ***
LEVEL
-0.0018998 0.0003038 -6.253 9.29e-07 ***
F -test
In simple linear regression, we can do an F -test:
H0 :β1 = 0
H1 :β1 6= 0
F =
ESS/1
ESS
= 2
RSS/(n − 2)
σ̂
with 1 and n − 2 degrees of freedom.
Example
> require(foreign)
> dail <- read.dta("dailcorrected.dta")
> summary(lm(votes1st ~ spend_total + incumb + electorate + minister, data=dail))
Call:
lm(formula = votes1st ~ spend_total + incumb + electorate + minister,
data = dail)
Residuals:
Min
1Q
-4934.1 -1038.8
Median
-347.6
3Q
1054.0
Max
6900.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.966e+02 4.172e+02
1.909
0.0569 .
spend_total
1.737e-01 1.095e-02 15.862
<2e-16 ***
incumbIncumbent 2.522e+03 2.207e+02 11.424
<2e-16 ***
electorate
-4.827e-04 5.404e-03 -0.089
0.9289
minister
-1.303e+02 3.965e+02 -0.329
0.7425
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1847 on 458 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.6478, Adjusted R-squared: 0.6447
F-statistic: 210.6 on 4 and 458 DF, p-value: < 2.2e-16
CLRM: Basic Assumptions
1. Specification:
I
I
I
I
Relationship between X and Y in the population is linear:
E(Y ) = X β
No extraneous variables in X
No omitted independent variables
Parameters (β) are constant
2. E() = 0
3. Error terms:
I
I
Var() = σ 2 , or homoskedastic errors
E(ri ,j ) = 0, or no auto-correlation
CLRM: Basic Assumptions (cont.)
4. X is non-stochastic, meaning observations on independent
variables are fixed in repeated samples
I
I
I
implies no measurement error in X
implies no serial correlation where a lagged value of Y would
be used an independent variable
no simultaneity or endogenous X variables
5. N > k, or number of observations is greater than number of
independent variables (in matrix terms: rank(X ) = k), and no
exact linear relationships exist in X
6. Normally distributed errors: |X ∼ N(0, σ 2 ). Technically
however this is a convenience rather than a strict assumption
Ordinary Least Squares (OLS)
I
Objective: minimize
I
I
P
P
(Yi − Ŷi )2 , where
Ŷi = b0 + b1 Xi
error ei = (Yi − Ŷi )
b =
=
I
ei2 =
P
(Xi − X̄ )(Yi − Ȳ )
P
(Xi − X̄ )
P
XY
P i 2i
Xi
The intercept is: b0 = Ȳ − b1 X̄
OLS rationale
I
Formulas are very simple
I
Closely related to ANOVA (sums of squares decomposition)
I
Predicted Y is sample mean when Pr(Y |X ) =Pr(Y )
I
I
I
In the special case where Y has no relation to X , b1 = 0, then
OLS fit is simply Ŷ = b0
Why? Because b0 = Ȳ − b1 X̄ , so Ŷ = Ȳ
Prediction is then sample mean when X is unrelated to Y
I
Since OLS is then an extension of the sample mean, it has the
same attractice properties (efficiency and lack of bias)
I
Alternatives exist but OLS has generally the best properties
when assumptions are met
OLS in matrix notation
I
Formula for coefficient β:
Y
0
= Xβ + XY
= X 0X β + X 0
X 0Y
= X 0X β + 0
(X 0 X )−1 X 0 Y
= β+0
β = (X 0 X )−1 X 0 Y
I
Formula for variance-covariance matrix: σ 2 (X 0 X )−1
I
I
In simple
P case where y = β0 + β1 ∗ x, this gives
σ 2 / (xi − x̄)2 for the variance of β1
Note how increasing the variation in X will reduce the variance
of β1
The “hat” matrix
I
The hat matrix H is defined as:
β̂ = (X 0 X )−1 X 0 y
X β̂ = X (X 0 X )−1 X 0 y
ŷ
I
I
I
= Hy
H = X (X 0 X )−1 X 0 is called the hat-matrix
P 2
Other important quantities, such as ŷ ,
ei (RSS) can be
expressed as functions of H
Corrections for heteroskedastic errors (“robust” standard
errors) involve manipulating H
Some important OLS properties to understand
Applies to y = α + βx + I
If β = 0 and the only regressor is the intercept, then this is
the same as regressing y on a column of ones, and hence
α = ȳ — the mean of the observations
I
If α = 0 so that therePis no intercept and one explanatory
variable x, then β = P xy
x2
I
If there is an intercept and one explanatory variable, then
P
− x̄)(yi − ȳ )
i (x
Pi
β =
(xi − x̄)2
P
(xi − x̄)yi
= Pi
(xi − x̄)2
Some important OLS properties (cont.)
I
If the observations are expressed as deviations from
P ∗ their
P
means, y ∗ = y − ȳ and x ∗ = x − x̄, then β =
x y ∗ / x ∗2
I
The intercept can be estimated as ȳ − βx̄. This implies that
the intercept is estimated by the value that causes the sum of
the OLS residuals to equal zero.
I
The mean of the ŷ values equals the mean y values – together
with previous properties, implies that the OLS regression line
passes through the overall mean of the data points
Normally distributed errors
OLS in R
> dail <- read.dta("dail2002.dta")
> mdl <- lm(votes1st ~ spend_total*incumb + minister, data=dail)
> summary(mdl)
Call:
lm(formula = votes1st ~ spend_total * incumb + minister, data = dail)
Residuals:
Min
1Q
-5555.8 -979.2
Median
-262.4
3Q
877.2
Max
6816.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
469.37438 161.54635
2.906 0.00384 **
spend_total
0.20336
0.01148 17.713 < 2e-16 ***
incumb
5150.75818 536.36856
9.603 < 2e-16 ***
minister
1260.00137 474.96610
2.653 0.00826 **
spend_total:incumb
-0.14904
0.02746 -5.428 9.28e-08 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1796 on 457 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.6672, Adjusted R-squared: 0.6643
F-statistic:
229 on 4 and 457 DF, p-value: < 2.2e-16
OLS in Stata
. use dail2002
(Ireland 2002 Dail Election - Candidate Spending Data)
. gen spendXinc = spend_total * incumb
(2 missing values generated)
. reg votes1st spend_total incumb minister spendXinc
Source |
SS
df
MS
-------------+-----------------------------Model | 2.9549e+09
4
738728297
Residual | 1.4739e+09
457 3225201.58
-------------+-----------------------------Total | 4.4288e+09
461 9607007.17
Number of obs
F( 4,
457)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
462
229.05
0.0000
0.6672
0.6643
1795.9
-----------------------------------------------------------------------------votes1st |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------spend_total |
.2033637
.0114807
17.71
0.000
.1808021
.2259252
incumb |
5150.758
536.3686
9.60
0.000
4096.704
6204.813
minister |
1260.001
474.9661
2.65
0.008
326.613
2193.39
spendXinc | -.1490399
.0274584
-5.43
0.000
-.2030003
-.0950794
_cons |
469.3744
161.5464
2.91
0.004
151.9086
786.8402
------------------------------------------------------------------------------
Sums of squares (ANOVA)
TSS Total sum of squares
P
(yi − ȳ )2
P
ESS Estimation or Regression sum of squares (ŷi − ȳ )2
P 2 P
RSS Residual sum of squares
ei = (ŷi − yi )2
The key to remember is that TSS = ESS + RSS
Examining the sums of squares
> yhat <- mdl$fitted.values
# uses the lm object mdl from previous
> ybar <- mean(mdl$model[,1])
> y <- mdl$model[,1]
# can’t use dail$votes1st since diff N
> TSS <- sum((y-ybar)^2)
> ESS <- sum((yhat-ybar)^2)
> RSS <- sum((yhat-y)^2)
> RSS
[1] 1473917120
> sum(mdl$residuals^2)
[1] 1473917120
> (r2 <- ESS/TSS)
[1] 0.6671995
> (adjr2 <- (1 - (1-r2)*(462-1)/(462-4-1)))
[1] 0.6642865
> summary(mdl)$r.squared
# note the call to summary()
[1] 0.6671995
> RSS/457
[1] 3225202
> sqrt(RSS/457)
[1] 1795.885
> summary(mdl)$sigma
[1] 1795.885
Regression model return values
Here we will talk about the quantities returned with the lm()
command and lm class objects.
Next (last) week
I
ANOVA - Analysis of variance
I
Regression with categorical independent variables
I
Regression with interaction terms (Brambor, Clark, and
Golder)
I
Preview of Quant 2 with non-linear/normal models
Download