Linear regression with one independent variable Y

advertisement
Linear regression with one independent variable
Linear regression assumes a linear relationship between the dependent and independent variables.
Yi = b0+b1X1+i, i = 1, …, n
Yi – dependent variable
b0 - intercept
b1X1 – slope times independent variable
i – error term
In the regression which contains one independent variable (X), the slope coefficient equals Cov(Y,X)
/ Var(X).
Assumptions of linear regression:
1. The relationship between the dependent variable, Y, and the independent variable, X, is linear in
the parameters b0 and b1. The requirement does not exclude X from being raised to a power other than
1.
2. The independent variable, X, is not random.
3. The expected value of the error term equals to 0.
4. The variance of the error term is the same for all observations (homoskedasticity assumption)
5. The error term is uncorrelated across observations.
6. The error term is normally distributed.
Regression analysis uses two principal types of data:
- cross-sectional: data, which involves many observations on X and Y for the same time period
- time series: data that use many observations from different time periods for the same company,
assets class, investment fund, person, country etc.
Linear regression, also known as linear least squares, computes a line that best fits the observations; it
chooses values for the intercept, b0, and slope, b1, that minimize the sum of the squared vertical
distances between the observations and the regression line.
Understanding Excel’s linear regression output
(example of estimating Intel’s beta from the class)
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.7283
R Square
0.5304
Adjusted R
Square
0.5223
Standard Error
0.0927
Observations
60
ANOVA
df
Regression
Residual
Total
1
58
59
Standard
Error
MS
0.5627
0.0086
F
65.5155
Significance
F
0.0000
-0.0029
0.0120
-0.2438
0.8083
-0.0269
Lower
95.0%
0.0210 0.0269
2.2516
0.2782
8.0942
0.0000
1.6948
2.8084 1.6948
Coefficients
Intercept
X Variable 1 (beta
of the stock)
SS
0.5627
0.4982
1.0609
t Stat
P-value
Lower 95%
Upper
95%
Upper
95.0%
2.8084
0.0210
Multiple R = The correlation between the actual and forecasted values of the dependent variable in
the regression
Coefficient of determination (R Square) = Explained variation / Total variation = (Total variation –
Unexplained variation)/ Total variation – a measure of goodness of fit of an estimated regression to
the data. 53% of total variation in the variable is explained by the model (the higher this number the
better particular model is specified)
Adjusted coefficient of determination (Adjusted R Square) – a measure of goodness of fit of an
estimated regression to the data. Is used in the multiple regression, adjusted for degrees of freedom.
Standard error of estimate – measure the standard deviation of the residual term in the regression;
how precise is the regression (the lower the number the more precise is the regression).
ANOVA = analysis of variance
df = degrees of freedom; the number of independent observations used.
SS = sum of squares, the sum of squared errors of residuals – SSE.
MS = mean sum of squares, which is SS/df
F = F-statistic measures how well the regression equation explains the variation in the dependent
variable. If the independent variable explains little of the variation in the dependent variable, the
value of the F-statistic will be very small. F is squared value of the s-statistic for the slope (X
Variable 1): 8.09422 = 65.6160
Significance F = reports the overall significance of the test. The lower the number the more
significant is the regression.
t-stat = value of the s-statistic, used in hypothesis testing (null hypothesis against alternative
hypothesis). Should be compared with critical values from the table of student’s t-Distribution ( can
be found at the end of math or statistics textbook). “The rule of thumb” is to consider the critical
value equal to +/- 2.
p-value = the smallest level of significance at which the null hypothesis can be rejected ( the higher
the p-value the stronger are the regression results)
Lower 95%/ Upper 95% = the
lower and upper bounds for a 95% confidence interval. Confidence
interval = a range that has a given probability that it will contain the population parameter it is
intended to estimate. For instance, with a 95% confidence we can say that intercept has the value
between -0.0269 and 0.0210.
Download