Uploaded by cbot360 cgi

Formula Sheet 17-11-2020

advertisement
FORMULA SHEET - SIMPLE LINEAR REGRESSION
1. Prediction Equation
𝑦̂𝑖 = 𝛽̂0 + 𝛽̂1 π‘₯1
=
=
𝑆𝑆π‘₯𝑦
𝑆𝑆π‘₯π‘₯
∑ π‘₯𝑖 𝑦𝑖 − 𝑛π‘₯Μ… 𝑦̅
π‘π‘œπ‘£(π‘₯, 𝑦)
=
2
∑ π‘₯𝑖 − 𝑛π‘₯Μ… 2
π‘£π‘Žπ‘Ÿ(π‘₯)
𝑆𝑦
=π‘Ÿ∗
𝑆π‘₯
𝑆𝑆π‘₯𝑦 =
12. Adjusted 𝑅2
∑ π‘₯ 2 − (∑ π‘₯)2
𝑛
𝑛
𝛽̂0 = 𝑦̅ − 𝛽̂1 π‘₯Μ…
𝑅𝐴2 = The adjusted coefficient of
determination
2
9. Confidence Interval for
Mean value of Y given x
A (1 − 𝛼) 100% confidence
interval for E(Y|X):
𝑆𝑆𝑅
𝑆𝑆𝐸
𝑅2 =
=1−
𝑆𝑆𝑇
𝑆𝑆𝑇
5. Standard Error of Estimate
∑(π‘Œπ‘–
𝑆𝑒 = √
𝑛−π‘˜−1
A (1 – 𝛼) 100% prediction interval
for Y is:
π‘ŒΜ‚π‘– ± 𝑑(𝛼,𝑛−2) 𝑆𝑒 √1 +
1
π‘₯Μ… 2
𝑆(𝛽0 ) = 𝑆𝑒 √ +
𝑛 (𝑛 − 1)𝑆π‘₯ 2
=
𝑆(𝛽1 ) =
𝑆𝑒 × √∑ π‘₯ 2
√𝑆𝑆π‘₯π‘₯
1
𝑆𝑒
=
√𝑛 − 1 𝑆π‘₯
7. Test statistic for𝛽̂1
𝑑(𝑛−2)
πΈπ‘ π‘‘π‘–π‘šπ‘Žπ‘‘π‘’ − π‘ƒπ‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ
=
𝐸𝑠𝑑. 𝑠𝑑𝑑. π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ π‘œπ‘“ π‘’π‘ π‘‘π‘–π‘šπ‘Žπ‘‘π‘’
π‘˜ = Number of explanatory
variables
13. Variance Inflation Factor
1
1 − 𝑅𝑗2
𝑅𝑗2 is the coefficient of
determination for the regression
of 𝑋𝑗 as the dependent variable
and all other 𝑋𝑖 as independent
variables
If VIF >10, Multicollinearity is
suspected
14. Tolerance Factor
𝑋𝑖 is the observed value of
independent variable,
π‘ŒΜ‚π‘– is the estimate of Y,
1 − 𝑅𝑗2 =
1
𝑉𝐼𝐹
15. Beta weights
(Standardized Beta)
𝑛 is the sample size, and
𝑆𝑒 is the standard error
11. Coefficient of Correlation
(for simple regression)
√𝑛𝑆𝑆π‘₯π‘₯
𝑆𝑒
1 (𝑋𝑖 − 𝑋̅)2
+
𝑛
𝑆𝑆𝑋
where
6. Standard Error of β0 and
β1
𝑛 = Number of observations
𝑆𝑆𝑋 = (𝑛 − 1)𝑆π‘₯2
2
− π‘ŒΜ‚)2
𝑅 2 = Unadjusted coefficient of
determination
𝑉𝐼𝐹(𝑋𝑗 ) =
10. Prediction Interval for a
random value of Y given x
4. Coefficient of
Determination
𝑛−1
𝑛 − (π‘˜ + 1)
𝛽1 ± 𝑑(𝛼,𝑛−2) × π‘†π‘’ (𝛽1 )
Here π‘ŒΜ‚ is the E(Y|X)
3. Sample Y Intercept
𝑆𝑆𝐸⁄
(𝑛 − π‘˜ − 1)
=1−
𝑆𝑆𝑇⁄
(𝑛 − 1)
𝑅𝐴2 = 1 − (1 − 𝑅 2 ) ×
1 (𝑋𝑖 − 𝑋̅)2
π‘ŒΜ‚π‘– ± 𝑑(𝛼,𝑛−2) 𝑆𝑒 √ +
𝑛
𝑆𝑆𝑋
2
∑ π‘₯𝑦− ∑ π‘₯ ∑ 𝑦
𝑅𝐴2
𝛽0 ± 𝑑(𝛼,𝑛−2) × π‘†π‘’ (𝛽0 )
2
∑(π‘₯𝑖 − π‘₯Μ„ )(𝑦𝑖 − 𝑦̄ )
∑(π‘₯𝑖 − π‘₯Μ„ )2
𝑆𝑆π‘₯π‘₯ =
𝛽̂1 − 𝛽1
𝑆𝑒 (𝛽̂1 )
8. Confidence Interval for β0
and β1
2. Sample Slope
𝛽̂1 =
=
π‘Ÿ = √𝑅 2 =
π‘†π‘†π‘‹π‘Œ
√𝑆𝑆𝑋𝑋 π‘†π‘†π‘Œπ‘Œ
Forward Regression
𝐹𝑖𝑛 > 3.84
𝑃𝑖𝑛 < 0.05
Backward Regression
πΉπ‘œπ‘’π‘‘ < 2.71
π‘ƒπ‘œπ‘’π‘‘ > 0.10
π΅π‘’π‘‘π‘Ž = 𝛽𝑖 ×
𝑆π‘₯
𝑆𝑦
𝑆π‘₯ = Standard deviation of X
𝑆𝑦 = Standard deviation of Y
ANOVA TABLE
Source of Variation
Sum of Squares
Degrees of
Freedom
Regression
SSR
k
Error
SSE
n-(k+1)
Total
SST
n-1
Mean Square
F Statistic
𝑆𝑆𝑅⁄
π‘˜
𝑆𝑆𝐸⁄
(𝑛 − (π‘˜ + 1))
𝐹(π‘˜,𝑛−π‘˜−1) =
𝑀𝑆𝑅
𝑀𝑆𝐸
16. Partial F Test
πΉπ‘Ÿ,𝑛−(π‘˜+1)
(𝑆𝑆𝐸𝑅 − 𝑆𝑆𝐸𝐹 )⁄
π‘Ÿ
=
𝑀𝑆𝐸𝐹
17. F Test (Overall significance of the model)
πΉπ‘˜,𝑛−(π‘˜+1) =
(𝑅𝐹2 − 𝑅𝑅2 )⁄
π‘Ÿ
=
2
(1 − 𝑅𝐹 )⁄
(𝑛 − π‘˜ − 1)
=
𝑀𝑆𝑅
𝑀𝑆𝐸
𝑆𝑆𝑅⁄
π‘˜
𝑀𝑆𝐸⁄
(𝑛 − (π‘˜ + 1))
𝑅 2⁄
π‘˜
=
(1 − 𝑅 2 )
⁄(𝑛 − (π‘˜ + 1))
𝑆𝑆𝐸𝑅 = Sum of squared errors for reduced model
𝑆𝑆𝐸𝐹 = Sum of squared errors for full model
π‘Ÿ = Number of variables dropped from the full model /
or added to the reduced model
MULTIPLE LINEAR REGRESSION
18. Prediction Interval
A (1 − 𝛼) 100% PI (Prediction
Interval) for value of a randomly
chosen π‘Œ, given values of 𝑋𝑖 :
𝑦̂ ± 𝑑(𝛼,(𝑛−(π‘˜+1)) √𝑠 2 (𝑦̂) + 𝑀𝑆𝐸
2
19. Confidence Interval
A (1 − 𝛼) 100% CI (Confidence
Interval) for a conditional mean of
π‘Œ, given values of 𝑋𝑖 :
𝑦̂ ± 𝑑(𝛼,(𝑛−(π‘˜+1)) 𝑆[𝐸̂ (π‘Œ)]
2
20. Partial correlation
Correlation between 𝑦 and π‘₯1 ,
when the influence of π‘₯2 is
removed from both 𝑦 and π‘₯1 :
π‘π‘Ÿπ‘¦1,2 =
π‘Ÿπ‘¦1 − (π‘Ÿπ‘¦2 )(π‘Ÿ12 )
2
√1 − π‘Ÿπ‘¦22 √1 − π‘Ÿ12
21. Semi-partial correlation
(Part correlation)
Correlation between 𝑦 and π‘₯1 ,
when the influence of π‘₯2 is
removed from π‘₯1 (but not out of
𝑦):
π‘ π‘Ÿπ‘¦1,2 =
π‘Ÿπ‘¦1 − (π‘Ÿπ‘¦2 )(π‘Ÿ12 )
2
√1 − π‘Ÿ12
Square of part correlation of an
explanatory variable = unique
contribution of the explanatory
variable to 𝑅 2
When this variable is added
π‘ π‘Ÿ 2 = πΆβ„Žπ‘Žπ‘›π‘”π‘’ 𝑖𝑛 𝑅 2
2
2
= 𝑅𝑛𝑒𝑀
− π‘…π‘œπ‘™π‘‘
π‘π‘Ÿ 2 =
πΆβ„Žπ‘Žπ‘›π‘”π‘’ 𝑖𝑛 𝑅 2
2
1 − π‘…π‘œπ‘™π‘‘
22. Omitted variable bias
Actual relationship
π‘Œ = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2
Fitted model
π‘Œ = 𝛼0 + 𝛼1 𝑋1
Then
∝1 = 𝛽1 + 𝛽2 ×
πΆπ‘œπ‘£(π‘₯1 , π‘₯2 )
π‘‰π‘Žπ‘Ÿ(π‘₯1 )
JD Sir Formulae
Slide 1:
b1 = coefficient
Covariance:
Se(b1) = corresponding standard
error
t test (Significance of regression)
Eg: for alpha = 0.05 and n = 10
Confidence Interval for E[ΕΆ|X]:
Correlation Coefficient:
Prediction Interval for a specific
Ordinary Least Squares
ΕΆ:
Estimators:
P value of coefficients:
=T.DIST.2T(|tStat|,n-k-1)
(k is no of independent variables)
Slide 4:
(If <0.05, Reject Ho where Ho will
be coefficient = 0)
Slide 2:
Slide 3:
Summary Output:
Hypothesis test for beta1 = 0
Multiple R = sqrt(R^2) =
Omitted Variable Bias:
correlation when it’s SLR
Given equations:
𝑆𝑆𝑅
𝑆𝑆𝐸
𝑅 =
=1−
𝑆𝑆𝑇
𝑆𝑆𝑇
2
Standard Error, Se = sqrt(MSE)
F = MSR/MSE = t^2(for SLR beta1)
P value of F test and T test for
beta1 will be identical for SLR.
Hypotheses test for beta1<=a
MSR = SSR/k
MSE = SSE/n-k-1
SST = SSR+SSE
F test (Significance of overall
Actual T Value using VIF
Confidence Interval for beta1
Standard Error of Coefficients:
(SLR)
Download