Uploaded by tss6147

Regression Analysis Assignment#2

advertisement
Regression Analysis – Assignment #2
2017150407 전한림
1.
(a)
> x <- c(1,2,3,4,5,6,7)
> y <- c(2,5,6,9,9,11,14)
> X=cbind(1,x)
> b=solve(t(X)%*%X)%*%t(X)%*%y
> b
[,1]
0.7142857
x 1.8214286
> model.1 <- lm(y~x)
> summary(model.1)
Call:
lm(formula = y ~ x)
Residuals:
1
-0.5357
2
3
0.6429 -0.1786
4
5
6
1.0000 -0.8214 -0.6429
7
0.5357
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.7143
x
1.8214
0.6662
1.072
0.1490
0.333
12.226 6.47e-05 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7883 on 5 degrees of freedom
Multiple R-squared:
0.9676,
Adjusted R-squared:
F-statistic: 149.5 on 1 and 5 DF,
p-value: 6.474e-05
LSE is
> b
[,1]
0.7142857
x 1.8214286
and fitted regression equation is then
0.9612

      
(b)
> fm=lm(y~x)
> rm=lm(y~1)
> anova(rm,fm)
Analysis of Variance Table
Model 1: y ~ 1
Model 2: y ~ x
Res.Df
RSS Df Sum of Sq
1
6 96.000
2
5
3.107
1
F
Pr(>F)
92.893 149.48 6.474e-05 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
F = 149.48 and Pr(>F) is less then 0.05 so, under  =0.05 we can reject H0.
(c)
t =(1.8214-2)/0.1490=-1.1987
t(5,0.005)=-4.032
so, t>t(5,0.005) we cannot reject H0.
(d)
The 95% confidence interval for b0 is
(0.714 ± t(5,0.025) × 0.666) = (-0.998,2.426).
Since this interval contains 1, we cannot reject the null H0 : b0 = 1.
(e)
Multiple R-squared: 0.9676 (R^2)
> cor(x,y)
[1] 0.9836839
sampel correlation coefficient = 0.984
(f)
> predict(model.1, newdata = data.frame(x=3),interval="confidence")
fit
lwr
upr
1 6.178571 5.322257 7.034885
Therefore, 95% confidence interval for b0+3b1 is (5.322,7.035)
2.
(a)
> concrete<- read.table("concrete.txt",header=T,sep=",")
> attach(concrete)
> summary(fit<-lm(FLOW.cm.~Slag+Fly.ash+Water))
Call:
lm(formula = FLOW.cm. ~ Slag + Fly.ash + Water)
Residuals:
Min
1Q
-32.621 -10.941
Median
1.834
3Q
9.145
Max
23.894
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -49.771697
13.966001
-3.564 0.000564 ***
Slag
-0.090818
0.022051
-4.119 7.91e-05 ***
Fly.ash
-0.001257
0.016078
-0.078 0.937853
Water
0.540915
0.064349
8.406 3.21e-13 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.66 on 99 degrees of freedom
Multiple R-squared:
0.4958,
Adjusted R-squared:
F-statistic: 32.46 on 3 and 99 DF,
0.4806
p-value: 1.076e-14
The fitted linear equation is

    ×    ×    × 
(b)
When Water increases by 1 unit, while Slag and Fly.ash are fixed,
estimated Flow.cm. increased by 0.5409.
(c)
> FM=lm(FLOW.cm.~Slag+Fly.ash+Water)
> RM=lm(FLOW.cm.~1)
> anova(RM,FM)
Analysis of Variance Table
Model 1: FLOW.cm. ~ 1
Model 2: FLOW.cm. ~ Slag + Fly.ash + Water
Res.Df
RSS Df Sum of Sq
1
102 31483
2
99 15872
3
15611
F
Pr(>F)
32.456 1.076e-14 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Since the F-statistic is relatively large (p-value is small), we can reject the
null hypothesis (i.e. H0 : b1 = b2 = b3 = 0), and thus we cannot ignore all
predictor variables when explaining FLOW.cm.
(d)
See the regression results in 2(a). According to the individual t-statistics,
the effects of Slag and Water are strongly significant on FLOW.cm.. The
Fly.ash effect seems not statistically significant and may be removed from
the analysis.
(e)
> summary(fit.e <- lm(FLOW.cm.~Slag+Water))
Call:
lm(formula = FLOW.cm. ~ Slag + Water)
Residuals:
Min
1Q
-32.687 -10.746
Median
2.010
3Q
9.224
Max
23.927
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -50.26656
Slag
12.38669
-4.058 9.83e-05 ***
-0.09023
0.02064
-4.372 3.02e-05 ***
0.54224
0.06175
8.781 4.62e-14 ***
Water
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.6 on 100 degrees of freedom
Multiple R-squared:
0.4958,
Adjusted R-squared:
F-statistic: 49.17 on 2 and 100 DF,
0.4857
p-value: 1.347e-15
Linear regression is fitted with Fly.ash being removed and the fitted linear
equation is

    ×    × 
(f)
> median(Slag)
[1] 100
> median(Fly.ash)
[1] 164
> median(Water)
[1] 196
Keeping in mind the order of predictors, let x0 = (1, 100, 164, 196) denote the new
observation in the question. We can use the following R code:
> x0=c(intercept=1,Slag=100,Fly.ash=164,Water=196)
> X=cbind(1,Slag,Fly.ash,Water)
> b.hat=fit$coef # LSE
> b.hat
(Intercept)
-49.771697457
Slag
-0.090817519
Fly.ash
Water
-0.001256759
0.540914648
> sig.hat=summary(fit)$sigma # sigma hat
> mu0=t(x0)%*%b.hat # mean response
> mu0
[,1]
[1,] 46.95971
> se.mu0=sig.hat*sqrt(t(x0)%*%solve(t(X)%*%X)%*%x0)
> se.mu0
[,1]
[1,] 1.384796
>
cl.mu0=c(mu0-qt(0.025,lower.tail=F,df=99)*se.mu0,mu0+qt(0.025,lower.tail=F,df=99)*se.
mu0)
> cl.mu0
[1] 44.21198 49.70745
So, 95% confidence interval is (44.21198, 49.70745)
(g)
> y0=t(x0)%*%b.hat # predicted value
> se.y0=sig.hat*sqrt(1+t(x0)%*%solve(t(X)%*%X)%*%x0)
>
pl.mu0=
c(mu0-qt(0.005,lower.tail=F,df=99)*se.y0,mu0+qt(0.005,lower.tail=F,df=99)*se.y0)
> pl.mu0
[1] 13.50600 80.41343
So, 99% paediction interval is (13.50600, 80.41343)
3.
It is assumed that a simple linear regression (SLR) model was fitted for 103
observations, so
(i) n = 103, (e),(m) df = 103-2 = 101, and (a) df = 1, (b) = 3371.2
Since t-test equals to Coef/SE, (g) should be 57.02/2.69 = 21.1970 and (h) should
be 0.027*(-3.48) = -0.09396.
In the SLR setting, F = t1^2 and thus (c) F = (-3.48)^2 = 12.1104.
Since (c) = (b)/(f), (f) = (b)/(c), which is 3371.2/12.1104 = 278.3723.
Also, since (f) = (d)/(e), (d) = (f)*(e) = 278.3723*101 = 28115.6.
It follows from the ANOVA table, SSR = 3371.2 and SSE = (d) = 28115.6.
Then, (j) R^2 = SSR/SST = SSR/(SSR+SSE) = 3371.2/31486.8 = 0.1071.
(k) Ra^2 = 1-{SSE/(n-2)}/{SST/(n-1)}=1-(28115.6/101)/(31486.8/102) = 0.0982.
(l)

 
     
  
(a) = 1
(b) = 3371.2
(c) = 12.1104
(d) = 28115.6
(e) = 101
(f) = 278.3723
(g) = 21.1970
(h) = -0.09396
(i) = 103
(j) = 0.1071
(k) = 0.0982
(l) = 16.6845
(m) =101
Download