Uploaded by Dakshesh Gupta

CA 9a Tutorial Su22

advertisement
R Tutorial for STAT 350 for Computer Assignment 9a
Author: Leonore Findsen, Chunyan Sun, Sarah H. Sellke
All of the tutorials for Computer Assignment 9 use the same data set.
Example: (Data Set: loc.txt)
Job Stress and Locus of Control. Many factors, such as the type of job, education
level, and job experience, can affect the stress felt by workers on the job. Locus of
control (LOC) is a term in psychology that describes the extent to which a person
believes he or she is in control of the events that influence his or her life. Is feeling “more
in control” associated with less job stress? A recent study examined the relationship
between LOC and several work-related behavioral measures among certified public
accountants in Taiwan. LOC was assessed using a questionnaire that asked
respondents to select one of two options for each of 23 items. Scores ranged from 0 to
23. Individuals with low LOC believe that their own behavior and attributes determine
their rewards in life. Those with high LOC believe that these rewards are beyond their
control. We will consider a random sample of 100 accountants.
a)
b)
c)
d)
e)
f)
Make a scatterplot of the data (including the least-squares regression line) with LOC
on the x-axis and Stress on the y-axis. Briefly describe the relationship between
Stress and LOC.
Compute the correlation coefficient between Stress and LOC.
Find the equation of the least-squares regression line for predicting Stress from
LOC.
What are MSE and R2 for these data?
Using the ANOVA table for linear regression, confirm the values of MSE and R2.
Calculate the correlation coefficient from R2 and compare the value with the value
calculated in part b).
Solution:
job <- read.table(file = "loc.txt", header = TRUE)
#
# a) Scatterplot of the data
#
library(ggplot2)
ggplot(job, aes(x=LOC, y=STRESS))+
geom_point() +
geom_smooth(method = lm, se = FALSE) +
ggtitle("Relationship between Stress and LOC") +
xlab("Locus") +
ylab("Stress")
#
# b) Correlation
#
cor(job$LOC, job$STRESS)
#
# c), d) Calculate linear regression and get results
#
job.lm <- lm(STRESS ~ LOC, data = job)
summary(job.lm)
#
# e) ANOVA table
#
anova(job.lm)
1
STAT 350: Introduction to Statistics
Department of Statistics, Purdue University, West Lafayette, IN 47907
R Tutorial for STAT 350 for Computer Assignment 9a
Author: Leonore Findsen, Chunyan Sun, Sarah H. Sellke
a) Make a scatterplot of the data (including the least-squares regression line) with LOC
on the x-axis and Stress on the y-axis. Briefly describe the relationship between Stress
and LOC.
Solution:
The plot looks linear with a positive direction. I am not sure about the strength because
the scale on the y-axis is so small. I do not see any x- or y-outliers.
b) Compute the correlation coefficient between Stress and LOC.
Solution:
> cor(job$LOC, job$STRESS)
[1] 0.3122765
The correlation coefficient between Stress and LOC is 0.3122765.
This looks like there is a weak but nonnegligible association between Stress and LOC.
2
STAT 350: Introduction to Statistics
Department of Statistics, Purdue University, West Lafayette, IN 47907
R Tutorial for STAT 350 for Computer Assignment 9a
Author: Leonore Findsen, Chunyan Sun, Sarah H. Sellke
c) Find the equation of the least-squares regression line for predicting Stress from LOC.
Solution:
Call:
lm(formula = STRESS ~ LOC, data = job)
Residuals:
Min
1Q
-1.04704 -0.33806
Median
0.02169
3Q
0.30798
Max
1.06715
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.25550
0.14691 15.353 < 2e-16 ***
LOC
0.03991
0.01226
3.254 0.00156 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4513 on 98 degrees of freedom
Multiple R-squared: 0.09752, Adjusted R-squared: 0.08831
F-statistic: 10.59 on 1 and 98 DF, p-value: 0.001562
The answer to this part is highlighted in yellow in the data above.
̂ = 2.25550 + 0.03991 LOC
𝑆𝑡𝑟𝑒𝑠𝑠
Be sure to always report the equation, not just the values of b0 and b1.
d) What are MSE and R2 for these data?
Solution:
The answer to this part is highlighted in green in the data above.
R2 = 0.09752
For simple linear regression, do not use the adjusted R-squared value.
This does not look very good.
MSE = 0.45132 = 0.2037
This is squared because the reported value is the standard deviation.
3
STAT 350: Introduction to Statistics
Department of Statistics, Purdue University, West Lafayette, IN 47907
R Tutorial for STAT 350 for Computer Assignment 9a
Author: Leonore Findsen, Chunyan Sun, Sarah H. Sellke
e) Using the ANOVA table for linear regression, confirm the values of MSE and R2.
Solution:
> anova(job.lm)
Analysis of Variance Table
Response: STRESS
Df Sum Sq Mean Sq F value
Pr(>F)
LOC
1 2.1565 2.15651 10.589 0.001562 **
Residuals 98 19.9578 0.20365
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
MSE = 0.20365
𝑆𝑆𝑅
𝑆𝑆𝑅
2.1565
2.1565
=
=
=
= 0.09752
𝑆𝑆𝑇 (𝑆𝑆𝑅 + 𝑆𝑆𝐸) 2.1565 + 19.9578 22.1143
These numbers match the values from the previous output.
𝑅2 =
f) Calculate the correlation coefficient from R2 and compare the value with the value
calculated in part b).
Solution:
For simple linear regression 𝑟 = ±√𝑅 2. The sign of r is the same as the sign of the
slope.
In this case, the slope is positive; therefore,
𝑟 = +√𝑅 2 = √0.09752 = 0.312228
In part b), we obtained a value of 0.3122765. the two values are the same to 6 decimal
places. The reason why they are not exact is because the value of R2 is rounded.
4
STAT 350: Introduction to Statistics
Department of Statistics, Purdue University, West Lafayette, IN 47907
Download