Uploaded by MAANVENDRA SINGH SONGIRA

Statistics Assignment3

advertisement
Name: Maanvendra Singh Songira
Roll. No: 2021H1540814P
Practical Group: P1
MBA, Business Analytics (2021-2023)
Exercise 3: Analysis of Wooldridge Dataset
Date: 28th September’ 2021
Data Source: wooldridge library dataset ceosal1
Problem/Questions: To analyze the wooldridge Dataset present in R and compare the
Rsquared values to find the most suitable model for OLS Regression.
Theory:
OLS Regression line is given by the equation:
y = β 0 + β1 * x + u
Where:
β0 = ŷ - β1 * x̂
β1 =
cov(x,y)
var(x)
Code:
library(wooldridge)
data("ceosal1")
b1<-cov(ceosal1$lsalary,ceosal1$salary)/var(ceosal1$lsalary)
print(paste("The value of b1 is:",b1))
## [1] "The value of b1 is: 1924.39557859903"
b0<-mean(ceosal1$salary)-b1*mean(ceosal1$lsalary)
print(paste("The value of b0 is:",b0))
## [1] "The value of b0 is: -12094.1726760315"
ceosal<-lm(ceosal1$salary~ceosal1$lsalary)
ceosal
##
## Call:
## lm(formula = ceosal1$salary ~ ceosal1$lsalary)
##
## Coefficients:
## (Intercept) ceosal1$lsalary
##
-12094
1924
plot(ceosal1$lsalary,ceosal1$salary,ylim=c(0,3000));abline(ceosal)
for (i in colnames(ceosal1)){ #Running for loop to get values of all the columns
print(i)
val<-lm(ceosal1$salary~ceosal1[[i]])
print(summary(val)[8]) #8th position of val gives us the Rsquared values
}
## [1] "salary" #Ignore this value as RSquare value of salary wrt salary is 1
## $r.squared
## [1] 1
##
## [1] "pcsalary"
## $r.squared
## [1] 7.520697e-05
##
## [1] "sales"
## $r.squared
## [1] 0.01436869
##
## [1] "roe"
## $r.squared
## [1] 0.01318862
##
## [1] "pcroe"
## $r.squared
## [1] 0.0008242895
##
## [1] "ros"
## $r.squared
## [1] 0.00113447
##
## [1] "indus"
## $r.squared
## [1] 0.005059995
##
## [1] "finance"
## $r.squared
## [1] 0.0006127428
##
## [1] "consprod"
## $r.squared
## [1] 0.04183918
##
## [1] "utility"
## $r.squared
## [1] 0.0339699
##
## [1] "lsalary"
## $r.squared
## [1] 0.6307666
##
## [1] "lsales"
## $r.squared
## [1] 0.03767174
Detail solution with Inference:
Rsquare signifies the percentage of explained variation, the more the value of Rsquare better
is the model of regression.
By comparing the values of Rsquare for all the given variables above, we find that “lsalary”
has the highest Rsquare value, therefore we choose this variable to formulate our OLS
Regression Model.
Regression Equation:
From above code:
β0 = -12094.1726760315
β1 = 1924.39557859903
y : Salary
x : Lsalary
Putting it in Equation: y = β0 + β1 * x + u
Therefore,
Salary = -12094.172 + 1924.395 * Lsalary + u
Download