Name: Maanvendra Singh Songira Roll. No: 2021H1540814P Practical Group: P1 MBA, Business Analytics (2021-2023) Exercise 3: Analysis of Wooldridge Dataset Date: 28th September’ 2021 Data Source: wooldridge library dataset ceosal1 Problem/Questions: To analyze the wooldridge Dataset present in R and compare the Rsquared values to find the most suitable model for OLS Regression. Theory: OLS Regression line is given by the equation: y = β 0 + β1 * x + u Where: β0 = ŷ - β1 * x̂ β1 = cov(x,y) var(x) Code: library(wooldridge) data("ceosal1") b1<-cov(ceosal1$lsalary,ceosal1$salary)/var(ceosal1$lsalary) print(paste("The value of b1 is:",b1)) ## [1] "The value of b1 is: 1924.39557859903" b0<-mean(ceosal1$salary)-b1*mean(ceosal1$lsalary) print(paste("The value of b0 is:",b0)) ## [1] "The value of b0 is: -12094.1726760315" ceosal<-lm(ceosal1$salary~ceosal1$lsalary) ceosal ## ## Call: ## lm(formula = ceosal1$salary ~ ceosal1$lsalary) ## ## Coefficients: ## (Intercept) ceosal1$lsalary ## -12094 1924 plot(ceosal1$lsalary,ceosal1$salary,ylim=c(0,3000));abline(ceosal) for (i in colnames(ceosal1)){ #Running for loop to get values of all the columns print(i) val<-lm(ceosal1$salary~ceosal1[[i]]) print(summary(val)[8]) #8th position of val gives us the Rsquared values } ## [1] "salary" #Ignore this value as RSquare value of salary wrt salary is 1 ## $r.squared ## [1] 1 ## ## [1] "pcsalary" ## $r.squared ## [1] 7.520697e-05 ## ## [1] "sales" ## $r.squared ## [1] 0.01436869 ## ## [1] "roe" ## $r.squared ## [1] 0.01318862 ## ## [1] "pcroe" ## $r.squared ## [1] 0.0008242895 ## ## [1] "ros" ## $r.squared ## [1] 0.00113447 ## ## [1] "indus" ## $r.squared ## [1] 0.005059995 ## ## [1] "finance" ## $r.squared ## [1] 0.0006127428 ## ## [1] "consprod" ## $r.squared ## [1] 0.04183918 ## ## [1] "utility" ## $r.squared ## [1] 0.0339699 ## ## [1] "lsalary" ## $r.squared ## [1] 0.6307666 ## ## [1] "lsales" ## $r.squared ## [1] 0.03767174 Detail solution with Inference: Rsquare signifies the percentage of explained variation, the more the value of Rsquare better is the model of regression. By comparing the values of Rsquare for all the given variables above, we find that “lsalary” has the highest Rsquare value, therefore we choose this variable to formulate our OLS Regression Model. Regression Equation: From above code: β0 = -12094.1726760315 β1 = 1924.39557859903 y : Salary x : Lsalary Putting it in Equation: y = β0 + β1 * x + u Therefore, Salary = -12094.172 + 1924.395 * Lsalary + u