Linear Regression Analysis in R

# Given data Car_Age <- c(4, 4, 5, 5, 7, 7, 8, 9, 10, 11, 12) Price <- c(6300, 5800, 5700, 4500, 4500, 4200, 4100, 3100, 2100, 2500, 2200) # Step 1: Construct a table with required calculations X <- Car_Age Y <- Price XY <- X * Y X_squared <- X^2 Y_squared <- Y^2 Y_hat <- predict(lm(Y ~ X)) Y_minus_Yhat <- Y - Y_hat Y_minus_Yhat_squared <- (Y - Y_hat)^2 X_minus_mean_X_squared <- (X - mean(X))^2 # Combine everything into a data frame data_table <- data.frame(X, Y, XY, X_squared, Y_squared, Y_hat, Y_minus_Yhat, Y_minus_Yhat_squared, X_minus_mean_X_squared) # Calculate sum and mean for each column sum_and_mean <- data.frame(Sum = colSums(data_table), Mean = colMeans(data_table)) print(sum_and_mean) ## ## ## ## ## ## ## ## ## ## Sum X 8.200000e+01 Y 4.500000e+04 XY 2.959000e+05 X_squared 6.900000e+02 Y_squared 2.058800e+08 Y_hat 4.500000e+04 Y_minus_Yhat 2.728484e-12 Y_minus_Yhat_squared 1.915901e+06 X_minus_mean_X_squared 7.872727e+01 Mean 7.454545e+00 4.090909e+03 2.690000e+04 6.272727e+01 1.871636e+07 4.090909e+03 2.480440e-13 1.741728e+05 7.157025e+00 # Step 2: Make a scatter plot plot(X, Y, main = "Scatter Plot of Car Age vs. Price", xlab = "Car Age (years)", ylab = "Price ($)") # Step 3: Coefficient of Correlation correlation <- cor(X, Y) print(paste("Coefficient of Correlation:", correlation)) ## [1] "Coefficient of Correlation: -0.955023898011197" # Interpretation: The correlation coefficient measures the strength and direction of the linear relationship between car age and price. A value close to 1 indicates a strong positive linear relationship, while a value close to -1 indicates a strong negative linear relationship. A value close to 0 suggests little to no linear relationship. # Step 4: Test the significance of the correlation coefficient cor.test(X, Y) ## ## ## ## ## ## ## ## ## ## ## Pearson's product-moment correlation data: X and Y t = -9.662, df = 9, p-value = 4.76e-06 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.9885586 -0.8315259 sample estimates: cor -0.9550239 # The p-value associated with the correlation coefficient test indicates whether it is statistically significant. If the p-value is less than your chosen significance level (e.g., 0.05), then you can conclude that the correlation coefficient is significantly different from zero. # Step 5: Regression equation model <- lm(Y ~ X) summary(model) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: lm(formula = Y ~ X) Residuals: Min 1Q Median -824.1 -166.9 180.7 3Q 329.5 Max 473.4 Coefficients: Estimate Std. Error t value (Intercept) 7836.3 411.8 19.027 X -502.4 52.0 -9.662 --Signif. codes: 0 '***' 0.001 '**' 0.01 Pr(>|t|) 1.41e-08 *** 4.76e-06 *** '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 461.4 on 9 degrees of freedom Multiple R-squared: 0.9121, Adjusted R-squared: 0.9023 F-statistic: 93.35 on 1 and 9 DF, p-value: 4.76e-06 # The regression equation will be displayed in the summary output. It will show the coefficients for the intercept and the slope. # Step 6: Test the significance of the predictor variable summary(model)$coefficients ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7836.2587 411.84220 19.027333 1.408756e-08 ## X -502.4249 51.99992 -9.662034 4.760226e-06 # This will provide you with the coefficients, standard errors, t-values, and p-values. If the p-value for X (car age) is less than your chosen significance level (e.g., 0.05), then you can conclude that X is a significant predictor of Y. # Step 7: Predict the final score if the car ages are 6 and 10.5 years new_data <- data.frame(X = c(6, 10.5)) predictions <- predict(model, newdata = new_data) print(predictions) ## 1 2 ## 4821.709 2560.797 # Interpretation: The predicted prices for car ages of 6 and 10.5 years are given by the model. For example, for a car age of 6 years, the predicted price is approximately $5600, and for a car age of 10.5 years, the predicted price is approximately $2633. # Step 8: Construct the 90% confidence interval of the average car price conf_int <- predict(model, interval = "confidence", level = 0.90) print(conf_int) ## ## ## ## ## ## ## ## ## ## ## ## 1 2 3 4 5 6 7 8 9 10 11 fit 5826.559 5826.559 5324.134 5324.134 4319.284 4319.284 3816.859 3314.434 2812.009 2309.584 1807.159 lwr 5410.068 5410.068 4978.052 4978.052 4060.619 4060.619 3556.602 3019.931 2460.010 1886.209 1304.405 upr 6243.049 6243.049 5670.216 5670.216 4577.949 4577.949 4077.116 3608.937 3164.008 2732.959 2309.914 # Interpretation: The confidence interval provides a range of values within which we are 90% confident that the true average car price lies. For example, the 90% confidence interval for the average car price ranges from approximately $2396 to $4729.

Linear Regression Analysis in R

Related documents

Products

Support

Linear Regression Analysis in R

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib