Simple Linear Regression: Concepts and Application

10/14/2025 Simple Linear Regression REIGNER JAY B. ESCARTIN, RMT College of Medical and Biological Sciences University of the Immaculate Conception 315 Linear Regression Linear Regression is the next step up after correlation. It is used when we want to predict the value of a variable based on the value of another variable. The variable we want to predict is called the dependent variable (or sometimes, the outcome variable). The variable we are using to predict the other variable's value is called the independent variable (or sometimes, the predictor variable). 316 Linear Regression EXAMPLE: You could use linear regression to understand whether exam performance can be predicted based on revision time; whether cigarette consumption can be predicted based on smoking duration; and so forth. If you have two or more independent variables, rather than just one, you need to use multiple regression. 317 1 10/14/2025 Linear Regression & Correlation ❏ Linear regression is closely related to correlation ❏ Both involved relationships between two variables ❏ Both uses paired scores taken from the same or matched subjects ❏ Correlation is concerned with the magnitude and direction of the relationship ❏ Regression focuses on using the relationship for prediction ❏ If the relationship is perfect, the prediction is also perfect. 318 Slope-intercept form of a line y = mx + b where, x = random variable m = slope of the line (rise/run) b = y-intercept (crosses y-axis) 319 The Regression Statistical Model where, β0 = y - intercept population parameter β1 = slope population parameter ϵ = error term, “noise” that prevent the x and y from forming a perfectly straight line, or unexplained variation in y (also called Residuals) Population Regression Model: “Data = fit + residual” 320 2 10/14/2025 Best Fitting Line Equation ❏ The intercept β0, slope β1, and standard deviation σ of y are the unknown parameters of the regression model and must be estimated from the sample data. ❏ The least squares regression line obtained from the sample data is the best estimate of the true population regression line ( ): ❏ The value of ŷ from the least squares regression line is really a prediction of the mean value of y for a given value of x. 321 General Linear Regression Lines ❏ If E(y) = β0 + 0(x) ❏ Slope β1 is 0 ❏ If E(y) = β0 + β1(x) ❏ If E(y) = β0 - β1(x) ❏ Slope β1 is positive ❏ Slope β1 is negative 322 Linear Regression 323 3 10/14/2025 Linear Regression The line allows us to predict what score they would have based on the hemoglobin level 324 Linear Regression y= b + mx y= -16+8x b = -16 (intercept) m=8 (slope/unstandardized regression coefficient) y= -16+8(11.5g/dL) y= 76 325 Linear Regression y= -16+8(14.5g/dL) y= 100 y= b + mx y= -16+8x b = -16 (intercept) m=8 (slope/unstandardized regression coefficient) y= -16+8(11.5g/dL) y= 76 This is in the context of a perfect relationship, but in reality we never have perfect relationships. 326 4 10/14/2025 Prediction and Imperfect Relationships ❏ In imperfect relationships, the task is to determine a single line that best describes the data. ❏ A line that minimizes errors of prediction ❏ The least squares regression line - a prediction line that minimizes the errors of prediction 327 Prediction and Imperfect Relationships 328 Prediction and Imperfect Relationships 329 5 10/14/2025 Constructing the least squares regression line: Regression of y on x 330 Constructing the least squares regression line: Regression of y on x 331 Prediction and Imperfect Relationships Predicted Y’ Observed Y 332 6 10/14/2025 Prediction and Imperfect Relationships Predicted Y’ Observed Y 333 Prediction and Imperfect Relationships Predicted Y’ Error = Observed Y’ - Predicted Y = Y - Y’ } Error/residuals Observed Y 334 Sum of Squares: Basis of evaluating the regression model ❏ Most regression analyses will produce the best model available, but how good is it actually and how much error is in the model? ❏ This can be determined by looking at ‘the goodness of fit’ using the sums of squares. This is a measure of how close the actual data points are close to the modelled regression line. 335 7 10/14/2025 Sum of Squares: Basis of evaluating the regression model The vertical difference between the data points and the predicted regression line is known as the residuals. These values are squared to remove the negative numbers and then summed to give SSR. This is effectively the error of the model or the ‘goodness of fit’, obviously the smaller the value the less error in the model. 336 Sum of Squares: Basis of evaluating the regression model The vertical difference between the data points and the mean of the outcome variable can be calculated. These values are squared to remove the negative numbers and then summed to give the total sum of the squares SST. This shows how good the mean value is as a model of the outcome scores. 337 Sum of Squares: Basis of evaluating the regression model The vertical difference between the mean of the outcome variable and the predicted regression line is now determined. Again these values are squared to remove the negative numbers and then summed to give the model sum of squares (SSM). This indicates how better the model is compared to just using the mean of the outcome variable. SST is the total sum of the squares. 338 8 10/14/2025 Sum of Squares: Basis of evaluating the regression model So, the larger the SSM the better the model is at predicting the outcome compared to the mean value alone. If this is accompanied by a small SSR the model also has a small error. R2 is similar to the coefficient of determination in correlation in that it shows how much of the variation in the outcome variable can be predicted by the predictor variable(s). 339 Sum of Squares: Basis of evaluating the regression model In regression, the model is assessed by the F statistic based on the improvement in the prediction of the model SSM and the residual error SSR. The larger the F value the better the model. 340 Reporting the Results Here it can be seen that the correlation (R) between the two variables is high (0.784). The R2 value of 0.614 tells us that right leg strength accounts for 61.4% of the variance in kick distance. DurbinWatson checks for correlations between residuals, which can invalidate the test. This should be above 1 and below 3 and ideally around 2. R 2 provides information on how much variance is explained by the model using the predictors provided. 341 9 10/14/2025 Reporting the Results The ANOVA table shows all the sums of squares mentioned earlier. With regression being the model and Residual being the error. The F-statistic is significant - 𝑝 = .002. This tells us that the model is a significantly better predictor of kicking distance than the mean distance. Report as 𝐹 (1, 11) = 17.53, 𝑝 = .002. F-statistic provides information as to how good the model is. 342 Reporting the Results This table gives the coefficients (unstandardized) that can be put in the linear equation: The unstandardized (b)y = mx + b or y = b + mx value provides a constant which reflects the strength where the relationship y = estimated dependent outcome variable score of between the predictor(s) m = regression coefficient (R_Strength) (slope) and the outcome variable. b = constant (intercept) x = a score on the independent predictor variable Distance = 57.105 + (6.452 * 60) = 454.6 m 343 Reporting the Results A simple linear regression was used to predict rugby kicking distance from right leg strength. Leg strength was shown to explain a significant amount of the variance in the kicking distance: 𝐹(1, 11) = 17.53, 𝑝 = .002, 𝑟 2 = 0.614. The regression coefficient ( 𝑏 = 6.452 ) allows the kicking distance to be predicted using the following regression equation: Distance = 57.105 + (6.452 * Right leg strength) 344 10

Simple Linear Regression: Concepts and Application

Related documents

Products

Support

Simple Linear Regression: Concepts and Application

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib