MATH 1115 Lecture Notes 9 Topic: Linear Regression Two variables, 𝑥 and 𝑦, are said to be linearly related if: When 𝑥 increases, 𝑦 increases. When 𝑥 increases, 𝑦 decreases. Determining the Line of Best Fit Given 𝑛 data points of the form (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ),…, (𝑥𝑛 , 𝑦𝑛 ), the line of best fit through these points is given by the regression line: 𝑦 = 𝑎0 + 𝑎1 𝑥 Where, 𝑎1 = 𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖 𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 ) 2 𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ 𝑥̅ = ∑ 𝑥𝑖 𝑛 𝑦̅ = ∑ 𝑦𝑖 𝑛 𝑛 ∑ 𝑥𝑖 𝑦𝑖 = ∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥𝑛 𝑦𝑛 𝑖=1 𝑛 ∑ 𝑥𝑖 = ∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 Sum of the product of all 𝒙 and 𝒚 values Sum of all 𝒙 values 𝑖=1 𝑛 ∑ 𝑦𝑖 = ∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛 Sum of all 𝒚 values 𝑖=1 MATH 1115 Semester I (2022/2023) Group 3 Ms. E. Sonron 𝑛 Sum of all squared 𝒙 values ∑(𝑥𝑖2 ) = ∑(𝑥𝑖2 ) = 𝑥12 + 𝑥22 + ⋯ + 𝑥𝑛2 𝑖=1 𝑛 ∑(𝑦𝑖2 ) = ∑(𝑦𝑖2 ) = 𝑦12 + 𝑦22 + ⋯ + 𝑦𝑛2 Sum of all squared 𝒚 values 𝑖=1 (∑ 𝑥𝑖 )2 Square of the sum of all 𝒙 values Please note ∑(𝒙𝟐𝒊 ) ≠ (∑ 𝒙𝒊 ) 𝟐 The point (𝑥̅ , 𝑦̅) always lies on the line of best fit thus 𝑦̅ = 𝑎0 + 𝑎1 𝑥̅ (i.e. 𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ ). Correlation Coefficient The correlation coefficient, 𝑟, indicates the strength of the linear relationship between 𝑥 and 𝑦 (or the linear degree of scatter among data points). To calculate the correlation coefficient, 𝑟 Formula 1: 𝑟= ∑ 𝑥𝑖 𝑦𝑖 − 𝑛𝑥̅ 𝑦̅ , −1 ≤ 𝑟 ≤ 1, 0 ≤ 𝑟 2 ≤ 1 √∑ 𝑥𝑖2 − 𝑛(𝑥̅ )2 √∑ 𝑦𝑖2 − 𝑛(𝑦̅)2 Formula 2: 𝑟= ∑ 𝑥𝑖 𝑦𝑖 − √∑ 𝑥𝑖2 − (∑ 𝑥𝑖 𝑛 )2 MATH 1115 Semester I (2022/2023) ∑ 𝑥𝑖 ∑ 𝑦𝑖 𝑛 √∑ 𝑦𝑖2 − (∑ 𝑦𝑖 𝑛 Group 3 )2 , −1 ≤ 𝑟 ≤ 1, 0 ≤ 𝑟 2 ≤ 1 Ms. E. Sonron 𝑟 = 1, is an indicator of perfect positive correlation (line has positive slope) 𝑟 = -1 is an indicator of perfect negative correlation (line has negative slope) 𝑟 = 0 suggests that there is absolutely no linearly correlation. Worked Example: Two variables, x and y are linearly related. From experiment: x 1.1 2.2 2.9 3.4 5.4 y 4.9 6.0 6.9 7.5 9.6 Determine the equation of the line of best fit (place answers to 2 d.p.) Using the equation of the line of best, find The value of 𝑦 when 𝑥 = 4.3 The value of 𝑥 when 𝑦 = 5.3 Compute the correlation coefficient, 𝑟. MATH 1115 Semester I (2022/2023) Group 3 Ms. E. Sonron We first must set up and complete the table as shown below: Determine the equation of the line of best fit (place answers to 2 d.p.) 𝑎1 = 𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖 𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 ) 2 = 5(115.94) − (15)(34.9) 56.2 = = 1.10 (𝑡𝑜 2 𝑑. 𝑝. ) 5(55.18) − (15)2 50.9 𝑥̅ = 𝑦̅ = ∑ 𝑥𝑖 15 = =3 𝑛 5 ∑ 𝑦𝑖 34.9 = = 6.98 𝑛 5 𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ = 6.98 − ( 56.2 × 3) = 3.67 (𝑡𝑜 2 𝑑. 𝑝. ) 50.9 Equation of the line of best fit is 𝒚 = 𝟑. 𝟔𝟕 + 𝟏. 𝟏𝟎𝒙 The value of 𝑦 when 𝑥 = 4.3 (Interpolation) 𝑦 = 3.67 + 1.10(4.3) = 8.4 The value of 𝑥 when 𝑦 = 5.3 (Extrapolation) 5.3 = 3.67 + 1.10𝑥 𝑥= MATH 1115 Semester I (2022/2023) 5.3 − 3.67 = 1.482 1.10 Group 3 Ms. E. Sonron Compute the correlation coefficient, r. ∑ 𝑥𝑖 𝑦𝑖 − 𝑛𝑥̅ 𝑦̅ 𝑟= √∑ 𝑥𝑖2 − 𝑛(𝑥̅ )2 √∑ 𝑦𝑖2 − 𝑛(𝑦̅)2 = 115.94 − 5(3)(6.98) √55.18 − (5)(3)2 √256.03 − (5)(6.98)2 = 11.24 √10.18√12.428 = 0.999 The value of 𝑟 is very close to 1, which can be interpreted to mean that there is near perfect positive linear correlation between 𝑥 and 𝑦. Graphical Presentation in Excel MATH 1115 Semester I (2022/2023) Group 3 Ms. E. Sonron