Uploaded by Mikkel Charles

9. Linear Regression

advertisement
MATH 1115 Lecture Notes 9
Topic: Linear Regression
Two variables, 𝑥 and 𝑦, are said to be linearly related if:
When 𝑥 increases, 𝑦 increases.
When 𝑥 increases, 𝑦 decreases.
Determining the Line of Best Fit
Given 𝑛 data points of the form (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ),…, (𝑥𝑛 , 𝑦𝑛 ), the line of best fit
through these points is given by the regression line:
𝑦 = 𝑎0 + 𝑎1 𝑥
Where,
𝑎1 =
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖
𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )
2
𝑎0 = 𝑦̅ − 𝑎1 𝑥̅
𝑥̅ =
∑ 𝑥𝑖
𝑛
𝑦̅ =
∑ 𝑦𝑖
𝑛
𝑛
∑ 𝑥𝑖 𝑦𝑖 = ∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥𝑛 𝑦𝑛
𝑖=1
𝑛
∑ 𝑥𝑖 = ∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
Sum of the product of all 𝒙 and 𝒚
values
Sum of all 𝒙 values
𝑖=1
𝑛
∑ 𝑦𝑖 = ∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛
Sum of all 𝒚 values
𝑖=1
MATH 1115 Semester I (2022/2023)
Group 3
Ms. E. Sonron
𝑛
Sum of all squared 𝒙 values
∑(𝑥𝑖2 ) = ∑(𝑥𝑖2 ) = 𝑥12 + 𝑥22 + ⋯ + 𝑥𝑛2
𝑖=1
𝑛
∑(𝑦𝑖2 ) = ∑(𝑦𝑖2 ) = 𝑦12 + 𝑦22 + ⋯ + 𝑦𝑛2
Sum of all squared 𝒚 values
𝑖=1
(∑ 𝑥𝑖 )2
Square of the sum of all 𝒙 values
Please note ∑(𝒙𝟐𝒊 ) ≠ (∑ 𝒙𝒊 )
𝟐
The point (𝑥̅ , 𝑦̅) always lies on the line of best fit thus 𝑦̅ = 𝑎0 + 𝑎1 𝑥̅
(i.e. 𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ ).
Correlation Coefficient
The correlation coefficient, 𝑟, indicates the strength of the linear relationship
between 𝑥 and 𝑦 (or the linear degree of scatter among data points).
To calculate the correlation coefficient, 𝑟
Formula 1:
𝑟=
∑ 𝑥𝑖 𝑦𝑖 − 𝑛𝑥̅ 𝑦̅
, −1 ≤ 𝑟 ≤ 1, 0 ≤ 𝑟 2 ≤ 1
√∑ 𝑥𝑖2 − 𝑛(𝑥̅ )2 √∑ 𝑦𝑖2 − 𝑛(𝑦̅)2
Formula 2:
𝑟=
∑ 𝑥𝑖 𝑦𝑖 −
√∑ 𝑥𝑖2 − (∑ 𝑥𝑖
𝑛
)2
MATH 1115 Semester I (2022/2023)
∑ 𝑥𝑖 ∑ 𝑦𝑖
𝑛
√∑ 𝑦𝑖2 − (∑ 𝑦𝑖
𝑛
Group 3
)2
, −1 ≤ 𝑟 ≤ 1, 0 ≤ 𝑟 2 ≤ 1
Ms. E. Sonron
𝑟 = 1, is an indicator of perfect positive correlation (line has positive slope)
𝑟 = -1 is an indicator of perfect negative correlation (line has negative slope)
𝑟 = 0 suggests that there is absolutely no linearly correlation.
Worked Example:
Two variables, x and y are linearly related. From experiment:
x
1.1
2.2
2.9
3.4
5.4
y
4.9
6.0
6.9
7.5
9.6
Determine the equation of the line of best fit (place answers to 2 d.p.)
Using the equation of the line of best, find
The value of 𝑦 when 𝑥 = 4.3
The value of 𝑥 when 𝑦 = 5.3
Compute the correlation coefficient, 𝑟.
MATH 1115 Semester I (2022/2023)
Group 3
Ms. E. Sonron
We first must set up and complete the table as shown below:
Determine the equation of the line of best fit (place answers to 2 d.p.)
𝑎1 =
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖
𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )
2
=
5(115.94) − (15)(34.9) 56.2
=
= 1.10 (𝑡𝑜 2 𝑑. 𝑝. )
5(55.18) − (15)2
50.9
𝑥̅ =
𝑦̅ =
∑ 𝑥𝑖 15
=
=3
𝑛
5
∑ 𝑦𝑖 34.9
=
= 6.98
𝑛
5
𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ = 6.98 − (
56.2
× 3) = 3.67 (𝑡𝑜 2 𝑑. 𝑝. )
50.9
Equation of the line of best fit is 𝒚 = 𝟑. 𝟔𝟕 + 𝟏. 𝟏𝟎𝒙
The value of 𝑦 when 𝑥 = 4.3 (Interpolation)
𝑦 = 3.67 + 1.10(4.3) = 8.4
The value of 𝑥 when 𝑦 = 5.3 (Extrapolation)
5.3 = 3.67 + 1.10𝑥
𝑥=
MATH 1115 Semester I (2022/2023)
5.3 − 3.67
= 1.482
1.10
Group 3
Ms. E. Sonron
Compute the correlation coefficient, r.
∑ 𝑥𝑖 𝑦𝑖 − 𝑛𝑥̅ 𝑦̅
𝑟=
√∑ 𝑥𝑖2 − 𝑛(𝑥̅ )2 √∑ 𝑦𝑖2 − 𝑛(𝑦̅)2
=
115.94 − 5(3)(6.98)
√55.18 − (5)(3)2 √256.03 − (5)(6.98)2
=
11.24
√10.18√12.428
= 0.999
The value of 𝑟 is very close to 1, which can be interpreted to mean that there is
near perfect positive linear correlation between 𝑥 and 𝑦.
Graphical Presentation in Excel
MATH 1115 Semester I (2022/2023)
Group 3
Ms. E. Sonron
Download