living with the lab - Louisiana Tech University

advertisement
living with the lab
introduction to linear regression
linear regression provides a predictable way to quantify the relationship between
two variables, even when significant uncertainty and measurement error exist
environmental data
http://mrg.bz/sWnKWI
medical data
process parameters
http://mrg.bz/jR4UEX
http://mrg.bz/sMmqHk
© 2011 David Hall and the LWTL faculty team
The Living with the Lab label, the Louisiana Tech Logo, and this copyright notice should not be removed when any part of this work is
used by others. This work may not be used for commercial purposes. Inquiries should be addressed to dhall@latech.edu. This
presentation on linear regression is based partially on class notes created by Dr. Mark Barker at Louisiana Tech University.
living with the lab
DISCLAIMER
The content of this presentation is for informational purposes only and is intended only for students
attending Louisiana Tech University.
The author of this information does not make any claims as to the validity or accuracy of the information
or methods presented.
The procedures demonstrated here are potentially dangerous and could result in injury or damage.
Louisiana Tech University and the State of Louisiana, their officers, employees, agents or volunteers, are
not liable or responsible for any injuries, illness, damage or losses which may result from your using the
materials or ideas, or from your performing the experiments or procedures depicted in this presentation.
If you do not agree, thendo not view this content.
2
living with the lab
collect some data to see how linear regression works
•
we know that our heart rate increases as we begin to exercise
•
heart rate is usually expressed in beats per minute (bpm)
•
we can record our pulse over a short period of time to estimate heart rate . . . we’ll collect over a
10 second period
• 𝑏𝑝𝑚 = 𝑏𝑒𝑎𝑡𝑠 𝑜𝑣𝑒𝑟 𝑎 10 𝑠𝑒𝑐𝑜𝑛𝑑 𝑝𝑒𝑟𝑖𝑜𝑑 ∙ 6
•
the variation of heart rate during exercise is complex and depends on many factors (fitness, the
level of exertion, the duration of exercise, what you’ve been eating/drinking, . . .)
•
we will assume that heart rate is initially linear with the duration of exercise just to collect some
data . . . this could serve as a starting point for a systematic study of heart rate during exercise
3
living with the lab
collect pulse after doing jumping jacks
1.
2.
measure pulse for 10 seconds (have a partner write down the number of beats)
do jumping jacks for 10 seconds 10 seconds of total exercise
3.
4.
5.
6.
7.
8.
9.
measure pulse for 10 seconds
do jumping jacks for 10 seconds 20 seconds of total exercise
measure pulse for 10 seconds
do jumping jacks for 10 seconds 30 seconds of total exercise
measure pulse for 10 seconds
do jumping jacks for 10 seconds 40 seconds of total exercise
measure pulse for 10 seconds
0
STOP
0
10
10
20
30
40
jump STOP
jump STOP
jump STOP
jump STOP
20
30
40
50
60
collect heart rate five times
70
80
jumping time (s)
90
total time (s)
4
living with the lab
logistics
www.onlinestopwatch.com
•
choose one or two people per table to do jumping jacks; this is voluntary . . . don’t do the jumping
jacks if there is any reason why this activity could be harmful to you
•
the people who are jumping should get away from tripping hazards and other people (clear a space
around your table and keep yourself under control while exercising)
•
your instructor will keep track of time and tell you when to jump and when to collect heart rate; a
cell phone, watch or online stopwatch can be used
•
we need about 7 to 10 sets of data from the entire class . . . not everybody will get to exercise 
•
we’ll analyze and plot this data using Excel
•
the heart rate collected will include some error
o collect pulse as soon as you stop jumping
o after 10 seconds, call out the number of pulses collected over 10 seconds to your
partner(s) and start jumping again
•
just be as accurate as possible
5
living with the lab
enter heart rate data into a Excel
time
(s)
student
1 (bpm)
student
2 (bpm)
student
3 (bpm)
student
4 (bpm)
student
5 (bpm)
student
6 (bpm)
student
7 (bpm)
student
8 (bpm)
0
10
20
30
40
• please multiply the number of pulses collected over 10 seconds by 6 to get beats
per minute (bpm)
• report bpm to your instructor
• build a spreadsheet on your computer along with the instructor
6
living with the lab
plot data for the entire class in Excel
• make a scatter plot using symbols only – no lines
• time is the independent variable and is plotted as the x-axis
• heart rate is the dependent variable and is plotted as the y-axis
• the title of the plot is always listed as “y versus x” . . . which is “heart rate versus
exercise time” for this problem
heart rate versus exercise time
heart rate (bpm)
150
130
110
90
70
50
0
10
20
30
40
50
cumulative exercise time (s)
7
living with the lab
make a hand plot for one data set
• your instructor will select one student’s data that is typical of the data for the
entire class; we will analyze this data
• make a hand plot using your own paper as shown below (use proper format!!)
• draw a “best fit” line through the data; just use your judgment
heart rate versus exercise time
“best fit line”
cumulative
exercise time heart rate
(s)
(bpm)
0
67
10
82
20
86
30
96
40
120
use data from class . . . not this data
8
living with the lab
find an equation to fit the data
• assume the data is linear
• pick two points from your data (or make up two points by picking from the line)
𝑟𝑖𝑠𝑒
∆𝑦
• compute the slope
𝑜𝑟
𝑟𝑢𝑛
∆𝑥
• write equation using point-slope form as 𝑦 = 𝑚 ∙ 𝑥 + 𝑏
cumulative
exercise time heart rate
(s)
(bpm)
0
67
10
82
20
86
30
96
40
120
example (use data from your class)
find the slope:
∆𝑦 120−82
= 1.27
𝑚=
=
∆𝑥 40−10
find the y-intercept by plugging in one of the data points:
𝑏 = 𝑦 − 𝑚 ∙ 𝑥 = 120 − 1.27 ∙ 40 = 69.3
write the equation:
ℎ𝑒𝑎𝑟𝑡 𝑟𝑎𝑡𝑒 = 1.27 ∙ 𝑡𝑖𝑚𝑒 + 69.3
. . . where heart rate is in bpm and time is in seconds.
9
living with the lab
analysis of our equations
• compare your answer with others in the class
• if you chose the same two points to define your “best fit” line, then your
equations should be the same
• choosing different points causes us to get different equations
• linear regression, which can be derived using calculus, gives us the same
equation every time
• linear regression takes the guess work out of finding best fit lines
http://earthobservatory.nasa.gov/IOTD/view.php?id=46145
10
living with the lab
understanding linear regression
𝑦
data point 𝑖
(𝑥𝑖 , 𝑦𝑖 )
best fit line
𝑦 =𝑚∙𝑥+𝑏
𝑓𝑖𝑡
𝑓𝑖𝑡
𝑦𝑖 − 𝑦𝑖
𝑓𝑖𝑡
𝑦𝑖
𝑦𝑖
= 𝑚 ∙ 𝑥𝑖 + 𝑏
𝑓𝑖𝑡
𝑦𝑖
𝑥𝑖
𝑥
• linear regression generates the best line by minimizing the squares of the errors
𝑓𝑖𝑡 2
𝑦𝑖
• minimize 𝑦𝑖 −
for all data points to find optimum values of m and b
• we call this least squares linear regression
11
living with the lab
finding m and b
𝑛 𝑥𝑖 𝑦𝑖 −
𝑚=
𝑛 𝑥𝑖2 −
𝑥𝑖 𝑦𝑖
𝑥𝑖 2
𝑏=
𝑦𝑖 − 𝑚
𝑛
𝑥𝑖
𝑦 =𝑚∙𝑥+𝑏
cumulative
exercise time heart rate
(s)
(bpm)
𝑚=
x·y
x2
x
y
0
10
20
30
40
67
82
86
96
120
0
820
1720
2880
4800
0
100
400
900
1600
100
451
10220
3000
∑𝑥𝑖
∑𝑦𝑖
∑𝑥𝑖 ∙ 𝑦𝑖
∑𝑥𝑖2
𝑏=
cumulative
exercise time heart rate
(s)
(bpm)
0
67
10
82
20
86
30
96
40
120
5 ∙ 10220 − 100 ∙ 451
= 1.2
5 ∙ 3000 − 100 2
451 − 1.2 ∙ 100
5
= 66.2
ℎ𝑒𝑎𝑟𝑡 𝑟𝑎𝑡𝑒 = 1.2 ∙ 𝑡𝑖𝑚𝑒 + 66.2
Repeat the above procedure for the data set selected in your class. Compare the m and b
that you get with your classmates. Doing this by hand is good practice for the exam. 
12
living with the lab
repeat for all of the class data
cumulative
exercise time heart rate
(s)
(bpm)
student 4
student 3
student 2
student 1
x
y
x·y
x2
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
• reformat your spreadsheet to have
single x and y columns as shown (5 lines
for each students heart rate data)
• find the sums and plug them into the
equations for m and b to find the best
fit line; try to do these calculations in
Excel . . . it’s tricky  due to fixed cell
references and the placement of
parentheses
• create a plot of all data in Excel
• plot the best fit line without any
symbols over the data points
• see the next page for an example
400
∑𝑥𝑖
∑𝑦𝑖
∑𝑥𝑖 ∙ 𝑦𝑖
∑𝑥𝑖2
13
living with the lab
details of solving previous problem in Excel
cumulative
exercise time
(s)
heart rate
(bpm)
x
y
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
67
82
86
96
120
50
60
70
80
90
82
92
105
110
115
80
91
105
118
118
400
1817
student 2
student 3
m=
b=
1.0175
70.5
D
E
F
don’t look at these tips unless you get stuck!!
x·y
0
820
1720
2880
4800
0
600
1400
2400
3600
0
920
2100
3300
4600
0
910
2100
3540
4720
40410
x2
yfit
0
100
400
900
1600
0
100
400
900
1600
0
100
400
900
1600
0
100
400
900
1600
12000
70.5
80.7
90.9
101.0
111.2
=C$28*B5+C$29
use these data point to plot the best-fit line
heart rate versus exercise time
heart rate (bpm)
C
student 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
B
student 1
A
150
130
110
90
70
50
0
10
20
30
40
50
cumulative exercise time (s)
=(COUNT(B5:B24)*D26-B26*C26)/(COUNT(B5:B24)*E26-B26^2)
14
Download