SoTL Brown Bag - Juniata College

advertisement
Gerald Kruse, Ph.D. & Cathy Stenson, Ph.D.
Juniata College
Mathematics Department
CityMPG = EPA's estimated miles per gallon for city driving
Weight = Weight of the car (in pounds)
FuelCapacity = Size of the gas tank (in gallons)
QtrMile = Time (in seconds) to go 1/4 mile from a standing
start
Acc060 = Time (in seconds) to accelerate from zero to 60 mph
PageNum = Page number on which the car appears in the
buying guide
Place the letter for each pair on the chart below to indicate your
guess as to the direction (negative, neutral, or positive) and
strength of the association between the two variables.
(a) Weight vs. CityMPG
(d) Weight vs. QtrMile
(b) Weight vs. FuelCapacity
(e) Acc060 vs. QtrMile
(c) PageNum vs. Fuel Capacity
(f) CityMPG vs. QtrMile
Strong
Negative
Moderate
Negative
Weak Negative
No
Association
Weak Positive
Moderate
Positive
Strong
Positive
Matrix Plot - Car Data
26.75
CityMPG
Scatterplot Matrix
20.25
3570
Weight
2420
20.35
FuelCap
13.65
17.85
QtrMile
15.35
10.775
Acc060
7.325
202
PageNum
108
.2
20
5
.7
26
5
2
24
0
7
35
0
.6
13
5
.3
20
5
.3
15
5
.8
17
5
2
7. 3
5
1
7
0. 7
5
10
8
20
2
Place the letter for each pair on the chart below to indicate your
guess as to the direction (negative, neutral, or positive) and
strength of the association between the two variables.
(a) Weight vs. CityMPG
(d) Weight vs. QtrMile
(b) Weight vs. FuelCapacity
(e) Acc060 vs. QtrMile
(c) PageNum vs. Fuel Capacity
(f) CityMPG vs. QtrMile
Strong
Negative
Moderate
Negative
(a)
(d)
Weak Negative
No
Association
(c)
Weak Positive
Moderate
Positive
Strong
Positive
(f )
(b) , (e)
Measure of Correlation
Definition: The correlation, r, measures
the strength of linear association
between two quantitative variables.
X X
1

r


n 1  S X
 Y  Y

 SY



Measure of Correlation
X  m eanof X values
Y  m eanof Y values
S X  Std Dev of X values
SY  Std Dev of Y values
Sample Correlations in
1999 Car Data
CityMPG
Weight
FuelCap
QtrMile
Weight
-0.907
FuelCap
-0.793
0.894
QtrMile
0.510
-0.450
-0.469
Acc060
0.506
-0.454
-0.465
0.994
PageNum
0.283
-0.237
-0.081
0.196
Acc060
0.205
Place the letter for each pair on the chart below to indicate your
guess as to the direction (negative, neutral, or positive) and
strength of the association between the two variables.
(a) Weight vs. CityMPG
(d) Weight vs. QtrMile
(b) Weight vs. FuelCapacity
(e) Acc060 vs. QtrMile
(c) PageNum vs. Fuel Capacity
(f) CityMPG vs. QtrMile
Strong
Negative
Moderate
Negative
Weak
Negative
No
Association
Weak
Positive
Moderate
Positive
Strong
Positive
r “between”
-1.0 and -0.8
r “between”
-0.8 and -0.5
r “between”
-0.5 and 0
r “around”
0
r “between”
0 and 0.5
r “between”
0.5 and 0.8
r “between”
0.8 and 1.0
(a) = -0.907
(d) = -0.450
(f) = 0.510
(b) = 0.894
(c) = -0.081
(e) = 0.994
1)
-1 ≤ r ≤ 1
2) The sign indicates the direction of association
positive association: r > 0
negative association: r < 0
no linear association: r approx 0
3) The closer r is to ±1, the stronger the linear association
4) r has no units and does not depend on the units of measurement
5) The correlation between X and Y is the same as the correlation
between Y and X
(0) faculty.juniata.edu/kruse
(1) Open the Excel file:
ConsumerReportsCarData1999.xlsx
(2) Highlight column C, City MPG
(3) CTRL – click and highlight column F, Weight
(4) Insert -> Scatter -> Scatterplot
(5) Remove legend
(6) “Zoom” on axes
(7) Add axes titles
(8) Modify plot title, “City MPG vs. Weight”
(9) Add trendline
We were given that the r-value for this data is -0.907.
Excel calculated R2 as 0.8225?
Let’s take the square root…
0.906918, which if we round and add the negative sign for the
slope, is what we would expect.
We could also calculate the r-value:
(1) using the Data Analysis Add-In in Excel
(2) by “hand,” in Excel
A correlation near zero does not (necessarily) mean that the two variables are unrelated.
EXAMPLE: A circus performer (the Human Cannonball) is interested in how the distance
downrange (Y) that a projectile shot from a cannon will travel depends on the angle of elevation (X)
of the cannon.
Suppose that we designed an experiment to examine this relationship by test firing (dummies) at
various angles ranging from X=0o to X=90o. Sketch a typical scatterplot that you might expect to see
from such an experiment.
Would you say that there is likely to be a strong relationship between angle X and distance
downrange Y? Estimate the correlation between the X and Y variables from your scatterplot.
Remember: Correlation measures the strength of linear association between two variables.
Y
X
Y
0 deg
http://stat.duke.edu/courses/Fall12/sta101.002/Sec2-145.pdf
90 deg
X
A strong correlation does not (necessarily) imply a cause/effect relationship.
Life Expectancy vs. People/TV
250
People per TV
200
y = -5.5887x + 413.83
R² = 0.6461
R = -0.8038
150
100
50
0
40
-50
45
50
55
60
65
70
75
80
Life Expectancy (years)
Would you agree that there is a fairly strong negative association between these two variables?
Given this association, would it be reasonable to set a foreign policy goal to send lots of TV's to
the countries with lowest life expectancies, thus decreasing the number of people per TV and
thereby helping the inhabitants to live longer lives?
http://www.public.iastate.edu/~pcaragea/S226S09/Notes/student.notes.section2.4.pdf
A strong correlation does not (necessarily) imply a cause/effect relationship.
http://www.nbcnews.com/id/41479869/ns/healthdiet_and_nutrition/t/daily-diet-soda-tied-higher-risk-strokeheart-attack/
The following web-page has a Java applet which can be used to
construct scatterplots and calculate Pearson’s Correlation
Coefficient.
http://illuminations.nctm.org/LessonDetail.aspx?ID=L456
1) Coefficient of Correlation lies between -1 and +1
2) Coefficients of Correlation are independent of Change of Origin and
Scale
3) Coefficients of Correlation possess the property of Symmetry
4) Co-efficient of Correlation measures only linear correlation between
X and Y
5) If two variables X and Y are independent, coefficient of correlation
between them will be zero.
Download