Gerald Kruse, Ph.D. & Cathy Stenson, Ph.D. Juniata College Mathematics Department CityMPG = EPA's estimated miles per gallon for city driving Weight = Weight of the car (in pounds) FuelCapacity = Size of the gas tank (in gallons) QtrMile = Time (in seconds) to go 1/4 mile from a standing start Acc060 = Time (in seconds) to accelerate from zero to 60 mph PageNum = Page number on which the car appears in the buying guide Place the letter for each pair on the chart below to indicate your guess as to the direction (negative, neutral, or positive) and strength of the association between the two variables. (a) Weight vs. CityMPG (d) Weight vs. QtrMile (b) Weight vs. FuelCapacity (e) Acc060 vs. QtrMile (c) PageNum vs. Fuel Capacity (f) CityMPG vs. QtrMile Strong Negative Moderate Negative Weak Negative No Association Weak Positive Moderate Positive Strong Positive Matrix Plot - Car Data 26.75 CityMPG Scatterplot Matrix 20.25 3570 Weight 2420 20.35 FuelCap 13.65 17.85 QtrMile 15.35 10.775 Acc060 7.325 202 PageNum 108 .2 20 5 .7 26 5 2 24 0 7 35 0 .6 13 5 .3 20 5 .3 15 5 .8 17 5 2 7. 3 5 1 7 0. 7 5 10 8 20 2 Place the letter for each pair on the chart below to indicate your guess as to the direction (negative, neutral, or positive) and strength of the association between the two variables. (a) Weight vs. CityMPG (d) Weight vs. QtrMile (b) Weight vs. FuelCapacity (e) Acc060 vs. QtrMile (c) PageNum vs. Fuel Capacity (f) CityMPG vs. QtrMile Strong Negative Moderate Negative (a) (d) Weak Negative No Association (c) Weak Positive Moderate Positive Strong Positive (f ) (b) , (e) Measure of Correlation Definition: The correlation, r, measures the strength of linear association between two quantitative variables. X X 1 r n 1 S X Y Y SY Measure of Correlation X m eanof X values Y m eanof Y values S X Std Dev of X values SY Std Dev of Y values Sample Correlations in 1999 Car Data CityMPG Weight FuelCap QtrMile Weight -0.907 FuelCap -0.793 0.894 QtrMile 0.510 -0.450 -0.469 Acc060 0.506 -0.454 -0.465 0.994 PageNum 0.283 -0.237 -0.081 0.196 Acc060 0.205 Place the letter for each pair on the chart below to indicate your guess as to the direction (negative, neutral, or positive) and strength of the association between the two variables. (a) Weight vs. CityMPG (d) Weight vs. QtrMile (b) Weight vs. FuelCapacity (e) Acc060 vs. QtrMile (c) PageNum vs. Fuel Capacity (f) CityMPG vs. QtrMile Strong Negative Moderate Negative Weak Negative No Association Weak Positive Moderate Positive Strong Positive r “between” -1.0 and -0.8 r “between” -0.8 and -0.5 r “between” -0.5 and 0 r “around” 0 r “between” 0 and 0.5 r “between” 0.5 and 0.8 r “between” 0.8 and 1.0 (a) = -0.907 (d) = -0.450 (f) = 0.510 (b) = 0.894 (c) = -0.081 (e) = 0.994 1) -1 ≤ r ≤ 1 2) The sign indicates the direction of association positive association: r > 0 negative association: r < 0 no linear association: r approx 0 3) The closer r is to ±1, the stronger the linear association 4) r has no units and does not depend on the units of measurement 5) The correlation between X and Y is the same as the correlation between Y and X (0) faculty.juniata.edu/kruse (1) Open the Excel file: ConsumerReportsCarData1999.xlsx (2) Highlight column C, City MPG (3) CTRL – click and highlight column F, Weight (4) Insert -> Scatter -> Scatterplot (5) Remove legend (6) “Zoom” on axes (7) Add axes titles (8) Modify plot title, “City MPG vs. Weight” (9) Add trendline We were given that the r-value for this data is -0.907. Excel calculated R2 as 0.8225? Let’s take the square root… 0.906918, which if we round and add the negative sign for the slope, is what we would expect. We could also calculate the r-value: (1) using the Data Analysis Add-In in Excel (2) by “hand,” in Excel A correlation near zero does not (necessarily) mean that the two variables are unrelated. EXAMPLE: A circus performer (the Human Cannonball) is interested in how the distance downrange (Y) that a projectile shot from a cannon will travel depends on the angle of elevation (X) of the cannon. Suppose that we designed an experiment to examine this relationship by test firing (dummies) at various angles ranging from X=0o to X=90o. Sketch a typical scatterplot that you might expect to see from such an experiment. Would you say that there is likely to be a strong relationship between angle X and distance downrange Y? Estimate the correlation between the X and Y variables from your scatterplot. Remember: Correlation measures the strength of linear association between two variables. Y X Y 0 deg http://stat.duke.edu/courses/Fall12/sta101.002/Sec2-145.pdf 90 deg X A strong correlation does not (necessarily) imply a cause/effect relationship. Life Expectancy vs. People/TV 250 People per TV 200 y = -5.5887x + 413.83 R² = 0.6461 R = -0.8038 150 100 50 0 40 -50 45 50 55 60 65 70 75 80 Life Expectancy (years) Would you agree that there is a fairly strong negative association between these two variables? Given this association, would it be reasonable to set a foreign policy goal to send lots of TV's to the countries with lowest life expectancies, thus decreasing the number of people per TV and thereby helping the inhabitants to live longer lives? http://www.public.iastate.edu/~pcaragea/S226S09/Notes/student.notes.section2.4.pdf A strong correlation does not (necessarily) imply a cause/effect relationship. http://www.nbcnews.com/id/41479869/ns/healthdiet_and_nutrition/t/daily-diet-soda-tied-higher-risk-strokeheart-attack/ The following web-page has a Java applet which can be used to construct scatterplots and calculate Pearson’s Correlation Coefficient. http://illuminations.nctm.org/LessonDetail.aspx?ID=L456 1) Coefficient of Correlation lies between -1 and +1 2) Coefficients of Correlation are independent of Change of Origin and Scale 3) Coefficients of Correlation possess the property of Symmetry 4) Co-efficient of Correlation measures only linear correlation between X and Y 5) If two variables X and Y are independent, coefficient of correlation between them will be zero.