Topics for Today Scatterplots Relationship between 2 Continuous Variables Pearson’s Correlation Facts and Myths Correlation as a Statistic Stat203 Fall2011 – Week 9, Lecture 1 Page 1 of 28 Two Continuous Variables Using the 2-sample Chi-square test we were able to investigate the relationship between two discrete variables. Eg: - Radio format and age - weather and city Now we will examine the relationship between two __________ variables. The first tool we will discuss is called ___________. Stat203 Fall2011 – Week 9, Lecture 1 Page 2 of 28 but even before that … Scatter Plots Shows the relationship between 2 continuous variables measured on the same ___________. Values of the one variable (X) are plotted on the horizontal axis and values of the other variable (Y) are plotted on the vertical axis. Each individual appears as a single point. Let’s look at this in SPSS … Stat203 Fall2011 – Week 9, Lecture 1 Page 3 of 28 Let’s look at a dataset called Detroit that has information from the city for years 1961 to 1973. It contains 6 variables: - year homicide rate (per 100,000 population) # of police (per 100,000 population) unemployment rate (%) # registered handguns (per 10,000 population) average weekly income ($) Stat203 Fall2011 – Week 9, Lecture 1 Page 4 of 28 Let’s create a scatterplot of two of these variables. Stat203 Fall2011 – Week 9, Lecture 1 Page 5 of 28 Stat203 Fall2011 – Week 9, Lecture 1 Page 6 of 28 A scatterplot of the # of registered handguns and the # of police officers: Stat203 Fall2011 – Week 9, Lecture 1 Page 7 of 28 let’s look at the first row of the data table, and then identify that point (circle it) in the scatterplot on the previous page: Each row in the data table corresponds to exactly one point in the scatter plot. What sort of relationship between the # of registered handguns and the # of police officers does this scatterplot show? Stat203 Fall2011 – Week 9, Lecture 1 Page 8 of 28 Correlation The term ___________ is often used in common language and has a general interpretation as implying a ____________ between two events … including two discrete events: “Autism is correlated with vaccination” … or things that can’t really be measured “there’s a correlation between my mood and my partner’s behavior” However in statistics the term correlation means something specific. Stat203 Fall2011 – Week 9, Lecture 1 Page 9 of 28 Statistical Correlation ___________ measures the _________ and ________ of a ______ relationship between two continuous variables (X and Y). Pearson’s correlation is the most commonly used: r= å n i=1 (x i - x )(y i - y ) é n (x - x ) 2 ùé n (y - y ) 2 ù êëåi=1 i úûêëåi=1 i úû Note: - this is ONLY a linear relationship - there are many types of relationships that are not linear Stat203 Fall2011 – Week 9, Lecture 1 Page 10 of 28 I only give you the formula for completeness; we will not be calculating it by hand (it is extremely tedious). In this class as in every time you analyze data in the future, we will make the software calculate the correlation. However, it is important that you understand that it’s just another statistic calculated from the data, just like the mean, the standard deviation, or the odds-ratio. Stat203 Fall2011 – Week 9, Lecture 1 Page 11 of 28 Some Facts about Correlation 1. Correlation can only be used when both variables are interval or ratio level 2. Correlation does not change when we change the units of measurement of X and Y Height in cm or in will give same correlation to weight in kg or lbs 3. Positive correlation indicates positive association between the variables and negative correlation indicates negative association 4. Correlation is always between __ and _. Values near 0 indicate a very ____ relationship -1 or 1 will occur only if points fall on a straight line Stat203 Fall2011 – Week 9, Lecture 1 Page 12 of 28 Examples The following are scatter plots of two variables with the correlation between the two listed above the plot. Stat203 Fall2011 – Week 9, Lecture 1 Page 13 of 28 Pearson Correlation of 1 As in the definition, correlation is the strength of the linear relationship. All of these figures have the ____ correlation! Important note! The strength of the correlation doesn’t depend on the slope of the line, just how _______ clustered the points are to a _____________ … any straight line! Stat203 Fall2011 – Week 9, Lecture 1 Page 14 of 28 Examples of a relationship with Pearson Correlation of 0 Stat203 Fall2011 – Week 9, Lecture 1 Page 15 of 28 Facts in a video http://www.youtube.com/watch?v=Ypgo4qUBt5o Stat203 Fall2011 – Week 9, Lecture 1 Page 16 of 28 Let’s do some examples – Correlation guessing Q15, pg 370 – correlation between poverty and rates of teen pregnancy in 8 US states. a) b) c) d) [-0.95, -0.5) [-0.5, 0) (0, 0.5) [0.5, 0.95) Stat203 Fall2011 – Week 9, Lecture 1 Page 17 of 28 Q16, pg 370 (edited) – Hours studied and exam grade a) [-0.95, -0.5) b) [-0.5, 0) c) (0, 0.5) d) [0.5, 0.95) Stat203 Fall2011 – Week 9, Lecture 1 Page 18 of 28 Q19, pg 371 – Hours watching TV vs # books read a) b) c) d) [-0.95, -0.5) [-0.5, 0) (0, 0.5) [0.5, 0.95) Stat203 Fall2011 – Week 9, Lecture 1 Page 19 of 28 An Example 0 y -4 -5 -2 y 0 2 5 In which of these two scatter plots is the correlation higher? -3 -2 -1 0 1 2 -5 x Stat203 Fall2011 – Week 9, Lecture 1 0 x Page 20 of 28 5 The correlation of the x and y in the two figures is _________, only the _____ of the axes is different! Don’t trust your eye, always calculate the correlation. … but don’t trust the correlation … always check by eye. Stat203 Fall2011 – Week 9, Lecture 1 Page 21 of 28 Myths about Correlation 1. Correlation implies causation There could be a third, unknown variable which influences both X and Y 2. A correlation coefficient of zero implies no relationship between two variables WRONG! it only implies no LINEAR relationship! Remember the funky shaped figures! Stat203 Fall2011 – Week 9, Lecture 1 Page 22 of 28 Myths explained in video http://www.youtube.com/watch?v=MTbZoKEOkUg http://www.youtube.com/watch?v=VW1IEqKuf6s (Only to 2:48) Stat203 Fall2011 – Week 9, Lecture 1 Page 23 of 28 Correlation as a statistic As with the mean, the Odds Ratio and the other statistics we have looked at, a correlation is a characteristic of a population that we estimate with our ______: Mean Proportion Odds Ratio Correlation Stat203 Fall2011 – Week 9, Lecture 1 Population (Parameter) µ p Sample (Statistic) X pˆ OR _ _ Page 24 of 28 The r tells part of the story Remember, the correlation (r) we calculate from a sample is only one of the _____________ correlations we could have obtained one of many possible _______. It’s possible that the true population correlation, ρ, has another value … say 0, or ρ0. So … there is some variability of our estimate r, it’s standard error. 1- r 2 seˆ(r) = n -2 Stat203 Fall2011 – Week 9, Lecture 1 Page 25 of 28 Hypotheses for Associations between Continuous Variables H0: there is no linear relationship between X and Y Ha: there is a linear relationship between X and Y Is the same as: H 0: H 0: ρ = 0 H a: H 0: ρ ≠ 0 And as in our other hypotheses tests, we will use a _________ (r ) to approximate a _________ (ρ). Stat203 Fall2011 – Week 9, Lecture 1 Page 26 of 28 Testing for Correlation = 0 Recall our hypothesis tests for the μ= 0, we used a t-test. x -0 x t= = se(x ) s / n If both X and Y are normally distributed, the test for H0: ρ = 0 is very similar: r-0 r t= = se(r) 1- r 2 n -2 and we look up our t value in the appropriate table to find the p-value! Stat203 Fall2011 – Week 9, Lecture 1 Page 27 of 28 New Topics Covered Today Pearson’s Correlation Most commonly calculated correlation statistic No definition of response or predictor Always between -1 and 1 Hypothesis testing for Correlation Does a correlation exist? Reject null = a non-zero correlation Reading: Chapter 10 up to page 360 Stat203 Fall2011 – Week 9, Lecture 1 Page 28 of 28