9.1 Correlation • Key Concepts: – Scatter Plots – Correlation – Sample Correlation Coefficient, r – Hypothesis Testing for the Population Correlation Coefficient, ρ 9.1 Correlation • What exactly do we mean by correlation? – If two variables are correlated, it means a relationship exists between them. – Examples of correlated variables: • • • • • Job Satisfaction and Job Attendance Number of Cows per Square Mile and Crime Rate Height and Weight High School GPA and College GPA Square Footage and Price (of a house) 9.1 Correlation • Two questions we need to answer: 1. Does a linear (or straight line) correlation exist between the two variables? 2. If the variables appear linearly correlated, how strong is the correlation? – We can answer (1) using a scatter plot • The independent (explanatory) variable is x • The dependent (response) variable is y – Example: How well does High School GPA, x, “explain” College GPA, y? – See section 2.2 for a review of scatter plots 9.1 Correlation • Once the scatter plot is complete, we should be able to see if a linear relationship exists between the two variables. – See p. 470 for what we mean by Negative Linear Correlation, Positive Linear Correlation, No Correlation, and Nonlinear Correlation. • Next, we need a way to quantify or measure the strength of the linear relationship between the two variables. 9.1 Correlation • The Correlation Coefficient measures the strength and the direction of the linear relationship between two variables. The sample correlation coefficient, r, is defined as: r n xy x y n x x 2 2 n y y where n is the number of pairs of data 2 2 9.1 Correlation • Things we need to know about the sample correlation coefficient, r : – r will always lie between -1 and 1, inclusive: -1 ≤ r ≤ 1 – If r = -1, we say there is a perfect negative linear correlation between the two variables. – If r = 1, there is a perfect positive linear correlation between the two variables. – The strength of the linear relationship between the variables is determined by r ’s proximity to 1 or -1. In other words, the closer r is to 1 or -1, the stronger the linear relationship. The closer r is to 0, the weaker the linear relationship. • Practice: #22 p. 482 (Age and Vocabulary) 9.1 Correlation • Once we have the sample linear correlation coefficient, r, we can use it in a t-Test to make an inference about the population linear correlation coefficient, ρ (Greek letter “rho”). – Why bother? • Remember we found r using a limited set of data. What about the rest of the population? Do we have enough evidence from the sample data to claim that a significant linear correlation exists between our two variables? – Example: If we have analyzed the High School GPA and College GPA of 25 students, is there enough evidence to claim that a significant linear correlation exists between the High School GPA and College GPA of all students? 9.1 Correlation • t-Test for the Population Correlation Coefficient – We will use the two-tailed version of this test: H0: ρ = 0 (no significant correlation exists) Ha: ρ ≠ 0 (a significant correlation exists) – The test statistic is r and the standardized test statistic is given by: t r r r 1 r2 n2 Note: t follows a t-distribution with n – 2 degrees of freedom 9.1 Correlation • Practice using the t-Test: #32 p. 484 (Braking Distances: Wet Surface)