Section 12.1
Scatter Plots and Correlation
With the quality added value you’ve come to
expect from D.R.S., University of Cordele
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
HAWKES LEARNING SYSTEMS
Regression, Inference, and Model Building
math courseware specialists
12.1 Scatter Plots and Correlation
Types of Relationships:
Strong
Linear
Relationship
No
Relationship
Plot (x,y) data points and think about
whether x and y are somehow related
Weak
Linear
Relationship
Non-Linear
Relationship
Example 12.3: Determining Whether a Scatter Plot Would Have a
Positive Slope, Negative Slope, or Not Follow a Straight-Line Pattern
Determine whether the points in a scatter plot for the
two variables are likely to have a positive slope,
negative slope, or not follow a straight-line pattern.
a. The number of hours you study for an exam and the
score you make on that exam _________________
b. The price of a used car and the number of miles on
the odometer _____________________________
c. The pressure on a gas pedal and the speed of the
car _____________________________________
d. Shoe size and IQ for adults ___________________
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Scatter Plots and Correlation
The Pearson correlation coefficient, , is the
parameter that measures the strength of a linear
relationship between two quantitative variables in a
population. ρ is the Greek letter “rho”. Practice writing
the rho character here:
The correlation coefficient for a sample is denoted by r.
It always takes a value between −1 and 1, inclusive.
1  r  1
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Population parameter 𝜌, Sample statistic 𝑟
𝜌 (Greek letter rho) is the population parameter for
the Correlation Coefficient.
𝑟 is the sample statistic for the Correlation Coefficient.
r
  x   y 
n x    x  n y    y 
n x i y i 
i
2
2
i
i
i
2
i
2
i
We use our sample 𝑟 to estimate the population’s 𝜌.
Just like in other experiments we used our sample 𝑥 to
estimate the population’s mean, 𝜇.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
HAWKES LEARNING SYSTEMS
Regression, Inference, and Model Building
math courseware specialists
12.1 Scatter Plots and Correlation
• –1 ≤ r ≤ 1
• Close to –1 means a strong negative correlation.
• Close to 0 means no correlation.
• Close to 1 means a strong positive correlation.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Some of these examples are based on
the data set of 2015 Major League
Baseball statistics as listed on another
handout. The easiest way to get this
data is to have the five lists of data
loaded onto your calculator from
another source. The hardest way is
to type in the data yourself: 5 lists x
30 teams = 150 data values.
Is there a correlation between
Team Payroll and Games Won?
• Scatter Plot
.
2ND STAT PLOT (on Y= key, top left)
L1 is payroll in
$millions
L2 is games won
ZOOM 9:ZStat. Do
the dots seem to line
up in a straight line
pattern? { Yes No }
Is there a correlation between
Team Payroll and Games Won?
• Scatter Plot
• Correlation Coefficient
2ND STAT PLOT (on Y= key, top left)
2ND STAT, TESTS, ALPHA F
for LinRegTTest
(ALPHA E on TI-83/Plus)
L1 is payroll in
$millions
L2 is games won
ZOOM 9:ZStat. Do
the dots seem to line
up in a straight line
pattern? { Yes No }
Again: L1 is
payroll $millions
L2 is games won
VARS, Y-VARS, 1, 1 to
put the Y1 into RegEQ
The correlation
coefficient is
r = _________,
which seems
{ strong, weak }
Is there a correlation between Games Won
and Attendance?
• Scatter Plot
• Correlation Coefficient
2ND STAT PLOT (on Y= key, top left)
2ND STAT, TESTS, ALPHA F
for LinRegTTest
(ALPHA E on TI-83/Plus)
L___ is Games Won
L___ is Attendance
Which two lists
do you use?
ZOOM 9:ZStat. Do
the dots seem to line
up in a straight line
pattern? { Yes No }
VARS, Y-VARS, 1, 1 to
put the Y1 into RegEQ
The correlation
coefficient is
r = _________,
which seems
{ strong, weak }
LinRegTTest Inputs
• Here are the inputs:
• β & ρ: ≠0
– This is the Alternative
Hypothesis. Always ≠0
for M2205 Ch. 12.
• RegEq: VARS, right
arrow to Y-VARS, 1, 1
• Xlist and Ylist – the two
data lists of interest
• Freq: 1 (unless…)
– Just put it in
– It will be used later
• Highlight “Calculate”
• Press ENTER
LinRegTTest Outputs, first screen
•
• t= the t test statistic
value for this test (the
formula is coming soon)
• p = the p-value for this t
test statistic
• 𝑑𝑓 = 𝑛 – 2 in this kind
of a test
• 𝑎 later – for regression
LinRegTTest Outputs, second screen
• b later, for Regression
• s much later, for
advanced Regression
• r2 = how much of the
output variable (weight)
is explained by the
input variable (girth)
• r = the correlation
coefficient for the
sample
– Close to +1? strong
positive relationship
– Or −1? strong negative
Correlation does not imply Causation!
If there seems to be a Correlation, it doesn’t necessarily
mean that changes in one variable cause changes in the
other variable.
1. There might be a lurking variable that affects both.
2. Or the two might be completely unrelated. The
mathematical indication of a strong correlation is
merely coincidental.
Extreme examples can be seen at the Spurious
Correlations web site (www.tylervigen.com)
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Testing the Correlation Coefficient for
Significance Using Hypothesis Testing
Testing Linear Relationships for Significance This is the one
Significant Linear Relationship (Two-Tailed Test)
we use the most.
H0:  = 0 (Implies that there is no significant linear relationship)
Ha:  ≠ 0 (Implies that there is a significant linear relationship)
Testing Linear Relationships for Significance (cont.)
Significant Negative Linear Relationship (Left-Tailed Test)
H0:  ≥ 0 (Implies that there is no significant negative linear relationship)
Ha:  < 0 (Implies that there is a significant negative linear relationship)
Testing Linear Relationships for Significance (cont.)
Significant Positive Linear Relationship (Right-Tailed Test)
H0:  ≤ 0 (Implies that there is no significant positive linear relationship)
Ha:  > 0 (Implies that there is a significant positive linear relationship)
Be aware that this
one exists.
Be aware that this
one exists.
(Now they’re getting into the Hypothesis Testing we saw a brief
preview of earlier in this set of slides.)
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Testing the Correlation Coefficient for
Significance Using Hypothesis Testing
Test Statistic for a Hypothesis Test for a Correlation
Coefficient
The test statistic for testing the significance of the
correlation coefficient is given by
TI-84
Test Statistic for a Hypothesis Test
r
for a Correlation Coefficient (cont.) t 
LinRegTTest will
2
where r is the sample correlation
1r
calculate this
coefficient and
n is the number of data pairs in the
n2
value for us.
sample.
The number of degrees of freedom
for the t-distribution of the test
statistic is given by n  2.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Testing the Correlation Coefficient for
Significance Using Hypothesis Testing
Rejection Regions for Testing Linear Relationships
Significant Linear Relationship (Two-Tailed Test)
Reject the null hypothesis, H0 , if t  t 2 .
Significant Negative Linear Relationship (Left-Tailed Test)
Reject the null hypothesis, H0 , if t  t  .
Significant Positive Linear Relationship (Right-Tailed Test)
Reject the null hypothesis, H0 , if t  t .
But we will use the p-value method
because LinRegTTest gives us a p-value
and the experiment specifies the α (alpha)
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Hypothesis Test for significant 𝑟
Null Hypothesis: 𝜌 = 0
“No relationship”
Alternative: 𝜌 ≠ 0 “There is a significant relationship!”
There’s some 𝛼 level of significance specified in
advance, like 𝛼 = .01 or 𝛼 = .05
A 𝑡 value is calculated. Then “what is the 𝑝-value of
this 𝑡?” (Area beyond 𝑡, is it a small probability?)
And if 𝑝-value < 𝛼, reject the null hypothesis
– If so, then we say “Yes, significant relationship!”
Disregard most of the by-hand detail that is in the online Help.
Example 12.7: Performing a Hypothesis Test to Determine if
the Linear Relationship between Two Variables Is Significant
Use a hypothesis test to determine if the linear
relationship between the number of parking tickets a
student receives during a semester and his or her GPA
during the same semester is statistically significant at
the 0.05 level of significance. Refer to the data
presented in the following table.
GPA and Number of Parking Tickets
Number
of Tickets
GPA
0
0
0
0
1
1
1
2
2
2
3
3
5
7
8
3.6 3.9 2.4 3.1 3.5 4.0 3.6 2.8 3.0 2.2 3.9 3.1 2.1 2.8 1.7
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.7: Performing a Hypothesis Test to Determine if the
Linear Relationship between Two Variables Is Significant (cont.)
Solution
Step 1: State the null and alternative hypotheses.
We wish to test the claim that a significant linear
relationship exists between the number of parking
tickets a student receives during a semester and his or
her GPA during the same semester. Thus, the
hypotheses are stated as follows.
H0 :   0 (Population Correl. Coeff. = 0: No correlation.)
Ha :   0 (Population Correl. Coeff. ≠ 0: Yes, correlation.)
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.7: Performing a Hypothesis Test to Determine if the
Linear Relationship between Two Variables Is Significant (cont.)
Step 2: Determine which distribution to use for the test
statistic, and state the level of significance.
We will use the t-test statistic presented previously in
this section along with a significance level of = 0.05 to
perform this hypothesis test.
Step 3: Gather data and calculate the necessary sample
statistics. (Do LinRegTTest)
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12-7 Hypothesis Test, concluded
Compare p = _____ vs. α = ______
Decision: { Reject / Fail to Reject } the Null Hypothesis.
Conclusion about Signficant Linear Relationship:
Conclusion in Plain English:
Example 12.8: Performing a Hypothesis Test to Determine if
the Linear Relationship between Two Variables Is Significant
An online retailer wants to research the effectiveness of
its mail-out catalogs. The company collects data from
its eight largest markets with respect to the number of
catalogs (in thousands) that were mailed out one fiscal
year versus sales (in thousands of dollars) for that year.
The results are as follows.
Number of Catalogs Mailed and Sales
Number of
Catalogs
(in Thousands)
Sales
(in Thousands)
2
3
3
3
4
4
5
6
$126 $98 $255 $394 $107 $122 $334 $403
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.8: Performing a Hypothesis Test to Determine if the
Linear Relationship between Two Variables Is Significant (cont.)
Use a hypothesis test to determine if the linear
relationship between the number of catalogs mailed
out and sales is statistically significant at the 0.01 level
of significance.
Step 1: Hypotheses:
H0: ___________ meaning _____________________.
Ha: ___________ meaning _____________________.
Step 2: Decision to use the t distribution
and level of significance _____ = 0.01
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.8: Performing a Hypothesis Test to Determine if the
Linear Relationship between Two Variables Is Significant (cont.)
Step 3: Gather data and calculate the necessary sample
statistics. Using a TI-83/84 Plus calculator, enter the
values for the numbers of catalogs mailed (x) in L1 and
the sales values (y) in L2. Run LinRegTTest.
Step 4: Conclusion:
{ Reject / Fail to Reject } the Null Hypothesis.
Interpretation:
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Coefficient of Determination
The coefficient of determination, r2 , is a measure of
the proportion of the variation in the response variable
(y) that can be associated with the variation in the
explanatory variable (x).
This too is reported to you in the
LinRegTTest outputs.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.9: Calculating and Interpreting the
Coefficient of Determination
If the correlation coefficient for the relationship
between the numbers of rooms in houses and their
prices is r = 0.65, how much of the variation in house
prices can be associated with the variation in the
numbers of rooms in the houses?
Solution
Recall that the coefficient of determination tells us the
amount of variation in the response variable (house
price) that is associated with the variation in the
explanatory variable (number of rooms).
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.9: Calculating and Interpreting the
Coefficient of Determination (cont.)
Thus, the coefficient of determination for the
relationship between the numbers of rooms in houses
and their prices will tell us the proportion or
percentage of the variation in house prices that can be
associated with the variation in the numbers of rooms
in the houses. Also, recall that the coefficient of
determination is equal to the square of the correlation
coefficient.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.9: Calculating and Interpreting the
Coefficient of Determination (cont.)
Since we know that the correlation coefficient for these
data is r = 0.65, we can calculate the coefficient of
determination as r2 = _____
Thus, approximately _____% of the variation in house
prices can be associated with the variation in the
numbers of rooms in the houses.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Correlation Coefficient in Excel
More with Excel
That’s about all that can be done with basic Excel.
There is an advanced feature on Data tab, then the
Data Analysis add-in.
It gets into
the Regression
topic in the
next lesson.
.
.
.
.