The Statistical Imagination

• Chapter 15. Correlation and
Part 2: Hypothesis Testing and
Aspects of a Relationship
When to Test a Hypothesis Using
Correlation and Regression
1) There is one representative sample from a
single population
2) There are two interval/ratio or interval-like
ordinal variables
3) There are no restrictions on sample size, but
generally, the larger the n, the better
4) A scatterplot of the coordinates of the two
variables fits a linear pattern
Test Preparation
• Before proceeding with the hypothesis test,
check the scatterplot for a linear pattern
• Calculate the Pearson’s r correlation coefficient
and the regression coefficient, b
• Compute the means of X and Y and use them
and b to compute a
• Specify the regression equation, insert values of
X, solve for Ý, and plot the line on the
• Provide a conceptual diagram
Features of the Hypothesis Test
• Step 1. Stat. H: ρ = 0
• That is, there is no relationship between
X and Y
• The Greek letter rho (ρ) is the correlation
coefficient obtained if Pearson’s correlation
coefficient were computed for the population
• A ρ of zero asserts that there is no correlation in
the population and that the regression line has
no slope
• Step 2. The sampling distribution is the tdistribution with df = n - 2
• When the Stat. H is true, sample Pearson’s r’s
will center around zero
• This test does not require a direct calculation of
a standard error
• Step 4. The test effect is the value of
Pearson’s r
• The test statistic is tr
• The p-value is estimated from the t-distribution
table, Statistical Table C in Appendix B
Four Aspects of a Relationship
• With correlation and regression analysis,
because both variables are of interval/ratio
level, the analysis is mathematically rich
• All four aspects of a relationship apply
Existence of a Relationship
• Test the Stat. H that ρ = 0, that there is no
relationship between X and Y
• If the Stat. H is rejected, a relationship exists
Direction of a Relationship
• Direction is indicated by the sign of r and b,
and by observing the slope of the pattern of
coordinates in a scatterplot
• A positive relationship is revealed with an
upward slope, and r and b will be positive
• A negative relationship is revealed with a
downward slope, and r and b will be negative
Strength of a Relationship
• Strength is determined by the proportion of the
total variation in Y explained by X
• This proportion is quickly obtained by squaring
Pearson’s r correlation coefficient
Nature of a Relationship
1) Interpret the regression coefficient, b, the
slope of the regression line. State the effect on
Y of a one-unit change in X
2) Provide best estimates using the regression
line equation. Insert chosen values of X,
compute Ý ’s and interpret them in everyday
Careful Interpretation of
• A correlation applies to a population, not to an
• E.g., predictions of Y for a value of X provide
the best estimate of the mean of Y for all
subjects with that X-score
• A statistical relationship may exist but not mean
much. It is important to distinguish statistical
significance (i.e., the existence of a
relationship) from practical significance (i.e.,
the strength of the relationship
Spurious Correlation
• A spurious correlation is one that is
conceptually false, nonsensical, or theoretically
• E.g., in the 1990s there is a positive correlation
between the amount of carbon dioxide released
into the atmosphere and the level of the Dow
Jones stock index