Statistics - Healey Chapter 13-14

advertisement
Week 12
Chapter 13 – Association between variables
measured at the ordinal level
&
Chapter 14: Association Between Variables
Measured at the Interval-Ratio Level
Chapter 13
Association Between Variables
Measured at the Ordinal Level
This Presentation
 Two Types of Ordinal Variables
 Gamma
 Spearman’s Rho
 Hypothesis Tests for Gamma and Rho
Two Types of Ordinal Variables
Continuous ordinal variables:
1.
Have many possible scores
Resemble interval-ratio level variables
Use Spearman’s Rho: rs
Example: a scale measuring attitudes toward handgun
control with scores ranging from 0 to 20




Collapsed ordinal variables:
2.
Have just a few values or scores
Use Gamma: G




Can also use Somer’s d and Kendall’s tau-b (see text website)
Example: social class measured as lower, middle, upper
Gamma
 Gamma is used to measure the strength and direction of
the relationship between two ordinal level variables that
have been arrayed in a bivariate table
 Before computing and interpreting Gamma, it will always
be useful to find and interpret the column percentages
Gamma
Interpretation:
 Use the table below as a guide to interpret the strength of
gamma in overall terms
Gamma
 In addition to strength, gamma also identifies the
direction of the relationship
 In a negative relationship, the variables change in
different directions
 Example: As age increases, income decreases
(or, as age decreases, income increases)
 In a positive relationship, the variables change in the
same direction
 Example: As education increases, income
increases (or, as education decreases, income
decreases)
Gamma
Gamma
Gamma
In addition to strength and direction, a hypothesis test of
Gamma can also indicate if the two variables share a
relationship in the population, or if the two variables are
significantly related
Hypothesis Test of Gamma:
 Step 1: Make Assumptions and Meet Test
Requirements
 Random sampling
 Ordinal level of measurement
 Normal sampling distribution
Gamma
Hypothesis Test of Gamma:
 Step 2: State the Null Hypothesis

Ho: γ = 0


No relationship exists between the variables
in the population
H1: γ ≠ 0

A relationship exists between the variables in
the population
Gamma
Hypothesis Test of Gamma:
 Step 3: Select the Sampling Distribution and
Establish the Critical Region
 Sampling distribution = Z distribution
 Set alpha (two-tailed)
 Look up Z(critical) in Appendix A
Gamma
Hypothesis Test of Gamma:
 Step 4: Compute the Test Statistic
Ns  Nd
Z(obtained)  G 
N (1  G2 )
Ns  Nd
w hereG 
Ns  Nd
Gamma
Hypothesis Test of Gamma:
 Step 5: Make a Decision and Interpret the
Results
 Compare Z(obtained) to Z(critical)
 If Z(obtained) falls in the critical region,
reject Ho
 If Z(obtained) does not fall in the critical
region, fail to reject Ho
 Interpret results
Spearman’s Rho (rs)

Measure of association for ordinal-level variables
with a broad range of different scores and few ties
between cases on either variable
 Computing Spearman’s Rho
1.
Rank cases from high to low on each variable
2.
Use ranks, not the scores, to calculate Rho
Spearman’s Rho (rs)
Spearman’s Rho (rs)
Spearman’s Rho (rs)
Spearman’s Rho (rs)





Rho is positive, therefore jogging and self-image
share a positive relationship: as jogging rank
increases, self-image rank also increases
On its own, Rho does not have a good strength
interpretation
But Rho2 is a PRE measure
For this example, Rho2 = (0.86)2 = 0.74
Therefore, we would make 74% fewer errors if we
used the rank of jogging to predict the rank on selfimage compared to if we ignored the rank on
jogging
Spearman’s Rho (rs)
In addition to strength and direction, a hypothesis test of Rho can
also indicate if the two variables share a relationship in the
population, or if the two variables are significantly related
Hypothesis Test of Spearman’s Rho:
 Step 1: Make Assumptions and Meet Test
Requirements
 Random sampling
 Ordinal level of measurement
 Normal sampling distribution
Spearman’s Rho (rs)
Hypothesis Test of Spearman’s Rho:
 Step 2: State the Null Hypothesis

Ho: ρs = 0


No relationship exists between the variables
in the population
H1: ρs ≠ 0

A relationship exists between the variables in
the population
Spearman’s Rho (rs)
Hypothesis Test of Spearman’s Rho:
 Step 3: Select the Sampling Distribution and
Establish the Critical Region
 Sampling distribution = Student’s t
 Alpha = 0.05 (two-tailed)
 Degrees of freedom = N-2 = 8
 t(critical) = ±2.306
Spearman’s Rho (rs)
Hypothesis Test of Gamma:
 Step 4: Compute the Test Statistic
Spearman’s Rho (rs)
Hypothesis Test of Gamma:
 Step 5: Make a Decision and Interpret the
Results
 t(obtained) = 4.77
 t(critical) = ±2.306
 t(obtained) falls in the critical region, so
reject Ho
 Jogging and self-image are related in the
population from which the sample was
drawn
Chapter 14
Association Between Variables
Measured at the Interval-Ratio Level
This Presentation
 Scattergrams
Graphs that display relationships between two interval-ratio
variables
Regression Coefficients and the Regression Line
 Regression line summarizes the linear relationship between
X and Y
 Regression coefficients predict scores on Y from scores on
X
Pearson’s r
 Preferred measure of association for two interval-ratio
variables
Coefficient of determination: r2
Correlation matrix





Scattergrams
 Scattergrams have two dimensions:
The X (independent) variable is arrayed along the
horizontal axis
 The Y (dependent) variable is arrayed along the
vertical axis
 Each dot on a scattergram is a case
 The dot is placed at the intersection of the case’s
scores on X and Y

Scattergrams
 A regression line, which summarizes the linear
relationship between X and Y, is added to the graph
 “Eyeball” a straight line that connects all of the
dots or comes as close as possible to connecting
all of the dots
 To be more precise: calculate the conditional
mean of Y for each value of X, plot those values,
and connect the dots
 Inspection of a scattergram should always be the
first step in assessing the relationship between two
interval-ratio level variables
Scattergrams
Linearity
 A key assumption of scattergrams and regression
analysis is that X and Y share a linear relationship
 In a linear relationship the dots of a scattergram form a
straight line pattern
Linear Relationship: Example
Scattergrams
Linearity
 In a nonlinear relationship the dots do not form a straight
line pattern
Scattergrams
Three Questions
 Does a relationship exist?
 A relationship exists if the conditional
means of Y change across values of X
 As long as the regression line lies at an
angle to the X axis (and is not parallel to
the X axis), we can conclude that a
relationship exists between the two
variables
Scattergrams
Three Questions
 How strong is the relationship?
 Strength of the relationship is determined by the
spread of the dots around the regression line
 In a perfect association, all dots fall on the
regression line
 In a stronger association, the dots fall close
(are clustered tightly around) the regression
line
 In a weaker association, the dots are spread
out relatively far from the regression line
Scattergrams
Three Questions
 What is the direction of the relationship? (Direction of
association is determined by the angle of the
regression line)
What is the Direction of the Relationship?
Scattergrams
Based on this scattergram for percent college educated (X) and voter
turnout (Y) on election day for 50 states:
Does a relationship exist? How strong is the relationship?
What is the direction of the relationship? Is the relationship linear?
Scattergrams
 Does a relationship exist?

The regression line falls at an angle to the X axis (it is not
parallel), therefore we can conclude that an association
exists between voter turnout and college education
Scattergrams
 How strong is the relationship?


The greater the extent to which dots are clustered around
the regression line, the stronger the relationship
This relationship is weak to moderate in strength
Scattergrams
 What is the direction of the relationship?



Positive: Regression line rises from lower-left to upper-right
Negative: Regression line falls from upper-left to lower-right
This is a positive relationship: As percent college educated
increases, voter turnout increases
Scattergrams
 Is the relationship linear?
 The conditional means on Y form a straight line, as
demonstrated by the regression line
 Therefore, the relationship is linear
Pearson’s r
 Pearson’s r is a measure of association for interval-
ratio level variables
 Pearson’s r can indicate the direction of association,
but it does not have an acceptable strength
interpretation
 But, by squaring r, we obtain a PRE measure called
the coefficient of determination
 The coefficient of determination indicates the
percentage of the variation in Y that is explained by X
Pearson’s r
 Calculate r
Pearson’s r
 r = 0.50
 r is positive, therefore the relationship between X and Y is
positive
 As the number of children in dual-career families increases,
husbands’ hours of housework per week also increases
 r2 = (0.50)2 = 0.25
 r2 is 0.25, therefore the number of children in dual-career
families explains 25% of the variation in husbands’ hours of
housework per week
Pearson’s r
Hypothesis Test of Pearson’s r
 Step 1: Make Assumptions and Meet Test
Requirements






Random sample
Interval-ratio level measurement
Bivariate normal distributions
Linear relationship
Homoscedasticity
Normal sampling distribution
Pearson’s r
Hypothesis Test of Pearson’s r
 Step 2: State the Null Hypothesis


H o: ρ = 0
H 1: ρ ≠ 0
 Step 3: Select the Sampling Distribution and Establish the
Critical Region




Sampling distribution = Student’s t
Alpha = 0.05 (two-tailed)
Degrees of freedom = N-2 = 10
t(critical) = ±2.228
Pearson’s r
Hypothesis Test of Pearson’s r
 Step 4: Compute Test Statistic
Pearson’s r
Hypothesis Test of Pearson’s r
 Step 5: Make a Decision and Interpret the Results
 t(critical) = ±2.228
 t(obtained) = 1.83
 t(obtained) does not fall in the critical region, so we
fail to reject Ho
 The two variables are not related in the population
Correlation Matrix
 A correlation matrix is a table that shows the relationships
between all possible pairs of variables
Correlation Matrix
 Using the matrix below:
 What is the correlation between GDP and inequality?
 Of all the variables correlated with Inequality, which has the
strongest relationship? The weakest?
Download