response variable

advertisement
Examining Relationships
Prob. And Stat. CH.2.1
Scatterplots
• The relationships between two variables can be
strongly influenced by other variables that are
lurking in the background.
• Because variation is everywhere, statistical
relationships are overall tendencies, not ironclad
rules. They allow individual exceptions.
• To study a relationship between two variables, we
measure both variables on the same individual.
• We often think that one of the variables explains
or influences the other.
Response Variable, Explanatory Variable
A response variable measures an outcome of a
study.
An explanatory variable explains or influences
changes in a response variable.
•A response variable measures an outcome of a
study. (Independent Variables)
•An explanatory variable explains or influences
changes in a response variable. (Dependent
Variables)
•Because of the multiple meanings that the words
“independent”, and “dependent”, have in statistics,
we should try to avoid using them.
•It is easier to identify explanatory and response
variables when we actually set values of one
variable in order to see how it affects another
variable.
When we don’t set the values of either variable
but just observe both variables, there may or may
not be explanatory and response variables.
Whether there are depends on how we plan to
use the data.
EXAMPLE 2.1
The effects of alcohol
Alcohol has many effects on the body. One effect is
a drop in body temperature. To study this effect,
researchers give several different amounts of
alcohol to mice, then measure the change in each
mouse’s body temperature in the 15 minutes after
taking the alcohol. Amount of alcohol is the
explanatory variable, and change in body
temperature is the response variable.
EXAMPLE 2.2
SAT scores
Jim wants to know the average SAT math and verbal
scores in the 51 states (including the District of
Columbia) are related to each other. He doesn’t
think that either score explains or causes the other.
Jim has two related variables, and neither is an
explanatory variable.
Julie looks at the same data. She asks, “Can I predict
a state’s SAT math score if I know its SAT verbal
score?” Julie is treating the verbal score as the
explanatory variable and the math score as the
response variable.
• Alcohol actually causes a change in body
temperature.
• There is no cause-and –effect relationship
between SAT math and verbal scores.
• Because the scores are closely related, we can
use a state’s SAT verbal score to predict its math
score.
• In section 2.3 we will look at predictions, that
require us to identify an explanatory variable and
a response variable
• Most statistical studies examine data on more
than one variable.
• If we call one variable explanatory and the other
response doesn’t mean that changes in one cause
changes in the other.
Statistical Principles
• First plot the data, then add numerical
summaries.
• Look for overall patterns and deviations from
those patterns.
• When the overall pattern is quite regular, use a
compact mathematical model to describe it.
•The most common way to display the relation
between two quantitative variables is a scatterplot.
•Always plot the explanatory variable, if there is
one, on the horizontal axis (the x axis) of a
scatterplot.
•Usually we call the explanatory variable x and the
response variable y.
SCATTERPLOT
A scatterplot shows the relationship between two
quantitative variables measured on the same
individuals. The values of one variable appear on
the horizontal axis. Each individual in the data
appears as the point in the plot fixed by the
values of both variables for that individual.
EXAMINING A SCATTERPLOT
In any graph of data, look for the overall pattern
and for striking deviations from that pattern.
You can describe the overall pattern of a
scatterplot by the form, direction, and strength of
the relationship.
An important kind of deviation is an outlier, an
individual value that falls outside the overall
pattern of the relationship.
Analysis of SAT/ACT Scatterplots
Figure 2.1 shows a clear form: there are two distinct
clusters of states with a gap between them. In the
cluster at the right of the plot, 45% or more of high
school graduates take the SAT, and the average
scores are low. The states in the cluster at the left
have higher SAT scores and lower percents of
graduates taking the test. There are no clear
outliers. That is, no points fall clearly outside the
clusters.
What explains the clusters? There are two widely
used college entrance exams, the SAT and the
America College Testing (ACT) exam. Each state
favors one or the other. The left cluster in figure 2.1
contains the ACT states, and the SAT states make up
the right cluster. In ACT states, most students who
take the SAT are applying to a selective college that
requires SAT scores. This select group of students
has a higher average score than the much larger
group of students who take the SAT in SAT states.
The relationship in Figure 2.1 also has a clear
direction: states in which a higher percent of
students take the SAT tend to have lower average
scores. This is a negative association between the
two variables.
POSITIVE ASSOCIATION, NEGATIVE ASSOCIATION
Two variables are positively associated when
above-average values of one tend to accompany
above-average values of the other and belowaverage values also tend to occur together.
Two variables are negatively associated when
above-average values of one tend to accompany
below-average values of the other, and vice versa.
The strength of a relationship in a scatterplot is
determined by how closely the points follow a clear
form. The overall relationship in Figure 2.1 is not
strong-states with similar percents taking the SAT
show quite a bit of scatter in their average scores.
Here is an example of a stronger relationship with a
clearer form.
•Not all relationships are linear in form. They can
be nonlinear.
•Not all relationships have a clear direction that we
can describe as a positive association or a negative
association.
Adding categorical variables to scatterplots
Looking at SAT scores, we can add categorical
variables for Southern and Non-southern states,
adds a third variable to the scatterplot. Figure 2.3
uses two different colors to distinguish the
difference.
Use different colors or symbols to plot points
when you want to add a categorical variable to
a scatterplot.
We can use different symbols if two or more
values occupy the same point or represent more
than one individual.
Example 2.5
Is the South different?
Figure 2.3 enhances the scatterplot in Figure 2.1 by
plotting the southern states in purple. (We took the
South to be the states in the East South Central and
South Atlantic regions.) Most of the southern states
blend in with the rest of the country. Several
southern states do lie at the lower edges of their
clusters, along with the District of Columbia, which is
a city rather than a state. Georgia, South Carolina,
and West Virginia have lower SAT scores than we
would expect from the percent of their high school
graduates who take the examination.
A scatterplot displays the form, direction, and
strength of the relationship between two quantitative
variables.
Download