Unit 4 Bivariate Data

advertisement
UNIT 4
Bivariate Data
Scatter Plots and Regression
What is
Bivariate
Data?
• Bivariate Data are two
quantitative variables.
Example: The
percentage of
people who
would vote for
a woman
president over
the last century
Displaying
Bivariate
Data
• Bivariate Data is typically displayed with a
Scatter Plot
• Scatterplots may be the most common
and most effective display for data.
• Scatterplots are the best way to start
observing the relationship and the ideal
way to picture associations between two
quantitative variables.
• X axis – the explanatory (or predictor)
variable.
• Y axis – the response variable.
Examples:
For the
following
state the
explanatory
variable and
the response
variable.
• Do students learn better with the
amount of homework assigned?
• Does the weight of a car affect the
miles per gallon that the car gets?
• Are SAT math scores and GPA
related?
• Is there an association between a
person’s speed and the amount of
weight they can squat?
What
should you
look for in a
scatterplot
• Direction – which way are the
points going?
positive, negative, neither.
• Form - linear, quadratic,
exponential, logarithmic
• Strength – how much scatter
is there in the plot?
Weak, moderate, strong
Example: Peak
Period Freeway
Speed and cost
per person
What is the
direction, form
strength?
What is the
direction,
form
strength?
What is the
direction,
form
strength?
Outliers
• You can describe the overall
pattern of a scatterplot by
Direction, Form, and Strength
of the relationship.
• An important kind of deviation
is an outlier, an individual
value that falls outside the
overall pattern of the
relationship.
A scatter plot is a
picture of the
relationship between
two quantitative
variables. If a liner
relationship exists
between two variables
the scatter plot will exist
as a swarm of points
stretched out in a
generally consistent
manner.
If the relationship
isn’t straight, we
can find ways to
make it more
nearly straight.
Correlation Calculation
The correlation coefficient (r) gives us a numerical
measurement of the strength of the linear relationship
between the explanatory and response variables.
r 
z
z
x
y
n 1
The calculator will do the work.
Correlation
Conditions •
• Correlation measures the strength of the linear
association between two quantitative variables.
Before you use correlation, you must check several
conditions:
– Quantitative Variables Condition – Correlation is only
used for quantitative variables
– Straight Enough Condition - But correlation
measures the strength only of the linear association,
and will be misleading if the relationship is not linear.
– Outlier Condition – Outliers can distort the
correlation. When you see an outlier, it’s often a good
Sli
idea to report the correlations with and without the de713
point.
Correlation
• Correlation describes the
direction and strength of a
linear relationship
• -1 ≤ r ≤ 1
• r > 0 positive linear association
• r = 0 no association
• r < 0 negative linear
association
Strength of
the
relationship
• 𝑟 > 0.8 strong
• 0.5 < 𝑟 < 0.8 moderate
• 𝑟 < 0.5 weak
• Examples
-0.35
0.98
0.67
• We need to be careful about interpreting
correlation coefficients. Just because two variables
are highly correlated does not mean that one
causes the other.
Correlation
and
• In statistical terms we say that correlation does
Causation not imply causation.
• Examples:
• The number of ice cream sales and the number of
shark attacks on swimmers are highly correlated.
• The increase in stock prices and the length of
women’s skirts are highly correlated.
• The number of cavities in elementary school
children and vocabulary size have a strong positive
correlation.
Three
relationships
which can be
taken (or
mistaken) for
causation
• Causation – Changes in X causes
changes in Y
• Common response – Both X and Y
respond to some unobserved
variable
• Confounding – The effect of X on Y
is hopelessly mixed up with the
effects of other variables on Y.
Correlation
and
Causation
• Association, relationship
and correlation does not
imply causation
• Causation does imply
association, relationship
and correlation.
Download