Describing location in distributions - JuabMath

advertisement
AP Statistics
Unit 3: Examining Relationships
3.1A Scatterplots & Correlation
Name_________________________
Date____________
Hour____________
1. What is the difference between a response variable and an explanatory variable?
The explanatory variable is your x variable. It is the variable that you are using to predict you y or
response variable.
2. How are response and explanatory variables related to dependent and independent variable?
Your prediction for the response variable depends on what your independent or explanatory variable
is.
3. When is it okay to use a scatterplot to display data?
When you have two groups of quantitative data.
4. You can describe the overall pattern of a scatterplot by the ….
Strength, Direction, and Form
5. Which variable always appears on the horizontal axis of a scatterplot?
Explanatory variable
6. Explain the difference between a positive association and a negative association.
A positive association has a slope that is positive, or as the explanatory variable increases, so does the
response variable.
7. How can quantitative data which belongs to different categories be differentiated on a scatterplot?
Use different colors for the points.
8. What does correlation measure?
Correlation measures how well the data can be modeled by a linear relationship.
9. Explain why two variables must both be quantitative in order to find the correlation between
them.
Because the formula requires a standard deviation and mean for both variables, and categorical data
does not have a measure of spread and center.
10. What is true about the relationship between two variables if the relationship is:
a) Near 0? No linear relationship
b) Near 1? The relationship is positive and could almost be perfectly be modeled by a line.
c) Near -1? Negative
d) Exactly 1? All the points would fall exactly on a line with positive slope
e) Exactly -1? All the points would fall exactly on a line with negative slope
11. Is correlation resistant to extreme observations?
No.
2 EXAMINING RELATIONSHIPS
12. What does it mean if two variables have high correlation? It means that one the explanatory
variable does a very good job of predicting what the response variable would be. The relationship
follows a linear model very well and we can use a linear equation to make predictions about what
the response variable could be based on a given explanatory.
13. What does it mean if two variables have weak correlation?
A linear model does not do a very good job of explain the relationship between the two variables.
14. What does it mean if two variables have no correlation?
A linear model would be useless for describing the relationship.
15. Are hot dogs that are high in calories also high in salt? The scatterplot below shows
calories and salt content (measured as milligrams of sodium) in 17 brands of meat hot
dogs.
a) Roughly what are the highest and lowest calorie counts among these brands? Roughly what is
the sodium level in the brands with the fewest and with the most calories?
b) Does the scatterplot show a clear positive or negative association? Say in words what this
association means about calories and salt in hot dogs.
c) Are there any outliers? Is the relationship (ignoring any outliers) roughly linear in form? Still
ignoring any outliers, how strong would you say the relationship between calories and sodium
is?
(a) The lowest calorie count is about 107 calories
and the sodium level for this brand is about 145
mg. The highest calorie count is about 195
calories, and the sodium level for this brand is
about 510 mg. (b) The scatterplot shows positive
association; high-calorie hot dogs tend to be high
in salt, and low-calorie hot dogs tend to have low
sodium. (c) The lower left point is an outlier.
Ignoring this point, the relationship is linear and
moderately strong.
AP Statistics
Unit 3: Examining Relationships
3.1A Scatterplots & Correlation
16. The figure below is a scatterplot of school grade point average versus IQ score for 78
seventh-grade students.
a) Is the correlation r for these data near -1, clearly negative but not near -1, near 0, clearly positive
but not near 1, or near 1? Explain your answer.
b) Refer to the figure in the last problem. Is the correlation in the scatterplot below closer to 1 than
that for the scatterplot in the last problem, or closer to 0? Explain your answer.
c) Both scatterplots contain outliers. Removing the outliers will increase the correlation r in one
figure and decrease r in the other figure and decrease r in the other figure. What happens in each
figure and why?
(a) The correlation r is clearly positive but not near 1.
The scatterplot shows that students with high IQs tend
to have high grade point averages, but there is more
variation in the grade point averages for students with
moderate IQs. (b) The correlation r for the data in
Figure 3.8 would be closer to one. The overall positive
relationship between calories and sodium is stronger
than the positive association between IQs and GPAs.
(c) The outliers with moderate IQ scores in Figure 3.4
weaken the positive relationship between IQ and GPA,
so removing them would increase r. The outlier in the lower left corner of Figure
3.8 strengthens the positive, linear relationship between calories and sodium, so
removing this outlier would decrease r.
4 EXAMINING RELATIONSHIPS
17. The gas mileage of an automobile first increases and then decreases as the speed increases.
Suppose that this relationship is very regular, as shown by the following data on speed
(miles per hour) and mileage (miles per gallon):
Speed
20
30
40
50
60
Mileage
24
28
30
28
24
a) Make a scatterplot of mileage versus speed.
b) Show that the correlation between speed and
mileage is r = 0. (use formula: 𝑟 =
1
𝑥 −𝑥̅ 𝑦 −𝑦̅
∑ 𝑖 ∗ 𝑖 )Explain why the correlation
𝑛−1
𝑠
𝑠
𝑥
𝑦
is 0 even though there is a strong
relationship between speed and mileage.
(b) The speeds have a mean of 40 and a
standard deviation of 15.81. The mileages
have a mean of 26.8 mpg and a standard
deviation of 2.68 mpg The table below
shows the standardized values (labeled
zspeed and zmpg) obtained by subtracting the
mean and dividing by the standard deviation. The column labeled “product”
contains the product (zspeed×zmpg) of the standardized measurements. The sum of
the products is 0.0, so the correlation coefficient is also 0.0. The correlation
coefficient r measures the strength of linear association between two quantitative
variables; this plot shows a nonlinear relationship between speed and mileage.
Download