z-scores, Outliers and Two-way Tables page 1 Suppose that a

advertisement
Math 2B/3A
z-scores, Outliers and Two-Way Tables
Name:________________________________
5.22.14
Suppose that a college admissions office needs to compare scores of students who take the
Scholastic Aptitude Test (SAT) with those who take the American College Test (ACT). Suppose
that among the college’s applicants who take the SAT, scores have a mean of 1440 and a
standard deviation of 261. Further suppose that among the college’s applicants who take the
ACT, scores have a mean of 20.6 and a standard deviation of 5.2.
1. If applicant Bobby scored 1620 on the SAT, how many points above the SAT mean did he
score?
2. If applicant Kathy scored 28 on the ACT, how many points above the ACT mean did she
score?
3. Is it sensible to conclude that since your answer to (1) is greater than your answer to (b),
Bobby outperformed Kathy on the admissions test? Explain.
4. Determine how many standard deviations above the mean Bobby scored by dividing your
answer to (1) by the standard deviation of the SAT scores.
5. Determine how many standard deviations above the mean Kathy scored by dividing your
answer to (2) by the standard deviation of the ACT scores.
This activity illustrates the use of standard deviation to make comparisons of individual values
from different distributions. One calculates a z-score, or standardized score, by subtracting the
mean from the value of interest and then dividing by the standard deviation. These z-scores
indicate how many standard deviations above (or below) the mean a particular value falls. One
should use z-scores only when working with mound-shaped distributions, however.
6. Which applicant has the higher z-score for his or her admissions test score?
7. Explain in your own words which applicant performed better on his or her admissions test.
z-scores, Outliers and Two-way Tables
page 2
Calculating the z-score allows you to compare numbers that are measured on different scales but
measuring how far they are from the mean compared to other data measured on that scale.
z - score =
x-x
s
8. Calculate the z-score for applicant Peter, who scored 1110 on the SAT, and for applicant
Susan, who scored 19 on the ACT.
9. Which or Peter and Susan has the higher z-score?
10. Under what conditions does a z-score turn out to be negative?
11. We collected data about hours of sleep that students had the night before. (The
numbers in parentheses are the number of people who slept that many hours.)
Calculate the z-scores for 3, 6 and 10 hours of sleep.
12. Make a dot plot of the data and then above it, put the box-and-whisker plot of the data.
Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen
Hours of
sleep
(1) 3
(2) 5
(7) 6
(5) 7
(12) 8
(2) 9
(1) 10
z-scores, Outliers and Two-way Tables
page 3
To calculate if a data value is an outlier, first we need to calculate the interquartile range
(IQR). This is range from Q1 (the median of the first half of the data) to Q3 (the median of the
second half of the data). Visually, it is the length of the box in your box-and-whisker plot. For
the sleep data, the IQR = Q3 – Q1 = 8 – 6 = 2. The length of your box should be 2. An outlier
is (1.5)(IQR) past the edges of the box – in this case (1.5)(2) = 3. On the right-hand side, any
point that is 3 past Q3 (greater than 11) is an outlier. On the left hand side, any point that 3
before Q1 (less than 3) is an outlier. The minimum data point of 3 just barely avoids being an
outlier.
Name
13. For the data at the right, determine what lengths would be considered
(mm)
outliers for a signature. Are any numbers in the table outliers?
(1) 20
(1) 30
(1) 34
(1) 36
(1) 38
(3) 40
(2) 45
(1) 46
(1) 47
(2) 50
(1) 51
(1) 65
(1) 69
(1) 75
When a data point is an outlier, we must consider why that data point may
(1) 80
have come about. For example, when we consider the weights of the rowers in
the eight man scull and the weight of the coxswain, we can see that the
(1) 85
coxswain’s weight is an outlier. We know why he is so light.
14. Calculate how much the coxswain would have to weigh to not be considered an outlier.
name
Brown
Burden
Collins, P
Honebein
Kaehler
Koven
Murphy
Segaloff
Smith
weight
214
195
195
200
210
200
220
121
207
event
eight
eight
eight
eight
eight
eight
eight
coxswain
eight
Usually we don’t include outliers in our analysis because we consider them an anomaly.
Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen
z-scores, Outliers and Two-way Tables
page 4
Two-Way Tables
15. In a national survey of adult Americans in 1998, people were asked to indicate their age and
to classify their interest in politics as very much, somewhat, or not much. While age is
typically a quantitative variable, it was categorized into three groups for this analysis:
18–35; 36–55; and 56–94 (the oldest subject in the survey). The results are summarized in
the following frequency table; notice that the row and column totals are also provided:
Not much
Somewhat
Very much
Total
18–35
146
192
47
385
36–55
146
260
125
531
56–94
89
154
106
349
Total
381
606
278
1265
a. What proportion of the survey respondents were between the ages of 18 and 35?
b. What proportion of the survey respondents were between the ages of 36 and 55?
c. What proportion of the survey respondents were over the age of 55?
You have just calculated the marginal distribution of the age variable. When analyzing twoway tables, one typically starts by considering the marginal distribution of each variable by itself
before moving on to explore possible relationships between the two variables.
d. Calculate the marginal distributions of the interest variable.
To study possible relationships between two categorical variables, one examines conditional
distributions, i.e. distributions of one variable for given categories of the other variable.
e. Restrict your attention (for the moment) to just the respondents under 35 years of
age (the condition of being a young respondent). What proportion of these young
respondents classify themselves as having not much interest in politics?
f. What proportion of the young respondents classify themselves as somewhat
interested in politics?
g. What proportion of the young respondents classify themselves as very much
interested in politics?
Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen
z-scores, Outliers and Two-way Tables
page 5
Conditional distributions can be represented visually with segmented bar graphs. The rectangles
in a segmented bar graph all have a height of 100%, but they contain segments whose lengths
correspond to the conditional proportions.
h. Complete the segmented bar graph below by using the percentages that you found
above to shade the 18-35 category in the segmented bar graph, constructing the
conditional distributions of political interest among those aged 18 – 35.
i. Write a few sentences commenting on whether there seems to be any relationship
between age and political interest. In other words, does the distribution of political
interest seem to differ among the three age groups?
In dealing with conditional proportions, it is very important to keep straight which category is the
one being conditioned on. For example, the proportion of American males who are U.S.
Senators is very small (most men are not senators) but the proportion of U.S. Senators who are
American males is very large (most senators are men).
Refer to the original table of data on page 4 to answer the following:
j. What proportion of respondents aged 36 – 55 classified themselves as not much
interested in politics?
k. What proportion of those with not much interest in politics are of age 36 – 55?
l. What proportion of the people surveyed identified themselves as being both
between the ages of 36 – 55 and having not much political interest?
Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen
z-scores, Outliers and Two-way Tables
page 6
m. Now, make a segmented bar chart to represent the data based on the interest
condition (don’t forget to make a key for the meaning of the different shadings):
n. What differences do you notice between this segmented bar graph and the other
segmented bar graph that you finished above?
Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen
Download