Math 2B/3A z-scores, Outliers and Two-Way Tables Name:________________________________ 5.22.14 Suppose that a college admissions office needs to compare scores of students who take the Scholastic Aptitude Test (SAT) with those who take the American College Test (ACT). Suppose that among the college’s applicants who take the SAT, scores have a mean of 1440 and a standard deviation of 261. Further suppose that among the college’s applicants who take the ACT, scores have a mean of 20.6 and a standard deviation of 5.2. 1. If applicant Bobby scored 1620 on the SAT, how many points above the SAT mean did he score? 2. If applicant Kathy scored 28 on the ACT, how many points above the ACT mean did she score? 3. Is it sensible to conclude that since your answer to (1) is greater than your answer to (b), Bobby outperformed Kathy on the admissions test? Explain. 4. Determine how many standard deviations above the mean Bobby scored by dividing your answer to (1) by the standard deviation of the SAT scores. 5. Determine how many standard deviations above the mean Kathy scored by dividing your answer to (2) by the standard deviation of the ACT scores. This activity illustrates the use of standard deviation to make comparisons of individual values from different distributions. One calculates a z-score, or standardized score, by subtracting the mean from the value of interest and then dividing by the standard deviation. These z-scores indicate how many standard deviations above (or below) the mean a particular value falls. One should use z-scores only when working with mound-shaped distributions, however. 6. Which applicant has the higher z-score for his or her admissions test score? 7. Explain in your own words which applicant performed better on his or her admissions test. z-scores, Outliers and Two-way Tables page 2 Calculating the z-score allows you to compare numbers that are measured on different scales but measuring how far they are from the mean compared to other data measured on that scale. z - score = x-x s 8. Calculate the z-score for applicant Peter, who scored 1110 on the SAT, and for applicant Susan, who scored 19 on the ACT. 9. Which or Peter and Susan has the higher z-score? 10. Under what conditions does a z-score turn out to be negative? 11. We collected data about hours of sleep that students had the night before. (The numbers in parentheses are the number of people who slept that many hours.) Calculate the z-scores for 3, 6 and 10 hours of sleep. 12. Make a dot plot of the data and then above it, put the box-and-whisker plot of the data. Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen Hours of sleep (1) 3 (2) 5 (7) 6 (5) 7 (12) 8 (2) 9 (1) 10 z-scores, Outliers and Two-way Tables page 3 To calculate if a data value is an outlier, first we need to calculate the interquartile range (IQR). This is range from Q1 (the median of the first half of the data) to Q3 (the median of the second half of the data). Visually, it is the length of the box in your box-and-whisker plot. For the sleep data, the IQR = Q3 – Q1 = 8 – 6 = 2. The length of your box should be 2. An outlier is (1.5)(IQR) past the edges of the box – in this case (1.5)(2) = 3. On the right-hand side, any point that is 3 past Q3 (greater than 11) is an outlier. On the left hand side, any point that 3 before Q1 (less than 3) is an outlier. The minimum data point of 3 just barely avoids being an outlier. Name 13. For the data at the right, determine what lengths would be considered (mm) outliers for a signature. Are any numbers in the table outliers? (1) 20 (1) 30 (1) 34 (1) 36 (1) 38 (3) 40 (2) 45 (1) 46 (1) 47 (2) 50 (1) 51 (1) 65 (1) 69 (1) 75 When a data point is an outlier, we must consider why that data point may (1) 80 have come about. For example, when we consider the weights of the rowers in the eight man scull and the weight of the coxswain, we can see that the (1) 85 coxswain’s weight is an outlier. We know why he is so light. 14. Calculate how much the coxswain would have to weigh to not be considered an outlier. name Brown Burden Collins, P Honebein Kaehler Koven Murphy Segaloff Smith weight 214 195 195 200 210 200 220 121 207 event eight eight eight eight eight eight eight coxswain eight Usually we don’t include outliers in our analysis because we consider them an anomaly. Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen z-scores, Outliers and Two-way Tables page 4 Two-Way Tables 15. In a national survey of adult Americans in 1998, people were asked to indicate their age and to classify their interest in politics as very much, somewhat, or not much. While age is typically a quantitative variable, it was categorized into three groups for this analysis: 18–35; 36–55; and 56–94 (the oldest subject in the survey). The results are summarized in the following frequency table; notice that the row and column totals are also provided: Not much Somewhat Very much Total 18–35 146 192 47 385 36–55 146 260 125 531 56–94 89 154 106 349 Total 381 606 278 1265 a. What proportion of the survey respondents were between the ages of 18 and 35? b. What proportion of the survey respondents were between the ages of 36 and 55? c. What proportion of the survey respondents were over the age of 55? You have just calculated the marginal distribution of the age variable. When analyzing twoway tables, one typically starts by considering the marginal distribution of each variable by itself before moving on to explore possible relationships between the two variables. d. Calculate the marginal distributions of the interest variable. To study possible relationships between two categorical variables, one examines conditional distributions, i.e. distributions of one variable for given categories of the other variable. e. Restrict your attention (for the moment) to just the respondents under 35 years of age (the condition of being a young respondent). What proportion of these young respondents classify themselves as having not much interest in politics? f. What proportion of the young respondents classify themselves as somewhat interested in politics? g. What proportion of the young respondents classify themselves as very much interested in politics? Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen z-scores, Outliers and Two-way Tables page 5 Conditional distributions can be represented visually with segmented bar graphs. The rectangles in a segmented bar graph all have a height of 100%, but they contain segments whose lengths correspond to the conditional proportions. h. Complete the segmented bar graph below by using the percentages that you found above to shade the 18-35 category in the segmented bar graph, constructing the conditional distributions of political interest among those aged 18 – 35. i. Write a few sentences commenting on whether there seems to be any relationship between age and political interest. In other words, does the distribution of political interest seem to differ among the three age groups? In dealing with conditional proportions, it is very important to keep straight which category is the one being conditioned on. For example, the proportion of American males who are U.S. Senators is very small (most men are not senators) but the proportion of U.S. Senators who are American males is very large (most senators are men). Refer to the original table of data on page 4 to answer the following: j. What proportion of respondents aged 36 – 55 classified themselves as not much interested in politics? k. What proportion of those with not much interest in politics are of age 36 – 55? l. What proportion of the people surveyed identified themselves as being both between the ages of 36 – 55 and having not much political interest? Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen z-scores, Outliers and Two-way Tables page 6 m. Now, make a segmented bar chart to represent the data based on the interest condition (don’t forget to make a key for the meaning of the different shadings): n. What differences do you notice between this segmented bar graph and the other segmented bar graph that you finished above? Primarily from Workshop Statistics by Rossman, Chance and Von Oehsen