AP Statistics – Chapter 4 Topics Textbook Assignments: a. 1, 3, 6, 9, 11, 12 b. 17, 19, 22, 23, 26, 28 c. 29, 32, 33, 35 d. 37, 39, 42, 43, 44, 50 (38 in class) Objectives for chapter 4: 1. Find measures of central tendency. 2. Combine means from multiple samples to get the mean for the combined sample. 3. Find the trimmed mean. 4. Know whether the mean or median is the better measure of center for a given situation. 5. Know what resistance and sensitivity mean. 6. Find the standard deviation and the variance and interpret these values. 7. Determine how “spread out” a distribution is using the mean and standard deviation. 8. Empirical Rule – Use it to find percentiles and percentages between values of a normal distribution. 9. Chebyshev’s Formula – Use it to estimate percentages between values of any distribution. 10. Know the requirements for using Empirical Rule. 11. Know what values are considered unusually high or unusually low for normally distributed data. 12. Create a boxplot and a modified boxplot. 13. Know what the numbers in the 5 number summary mean and interpret them. 14. Draw comparative boxplots and write comparisons between two sets of data. 15. Know the IQR rule for identifying outliers and use it. 16. Know which plots are good for numerical univariate data vs. categorical univariate data vs. numerical bivariate data. 17. Know the relationship between quartiles and percentiles and Empirical Rule. 18. Given a set of data find approximate percentiles. 19. Don’t forget C.U.S.S. Exploratory Data Analysis: Using Descriptive Statistics and Graphs to Draw Conclusions About Data and Make Comparisons Between Multiple Data Sets Measures of Central Tendency Mean – sample mean vs. population mean (𝑥̅ vs. ) Median – sample median vs. population median Mode Midrange and Trimmed Mean Measures of Spread Standard deviation is the square root of the variance. (s and s2 vs. and 2) Range AP Statistics – Chapter 4 Topics Measures of Position Quartiles Minimum and Maximum Z-scores City Temperatures Average monthly temperatures for Raleigh and San Francisco Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Raleigh 39 42 50 59 67 74 78 77 71 60 51 43 San Francisco 49 52 53 56 58 62 63 64 65 61 55 49 Raleigh: San Francisco: Mean Mean Median Median Standard Deviation Standard Deviation Minimum Minimum Q1 Q1 Q3 Q3 Maximum Maximum Which descriptive statistic tells the most about how these two sets of data compare? What conclusions could be drawn? AP Statistics – Chapter 4 Topics Five Number Summaries and the Box-and-Whisker Plot: A quick way to compare center, spread and distribution between multiple sets of data, the Five Number Summary consists of: •Minimum (Min) •Quartile 1 (Q1) •Median (Med) “Quartile 2” •Quartile 3 (Q3) •Maximum (Max) Behold! The Box-and-Whisker Plot 30 38 40 42 45 54 62 Min Q1 25% 30 25% 40 Med 25% Q3 Max 25% 50 60 70 AP Statistics – Chapter 4 Topics Word Lengths (Adapted from Workshop Statistics: Discovery with Data; Rossman and Chance) The first sentence of the first chapter of one of my favorite books reads, “You are starting the study of one of the most interesting and useful branches of mathematics.” The following table lists the lengths of the 16 words used in that sentence. 3 3 3 4 8 11 3 3 5 6 2 8 3 2 2 11 Compare the distribution of word lengths in the first sentence of my favorite book with the distribution of word lengths in a sentence or two from one of your favorite writers. Write a paragraph describing your findings. Use appropriate descriptive statistics and graphical displays. AP Statistics – Chapter 4 Topics Z-scores z x s x A z-score is a measure of the number of standard deviation between a value in a set of data and the mean. A positive z-score indicates the value is above the mean a negative z-score indicates it is below. x your score x sample mean score s sample standard deviation For San Francisco, the average temperature for June, 62, has a z-score of 62 57.25 .826 5.75 This indicates that average temperature for June was above the average temperature for the year by .826 standard deviations. AP Statistics – Chapter 4 Topics Empirical Rule If a data set is approximately normal, that is to say its distribution is roughly bellshaped and symmetrical or its boxplot is roughly symmetrical with relatively short whiskers in comparison to the size of the box (long whiskers in relation to the box indicate outliers) about 95% of the data will be within 2 standard deviations of the average. In this case, z-scores greater than 2 or less than -2 are associated with values of x that are considered unusual. Commuting Times (Adapted from Introduction to Statistics and Data Analysis; Peck, Olsen, Devore) A teacher, who lives a few miles outside of Mustang, records the time he takes to drive to the school each morning. Given below are the commute times for 20 mornings. 7.92 8.42 8 7.75 8.08 7.42 8.42 6.75 8.75 7.42 8.08 8.5 9.75 8.67 8.33 10.17 7.83 7.83 7.92 8.58 a) Give the five number summary for this data. b) Draw the boxplot for this data set. Is it roughly symmetric? Are any outliers indicated? c) The three unusual observations can be explained. The low time was a day the seniors were not on campus (less traffic) and the two high times reflect days when the highway was being resurfaced. Remove these values and recalculate the mean and standard deviation. d) Based on this sample, would a commute time of 8.5 minutes be considered unusual for a typical day? What about a commute of 7 minutes? AP Statistics – Chapter 4 Topics Who did better? Z-scores allow us to compare relative position within a distribution between two or more sets of data. However, this only works well if both sets of data are at least approximately normally distributed. Exam Day! (1997 College Board AP Statistics exam question) At a college the scores on the chemistry final exam are approximately normally distributed, with a mean of 75 and a standard deviation of 12. The scores on the calculus final are also approximately normally distributed, with a mean of 80 and a standard deviation of 8. A student scored 81 on the chemistry final and 84 on the calculus final. Relative to the students in each respective class, in which subject did this student do better? AP Statistics – Chapter 4 Topics DIRECTIONS: Circle the best response for the answer of multiple choice questions. For the Free Response questions, be sure to be explicit on answers. A survey was conducted to gather ratings of the quality of service at local restaurants at a nearby mall. Respondents were to rate overall service using values between 0 (terrible) and 100 (excellent). The following stem plot represents the data. Stem | Leaves 3 |3 4 4 |0 1 3 4 7 7 7 9 5 |0 1 1 3 4 4 5 6 |2 4 5 5 8 7 |3 7 8 | 9 |5 1. What percent of the respondents rated quality as moderate to very poor (rating of 60 to 0)? a) 32% b) 50% c) 68% d) 75% e) none of these 2. The median response was a) 49 b) 50 c) 51 3. The mean of these data is a) equal to the median. c) greater than the median. determined. 4. The value of 33 is a) the minimum but not an outlier. c) one of three outliers. d) 52 e) 53 b) less than the median. d) an integer. e) cannot be b) the minimum and an outlier. d) not a data value. e) none of these. 5. Sketch and label the five number summary on a box plot of these data. 6. In skewed-left distributions, what is most frequently the relationship of the mean, median, and mode? a) mean > mode > median b) median > mode > mean c) mode > mean > median d) mode > median > mean e) mean > median > mode 7. If the mean of 60 values is 52.6 pounds and the mean of 40 values is 48.4 pounds, find the mean of all 100 values. AP Statistics – Chapter 4 Topics 8. Of 1000 high school students whose mean heights is 67.8 inches, 350 were girls. If the mean height of the girls was 63.0 inches, what is the mean height of the boys? 9. Which grade is better: A 78 on a test whose mean is 70 and standard deviation of 8.5, or an 83 on a test whose mean is 77 and standard deviation is 6.4? Justify your answer. Use the table of test grades for two classes A and B to answer the following questions. A B A B A B A B 97 100 75 78 91 74 95 75 86 65 82 98 65 82 45 98 74 75 81 65 62 83 10. Calculate the five-number summary and the mean and standard deviation for each set of grades. 11. Describe the benefits of each set of summary statistics for these data. 12. Construct parallel box plots for the data from the two classes. Which class did better based on your plots and calculations? Defend your opinion. AP Statistics – Chapter 4 Topics Match each histogram with its boxplot, by writing the letter of the boxplot in the space provided. 1. _______ A. 2. _______ B. 3. _______ C. 4. _______ D. 5. _______ E.