AP Statistics Project Part I – Exploring and Understanding Data (MODEL ANSWERS) This project explores the numerical and visual display and analysis of data and determines students’ ability, using a graphing calculator to: Enter and manipulate data; Complete basic numerical computations (5-number summaries, mean and standard deviation); Create properly formatted visual displays of data (tables, bar graphs, histograms, box plots, stemand-leaf plots and normal distribution plots); Analyze and interpret numerical summaries and visual displays of data; Scenario and Data Scores (in percentages) for the first test of the year for two AP Statistics classes are provided below. The class “First” meets first period of the day, every day of the school year. The second class “Last” meets last period of the day, every day of the school year. There are 21 students in the First class and 30 students in the Last class: First Scores: 98, 52, 92, 92, 60, 66, 90, 86, 86, 70, 72, 84, 82, 82, 82, 82, 74, 74, 76, 80, and 80 (checksum: 1,660). Last Scores: 96, 95, 92, 54, 57, 58, 86, 85, 82, 82, 82, 60, 66, 66, 66, 68, 80, 80, 77, 76, 76, 75, 74, 74, 74, 73, 72, 72, 71, and 70 (checksum: 2,239). The AP Statistics teacher would like to conduct an analysis that compares the grades of the classes to determine if there is a difference in skill level between the students in the two classes. Section 1: Graphing Calculator (round all decimals to the tenth place) a) Enter the data sets into two columns in the list processor of your graphing calculator. Order the data from highest to lowest. b) Use the checksum numbers above to ensure that you entered your data correctly. The sum of each data set should equal its checksum. c) Use your calculator to find the mean and standard deviation of each class. Insert your answers here: First Mean: 79.0% Last Mean: 74.6% First Standard Deviation: 10.8 Last Standard Deviation: 10.3 Which standard deviation did you record above, s or σ? Explain why: Students should select σ as each data set is a full population of a class. If they selected s, their answers will be FIRST: 11.0 and LAST: 10.5. Their answers should include a sentence or two explaining the difference between a population and sample mean. d) Create a 5-number summary for each class, round to 1 decimal place & record your results here: Max Q3 Median Q1 Min First 98.0% 86.0% 82.0% 73.0% 52.0% Last 96.0% 81.5% 74.0% 68.0% 54.0% e) Identify any outlier(s) in these data sets. Explain why they’re outliers. Using the IQR*1.5 method, 52% from FIRST is the only outlier in either data set. f) Create modified box plots of the data for each class. Sketch the box plots, side-by-side, here: See EXCEL output. How is a “modified” box plot different than a regular box plot? In a modified box plot, outliers are shown as ‘Xs’ beyond the min and max values at the ends of the whiskers. Explain in a sentence or two the usefulness of a box plot when analyzing data. A box plot is particularly effective when comparing the spreads of two sets of data. g) Create (by hand) back-to-back stem-and-leaf plots for the data sets; split your stems if necessary. See EXCEL output. Explain in a sentence or two the particular usefulness of a stem-and-leaf plot in analyzing data. A stem-and-leaf plot is particularly effective because it allows you to look at the entire set of data (each individual data point is exposed). h) Make a frequency table of As (90-100), Bs (80-89), Cs (70-79), Ds (60-69) and Fs (<60) for each class, then create a histogram for each class. Grades A (90 +) B (80-89) C (70-79) D (60-69) F (< 60) FIRST 4 9 5 2 1 LAST 3 7 12 5 3 See EXCEL output for histograms. Explain in a sentence or two the particular usefulness of frequency tables and histograms when analyzing data. Frequency tables tabulate in order to create a histogram. Histograms are particularly effective when analyzing data because it gives an excellent visual of the shape of the distribution. i) Create a table that displays marginal grade summaries and marginal distributions. Grades FIRST Frequency LAST Frequency Totals Marginal Distribution (%) A (90 +) B (80-89) C (70-79) D (60-69) F (< 60) Totals 4 9 5 2 1 21 3 7 12 5 3 30 7 16 17 7 4 51 14% 31% 33% 14% 8% 100% In a sentence or two, comment on the marginal distribution percentages that you calculated. The marginal distributions reveal that 14% of the students overall received As, 31% received Bs, 33% received Cs, 14% received Ds and 8% failed. j) Compare the conditional probabilities of the grades for FIRST and LAST (e.g. what is the conditional probability that a student will get a grade of B or better given that s/he is in the FIRST PERIOD class). Complete the table and write a few sentences explaining your findings. Conditional Distribution Table First Last Period Period Conditional Grades Conditional A (90 +) 19% 10% B (80-89) 43% 23% C (70-79) 24% 40% D (60-69) 10% 17% F (< 60) 5% 10% A or B 62% 33% A, B or C 86% 73% C, D or F 38% 67% D or F 14% 27% The conditional probabilities show that FIRST PERIOD students were more likely to receive As (19% vs. 10%) or Bs (43% vs. 23%). They were also more likely to receive an A or B (62% vs. 33%) or an A, B or C (86% vs. 73%). LAST PERIOD students were more likely to receive Cs (40% vs. 24%), Ds (17% vs. 10%) and Fs (10% vs. 5%). They were also more likely to receive a D or F (27% vs. 14%) and a C, D or F (67% vs. 38%). Section2 - Summary Questions: Using the data, tables, summaries and visual displays you created, answer the following questions. Your answers should be typed on a separate of paper. (1) Describe the shape of the data for each data set (shape, center, and spread). FIRST PERIOD data: Shape is uni-modal, center is around 80 (mean of 79%, median of 82%), spread is fairly wide with scores from 52% to 98% (range of 46%), data is skewed left. There is one outlier low. LAST PERIOD data: Shape is uni-modal; distribution appears to be close to normal. The center is around 74 (mean of 74.6%, median of 74%), spread is fairly wide with scores from 54% to 96% (range of 42%), data is symmetric. There are no outliers. (2) Discuss your numerical findings in general, comparing the data of these two classes. What conclusions can you make? The test scores from FIRST PERIOD are generally higher than those from LAST PERIOD. Two measures of centrality (mean and median) are higher in FIRST PERIOD than in last (mean: 79% vs. 74.6%; median: 82% vs. 74%). The third measure of centrality (mode) is the same between the two data sets; however, given the data (test scores), this measure may not mean much and may be partially due to grading (e.g. individual question values, etc.). FIRST PERIOD has significantly more As and Bs as a percentage of the total number of tests in the class than LAST PERIOD (FIRST: 62% vs. LAST: 33%). It appears that there is more ability in the FIRST PERIOD class than in the LAST PERIOD class. ****OTHER CONJECTURES WILL VARY**** (3) Should the AP Statistics teacher conclude that there is a difference in the level of abilities between the students in the two classes? Support your answers in 3-5 sentences using your data. ****Answers here will vary, however, it is reasonable for students to conclude that FIRST PERIOD has a stronger group of students than LAST period.**** (4) Are there factors besides student ability that might be affecting this data? Using your experiences as a student, identify some possible factors and support your arguments in 3-5 sentences. ****Answers here will vary, but students SHOULD point out that there are more students in LAST PERIOD (30) than in FIRST PERIOD (21) and that students and teachers are probably less effective at the end of the day than at the beginning of the day.**** (5) What recommendations would you make to the AP Statistics teacher regarding these two classes? Be specific in your recommendations and support your answers. ****Answers here will vary; most reasonable conclusions are acceptable and should be praised.****