Statistical Inference - edventure-ga

AP Statistics Summer, 2013 Paul L. Myers The Paideia School Atlanta, GA 30307 myers.paul@paideiaschool.org www.edventure-ga.com AP Statistics Summer Institute Summer, 2013 Paul L. Myers, Consultant The Paideia School, Atlanta, GA myers.paul@paideiaschool.org  Day 1 o o o o Logistics & Introductions The AP Program & The AP Statistics Syllabus Appropriate Use of Technology – Graphing Calculators Topic I - Exploring Data  Constructing and interpreting graphical displays of univariate data  Summarizing distributions of univariate data  Comparing distributions of univariate date  Exploring bivariate data  Exploring categorical data  Day 2 o Materials & Resources o Appropriate Use of Technology – Computer Software o Topic II - Sampling & Experimentation  Overview of methods of data collection  Planning and conducting surveys  Planning and conducting experiments  Generalizability of results and conclusions that can be drawn from observational studies, experiments, and surveys  Day 3 o The AP Audit Syllabus & Timeline o Appropriate Use of Hands-On Activities o Topic III - Anticipating Patterns  Probability  Combining independent random variables  The normal distribution  Sampling distributions  Day 4 o AP Exam Review Tips o Grading the AP Exam o Topic IV - Statistical Inference  Estimation  Tests of Significance 2 Random Introductions On the average, how many draws will it take to introduce N people? Each person is assigned a random integer from 1 to N. Random integers, from 1 to N, are selected with replacement. A short TI-8X home-screen program: o Initialize a Counter: 0 C o Update the Counter: C 1  C o Select a Random Integer: RandInt (1, N)  0 C  Enter  C  1  C :{C , RandInt (1, N )}  Enter Person Number 1 P (Introduced) E [Introduced] 2 3 4 5 N 3 AP Statistics Topic Outline Topic Exam Percentage Exploring Data 20%-30% Sampling & Experimentation Anticipating Patterns 10%-15% Statistical Inference 30%-40% 20%-30% 4 Exploring Data 5 Topic I – Exploring Data Describing patterns and departures from patterns (20%-30%) Exploring analysis of data makes use of graphical and numerical techniques to study patterns and departures from patterns. Emphasis should be placed on interpreting information from graphical and numerical displays and summaries. A. Constructing and interpreting graphical displays of distributions of univariate data (dotplot, stemplot, histogram, cumulative frequency plot) 1. Center and spread 2. Clusters and gaps 3. Outliers and other unusual features 4. Shape B. Summarizing distributions of univariate data 1. Measuring center: median, mean 2. Measuring spread: range, interquartile range, standard deviation 3. Measuring position: quartiles, percentiles, standardized scores (z-scores) 4. Using boxplots 5. The effect of changing units on summary measures C. Comparing distributions of univariate data (dotplots, back-to-back stemplots, parallel boxplots) 1. Comparing center and spread: within group, between group variables 2. Comparing clusters and gaps 3. Comparing outliers and other unusual features 4. Comparing shapes D. Exploring bivariate data 1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residuals plots, outliers, and influential points 5. Transformations to achieve linearity: logarithmic and power transformations E. Exploring categorical data 1. Frequency tables and bar charts 2. Marginal and joint frequencies for two-way tables 3. Conditional relative frequencies and association 4. Comparing distributions using bar charts 6 Activity 1: How Do The Colors Vary? 1. Find the weight (in grams) of your bag of M&Ms. (We will use it in a later activity!) ______________ 2. Open your bag of M&Ms and count the number and the percentage of each color and the total number of M&Ms in the bag. Color Brown Yellow Red Blue Orange Green Total Number Percentage 3. Using your M&Ms, construct a pie chart of colors. Pie Chart 7 Matching Displays to Variables I Consider the following list of variables and data displays: A. Scores on a fairly easy examination in statistics B. Number of months required to achieve pregnancy for a sample of women who attempted to get pregnant C. Age at death of a sample of 34 persons D. Heights of a group of college students E. The last digit in the social security number of each of 40 students F. Number of medals won by medal-winning countries in the 2000 Sydney Olympics G. SAT scores for a group of college students 8 Matching Displays to Variables II Consider the following group of histograms and summary statistics. Each of the variables corresponds to one of the histograms. Variable Mean Median Standard Deviation 1 60 50 10 2 50 50 15 3 50 50 10 4 53 50 20 5 47 50 10 6 50 50 5 9 Matching Displays to Variables III Consider the following group of histograms and box plots. Each box plot corresponds to one of the histograms. Match the box plots to the histograms and explain how you made your choices. 10 United States Presidents President 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 George Washington John Adams Thomas Jefferson James Madison James Monroe John Quincy Adams Andrew Jackson Martin Van Buren William Henry Harrison John Tyler James Polk Zachary Taylor Millard Fillmore Franklin Pierce James Buchanan Abraham Lincoln Andrew Johnson Ulysses S. Grant Rutherford B. Hayes James A. Garfield Chester A. Arthur Grover Cleveland Benjamin Harrison Grover Cleveland William McKinley Theodore Roosevelt William Howard Taft Thomas Woodrow Wilson Warren G. Harding John Calvin Coolidge Herbert Clark Hoover Franklin D. Roosevelt Harry S Truman Dwight D. Eisenhower John F. Kennedy Lyndon B. Johnson Richard M. Nixon Gerald R. Ford James E. Carter Ronald W. Reagan George H. W. Bush William J. Clinton George W. Bush Barack Obama Date of Birth 2/22/1732 10/30/1735 4/13/1743 3/16/1751 4/28/1758 7/11/1767 3/15/1767 12/5/1782 2/9/1773 3/29/1790 11/2/1795 11/24/1784 1/7/1800 11/23/1804 4/23/1791 2/12/1809 12/29/1808 4/27/1822 10/4/1822 11/19/1831 10/5/1830 3/18/1837 8/20/1833 3/18/1837 1/29/1843 10/27/1858 9/15/1857 12/28/1856 11/2/1865 7/4/1872 8/10/1874 1/30/1882 5/8/1884 10/14/1890 5/29/1917 8/27/1908 1/9/1913 7/14/1913 10/1/1924 2/6/1911 6/12/1924 8/19/1946 7/6/1946 8/4/1961 Date of Inauguration 4/30/1789 3/4/1797 3/4/1801 3/4/1809 3/4/1817 3/4/1825 3/4/1829 3/4/1837 3/4/1841 4/6/1841 3/4/1845 3/5/1849 7/9/1850 3/4/1853 3/4/1857 3/4/1861 4/15/1865 3/4/1869 3/4/1877 3/4/1881 9/19/1881 3/4/1885 3/4/1889 3/4/1893 3/4/1897 9/14/1901 3/4/1909 3/4/1913 3/4/1921 8/3/1923 3/4/1929 3/4/1933 4/12/1945 1/20/1953 1/20/1961 11/22/1963 1/20/1969 8/9/1974 1/20/1977 1/20/1981 1/20/1989 1/20/1993 1/20/2001 1/20/2009 Date of Death 12/14/1799 7/4/1826 7/4/1826 6/28/1836 7/4/1831 2/23/1848 6/8/1845 7/24/1862 4/4/1841 1/18/1862 6/15/1849 7/9/1850 3/8/1874 10/8/1869 6/1/1868 4/15/1865 7/31/1875 7/23/1885 1/17/1893 9/19/1881 11/18/1886 6/24/1908 3/13/1901 6/24/1908 9/14/1901 1/6/1919 3/8/1930 2/3/1924 8/2/1923 1/5/1933 10/20/1964 4/12/1945 12/26/1972 3/28/1969 11/22/1963 1/22/1973 4/22/1994 12/26/2006 6/6/04 Age at Inauguration 57 61 57 57 58 57 61 54 68 51 49 64 50 48 65 52 56 46 54 49 50 47 55 55 54 42 51 56 55 51 54 51 60 62 43 55 56 61 52 69 64 46 55 47 Age at Death 67 90 83 85 73 80 78 79 68 71 53 65 74 64 77 56 66 63 70 49 56 71 67 71 58 60 72 67 57 60 90 63 88 78 46 64 81 93 93 11 1. What do you think the shape of Inauguration Ages will be? 2. Draw a stem-and-leaf plot of Inauguration Ages. 4 * 5 * 6 * 3. Find the 5-Number Summary of Inauguration Ages. Min Q1 Med Q3 Max = = = = = 4. Draw a boxplot of Inauguration Ages. 5. Are there any outliers? Who are they and what are their ages? 6. Draw a histogram of Inauguration Ages. 7. Find the mean and standard deviation of Inauguration Ages. 12 1. What do you think the shape of Death Ages will be? 2. Draw a stem-and-leaf plot of Death Ages. 4 * 5 * 6 * 7 * 8 * 9 * 3. Find the 5-Number Summary of Death Ages. Min Q1 Med Q3 Max = = = = = 4. Draw a boxplot of Death Ages. 5. Are there any outliers? Who are they and what are their ages? 6. Draw a histogram of Death Ages. 7. Find the mean and standard deviation of Death Ages. 13 AP Statistics One-Variable Statistics Project Your report must be done in Word and your displays must be done in Fathom and pasted into your Word document. Your report must be titled and the title centered and in bold type. All narratives must be in complete sentences and written in paragraph form with correct punctuation and spelling. Graphical displays must be complete (titled, axes labeled and scaled, sized appropriately) and arranged in a logical format on the page. Your project will be graded on content and presentation. You will use internet resources to find a set of data concerning a topic of interest to you. Your data set must have a minimum of 50 individuals. Information about one quantitative variable and at least one categorical variable should be collected for these individuals. You will then write a report about the data set that follows the guidelines below: Part I: The Introduction Describe your data set, give its web address, and describe what you are intending to investigate. Be sure to include some background information about your data so that your project becomes interesting to the reader. You may include photos from the web site in this portion of your project. Part II: The Analysis Display your data graphically and numerically. (a) Create at least one display which is appropriate for your quantitative variable. (b) Calculate the appropriate measures of center and spread and mark these values on your display. (c) Create an appropriate summary table of your numerical measures. (d) Create at least one display that compares the quantitative variable split by category. (e) Calculate the appropriate measures of center and spread split by category and mark these values on your display. (f) Create an appropriate summary table of your numerical measures split by category. Part III: The Conclusion Summarize the findings of your investigation. Issues that you should address include but are not limited to: 1. Shape. Is the shape clear or does one type of graph more clearly show a shape than another? In the display, does the scaling make any difference in the overall shape? 2. Unusual Features. Are there any outliers? How do you know? If so, identify them and either justify eliminating them or explain their presence. 3. Spread. What percentage of your data points are within one standard deviation of the mean? 4. Comparisons. When your data is split by category, compare and contrast the distributions. 14 AP Statistics One-Variable Statistics Project Grading Rubric Introduction - Describe your data set, give its web address, and describe what you are intending to investigate. Be sure to include some background information about your data so that your project becomes interesting to the reader. You may include photos from the web site in this portion of your project. Quantitative Displays (g) Create at least one display which is appropriate for your quantitative variable. (h) Calculate the appropriate measures of center and spread and mark these values on your display. (i) Create an appropriate summary table of your numerical measures. Quantitative Displays split by Category (j) Create at least one display that compares the quantitative variable split by category. (k) Calculate the appropriate measures of center and spread split by category and mark these values on your display. (l) Create an appropriate summary table of your numerical measures split by category. Conclusion - Summarize the findings of your investigation. Issues that you should address include but are not limited to: 1 point 2 points 2 points 1. Shape. Is the shape clear or does one type of graph more clearly show a shape than another? In the display, does the scaling make any difference in the overall shape? 2. Unusual Features. Are there any outliers? How do you know? If so, identify them and either justify eliminating them or explain their presence. 3 points 3. Spread. What percentage of your data points are within one standard deviation of the mean? 4. Comparisons. When your data is split by category, compare and contrast the distributions. Presentation - Your report must be done in Word and your displays must be done in Fathom and pasted into your Word document. Your report must be titled and the title centered and in bold type. All narratives must be in complete sentences and written in paragraph form with correct punctuation and spelling. Graphical displays must be complete (titled, axes labeled and scaled, sized appropriately) and arranged in a logical format on the page. 2 points TOTAL 10 points 15 Multiple Choice Practice 1. 2. 16 3. 4. 17 Free Response Practice 2006 FR#1 – The Catapults Two parents have each built a toy catapult for use in a game at an elementary school fair. To play the game, the students will attempt to launch Ping-Pong balls from the catapults so that the balls land within a 5-centimeter band. A target line will be drawn through the middle of the band, as shown in the figure below. All points on the target line are equidistant from the launching location. If a ball lands within the shaded band, the student will win a prize. The parents have constructed the two catapults according to slightly different plans. They want to test these catapults before building additional ones. Under identical conditions, the parents launch 40 Ping-Pong balls from each catapult and measure the distance that the ball travels before landing. Distances to the nearest centimeter are graphed in the dotplot below. (a) Comment on any similarities and any differences in the two distributions of distances traveled by balls launched from catapult A and catapult B. (b) If the parents want to maximize the probability of having the Ping-Pong balls land within the band, which one of the catapults, A or B, would be better to use than the other? Justify your choice. (c) Using the catapult that you chose in part (b), how many centimeters from the target line should this catapult be placed? Explain why you chose this distance. 18 Free Response Practice 2008 FR#1 – Breakfast Cereal 19 Is There an Association Between Spaces From Go and Property Cost? Property Mediterranean Avenue Baltic Avenue Reading Railroad Oriental Avenue Vermont Avenue Connecticut Avenue St. Charles Place Electric Company States Avenue Virginia Avenue Penn Railroad St. James Place Tennessee Avenue New York Avenue Kentucky Avenue Indiana Avenue Illinois Avenue B & O Railroad Atlantic Avenue Ventnor Avenue Water Works Marvin Gardens Pacific Avenue North Carolina Avenue Pennsylvania Avenue Short Line Railroad Park Place Boardwalk 1. 2. 3. 4. 5. 6. Spaces from GO 1 3 5 6 8 9 11 12 13 14 15 16 18 19 21 23 24 25 26 27 28 29 31 32 34 35 37 39 Cost 60 60 200 100 100 120 140 150 140 160 200 180 180 200 220 220 240 200 260 260 150 280 300 300 320 200 350 400 Draw a scatterplot of (Spaces From Go, Cost). Does there appear to be an association between the two variables? Find a “Line of Best Fit” – y = mx+b. What does b represent? What does m represent? Are there any unusual points? What are they? Predict the cost of a “new” property that is 50 spaces from Go. 20 Measure The Correlation For each of the following scatterplots:    Draw a symmetrical ellipse that characterizes the data. Draw the major and minor axes of the ellipse  length of the minor axis  Calculate r   1    length of the major axis  21 Matching Descriptions to Scatter Plots I Match each of the five scatterplots to the description of its regression line and correlation coefficient. The scales on the axes of the scatterplots are the same. H. r  0.83 , $y  2.1  1.4 x I. r  0.31 , $y  7.8  0.5x r  0.96 , $y  2.1  1.4 x K. r  0.83 , $y  11.8 1.4 x L. r  0.41 , $y  1.4  1.4 x J. 22 Matching Descriptions to Scatter Plots II For the nine points on the following scatterplot, r  0.71 , r 2  0.5 , and the equation of the leastsquares regression line is y  4.00  1.00 x . A tenth point is added to the original nine. Match each of the following points with the correlation coefficient that would result if that point were added. Do not calculate the new correlation coefficient but rather reason out which r must go with each point. Points (a) (3,7) (b) (2,6) (c) (10,0) (d) (10,6) (e) (10,14) (f) (100,0) (g) (100,6) Correlation coefficients I. -0.84 II. -0.70 III. 0.02 IV. 0.22 V. 0.71 VI. 0.73 VII. 0.96 23 Matching Descriptions to Scatter Plots III Analyze the four data sets (from Anscombe, 1973) in the following table.  What do they have in common?  Why are they of interest?  What do they illustrate? Data Set I x y 10 8.04 8 6.95 13 7.58 9 8.81 11 8.33 14 9.96 6 7.24 4 4.26 12 10.84 7 4.82 5 5.68 Data Set 2 x y 10 9.14 8 8.14 13 8.74 9 8.77 11 9.26 14 8.10 6 6.13 4 3.10 12 9.13 7 7.26 5 4.74 Data Set 3 x y 10 7.46 8 6.77 13 12.74 9 7.11 11 7.81 14 8.84 6 6.08 4 5.39 12 8.15 7 6.42 5 5.73 Data Set 4 x y 8 6.58 8 5.76 8 7.71 8 8.84 8 8.47 8 7.04 8 5.25 19 12.50 8 5.56 8 7.91 8 6.89 24 What Is R-Squared (Really!)? Our goal is to find a model to predict y. Plot the data set (x,y). One-Variable Model 1. Calculate y (the mean of y) 2. Draw the y line.  This is our One-Variable Model. 3. Draw the vertical segments representing y  y (the deviations from the mean).  4. Draw squares representing y  y 5. Shade the squares and calculate  2 (the squared deviations from the mean).  y  y 2 (the total squared variability in this model).  We will use this as our measure of the One-Variable Model variability. Two-Variable Model ^ 1. Calculate y (the least-squares regression line of (x,y)) ^ 2. Draw the y line.  This is our Two-Variable Model. ^ 3. Draw the vertical segments representing y  y (the residuals from the LSRL). 2 ^   4. Draw squares representing  y  y  (the squared residuals).   2 ^   5. Shade the squares and calculate   y  y  (the total squared variability in this model).    We will use this as our measure of the Two-Variable Model variability. Measuring the Quality of the Model 1. Calculate  that is 2. Calculate   2 ^   y  y    y  y  (the amount of variability in the One-Variable Model   explained by the Two-Variable Model) 2   y y 2 2 ^     y  y   (the percentage of variability in the One-Variable 2 y y  Model that is explained by the Two-Variable Model)  This is r2. 25 AP Statistics - Two Variable Statistics Project Find a data set with at least 25 paired data points that represent some variables that might realistically have an association. You will write a report, using Word, including, but not limited to, the following. Discuss your data and the source of the data. Include why you find this data interesting and what type of relationship (or association) you expect to find BEFORE you ever plot the data and why you expect this relationship. Using Fathom, plot the data and find:  the regression equation.  the correlation coefficient  the coefficient of determination.  a residual plot           Be sure each graph is appropriately titled and labeled. Discuss the meaning of all parts of the regression equation as they relate to your data. Interpret both the correlation coefficient and the coefficient of determination as they relate to your data. What information does the residual plot give you? Explain why you would or would not use the regression equation to predict value of the response variable. Are there any influential points? If not, how do you know? If so, how would you deal with those points? If there are any categorical variables present, indicate them on your graph and discuss what significance, if any, they have to the association. Which of the data values have the smallest and largest residual? Interpret the meaning of these residuals. Was your expectation about the relationship accurate or not? If it was accurate, why do you think you accurately predicted the nature of the relationship? If not, why not? Remember to incorporate your graphs into the discussion portion of your project. Treat this as you would a paper for English. Presentation will count. 26 AP Statistics - Two Variable Statistics Project Rubric Presentation: Your report is done in Word and your displays are done in Fathom and pasted into your Word document. Graphical displays are to be complete (titled, axes labeled and scaled, sized appropriately) and arranged in a logical format on the page. Your report is titled. All narratives are in complete sentences and written in paragraph form with correct punctuation and spelling. Fathom: Plot the data. Find the regression equation. Find the correlation coefficient. Find the coefficient of determination. Find a residual plot. ½ point ½ point ½ point ½ point Discussion: Discuss your data and the source of the data. Include why you find this data interesting and what type of relationship (or association) you expect to find BEFORE you ever plot the data and why you expect this relationship. Was your expectation about the relationship accurate or not? If it was accurate, why do you think you accurately predicted the nature of the relationship? If not, why not? Interpretation: Discuss the meaning of all parts of the regression equation as they relate to your data. (slope and intercept) Interpret both the correlation coefficient and the coefficient of determination as they relate to your data. What information does the residual plot give you? Explain why you would or would not use the regression equation to predict value of the response variable. Are there any influential points? If not how do you know? If so, how would you deal with those points? If there are any categorical variables present indicate them on your graph and discuss what significance, if any, they have to the association. Which of the data values have the smallest and largest residual? Interpret the meaning of these residuals. TOTAL ½ point 2 points 2 points ½ point 1 point 1 point ½ point ½ point 10 points 27 M & M Statistics Exponential Decay 1. Count the number of M&M Minis in your tube - Place that number in Trial Number 0. 2. (a) Place the M&M Minis in the tube. (b) Shake. (c) Pour on the table. (d) Remove the M&M Minis with no M facing Up. (e) Count the remaining M&M Minis. (f) Place that number in Trial Number 1. 3. Continue step 2 (increasing the trial number by 1) until there is only 1 M&M left. Trial Number Number of Minis Left 0 1 2 4. Draw a scatterplot of: (Trial Number, Number of M&M Minis) 3 4 5 6 7 8 9 5. Transform the data and draw a scatterplot of: (Time, log(Number of M&M's)) 6. Find the LSRL of the transformed data. LSRL _______________________________ r2 ____________ 7. Undo the transformation and find the model ________________________________ 28 Interspecies Scaling for Mammals Mouse Rat Rabbit Monkey Dog Human Elephant Weight (Kg) 0.03 0.32 3.97 6.55 16 68 2,500 Heart Rate (beats per minute) 580 320 170 150 120 25 Find an appropriate model and predict the heart rate of a human. 29 Multiple Choice Practice 1. 2. 30 3. 4. 31 Free Response Practice Lunchtime Does how long children spend at the lunch table help predict how much food they eat? Data was collected on 20 toddlers who were observed over several months at a nursery school. Both time (in minutes) spent at the lunch table and calories consumed were collected. A computer printout of the linear regression is shown below. (a) What is the equation of the least-squares regression line (in context)? (b) Find the value of and explain the meaning of the slope in the context of the problem. (c) Find the value of and explain the meaning of the y-intercept in the context of the problem. (d) Find the value of and explain the meaning of the correlation in the context of the problem. (e) Predict the number of calories consumed by a toddler who spends 30 minutes at the lunch table. 32 Free Response Practice 2006 FR#2 – Soapsuds A manufacturer of dish detergent believes the height of soapsuds in the dishpan depends on the amount of detergent used. A study of the suds’ height for a new dish detergent was conducted. Seven pans of water were prepared. All pans were of the same size and type and contained the same amount of water. The temperature of the water was the same for each pan. An amount of dish detergent was assigned at random to each pan, and that amount of detergent was added to that pan. Then the water in the dishpan was agitated for a set of amount of time, and the height of the resulting suds were measured. A plot of the data and the computer printout from fitting a least-squares regression line to the data are shown below. (a) Write the equation of the fitted regression line. Define any variables used in this equation. (b) Note that s = 1.99821 in the computer output. Interpret this value in the context of the study. (c) Identify and interpret the standard error of the slope. 33 2012 FR#1 – Sewing Machines 1. The scatterplot below displays the price in dollars and quality rating for 14 different sewing machines. a) Describe the nature of the association between price and quality rating for the sewing machines. b) One of the 14 sewing machines substantially affects the appropriateness of using a linear regression model to predict quality rating based on price. Report the approximate price and quality rating of that machine and explain your choice. c) Chris is interested in buying one of the 14 sewing machines. He will consider buying only those machines for which there is no other machine that has both higher quality and lower price. On the scatterplot reproduced below, circle all data points corresponding to machines that Chris will consider buying. 34 Sampling & Experimentation Triple Blind Study    Participant doesn’t know what he is taking Physician doesn’t know what the participant is taking Statistician doesn’t know what he is doing 35  Topic II – Sampling and Experimentation Planning and conducting a study (10%-15%) Data must be collected according to a well-developed plan if valid information on a conjecture is to be obtained. This includes clarifying the question and deciding upon a method of data collection and analysis. A. Overview of methods of data collection 1. Census 2. Sample survey 3. Experiment 4. Observational study B. Planning and conducting surveys 1. Characteristics of a well-designed and well-conducted survey 2. Populations, samples, and random selection 3. Sources of bias in sampling and surveys 4. Sampling methods, including simple random sampling, stratified random sampling, and cluster sampling C. Planning and conducting experiments 1. Characteristics of a well-designed and well-conducted experiment 2. Treatments, control groups, experimental units, random assignments, and replication 3. Sources of bias and confounding, including placebo effect and blinding 4. Randomized block design, including matched pairs design D. Generalizability of results and types of conclusions that can be drawn from observational studies, experiments, and surveys 36 Newspaper/Magazine/Web Articles Answer the following questions: 1. Was this an experiment or an observational study? 2. What were the explanatory and response variables? 3. What was the sample size? 4. If it is an observational study, is there a confounding effect lurking in the background? If so, what is it? 5. If it was an experiment, (a) Was there a control group? Was there a placebo? (b) Was it run blind? (c) Was it run double blind? (d) What factors were used and at what levels were they used? 6. What was the conclusion of the study? Do you believe it? Why or why not? 37 38 An Exercise in Sampling: Rolling Down the River A farmer has just cleared a new field for corn. It is a unique plot of land in that a river runs along one side. The corn looks good in some areas of the field but not others. The farmer is not sure that harvesting the field is worth the expense. He has decided to harvest 10 plots and use this information to estimate the total yield. Based on this estimate, he will decide whether to harvest the remaining plots. Part I. A. Method Number One: Convenience Sample The farmer began by choosing 10 plots that would be easy to harvest. They are marked on the grid below: Since then, the farmer has had second thoughts about this selection and has decided to come to you (knowing that you are an AP statistics student, somewhat knowledgeable, but far cheaper than a professional statistician) to determine the approximate yield of the field. You will still be allowed to pick 10 plots to harvest early. Your job is to determine which of the following methods is the best one to use and to decide if this is an improvement over the farmer’s original plan. B. Method Number Two: Simple Random Sample Use your calculator or a random number table to choose 10 plots to harvest. Mark them on the grid below, and describe your method of selection. 39 C. Method Number Three: Stratified Sample Consider the field as grouped in vertical columns (called strata). Using your calculator or a random number table, randomly choose one plot from each vertical column and mark these plots on the grid. D. Method Number Four: Stratified Sample Consider the field as grouped in horizontal rows (also called strata). Using your calculator or a random number table, randomly choose one plot from each horizontal row and mark these plots on the grid. 40 OK, the crop is ready. Below is a grid with the yield per plot. Estimate the average yield per plot based on each of the four sampling techniques. Observations: 1) You have looked at four different methods of choosing plots. Is there a reason, other than convenience, to choose one method over another? 2) How did your estimates vary according to the different sampling methods you used? 3) Compare your results to someone else in the class. Were your results similar? 4) Pool the results of all students for the mean yields from the simple random samples and make a class boxplot. Repeat for means from vertical strata and from horizontal strata. Compare the class boxplots for each sampling method. What do you see? 5) Which sampling method should you use? Why do you think this method is best? 6) What was the actual yield of the farmer’s field? How did the boxplots relate to this actual value? 41 Part II: The farmer was very impressed with the results of your study and seeks to improve the yield of that part of the field the following year. Believing that irrigation is the answer, a new system was installed. The following year’s yield was: Redo your sampling using a SRS, vertical stratification, and horizontal stratification. Be certain to mark on the grids the plots you choose. A. Simple Random Sample: B. Stratified Sample (vertically): 42 C. Stratified Sample (horizontally): Observations: 1) Compare the class boxplots of the sample means obtained from the SRS and the two methods of stratified sampling. 2) Based on the results of both activities, under what conditions is it more useful to use stratified sampling? 3) Based on the results of both activities, under what conditions is it more useful to use a simple random sample? 43 Teacher Notes for Rolling Down The River The purpose of this exercise is to allow students to see the effects of different methods of sampling in different situations. Part I. A. Convenience Sample This is rarely a good choice. Although it is attractive to the farmer to harvest the plots as easily as possible, it often leads to large bias in the result. B. Simple Random Sample With simple random sampling, all possible sets of 10 plots have an equal chance of being selected. By using this impartial selection method, higher yield plots should be balanced out by lower yield plots. However, it may be the case, since all possible combinations are possible, that all of the selected plots have a high yield or that all of the selected plots have a low yield. Thus there is large variability in the sample statistic. C/D. Stratification When there is some factor that can influence or affect the response, (in this case the river has an effect on the yield), then using a stratified sample should reduce some of the variability in the means of repeated samples. However, it is necessary to choose the strata correctly. Strata should be constructed so that within the strata the data are very similar (homogeneous) while the individual strata contain sets of data that are as different as possible (heterogeneous). The farmer should be consulted as to the direction of the strata. His experience would determine the best approach. Note: The data were purposely set up so that the effects of proper stratification would be startling. This does not mean to suggest that a crop grown near a river would necessarily result in such a large difference in yields. Observations / Answer Key: 1) One needs to choose a method that will give the best estimate of the yield. This can be affected by factors that cannot be controlled: e.g. the placement of the river. That’s why one shouldn’t choose the ten plots chosen by the farmer. 2) The student will see that the farmer’s sample yields a very low estimate compared to the other methods used. 3) Comparing results with a peer helps the student verify that the sampling was done correctly. This does not mean the students will have the same sample, but each student should use the same process of drawing a sample for a given method. Some methods will produce highly variable results while others are much more consistent. 4) The variability of the means of the sample yields, as shown by the length of the boxplot and the width of the middle 50%, will reduce drastically once the student has stratified appropriately. Thus the strata that are effective are the vertical ones, in which the values in each stratum are similar. This stratification reduces the variation in the sample means since the values chosen for a particular stratum vary little from sample to sample relative to the variability in the population. 5) Vertical stratification should be used since the sample would then include higher yielding plots as well as lower yielding ones. 6) The actual yield is 5004. The class boxplot for the means resulting from the vertical stratification should be centered near 5004/100 or about 50. Part II. Observations / Answer Key: 1) Since the river effect has been cancelled out by the irrigation process, there is no discernable pattern in the yield (in effect, the data are randomly distributed). Therefore, there should be no improvement in using a stratified random sample over a SRS. The boxplots should be centered near the total yield divided by 100 (7603/100 or approximately 76). 2) It is more useful to stratify when one suspects that there is some outside factor affecting the response variable. 3) It is more useful to use a SRS when there is no reason to stratify; that is, when there is no reason to expect that an outside factor is affecting the response variable. It certainly is easier and is often less expensive to use a SRS. 44 Helicopters Question: Do long-rotor paper helicopters take a different length of time to fall, on average, than short-rotor paper helicopters? Materials: 1. Paper to construct helicopters – both long rotor and short rotor 2. Scissors 3. Stapler 4. Paper clip 5. Stopwatch Procedure: 1. Construct one long-rotor helicopter and one short-rotor helicopter (see diagram and instructions) 2. Each type of helicopter will be dropped 10 times from the ceiling to the floor. Team Trial Number Type of Rotor (Long or Short) Time (sec.) Dropper Timer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 45 Helicopter Construction Instructions 1. Cut out the rectangular shape of the helicopter on the solid lines. 2. Cut one-third of the way in from each side of the helicopter to the vertical dashed lines on the solid line. 3. Fold both sides toward the center creating the base. The base can be stapled at the top and bottom. Try to be consistent about where the staples are placed. Use a paper clip to add some weight to the body. 4. For long-rotor helicopters, cut down from the top along the solid center line to the horizontal dashed line. 5. For short-rotor helicopters, proceed as in step 4, but cut the rotors off along the horizontal line marked. 6. Fold the rotors in opposite directions. 46 47 I. Graphical Analysis 1. Using long- and short-rotor helicopter descent times construct an appropriate display to compare the flight times. 2. Calculate appropriate summary statistics for each of the two sets of data. 3. Using the graphical and numerical information, compare the shape, center, and spread of the distributions including outliers and any other unusual features. II. Inferential Statistics 1. What is the population about which inference can be made? 2. What is the appropriate inference procedure for comparing the mean descent times for the two helicopters? (e.g.: paired t-test, two independent sample t-test, z-test, etc.) 3. State and justify the assumptions necessary to apply this inference procedure. 4. Construct a 95% confidence interval for the difference in the mean descent time for each helicopter. Discuss and interpret the meaning of the confidence interval. 5. Use an appropriate significance test to determine whether or not there is a difference in the true mean descent times of the two helicopters. 6. How do the results of the test of significance relate to the observations you made using your confidence intervals? 48 Multiple Choice Practice 1. 2. 3. 49 Free Response Practice 2010 FR#1 – Bird Deterrent 50 Free Response Practice 1997 FR#2 – Fish Tanks 51 Free Response Practice 2008 FR#2 – School Board Survey 52 Anticipating Patterns 53 Topic III – Anticipating Patterns Exploring random phenomena using probability and simulation (20%-30%) Probability is the tool used for anticipating what the distribution of data should look like under a given model. A. Probability 1. Interpreting probability, including long-run relative frequency interpretation 2. “Law of Large Numbers” concept 3. Addition rule, multiplication rule, conditional probability, and independence 4. Discrete random variables and their probability distributions, including binomial and geometric 5. Simulation of random behavior and probability distributions 6. Mean (expected value) and standard deviation of a random variable and linear transformation of a random variable B. Combining independent random variables 1. Notion of independence versus dependence 2. Mean and standard deviation for sums and differences of independent random variables C. The normal distribution 1. Properties of the normal distribution 2. Using tables of the normal distribution 3. The normal distribution as a model for measurements D. Sampling distributions 1. Sampling distribution of a sample proportion 2. Sampling distribution of a sample mean 3. Central Limit Theorem 4. Sampling distribution of a difference between two independent sample proportions 5. Sampling distribution of a difference between two independent sample means 6. Simulation of sampling distributions 7. t-distribution 8. Chi-square distribution 54 Why Do We Really Buy Cereal? 1. Inside each box of Rice Krispies is a toy guitar and there are six colors of guitars available. 2nd-Grader Emma Grace really needs to get the red one. How many boxes does Emma Grace’s Grampa have to buy in order for Emma Grace to get her wish? 2. Assuming Kelloggs is not trying to horde any specific color (we'll assume they are randomly distributed), we may simulate this situation by randomly generating six colors. 3. Assign each color an integer from one to six. (Let's say red is four.) 4. Using a die, a random number table, or the TI-84, simulate this situation by generating random integers from one to six. When you get a four, circle it. 5. On the average, how many boxes did it take to get the red one? 6. Now Emma Grace has decided what she really needs is to get the all the guitars. How many boxes does Emma Grace's Grampa have to buy in order for Emma Grace to get her latest wish? 55 Weird Dice 1999 AP Statistics FR#5 Die A has four 9’s and two 0’s on its faces. Die B has four 3’s and two 11’s on its faces. When either of these dice is rolled, each face has an equal chance of landing on top. Two players are going to play a game. The first player selects a die and rolls it. The second player rolls the remaining die. The winner is the player whose die has the higher number on top. Suppose you are the first player and you want to win the game. Which die would you select? Justify your answer. Simulate the game by creating and rolling the “Weird” dice. Winner Die A Die B Tally Simulation with the TI-83+  Die A will be stored in L1 {0,0,9,9,9,9}  L1  Die B will be stored in L2 {3,3,3,3,11,11}  L2  Roll Die A (select a random value from L1 L1(randInt(1,6)) The result will be a random value from L1, either 0 or 9.  Roll Die B (select a random value from L2) L2(randInt(1,6)) The result will be a random value from L2, either 3 or 11.  Compare Die A with Die (Does Die A beat Die B?) L1(randInt(1,6)) > L2(randInt(1,6)) The result will be true (1) or false (0).  Simulate the comparison n (use 100 as an example) times. seq(L1(randInt(1,6)) > L2(randInt(1,6)), X, 1, 100, 1) The result will be a list of 100 1’s and 0’s representing A beating B and B beating A.  Determine the number of times Die A beats Die B. sum(seq(L1(randInt(1,6)) > L2(randInt(1,6)), X, 1, 100, 1)) The result will be the number of times Die A beats Die B.  Determine the probability of Die A beating Die B. sum(seq(L1(randInt(1,6)) > L2(randInt(1,6)), X, 1, 100, 1))/100 56 Law of Large Numbers Flip a thumbtack 25 times. Record the results in the following table. (Tip Up – 1, Tip Down – 0) Trial Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Tip Up [1] , Tip Down [0] Using your calculator:  Assign the flip numbers to List1.  Record your results in List 2. (Enter the data)  Assign the cumulative sum in List3.  Assign the probability in List4.  Make a lineplot of (trial number, probability). seq( X , X ,1, 25,1)  L1 cumSum( L2 )  L3 L3 / L1  L4 57 The Central Limit Theorem  Store 0 in rand 0  rand [This seeds the random number generator and allows everyone to have the same population.]   Create a population. Determine the characteristics of your population. Center - Mean    Shape (Histogram)  Spread – Standard Deviation   Run PGRM CLT o Population is in L1 o Number of Samples is ___________. o Sample Size is ___________. o Your population is stored in LPOP, samples are temporarily stored in LTEMP, and  sample means x are stored in LXBAR.  Determine the characteristics of your sample means. Shape (Histogram) Center - Mean   x  Spread – Standard Deviation  x   Is there a relationship between the characteristics of the population and the characteristics of the sample?  Determine the characteristics of  x  . o Add your  x  in the row corresponding to your sample size. Sample Size (n) o o o Standard Deviation of Sample Means  x  Draw a scatterplot of  n,  x  . What relationship seems to exist? What type of transformation will make this relationship linear? 58 Multiple Choice Practice 1. 2. 59 3. 4. 5. 60 Shirt Sizes 2003 AP STATISTICS FR QUESTION #3 Men’s shirt sizes are determined by their neck sizes. Suppose that men’s neck sizes are approximately normally distributed with mean 15.7 inches and standard deviation 0.7 inch. A retailer sells men’s shirts in sizes S. M, L, XL, where the shirt sizes are defined in the table below. Shirt Size Neck Size S 14  neck size  15 M 15  neck size  16 L 16  neck size  17 XL 17  neck size  18 (a) Because the retailer only stocks the sizes listed above, what proportion of customers will find that the retailer does not carry any shirts in their sizes? (b) Find the proportion of men whose shirt size is M. (c) Of 12 randomly selected customers, what is the probability that exactly 4 will request size M? 61 Skunk Spinner 2003 AP STATISTICS FR #5 (B) 100 200 500 Contestants on a game show spin a wheel like the one shown in the figure above. Each of the four outcomes on this wheel is equally likely and outcomes are independent from one spin to the next. • The contestant spins the wheel. • If the result is a skunk, no money is won and the contestant’s turn is finished. • If the result is a number, the corresponding amount in dollars is won. The contestant can then stop with those winnings or can choose to spin again, and his or her turn continues. • If the contestant spins again and the result is a skunk, all of the money earned on that turn is lost and the turn ends. • The contestant may continue adding to his or her winnings until he or she chooses to stop or until a spin results in a skunk. (a) What is the probability that the result will be a number on all of the first three spins of the wheel? (b) Suppose a contestant has earned S800 on his or her first three spins and chooses to spin the wheel again. What is the expected value of his or her total winnings for the four spins? (c) A contestant who lost at this game alleges that the wheel is not fair. In order to check on the fairness of the wheel, the data in the table below were collected for 100 spins of this wheel. Result Skunk $100 $200 $300 Frequency 33 21 20 26 Based on these data, can you conclude that the four outcomes on this wheel are not equally likely? Give appropriate statistical evidence to support your answer. 62 Free Response Practice 2012 FR#2 – Spin The Pointer 2. A charity fundraiser has a Spin the Pointer game that uses a spinner like the one illustrated in the figure below. A donation of $2 is required to play the game. For each $2 donation, a player spins the pointer once and receives the amount of money indicated in the sector where the pointer lands on the wheel. The spinner has an equal probability of landing in each of the 10 sectors. (a) Let X represent the net contribution to the charity when one person plays the game once. Complete the table for the probability distribution of X. x $2 $1 -$8 P( x) (b) What is the expected value of the net contribution to the charity for one play of the game? (c) The charity would like to receive a net contribution of $500 from this game. What is the fewest number of times the game must be played for the expected value of the net contribution to be at least $500? (d) Based on last year’s event, the charity anticipates that the Spin the Pointer game will be played 1,000 times. The charity would like to know the probability of obtaining a net contribution of at least $500 in 1,000 plays of the game. The mean and standard deviation of the net contribution to the charity in 1,000 plays of the game are $700 and $92.79, respectively. Use the normal distribution to approximate the probability that the charity would obtain a net contribution of at least $500 in 1,000 plays of the game. 63 Free Response Practice 2010 FR#2 – Radio Station Programming A local radio station plays 40 rock-and-roll songs during each 4-hour show. The program director at the station needs to know the total amount of airtime for the 40 songs so that time can also be programmed during the show for news and advertisements. The distribution of the lengths of rock-and-roll songs, in minutes, is roughly symmetric with a mean of 3.9 minutes and a standard deviation of 1.1 minutes. (a) Describe the sampling distribution of the sample mean song lengths for random samples of 40 rock-and-roll songs. (b) If the program manager schedules 80 minutes of news and advertisements for the 4-hour (240-minute) show, only 160 minutes are available for music. Approximately what is the probability that the total amount of time needed to play 40 randomly selected rock-and-roll songs exceeds the available airtime? 64 Free Response Practice 1998 FR#6 – Pearls The manager of a cultured pearl farm has received a special order for two pearls between 7 millimeters and 9 millimeters in diameter. From past experience, the manager knows that the pearls found in his oyster bed have diameters that are normally distributed with a mean of 8 millimeters and a standard deviation of 2 millimeters. Assume that every oyster contains one pearl. The manager wants to know how many oysters he should expect to open to find two pearls of the appropriate size for this special order. Complete the following parts to design a simulation to answer the manager’s question. (a) Determine the probability of finding a pearl of the appropriate size in an oyster selected at random. (Express this probability as a number between 0 and 1. Round this probability to the nearest tenth.) (b) Describe how you would use a table of random digits to car out a simulation to deter mine the number of oysters needed to find two pearls of the appropriate size. Include a description of what each of the digits 0. 1. 2. 3. 4. 5. 6. 7. 8. and 9 will represent in your simulation. (c) Perform your simulation 3 times. (That is, run 3 trials of you simulation.) Start at the upper left most digit in the first row of the table and move across. Make your procedure clear so that someone can follow what you did. You must do this by marking directly on or above the table. 48747 51269 58249 22684 77576 97842 76595 87073 80993 02409 74872 48327 32588 73694 52010 37565 57431 37976 38392 97751 88856 52457 29251 81333 84422 17857 23882 01257 77848 10264 80016 52352 73613 40615 98037 37890 21392 57648 63910 81230 71950 22930 47051 09596 38561 22494 43776 63016 10241 69580 00369 10503 73572 03413 06181 65 (d) The results of two 100-trial simulations, one searching for two pearls between 7 millimeters and 9 millimeters and the other searching for two pearls between 4 millimeters and 6.5 millimeters are shown below. Identify which distribution, A or B, represents the search for two 7 millimeter to 9 millimeter pearls. Explain your reasoning. (e) Use the appropriate distribution in part (d) to compute an estimate of the expected number of oysters opened to find two pearls between 7 millimeters and 9 millimeters in diameter. 66 Statistical Inference 67 Topic IV – Statistical Inference Estimating population parameters and testing hypotheses (30%-40%) Statistical inference guides the selection of appropriate models. A. Estimation (point estimators and confidence intervals) 1. Estimating population parameters and margins of error 2. Properties of point estimators, including unbiasedness and variability 3. Logic of confidence intervals, meaning of confidence level and intervals, and properties of confidence intervals 4. Large sample confidence interval for a proportion 5. Large sample confidence interval for the difference between two proportions 6. Confidence interval for a mean 7. Confidence interval for the difference between two means (unpaired and paired) 8. Confidence interval for the slope of a least-squares regression line B. Tests of Significance 1. Logic of significance testing, null and alternative hypotheses; p-values; one- and two-sided tests; concepts of Type I and Type II errors; concept of power 2. Large sample test for a proportion 3. Large sample test for a difference between two proportions 4. Test for a mean 5. Test for a difference between two means (unpaired and paired) 6. Chi-square test for goodness of fit, homogeneity of proportions, and independence (one- and two-way tables) 7. Test for the slope of a least-squares regression line 68 What Percentage of the Earth’s Surface is Water? Make a small dot on your thumb. Toss the globe beach ball from person to person, determining whether your dot was in water or land. Null Hypothesis: Alternative Hypothesis: Test Statistic: Conditions: Decision Rule: Sample Data: Water Land 69 Female Mathematicians A company has 11 mathematicians on its staff, 3 of whom are women. The president of the company is concerned about the small number of women mathematicians. The president learns that about 40 percent of the mathematicians in the United States are women, and asks you to investigate whether or not the number of women mathematicians in the company is consistent with the national pool. 70 "ALL ANIMALS ARE EQUAL, BUT SOME ANIMALS ARE MORE EQUAL THAN OTHERS." Statistical Inference with Barnum’s Animal Crackers Questions:  How many types of animals are there?  How many animals are in a box? Null Hypothesis: Alternative Hypothesis: Test Statistic: Conditions: 71 Inference With Animal Crackers Animal Number Bear (on all fours) Bear (standing) Bison Camel Cougar Elephant Giraffe Gorilla Hippopotamus Hyena Kangaroo Koala Lion Monkey Rhinoceros Seal Sheep Tiger Zebra 72 Inference with Animal Crackers Bear (on Bear all fours) (standing) Bison Camel Cougar Elephant Giraffe Gorilla Hippo Hyena Kangaroo Koala Lion Monkey Rhino Seal Sheep Tiger Zebra 73 Distracted Driving Are drivers more distracted when using a cell phone than when talking to a passenger in the car? Researchers wanted to find out, so they designed an experiment. Here are the details. In a study involving 48 people, 24 people were randomly assigned to drive in a driving simulator while using a cell phone. The remaining 24 were assigned to drive in the driving simulator while talking to a passenger in the simulator. Part of the driving simulation for both groups involved asking drivers to exit the freeway at a particular exit. In the study, 7 of the 24 cell phone users missed the exit, while 2 of the 24 talking to a passenger missed the exit. (from the 2007 AP* Statistics exam, question 5)  Let’s start by summarizing the data from this study. Each of the 48 people in the experiment can be classified into one of the four cells in the table below based on the experimental condition to which they were assigned and whether they missed the designated exit. Use information from the previous paragraph to complete the table. Distraction Cell phone Passenger Yes No Missed exit? To analyze data, we begin by making one or more graphs. Which is more distracting? Which is more distracting? 100% 25 80% Percent Frequency 20 15 10 5 60% 40% 20% 0 0% Cell phone Passenger Type of Distraction Missed exit  Cell phone Didn't miss exit Passenger Type of distraction Missed exit Didn't miss exit Two types of Excel bar graphs are shown above. Explain the difference in what the two graphs display. Then tell which one you prefer and why. Next, we add numerical summaries. We might be interested in comparing the counts, percents, or proportions of people in the two groups who missed the freeway exit.  Fill in the missing entries in the table below for the passenger group. Cell phone group Passenger group Number Missed exit Proportion Percent 7 0.292 29.2 In the distracted driving experiment, 29.2% of the 24 drivers talking on cell phones missed the freeway exit, compared with only 8.3% of the 24 drivers who were talking to passengers. This seems like a pretty large difference—almost 21% higher for the drivers who used cell phones. Researchers might be tempted to conclude that the different experimental conditions—talking on a cell phone and talking to a 74 passenger—actually caused the observed difference in the percent of drivers who missed the freeway exit. There is another possibility, however. Suppose that the two experimental conditions—talking on a cell phone and talking to a passenger— actually have the same effect on drivers’ distraction. In that case, the 9 people in this experiment who missed the freeway exit would have done so no matter which group they were assigned to. Likewise, the 39 people who did not miss the exit would have had the same result whether they talked on a cell phone or to a passenger. This leads us to the other possibility: if the two experimental conditions actually have the same effect on drivers’ distraction, then the difference in the percents that missed the exit in the two groups could simply have been due to chance. That is, the difference could be a result of which 24 people just happened to be assigned to each group. In the next activity, you will examine whether this second possibility seems plausible. Activity: Could the observed difference be due to the chance assignment of people to groups? Materials: Standard deck of playing cards for each group of 3-4 students What would happen if we reassigned the 48 people in this experiment to the cell phone and passenger groups many times, assuming that the group assignment had no effect on whether each driver missed the exit? Let’s try it and see. 1. Get a standard deck of playing cards from your teacher. Make sure that your deck has 52 cards, not including jokers. 2. We need 48 cards to represent the 48 drivers in this study. In the original experiment, 9 people missed the exit and 39 people didn’t miss the exit. If the group assignment had no effect on drivers’ distraction, these results wouldn’t change if we reassigned 24 people to each group at random. For a physical simulation of these reassignments, we need 9 cards to represent the people who will miss the exit and 39 cards to represent the people who won’t miss the exit. With your group members, discuss which cards should represent which outcomes. When you have settled on a plan, designate one member of your group to share your plan with the class. 3. After each group presents its plan, the class as a whole will decide which plan to use. Record the details here. 4. Now you’re ready to simulate the process of reassigning people to groups. “Shuffle up and deal” two piles of 24 cards—the first pile representing the cell phone group and the second pile representing the passenger group. Record the number of drivers who missed the exit in each group. 5. Repeat this process 9 more times so that you have a total of 10 trials. Record your results in the table provided. Trial Number who missed exit in cell phone group Number who missed exit in passenger group 1 2 3 4 5 6 7 8 9 10 75 In the original experiment, 7 of the 24 drivers using cell phones missed the freeway exit, compared to only 2 of the 24 drivers who were talking to a passenger. How surprising would it be to get a difference this large or larger simply due to chance if the effects of the two experimental conditions on drivers’ distraction were actually the same? You can estimate the chance of this happening with the results of your simulation. 6. In how many of your 10 simulation trials did 7 or more drivers in the cell phone group miss the exit? Why don’t you need to consider the number of people in the “talking to a passenger group” who missed the exit? 7. Combine results with your classmates. In what percent of the class’s simulation trials did 7 or more people in the cell phone group miss the freeway exit? 8. Based on the class’s simulation results, do you think it’s possible that cell phones and passengers are equally distracting to drivers, and that the difference observed in the original experiment could have been due to the chance assignment of people to the two groups? Why or why not? Here are the results of 1000 trials of a computer simulation, like the one you did with the playing cards, showing the number of drivers who missed the exit in the cell phone group. 9. In the computer simulation, how often did 7 or more drivers in the cell phone group miss the exit when there is no difference in the effects of the experimental conditions? Do you think the results of the original experiment could be due to chance and not to a difference in the effects of cell phone use and talking to a passenger on driver distraction? Explain your reasoning. 76 Inference and Hypothesis Testing Decision Fail to Reject H0 Reject H0 H0 True The Truth H0 False 77 M & M Statistics Is the Color Distribution of M&Ms homogenous across different types? Color Brown Yellow Red Blue Orange Green Type Milk Chocolate Peanut Peanut Butter Null Hypothesis: Alternative Hypothesis: Test Statistic: Conditions: Decision Rule: 78 Multiple Choice Practice 1. 2. 79 3. 4. 5. 80 Free Response Practice 2012 FR#4 – Television Commercials A survey organization conducted telephone interviews in December 2008 in which 1,009 randomly selected adults in the United States responded to the following question. At the present time, do you think television commercials are an effective way to promote a new product? Of the 1,009 adults surveyed, 676 responded “yes.” In December 2007, 622 of 1,020 randomly selected adults in the United States had responded “yes” to the same question. Do the data provide convincing evidence that the proportion of adults in the United States who would respond “yes” to the question changed from December 2007 to December 2008? 2010 FR#5 – Fish Lengths 81 Free Response Practice 2009 FR#5 – Heart Attack Response 82 Free Response Practice 2006B FR#4 – Manual Dexterity 83 Free Response Practice 2006B FR#6 – Manual Dexterity 84 85 Free Response Practice 2010 FR#6 – Hurricane Damage 86 87 88

Statistical Inference - edventure-ga

Related documents

Products

Support

Statistical Inference - edventure-ga

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib