Math 311 Minitab Lab 2, Winter 2003 In this lab we will investigate the following question: does the age at which the first word is spoken predicts later performance on aptitude tests? Let’s begin with some data. A study of the development of young children recorded the age in months at which each of 21 children spoke their first word and their performance much later in life on the Gesell Adaptive Test. The results in the table below and can be found on the course webpage http://www.cwu.edu/~englundt/Data.htm Age Child (months) 1 2 3 4 5 6 7 8 9 10 11 15 26 10 9 15 20 18 11 8 20 7 Gesell Score 95 71 83 91 102 87 93 100 104 94 113 Age Child (months) 12 13 14 15 16 17 18 19 20 21 9 10 11 11 10 12 42 17 11 10 Gesell Score 96 83 84 102 100 105 57 121 86 100 Question 1: Make a scatterplot of this data. To do so, select Graph>Plot. Remember, we are using the age at which the first word is spoken predicts later performance on the Gesell Exam. So enter the response variable (you have to determine which is the response variable) in the Y column and the explanatory variable in the X column. Lastly, select OK. Looking only at the scatterplot and before performing any further statistical analysis, does it seem to you that the variables have a strong association? Record your impressions in your report. Specify exactly what kind of association you think these variables have. Of course, you should include the graph in your report. We will make your impressions more precise throughout this worksheet. Question 2: Compute the correlation (r) of the data. To get MINITAB to compute the correlation do the following: Select Stat>BasicStatistics>Correlation from the menu. A dialogue box will then open. Click on the Variables box and then double click on AGE and SCORE in the window on the left. Then select OK. The “Pearson correlation of Age and Score” value is r. Record this value of r in your report. How does the value of r correspond with your impressions in question 1? Question 3: Now produce a regression plot. To get MINITAB to do this, select Stat>Regression>FittedLinePlot from the menu. Since we are trying to predict the youngsters score on the Gesell test later in life from the age at which they first speak, chose AGE as the explanatory variable and SCORE as the response variable. Of course you should include this graph in your report. Next, use the equation given by MINITAB to predict the Gesell score of a kid who spoke her first words at 30 months by plugging 30 in for AGE and solving for SCORE. Include your answer in your report. Question 4: To hammer home the point that it matters very much which variable we call explanatory, repeat question 3 only this time make SCORE the explanatory variable and AGE the response variable. Again substitute 30 in for AGE and solving for SCORE in this new equation. How much different is this answer from the answer obtained above? Influential observations: Now we’re going to investigate the impact that the outliers in the data have on the regression line. So notice that Child 18 and Child 19’s data is not like the others. Child 18 didn’t speak until a much later age than did the other kids. Child 19 scored much higher than her peers on the Gesell. To investigate the impact of these points, we’ll delete them (in an orderly fashion please) from our data sets. Child 18: Copy the data from Age and Score columns on the MINITAB worksheet and paste it into columns C4 and C5 under the heading Age_1 and Score_1. Now delete child 18’s data from these columns and have MINITAB plot a regression line for this new set of data. What do you notice? Record your observations. Pay attention to the value of r (and, consequently, r2). Child 19: Copy the data from Age and Score columns on the MINITAB worksheet and paste it into columns C7 and C8 under the heading Age_2 and Score_2. Now delete child 19’s data from these columns and have MINITAB plot a regression line for this new set of data. What do you notice? Record your observations. Question 5: Which child’s data seems to have been more influential? That is, which child’s data, when deleted, resulted in the biggest change in the regression line? Involve the values of r2 for each of the three regression lines in your conclusion. Explain why you think this kid’s data is more influential than the other’s. Question 7: Now that you’ve examined the data – both with and without the outlying data included – do you feel that the age at which a kid speaks his or her first word is accurate predictor of the kid’s performance later in life on aptitude tests? What bits of data or analysis results would make you feel even more comfortable with your assertion? Make a clear, convincing, and statistically sound argument. Do not simply say “I think so” or “I think not.” Use the concepts and vocabulary learned in class to defend your conclusion.