24 JULY 2016 MATH 311 LAB 2 – SCATTERPLOTS, REGRESSION, GENERATING DATA & NORMALITY TEST DUE: TUESDAY, 8 APRIL AT 3:00 P.M. Scatterplots and regression: Let’s begin with some data. A study of the development of young children recorded the age in months at which each of 21 children spoke their first word and their performance later in life on the Gesell Adaptive Test (it’s kinda like an IQ test for kids). The results in the table below and available at the class data page: Child Age (months) 1 2 3 4 5 6 7 8 9 10 11 15 26 10 9 15 20 18 11 8 20 7 Gesell Score 95 71 83 91 102 87 93 100 104 94 113 Child Age (months) 12 13 14 15 16 17 18 19 20 21 9 10 11 11 10 12 42 17 11 10 Gesell Score 96 83 84 102 100 105 57 121 86 100 Question 1: Make a scatterplot of this data. To do so, select Graph>Scatterplot>Simple. Remember, we are using the age at which the first word is spoken predicts later performance on the Gesell Exam. So enter the response variable (you have to determine which is the response variable) in the Y variable and the explanatory variable in the X variable. Lastly, select OK. Include this graph in your report. Question 2: Looking only at the scatterplot and before performing any further statistical analysis, describe the relationship these two variables share. Use the correct statistical language! We will make your impressions more precise throughout this worksheet. Question 3: Compute the correlation coefficient (r) of these data. To have MINITAB compute the correlation coefficient, do the following: Select Stat>BasicStatistics>Correlation from the menu. A dialogue box will then open. Click on the Variables box and then double click on AGE and SCORE in the window on the left. Then select OK. The “Pearson correlation of Age and Score” value is r. Record this value of r in your report. How does the value of r correspond with your impressions in question 1? MINITAB also reports a p-value – disregard this for now. Question 4: Now produce a regression plot. To get MINITAB to do this, select Stat>Regression>FittedLinePlot from the menu. Remember, we are trying to predict the youngsters score on the Gesell test later in life from the age at which they first speak, so make the appropriate choices for the explanatory and response variables. Of course you should also include this graph in your report. Question 5: Use the equation given by MINITAB to predict the Gesell score of a kid who spoke her first words at 30 months by plugging 30 in for AGE and solving for SCORE. Include your answer in your report. The rest of this lab is devoted to learning valuable skills which are not to be included in your write-up. Consequently, you may wish to close the current Minitab worksheet and open a nice new (and clean) one. Generating Data Minitab lives to generate data – you pick the type of distribution the data should have and Minitab does all the work. For example: let’s suppose you want to generate 10,000 data points of normally distributed data with a mean of 6 and a standard deviation of 1.8. To do so, use the following commands: Calc>Random Data>Normal Enter 10000 in the Number of rows of data to generate box Enter C1 in the Store in column(s): box Enter 6 in the Mean: box Enter 1.8 in the Standard deviation box Select OK Pretty cool, huh? Make a histogram of your data to check it. Repeat the above with Uniformly (instead of normally) distributed data (min of –3, max of 7). Do this by selecting Calc>Random Data>Uniform. Make a histogram of the data to check the distribution. You can generate data more than one column at a time. If you wanted to generate, say, 17 columns of normally distributed data, you would enter C1 – C17 in the Store in column(s) box. How Normal Is Normal? Now let’s suppose you’re given data to analyze. Like any good junior statistician, you wonder about the shape, center, and spread of these data. Towards that end, you construct a histogram of the data (how else would you be able to describe the shape?) and notice that the histogram is symmetric and single-peaked. You think it might be normal, but how can you tell for sure? Being bell-shaped isn’t enough. There are plenty of non-normal bell-shaped distributions out there. We need a test! How do we test data for normality? One option would be to compute the mean and standard deviation of the data and then determine how much of the data lies within 1 standard deviation of the mean, 2 standard deviations, 3 standard deviations, etc. and then compare this to the percentages listed in Table A. OR we let Minitab do all the work! Have Minitab generate 10,000 rows of normally distributed data (you just learned how to do this above!). Then use the following commands: Graph>Probability Plot Select Single In the Graph Variables: window, enter the column in which the data you’re testing lives (probably C1). Select OK See how the red dots are pretty much following the blue lines – that tells you that the data are normally distributed. Now repeat the above procedure with 10,000 Uniformly (Calc>Random Data>Uniform) distributed data. The red dots won’t follow the blue lines towards the ends – that’s how you know that the data are not normally distributed. Note – the red dots are close to the line in the center. This is always the case for any data so don’t pay any attention to it. You need to look at the ends, not the center of the line. Try it again with any distribution besides Normal or Uniform from the Random Data menu. See the pattern? Red dots on line at ends = normal … red dots not on line at end = not normal. … red dots kinda on the line at the ends = kinda normal. Note – even when the distribution is normal, the dots rarely lie EXACTLY on the line at the extreme ends. Also notice the box to the right of the graph. Of particular interest is the P-Value score. Typically, if the P-Value > 0.05, then can assume that the data are normally distributed – at least we don’t have evidence that it’s not normally distributed. More on this later!