M 311 – L

advertisement
24 JULY 2016
MATH 311
LAB 2 – SCATTERPLOTS, REGRESSION, GENERATING DATA & NORMALITY TEST
DUE: TUESDAY, 8 APRIL AT 3:00 P.M.
Scatterplots and regression:
Let’s begin with some data. A study of the development of young children recorded the age in months at
which each of 21 children spoke their first word and their performance later in life on the Gesell Adaptive
Test (it’s kinda like an IQ test for kids). The results in the table below and available at the class data
page:
Child Age (months)
1
2
3
4
5
6
7
8
9
10
11
15
26
10
9
15
20
18
11
8
20
7
Gesell Score
95
71
83
91
102
87
93
100
104
94
113
Child Age (months)
12
13
14
15
16
17
18
19
20
21
9
10
11
11
10
12
42
17
11
10
Gesell Score
96
83
84
102
100
105
57
121
86
100
Question 1: Make a scatterplot of this data. To do so, select
Graph>Scatterplot>Simple.
Remember, we are using the age at which the first word is spoken predicts later performance on the
Gesell Exam. So enter the response variable (you have to determine which is the response variable) in the
Y variable and the explanatory variable in the X variable. Lastly, select OK.
Include this graph in your report.
Question 2: Looking only at the scatterplot and before performing any further statistical analysis,
describe the relationship these two variables share. Use the correct statistical language!
We will make your impressions more precise throughout this worksheet.
Question 3: Compute the correlation coefficient (r) of these data. To have MINITAB compute the
correlation coefficient, do the following: Select
Stat>BasicStatistics>Correlation
from the menu. A dialogue box will then open. Click on the Variables box and then double click on
AGE and SCORE in the window on the left. Then select OK. The “Pearson correlation of
Age and Score” value is r. Record this value of r in your report. How does the value of r
correspond with your impressions in question 1?
MINITAB also reports a p-value – disregard this for now.
Question 4: Now produce a regression plot. To get MINITAB to do this, select
Stat>Regression>FittedLinePlot
from the menu. Remember, we are trying to predict the youngsters score on the Gesell test later in life
from the age at which they first speak, so make the appropriate choices for the explanatory and response
variables. Of course you should also include this graph in your report.
Question 5: Use the equation given by MINITAB to predict the Gesell score of a kid who spoke her first
words at 30 months by plugging 30 in for AGE and solving for SCORE. Include your answer in your
report.
The rest of this lab is devoted to learning valuable skills which are not to be included in your write-up.
Consequently, you may wish to close the current Minitab worksheet and open a nice new (and clean) one.
Generating Data
Minitab lives to generate data – you pick the type of distribution the data should have and Minitab does
all the work. For example: let’s suppose you want to generate 10,000 data points of normally distributed
data with a mean of 6 and a standard deviation of 1.8. To do so, use the following commands:
Calc>Random Data>Normal
Enter 10000 in the Number of rows of data to generate box
Enter C1 in the Store in column(s): box
Enter 6 in the Mean: box
Enter 1.8 in the Standard deviation box
Select OK
Pretty cool, huh? Make a histogram of your data to check it.
Repeat the above with Uniformly (instead of normally) distributed data (min of –3, max of 7). Do this by
selecting Calc>Random Data>Uniform. Make a histogram of the data to check the distribution.
You can generate data more than one column at a time. If you wanted to generate, say, 17 columns of
normally distributed data, you would enter C1 – C17 in the Store in column(s) box.
How Normal Is Normal?
Now let’s suppose you’re given data to analyze. Like any good junior statistician, you wonder about the
shape, center, and spread of these data. Towards that end, you construct a histogram of the data (how else
would you be able to describe the shape?) and notice that the histogram is symmetric and single-peaked.
You think it might be normal, but how can you tell for sure? Being bell-shaped isn’t enough. There are
plenty of non-normal bell-shaped distributions out there. We need a test!
How do we test data for normality?
One option would be to compute the mean and standard deviation of the data and then determine how
much of the data lies within 1 standard deviation of the mean, 2 standard deviations, 3 standard
deviations, etc. and then compare this to the percentages listed in Table A.
OR
we let Minitab do all the work!
Have Minitab generate 10,000 rows of normally distributed data (you just learned how to do this above!).
Then use the following commands:
Graph>Probability Plot
Select Single
In the Graph Variables: window, enter the column in which the data you’re testing lives (probably C1).
Select OK
See how the red dots are pretty much following the blue lines – that tells you that the data are normally
distributed.
Now repeat the above procedure with 10,000 Uniformly (Calc>Random Data>Uniform) distributed
data. The red dots won’t follow the blue lines towards the ends – that’s how you know that the data are
not normally distributed. Note – the red dots are close to the line in the center. This is always the case
for any data so don’t pay any attention to it. You need to look at the ends, not the center of the line.
Try it again with any distribution besides Normal or Uniform from the Random Data menu.
See the pattern? Red dots on line at ends = normal … red dots not on line at end = not normal. … red
dots kinda on the line at the ends = kinda normal. Note – even when the distribution is normal, the dots
rarely lie EXACTLY on the line at the extreme ends.
Also notice the box to the right of the graph. Of particular interest is the P-Value score.
Typically, if the P-Value > 0.05, then can assume that the data are normally distributed – at least we
don’t have evidence that it’s not normally distributed. More on this later!
Download