Uploaded by phana4444hout

AgainstAllOdds FacultyGuide Unit22-Sampling-Distributions

advertisement
Unit 22: Sampling
Distributions
Prerequisites
This unit serves as the transition from descriptive statistics to inferential statistics. The
concepts in this unit are more sophisticated than in earlier units. Students should have
considerable exposure to data before viewing this unit. They need familiarity with the mean
and standard deviation (Unit 4, Measures of Center, and Unit 6, Standard Deviation). Most
importantly, students need background on the normal distribution (Units 7 – 9).
Additional Topic Coverage
Additional coverage of sampling distributions can be found in The Basic Practice of Statistics,
Chapter 11, Sampling Distributions. More in-depth information on x charts can be found in
the Companion Chapters (CD insert), Chapter 17, Statistics for Quality: Control and Capability.
Unit 23, Control Charts, continues the discussion of control charts. There are a number of
applets that allow students to conduct simulations that give them insight into the Central Limit
Theorem. For example, take a look at the Central Limit Theorem applet at www.causeweb.org/
repository/statjava/
Activity Description
This activity supports Learning Objectives for this unit A, B, and C. In this activity, students
draw random samples by hand from an approximate normal distribution. After calculating the
sample means, students compare a histogram of individual observations to a histogram of the
sample means.
Drawing samples by hand makes the process of random sampling more concrete to students.
Since drawing 100 samples by hand is time-consuming, after several samples have been
selected, students can be working on other material until it is their turn to draw one or more
samples. If students have access to statistical software, samples can be generated using the
Unit 22: Sampling Distributions | Faculty Guide | Page 1
software’s normal random number generator (see Extension for Minitab instructions). If you
decide to use technology to create the samples, it is still a good idea to begin the activity by
having students draw several samples by hand.
Materials
Prepare 100 identical slips of stiff poster board.
Write the numbers as described in Table T22.1 on the slips.
Write each of
these numbers
50
49, 51
48, 52
47, 53
46, 54
45, 55
44, 56
43, 57
42, 58
41, 59
40, 60
On this
many slips
10
9
9
8
6
5
3
2
1
1
1
Table T22.1. Distribution table for approximate normal data.
These 100 numbered slips form a population. The distribution of the numbers in this population
is roughly normal with mean µ = 50 and standard deviation σ = 4. Before students begin
sampling, they should make a graphic display of the population distribution (question 1).
Before students can answer question 2, they need to collect the data and the instructor needs
to provide the instructions. The numbered slips should be put into a container and mixed. A
student should be asked to draw a sample of size nine for Sample 1 by drawing a slip of paper,
recording its number, and returning it to the container for mixing before drawing the second
number, and so on until nine numbers have been drawn. The values should be recorded
either in a copy of Table T22.2 or in an Excel spreadsheet. This process should be repeated
as many times as convenient – around 100 times if possible. The data should be distributed
to students so that they can complete the activity. Students should save the data in Table
T22.2 for use in the activity for Unit 24, Confidence Intervals.
Unit 22: Sampling Distributions | Faculty Guide | Page 2
Extension
Students use technology (such as Excel or Minitab) to generate 100 (or more) samples of size
9 from a uniform distribution on the interval from 0 to 1. A graphic display for this population’s
distribution is box shaped and centered at 0.5. However, the sampling distribution of x
appears roughly normal in shape but remains centered at 0.5.
Using Excel to Generate the Samples
• Label the first row of columns A – K with the following headings: Sample, X1, X2, X3, X4,
X5, X6, X7, X8, X9, and Mean.
• In the column Sample, generate the numbers 1 – 100 as follows: enter 1 in cell A2. In cell
A3, enter the formula = a2+1 and then press Enter. Click cell A2. Then click the bottom
right corner of the cell and drag down to row 101.
• In cell B2, enter the formula =rand() and press Enter. Click cell B2 and then click the
bottom right corner of the cell and drag across the row to cell J2 to form the first sample.
• With the first sample still highlighted, click the bottom right corner and drag down to form
the other 99 samples.
• To find the mean of the first sample, click cell K2 (the first entry in the column labeled
Mean). Enter the formula =AVERAGE(B2:J2) and press Enter.
• To calculate the means for the remaining 99 samples, click cell K2. Then click the bottom
right corner and drag down to cell K101.
Using Minitab to Generate the Samples
• Label columns C1 – C11 as Sample, X1, X2, X3, X4, X5, X6, X7, X8, X9, and Mean.
• Select Calc > Make Patterned Data > Simple Set of Numbers. Then complete the dialog
box as follows:
Store patterned data in C1
From first value 1
To last value 100
In steps of 1
Click OK
Unit 22: Sampling Distributions | Faculty Guide | Page 3
You should see consecutive integers 1 – 100 under Sample.
• Generate 100 samples of size 9: Calc > Random Data > Uniform. Then complete the dialog
box as follows:
Number of rows of data to generate 100 (or more)
Store in columns: Select C2 – C10 for X1 – X9
Click OK
• To calculate the sample means for each sample: Calc > Row Statistics and then complete
as follows:
Select Mean for the statistic
Select C2 – C10 for the input variables
Store results in: C11
Click OK.
Note: The Minitab instructions above can be adapted to replace drawing by hand from a
normal population. To do so, adapt the second bullet above as follows:
• Generate 100 samples of size 9: Calc > Random Data > Normal and then complete the
dialog box as follows:
Number of rows of data to generate 100 (or more)
Store in columns: Select C2 – C10 for X1 – X9
Mean: 50
Standard deviation: 4
Click OK
Unit 22: Sampling Distributions | Faculty Guide | Page 4
Samples of Size 9, page 1
Sample
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
X1
X2
X3
X4
X5
X6
X7
Unit 22: Sampling Distributions | Faculty Guide | Page 5
X8
X9
Mean
Samples of Size 9, page 2
Sample
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
X1
X2
X3
X4
X5
X6
Unit 22: Sampling Distributions | Faculty Guide | Page 6
X7
X8
X9
Mean
Samples of Size 9, page 3
Sample
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
X1
X2
X3
X4
X5
X6
X7
Table T22.2. Data table for samples of size 9.
Unit 22: Sampling Distributions | Faculty Guide | Page 7
X8
X9
Mean
Table T22.3 contains sample data that you might use if you don’t want to have students draw
all 100 samples. The sample solution will be based on these data.
X1
40
53
57
44
48
58
51
50
59
52
51
52
47
52
54
51
51
45
52
53
50
51
56
50
48
48
54
45
49
45
50
49
47
52
53
54
50
X2
55
55
48
50
59
41
46
47
54
44
45
48
50
54
44
43
58
51
56
47
40
46
41
50
48
50
53
51
56
52
53
52
51
51
49
48
47
X3
45
50
46
55
51
46
51
51
49
46
53
58
53
50
51
49
48
45
51
52
57
47
51
49
54
51
55
53
53
44
52
44
50
51
40
57
55
X4
53
53
50
47
49
56
40
52
49
57
54
57
58
49
43
51
45
46
52
50
51
47
52
56
49
52
46
52
48
57
49
40
56
53
49
54
46
X5
41
51
46
44
51
46
49
51
49
43
46
56
47
53
46
46
51
52
53
55
52
52
48
60
53
51
48
46
50
48
50
43
47
42
49
48
49
X6
50
43
48
44
47
55
47
51
41
46
48
44
44
43
52
49
50
48
53
54
52
55
44
42
49
50
48
40
54
41
53
50
54
47
49
55
52
X7
54
54
53
47
47
49
48
60
56
50
49
52
48
53
53
48
59
52
43
46
48
46
51
52
51
47
52
49
44
49
53
51
55
47
48
46
45
X8
60
53
51
59
45
50
60
53
46
50
46
50
47
59
58
46
53
48
47
49
46
51
48
45
49
50
52
57
52
51
52
50
50
54
52
52
53
X9
56
50
47
53
55
51
46
48
56
47
41
49
48
49
47
48
55
54
48
55
49
60
47
57
47
50
47
43
50
46
55
55
46
45
48
55
46
Unit 22: Sampling Distributions | Faculty Guide | Page 8
Sample Mean
48
52
46
56
46
54
50
55
47
46
56
41
51
59
55
52
46
53
49
46
46
56
50
52
49
47
45
53
49
51
48
44
52
60
50
52
51
50
51
48
49
51
56
55
50
46
50
52
49
43
55
54
47
46
54
52
50
56
60
51
47
46
41
46
56
47
50
46
54
51
44
51
40
51
52
49
46
52
50
49
51
52
50
53
52
58
52
53
47
51
47
42
48
43
46
46
50
45
46
48
57
52
44
52
52
45
44
48
48
46
53
52
51
54
45
45
44
54
46
53
45
53
48
45
45
54
54
54
51
46
44
44
54
50
48
46
48
54
52
45
46
51
43
49
49
45
47
53
50
51
43
52
49
53
47
46
54
53
56
46
53
54
50
49
49
53
50
52
50
50
48
43
40
53
58
49
48
48
46
51
58
47
49
51
48
53
54
58
48
48
55
46
51
53
53
45
48
53
53
54
46
57
51
48
47
50
53
52
44
56
54
51
47
47
48
53
49
46
51
49
50
45
52
45
49
48
52
58
52
50
57
56
50
44
49
43
60
51
43
47
50
47
51
52
59
48
52
41
48
51
49
52
58
51
49
52
55
48
41
52
54
49
53
59
51
57
51
55
53
48
53
48
47
51
55
58
59
47
41
51
57
47
55
54
49
50
45
47
42
52
52
55
53
51
55
52
49
48
50
56
51
44
46
59
53
51
43
55
46
52
49
46
55
54
48
54
43
41
50
46
53
44
52
55
52
53
46
50
53
54
57
54
47
50
45
40
49
48
46
51
49
47
43
56
47
60
48
55
54
47
42
Unit 22: Sampling Distributions | Faculty Guide | Page 9
49
46
52
46
51
52
40
48
49
52
51
49
59
46
49
44
48
49
53
51
56
53
54
42
50
47
46
52
50
48
48
53
58
46
50
48
53
53
44
51
47
56
52
51
51
48
53
57
49
48
51
54
54
50
44
55
53
49
53
52
49
51
47
56
44
45
48
52
55
51
54
44
54
52
50
45
47
49
48
51
47
47
50
48
45
54
48
53
48
51
47
50
53
53
54
50
52
52
60
45
47
50
47
48
58
51
52
60
46
60
48
52
48
51
45
47
52
49
44
43
48
48
50
60
45
51
49
52
50
48
52
54
45
54
54
47
51
54
53
44
53
44
48
59
53
49
47
56
56
44
60
51
53
49
49
49
50
50
51
45
52
53
51
48
49
51
49
51
51
48
56
50
51
49
47
51
47
44
54
45
42
57
50
48
52
46
48
50
49
55
49
45
50
55
52
50
54
52
47
41
52
47
43
50
51
54
58
51
53
48
50
49
48
48
51
55
Table T22.3. Sample Data
Unit 22: Sampling Distributions | Faculty Guide | Page 10
The Video Solutions
1. Parameters describe an entire population and are generally unknown. Statistics are
computed from samples.
2. No, it might be too expensive to inspect them all. In addition, it is too costly to wait until the
end to examine the finished product. If somewhere in the process things are out of control, it
doesn’t make sense to put in additional money to finish a defective product.
3. No, there will be variability in the mean scores. Sometimes the mean score will be above
100 and sometimes below 100.
4. The distribution of x will be normal with mean 100 and standard deviation 4
5.
5. The sample mean is less variable than individual observations. That’s because when
averaging, high observations will balance out low observations, which makes the mean
less variable.
6. The sampling distribution of the sample mean is approximately normally distributed if the
sample size is sufficiently large.
Unit 22: Sampling Distributions | Faculty Guide | Page 11
Activity Solutions
1. The shape is roughly symmetric and mound-shaped. Except for the fact that the population
consists of whole numbers so that the distribution can’t be normal, it is approximately
normally distributed.
10
10
9
9
8
8
Frequency
9
8
6
6
9
6
5
5
4
3
2
2
1
0
3
1
2
1
42
1
45
48
X
51
54
57
1
1
60
2. a. Sample answer (based on sample data):
50.44 51.33 49.56 49.22 50.22 50.22 48.67 51.44 51.00 48.33
48.11 51.78 49.11 51.33 49.78 47.89 52.22 49.00 50.56 51.22
49.44 50.56 48.67 51.22 49.78 49.89 50.56 48.44 50.67 48.11
51.89 48.22 50.67 49.11 48.56 52.11 49.22 50.44 49.89 52.89
52.44 49.33 48.22 50.11 50.89 50.33 50.33 51.67 50.56 50.44
47.22 48.67 49.22 48.78 50.56 52.89 50.44 49.44 50.78 49.11
47.78 48.33 49.44 49.22 50.11 47.00 50.44 50.56 51.11 51.33
50.33 51.67 51.00 51.56 48.33 47.22 50.67 49.44 51.56 50.89
50.56 49.44 47.78 50.00 51.89 48.11 50.44 50.56 48.89 53.22
49.89 49.67 49.22 50.33 49.67 49.11 51.78 50.22 50.67 49.56
Unit 22: Sampling Distributions | Faculty Guide | Page 12
b.
30
Frequency
25
20
15
10
5
0
42
45
48
51
Sample Mean
54
57
60
Both the population distribution and sampling distribution of x are fairly symmetric and
mound-shaped, and appear approximately normal in shape. Both are centered at around 50.
However, the sampling distribution is more concentrated about its center, and not as spread
out as the population distribution.
Extension
3. c. The distribution will be roughly symmetric and mound-shaped – or approximately normally
distributed – similar to the one that follows.
25
Frequency
20
15
10
5
0
-0.00
0.15
0.30
0.45
0.60
Sample Mean
0.75
0.90
Unit 22: Sampling Distributions | Faculty Guide | Page 13
The histogram should be centered close to 0.5, which is the center of the population
distribution curve shown in Figure 22.10. However, the sampling distribution will be less spread
out than the population distribution, which is evenly spread between 0 and 1. Instead it will be
concentrated around 0.5 and trail off on either side of 0.5.
Unit 22: Sampling Distributions | Faculty Guide | Page 14
Exercise Solutions
1. a. The mean of the three measurements has standard deviation
σ
n
=
0.08
3
≈ 0.046
b. The mean of three measurements is less variable than a single measurement. That is why
it is good practice to repeat a laboratory measurement several times independently and report
the average.
2. This exercise contrasts the distribution of one score with that of the average of 55 scores.
a. We need to calculate the probability that a normal random variable x has value 21 or
greater. We first convert to z-scores so that we can use standard normal tables:
⎛ x − 18.6 21− 18.6 ⎞
P(x ≥ 21) = P ⎜
≥
= P(z ≥ 0.41) = 1− 0.6591= 0.3409
5.9 ⎟⎠
⎝ 5.9
Note: We could also use software such as Minitab or a graphing calculator to compute the
probability directly, as shown below. This gives a more accurate answer.
Normal, Mean=18.6, StDev=5.9
0.07
0.06
Density
0.05
0.04
0.03
0.02
0.3421
0.01
0.00
18.6 21
X
Unit 22: Sampling Distributions | Faculty Guide | Page 15
b. The sampling distribution of x for samples of size 25 is normally distributed with mean
18.6 and standard deviation 5.9 55 ≈ 0.7956 . Below are the calculations for computing the
probability using a standard normal table:
⎛ x − 18.6 21− 18.6 ⎞
P(x ≥ 21) = P ⎜
≥
= P(z ≥ 3.02) = 1− 0.9987 = 0.0013
0.7956 ⎟⎠
⎝ 0.7956
We can calculate this probability directly using software:
Normal, Mean=18.6, StDev=0.7956
0.5
Density
0.4
0.3
0.2
0.1
0.001278
0.0
18.6
X
21
The mean of many scores is less likely to be far away from the population mean than is a
single score.
3. a. The distribution is approximately normal with mean 2.2 and standard deviation
1.4 52 ≈ 0.194 .
b. Again, transforming in order to use the standard normal table gives:
⎛ x − 2.2 2 − 2.2 ⎞
P(x < 2) = P ⎜
<
= P(z < −1.03) = 0.1515
0.194 ⎟⎠
⎝ 0.194
c. To say that the total is less than 100 is exactly the same as saying that the mean per week
is less than 100/52 or 1.923. Now, we can proceed to find P(x < 1.923) , which we find directly
using software:
Unit 22: Sampling Distributions | Faculty Guide | Page 16
Normal, Mean=2.2, StDev=0.194
2.0
Density
1.5
1.0
0.5
0.07667
0.0
1.923
2.2
X
4. a. The lower and upper control limits are µ ± 3σ n . In this case, the limits are
6 ± (3)(0.9 3) or LCL ≈ 4.44 and UCL ≈ 7.56 (rounded to two decimals).
b. Samples collected over a 24-hour time period appear in Table 22.3. The sample means
appear below.
Sample
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
5.8
6.4
5.8
5.7
6.5
5.2
5.1
5.8
4.9
6.4
6.9
7.2
6.9
5.3
6.5
6.4
6.5
6.9
6.2
5.5
pH level
6.2
6.9
5.2
6.4
5.7
5.2
5.2
6.0
5.7
6.3
5.2
6.2
7.4
6.8
6.6
6.1
6.7
6.8
7.1
6.7
6.0
5.3
5.5
5.0
6.7
5.8
5.6
6.2
5.6
4.4
6.2
6.7
6.1
6.2
4.9
7.0
5.4
6.7
4.7
6.7
Sample mean
6.0
6.2
5.5
5.7
6.3
5.4
5.3
6.0
5.4
5.7
6.1
6.7
6.8
6.1
6.0
6.5
6.2
6.8
6.0
6.3
Unit 22: Sampling Distributions | Faculty Guide | Page 17
21
22
23
24
6.6
6.4
6.4
7.0
5.2
6.0
4.6
6.3
6.8
5.9
6.7
7.4
6.2
6.1
5.9
6.9
c.
8
UCL: 7.56
Sample Mean
7
6
Mean: 6
5
LCL: 4.44
4
0
5
10
Sample Number
15
20
d. No, none of the sample means is above the upper control limit or below the lower
control limit.
e. Although the sample means lie within the control limits, the process appears to be changing.
Initially, the sample means were fairly close to the upper control limit. Over time, the sample
means tended to decrease. If this pattern continues, the sample means will begin to fall below
the lower control limit. So, this process does not appear to be stable.
Unit 22: Sampling Distributions | Faculty Guide | Page 18
Review Questions Solutions
1. a. Using the 68-95-99.7 rule, 95% of the caps will have diameters within two standard
deviations of the mean – hence, between 0.497 and 0.503 inches. Thus, around 5% of the
bottles will have diameters outside of the chemical manufacturer’s specification limits.
b. x will have a normal distribution with mean 0.500 inch and standard deviation 0.0015
0.0005 inch.
9 =
c. The endpoints of the acceptance interval can be written as 0.500 ± 0.001, which is
equivalent to 0.005 ± 2(0.0005). Hence, 95% of the samples will have means within this
interval. The production process will be stopped 5% of the time (or a proportion of 0.05).
2. a. No. The number of people in a car must be a whole number. A normal random variable
can take any value, not just whole number values. (Normal distributions are
continuous distributions.)
b. The sample mean x has an approximately normal distribution with mean 1.5 and standard
deviation 0.75 700 ≈ 0.02835
c. The total in 700 cars exceeds 1075 people exactly when the mean x exceeds
1075/700 ≈1.5357 persons per car. So, the probability we want is:
⎛ x − 1.5 1.5357 − 1.5 ⎞
P(x > 1.5357) = P ⎜
>
= P(z > 1.26) = 1− 0.8962 = 0.1038
0.02835 ⎟⎠
⎝ 0.02835
We can compute this directly using software (see chart next page):
Unit 22: Sampling Distributions | Faculty Guide | Page 19
Normal, Mean=1.5, StDev=0.02835
14
12
Density
10
8
6
4
2
0
0.1040
1.5357
1.5
X
3. a. µ x = 90 seconds; σ x = 120 10 ≈ 37.9 seconds. The shape will be less skewed to the
right than the original distribution. Given that the sample size is relatively small, you can’t say
much more than that.
b. µ x = 90 seconds; σ x = 120 100 = 12 seconds. Since the sample size is large, by the
Central Limit Theorem we can say that the shape of the distribution will be approximately
normal.
c. First, we convert 2 minutes into 120 seconds. We need to calculate P ( x > 120) , which we
determine using software, which gives P ( x > 120) ≈ 0.0062 as shown below.
Distribution Plot
Normal, Mean=90, StDev=12
0.035
0.030
Density
0.025
0.020
0.015
0.010
0.005
0.000
0.006210
90
X
120
Unit 22: Sampling Distributions | Faculty Guide | Page 20
Download