Ch11. Chi-sq, T- and F

advertisement
307
Chapter 11. Chi-square, Student’s T and Fisher’s F-distribution
Problem PS246
1. Graph three Chi-square densities with respectively 4, 8 and 12 dof (degrees of
freedom). Work as follows: the values 0, 0.5, 1, 1.5, …, 25 in the range A3:A53, the dof
4, 8 and 12 in cells B2:D2, in cell B3 the Chi-square density for 4 dof at the point in
cell A3: = 𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇($𝐴3; 𝐡$2; 0). Drag cell B3 to cell D3 and cells B3:D3 to cells
B53:D53.
Graph the three densities using Scatter with smooth lines. Notice that the density
tends to a normal density as the number of dof increases;
2. When X is Chi-square distributed with k dof (𝑋~πΆβ„Žπ‘–(π‘˜)), it can be shown that
𝐸(𝑋) = π‘˜, π‘£π‘Žπ‘Ÿ(𝑋) = 2π‘˜, π‘šπ‘œπ‘‘π‘’π‘ (𝑋) = π‘˜ − 2, π‘“π‘œπ‘Ÿ π‘˜ > 2. The median of X can be
computed in Excel using = 𝐢𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉(0.5; π‘˜) when π‘˜ π‘‘π‘œπ‘“.
Construct vertical line segments at the value of the mode, the median and the
expected value. To add a vertical line for the mean in the graph for π‘‘π‘œπ‘“ = 10, input
the value 10 in cells I27 and I28, the value 0 in cell J27 and =
𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐼28; 10; 0) in cell J28. Add the series with Series Y-values: J27/J28 and
Series X-values I27:I28. Work in the same way for the mode and the median.
Assignment PA246
Graph three chi-square random variables with 10, 20 and 40 dof. Graph three normal
densities on the same graph where the normal variables have the expected value and the
variance of the chi-square random variables. Note how the approximation of the normal
density improves as the number of dof increases.
308
Problem PS247
When Y is a normal random variable with expected value πœ‡ and standard deviation σ, then
𝑋 = ((π‘Œ − πœ‡)⁄𝜎)2 is a Chi-square random variable with 1 degree of freedom. We simulate
this result in the following steps:
1. Generate randomly 250 normal values with expected value 10 and standard
deviation 2 in the range A2:A251. Compute the corresponding values of X in the
range B2:B252. Example for cell B2: = π‘ƒπ‘‚π‘ŠπΈπ‘…((𝐴2 − 10)⁄2; 2) and likewise for
cells B3:B251. Use bin values 0, 0.2, 0.4, …, 4.2 in cells E3:E24. Compute the relative
frequencies of the sample values in column B using the array instruction =
πΉπ‘…πΈπ‘„π‘ˆπΈπ‘πΆπ‘Œ(𝐡2: 𝐡251; 𝐸3: 𝐸24)/250 in cells F3:F24. Input the midpoints of the
bins 0.1, 0.3, …, 4.1 in cells G4:G24. Compute the probability of the bin intervals for a
chi-square random variable with 1 dof in cells H4:H24. Example for cell H4: =
𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐸4; 1; 1) − 𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐸3; 1; 1) and likewise for cells H5:H24.
Graph the relative frequencies and the chi-square probabilities in one graph with the
relative frequencies as rectangles with gap 0 and the probabilities as a Scatter with
smooth lines. Notice the close correspondence between relative frequencies and
probabilities;
2. Compute the sample mean, the sample variance and the sample median of the
sample in column B and compare to the expected value, the variance and the median
of a chi-square random variable with 1 dof. The median is computed using =
𝐢𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉(0.5; 1).
Assignment PA247
Illustrate the property that the sum of independent chi-square random variables is chisquare distributed with dof equal to the sum of the dof’s of the independent variables. Work
as follows: generate randomly chi-square random values in cells A2:D2 with respectively 2,
3, 2 and 4 dof. Sum the values in cell E2. Drag cells A2:E2 over 299 more lines.
Compute sample means, medians and variances for the samples in the five columns.
Graph the histogram of the sample in column E and compare to a chi-square density with 11
dof as in Step 1 above.
309
Problem PS248
For a random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 from a normally distributed population with expected
value πœ‡ and standard deviation 𝜎 it holds that (𝑛 − 1) ∗ 𝑆 2 ⁄𝜎 2 = ∑𝑛1(𝑋𝑖 − 𝑋̅)2 ⁄𝜎 2 is Chisquare distributed with 𝑛 − 1 π‘‘π‘œπ‘“ (𝑋̅ is the sample mean, 𝑆 2 is the sample variance). We
simulate this result in the following steps:
1. Report the expected value πœ‡ = 10 in cell A1 and the standard deviation 𝜎 = 2.5 in
cell B2. Generate a random sample in cells A3:A7: =
𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); $𝐴$1; $𝐡$1). Compute the sample variance in cell C3: =
𝑉𝐴𝑅. 𝑆(𝐴3: 𝐴7) and the value of (𝑛 − 1) ∗ 𝑆 2 ⁄𝜎 2 in cell D3: = 4 ∗ 𝐢3⁄(𝐡1 ∗ 𝐡1). Use
the TABLE-option to generate the values in cell C3:D3 1000 times;
2. Use bin values 1, 2, …, 14, 15 in cells F4:F18 and compute the relative frequencies of
the Chi-square values in column D in cells G4:G18. Compute the probabilities of the
chi-square density with 4 dof in cells H4:H18. Example for cell H5: =
𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐹5; 4; 1) − 𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐹4; 4; 1) and likewise for the remaining cells;
3. Construct a histogram of the relative frequencies in cells G4:G18 and the chi-square
probabilities in cells H4:H18. Change the columns of the density to a Scatter with
smooth lines. Notice how the histogram of relative frequencies approximates the
smooth chi-square density;
4. Compute the average and the variance of the 1000 values in column D and compare
these values to their expected values of 𝑛 − 1 = 4 and 2 ∗ (𝑛 − 1) = 8;
5. Change the value of πœ‡ in cell A1 to 20, then to 5. Change the value of the standard
deviation in cell B2 to 10, then to 1 and check the graphs.
Assignment PA248
In the problem above, replace the value of ∑𝑛1(𝑋𝑖 − 𝑋̅)2 ⁄𝜎 2 by ∑𝑛1(𝑋𝑖 − πœ‡)2⁄𝜎 2 which is Chisquare distributed with n dof. Simulate this result following the four steps above.
310
Problem PS249
Graph the standard normal pdf and three Student T densities with respectively 3, 8 and 25
dof (degrees of freedom). Work as follows: the values -4, -3.8, -3.6, …3.6, 3.8, 4 in the range
A3:A43, the dof 3, 8 and 25 in cells C2:E2, in cell B3 the standard normal density value at
the point in cell A3: = 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇($𝐴3; 0). In cell C3 the T-density with 3 dof at the
point in cell A3: = 𝑇. 𝐷𝐼𝑆𝑇($𝐴3; 𝐢$2; 0). Drag cell C3 to cell E3 and cells B3:E3 to cells
B43:E43.
Graph the four densities using Scatter with smooth lines.
Notice that the T-density tends to a standard normal density as the number of degrees of
freedom increases.
Assignment PA249
Graph a standard normal pdf and a Student t-density with 5 dof as above. Compute the
value of the distribution function for a probability 0.025 using T.INV and NORM.S.INV, say π‘₯𝑑
and π‘₯π‘›π‘œπ‘Ÿπ‘š . Construct vertical line segments from the horizontal axis to the value of the pdf
at π‘₯𝑑 and π‘₯π‘›π‘œπ‘Ÿπ‘š .
Repeat this assignment for a T distribution with 25 dof.
311
Problem PS250
When Z is standard normally distributed and U is Chi-square distributed with k dof,
independent of Z, then 𝑋 = 𝑍⁄√π‘ˆ⁄π‘˜ is Students T-distributed with k dof. We simulate this
result as follows:
1. Generate randomly 1000 standard normal values in the range A3:A1002:
= 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( )). Generate randomly 1000 Chi-square values with 5 dof
in the range B3:B1002 (the dof 5 in cell B2). Compute the above ratio in column C.
Example for cell C3: = 𝐴3⁄𝑆𝑄𝑅𝑇(𝐡3⁄𝐡$2) and likewise for cells C4:C1002;
2. Compute relative frequencies of the values in column C ranging from -4.4 to 4.4 in
steps of 0.4 (cells E3:E25 and F3:F25). Compute the probability of the bin intervals
for a T-distribution with 5 dof in cells G3:G25. Example for cell G4: =
𝑇. 𝐷𝐼𝑆𝑇(𝐸4; 𝐡$2; 1) − 𝑇. 𝐷𝐼𝑆𝑇(𝐸3; 𝐡$2; 1);
3. Show the relative frequencies (as rectangles) and the probabilities (as a smooth
curve) in the same graph.
Assignment PA250
Generate 500 random numbers in column A. Generate randomly 500 standard normal
values in column B using the corresponding random numbers in column A. Generate
randomly 500 T-values (6 dof) in column C again using the random numbers in column A.
1. Compute mean of the values generated in columns B and C and compare to their
expected values of 0;
2. Compute median of the values generated in columns B and C and compare to their
expected values of 0;
3. Compute variance of the values generated in columns B and C and compare to their
expected values of 1 and 6⁄4 = 1.5.
312
Problem PS251
Let 𝑋̅ be the sample mean and S the sample standard deviation of a random sample of size n
from a normal population. It is known that (𝑋̅ − πœ‡) ∗ √𝑛⁄𝑆 is T-distributed with 𝑛 − 1 dof.
We simulate this result as follows:
1. Generate a random sample of 16 observations from a normal population with
expected value 10 and standard deviation 2 in cells A3:A18: =
𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 10; 2). Compute the sample mean in cell C3, the sample
standard deviation in cell D3 and the ratio (𝑋̅ − πœ‡) ∗ √𝑛⁄𝑆 in cell E3. Use the TABLEoption to generate those values 1000 times in columns C, D and E;
2. Compute the average of the 1000 ratios in column E in cell H3 and compare with its
theoretical value 0. Compute the standard deviation of the 1000 ratios in column E in
cell H6 and compare with its theoretical value √15⁄13 = 1.0742;
3. Compute the relative frequencies of the ratios in column E between -4 and +4.4
using interval widths of 0.4. Compute the probabilities of these intervals assuming a
T distribution with 15 dof. Compare the relative frequencies and the probabilities,
also graphically using smooth curves (similar to Steps 2 and 3 in PS250)
Assignment PA251
Consider the problem PS251 above. In Step 1 add a column F where the ratio
(𝑋̅ − πœ‡) ∗ √𝑛⁄𝜎 is computed. In Step 2 also compute the average and the standard deviation
of the ratio values in column F and compare with their theoretical values.
313
Problem PS252
Consider Problem PS251 but the sample data are generated from an exponential density
with parameter 1.
1. Generate 64 random exponential values in the range A3:A66: =
𝐺𝐴𝑀𝑀𝐴. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 1; 1);
2. Firstly consider the first 9 sample points (A3:A11) and compute the sample mean in
cell C3, the sample standard deviation in cell D3 and the ratio (𝑋̅ − πœ‡) ∗ √𝑛⁄𝑆 in cell
E3. Next consider the full sample of 64 sample points (A3:A66) and compute the
sample mean in cell G3, the sample standard deviation in cell H3 and the ratio
(𝑋̅ − πœ‡) ∗ √𝑛⁄𝑆 in cell I3. Use the TABLE-option to generate those values 1000 times
in columns C to I;
3. Compute the average of the 1000 ratios in column E and compare with the value 0.
Notice the significant difference. Compute the standard deviation of the 1000 ratios
in column E and compare with the value √8⁄6 = 1.1547. Notice again the significant
difference. Repeat the computations for the sample of 64 sample points and notice
that the differences have been much reduced (theoretical standard deviation is
√63⁄61 = 1.0163 );
4. Compare graphically relative frequencies and probabilities as in Step 2 and 3 in
Problem PS250 for both sample sizes.
Assignment PA252
Consider the problem above but generate the sample data from a gamma density with
parameters 4 and 2 (expected value is 2). Notice the robustness of the T-approximation for
the ratio even for small sample sizes when the population is not too skew.
314
Problem PS253
1. Graph two Fisher F-densities: the first has 10 and 5 dof, the second has 5 and 10 dof.
Work as follows: the parameters 10 and 5 in cells A2 and B2, the values .01, .02, …, 1,
1.2, 1.4, …,6 in cells A4:A43. In cells B4:B43 the values of an F-density with 10 and 5
dof. Example for cell B4: = 𝐹. 𝐷𝐼𝑆𝑇(𝐴4; 𝐴$2; 𝐡$2; 0) and likewise for cells B5:B43.
Report the values of the F-density with 5 and 10 dof in the range C4:C43. Graph both
densities using Scatter with smooth lines;
2. When U is Chi-square distributed with k dof and V is Chi-square distributed with l
dof, independent of U, then
π‘ˆ ⁄π‘˜
𝑉 ⁄𝑙
is F-distributed with k an l dof. We simulate this
result: report values of the parameters π‘˜ = 10 and 𝑙 = 5 in cells A2 and B2, generate
1000 chi-square random values with parameter 10 in the range A4:A1003 and 1000
chi-square random values with parameter 5 in the range B4:B1003. Example for cell
A4: = 𝐢𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 𝐴$2). Compute the ratio in column C. Example for cell
C4: = (𝐴4⁄𝐴$2)⁄(𝐡4⁄𝐡$2) and likewise for cells C5:C1003;
3. Compute relative frequencies of the ratios using bin values 0.2, 0.4, …, 5.8, 6: the bin
values in cells E3:E32, the relative frequencies in cells F3:F32, the midpoints of the
intervals 0.1, 0.3, …, 5.9 in cells G3:G32. Compute the probability of the intervals
assuming a F-distribution with 10 and 5 dof in cells H3:H32. Example for cell H4: =
𝐹. 𝐷𝐼𝑆𝑇(𝐸4; 𝐴$2; 𝐡$2; 1) − 𝐹. 𝐷𝐼𝑆𝑇(𝐸3; 𝐴$2; 𝐡$2; 1). Show the relative frequencies
(as rectangles) and the probabilities (as a smooth curve) in the same graph;
4. Compute the expected value of a F-distributed random variable with 10 and 5 dof (=
𝐡$2⁄(𝐡$2 − 2)) and compare the sample mean of the 1000 ratios to this value. Do
likewise for the variance: =
2 ∗ 𝐡2 ∗ 𝐡2 ∗ (𝐴2 + 𝐡2 − 2)⁄(𝐴2 ∗ π‘ƒπ‘‚π‘ŠπΈπ‘…(𝐡2 − 2,2) ∗ (𝐡2 − 4)) and the sample
variance.
Assignment PA253
Generate 1000 random values of an F-random variable with parameters 10 and 5 dof using
the function F.INV. Compute the sample mean and sample variance and compare with the
exact values of those parameters.
315
Problem PS254
We illustrate the property that when X is a F-distributed random variable with k and l dof,
then 1⁄𝑋 is F-distributed with l and k dof.
1. Report the values π‘˜ = 5 and 𝑙 = 10 in cells A2 and B2. Generate random values of a
F-distributed r.v. X in cells A4:A1003 (see Problem PS253). Compute the values of
1⁄𝑋 in column B;
2. Work as in Steps 3 and 4 of Problem PS253 using the values of 1⁄𝑋 .
Assignment PA254
Work as above to illustrate the following property: when X is T-distributed with k dof, then
𝑋 2 is F-distributed with 1 and k dof.
316
Problem PS255
Given two normally distributed populations. Population 1 has expected value πœ‡1 and
variance 𝜎12 . Population 2 has expected value πœ‡2 and variance 𝜎22 . Take two independent
samples of sizes 𝑛1 and 𝑛2 , one sample from each population. Let 𝑆12 and 𝑆22 be the sample
variances. It holds that (𝑆12 ⁄𝜎12 )⁄(𝑆22 ⁄𝜎22 ) is F-distributed with 𝑛1 − 1 and 𝑛2 − 1 dof. We
simulate this result in the following steps:
1. Report the expected value πœ‡1 = 10 and the variance 𝜎12 = 9 in cells C1 and C2, the
expected value πœ‡2 = 8 and the variance 𝜎22 = 4 in cells G1 and G2. Take a random
sample of size 𝑛1 = 8 from population 1 in cells A6:A13, a sample of size 𝑛2 = 6 in
cells B6:B11. Compute the sample variance 𝑠12 in cell D6, the sample variance 𝑠22 in
cell E6. Compute the ratio (𝑠12 ⁄𝜎12 )⁄(𝑠22 ⁄𝜎22 ) in cell F6. Use the TABLE-option to
generate cells D6:F6 1000 times;
2. Use bin values 0.25, 0.5, …, 5.25 in cells I6:I26 and compute the relative frequencies
of the ratios in column F in cells J6:J26. Compute the expected relative frequencies in
cells K6:K26. Example for cell K7: = 𝐹. 𝐷𝐼𝑆𝑇(𝐼7; 7; 5; 1) − 𝐹. 𝐷𝐼𝑆𝑇(𝐼6; 7; 5; 1) and
likewise for the remaining cells;
3. Construct a histogram of the relative frequencies in cells I6:I25 and the probabilities
in cells K6:K25 (neglect the tail relative frequencies and probabilities). Change the
columns of the probabilities to a Scatter with smooth lines. Notice how the histogram
of relative frequencies approximates the smooth F-density;
4. Compute the average of the 1000 ratios in column F and compare it to its expected
value (𝑛2 − 1)⁄(𝑛2 − 3) = 5⁄3 = 1.667. Do likewise for the variance of the 1000
ratios where the expected value equals
2 ∗ π‘π‘œπ‘€π‘’π‘Ÿ(𝑛2 − 1; 2) ∗ (𝑛1 + 𝑛2 − 4)⁄((𝑛1 − 1) ∗ π‘π‘œπ‘€π‘’π‘Ÿ(𝑛2 − 3; 2) ∗ (𝑛2 − 5)) =
7.9365.
Assignment PA255
Repeat the analysis above for exponential populations with expected values of 2 and 1.
Verify that the results above do not hold for non-normal data.
Download