Ch11. Chi-sq, T- and F

307 Chapter 11. Chi-square, Student’s T and Fisher’s F-distribution Problem PS246 1. Graph three Chi-square densities with respectively 4, 8 and 12 dof (degrees of freedom). Work as follows: the values 0, 0.5, 1, 1.5, …, 25 in the range A3:A53, the dof 4, 8 and 12 in cells B2:D2, in cell B3 the Chi-square density for 4 dof at the point in cell A3: = 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇($𝐴3; 𝐵$2; 0). Drag cell B3 to cell D3 and cells B3:D3 to cells B53:D53. Graph the three densities using Scatter with smooth lines. Notice that the density tends to a normal density as the number of dof increases; 2. When X is Chi-square distributed with k dof (𝑋~𝐶ℎ𝑖(𝑘)), it can be shown that 𝐸(𝑋) = 𝑘, 𝑣𝑎𝑟(𝑋) = 2𝑘, 𝑚𝑜𝑑𝑢𝑠(𝑋) = 𝑘 − 2, 𝑓𝑜𝑟 𝑘 > 2. The median of X can be computed in Excel using = 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉(0.5; 𝑘) when 𝑘 𝑑𝑜𝑓. Construct vertical line segments at the value of the mode, the median and the expected value. To add a vertical line for the mean in the graph for 𝑑𝑜𝑓 = 10, input the value 10 in cells I27 and I28, the value 0 in cell J27 and = 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐼28; 10; 0) in cell J28. Add the series with Series Y-values: J27/J28 and Series X-values I27:I28. Work in the same way for the mode and the median. Assignment PA246 Graph three chi-square random variables with 10, 20 and 40 dof. Graph three normal densities on the same graph where the normal variables have the expected value and the variance of the chi-square random variables. Note how the approximation of the normal density improves as the number of dof increases. 308 Problem PS247 When Y is a normal random variable with expected value 𝜇 and standard deviation σ, then 𝑋 = ((𝑌 − 𝜇)⁄𝜎)2 is a Chi-square random variable with 1 degree of freedom. We simulate this result in the following steps: 1. Generate randomly 250 normal values with expected value 10 and standard deviation 2 in the range A2:A251. Compute the corresponding values of X in the range B2:B252. Example for cell B2: = 𝑃𝑂𝑊𝐸𝑅((𝐴2 − 10)⁄2; 2) and likewise for cells B3:B251. Use bin values 0, 0.2, 0.4, …, 4.2 in cells E3:E24. Compute the relative frequencies of the sample values in column B using the array instruction = 𝐹𝑅𝐸𝑄𝑈𝐸𝑁𝐶𝑌(𝐵2: 𝐵251; 𝐸3: 𝐸24)/250 in cells F3:F24. Input the midpoints of the bins 0.1, 0.3, …, 4.1 in cells G4:G24. Compute the probability of the bin intervals for a chi-square random variable with 1 dof in cells H4:H24. Example for cell H4: = 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐸4; 1; 1) − 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐸3; 1; 1) and likewise for cells H5:H24. Graph the relative frequencies and the chi-square probabilities in one graph with the relative frequencies as rectangles with gap 0 and the probabilities as a Scatter with smooth lines. Notice the close correspondence between relative frequencies and probabilities; 2. Compute the sample mean, the sample variance and the sample median of the sample in column B and compare to the expected value, the variance and the median of a chi-square random variable with 1 dof. The median is computed using = 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉(0.5; 1). Assignment PA247 Illustrate the property that the sum of independent chi-square random variables is chisquare distributed with dof equal to the sum of the dof’s of the independent variables. Work as follows: generate randomly chi-square random values in cells A2:D2 with respectively 2, 3, 2 and 4 dof. Sum the values in cell E2. Drag cells A2:E2 over 299 more lines. Compute sample means, medians and variances for the samples in the five columns. Graph the histogram of the sample in column E and compare to a chi-square density with 11 dof as in Step 1 above. 309 Problem PS248 For a random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 from a normally distributed population with expected value 𝜇 and standard deviation 𝜎 it holds that (𝑛 − 1) ∗ 𝑆 2 ⁄𝜎 2 = ∑𝑛1(𝑋𝑖 − 𝑋̅)2 ⁄𝜎 2 is Chisquare distributed with 𝑛 − 1 𝑑𝑜𝑓 (𝑋̅ is the sample mean, 𝑆 2 is the sample variance). We simulate this result in the following steps: 1. Report the expected value 𝜇 = 10 in cell A1 and the standard deviation 𝜎 = 2.5 in cell B2. Generate a random sample in cells A3:A7: = 𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); $𝐴$1; $𝐵$1). Compute the sample variance in cell C3: = 𝑉𝐴𝑅. 𝑆(𝐴3: 𝐴7) and the value of (𝑛 − 1) ∗ 𝑆 2 ⁄𝜎 2 in cell D3: = 4 ∗ 𝐶3⁄(𝐵1 ∗ 𝐵1). Use the TABLE-option to generate the values in cell C3:D3 1000 times; 2. Use bin values 1, 2, …, 14, 15 in cells F4:F18 and compute the relative frequencies of the Chi-square values in column D in cells G4:G18. Compute the probabilities of the chi-square density with 4 dof in cells H4:H18. Example for cell H5: = 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐹5; 4; 1) − 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇(𝐹4; 4; 1) and likewise for the remaining cells; 3. Construct a histogram of the relative frequencies in cells G4:G18 and the chi-square probabilities in cells H4:H18. Change the columns of the density to a Scatter with smooth lines. Notice how the histogram of relative frequencies approximates the smooth chi-square density; 4. Compute the average and the variance of the 1000 values in column D and compare these values to their expected values of 𝑛 − 1 = 4 and 2 ∗ (𝑛 − 1) = 8; 5. Change the value of 𝜇 in cell A1 to 20, then to 5. Change the value of the standard deviation in cell B2 to 10, then to 1 and check the graphs. Assignment PA248 In the problem above, replace the value of ∑𝑛1(𝑋𝑖 − 𝑋̅)2 ⁄𝜎 2 by ∑𝑛1(𝑋𝑖 − 𝜇)2⁄𝜎 2 which is Chisquare distributed with n dof. Simulate this result following the four steps above. 310 Problem PS249 Graph the standard normal pdf and three Student T densities with respectively 3, 8 and 25 dof (degrees of freedom). Work as follows: the values -4, -3.8, -3.6, …3.6, 3.8, 4 in the range A3:A43, the dof 3, 8 and 25 in cells C2:E2, in cell B3 the standard normal density value at the point in cell A3: = 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇($𝐴3; 0). In cell C3 the T-density with 3 dof at the point in cell A3: = 𝑇. 𝐷𝐼𝑆𝑇($𝐴3; 𝐶$2; 0). Drag cell C3 to cell E3 and cells B3:E3 to cells B43:E43. Graph the four densities using Scatter with smooth lines. Notice that the T-density tends to a standard normal density as the number of degrees of freedom increases. Assignment PA249 Graph a standard normal pdf and a Student t-density with 5 dof as above. Compute the value of the distribution function for a probability 0.025 using T.INV and NORM.S.INV, say 𝑥𝑡 and 𝑥𝑛𝑜𝑟𝑚 . Construct vertical line segments from the horizontal axis to the value of the pdf at 𝑥𝑡 and 𝑥𝑛𝑜𝑟𝑚 . Repeat this assignment for a T distribution with 25 dof. 311 Problem PS250 When Z is standard normally distributed and U is Chi-square distributed with k dof, independent of Z, then 𝑋 = 𝑍⁄√𝑈⁄𝑘 is Students T-distributed with k dof. We simulate this result as follows: 1. Generate randomly 1000 standard normal values in the range A3:A1002: = 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( )). Generate randomly 1000 Chi-square values with 5 dof in the range B3:B1002 (the dof 5 in cell B2). Compute the above ratio in column C. Example for cell C3: = 𝐴3⁄𝑆𝑄𝑅𝑇(𝐵3⁄𝐵$2) and likewise for cells C4:C1002; 2. Compute relative frequencies of the values in column C ranging from -4.4 to 4.4 in steps of 0.4 (cells E3:E25 and F3:F25). Compute the probability of the bin intervals for a T-distribution with 5 dof in cells G3:G25. Example for cell G4: = 𝑇. 𝐷𝐼𝑆𝑇(𝐸4; 𝐵$2; 1) − 𝑇. 𝐷𝐼𝑆𝑇(𝐸3; 𝐵$2; 1); 3. Show the relative frequencies (as rectangles) and the probabilities (as a smooth curve) in the same graph. Assignment PA250 Generate 500 random numbers in column A. Generate randomly 500 standard normal values in column B using the corresponding random numbers in column A. Generate randomly 500 T-values (6 dof) in column C again using the random numbers in column A. 1. Compute mean of the values generated in columns B and C and compare to their expected values of 0; 2. Compute median of the values generated in columns B and C and compare to their expected values of 0; 3. Compute variance of the values generated in columns B and C and compare to their expected values of 1 and 6⁄4 = 1.5. 312 Problem PS251 Let 𝑋̅ be the sample mean and S the sample standard deviation of a random sample of size n from a normal population. It is known that (𝑋̅ − 𝜇) ∗ √𝑛⁄𝑆 is T-distributed with 𝑛 − 1 dof. We simulate this result as follows: 1. Generate a random sample of 16 observations from a normal population with expected value 10 and standard deviation 2 in cells A3:A18: = 𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 10; 2). Compute the sample mean in cell C3, the sample standard deviation in cell D3 and the ratio (𝑋̅ − 𝜇) ∗ √𝑛⁄𝑆 in cell E3. Use the TABLEoption to generate those values 1000 times in columns C, D and E; 2. Compute the average of the 1000 ratios in column E in cell H3 and compare with its theoretical value 0. Compute the standard deviation of the 1000 ratios in column E in cell H6 and compare with its theoretical value √15⁄13 = 1.0742; 3. Compute the relative frequencies of the ratios in column E between -4 and +4.4 using interval widths of 0.4. Compute the probabilities of these intervals assuming a T distribution with 15 dof. Compare the relative frequencies and the probabilities, also graphically using smooth curves (similar to Steps 2 and 3 in PS250) Assignment PA251 Consider the problem PS251 above. In Step 1 add a column F where the ratio (𝑋̅ − 𝜇) ∗ √𝑛⁄𝜎 is computed. In Step 2 also compute the average and the standard deviation of the ratio values in column F and compare with their theoretical values. 313 Problem PS252 Consider Problem PS251 but the sample data are generated from an exponential density with parameter 1. 1. Generate 64 random exponential values in the range A3:A66: = 𝐺𝐴𝑀𝑀𝐴. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 1; 1); 2. Firstly consider the first 9 sample points (A3:A11) and compute the sample mean in cell C3, the sample standard deviation in cell D3 and the ratio (𝑋̅ − 𝜇) ∗ √𝑛⁄𝑆 in cell E3. Next consider the full sample of 64 sample points (A3:A66) and compute the sample mean in cell G3, the sample standard deviation in cell H3 and the ratio (𝑋̅ − 𝜇) ∗ √𝑛⁄𝑆 in cell I3. Use the TABLE-option to generate those values 1000 times in columns C to I; 3. Compute the average of the 1000 ratios in column E and compare with the value 0. Notice the significant difference. Compute the standard deviation of the 1000 ratios in column E and compare with the value √8⁄6 = 1.1547. Notice again the significant difference. Repeat the computations for the sample of 64 sample points and notice that the differences have been much reduced (theoretical standard deviation is √63⁄61 = 1.0163 ); 4. Compare graphically relative frequencies and probabilities as in Step 2 and 3 in Problem PS250 for both sample sizes. Assignment PA252 Consider the problem above but generate the sample data from a gamma density with parameters 4 and 2 (expected value is 2). Notice the robustness of the T-approximation for the ratio even for small sample sizes when the population is not too skew. 314 Problem PS253 1. Graph two Fisher F-densities: the first has 10 and 5 dof, the second has 5 and 10 dof. Work as follows: the parameters 10 and 5 in cells A2 and B2, the values .01, .02, …, 1, 1.2, 1.4, …,6 in cells A4:A43. In cells B4:B43 the values of an F-density with 10 and 5 dof. Example for cell B4: = 𝐹. 𝐷𝐼𝑆𝑇(𝐴4; 𝐴$2; 𝐵$2; 0) and likewise for cells B5:B43. Report the values of the F-density with 5 and 10 dof in the range C4:C43. Graph both densities using Scatter with smooth lines; 2. When U is Chi-square distributed with k dof and V is Chi-square distributed with l dof, independent of U, then 𝑈 ⁄𝑘 𝑉 ⁄𝑙 is F-distributed with k an l dof. We simulate this result: report values of the parameters 𝑘 = 10 and 𝑙 = 5 in cells A2 and B2, generate 1000 chi-square random values with parameter 10 in the range A4:A1003 and 1000 chi-square random values with parameter 5 in the range B4:B1003. Example for cell A4: = 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 𝐴$2). Compute the ratio in column C. Example for cell C4: = (𝐴4⁄𝐴$2)⁄(𝐵4⁄𝐵$2) and likewise for cells C5:C1003; 3. Compute relative frequencies of the ratios using bin values 0.2, 0.4, …, 5.8, 6: the bin values in cells E3:E32, the relative frequencies in cells F3:F32, the midpoints of the intervals 0.1, 0.3, …, 5.9 in cells G3:G32. Compute the probability of the intervals assuming a F-distribution with 10 and 5 dof in cells H3:H32. Example for cell H4: = 𝐹. 𝐷𝐼𝑆𝑇(𝐸4; 𝐴$2; 𝐵$2; 1) − 𝐹. 𝐷𝐼𝑆𝑇(𝐸3; 𝐴$2; 𝐵$2; 1). Show the relative frequencies (as rectangles) and the probabilities (as a smooth curve) in the same graph; 4. Compute the expected value of a F-distributed random variable with 10 and 5 dof (= 𝐵$2⁄(𝐵$2 − 2)) and compare the sample mean of the 1000 ratios to this value. Do likewise for the variance: = 2 ∗ 𝐵2 ∗ 𝐵2 ∗ (𝐴2 + 𝐵2 − 2)⁄(𝐴2 ∗ 𝑃𝑂𝑊𝐸𝑅(𝐵2 − 2,2) ∗ (𝐵2 − 4)) and the sample variance. Assignment PA253 Generate 1000 random values of an F-random variable with parameters 10 and 5 dof using the function F.INV. Compute the sample mean and sample variance and compare with the exact values of those parameters. 315 Problem PS254 We illustrate the property that when X is a F-distributed random variable with k and l dof, then 1⁄𝑋 is F-distributed with l and k dof. 1. Report the values 𝑘 = 5 and 𝑙 = 10 in cells A2 and B2. Generate random values of a F-distributed r.v. X in cells A4:A1003 (see Problem PS253). Compute the values of 1⁄𝑋 in column B; 2. Work as in Steps 3 and 4 of Problem PS253 using the values of 1⁄𝑋 . Assignment PA254 Work as above to illustrate the following property: when X is T-distributed with k dof, then 𝑋 2 is F-distributed with 1 and k dof. 316 Problem PS255 Given two normally distributed populations. Population 1 has expected value 𝜇1 and variance 𝜎12 . Population 2 has expected value 𝜇2 and variance 𝜎22 . Take two independent samples of sizes 𝑛1 and 𝑛2 , one sample from each population. Let 𝑆12 and 𝑆22 be the sample variances. It holds that (𝑆12 ⁄𝜎12 )⁄(𝑆22 ⁄𝜎22 ) is F-distributed with 𝑛1 − 1 and 𝑛2 − 1 dof. We simulate this result in the following steps: 1. Report the expected value 𝜇1 = 10 and the variance 𝜎12 = 9 in cells C1 and C2, the expected value 𝜇2 = 8 and the variance 𝜎22 = 4 in cells G1 and G2. Take a random sample of size 𝑛1 = 8 from population 1 in cells A6:A13, a sample of size 𝑛2 = 6 in cells B6:B11. Compute the sample variance 𝑠12 in cell D6, the sample variance 𝑠22 in cell E6. Compute the ratio (𝑠12 ⁄𝜎12 )⁄(𝑠22 ⁄𝜎22 ) in cell F6. Use the TABLE-option to generate cells D6:F6 1000 times; 2. Use bin values 0.25, 0.5, …, 5.25 in cells I6:I26 and compute the relative frequencies of the ratios in column F in cells J6:J26. Compute the expected relative frequencies in cells K6:K26. Example for cell K7: = 𝐹. 𝐷𝐼𝑆𝑇(𝐼7; 7; 5; 1) − 𝐹. 𝐷𝐼𝑆𝑇(𝐼6; 7; 5; 1) and likewise for the remaining cells; 3. Construct a histogram of the relative frequencies in cells I6:I25 and the probabilities in cells K6:K25 (neglect the tail relative frequencies and probabilities). Change the columns of the probabilities to a Scatter with smooth lines. Notice how the histogram of relative frequencies approximates the smooth F-density; 4. Compute the average of the 1000 ratios in column F and compare it to its expected value (𝑛2 − 1)⁄(𝑛2 − 3) = 5⁄3 = 1.667. Do likewise for the variance of the 1000 ratios where the expected value equals 2 ∗ 𝑝𝑜𝑤𝑒𝑟(𝑛2 − 1; 2) ∗ (𝑛1 + 𝑛2 − 4)⁄((𝑛1 − 1) ∗ 𝑝𝑜𝑤𝑒𝑟(𝑛2 − 3; 2) ∗ (𝑛2 − 5)) = 7.9365. Assignment PA255 Repeat the analysis above for exponential populations with expected values of 2 and 1. Verify that the results above do not hold for non-normal data.

Ch11. Chi-sq, T- and F

Related documents

Products

Support

Ch11. Chi-sq, T- and F

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib