307 Chapter 11. Chi-square, Student’s T and Fisher’s F-distribution Problem PS246 1. Graph three Chi-square densities with respectively 4, 8 and 12 dof (degrees of freedom). Work as follows: the values 0, 0.5, 1, 1.5, …, 25 in the range A3:A53, the dof 4, 8 and 12 in cells B2:D2, in cell B3 the Chi-square density for 4 dof at the point in cell A3: = πΆπ»πΌππ. π·πΌππ($π΄3; π΅$2; 0). Drag cell B3 to cell D3 and cells B3:D3 to cells B53:D53. Graph the three densities using Scatter with smooth lines. Notice that the density tends to a normal density as the number of dof increases; 2. When X is Chi-square distributed with k dof (π~πΆβπ(π)), it can be shown that πΈ(π) = π, π£ππ(π) = 2π, ππππ’π (π) = π − 2, πππ π > 2. The median of X can be computed in Excel using = πΆπ»πΌππ. πΌππ(0.5; π) when π πππ. Construct vertical line segments at the value of the mode, the median and the expected value. To add a vertical line for the mean in the graph for πππ = 10, input the value 10 in cells I27 and I28, the value 0 in cell J27 and = πΆπ»πΌππ. π·πΌππ(πΌ28; 10; 0) in cell J28. Add the series with Series Y-values: J27/J28 and Series X-values I27:I28. Work in the same way for the mode and the median. Assignment PA246 Graph three chi-square random variables with 10, 20 and 40 dof. Graph three normal densities on the same graph where the normal variables have the expected value and the variance of the chi-square random variables. Note how the approximation of the normal density improves as the number of dof increases. 308 Problem PS247 When Y is a normal random variable with expected value π and standard deviation σ, then π = ((π − π)⁄π)2 is a Chi-square random variable with 1 degree of freedom. We simulate this result in the following steps: 1. Generate randomly 250 normal values with expected value 10 and standard deviation 2 in the range A2:A251. Compute the corresponding values of X in the range B2:B252. Example for cell B2: = ππππΈπ ((π΄2 − 10)⁄2; 2) and likewise for cells B3:B251. Use bin values 0, 0.2, 0.4, …, 4.2 in cells E3:E24. Compute the relative frequencies of the sample values in column B using the array instruction = πΉπ πΈπππΈππΆπ(π΅2: π΅251; πΈ3: πΈ24)/250 in cells F3:F24. Input the midpoints of the bins 0.1, 0.3, …, 4.1 in cells G4:G24. Compute the probability of the bin intervals for a chi-square random variable with 1 dof in cells H4:H24. Example for cell H4: = πΆπ»πΌππ. π·πΌππ(πΈ4; 1; 1) − πΆπ»πΌππ. π·πΌππ(πΈ3; 1; 1) and likewise for cells H5:H24. Graph the relative frequencies and the chi-square probabilities in one graph with the relative frequencies as rectangles with gap 0 and the probabilities as a Scatter with smooth lines. Notice the close correspondence between relative frequencies and probabilities; 2. Compute the sample mean, the sample variance and the sample median of the sample in column B and compare to the expected value, the variance and the median of a chi-square random variable with 1 dof. The median is computed using = πΆπ»πΌππ. πΌππ(0.5; 1). Assignment PA247 Illustrate the property that the sum of independent chi-square random variables is chisquare distributed with dof equal to the sum of the dof’s of the independent variables. Work as follows: generate randomly chi-square random values in cells A2:D2 with respectively 2, 3, 2 and 4 dof. Sum the values in cell E2. Drag cells A2:E2 over 299 more lines. Compute sample means, medians and variances for the samples in the five columns. Graph the histogram of the sample in column E and compare to a chi-square density with 11 dof as in Step 1 above. 309 Problem PS248 For a random sample π1 , π2 , … , ππ from a normally distributed population with expected value π and standard deviation π it holds that (π − 1) ∗ π 2 ⁄π 2 = ∑π1(ππ − πΜ )2 ⁄π 2 is Chisquare distributed with π − 1 πππ (πΜ is the sample mean, π 2 is the sample variance). We simulate this result in the following steps: 1. Report the expected value π = 10 in cell A1 and the standard deviation π = 2.5 in cell B2. Generate a random sample in cells A3:A7: = πππ π. πΌππ(π π΄ππ·( ); $π΄$1; $π΅$1). Compute the sample variance in cell C3: = ππ΄π . π(π΄3: π΄7) and the value of (π − 1) ∗ π 2 ⁄π 2 in cell D3: = 4 ∗ πΆ3⁄(π΅1 ∗ π΅1). Use the TABLE-option to generate the values in cell C3:D3 1000 times; 2. Use bin values 1, 2, …, 14, 15 in cells F4:F18 and compute the relative frequencies of the Chi-square values in column D in cells G4:G18. Compute the probabilities of the chi-square density with 4 dof in cells H4:H18. Example for cell H5: = πΆπ»πΌππ. π·πΌππ(πΉ5; 4; 1) − πΆπ»πΌππ. π·πΌππ(πΉ4; 4; 1) and likewise for the remaining cells; 3. Construct a histogram of the relative frequencies in cells G4:G18 and the chi-square probabilities in cells H4:H18. Change the columns of the density to a Scatter with smooth lines. Notice how the histogram of relative frequencies approximates the smooth chi-square density; 4. Compute the average and the variance of the 1000 values in column D and compare these values to their expected values of π − 1 = 4 and 2 ∗ (π − 1) = 8; 5. Change the value of π in cell A1 to 20, then to 5. Change the value of the standard deviation in cell B2 to 10, then to 1 and check the graphs. Assignment PA248 In the problem above, replace the value of ∑π1(ππ − πΜ )2 ⁄π 2 by ∑π1(ππ − π)2⁄π 2 which is Chisquare distributed with n dof. Simulate this result following the four steps above. 310 Problem PS249 Graph the standard normal pdf and three Student T densities with respectively 3, 8 and 25 dof (degrees of freedom). Work as follows: the values -4, -3.8, -3.6, …3.6, 3.8, 4 in the range A3:A43, the dof 3, 8 and 25 in cells C2:E2, in cell B3 the standard normal density value at the point in cell A3: = πππ π. π. π·πΌππ($π΄3; 0). In cell C3 the T-density with 3 dof at the point in cell A3: = π. π·πΌππ($π΄3; πΆ$2; 0). Drag cell C3 to cell E3 and cells B3:E3 to cells B43:E43. Graph the four densities using Scatter with smooth lines. Notice that the T-density tends to a standard normal density as the number of degrees of freedom increases. Assignment PA249 Graph a standard normal pdf and a Student t-density with 5 dof as above. Compute the value of the distribution function for a probability 0.025 using T.INV and NORM.S.INV, say π₯π‘ and π₯ππππ . Construct vertical line segments from the horizontal axis to the value of the pdf at π₯π‘ and π₯ππππ . Repeat this assignment for a T distribution with 25 dof. 311 Problem PS250 When Z is standard normally distributed and U is Chi-square distributed with k dof, independent of Z, then π = π⁄√π⁄π is Students T-distributed with k dof. We simulate this result as follows: 1. Generate randomly 1000 standard normal values in the range A3:A1002: = πππ π. π. πΌππ(π π΄ππ·( )). Generate randomly 1000 Chi-square values with 5 dof in the range B3:B1002 (the dof 5 in cell B2). Compute the above ratio in column C. Example for cell C3: = π΄3⁄πππ π(π΅3⁄π΅$2) and likewise for cells C4:C1002; 2. Compute relative frequencies of the values in column C ranging from -4.4 to 4.4 in steps of 0.4 (cells E3:E25 and F3:F25). Compute the probability of the bin intervals for a T-distribution with 5 dof in cells G3:G25. Example for cell G4: = π. π·πΌππ(πΈ4; π΅$2; 1) − π. π·πΌππ(πΈ3; π΅$2; 1); 3. Show the relative frequencies (as rectangles) and the probabilities (as a smooth curve) in the same graph. Assignment PA250 Generate 500 random numbers in column A. Generate randomly 500 standard normal values in column B using the corresponding random numbers in column A. Generate randomly 500 T-values (6 dof) in column C again using the random numbers in column A. 1. Compute mean of the values generated in columns B and C and compare to their expected values of 0; 2. Compute median of the values generated in columns B and C and compare to their expected values of 0; 3. Compute variance of the values generated in columns B and C and compare to their expected values of 1 and 6⁄4 = 1.5. 312 Problem PS251 Let πΜ be the sample mean and S the sample standard deviation of a random sample of size n from a normal population. It is known that (πΜ − π) ∗ √π⁄π is T-distributed with π − 1 dof. We simulate this result as follows: 1. Generate a random sample of 16 observations from a normal population with expected value 10 and standard deviation 2 in cells A3:A18: = πππ π. πΌππ(π π΄ππ·( ); 10; 2). Compute the sample mean in cell C3, the sample standard deviation in cell D3 and the ratio (πΜ − π) ∗ √π⁄π in cell E3. Use the TABLEoption to generate those values 1000 times in columns C, D and E; 2. Compute the average of the 1000 ratios in column E in cell H3 and compare with its theoretical value 0. Compute the standard deviation of the 1000 ratios in column E in cell H6 and compare with its theoretical value √15⁄13 = 1.0742; 3. Compute the relative frequencies of the ratios in column E between -4 and +4.4 using interval widths of 0.4. Compute the probabilities of these intervals assuming a T distribution with 15 dof. Compare the relative frequencies and the probabilities, also graphically using smooth curves (similar to Steps 2 and 3 in PS250) Assignment PA251 Consider the problem PS251 above. In Step 1 add a column F where the ratio (πΜ − π) ∗ √π⁄π is computed. In Step 2 also compute the average and the standard deviation of the ratio values in column F and compare with their theoretical values. 313 Problem PS252 Consider Problem PS251 but the sample data are generated from an exponential density with parameter 1. 1. Generate 64 random exponential values in the range A3:A66: = πΊπ΄πππ΄. πΌππ(π π΄ππ·( ); 1; 1); 2. Firstly consider the first 9 sample points (A3:A11) and compute the sample mean in cell C3, the sample standard deviation in cell D3 and the ratio (πΜ − π) ∗ √π⁄π in cell E3. Next consider the full sample of 64 sample points (A3:A66) and compute the sample mean in cell G3, the sample standard deviation in cell H3 and the ratio (πΜ − π) ∗ √π⁄π in cell I3. Use the TABLE-option to generate those values 1000 times in columns C to I; 3. Compute the average of the 1000 ratios in column E and compare with the value 0. Notice the significant difference. Compute the standard deviation of the 1000 ratios in column E and compare with the value √8⁄6 = 1.1547. Notice again the significant difference. Repeat the computations for the sample of 64 sample points and notice that the differences have been much reduced (theoretical standard deviation is √63⁄61 = 1.0163 ); 4. Compare graphically relative frequencies and probabilities as in Step 2 and 3 in Problem PS250 for both sample sizes. Assignment PA252 Consider the problem above but generate the sample data from a gamma density with parameters 4 and 2 (expected value is 2). Notice the robustness of the T-approximation for the ratio even for small sample sizes when the population is not too skew. 314 Problem PS253 1. Graph two Fisher F-densities: the first has 10 and 5 dof, the second has 5 and 10 dof. Work as follows: the parameters 10 and 5 in cells A2 and B2, the values .01, .02, …, 1, 1.2, 1.4, …,6 in cells A4:A43. In cells B4:B43 the values of an F-density with 10 and 5 dof. Example for cell B4: = πΉ. π·πΌππ(π΄4; π΄$2; π΅$2; 0) and likewise for cells B5:B43. Report the values of the F-density with 5 and 10 dof in the range C4:C43. Graph both densities using Scatter with smooth lines; 2. When U is Chi-square distributed with k dof and V is Chi-square distributed with l dof, independent of U, then π ⁄π π ⁄π is F-distributed with k an l dof. We simulate this result: report values of the parameters π = 10 and π = 5 in cells A2 and B2, generate 1000 chi-square random values with parameter 10 in the range A4:A1003 and 1000 chi-square random values with parameter 5 in the range B4:B1003. Example for cell A4: = πΆπ»πΌππ. πΌππ(π π΄ππ·( ); π΄$2). Compute the ratio in column C. Example for cell C4: = (π΄4⁄π΄$2)⁄(π΅4⁄π΅$2) and likewise for cells C5:C1003; 3. Compute relative frequencies of the ratios using bin values 0.2, 0.4, …, 5.8, 6: the bin values in cells E3:E32, the relative frequencies in cells F3:F32, the midpoints of the intervals 0.1, 0.3, …, 5.9 in cells G3:G32. Compute the probability of the intervals assuming a F-distribution with 10 and 5 dof in cells H3:H32. Example for cell H4: = πΉ. π·πΌππ(πΈ4; π΄$2; π΅$2; 1) − πΉ. π·πΌππ(πΈ3; π΄$2; π΅$2; 1). Show the relative frequencies (as rectangles) and the probabilities (as a smooth curve) in the same graph; 4. Compute the expected value of a F-distributed random variable with 10 and 5 dof (= π΅$2⁄(π΅$2 − 2)) and compare the sample mean of the 1000 ratios to this value. Do likewise for the variance: = 2 ∗ π΅2 ∗ π΅2 ∗ (π΄2 + π΅2 − 2)⁄(π΄2 ∗ ππππΈπ (π΅2 − 2,2) ∗ (π΅2 − 4)) and the sample variance. Assignment PA253 Generate 1000 random values of an F-random variable with parameters 10 and 5 dof using the function F.INV. Compute the sample mean and sample variance and compare with the exact values of those parameters. 315 Problem PS254 We illustrate the property that when X is a F-distributed random variable with k and l dof, then 1⁄π is F-distributed with l and k dof. 1. Report the values π = 5 and π = 10 in cells A2 and B2. Generate random values of a F-distributed r.v. X in cells A4:A1003 (see Problem PS253). Compute the values of 1⁄π in column B; 2. Work as in Steps 3 and 4 of Problem PS253 using the values of 1⁄π . Assignment PA254 Work as above to illustrate the following property: when X is T-distributed with k dof, then π 2 is F-distributed with 1 and k dof. 316 Problem PS255 Given two normally distributed populations. Population 1 has expected value π1 and variance π12 . Population 2 has expected value π2 and variance π22 . Take two independent samples of sizes π1 and π2 , one sample from each population. Let π12 and π22 be the sample variances. It holds that (π12 ⁄π12 )⁄(π22 ⁄π22 ) is F-distributed with π1 − 1 and π2 − 1 dof. We simulate this result in the following steps: 1. Report the expected value π1 = 10 and the variance π12 = 9 in cells C1 and C2, the expected value π2 = 8 and the variance π22 = 4 in cells G1 and G2. Take a random sample of size π1 = 8 from population 1 in cells A6:A13, a sample of size π2 = 6 in cells B6:B11. Compute the sample variance π 12 in cell D6, the sample variance π 22 in cell E6. Compute the ratio (π 12 ⁄π12 )⁄(π 22 ⁄π22 ) in cell F6. Use the TABLE-option to generate cells D6:F6 1000 times; 2. Use bin values 0.25, 0.5, …, 5.25 in cells I6:I26 and compute the relative frequencies of the ratios in column F in cells J6:J26. Compute the expected relative frequencies in cells K6:K26. Example for cell K7: = πΉ. π·πΌππ(πΌ7; 7; 5; 1) − πΉ. π·πΌππ(πΌ6; 7; 5; 1) and likewise for the remaining cells; 3. Construct a histogram of the relative frequencies in cells I6:I25 and the probabilities in cells K6:K25 (neglect the tail relative frequencies and probabilities). Change the columns of the probabilities to a Scatter with smooth lines. Notice how the histogram of relative frequencies approximates the smooth F-density; 4. Compute the average of the 1000 ratios in column F and compare it to its expected value (π2 − 1)⁄(π2 − 3) = 5⁄3 = 1.667. Do likewise for the variance of the 1000 ratios where the expected value equals 2 ∗ πππ€ππ(π2 − 1; 2) ∗ (π1 + π2 − 4)⁄((π1 − 1) ∗ πππ€ππ(π2 − 3; 2) ∗ (π2 − 5)) = 7.9365. Assignment PA255 Repeat the analysis above for exponential populations with expected values of 2 and 1. Verify that the results above do not hold for non-normal data.