373 Chapter 16. Testing a correlation Problem PS300 Two-sided test of the correlation coefficient of bivariate normal data. 1. Use Step 3 of Problem PP177 to generate a sample of 50 values of (π, π) in the range A2:B51 (the correlation -0.5 in cell C1); 2. Compute the sample correlation in cell D2: = πΆππ π πΈπΏ(π΄2: π΄51; π΅2: π΅51) and the sample statistic = 1⁄2 ∗ ππ((1 + π·2)⁄(1 − π·2)) in cell D3. Compute the value = 1⁄2 ∗ ππ((1 + πΆ1)⁄(1 − πΆ1)) in cell D4; 3. Compute the sample statistic (z-value) = (π·3 − π·4)⁄πππ π(1⁄47) in cell D6 and the p-value of the two sided test in cell D7: = 2 ∗ (1 − πππ π. π. π·πΌππ(π·6; 1)); 4. Repeat the test several times (key F9) and notice that in most cases the hypothesis π = −0.5 will be accepted (as it should); 5. Derive a 95% C.I. for the correlation in the population. Report the confidence coefficient 0.95 in cell I2 and the corresponding z-value in cell J2: = πππ π. π. πΌππ((1 + πΌ2)⁄2). Compute the value = πππ π(1⁄47) in cell K2. Derive the lower bound of the C.I. in cell I4: = ((1 + π·2) ∗ πΈππ(−2 ∗ π½2 ∗ πΎ2) − (1 − π·2))⁄((1 + π·2) ∗ πΈππ(−2 ∗ π½2 ∗ πΎ2) + (1 − π·2)) and the upper bound in cell I5: = ((1 + π·2) ∗ πΈππ(2 ∗ π½2 ∗ πΎ2) − (1 − π·2))⁄((1 + π·2) ∗ πΈππ(2 ∗ π½2 ∗ πΎ2) + (1 − π·2)). Notice that the true value of the correlation (-0.5) is usually within the interval when the simulation is repeated; 6. Change the confidence coefficient in cell I2 to 0.9, then to 0.99, and repeat the simulation a few times; 7. Change the population correlation in cell C1 to 0 and to 0.8 and check the results. Assignment PA300 Generate a sample of 100 values of the standard normal variable Z. Generate a second sample of values of the random variable Y by squaring the first sample values. Apply a twosided test for the hypothesis πππ = ππππ(π, π) = 0 and derive a 90% confidence interval. 374 Notice that the population correlation was derived as 0 in Problem PP176. A fairly large sample is needed as the distribution of (π, π) is not bivariate normal. 375 Problem PS301 An exact two-sided test of the correlation coefficient of bivariate normal data based on the T-distribution. 1. Repeat Steps 1 to 3 of Problem PS300 but use a correlation of 0 (cell C1) between both variables; 2. To apply the exact T-test, compute the value of the T-statistic in cell D9: = π΄π΅π(π·2 ∗ πππ π(48)⁄πππ π(1 − π·2 ∗ π·2)) and the p-value in cell D10: = 2 ∗ (1 − π. π·πΌππ(π·9; 4; 1)); 3. Compare the p-values of the exact test in cell D10 and the approximate test in cell D7. Assignment PA301 Generate a sample of 100 values of the standard normal variable Z. Generate a second sample of values of the random variable Y by squaring the first sample values. Apply a twosided T-test for the hypothesis πππ = ππππ(π, π) = 0. Notice that the population correlation was derived as 0 in Problem PP176. A fairly large sample is needed as the distribution of (π, π) is not bivariate normal. 376 Problem PS302 Consider the data set ‘Baguette’. 1. Test the hypothesis that there is no significant correlation between weight and price. Use a one-sided test as we expect that there might be a positive correlation. Also note that the sample is fairly large so that we can use both a T-test and a Z-test. To start with the T-test: compute the sample correlation in cell H2: = πΆππ π πΈπΏ(π΄2: π΄75; πΆ2: πΆ75). Compute the T-statistic in cell H3: = π΄π΅π(π»2 ∗ πππ π(72)⁄πππ π(1 − π»2 ∗ π»2)). Derive the p-value in cell H4: = 1 − π. π·πΌππ(π»3; 72; 1). Answer: = .2157, π − π π‘ππ‘ππ π‘ππ = 1.8742, π − π£πππ’π = 0.0325. The hypothesis of no correlation can be strongly rejected; 2. To apply the Z-test, compute the correlation in cell M2, the sample statistic = .5 ∗ ππ((1 + π2)⁄(1 − π2)) in cell M3, the Z-statistic in cell M4: = π΄π΅π(π3⁄πππ π(1⁄71)) and the p-value in cell M5: = 1 − πππ π. π. π·πΌππ(π4; 1). Notice that the p-value is almost identical to the p-value based on the T-test and that the same conclusion is reached. Assignment PA302 Consider the data set ‘Baguette’. Compute the values of price/100g in column F. Test onesided the hypothesis that there is no significant correlation between the variables ‘weight’ and ‘price/100g’. Use both the T-test and the Z-test. 377 Problem PS303 Consider the data set ‘Decathlon2011’. 1. Compute the sample correlation between the variables ‘100m’ and ‘total points’ in cell Q2. Repeat the computation for the remaining 9 disciplines and ‘total points’ in cells R2 to Z2. Do the signs of the correlations correspond to what we expect? 2. Use one-sided T-tests to test the hypothesis that there is no significant correlation between the results in the various disciplines and the total number of points. Compute the T-sample statistics in row 3 and the p-values in row 4 (see Problems PS301 or PS302 for the necessary Excel instructions). Assignment PA303 Consider the data set ‘Decathlon2011’. Compute the sample correlations between the variable ‘100m’ and the 9 remaining disciplines. Do the signs of the correlations correspond to what we expect? Use one-sided T-tests to test the hypothesis that there is no significant correlation between the result for the100 and the results for the other disciplines.