Ch16. Testing a correlation

advertisement
373
Chapter 16. Testing a correlation
Problem PS300
Two-sided test of the correlation coefficient of bivariate normal data.
1. Use Step 3 of Problem PP177 to generate a sample of 50 values of (𝑋, π‘Œ) in the range
A2:B51 (the correlation -0.5 in cell C1);
2. Compute the sample correlation in cell D2: = 𝐢𝑂𝑅𝑅𝐸𝐿(𝐴2: 𝐴51; 𝐡2: 𝐡51) and the
sample statistic = 1⁄2 ∗ 𝑙𝑛((1 + 𝐷2)⁄(1 − 𝐷2)) in cell D3. Compute the value =
1⁄2 ∗ 𝑙𝑛((1 + 𝐢1)⁄(1 − 𝐢1)) in cell D4;
3. Compute the sample statistic (z-value) = (𝐷3 − 𝐷4)⁄𝑆𝑄𝑅𝑇(1⁄47) in cell D6 and the
p-value of the two sided test in cell D7: = 2 ∗ (1 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(𝐷6; 1));
4. Repeat the test several times (key F9) and notice that in most cases the hypothesis
𝜌 = −0.5 will be accepted (as it should);
5. Derive a 95% C.I. for the correlation in the population. Report the confidence
coefficient 0.95 in cell I2 and the corresponding z-value in cell J2: =
𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉((1 + 𝐼2)⁄2). Compute the value = 𝑆𝑄𝑅𝑇(1⁄47) in cell K2. Derive the
lower bound of the C.I. in cell I4: =
((1 + 𝐷2) ∗ 𝐸𝑋𝑃(−2 ∗ 𝐽2 ∗ 𝐾2) − (1 − 𝐷2))⁄((1 + 𝐷2) ∗ 𝐸𝑋𝑃(−2 ∗ 𝐽2 ∗ 𝐾2) + (1 − 𝐷2))
and the upper bound in cell I5:
= ((1 + 𝐷2) ∗ 𝐸𝑋𝑃(2 ∗ 𝐽2 ∗ 𝐾2) − (1 − 𝐷2))⁄((1 + 𝐷2) ∗ 𝐸𝑋𝑃(2 ∗ 𝐽2 ∗ 𝐾2) + (1 − 𝐷2)).
Notice that the true value of the correlation (-0.5) is usually within the interval when
the simulation is repeated;
6. Change the confidence coefficient in cell I2 to 0.9, then to 0.99, and repeat the
simulation a few times;
7. Change the population correlation in cell C1 to 0 and to 0.8 and check the results.
Assignment PA300
Generate a sample of 100 values of the standard normal variable Z. Generate a second
sample of values of the random variable Y by squaring the first sample values. Apply a twosided test for the hypothesis πœŒπ‘π‘Œ = π‘π‘œπ‘Ÿπ‘Ÿ(𝑍, π‘Œ) = 0 and derive a 90% confidence interval.
374
Notice that the population correlation was derived as 0 in Problem PP176. A fairly large
sample is needed as the distribution of (𝑍, π‘Œ) is not bivariate normal.
375
Problem PS301
An exact two-sided test of the correlation coefficient of bivariate normal data based on the
T-distribution.
1. Repeat Steps 1 to 3 of Problem PS300 but use a correlation of 0 (cell C1) between
both variables;
2. To apply the exact T-test, compute the value of the T-statistic in cell D9:
= 𝐴𝐡𝑆(𝐷2 ∗ 𝑆𝑄𝑅𝑇(48)⁄𝑆𝑄𝑅𝑇(1 − 𝐷2 ∗ 𝐷2)) and the p-value in cell D10:
= 2 ∗ (1 − 𝑇. 𝐷𝐼𝑆𝑇(𝐷9; 4; 1));
3. Compare the p-values of the exact test in cell D10 and the approximate test in cell
D7.
Assignment PA301
Generate a sample of 100 values of the standard normal variable Z. Generate a second
sample of values of the random variable Y by squaring the first sample values. Apply a twosided T-test for the hypothesis πœŒπ‘π‘Œ = π‘π‘œπ‘Ÿπ‘Ÿ(𝑍, π‘Œ) = 0. Notice that the population correlation
was derived as 0 in Problem PP176. A fairly large sample is needed as the distribution of
(𝑍, π‘Œ) is not bivariate normal.
376
Problem PS302
Consider the data set ‘Baguette’.
1. Test the hypothesis that there is no significant correlation between weight and price.
Use a one-sided test as we expect that there might be a positive correlation. Also
note that the sample is fairly large so that we can use both a T-test and a Z-test. To
start with the T-test: compute the sample correlation in cell H2: =
𝐢𝑂𝑅𝑅𝐸𝐿(𝐴2: 𝐴75; 𝐢2: 𝐢75). Compute the T-statistic in cell H3: =
𝐴𝐡𝑆(𝐻2 ∗ 𝑆𝑄𝑅𝑇(72)⁄𝑆𝑄𝑅𝑇(1 − 𝐻2 ∗ 𝐻2)). Derive the p-value in cell H4:
= 1 − 𝑇. 𝐷𝐼𝑆𝑇(𝐻3; 72; 1).
Answer: = .2157, 𝑇 − π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘ = 1.8742, 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 0.0325. The hypothesis of no
correlation can be strongly rejected;
2. To apply the Z-test, compute the correlation in cell M2, the sample statistic = .5 ∗
𝑙𝑛((1 + 𝑀2)⁄(1 − 𝑀2)) in cell M3, the Z-statistic in cell M4: =
𝐴𝐡𝑆(𝑀3⁄𝑆𝑄𝑅𝑇(1⁄71)) and the p-value in cell M5: = 1 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(𝑀4; 1).
Notice that the p-value is almost identical to the p-value based on the T-test and that
the same conclusion is reached.
Assignment PA302
Consider the data set ‘Baguette’. Compute the values of price/100g in column F. Test onesided the hypothesis that there is no significant correlation between the variables ‘weight’
and ‘price/100g’. Use both the T-test and the Z-test.
377
Problem PS303
Consider the data set ‘Decathlon2011’.
1. Compute the sample correlation between the variables ‘100m’ and ‘total points’ in
cell Q2. Repeat the computation for the remaining 9 disciplines and ‘total points’ in
cells R2 to Z2. Do the signs of the correlations correspond to what we expect?
2. Use one-sided T-tests to test the hypothesis that there is no significant correlation
between the results in the various disciplines and the total number of points.
Compute the T-sample statistics in row 3 and the p-values in row 4 (see Problems
PS301 or PS302 for the necessary Excel instructions).
Assignment PA303
Consider the data set ‘Decathlon2011’.
Compute the sample correlations between the variable ‘100m’ and the 9 remaining
disciplines. Do the signs of the correlations correspond to what we expect?
Use one-sided T-tests to test the hypothesis that there is no significant correlation between
the result for the100 and the results for the other disciplines.
Download