CORRELATIONRAN The macro is designed to test the significance of the correlation between two variables. RUNNING THE MACRO Calling statement correlationran c1 c2 ; nran k1 (999) ; corrs c1. Input C1 First variable C2 Second variable C1 and c2 must be columns, of the same length, containing only numerical values. Subcommands nran Number of randomizations used. corrs Specifya column in which to store correlation coefficients for randomization samples. Output Number of observations, and means for each variable Observed correlation coefficient Number of randomizations Randomization p-values Speed of macro : FAST Missing values : Allowed. ALTERNATIVE PROCEDURES Standard procedures Correlation C2 C1. This finds the correlation between the data in c1 and the data in c2, and gives the p-value for this correlation. TECHNICAL DETAILS Null hypothesis : The two variables are uncorrelated, i.e. = 0. Test-statistic : The Pearson correlation coefficient. Randomization: We randomize the allocation of the values to the second variable to the values of the first variable, since under the null hypothesis the pairing of the two variables will be independent. Note : This macro operates in exactly the same way as the simple linear regression macro, REGRANSIMPLE. The output is substantially different, reflecting the different emphasis of correlation as opposed to regression. REFERENCES MANLY, F.J. (1997) Randomization, bootstrap and Monte Carlo methods in biology, Chapman and Hall, London (Chapter 8). WORKED EXAMPLE FOR CORRELATIONRAN Name of dataset HEXOKINASE Description The data is taken from part of a study by McKechnie, concerning electrophoretic frequencies of the butterfly Euphydryas editha. For each of 18 units (corresponding either to colonies, or to sets of colonies), the reciprocal of altitude (originally measured in feet * 103) is recorded, together with the percentage frequency of hexokinase 1.00 mobility genes from electrophoresis of samples of Euphydryas editha. We choose to label these variables "invalt" and "hk" respectively. Our source MANLY, F.J. (1997) Randomization, bootstrap and Monte Carlo methods in biology, Chapman and Hall, London. Original source MC. KECHNIE, S.W., EHRLICH, P.R. & WHITE, R.R. (1975), Population genetics of Euphydryas butterflies. I. Genetic variation and the neutrality hypothesis, Genetics, 81, pp. 571-594. Data Number of observations = 18 Number of variables = 2 For each observation, HK (top) and INVALT (bottom) are given. 98.00 36.00 72.00 67.00 82.00 72.00 65.00 1.00 40.00 39.00 9.00 2.00 1.25 1.75 1.82 2.63 1.08 2.08 1.59 0.67 0.57 0.50 19.00 42.00 37.00 16.00 4.00 1.00 4.00 0.24 0.40 0.50 0.15 0.13 0.11 0.10 Plot 100 80 hk 60 40 20 0 0 1 2 invalt Minitab worksheet C1 HK measurements C2 INVALT measurements Aims of analysis To investigate whether HK and INVALT measurements are correlated. 2 Standard procedure Welcome to Minitab, press F1 for help. MTB > Retrieve "N:\resampling\Examples\Hexokinase.MTW". Retrieving worksheet from file: N:\resampling\Examples\Hexokinase.MTW # Worksheet was saved on 06/07/01 14:15:38 Results for: Hexokinase.MTW Correlation c1 c2. Correlations: hk, invalt Pearson correlation of hk and invalt = 0.770 P-Value = 0.000 Resampling procedure MTB > % N:\resampling\library\correlationran c1 c2 ; SUBC> nran 499 ; SUBC> corrs c4. Executing from file: N:\resampling\library\correlationran.MAC Data Display (WRITE) Number of observations 18 Mean of first variable 39.11 Mean of second variable 0.98 Correlation coefficient 0.770 Number of randomizations 499 One sided randomization p-value, H1: -ve correlation One sided randomization p-value, H1: +ve correlation Two sided randomization p-value 0.0040 1.0000 0.0020 Modified worksheet C4 A column containing 499 correlation coefficients, one for each randomized dataset Discussion There is clearly a strong positive correlation between the variables. The standard p-value is 0.000, whilst the randomization pvalue is 0.004, the smallest possible value for 499 randomizations. 3