W.R. Wilcox, Clarkson University, 7 June 2005 7. Comparison of means Imagine that we have two populations and that we wish to compare their unknown means by using samples taken from those populations.1 It is assumed that these populations are normally distributed.2 First, we must realize that we can never say with 100% certainty by how much the two population means differ from one another. We can give a probability that one is less than the other. If this probability is near 100%, then we can be reasonably sure that the two populations have different means. If the probability is near 50%, then there's a good chance the two population means are the same or at least nearly the same. The following MATLAB code gives a function designed to compare means. Copy it into MATLAB's editor and save it in your working directory. function compare(X1,X2) % Comparison of means of two populations using samples X1 and X2 % assuming that both populations are normally distributed. % W.R. Wilcox, Clarkson University, 7 June 2005 % x_1 and x_2: means of the two sets of samples, X1 and X2 % d_ = x_1 - x_2: difference of means % n1 and n2: numbers of members of X1 and X2 % s1 and s2: standard deviations of X1 and X2 % nu1 = n1-1, nu2 = n2-1: degrees of freedom for s1^2 and s2^2 % nu: degrees of freedom for calculation of probability % t: Student's t for nu and alpha2 % alpha2: Probability that |d_|>0 n1 = length(X1); n2 = length(X2); x_1 = mean(X1); x_2 = mean(X2); nu1 = n1 - 1; nu2 = n2 - 1; fprintf('\nThe mean of the first set of data is %g.\n',x_1); fprintf('The mean of the second set of data is %g.\n',x_2); if n1==n2 d = X1 - X2; d_ = mean(d); nu = nu1; sd = std(d); t = abs(d_)*sqrt(n1)/sd; else d_ = x_1 - x_2; s1 = std(X1); s2 = std(X2); sx12 = s1^2/n1; sx22 = s2^2/n2; nu = 1/(sx12^2/(sx12+sx22)^2/nu1 + sx22^2/(sx12+sx22)^2/nu2); t = abs(d_)/sqrt(sx12+sx22); end alpha2 = betainc(nu/(nu+t^2),nu/2,1/2)/2; prob = 100*(1-alpha2); fprintf('\nThe probability that the mean of the first population\n'); if d_ >= 0 fprintf('is greater than the mean of the second is %4.1f%%.\n\n',prob); else fprintf('is less than the mean of the second is %4.1f%%.\n\n',prob); end if prob < 80 fprintf('Only if the probability is near 100%% can we be reasonably sure\n') 1 The two methods used here are based on section 5.5 of "Statistical Analysis in Chemistry and the Chemical Industry," by C.A. Bennett and N.L. Franklin, Wiley, NY (1954). 2 See 6. Descriptive statistics for measurements of a single variable for tests for normalcy. 1 fprintf('that the two populations have different means.\n'); fprintf('To increase the certainty increase the sample size and\n'); fprintf('make both sample sizes equal.\n'); end end Test this function by saving the MATLAB data file Xn.mat in your working directory, then load into the Command window using File, Import Data. This places "data" X1, X2, X3 and X4 in your Command window. Compare the population means for these by executing the following: >> compare(X1,X2); >> compare(X1,X3); >> compare(X1,X4); >> compare(X2,X3); >> compare(X2,X4); >> compare(X3,X4); What can you conclude from these results?3 3 These "data" were created using MATLAB's randn function and the following values of number of samples n, mean , and square root of variance, : For X1, n = 15, = 10 and = 3. For X2, n = 16, = 11 and = 2. For X3, n = 15, = 11 and = 2. For X4, n = 15, = 10.5 and = 2. 2