Comparison of means - Clarkson University

advertisement
W.R. Wilcox, Clarkson University, 7 June 2005
7. Comparison of means
Imagine that we have two populations and that we wish to compare their unknown means by
using samples taken from those populations.1 It is assumed that these populations are normally
distributed.2 First, we must realize that we can never say with 100% certainty by how much the
two population means differ from one another. We can give a probability that one is less than
the other. If this probability is near 100%, then we can be reasonably sure that the two
populations have different means. If the probability is near 50%, then there's a good chance the
two population means are the same or at least nearly the same.
The following MATLAB code gives a function designed to compare means. Copy it into
MATLAB's editor and save it in your working directory.
function compare(X1,X2)
% Comparison of means of two populations using samples X1 and X2
% assuming that both populations are normally distributed.
% W.R. Wilcox, Clarkson University, 7 June 2005
% x_1 and x_2: means of the two sets of samples, X1 and X2
% d_ = x_1 - x_2: difference of means
% n1 and n2: numbers of members of X1 and X2
% s1 and s2: standard deviations of X1 and X2
% nu1 = n1-1, nu2 = n2-1: degrees of freedom for s1^2 and s2^2
% nu: degrees of freedom for calculation of probability
% t: Student's t for nu and alpha2
% alpha2: Probability that |d_|>0
n1 = length(X1); n2 = length(X2);
x_1 = mean(X1); x_2 = mean(X2);
nu1 = n1 - 1; nu2 = n2 - 1;
fprintf('\nThe mean of the first set of data is %g.\n',x_1);
fprintf('The mean of the second set of data is %g.\n',x_2);
if n1==n2
d = X1 - X2; d_ = mean(d); nu = nu1; sd = std(d);
t = abs(d_)*sqrt(n1)/sd;
else
d_ = x_1 - x_2;
s1 = std(X1); s2 = std(X2);
sx12 = s1^2/n1; sx22 = s2^2/n2;
nu = 1/(sx12^2/(sx12+sx22)^2/nu1 + sx22^2/(sx12+sx22)^2/nu2);
t = abs(d_)/sqrt(sx12+sx22);
end
alpha2 = betainc(nu/(nu+t^2),nu/2,1/2)/2;
prob = 100*(1-alpha2);
fprintf('\nThe probability that the mean of the first population\n');
if d_ >= 0
fprintf('is greater than the mean of the second is %4.1f%%.\n\n',prob);
else
fprintf('is less than the mean of the second is %4.1f%%.\n\n',prob);
end
if prob < 80
fprintf('Only if the probability is near 100%% can we be reasonably
sure\n')
1
The two methods used here are based on section 5.5 of "Statistical Analysis in Chemistry and the Chemical
Industry," by C.A. Bennett and N.L. Franklin, Wiley, NY (1954).
2
See 6. Descriptive statistics for measurements of a single variable for tests for normalcy.
1
fprintf('that the two populations have different means.\n');
fprintf('To increase the certainty increase the sample size and\n');
fprintf('make both sample sizes equal.\n');
end
end
Test this function by saving the MATLAB data file Xn.mat in your working directory, then load
into the Command window using File, Import Data. This places "data" X1, X2, X3 and X4 in
your Command window. Compare the population means for these by executing the following:
>> compare(X1,X2);
>> compare(X1,X3);
>> compare(X1,X4);
>> compare(X2,X3);
>> compare(X2,X4);
>> compare(X3,X4);
What can you conclude from these results?3
3
These "data" were created using MATLAB's randn function and the following values of number of samples n,
mean , and square root of variance, : For X1, n = 15,  = 10 and  = 3. For X2, n = 16,  = 11 and  = 2. For
X3, n = 15,  = 11 and  = 2. For X4, n = 15,  = 10.5 and  = 2.
2
Download