A New Approach for Comparing Means of Two Populations By Brad Moss And Sponsored by Dr. Chand Chauhan The Problem • The research that is being presented is about a current problem in statistics. • The problem is how do you compare two means of different populations when the variance of each population is unknown and unequal. • There are many ways to deal with this problem. We will compare a new method to one of the other methods during this presentation. The Idea! • After consulting with Dr. Chauhan and reading “A Note on the Ratio of Two Normally Distributed Variables” • It was decided that we could try to get around the unknown/unequal variance problem by taking a ratio of the estimated means which has not been done before. • We would approximate this ratio as a normal distribution and from that, decide whether or not the means are different. A Context for the Idea • Let’s say we are employees of “Get Better Drug Company” and that we are in the process of developing a drug for weight loss. • Our scientists have developed two drugs, A and B. • The company can only mass produce one of these drugs. Things We Will Want to Know • Overall, what is the mean weight loss for people taking each drug? • Is one drug more effective than the other? How To Answer • First we approximate the means of the two samples with averages known as π and π. • Then we compare the means by either taking the ratio of the averages or we take the difference of the averages. • Lets’ consider the ratio of the averages which is what my research is about. How this Works • A ratio (otherwise know as a fraction) of the two means should be one or close to one. • How close to 1 is close enough? • For this, we create what is know as a Confidence Interval using the estimated ratio and an estimate of the variance. • If 1 falls into this interval, we will say that the means are statistically the same. The Ratio Can Be Approximated by a Normal Distribution. • According to the paper mentioned on slide 3, a ratio of two normally distributed random variables can be approximated as a normal distribution. • This happens when the standard deviation of one of the random variables is significantly smaller than the other. • The ratio of the standard deviations should exceed 19 to 9 for this to work assuming the means are equal. How Can We Use This • Since the ratio of two normally distributed random variables is approximately normal. • And the average values of normally distributed variables are normally distributed as well (see Theorem 8, page 187 in “An Introduction to Probability and Statistical Inference”) • That is if Y and X are normally distributed, then π and π are normally distributed. How Can We Use This • The last fact is important, because we don’t know our true mean, so we approximate our true mean with the averages. • According to the paper, the mean following: ππ¦ ππ₯ π of π is the (1 + πΆππ₯ 2 ) when Y and X are independent. • CVx is the ratio of the standard deviation of x to its mean. The mu’s stand for the true mean. How Can We Use This • The variance π of π is: 2 ππ¦ ππ₯2 πΆππ₯ 2 + πΆππ¦ 2 • Once again we are assuming Y and X are independent. • π So, now we want to replace π π our approximated means π with the ratio of How Does that Change Things • Well according to statistics. We see that πΈ π = ππ¦ and πππ π = and πππ π = ππ₯2 ππ₯ ππ¦2 ππ¦ πΈ π = ππ₯ • Where E() stands for expected value of (…) and Var stands for Variance of (…) • The n’s stand for sample sizes with the subscript referencing to which sample they belong. Now Substituting • πΈ π π • πππ = π π ππ¦ ππ₯ = (1 + 2 ππ¦ ππ₯2 ( πΆππ₯ 2 ) ππ₯ πΆππ₯ 2 ππ₯ + πΆππ¦ 2 ) ππ¦ • Since the standard deviation of X is small, then CVx is small and is being divided by a bigger number. • Therefore we can ignore approximate mean. Thus πΆππ₯ 2 in the ππ₯ ππ¦ π πΈ ≈ π ππ₯ The Confidence Interval! • Now we need to determine how close to 1 do we have to be in order to say that the means are the same. • So we will create an interval around our approximate mean. • In statistics, we convert our normal values back to standard normal by subtracting the mean and then dividing by the standard deviation. The Confidence Interval! • So π ππ¦ − π ππ₯ π2 π¦ π2 π₯ ~π(0,1) πΆππ₯ 2 πΆππ¦ 2 + ππ₯ ππ¦ • Now we want to choose two numbers “a” and “b” on the real line such that the probability π of π ≤ π π¦ − π ππ₯ π2 π¦ π2 π₯ πΆππ₯ 2 πΆππ¦ 2 + ππ₯ ππ¦ ≤ π is 95% The Confidence Interval! • Since we are working with a standard normal distribution, we will choose “a” and “b” to be -1.96 and 1.96 because in a standard normal, π(−1.96 ≤ π ≤ 1.96)=.95 • so, π −1.96 ≤ π ππ¦ − π ππ₯ π2 π¦ π2 π₯ πΆππ₯ 2 πΆππ¦ 2 + π ππ₯ π¦ ≤ 1.96 ≈ 95% Doing Some Algebra • After some Algebra we get: π π π 1−1.96 πΆππ₯ 2 πΆππ¦ 2 + ππ₯ ππ¦ ≥ ππ¦ ππ₯ ≥ 1+1.96 π π πΆππ₯ 2 πΆππ¦ 2 + ππ₯ ππ¦ .95 • Thus the end points of the Interval are: π π • 1+1.96 πΆππ₯ 2 πΆππ¦ 2 + ππ₯ ππ¦ π π and 1−1.96 πΆππ₯ 2 πΆππ¦ 2 + ππ₯ ππ¦ ≈ Results • So, given those two endpoints, if 1 is between them, then the means are statistically the same. • Otherwise the means are different. Results • Using a program called Minitab, I ran 5000 simulations for several cases and the results are on the following slides. Case 1: Equal Sample Sizes • Y~N(100,10), X~N(100,0.5), Sample Size 30 • Confidence Interval of 94.04% • Mean Length: 0.0711821 • Y~N(100,10), X~N(100,2), Sample Size 30 • Confidence Interval of94.42% • Mean Length: 0.0726839 • Y~N(100,10), X~N(100,5), Sample Size 30 • Confidence Interval of 94.54% • Mean Length: 0.0795576 Case 2: Y has smaller sample size • Y~N(100,10) Sample Size 15, X~N(100,0.5) Sample Size 20 • Confidence Interval of 93.34% • Mean Length: 0.100127 • Y~N(100,10) Sample size 15, X~N(100,2) Sample Size 20 • Confidence Interval of 92.88% • Mean Length: 0.101451 • Y~N(100,10) Sample Size 15, X~N(100,5) Sample Size 20 • Confidence Interval of 93.68% • Mean Length: 0.108929 Case 3: X has smaller sample size • Y~N(100,10) Sample Size 20, X~N(100,0.5) Sample Size 15 • Confidence Interval of 93.8% • Mean Length: 0.0869802 • Y~N(100,10) Sample Size 20, X~N(100,2) Sample Size 15 • Confidence Interval of 93.56% • Mean Length: .0891369 • Y~N(100,10) Sample Size 20, X~N(100,5) Sample Size 15 • Confidence Interval of 93.84 • Mean Length: 0.100611 Sources • Jack Hayya, Donald Armstrong, and Nicolas Gressis. “A Note on the Ratio of Two Normally Distributed Variables” Management Science 21.11 (1975). 14 Jan 2011 < http://www.jstor.org/stable/2629897 > • Roussas, George. An Introduction To Probability and Statistical Inference. San Diego: Elsevier Science, 2003