A New Approach for Comparing Means of Two Populations By Brad Moss

A New Approach for Comparing Means of Two Populations By Brad Moss And Sponsored by Dr. Chand Chauhan The Problem • The research that is being presented is about a current problem in statistics. • The problem is how do you compare two means of different populations when the variance of each population is unknown and unequal. • There are many ways to deal with this problem. We will compare a new method to one of the other methods during this presentation. The Idea! • After consulting with Dr. Chauhan and reading “A Note on the Ratio of Two Normally Distributed Variables” • It was decided that we could try to get around the unknown/unequal variance problem by taking a ratio of the estimated means which has not been done before. • We would approximate this ratio as a normal distribution and from that, decide whether or not the means are different. A Context for the Idea • Let’s say we are employees of “Get Better Drug Company” and that we are in the process of developing a drug for weight loss. • Our scientists have developed two drugs, A and B. • The company can only mass produce one of these drugs. Things We Will Want to Know • Overall, what is the mean weight loss for people taking each drug? • Is one drug more effective than the other? How To Answer • First we approximate the means of the two samples with averages known as 𝑌 and 𝑋. • Then we compare the means by either taking the ratio of the averages or we take the difference of the averages. • Lets’ consider the ratio of the averages which is what my research is about. How this Works • A ratio (otherwise know as a fraction) of the two means should be one or close to one. • How close to 1 is close enough? • For this, we create what is know as a Confidence Interval using the estimated ratio and an estimate of the variance. • If 1 falls into this interval, we will say that the means are statistically the same. The Ratio Can Be Approximated by a Normal Distribution. • According to the paper mentioned on slide 3, a ratio of two normally distributed random variables can be approximated as a normal distribution. • This happens when the standard deviation of one of the random variables is significantly smaller than the other. • The ratio of the standard deviations should exceed 19 to 9 for this to work assuming the means are equal. How Can We Use This • Since the ratio of two normally distributed random variables is approximately normal. • And the average values of normally distributed variables are normally distributed as well (see Theorem 8, page 187 in “An Introduction to Probability and Statistical Inference”) • That is if Y and X are normally distributed, then 𝑌 and 𝑋 are normally distributed. How Can We Use This • The last fact is important, because we don’t know our true mean, so we approximate our true mean with the averages. • According to the paper, the mean following: 𝜇𝑦 𝜇𝑥 𝑌 of 𝑋 is the (1 + 𝐶𝑉𝑥 2 ) when Y and X are independent. • CVx is the ratio of the standard deviation of x to its mean. The mu’s stand for the true mean. How Can We Use This • The variance 𝑌 of 𝑋 is: 2 𝜇𝑦 𝜇𝑥2 𝐶𝑉𝑥 2 + 𝐶𝑉𝑦 2 • Once again we are assuming Y and X are independent. • 𝑌 So, now we want to replace 𝑋 𝑌 our approximated means 𝑋 with the ratio of How Does that Change Things • Well according to statistics. We see that 𝐸 𝑌 = 𝜇𝑦 and 𝑉𝑎𝑟 𝑌 = and 𝑉𝑎𝑟 𝑋 = 𝜎𝑥2 𝑛𝑥 𝜎𝑦2 𝑛𝑦 𝐸 𝑋 = 𝜇𝑥 • Where E() stands for expected value of (…) and Var stands for Variance of (…) • The n’s stand for sample sizes with the subscript referencing to which sample they belong. Now Substituting • 𝐸 𝑌 𝑋 • 𝑉𝑎𝑟 = 𝑌 𝑋 𝜇𝑦 𝜇𝑥 = (1 + 2 𝜇𝑦 𝜇𝑥2 ( 𝐶𝑉𝑥 2 ) 𝑛𝑥 𝐶𝑉𝑥 2 𝑛𝑥 + 𝐶𝑉𝑦 2 ) 𝑛𝑦 • Since the standard deviation of X is small, then CVx is small and is being divided by a bigger number. • Therefore we can ignore approximate mean. Thus 𝐶𝑉𝑥 2 in the 𝑛𝑥 𝜇𝑦 𝑌 𝐸 ≈ 𝑋 𝜇𝑥 The Confidence Interval! • Now we need to determine how close to 1 do we have to be in order to say that the means are the same. • So we will create an interval around our approximate mean. • In statistics, we convert our normal values back to standard normal by subtracting the mean and then dividing by the standard deviation. The Confidence Interval! • So 𝑌 𝜇𝑦 − 𝑋 𝜇𝑥 𝜇2 𝑦 𝜇2 𝑥 ~𝑁(0,1) 𝐶𝑉𝑥 2 𝐶𝑉𝑦 2 + 𝑛𝑥 𝑛𝑦 • Now we want to choose two numbers “a” and “b” on the real line such that the probability 𝜇 of 𝑎 ≤ 𝑌 𝑦 − 𝑋 𝜇𝑥 𝜇2 𝑦 𝜇2 𝑥 𝐶𝑉𝑥 2 𝐶𝑉𝑦 2 + 𝑛𝑥 𝑛𝑦 ≤ 𝑏 is 95% The Confidence Interval! • Since we are working with a standard normal distribution, we will choose “a” and “b” to be -1.96 and 1.96 because in a standard normal, 𝑃(−1.96 ≤ 𝑋 ≤ 1.96)=.95 • so, 𝑃 −1.96 ≤ 𝑌 𝜇𝑦 − 𝑋 𝜇𝑥 𝜇2 𝑦 𝜇2 𝑥 𝐶𝑉𝑥 2 𝐶𝑉𝑦 2 + 𝑛 𝑛𝑥 𝑦 ≤ 1.96 ≈ 95% Doing Some Algebra • After some Algebra we get: 𝑌 𝑋 𝑃 1−1.96 𝐶𝑉𝑥 2 𝐶𝑉𝑦 2 + 𝑛𝑥 𝑛𝑦 ≥ 𝜇𝑦 𝜇𝑥 ≥ 1+1.96 𝑌 𝑋 𝐶𝑉𝑥 2 𝐶𝑉𝑦 2 + 𝑛𝑥 𝑛𝑦 .95 • Thus the end points of the Interval are: 𝑌 𝑋 • 1+1.96 𝐶𝑉𝑥 2 𝐶𝑉𝑦 2 + 𝑛𝑥 𝑛𝑦 𝑌 𝑋 and 1−1.96 𝐶𝑉𝑥 2 𝐶𝑉𝑦 2 + 𝑛𝑥 𝑛𝑦 ≈ Results • So, given those two endpoints, if 1 is between them, then the means are statistically the same. • Otherwise the means are different. Results • Using a program called Minitab, I ran 5000 simulations for several cases and the results are on the following slides. Case 1: Equal Sample Sizes • Y~N(100,10), X~N(100,0.5), Sample Size 30 • Confidence Interval of 94.04% • Mean Length: 0.0711821 • Y~N(100,10), X~N(100,2), Sample Size 30 • Confidence Interval of94.42% • Mean Length: 0.0726839 • Y~N(100,10), X~N(100,5), Sample Size 30 • Confidence Interval of 94.54% • Mean Length: 0.0795576 Case 2: Y has smaller sample size • Y~N(100,10) Sample Size 15, X~N(100,0.5) Sample Size 20 • Confidence Interval of 93.34% • Mean Length: 0.100127 • Y~N(100,10) Sample size 15, X~N(100,2) Sample Size 20 • Confidence Interval of 92.88% • Mean Length: 0.101451 • Y~N(100,10) Sample Size 15, X~N(100,5) Sample Size 20 • Confidence Interval of 93.68% • Mean Length: 0.108929 Case 3: X has smaller sample size • Y~N(100,10) Sample Size 20, X~N(100,0.5) Sample Size 15 • Confidence Interval of 93.8% • Mean Length: 0.0869802 • Y~N(100,10) Sample Size 20, X~N(100,2) Sample Size 15 • Confidence Interval of 93.56% • Mean Length: .0891369 • Y~N(100,10) Sample Size 20, X~N(100,5) Sample Size 15 • Confidence Interval of 93.84 • Mean Length: 0.100611 Sources • Jack Hayya, Donald Armstrong, and Nicolas Gressis. “A Note on the Ratio of Two Normally Distributed Variables” Management Science 21.11 (1975). 14 Jan 2011 < http://www.jstor.org/stable/2629897 > • Roussas, George. An Introduction To Probability and Statistical Inference. San Diego: Elsevier Science, 2003

A New Approach for Comparing Means of Two Populations By Brad Moss

Related documents

Products

Support

A New Approach for Comparing Means of Two Populations By Brad Moss

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib