A New Approach for Comparing Means of Two Populations By Brad Moss

advertisement
A New Approach for Comparing
Means of Two Populations
By Brad Moss
And Sponsored by Dr. Chand Chauhan
The Problem
• The research that is being presented is about a
current problem in statistics.
• The problem is how do you compare two means
of different populations when the variance of
each population is unknown and unequal.
• There are many ways to deal with this problem.
We will compare a new method to one of the
other methods during this presentation.
The Idea!
• After consulting with Dr. Chauhan and reading “A
Note on the Ratio of Two Normally Distributed
Variables”
• It was decided that we could try to get around
the unknown/unequal variance problem by
taking a ratio of the estimated means which has
not been done before.
• We would approximate this ratio as a normal
distribution and from that, decide whether or not
the means are different.
A Context for the Idea
• Let’s say we are employees of “Get Better
Drug Company” and that we are in the
process of developing a drug for weight loss.
• Our scientists have developed two drugs, A
and B.
• The company can only mass produce one of
these drugs.
Things We Will Want to Know
• Overall, what is the mean weight loss for
people taking each drug?
• Is one drug more effective than the other?
How To Answer
• First we approximate the means of the two
samples with averages known as π‘Œ and 𝑋.
• Then we compare the means by either taking
the ratio of the averages or we take the
difference of the averages.
• Lets’ consider the ratio of the averages which
is what my research is about.
How this Works
• A ratio (otherwise know as a fraction) of the
two means should be one or close to one.
• How close to 1 is close enough?
• For this, we create what is know as a
Confidence Interval using the estimated ratio
and an estimate of the variance.
• If 1 falls into this interval, we will say that the
means are statistically the same.
The Ratio Can Be Approximated by a
Normal Distribution.
• According to the paper mentioned on slide 3, a
ratio of two normally distributed random
variables can be approximated as a normal
distribution.
• This happens when the standard deviation of one
of the random variables is significantly smaller
than the other.
• The ratio of the standard deviations should
exceed 19 to 9 for this to work assuming the
means are equal.
How Can We Use This
• Since the ratio of two normally distributed
random variables is approximately normal.
• And the average values of normally
distributed variables are normally distributed
as well (see Theorem 8, page 187 in “An
Introduction to Probability and Statistical
Inference”)
• That is if Y and X are normally distributed,
then π‘Œ and 𝑋 are normally distributed.
How Can We Use This
• The last fact is important, because we don’t know
our true mean, so we approximate our true mean
with the averages.
• According to the paper, the mean
following:
πœ‡π‘¦
πœ‡π‘₯
π‘Œ
of
𝑋
is the
(1 + 𝐢𝑉π‘₯ 2 ) when Y and X are
independent.
• CVx is the ratio of the standard deviation of x to
its mean. The mu’s stand for the true mean.
How Can We Use This
• The variance
π‘Œ
of
𝑋
is:
2
πœ‡π‘¦
πœ‡π‘₯2
𝐢𝑉π‘₯
2
+ 𝐢𝑉𝑦
2
• Once again we are assuming Y and X are
independent.
•
π‘Œ
So, now we want to replace
𝑋
π‘Œ
our approximated means
𝑋
with the ratio of
How Does that Change Things
• Well according to statistics. We see that
𝐸 π‘Œ = πœ‡π‘¦ and π‘‰π‘Žπ‘Ÿ π‘Œ =
and π‘‰π‘Žπ‘Ÿ 𝑋 =
𝜎π‘₯2
𝑛π‘₯
πœŽπ‘¦2
𝑛𝑦
𝐸 𝑋 = πœ‡π‘₯
• Where E() stands for expected value of (…)
and Var stands for Variance of (…)
• The n’s stand for sample sizes with the
subscript referencing to which sample they
belong.
Now Substituting
• 𝐸
π‘Œ
𝑋
• π‘‰π‘Žπ‘Ÿ
=
π‘Œ
𝑋
πœ‡π‘¦
πœ‡π‘₯
=
(1 +
2
πœ‡π‘¦
πœ‡π‘₯2
(
𝐢𝑉π‘₯ 2
)
𝑛π‘₯
𝐢𝑉π‘₯ 2
𝑛π‘₯
+
𝐢𝑉𝑦 2
)
𝑛𝑦
• Since the standard deviation of X is small, then
CVx is small and is being divided by a bigger
number.
• Therefore we can ignore
approximate mean. Thus
𝐢𝑉π‘₯ 2
in the
𝑛π‘₯
πœ‡π‘¦
π‘Œ
𝐸
≈
𝑋
πœ‡π‘₯
The Confidence Interval!
• Now we need to determine how close to 1 do
we have to be in order to say that the means
are the same.
• So we will create an interval around our
approximate mean.
• In statistics, we convert our normal values
back to standard normal by subtracting the
mean and then dividing by the standard
deviation.
The Confidence Interval!
• So
π‘Œ πœ‡π‘¦
−
𝑋 πœ‡π‘₯
πœ‡2
𝑦
πœ‡2
π‘₯
~𝑁(0,1)
𝐢𝑉π‘₯ 2 𝐢𝑉𝑦 2
+
𝑛π‘₯
𝑛𝑦
• Now we want to choose two numbers “a” and
“b” on the real line
such that the probability
πœ‡
of π‘Ž ≤
π‘Œ
𝑦
−
𝑋 πœ‡π‘₯
πœ‡2
𝑦
πœ‡2
π‘₯
𝐢𝑉π‘₯ 2 𝐢𝑉𝑦 2
+
𝑛π‘₯
𝑛𝑦
≤ 𝑏 is 95%
The Confidence Interval!
• Since we are working with a standard normal
distribution, we will choose “a” and “b” to be
-1.96 and 1.96 because in a standard normal,
𝑃(−1.96 ≤ 𝑋 ≤ 1.96)=.95
• so, 𝑃
−1.96 ≤
π‘Œ πœ‡π‘¦
−
𝑋 πœ‡π‘₯
πœ‡2
𝑦
πœ‡2
π‘₯
𝐢𝑉π‘₯ 2 𝐢𝑉𝑦 2
+ 𝑛
𝑛π‘₯
𝑦
≤ 1.96
≈ 95%
Doing Some Algebra
• After some Algebra we get:
π‘Œ
𝑋
𝑃
1−1.96
𝐢𝑉π‘₯ 2 𝐢𝑉𝑦 2
+
𝑛π‘₯
𝑛𝑦
≥
πœ‡π‘¦
πœ‡π‘₯
≥
1+1.96
π‘Œ
𝑋
𝐢𝑉π‘₯ 2 𝐢𝑉𝑦 2
+
𝑛π‘₯
𝑛𝑦
.95
• Thus the end points of the Interval are:
π‘Œ
𝑋
•
1+1.96
𝐢𝑉π‘₯ 2 𝐢𝑉𝑦 2
+
𝑛π‘₯
𝑛𝑦
π‘Œ
𝑋
and
1−1.96
𝐢𝑉π‘₯ 2 𝐢𝑉𝑦 2
+
𝑛π‘₯
𝑛𝑦
≈
Results
• So, given those two endpoints, if 1 is between
them, then the means are statistically the
same.
• Otherwise the means are different.
Results
• Using a program called Minitab, I ran 5000
simulations for several cases and the results
are on the following slides.
Case 1: Equal Sample Sizes
• Y~N(100,10), X~N(100,0.5), Sample Size 30
• Confidence Interval of 94.04%
• Mean Length: 0.0711821
• Y~N(100,10), X~N(100,2), Sample Size 30
• Confidence Interval of94.42%
• Mean Length: 0.0726839
• Y~N(100,10), X~N(100,5), Sample Size 30
• Confidence Interval of 94.54%
• Mean Length: 0.0795576
Case 2: Y has smaller sample size
• Y~N(100,10) Sample Size 15, X~N(100,0.5) Sample Size 20
• Confidence Interval of 93.34%
• Mean Length: 0.100127
• Y~N(100,10) Sample size 15, X~N(100,2) Sample Size 20
• Confidence Interval of 92.88%
• Mean Length: 0.101451
• Y~N(100,10) Sample Size 15, X~N(100,5) Sample Size 20
• Confidence Interval of 93.68%
• Mean Length: 0.108929
Case 3: X has smaller sample size
• Y~N(100,10) Sample Size 20, X~N(100,0.5) Sample Size 15
• Confidence Interval of 93.8%
• Mean Length: 0.0869802
• Y~N(100,10) Sample Size 20, X~N(100,2) Sample Size 15
• Confidence Interval of 93.56%
• Mean Length: .0891369
• Y~N(100,10) Sample Size 20, X~N(100,5) Sample Size 15
• Confidence Interval of 93.84
• Mean Length: 0.100611
Sources
• Jack Hayya, Donald Armstrong, and Nicolas Gressis. “A Note on the Ratio
of Two Normally Distributed Variables” Management Science 21.11
(1975). 14 Jan 2011 < http://www.jstor.org/stable/2629897 >
• Roussas, George. An Introduction To Probability and Statistical Inference.
San Diego: Elsevier Science, 2003
Download