An Alternative Method of Comparing the Ratio of Two

An alternative method of comparing the means of two populations Chand Chauhan, Ph.D Associate Professor Dept. Of Mathematical Sciences Indiana University-Purdue University 2101 E, Coliseum Blvd, Fort Wayne, Indiana, 46835 Phone #: 260-481-6227 Fax # : 260-481-0155 chauhan@ipfw.edu Brad Moss, Student Dept. Of Mathematical Sciences Northern Illinois University Dekalb, Il, 60115 Phone: 574-453-6120 b.rammoss@gmail.com Results on the ratio of the means of two normally distributed variables. Abstract The equality of the means of two populations are tested under three different situations : a) population standard deviations are known, b) population standard deviations are unknown and equal, and c) population standard deviations are unknown and unequal. In most elementary courses, situations a and b are discussed in detail. However, situation c is either neglected or briefly discussed with a reference of the formula of the degrees of freedom for the t value used in the formula. It is not very easy to provide a logical explanation for the complicated formula of the degrees of freedom of the t distribution. In this paper we discuss an alternative method of comparing two population means under case c. We propose a confidence interval of the ratio of the means of populations, with the conclusion of equal means if the interval includes the value of one. The proposed method is valid for both independent or dependent populations. However in this paper the focus is on the independent case. The conditions under which the proposed formula works well are discussed. Simulation results are presented to illustrate the validity of the proposed interval. A numerical example is provided to illustrate the application of the proposed method. Introduction: Consider two positive sets of values, X and Y, such that X ~ N(  X ,  X2 ) and Y~ N(  y  y 2 ) , Further suppose  represents the correlation coefficient of X and Y. Let X and Y represent the corresponding sample means of n pairs, ( X i , Yi ) . In most practical situations,  X and Y are unknown and we may be interested in estimating  and computing the confidence interval of the ratio Y . For this objective, one needs to X Y  , denoted R , which is a reasonable estimate of Y . While a X X linear combination of X and Y has a normal distribution, the same is not true for the ratio. In this paper, the results of Hays, Armstrong, and Grasses (1975), have been modified to compute such an interval. investigate the distribution of Theoretical Results: Y , and another on the X approximate distribution of a function of R, given in Hays, Armstrong, and Grasses (1975), are as follows: Two specific results, one on the approximate distributions of R= (1) E (    2 Y )  Y  X 3Y   X 2 y X X X X       Y )  X 4y  y2  2  X y3 Y X X X X 2 (2) Var ( 2 2 Y has an approximately normal distribution under certain conditions. (Note in the X present discussion  refers to an approximately equal value.) (3) (4) Z  ( R X  Y ) (  2 R  X  Y  R 2 X2 ) 2 Y , known as the Geary_Hinkley transformation also has an approximate standard normal distribution under certain conditions. Hays et al provided conditions for 3) and 4) at 5% level of significance. Both conditions are in terms of the values of  , coefficients of variation of X and that of Y. The rationale of the conditions are driven by the fact that for a small enough value of standard deviation of X, (or Y may be regarded as Y times a constant, X whose distribution is normal . Hays et al used these results to derive a confidence interval of Y , assuming that the values of  X ,  X , Y , Y , and  , are all known. From practical point of X view it is highly unlikely to know the values of  X ,  X , Y , Y , and  .Moreover, from . consequently the coefficient of variation of X), applications point of view a confidence interval of Y .provides useful information regarding X the ratio of two attributes, on an average basis, than that of Y X Main Results: It can be shown that (5) The correlation coefficient between X and Y , denoted  is equal to  . This result follows directly from the fact that Covariance (X,Y) = Covariance ( X , Y ) Following (5), results (1)……(4) are modified as follows: (6)  Y 1  X  y  X  Y E( )  Y   X X n X 3  X 2n (7) Var ( 2 2 2 2   Y 1 X y 1 y )   2 X 3 y 4 2 X n X n X X n Y has an approximately normal distribution if the standard deviation of X is much X smaller than the standard deviation of Y . The same rationale applies as (3). (8) (9) Z n ( R  X  Y ) also has an approximate standard normal ( Y2  2 R  X  Y  R 2 X2 ) distribution under certain conditions as well. Conditions of approximate normality: Modifying the simulation results of Hays et al, (10) If ρ=0, then Y (10)  R has approximately normal distribution at 5% level of significance if X (10) Since the coefficient of variation of X, CVx =  X , and CVy = y , equations 6 and 7 y X can be approximated as follows: Y CVX2 CVX CVY Y CVX2 Y (10) E ( )  (1   ) (1  ) X X n n X n (11) Var ( 2 1  CVxCVy 2 1 Y 1 1 )  y 2 ( CVx2  CVy2  2 )  y 2 ( CVx2  CVy2 ) X X n n n X n n The last approximations in equations (10) and (11) are justified when n is large, and ρ is either  0 or very small. The ratio is insignificant even for moderately small value of ρ and n moderately large value of n. Recall, for normal approximation we do require   0.5. For CVX2 for sufficiently large n and n Y  sufficiently small value of CVX . In that case is an unbiased estimate of Y with X X negligible bias. further ease of algebra, one may cautiously drop the factor Applications : 1. If ρ=0, Y provides a useful alternative to Behren-Fisher X problem, when two populations have unequal standard deviations. The inclusion of the value 1 in the confidence interval leads to the conclusion of the equality of means. the approximate confidence interval of Applications: In many businesses related settings, a hypothesis test (or a confidence interval) for the equality of the means of two independent populations is conducted. When comparing the mean profits of two stores of varying sizes, or the mean number of transactions for two different credit cards are examples of some of the situations where such a confidence interval may be useful. Well known methods to compute such an interval are given as long as the population variances are either known or are unknown but equal. However no simple solution exits when the population variances are unknown and unequal. In this paper we have utilized results 5-7 for such a purpose. Y In our approach we compute a confidence interval of , and conclude that the population means X are equal if the interval contains 1. This approach is different from a traditional approach in which the means are compared by computing an interval based on Y  X and concluding the equality of the means if zero is within the interval. Formula for the confidence interval of y X Assumptions: We assume that the two populations are independent and normally distributed, and the samples are drawn randomly from each population. Further we assume that the standard deviation of one population, say, X, is much smaller than that of Y( although this condition may be relaxed depending on the sample sizes). Since  =0 , results 8 and 9 simplify as follows: (10) E(   CVx2 Y )  y (1  ) y X X n X y2 1 Y 1 Var ( )  2 ( CVx2  CVy2 ) X X n m (11) Moreover, the bias is even smaller for a reasonably large value of n. Central Limit Theorem and some algebraic manipulations lead to the following 95% confidence interval: Y X (12) 1  1.96 1 1 CVx2  CVy2 n m   y  X Y X 1  1.96 1 1 CVx2  CVy2 n m Intervals of different confidence levels may be computed by selecting appropriate values of Z. The values of the coefficients of variation are estimated from the samples. Simulation results: Nine different simulation results of 4000 runs were ran on Minitab to compute 95% confidence intervals. In each case both population means were equal. The sample sizes and the standard deviations of X and Y varied keeping the standard deviation of X smaller than that of Y. The actual confidence level and the length of interval were computed in each case. The simulation result 1, for example, is based on two normally distributed populations with the means of 100 each and standard deviations of 10 and 0.5 respectively. The actual confidence level is 94.04 % and the length of the interval is .0711821.The results follow; 1. Y~N(100,10), X~N(100,0.5), Sample Size 30 each Confidence Interval of 94.04% Mean Length: 0.0711821 2. Y~N(100,10), X~N(100,2), Sample Size 30 each Confidence Interval of 94.42% Mean Length: 0.0726839 3. Y~N(100,10), X~N(100,5), Sample Size 30 each Confidence Interval of 94.54% Mean Length: 0.0795576 4. Y~N(100,10) Sample Size 15, X~N(100,0.5) Sample Size 20 Confidence Interval of 93.34% Mean Length: 0.100127 5. Y~N(100,10) Sample size 15, X~N(100,2) Confidence Interval of 92.88% Mean Length: 0.101451 6. Y~N(100,10) Sample Size 15, X~N(100,5) Confidence Interval of 93.68% Mean Length: 0.108929 7. Y~N(100,10) Sample Size 20, X~N(100,0.5) Sample Size 15 Confidence Interval of 93.8% Mean Length: 0.0869802 8. Y~N(100,10) Sample Size 20, X~N(100,2) Confidence Interval of 93.56% Mean Length: .0891369 9. Sample Size 20 Sample Size 20 Sample Size 15 Y~N(100,10) Sample Size 20, X~N(100,5) Sample Size 15 Confidence Interval of 93.84 Mean Length: 0.100611 Interesting observations: Notice that in each of the cases the confidence level is below 95%. This is due to the fact that in simulation, we use the estimates for the coefficient of variation for both X and Y, keeping the practical issue in mind. In terms of confidence level the best results were obtained when the sample sizes were equal. The worst results were obtained when the sample was smaller for Y than for X. A logical explanation for this occurrence is that one must take larger sample for a population with a larger variability. The length of the interval gets shorter as the ratio of the standard deviation of X with that of Y gets smaller. The length also increases as the sample sizes decrease. Restrictions on Standard deviations of X and Y Y to have X an approximately normal distribution: Coefficient of variation of X<= .09 and coefficient of variation of Y> .19. In our proposed result, the normal assumption will depend on the ratio of Hayya et al,( 1975), recommended the following conservative rule of thumb for y as well as the values of n and m. For example, simulation result of 4000 runs showed that X y Y even with the ratio of =1, has a normal distribution when n=15, and m=9. Therefore X X our result has more flexibility. More investigation is underway on this topic. Numerical Example: Suppose the following information is obtained from two independent business transactions: Transaction1: n= 15, X = $ 188.00, sample standard deviation = $ 18.00 Transaction 2: m= 9, Y = $ 196.00, sample standard deviation = $ 28.00 Based on (12) a 95% confidence interval of y is as follows: X .96791  y  1.1297 . X y . Moreover we also X conclude that the two sample means are not statistically different, (note the value of 1 is included in the interval). Interval ( .96791, 1.1297) provides possible values for the ratio of Suggestions to improve the confidence level of the proposed interval: As noted from the simulation study, the actual confidence level for the interval is below 95%.This is particularly true when the samples are 20 or less. It is believed that the confidence level will increase to the desired level for sample of more than 30. To increase the confidence level for all sample sizes, a value of t may be used in formula (12) in place of z. However, formula (12) is very appealing to practitioners for its simplicity. More work is underway to determine the degrees of freedom, if a t value is used in above formula. References Jack Hayya, Donald Armstrong, and Nicolas Gressis. “A Note on the Ratio of Two Normally Distributed Variables” Management Science, Volume 21, No. 11, Theory Series, 1975, pp 1338-1341. Roussas, George. An Introduction To Probability and Statistical Inference.San Diego: Elsevier Science, 2003 Montgomery, Douglas. Design and Analysis of Experiments 7th Edition. New Jersey: John Wiley & Sons, 2009

An Alternative Method of Comparing the Ratio of Two

Related documents

Products

Support

An Alternative Method of Comparing the Ratio of Two

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib