ESTIMATION OF PARAMETERS Rajender Parsad I.A.S.R.I., Library Avenue, New Delhi-110 012, India 1. Introduction Statistics is a science which deals with collection, presentation, analysis and interpretation of results. The procedures involved with collection, presentation, analysis and interpretation of results can be classified into two broad categories viz.: 1. Descriptive Statistics: It deals with the methods of collecting, presenting and describing a set of data so as to yield meaningful information. 2. Statistical Inference: It comprises of those methods which are concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data. It is also known as the art of evaluating information to draw reliable inferences about the true value of the phenomenon under study. The main purpose of statistical inference is 1. to estimate the parameters(the quantities that represent a particular characteristic of a population) on the basis of sample observations through a statistic ( a function of sample values which does not involve any parameter). (Theory of Estimation). 2. to compare these parameters among themselves on the basis of observations and their estimates ( testing of hypothesis) To distinguish clearly between theory of estimation and testing of hypothesis, let us consider the following examples: Example 1.1: A storage bin contains 1,00,000 seeds. These seeds after germination will produce either red or white coloured flowers. We want to know the percentage of seeds that will produce red coloured flowers. This is the problem of estimation. Example 1.2: It is known that the chir pine trees can yield an average of 4 kg of resin per blaze per season. On some trees the healed up channels were treated with a chemical. We want to know whether the chemical treatment of healed up channels enhances the yields of resin. This is the problem of testing of hypothesis. In many cases like above, it may not be possible to determine and test about the value of a population parameter by analysing the entire set of population values. The process of determining the value of the parameter may destroy the population units or it may simply be too expensive in money and /or time to analyse the units. Therefore, for making statistical inferences, the experimenter has to have one or more samples of observations from one or more variables. These observations are required to satisfy certain assumptions viz. the observations should belong to a population having some specified probability distribution and that they are independent. For example in case of example 1, ESTIMATION OF PARAMETERS it will not be natural to compute the percentage of seeds after germination of all the seeds, as we may not be willing to use all the seeds at one time or there may be lack of resources for maintaining the whole bulk at a time. In example 2, it will not be possible to apply the chemical treatment to the healed up channels of all the chir pine trees and hence we have to test our hypothesis on the basis of a sample of chir pine trees. More specifically, the statistical inference is the process of selecting and using a sample statistic (a function of sample observations) to draw inferences about population parameters(s){a function of population values}.. Before proceeding further, it will not be out of place to describe the meaning of parameter and statistic. Parameter: Any value describing a characteristic of the population is called a parameter. For example, consider the following set of data representing the number of errors made by a secretary on 10 different pages of a document: 1, 0, 1, 2, 3, 1, 1, 4, 0 and 2. Let us assume that the document contains exactly 10 pages so that the data constitute of a small finite population. A quick study of this population leads to a number of conclusions. For instance, we could make the statement that the largest number of typing errors on any single page was 4, or we might say that the arithmetic mean of the 10 numbers is 1.5. The 4 and 1.5 are descriptive properties of the population and are called as parameters. Customarily, the parameters are represented by Greek letters. Therefore, population mean of the typing errors is µ = 1.5. It may be noted hat the parameter is a constant value describing the population. Statistic: Any numerical value describing a characteristics of a sample is called a statistic. For example, let us suppose that the data representing the number of typing errors constitute a sample obtained by counting the number of errors on 10 pages randomly selected from a large manuscript. Clearly, the population is now a much larger set of data about which we only have partial information provided by the sample. The numbers 4 and 1.5 are now descriptive measures of sample and are called statistic. A statistic is usually represented by ordinary letters of the English alphabet. If the statistics happens to be the sample mean, we denote it by x . For our random sample of typing errors we have x =1.5. Since many random samples are possible from the same population, we would expect the statistic to vary from sample to sample. Now coming back to the problem of statistical inference: Let x1, x2 ,K, xn be a random sample from a population which is distributed in a form which is completely known except that it contains some unknown parameters and probability density function (pdf) or probability mass function (pmf) of the population is given by f(X,θ). Therefore, in this situation the distribution is not known completely until we know the values of the unknown parameters. For simplicity let us take the case of single unknown parameter. The unknown parameter θ have some admissible values which lie on a real line in case of a single parameter, in a plane for two parameters, three dimensional plane in case of three parameters and so on. The set of all possible values of the parameter(s) θ is called the parameteric space and is denoted by Θ. If Θ is the parameter space, then the set of all {f(X,θ) ; θ∈Θ } is called the family of pdf’s of X if X is continuos and the family of pmf’s of X if X is discrete. To be clearer, let us consider the following examples. II-48 Estimation of Parameters Example 1.3: Let X∼ B(n,p) and p is unknown. Then Θ = {p: 0<p<1} and {B(n,p): 0<p<1} is the family of pmf’s of X. Example 1.4: Let X ∼ N(µ, σ2), if both µ and σ2 are unknown then Θ = {(µ, σ2) : ∞<µ<∞, σ2 >0} and if µ = µ0 , say and σ2 is unknown, then Θ = {(µ0, σ2) : σ2 >0}. On the basis of a random sample x1, x2 ,K, xn from a population, our aim is to estimate the unknown parameter θ. Henceforth, we shall discuss only the theory of estimation. The estimation of unknown population parameter(s) through sample values can be done in two ways: 1. Point Estimation 2. Interval Estimation In the first case we are required to determine a number which can be taken as the value of θ, where as in the second, we are required to determine an interval (a, b) in which the unknown parameter, θ is expected to lie. For example, if the population is normal, then a possible point estimate of population mean is done through sample mean and a possible interval estimate of mean is ( x − 3s, x + 3s), where x = s2 = 1 n ∑ xi is the sample mean and n i=1 1 n ∑ (xi − x )2 is the sample variance. n − 1 i =1 Estimator: It is a function of sample observations whose value at a given realization of the observations gives the estimate of the population parameter. On the other hand, an estimate means the numerical value of the estimator of a given sample. Thus, an estimator is a random variable calculated from the sample data that supplies either interval estimates or point estimates for population parameters. It is essential to distinguish between an estimator and an estimate. The distinction between an estimator and estimate is same as that between a function ‘f’ regarded as defined for a range of a variable X and the particular value which the function assumes , say f(a) for a specified value of X= a. For instance, if the sample mean x is used to estimate a population mean (µ), and the sample mean is 15, the estimator used is the sample mean whereas the estimate is 15. Thus the statistic which is used to estimate a parameter is an estimator whereas the numerical value of the estimator is called an estimate. 2. Point Estimation A point estimator is a random variable calculated from the sample data that supplies the point estimates for population parameters. 49 ESTIMATION OF PARAMETERS Let x1, x2 ,K, xn be a random sample from a population with pdf or pmf as f ( X , θ ), θ ∈Θ , where θ is unknown. We want to estimate θ or τ(θ). Then tn = f ( x1, x2 , K , xn ) is said to be point estimator of θ or τ(θ) if tn is close to θ or τ(θ). In general there can be several alternative procedures that can be adopted to obtain the point estimate of the population parameter. For instance, we may compute arithmetic mean or median or geometric mean to estimate the population mean. As we never know the true value of the parameter, therefore, it does not make sense that our estimate is correct. Therefore, if there are more than one estimator then the question arises, which among them is better. This means that we must stipulate some criteria which can be applied to decide whether one estimator is better than the other i.e. although an estimator is not expected to estimate the population parameter without error. Simultaneously we don’t expect them to be very far off. Following are the criteria which should be satisfied by a good estimator. These are: 1. Unbiasedness 2. Consistency 3. Efficiency 4. Sufficiency Unbiasedness: An estimator (T) is said to be unbiased if the expected value of the estimator is equal to the population parameter (θ) being estimated. For example, if the same estimator is used repeatedly for all possible samples and we average these values we would expect the average to be same as true values of the parameter. For instance, if sample mean ( x ) is an unbiased estimator for population mean (µ) then we must get the expected value of sample mean equal to population mean. In symbols, E ( x ) = µ. 1 n ( xi − x ) 2 is an unbiased Similarly variance of the sample observations s2 = ∑ n − 1 i =1 2 2 2 estimate of population variance ( σ ) because E (s ) = σ . Steps to check whether a given estimator is unbiased or not 1. Draw all possible samples of a given size from the population 2. Calculate the value of given estimator for all these samples separately. 3. Take the average of all these values obtained in Step 2. If this average is equal to the population parameter, then the estimator is unbiased and if this average is more than the population parameter, then the estimator is said to be positively biased and when this average is less than the population parameter, it is said to be negatively biased. Consistency: An estimator (T) is said to be consistent estimator of parameter (θ) if, as the sample size ‘n’ is increased, the estimator (T) converges to θ in probability. It is also intuitively an appealing characteristic for an estimator to possess. For it says that as the sample size increases (which should mean, in most reasonable circumstances that more information becomes available) the estimate becomes “better” in the sense indicated. II-50 Estimation of Parameters Steps to check whether a given estimator is consistent or not 1. Show that T is an unbiased estimator of θ. 2. Obtain the variance of T i.e. variance of all the values of T obtained from all possible samples of a particular size. 3. Increase the sample size and repeat the above two steps. 4. If variance of T decreases as sample size (n) increases and approaches to zero as n becomes infinitely large, then T is said to be consistent estimator of θ. Efficiency: It is sometimes possible to find more than one estimator which are unbiased and consistent for population parameter. For instance, in case of a normal population N(µ, σ2) sample mean ( x ) and sample median (xmd) are unbiased and consistent estimators for population mean (µ). However it can easily be seen that Var(xmd) 2 ⎛ ⎛ πσ 2 ⎞ ⎟ >Var ( x )⎜ = σ ⎜≈ ⎜ n ⎜ 2n ⎟ ⎝ ⎠ ⎝ ⎞ π ⎟ as > 1. ⎟ 2 ⎠ As the variance of a random variable measures the variability of the random variable about its expected value. Hence, it intuitively appeals that an unbiased estimator with smaller variance is preferable to an unbiased estimator with larger variance. Therefore, in above example, sample mean is preferable to sample median. Thus there is a necessity of some further criterion which will enable us to choose the best estimator. Such a criterion which is based on the concept of variance is known as efficiency. Minimum Variance Unbiased Estimator (MVUE): An estimator (T) is said to be an MVUE for population parameter θ if T is unbiased and has the smallest variance among all the unbiased estimators of θ. This is also called as most efficient estimator of θ as it has the smallest variance among all the unbiased estimators of θ. The ratio of variance of an MVUE and variance of a given unbiased estimator is termed as efficiency of the given unbiased estimator. There exists some general techniques viz. Cramer-Rao inequality, Rao-Blackwell theorem, Lehmann-Scheffe Theorem for finding minimum variance unbiased estimators. Best Linear Unbiased Estimator: An estimator (T) is said to be Best Linear Unbiased Estimation (BLUE) of θ if 1. T is unbiased for θ. 2. T is a linear function of sample observations. 3. T has the minimum variance among all unbiased estimators of θ which are linear functions of the sample observations. Example 2.1: It is claimed that when a particular chemical is applied to some ornamental plants will increase the height of the plant rapidly in a period of one week. The increase in heights of 5 plants to which this chemical was applied are given below: 51 ESTIMATION OF PARAMETERS Plants Increase in Height (cm) 1 5 2 7 3 8 4 9 5 6 Assuming that the distribution of increase in heights is normal, draw a sample of size 3 with replacement and show that sample mean ( x ) and sample median ( xmd) are unbiased estimators for the population mean. Solution: Step 1: Obtain the population mean. 5 + 7 + 8 + 9 + 6 35 = = 7 cm. 5 5 Step 2: Draw all possible samples of size three with replacement and obtain their sample mean and sample median. Population mean = Sample Mean Frequency 15/3 1 16/3 3 17/3 6 18/3 10 19/3 15 20/3 18 21/3 19 22/3 18 23/3 15 24/3 10 25/3 6 26/3 3 27/3 1 Step 3: Obtain the mean of the sample means. Sample Median 5 6 7 8 9 Frequency 13 31 37 31 13 Mean of the Sample Means= 7 cm. Variance of the sample means=0.667 cm2 Step 4: Obtain the mean of sample medians. Mean of the sample medians=7cm; Variance of the Sample Medians = 1.328 cm2 The median =7 cm. Therefore, we can see that both the sample mean and sample median are unbiased estimators for population mean in case of normal population. Sufficiency: An estimator (T) is said to be sufficient for the parameter θ, if it contains all the information in the sample regarding the parameter. This criterion has a practical importance in the sense that after the data is collected either from a sample survey or a designed experiment, the job of a statistician is to draw some statistically valid II-52 Estimation of Parameters conclusions about the population under investigation. The raw data by themselves besides being costly to store are not suitable for this purpose. Therefore, the statistician would like to condense the data by computing some statistic from them and to base his analysis on these statistic, provided that there is no loss of information in doing so. In many statistical problems of statistical inference a function of the observations contain as much information about the unknown parameter as do all the observed values. To make it clearer, let us consider the following example: Example 2.2: Suppose you wish to play a coin tossing game against an adversary who supplies the coin. If the coin is fair and you win a dollar if you predict the outcome of a toss correctly and lose a dollar otherwise, then your net expected gain is zero. Since your adversary supplies the coin you may want to check if the coin is fair before you start playing the game, i.e., to test H0: p=0.5 against H1: p≠0.5. You toss the coin ‘n’ times, should you record the outcome of each trial or is it enough to know the total number of heads in ‘n’ tosses to test H0? Intuitively it seems clear that the number of heads in ‘n’ trials contains all the information about the unknown parameter ‘p’ and precisely this is the information which we have used so far in the problems of inference concerning ‘p’. Writing xi=1, if the ith toss results in a head and zero otherwise and setting T = T(x1, ..., n xn) = ∑x i . We note that T is the number of heads in ‘n’ trials. Clearly, there is a i =1 substantial reduction in data collection and storage if we record the value of T=t rather than the observation vector (x1, ..., xn) because t can take only n+1 values whereas the vector can take values numbering 2n. Therefore, whatever decision we make about H0 should depend on the value of t. It can easily be seen that a trivial statistic T(x1, ..., xn)= (x1, ..., xn) is always a sufficient but does not provide any reduction in data collection. Hence, is not preferable as our aim is to condense the data and simultaneously retaining all the information about the parameter contained in the sample. A sufficient statistic which reduces the data most is called minimal sufficient statistic. One of the ways to check whether a given statistic is sufficient or not is that the conditional distribution of x1, ..., xn given T ( given sufficient statistic) is independent of population parameter. Until now, we have discussed several properties of good estimators like unbiasedness, consistency, efficiency and sufficiency that seems desirable in the context of point estimation. Thus, we would like to check whether a proposed estimator satisfies all or some of these criteria. However, if we are faced with a point estimation problem, the question arises where can we start to look for the estimator. Therefore, it would be convenient to have one (or several) intuitively reasonable methods of generating possibly good estimators to study our problem. The principal methods of obtaining point estimators are: 53 ESTIMATION OF PARAMETERS 1. 2. 3. 4. Method of moments Method of minimum chi-square Method of least squares Method of maximum likelihood. The application of the above mentioned methods in particular cases lead to estimators which may differ and hence possess different attributes of goodness. The most important method of point estimation is the method of maximum likelihood which provides estimators with desirable properties. Method of Maximum Likelihood: To introduce the method of maximum likelihood, consider a very simple estimation problem. Suppose that an urn contains a number of black and a number of white balls and suppose that it is known that the ratio of the numbers is 3:1 but that it is not known whether the black or white balls are more numerous, i.e., the probability of drawing a black ball is either 1/4 or 3/4. If ‘n’ balls are drawn with replacement from the urn. The distribution of X, the number of black balls is binomial distribution and its probability mass function is given by f(X, p) = n CX p X q n − X for X = 0,1,...,n where q = 1 - p and p is the probability of drawing a black ball, here p = 1/4 or 3/4. We shall draw a sample of three balls, i.e., n = 3 with replacement and attempt to estimate the unknown parameter ‘p’ of the distribution. The estimation problem is particularly simple in this case because we have only to choose between the two numbers 1/4 and 3/4. The possible outcomes of the sample and their probabilities are given below: Outcome : X f(X;3/4) f(X;1/4) 0 1/64 27/64 1 9/64 27/64 2 27/64 9/64 3 27/64 1/64 In the present example, if we found that X=0, the estimate 1/4 for ‘p’ would be preferred over 3/4 because the probability 27/64 is greater than 1/64, i.e., because a sample with X=0 is more likely ( in the sense of having larger probability ) to arise from a population with p=1/4 than from one with p=3/4. In general, we substitute ‘p’ by 1/4 when X=0 or 1 and by 3/4 when X=2 or 3. The estimator may thus be defined as ⎧1 / 4, forX = 0 or1 . ⎩ 3 / 4, forX = 2,3 p$ = p$ ( X ) = ⎨ The estimator thus selects for every possible value of X, the value of p, say p$ such that f ( X ; p$ ) > f ( x , p′ ) where p′ is any other value of p, 0<p<1. Let us consider another experimental situation. A lion turned man eater. The lion has three possible states of activity each night; they are “very active” (denoted by θ1), II-54 Estimation of Parameters “moderately active” (denoted by θ2) and “lethargic” (denoted by θ3). This lion eats ‘i’ people with probability P(i/θ), θ ∈ Θ = {θ1, θ2, θ3}. The numerical values are given in the table below: Lion’s Appetite Distribution i 0 1 2 3 4 .00 .05 .05 .80 .10 P(i/θ1) .05 .05 .80 .10 .00 P(i/θ2) .90 .08 .02 .00 .00 P(i/θ3) If we are told that X=x0 people were eaten last night and asked to estimate the lion’s activity state θ1, θ2 or θ3. One seemingly reasonable method is to estimate θ as that θ ∈ Θ that provides the largest probability of observing what we did observe. It can easily be seen that θ$ (0) = θ 3 , θ$ (1) = θ 3 , θ$ (2) = θ 2 , θ$ (3) = θ 1 and θ$ (4) = θ 1 . Thus maximum likelihood estimator ( θ$ ) of population parameter is that the value of θ which maximizes the likelihood function, i.e., the joint pdf/pmf of sample observations taken as a function of θ . MLE for population mean : The MLE of population mean µ, based on a random sample of size n is the sample mean x and if the variance of the population units Xi’s is σ2, then variance of x is σ2/n. Therefore, it can easily be seen that x is unbiased, consistent, sufficient and efficient estimator of θ . MLE for proportion : The MLE of the proportion ‘p’ in a binomial experiment is given by p$ = x/n, where x represents the number of successes in ‘n’ trials. Therefore, the sample proportion p$ =x/n is MLE for the parameter ‘p’. The variance of p$ is p(1-p)/n and E( p$ )=X/n is unbiased, consistent and sufficient estimator of ‘p’. MLE for population variance: In case of large samples from any population or small samples from a normal population the MLE of the population variance σ2 when population mean is unknown is given by 1 n ( x i − x ) 2 where x1, ...,xn are the sample observations and x is the sample ∑ n i =1 mean. S2 = 1 2 1 2σ 4 2 2 E(S ) = (1- )σ and variance of S is Var (S ) = (1 − ) . It can easily be seen n n n that as n → ∞, S2 is consistent estimator for population mean, it can also be proved that it is asymptotically unbiased and asymptotically efficient estimator for the population variance. However an exact unbiased estimator for population variance is s2 = 1 n ( x i − x ) 2 . Therefore, it can be inferred that MLE’s are not in general unbiased. ∑ n − 1 i =1 2 55 ESTIMATION OF PARAMETERS However, quite often bias may be removed by multiplying by an appropriate constant as in the above case if we multiply S2 by n/(n-1) we get s2, an unbiased estimator for σ2. Point estimators, however, are not good estimators of the population parameters in the sense that even an MVUE is unlikely to estimate the population parameter exactly. It is true that our accuracy increases with large samples, but still there is no reason why we should expect a point estimate from a given sample to be exactly equal to the population parameter it is supposed to estimate. Point estimators fail to throw light on how close we can expect such an estimator to the population parameter, we wish to estimate. Thus, we cannot associate a probability statement with point estimators. Therefore, it would be desirable to determine an interval within which we should expect to find the value of the parameter with some probability statement associated with it. This is done through the interval estimation. 3. Interval Estimation An interval estimator is a formula that tells us how to use sample data to calculate an interval that estimates a population parameter. Let x1, x2,..., xn be a sample from a population with pdf or pmf as f (x,θ), θ ∈ Θ. Our aim is to find two estimators T1 = T1 (x1,...,xn) and T2 = T2(x1...,xn) such that P{T ≤ θ ≤ T2} = 1-α. Then the interval (T1, T2) is called the 100(1-α)% confidence interval (CI) estimator with confidence coefficient 100(1-α)% as the confidence coefficient. Therefore, the confidence coefficient is the probability that an interval estimator encloses the population parameter if the estimator is used repeatedly a large number of times. T1, T2 are the lower and upper bounds of the CI where for a particular application we substitute the appropriate numerical values for the confidence, and the lower and upper bounds. The above statement reflects our confidence in the process rather than in the particular interval formed. We know that 100 (1-α)% of the resulting intervals will contain the population parameter. There is usually no way to determine whether a particular interval is one of those which contain the population parameter or one that does not. However, unlike point estimators, confidence intervals have some measure of reliability, the confidence coefficient, associated with them, and for that reason preferred to point estimators. Thus to obtain a 100 (1-α)% confidence interval if α = .05, we have a 95% confidence interval, and when α = .01, we obtain a wider 99% confidence interval. The wider the confidence interval is, the more confident we can be that the given interval contains the unknown parameter. Of course, it is better to be 95% confident that the average life of a machine is between 12 to 15 years than to be 99% confident that it is between 8 to 18 years. Ideally, we prefer a short interval with a high degree of confidence. Sometimes, restrictions on the size of our sample prevent us from achieving short intervals without sacrificing some of our degree of confidence. II-56 Estimation of Parameters Confidence Interval from population Mean Consider a sample has been selected from a normal population or failing this if ‘n’ is sufficiently large. Let the population mean is µ and population variance is σ2. Confidence Interval for µ, σ known If x is the mean of a random sample of size n from a population with known variance σ2, a 100 (1-α)% confidence interval for µ is x - Zα / 2 σ n <µ < x + σ n Zα / 2 where Zα/2 is the Z- value with an area α/2 to its right. The 100(1-α)% provides an estimate of accuracy of our point estimate. If x is used as an estimate of µ, we can then be 100 (1-α)% confident that the error will not exceed Zα / 2 σ n . Frequently, we wish to know, how large a sample is necessary to ensure that the error in estimating the population mean µ will not exceed a specified amount e. Therefore, by σ using the above, we must choose n such that Zα/2 =e. n Sample size for estimating µ If x is used as an estimate of µ, we can be 100(1-α)% confident that the errors by ‘e’ above or ‘e’ below or width of the interval will not exceed W=2e when the sample size is ⎛Z ⎞ ⎛ Z σ⎞ n = ⎜ α / 2 ⎟ or n = 4 ⎜ α /22 ⎟ σ 2 ⎝W ⎠ ⎝ e ⎠ 2 2 when solving for the sample size, n, all fractional values are rounded up to the next whole number. When the value of σ is unknown and sample size is large, then, it can be replaced by 1 n 2 (xi - x) 2 and the above formulae sample standard deviation S, where S = ∑ n i =1 can be used. Example 3.1: Unoccupied seats on flights cause the airlines to lose revenue. Suppose a large airline wants to estimate its average number of unoccupied seats per flight over the past year. To accomplish this, the records of 225 flights are randomly selected and the number of unoccupied seats is noted for each of the sample flights. The sample mean and standard deviation are 57 ESTIMATION OF PARAMETERS x = 11.6 seats S = 4.1 seats Estimate µ, the mean number of unoccupied seats per flight during the past year using a 90% confidence interval. Solution: For 90% confidence interval α = 0.10. The general form for large sample 90% confidence interval for a population mean is S 4.1 x ± Zα / 2 = 11.6 ± 1.645. = 11.6 ± .45 n 225 (11.5, 12.05). That is the airline can be 90% confident that the mean number of unoccupied flights was between (11.15, 12.05) during the sampled year. In this example, we are 90% confident that the sample mean x differs from the true mean by no more than 0.45. If in the above example, we want to know the sample size, so that our estimate µ is not off by more than 0.05 seats. Then we can obtain: 2 ⎛ 1.645 × 4.1 ⎞ 0.05 = 1.645 × 4.1 / n which implies n = ⎜ ⎟ =18195.3121. However, if we can ⎝ 0.05 ⎠ 2 ⎛ 1.645 × 4.1 ⎞ ⎟ = 45.49 ≈ 46 is 1 ⎝ ⎠ have an error margin of 1 flight, than the sample size n = ⎜ enough. Exercise 1: The mean and standard deviation for quality grade-point averages of a random sample of 36 college seniors are calculated to be 2.6 and 0.3, respectively. Obtain 95% and 99% confidence intervals for the entire senior class. (Z0.05=1.96 and Z0.01=2.575). Small sample confidence interval for µ, σ unknown If x and s are mean and standard deviations of a random sample of size n < 30 from an approximate normal population with unknown variance σ2, a 100 ((1-α)% confidence interval for µ is s s x - tα / 2 < µ < x + tα / 2 n n where tα/2 is the t-value with n - 1 degrees of freedom leaving an area of α/2 to the right. Estimating the difference between two population means Confidence Interval for µ1 - µ2 , σ 12 and σ 22 known: If x1 and x2 are the means of independent random samples of size n1 and n2 from populations with known variances σ 12 and σ 22 respectively, a 100(1-α)% confidence interval for µ1-µ2 is given ( x1 - x2 ) - Zα / 2 σ 12 n1 + σ 22 n2 < µ 1 - µ 2 < ( x1 − x2 ) + Zα / 2 II-58 σ 12 n1 + σ 22 n2 Estimation of Parameters The above CI for estimating the difference between two means is applicable if σ 12 and σ 22 are known or can be estimated from large samples. If the sample sizes are small i.e. n1 and n2 are small (<30) and σ 12 and σ 22 are unknown, the above interval will not be reliable. Small-sample Confidence Interval for µ1-µ2; σ 12 = σ 22 = σ2 unknown : If x1 and x2 are the means of small independent random samples of sizes n1 and n2 respectively, from approximate normal populations with unknown but equal variances a 100(1-α)%. CI for µ1-µ2 is given by 1 1 1 1 + < µ 1 - µ 2 < ( x1 - x2 ) + t α / 2 sp + ( x1 - x2 ) − t α / 2 sp n1 n2 n1 n2 where sp is the pooled estimate of the population standard deviation and (n1 − 1) s12 + (n 2 − 1) s22 2 sp = n1 + n 2 − 2 and tα/2 is the t-value with n1+n2-2 degrees of freedom, leaving an area of α/2 to the right. Small sample confidence interval for µ1-µ2; σ 12 ≠ σ 22 unknown : If x1 and s12 and x2 and s22 , are the means and variances of small independent small samples of size n1 and n2 respectively, from approximate normal distributions, with unknown and unequal variances, an approximate 100(1-α)% confidence interval for µ1-µ2 is given by (x1 - x2 ) − tα/2 s12 s22 + < µ1 − µ 2 < ( x1 - x2 ) + tα/2 n1 n2 s12 s22 + n1 n2 where tα/2 is the t-value with 2 ⎧⎡ s2 / n 2 ⎤ ⎡ s2 / n 2 ⎤ ⎫ ⎛ s12 s22 ⎞ ⎪⎢ 1 1 ⎥ 2 2 ⎥ ⎪⎬ + ⎢ ⎜ + ⎟ ⎨ ⎢ ( n2 − 1) ⎥ ⎪ n2 ⎠ ⎝ n1 ⎪⎩⎢⎣ ( n1 − 1) ⎥⎦ ⎣ ⎦⎭ ( ) ( ) degrees of freedom, leaving an area α/2 to the right. Confidence Interval for µD = µ1 -µ2 for paired observations If d and sd are the mean and standard deviation of the differences of n random pairs of measurements, a 100(1-α)% confidence interval for µD = µ1 - µ2 is s s d - tα / 2 d < µ d < d + tα / 2 d n n where tα/2 is the t-value with n-1 degrees of freedom, leaving an area of α/2 to the right. Example 3.2: A random sample of size 30 were taken from an apple orchard. Distribution of weights of apples is given below: 59 ESTIMATION OF PARAMETERS Wt in (gms): 125 frequency : 1 150 4 175 3 200 5 225 4 250 7 275 4 300 1 325 1 350 0 Construct a 95% confidence interval for population mean i.e. average weight of apples if i) The population variance is given to be 46.875gm ii) If the population variance is unknown. Solution : Step 1: Obtain the sample mean x = Σ fi xi = 220.833 Σ fi Step 2: As α = .05, Zα/2 = Z.025 = 1.96 Step 3: Obtain the Interval as follow σ σ ⎞ ⎛ , x + Zα / 2 ⎜ x - Zα / 2 ⎟ = (218.38, 223.28) ⎝ n n⎠ ii) Step 1: Obtain sample variance n 1 s2 = Σ ∑ (xi − x) 2 = 2503.592 n -1 i =1 Step 2: See the value of t29 (.025) = 2.045 Step 3 : Obtain the confidence interval as s s ⎞ ⎛ , x + t n-1,α / 2 ⎜ x - t n-1,α / 2 ⎟ =(202.152, 239.512) ⎝ n⎠ n Large - Sample Confidence Interval for p: If p$ is the proportion of successes in a random sample of size n, and q$ = 1 - p$ , an approximate 100(1 - α) % confidence interval for the binomial parameter p is given by $$ $$ pq pq p$ - Zα/2 < p < p$ + Zα/2 , n n where Zα/2 is the Z value leaving an area of α/2 to the right. The method for finding a confidence interval for the binomial parameter p is also applicable when the binomial distribution is being used to approximate the hypergeometric distribution, that is, when n is small relative to N, population size. Error in Estimating p: If p$ is used as an estimate of p, then we can be 100(1 - α)% confident that the error will not exceed Zα/2 II-60 $ $ / n. pq Estimation of Parameters Sample Size for Estimating p: If p$ is used as an estimate of p, then we can be 100(1 α) % confident that the error will not exceed a specified amount e above or below when the sample size is $$ Z 2 pq n= α /2 . e2 The above result is somewhat misleading in the sense that we must use p$ to determine the sample size n, but p$ is computed from the sample. If a crude estimate of p can be made without taking a sample, we could use this value for p$ and then determine n. Lacking such an estimate, we could take a preliminary sample of size n ≥30 to provide an estimate of p. Then using the above result regarding the sample size, we could determine approximately how many observations are needed to provide the desired degree of accuracy. Once again, all fractional values of n are rounded up to the next whole number. Therefore, if we substitute p$ = 1/2 into the formula for n. When, in fact, p actually differs from 1/2 then n will turn out to be larger than necessary for the specified degree of confidence and as a result our degree of confidence will increase. If p$ is used as an estimate of p, we can be at least 100(1-α)% confident that the error will not exceed a specified amount e when the sample size is Z2 n = α /22 4e Large - Sample Confidence Interval for p1 - p2 If p$ 1 and p$ 2 are the proportion of success in random samples of size n1 and n2 , respectively, q$ 1 = 1 - p$ 1 and , q$ 2 = 1 - p$ 2 , an approximate 100(1-α)% confidence interval for difference to two binomial parameters , p1 - p2 , is given by p$ 1q$ 1 p$ 2 q$ 2 p$ 1q$ 1 p$ 2 q$ 2 p$ 1 − p$ 2 -Zα/2 + < p1 - p2 < p$ 1 − p$ 2 + Zα/2 + , n1 n2 n1 n2 where Zα/2 is the Z value leaving an area of α/2 to the right. Confidence Interval for σ2 If s2 is the variance of a random sample of size n from a normal population, a 100(1-α)% confidence interval σ2 is given by (n − 1) s2 (n − 1)s2 2 < σ < 2 2 χα /2 χ 1-α / 2 where χ α2 / 2 and χ 1-2 α / 2 are χ 2 values with n-1 degrees of freedom leaving areas of α/2 and 1-α/2, respectively to the right and left. A 100(1-α)% confidence interval for σ is obtained by taking the square root of each endpoint of the interval for σ2. 61 ESTIMATION OF PARAMETERS Example 3.3: The following are the volumes, in deciliters, of 10 cans of peaches distributed by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0. Find a 95% confidence interval for the variance of all such cans of peaches distributed by this company, assuming volume to be a normally distributed variable. Solution: Step1: Find the sample variance s2 = 0.286. Step 2: To obtain a 95% confidence interval, we choose α = 0.05. Then the value of χ α2 / 2 ( χ 20.025 ) = 19.023 and χ 1-2 α / 2.. ( χ 20.975 ) = 2.700. Step 3: Substitute the above values in the formula (n − 1) s2 (n − 1)s2 2 < σ < 2 2 χα /2 χ 1-α / 2 We get the 95% confidence interval ( 9)( 0.286) (9)(0.286) < σ2 < , 19.023 2.700 or simply 0.135 < σ2 < 0.953. σ 12 Confidence Interval for 2 σ2 If s12 and s22 are the variances of independent samples of size n1 and n2, respectively, from normal populations, then a 100(1-α)% confidence interval for s12 1 < 2 s2 Fα / 2, (v 1 ,v 2 ) σ 12 s12 < Fα / 2, (v σ 22 s22 σ 12 is σ 22 2, , v1 ), where Fα/2,(v1,v2) is an F value with v1 = n1-1 and v2 = n2-1 degrees of freedom leaving an area of α/2 to the right, and Fα/2,(v2,v1) is a similar F value with v2 = n2 - 1 and v1 = n1 - 1 degrees of freedom. Example 3.4: A standardized placement test in mathematics was given to 25 boys and 16 girls. The boys made an average grade of 82 with a standard deviation of 8, while the girls made an average grade of 78 with a standard deviation of 7. Find a 98% confidence σ 12 σ1 σ 12 for 2 and , where 2 are the variances of populations of grades for all boys and σ2 σ2 σ2 girls, respectively, who at some time have taken or will take this test. Assume the populations to be normally distributed. Solution: We have n1 = 25, n2 = 16, s1 = 8 and s2 = 7. II-62 Estimation of Parameters Step 1 : For a 98% confidence interval, α = 0.02 and F0.01 (24,15) = 3.29 and F0.0r (15,24) = 2.89. Step 2 : Substituting these in the formula 1 s12 s12 σ 12 < < Fα / 2, (v 2, , v1 ), s22 s22 Fα / 2, (v , v2 ) σ 22 we obtain the 98% confidence interval 64 49 1 3 .2 9 < which simplifies to σ σ 2 1 2 2 < 0.397 < 64 ( 2 .8 9 ) 49 σ 12 < 3.775. σ 22 Step 3 : take square roots of the confidence limits, a 98% confidence interval for 0.630 < σ1 < 1.943. σ2 σ1 is σ2 References Grewal, P.S. (1990). Methods of Statistical Analysis : second edition. Sterling Publishers Pvt. Ltd., New Delhi. Meyer, P.L. (1970). Introductory probability and statistical applications. Oxford & IBH Publishing Co., New Delhi. Snedecor, G.W. and Cochran, W.G. (1967). Statistical Methods: Sixth edition. Oxford & IBH Publishing Co., New Delhi. Walpole, R.E. (1982). Introduction to Statistics: Third Edition. Macmillan Publishing Co., Inc., New York. 63