Distribution for Sample Mean Distribution for Sample Mean Proposition Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean value µ and standard deviation σ. Then 1. E (X ) = µX = µ √ 2. V (X ) = σ 2 = σ 2 /n and σX = σ/ n X Distribution for Sample Mean Proposition Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean value µ and standard deviation σ. Then 1. E (X ) = µX = µ √ 2. V (X ) = σ 2 = σ 2 /n and σX = σ/ n X In words, the expected value of the sample mean equals the population mean, which is called the unbiased property. And the variance of the sample mean equals n1 of the population variance Distribution for Sample Mean Distribution for Sample Mean Proposition Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean value µ and standard deviation σ. Define T0 = X1 + X2 + · · · + Xn , then √ E (T0 ) = nµ, V (T0 ) = nσ 2 and σT0 = nσ Distribution for Sample Mean Distribution for Sample Mean Proposition Let X1 , X2 , . . . , Xn be a random sample from a normal distribution with mean value µ and standard deviation σ. Then for any n, X is normally distributed (with mean value µ and standard deviation √ σ/ n), as is T0 (with mean value nµ and standard deviation √ nσ). Distribution for Sample Mean Distribution for Sample Mean The Central Limit Theorem (CLT) Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean value µ and standard deviation σ. Then if n is sufficiently large, X has approximately a normal distribution with mean value √ µ and standard deviation σ/ n, and T0 also has approximately a normal distribution with mean value nµ and standard deviation √ nσ. The larger the value of n, the better the approximation. Distribution for Linear Combinations Distribution for Linear Combinations Proposition Let X1 , X2 , . . . , Xn have mean values µ1 , µ2 , . . . , µn , respectively, and variances σ12 , σ22 , . . . , σn2 , respectively. 1.Whether or not the Xi s are independent, E (a1 X1 + a2 X2 + · · · + an Xn ) = a1 E (X1 ) + a2 E (X2 ) + · · · + an E (Xn ) = a1 µ1 + a2 µ2 + · · · + an µn 2. If X1 , X2 , . . . , Xn are independent, V (a1 X1 + a2 X2 + · · · + an Xn ) = a12 V (X1 ) + a22 V (X2 ) + · · · + an2 V (Xn ) = a12 σ12 + a22 σ22 + · · · + an2 σn2 Distribution for Linear Combinations Distribution for Linear Combinations Proposition (Continued) Let X1 , X2 , . . . , Xn have mean values µ1 , µ2 , . . . , µn , respectively, and variances σ12 , σ22 , . . . , σn2 , respectively. 3. More generally, for any X1 , X2 , . . . , Xn V (a1 X1 + a2 X2 + · · · + an Xn ) = n X n X i=1 j=1 ai aj Cov (Xi , Xj ) Distribution for Linear Combinations Proposition (Continued) Let X1 , X2 , . . . , Xn have mean values µ1 , µ2 , . . . , µn , respectively, and variances σ12 , σ22 , . . . , σn2 , respectively. 3. More generally, for any X1 , X2 , . . . , Xn V (a1 X1 + a2 X2 + · · · + an Xn ) = n X n X ai aj Cov (Xi , Xj ) i=1 j=1 We call a1 X1 + a2 X2 + · · · + an Xn a linear combination of the Xi ’s. Distribution for Linear Combinations Distribution for Linear Combinations Example (Problem 64) Suppose your waiting time for a bus in the morning is uniformly distributed on [0,8], whereas waiting time in the evening is uniformly distributed on [0,10] independent of morning waiting time. Distribution for Linear Combinations Example (Problem 64) Suppose your waiting time for a bus in the morning is uniformly distributed on [0,8], whereas waiting time in the evening is uniformly distributed on [0,10] independent of morning waiting time. a. If you take the bus each morning and evening for a week, what is your total expected waiting time? Distribution for Linear Combinations Example (Problem 64) Suppose your waiting time for a bus in the morning is uniformly distributed on [0,8], whereas waiting time in the evening is uniformly distributed on [0,10] independent of morning waiting time. a. If you take the bus each morning and evening for a week, what is your total expected waiting time? b. What is the variance of your total waiting time? Distribution for Linear Combinations Example (Problem 64) Suppose your waiting time for a bus in the morning is uniformly distributed on [0,8], whereas waiting time in the evening is uniformly distributed on [0,10] independent of morning waiting time. a. If you take the bus each morning and evening for a week, what is your total expected waiting time? b. What is the variance of your total waiting time? c. What are the expected value and variance of the difference between morning and evening waiting times on a given day? Distribution for Linear Combinations Example (Problem 64) Suppose your waiting time for a bus in the morning is uniformly distributed on [0,8], whereas waiting time in the evening is uniformly distributed on [0,10] independent of morning waiting time. a. If you take the bus each morning and evening for a week, what is your total expected waiting time? b. What is the variance of your total waiting time? c. What are the expected value and variance of the difference between morning and evening waiting times on a given day? d. What are the expected value and variance of the difference between total morning waiting time and total evening waiting time on a particular week? Distribution for Linear Combinations Distribution for Linear Combinations Corollary E (X1 − X2 ) = E (X1 ) − E (X2 ) and, if X1 and X2 are independent, V (X1 − X2 ) = V (X1 ) + V (X2 ). Distribution for Linear Combinations Corollary E (X1 − X2 ) = E (X1 ) − E (X2 ) and, if X1 and X2 are independent, V (X1 − X2 ) = V (X1 ) + V (X2 ). Proposition If X1 , X2 , . . . , Xn are independent, normally distributed rv’s (with possibly different means and/or variances), then any linear combination of the Xi s also has a normal distribution. In particular, the difference X1 − X2 between two independent, normally distributed variables is itself normally distributed. Distribution for Linear Combinations Distribution for Linear Combinations Example (Problem 62) Manufacture of a certain component requires three different maching operations. Machining time for each operation has a normal distribution, and the three times are independent of one another. The mean values are 15, 30, and 20min, respectively, and the standard deviations are 1, 2, and 1.5min, respectively. Distribution for Linear Combinations Example (Problem 62) Manufacture of a certain component requires three different maching operations. Machining time for each operation has a normal distribution, and the three times are independent of one another. The mean values are 15, 30, and 20min, respectively, and the standard deviations are 1, 2, and 1.5min, respectively. What is the probability that it takes at most 1 hour of machining time to produce a randomly selected component? Point Estimation Point Estimation Example (a variant of Problem 62, Ch5) Manufacture of a certain component requires three different maching operations. The total time for manufacturing one such component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. If we did an experiment in which we manufactured 10 components and record the operation time, and the sample time is given as 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 following: 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 What can we say about the population mean µ and population variance σ 2 ? Point Estimation Point Estimation Example (a variant of Problem 64, Ch5) Suppose the waiting time for a certain bus in the morning is uniformly distributed on [0, θ], where θ is unknown. If we record 10 waiting times as follwos: 1 2 3 4 5 time 7.6 1.8 4.8 3.9 7.1 6 7 8 9 10 time 6.1 3.6 0.1 6.5 3.5 What can we say about the parameter θ? Point Estimation Point Estimation Definition A point estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called the point estimator of θ. Point Estimation Definition A point estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called the point estimator of θ. P e.g. X = 10 i=1 Xi /10 is a point estimator for µ for the normal distribution example. Point Estimation Definition A point estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called the point estimator of θ. P e.g. X = 10 i=1 Xi /10 is a point estimator for µ for the normal distribution example. The largest sample data X10,10 is a point estimator for θ for the uniform distribution example. Point Estimation Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: unbiasedness, Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: unbiasedness, minimum variance, Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: unbiasedness, minimum variance, and mean square error. Point Estimation Point Estimation Definition A point estimator θ̂ is said to be an unbiased estimator of θ if E (θ̂) = θ for every possible value of θ. If θ̂ is not unbiased, the difference E (θ̂) − θ is called the bias of θ̂. Point Estimation Definition A point estimator θ̂ is said to be an unbiased estimator of θ if E (θ̂) = θ for every possible value of θ. If θ̂ is not unbiased, the difference E (θ̂) − θ is called the bias of θ̂. Principle of Unbiased Estimation When choosing among several different estimators of θ, select one that is unbiased. Point Estimation Point Estimation Proposition Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean µ and variance σ 2 . Then the estimators Pn Pn (Xi − X )2 2 2 i=1 Xi µ̂ = X = and σ̂ = S = i=1 n n−1 are unbiased estimator of µ and σ 2 , respectively. e If in addition the distribution is continuous and symmetric, then X and any trimmed mean are also unbiased estimators of µ. Point Estimation Point Estimation Principle of Minimum Variance Unbiased Estimation Among all estimators of θ that are unbiased, choose the one that has minimum variance. The resulting θ̂ is called the minimum variance unbiased estimator ( MVUE) of θ. Point Estimation Principle of Minimum Variance Unbiased Estimation Among all estimators of θ that are unbiased, choose the one that has minimum variance. The resulting θ̂ is called the minimum variance unbiased estimator ( MVUE) of θ. Theorem Let X1 , X2 , . . . , Xn be a random sample from a normal distribution with mean µ and variance σ 2 . Then the estimator µ̂ = X is the MVUE for µ. Point Estimation Point Estimation Definition Let θ̂ be a point estimator of parameter θ. Then the quantity E [(θ̂ − θ)2 ] is called the mean square error (MSE) of θ̂. Point Estimation Definition Let θ̂ be a point estimator of parameter θ. Then the quantity E [(θ̂ − θ)2 ] is called the mean square error (MSE) of θ̂. Proposition MSE = E [(θ̂ − θ)2 ] = V (θ̂) + [E (θ̂) − θ]2 Point Estimation Point Estimation Definition The standard error of an estimator θ̂ is its standard deviation q σθ̂ = V (θ̂). If the standard error itself involves unknown parameters whose values can be estimated, substitution of these estimates into σθ̂ yields the estimated standard error (estimated standard deviation) of the estimator. The estimated standard error can be denoted either by σ̂θ̂ or by sθ̂ . Methods of Point Estimation Definition Let X1 , X2 , . . . , Xn be a random sample from a distribution with pmf or pdf f (x). For k = 1, 2, 3, . . . , the kth population moment, or kth moment of the distribution f (x), is E (X k ). 1 Pn The kth sample moment is n i=1 Xik . Definition Let X1 , X2 , . . . , Xn be a random sample from a distribution with pmf or pdf f (x; θ1 , . . . , θm ), where θ1 , . . . , θm are parameters whose values are unknown. Then the moment estimators θ̂1 , . . . , θ̂m are obtained by equating the first m sample moments to the corresponding first m population moments and solving for θ1 , . . . , θ m . Methods of Point Estimation Suppose that a coin is biased, and it is known that the average proportion of heads is one of the three values p = .2, .3, or .8. An experiment consists of tossing the coin twice and observing the number of heads. This could be modeled as a random sample X1 , X2 of size n = 2 from a Bernoulli distribution, Xi ∼ BER(p), where the parameter is one of .2, .3, .8. Consider the joint pdf of the random sample f (x1 , x2 ; p) = p x1 +x2 (1 − p)2−x1 −x2 for xi p .2 .3 .8 = 0 or (0,0) .64 .49 .04 1. The (0,1) .16 .21 .16 values of f (x1 , x2 ; p) are provided as follows (1,0) (1,1) .16 .04 .21 .09 .16 .64 Methods of Point Estimation The values of f (x1 , x2 ; p) are provided as follows p (0,0) (0,1) (1,0) (1,1) .2 .64 .16 .16 .04 .3 .49 .21 .21 .09 .16 .16 .64 .8 .04 The estimate that maximizes the “likelihood” for an observed pair (x1 , x2 ) is .2 if (x1 , x2 ) = (0, 0) p̂ = .3 if (x1 , x2 ) = (0, 1) or (1, 0) .8 if (x1 , x2 ) = (1, 1) Methods of Point Estimation Definition Let X1 , X2 , . . . , Xn have joint pmf or pdf f (x1 , x2 , . . . , xn ; θ) where the parameter θ is unknown. When x1 , . . . , xn are the observed sample values and the above function f is regarded as a function of θ, it is called the likelihood function and often is denoted by L(θ). The maximum likelihood estimate (mle) θ̂ is the value of θ that maximize the likelihood function, so that f (x1 , x2 , . . . , xn ; θ̂) ≥ f (x1 , x2 , . . . , xn ; θ) for all θ When the Xi s are substituted in place of the xi s, the maximu likelihood estimator result. Methods of Point Estimation The Invariance Principle Let θ̂ be the mle of the parameter θ. Then the mle of any function h(θ) of this parameter is the function h(θ̂). Proposition Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter θ is approximately unbiased [E (θ̂) ≈ θ] and has variance that is nearly as small as can be achieved by any estimator. Stated another way, the mle θ̂ is approximately the MVUE of θ.