Statistics II Xavier Vilà Universitat Autònoma de Barcelona Year 2020-2021 Statistics II 1. Introduction to Inferential Statistics and Estimation Statistical Inference is a collection of techniques by means of which we can draw conclusions with regards to a Year 2019 - 2020 reality from the study of a sample of such reality 1 Statistics II It is important to understand that probabilistic • statistics is based on • any statistical conclusion drawn from a to the whole Year 2019 - 2020 reality, techniques. sample will not be true for sure when applied but only with a certain probability. 2 Statistics II Example 1 When an electoral survey is conducted it is clear that its results do not exactly coincide with the results in the nal election. Nevertheless, if the survey is "well done", that is, if the sample (which in this case is the set of people interviewed) closely represents the whole reality (which in this case is the whole population that has the right to vote), then the survey result will be close to the nal results with a high probability Year 2019 - 2020 3 Statistics II 1.1 Inferential Statistics: Denition and Inference Methods Statistical inference is mainly built upon four main concepts, which will be dened and described below. Population Is the set of elements that are the object of study. The goal will be to draw some conclusion regarding some specic feature of this population. Example 2 All the apples in the world. The feature at study is whether an apple falls down or not. Example 3 Labor force in the European Union. The feature at study is whether a worker is unemployed or not. Example 4 Production of Intel chips in a given day. The feature at study is whether a chip is faulty or not. Year 2019 - 2020 4 Statistics II Sample Subset of the Population used to draw conclusions about the population Example 5 50 apples in Newton's garden. Example 6 Unemployment statistics at the European Union. Example 7 25 Intel chips manufactured in a given day. Year 2019 - 2020 5 Statistics II Parameter Is the feature of the population that we want to know something about. This feature has to be a numerical one and, obviously, its true value must be unknown Example 8 What is the proportion of falling apples. Example 9 What is the unemployment rate at the European Union Example 10 What is the proportion of faulty chips among those produced in a given day. Year 2019 - 2020 6 Statistics II Statistic Computation made using the elements in the an approximation to the true value of the sample parameter. and used to get It is important to notice that this value will be known (since we will compute it) and will be used to draw conclusions on the true value of the parameter, which is unknown and is what is of interest to us. Example 11 Proportion of falling apples among the 50 sampled apples in Newton's garden. Example 12 Unemployment rate among the workers interviewed in the unemployment statistics in the European Union. Example 13 Proportion of faulty chips among the 25 selected chips produced in a given day. Year 2019 - 2020 7 Statistics II From this four main concepts, the process of statistical inference works as follows: 1. Using sampling techniques that will be explained below, a the population that is going to be studied. 2. From this 3. From this sample is selected from sample, the proper computations are done in order to obtain a statistic. statistic, using some statistical inference technique that we will see in other chapters, some conclusions are drawn regarding the unknown population parameter that represents the feature of the population that is to be studied. Year 2019 - 2020 8 Statistics II This process can be represented as in Figure 1 Population Parameter (unkonwn) Statistical Inference Sampling Sample Statistic (known) Figure 1: The process of Statistical Inference Year 2019 - 2020 9 Statistics II We can now provide a denition for Statistics (or Statistical Inference, to be more precise) which is more formal than the one oered in the introduction. Denition 14 Statistical Inference is a subject whose main objective is to draw conclusions regarding a population through the study of one sample by means of probabilistic techniques. Year 2019 - 2020 10 Statistics II 1.2 Denition, characteristics and Distribution of the main sample statistics: mean, variance and proportion Once the sample is obtained (we will always assume that using a SRS), the process of working with it and draw conclusions starts. In this sense, the main task is now to obtain a in statistical inference. population statistic, one of the main elements We will use it to produce conclusions regarding the unknown parameter that is of interest to us. The denition that follows will remind us what a Then, the concept of statistic is estimate is dened. Although these two concepts are very similar and closely related, it is very important to notice that they are not the same thing. Year 2019 - 2020 11 Statistics II Denition 15 A statistic (or estimator) is a formula that uses the values in the sample at hand (observations) in order to produce an approximation to the true value of an unknown population parameter. Denition 16 An estimate (or estimation) is the particular value of an estimator that is obtained from a particular sample of data and normally used to indicate the value of an unknown population parameter. Hence, statistic is not a number but a formula • a • an estimate is the number that is obtained when the formula (the estimator) is applied to the observations of the specic sample that we have at hand. Year 2019 - 2020 12 Statistics II Important Given that the sample is obtained by means of a random technique, the random variable The statistic is a statistic will produce dierent estimates with dierent probabilities (depend- ing on the specic sample that is nally "selected" at random). In this sense, an estimate is a specic realization of this random variable. The following example aims to clarify this idea. Year 2019 - 2020 13 Statistics II Example 17 We want to know the average number of cars per family in a given population. To keep it simple, we will assume that the population is very small, only 4 families, P opulation = {A, B, C, D} Let us now assume that • family A owns 1 car, • families B and C have 2 cars each, and • family D has 4. Year 2019 - 2020 14 Statistics II For the study, we • want to obtain a random sample of size 2. • compute the average number of cars in the sample • use it to infer some conclusion regarding the true average in the population. The sample mean (or just mean, for short) will play the role of statistic in this example. We will use it to draw conclusions on the true population parameter that is of interest to us: the average number of cars per family in the whole population, that is, the population mean. Year 2019 - 2020 15 Statistics II The following Table summarizes: 1. the 6 possible samples than can be the result of a sampling process on this population, 2. the probability of being selected (all of them will have the same probability as we are assuming SRS) 3. the estimate value that would result from applying the sample average formula to the corresponding sample Elements Probability Estimate Year 2019 - 2020 Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 {A, B} {A, C} {A, D} {B, C} {B, D} {C, D} 1 6 1 6 1 6 1 6 1 6 1 6 1.5 1.5 2.5 2 3 3 16 Statistics II In this example we can see how the statistic at use (sample mean) can take 4 dierent values, depending on which of the six possible samples is selected by the SRS. It is easy to see that • the value 1.5 corresponds to two possible samples (Sample 1 and Sample 2). • each sample has the same probability of being selected ( 16 ), thus, the probability that the statistic takes the value 1.5 is: P (statistic = 1.5) = P (Sample 1) + P (Sample 2) = Year 2019 - 2020 1 1 1 + = 6 6 3 17 Statistics II We summarize what are the possible values the statistic can take an what is the probability associated to each of them: 1.5 2 statistic value = 2.5 3 p= p= p= p= 1 3 1 6 1 6 1 3 In this example, we have seen how the statistic can take dierent values (4 in this case) with dierent probabilities. Hence, the statistic is a random variable Year 2019 - 2020 18 Statistics II It will be necessary to know their main properties and, statistics that are more frequently used. The main statistics (or estimators) that are studied are specially, the probability distributions of the • the sample mean, • the sample variance, • the sample proportion. and In all cases, we will assume that a sample of size n has been obtained by means of a SRS. The elements of the sample will be denoted by {x1, x2, · · · xn} Year 2019 - 2020 19 Statistics II Also, we will assume that The sample has been selected form a population that follows a given distribution This distribution is very important as it will inuence the sampling result and, hence, the possible values of the statistic as we have seen in the previous example. In that example we have seen that the population is distributed so that there is • 1 element with 1 car, • 2 elements with 2 cars, and • 1 element with 4 cars. Year 2019 - 2020 20 Statistics II Therefore, if we pick the sample element have that: p(xi = a) = This is, in this case, the Year 2019 - 2020 distribution xi 1 4 1 2 1 4 0 at random from this population, we will if a = 1 if a = 2 if a = 4 otherwise of the population 21 Statistics II Year 2019 - 2020 Graphically 22 Statistics II In general, we will assume that the Sample has been obtained by means of a SRS from a population distributed according to a Normal Distribution with some Population Mean µ and some Population Variance σ 2. What does it mean ? Easy, it means that for any two numbers for any element in our sample a and b, we have that xi , p(a ≤ xi ≤ b) = p(a − µ ≤ xi − µ ≤ b − µ) = = p( where Z a − µ xi − µ b − µ a−µ b−µ ≤ ≤ ) = p( ≤Z≤ ) σ σ σ σ σ represents the Standard Normal Year 2019 - 2020 distribution, usually denoted by N (0, 1), 23 Statistics II whose associated probabilities are found in tables. Year 2019 - 2020 24 Statistics II Year 2019 - 2020 Graphically 25 Statistics II We turn next to the study of the distributions of the three main we have discussed above, will depend on the distribution statistics. These, as of the population from which we obtain the sample. 1.2.1 The Sample Mean 1.2.2 The Sample Variance 1.2.3 The Sample Proportion For each case, we will be interested in knowing what is the and the variance Year 2019 - 2020 distribution, the expectation of these statistics. 26 Statistics II Sample mean, 1.2.1 Sample Mean denoted by X̄ , is the statistic using the formula: X̄ = that is obtained from the sample n X xi i=1 n It is normally used to infer conclusions regarding the true value of the mean µ. Population Distribution Its distribution depends on the characteristics of both the population and the sample Year 2019 - 2020 27 Statistics II 1. If the population is Normal, that is, Xi ∼ N (µ, σ 2) ∀i, then we have that σ2 X̄ ∼ N (µ, ) n 2. If the population is not Normal but the sample is big enough, then: X̄ − µ q ∼ N (0, 1) (approx.) σ2 n 3. If the population is not sample mean X̄ Year 2019 - 2020 Normal and the sample is small, then the distribution of the is unknown in general. 28 Statistics II 4. If the population variance σ2 is unknown and the population is Normal, then X̄ − µ q ∼ tn−1 S2 n S 2 is the sample variance (that we will see next) and tn−1 is the t − student distribution with n − 1 degrees of freedom, which is very similar to the N (0, 1) distribution and whose values can also be found in tables. where We turn next to the study of the expectation and variance of this statistic. To do so, we will use the mathematical properties of the expectation and variance of a random variable. We will assume that the sample has been obtained from a population with mean µ element and xi Year 2019 - 2020 population variance σ 2. That is, E(xi) = µ and population V (xi) = σ 2 for any in the sample. 29 Statistics II Expectation n n n n X X X X xi xi 1 µ E(X̄) = E( )= E( ) = E(xi) = =µ n n n n i=1 i=1 i=1 i=1 Variance n n n n X X X X xi xi 1 σ2 σ2 V (X̄) = V ( )= V (xi) = = V( )= 2 2 n n n n n i=1 i=1 i=1 i=1 Year 2019 - 2020 30 Statistics II Therefore, for the case of the sample mean X̄ we have that E(X̄) = µ V (X̄) = Year 2019 - 2020 σ2 n 31 Statistics II 1.2.2 Sample Variance Sample variance, denoted by S 2, is the statistic that is obtained from the sample using the formula: 1 X (xi − X̄)2 S = n−1 2 It is normally used to infer conclusions regarding the true value of the variance σ 2. Population Distribution Its distribution depends on the characteristics of the population. Year 2019 - 2020 32 Statistics II 1. If the population is Normal, (Xi ∼ N (µ, σ 2) ∀i), then: (n − 1)S 2 2 ∼ χ n−1 σ2 where χ2n−1 is the chi-square distribution with n − 1 degrees of freedom, whose values are also in tables. 2. If the population is not Normal, then the distribution of the sample variance is unknown in general, even for large samples. Since we only know the distribution of the Normal, sample variance when the population is χ2n−1 to nd the 2 In this sense, we know that for any χ variable we have we will use the fact that in that case its distribution is expectation and variance easily. that Year 2019 - 2020 33 Statistics II • E(χ2n−1) = n − 1 • V (χ2n−1) = 2(n − 1). Hence, we will assume the the sample has been obtained from a with xi sample mean µ and sample variance σ 2. That is, Normal population xi ∼ N (µ σ 2) for any element in the sample. Hence: (n − 1)S 2 2 ∼ χ n−1 σ2 Expectation (n − 1)S 2 (n − 1) 2 2 2 E( ) = n − 1 ⇒ E(S ) = n − 1 ⇒ E(S ) = σ σ2 σ2 Year 2019 - 2020 34 Statistics II Variance 4 (n − 1)S 2 (n − 1)2 2σ 2 2 V( ) = 2(n − 1) ⇒ V (S ) = 2(n − 1) ⇒ V (S )= σ2 (σ 2)2 n−1 Therefore, for the case of the sample variance S2 we have that E(S 2) = σ 2 2 V (S ) = Year 2019 - 2020 2σ 4 n−1 35 Statistics II 1.2.3 Sample Proportion Sample proportion is used when we are interested in knowing which proportion of elements in a population that have a given characteristic. is the true For instance, it might be of interest to know what is the proportion of smokers among the second year students in this school (in this case, the characteristic that is of interest is "whether a student smokes or not") Year 2019 - 2020 36 Statistics II Sample proportion, denoted by π̂ , is the statistic that is obtained using the formula: π̂ = xi = 1 if the i-th element in studying and xi = 0 if it does not. where X xi n the sample has the characteristic that we are Sample proportion π̂ is normally used to infer conclusions regarding the true population sample π . Distribution In this case, the population is never Normal since each observation xi comes from a Bernoulli random variable. Year 2019 - 2020 37 Statistics II Let us assume that we are looking at a population of 100 individuals out of which 45 are smokers. That is, the true population proportion is 45% or From this population we want to obtain a sample of size 10. element xi π = 0.45. It is clear that for any of the sample we will have that: 45 p(xi = 1) = = 0.45 100 55 = 0.55 p(xi = 0) = 100 Hence, we see that each element parameter Year 2019 - 2020 π (where π xi in the sample follows a is the true and unknown Bernoulli population proportion distribution with 38 Statistics II It can be shown then that π̂ = is a Binomial Normal n random variable. Also, given that when samples are large a by a X xi Binomial distribution can be approximated distribution, we can conclude that, in general: 1. If the sample is large enough (nπ(1 − π) > 5), π̂ ∼ N (π, This approximation is better the closer to then (approx.): π(1 − π) ) n 0, 5 is π and the larger is the sample 2. If the sample is not large, then the approximation is very bad. Year 2019 - 2020 39 Statistics II Expectation E(π̂) = π Variance V (π̂) = π(1 − π) n Therefore, for the case of the sample proportion π̂ we have that E(π̂) = π V (π̂) = Year 2019 - 2020 π(1 − π) n 40 Statistics II • 1.3 Point and interval estimation Statistical estimation is the simplest inference technique. It allows for a quick approximation to the true value of the parameter of interest. • Its objective is to produce a rst approximated measure of the parameter we want to study. This measure will be improved later on by means of more elaborated statistical inference techniques. • We will learn how to use the statistics learned previously to produce conclusions (very preliminary at this point) regarding the true population parameters. and • condence intervals Point estimation will be the techniques that we will use. Later in this section we will investigate the main properties of these estimators, as well as other more advanced topics like Year 2019 - 2020 Maximum likelihood estimation and the 41 Statistics II method of moments which will allow us to design good estimators for the case we do not know which one to use. Year 2019 - 2020 42 Statistics II • A point estimation 1.3.1 Point estimation is the simplest method to produce estimations for a population parameter, that is, an approximation to its true value. • To obtain a point estimation or estimate we just need to apply our estimator to the specic sample at hand. Example 18 Imagine that we want to obtain an approximation to the true value of the population mean µ of a given population. We know that the sample mean X̄ is a good estimator of µ. Hence, this will be the estimator we use. Imagine that the sample we have is Sample = {1, 2, 3, 4} Year 2019 - 2020 43 Statistics II Then X̄ = 1+2+3+4 = 2.5 4 Hence, in this case the point estimation (or estimate) we get for µ is 2.5 • A point estimation has the advantage of being an easy and quick method of estimation. • On the other hand, it does not provide much information about the parameter, and is not very accurate either. In the example above: • The value of mean µ Year 2019 - 2020 X̄ that we have found suggests that the true value of the population will be around 2.5 44 Statistics II • We do not know, though, if it will be larger or smaller • We do not know if it will be near 2.5 or not. • We do not know anything about the accuracy of our estimation Such lack of precision can be somehow xed with the next method of estimation Year 2019 - 2020 45 Statistics II 1.3.2 Interval estimation We will use now the knowledge we have about the probability distribution of the sample point estimation with additional information. In this way, we will produce an interval that will contain, with some probability, the true value statistics to supplement the of the unknown population parameter. That is, we will be able now to "measure" the accuracy of our estimation. In this sense, the outcome of an interval estimation will be something similar to (for the case of the mean): µ ∈ [2.25 , 2.75] with probability 95% condence intervals • The intervals obtained using this method are called • The probability that the interval contains the population parameter is the dence level, usually denoted by 1 − α. Year 2019 - 2020 con46 Statistics II We will study how to construct condence intervals for: 1.3.2.1 The population mean 1.3.2.2 The population variance 1.3.2.3 The population proportion Year 2019 - 2020 µ σ2 π 47 Statistics II 1.3.2.1 Condence Interval for the mean We will see next how to build the condence interval for the case when we need to produce an estimation for the population mean Case I: Normal µ Population (or large sample) and σ2 known We know that in this case, X̄ − µ q ∼ N (0, 1) σ2 n hence p(−z Year 2019 - 2020 1− α 2 X̄ − µ ≤ q ≤ z1− α2 ) = 1 − α σ2 n 48 Statistics II where of z1− α2 1 − α2 . is the value that corresponds to a That is, P (Z ≤ z where Z represents a Year 2019 - 2020 N (0, 1) 1− α 2 N (0, 1) whose left tail contains an area α )=1− 2 and this value can be found in tables. 49 Statistics II Graphically α/2 α/2 −Z1−α/2 Year 2019 - 2020 0 Z1−α/2 50 Statistics II Doing some algebra inside the inequalities we get, r p(−X̄ − z1− α2 multiplying by −1 σ2 n r ≤ −µ ≤ −X̄ + z1− α2 σ2 )=1−α n we reverse the "direction" of the inequalities, and hence r p(X̄ + z1− α2 σ2 n r ≥ µ ≥ X̄ − z1− α2 σ2 )=1−α n at the end we get the interval we were looking for, r µ ∈ [X̄ − z1− α2 Year 2019 - 2020 σ2 n r , X̄ + z1− α2 σ2 ] n with probability 1−α 51 Statistics II Example 19 Let {x1, x2, · · · , x100} be a random sample of size 100 drawn from a Normal population with unknown mean and variance σ 2 = 1.000.000. Construct a condence interval with a condence level of 95% for the population mean µ if we know that the sample mean is X̄ = 26.000. If the condence level is 95% we have that 1 − α = 0.95. Hence, α = 0.05 and α 2 = 0.025. Therefore, α 1 − = 0.975 2 The interval will be of the form r [X̄ − z1− α2 Year 2019 - 2020 σ2 , X̄ + z1− α2 n r σ2 ] n 52 Statistics II where all the values are known except for the values Z that correspond to a Normal distribution. In this case we have to look up the tables for the value Z1− α2 = Z0.975 That is, the value of a N (0, 1) that has to its left a probability of 0.975. In the tables we nd Z0.975 = 1.96 Thus, r σ2 r σ2 ]= n r [X̄ − z1− α2 , X̄ + z1− α2 n r 1.000.000 = [26.000 − 1.96 , 26.000 + 1.96 100 1.000.000 ] 100 Doing the computations, we nally get Year 2019 - 2020 53 Statistics II µ ∈ [25.804, 26.196] with a probability of 95% Year 2019 - 2020 54 Statistics II Case II: Normal Population (or large sample) and σ2 unknown In the previous case we need to know the true value of the population variance σ2 in order to compute the interval. This is highly unusual. To overcome this problem we can σ 2 by its unbiased estimator S 2. The only dierence is that now we can not use N (0, 1), but the t − student with n − 1 degrees of freedom. replace the r µ ∈ [X̄ − t1− α2 S2 , X̄ + t1− α2 n r S2 ] n with probability t1− α2 is the value that corresponds to a t − student α of 1 − 2 and that can be found in tables as well. where area (when n is large, then Year 2019 - 2020 t1− α2 is approximately equal to a 1−α whose left tail contains an z1− α2 ) 55 Statistics II Example 20 Let {x1, x2, · · · , x100} be a random sample of size 100 drawn from a Normal population with unknown mean and variance. Construct a condence interval with a condence level of 95% for the population mean µ if we know that the sample mean is X̄ = 26.000 and the sample variance is S 2 = 980.000 If the condence level is 95% we have that 1 − α = 0.95. Hence, α = 0.05 and α 2 = 0.025. Therefore, α 1 − = 0.975 2 The interval will be of the form r [X̄ − t1− α2 S2 , X̄ + t1− α2 n r S2 ] n where all the values are known except for the valuest that correspond to a t−student with n − 1 = 99 degrees of freedom. In this case we have to look up the tables for Year 2019 - 2020 56 Statistics II the value t1− α2 = t0.975 That is, the value of a t − student with 99 degrees of freedom that has to its left a probability of 0.975. In the tables we nd (since 99 degrees of freedom does not appear in the tables we take the nearest value, 100 degrees of freedom) t0.975(99) = 1.984 Thus r [X̄ − t1− α2 r = [26.000 − 1.984 Year 2019 - 2020 S2 n r , X̄ + t1− α2 S2 ]= n r 980.000 , 26.000 + 1.984 100 980.000 ] 100 57 Statistics II Doing the computations, we nally get µ ∈ [25.803, 56, 26.196, 42] with a probability of 95% Year 2019 - 2020 58 Statistics II • 1.3.2.2 Condence Interval for the variance In a similar manner, we can also construct a condence interval for the case of the population variance. • We must remember, though, that in this case the population must follow a Normal distribution. We know then that and hence Year 2019 - 2020 (n − 1)S 2 2 ∼ χ n−1 σ2 (n − 1)S 2 p(χ α2 ≤ ≤ χ1− α2 ) = 1 − α 2 σ 59 Statistics II where χ α2 is the value of a found in tables. Similarly, of 1 − α2 . Year 2019 - 2020 χ2n−1 whose left tail contains an area of α2 and that can be χ1− α2 is the value of a χ2n−1 whose left tail contains an area 60 Statistics II Graphically α/2 α/2 χα/2 Year 2019 - 2020 χ1−α/2 61 Statistics II As before, we can work the inequalities out to obtain 1 σ2 1 p( ≥ )=1−α ≥ χ α2 (n − 1)S 2 χ1− α2 (n − 1)S 2 (n − 1)S 2 2 ≥σ ≥ )=1−α p( α α χ2 χ1− 2 that is, 2 2 (n − 1)S (n − 1)S σ2 ∈ [ , ] χ1− α2 χ α2 with probability 1−α Example 21 Let {x1, x2, · · · , x100} be a random sample of size 100 drawn from a Normal population with unknown mean and variance. Construct a condence Year 2019 - 2020 62 Statistics II interval with a condence level of 95% for the population variance σ 2 if we know that the sample variance is S 2 = 4.800 If the condence level is 95% we have that 1 − α = 0.95. Hence, α = 0.05 and α 2 = 0.025. Therefore, 1− α = 0.975 2 The interval will be of the form (n − 1)S 2 (n − 1)S 2 [ , ] α α χ1− 2 χ2 where all the values are known except for the values χ that correspond to a chi-square with n − 1 = 99 degrees of freedom. In this case we have to look up the tables for Year 2019 - 2020 63 Statistics II the value χ1− α2 = χ0.975 and χ α2 = χ0.025 That is, the values of a chi-square with 99 degrees of freedom that have to its left a probability of 0.975 i 0.025 respectively, In the tables we nd χ0.975 = 129.561 i χ0.025 = 74.222 Thus, (n − 1)S 2 (n − 1)S 2 [ , ]= χ1− α2 χ α2 =[ Year 2019 - 2020 99·4.800 99·4.800 , ] 129.561 74.222 64 Statistics II Doing the computations, we nally get σ 2 ∈ [3667.77, 6402.41] with a probability of 95% Year 2019 - 2020 65 Statistics II 1.3.2.3 Condence Interval for the proportion The case of the Normal proportion is special for, as said before, the approximation to the requires a large sample (nπ(1 − π) > 5) Then we will have π̂ ∼ N (π, π(1 − π) ) n and, similarly as in the case of the condence interval for the mean, we get: r π ∈ [π̂ − z 1− α 2 π̂(1 − π̂) , π̂ + z1− α2 n r π̂(1 − π̂) ] n with probability 1−α Example 22 In an random sample of 1000 people, 450 declare that they smoke on a regular basis. Construct a condence interval with a condence level of 95% for the proportion of smokers, π , in the population from which the sample has been obtained. Year 2019 - 2020 66 Statistics II If the condence level is 95% we have that 1 − α = 0.95. Hence, α = 0.05 and α 2 = 0.025. Therefore, α 1 − = 0.975 2 Let us rst compute the sample proportion, that is, the proportion of smokers in the sample. In this case π̂ = 450 = 0.45 1000 The interval will be of the form r [π̂ − z 1− α 2 π̂(1 − π̂) , π̂ + z1− α2 n r π̂(1 − π̂) ] n where all the values are known except for the values Z that correspond to a Normal Year 2019 - 2020 67 Statistics II distribution. In this case we have to look up the tables for the value Z1− α2 = Z0.975 That is, the value of a N (0, 1) that has to its left a probability of 0.975. In the tables we nd Z0.975 = 1.96 Thus, r r π̂(1 − π̂) π̂(1 − π̂) [π̂ − z1− α2 , π̂ + z1− α2 ]= n n r r 0.45(1 − 0.45) 0.45(1 − 0.45) = [0.45 − 1.96 , 0.45 + 1.96 ] 1000 1000 Doing the computations, we nally get Year 2019 - 2020 68 Statistics II π ∈ [0.4191, 0.4808] with a probability of 95% Year 2019 - 2020 69 Statistics II 1.4 Properties of estimators: bias, eciency and consistency • Once the main statistics and their probabilistic features (i.e. probability distribution, expectation and variance) are known, we focus in this chapter on the "good" properties that we would like the estimators to have in order for them to provide good approximations to the parameters of interest. • In this sense, an estimator might, among others, satisfy the properties of being unbiased, ecient, and consistent Year 2019 - 2020 that we will see next. 70 Statistics II 1.4.1 Bias Denition 23 Let θ̂ be an estimator of the population parameter θ. The bias of θ̂ is dened as the dierence between the expected value of the estimator and the true value of the population parameter B(θ̂) = E(θ̂) − θ Denition 24 An estimator θ̂ is said to be an unbiased estimator of the population parameter θ if its bias is zero B(θ̂) = 0 or E(θ̂) = θ Year 2019 - 2020 71 Statistics II Example 25 Let {x1, x2, . . . , xn} be a random sample drawn from a population with population mean µ. Then, for the sample mean X̄ we have: E(X̄) = µ Thus, X̄ is and unbiased estimator of µ Year 2019 - 2020 72 Statistics II Example 26 Let {x1, x2, . . . , xn} be a random sample drawn from a population with population variance σ 2. Then, for the population variance S 2 we have: E(S 2) = σ 2 Thus, S 2 is an unbiased estimator of σ 2 Year 2019 - 2020 73 Statistics II Example 27 Let {x1, x2, . . . , xn} be a random sample drawn from a population with population proportion π . Then, for the sample proportion π̂ we have: E(π̂) = π Thus, π̂ is an unbiased estimator of π Year 2019 - 2020 74 Statistics II • Interpretation of the unbiased property We know that an estimator is a random variable, that is, takes dierent values with dierent probabilities. • Hence, it is clear that it is highly unlikely that the specic value ( estimate) that we get once we apply the sample to the estimator exactly coincides with the true parameter value. • What the unbiased property means is that the above is true "in the sense of expectation". When we apply the specic sample we have to the estimator the estimate will not coincide (in general) with the true value of the parameter, if we had 100 dierent samples to apply to the estimator then the average of the 100 dierent estimates produced would be very close to the true parameter value. Year 2019 - 2020 75 Statistics II We can compare an estimator with a "shooter" whose target is the true value of the parameter. • A good "shooter" (unbiased) always aims at the center of the target, although there is always a small probability that the shot slightly deviates from the center. • A bad "shooter" (biased) never aims at the center of the target. Year 2019 - 2020 76 Statistics II 1.4.2 Eciency The eciency criterion for an estimator, that we will see next , has two dierent versions depending on whether the estimator is biased or unbiased. 1.4.2.1 Unbiased Estimators Denition 28 Let θ̂1 and θ̂2 be two unbiased estimators of θ. Then, the more ecient estimator is that of the lesser variance. 1.4.2.2 Biased Estimators Denition 29 Let θ̂1 and θ̂2 be any two estimators of θ. Then, the more ecient estimator is that of the lesser Mean Quadratic Error (M QE) where: M QE(θ̂) = E(θ̂ − θ)2 = V (θ̂) + B(θ̂)2 Year 2019 - 2020 77 Statistics II The second "version" contains the rst one as a special case. Indeed, it an estimator has zero bias's then its M QE and Variance coincide Example 30 Let us consider the following alternative estimators of the population mean µ which will be applied to a sample obtained from a population with population mean µ and population variance σ 2 µ̂1 = x1 + x 2 + x3 3 x1 + x 2 µ̂2 = 2 Let us check rst the bias's of each of these estimators: Year 2019 - 2020 78 Statistics II x1 + x 2 + x 3 B(µ̂1) = E(µ̂1) − µ = E( )−µ= 3 1 = (E(x1) + E(x2) + E(x3)) − µ = 3 1 = 3µ − µ = µ − µ = 0 3 Year 2019 - 2020 79 Statistics II B(µ̂2) = E(µ̂2) − µ = E( = = x1 + x2 )−µ= 2 1 (E(x1) + E(x2)) − µ = 2 1 2µ − µ = µ − µ = 0 2 Hence, both estimators and unbiased. Let us now check which one has less variance: Year 2019 - 2020 80 Statistics II x1 + x2 + x3 V (µ̂1) = V ( )= 3 1 = (V (x1) + V (x2) + V (x3)) = 9 1 2 σ2 = 3σ = 9 3 Year 2019 - 2020 81 Statistics II x1 + x2 V (µ̂2) = V ( )= 2 1 = (V (x1) + V (x2)) = 4 1 2 σ2 = 2σ = 4 2 2 2 Therefore, µ̂1 is more ecient as it has less variance ( σ3 < σ2 ) Year 2019 - 2020 82 Statistics II Interpretation of the eciency property f we compare an unbiased estimator with a "good shooter" (as we have done before) that always aims at the center of the target, then an estimator is more ecient than another one if it "trembles" less. In other words, the more ecient estimator is the one whose values are more concentrated around its mean. Year 2019 - 2020 83 Statistics II 1.4.3 Consistency • Very often it becomes very dicult to nd ecient estimators for a specic parameter. • In this case we look at the so called asymptotic properties, that consist of the properties that the estimators have when the sample is as large as needed. • In this sense, we will introduce asymptotic bias's and the asymptotic eciency or consistency. the Year 2019 - 2020 84 Statistics II 1.4.3.1 Asymptotically unbiased estimators Denition 31 An estimator θ̂ of the population parameter θ is said to be asymptotically unbiased if its bias vanishes as the sample size goes to innity. Formally, θ̂ is an unbiased estimator of θ if lim B(θ̂) = 0 n→∞ Example 32 Let us consider the following estimator of the population variance (σ 2) 2 S̃ = Year 2019 - 2020 Pn i=1 (xi − X̄)2 n 85 Statistics II It is easy to check that if 2 Pn S = then − X̄)2 n−1 i=1 (xi n−1 2 S S̃ = n 2 and hence E(S̃ 2) = E( Therefore n−1 2 n−1 n−1 2 S )= E(S 2) = σ n n n σ2 n−1 2 2 B(S̃ ) = E(S̃ ) − σ = σ −σ =− n n 2 2 2 That is, S̃ 2is a biased estimator of σ 2 since E(S̃ 2) 6= σ 2. Year 2019 - 2020 86 Statistics II Nevertheless, S̃ 2 is an asymptotically unbiased estimator of σ 2, for its bias vanishes as the sample grows. Indeed, σ2 lim B(S̃ ) = lim − = 0 n→∞ n→∞ n 2 1.4.3.2 Consistent Estimators The property of consistency not only considers the behavior of the bias as the sample grows large, but also looks at the variance. That is, of the M QE Year 2019 - 2020 consistency refers to the behavior of the estimator as the sample size goes to innity. 87 Statistics II Denition 33 An estimator θ̂ of the population parameter θ is said to be consistent it its Mean Quadratic Error vanishes as the size of the sample goes to innity. Formally, θ̂ is a consistent estimator of θ if lim EQM (θ̂) = 0 n→∞ Example 34 Let us consider the estimator of σ 2 that we have seen before, S̃ 2. We already know that it it a biased estimator for σ and that its bias is B(S̃ ) = 2 2 σ2 −n . We will compute now its variance in order to study the behavior of its EQM as the sample size goes to innity n−1 2 n−1 2 (n − 1)2 2(σ 2)2 2(n − 1)σ 4 2 V (S̃ ) = V ( S )=( ) V (S ) = = n n n2 n−1 n2 2 Year 2019 - 2020 88 Statistics II Hence 2 4 4 σ 2(n − 1)σ (2n − 1)σ 2 + (− EQM (S̃ 2) = V (S̃ 2) + B(S̃ 2)2 = ) = n2 n n2 and then (2n − 1)σ 4 lim EQM (S̃ ) = lim =0 2 n→∞ n→∞ n 2 Therefore, S̃ 2is a consistent estimator of σ 2 Year 2019 - 2020 89 Statistics II 1.5 Methods of point estimation: maximum likelihood and method of moments • When we need to produce estimations for population parameters that are "standard", µ, σ 2, π ), ( • there are good estimators at hand: ( X̄, S 2, π̂ ). When we need to estimate a dierent population parameter (for instance the median or the kurtosis) we do not have a "candidate" for estimator. • The Maximum Likelihood method and the Method of Moments provide techniques to build good estimators of a given population parameter. Year 2019 - 2020 90 Statistics II 1.5.1 Maximum Likelihood estimation The intuition of the method is as follows: • After performing a totally random sampling (SRS) we obtain a specic sample, and there must be a reason for it (since we could have obtained a dierent one). • Well, probably we have obtained this specic sample because the parameter value we want to estimate is such that the sample we have obtained is the one with the highest probability of been selected. • In this sense, the maximum likelihood method nds the value of the parameter that maximizes the probability of obtaining the sample at hand. The process takes three steps, starting with the sample we have, the probability density function of the population that contains want to estimate, Year 2019 - 2020 {x1, x2, · · · xn} and the parameter (θ) we f (x; θ). 91 Statistics II We will rst introduce the general method, and later we oer an example to clarify it. Suppose that we want to estimate the parameter given by f (x; θ) Step 1 Build the Likelihood function θ of a population with a distribution using the sample that we have obtained {x1, x2, · · · xn}. The Likelihood function is the "formula" that computes the probability of having obtained the sample we have conditional on the population parameter we want to estimate. L(x1, x2, · · · xn; θ) = P (X1 = x1, X2 = x2, · · · Xn = xn; θ) Since the sample has been obtained from a population with a probability distribution given by f (x; θ) Year 2019 - 2020 and that the elements in the sample are independent from each other, 92 Statistics II the joint probability P (X1 = x1, X2 = x2, · · · Xn = xn; θ) can be computed as P (X1 = x1, X2 = x2, · · · Xn = xn; θ) = f (x1; θ) · f (x2; θ) · . . . · f (xn; θ) hence, ' L(x1, x2, · · · xn; θ) = f (x1; θ)·f (x2; θ)·. . .·f (xn; θ) = $ n Y f (xi; θ) i=1 & Step 2 % Apply logarithms The functional form of the likelihood function is often involved (the product of functions) Year 2019 - 2020 93 Statistics II Using logarithms we can simplify the function so that it becomes easier to deal with. Therefore, in this step we simply apply ln and then use the properties of logarithms in order to simplify the form of the likelihood function ' $ ln L(x1, x2, · · · xn) = ln n Y i=1 & Step 3 f (xi; θ) = n X ln f (xi; θ) i=1 % Maximize The last step is to maximize the likelihood function, that is, to nd the value of maximizes the function L θ (the probability of having obtained the sample we have). We must compute the derivative of the likelihood function (with the logarithm) Year 2019 - 2020 that ln L 94 Statistics II θ with respect to the parameter and make it equal to zero to nd the value of θ that maximizes it. ' $ ∂ ln L(x1, · · · xn; θ) =0 ∂θ & From here we nd the value of The solution will be the Example 35 Let θ % solves the above equation. maximum likelihood estimator {x1, x2, · · · xn} of θ, usually denoted by θ̂M L be a sample (independent) obtained from a Normal population with population mean variance σ 2. Find the maximum likelihood µ and population estimator of µ. First, let us remember what is the probability density function corresponding to a Year 2019 - 2020 95 Statistics II N (µ, σ 2): 2 1 − 12 ( x−µ σ ) f (x; µ, σ ) = √ e 2πσ 2 Year 2019 - 2020 96 Statistics II Step 1 Likelihood Function n Y x −µ 1 − 12 ( iσ )2 √ e = L(x1, x2, · · · xn) = 2πσ i=1 n Pn xi −µ 2 1 1 − √ = · e 2 i=1 σ 2πσ This would be hard to work with !. That's why we need to use logarithms. Year 2019 - 2020 97 Statistics II Step 2 Logarithms ln L(x1, · · · xn) = ln 1 √ 2πσ n ·e Pn xi −µ 2 1 − 2 i=1 σ It still looks hard, but after using some of the properties of logarithms1 the simplication will be important ln 1 √ 2πσ n ·e Pn xi −µ 2 1 − 2 i=1 σ = ln √ 1 1 2πσ 1 = ln √ 2πσ n − n + ln e 2 n X 1 xi − µ 2 i=1 σ Pn xi −µ 2 1 − 2 i=1 σ = ln e The logarithm of the product is the sum of logarithms, etc. Year 2019 - 2020 98 Statistics II Hence ln L(x1, · · · xn) = ln √ Pas 3 1 2πσ n − 2 n X xi − µ 1 2 i=1 σ Maximize We have to compute the derivative of ln L(x1, · · · , xn) with respect to µ and equate it to zero. ∂ ln L(x1, · · · xn) ∂ = ∂µ ∂µ ln √ 1 2πσ n − n X 1 2 i=1 xi − µ σ 2 ! = n 2 n ∂ 1 ∂ 1 X xi − µ = ln √ − = ∂µ ∂µ 2 σ 2πσ i=1 Year 2019 - 2020 99 Statistics II 2 n n n X X X 1 1 ∂ xi − µ xi − µ 1 xi − µ =0− =− 2 (− ) = 2 2 i=1 ∂µ σ 2 i=1 σ σ σ i=1 Hence, ∂L(x1, · · · xn) =0⇒ ∂µ and nally, n X i=1 Year 2019 - 2020 xi = n X xi − µ i=1 n X i=1 µ⇒ σ2 n X i=1 n n X 1 X = 0 ⇒ 2( xi − µ) = 0 σ i=1 i=1 Pn xi = nµ ⇒ µ = i=1 xi n 100 Statistics II That is, the maximum likelihood estimator of the population mean µ is the sample mean X̄ ' $ Pn µ̂M L = & Year 2019 - 2020 i=1 xi n = X̄ % 101 Statistics II 1.5.2 Method of moments Consider a population distributed according to the density function the unknown The population parameter method of moments Step 1 f (x, θ), where θ is that we want to study. proceeds in 3 simple steps Compute the expectation of make it equal to the average µ X according to the density function above and of the population Z µ = E(X) = xf (x, θ)dx The result of this integral will be a function of the parameter θ. Hence we should have something as µ = g(θ) Year 2019 - 2020 102 Statistics II Step 2 Since we know that X̄ is a good estimator of µ, we just set µ = X̄ , that is: X̄ = g(θ) Step 3 Finally, just inverting the function g we can express and we are done ! We have found an estimator for moments estimator θ̂M M θ θ X̄ method of as a function of that is called the θ̂M M = g −1(X̄) Example 36 Consider a population distributed according to the density function f (x, θ) = (θ + 1)xθ 0 0≤x≤1 otherwise Find the Method of Moments estimator of θ Year 2019 - 2020 103 Statistics II Step 1 Expectation Z µ = E(x) = 1 x(θ + 1)xθ dx = (θ + 1) 1 x(θ+1)dx = 0 0 = (θ + 1) Z (θ+2) 1 x θ+2 = 0 (θ + 1) (θ + 2) Hence, we can write µ= θ+1 θ+2 Step 2 Use the estimation of µ θ+1 X̄ = θ+2 Year 2019 - 2020 104 Statistics II Step 3 Solve for θ θ̂M M = Year 2019 - 2020 1 − 2X̄ X̄ − 1 105