Lecture 4: Review of statistics – sampling and estimation BUEC 333 Professor David Jacks 1 Remember: our objective as econometricians is inference, learning about a population of interest. The population can be almost any group of people, firms, etc. that we are interested in (e.g., all Canadian adults, the 30 largest firms in the TSX). Question: what is the relationship between CAPE for firms in the TSX and future price growth? The goals of inference: learning about a population 2 In principle, we can measure a parameter we care about using the whole population, but we almost never do because of the cost or data constraints. StatCan almost does this in the Census of Population (4/5 short forms; 1/5 long forms); but only does so every 5 years because of its expense. Basic idea: a cheaper alternative is to contact a small, representative group of individuals. Sampling 3 Again, inference about a population is almost always based on a sample in econometrics. But how to choose which population members to sample? Econometricians do it randomly. Example: “If there was an election today, which of these parties would you vote for?” Population: every eligible voter nationwide Sample: the group randomly selected Populations and samples 4 How to ensure an appropriate sample? Easiest way is a simple random sample (SRS): randomly choose n members of the population that equally likely to be selected (e.g., draw names). But most surveys are actually not SRS. Example: in a SRS of 1000 Canadians, very unlikely to select anyone from PEI…consequently, Random sampling 5 Suppose we are interested in a RV X; select a sample of individuals and measure their X. The observed measurements of X that comprise our sample are called observations; the set of these observations are our data. Denote the n observations in the sample as X1, X2, ... , Xn. Samples as sets of RVs 6 Because we randomly select objects into the sample, the values of X1, X2, ... , Xn are random. That is, we do not know what values of X we will get in advance. And if we had chosen different members of the population, their values of X would be different. Thus, under random sampling, not only are Samples as sets of RVs 7 Here, we will assume a convenient kind of sample whereby the distribution of RV X is the same for all members of the population. Because each Xi (where i = 1, 2,..., n) comes from the same population distribution, each Xi has the same marginal distribution: f(X). This is why we can use the sample to learn about the population iid sampling 8 Going one step further: suppose the observations are drawn independently of one another. That is, knowing X1 provides no information about X2, …, Xn. Then we say X1, X2, ... , Xn are independently and identically distributed, or iid. The “convenience” of assuming an iid sample iid sampling 9 Suppose we draw an iid sample of n observations, X1, X2,..., Xn, from a population. 1 The sample mean is X n a “good” estimate of μ. n X , i i 1 The sample variance is a “good” estimate of σ2. , Likewise, the sample standard deviation is s s 2 Some old skool (BUEC 232) statistics 10 Finally, the sample covariance is 1 X i X Yi Y n 1 i 1 n s XY a “good” estimate of σXY. And the sample correlation is rXY s XY / s X sY . Some old skool (BUEC 232) statistics 11 A statistic is simply any function of the sample data and critically, statistics are RVs (since the sample data they are drawn from are RVs). And we know that all RVs have a probability distribution, so all statistics have one too. The probability distribution of a statistic is known as the sampling distribution. Statistics and sampling distributions 12 Every statistic has a sampling distribution: a different sample contains different observations, different observations take different values, and so the value of the statistic would be different. The sampling distribution represents uncertainty about the population value of the statistic because it is based on a sample. Like any probability distribution, the sampling distribution tells us what values of the statistic are Statistics and sampling distributions 13 For instance, the mean of the sampling distribution tells us the expected value of the statistic, a measure which tells us where the statistic’s probability distribution is centered. The variance of the sampling distribution tells us how spread out the statistic’s distribution is. Generally, it is a function of the sample size. What the sampling distribution tells us 14 Time for a demonstration: consider the last digit of your student ID numbers. These should be randomly assigned. What the sampling distribution tells us 15 I can calculate the average value of this last digit for the whole class, namely 4.24. I can also calculate the variance of this last digit for the whole class, namely 8.64. Imagine that these are our population parameters of interest. Note, however, you will never be able to calculate these parameters because you will never have access to the full set of data. What the sampling distribution tells us 16 Now, your job is break up into groups of four and calculate the average within groups. Each of your groups should be randomly assigned and will constitute a sample. Each of your averages should be different and will constitute a sample statistic. What the sampling distribution tells us 17 If the sampling variance is large, then it is likely that the statistic takes a value far from the mean of the sampling distribution. If the sampling variance is small, then it is unlikely that the statistic takes a value far from the mean of the sampling distribution. Usually, the sampling variance gets What the sampling distribution tells us 18 The sample statistics seen before are estimators (they are used to estimate population parameters). That is, we care about population parameters like μ, but do not observe them directly and cannot measure their values in the population. So…draw a sample from the population and estimate μ using that sample. We said X-bar as a “good” estimate of μ, but what constitutes “good”? Estimation 19 Tons of available estimators, but not created equal: X-bar is an estimator of μ, but so is X1 (or X2). Usefulness of estimator’s sampling distribution: suppose we are interested in population parameter Q and q is a sample statistic used to estimate Q. We say q is an unbiased estimator of Q if Q is the mean of the sampling distribution Estimators and their properties: bias 20 Unbiasedness is nice but “weak”: many unbiased estimators of a given population parameter. Example: how to estimate μ; in an iid sample, the sample mean is an unbiased estimator of μ: EX as sample is iid and E(Xi) = μ for all observations. Estimators and their properties: efficiency 21 How do we then choose between unbiased estimators? Suppose two unbiased estimators of Q, q1 and q2: q1 is more efficient than q2 if Var(q1) < Var(q2). We prefer the unbiased estimator with the smaller sampling variance, i.e., q1. Why? The more efficient Estimators and their properties: efficiency 22 Suppose X1, X2, ... ,Xn are an iid random sample from a population with mean μ and variance σ2. Already shown that the sample mean is unbiased; the variance of the sampling distribution of the sample mean (or the sampling variance of the sample mean) is σ2/n: Var X (refer to last slide in lecture 3) Sampling distribution of the sample mean 23 In fact, if X1, X2, ... , Xn are iid draws from the N(μ, σ2) distribution, then X ~ N , 2 n But why? We already know the values of the mean and variance of the sampling distribution… We also know the sample mean is just a linear combination of a bunch of N(μ, σ2) RVs. Finally, we know that linear combinations of normal RVs are themselves normally distributed Sampling distribution of the sample mean 24 1.) The easiest way to characterize a statistic’s sampling distribution is to calculate some of its features, like its mean and variance. Examples: an estimator’s bias depends on the mean of the sampling distribution; efficiency involves comparing its sampling variance. The standard deviation of the sampling distribution of a statistic has a special name Ways to characterize the sampling distribution 25 2.) Given knowledge of (or assumptions about) the exact probability distribution of the population, we can derive the statistic’s exact sampling distribution. Example: when sampling from a normal population, sample mean is normally distributed. 3.) If unwilling or unable to do 2.), we can rely on asymptotic theory to derive an approximate sampling distribution Ways to characterize the sampling distribution 26 Thankfully, there already exist some powerful theorems to describe the behavior of a sample mean as the sample size tends to infinity. But why do we care about sample means? Because most statistics of interest can be written as sample means of something. Therefore, we can use these theorems to describe an approximate sampling distribution for many The law of large numbers and the central limit theorem 27 The law of large numbers (LLN): as the sample size n approaches infinity, the sample mean will be close to the population mean with very high probability. If q → Q as n → ∞, we say q is a consistent estimator of Q. Thus, the LLN says the sample mean is a The law of large numbers and the central limit theorem 28 The central limit theorem (CLT): as the sample size approaches infinity, the sampling distribution of the sample mean is approximately normal with mean μ and variance σ2/n. CLT: as n , X N , / n 2 The law of large numbers and the central limit theorem 29 CLT: the sum (and hence, the mean) of a number of independent, identically distributed random variables will tend to be normally distributed, regardless of their underlying distribution, if the number of different RVs is large enough. Consider (yet again) the case of a six-sided dice: There are 6 possible outcomes {1, 2, 3, 4, 5, 6} Each with an associated probability of 1/6 The pdf looks like the following… Demonstrations of the CLT 30 Demonstrations of the CLT 31 Simulation: using a computer to “throw the dice” many times (N). We can then look at the sampling distribution of the average and consider what happens as N increases. Demonstrations of the CLT 32 Run it one time: Run it another time: OK, one more time: Demonstrations of the CLT 33 Give me a billion! Now let’s plot the histogram… Demonstrations of the CLT 34 We know the population mean is equal to 3.5… so pretty close, but how can we get closer? CLT: Demonstrations of the CLT 35 For N = 100… Demonstrations of the CLT 36 For N = 1000… Demonstrations of the CLT 37 The point of statistical inference is to use the observed sample to learn things about the population like its mean and variance. But we do not observe population parameters, only the sample…we then estimate the population parameters using sample statistics. Then, the general goal is to test hypotheses about Recap: the importance of sampling distributions 38