Lecture 4: Review of statistics – sampling and estimation BUEC 333

advertisement
Lecture 4:
Review of statistics – sampling and estimation
BUEC 333
Professor David Jacks
1
Remember: our objective as econometricians is
inference, learning about a population of interest.
The population can be almost any group of people,
firms, etc. that we are interested in (e.g., all
Canadian adults, the 30 largest firms in the TSX).
Question: what is the relationship between CAPE
for firms in the TSX and future price growth?
The goals of inference: learning about a population
2
In principle, we can measure a parameter we care
about using the whole population, but we almost
never do because of the cost or data constraints.
StatCan almost does this in the Census of
Population (4/5 short forms; 1/5 long forms); but
only does so every 5 years because of its expense.
Basic idea: a cheaper alternative is to contact a
small, representative group of individuals.
Sampling
3
Again, inference about a population is almost
always based on a sample in econometrics.
But how to choose which population members to
sample? Econometricians do it randomly.
Example: “If there was an election today, which of
these parties would you vote for?”
Population: every eligible voter nationwide
Sample: the group randomly selected
Populations and samples
4
How to ensure an appropriate sample?
Easiest way is a simple random sample (SRS):
randomly choose n members of the population that
equally likely to be selected (e.g., draw names).
But most surveys are actually not SRS.
Example: in a SRS of 1000 Canadians, very
unlikely to select anyone from PEI…consequently,
Random sampling
5
Suppose we are interested in a RV X; select a
sample of individuals and measure their X.
The observed measurements of X that comprise
our sample are called observations; the set of
these observations are our data.
Denote the n observations in the sample as
X1, X2, ... , Xn.
Samples as sets of RVs
6
Because we randomly select objects into the
sample, the values of X1, X2, ... , Xn are random.
That is, we do not know what values of X we will
get in advance.
And if we had chosen different members of the
population, their values of X would be different.
Thus, under random sampling, not only are
Samples as sets of RVs
7
Here, we will assume a convenient kind of sample
whereby the distribution of RV X is the same for
all members of the population.
Because each Xi (where i = 1, 2,..., n) comes from
the same population distribution, each Xi has the
same marginal distribution: f(X).
This is why we can use the sample to learn about
the population
iid sampling
8
Going one step further: suppose the observations
are drawn independently of one another.
That is, knowing X1 provides no information about
X2, …, Xn.
Then we say X1, X2, ... , Xn are independently and
identically distributed, or iid.
The “convenience” of assuming an iid sample
iid sampling
9
Suppose we draw an iid sample of n observations,
X1, X2,..., Xn, from a population.
1
The sample mean is X 
n
a “good” estimate of μ.
n
X ,
i
i 1
The sample variance is
a “good” estimate of σ2.
,
Likewise, the sample standard deviation is
s s
2
Some old skool (BUEC 232) statistics
10
Finally, the sample covariance is
1
X i  X Yi  Y 

n  1 i 1
n
s XY

a “good” estimate of σXY.
And the sample correlation is rXY  s XY / s X sY .
Some old skool (BUEC 232) statistics
11
A statistic is simply any function of the sample
data and critically, statistics are RVs (since the
sample data they are drawn from are RVs).
And we know that all RVs have a probability
distribution, so all statistics have one too.
The probability distribution of a statistic is known
as the sampling distribution.
Statistics and sampling distributions
12
Every statistic has a sampling distribution:
a different sample contains different observations,
different observations take different values,
and so the value of the statistic would be different.
The sampling distribution represents uncertainty
about the population value of the statistic because
it is based on a sample.
Like any probability distribution, the sampling
distribution tells us what values of the statistic are
Statistics and sampling distributions
13
For instance, the mean of the sampling
distribution tells us the expected value of the
statistic, a measure which tells us where the
statistic’s probability distribution is centered.
The variance of the sampling distribution tells
us how spread out the statistic’s distribution is.
Generally, it is a function of the sample size.
What the sampling distribution tells us
14
Time for a demonstration: consider the last digit of
your student ID numbers.
These should be randomly assigned.
What the sampling distribution tells us
15
I can calculate the average value of this last digit
for the whole class, namely 4.24.
I can also calculate the variance of this last digit
for the whole class, namely 8.64.
Imagine that these are our population parameters
of interest.
Note, however, you will never be able to calculate
these parameters because you will never have
access to the full set of data.
What the sampling distribution tells us
16
Now, your job is break up into groups of four and
calculate the average within groups.
Each of your groups should be randomly assigned
and will constitute a sample.
Each of your averages should be different and will
constitute a sample statistic.
What the sampling distribution tells us
17
If the sampling variance is large, then it is likely
that the statistic takes a value far from the mean of
the sampling distribution.
If the sampling variance is small, then it is
unlikely that the statistic takes a value far from the
mean of the sampling distribution.
Usually, the sampling variance
gets
What the sampling distribution tells us
18
The sample statistics seen before are estimators
(they are used to estimate population parameters).
That is, we care about population parameters like
μ, but do not observe them directly and cannot
measure their values in the population.
So…draw a sample from the population and
estimate μ using that sample.
We said X-bar as a “good” estimate of μ, but what
constitutes “good”?
Estimation
19
Tons of available estimators, but not created equal:
X-bar is an estimator of μ, but so is X1 (or X2).
Usefulness of estimator’s sampling distribution:
suppose we are interested in population parameter
Q and q is a sample statistic used to estimate Q.
We say q is an unbiased
estimator of Q if Q is the
mean of the sampling
distribution
Estimators and their properties: bias
20
Unbiasedness is nice but “weak”: many unbiased
estimators of a given population parameter.
Example: how to estimate μ; in an iid sample, the
sample mean is an unbiased estimator of μ:
EX  
as sample is iid and E(Xi) = μ for all observations.
Estimators and their properties: efficiency
21
How do we then choose between unbiased
estimators?
Suppose two unbiased estimators of Q, q1 and q2:
q1 is more efficient than q2 if Var(q1) < Var(q2).
We prefer the unbiased estimator with the smaller
sampling variance, i.e., q1.
Why? The more efficient
Estimators and their properties: efficiency
22
Suppose X1, X2, ... ,Xn are an iid random sample
from a population with mean μ and variance σ2.
Already shown that the sample mean is unbiased;
the variance of the sampling distribution of the
sample mean (or the sampling variance of the
sample mean) is σ2/n:
Var  X  
(refer to last slide in lecture 3)
Sampling distribution of the sample mean
23
In fact, if X1, X2, ... , Xn are iid draws from the
N(μ, σ2) distribution, then X ~ N  ,  2 n


But why? We already know the values of the mean
and variance of the sampling distribution…
We also know the sample mean is just a linear
combination of a bunch of N(μ, σ2) RVs.
Finally, we know that linear combinations of
normal RVs are themselves normally distributed
Sampling distribution of the sample mean
24
1.) The easiest way to characterize a statistic’s
sampling distribution is to calculate some of its
features, like its mean and variance.
Examples: an estimator’s bias depends on the
mean of the sampling distribution; efficiency
involves comparing its sampling variance.
The standard deviation of the sampling
distribution of a statistic has a special name
Ways to characterize the sampling distribution
25
2.) Given knowledge of (or assumptions about) the
exact probability distribution of the population, we
can derive the statistic’s exact sampling
distribution.
Example: when sampling from a normal
population, sample mean is normally distributed.
3.) If unwilling or unable to do 2.), we can rely on
asymptotic theory to derive an approximate
sampling distribution
Ways to characterize the sampling distribution
26
Thankfully, there already exist some powerful
theorems to describe the behavior of a sample
mean as the sample size tends to infinity.
But why do we care about sample means?
Because most statistics of interest can be written
as sample means of something.
Therefore, we can use these theorems to describe
an approximate sampling distribution for many
The law of large numbers and the central limit theorem 27
The law of large numbers (LLN):
as the sample size n approaches infinity, the
sample mean will be close to the population mean
with very high probability.
If q → Q as n → ∞, we say q is a consistent
estimator of Q.
Thus, the LLN says the sample mean is a
The law of large numbers and the central limit theorem 28
The central limit theorem (CLT):
as the sample size approaches infinity, the
sampling distribution of the sample mean is
approximately normal with mean μ and variance
σ2/n.

CLT: as n  , X  N  ,  / n
2

The law of large numbers and the central limit theorem 29
CLT: the sum (and hence, the mean) of a number
of independent, identically distributed random
variables will tend to be normally distributed,
regardless of their underlying distribution, if the
number of different RVs is large enough.
Consider (yet again) the case of a six-sided dice:
There are 6 possible outcomes {1, 2, 3, 4, 5, 6}
Each with an associated probability of 1/6
The pdf looks like the following…
Demonstrations of the CLT
30
Demonstrations of the CLT
31
Simulation: using a computer to “throw the dice”
many times (N).
We can then look at the sampling distribution of
the average and consider what happens as N
increases.
Demonstrations of the CLT
32
Run it one time:
Run it another time:
OK, one more time:
Demonstrations of the CLT
33
Give me a billion!
Now let’s plot the histogram…
Demonstrations of the CLT
34
We know the
population mean
is equal to 3.5…
so pretty close,
but how can we
get closer?
CLT:
Demonstrations of the CLT
35
For N = 100…
Demonstrations of the CLT
36
For N = 1000…
Demonstrations of the CLT
37
The point of statistical inference is to use the
observed sample to learn things about the
population like its mean and variance.
But we do not observe population parameters,
only the sample…we then estimate the population
parameters using sample statistics.
Then, the general goal is to test hypotheses about
Recap: the importance of sampling distributions
38
Download