Lecture 5: Review of statistics – useful probability distributions

advertisement
Lecture 5:
Review of statistics – useful probability
distributions
BUEC 333
Professor David Jacks
1
Throughout the course, there are four probability
distributions that we encounter again and again:
the normal, chi-square, (student’s) t, and F.
These are important for our purposes because
most theory regarding the classical linear
regression model (CLRM) was developed in the
context of the normal distribution.
Doing so allows us to arrive at exact results
Useful probability distributions
2
But when we get away from the exact
distributional assumptions of the CLRM, we have
to use large sample approximations.
From the central limit theorem, many statistics
have an approximately normal distribution as the
sample size gets large (add up enuf RVs from nonnormals and sum or average looks pretty normal).
Consequently, test statistics that we care about
turn out to have chi-square, t, or F
Useful probability distributions
3
The CLT in action
4
A continuous RV with a normal distribution has a
well-behaved, bell-shaped pdf.
Completely characterized by just two parameters,
its mean (μ) and variance (σ2) with 95% of its
probability density lying between μ ± 1.96σ.
We write X ~ N(μ,
σ2).
0.20
pdf
0.15
0.10
0.05
0.00
0.0
The normal distribution
25.0
50.0
75.0
100.0
5
If X ~ N(μ, σ2), we can standardize X by
subtracting off the mean and dividing by the
standard deviation: Z = (X - μ)/σ and Z ~ N(0,1).
Two further useful results related to the normal
distribution:
1.) If X ~ N(μ, σ2) then a + bX ~ N(a + bμ, b2σ2)
2.) If X1, X2, ... , Xn are normally distributed RVs
The normal distribution
6
Many important test statistics have a chi-square
distribution.
It is defined by a single parameter: the degrees of
freedom, denoted v.
Not symmetric but rather has a very long ―tail‖ in
the positive direction (very large positive values
can occur, although not too often).
The chi-square distribution
7
Standard notation: χ2v
Its definition is based on the normal distribution:
if Z ~ N(0,1), then Z2 ~ χ21.
Furthermore, if X1 and X2 are independent χ21 RVs,
then
2
X1  X 2 ~  2
Likewise, if we add n independent χ21 RVs
The chi-square distribution
8
The chi-square distribution
9
A very important test statistic—called the t
statistic (not a coincidence)—has a probability
distribution called the (student’s) t distribution.
Defined by a single parameter, the degrees of
freedom v, and denoted as tv.
Distribution is very similar to the normal even for
small v, but with slightly thicker tails.
Based on the normal and chi-square: if Z ~ N(0,1),
X ~ χ2v , and they are independent
Student’s t distribution
10
Student’s t distribution
11
Another derived distribution that is very important
for inference; F test statistic has a F distribution.
Like chi-square, RVs with an F distribution take
positive values only & are positively skewed.
Defined by two degrees of freedom parameters v1
and v2, and denoted as Fv1,v2.
Based on the chi-square: if X1 and X2 are
independent chi-square RVs with v1 and v2 degrees
of freedom, respectively
F distribution
12
F distribution
13
Download