Uploaded by lanyleepunzalan

Supplement Review of Basic Prob

advertisement
Review of basic probability
concepts
REFERENCES:
•
WOOLDRIDGE, APPENDIX B
•
DANAO, ROLANDO. INTRODUCTION TO
STATISTICS AND ECONOMETRICS. DILIMAN:
THE UNIVERSITY OF THE PHILIPPINES PRESS
Review of Probability Concepts
I.
Probability
II.
Random Variables, Probability Distributions,
Features of Probability Distributions
III.
Special Probability Distributions
Experiment
 An experiment is any procedure that can, at least in
theory, be infinitely repeated and has a well-defined
set of outcomes, e.g. tossing a coin an counting the
number of time a head turns up.
Sample space, event
 The sample space of a random experiment is the set of all
possible outcomes of the experiment. Each outcome is
called a sample point. A subset of a sample space is called
an event.
 Experiment 1: Rolling a single die
Sample space: S = {1,2,3,4,5,6}
Event of an “odd outcome”: A = {1,3,5}
Event of an “even outcome”: B = {?}
 Experiment 2: tossing a coin until a tail appears.
S = {T, HT, HHT, …}
Event tail occurs on the second toss: A = {HT}
Events
 Given two events A and B,
A U B = “union of A and B” = either A or B occurs
A ∩ B = “intersection of A and B” = both A and B occur
If A is an event, the symbol “A= Ф” means A is impossible.
 Let A be an event in sample space S. The complementary
event of A is defined as
Thus
Ac  x  S | x  A
A  Ac  S
and
A  Ac  φ
Events
 Two events A and B are said to be mutually exclusive if
and only if they have no elements in common,
i.e. AB=Ф
 Or, the occurrence of A implies the non-occurrence of B
and vice versa
Probability
 The probability of an event is the chance or likelihood of
this event occurring, measured by a number between 0 and
1
 Equivalent statements:
Probability of an event is 3/4.
= An event occurs with a 75 percent probability.
= The odds against the event not occurring is 25 percent to
75 percent or 1 to 3.
Classical definition of probability
 Let S be a sample space consisting of N mutually exclusive
and equally likely outcomes. The probability of a single
outcome is defined as 1/N.
The probability of an event is the sum of the
probabilities of the sample points in the event. The
probability of an event is denoted as P(A) or Pr(A).
 This is the classical definition of probability. Requires
that there are a finite number of outcomes and outcomes
are mutually exclusive.
Example
Experiment = Toss a coin twice
S = {HH, HT, TH, TT}
N=4
A = event that a head will turn up in both tosses
P(A)= 1/4 = 25 percent
B = event that at least one head will occur
P (B) = ?
Experiment = Roll a fair die once
S = {1,2,3,4,5,6}
N=6
A = event that an odd number turns up={1,3,5}
P(A) = P(1)+P(3)+P(5)=(1/6) + (1/6) + (1/6) = 3/6
Probability set function
Let S be a sample space and A an event.
i.
0≤ P(A)≤1
ii.
P(S)=1
iii. For any number of mutually exclusive events A1 and A2,
P(A1  A2) = P(A1) + P(A2)
A real-valued function P that satisfies the preceding three
axioms on a sample space S is a probability set
function
Theorems
(i)
P(AC) = 1 – P(A)
(ii) P(Ф) = 0
(iii) If A B, then P(A) ≤ P(B).
(iv) If A and B are any two events, then
P(A U B) = P(A) + P(B) – P(A ∩ B)
Conditional Probability
 Let A and B be events in a sample space S. The
conditional probability of event A given that B has
occurred or is certain to occur is defined as
P( A  B)
P( A | B) 
,
P( B)
P(B)  0
Example 3: rolling a die. Let A be event of rolling a “3” and B
the event “odd outcome”. Then A = {3}, B = {1,3,5} and P
(A|B) = (1/6)/(3/6) = 1/3
Example: Data from a survey of 2000 individuals selected
at random (Danao)
Sex of Worker
Male (M)
Female (F)
Total
Employed (E)
850
600
1450
Unemployed (U)
300
250
550
Total
1150
850
2000
P(E) = 1450/2000
P(U) = 550/2000
P(M) = 1150/2000
P(F) = 850/2000
The probability that a male worker is employed is:
 P(E|M) = P(EM)/P(M) = (850/2000)/(1150/2000)




Independence
 Two events A and B are said to be independent
(statistically independent, stochastically independent)
when the occurrence of one event does not affect the
probability of the occurrence of the other event. That is, iff
P(A|B)=P(A) and P(B|A)=P(B)
 Two events are independent if and only if
P(AB)=P(A)P(B)
 Events A, B and C are independent iff they are pairwise
independent and P(A  B C) = P(A)P(B)P(C)
Example 5: card drawn from deck of 52 cards
 Let A be the event that a queen is drawn
 B be the event that a spade is drawn.
n=52
A={QH, QD, QS, QC}
B={2S,…,JS, QS, KS, AS}
P(A) = 4/52 = 1/13
P(B) = 13/52 = ¼
What is the probability that a queen of spade is drawn?
P(AB)=1/52 = (1/13)*(1/4) = P(A)P(B)
Therefore, A and B are independent.
II. Random Variables,
Probability Distributions
Random Variables
 It is more convenient to use numbers to represent
elements of a sample space
 For outcomes that are not numbers, this is
accomplished by associating a number with each
outcome, e.g. instead of S= {H,T}, use {1,0}.
 This assignment of numbers is called a random
variable.
Formally,

A random variable is a real-valued function defined on
a sample space. If X is a random variable defined on a
sample space S, then the range X, denoted by X(S) is the
set
X(S)={xR|x=X(s), sS}
where R is set of real numbers.
Random variable X
S
s.
X
R
x
Let X: S R be a random variable. If x R, the symbol “X=x”
means “X takes the value x”. That is, there is an s S such
that X(s) = x.
Example: tossing a fair coin twice
S = {HH, TT, HT, TH}
Let a random variable X = number of heads that turn
up. So,
Sample Point
HH
TT
HT
TH
X
2
0
1
1
The event “X=1” is the set {s  S|X(s) = 1} = {HT, TH}
Discrete random variable
A discrete random variable can take only a finite or
“countably infinite” number of values (values can be put in
a one-to-one correspondence with the positive integers).
We will concentrate on the former.
Example: Lottery
first prize: P100,000
second prize: P5000
third prize: P500.75
Prize money is a discrete random variable since it has only
four (a finite number) possible outcomes:
P0.00; P500.75; P5000.00; P100,000.00
Simplest example: Bernoulli random variable
A random variable that can only take on the values of
zero and one is called a Bernoulli random variable.
 Event X = 1 is a “success”; Event X = 0 is a “failure”
 To completely describe it, need only know P(X=1)

P(X=1) = θ reads “the probability that X equals one is θ”.
Notation: X ~ Bernoulli (θ) read as “X has a Bernoulli
distribution with probability of success equal to θ”
More generally,
Any complete random variable is completely described by
listing its possible values and the associated probability that
it takes on each value.
A list of all of the possible values taken by a discrete random
variable along with their chances of occurring is called a
probability function or probability density function (pdf).
die
one dot 1
two dots
three dots
four dots
five dots
six dots 6
x
1/6
2
3
4
5
1/6
f(x)
1/6
1/6
1/6
1/6
Same experiment

S = {HH, TT, TH, HT}
P(HH) = ¼
P(TT) = ¼
P(TH) = ¼
P(HT) = ¼

s
TT
HT, TH
HH
x
0
1
2
P(X=x)
1/4
1/2
1/4
X is the number of times heads turn up
-> P(X=0) = P(TT) = ¼
-> P(X=1) = P(HT U TH) = ¼ + ¼ = ½
-> P(X=2) = P(HH) = 1/4
Probability density functions
A discrete random variable X has pdf, f(x), which is the
probability that X takes on the value x. Let X take on the
k possible values, with probabilities defined by
pj = p(X=xj), j = 1, 2, …k
The probability density function (pdf) of X is:
f(x) = pj, j = 1, 2, ….k,
= 0, for any x not equal to xj for some j.
with the following properties :
(i) f(x) ≥ 0
(ii)∑x f(x) = 1
Relationship of X, P and f
S
X
X=x
R
x
P
0
f
1
Continuous random variable
A variable X is a continuous random variable if it takes on any real
value with zero probability: P [ X = a ] = P [ a < X < a ] = 0
(note: X can take on so many possible values that we cannot
count them or match them up with positive integers. So logical
consistency dictates that X can take on each value with
probability zero)
Random variables that take on numerous values are best treated as
continuous. Examples: Gross national product (GNP), money
supply, interest rates, price of eggs, household income,
expenditure on clothing
For continuous rv
• We will use a pdf of a continuous rv only to compute events
involving a range of values
 If a and b are constants, where a < b, the probability that X
lies between the numbers a and b, P (a < X < b ) is the area
under the pdf between points a and b.
For continuous rv, it is easiest to work with the
cumulative distribution function (cdf)

If X is any random variable, then its cdf is defined for any
real number x by
F(x) P(X≤x)

For discrete rv, this is obtained by summing the pdf over
all values of xj such that xj ≤ x. F(x)= Σ f(xj)
 For a continuous rv, F(x) is the area under the pdf to the
left of the point x, or

x
F ( x)  P( X  x)   f (u)du

Continuous RV: cumulative distribution is the area
under the pdf falling less than or equal to x
The total area under the curve is 1.
P(X ≤ a)=area under the curve to
the left of a
= F(a)
P(X ≤ b)=area under the curve to
the left of b
P(a≤X ≤ b)=
= F(b)
F(b)-F(a)
f(x)
X
a
b
Remarks
F(x) is simply a probability; always between 0 and 1.
 If x1 ≤ x2, then P(X ≤ x1 ) ≤ P(X≤ x2 ), that is, F (x1 ) ≤
F(x2). CDF is at least a non-decreasing function of x.

Important properties:
 For any number c, P(X>c) = 1- F(c )
 For any numbers a<b, P(a <X ≤b) = F(b) – F(a).
 For continuous rv, it does not matter whether inequalities
are strict or not). That is: P(Xc) = P(X>c) and
P(a <X <b) = P(a ≤ X ≤b) = P(a <X ≤b) = P(a ≤ X < b)
Joint distributions

Let X and Y be discrete random variables. Then (X,Y) have
a joint distribution, which is fully described by the joint
probability density function of (X,Y):
f x,y (x,y) = P(X=x,Y=y),
= 0, elsewhere
if xX and y  Y
The joint pdf of X and Y (discrete RVs) satisfies:
(1) f x,y (x,y)  0
(2) ΣxΣy f(x,y)=1
Example

A population of N individuals consists of employed males (em),
employed females (ef), unemployed males (um) and
unemployed females (uf).

Experiment = draw a person at random

S={s1,s2,…,sN}

Define the ff random variables on S:
 X(s)=1 if s is female, 0 if s if male
 Y(s)=1 if s is employed, 0 if s is unemployed
e.g. Event {X=0} has em + um elements.
Event {Y=1} has em + ef elements…. And so forth.
P(X=0) = (em+ um)/N
P(Y=1) = ?
Similarly, joint probabilities:
 P(X=0, Y=1) =em/N
 P(X=1, Y=1) =ef/N
 P(X=0, Y=0) =um/N
 P(X=1, Y=0) =uf/N
The joint pdf of X and Y is
defined as:
f
f
f
f
f
X,Y(0,
1) =em/N
X,Y(0, 0) =um/N
X,Y(1, 1) =ef/N
X,Y(1, 0) =uf/N
X,Y(x,y) = 0, elsewhere
Example (cont’d)

Note that
ΣxΣy f x,y (x,y) = f(0,0)+f(0,1)+f(1,0)+f(1,1)
=
um/N +em/N + uf/N + ef/N
=
1
Independence
Random variables X and Y are said to be independent iff
f X,Y (x,y) = fX (x) fY(y)
for all x, y, where fX (x) is the pdf of X and fY(y) is the pdf of
Y. This must hold for all pairs of x, y. If random variables
are not independent, they are said to be dependent
The pdfs fX and fY are called the marginal probability density
functions (to distinguish then from the joint pdf f X,Y )
If X and Y are independent, then knowing the outcome of X
does not change the probabilities of the possible outcomes
of Y, and vice versa.
Note
The marginal probability density functions, fX(x) and
fY(y), for discrete random variables, can be obtained
by summing over the f(x,y) with respect to the values
of Y to obtain fX(x) and with respect to the values of
X to obtain fY(y),
fX(x) = Σy f(x,y)
fY(y)= Σx f(x,y)
Example (see Danao 2.3.3 for continuous case)
Recall example 9: The
joint pdf of X and Y :
f X,Y(0, 1) =em/N
f X,Y(0, 0) =um/N
f X,Y (1, 1) =ef/N
f X,Y(1, 0) =uf/N
f X,Y(x,y) = 0, elsewhere
The marginal pdf of X is:
fX(0)=(em/N) + (um/N)
fX(1)=(ef/N) + (uf/N)
fX(x) = 0, elsewhere
The marginal pdf of Y is:
fY(0)=(um/N)+(uf/N)
fY(1)=(em/N)+(ef/N)
fY(y) = 0, elsewhere
Conditional Distributions
We want to know how one random variable, Y, relates to
one or more other variables?
 Let X and Y be random variables with joint pdf f(x,y). The
conditional pdf of Y given X=x is

f Y/X (y/x) = f X,Y (x,y)/ fX (x)
e.g. interpretation is most easily seen in the discrete case:
f Y/X (y/x) = P(Y=y|X=x)
“the probability that Y=y given that X=x”
From last example



X(s)=1 if s is female, 0 if s if male
Y(s)=1 if s is employed, 0 if s is unemployed
Summarizing joint pdf and marginal pdf:
y=1
y=0
fX
x=0
em/N
um/N
(em+ um)/N
x=1
ef/N
uf/N
(ef+ uf)/N
fY
(em+ ef)/N
(um+ uf)/N
The conditional probability of drawing a female given that the person is
What is the Pr of drawing employed
a female given that the individual
f
X|Y(1,1)/ =f
is employed?
X,Y(1,1)/ fY(1)=[(ef/N)]/[(em+ef )/N]=ef/(em+ef )
Conditional distributions and independence
Important feature of conditional distributions: if X and Y
are independent rv, then knowledge of the value taken
on by X tells us nothing about the probability that Y
takes on various values (and vice versa).
f Y/X (y/x) = f Y (y), and f X/Y (x/y) = fX (x)
Are X and Y independent?
f X,Y (0,0) = um/N
f X (0) = (em + um)/N
f Y (0) = (um+uf)/N
Example: Basketball player shooting 2 free
throws.
Let X, Bernoulli random variable that she makes first
free throw.
Y, Bernoulli random variable that she makes second
free throw.
Suppose she is an 80% free throw shooter. And X and
Y are independent. What is probability of player
making both free throws?
X ~ Bernoulli (.8), Y ~ Bernoulli (.8)
P(X=1, Y=1) =?
Now assume that conditional density is:
f Y/X (1|1) = .85, f Y/X (0|1) = .15
f Y/X(1|0) = .70, f Y/X (0|0) = .30
Thus, the probability of making the 2nd free throw depends on
whether 1st was made. If 1st was made, then chance of
making the 2nd is _____. If 1st was missed, chance of
making 2nd is _____.
Are X and Y independent?
Compute for P(X=1, Y=1). Assume P(X=1) =.8. =?
Some features of probability
distributions
MEASURES OF CENTRAL TENDENCY
MEASURES OF VARIABILITY
MEASURES OF ASSOCIATION BETWEEN TWO
RANDOM VARIABLES
Measure of central tendency: Expected value
 Most familiar measure of central tendency.
 Employs all available data in the computation.
 Strongly influenced by extreme values.
 Expected value of X can be a number that is not a
possible outcome of X
Expected value
If X is a random variable, the expected value of X (or
expectation) of X, denoted by E(X) (and sometimes μX or
just μ) is a weighted average of all possible values of X,
where weights are determined by the pdf. Sometimes called
the population mean, when we want to emphasize that X
represents some variable in the population.
Simplest in the case that X is a discrete rv:
k
E(X) = x1f(x1) + x2f(x2)+…+xkf(xk)   x j f ( x j )
j 1
 If X is continuous rv, then

E( X ) 
 xf ( x)dx

 Given random variable X and function g(•), can create a
new random variable g(X). The expected value of g(X) is:
k
E[ g ( X )]   g ( x j ) f X ( x j ) discrete
j 1

E[ g ( X )] 
continuous
g
(
x
)
f
dx
X


 Note: [E(X)]2
E(X2) . That is, for nonlinear
function g(X), E[g(X)]  g[E(X)]
 If X and Y are random variables taking on vaues (x1,
… xk) and (y1, … ym), then g(X,Y) is a random variable
for any function g, and its expectation is:
E[g(X,Y)] = kh=1mj =1 g(xh,yj) fxy (xh,yj)
Properties of expected values
E.1 For any constant c, E(c ) = c
E.2 For any constants a and b,
E(aX + b) = a E(X)+b
E.3 If {a1, a2, ….an) are constants and {X1, X2, …, Xn}
are random variables, then
n
n
i 1
i 1
E ( ai X i )   ai E ( X i )
 As a special case, with each ai=1,
n
n
i 1
i 1
E ( X i )   E ( X i )
Median
 Value that divides an ordered data set (array) into
two equal parts.
 A value below which half of the data fall.
 Characteristics:



A positional measure
Not influenced by extreme values
May not be an actual value in the data set.
Finding the median, (Med(X) or Md)
 If X is discrete: to arrange the data in an array (ordered
values). Let X(i) be the ith observation in the array, i = 1, 2,
..n. If n is odd, the median position equals X (n+1)/2. If n is
even, the mean of the two middle values in the array is the
median.
 If X continuous: median is the value such that ½ of the area
under the pdf is to the left of Md and ½ is to the right.
 If X has a symmetric distribution about the value μ, then μ
is both the expected value and the median.
Measures of variability: variance and standard
deviation
 Measures of dispersion indicate the extent to which
individual items in a series are scattered about an average
 Used as a measure of reliability of the average value.
 Measures of absolute dispersion (variance, standard
deviation) - used to describe the variability of a data set

Also: measure of relative dispersion - used to compare
two or more data sets with different means and different
units of measurement (e.g. coefficient of variation)
Variance and standard deviation
 For random variable X, let μ= E(X). We need a number to
tell us how far X is from μ, on average. One such number is
the variance tells us the expected distance from X to its
mean:
Var(X)  E[(X – μ)2] , denoted by x2 or just 2
Note that  2 = E(X2)- μ2
 The standard deviation of a random variable is simply the
positive square root of the variance.
sd(X)  + Var( X )
 The standard deviation is often referred to the measure of
“volatility”
Variance and standard deviation
 If there is a large amount of variation in the data set,
the data values will be far from the mean. In this
case, the standard deviation will be large.
 If there is only a small amount of variation in the data
set, the data values will be close to the mean.
Hence, the standard deviation will be small.
Characteristics of the standard deviation
 It is affected by the value of every observation
 It may be distorted by few extreme values
 It is always positive.
Properties of the variance
VAR.1. Var (X)=0 iff there is a constant c, such that P(X=c) =
1, in which case E(X) = c. [Variance of a constant is zero]
SD.1 For any constant c, sd(c ) = 0.
VAR.2. For any constants a and b,
Var (aX+b) = a 2 Var(X)
[Adding constant to a rv does not change variance;
multiplying by a constant increases variance by factor
equal to square of constant]
SD.2
sd (aX+b) = |a| sd(X).
If a>0, then sd(aX) = a*sd(X)
Standardizing a random variable
Suppose that given a random variable X, we define a new
random variable by subtracting off its mean and dividing by
its standard deviation:
Z 
X 

Which we can write as Z= aX+b, where a 1/σ and b -(μ/ σ).
Then,
E(Z) = a E(X) + b = 0
and,
Var(Z) = a2 Var (X) = 1
Z has mean zero and variance of 1.
This procedure is known as standardizing the random
variable X and Z is called a standardized random variable.
Coefficient of variation
Commonly used measure of relative dispersion.
 The coefficient of variation utilizes two measures: the mean
and the standard deviation.
 It is expressed as a percentage, removing the unit of
measurement, thus, allowing comparison of two or more
data sets.
Unit-less. Used to compare the scatter of one distribution
with the scatter of another distribution.
Coefficient of variation
 The formula of the coefficient of variation is given as,
CV = σ/ μ * 100%
Measures of association; features of Joint and
Conditional distributions
Summary measures of how, on average, two random
variables vary with one another.
 The covariance between two random variables X an Y,
sometimes called the population covariance, is defined as
expected value of the product (X – μX) (Y – μY):
Cov (X, Y)  E[(X – μX) (Y – μY)], denoted as σXY
This measures the amount of linear dependence between two
random variables. If positive, two random variables move in the
same direction (on average, when X is above its mean,Y is also).
If negative, move in opposite directions (when X is above its
mean, Y is below). Interpreting the magnitude of a covariance
can be tricky.
Note (show!)
Cov (X,Y)
= E[(X – μX) (Y – μY)]
= E[(X– μX) Y]
= E[X(Y – μY)]
= E (XY) - μX μY
If E(X) or E(Y) = 0, then Cov (X,Y) = E(XY)
Some Properties
COV.1 If X and Y are independent, then Cov(X,Y)=0.
[follows from E(XY) = E(X)E(Y)].
Converse is not true.
COV.2. For any constants a1, b1, a2, b2:
Cov (a1X + b1, a2Y + b2) = a1a2 Cov(X,Y)
This implies covariance between 2 random variables can
be altered simply by multiplying one or both by a
constant. Depends on units of measurement.
COV.3 |Cov(X,Y)|  sd(X)sd(Y)
Correlation Coefficient (ρXY)
Corr (X,Y)
= Cov(X,Y)/[sd(X)*sd(Y)]
= σXY/ σX σY
Sometimes denoted xy. This scales covariance into a
unitless number.
Again, a measure of the strength of the linear
relationship existing between X and Y.
Always same sign as Cov (X,Y). Magnitude is easier to
interpret than size of covariance because of CORR.1
Properties
CORR. 1
–1  Corr (X,Y)  1
A ρxy close to 1 or –1 indicates a strong linear relationship but
it does not necessarily imply that X causes Y or Y causes
X.
If ρxy = 1, perfect positive linear relationship, and we can
write Y = a + bX, for some constant a and constant b>0.
If ρxy = 0, then there is no linear relationship between X and
Y and they are said to be uncorrelated random variables.
A value of 0 however does not mean lack of association.
Properties
CORR.2
For any constants a1, b1, a2, b2:
Corr (a1X + b1, a2Y + b2) = Corr (X,Y)
If a1a2 < 0, then
Corr (a1X + b1, a2Y + b2) = - Corr (X,Y)
That is, the correlation is invariant to the units of
measurement of either X or Y
More properties of the variance
VAR.3 For constants a and b,
Var (aX + bY)= a2 Var(X) + b2Var(Y) + 2abCov(X,Y)
If uncorrelated, then Var(X+Y) = Var(X) + Var(Y)
Var (X-Y) = ?
VAR.4 Extended to more than two random variables: If {X1,
X2, … Xn} are pairwise uncorrelated random
variables and {ai, i = 1…n} are constants, then
Var(a1X1+…+anXn) = a12Var(X1)+…+an2Var(Xn)
Conditional expectation or mean- when Y is
related to X in a nonlinear fashion
 Call Y the explained variable (say, hourly wage) and X the
explanatory variable (say, years of formal education).
 How does the distribution of wages change with education
level?
 Summarize relationship by looking at the conditional
expectation of Y given X, also called the conditional mean.
The expected value of Y given that we know an outcome of
X.
Conditional expectation: E(Y|X=x) or E(Y|x)
 When Y is a discrete rv, taking values {y1…ym}
m
E (Y | x)   y j fY | X ( y j | x)
j 1
 Weighted average of possible values of Y, but now
weights reflect the fact that X has taken on a specific
value.
 E(Y|x) is a function of x, which tells us how the
expected value of Y varies with x.
example
 Let (X,Y) represent population of all working individuals,
where X is years of education and Y is hourly wage.
 Then E(Y|X=10) is average hourly wage for all people in the
population with 10 years of education (with HS education).
 Expected value of hourly wage can be found at each level of
education. In econometrics, we can specify simple
functions that capture this relationship, e.g. linear
function: E(WAGE|EDUC) = 15.65 + 3.50 EDUC
(note: conditional expectations can also be nonlinear
functions)
Useful properties of conditional mean
CE. 1
CE. 2
CE.3


CE.4

E[c(X)|X] = c(X), for any function c(X)
E[a(X)Y + b(X)|X] = a(X) E(Y|X) + b(X)
If X and Y are independent, E(Y|X) = E(Y)
If X and Y are independent, then the expected value of Y given X
does not depend on X (e.g. if wages were independent of
education, then average wages of HS and college grads would be
the same)
If U and X are independent and E(U) = 0, then E(U|X) = 0.
E[E(Y|X)] = E(Y)
If we first get E(Y|X) as a function of X, and take the expected
value of this, then end up with E(Y).
CE.5
If E(Y|X) = E(Y), then cov (X,Y) = 0 (and so corr(X,Y) = 0.) In
fact, every function of X is uncorrelated with Y.

If knowledge of X does not change the expected value of Y, then
X and Y must be uncorrelated (implying, if X and Y are
correlated then E(Y|X) must depend on X.

Converse is not true. If X and Y are uncorrelated, E(Y|X) could
still depend on X.
From (4) and (5): If U and X are random variables such that E(U|X) = 0,
then E(U) = 0 and U and X are uncorrelated.
Conditional variance
Given random variables X and Y, the variance of Y
conditional on X=x is the variance associated with the
conditional distribution of Y, given X=x:
E{[Y-E(Y|x)]2 |x}
 Var(Y|X=x) = E(Y2|x) – [E(Y|x)]2 (why?)
CV.1 If X and Y are independent, then Var (Y|X) = Var (Y)
III. Special Probability
Distributions
Special Probability Distributions

The probability that a random variable takes values in an
interval can be computed from each of these distributions.

Examples of widely used distributions for continuous
variables:




Normal distribution
t distribution
F distribution
Chi-square distribution
The Normal Distribution

A random variable X is said to be normal if its pdf
is of the form
2

1
1x μ 
f (x) 
exp  
 , x  R
σ 2π
 2  σ  

X has a normal distribution with expected value
 and variance 2
Remarks on the Normal Distribution

The normal distribution is completely characterized by two
parameters,  (mean) and  (standard deviation).

The graph of the normal pdf is the familiar bell curve.
 Symmetric with respect to .  is also the median of X.
 Maximum value of the pdf is attained when x=.
 The larger  is, the flatter f(x) is at x=

To indicate that X is a normal random variable with mean 
and variance 2:
X~N(, 2)
Standard Normal Distribution
If a random variable Z ~ N(0, 1), then we say it has a standard
normal distribution (SND).
The pdf of a snd is denoted  (z) and is obtained as the area under
 to the left of z.
Can use the standard normal cdf for computing the probably of an
event involving a standard normal rv.
P(Z > z) = 1 -  (z)
P (Z < -z) = P (Z > z)
P (a  Z  b) =  (b) -  (a)
P (|Z| > c) = P (Z >c) + P (Z<c) = 2 * P(Z>c) = 2[1-  (c)]
Example
To get P (x1 < X < x2), we write
P (x1 < X < x2) = P (x1 < Z +  < x2)
= P ((x1- ) /  < Z < (x2- ) /)
Let X ~ N (5, 4). What is P(6 < X < 8)?
= P ((6-5)/2 < Z < (8-5)/2)
= P (0.5 < Z < 1.5) = 0.9332 - .6915 = .2417
This is the area under the standard normal curve
between 0.5 and 1.5.
Example

Assume that monthly family incomes in urban Philippines
are normally distributed with μ = 16,000 and σ=2,000.
What is the probability that a family picked at random will
have an income between 15,000 and 18,000?
P(15000<X<18,000)
16k
15k
18k
Example

Compute Z values
Z1=(15,000-16,000)/2,000 = -0.5
Z2=(18,000-16,000)/2,000=1

Find the area between Z1=-0.5 and Z2=1
Use Table of Z values to find:
P(-0.5<Z<1)
= .8413 - .3085
Or
= P(0<Z<1)+P(-0.5<Z<0)
= 0.3413 + 0.1915
= 0.5328 or 53.28 %
NOR.2 Let X~N(, 2) and let Y = a + bX. Then Y~N(a+b,
b22).
NOR.3 If X and Y are jointly normally distributed, then they
are independent iff Cov(X,Y) =0
NOR.4 Any linear combination of independent, identically
distributed normal rv has a normal distribution. In
particular, let X1, X2,…, Xn~ N(i, i2). Then for real
numbers c1, c2, …, cn
c1 X1 + c2 X2+ … + cn Xn ~ N(Σcii, Σci2i2)
Corollary
Let X1, X2, ..., Xn be mutually independent normal
random variables and identically distributed as
N(μ,σ2).
The distribution of
_
X
is N(μ,σ2/n).
(That is, the average of independent normally
distributed random variables has a normal
distribution.)
Example
Example: consider Zi ~ N(0,1), i = 1,…,25. What is
_
P( Z< .2)?
_
25
Z = 1/n  Z i
i 1
~ N(0,1/25).
_
Z 0
~ N(0,1). Or, Z = 5 Z
1 / 25
_
Hence, P (
<0.2) = P(Z/5 < 0.2) = P(Z<1) = .8413
So, Z =
Z
Other features of normal distribution
 Zero skewness (3rd moment: degree of asymmetry, or departure from
symmetry of a distribution).
E ( Z )  E[( X   ) ] / 
3
3
3
 Kurtosis, distinguishes between two symmetric distribution, = 3 for a
normal distribution.
E ( Z 4 )  E[( X   ) 4 ] /  4
In Excel (and in some other textbooks), the measure of kurtosis is
measured as (α4 – 3), referred to as excess kurtosis. If > 0, the
distribution has fatter tails than the normal distribution, such as with
the t distribution. If <0, then it has thinner tails (rarer situation)
Positive vs. negative skewness
Positive (to the right)
 distribution tapers more to
the right than to the left
 longer tail to the right
 more concentration of values
below than above the mean
Negative (to the left)
 distribution tapers more to the
left than to the right
 longer tail to the left
 more concentration of values
above than below the mean
Note: rarely do we find data
characteristically skewed to
the left.
Chi-square
 Obtained directly from independent, standard normal
random variables.
 Let Zi, i = 1, …., n be independent random variables, each
distributed as standard normal. Define a new random
variable as the sum of the squares of the Zi
n
X =  Z i2
i 1
The X has a chi-square distribution with n degrees of
2

freedom (df), denoted: X ~ n
(or  2 (n) )
Notes: Chi-square
 non-negative (has positive values only for x>0)
 not symmetric about any point. For small values of df, it is
skewed to the right but as df increases, the curve
approaches the normal curve.
 The expected value of X is n, and its variance is 2n.
 Relation to the normal distribution: the square of standard
2
normal random variable is ~  (1)
t distribution
 Workhorse in classical stat and multiple regression analysis. Obtained
from standard normal and chi-square random variable.
 Let Z have SND and X have chi-square with n df. Assume Z and X are
independent. Then the random variable
T = Z/
X /n
as a t distribution with n df. Denoted: T~tn or t (n)
 Shape similar to SND, except more spread out so thicker tails.
 Expected value of a t distributed random variable is 0 and variance is
n/(n-2) for n>2 (does not exist for n <2 because distribution is so
spread out).
 As df increases, variance approaches 1, so at large df, t distribution can
be approximated by the SND
F distribution
 Will be used for testing hypothesis in context of multiple regression
analysis.
2
2

 Let X1~
and X2 ~  k 2
k1
Then the random variable
F = X 1 / k1
, and they are independent.
X 2 / k2
Has an F distribution with (k1, k2) degrees of freedom. Denoted: F ~F
k 1, k 2
[or F (k1, k2)]
 Characterized by two parameters: k1, the numerator degrees of
freedom; k2 , the denominator degrees of freedom. Skewed to the right.
 The square of a t distribution is an F distribution
X ~ t(n)
X2 ~ F(1, n)
Download