Uploaded by titusmkhonto33

Joint Distributions: Statistics 2A Lecture Notes

advertisement
Chapter 3
Statistics 2A
STA02A2
CHAPTER 3: Joint Distributions
Lecture notes sections
Content from corresponding textbook sections
1. Introduction
3.1. Introduction
2. Discrete random variables
3.2. Discrete random variables
3. Continuous random variables
3.3. Continuous random variables
4. Independent random variables
3.4. Independent random variables
5. Conditional distributions
3.5. Conditional distributions
1. Introduction
In statistical analysis we are often interested in several random variables that are related to one another. For
example, in ecological studies the counts of different species could be related to the predator count in the habitat,
or in marketing, the sales revenue, marketing expenditure, and consumer behaviour and preference are all related
measurements. In such cases we not only look at the distribution of each variable individually, i.e., univariate
analysis, we also look at the joint distribution of two or more variables, i.e., bivariate/multivariate analysis.
The joint behaviour of two random variables X and Y, discrete or continuous, is determined by the joint cdf:
F ( x, y ) = P ( X  x  Y  y ) = P ( X  x, Y  y )
The cdf gives the probability that the point (X, Y) belongs to a semi-infinite rectangle in the plane (figure on the
left) and the probability that the point (X, Y) belongs to a given rectangle (figure on the right) is:
P ( x1  X  x2 , y1  Y  y2 ) = F ( x2 , y2 ) − F ( x2 , y1 ) − F ( x1 , y2 ) + F ( x1 , y1 )
Y
Y
b
y2
y1
X
X
a
In general, if X 1 , X 2 ,
F ( x1 , x2 ,
x1
x2
, X n are jointly distributed random variables, their joint cdf is:
, xn ) = P ( X1  x1 , X 2  x2 ,
, X n  xn )
1
Chapter 3
Statistics 2A
STA02A2
2. Discrete Random Variables
2.1 Joint probability mass function
Let X and Y be discrete random variables defined on the same sample space, where X can take on the values
and Y can take on the values y1 , y2 ,
x1 , x2 ,
. The joint probability mass function (joint pmf) is defined as:
p ( xi , y j ) = P ( X = xi , Y = y j )
Consider two discrete random variables X and Y with any valid range −  X   and −   Y   with no
further restrictions. The joint probability for a range of X and Y values is found through double summation.
x2
y2
y2
x2
P ( x1  X  x2 , y1  Y  y2 ) =   p ( x, y ) =   p ( x, y )
x = x1 y = y1
y = y1 x = x1
Example
Consider two discrete random variables X and Y with joint pmf given in the following table, and presented in the
graph below:
Y=0
Y=1
Y=2
X=0
P ( X = 0, Y = 0 ) = 16
P ( X = 0, Y = 1) = 14
P ( X = 0, Y = 2 ) = 18
X=1
P ( X = 1, Y = 0 ) = 18
P ( X = 1, Y = 1) = 16
P ( X = 1, Y = 2 ) = 16
0.25
0.167
0.25
0.125
0.20
0.167
0.167
0.125
0.15
0.10
x=0
0.05
x =1
0.00
y=0
y=1
y=2
Note that  p ( xi , y j ) = 16 + 14 + 18 + 18 + 16 + 16 = 4+6+324+3+ 4+ 4 = 24
.
24 = 1
1
2
i =0 j =0
2
Chapter 3
Statistics 2A
STA02A2
2.2 Marginal probability mass function
The joint pmf contains all information about the distributions on both X and Y. It is therefore possible to derive
the marginal distribution of X (marginal pmf of X) and the marginal distribution of Y (marginal pmf of Y) from the
joint pmf using the law of total probability, namely:
➢ Marginal pmf of X = p X ( x ) =  p ( x, y j )
for all j
➢ Marginal pmf of Y = pY ( y ) =  p ( xi , y )
for all i
Example
In the example above, the marginal pmf of X and the marginal pmf of Y, with both shown in one graph, are:
➢ Marginal pmf of X:
o For X = 0: p X ( 0 ) =  p ( 0, y j ) = 16 + 14 + 81 = 4+246+3 = 13
24
for all j
o For X = 1: p X (1) =  p (1, y j ) = 18 + 16 + 16 = 3+244+ 4 = 11
24
for all j
➢ Marginal pmf of Y:
o For Y = 0: pY ( 0 ) =  p ( xi , 0 ) = 16 + 18 = 424+3 = 247
for all i
o For Y = 1: pY (1) =  p ( xi ,1) = 14 + 16 = 624+ 4 = 10
24
for all i
o For Y = 2: pY ( 2 ) =  p ( xi , 2 ) = 18 + 16 = 324+ 4 = 247
for all i
0.6
0.54
0.5
0.46
0.42
0.4
0.29
0.3
0.29
0.2
0.1
0
x=0
x=1
y=0
y=1
y=2
3
Chapter 3
Statistics 2A
STA02A2
2.3 Multivariate joint and marginal probability mass functions
In the case of several discrete random variables X 1 , X 2 ,
➢ Joint pmf of X 1 , X 2 ,
, X n is p ( x1 , x2 ,
➢ Marginal pmf of, say, X 1 , is p X1 ( x1 ) =
, X n , the joint and marginal functions are:
, xn ) = P ( X1 = x1 , X 2 = x2 ,
 p(x ,x ,
1
x2 x3
2
, X n = xn )
, xn )
xn
 p(x , x ,
➢ Two-dimensional marginal pmf of, say, X 1 and X 2 , is p X1 X 2 ( x1 , x2 ) =
1
x3 x4
2
, xn )
xn
2.4 Calculating probabilities
We can use the joint (and marginal) mass functions to calculate joint (and marginal) probabilities. The first step
is to identify the valid values of X and Y for the required probability.
Example
Calculate P ( X = 1, Y  0 ) from the joint pmf:
Y=0
Y=1
Y=2
X=0
P ( X = 0, Y = 0 ) = 16
P ( X = 0, Y = 1) = 14
P ( X = 0, Y = 2 ) = 18
X=1
P ( X = 1, Y = 0 ) = 18
P ( X = 1, Y = 1) = 16
P ( X = 1, Y = 2 ) = 16
P ( X = 1, Y  0 ) = P ( X = 1, Y = 1) + P ( X = 1, Y = 2 ) = 16 + 16 = 62 = 13
3. Continuous Random Variables
3.1 Joint probability density function
Let X and Y be continuous random variables. The joint pdf f ( x, y ) is a piecewise continuous function of variables
 
 
− −
− −
X and Y that is nonnegative, and   f ( x, y ) dxdy =   f ( x, y ) dydx = 1 .
Consider two continuous random variables X and Y with any valid range −  X   and −   Y   with no
further restrictions. The joint probability for a range of X and Y values is found through double integration.
x2 y2
y2 x2
x1 y1
y1 x1
P ( x1  X  x2 , y1  Y  y2 ) =   f ( x, y ) dydx =   f ( x, y ) dxdy
4
Chapter 3
Statistics 2A
STA02A2
Example
Show that the following function f ( x, y ) is a valid joint pdf of random variables X and Y.
f ( x, y ) = 127 ( x 2 + xy )
 
1 1
  f ( x, y ) =  
− −
0 0
0  x, y  1
1
12
7
( x + xy ) dydx =  ( x y + xy )
2
2
12
7
2
1
2
0
1
= 127  ( x 2 + 12 x ) dx = 127 ( 13 x 3 + 14 x 2 )
0
y =1
y =0
dx
.
x =1
x =0
= 127 ( 13 + 14 ) = 127  127 = 1
Since f ( x, y ) integrates to 1 over both variables it follows that it is a valid pdf. The graph of f ( x, y ) is:
3.2 Joint cumulative distribution function
The joint cdf of X and Y is defined as:
x
y
y
x
F ( x, y ) = P ( X  x, Y  y ) =   f ( u , v ) dvdu =   f ( u , v ) dudv
− −
− −
From the fundamental theorem of multivariable calculus it follows that f ( x, y ) =
2
F ( x, y ) .
xy
5
Chapter 3
Statistics 2A
STA02A2
Example
(
)
1) Derive F ( x, y ) from f ( x, y ) = 127 x 2 + xy for 0  x, y  1 .
x y
x
0 0
0
F ( x, y ) =   127 ( u 2 + uv ) dvdu = 127  ( u 2v + 12 uv 2 )
v= y
v =0
= 127 ( 13 x 3 y + 14 x 2 y 2 ) for 0  x, y  1
1
du = 127  ( u 2 y + 12 uy 2 ) du = 127 ( 13 u 3 y + 14 u 2 y 2 )
0
u=x
u =0
2) Use the answer in (1) to derive f ( x, y ) .
f ( x, y ) =
 2 12 1 3

 7 ( 3 x y + 14 x 2 y 2 )  =  127 ( x 2 y + 12 xy 2 ) = 127 ( x 2 + xy ) for 0  x, y  1



xy
y 
3.3 Marginal cumulative distribution and probability density functions
The marginal cdf of a random variable X, FX ( x ) , is:
x 
FX ( x ) = P ( X  x ) =   f ( u , y ) dydu
− −
The marginal pdf of a random variable X, f X ( x ) , is:
fX ( x) =

d
FX ( x ) =  f ( x, y ) dy
dx
−
Example
(
)
Derive f X ( x ) and fY ( y ) from f ( x, y ) = 127 x 2 + xy for 0  x, y  1 .
1
1)
2)
1
f X ( x ) =  f ( x, y ) dy =  127 ( x 2 + xy ) dy = 127 ( 13 x 2 y + 12 xy 2 )
0
0
1
1
fY ( y ) =  f ( x, y ) dx =  127 ( x 2 + xy ) dx = 127 ( 13 x 3 + 12 x 2 y )
0
0
y =1
y =0
x =1
x =0
= 127 ( 13 x 2 + 12 x ) for 0  x  1
= 127 ( 13 + 12 y ) for 0  y  1
6
Chapter 3
Statistics 2A
STA02A2
3.4 Multivariate joint and marginal probability density functions
In the case of several discrete random variables X 1 , X 2 ,
 

− −
−
➢ Marginal pmf of, say, X 1 , is f X1 ( x1 ) =  
, X n , the marginal functions are:
 f (x , x ,
2
3
, xn ) dx2 dx3
dxn
➢ Two-dimensional marginal pmf of, say, X 1 and X 2 , is
 
f X1 X 2 ( x1 , x2 ) =  
− −

 f (x , x ,
3
4
, xn ) dx3dx4
dxn
−
3.5 Calculating probabilities
To calculate probabilities from a joint pdf of two variables, i.e., a bivariate density, we use the double integral. It
is important that we incorporate any restrictions on the valid values of the two variables in the double integral.
To do this, we must determine the range of the sample space, and the range of the required joint probability. The
first step is to draw the valid region for the sample space. The second step is to add the valid region for the event
space of the required probability. This determines the successive integral ranges for X and Y, and also the order
in which we will use the double integral.
Exercise 1
Let X and Y be continuous random variables with joint pdf f ( x, y ) , where 0  x  y  1 .
1) Draw the valid range of the sample space.
7
Chapter 3
Statistics 2A
STA02A2
2) Draw the valid range of the events space to calculate P ( X  0.5) .
We can find the P ( X  0.5) by either (1) finding the marginal distribution of X (i.e., integrating out over y)
and then calculating the probability, or (2) using the double integral in a single step. For option (2) we need to
decide whether we first integrate over y and then over x, or over x and then over y. To determine this, we must
check how the two variables vary together.
8
Chapter 3
Statistics 2A
STA02A2
3) Draw the valid range of the events space to calculate P (Y  0.5) .
We must check how the two variables vary together.
9
Chapter 3
Statistics 2A
STA02A2
4) Draw the valid range of the events space to calculate P ( X  0.5Y ) .
Since we now have both variables in the event space it creates an additional line in the graph.
10
Chapter 3
Statistics 2A
STA02A2
3.6 Copulas
In all the preceding joint distribution examples, the joint pmf or pdf were given. The question we can ask is how
was that joint distribution created? If two variables X and Y are, say, univariate normal variables, does it
necessarily follow that they follow a bivariate normal distribution? The answer to this is no, as the underlying
dependence structure has an impact on the joint behaviour of the random variables. It is however true that, if
variables X and Y follow a bivariate normal distribution, then both variables are univariate normally distributed.
If we want to model the relationship between two variables, an easy way to do this is through linear regression
models. However, this requires the assumption of linearity, which is not always the case. If a bivariate relationship
between X and Y is such that they are closely correlated for small values of X and Y, and weakly correlated for
large values of X and Y, the relationship is not constant.
One approach to model this type of dependence and find a joint probability distribution is through the use of
copulas, which can estimate more complex relationships. Copulas are based on Sklar’s theorem, developed in
1959. The theorem states that any multivariate joint distribution can be written in terms of the univariate marginal
distribution functions and a copula, which describes the dependence structure between the variables. To estimate
a copula, we need to determine which copula to use and then find the parameter which best fits the data. Common
copulas are the Gaussian, Clayton, Farlie-Morgenstern, Gumbel and Frank copulas.
Formally defined, a copula is a joint cdf of random variables that have uniform marginal distributions. A copula
C ( u, v ) is nondecreasing in each variable because it is a cdf. The density is defined as:
c ( u, v ) =
2
C ( u, v )  0
uv
If X and Y are continuous random variables with cdf’s FX ( x ) and FY ( y ) , then U = FX ( x ) and V = FY ( y ) are
uniform random variables (from the probability integral transform). For a copula C ( u, v ) , the joint cdf FXY ( x, y )
and joint pdf f XY ( x, y ) , are:
➢ FXY ( x, y ) = C ( FX ( x ) , FY ( y ) )
➢
f XY ( x, y ) = c ( FX ( x ) , FY ( y ) ) f X ( x ) fY ( y )
Marginal distributions alone do not determine the joint distribution. A bivariate joint distribution can be
constructed from two marginal distributions and any copula, which captures the dependence between X and Y.
11
Chapter 3
Statistics 2A
STA02A2
Farlie-Morgenstern family
If FX ( x ) and GY ( y ) are univariate cdf’s, then the function H ( x, y ) is a bivariate cdf, for any   1 :


H ( x, y ) = F ( x ) G ( y ) 1 +  1 − F ( x ) 1 − G ( y )
Because lim FX ( x ) = lim FY ( y ) = 1 , the marginal distributions of H ( x, y ) are FX ( x ) and GY ( y ) , which are
x →
y →
uniform on  0,1 . We can construct an infinite number of different bivariate distributions using H ( x, y ) .
For example, use H ( x, y ) to construct a bivariate distribution with uniform(0,1) marginals, where  = −1:
➢ FX ( x ) = x, 0  x  1 and FY ( y ) = y, 0  y  1
➢ H ( x, y ) = xy 1 − (1 − x )(1 − y )  = x 2 y + xy 2 − x 2 y 2
➢ h ( x, y ) =
2

x 2 y + xy 2 − x 2 y 2 ) = ( x 2 + 2 xy − 2 x 2 y ) = 2 x + 2 y − 4 xy
(
xy
x
For example, use H ( x, y ) to construct a bivariate distribution with uniform(0,1) marginals, where  = +1:
➢ FX ( x ) = x, 0  x  1 and FY ( y ) = y, 0  y  1
➢ H ( x, y ) = xy 1 + (1 − x )(1 − y )  = 2 xy − x 2 y − xy 2 + x 2 y 2
2

➢ h ( x, y ) =
2 xy − x 2 y − xy 2 + x 2 y 2 ) = ( 2 x − x 2 − 2 xy + 2 x 2 y ) = 2 − 2 x − 2 y + 4 xy
(
xy
x
12
Chapter 3
Statistics 2A
STA02A2
Exercise 2
1) Consider the following function of a random variable X, where  = 0,1, 2,3 , and find the value of k that will
ensure that p(x) is a valid pmf.
k ( x3 + 1) x = 0,1, 2,3
p ( x) = 
otherwise
0
2) Consider the following joint density distribution f ( x, y ) . Draw the graph, with coordinates, of the valid region
of X and Y, as well as the event space to calculate P ( X  Y ) .
x+ y

f ( x, y ) =  2
0
for x  0, y  0, and 3 x + y  3
otherwise
13
Chapter 3
Statistics 2A
STA02A2
3) Calculate P ( X  Y ) .
14
Chapter 3
Statistics 2A
STA02A2
4. Independent Random Variables
In bivariate analysis, we not only look at the joint distribution of two random variables, but we must also evaluate
the dependence between the two variables. The concept of independent random variables is similar to the concept
of statistical independence in probability analysis, which states that two events A and B are statistically
independent if P ( A  B ) = P ( A) P ( B ) . In other words, knowing the outcome of one event does not change or
influence the outcome of the other event. The same concept applies to random variables.
In general, if two random variables X and Y are independent, then:
P ( X  A, Y  B ) = P ( X  A) P (Y  B ) for all sets A and B
This means that n random variables X 1 , X 2 ,
, X n are independent if their joint pmf/pdf factors into the product
of their marginal pmf/pfd’s, and their joint cdf factors into the product of their marginal cdf’s.
➢ For X i independent discrete random variables, it follows that:
o
p ( x1 , x2 ,
, xn ) = pX1 ( x1 ) pX 2 ( x2 )
pX n ( xn )
o
F ( x1 , x2 ,
, xn ) = FX1 ( x1 ) FX 2 ( x2 )
FX n ( xn )
➢ For X i independent continuous random variables, it follows that:
o
f ( x1 , x2 ,
, xn ) = f X1 ( x1 ) f X 2 ( x2 )
f X n ( xn )
o
F ( x1 , x2 ,
, xn ) = FX1 ( x1 ) FX 2 ( x2 )
FX n ( xn )
Consider the case of two jointly continuous random variables X and Y. If they are independent, then:
F ( x, y ) = FX ( x ) FY ( y )
Proof/Derivation:
x
y
F ( x, y ) =   f XY ( u , v ) dvdu
− −
x
y
=   f X ( u ) fY ( v ) dvdu
− −

x
 y
=   f X ( u ) du    fY ( v ) dv 
 −
  −

= FX ( x ) FY ( y )
15
Chapter 3
Statistics 2A
STA02A2
Consider the case of two independent random variables X and Y. Then any function Z = g ( X ) and W = h (Y )
are also independent.
Proof/Derivation:
Let A ( z ) be the set of x such that g ( x )  z , and let B ( w) be the set of y such that h ( y )  w
P ( Z  z,W  w ) = P ( X  A ( z ) , Y  B ( w ) )
= P ( X  A ( z ) ) P (Y  B ( w ) )
= P ( Z  z ) P (W  w )
In conclusion:
➢ If we know two variables are independent, we find the joint distribution as the product of the marginal
distributions.
➢ If we want to test if two variables are independent, we check if the joint distribution is equal to the product
of the marginal distributions.
Example
Consider two discrete random variables X and Y with joint pmf:
Y=0
Y=1
Y=2
X=0
P ( X = 0, Y = 0 ) = 16
P ( X = 0, Y = 1) = 14
P ( X = 0, Y = 2 ) = 18
X=1
P ( X = 1, Y = 0 ) = 18
P ( X = 1, Y = 1) = 16
P ( X = 1, Y = 2 ) = 16
Use the marginal pmf’s of X and Y to check if X and Y are independent.


p ( x ) =  11
24
0

13
24
x=0
x =1
otherwise
 247
 10

p ( y ) =  247
 24
0

y=0
y =1
y=2
otherwise
To check independence, we need to see if P ( X = xi , Y = y j ) = P ( X = xi ) P (Y = y j ) for all possible values of X
and Y. For example, consider pXY (1,0 ) :
7
P ( X = 1, Y = 0 ) = 18 = 0.125  P ( X = 1) P (Y = 0 ) = 11
24  24 = 0.134
16
Chapter 3
Statistics 2A
STA02A2
5. Conditional Distributions
A conditional distribution is the distribution of values for one random variable that exists when we specify the
values of another random variable. For example, we have observed the value of a random variable Y and we need
to define the distribution of another random variable X, give what we have observed for Y. Therefore, we use the
conditional distribution of X given Y.
Consider two random variables X and Y.
➢ If X and Y are jointly distributed discrete random variables, the conditional probability that X = x (some
value x) given that Y = y (some value y), where pY ( y )  0 , is:
p X |Y ( x | y ) = P ( X = x | Y = y )
=
P ( X = x, Y = y )
P (Y = y )
=
p XY ( x, y )
pY ( y )
➢ If X and Y are jointly distributed continuous random variables, the conditional density of X given Y, where
fY ( y )  0 , is:
f X |Y ( x | y ) =
f XY ( x, y )
fY ( y )
The joint pmf/pdf can be expressed in terms of the marginal and conditional pmf/pdf’s, which leads to an extremely
useful application of the law to total probability.
➢ X and Y discrete:
p XY ( x, y ) = p X |Y ( x | y ) pY ( y )  p X ( x ) =  p XY ( x, y ) =  p X |Y ( x | y ) pY ( y )
y
y
p XY ( x, y ) = pY | X ( y | x ) p X ( x )  pY ( y ) =  p XY ( x, y ) =  pY | X ( y | x ) p X ( x )
x
x
➢ X and Y continuous:


−
−


−
−
f XY ( x, y ) = f X |Y ( x | y ) fY ( y )  f X ( x ) =  f XY ( x, y ) dy =  f X |Y ( x | y ) fY ( y ) dy
f XY ( x, y ) = fY | X ( y | x ) f X ( x )  fY ( y ) =  f XY ( x, y ) dx =  fY | X ( y | x ) f X ( x ) dx
17
Chapter 3
Statistics 2A
STA02A2
Example
Consider two discrete random variables X and Y with joint pmf:
Y=0
Y=1
Y=2
X=0
P ( X = 0, Y = 0 ) = 16
P ( X = 0, Y = 1) = 14
P ( X = 0, Y = 2 ) = 18
X=1
P ( X = 1, Y = 0 ) = 18
P ( X = 1, Y = 1) = 16
P ( X = 1, Y = 2 ) = 16
Given the marginal pmf p ( x ) , calculate pY | X ( 2 | 0 ) = P (Y = 2 | X = 0 ) .
x=0
 13
24
 11
p ( x ) =  24
0

x =1
pY | X ( y | x ) =
p XY ( x, y )
pX ( x )
otherwise
 pY | X ( 2 | 0 ) =
p XY ( 2, 0 ) 18
3
=
=
13
pX ( 0 )
13
24
Exercise 3
Consider two continuous random variables X and Y with joint pdf f ( x, y ) .
0.5
f ( x, y ) = 
0
for 0  x  y  2
otherwise
1) Identify the valid range, with coordinates, of the sample space.
18
Chapter 3
Statistics 2A
STA02A2
2) Find the marginal distribution, with valid range, of X.
3) Find the marginal distribution, with valid range, of Y.
4) Determine whether X and Y are independent.
19
Chapter 3
Statistics 2A
STA02A2
5) Find the conditional distribution, with valid range, of fY | X ( y | x ) .
6) Calculate fY | X ( y |1.5) and specify the valid range.
7) Calculate P (Y  1.7 | X = 1.5) .
20
Download