STAT355 - Probability & Statistics Chapter 5: Joint Probability

advertisement
STAT355 - Probability & Statistics
Chapter 5:
Joint Probability Distributions and Random Samples
Fall 2011
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint1 Probab
/ 34
Chap 5 - Joint Probability Distributions and Random
Samples
1
5.1 Jointly Distributed Random Variables
2
5.2 Expected Values, Covariance, and Correlation
3
5.3 Statistics and Their Distributions
4
5.4 The Distribution of the Sample Mean
5
5.5 The Distribution of a Linear Combination
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint2 Probab
/ 34
Two Discrete Random Variables
I The probability mass function (pmf) of a single discrete rv X specifies
how much probability mass is placed on each possible X value.
I The joint pmf of two discrete rvs X and Y describes how much
probability mass is placed on each possible pair of values (x, y ).
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint3 Probab
/ 34
Two Discrete Random Variables
Definition
I Let X and Y be two discrete rvs defined on the sample space S of an
experiment. The joint probability mass function p(x, y ) is defined for
each pair of numbers (x, y ) by
p(x, y ) = P(X = x and Y = y )
I The marginal probability mass function of X , denoted by pX (x), is
given by
X
pX (x) =
p(x, y ) for each possible value x
y
I Similarly, the marginal probability mass function of Y is
X
pY (y ) =
p(x, y ) for each possible value y .
x
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint4 Probab
/ 34
Two Discrete Random Variables - Remarks
I It must be the case that p(x, y ) ≥ 0 and
P P
x
y
p(x, y ) = 1.
I Let A be any set consisting of pairs of (x, y ) values (e.g.,
A = (x, y ) : x + y = 5 or (x, y ) : max(x, y ) ≤ 3). Then the probability
P[(X , Y ) ∈ A] is obtained by summing the joint pmf over pairs in A:
XX
P[(X , Y ) ∈ A] =
p(x, y )
(x,y ) ∈A
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint5 Probab
/ 34
Two Discrete Random Variables - Examples
A large insurance agency services a number of customers who have
purchased both a homeowner’s policy and an automobile policy from the
agency. For each type of policy, a deductible amount must be specified.
For an automobile policy, the choices are $100 and $250, whereas for a
homeowner’s policy, the choices are 0, $100, and $200.
Suppose an individual with both types of policy is selected at random from
the agency’s files. Let
X = the deductible amount on the auto policy, and
Y = the deductible amount on the homeowner’s policy.
Possible (X , Y ) pairs are then
(100, 0), (100, 100), (100, 200), (250, 0), (250, 100), and (250, 200); the
joint pmf specifies the probability associated with each one of these pairs,
with any other pair having probability zero.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint6 Probab
/ 34
Two Discrete Random Variables - Examples
Suppose the joint pmf is given in the accompanying joint probability table:
y
p(x,y)
x 100
250
0
0.2
0.05
100
0.10
0.15
200
0.20
0.30
Then
p(100, 100) = P(X = 100andY = 100)
= P($100 deductible on both policies)
= .10.
The probability P(Y ≥ 100) is computed by summing probabilities of all
(x, y ) pairs for which y ≥ 100:
P(Y ≥ 100) = p(100, 100)+p(250, 100)+p(100, 200)+p(250, 200) = 0.75
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint7 Probab
/ 34
Two Discrete Random Variables - Examples
The possible X values are x = 100 and x = 250, so computing row totals
in the joint probability table yields
pX (100) = p(100, 0) + p(100, 100) + p(100, 200) = .50
and
pX (250) = p(250, 0) + p(250, 100) + p(250, 200) = .50
The marginal pmf of X is then
0.5
pX (x) =
0
if x = 100, 200
otherwise .
And the marginal for Y is

 0.25
0.50
pY (y ) =

0
if y = 0, 100
if y = 200
otherwise .
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint8 Probab
/ 34
Two Continuous Random Variables
I The probability that the observed value of a continuous rv X lies in a
one-dimensional set A (such as an interval) is obtained by integrating the
pdf f (x) over the set A.
I Similarly, the probability that the pair (X , Y ) of continuous rv’s falls in
a two-dimensional set A (such as a rectangle) is obtained by integrating a
function called the joint density function.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint9 Probab
/ 34
Two Continuous Random Variables
Definition
Let X and Y be continuous rv’s. A joint probability density function
fR (x, yR) for these two variables is a function satisfying f (x, y ) ≥ 0 and
∞
∞
−∞ −∞ f (x, y )dxdy = 1. Then for any two-dimensional set A
Z Z
P[(X , Y ) ∈ A] =
f (x, y )dxdy
A
In particular, if A is the two-dimensional rectangle
{(x, y ) : a ≤ x ≤ b, c ≤ y ≤ d}, then
Z
b
Z
P[(X , Y ) ∈ A] = P(a ≤ X ≤ b, c ≤ Y ≤ d) =
f (x, y )dxdy
a
STAT355 ()
- Probability & Statistics
d
c
Chapter
Fall 2011
5: Joint
10 Probab
/ 34
Two Continuous Random Variables
Definition
The marginal probability density functions of X and Y , denoted by fX (x)
and fY (y ), respectively, are given by
Z ∞
fX (x) =
f (x, y )dy for − ∞ ≤ x ≤ ∞
(1)
−∞
Z ∞
fY (y ) =
f (x, y )dx for − ∞ ≤ y ≤ ∞
(2)
−∞
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
11 Probab
/ 34
Two Continuous Random Variables - Examples
A bank operates both a drive-up facility and a walk-up window. On a
randomly selected day, let
X = the proportion of time that the drive-up facility is in use
(at least one customer is being served or waiting to be served) and
Y = the proportion of time that the walk-up window is in use.
Then the set of possible values for (X , Y ) is the rectangle
D = {(x, y ) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}.
Suppose the joint pdf of (X , Y ) is given by
6
2
if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
5 (x + y )
f (x, y ) =
0
otherwise .
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
12 Probab
/ 34
Two Continuous Random Variables - Examples
Suppose the joint pdf of (X , Y ) is given by
6
2
if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
5 (x + y )
f (x, y ) =
0
otherwise .
Verify that this is a legitimate pdf
1 f (x, y ) ≥ 0
R∞ R∞
2
−∞ −∞ f (x, y )dxdy = 1
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
13 Probab
/ 34
Independent Random Variables
Definition
Two random variables X and Y are said to be independent if for every
pair of x and y values
p(x, y ) = pX (x)pY (y ) when X and Y are discrete
or
(3)
f (x, y ) = fX (x)fY (y ) when X and Y are continuous
If (3) is not satisfied for all (x, y ), then X and Y are said to be dependent.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
14 Probab
/ 34
Two Continuous Random Variables - Examples
In the insurance situation
p(100, 100) = .10 6= (.5)(.25) = pX (100)pY (100)
so X and Y are not independent.
Independence of two random variables is most useful when the description
of the experiment under study suggests that X and Y have no effect on
one another.
Then once the marginal pmfs or pdfs have been specified, the joint pmf or
pdf is simply the product of the two marginal functions. It follows that
P(a ≤ X ≤ b, c ≤ Y ≤ d) = P(a ≤ X ≤ b)P(c ≤ Y ≤ d)
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
15 Probab
/ 34
Conditional Distributions
Definition
Let X and Y be two continuous rvs with joint pdf f (x, y ) and marginal X
pdf fX (x). Then for any X value x for which fX (x) > 0, the conditional
probability density function of Y given that X = x is
fY |X (y |x) =
f (x, y )
fX (x)
−∞<y <∞
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
16 Probab
/ 34
Exercise (5.1) 13
You have two lightbulbs for a particular lamp. Let X = the lifetime of the
first bulb and Y = the lifetime of the second bulb (both in 1000s of
hours). Suppose that X and Y are independent and that each has an
exponential distribution with parameter λ = 1.
1 What is the joint pdf of X and Y ?
2 What is the probability that each bulb lasts at most 1000 hours (i.e.
X ≤ 1 and Y ≤ 1)?
3 What is the probability that the total lifetime of the two bulbs is at
most 2? [Hint: Draw a picture of the region
A = {(x, y ) : x ≥ 0, y ≥ 0, x + y ≤ 2} before integrating.]
4 What is the probability that the total lifetime is between 1 and 2?
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
17 Probab
/ 34
Expected Values
Proposition
Let X and Y be jointly distributed rv’s with pmf p(x, y ) or pdf f (x, y )
according to whether the variables are discrete or continuous. Then the
expected value of a function h(X , Y ), denoted by E [h(X , Y )] or µh(X ,Y ) ,
is given by
P P
h(x, y )p(x, y )
if X and Y are discrete
E [h(X , Y )] = R ∞ Rx∞ y
if X and Y are continuous
−∞ −∞ h(x, y )f (x, y )dxdy
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
18 Probab
/ 34
Expected Values - Example
The joint pdf of the amount X of almonds and amount Y of cashews in a
1-lb can of nuts was
24xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, x + y ≤ 1
f (x, y ) =
0
otherwise
If 1 lb of almonds cost the company $100, 1 lb of cashews costs $1.50,
and 1 lb of peanuts costs $0.50, then the cost of the contents of a can is
h(X , Y ) = (1)X + (1.5)Y + (0.5)(1 − X − Y ) = 0.5 + 0.5X + Y
The expected total cost is
Z Z
E [h(X , Y )] =
h(x, y )f (x, y )dxdy
Z
1Z
=
1−x
(0.5 + 0.5x + y )24xydxdy
0
0
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
19 Probab
/ 34
Covariance
I When two random variables X and Y are not independent, it is
frequently of interest to assess how strongly they are related to one
another.
Definition
The covariance between two rv’s X and Y is
Cov (X , Y ) = E [(X − µX )(Y − µY )]
P P
(x − µX )(y − µY )p(x, y )
R ∞ Rx∞ y
=
−∞ −∞ (x − µX )(y − µY )f (x, y )dxdy
X , Y discrete
X , Y cont.
The following shortcut formula for Cov (X , Y ) simplifies the computations.
Proposition
Cov (X , Y ) = E (XY ) − µX µY
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
20 Probab
/ 34
Covariance
I Since X − µX and Y − µY are the deviations of the two variables from
their respective mean values, the covariance is the expected product of
deviations.
Remarks:
1
Cov (X , X ) = E [(X − µX )2 ] = V (X ).
2
If X and Y have a strong positive relationship to one another then
Cov (X , Y ) should be quite positive.
3
For a strong negative relationship, Cov (X , Y ) should be quite
negative.
4
If X and Y are not strongly related, Cov (X , Y ) is near 0.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
21 Probab
/ 34
Correlation
Definition
The correlation coefficient of X and Y , denoted by Corr (X , Y ), ρX ,Y , or
just ρ, is defined by
Cov (X , Y )
ρX ,Y =
σX σY
where σX and sigmaY are the standard deviations of X and Y .
Proposition
If a and c are either both positive or both negative,
Corr (aX + b, cY + d) = Corr (X , Y )
For any two rv’s X and Y ,
1 ≤ Corr (X , Y ) ≤ 1.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
22 Probab
/ 34
Correlation
Proposition
1
If X and Y are independent, then ρX ,Y = 0, but ρ = 0 does not
imply independence.
2
ρ = 1 or −1 iff Y = aX + b for some numbers a and b with a 6= 0.
I This proposition says that ρ is a measure of the degree of linear
relationship between X and Y , and only when the two variables are
perfectly related in a linear manner will ρ be as positive or negative as it
can be.
I A ρ less than 1 in absolute value indicates only that the relationship is
not completely linear, but there may still be a very strong nonlinear
relation.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
23 Probab
/ 34
Exercise (5.2) 27
Annie and Alvie have agreed to meet for lunch between noon (0:00pm)
and 1:00pm. Denote Annie’s arrival time by X , Alvie’s by Y , and suppose
X and Y are independent with pdf’s
3x 2 0 ≤ x ≤ 1
fX (x) =
0
otherwise
2y 0 ≤ y ≤ 1
fY (y ) =
0
otherwise
What are the expected amount of time that the one who arrives first must
wait for the other person? [Hint: h(X , Y ) = |X − Y |]
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
24 Probab
/ 34
Exercise (5.2) 35
1
2
3
Use the rules of expected value to show that
Cov (aX + b, cY + d) = ac Cov (X , Y ).
Use part 1. along with the rules of variance and standard deviation to
show that Corr (aX + b, cY + d) = Corr (X , Y ) when a and c have
the same sign.
What happens if a and c have opposite sign.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
25 Probab
/ 34
Random Samples
Definition
A statistic is any quantity whose value can be calculated from sample
data.
I A statistic is a random variable and will be denoted by an uppercase
letter; a lowercase letter is used to represent the calculated or observed
value of the statistic.
Definition
The rv’s X1 , X2 , ..., Xn are said to form a (simple) random sample of size n
if
1
The Xi ’s are independent rvs.
2
Every Xi has the same probability distribution.
A random sample Xi , i = 1, ..., n is sometimes referred to as iid
(independent and identically distributed).
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
26 Probab
/ 34
Exercise (5.3) 39
It is known that 80% of all brand A zip drives work in a satisfactory
manner throughout the warranty period (are ”successes”). Suppose that
n = 10 drives are randomly selected. Let X = the number of successes in
the sample. The statistic X /n is the sample proportion (fraction) of
successes. Obtain the sampling distribution of this statistic. [Hint: One
possible value of X /n is 0.3. What is the probability of this value (what
kind of random variable is X )?
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
27 Probab
/ 34
The Distribution of the Sample Mean
Notation: Let X1 , ..., Xn be an iid rv’s. The sample mean is denoted by
X̄ =
n
X
Xi
i=1
Proposition
Let X1 , X2 , ..., Xn be a random sample from a distribution with mean value
µ and standard deviation σ. Then
1
2
E (X̄ ) = µX̄ = µ
√
V (X̄ ) = σX̄2 = σ 2 /n and σX̄ = σ/ n
In addition, with T0 = X1 + ... + Xn , E (T0 ) = nµ.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
28 Probab
/ 34
The Distribution of the Sample Mean
The Central Limit Theorem (CLT)
Theorem
Let X1 , X2 , ..., Xn be a random sample from a distribution with mean µ
and variance σ 2 . Then if n is sufficiently large, X̄ has approximately a
normal distribution with mean µX̄ and variance σX̄2 = σ 2 /n and T0 also
has approximately a normal distribution with mean µT0 = nµ and variance
σT2 0 = nσ 2 .
Remark: The larger the value of n, the better the approximation.
Rule of Thumb: If n > 30, the Central Limit Theorem can be used.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
29 Probab
/ 34
CLT - Example
The CLT can be used to justify the normal approximation to the binomial
distribution discussed earlier. We know that a binomial variable X is the
number of successes in a binomial experiment consisting of n independent
success/failure trials with p = P(S) for any particular trial. Define a new
rv X1 by
1 if the first trial results in a success
X1 =
0 if the first trial results in a failure
and define X2 , X3 , ..., Xn analogously for the other n1 trials. Each Xi
indicates whether or not there is a success on the corresponding trial.
Because the trials are independent and P(S) is constant from trial to trial,
the Xi s are iid (a random sample from a Bernoulli distribution).The CLT
then implies that if n is sufficiently large, both the sum and the average of
the Xi ’s have approximately normal distributions.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
30 Probab
/ 34
Exercise (5.4) 55
The number of parking tickets issued in a certain city on any given
weekday has a Poisson distribution with parameter µ = 50. What is the
approximate probability that
1 between 35 and 70 tickets are given out on a particular day? [Hint:
When µ is large, a Poisson rv has approximately a normal
distribution.]
2 The total number of tickets given out during a 5-day week is between
225 and 175?
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
31 Probab
/ 34
The Distribution of a Linear Combination
Definition
Given a collection of n random variables X1 , ..., Xn and n numerical
constants a1 , ..., an , the rv
Y = a1 X1 + ... + an Xn
is called a linear combination of the Xi ’s.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
32 Probab
/ 34
The Distribution of a Linear Combination
Proposition
Let X1 , X2 , ..., Xn have mean values µ1 , ..., µn , respectively, and variances
σ12 , ..., σn2 respectively.
1
Whether or not the Xi ’s are independent,
E (a1 X1 + ... + an Xn ) = a1 E (X1 ) + ... + an E (Xn ) = a1 µ1 + ... + an µn
2
If X1 , ..., Xn are independent,
V (a1 X1 + ... + an Xn ) = a12 V (X1 ) + ... + an2 V (Xn ) = a12 σ12 + ... + an2 σn2
3
For any X1 , ..., Xn ,
V (a1 X1 + ... + an Xn ) =
n X
n
X
ai aj Cov (Xi , Xj )
i=1 j=1
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
33 Probab
/ 34
Exercise (5.5) 73
Suppose the expected tensile strength of type-A steel is 105 ksi and the
standard deviation of tensile strength is 8 ksi. For type-B steel, suppose
the expected tensile strength and standard deviation of tensile strength are
100 ksi and 6 ksi, respectively. Let X̄ = the sample average tensile
strength of a random sample of 40 type-A specimens, and let Ȳ = the
sample average tensile strength of a random sample of 35 type-B
specimens.
1
What is the approximate distribution of X̄ ?, Of Ȳ ?
2
What is the approximate distribution of X̄ − Ȳ ? Justify your answer.
3
Calculate (approximately) P(−1 ≤ X̄ − Ȳ ≤ 1)
4
Calculate P(X̄ − Ȳ ≥ 10). If you actually observed X̄ − Ȳ ≥ 10,
would you doubt that µ1 − µ2 = 5?
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
5: Joint
34 Probab
/ 34
Download