docx - Tsinghua Math Camp 2015

advertisement
Probability & Statistics
Professor Wei Zhu
July 24th
(I)
The sum of two normal random variables may not be
normal.
Counter example: 𝑍~𝑁(0,1), and the value of random variable W depends on a
𝑍, 𝑖𝑓 β„Žπ‘’π‘Žπ‘‘
coin (fair coin) flip: π‘Š = {
, please
−𝑍, 𝑖𝑓 π‘‘π‘Žπ‘–π‘™
(a) Find the distribution of W.
(b) Is the joint distribution of Z and W a bivariate normal?
(c) Is W+Z normal?
Answer:
(a)
πΉπ‘Š (𝑀) = 𝑃(π‘Š ≤ 𝑀)
= 𝑃(π‘Š ≤ 𝑀|𝐻)𝑃(𝐻) + 𝑃(π‘Š ≤ 𝑀|𝑇)𝑃(𝑇)
= 𝑃(𝑍 ≤ 𝑀)𝑃(𝐻) + 𝑃(−𝑍 ≤ 𝑀)𝑃(𝑇)
1
1
= 𝐹𝑍 (𝑀) × + 𝑃(𝑍 ≥ −𝑀) ×
2
2
1
1
2
2
= 𝐹𝑍 (𝑀) × + 𝑃(𝑍 ≤ 𝑀) ×
So π‘Š~𝑁(0,1).
1
1
= 𝐹𝑍 (𝑀) × + 𝐹𝑍 (𝑀) ×
2
2
= 𝐹𝑍 (𝑀)
(b)
𝑃(π‘Š + 𝑍 = 0) = 𝑃(π‘Š + 𝑍 = 0|𝐻)𝑃(𝐻) + 𝑃(π‘Š + 𝑍 = 0|𝑇)𝑃(𝑇)
1
1 1
= 1× +0× =
2
2 2
Therefore, π‘Š + 𝑍 is not a normal random variable.
(c)
The joint distribution of Z and W is not bivariate normal. This can be shown by
deriving the joint mgf of Z and W and compare it with the joint mgf of bivariate
normal (which we already knew, from the previous lecture).
𝑀(𝑑1 , 𝑑2 ) = 𝐸(𝑒 𝑑1 𝑍+𝑑2 π‘Š )
= 𝐸(𝐸(𝑒 𝑑1 𝑍+𝑑2 π‘Š |π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’ π‘œπ‘“ π‘π‘œπ‘–π‘› 𝑓𝑙𝑖𝑝))
= 𝐸(𝑒 𝑑1 𝑍+𝑑2 π‘Š |𝐻)𝑃(𝐻) + 𝐸(𝑒 𝑑1 𝑍+𝑑2 π‘Š |𝑇)𝑃(𝑇)
1
= 𝐸(𝑒 (𝑑1 +𝑑2 )𝑍 ) ×
1
1
+ 𝐸(𝑒 (𝑑1 −𝑑2 )𝑍 ) ×
2
2
(𝑑1 −𝑑2 )2
1 (𝑑1 +𝑑2 )2
2
= (𝑒
+𝑒 2 )
2
This is not the joint mgf of bivariate normal. Therefore the joint distribution of
π‘Š π‘Žπ‘›π‘‘ 𝑍 is not bivariate normal.
(II)
Other Common Univariate Distributions
Dear students, besides the Normal, Bernoulli and Binomial distributions, the
following distributions are also very important in our studies.
1. Discrete Distributions
1.1. Geometric distribution
This is a discrete waiting time distribution. Suppose a sequence of independent
Bernoulli trials is performed and let X be the number of failures preceding the
first success. Then 𝑋~πΊπ‘’π‘œπ‘š(𝑝), with the pdf
𝑓(π‘₯) = 𝑃(𝑋 = π‘₯) = 𝑝(1 − 𝑝)π‘₯ , π‘₯ = 0,1,2, …
1.2. Negative Binomial distribution
Suppose a sequence of independent Bernoulli trials is conducted. If X is the
number of failures preceding the nth success, then X has a negative binomial
distribution.
Probability density function:
𝑓(π‘₯) = 𝑃(𝑋 = π‘₯) =
(𝑛 + π‘₯ − 1)
⏟ 𝑛−1
π‘₯
𝑝𝑛 (1 − 𝑝) ,
ways of allocating
failures and successes
𝑛(1 − 𝑝)
,
𝑝
𝑛(1 − 𝑝)
π‘‰π‘Žπ‘Ÿ(𝑋) =
,
𝑝2
𝑛
𝑝
(𝑑)
𝑀π‘₯
=[
] .
1 − 𝑒 𝑑 (1 − 𝑝)
𝐸(𝑋) =
1. If n = 1, we obtain the geometric distribution.
2. Also seen to arise as sum of n independent geometric variables.
2
1.3. Poisson distribution
Parameter: rate πœ† > 0
MGF: 𝑀π‘₯ (𝑑) = 𝑒 πœ†(𝑒
𝑑 −1)
Probability density function:
𝒇(𝒙) = 𝑷(𝑿 = 𝒙) =
𝒆−𝝀 𝝀𝒙
, 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
1. The Poisson distribution arises as the distribution for the number of “point
events” observed from a Poisson process.
Examples:
Figure 1: Poisson Example
2. The Poisson distribution also arises as the limiting form of the binomial
distribution:
𝑛 → ∞,
𝑛𝑝 → πœ†
𝑝→0
The derivation of the Poisson distribution (via the binomial) is underpinned by a
Poisson process i.e., a point process on [0, ∞); see Figure 1.
AXIOMS for a Poisson process of rate λ > 0 are:
(A) The number of occurrences in disjoint intervals are independent.
(B) Probability of 1 or more occurrences in any sub-interval [𝑑, 𝑑 + h) is πœ†h +
π‘œ(h) (h → 0) (approx prob. is equal to length of interval π‘₯πœ†).
(C) Probability of more than one occurrence in [𝑑, 𝑑 + h) is π‘œ(h) (h → 0) (i.e.
prob is small, negligible).
Note: π‘œ(h), pronounced (small order h) is standard notation for any function
π‘Ÿ(β„Ž) with the property:
π‘Ÿ(β„Ž)
=0
β„Ž→0 β„Ž
π‘™π‘–π‘š
3
1.4. Hypergeometric distribution
Consider an urn containing M black and N white balls. Suppose n balls are
sampled randomly without replacement and let X be the number of black balls
chosen. Then X has a hypergeometric distribution.
Parameters:
𝑀, 𝑁 > 0, 0 < 𝑛 ≤ 𝑀 + 𝑁
Possible values: π‘šπ‘Žπ‘₯ (0, 𝑛 − 𝑁) ≤ π‘₯ ≤ π‘šπ‘–π‘› (𝑛, 𝑀)
Probability density function:
𝑴) ( 𝑡 )
𝒇(𝒙) = 𝑷(𝑿 = 𝒙) = 𝒙 𝒏 − 𝒙 ,
(𝑴 + 𝑡)
𝒏
𝑀
𝑀 + 𝑁 − 𝑛 𝑛𝑀𝑁
𝐸(𝑋) = 𝑛
,
π‘‰π‘Žπ‘Ÿ(𝑋) =
.
𝑀+𝑁
𝑀 + 𝑁 − 1 (𝑀 + 𝑁)2
(
The mgf exists, but there is no useful expression available.
1. The hypergeometric distribution is simply
# samples with x black balls
# possible samples
𝑀
𝑁
( )(
)
= π‘₯ 𝑛−π‘₯ ,
𝑀+𝑁
(
)
𝑛
2. To see how the limits arise, observe we must have x ≤ n (i.e., no more than
sample size of black balls in the sample.) Also, x ≤ M, i.e., x ≤ min (𝑛, 𝑀).
Similarly, we must have x ≥ 0 (i.e., cannot have < 0 black balls in sample), and
𝑛 − π‘₯ ≤ 𝑁 (i.e., cannot have more white balls than number in urn).
i.e. π‘₯ ≥ 𝑛 − 𝑁
i.e. π‘₯ ≥ max (0, 𝑛 − 𝑁).
𝑀
3. If we sample with replacement, we would get 𝑋 ∼ 𝐡 (𝑛, 𝑝 = 𝑀+𝑁). It is
interesting to compare moments:
4
finite population correction
hypergeometric
𝐸(𝑋) = 𝑛𝑝
binomial
𝐸(π‘₯) = 𝑛𝑝
↑
𝑀+𝑁−n
π‘‰π‘Žπ‘Ÿ(𝑋) =
[𝑛𝑝(1 − 𝑝)]
𝑀+𝑁−1
π‘‰π‘Žπ‘Ÿ(𝑋) = 𝑛𝑝(1 − 𝑝)
↓
When sample all
balls in urn π‘‰π‘Žπ‘Ÿ(𝑋)~0
2. Continuous Distributions
2.1 Uniform Distribution
CDF, for a < π‘₯ < 𝑏:
π‘₯
π‘₯
𝐹(π‘₯) = ∫ 𝑓(π‘₯)𝑑π‘₯ = ∫
−∞
π‘Ž
1
π‘₯−π‘Ž
𝑑π‘₯ =
,
𝑏−π‘Ž
𝑏−π‘Ž
that is,
0
π‘₯−π‘Ž
𝐹(π‘₯) = {
𝑏−π‘Ž
1
𝑀𝑋 (𝑑) =
π‘₯≤π‘Ž
π‘Ž<π‘₯<𝑏
𝑏≤π‘₯
𝑒 𝑑𝑏 − 𝑒 π‘‘π‘Ž
𝑑(𝑏 − π‘Ž)
A special case is the 𝑼(𝟎, 𝟏) distribution:
𝒇(𝒙) = {
𝟏, 𝟎 < π‘₯ < 1
𝟎, 𝐨𝐭𝐑𝐞𝐫𝐰𝐒𝐬𝐞,
𝐹(π‘₯) = π‘₯ for 0 < π‘₯ < 1
1
𝐸(𝑋) = ,
2
π‘‰π‘Žπ‘Ÿ(𝑋) =
1
,
12
𝑀(𝑑) =
𝑒𝑑 − 1
.
𝑑
2.2 Exponential Distribution
PDF: 𝒇(𝒙) = 𝝀𝒆−𝝀𝒙 , 𝒙 ≥ 𝟎
CDF: 𝑭(𝒙) = 𝟏 − 𝒆−𝝀𝒙 ,
𝑀𝑋 (𝑑) =
πœ†
, πœ† > 0, π‘₯ ≥ 0.
πœ†−𝑑
This is the distribution for the waiting time until the first occurrence in a Poisson
process with rate parameter πœ† > 0.
5
Exercise: In a Poisson process with rate πœ†, the inter-arrival time (time between
two adjacent events) follows i.i.d. exponential(πœ†).
1. If 𝑋~𝐸π‘₯𝑝(πœ†) then,
𝑷(𝑿 ≥ 𝒕 + 𝒙|𝑿 ≥ 𝒕) = 𝑷(𝑿 ≥ 𝒙)
(memoryless property)
2. It can be obtained as limiting form of geometric distribution.
2.3 Gamma distribution
𝒇(𝒙) =
π€πœΆ
𝜞(𝜢)
π’™πœΆ−𝟏 𝒆−𝝀𝒙 , 𝜢 > 0, πœ† > 0, π‘₯ ≥ 0
With
∞
gamma function: 𝛀(𝛼) = ∫0 𝑑 𝛼−1 𝑒 −𝑑 𝑑𝑑
πœ†
𝛼
mgf : 𝑀𝑋 (𝑑) = (πœ†−𝑑) , 𝑑 < πœ†.
Suppose Y1 , . . . , Y𝐾 are independent Exp(λ) random variables and let
X = Y1 + β‹― + Y𝐾 .
Then 𝑋 ∼ Gamma(𝐾, πœ†), for 𝐾 integer. In general, 𝑋 ∼ Gamma(α, λ), α > 0.
1. α is the shape parameter,
πœ† is the scale parameter
π‘Œ
Note: if π‘Œ ∼ Gamma (𝛼, 1) and 𝑋 = πœ†, then 𝑋 ∼ Gamma(α, λ), That is, λ is
scale parameter.
Figure 2: Gamma Distribution
𝑝 1
2. Gamma ( 2 , 2) distribution is also called χ2𝑝 (chi-square with 𝑝 df)
distribution
6
if 𝑝 is integer;
χ22 = exponential distribution (for 2 df).
3. Gamma (𝐾, πœ†) distribution can be interpreted as the waiting time until the
𝐾 π‘‘β„Ž occurrence in a Poisson process.
2.4 Beta density function
Suppose Y1 ∼ Gamma (𝛼, πœ†), π‘Œ2 ∼ Gamma (β, λ) independently, then,
π‘Œ1
X=
~𝐡(𝛼, 𝛽), 0 ≤ π‘₯ ≤ 1.
π‘Œ1 + π‘Œ2
Remark: see soon for derivation!
2.5 Standard Cauchy distribution
Possible values: π‘₯ ∈ 𝑅
1
1
1
1
2
πœ‹
PDF: 𝑓(π‘₯) = πœ‹ (1+π‘₯2 ) ; (location parameter πœƒ = 0)
CDF: 𝐹(π‘₯) = + arctanπ‘₯
𝐸(𝑋), π‘‰π‘Žπ‘Ÿ(𝑋), 𝑀𝑋 (𝑑) do not exist.
The Cauchy is a bell-shaped distribution symmetric about zero for which no
moments are defined.
If
𝑍1 ∼ 𝑁(0, 1)
𝑍1
~Cauthy distribution.
𝑍2
and
𝑍2 ∼ 𝑁(0, 1)
(III) Let’s Have Some Fun!
independently,
then
𝑋=
The Post Office
A post office has two clerks, Jack and Jill. It is known that their service times
are two independent exponential random variables with the same parameter
λ, and it is known that Jack spends on the average 10 minutes with each
customer. Suppose Jack and Jill are each serving one customer at the moment,
(a) What is the distribution of the waiting time for the next customer
in line?
7
(b) What is the probability that the next customer in line will be the
last customer (among the two customers being served and
him/herself) to leave the post office?
Solution:
(a) Let X1 and X2 denote the service time for Jack and Jill, respectively. Then the
waiting time for the next customer in line is:
Y=min (X1,X2), where Xi ~ iid exp(λ), i=1,2.
Furthermore, E(X1)=1/λ=10, λ=1/10. From the definition of exponential R.V.,
ο‚₯
f X i ( xi ) ο€½ eο€­ xi , xi>0, i=1,2 and P(Xi>x)=  f xi (u )du = e ο€­ xi , xi>0, i=1,2.
x
FY(y)=P(Yο‚£y)=1-P(Y>y)=1-P(X1>y,X2>y)=1-P(X1>y)·P(X2>y)=1- (e-λy)2=1- e-2λy
Therefore fY(y)=
d
FY(y)=(2λ) e-(2λ) y, y>0. Thus Y ~ exp (2λ), where λ=1/10.
dy
Note: Because of the memoryless property of the exponential distribution, it
does not matter how long the current customers has been served by Jack or
Jill.
(b) One of the two customers being served right now will leave first. Now
between the customer who is still being served and the next customer in line,
their service time would follow the same exponential distribution because of
the memoryless property of the exponential random variable. Therefore, each
would have the same chance to finish first or last. That is, the next customer
in line will be the last customer to leave with probability 0.5.
Exercise.
1.
Let X be a random variable following the Bernoulli distribution with success
probability p. Please
(1) Write down the probability density function of X,
(2) Derive its moment generating function,
(3) Derive the moment generating function of a B(n,p) (Binomial with
parameters n and p) random variable.
(4) Derive that the summation of n i.i.d Bernoulli (p) random variables would
follow B(n,p).
2.
Let X~N(μ1 , σ12 ), and Y~N(μ2 , σ22 ). Furthermore, X and Y are independent
to each other. Please derive the distribution of
(1) X+Y
(2) X-Y
(3) 2X-3Y+5
8
Solutions:
Q1.
(1) f(x) = px(1-p)1-x, x= 0 or 1.
(2) 𝑀𝑋 (t) = E(𝑒 𝑑𝑋 ) = 𝑒 0βˆ™π‘‘ 𝑃(𝑋 = 0) + 𝑒 1βˆ™π‘‘ 𝑃(𝑋 = 1) = 1 − 𝑝 + 𝑝𝑒 𝑑
(3) Assume Y ~ B(n,p)
𝑛
𝑛
π‘˜=0
π‘˜=0
𝑛
π‘€π‘Œ (t) = E(𝑒 π‘‘π‘Œ ) = ∑ 𝑒 π‘‘π‘˜ 𝑃(Y = π‘˜) = ∑ 𝑒 π‘‘π‘˜ ( ) π‘π‘˜ (1 − 𝑝)𝑛−π‘˜
π‘˜
𝑛
𝑛
= ∑ ( ) (𝑝𝑒 𝑑 )π‘˜ (1 − 𝑝)𝑛−π‘˜ = (1 − 𝑝 + 𝑝𝑒 𝑑 )𝑛
π‘˜
π‘˜=0
(4). Assume 𝑋1 , 𝑋2 , … , 𝑋𝑛 ~π΅π‘’π‘Ÿπ‘›π‘œπ‘’π‘™π‘™π‘–(𝑝) i. i. d
Let Y = ∑𝑛𝑖=1 𝑋𝑖
𝑛
π‘€π‘Œ (t) = E(𝑒 π‘‘π‘Œ ) = E(𝑒 𝑑 ∑𝑖=1 𝑋𝑖 )
𝑛
= ∏ E(𝑒 𝑑𝑋𝑖 ) = [E(𝑒 𝑑𝑋1 )]𝑛 (since 𝑋𝑖 π‘Žπ‘Ÿπ‘’ i. i. d. )
𝑖=1
= (1 − 𝑝 + 𝑝𝑒 𝑑 )𝑛 (from conclusion of part (1))
Since the probability distribution can be identified by mgf, by comparing the results
of part (3) and (4), we conclude that the sum of n i.i.d Bernoulli(p) random variables
would follow B(n, p).
Q2
If Z ~ N (μ, σ2)
∞
𝑀𝑍 (t) = E(𝑒
𝑑𝑍 )
=∫ 𝑒
−∞
𝑑𝑧
1
√2πœ‹πœŽ
𝑒
−
(z−μ)2
2𝜎2 𝑑𝑧
∞
= ∫
1
−∞ √2πœ‹πœŽ
∞
[𝑧−(μ+𝜎
1
1
−
2𝜎2
= exp (πœ‡π‘‘ + 𝜎 2 𝑑 2 ) ∫
𝑒
2
−∞ √2πœ‹πœŽ
𝑒
−
1
[𝑧 2 −(2μ+2𝜎2 𝑑)+πœ‡ 2 ]
2𝜎2
𝑑𝑧
2 𝑑)]2
𝑑𝑧
1
1
= exp (πœ‡π‘‘ + 𝜎 2 𝑑 2 ) βˆ™ 1 = exp (πœ‡π‘‘ + 𝜎 2 𝑑 2 )
2
2
where 𝑓(x) =
1
√2πœ‹πœŽ
2
𝑒
−
[𝑧−(μ+𝜎2 𝑑)]
2𝜎2
𝑖𝑠 π‘‘β„Žπ‘’ 𝑝𝑑𝑓 π‘œπ‘“ 𝑁(μ + 𝜎 2 𝑑, 𝜎 2 )
Then, we have
1
𝑀𝑋 (t) = exp (πœ‡1 𝑑 + 𝜎12 𝑑 2 )
2
1
π‘€π‘Œ (t) = exp (πœ‡2 𝑑 + 𝜎22 𝑑 2 )
2
Since X and Y are independent, by theorem, we have
(1) 𝑀𝑋+Y (t) = 𝑀𝑋 (t)𝑀Y (t)
9
1
= exp [(πœ‡1 + πœ‡2 )t + (𝜎12 + 𝜎22 )𝑑 2 ]
2
⇒ 𝑋 + Y ~ N(πœ‡1 + πœ‡2 , 𝜎12 + 𝜎22 )
(2) 𝑀𝑋−Y (t) = 𝑀𝑋 (t)𝑀Y (−t)
1
= exp [(πœ‡1 − πœ‡2 )t + (𝜎12 + 𝜎22 )𝑑 2 ]
2
2
2
⇒ 𝑋 − Y ~ N(πœ‡1 − πœ‡2 , 𝜎1 + 𝜎2 )
(3) 𝑀2𝑋−3Y+5 (t) = 𝑀𝑋 (2t)𝑀Y (−3t) βˆ™ 𝑒 5𝑑
1
= exp [(2πœ‡1 − 3πœ‡2 + 5)t + (4𝜎12 + 9𝜎22 )𝑑 2 ]
2
⇒ 2𝑋 − 3Y + 5 ~ N(2πœ‡1 − 3πœ‡2 + 5, 4𝜎12 + 9𝜎22 )
10
Download