L2: Lecture notes: Distributions

advertisement
Distributions of random variables
A random variable (r.v.) is a real function
X: S → R on the sample space S (a quantitative
aspect of the random experiment).
The range is SX = { X(s) | s
S}
We distinguish discrete and continuous r.v.’s:
X is discrete iff (= if and only if) SX countable,
that is: SX = {x1, x2, …,xn} or SX = {x1, x2,…}
If X is continuous, then SX is usually an interval
in R. (a definition follows later)
Discrete r.v. X: the event X=x is{s S| X(s)= x}
The probability (mass) function of X:
P: x → P(X=x)
Requirements for a probability function P:
1.
P(X=x) ≥ 0 for every x
SX
1
2.
The distribution of a discrete r.v. X consists of a
table or formula for P(X=x) of all x
SX .
The measure for the centre of distribution is the
Expectation or Expected value of X:
E ( X )   xP ( X  x ) ,
xS X
provided that the sum is absolute convergent.
Notation: E(X) = EX = µX = µ.
Interpretation: “ Weighted average”.
Properties E(X):
1. If P(X=x) is symmetric in x = c,
then E(X)= c
2
2.
Eg ( X )   g ( x ) P ( X  x )
3.
4.
E(aX + b) = aE(X) + b
E[ag(X) + b h(X)] = aEg(X) + bh(X)
xS X
Note that in general E(X2) ≠ E(X) 2!
Functions of X and their expectation:
E(Xk) is the kth moment of X
var(X) = E(X - µX)2 is the variance of X
Notation: var(X)=  X2   2 .
3
 X  var( X ) is the standard deviation of the
r.v. X
var(X) and σX are both measures of spread for
the distribution.
Properties of var(X) and σX :
1. var(X) ≥ 0 and σX ≥ 0
2. var(X) = E(X2) - µX2
(formula for computations)
3. var(X) > 0 => E(X2) > µX2
var(X) = 0 => P(X = µX) = 1
4. var(aX + b) = a2 var(X) and σaX + b = |a|σX
Chebyshev’s inequality:
P(| X   X |  c) 
var( X )
c2
for all c > 0
Discrete distributions and characteristics:
4
name
Binomial
B(n, p)
Poisson
(µ)
Geometrical (p)
HyperGeometrical
P(X = k) =
E(X)
Var(X)
n k
  p 1 p nk
k 
np
np(1-p)
µ
µ
1
p
1 p
p2
for k = 0,1,…,n
k
k!
e ,
k= 0,1,…
1 p k 1 p ,
k = 1,2,…
 R  N  R 
 

k
n

k
 

,
N
 
 
n
np
np(1-p)×
(p= NR )
N n
N 1
k = 0, 1,…,n
Properties (linking the distributions):
1. If the parts of the populations are large
compared to the sample size (> 5n2 ), the
hypergeometrical probabilities can be
approximated with the binomial.
5
2. If X ~ B(n, p) for large n and small p so that
np > 10, X is approximately Poissondistributed with µ = np.
When to use these common discrete
distributions?
Binomial
“the number of successes in n Bernoulli trials”
Ex: X = “# sixes in 25 flips of a dice”
X ~ B(25, 1/6)
Geometrical
“the trial number of the first success when
performing Bernoulli trials”
Ex: X = “# of the first flip of a dice that results
in a 6”
Property: P(X > k) = (1-p)k , k = 0, 1, 2,…
Hyper geometrical
“The number of white balls selected when n
6
balls are selected at random without
replacement from an urn that contains R red and
N-R white balls”
Ex: X = “# of girls when 5 persons are
selected at random from a group of 8 boys and
12 girls”
Poisson
“The number of rare events in a period and/or
space”
Ex: X = “# of traffic accidents on a busy road
on a day”.
Continuous random variables
X is a continuous random variable if there exists
a non-negative function f(x) for all real x so that
for every (measurable) set B:
P( X  B )   f ( x )dx
B
7
f(x) or fX(x) is the probability density function.
Requirements: 1. f(x) ≥ 0

2.  f ( x )dx  1

Note: f(x) is not a probability, but for
small dx > 0 is P(x < X ≤ x + dx) ≈ f(x)dx
x
F ( x )  P( X  x )   f (u )du is the cumulative

distribution function (c.d.f.).
Notation: FX(x) = F(x)
Properties F(x) for every random variable X:
1.
2.
3.
a < b => F(a) ≤ F(b)
(F is non decreasing).
lim F ( x )  1
x 
lim F ( x )  F ( a )
x a
and
lim F ( x )  0
x 
(F is right continuous)
8
4.
5.
P(X > x) = 1- F(x)
P(X = x) = F(x) - limu↑x F(u)
Properties of density function f and c.d.f. F of a
continuous r.v. X:
1.
2.
3.
4.
F(x) is a continuous function
f ( x) 
d
dx
F ( x)
P(X = x) = 0
P(a < X < b) = P(a ≤ X ≤ b)
b
= F(b) – F(a) =  f ( x )dx
a
The expectation of a continuous r.v. X

E ( X ) =  xf ( x )dx

(provided that the integral is absolute
convergent).
A function Y = g(X) of a continuous r.v. X , if
we know the density function fX.
9
The density function f Y(y) can be determined in
3 steps:
1.
Express FY (y) = P( g(X) ≤ y) in FX.
2.
3.
Determine fY ( y ) 
d F ( y)
dy Y
Use the known distribution f

E (Y )  Eg ( X ) =  g ( x ) f ( x )dx


2
2
E
(
X
)

x
f ( x )dx

Especially:

All properties of E(X) and var(X) hold for
continuous random variables as well,
e.g. var(X) = E(X2) - µX2
Properties of fX (x):
1. If fX (x) is symmetric in x = c, then E(X) = c
10
2. Linear transformation: Y = aX + b
1 F (
f
(
y
)

(known fX (x)) : Y
|a| X
y b
)
a
Common continuous distributions
Probability
Name
E(X) Var(X)
density function
Uniform f(x) = b1 a , for x in ab (ba )2
2
12
U(a,b)
[a, b]
1
Exponential f(x) = e x , for x 1
2


≥0
Standard 1 x2
0
1
normal
φ(x) = 12 e 2
N(0,1)
x   2

1
Normal
 
2
2   
1
µ
σ
N(µ, σ2) f(x) = 2 2 e
These distributions are often used as a model
of the stochastic reality:
Uniform: random numbers from an interval
11
Exponential: waiting times, serving times
Normal: quantities or variables in nature,
economy etc, varying around an average
Some properties of these continuous
distributions
1. An exponential variable has no memory:
P(X > x + y | X > x) = P(X > y).
This follows from the exponential property
P(X > x) = e –x, x ≥ 0
2. If X ~ U(0, 1),
then Y = aX + (b – a) ~ U(a, b).
3. If X ~ N(µ, σ2),
then Y = aX + b ~ N(aµ+b, a2σ2).
12
Download