Lecture

advertisement
Probability Theory
Random Variables and
Distributions
Rob Nicholls
MRC LMB Statistics Course 2014
Introduction
Introduction
Significance magazine (December 2013) Royal Statistical Society
Introduction
Significance magazine (December 2013) Royal Statistical Society
Introduction
Significance magazine (December 2013) Royal Statistical Society
Contents
•
•
•
•
•
Set Notation
Intro to Probability Theory
Random Variables
Probability Mass Functions
Common Discrete Distributions
Set Notation
A set is a collection of objects, written using curly brackets {}
If A is the set of all outcomes, then:
A = {heads,tails}
A = {one, two, three, four, five, six}
A set does not have to comprise the full number of outcomes
E.g. if A is the set of dice outcomes no higher than three, then:
A = {one, two, three}
Set Notation
If A and B are sets, then:
A¢
Complement – everything but A
AÈ B
Union (or)
AÇ B
Intersection (and)
A\B
Not
Æ
Empty Set
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
A = {two, three, four, five}
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
B = { four, five, six}
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
AÇ B = { four, five}
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
AÈ B = {two,three, four, five, six}
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
(AÈ B)¢ = {one}
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
A \ B = {two,three}
Set Notation
Venn Diagram:
one
two, three
A
six
four
five
B
(A \ B)¢ = {one, four, five, six}
Probability Theory
To consider Probabilities, we need:
1. Sample space: Ω
2. Event space:
3. Probability measure: P
Probability Theory
To consider Probabilities, we need:
1. Sample space: Ω
- the set of all possible outcomes
W = {heads, tails}
W = {one, two, three, four, five, six}
Probability Theory
To consider Probabilities, we need:
2. Event space:
- the set of all possible events
W = {heads, tails}
W = {{heads, tails},{heads},{tails},Æ}
Probability Theory
To consider Probabilities, we need:
3. Probability measure: P
P : F ®[0,1]
P must satisfy two axioms:
Probability of any outcome is 1 (100% chance)
If and only if
A1, A2 ,…are disjoint
Probability Theory
To consider Probabilities, we need:
3. Probability measure: P
P : F ®[0,1]
P must satisfy two axioms:
Probability of any outcome is 1 (100% chance)
If and only if
A1, A2 ,…are disjoint
P({one, two}) = P({one})+ P({two})
1 1 1
= +
3 6 6
Probability Theory
To consider Probabilities, we need:
1. Sample space: Ω
2. Event space:
3. Probability measure: P
As such, a Probability Space is the triple: (Ω,
,P )
Probability Theory
To consider Probabilities, we need:
The triple: (Ω,
,P )
i.e. we need to know:
1. The set of potential outcomes;
2. The set of potential events that may occur; and
3. The probabilities associated with occurrence of those events.
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
A = {one, two}
A¢ = {three, four, five, six}
P(A) = 1 / 3
P( A¢) = 2 / 3
Probability Theory
Notable properties of a Probability Space (Ω,
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
,P ):
Probability Theory
Notable properties of a Probability Space (Ω,
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
A
B
,P ):
Probability Theory
Notable properties of a Probability Space (Ω,
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
A
B
,P ):
Probability Theory
Notable properties of a Probability Space (Ω,
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
A
B
,P ):
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
A = {one, two}
P(A) = 1 / 3
B = {two, three}
A È B = {one, two, three}
P(B) = 1 / 3
P(A È B) = 1 / 2
A Ç B = {two}
P(A Ç B) = 1 / 6
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
If
A Í B then
and
P(A) £ P(B)
A
P(B \ A) = P(B) - P(A)
B
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
If
A Í B then
and
P(A) £ P(B)
A
P(B \ A) = P(B) - P(A)
B
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
If
A Í B then
and
P(A) £ P(B)
A
P(B \ A) = P(B) - P(A)
B
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
If
A Í B then
and
P(A) £ P(B)
A
P(B \ A) = P(B) - P(A)
B
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
If
A Í B then
and
P(A) £ P(B)
P(B \ A) = P(B) - P(A)
A = {one, two}
B = {one, two, three}
P(A) = 1 / 3
P(B) =1 / 2
B \ A = {three}
P(B \ A) = 1 / 6
Probability Theory
Notable properties of a Probability Space (Ω,
,P ):
P( A¢) =1- P(A)
P(AÈ B) = P(A)+ P(B)- P(AÇ B)
If
A Í B then
P(Æ) = 0
and
P(A) £ P(B)
P(B \ A) = P(B) - P(A)
Probability Theory
So where’s this all going? These examples are trivial!
Probability Theory
So where’s this all going? These examples are trivial!
Suppose there are three bags, B1, B2 and B3, each of which contain a
number of coloured balls:
• B1 – 2 red and 4 white
• B2 – 1 red and 2 white
• B3 – 5 red and 4 white
A ball is randomly removed from one the bags.
The bags were selected with probability:
• P(B1) = 1/3
• P(B2) = 5/12
• P(B3) = 1/4
What is the probability that the ball came from B1, given it is red?
Probability Theory
Conditional probability:
P(A Ç B)
P(A | B) =
P(B)
Partition Theorem:
P(A) = å P(A Ç Bi )
If the
i
Bayes’ Theorem:
P(B | A)P(A)
P(A | B) =
P(B)
A
Bpartition
i
Random Variables
A Random Variable is an object whose value is determined by chance,
i.e. random events
Maps elements of Ω onto real numbers, with corresponding probabilities
as specified by P
Random Variables
A Random Variable is an object whose value is determined by chance,
i.e. random events
Maps elements of Ω onto real numbers, with corresponding probabilities
as specified by P
Formally, a Random Variable is a function:
X :W® R
Random Variables
A Random Variable is an object whose value is determined by chance,
i.e. random events
Maps elements of Ω onto real numbers, with corresponding probabilities
as specified by P
Formally, a Random Variable is a function:
X :W® R
Probability that the random variable X adopts a particular value x:
P({w Î W : X(w) = x})
Random Variables
A Random Variable is an object whose value is determined by chance,
i.e. random events
Maps elements of Ω onto real numbers, with corresponding probabilities
as specified by P
Formally, a Random Variable is a function:
X :W® R
Probability that the random variable X adopts a particular value x:
P({w Î W : X(w) = x})
Shorthand:
P(X = x)
Random Variables
Example:
If the result is heads then WIN - X takes the value 1
If the result is tails then LOSE - X takes the value 0
W = {heads, tails}
X : W ® {0,1}
ì
ïï P({heads})
P(X = x) = í
ï P({tails})
ïî
P(X = x) =1 2
x =1
x=0
x Î {0,1}
Random Variables
Example:
W = {one, two, three, four, five, six}
Win £20 on a six, nothing on four/five, lose £10 on one/two/three
X : W ® {-10, 0, 20}
Random Variables
Example:
W = {one, two, three, four, five, six}
Win £20 on a six, nothing on four/five, lose £10 on one/two/three
X : W ® {-10, 0, 20}
ì
P({six}) = 1 6
ï
ï
ï
P(X = x) = í
P({ four, five})= 1 3
ï
ï
ï P({one, two, three}) = 1 2
î
Note – we are considering the probabilities of events in
x = 20
x=0
x = -10
Probability Mass Functions
Given a random variable:
X :W® A
The Probability Mass Function is defined as:
pX (x) = P(X = x)
Only for discrete random variables
Probability Mass Functions
Example:
Win £20 on a six, nothing on four/five, lose £10 on one/two/three
ì
P({six}) = 1 6
ï
ï
ï
pX (x) = í
P({ four, five})= 1 3
ï
ï
ï P({one, two, three}) = 1 2
î
x = 20
x=0
x = -10
Probability Mass Functions
Notable properties of Probability Mass Functions:
pX (x) ³ 0
åp
xÎA
X
(x) =1
Probability Mass Functions
Notable properties of Probability Mass Functions:
pX (x) ³ 0
åp
X
(x) =1
xÎA
Interesting note:
If p() is some function that has the above two properties, then it is
the mass function of some random variable…
Probability Mass Functions
For a random variable
Mean:
X :W® A
E(X) = å xpX (x)
xÎA
Probability Mass Functions
For a random variable
Mean:
X :W® A
E(X) = å xpX (x)
xÎA
Probability Mass Functions
For a random variable
Mean:
X :W® A
E(X) = å xpX (x)
xÎA
Compare with:
1 n
xi
å
n i=1
Probability Mass Functions
For a random variable
Mean:
X :W® A
E(X) = å xpX (x)
xÎA
Median:
any m such that:
åp
x£m
X
(x) ³1 2and
åp
x³m
X
(x) ³1 2
Probability Mass Functions
For a random variable
Mean:
X :W® A
E(X) = å xpX (x)
xÎA
Median:
any m such that:
åp
X
(x) ³1 2and
x³m
x£m
Mode:
argmax ( pX (x))
x
åp
: most likely value
X
(x) ³1 2
Common Discrete Distributions
Common Discrete Distributions
The Bernoulli Distribution:
X ~ Bern(p)
p : success probability
X : W ® {0,1}
ì
ïï p
pX (x) = í
ï 1- p
ïî
x =1
x=0
Common Discrete Distributions
X ~ Bern(p)
The Bernoulli Distribution:
p : success probability
ì
ïï p
pX (x) = í
ï 1- p
ïî
X : W ® {0,1}
Example:
X :{heads, tails} ® {0,1}
pX (x) =1 2
Therefore
x Î {0,1}
X ~ Bern(1 2)
x =1
x=0
Common Discrete Distributions
The Binomial Distribution:
X ~ Bin(n, p)
E(X) = np
n : number of independent trials
p : success probability
X : W ® {0,1,..., n}
ænö x
pX (x) = ç ÷ p (1- p)n-x
è xø
Common Discrete Distributions
The Binomial Distribution:
X ~ Bin(n, p)
E(X) = np
n : number of independent trials
p : success probability
X : W ® {0,1,..., n}
pX (x)
px
(1- p)n-x
ænö
n!
ç ÷=
è x ø x!(n - x)!
ænö x
pX (x) = ç ÷ p (1- p)n-x
è xø
: probability of getting x successes out of n trials
: probability of x successes
: probability of (n-x) failures
: number of ways to achieve x successes and (n-x) failures
(Binomial coefficient)
Common Discrete Distributions
The Binomial Distribution:
E(X) = np
X ~ Bin(n, p)
n : number of independent trials
p : success probability
X : W ® {0,1,..., n}
n =1 :
ænö x
pX (x) = ç ÷ p (1- p)n-x
è xø
ìï
pX (x) = p x (1- p)1-x = í
ïî
X ~ Bin(1, p)
Û
p
x =1
1- p
x=0
X ~ Bern(p)
Common Discrete Distributions
The Binomial Distribution:
X ~ Bin(n, p)
E(X) = np
n : number of independent trials
p : success probability
X : W ® {0,1,..., n}
ænö x
pX (x) = ç ÷ p (1- p)n-x
è xø
n = 10
p = 0.5
Common Discrete Distributions
The Binomial Distribution:
X ~ Bin(n, p)
E(X) = np
n : number of independent trials
p : success probability
X : W ® {0,1,..., n}
ænö x
pX (x) = ç ÷ p (1- p)n-x
è xø
n = 100
p = 0.5
Common Discrete Distributions
The Binomial Distribution:
X ~ Bin(n, p)
E(X) = np
n : number of independent trials
p : success probability
X : W ® {0,1,..., n}
ænö x
pX (x) = ç ÷ p (1- p)n-x
è xø
n = 100
p = 0.8
Common Discrete Distributions
Example:
Number of heads in n fair coin toss trials
X : W ® {0,1,..., n}
n=2
W = {heads : heads, heads : tails, tails : heads, tails : tails}
In general:
W = 2n
Common Discrete Distributions
Example:
Number of heads in n fair coin toss trials
X : W ® {0,1,..., n}
n=2
W = {heads : heads, heads : tails, tails : heads, tails : tails}
In general:
W = 2n
Notice:
X ~ Bin(n,1 2)
ænö n
pX (x) = ç ÷ 0.5
è xø
E(X) = n 2
Common Discrete Distributions
The Poisson Distribution:
E(X) = l
X ~ Pois(l )
Used to model the number of occurrences of an event
that occur within a particular interval of time and/or space
λ : average number of counts (controls rarity of events)
X : W ® {0,1,...}
pX (x) =
l x e- l
x!
Common Discrete Distributions
The Poisson Distribution:
• Want to know the distribution of the number of occurrences of an event
ÞBinomial?
• However, don’t know how many trials are performed – could be infinite!
• But we do know the average rate of occurrence:
X ~ Bin(n, p) Þ E(X) = np
Þ
Þ
l = np
l
p=
n
E(X) = l
Common Discrete Distributions
n!
pX (x) =
p x (1- p)n-x
x!(n - x)!
Binomial:
p=
l
n
Þ
n! æ l ö
pX (x) =
ç ÷
x!(n - x)! è n ø
x
æ lö
ç1- ÷
è nø
n-x
Common Discrete Distributions
n!
pX (x) =
p x (1- p)n-x
x!(n - x)!
Binomial:
p=
l
n
x
Þ
ælö æ lö
n!
pX (x) =
ç ÷ ç1- ÷
x!(n - x)! è n ø è n ø
n-x
n
æ lö
ç1- ÷
x
l
n!
è nø
pX (x) =
x! n x (n - x)! æ l öx
ç1- ÷
è nø
Common Discrete Distributions
n!
pX (x) =
p x (1- p)n-x
x!(n - x)!
Binomial:
p=
l
n
x
Þ
ælö æ lö
n!
pX (x) =
ç ÷ ç1- ÷
x!(n - x)! è n ø è n ø
n-x
n
æ lö
ç1- ÷
x
1
l
n!
è nø
pX (x) =
x! n x (n - x)! æ l öx1
ç1- ÷
è nø
as n ®¥
Common Discrete Distributions
n!
pX (x) =
p x (1- p)n-x
x!(n - x)!
Binomial:
p=
l
n
x
Þ
ælö æ lö
n!
pX (x) =
ç ÷ ç1- ÷
x!(n - x)! è n ø è n ø
n-x
n
æ lö
ç1- ÷
x
1
l
n!
è nø
pX (x) =
x! n x (n - x)! æ l öx1
ç1- ÷
è nø
æ l x æ l ön ö
pX (x) = Lim çç ç1- ÷ ÷÷
n®¥
è x! è n ø ø
as n ®¥
Common Discrete Distributions
n!
pX (x) =
p x (1- p)n-x
x!(n - x)!
Binomial:
p=
l
n
x
Þ
ælö æ lö
n!
pX (x) =
ç ÷ ç1- ÷
x!(n - x)! è n ø è n ø
n-x
n
æ lö
ç1- ÷
x
1
l
n!
è nø
pX (x) =
x! n x (n - x)! æ l öx1
ç1- ÷
è nø
æ l x æ l ön ö
pX (x) = Lim çç ç1- ÷ ÷÷
n®¥
è x! è n ø ø
as n ®¥
=
lx
x!
e- l
☐
Common Discrete Distributions
The Poisson distribution is the Binomial distribution as
If
Xn ~ Bin(n, p)
then
n ®¥
Xn ¾d¾
® Pois(np)
If n is large and p is small then the Binomial distribution can be approximated using
the Poisson distribution
This is referred to as the:
• “Poisson Limit Theorem”
• “Poisson Approximation to the Binomial”
• “Law of Rare Events”
fixed
Poisson is often more computationally convenient than Binomial
References
Countless books + online resources!
Probability theory and distributions:
• Grimmett and Stirzker (2001) Probability and Random Processes.
Oxford University Press.
General comprehensive introduction to (almost) everything
mathematics (including a bit of probability theory):
• Garrity (2002) All the mathematics you missed: but need to know
for graduate school. Cambridge University Press.
Download