Uploaded by Aurian Saidi

Probability

advertisement
Random variables
Discrete RVs on a discrete S
Discrete probability models
STAT 391
Discrete Random Variables
Emanuela Furfaro
1
Winter 2023
1
These slides are based on Chapter 3 and Chapter 8 of the textbook “Probability and
Statistics for Computer Science" by Marina Meila
1 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Random variables
É A random variable (r.v.) is defined as a function that associates
a number to each element of the outcome space. Hence, any r,
r : S → (−∞, +∞) is a random variable.
É In general, for a random variable Y and an outcome space S,
with Y : S → SY ⊂ (−∞, +∞): the range of Y, SY is called the
outcome space of the random variable Y.
É The SY cannot have more elements than the original S.
É If the range (i.e the outcome space) SY of a RV Y is discrete,
then Y is called a discrete random variable.
É If SY is continuous then the RV is a continuous.
É On a discrete S one can have only discrete RV’s.
É If S is continuous, one can construct both discrete and
continuous RV’s.
2 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Probability distribution of a RV
É The probability distribution of a RV Y denoted by PY is a
probability over SY
PY (y) = P ({s ∈ S|Y(s) = y})
É We will indicate random variables by capital letters and their
values by the same letter, in lower case
É PY (y) is the probability of the event Y(s) = y and we will often
simply write P(y).
3 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Example
É We toss a fair coin (coin with P(H) = P(T) = 0.5).
É The sample space contains the following: SY = {H, T}.
É Let Y be the random variable which interprets the number of
tails observed the random variable Y takes values: S = {0, 1}
4 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Repeated independent trials
É A coin is tossed n times, a series of n die rolls are both
examples of experiments with repeated trials.
É In a repeated trial, the outcome space Sn is S × S × ... × S n times
É The elements of Sn are length n sequences of elements of S.
É If in a set of repeated trials, the outcome of a trial is not
influenced in any way by the outcomes of the other trials,
either taken together or separately, we say that the trials are
independent.
É The probability of the sequence, it is given by multiplying the
probability of the each event.
5 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Example
É A fair coin (P(T) = P(H) = 0.5) is tossed 3 times.
É The following 2 numbers that can be associated with each outcome of
the experiment are random variables on this space:
É Y: the number of heads in 3 tosses
É X: the position of the first toss that is heads (0 if no heads appear)
outcome
Y
X
P(outcome)
TTT
HTT
THT
TTH
HHT
HTH
THH
HHH
0
1
1
1
2
2
2
3
0
1
2
3
1
1
2
1
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
É We can use the definition of independent event to compute P(outcome).
É P(outcome) =
1 1 1
2 2 2
for all outcomes.
6 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Example (cnt.)
É It follows that SY = {0, 1, 2, 3} and SX = {0, 1, 2, 3}.
É An important question that will often occur is “What is the probability
that a RV takes a certain value"?
É Since the outcomes are disjoint:
0
1
2
3
PY (y)
PX (x)
PY (0) = 1/ 8
PY (1) = 1/ 8 + 1/ 8 + 1/ 8 = 3/ 8
PY (2) = 1/ 8 + 1/ 8 + 1/ 8 = 3/ 8
PY (3) = 1/ 8
PX (0) = 1/ 8
PX (1) = 1/ 8 + 1/ 8 + 1/ 8 + 1/ 8 = 4/ 8
PX (2) = 1/ 8 + 1/ 8 = 2/ 8
PX (3) = 1/ 8
É The events Y = y for y = 0, 1, 2, 3 are disjoint events, and their union is
equal to the whole sample space S.
É Similarly, the events X = x for x = 0, 1, 2, 3 are disjoint events, and their
union is equal to the whole sample space S.
É If we are interested only in Y (or only in X) instead of the experiments
outcome itself, then we can ignore the original outcome space S and
instead look at the outcome space SY of Y or SX of X.
7 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Expectation of a discrete RV
É The expectation of a discrete RV Y is a real number computed
as:
E [Y] =
X
y · PY (y)
Y∈SY
É The expectation is often also called expected value, average,
mean.
É It is the “center of balance" of a distribution.
8 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Properties
NOTE: This is valid for continuous and discrete RVs, but here we
only refer to discrete RVs.
1. The expectation of a constant random variable is the constant
itself.
2. Let X be random variable, c a real constant, and Y = cX. Then
E [Y] = cE [X]
3. The expectation of the sum of n r.v’s Y1 , Y2 , ..., Yn is equal to the
sum of their expectations.

E
n
X
i=1

Yi  =
n
X
E [Yi ]
i=1
9 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Expectation of a discrete RV (ctd.)
NOTE: This is valid for continuous and discrete RVs, but here we
only refer to discrete RVs.
É Given a random variable Y with probability distribution PY (y),
and a function g(Y):
E [g(Y)] =
X
g(y)PY (y)
y∈SY
É In other words, the transformed random variable g(Y) can be
found without finding the distribution of the transformed
random variable, simply by applying the probability weights of
the original random variable to the transformed values.
É This theorem sometimes referred to as “Law of the unconscious
statistician" (LOTUS).
10 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Example (slide 6)
Calculate the expected value of X and the expected value of Y.
11 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Variance
É The variance is defined as
Var [Y] = E (Y − E [Y])2
É It can be thought of as a special kind of expectation which
measures the average squared deviations from the expected
value E [Y].
É The variance is often computed using the following formula:
Var [Y] = E Y 2 − E [Y]2
which for the discrete case translates into
€P
Š2
P
2
y∈SY y PY (y) −
y∈SY yPY (y) .
É The square root of the variance is called standard deviation.
12 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Properties of the variance
1. The variance is always ≥ 0. When the variance is 0, the RV is
deterministic (in other words it takes one value only).
2. Given a RV Y and a real constant c, Var [cY] = c2 Var [Y]
3. Given two RVs X and Y, the variance of their sum
Var [X + Y] = Var [X] + Var [Y] + 2Cov [X, Y].
É If X and Y are independent: Var [X + Y] = Var [X] + Var [Y].
É Given a sequence of independent variables Y1 , Y2 , ..., Yn :
— P
”P
n
n
Var
Y = i=1 Var [Yi ].
i=1 i
13 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Example (slide 6)
Calculate the Variance of X and the variance of Y.
14 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Discrete probability models
É In this section we describe a few naturally occurring families of
discrete probability models that arise out of certain idealized
random experiments.
É We will say that a random variable belongs to a certain family
of probability models if it follows a specific distribution.
É The term “family” here refers to the fact that the distributions
of these random variables have specific functional forms that
can be described in terms of one or more parameters.
15 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Bernoulli trials and Binomial experiments
Bernoulli trials or Binomial experiment refers to the following
random experiment:
É the experiment consists of a sequence of independent trials,
É each trial results in one of two possible outcomes – success or
failure (this is often called binary experiment),
É the probability of success in each trial is a fixed number θ with
0 < θ < 1.
16 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
The Binomial distribution
É Suppose we conduct a binomial experiment consisting of n
trials, where n is predetermined, and count the number of
successes in these n trials.
É The latter random variable Y is said to be a Binomial random
variable with parameters (n, θ) if it follows the Binomial
distribution given by:
PY (y) = P(Y = y) =
n y
θ (1 − θ)n−y ,
y
y = 0, 1, 2, . . . , n.
É The expectation and variance are given by:
E [Y] = nθ and Var [Y] = nθ(1 − θ)
17 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Example (slide 6)
In the experiment described in slide 6, we had obtained the
following table:
PY (y)
0
PY (0) = 1/ 8
1
PY (1) = 1/ 8 + 1/ 8 + 1/ 8 = 3/ 8
2
PY (2) = 1/ 8 + 1/ 8 + 1/ 8 = 3/ 8
3 PY (3) = 1/ 8
Once we establish that Y ∼ Binomial(3, 0.5), we can readily
compute the probability distribution without having to construct S
and SY . For instance:
PY (1) =
3
0.5y (1 − 0.5)3−1 = 0.375
1
18 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
The Binomial distribution
3
4
p(x)
0
2
3
4
5
1
2
4
6
8
10
7
10 13 16 19 22
19
9
12 15 18 21
x
23
25
27
29
0.12
p(x)
0.00
6
21
0.00
p(x)
0.04
0.12
0.06
3
5
x
Y~Binom(p=0.9,
n=100)
0.08
x
Y~Binom(p=0.5,
n=100)
0.00
0
4
0.20
4
x
Y~Binom(p=0.1,
n=100)
3
0.00
p(x)
0.10
p(x)
0
2
x
Y~Binom(p=0.9,
n=30)
0.00
p(x)
0.00 0.10 0.20
1
x
Y~Binom(p=0.5,
n=30)
0.10
2
0.06
1
x
Y~Binom(p=0.1,
n=30)
0.0 0.2 0.4 0.6
0.30
0.15
p(x)
0
p(x)
Y~Binom(p=0.9, n=5)
0.00
p(x)
Y~Binom(p=0.5, n=5)
0.0 0.2 0.4 0.6
Y~Binom(p=0.1, n=5)
31
37
43
49
x
55
61
67
78
82
86
90
x
94
98
19 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
The Bernoulli distribution
É The special case where Y ∼ Binomial(n, θ) and n = 1 is referred
to as Bernoulli distribution.
É A Bernoulli has only one parameter θ and its probability
distribution is given by:
É E [Y] =
É Var [Y] =
É The Binomial RV is often derived as the sum of n independent
and identical Bernoullis.
20 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Exercise: Binomial RV as the sum of Bernoullis
The Binomial RV is often derived as the sum of n independent and
identical Bernoullis. Let X1 , X2 , ...Xn independent and identical
Pn
Bernoulli(θ) and Y = i=1 Xi . Find E [Y] and Var [Y].
21 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Historical fact (Ross, 2005)
Independent trials having a common probability of success were first
studied by the Swiss mathematician Jacques Bernoulli (1654–1705). His
book Ars Conjectandi (The Art of Conjecturing) was published by his nephew
Nicholas eight years after his death in 1713. Jacques Bernoulli was from the
first generation of the most famous mathematical family of all time.
Altogether, there were between 8 and 12 Bernoullis, spread over three
generations, who made fundamental contributions to probability, statistics,
and mathematics. One difficulty in knowing their exact number is the fact
that several had the same name. (For example, two of the sons of Jacques’s
brother Jean were named Jacques and Jean.) Another difficulty is that
several of the Bernoullis were known by different names in different places.
Our Jacques (sometimes written Jaques) was, for instance, also known as
Jakob (sometimes written Jacob) and as James Bernoulli. But whatever their
number, their influence and output were prodigious.
22 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Multinomial distribution
É The multinomial distribution is a generalization of the binomial
distribution
É The generalization consists in allowing to model experiments
which are not binary experiments.
É Let’s consider a trial that has m outcomes or m categories, with
each category having a given fixed success probability.
É For n independent trials each of which leads to a success for
exactly one of the m categories, the multinomial distribution
gives the probability of any particular combination of numbers
of successes for the various categories.
23 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Multinomial distribution: Example
É Let’s start with the simple example of a binary experiment.
É Let’s assume a box contains 5 red balls and 15 yellow balls.
É We draw 3 balls with replacement.
É The following 2 numbers that can be associated with each outcome of
the experiment are random variables on this space:
É n1 : the number of red balls in 3 draws
É n2 : the number of yellow balls in 3 draws
É Let θ1 = 5/ 20 = 1/ 4 indicate the probability of a red ball and
θ2 = 15/ 20 = 3/ 4 the probability of a yellow ball,
outcome
n1
n2
RRR
RYY
YRY
YYR
RRY
RYR
YRR
RRR
0
1
1
1
2
2
2
3
0
2
2
2
1
1
1
0
P(outcome)
24 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Multinomial distribution: Example
É Note that n = n1 + n2 and therefore the combination of n1 , n2
n
n
can occur n n!
= n = n times.
!n !
1
2
1
2
É The probability of observing n1 and n2 counts is then given by:
n n1 n2
θ1 θ2
n
1
É Let’s assume that the box contains 5 red balls, 10 yellow balls
and 5 blue balls.
É We then have θ1 = 5/ 20 = 1/ 4, θ2 = 10/ 20 = 3/ 4 and
θ3 = 5/ 20 = 1/ 4:
outcome
n1
n2
n3
RRR
3
0
0
YYY
0
3
0
BBB
0
0
3
RYY
1
2
0
RYB
1
1
1
...
...
...
...
P(outcome)
25 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Multinomial distribution: Example
É Note that n = n1 + n2 + n3 and therefore the combination of
n outcomes n1 , n2 , n3 can occur n !nn!!n ! = n ,n times.
0
1
2
0
1
É The combination of outcomes n1 , n2 , n3 has probability:
n n1 n2 n3
θ1 θ2 θ3
n ,n
1
2
É Generally speaking the probability of observing a set of counts
(n1 , ..., nm ) is obtained by multiplying the probability of one
sequence, given by with the total number of sequences
exhibiting those counts:
n!
P(n1 , n2 , ..., nm ) =
n1 n2 ...nm
m
Y
n
θi i
i=1
26 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Multinomial distribution
É Denote the variable which is the number of successes in
category i (i = 1, ..., m) as Yi , and denote as θi the probability
that a given extraction will be a success in category i, the
following denotes the probability distribution of a multinomial
random variable.
PY1 ,Y2 ,...,Ym (y1 , y2 , ..., ym ) =
n!
m
Y
y1 y2 ...ym
i=1
θyi
É For m = 2, the above equation is the binomial distribution.
27 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Geometric distribution
É Suppose that independent trials, each having a probability θ,
0 < θ < 1, of being a success, are performed until a success
occurs. If we let Y equal the number of trials required, then
PY (y) = (1 − θ)y−1 θ
with y = 1, 2, ....
É Any random variable Y whose probability distribution is given
by equation above is said to be a geometric random variable
with parameter θ.
É This model is useful when we are interested to know the
number of trials needed to obtain one success.
É E [Y] = 1 and Var [Y] = 1−θ
θ
θ2
28 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
Geometric distribution
0.4
y_dgeom
y_dgeom
0.0
0.0
0.00
0.1
0.02
0.2
0.2
0.04
y_dgeom
0.3
0.06
0.6
0.4
0.08
0.5
Y~geom(p=0.8)
0.8
Y~geom(p=0.5)
0.10
Y~geom(p=0.1)
0
10
20
30
Index
40
50
2
4
6
Index
8
10
2
4
6
8
10
Index
29 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
The Poisson distribution
É A random variable Y is called to be a Poisson RV with parameter
λ if i follows the Poisson distribution given by:
PY (y) = P(Y = y) = e−λ
λy
,
y!
y = 0, 1, 2, . . .
É The Mean and variance are given by: E(X) = λ, Var(X) = λ.
É λ is called the rate parameter and it represents the average
number of occurrences in a given interval.
É It expresses the probability of a given number of events
occurring in a fixed interval of time or space if these events
occur with a known constant mean rate and independently of
the time since the last event.
É It can be derived as a limit of the Binomial distribution when θ
is small and n → ∞.
30 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
The Poisson approximation to Binomial
Consider a large number n of Bernoulli trials, and let the probability
of success p for each trial be such that θ → 0 and nθ → λ > 0 as
n → ∞. Then the probability distribution of Y, the number of
successes in these n trials (which has a Binomial(n, p) distribution)
is approximately Poisson(λ), in the sense that, for every fixed
nonnegative integer y,
λy
n y
θ (1 − θ)n−y → e−λ
PY (y) =
y
y!
as n → ∞.
31 / 32
Random variables
Discrete RVs on a discrete S
Discrete probability models
The Poisson distribution
Y~Pois(2)
Y~Pois(5)
0.15
0.05
0.00
0.00
4
6
8
10
2
4
8
10
5
10
Index
15
15
0.06
0.08
y_dpois
0.06
0.02
0.00
0.02
0.00
5
10
Index
Y~Pois(30)
0.04
y_dpois
0.08
0.04
0.00
y_dpois
6
Index
Y~Pois(20)
0.12
Index
Y~Pois(10)
0.04
2
0.10
y_dpois
0.20
y_dpois
0.10
0.4
0.2
0.0
y_dpois
0.6
Y~Pois(0.5)
0
5
10
15
Index
20
25
0
10
20
30
Index
40
32 / 32
Download