Probability and Random Variables
Chapter 1 - Experiments, Models, and Probabilities
Haneul Ko
Content
• Introduction
• Set Theory
• Applying Set Theory to Probability
• Conditional Probability
• Independence
2
Why Probability and Random Process?
• When you flip a coin, what will be the outcome, head or tail?
– Approach 1 (Deterministic Approach)
▪ Collecting all information about the coin, flipping dynamics, weather, location and so on
▪ Then, solving complicated equations of mechanic physics
▪ Pros: We might be able to have a deterministic answer
▪ Cons: It is too much complicated, and we might not be able to solve
3
Why Probability and Random Process?
• When you flip a coin, what will be the outcome, head or tail?
– Approach 2 (Probabilistic Approach)
▪ Modeling the coin as a fair coin, and assign the probability for each outcome
▪ Pros: Very simple
▪ Cons: Cannot have a deterministic answer
4
Why Probability and Random Process?
• Many problems (e.g., flipping a coin) in engineering (and other
fields) can be simplified by probabilistic approaches
– Example 1: what is the error rate when you download a movie through the
Internet?
– Example 2: How do the vehicles move?
5
Probability Theory
• You already know what the “probability” is
– What is the probability to see Head when flipping a coin?
– What is the probability to see an even number when rolling a dice?
– What is the winning probability of a LoL match of yours?
6
Probability Theory
• Even though we all know the probability intuitively, we (actually not
we, some mathematicians) need to define the probability explicitly
• Probability theory is the way to mathematically define the
probability
7
Set Theory
The mathematics of probability relies on the theory of sets
8
Set Definitions
• Set
– A collection of objects
• Element
– An object of a set
– 𝑎 ∈ 𝐴: 𝑎 is an element of set 𝐴
– 𝑎 ∉ 𝐴: 𝑎 is not an element of set 𝐴
9
Set Definitions
• Set relations
– 𝐴 ⊂ 𝐵: 𝐴 is a subset of 𝐵, 𝐴 is contained in 𝐵
▪ 𝑎∈𝐴⇒𝑎∈𝐵
▪ 𝑎 ∈ 𝐵, ∀𝑎 ∈ 𝐴
– Def. Two sets 𝐴 and 𝐵 are equal, denoted by A = 𝐵, if and only if A ⊂ 𝐵 and 𝐵 ⊂ 𝐴
– 𝐴 ⊆ 𝐵: 𝐴 is a proper subset of 𝐵
▪ A ⊂ 𝐵 but A ≠ 𝐵
▪ 𝑎 ∈ 𝐵, ∀𝑎 ∈ 𝐴 but 𝑎 ∉ 𝐴, ∃𝑎 ∈ 𝐵
– 𝐴 and 𝐵 are mutually exclusive or disjoint if they have no common elements
▪ Draw a Venn diagram
10
Set Definitions
• A class of sets (Collection of sets)
– We need set operations to handle more than one set
– The universal set
▪ The largest or all-encompassing set of objects
▪ Often denoted by 𝑆
▪ The universal set 𝑆 is represented by a rectangle in a Venn diagram
11
Set Operations
• Given a universal set 𝑆, we define set operation
– Complement: 𝐴𝑐 or denoted by 𝐴ҧ
𝑆
– Union: 𝐴⋃𝐵 ≜ 𝑥 ∈ 𝑆 | 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵
– Intersection: 𝐴⋂𝐵 ≜ 𝑥 ∈ 𝑆 | 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵
12
Set Operations
– Mutually exclusive (or disjoint) sets: 𝐴⋂𝐵 = 𝜙
– Collectively exhaustive sets: 𝐴1 ⋃𝐴2 ⋃… ⋃𝐴𝑛 = 𝑆
𝑆
– A collection of sets 𝐴1 , … , 𝐴𝑛 is a partition
if it is both mutually exclusive and collectively
exhaustive
13
Algebra of Two Sets
• Commutative law
– 𝐴⋃𝐵 = 𝐵⋃𝐴
– 𝐴⋂𝐵 = 𝐵⋂𝐴
𝑆
• Distributed law
– 𝐴⋂ 𝐵⋃𝐶 = 𝐴⋂𝐵 ⋃ 𝐴⋂𝐶
– 𝐴⋃ 𝐵⋂𝐶 = 𝐴⋃𝐵 ⋂ 𝐴⋃𝐶
• Associative law
– 𝐴⋂ 𝐵⋂𝐶 = 𝐴⋂𝐵 ⋂𝐶 = 𝐴⋂𝐵⋂𝐶
– 𝐴⋃ 𝐵⋃𝐶 = 𝐴⋃𝐵 ⋃𝐶 = 𝐴⋃𝐵⋃𝐶
14
Algebra of Two Sets
• De Morgan's Law
– 𝐴⋃𝐵 𝑐 = 𝐴𝑐 ⋂𝐵𝑐
▪ To show 𝐴⋃𝐵 𝑐 ⊂ 𝐴𝑐 ⋂𝐵𝑐
𝑆
▪ To show 𝐴𝑐 ⋂𝐵𝑐 ⊂ 𝐴⋃𝐵 𝑐
15
Applying Set Theory to Probability
16
Repeatable Experiment
• Probability is based on a repeatable experiment that consists of a
procedure and observation
– Procedure: Flipping a coin / Observation: Did it land with heads or tails facing
up?
– Procedure: Rolling a dice / Observation: Did it land with which number?
– Procedure: Play a LoL match / Observation: Win or lose?
• We can create models of experiments, because real experiments
generally are too complicated to analyze
– It is necessary to study a model that captures the important part of the actual
physical experiment
17
Experiment Example
• Example 1.2
– Procedure: Monitor activity at a Smart Phone store
– Observation: Observe which type of phone (A vs. S) that next customer
purchases
– Model: Apple and Samsung are equally likely. The result of each purchase is
unrelated to the results of previous purchases
18
Definition
• Experiment: an imaginary performance (e.g., flipping a coin)
• Outcome: any possible observation (e.g., head or tail)
• Sample space: set of all possible outcomes of an experiment (e.g.,
{head, tail})
• Event: a set of outcomes of an experiment
Set Algebra
Probability
Set
Event
Universal set
Sample space
Element
Outcome
𝑆
𝐸
19
Probability
• Probability model is a function that maps events in the sample
space to real numbers
– Probability model assigns a number between 0 and 1 to every event
0
𝑆
𝐸
𝑃[𝐸]
1
20
Probability Axiom
• Axiom 1: For any event 𝑨, 𝑷 𝑨 ≥ 𝟎
• Axiom 2: 𝑷 𝑺 = 𝟏
• Axiom 3: For any countable collection 𝑨𝟏 , 𝑨𝟐 , … of mutually
exclusive event, 𝑷 𝑨𝟏 ⋃𝑨𝟐 ⋃… = 𝑷 𝑨𝟏 + 𝑷 𝑨𝟐 + ⋯
21
Some Obvious Consequences of Axioms
•𝑷𝜙 =𝟎
• 𝑷 𝑨𝒄 = 𝟏 − 𝑷 𝑨
• 𝑷 𝑨⋃𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷 𝑨⋂𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷 𝑨𝑩
• If 𝑨 ⊂ 𝑩, then 𝑷 𝑨 ≤ 𝑷 𝑩
22
Example
• Considering flipping “three” coins. Assuming three coins are fair
coins and distinct from each other
– What is the sample space?
– How many possible outcomes?
– What is the probability of HHT?
– What is the probability to see 2 tails?
– What is the probability to see 2 tails or 2 heads?
– What is the probability to see 2 tails and 2 heads at the same time?
23
Example
• You are monitoring traffic. You classify the vehicle as a car (𝑪) or a
bike (𝑩). Also you classify a vehicle as fast (𝑭) if it moves with speed
of more than 60km/h, or slow (𝑺) otherwise. Based on the observed
data, you build a probability model: 𝑷 𝑪 = 𝟎. 𝟕, 𝑷 𝑭 = 𝟎. 𝟑, and P
𝑷 𝑪𝑭 = 𝟎.15. Find the following probabilities
– 𝑷 𝑩𝑭
– 𝑷 𝑪⋃𝑭
C
B
– 𝑷 𝑩𝑺
F
0.15
0.15
– 𝑷 𝑩⋃𝑭
S
0.55
0.15
24
Birthday problem
• In probability theory, the birthday
problem asks for the probability
that, in a set of 𝒏 randomly
chosen people, at least two will
share a birthday
25
Birthday problem
• The birthday paradox refers to the counterintuitive fact that only 23
people are needed for that probability to exceed 50%
26
Birthday problem
27
Conditional Probability
A Modified probability model that reflects partial information
28
Conditional Probability
• What is the winning probability of a LoL match?
– Marginal probability
• What is the winning probability of a LoL match when the top solo is
Timo?
– Conditional probability
29
Definition of Conditional Probability
• The conditional probability of the event 𝑨 given the occurrence of
the event 𝑩 (or shortly of 𝑨 given 𝑩) is given as
𝑃 𝐴𝐵
– 𝑃 𝐴|𝐵 =
𝑃𝐵
– Defined only when 𝑃 𝐵 > 0
▪ 𝑃 𝐵 = 0 means that 𝐵 never occurs. In this case, it is illogical to speak the probability of
𝐴 given that 𝐵 occurs
30
Conditional Probability Axioms
• Axiom 1: For any event, 𝑷 𝑨|𝑩 ≥ 𝟎
• Axiom 2: 𝑷 𝑩|𝑩 = 𝟏
• Axiom 3: For any disjoint events 𝑨𝟏 , 𝑨𝟐 , … 𝑨𝒏 ,
𝑷 𝑨𝟏 ⋃𝑨𝟐 ⋃ … 𝑨𝒏 |𝑩 = 𝑷 𝑨𝟏 |𝑩 + 𝑷 𝑨𝟐 |𝑩 + ⋯ + 𝑷 𝑨𝒏 |𝑩
31
Example
• Roll two fair four-sided dice. Let 𝑿𝟏 and 𝑿𝟐 denote the number of
dots that appear on dice 1 and dice 2, respectively.
– Let 𝐴 be the event 𝑋1 ≥ 2. What is 𝑃[𝐴]?
– Let 𝐵 denote the event 𝑋2 > 𝑋1 . What is 𝑃[𝐵]? What is 𝑃[𝐴|𝐵]?
32
Example
• Roll two fair six-sided dice. Let 𝑿𝟏 and 𝑿𝟐 denote the number of
dots that appear on dice 1 and dice 2, respectively.
– Let 𝐴 be the event 𝑋1 ≥ 3. What is 𝑃[𝐴]?
– Let 𝐵 denote the event 𝑋2 > 𝑋1 . What is 𝑃[𝐵]? What is 𝑃[𝐴|𝐵]?
33
Markov Property
• Environment (where reinforcement learning (RL) can be applied)
can be modelled by Markov decision process (MDP)
– MDP has Markov property
• Markov Property is related to the conditional probability
𝑃[𝑋𝑛 = 𝑥𝑛 |𝑋𝑛−1 = 𝑥𝑛−1 ,…, 𝑋0 = 𝑥0 ] = 𝑃[𝑋𝑛 = 𝑥𝑛 |𝑋𝑛−1 = 𝑥𝑛−1 ]
34
Law of Total Probability
• For disjoint events 𝑩𝟏 , 𝑩𝟐 , … , 𝑩𝒎 such that 𝑷[𝑩𝒊 ] > 𝟎 for all 𝒊 and
𝑺 = ⋃𝒎
𝒊 𝑩𝒊
𝒎
𝑷 𝑨 = 𝑷 𝑨|𝑩𝒊 𝑷[𝑩𝒊 ]
𝒊=𝟏
𝑆
𝑨
𝑩𝟏
𝑩𝟐
𝑩𝟑
𝑩𝟒
35
Example
• Assume that 20% of Korean, 10% of Chinese, and 5% of Japanese
are handsome. Also assume that the populations of Korea, China,
and Japan are 0.1B, 1.3B, and 0.2B, respectively.
– What is the probability of handsome in the East Asia?
𝑆
𝒎
𝑷 𝑨 = 𝑷 𝑨|𝑩𝒊 𝑷[𝑩𝒊 ]
𝒊=𝟏
36
Example - Solution
37
Example
• A company has three machines 𝑩𝟏 , 𝑩𝟐 , and 𝑩𝟑 making 1 𝐤𝛀 resistors. Resistors
with 50 𝛀 of the nominal value are considered acceptable. It has been observed
that 80% of the resistors produced by 𝑩𝟏 and 90% of the resistors produced by
𝑩𝟐 are acceptable. The percentage for machine 𝑩𝟑 is 60%. Each hour, machines
𝑩𝟏 , 𝑩𝟐 , and 𝑩𝟑 produce 3000, 4000, and 3000 resistors, respectively. All of the
resistors are mixed together at random in one bin and packed for shipment.
• What is the probability that the company ships an acceptable resistor?
38
Example - Solution
39
Bayes’ Theorem
• Presenting 𝑷 𝑩|𝑨 with 𝑷 𝑨|𝑩 , or the other way around
𝑃 𝐴|𝐵 𝑃 𝐵
𝑃 𝐵|𝐴 =
𝑃𝐴
Law of Total Probability
𝑃 𝐴|𝐵𝑖 𝑃 𝐵𝑖
𝑃 𝐵𝑖 |𝐴 = 𝑚
σ𝑖=1 𝑃[𝐴|𝐵𝑖 ]𝑃 𝐵𝑖
40
Example
• Assume that 20% of Korean, 10% of Chinese, and 5% of Japanese
are handsome. Also assume that the populations of Korea, China,
and Japan are 0.1B, 1.3B, and 0.2B, respectively. What is the
probability that a handsome guy is Korean?
𝑃 𝐴|𝐵𝑖 𝑃 𝐵𝑖
𝑃 𝐵𝑖 |𝐴 = 𝑚
σ𝑖=1 𝑃[𝐴|𝐵𝑖 ]𝑃 𝐵𝑖
41
Importance of Bayes’ Theorem
• We use Bayes’ theorem to obtain hard-to-have information from
easy-to-have information
– For example, the occurrence probabilities of heart attacks in Korea, China and
Japan are easy to get because each county reports
– However, the total probability of heart attacks in the East Asia, or the
probability that a man with heart attack is Korean, is not easy to have because
nobody report
– But we can calculate using Bayes’ theorem
42
Example
• A company has three machines 𝑩𝟏 , 𝑩𝟐 , and 𝑩𝟑 making 1 𝐤𝛀 resistors. Resistors
with 50 𝛀 of the nominal value are considered acceptable. It has been observed
that 80% of the resistors produced by 𝑩𝟏 and 90% of the resistors produced by
𝑩𝟐 are acceptable. The percentage for machine 𝑩𝟑 is 60%. Each hour, machines
𝑩𝟏 , 𝑩𝟐 , and 𝑩𝟑 produce 3000, 4000, and 3000 resistors, respectively. All of the
resistors are mixed together at random in one bin and packed for shipment.
• What is the probability that an acceptable resistor comes from machine 𝑩𝟑 ?
𝑃 𝐴|𝐵3 𝑃 𝐵3
𝑃 𝐵3 |𝐴 =
𝑃𝐴
43
Example - Solution
44
Example - Solution
45
Independence
46
Independent
• Events 𝑨 and 𝑩 are independent or 𝑨⫫𝑩 if and only if 𝑷 𝑨𝑩 =
𝑷 𝑨 𝑷[𝑩]
– Consequently, 𝑃 𝐴|𝐵 = 𝑃[𝐴] and 𝑃 𝐵|𝐴 = 𝑃[𝐵]
• Lemma: If 𝑨⫫𝑩 ⇒ 𝑨⫫𝑩𝒄 , 𝑨𝒄 ⫫𝑩, and 𝑨𝒄 ⫫𝑩𝒄 , …
• Independence means
– Two events have no relation
– Knowing the result of one event does not help to predicting the outcome of
the other event
47
Example
• Assuming that 20% of Korean, 10% of Chinese, and 5% of Japanese
are handsome, knowing a guy is Korean increases the probability
that he/she is handsome
– Nationality and handsomeness are not independent
• Knowing the outcome of a coin flip does not help to predict the
outcome of a dice rolling because they are independent
48
Independence of 𝑚 Event
• 𝑨𝟏 , 𝑨𝟐 and 𝑨𝟑 are independent if and only if
– 𝑨𝟏 and 𝑨𝟐 are independent
– 𝑨𝟐 and 𝑨𝟑 are independent
– 𝑨𝟑 and 𝑨𝟏 are independent
– 𝑷 𝑨𝟏 ⋂𝑨𝟐 ⋂𝑨𝟑 = 𝑷 𝑨𝟏 𝑷 𝑨𝟐 𝑷 𝑨𝟑
49