Uploaded by hauksdottirhildur

Lecture 1

advertisement
Lecture 1 - Applied Probability
L1.1 What is probability?
We use probability to desribe uncertain situations
Will it rain tomorrow?
Will the party on Friday be fun?
Who will win in the elections?
What will the price of USD in ISK be tomorrow?
There are two main philosophical interpretations of probability: the frequentist approach and the
subjective approach
L1.1.1 Frequentist approach
Here, probability is described in terms of frequency of occurence
This approach assumes that one can repeat an experiment many times like tossing a coin
Then count the frequency of the occurance of each outcome
Define probability of an outcome as the percentage of the occurrance of the outcome over the number
of trials as the number of trials approach infinity
Main criticism: Many situations are not repeatable experiments
L1.1.2 Subjective approach (Bayesian)
Here, probability is described in terms of degree of belief
Given my experience, I belive the probability of rain tomorrow to be 65%
Main criticism: People have varying degrees of belief
We don't need to worry about philosophical differences here. We just assume that probability theory is very
useful in many different situations and use Kolmogorov's axiomatic approach.
Probability is the most important concept in modern science especially as nobody has the slightest notion what
it means
-Bertrand Russel
L1.2 Historical development
Very long time ago (BC)
Games of change popular in ancient Greece and Rome
No scientific development of games or probability
1500s
Cardano - Italian mathematician publishes book on probabilities in games involving dice and cards
1600s
Fermat and Pascal - French mathematicians (Pascal's triangle, Fermat's last theorem)
1700s
Bernoullis - law of large numbers
De Moivre - the normal distribution, central limit theorem
1800s
Laplace - publishes an important book on probability theory with many original results
Gauss - Greatest mathematician in history, least squres, Gaussian distribution, practical applications
Poisson - Poisson process
Markov - Stochastic processes, Markov chains
1900s
Kolmogorov - Russian mathematician, created an axiomatic approach to probability theory.
L1.3 Sample spaces, events, and probability measures
Kolmogorov's approach (somewhat simplified)
An experiment is any process whose outcome is not known in advance
toss of a coin
roll of a die
The sample space, Ω, is the set of all possible outcomes of an experiment
Ω = {H , T }
for a coin toss
Ω = {1, 2, 3, 4, 5, 6}
for a rolling a die
An event A is a subset of Ω
A = {H} is the event of tossing a Head in a single toss of a coin
A = {2,4,6} is the event of rolling an even number in a single roll of a die
A = {1,4} is the event of rolling either 1 or 4 in a single roll of a die
A probability measure, or just probability, is a function that assigns a number to all events A, denoted
by P (A), with the following properties
(Non-negativity) P (A) ≥ 0 for all events A
(Additivity) If A and B are two disjoint events, then the probability of their union satisfies
P (A ∪ B) = P (A) + P (B)
More generally, if A , A , A , . . . is a sequence (possibly infinite) of disjoint events then
1
2
3
P (A1 ∪ A2 ∪ A3 . . . ) = P (A1 ) + P (A2 ) + P (A3 )+. . .
(Normalization) The probability of the full sample space Ω is 1, i.e., P (Ω) = 1
The definitions of events are more elaborate in more advanced courses (based on the concept of sigma
algebras). In that context, Probability Theory is a special case of a more general theory called Measure
Theory, which is also sometimes called the Integration Theory.
Using the definition above, can you see that P (∅) = 0?
Example 1. Coin tossing. Fair coin is tossed up once. This is an experiment because the outcome is not
known in advance. The sample space is Ω = {H , T }. All possible events are ∅, {H}, {T}, and Ω (often
impossible to list all possible events). The probability measure is defined by P (∅) = 0, P (H ) = 0.5,
,
P (T ) = 0.5 P (Ω) = 1
.
Example 2. Rolling a die. Fair die is tossed up once. This is an experiment because the outcome is not
known in advance. The sample space is Ω = {1, 2, 3, 4, 5, 6}. We now have 64 possible events (the
powerset of Ω). The probability measure is defined by P (∅) = 0, P (i) = 1/6,
P (i, j, . . , k) = P (i) + P (j)+. . . +P (k)
when i, j, ..., k are all different.
L1.4 Basic properties
Discrete Probability Law
If the sample space has finite number of possible outcomes, then the probability measure is fully specified
by the probabilities of events with a single element. In particular
P ({s1 , . . . , sn }) = P (s1 ) + P (s2 )+. . . +P (sn )
Discrete Uniform Probability
If the same space consists of n outcomes that all equally likely, the the probability of any event A is given by
P (A) =
number of elements of A
n
We will use this law very often. Anytime we are tossing fair coins or dice or drawing from a deck of cards,
etc., we are talking about uniform probabilities.
Example: What is the probability of {1,2,3,4} when rolling a six faced die?
Excercise: Write down the sample space of the roll of two six sided dice
and determine the probability measure on events.
Couple of useful properties
Let A, B and C be events. Then
(a) If A ⊆ B then P (A) ≤ P (B)
(b) P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
(c) P (A ∪ B) ≤ P (A) + P (B)
Example
Roll two dice and suppose for simplicity that they are red and green. Let F = "At least one 4 appears", A = "A
4 appears on the red die", and B = "A 4 appears on the green die". So F = A ∪ B. Now,
P (A) = P (B) = 1/6
and A ∩ B = (4, 4), which has a probability of 1/36. Using the relation above, we
find
P (F ) = P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 1/6 + 1/6 − 1/36 = 11/36
L1.5 Monte Carlo simulations
Monte Carlo simulation is a method where we let the computer generate many experiments - draw random
outcomes from a specified distibution many times. It is, e.g., much easier to let the computer generate the
outcomes of 1000 coin tosses than to actually do that yourself.
Once we have generated the random samples, we can use them to compute various properties of
probability distributions, and compute the probability of certain events happening. Using Monte Carlo is
often much easier than calculating the probabilities using mathematics, and it also serves as a useful tool to
make sanity checks.
We will be using Monte Carlo simulations a lot in this course.
Let's do an example using python (it's easy to do the same using R or Julia or other programming
languages). In this example, we are looking at the distribution, or the probability of all single outcomes, of a
toss of a single die. It demonstrates that for few simulations, the probabilities may be erratic (we can even
use Monte Carlo to estimate how erratic), but as we increase the number of simulations, the probabilities
settle down to their theoretical values. We see the same thing happening in many situations, so it is always
advisable to perform many simulations to get a reasonably accurate results.
In [5]: %matplotlib inline
import matplotlib.pyplot as plt
import random
import numpy as np
np.random.seed(42)
N = 100000 #Number of simulations
sample = np.random.choice(range(1, 7), size=N, replace=True) #Chooses N samples from 1,
#Aggregate the outcome
samples_per_outcome = np.array([len(sample[sample==i]) for i in range(1,7)]) #Pull it to
#Plot the distribution
fig = plt.figure()
plt.xlabel("Outcome")
plt.ylabel("Probability (%)")
plt.xlim([0, 7])
plt.bar(range(1,7), 100*samples_per_outcome/N)
100*samples_per_outcome/N #Convert this to a table - display a table using html?
Out[5]:
array([16.592, 16.799, 16.39 , 16.776, 16.81 , 16.633])
Let's see if we can use Monte Carlo to estimate the probability of getting an even number from the toss the
die.
In [9]: N = 100000
d1 = np.random.choice(range(1, 7), size=N, replace=True)
even = 0
for outcome in d1:
if outcome%2 == 0:
even += 1
print("Estimated probability of an even number: ", even/N)
Estimated probability of an even number:
0.5003
Now, let's up the game a little bit and see what we can do with two dice. What is the probability of the sum
of two dice being 3? Let's first work this out theoretically. The sample space of rolling two dice is a 6x6
square and we have a uniform distribution. The events leading to a sum of 3 are only two, namely {1,2} and
{2,1}, so the probability of rolling a sum of 3 is 2/36 = 1/18 which is approximately 0.056.
In [13]: N = 1000000
d1 = np.random.choice(range(1, 7), size=N, replace=True) #Rolling the first die
d2 = np.random.choice(range(1, 7), size=N, replace=True) #Indepedent, toss of d1 does no
s = d1 + d2
sum_is_3 = 0
for outcome in s:
if outcome == 3:
sum_is_3 += 1
print("Estimated probability getting a sum of 3: ", sum_is_3/N)
Estimated probability getting a sum of 3:
0.055763
Exercise: Can you find the probability of getting a sum greater than 9?
Let's find the distribution, or probabilities of all possible sums of two dice:
In [15]: N = 10000
d1 = np.random.choice(range(1, 7), size=N, replace=True) #Rolling the first die
d2 = np.random.choice(range(1, 7), size=N, replace=True) #Indepedent, toss of d1 does no
sample = d1+d2
samples_per_outcome = np.array([len(sample[sample==i]) for i in range(2,13)]) #Pull it t
fig = plt.figure()
plt.xlabel("Outcome")
plt.ylabel("Probability (%)")
plt.xlim([1, 13])
plt.bar(range(2,13), 100*samples_per_outcome/N)
100*samples_per_outcome/N
Out[15]:
array([ 2.6 ,
5.1 ,
5.62, 8.74, 11.08, 14.07, 16.81, 14.27, 10.64,
2.48])
8.59,
Finally, here is a cool example from 'Bayesian Methods for Hackers' - let's not worry to much about the
underlying assumptions, but the idea is to use a Bayesian approach to estimate the probabilities of getting
heads or tails using only data. Initially, we assume that we have no idea what the probability of heads (or
tails) is - in Bayesian terms, this is called prior belief, and we assume complete ignorance so all probabilities
between 0 and 1 are possible. Then, we start to toss and collect data. We toss once and get a tail, so the
proability of getting a head moves ever so slightly to zero. Then we toss again and get a tail, and now the
probabilities are centralized around 0.5, but the distribution is quite wide (meaning that there is a a
significant uncertainty). As we make more and more tosses and collect the data, the get more and more
certain that there is 50% probability of getting a head.
In [2]: %matplotlib inline
from IPython.core.pylabtools import figsize
import numpy as np
from matplotlib import pyplot as plt
figsize(11, 9)
import scipy.stats as stats
dist = stats.beta
n_trials = [0, 1, 2, 3, 4, 5, 8, 15, 50, 500]
data = stats.bernoulli.rvs(0.5, size=n_trials[-1])
x = np.linspace(0, 1, 100)
# For the already prepared, I'm using Binomial's conj. prior.
for k, N in enumerate(n_trials):
sx = plt.subplot(len(n_trials)//2, 2, k+1)
plt.xlabel("$p$, probability of heads") \
if k in [0, len(n_trials)-1] else None
plt.setp(sx.get_yticklabels(), visible=False)
heads = data[:N].sum()
y = dist.pdf(x, 1 + heads, 1 + N - heads)
plt.plot(x, y, label="observe %d tosses,\n %d heads" % (N, heads))
plt.fill_between(x, 0, y, color="#348ABD", alpha=0.4)
plt.vlines(0.5, 0, 4, color="k", linestyles="--", lw=1)
leg = plt.legend()
leg.get_frame().set_alpha(0.4)
plt.autoscale(tight=True)
plt.suptitle("Bayesian updating of posterior probabilities",
y=1.02,
fontsize=14)
plt.tight_layout()
There are couple of important counting methods that we should review here: the multiplication rule,
permutations and combinations.
Example: If I have 4 shirts and 3 jackets, I can dress in 12 different combinations. The reason is that each shirt
can have 3 jackets, and I have 4 shirts, so the total combinations is 3 + 3 + 3 + 3 = 4*3 = 12.
The multiplication rule, permutations and combinations
Suppose there are k experiments performed in order and each experiment i has n possible outcomes, then
i
the total number of outcomes are n n . . . n
1
2
k
In the example above, we have two experiments (k = 2), the first experiment is choosing a shirt (n
and the second expirment is choosing a jacket (n
2
= 3
1
= 4
)
).
Now, let's move to permutations.
In how many ways can 5 people stand in a line? The first experiment is to pick a person in the front of the
line. We can pick any of the 5 persons, so n
1
= 5
. In the second position, we can now choose 4 people,
(because we have picked one person in the first position). Then 3 people, then 2, and the last person must
come at the end of the line. Thus, the number of ways 5 people can stand in a line is 5*4*3*2*1 = 120.
We define n! = n ∗ (n − 1)∗. . . ∗2 ∗ 1 and it's called n factorial.
Suppose we have 11 football players. In how many ways can we pick 3 offensive players for the left wing,
center, and right wing out of the 11 players?
We start with the first striker (=offensive player) and we have 11 choices. Second striker is then 10, and last
one is 9. This can also be written as
11!
8!
and is what is called a permutation.
More generally, when we want to pick k objects from a population of n objects and arrange them in a
sequence, i.e., the number of distinct k object sequences, then we have
n!
n Pk
= n ∗ (n − 1)∗. . . ∗(n − k + 1) =
(n − k)!
and this is called k - permutations.
Example Count the number of 4 letter words using the english alphabet of 26
characters (never mind their meaning...)
n!
=
(n − k)!
26!
= 358, 800
22!
The Icelandic congress has 63 congressmen. In how many ways can a 4 congressman committee be formed?
We can proceed as before, and say that in the first slot in the committee, we can pick from 63 members,
second slot can accommodate 62 members, etc. So, the number of committees is 63*62*61*60. Well, that's
not quite right because we have counted the same committees many times. E.g., A, B, C, D is the same
committee as B, A, C, D, but both of these are counted as seperate committees in the 63*62*61*60. In how
many ways can a 4 party committee be arranged? This is the permualtion of 4, i.e., 4*3*2*1 = 24 different
ways. Thus, we have counted each committee 24 times, and the correct answer is thus 63*62*61*60/24 =
595,665 committees.
This leads to combinations, which is the number of ways we can pick k objects from n objects:
n
Ck =
n!
n ∗ (n − 1)∗. . . ∗(n − k + 1)
=
k!(n − k)!
1 ∗ 2∗. . . ∗k
The main difference is that in permutations, the order matters, but in combinations the order does not
matter.
Exercise
In how many ways can we pick 5 cards out of a deck of 52 cards?
The combinations relate to the binomial theorem (expansion of (a + b) ) and to the Pascal's triangle.
n
Now, let's do more with cards. Let's check the probability of getting a full house in poker (full house means
that for a 5 card hand, we have 3 of one kind and 2 of another kind, e.g., 10H,10D,10S,5C,5D), assuming we
cannot swap any cards.
We use the multilication rule with the following experiments:
1. Choose the kind for the 3 of a kind. Possibilities: 13
2. Choose 3 out of 4 possible cards for the kind chosen in step 1. Possibilities
4 C3
= 4
3. Choose the kind for the 2 of a kind. Possibilities: 12 (can't chosse the same as in step 1)
4. Choose 2 out of 4 possible cards for the kind chosen in step 3. Possibilities =
4 C2
= 6
Now, combine everything together, and we get the number of possible full houses: 13412*6 = 3744. As we
saw from the exercise, the total number of hands is 2,598,960. Thus, the probability of getting a full house
(straight from the dealer) is 3,744/2,598,960 = 0.144%. Pretty low...
Now, let's do the same using Monte Carlo.
In [18]: import random
#Construct the deck
kinds = list(range(1,11)) + ['J','Q','K']
suits = ['H','S','D','C']
deck_of_cards = [(k,s) for k in kinds for s in suits]
#Pick a 5 card hand at random:
def pick_hand(deck):
return random.sample(deck,5)
def is_full_house(hand):
#Need 3 items of one sort and 2 of the other
kinds = [h[0] for h in hand] #Removes the suit from the hand
unique_kinds = set(kinds) #Gives all kinds, removes duplicates
if len(unique_kinds)>2: #can only have two kinds. 1 kind does not exist in 5 card ha
return 0 #Not a full house
#Else, need to check that we have 2 and 3, not 1 and 4
if kinds.count(kinds[0]) in [1,4]:
return 0
return 1
N = 1000000 #Number of simulations
number_of_houses = 0
for i in range(N):
hand = pick_hand(deck_of_cards)
number_of_houses += is_full_house(hand)
print("Probability of full house: ",number_of_houses/N)
Probability of full house:
0.001436
Lecture 1 - Homework
1. In Galileo´s time, people thought that when 3 dice were rolled, a sum of 9 and a sum of 10 had the
same probability since each could be obtained in 6 ways:
9: 1+2+6, 1+3+5, 1+4+4, 2+2+5, 2+3+4, 3+3+3 10: 1+3+6, 1+4+5, 2+4+4, 2+3+5, 2+4+4, 3+3+4
Compute the probabilities of these sums and show that 10 is more likely than 9. Use either theory or
Monte Carlo (or both)
2. Five different awards are to be given to a class of 30 students. How many ways can this be done if (a)
each student can receive any number of awards, (b) each student can receive at most one award?
3. Seven people sit around a table. How many ways can this be done if Halli and Eiki (a) must sit next to
one another, (b) must not sit next to one another?
4. In Poker, what is the probability of being dealt a flush (5 card hand, dealer hands out 5 cards at random
from a deck of 52 cards)? Use either theory, Monte Carlo or both. Flush means that all the cards in the
hand have the same suit (e.g., all Hearts).
In [ ]:
Download