Lecture 1 - Applied Probability L1.1 What is probability? We use probability to desribe uncertain situations Will it rain tomorrow? Will the party on Friday be fun? Who will win in the elections? What will the price of USD in ISK be tomorrow? There are two main philosophical interpretations of probability: the frequentist approach and the subjective approach L1.1.1 Frequentist approach Here, probability is described in terms of frequency of occurence This approach assumes that one can repeat an experiment many times like tossing a coin Then count the frequency of the occurance of each outcome Define probability of an outcome as the percentage of the occurrance of the outcome over the number of trials as the number of trials approach infinity Main criticism: Many situations are not repeatable experiments L1.1.2 Subjective approach (Bayesian) Here, probability is described in terms of degree of belief Given my experience, I belive the probability of rain tomorrow to be 65% Main criticism: People have varying degrees of belief We don't need to worry about philosophical differences here. We just assume that probability theory is very useful in many different situations and use Kolmogorov's axiomatic approach. Probability is the most important concept in modern science especially as nobody has the slightest notion what it means -Bertrand Russel L1.2 Historical development Very long time ago (BC) Games of change popular in ancient Greece and Rome No scientific development of games or probability 1500s Cardano - Italian mathematician publishes book on probabilities in games involving dice and cards 1600s Fermat and Pascal - French mathematicians (Pascal's triangle, Fermat's last theorem) 1700s Bernoullis - law of large numbers De Moivre - the normal distribution, central limit theorem 1800s Laplace - publishes an important book on probability theory with many original results Gauss - Greatest mathematician in history, least squres, Gaussian distribution, practical applications Poisson - Poisson process Markov - Stochastic processes, Markov chains 1900s Kolmogorov - Russian mathematician, created an axiomatic approach to probability theory. L1.3 Sample spaces, events, and probability measures Kolmogorov's approach (somewhat simplified) An experiment is any process whose outcome is not known in advance toss of a coin roll of a die The sample space, Ω, is the set of all possible outcomes of an experiment Ω = {H , T } for a coin toss Ω = {1, 2, 3, 4, 5, 6} for a rolling a die An event A is a subset of Ω A = {H} is the event of tossing a Head in a single toss of a coin A = {2,4,6} is the event of rolling an even number in a single roll of a die A = {1,4} is the event of rolling either 1 or 4 in a single roll of a die A probability measure, or just probability, is a function that assigns a number to all events A, denoted by P (A), with the following properties (Non-negativity) P (A) ≥ 0 for all events A (Additivity) If A and B are two disjoint events, then the probability of their union satisfies P (A ∪ B) = P (A) + P (B) More generally, if A , A , A , . . . is a sequence (possibly infinite) of disjoint events then 1 2 3 P (A1 ∪ A2 ∪ A3 . . . ) = P (A1 ) + P (A2 ) + P (A3 )+. . . (Normalization) The probability of the full sample space Ω is 1, i.e., P (Ω) = 1 The definitions of events are more elaborate in more advanced courses (based on the concept of sigma algebras). In that context, Probability Theory is a special case of a more general theory called Measure Theory, which is also sometimes called the Integration Theory. Using the definition above, can you see that P (∅) = 0? Example 1. Coin tossing. Fair coin is tossed up once. This is an experiment because the outcome is not known in advance. The sample space is Ω = {H , T }. All possible events are ∅, {H}, {T}, and Ω (often impossible to list all possible events). The probability measure is defined by P (∅) = 0, P (H ) = 0.5, , P (T ) = 0.5 P (Ω) = 1 . Example 2. Rolling a die. Fair die is tossed up once. This is an experiment because the outcome is not known in advance. The sample space is Ω = {1, 2, 3, 4, 5, 6}. We now have 64 possible events (the powerset of Ω). The probability measure is defined by P (∅) = 0, P (i) = 1/6, P (i, j, . . , k) = P (i) + P (j)+. . . +P (k) when i, j, ..., k are all different. L1.4 Basic properties Discrete Probability Law If the sample space has finite number of possible outcomes, then the probability measure is fully specified by the probabilities of events with a single element. In particular P ({s1 , . . . , sn }) = P (s1 ) + P (s2 )+. . . +P (sn ) Discrete Uniform Probability If the same space consists of n outcomes that all equally likely, the the probability of any event A is given by P (A) = number of elements of A n We will use this law very often. Anytime we are tossing fair coins or dice or drawing from a deck of cards, etc., we are talking about uniform probabilities. Example: What is the probability of {1,2,3,4} when rolling a six faced die? Excercise: Write down the sample space of the roll of two six sided dice and determine the probability measure on events. Couple of useful properties Let A, B and C be events. Then (a) If A ⊆ B then P (A) ≤ P (B) (b) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (c) P (A ∪ B) ≤ P (A) + P (B) Example Roll two dice and suppose for simplicity that they are red and green. Let F = "At least one 4 appears", A = "A 4 appears on the red die", and B = "A 4 appears on the green die". So F = A ∪ B. Now, P (A) = P (B) = 1/6 and A ∩ B = (4, 4), which has a probability of 1/36. Using the relation above, we find P (F ) = P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 1/6 + 1/6 − 1/36 = 11/36 L1.5 Monte Carlo simulations Monte Carlo simulation is a method where we let the computer generate many experiments - draw random outcomes from a specified distibution many times. It is, e.g., much easier to let the computer generate the outcomes of 1000 coin tosses than to actually do that yourself. Once we have generated the random samples, we can use them to compute various properties of probability distributions, and compute the probability of certain events happening. Using Monte Carlo is often much easier than calculating the probabilities using mathematics, and it also serves as a useful tool to make sanity checks. We will be using Monte Carlo simulations a lot in this course. Let's do an example using python (it's easy to do the same using R or Julia or other programming languages). In this example, we are looking at the distribution, or the probability of all single outcomes, of a toss of a single die. It demonstrates that for few simulations, the probabilities may be erratic (we can even use Monte Carlo to estimate how erratic), but as we increase the number of simulations, the probabilities settle down to their theoretical values. We see the same thing happening in many situations, so it is always advisable to perform many simulations to get a reasonably accurate results. In [5]: %matplotlib inline import matplotlib.pyplot as plt import random import numpy as np np.random.seed(42) N = 100000 #Number of simulations sample = np.random.choice(range(1, 7), size=N, replace=True) #Chooses N samples from 1, #Aggregate the outcome samples_per_outcome = np.array([len(sample[sample==i]) for i in range(1,7)]) #Pull it to #Plot the distribution fig = plt.figure() plt.xlabel("Outcome") plt.ylabel("Probability (%)") plt.xlim([0, 7]) plt.bar(range(1,7), 100*samples_per_outcome/N) 100*samples_per_outcome/N #Convert this to a table - display a table using html? Out[5]: array([16.592, 16.799, 16.39 , 16.776, 16.81 , 16.633]) Let's see if we can use Monte Carlo to estimate the probability of getting an even number from the toss the die. In [9]: N = 100000 d1 = np.random.choice(range(1, 7), size=N, replace=True) even = 0 for outcome in d1: if outcome%2 == 0: even += 1 print("Estimated probability of an even number: ", even/N) Estimated probability of an even number: 0.5003 Now, let's up the game a little bit and see what we can do with two dice. What is the probability of the sum of two dice being 3? Let's first work this out theoretically. The sample space of rolling two dice is a 6x6 square and we have a uniform distribution. The events leading to a sum of 3 are only two, namely {1,2} and {2,1}, so the probability of rolling a sum of 3 is 2/36 = 1/18 which is approximately 0.056. In [13]: N = 1000000 d1 = np.random.choice(range(1, 7), size=N, replace=True) #Rolling the first die d2 = np.random.choice(range(1, 7), size=N, replace=True) #Indepedent, toss of d1 does no s = d1 + d2 sum_is_3 = 0 for outcome in s: if outcome == 3: sum_is_3 += 1 print("Estimated probability getting a sum of 3: ", sum_is_3/N) Estimated probability getting a sum of 3: 0.055763 Exercise: Can you find the probability of getting a sum greater than 9? Let's find the distribution, or probabilities of all possible sums of two dice: In [15]: N = 10000 d1 = np.random.choice(range(1, 7), size=N, replace=True) #Rolling the first die d2 = np.random.choice(range(1, 7), size=N, replace=True) #Indepedent, toss of d1 does no sample = d1+d2 samples_per_outcome = np.array([len(sample[sample==i]) for i in range(2,13)]) #Pull it t fig = plt.figure() plt.xlabel("Outcome") plt.ylabel("Probability (%)") plt.xlim([1, 13]) plt.bar(range(2,13), 100*samples_per_outcome/N) 100*samples_per_outcome/N Out[15]: array([ 2.6 , 5.1 , 5.62, 8.74, 11.08, 14.07, 16.81, 14.27, 10.64, 2.48]) 8.59, Finally, here is a cool example from 'Bayesian Methods for Hackers' - let's not worry to much about the underlying assumptions, but the idea is to use a Bayesian approach to estimate the probabilities of getting heads or tails using only data. Initially, we assume that we have no idea what the probability of heads (or tails) is - in Bayesian terms, this is called prior belief, and we assume complete ignorance so all probabilities between 0 and 1 are possible. Then, we start to toss and collect data. We toss once and get a tail, so the proability of getting a head moves ever so slightly to zero. Then we toss again and get a tail, and now the probabilities are centralized around 0.5, but the distribution is quite wide (meaning that there is a a significant uncertainty). As we make more and more tosses and collect the data, the get more and more certain that there is 50% probability of getting a head. In [2]: %matplotlib inline from IPython.core.pylabtools import figsize import numpy as np from matplotlib import pyplot as plt figsize(11, 9) import scipy.stats as stats dist = stats.beta n_trials = [0, 1, 2, 3, 4, 5, 8, 15, 50, 500] data = stats.bernoulli.rvs(0.5, size=n_trials[-1]) x = np.linspace(0, 1, 100) # For the already prepared, I'm using Binomial's conj. prior. for k, N in enumerate(n_trials): sx = plt.subplot(len(n_trials)//2, 2, k+1) plt.xlabel("$p$, probability of heads") \ if k in [0, len(n_trials)-1] else None plt.setp(sx.get_yticklabels(), visible=False) heads = data[:N].sum() y = dist.pdf(x, 1 + heads, 1 + N - heads) plt.plot(x, y, label="observe %d tosses,\n %d heads" % (N, heads)) plt.fill_between(x, 0, y, color="#348ABD", alpha=0.4) plt.vlines(0.5, 0, 4, color="k", linestyles="--", lw=1) leg = plt.legend() leg.get_frame().set_alpha(0.4) plt.autoscale(tight=True) plt.suptitle("Bayesian updating of posterior probabilities", y=1.02, fontsize=14) plt.tight_layout() There are couple of important counting methods that we should review here: the multiplication rule, permutations and combinations. Example: If I have 4 shirts and 3 jackets, I can dress in 12 different combinations. The reason is that each shirt can have 3 jackets, and I have 4 shirts, so the total combinations is 3 + 3 + 3 + 3 = 4*3 = 12. The multiplication rule, permutations and combinations Suppose there are k experiments performed in order and each experiment i has n possible outcomes, then i the total number of outcomes are n n . . . n 1 2 k In the example above, we have two experiments (k = 2), the first experiment is choosing a shirt (n and the second expirment is choosing a jacket (n 2 = 3 1 = 4 ) ). Now, let's move to permutations. In how many ways can 5 people stand in a line? The first experiment is to pick a person in the front of the line. We can pick any of the 5 persons, so n 1 = 5 . In the second position, we can now choose 4 people, (because we have picked one person in the first position). Then 3 people, then 2, and the last person must come at the end of the line. Thus, the number of ways 5 people can stand in a line is 5*4*3*2*1 = 120. We define n! = n ∗ (n − 1)∗. . . ∗2 ∗ 1 and it's called n factorial. Suppose we have 11 football players. In how many ways can we pick 3 offensive players for the left wing, center, and right wing out of the 11 players? We start with the first striker (=offensive player) and we have 11 choices. Second striker is then 10, and last one is 9. This can also be written as 11! 8! and is what is called a permutation. More generally, when we want to pick k objects from a population of n objects and arrange them in a sequence, i.e., the number of distinct k object sequences, then we have n! n Pk = n ∗ (n − 1)∗. . . ∗(n − k + 1) = (n − k)! and this is called k - permutations. Example Count the number of 4 letter words using the english alphabet of 26 characters (never mind their meaning...) n! = (n − k)! 26! = 358, 800 22! The Icelandic congress has 63 congressmen. In how many ways can a 4 congressman committee be formed? We can proceed as before, and say that in the first slot in the committee, we can pick from 63 members, second slot can accommodate 62 members, etc. So, the number of committees is 63*62*61*60. Well, that's not quite right because we have counted the same committees many times. E.g., A, B, C, D is the same committee as B, A, C, D, but both of these are counted as seperate committees in the 63*62*61*60. In how many ways can a 4 party committee be arranged? This is the permualtion of 4, i.e., 4*3*2*1 = 24 different ways. Thus, we have counted each committee 24 times, and the correct answer is thus 63*62*61*60/24 = 595,665 committees. This leads to combinations, which is the number of ways we can pick k objects from n objects: n Ck = n! n ∗ (n − 1)∗. . . ∗(n − k + 1) = k!(n − k)! 1 ∗ 2∗. . . ∗k The main difference is that in permutations, the order matters, but in combinations the order does not matter. Exercise In how many ways can we pick 5 cards out of a deck of 52 cards? The combinations relate to the binomial theorem (expansion of (a + b) ) and to the Pascal's triangle. n Now, let's do more with cards. Let's check the probability of getting a full house in poker (full house means that for a 5 card hand, we have 3 of one kind and 2 of another kind, e.g., 10H,10D,10S,5C,5D), assuming we cannot swap any cards. We use the multilication rule with the following experiments: 1. Choose the kind for the 3 of a kind. Possibilities: 13 2. Choose 3 out of 4 possible cards for the kind chosen in step 1. Possibilities 4 C3 = 4 3. Choose the kind for the 2 of a kind. Possibilities: 12 (can't chosse the same as in step 1) 4. Choose 2 out of 4 possible cards for the kind chosen in step 3. Possibilities = 4 C2 = 6 Now, combine everything together, and we get the number of possible full houses: 13412*6 = 3744. As we saw from the exercise, the total number of hands is 2,598,960. Thus, the probability of getting a full house (straight from the dealer) is 3,744/2,598,960 = 0.144%. Pretty low... Now, let's do the same using Monte Carlo. In [18]: import random #Construct the deck kinds = list(range(1,11)) + ['J','Q','K'] suits = ['H','S','D','C'] deck_of_cards = [(k,s) for k in kinds for s in suits] #Pick a 5 card hand at random: def pick_hand(deck): return random.sample(deck,5) def is_full_house(hand): #Need 3 items of one sort and 2 of the other kinds = [h[0] for h in hand] #Removes the suit from the hand unique_kinds = set(kinds) #Gives all kinds, removes duplicates if len(unique_kinds)>2: #can only have two kinds. 1 kind does not exist in 5 card ha return 0 #Not a full house #Else, need to check that we have 2 and 3, not 1 and 4 if kinds.count(kinds[0]) in [1,4]: return 0 return 1 N = 1000000 #Number of simulations number_of_houses = 0 for i in range(N): hand = pick_hand(deck_of_cards) number_of_houses += is_full_house(hand) print("Probability of full house: ",number_of_houses/N) Probability of full house: 0.001436 Lecture 1 - Homework 1. In Galileo´s time, people thought that when 3 dice were rolled, a sum of 9 and a sum of 10 had the same probability since each could be obtained in 6 ways: 9: 1+2+6, 1+3+5, 1+4+4, 2+2+5, 2+3+4, 3+3+3 10: 1+3+6, 1+4+5, 2+4+4, 2+3+5, 2+4+4, 3+3+4 Compute the probabilities of these sums and show that 10 is more likely than 9. Use either theory or Monte Carlo (or both) 2. Five different awards are to be given to a class of 30 students. How many ways can this be done if (a) each student can receive any number of awards, (b) each student can receive at most one award? 3. Seven people sit around a table. How many ways can this be done if Halli and Eiki (a) must sit next to one another, (b) must not sit next to one another? 4. In Poker, what is the probability of being dealt a flush (5 card hand, dealer hands out 5 cards at random from a deck of 52 cards)? Use either theory, Monte Carlo or both. Flush means that all the cards in the hand have the same suit (e.g., all Hearts). In [ ]: