Algebra 1 Summer Institute 2014 The Birthday Paradox 1. When is your birthday (month and day only)? Is there another participant with the same birthday? 2. How many people are required such that it is more likely than not that at least two people in the group share the same birthday? In other words, how many people might be necessary to have in a room for the chances of having a 50% of two of them sharing a birthday? Decide with the group what number of people we should use to find a match. Let us use Excel and the number 50 for the following instructions: 3. The first step is to randomly generate 50 integers between 1 and 12, inclusive to represent the month, and then 50 integers between 1 and 31, inclusive, to represent the date. We are going to disregard for now any dates that represent “bogus” dates, like Feb 30th. We will show another way to represent the dates so we can avoid strange dates. The creation of dates can be done like this: In excel, use the command “randbetween(x, y)” to generate the random numbers. In cell B2, type “=randbetween(1, 12)” to generate the birth months. In cell C2 type ““=randbetween(1, 31)” to generate the birth day. Drag these formulas to generate 50 random birthdates. The participants can then read lists 1 and 2 as twenty-three random "birthdays" by month and day. The data in columns B and C represent 50 randomly generated birthdays. For example a 3β¨in cell B2 and 24 in cell C2 represent the date 24th March. We now need to browse through this list and search for a repeated date. This can be time consuming and inconvenient. In order to simplify this part of the process the dates may be converted to three or four digit numbers by entering the formula =100*B2+C2 in column D. Once this is done the first one or two digits of each number in column D will represent the month while the last two digits will represent the day. For example, the appearance of 225 in the list indicates 25th of February while 1019 indicates 19th of October. The list of dates appearing in column D needs to be sorted so that a birthday match can appear as two successive numbers and therefore be easily identified. To do this we select column D, go to Edit, select Copy, click on a column away from the data (for example choose a column from column F onwards), go to Edit, select Paste Special and click on values and then click on OK. This will ensure 1 Algebra 1 Summer Institute 2014 that all the numbers of column D will now be copied in the same sequence in the new column. Now click on the new column and select the sort (in ascending order) option from the toolbar. Once the dates are sorted a match can be easily identified. 4. The experiment may be run about 10 times to confirm that in each simulation of 50 birthdays (representing the birthdays of 50 randomly selected people) there is at least one match. The pitfall of this simulation process is that impossible dates (such as 431, that is, 31 April etc) may appear in a particular list. In such a case the entire list can be ignored and the simulation may be repeated. After each participant has done the experiment 10 times with 50 random birthdays, find the experimental probability for the whole group. Is it better than 50%? How many people do we need to have a probability of 50%? 5. Let us analyze the problem using probability theory. Begin the discussion by finding the probability of a match in a group of 2 people, 3, 4, and 5. Once a pattern is evident, participants can easily generalize it to find the formula for the probability of a match in a group size of n persons. The probability that two strangers do not share a birthday is: Pno Match(2) = 365/365 x 364/365 = 364/365 assuming that neither of them was born in a leap year, with the probability of a match being the complement of this event, that is: Pmatch(2) = 1 -(365/365) x (364/365), or .00274. The probability that three strangers do not share a birthday is Pno Match(3) = (365/365)(364/365)(363/365), or .991796, with the complement, where a pair of matching birthdays exists, being 2 Algebra 1 Summer Institute 2014 Pmatch(3) =1 - (365/365)(364/365)(363/365) = 0.008204. It needs to be emphasized here that in a group of 3 people, there are three possible cases: i. ii. iii. All three people have the distinct birthdays Two people have the same birthday while the third has a different birthday All three have the same birthday Since the three cases are mutually exclusive and exhaustive, the sum of their probabilities is 1. Thus the probability that at least two people have the same birthday includes cases (ii) and (iii) and can be obtained by subtracting the probability of (i) from 1. Considering a group of four strangers where a matching pair is present produces the probability Pmatch(4) = 1 - (365/365)(364/365)(363/365)(362/365) = 0.016356 At this stage, the participants should generate a general formula that calculates the probability of a matching pair of birthdays for n random birthdays. The general formula involves permutations and is 365! Pmatch(n) = 1 − (365−π)!β365π While generalizing the formula, participants may need help in relating the last number of the product in the numerator to the group size, n. For example, the last number for n=3 is 365–2=363, for n=4 it is 365–3=362, for n=5 it is 365–4=361; for n=k, it is 365 – (k – 1). Once the generalized expression is obtained the knowledge of factorials may be used to write the expression in concise manner. 6. We are going to use this formula to explore in Excel the probabilities of matches with different number of people. The factorial of 365 is a very large number and Excel will not be able to represent all of its digits. We cannot use the formula at once. We will need to do it in stages. In column B we will have a count of people. Type 1 in the cell B2, and 2 in cell B3. Select both cells and drag them down 100 cells or so In column C we will count from 365 down by one. Type “365” in cell C2, and in cell C3 type “=C2 -1”. Drag this formula down to the same number cell as column A. In column D we will have the product of 365 x 364 and so on (the numerator of 3 Algebra 1 Summer Institute 2014 the formula). In cell D2 type “=product($C$2)”, the $ signs will keep this cell fixed in the formula. In cell D3 type “=product($C$2:C3)”. Drag this formula down. This formula multiples all the numbers in column C beginning with C2 (fixed) and ending at Cn. This numbers will be large very quickly and Excel will write them in scientific notation. In column E we will calculate 365n with the values of n taken from column B. Type “=$C$2^B2”, and drag it down. In column F we will calculate the probability. Type “=1-(D2/E2)” We are going to change the results from column F to percentages. In cell G2 type “=F2*100”. Format the numbers so we can see 2 decimals only. How many people are needed to have a probability around 50%? Let us make a graph of these percentages. Select all the numbers in column G, click on Charts and select “line”. 4 Algebra 1 Summer Institute 2014 7. We are going to do some simulations in Excel to see if the experimental probabilities are close to the mathematical probabilities for this problem In this case, we will number the days of the year from 1 to 365. For example day 32 represents Feb 1st. This way we will not have problems with dates like April 31st (we will not consider leap years). In column B we will count the number of people that we will use in the simulation, 23 in this case. In column C we will generate random numbers from 1 to 365 using “=randbetween(1,365)”. To find the matches, we will practice with conditional formatting. Select the numbers in column C and click on conditional formatting. We are going to use a formula to select what cells to highlight. The formula is “=countif(C$2:C$24,C2)>1”. When matches occur, Excel will highlight those cells. Run this simulations 20 times by clicking command and the equal sign key at the same time (Mac version). Count how many times you see highlighted cells (not how many cells are highlighted). Calculate the experimental probability. 5