Algebra 1 Summer Institute 2014 The Birthday Paradox When is

advertisement
Algebra 1 Summer Institute 2014
The Birthday Paradox
1. When is your birthday (month and day only)? Is there another participant with the
same birthday?
2. How many people are required such that it is more likely than not that at least two
people in the group share the same birthday? In other words, how many people
might be necessary to have in a room for the chances of having a 50% of two of
them sharing a birthday?
Decide with the group what number of people we should use to find a match. Let
us use Excel and the number 50 for the following instructions:
3. The first step is to randomly generate 50 integers between 1 and 12, inclusive to
represent the month, and then 50 integers between 1 and 31, inclusive, to
represent the date. We are going to disregard for now any dates that represent
“bogus” dates, like Feb 30th. We will show another way to represent the dates so
we can avoid strange dates. The creation of dates can be done like this:
In excel, use the command “randbetween(x, y)” to generate the random numbers.
In cell B2, type “=randbetween(1, 12)” to generate the birth months. In cell C2
type ““=randbetween(1, 31)” to generate the birth day. Drag these formulas to
generate 50 random birthdates. The participants can then read lists 1 and 2 as
twenty-three random "birthdays" by month and day.
The data in columns B and C represent 50 randomly generated birthdays. For
example a 3
in cell B2 and 24 in cell C2 represent the date 24th March. We now
need to browse through this list and search for a repeated date. This can be time
consuming and inconvenient. In order to simplify this part of the process the dates
may be converted to three or four digit numbers by entering the formula
=100*B2+C2 in column D. Once this is done the first one or two digits of each
number in column D will represent the month while the last two digits will
represent the day. For example, the appearance of 225 in the list indicates 25th of
February while 1019 indicates 19th of October.
The list of dates appearing in column D needs to be sorted so that a birthday
match can appear as two successive numbers and therefore be easily identified.
To do this we select column D, go to Edit, select Copy, click on a column away
from the data (for example choose a column from column F onwards), go to Edit,
select Paste Special and click on values and then click on OK. This will ensure
1
Algebra 1 Summer Institute 2014
that all the numbers of column D will now be copied in the same sequence in the
new column. Now click on the new column and select the sort (in ascending
order) option from the toolbar. Once the dates are sorted a match can be easily
identified.
4. The experiment may be run about 10 times to confirm that in each simulation of
50 birthdays (representing the birthdays of 50 randomly selected people) there is
at least one match. The pitfall of this simulation process is that impossible dates
(such as 431, that is, 31 April etc) may appear in a particular list. In such a case
the entire list can be ignored and the simulation may be repeated.
After each participant has done the experiment 10 times with 50 random
birthdays, find the experimental probability for the whole group. Is it better than
50%? How many people do we need to have a probability of 50%?
5. Let us analyze the problem using probability theory. Begin the discussion by
finding the probability of a match in a group of 2 people, 3, 4, and 5. Once a
pattern is evident, participants can easily generalize it to find the formula for the
probability of a match in a group size of n persons.
The probability that two strangers do not share a birthday is:
Pno Match(2) = 365/365 x 364/365 = 364/365
assuming that neither of them was born in a leap year, with the probability of a
match being the complement of this event, that is:
Pmatch(2) = 1 -(365/365) x (364/365), or .00274.
The probability that three strangers do not share a birthday is
Pno Match(3) = (365/365)(364/365)(363/365), or .991796,
with the complement, where a pair of matching birthdays exists, being
2
Algebra 1 Summer Institute 2014
Pmatch(3) =1 - (365/365)(364/365)(363/365) = 0.008204.
It needs to be emphasized here that in a group of 3 people, there are three possible
cases:
i.
ii.
iii.
All three people have the distinct birthdays
Two people have the same birthday while the third has a different birthday
All three have the same birthday
Since the three cases are mutually exclusive and exhaustive, the sum of their
probabilities is 1. Thus the probability that at least two people have the same
birthday includes cases (ii) and (iii) and can be obtained by subtracting the
probability of (i) from 1.
Considering a group of four strangers where a matching pair is present produces
the probability
Pmatch(4) = 1 - (365/365)(364/365)(363/365)(362/365) = 0.016356
At this stage, the participants should generate a general formula that calculates the
probability of a matching pair of birthdays for n random birthdays. The general
formula involves permutations and is
365!
Pmatch(n) = 1 − (365−𝑛)!βˆ™365𝑛
While generalizing the formula, participants may need help in relating the last
number of the product in the numerator to the group size, n. For example, the last
number for n=3 is 365–2=363, for n=4 it is 365–3=362, for n=5 it is 365–4=361;
for n=k, it is 365 – (k – 1). Once the generalized expression is obtained the
knowledge of factorials may be used to write the expression in concise manner.
6. We are going to use this formula to explore in Excel the probabilities of matches
with different number of people. The factorial of 365 is a very large number and
Excel will not be able to represent all of its digits. We cannot use the formula at
once. We will need to do it in stages.
In column B we will have a count of people. Type 1 in the cell B2, and 2 in cell
B3. Select both cells and drag them down 100 cells or so
In column C we will count from 365 down by one. Type “365” in cell C2, and in
cell C3 type “=C2 -1”. Drag this formula down to the same number cell as column
A.
In column D we will have the product of 365 x 364 and so on (the numerator of
3
Algebra 1 Summer Institute 2014
the formula). In cell D2 type “=product($C$2)”, the $ signs will keep this cell
fixed in the formula. In cell D3 type “=product($C$2:C3)”. Drag this formula
down. This formula multiples all the numbers in column C beginning with C2
(fixed) and ending at Cn. This numbers will be large very quickly and Excel will
write them in scientific notation.
In column E we will calculate 365n with the values of n taken from column B.
Type “=$C$2^B2”, and drag it down.
In column F we will calculate the probability. Type “=1-(D2/E2)”
We are going to change the results from column F to percentages. In cell G2 type
“=F2*100”. Format the numbers so we can see 2 decimals only.
How many people are needed to have a probability around 50%?
Let us make a graph of these percentages. Select all the numbers in column G,
click on Charts and select “line”.
4
Algebra 1 Summer Institute 2014
7. We are going to do some simulations in Excel to see if the experimental
probabilities are close to the mathematical probabilities for this problem
In this case, we will number the days of the year from 1 to 365. For example day
32 represents Feb 1st. This way we will not have problems with dates like April
31st (we will not consider leap years).
In column B we will count the number of people that we will use in the
simulation, 23 in this case. In column C we will generate random numbers from 1
to 365 using “=randbetween(1,365)”.
To find the matches, we will practice with conditional formatting. Select the
numbers in column C and click on conditional formatting. We are going to use a
formula to select what cells to highlight. The formula is
“=countif(C$2:C$24,C2)>1”. When matches occur, Excel will highlight those
cells.
Run this simulations 20 times by clicking command and the equal sign key at the
same time (Mac version). Count how many times you see highlighted cells (not
how many cells are highlighted). Calculate the experimental probability.
5
Download