Introduction_to_Prob..

advertisement
Introduction to Probability
The way we approach probability was thought up by an Italian mathematician named Gerolamo
Cardano (1501-1576), who invented it to help himself make a living by gambling. His theory was further
developed by the French mathematicians Pierre de Fermat (1601 or a few years later-1665) and Blaise
Pascal (1623-1662). They figured out a way to divide the stakes fairly in a game of chance which is
interrupted before it can be finished. I’m telling you this because it puts a human face on the whole
subject.
Sample Space and Event
The first idea is that of a probability experiment, which is a general term for something you do that
results in a certain outcome. It’s easier to understand with an example: You flip a coin. It lands with
heads or tails up (we don’t consider it ever landing on an edge). It’s one or the other, each time you flip
it. Landing with heads up is an outcome, and so is landing with tails up. Let’s label these outcomes H and
T. Then the set {H, T} is called the sample space, S, of the experiment. The number of outcomes in the
sample space is its size, and our notation for this is the function notation ()Sn, pronounced “n of S.” So S
= {H, T} and n(S)=2 in this experiment. Not so hard.
How about if your experiment is rolling a die (the singular form of “dice”)? Then there are six possible
outcomes – the 1 is facing up, the 2 is facing up, etc. – and we can write S = {1, 2, 3, 4, 5, 6}, and n(S)=6.
What about picking a card from a standard 52-card deck? Well, here you have to specify what the
experiment is. Are you looking at which suit you got? If so the sample space is S = {Club, Diamond, Spade,
Heart} and n(S)=4. If you’re looking at what rank you got, S = {2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King,
Ace} and n(S)=13. If you’re looking at both aspects, suit and rank, then S = {2 of Clubs, 3 of Clubs, … , King
of Hearts, Ace of Hearts} and n(S)=52. Or you could be looking to see if the card is red or black, in which
case S = {Red, Black} and n(S)=2. Or you could be looking to see if you got a face card (jack, queen, or
king) or not, and S = {Face Card, Not Face Card} and n(S)=2. We could probably go on like this for quite a
while, but you get the idea. The sample space and its size depend on what aspect of the card you’re
focusing on.
One further concept is called an event. An event, E, is some set or collection of the outcomes in a
sample space (or maybe none of the outcomes). For example, with rolling a die, we could talk about the
event “rolling an even number.” Then E = {2, 4, 6}. We call its size n(E). In this case, n(E)=3. Here’s
another event: “rolling at least a 5.” Here E = {5, 6} and . The event could encompass the entire sample
space – “rolling less than a 7,” which is E = {1, 2, 3, 4, 5, 6} and n(E)=6, or could have absolutely no
outcomes – “rolling more than a 6,” which is E = { }, or φ, which is the mathematical symbol for the
empty set, and n(E)=0.
Anything you can describe could be an event, or you could just list the outcomes in the event without
describing what they have in common.
Now we’ll move on to the probability experiment of flipping two coins and seeing what sides are face up.
We have to have some way of distinguishing the two coins. Maybe one is a dime and the other is a
quarter. Or one could be gold and the other could be silver (or think of a penny and a dime). Or one is
named Coin #1 and the other Coin #2. When I refer to the outcome HH, I mean both were heads. When I
mention HT, I mean that the first coin came up heads and the other one tails, but when I write TH, I
mean that the first coin came up tails and the other one heads. This is an important distinction to
remember.
So what’s the sample space? S = {HH, HT, TH, TT}, and n(S)=4. We could talk about the event “getting
two heads.” Let’s give this event the notation 2H. Then 2H = {HH}, and n(2H)=1. Using similar notation,
we get 1H = {HT, TH}, and n(1H)=2. And, of course, 0H = {TT}, and n(0H)=1.
How about flipping three coins? Well, the sample space is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT},
so n(S)=8. (If you’re seeing a pattern of the sizes of the sample spaces here, good for you! Yes, for one
3
coin it’s 21  2 , for two coins 22  4 , and for three coins 2  8 .) Here are the events of getting
various numbers of heads, and their sizes:
3H = {HHH}, n(3H)=1
2H = {HHT, HTH, THH}, n(2H)=3
1H = {HTT, THT, TTH}, n(1H)=3
0H = {TTT}, n(0H)=1
Having set up the structure of the sample space, outcomes, and events, we can finally get to probability.
And this is classical probability, which simply means that all outcomes are equally likely to occur each
time you perform the experiment. The coins are fair, which means that they’re equally likely to land
with the heads up or the tails up each time they’re flipped; the dice are fair, which means….well, you get
the idea. It doesn’t mean that on every two flips you’ll get one head and one tail, or that every six rolls
of a die will yield a 1, a 2, etc. It just means that in the long run the outcomes will even out in frequency.
(Much later in the course we’ll talk about how you can show that certain dice aren’t fair, or if you can’t
show that then you’ll conclude that maybe they are fair.)
We talk about the probability of an event, E, occurring. This is a number between 0 and 1 – always! We
write P(E), pronounced “P of E,” in the manner of function notation. So we can write:
0  P( E )  1
to express the fact that the probability of an event is between 0 and 1. It’s 0 if the event is impossible
(like rolling a 7 with one die), it’s 1 if it’s a sure thing, and if it might or might not happen (what we call a
conditional event), it’s bigger than 0 and less than 1. And the closer the probability is to 1 the more
confident we are of it happening.
So here’s the formula for determining the probability of an event E:
P( E ) 
n( E )
n( S )
In other words, the probability is the fraction of the outcomes in the sample space that are also in the
event. If you ever get a probability that is greater than 1, you've probably made a mistake.
Let’s go back to rolling the die. The probability of rolling a 4 is 1/6, because there’s one outcome in the
event “rolling a 4” and 6 in the sample space of “rolling a die.” The probability of rolling at least a 5 is
2/6=1/3. The probability of rolling an even number is 3/6=1/2.
Let’s look at the coin flipping. Refer to the sample spaces and their sizes and the events of getting a
certain number of heads and their sizes to get these results:
One coin:
P(1H)=P(0H)=1/2
Two coins:
P(2H)=1/4
P(1H)=2/4
P(0H)=1/4
Three coins:
P(3H)=1/8
P(2H)=3/8
P(1H)=3/8
P(0H)=1/8
The numerators of these probabilities form some amazing patterns, which you may be familiar with if
you know about Pascal’s Triangle.
Empirical Probability and Frequency
Think about how the probability formula is dependent on the notion of fairness, or equally-likely
outcomes. If a coin weren’t fair, we couldn’t say that P(1H)=1/2. We would have no idea what the
probability of getting a head would be, except by flipping the coin many, many times and seeing what
fraction of the flips come up heads.
In other words, we’d have to observe what actually happens and count the frequencies of particular
outcomes. This is the basis for empirical probability, in which probabilities are determined not by sample
spaces with equally-likely outcomes but by observing the fraction of times the probability experiment
(like flipping the coin) comes out one way or another. The word “empirical” comes from a Greek word
meaning experienced. Empirical probability is the result of experience, not deduction. Since empirical
probability must be directly based on frequencies of observable events, this way of looking at
probability is also called a "frequentist" approach. For example, what is the chance of you being hit by
lightning within this year? Without a nice mathematical model, our best bet is looking at the frequency
of lightning strikes compiled by national weather services:
By dividing the number of reported injuries and death (400), by the size of the U.S. population
(300,000,000), we arrive at the figure 1/750,000, although it's obviously a very rough estimate. For
example, what about all the strikes that were unreported? Also by using the U.S. population as the
denominator, we are assuming that each person in the U.S. is a random experiment with two outcomes:
"being struck", and "not being struck". But we all know that lightnings don't happen like coin flips -certain weather conditions contribute to lightning more than others. By avoiding hiding under a tree in a
thunderstorm, we are probably able to greatly reduce the probability of being struck by lightning.
This might lead you wonder how we can call a coin fair if not by using empirical probability, by flipping it
lots of times. It would be possible to make a coin without any markings, except maybe for labels of
heads and tails written in very light-weight ink, and with uniform density and symmetry, and then we
could assume that it is fair because of the physics of flipping it.
Later in the course we’ll see how we can conclude that certain processes (rolls of dice is what we’ll use)
are most likely fair or not using empirical probability, but for now we’ll just say that the assumption of
fairness is an abstract one that enables us to develop the classical theory of probability using equallylikely outcomes. The discussion can get philosophical very quickly if we keep pursuing the meaning of
probability.
Contingency Tables
Now we've seen some examples of empirical probability, we are now ready to introduce a more
sophisticated way of counting frequencies. It’s called a contingency table, and it’s a way of tabulating
data from two separate variables for each subject in the sample. For instance, looking at the Class Data,
we might wonder whether there are differences between men and women in the favorite colors they
pick. You might get some idea of this by just looking at the colors listed for the men and for the women,
but it would be hard to draw conclusions from just the lists. So we categorize each person in the data
base by their sex and by their favorite color, and we present the results in a neat table.
Immediately we encounter a problem, or at least something that needs to be decided (called a protocol
if you’re analyzing data). Whereas it’s easy to say there will be two categories for sex, what about for
favorite color? If we list all the colors separately there will be too many categories for our minds to take
in comfortably. We have to break the colors down into a few groups. There are endless ways to do this.
Each way would produce a different contingency table. We just have to choose one way and stick to it.
Here’s my way: Primary (blue, red, and yellow), Secondary (green, orange, and purple), and Other
(everything else, like white and brown and black and silver and anything with white in it, like a pastel).
Even with this decision, there are individual colors which we have to choose which of the three
categories they belong to. Let’s call maroon Primary, turquoise Secondary, and lavender Other. Let’s just
leave out the person who didn’t have a favorite color. We then make a table that looks like this:
Primary
Male
Female
Secondary
Other
Now we’re ready to put numbers in the six cells in the table. We could start with tally marks, looking at
each line in the Class Data Base and putting a tally mark in the correct box. For instance, Person #1,
being a male who chose blue, would be tallied in the upper left box. When we finish, the table would
look like this:
Primary
Secondary
Other
Male
14
9
2
Female
12
24
11
That’s all we’ll do with the contingency table this time, just tabulate it. I hope you agree that it makes
the patterns of favorite colors of the two sexes a lot more apparent. For women, Other was almost as
popular as Primary, but Secondary was the most popular, whereas for men Primary outstripped
Secondary, and Other was hardly there! This is just the beginning of the kind of analysis we’ll be doing.
Download