Introduction to Probability The way we approach probability was thought up by an Italian mathematician named Gerolamo Cardano (1501-1576), who invented it to help himself make a living by gambling. His theory was further developed by the French mathematicians Pierre de Fermat (1601 or a few years later-1665) and Blaise Pascal (1623-1662). They figured out a way to divide the stakes fairly in a game of chance which is interrupted before it can be finished. I’m telling you this because it puts a human face on the whole subject. Sample Space and Event The first idea is that of a probability experiment, which is a general term for something you do that results in a certain outcome. It’s easier to understand with an example: You flip a coin. It lands with heads or tails up (we don’t consider it ever landing on an edge). It’s one or the other, each time you flip it. Landing with heads up is an outcome, and so is landing with tails up. Let’s label these outcomes H and T. Then the set {H, T} is called the sample space, S, of the experiment. The number of outcomes in the sample space is its size, and our notation for this is the function notation ()Sn, pronounced “n of S.” So S = {H, T} and n(S)=2 in this experiment. Not so hard. How about if your experiment is rolling a die (the singular form of “dice”)? Then there are six possible outcomes – the 1 is facing up, the 2 is facing up, etc. – and we can write S = {1, 2, 3, 4, 5, 6}, and n(S)=6. What about picking a card from a standard 52-card deck? Well, here you have to specify what the experiment is. Are you looking at which suit you got? If so the sample space is S = {Club, Diamond, Spade, Heart} and n(S)=4. If you’re looking at what rank you got, S = {2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King, Ace} and n(S)=13. If you’re looking at both aspects, suit and rank, then S = {2 of Clubs, 3 of Clubs, … , King of Hearts, Ace of Hearts} and n(S)=52. Or you could be looking to see if the card is red or black, in which case S = {Red, Black} and n(S)=2. Or you could be looking to see if you got a face card (jack, queen, or king) or not, and S = {Face Card, Not Face Card} and n(S)=2. We could probably go on like this for quite a while, but you get the idea. The sample space and its size depend on what aspect of the card you’re focusing on. One further concept is called an event. An event, E, is some set or collection of the outcomes in a sample space (or maybe none of the outcomes). For example, with rolling a die, we could talk about the event “rolling an even number.” Then E = {2, 4, 6}. We call its size n(E). In this case, n(E)=3. Here’s another event: “rolling at least a 5.” Here E = {5, 6} and . The event could encompass the entire sample space – “rolling less than a 7,” which is E = {1, 2, 3, 4, 5, 6} and n(E)=6, or could have absolutely no outcomes – “rolling more than a 6,” which is E = { }, or φ, which is the mathematical symbol for the empty set, and n(E)=0. Anything you can describe could be an event, or you could just list the outcomes in the event without describing what they have in common. Now we’ll move on to the probability experiment of flipping two coins and seeing what sides are face up. We have to have some way of distinguishing the two coins. Maybe one is a dime and the other is a quarter. Or one could be gold and the other could be silver (or think of a penny and a dime). Or one is named Coin #1 and the other Coin #2. When I refer to the outcome HH, I mean both were heads. When I mention HT, I mean that the first coin came up heads and the other one tails, but when I write TH, I mean that the first coin came up tails and the other one heads. This is an important distinction to remember. So what’s the sample space? S = {HH, HT, TH, TT}, and n(S)=4. We could talk about the event “getting two heads.” Let’s give this event the notation 2H. Then 2H = {HH}, and n(2H)=1. Using similar notation, we get 1H = {HT, TH}, and n(1H)=2. And, of course, 0H = {TT}, and n(0H)=1. How about flipping three coins? Well, the sample space is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}, so n(S)=8. (If you’re seeing a pattern of the sizes of the sample spaces here, good for you! Yes, for one 3 coin it’s 21 2 , for two coins 22 4 , and for three coins 2 8 .) Here are the events of getting various numbers of heads, and their sizes: 3H = {HHH}, n(3H)=1 2H = {HHT, HTH, THH}, n(2H)=3 1H = {HTT, THT, TTH}, n(1H)=3 0H = {TTT}, n(0H)=1 Having set up the structure of the sample space, outcomes, and events, we can finally get to probability. And this is classical probability, which simply means that all outcomes are equally likely to occur each time you perform the experiment. The coins are fair, which means that they’re equally likely to land with the heads up or the tails up each time they’re flipped; the dice are fair, which means….well, you get the idea. It doesn’t mean that on every two flips you’ll get one head and one tail, or that every six rolls of a die will yield a 1, a 2, etc. It just means that in the long run the outcomes will even out in frequency. (Much later in the course we’ll talk about how you can show that certain dice aren’t fair, or if you can’t show that then you’ll conclude that maybe they are fair.) We talk about the probability of an event, E, occurring. This is a number between 0 and 1 – always! We write P(E), pronounced “P of E,” in the manner of function notation. So we can write: 0 P( E ) 1 to express the fact that the probability of an event is between 0 and 1. It’s 0 if the event is impossible (like rolling a 7 with one die), it’s 1 if it’s a sure thing, and if it might or might not happen (what we call a conditional event), it’s bigger than 0 and less than 1. And the closer the probability is to 1 the more confident we are of it happening. So here’s the formula for determining the probability of an event E: P( E ) n( E ) n( S ) In other words, the probability is the fraction of the outcomes in the sample space that are also in the event. If you ever get a probability that is greater than 1, you've probably made a mistake. Let’s go back to rolling the die. The probability of rolling a 4 is 1/6, because there’s one outcome in the event “rolling a 4” and 6 in the sample space of “rolling a die.” The probability of rolling at least a 5 is 2/6=1/3. The probability of rolling an even number is 3/6=1/2. Let’s look at the coin flipping. Refer to the sample spaces and their sizes and the events of getting a certain number of heads and their sizes to get these results: One coin: P(1H)=P(0H)=1/2 Two coins: P(2H)=1/4 P(1H)=2/4 P(0H)=1/4 Three coins: P(3H)=1/8 P(2H)=3/8 P(1H)=3/8 P(0H)=1/8 The numerators of these probabilities form some amazing patterns, which you may be familiar with if you know about Pascal’s Triangle. Empirical Probability and Frequency Think about how the probability formula is dependent on the notion of fairness, or equally-likely outcomes. If a coin weren’t fair, we couldn’t say that P(1H)=1/2. We would have no idea what the probability of getting a head would be, except by flipping the coin many, many times and seeing what fraction of the flips come up heads. In other words, we’d have to observe what actually happens and count the frequencies of particular outcomes. This is the basis for empirical probability, in which probabilities are determined not by sample spaces with equally-likely outcomes but by observing the fraction of times the probability experiment (like flipping the coin) comes out one way or another. The word “empirical” comes from a Greek word meaning experienced. Empirical probability is the result of experience, not deduction. Since empirical probability must be directly based on frequencies of observable events, this way of looking at probability is also called a "frequentist" approach. For example, what is the chance of you being hit by lightning within this year? Without a nice mathematical model, our best bet is looking at the frequency of lightning strikes compiled by national weather services: By dividing the number of reported injuries and death (400), by the size of the U.S. population (300,000,000), we arrive at the figure 1/750,000, although it's obviously a very rough estimate. For example, what about all the strikes that were unreported? Also by using the U.S. population as the denominator, we are assuming that each person in the U.S. is a random experiment with two outcomes: "being struck", and "not being struck". But we all know that lightnings don't happen like coin flips -certain weather conditions contribute to lightning more than others. By avoiding hiding under a tree in a thunderstorm, we are probably able to greatly reduce the probability of being struck by lightning. This might lead you wonder how we can call a coin fair if not by using empirical probability, by flipping it lots of times. It would be possible to make a coin without any markings, except maybe for labels of heads and tails written in very light-weight ink, and with uniform density and symmetry, and then we could assume that it is fair because of the physics of flipping it. Later in the course we’ll see how we can conclude that certain processes (rolls of dice is what we’ll use) are most likely fair or not using empirical probability, but for now we’ll just say that the assumption of fairness is an abstract one that enables us to develop the classical theory of probability using equallylikely outcomes. The discussion can get philosophical very quickly if we keep pursuing the meaning of probability. Contingency Tables Now we've seen some examples of empirical probability, we are now ready to introduce a more sophisticated way of counting frequencies. It’s called a contingency table, and it’s a way of tabulating data from two separate variables for each subject in the sample. For instance, looking at the Class Data, we might wonder whether there are differences between men and women in the favorite colors they pick. You might get some idea of this by just looking at the colors listed for the men and for the women, but it would be hard to draw conclusions from just the lists. So we categorize each person in the data base by their sex and by their favorite color, and we present the results in a neat table. Immediately we encounter a problem, or at least something that needs to be decided (called a protocol if you’re analyzing data). Whereas it’s easy to say there will be two categories for sex, what about for favorite color? If we list all the colors separately there will be too many categories for our minds to take in comfortably. We have to break the colors down into a few groups. There are endless ways to do this. Each way would produce a different contingency table. We just have to choose one way and stick to it. Here’s my way: Primary (blue, red, and yellow), Secondary (green, orange, and purple), and Other (everything else, like white and brown and black and silver and anything with white in it, like a pastel). Even with this decision, there are individual colors which we have to choose which of the three categories they belong to. Let’s call maroon Primary, turquoise Secondary, and lavender Other. Let’s just leave out the person who didn’t have a favorite color. We then make a table that looks like this: Primary Male Female Secondary Other Now we’re ready to put numbers in the six cells in the table. We could start with tally marks, looking at each line in the Class Data Base and putting a tally mark in the correct box. For instance, Person #1, being a male who chose blue, would be tallied in the upper left box. When we finish, the table would look like this: Primary Secondary Other Male 14 9 2 Female 12 24 11 That’s all we’ll do with the contingency table this time, just tabulate it. I hope you agree that it makes the patterns of favorite colors of the two sexes a lot more apparent. For women, Other was almost as popular as Primary, but Secondary was the most popular, whereas for men Primary outstripped Secondary, and Other was hardly there! This is just the beginning of the kind of analysis we’ll be doing.