Probability Topic 4 Probability • Until now we’ve discussed descriptive statistics – Methods for organizing and summarizing data – Graphical and numerical displays – Linear regressions • Collection and production of data – – – – Sample design Observational studies Design of Experiments Simulation Probability • We will now study a few more topics that will set the stage for our study of inferential statistics • We’ve mentioned already that inferential statistics involves making conclusions about a population based on information/data collected from a sample from that population Probability • Because we are drawing conclusions about the population based on a sample, we can never be certain that our conclusions are 100% correct. • That is, there is some uncertainty. • Probability is the study of uncertainty and provides some of the foundation for our study of inferential statistics – confidence intervals and hypothesis testing. Spinning Wheel Activity • This experiment consists of spinning the spinner 3 times and recording the numbers as they occur. We want to determine the proportion of times that at least one digit occurs in the correct position. For example, in the number 123, all of the digits are in their proper positions, but in the number 331, none are. Guess, Experiment, Theory • First, let’s guess the proportion of times at least one digit will occur in the proper position. • How would you simulate this if you didn’t have a spinner? • Let’s do the activity. • Later we’ll calculate the theoretical probability. Probability • • • • • Chance exists all around us. Human design – casinos In nature – sex of a child Probability is the study of chance or uncertainty. Chance behavior – like our activity – that seems haphazard and unpredictable in the short run, has a regular and predictable pattern the the long run. Let’s do another activity • Flipping a coin • Let’s flip it 100 times and record/graph the proportion of heads. • In the short run, this proportion or probability is unpredictable. • In the long run, the proportion of heads approaches the probability of getting ahead -- .5. Random vs Haphazard • Random is not synonymous with haphazard. • Random is a description of a kind of order that emerges in the long run. – Uncertain but a regular pattern in the long run • We often encounter the unpredictable side of randomness in everyday life, but we rarely see enough repetitions of the same phenomenon to observe the long run regularity that probability describes. • Probability is the proportion of times an outcome occurs in the long run – long term relative frequency. • Probability theory is the study of uncertainty, the study of random behavior. Terminology • Chance experiment: an activity or situation where there is uncertainty about what outcome out of many possible outcomes will occur. – Our spinning wheel activity – Equal Likelihood model • Sample space: the collection of all possible outcomes of a chance experiment – {123} {121} {122} {111} … – Tree diagram More Terminology • Event: collection of outcomes from the sample space of a chance experiment – A: second digit is correct • Simple event: an event that is made up of exactly one outcome – B: all digits are correct Notation and Displaying Events • Notation: E = event all digits in correct place • Venn Diagrams E not E Relationship Among Events • Complement – AC – The event A does not occur – Venn Diagram • Intersection – A B ‘A and B’ – The event A and B occur – Venn Diagram • Union – A B ‘A or B’ – The event either A or B occurs Examples • Playing Cards – – – – – A = card selected is a kings of hearts B = card selected is a king C = card selected is a heart D = card selected is a face card a. (not D) b. (B and C) c. (B or C) • Describe the events More about relationships • Mutually exclusive or disjoint events: two or more events are said to be mutually exclusive if at most one of them can occur when the experiment is performed, that is, if no two events have outcomes in common • E = card selected is black • F = card selected is a diamond • D = card selected is a face card • Relationship between E and F, E and D, F and D? Approaches to Probability • Probability is the study of uncertainty – the mathematical characterization of uncertainty • Notation: P(A) = • Interpretation of Probability: – Near 0 – unlikely – Near 1 – very likely – Frequentist interpretation – proportion of times an event will occur in a large repetition of trial of the experiment • The Classical Approach • The Relative Frequency Approach (Empirical Approach) • The Subjective Approach Subjective Approach • We’ve already seen this approach during the sinning wheel activity when I asked you what you thought the proportion for digits in correct places would be • We use the subjective approach to quantify likelihood of events all the time – Should I wear a rain coat – Should I speed – Should I do my homework Empirical Approach • Also called the relative frequency approach • Using an experiment or a simulation allows us to determine probabilities using the empirical approach • num berof tim es E occurs P E total num berof trials • As the number of repetitions of a chance experiment increases, the chance that the relative frequency of occurrence for an event will differ from the true probability of that event by more than a small amount approaches 0 Law of Large Numbers • As our number of trials grows larger and larger, the relative frequency of our event approaches the true probability of that event. Examples of the Empirical Approach • Spinner Activity • Flipping the coin • For equally likely outcomes, you can arrive at the correct probability using either approach • Not so with outcomes that are not equally likely Classical Approach • First, some notation – probability of an event • Classical Approach says num berof outcom es favorableto E P E num berof outcom esin the sam ple space • To use the classical approach to determine probabilities, all outcomes must be equally likely – games of chance • Go back to our spinner activity to calculate the probability Limitation to the Classical Approach • Consider accident rates of young adults driving cars. Who do you think has a higher insurance rate, an 18 year old or a 35 year old? Why? • Using the classical approach when you consider an insurance policy for an 18 and 35 year old, there are 2 outcomes –having an accident and not having an accident – and those outcomes are equally likely for the 18 and 35 yo. The probabilities that each would have an accident using the classical approach is .5. Which defies reason and experience. Basic Properties of Probability • For any event E, 0 P( E ) 1 • If S is the sample space for an experiment, then P( S ) 1 • If two events E and F are mutually exclusive, then P( E or F ) P( E ) P( F ) – General Addition Rule: P( AorB) P( A) P(B) P( A & B) • For any event E, P( E) P( E ) 1 C Conditional Probability • Sometimes the knowledge that an event occurred changes the likelihood that another event will occur • A simple example. Let’s say we have 2 events – – – – A: You choose a face card B: You choose a King Without any other info, what is P(B)? What is the probability of choosing a King given you have chosen a face card? • Let’s say 1% of the population has a certain disease. You can’t tell if you have the disease unless you are tested. Let’s say that 80% of the people who test positive have the disease – 20% who test positive don’t have the disease. – – – – E: You have the disease F: You test positive Without any info, what is P(E)? Let’s say you test positive, what is P(E)? In other words, what is the probability you have the disease given you test positive? Notation for Conditional Probability • P( E \ F ) • Read as “the probability of event E given F has occurred” An example using two-way tables • The ASU Statistical Summary provides information on various characteristics of the ASU faculty. Data on age and rank of ASU faculty is presented in the table below. Full Prof Assoc Prof Asst Prof Inst Total <30 2 3 57 6 68 30-39 52 170 163 17 402 40-49 156 125 61 6 348 50-59 145 68 36 4 253 >60 75 15 3 0 93 Total 430 381 320 33 1164 Determine some probabilities • • • • • Let’s select faculty at random P(Asst Prof) P(30-39) P(Full Prof \ 40-49) P(50-59 \ Asst Prof) Definition P( E F ) • P(E \ F) = P( F ) • Let’s use this definition to determine – P(Instr \ 30-39) – P(40-49 \ Asst Prof) A Summary • • • • • • Probability – the study of uncertainty Chance experiment Event/Simple event/Sample space Venn Diagrams – forming new events Complement, union, intersection, disjoint Determining probabilities – Classical Approach – Empirical Approach • Law of Large Numbers • Properties of Probability • Conditional Probability Independence • In conditional probability, the knowledge of one having occurred affects the likelihood that some other event will occur. • It is also possible that the knowledge of one event occurring will not change the probability of occurrence of a second event – these events are independent. • Can you think of a simple chance experiment where events are independent? An example from the text Single Family (F) Condo (C) Multi Family Total ARM (E) .4 .21 .09 .7 Fixed .10 .09 .11 .3 Total .5 .3 .2 Determine the following probabilities: Mort is ARM given mort is for single family home Mort is for single family home given is an ARM Mort is ARM given mort is for Condo Mort is an ARM Definition • Two events E and F are said to be independent if P( E \ F ) P( E ) P( E \ F ) P( E ) • If E and F are not independent, they are call dependent events • If P( E \ F ) P( E ) then P( F \ E ) P( F ) • Nothing we learn about one event will change the likelihood of the other event. Multiplication Rule for Independent Events • If events E and F are independent, then P( E F ) P( E ) P( F ) Sampling with and without replacement • We’ve talked about sampling in earlier topics. We made sure our samples were random – basically, the probability of being selected is the same for each individual and each sample of size n. • When we sample with replacement, thisis easy to accomplish. • In reality, when we sample we sample without replacement – but this introduces bias doesn’t it? • Under certain circumstances, selections in sampling without replacement -- independence is not a cause for concern Relatively small sample • If a random sample of size n is taken from a population of size N, independence can be assumed when N is at least 20 times larger than n. The theoretical probabilities of selecting without replacement differ insignificantly from the theoretical probabilities of selecting with replacement. • Examples 6.20 and 6.21 HW I (Indpendence) • Read POD Section 6.5 • Problems 6.36, 6.37, 6.38, 6.43, 6.44, 6.51, 6.52