5 – Probability (Chapter 3 of the optional text) Basic ideas about probabilities: What they are where they come from Simple probability models Properties of probabilities (sample space, events, unions, interactions, etc…) Conditional Probabilities and the Concept of Independence Baye’s Rule How to calculate probabilities: Using table of counts Using properties of probabilities, such as independence. I toss a fair coin (where ‘fair’ means ‘equally likely outcomes’) What are the possible outcomes? What is the probability it will turn up heads? I choose a duck nest at random and look at predation What are the possible outcomes? What is the probability the duck nest is predated? A probability is a number….. WHERE DO PROBABILITIES COME FROM? Probabilities from models (games, genetics, certain types of random experiments) The probability of getting a four when a fair dice is rolled is ________. Probabilities from data (or _________________ probabilities) What is the probability that a randomly selected starling is female? – In a random sample of n = 67 starlings 40 are found to female. – The estimated probability that a randomly chosen starling is female is 36 Subjective probabilities: The probability that there will be another outbreak of ebola in Africa within the next year is 0.1. The probability of snow in the next 24 hours is very high or 80%. A doctor may state that a patient’s chances of a full recovery are 70%. PROBABILITIES FROM DATA - SOME BASIC IDEAS Example 1 - Below is a table containing the results of treatment for patients with different types of Hodgkin’s disease. The data was collected by taking a random sample of 538 patients diagnosed with some form of Hodgkin’s disease thus both type of Hodgkin’s and response to treatment are random. Response to Treatment None Partial Positive Type of LD 44 Hodgkin’s LP 12 Disease MC 58 NS 12 COLUMN 126 TOTALS 10 18 54 16 98 ROW TOTALS 18 72 74 104 154 266 68 96 314 n = 538 Histological Types of Hodgkin’s Disease LD = lymphocyte depletion LP = lymphocyte predominant MC = mixed cellularity NS = nodular schlerosis For a patient selected at random from these 538 Hodgkin’s patients, find the probability that the patient: (a) had a positive response (b) had at least some response to treatment. (c) had LP and had a positive response to treatment. (d) had LP or NS for their histological type. 37 CONDITIONAL PROBABILITY and INDEPENDENCE • The sample space is reduced. • Key words that indicate conditional probability are: given, amongst, for those with, … Conditional Probability “The probability of event A occurring given that event B has already occurred” is written in shorthand as Formal Definition P(A | B) = Independence Events A and B are said to be independent if Example 1 (cont’d) - Conditional Probabilities from Hodgkin’s Example Response to Treatment None Partial Positive Type of LD 44 Hodgkin’s LP 12 Disease MC 58 NS 12 COLUMN 126 TOTALS 10 18 54 16 98 ROW TOTALS 18 72 74 104 154 266 68 96 314 n = 538 38 Example 2 - A study was conducted in 1991 by the University of Wisconsin and the Wisconsin Department of Transportation in which linked police reports and hospital discharge records were used to assess, among other things, the risk for head injury for motorcyclists in motor-vehicle crashes. The data shown below can be used to examine the relationship between helmet use and whether brain injury was sustained in the accident. Brain Injury Helmet Worn 17 No Brain Injury Row Totals 977 994 No Helmet 97 1918 2015 Column Totals 114 2895 3009 For information on how to do this type of analysis in JMP see the tutorial Bivariate Displays for Categorical Data on the course website. a) What is the probability that a motorcycle accident victim in Wisconsin suffered brain injury? b) What is the probability that a motorcyclist involved in an accident was wearing a helmet? Can this be used to estimate the probability that a randomly sampled motorcyclist in WI wears a helmet? c) What is the probability that a motorcyclist suffered brain injury given that they were wearing a helmet? d) What is the probability that a motorcyclist not wearing a helmet suffered brain injury? e) How many times more likely is a motorcyclist not wearing a helmet to sustain a brain injury? This ratio is called the ________________ or __________________ . 39 RELATIVE RISK (RR) and ODDS RATIO (OR) The Odds for an event A are defined as The Odds Ratio associated with a “risk factor” are defined as Example 3 - Age at First Pregnancy and Cervical Cancer A case-control study was conducted to determine whether there was increased risk of cervical cancer amongst women who had their first child before age 25. A sample of 49 women with cervical cancer was taken of which 42 had their first child before the age of 25. From a sample of 317 “similar” women without cervical cancer it was found that 203 of them had their first child before age 25. Do these data suggest that having a child at or before age 25 increases risk of cervical cancer? Cervical Cancer: Case or Control Case Control Column Totals Age at First Age < 25 Pregnancy Age > 25 Row Totals n= a) Why can’t we meaningfully calculated P(cervical cancer|risk factor status)? b) Find P(risk factor|disease status) for each group of women. c) What are the odds for the risk factor amongst the cases? Amongst the controls? d) What is odds ratio for having the risk factor associated with being a case? e) Even though it is not appropriate to do so, calculate the P(disease|risk factor status) and the odds for disease for both risk factor groups. 40 f) Finally calculate the odds ratio for having cervical cancer associated with having first pregnancy at or before age 25. What do we find? Why do you suppose the OR is much more commonly used than RR? Properties of the OR: 1) OR = ________ Risk factor present Risk factor absent Disease Present (case) a c Disease Absent (control) b d 2) When the disease is rare in the population being studied, e.g. P(disease) < .10, then there is little difference between the RR and OR, with the difference getting smaller the rarer the disease is. Thus for many diseases RR OR , which makes it easier to discuss and interpret odds ratios because we can state the results in terms of how many times more likely something is rather than using a multiplicative statement in terms of the odds. Age at First Pregnancy and Cervical Cancer (cont’d) Disease Status Age at 1st Pregnancy Case Control Row Totals Age < 25 Age > 25 Column Totals a b 42 203 c d 7 114 121 49 317 n = 366 245 41