5 – Probability

advertisement
5 – Probability
(Chapter 3 of the optional text)
Basic ideas about probabilities:
 What they are where they come from
 Simple probability models
 Properties of probabilities (sample space, events, unions, interactions, etc…)
 Conditional Probabilities and the Concept of Independence
 Baye’s Rule
 How to calculate probabilities:
 Using table of counts
 Using properties of probabilities, such as independence.
I toss a fair coin (where ‘fair’ means ‘equally likely outcomes’)

What are the possible outcomes?

What is the probability it will turn up heads?
I choose a duck nest at random and look at predation

What are the possible outcomes?

What is the probability the duck nest is predated?
A probability is a number…..
WHERE DO PROBABILITIES COME FROM?
Probabilities from models (games, genetics, certain types of random experiments)
The probability of getting a four when a fair dice is rolled is ________.
Probabilities from data (or _________________ probabilities)
What is the probability that a randomly selected starling is female?
– In a random sample of n = 67 starlings 40 are found to female.
– The estimated probability that a randomly chosen
starling is female is
36
Subjective probabilities:
The probability that there will be another outbreak of ebola in Africa within the next year is 0.1.
The probability of snow in the next 24 hours is very high or 80%.
A doctor may state that a patient’s chances of a full recovery are 70%.
PROBABILITIES FROM DATA - SOME BASIC IDEAS
Example 1 - Below is a table containing the results of treatment for patients with different types of
Hodgkin’s disease. The data was collected by taking a random sample of 538 patients diagnosed with
some form of Hodgkin’s disease thus both type of Hodgkin’s and response to treatment are random.
Response to
Treatment
None Partial Positive
Type of LD
44
Hodgkin’s LP
12
Disease MC
58
NS
12
COLUMN 126
TOTALS
10
18
54
16
98
ROW
TOTALS
18
72
74
104
154
266
68
96
314 n = 538
Histological Types of Hodgkin’s Disease
LD = lymphocyte depletion
LP = lymphocyte predominant
MC = mixed cellularity
NS = nodular schlerosis
For a patient selected at random from these 538 Hodgkin’s patients, find the probability that the patient:
(a)
had a positive response
(b)
had at least some response to treatment.
(c)
had LP and had a positive response to treatment.
(d)
had LP or NS for their histological type.
37
CONDITIONAL PROBABILITY and INDEPENDENCE
•
The sample space is reduced.
•
Key words that indicate conditional probability are: given, amongst, for those with, …
Conditional Probability
“The probability of event A occurring given that event B has already occurred”
is written in shorthand as
Formal Definition
P(A | B) =
Independence
Events A and B are said to be independent if
Example 1 (cont’d) - Conditional Probabilities from Hodgkin’s Example
Response to
Treatment
None Partial Positive
Type of LD
44
Hodgkin’s LP
12
Disease MC
58
NS
12
COLUMN 126
TOTALS
10
18
54
16
98
ROW
TOTALS
18
72
74
104
154
266
68
96
314 n = 538
38
Example 2 - A study was conducted in 1991 by the University of Wisconsin and the Wisconsin Department
of Transportation in which linked police reports and hospital discharge records were used to assess, among
other things, the risk for head injury for motorcyclists in motor-vehicle crashes. The data shown below can
be used to examine the relationship between helmet use and whether brain injury was sustained in the
accident.
Brain Injury
Helmet Worn 17
No Brain Injury
Row
Totals
977
994
No Helmet
97
1918
2015
Column
Totals
114
2895
3009
For information on how to do this type of
analysis in JMP see the tutorial Bivariate
Displays for Categorical Data on the course
website.
a) What is the probability that a motorcycle accident victim in Wisconsin suffered brain injury?
b) What is the probability that a motorcyclist involved in an accident was wearing a helmet? Can this
be used to estimate the probability that a randomly sampled motorcyclist in WI wears a helmet?
c) What is the probability that a motorcyclist suffered brain injury given that they were wearing a
helmet?
d) What is the probability that a motorcyclist not wearing a helmet suffered brain injury?
e) How many times more likely is a motorcyclist not wearing a helmet to sustain a brain injury?
This ratio is called the ________________ or __________________ .
39
RELATIVE RISK (RR) and ODDS RATIO (OR)
The Odds for an event A are defined as
The Odds Ratio associated with a “risk factor” are defined as
Example 3 - Age at First Pregnancy and Cervical Cancer
A case-control study was conducted to determine whether there was increased risk of cervical cancer
amongst women who had their first child before age 25. A sample of 49 women with cervical cancer was
taken of which 42 had their first child before the age of 25. From a sample of 317 “similar” women without
cervical cancer it was found that 203 of them had their first child before age 25. Do these data suggest that
having a child at or before age 25 increases risk of cervical cancer?
Cervical Cancer:
Case or Control
Case Control Column Totals
Age at First Age < 25
Pregnancy
Age > 25
Row Totals
n=
a) Why can’t we meaningfully calculated P(cervical cancer|risk factor status)?
b) Find P(risk factor|disease status) for each group of women.
c) What are the odds for the risk factor amongst the cases? Amongst the controls?
d) What is odds ratio for having the risk factor associated with being a case?
e) Even though it is not appropriate to do so, calculate the P(disease|risk factor status) and the odds for
disease for both risk factor groups.
40
f) Finally calculate the odds ratio for having cervical cancer associated with having first pregnancy at or
before age 25. What do we find? Why do you suppose the OR is much more commonly used than RR?
Properties of the OR:
1) OR =
________
Risk factor present
Risk factor absent
Disease
Present (case)
a
c
Disease
Absent (control)
b
d
2) When the disease is rare in the population being studied, e.g. P(disease) < .10, then there is little
difference between the RR and OR, with the difference getting smaller the rarer the disease is. Thus for
many diseases RR  OR , which makes it easier to discuss and interpret odds ratios because we can state the
results in terms of how many times more likely something is rather than using a multiplicative statement in
terms of the odds.
Age at First Pregnancy and Cervical Cancer (cont’d)
Disease Status
Age at 1st Pregnancy Case Control Row Totals
Age < 25
Age > 25
Column Totals
a
b
42
203
c
d
7
114
121
49
317
n = 366
245
41
Download