PubH 6414 Course Overview so far

advertisement
Course Overview so far
PubH 6414
Lesson 5
• Describing the Data: Categorical & Numerical
• Measuring the strength of relationship between two
variables
– Correlation coefficient for numerical data
– RR and OR for nominal variables
• Probability and Probability Models – this is where we
are now
P b bilit
Probability
• Estimating Parameters – Confidence Intervals
• Statistical Tests of Significance – Hypothesis Tests and
p-values
• Statistical Models: ANOVA and Regression
PubH 6414 Lesson 5
1
PubH 6414 Lesson 5
Lesson 5 Outline
2
Lesson 5 Outline
• Definition and properties of Probability
• Sensitivity, Specificity, NPV and PPV as
Conditional Probabilities
• Mutually exclusive events and the addition rule
• Calculating PPV and NPV using Bayes’
Theorem
• Non-mutually exclusive events
– Marginal, joint and conditional probabilities
• Independent events and the multiplication rule
• Non-independent events
PubH 6414 Lesson 5
3
PubH 6414 Lesson 5
Randomness
Sources of Uncertainty
• Definition:
– The phenomenon is random if the outcome is
uncertain.
•
•
•
•
4
• Why are measurements of variables uncertain (i.e.
variable)?
–
–
–
–
Th outcome
The
t
off a single
i l flip
fli off a coin.
i
The next participant’s blood type.
Your blood pressure.
Tomorrow’s weather.
Sampling Variability: different samples give different results.
results
Measurement Variability: instrument calibration, observer skill.
Intrinsic Variability: circadian rhythm, hormonal cycle.
Modeling Variability: different models, applied to the same data,
can give different results.
Handling such uncertainty is the foundation of statistics!!
PubH 6414 Lesson 5
5
PubH 6414 Lesson 5
6
1
Terminology and Definitions
Classic Probability Example
• A classic example of probability is tossing a coin
• Terminology:
– Trial: One repetition of an experiment that can
have one or more possible outcomes
– Event:
E
t one off the
th possible
ibl outcomes
t
off a trial
ti l
• Definition of Probability
7
3.
4.
5.
– It could be ‘heads’ or ‘tails’
PubH 6414 Lesson 5
The probability P(A) of any event A satisfies 0<P(A)<1.
If S is the sample space in a probability model, then P(S)
= 1.
The complement
p
of anyy event A is the event that A does
not occur, written as Ac. The complement rule states
that P(Ac) = 1-P(A).
Two events A and B are disjoint if they have no
outcomes in common and so can never occur
simultaneously. If A and B are disjoint, P(A or B) =
P(A) + P(B).
If events A and B are not disjoint P(A or B) = P(A) +
P(B) – P(A and B).
PubH 6414 Lesson 5
9
The following Blood Type distributions will be
used to illustrate some probability rules
• Distribution of blood types in the US
• Distribution of blood types by gender
• Distribution of blood types in Australia and
Finland
These distributions are included on the Lesson
5 Practice Exercises
PubH 6414 Lesson 5
10
Distribution of US Blood Types
Probability: Rule 1
Blood Type
• The probability P(A) of any event A satisfies
0<P(A)<1.
• Probability = 1 if the event is certain.
• Probability = 0 if the event is impossible.
• Probability values between 0 and 1 express
some degree of uncertainty about whether or
not the event will occur in a single trial
PubH 6414 Lesson 5
8
Examples used to describe
Probability Rules 1-5
Probability Rules:
1.
2.
• The outcome for a single toss is random
• After many repetitions of the trial, the long-term
relative frequency of ‘heads’ can be calculated
and the probability of getting heads is known:
P(heads) = 0.5
– The probability that a given event occurs is
the long-term relative frequency of the the
event over many trials
PubH 6414 Lesson 5
– A trial is a single toss of the coin
– The event is getting ‘heads’
11
Probability
O
0.42
A
0.43
B
0.11
AB
0 04
0.04
Event A: Blood Type A
What is P(A)?
PubH 6414 Lesson 5
12
2
Addition Rule of Probability for
Mutually Exclusive events: Rule 2
Complementary events: Rule 3
• The complement of any event A is the event that A does not
occur, written as Ac. The complement rule states that P(Ac) = 1P(A).
If S is the sample space in a probability model, then
P(S) = 1.
• What is the probability of not having blood type O.
•
The sum of the probabilities for all 4 blood
types = 1.0
PubH 6414 Lesson 5
13
PubH 6414 Lesson 5
Addition Rule of Probability for
Mutually Exclusive events: Rule 4
Mutually Exclusive (Disjoint)
Events
Two events A and B are disjoint if they have no outcomes in
common and so can never occur simultaneously. If A and B are
disjoint, P(A or B) = P(A) + P(B).
• Two or more events are mutually exclusive if
they cannot occur simultaneously (that is,
they do not have any elements in common).
•
• Blood Types are disjoint events because
each individual can have one (and only one)
blood type
PubH 6414 Lesson 5
15
Addition Rule of Probability for
Mutually Exclusive events: Rule 4
What is the probability of a randomly selected person
having either blood type O or blood type A?
PubH 6414 Lesson 5
What is the probability of a randomly selected
person having blood type A, or B or AB?
PubH 6414 Lesson 5
16
Practice Exercises
Two events A and B are disjoint if they have no outcomes in
common and so can never occur simultaneously. If A and B are
disjoint, P(A or B) = P(A) + P(B).
•
14
17
• Work through the addition rule and
complementary events practice exercise
problems using the US blood type
di t ib ti
distributions
– page 1
PubH 6414 Lesson 5
18
3
Gender and US Blood type
distributions
Non-mutually Exclusive Events
• Events are non-mutually exclusive if they can
occur simultaneously (not disjoint)
• Gender and blood type are non-mutually
exclusive events because each person has
b th events
both
t
Probabilities
Blood Type
O
Female
Total
0 21
0.21
0 21
0.21
0 42
0.42
A 0.215
0.215
0.43
B 0.055
0.055
0.11
0.02
0.02
0.04
0.50
0.50
1.00
AB
Total
Male
PubH 6414 Lesson 5
19
• The following probabilities can be identified
for non-mutually exclusive events
– Marginal probability: the probability that one of the
events occurs
– Joint probability: the probability that two events
occur simultaneously
– Conditional probability: the probability that one
event occurs given that the other event has
occurred
21
Joint Probability
The marginal probability is the probability of a
single event
These probabilities are in the margins of the
table of gender / blood type distributions
For example
example, the probability of having blood
type O, regardless of gender, is 0.42
On the Lesson 5 practice exercises, find the
other marginal probabilities for each blood
type and gender.
PubH 6414 Lesson 5
22
Conditional Probability
• A joint probability is the probability that two
events occur simultaneously’
• Using
g the blood type
yp distribution by
yg
gender,,
the P(Male and Type A) = 0.215
– Find the other blood type and gender joint
probabilities on the Lesson 5 practice exercises
PubH 6414 Lesson 5
20
Marginal Probability
Non-mutually Exclusive Events
PubH 6414 Lesson 5
PubH 6414 Lesson 5
23
A conditional probability is the probability of
one event given that the other event has
occurred.
The notation for the conditional probability
of having blood type O given that you are
female is P(Type O | Female).
PubH 6414 Lesson 5
24
4
Conditional Probability
Example
Conditional Probability: Rule 6
Using the Distribution of blood types by gender, what is the
P(Type O | Female)?
• When P(A) >0, the conditional probability
of event B, given A has occurred:
P (B | A ) =
P(A and B)
P(A)
Calculate other conditional probabilities on the Practice Exercises
PubH 6414 Lesson 5
25
For example: Is male gender independent of
type O blood?
• Two events are independent if the
probability of one does not affect the
probability
p
y of the other.
• If two events A and B are independent,
P(A and B ) = P(A)P(B)
Multiplication rule for independent events
Use the multiplication rule for all blood type /
gender joint probabilities to confirm that
gender and blood type are independent
events – see Lesson 5 practice exercises
27
Independent Events and
Conditional Probabilities
PubH 6414 Lesson 5
28
Independent Events and
Conditional Probabilities
• If two events are independent, the
conditional probability of the event is
equal to the marginal probability of the
event
event.
• For example, given that a participant is
male, does this change the probability
of type A blood?
P ( A | B ) = P ( A)
PubH 6414 Lesson 5
26
Independent Events
Independent Events: Rule 7
PubH 6414 Lesson 5
PubH 6414 Lesson 5
• Confirm that other conditional and marginal
probabilities are equal for the two independent events
of gender and blood type on the Lesson 5 practice
exercises
29
PubH 6414 Lesson 5
30
5
Non-independent Events
Non-independent Events
Non-independence between two events
suggests that a statistical relationship may
exist between them.
• When two events are not independent
– The joint probabilities are not equal to the
product of the two marginal probabilities
P(A and B) ≠ P(A) *P(B)
P(B)
Non-independence: the probability of one
event is affected by the outcome of the
other event
– The conditional probability is not equal to
the marginal probability
P(A|B) ≠ P(A)
PubH 6414 Lesson 5
31
Example of Non-independent
Events
PubH 6414 Lesson 5
Distribution of blood types by
country
• Blood type distributions are not the
same across different countries and
ethnic g
groups.
p
• For simplicity, the probability table on
the next slide assumes an equal
proportion of individuals from each
country
Probabilities
Blood Type
Total
PubH 6414 Lesson 5
33
Australia
Finland
0 155
0.155
Total
O
0 245
0.245
0 40
0.40
A
0.19
0.22
0.41
B
0.05
0.085
0.135
AB
0.015
0.04
0.055
0.50
0.50
1.00
PubH 6414 Lesson 5
34
Conditional Probabilities and
Non-independent events
Joint probabilities and
Non-independent events
• From the distribution of blood types in
Australia and Finland table, the joint
probability of (Type O and Australia) = 0.245
• The conditional probability of Type O blood
given that you are Australian = P(Type O |
Australia) = 0.245 / 0.5 = 0.49
• The product of P(Type O) and P(Australia) =
0.40*0.50 = 0.20
• This conditional probability is not equal to the
marginal probability of Type O for the two
countries: P(Type O) = 0.40
• Confirm that other joint probabilities do not equal the product of
the marginal probabilities on the Lesson 5 Practice exercises
PubH 6414 Lesson 5
32
35
Confirm that other conditional probabilities do not equal the
marginal probabilities on the Lesson 5 practice exercises
PubH 6414 Lesson 5
36
6
Toss two fair coins once. What is the
sample space (outcome space)?
1.
2.
3.
4.
5.
6.
{H T}
{(HH), (HT), (TT)}
{(HH), (HT), (TH), (TT)}
{½}
{¼, ½}
{¼, ½,¼}
How many trials did we do?
1.
2.
3.
4.
One
Two
Three
Four
Eberly and Telke - PubH 6450 Fall
2009
If we were to toss two coins again, in
the same way, would that be an
identical trial?
1. No
2. Yes
Eberly and Telke - PubH 6450 Fall 2009
In this experiment, what is the
probability of getting one head and
one tail?
1.
2.
3
3.
4.
Eberly and Telke - PubH 6450 Fall
2009
In this experiment, what is the
probability of getting two heads?
1.
2.
3
3.
4.
1/2
1/3
1/4
1/6
Eberly and Telke - PubH 6450 Fall 2009
Consider tossing two dice. Below is
the sample space for this experiment.
1/2
1/3
1/4
1/6
Eberly and Telke - PubH 6450 Fall 2009
Eberly and Telke - PubH 6450 Fall
2009
7
What is the probability you roll (3,3)?
1.
2.
3.
4.
1/2
1/6
1/12
1/36
What is the probability the sum of the
numbers is even (i.e. (1,1)=>1+1=2)?
1.
2.
3.
4.
1/2
1/6
1/12
1/36
Eberly and Telke - PubH 6450 Fall
2009
Eberly and Telke - PubH 6450 Fall
2009
What is the probability the sum of the
numbers is even AND you roll (3,3)?
1.
2.
3.
4.
5.
1/2
17/36
1/6
1/12
1/36
What is the probability the sum of the
numbers is even OR you roll (3,3)?
1.
2.
3.
4.
5.
1/2
17/36
1/6
1/12
1/36
Eberly and Telke - PubH 6450 Fall
2009
Eberly and Telke - PubH 6450 Fall
2009
Screening Test Measures
Screening test measures are
conditional probabilities
• Screening tests are used to classify
people as healthy or as falling into one
or more disease categories.
• Screening tests are not 100% accurate
and therefore misclassification is
unavoidable.
Screening tests involve two events: disease (D+ or
D-) and Test result (T+ or T-).
• Sensitivity is the probability of a positive test
given that the disease is present
P (T+ | D+)
• Examples: HIV test, Colonoscopy, Skin tests
for TB, mammograms
PubH 6414 Lesson 5
47
• Specificity is the probability of a negative test
given that the disease is absent
P (T- | D- )
PubH 6414 Lesson 5
48
8
Cancer Screening Probabilities
Cancer Screening Data
Disease (D)
D+
D
DTotal
Test Result (T)
T+
T154
225
362
23 362
23,362
516
23,587
Test Result, T
Total
379
23 724
23,724
24,103
This table of counts of with disease (D+ and D-)
and screening test outcomes (T+ and T-) gives the results
of screening 24,103 individuals for a certain cancer.
PubH 6414 Lesson 5
Disease, D
T+
T-
Total
D+
0.006
0.009
0.015
D-
0.015
0.970
0.985
0.021
0.979
1.00
Total
The probabilities in this table were calculated from the
observed counts on the previous table
P(T+ and D+) = 154 / 24,103 = 0.006
49
PubH 6414 Lesson 5
50
Sensitivity and Specificity
Cancer Screening Example
Test Result, T
Test Result, T
Disease,
D
D+
DTotal
Disease, D
T+
T-
Total
T+
T-
Total
D+
0.006
0.009
0.015
P(T+and D+)
P(T and D+)
P(T-
P(D+)
DD
0 015
0.015
0 970
0.970
0 985
0.985
P(T+and D-)
P(T- and D-)
P(D-)
0.021
0.979
1.00
P(T+)
P(T-)
Sensitivity: P (T+ | D+ ) =
P ( T+ and D+ ) / P (D+) =
Specificity: P (T- | D - ) =
Joint probabilities are highlighted orange
P (T- and D- ) / P (D-) =
Marginal probabilities are highlighted green
PubH 6414 Lesson 5
Total
51
PubH 6414 Lesson 5
52
Positive and Negative
Predictive Values
Screening Test errors
Screening tests do not always accurately identify disease
state of an individual. The two possible errors are called
‘False Negative’ and ‘False Positive’
In addition to Sensitivity and Specificity, two other conditional
probabilities are used to evaluate screening tests
False Negative:
Probability of False Negative = P(T
P(T-|D+)
|D+)
Positive Predictive Value (PPV or PV+):
• the probability that the disease is present given that the test
is positive.
• A conditional probability: P (D+ | T+)
Negative Predictive Value (NPV or PV-):
• The probability that the disease is not present given that the
test is negative
• A conditional probability: P(D- | T-)
P(T-|D+)= 0.009 / 0.015 = 0.60
False Positive:
Probability of False Positive = P(T+|D-)
P(T+|D-) = 0.015 / 0.985 = 0.015
PubH 6414 Lesson 5
53
PubH 6414 Lesson 5
54
9
PPV and NPV
Interpretation of Positive and
Negative Predictive Values
Test Result, T
Disease, D
T+
T-
Total
D+
0.006
0.009
0.015
DD
0 015
0.015
0 970
0.970
0 985
0.985
0.021
0.979
1.00
Total
If the test is positive, what is the probability that the
individual actually has the disease?
• For the cancer screening data, PPV = 0.285 so
only 28.5% of those with positive screening tests
actually have cancer.
PPV = P (D+ | T+ ) =
P ( D+ and T+ ) / P (T+) =
NPV = P (D- | T - ) =
P (D- and T- ) / P (T-) =
• In this example, NPV = 0.99 so 99% of those with
a negative test result don’t have cancer.
PubH 6414 Lesson 5
55
Effect of Prevalence on
Screening test results
Result
Total
• Sensitivity = 98 / 100 = 98%
• Specificity = 97,902 / 99,900 = 98%
AIDS
Yes
No
Total
T+
98
1998
2,096
T-
2
97,902
97,904
100
99,900
100,000
PubH 6414 Lesson 5
56
Example 1 Screening test
measures
Example 1: AIDS screening test for 100,000
people in the general population
Screening
PubH 6414 Lesson 5
• The prevalence of AIDS for those screened
can be calculated from the table. Prevalence
= the proportion of the total with disease =
• PPV for this example=
57
Screening test results when
prevalence of disease is low
PubH 6414 Lesson 5
58
Effect of Prevalence on
Screening test results
• Example 1 illustrates the screening test
results in a population with low prevalence
of the disease (0.1%)
p
y were high.
g
• The sensitivityy and specificity
• The PPV of a screening test depends
not only on the sensitivity and specificity
of the test but also on disease
prevalence in the population screened
• However:
• When the disease prevalence is low in the screening
population, the results of the screening test:
• The higher the prevalence, the higher
the test’s positive predictive value.
– Have low PPV: only 4.7% of those who test positive
actually have the disease.
PubH 6414 Lesson 5
59
PubH 6414 Lesson 5
60
10
Effect of Prevalence on
Screening test results
Using Bayes Theorem to
Calculate PPV and NPV
• Screening for conditions with low prevalence
in the general population is most effective
when done on a high-risk group.
• In the previous calculations of PPV and
NPV we had data about the true disease
status of the individuals (D+ and D-).
• Often we have the screening test results
but are lacking information on the true
disease state of the individual being
screened
– This is the rationale for limiting screening for
certain conditions to older indi
individuals
id als or
individuals with increased risk for the condition.
Page 7 on Exercise 5.
– Solution: Use Baye’s Theorem.
PubH 6414 Lesson 5
61
Bayes’ Theorem:
• Bayes, Thomas (b.
1702, London - d. 1761,
Tunbridge Wells, Kent),
mathematician who first
used probability
inductively and
established a
mathematical basis for
probability inference.
P( A | B ) =
P ( B | A) P ( A)
P ( B | A) P ( A) + P ( B | Ac ) P ( Ac )
Replace A with D+ and replace B with T+:
P(D + | T + ) =
• Source:http://www.bayesian.o
rg/resources/bayes.html
63
Using Bayes’ Theorem to
calculate PPV and NPV
P(T + | D + )P(D + )
P(T + | D + )P(D + ) + P(T + | D-)(D-)
PubH 6414 Lesson 5
64
Bayes’s rule:
An Example
• To begin:
P(D+) = prevalence
P(D ) = 1 - prevalence
P(D-)
P(T-|D-) = specificity
P(T+|D+) = sensitivity
P(T+|D-) = false positive rate
P(T-|D+) = false negative rate
PubH 6414 Lesson 5
62
Application of Bayes’ Theorem
to obtain PPV
Thomas Bayes
PubH 6414 Lesson 5
PubH 6414 Lesson 5
• Suppose the University decides to screen the
faculty for illegal drug use.
• Suppose the true prevalence of regular illegal drug
use among the faculty is 00.1%
1% (P(D=+)=0
(P(D=+)=0.001)
001)
• The sensitivity of the screening test for illegal
drug use has a sensitivity of 99.5%
(P(T=+|D=+)=0.995).
• The specificity of the screening test is 98%
(P(T=-|D=-) = 0.98).
65
PubH 6414 Lesson 5
66
11
Bayes’s rule:
An Example
Bayes’s Rule:
An Example
What is the probability a faculty member is
a regular illegal drug user given he/she has
ap
positive test result?
• To begin:
–
–
–
–
–
–
That is,
P(D=+|T=+)
PubH 6414 Lesson 5
67
• We want the probability of illegal drug use
given a positive test result P(D=+|T=+).
• Use Bayes
Bayes’ss Rule (Positive Predictivity) :
• The probability of illegal drug use given a positive
test result P(D=+|T=+) = 0.0474. This is the
positive predictivity of the test.
P(T = + | D = + ) P( D = + )
P(T = + | D = + ) P( D = + ) + P(T = + | D = −) P( D = −)
0.995 * 0.001
= 0.0474
0.995 * 0.001 + 0.02 * 0.999
PubH 6414 Lesson 5
69
Bayesian Approach to Analysis
•
68
Bayes’s Rule:
An Example
P( D = + | T = +) =
•
PubH 6414 Lesson 5
Bayes’s Rule:
An Example
P( D = + | T = +) =
•
P(D=+) = prevalence
P(D ) =
P(D=-)
P(T=+|D=-) = false positives
P(T=-|D=-) = specificity
P(T=+|D=+) = sensitivity
P(T=-|D=+) = false negatives
In Bayesian analysis, the probability of an event
is based on the collected data and known
probability of prior events.
The prior probability and the data are used to
find the posterior probability.
Bayes’ Theorem: P (A|B) = P(B|A)*P(A)
P(B)
P(A) is the called the prior probability,
P (B) and P(B|A) are determined from the data
P(A|B) is the posterior probability calculated
from the prior probability and the data
PubH 6414 Lesson 5
• So, even with a highly sensitive and highly
specific test the probability that someone is a
regular illegal drug user given they have a positive
test result is only 4.74% (for a population with a
prevalence of 1 in 1000).
PubH 6414 Lesson 5
70
BAYES’ RULE =
Inversion of Probabilities
Getting from P(T+|D+) to P(D+|T+)
and
SNIFFY
71
12
Do you think SNIFFY has the disease?
SNIFFY tested positive for Lyme’s Disease.
1.
2.
3.
4.
No
Maybe
Yes
What is Lyme’s
Disease?
0%
0%
1
0%
0%
3
4
What is P(D+|T+)?
What is P(T+|D+)?
1. Sensitivity
2. The probability of
the disease ggiven a
positive test result.
3. 1-Sensitivity
4. Prevalence
2
1. Sensitivity
2. The probability of
the disease given a
positive
i i test result.
l
3. 1-Sensitivty
4. Prevalence
What is the probability SNIFFY really has
Lyme’s Disease?
P(D+|T+)
P( D + | T +) => You want this!!
Suppose
Sensitivity = 98%
Specificity = 97%
Prevalence = 5%
13
What is P(D+)?
1.
2.
3.
4
4.
5.
What is P(T+|D+)?
98%
97%
95%
5%
3%
1.
2.
3.
4.
5.
98%
97%
95%
5%
3%
What is P(T+|D-)?
1.
2.
3.
4.
5.
98%
97%
95%
5%
3%
1.
2.
3.
4.
5.
What is P(D+|T+)?
Positive Predictive Value
1.
2.
3.
4
4.
5.
98%
97%
95%
85%
63%
What is P(D-)?
98%
97%
95%
5%
3%
What is the probability SNIFFY really has
Lyme’s Disease?
Sensitivity * Prevalence
Sensitivity * Prevalence + False Positives * (1 - Prevalence)
14
Online Clinical Calculator
Readings and Assignments
• http://www.intmed.mcw.edu/clincalc/bayes.html
• Online Clinical Calculator from Medical College
g
of Wisconsin
– Enter Prevalence, Sensitivity, Specificity as decimals
– The Online Clinical Calculator Returns PPV and NPV
A link to this website has been added to the course
weblinks
PubH 6414 Lesson 5
85
• Reading
– Chapter 4 pgs. 63-68
– Chapter 12 pgs. 306 - 309
– Note on Pg. 63 of text: “Our experience indicates
that the concepts underlying statistical inference
are not easily absorbed in a first reading”
• Work through probability calculations on the
Lesson 5 Practice Exercises
• Work through examples in Excel Module 5
• Complete Homework 3 by the due date
PubH 6414 Lesson 5
86
15
Download