BIOINF 2118 01-Introduction to Probability p.
1 of 5
Probability and Statistics
Probability: Process Data
“Given a process or mechanism, after many repetitions what kinds of outcomes (data) can we
Statistics: Data Process expect ?”
“Given some data, what can we say about the process or mechanism that gave rise to the data?”
Example: Diagnostic testing
A patient arrives in the clinic. The doctor suspects that the patient suffers a particular illness.
The true state, “healthy” or “sick”, is unknown; therefore the doctor orders a diagnostic test.
True state of patient = = “the process”. The result = X = “the data”.
Top arrow: probability : If we knew the process, then the probabilities of the result would be “known”
(at least roughly, from previous patients’ data).
Bottom arrow: statistics : After the test, we still do not know the true state of the patient, but from the data X we now have better knowledge.
DATA
PROCESS
X=negative X=positive X=indeterminate TOTAL
= “ healthy ” 0.95 0.03 0.02 1.00
= “ sick ” 0.03 0.95 0.02 1.00
TABLE 1: each row is a model ; the collection of rows is a model family .
Here is the unknown “ true state of nature ”, and X , the test result, is an observation .
Generating a test result X is the result of a process under a particular state of nature .
The sample space is the set of possible observations,
X = {negative, positive, indeterminate}.
The parameter space is the set of possible “states of nature,
= {healthy, sick}.
Each table entry is a conditional probability .
The “healthy” row is a probability distribution ,
Tthe “sick” row is another probability distribution,.
For example, if = “healthy”, the probability distribution is:
Pr( X =negative) = 0.95, Pr( X =positive)=0.03, Pr( X =indeterminate)=0.02.
The pair of rows is a model family (or a model ).
Each column is a likelihood function , L . (We write
L
:
Q ® +
.)
For example if X =negative is observed, then
L ( = “healthy”) = 0.95, L ( = “sick”)=0.03.
In the context of a likelihood, these numbers are NOT probabilities. ( Columns don’t add to one.)
BIOINF 2118 01-Introduction to Probability p.
2 of 5
Now suppose that the prevalence of the disease is 10%. Prevalence = Pr( = “sick”).
The following table is the joint distribution of and X .
X =negative X =positive
= “healthy” 0.855 0.027
= “sick” 0.003 0.095
TOTAL 0.858 0.122
X =indeterminate
0.018
0.002
0.020
TABLE 2: joint probabilities for each combination
TOTAL
0.90
0.10
1.00
Interpretations of Probability
• Frequency interpretation :
“ ” means:
“ If I do the test repeatedly on a large number of sick patients, then in the long run roughly 95% of the test results will equal positive.
• Subjective (Bayesian) interpretation- before data is observed:
, which means:
“Given what I know now, my current belief is that there’s a 10% chance that this patient is sick.”
Connection with decision-making :
This belief sometimes represents a willingness to gamble that the patient is sick, if the payoff is above the ratio 9-to-1 (0.9/0.1), but not if it’s below 9-to-1.
• Subjective (Bayesian) interpretation- after data is observed:
X =negative (1) X =positive (2) X =indeterminate (3)
= “healthy” 0.9965 0.221 0.9
= “sick” 0.0035 0.779 0.1
TOTAL 1.0000 1.000 1.0
TABLE 3: conditional probabilities, given X
,which means:
“Given what I knew before, plus what I know now (the data), my current belief is that there’s a 77.9% chance that this patient is sick.”
Table 3 combines the two types of probability: belief and frequency.
Now the gambling odds are 0.779/(1-0.779) = 3.52.
Key concept: odds : [0, 1] [0, ∞ ] (inclusive)
Read it as follows:
“odds is a function from the unit interval to the extended non-negative real line”
BIOINF 2118 01-Introduction to Probability p.
3 of 5
NOTATION and TERMINOLOGY
“ Statistics ” is assessing whether the patient is healthy or sick, after observing X .
We saw this above, in the form of the posterior probability, 0.779.
When the prevalence is not known, we have to use the frequency interpretation of probability, using the models in TABLE 1. A great tool is the likelihood ratio, LR :
.
LR ( X =1) = 0.03/0.95 ~ 1/32, LR ( X =2) =0.95/0.03 ~ 32, LR ( X =3) = 1.
The LR compares two explanations for the observation.
Here we see that the observation X =negative lowers the probability of “sick”, and X =positive raises it. Observing X =indeterminate does not provide any information, as reflected in LR =1.
For each value of X , we can see in what way the value of the probability changes, but we cannot say what the final probability is because we do not know the initial probability.
Experiments
•An experiment is any process in which the outcome is uncertain.
•Examples: Rolling a die, conducting a clinical trial, conducting a survey, getting married,….
•The sample space X is the set of possible outcomes.
•Example: For our diagnostic test, X = {1, 2, 3}. For rolling a die, X = {1, 2, 3, 4, 5, 6}.
Sets and Subsets
•A sample space X is a set .
•An outcome is an element of the sample space, .
•An event is a subset of the sample space. For example, is the event of rolling an even number with a die.
•An event A implies another event B if every outcome in A also belongs to B . This relation is denoted , “ A is a subset of B ”.
•A parameter space is a set .
•A hypothesis is a subset .
Empty, Finite and Infinite Sets
•The empty set contains no outcomes. It is denoted by . For all events A ,
•Sets may be finite or infinite. is finite.
•Infinite sets may be countably infinite or uncountably infinite.
X =[0,1] is uncountable.
.
(all ratios) is countable (but infinite).
BIOINF 2118 01-Introduction to Probability
Union, Intersection, Complement
Concept union (either/or) intersection (both) complement (not) empty set, or null set set product
Subset
A c
or { }
X element of
p.
4 of 5 symbol
Disjoint Events
•A and B are disjoint if and only if
•Events
. .
are disjoint or mutually exclusive if, for every
R function or value union( ) intersect( ) setdiff( )
NULL, character(0) expand.grid( ) all(is.element( )) is.element( )
.
Some formal definitions:
A probability space is a sample space X , together with a mapping Pr from events in a sample space to [0,1] (in mathematical notation, Pr: 2 X [0,1]) that satisfy three axioms:
Axiom 1: For every event A , .
(To be technically correct: there may be very esoteric sets which cannot be assigned a probability.)
Axiom 2: Pr( X ) = 1.
Axiom 3: For every “countable” sequence of disjoint events ,
Some probability theorems:
BIOINF 2118 01-Introduction to Probability p.
5 of 5
Given a parameter space and a sample space X , a model family indexed by is a set of probability distributions
When X is observed, the likelihood function is the function
. defined by
.
(Later, we’ll modify this slightly for “continuous distributions”.)
The likelihood function:
= the parameter space
The probability model:
X
In Table 1, copied here, each row is a probability model, each column is a likelihood function
PROCESS
X=negative X=positive X=indeterminate TOTAL
= “ healthy ” 0.95 0.03 0.02 1.00
= “ sick ” 0.03 0.95 0.02 1.00