Experimental Uncertainty and Probability PHY310: Lecture 03 Road Map

advertisement
02/04/07
PHY310: Lecture 03
Experimental Uncertainty
and Probability
Road Map
The meaning of experimental uncertainty
The fundamental concepts of probability
PHY310: Statistical Data Analysis
1
02/04/07
Experimental Uncertainty
Statements you might have heard
The temperature is
22 ± 1 °C
The electron charge is
1.602176462 ± 0.000000063 x 10-19 C
The solar neutrino flux is
2.35 ± 0.02 (stat) ± 0.08 (sys) x 106 cm-2 s-1
Figures you might have seen
PHY310: Statistical Data Analysis
2
02/04/07
What Does the Uncertainty Mean?
The electron charge is 1.602176462 ± 0.000000063 x 10-19 C
We intuitively know what this means
The “best estimate” of the value is 1.602176462
There is some probability (usually 68%) that the true charge is between
1.602176309 and 1.602176525.
This is a statement about our knowledge. Or is it?
Depends on the definition of probability
The True Value cannot be found statistically
This is fundamental:
There is a single true value.
No matter how precise a measurement gets, truth is unknowable.
PHY310: Statistical Data Analysis
3
02/04/07
What Does Uncertainty Mean?
A thermometer with a random uncertainty of ±5K is used to measure a
true temperature T=0.1K
About half the measurements are like T = 1 ± 5K
About half the measurements are like T =-1 ± 5K
Is this reasonable?
You KNOW the temperature is greater than zero (2nd law of
thermodynamics)
But, the thermometer does measure negative values.
Time to define a confidence interval, but it depends on the definition of
probability
PHY310: Statistical Data Analysis
4
02/04/07
Classical Probability
Definition: Probability is the relative frequency of an event as the
number of trials tends towards infinity.
This is an objective definition
There are some strange consequences
How do you assign uncertainty to an approximation?
What about systematic uncertainty? (Discussion later this semester)
How do you define “relative frequency” without referencing probability?
PHY310: Statistical Data Analysis
5
02/04/07
Confidence Interval
There is a single true value.
There is no such thing as a “classical probability” of a true value.
We don't (can't!) know that value.
Must construct a meaningful distribution to represent uncertainty as a
probability:
A confidence interval is a member of a set of intervals with contain the true
value with given frequency
This means that T = -1 ± 5 K is a member of a set of intervals that contains the
true value 68% of the time
In the usual since, there is not a 68%chance that the true temperature is in
the confidence interval
Consider T = -6 ± 5 K
The true temperature must be outside of this interval
You can't say that there is a 50% change that T < -6 K
That is OK!
PHY310: Statistical Data Analysis
6
02/04/07
Confidence Intervals Reiterated
A 68% confidence interval is a member of a set of intervals of which
68% contain the true value.
A false statement: “There is a 68% change that the true value is in the
confidence interval”
A single confidence interval (one member of the set) doesn'tnecessarily tell
you much about the rest of the set
If you are near a physical boundary, report the expected sensitivity of your
experiment (e.g. Tsens = 5 K)
PHY310: Statistical Data Analysis
7
Confusing the Issue:
Subjective/Bayesian/Modern Probability
02/04/07
Definition: Probability is a measure of the degree of belief that an event
will occur.
More general than the classical definition of probability
In fact, the classical definition is a special case of the subjective definition
Matches the colloquial meaning of probability
More importantly matches our ideas about theoretical and systematic
uncertainty
Examples:
T = 22 ± 1 °C means there is a 50% chance that T < 22 °C
If your thermometer measures T =-6 K there is a 50% chance that T < ~4 K
BUT
This introduces a subjective degree of belief
Considered EVIL by some physicists
This view is considered SILLY by some statisticians
PHY310: Statistical Data Analysis
8
02/04/07
Sorting it out
When you report an experimental result
Use a “Classical” confidence interval to report your measurements
Understand that your systematic uncertainty is “Subjective”
Understand that your theoretical uncertainty is “Subjective”
Keep the classical and subjective uncertainties separate
The solar neutrino flux is 2.35 ± 0.02 (stat) ± 0.08 (sys) x 106 cm-2 s-1
When you need to make a decision use a Subjective confidence interval
When your write a paper:
Include enough information so that the reader can calculateboth types of
confidence intervals
PHY310: Statistical Data Analysis
9
02/04/07
Notation for Probability
P(A): The probability of A
This is the probability that an event A will occur
P is not a function, it's just notation
When A is discreet, this is a single number (e.g. the probability of a coin toss)
When A is continuous, this is represented by a function
The probability of A satisfies
0 ≤ P(A) ≤ 1
P(not A) = 1- P(A)
P(A|B): The conditional probability of A given B
This is the probability that an event A will occur given that B has occurred
P is not a function, it's just notation (&c)
PHY310: Statistical Data Analysis
10
02/04/07
Random Variables
When we talk about P(A), we implicitly assume that A is being drawn
from a set of possible values (the “Parent Distribution”)
The value of P(A) is the ratio the number of times A occurs in the parent
distribution to the total number of elements in the parent distribution
Example:
P  A=
Parent Distribution: {AAABBCCCCD}
P(A) = 3/10
P(B) = 1/5
P(C) = 2/5
P(D) = 1/10
Sum Rule: P(A) + P(B) + P(C) + P(D) = 1
This is the “Law of Total Probability”
Number of times A is in the set
Number of elements in the set
∑ P  Ai =1
i
If the set is continuous it's got an infinite number of elements
A variable with values drawn from a parent distribution is called a
“Random Variable”
Can be continuous (x, y, z, t, &c), or discrete (n, m, &c)
PHY310: Statistical Data Analysis
11
02/04/07
Describing Continuous Parent
Distributions
Think about a random variable x drawn from a continuous parent
distribution
We want to describe the probability thatx0 is between x and x+dx
P(x<x0<x+dx) is the probability that x0 is in the interval [x, x+dx]
This can be described using a “Probability Density Function”
P(x<x0<x+dx) = f(x)dx
f(x) is called the “probability density function” or p.d.f.
The law of total probability gives the normalization
∞
∫ f  x dx=1
−∞
Sometimes its useful to deal with the “Cumulative Distribution”
x
F  x = ∫ f  x 'dx '
−∞
PHY310: Statistical Data Analysis
12
Multi-Dimensional P.D.F.s
Normalization
∞
∞
y+dy
∫ dx −∞
∫ dy f  x , y =1
−∞
Marginalization
∞
P [ x , xdx ]=f x  x  dx= ∫ dy f  x , y 
−∞
y
Conditional Probability of x given y
f x , y
f x ,y
P [ x , xdx ]∣ y =f  x ; y  dx=
=
∫ dx ' f  x ' , y  f y  y 
02/04/07
x
x+dx
A p.d.f. can depend on several parameters, for instance f(x,y)
The probability that a measurement is in the both intervals x[ ,x+dx] and
[y,y+dy] is P([x,x+dx],[y,y+dy]) = f(x,y)dxdy
Sometimes you need to know the probability that x is in the interval
[x,x+dx], but y can be any value:
“Marginalize w.r.t. y”
Probability of that a measurement is in an interval [x,x+dx] given y is
written P([x,x+dx]|y) = f(x;y)dx
PHY310: Statistical Data Analysis
13
02/04/07
Bayes Theorem
P  A∣ B=
P  B∣ A P  A
P  B
This is the jackknife of probability theory
P(A) is the “prior probability” for A
P(B) is the probability that B will occur (the “normalization”)
P(A|B) is the “posterior probability” for A
You have to be extremely careful with Bayes Theorem if you are trying
to have a classical probability
With subjective probability:
the prior probability is what we know about A before we start
the posterior probability is what we learned about A from our measurement
This is used a lot in information theory, robotics, control theory
(engineering), artificial intelligence, &c
PHY310: Statistical Data Analysis
14
Bayes Theorem Example
You go for a medical test and the result comes back positive
What you know
Only 0.1% of the population has the disease
So you can assume that P(disease) = 0.001 and P(no disease) = 0.999
Notice this is subjective: if you are at high risk, P(disease) might be higher!
The test is 98% efficient to detect the disease
P(+|disease) = 0.98 and P(- |disease) = 0.02
The test has 3% false positives:
P(+|no disease) = 0.03 and P(- |no disease) = 0.97
What you want to know: What is the probability that you have the disease?
P disease∣+=
P disease∣ +=
02/04/07
P +∣disease P disease
P +
P +∣disease P disease
P +∣disease P diseaseP +∣no disease P no disease
P disease∣+=
0.98×0.001
=0.032
0.98×0.0010.03×0.999
PHY310: Statistical Data Analysis
15
Download