02/04/07 PHY310: Lecture 03 Experimental Uncertainty and Probability Road Map The meaning of experimental uncertainty The fundamental concepts of probability PHY310: Statistical Data Analysis 1 02/04/07 Experimental Uncertainty Statements you might have heard The temperature is 22 ± 1 °C The electron charge is 1.602176462 ± 0.000000063 x 10-19 C The solar neutrino flux is 2.35 ± 0.02 (stat) ± 0.08 (sys) x 106 cm-2 s-1 Figures you might have seen PHY310: Statistical Data Analysis 2 02/04/07 What Does the Uncertainty Mean? The electron charge is 1.602176462 ± 0.000000063 x 10-19 C We intuitively know what this means The “best estimate” of the value is 1.602176462 There is some probability (usually 68%) that the true charge is between 1.602176309 and 1.602176525. This is a statement about our knowledge. Or is it? Depends on the definition of probability The True Value cannot be found statistically This is fundamental: There is a single true value. No matter how precise a measurement gets, truth is unknowable. PHY310: Statistical Data Analysis 3 02/04/07 What Does Uncertainty Mean? A thermometer with a random uncertainty of ±5K is used to measure a true temperature T=0.1K About half the measurements are like T = 1 ± 5K About half the measurements are like T =-1 ± 5K Is this reasonable? You KNOW the temperature is greater than zero (2nd law of thermodynamics) But, the thermometer does measure negative values. Time to define a confidence interval, but it depends on the definition of probability PHY310: Statistical Data Analysis 4 02/04/07 Classical Probability Definition: Probability is the relative frequency of an event as the number of trials tends towards infinity. This is an objective definition There are some strange consequences How do you assign uncertainty to an approximation? What about systematic uncertainty? (Discussion later this semester) How do you define “relative frequency” without referencing probability? PHY310: Statistical Data Analysis 5 02/04/07 Confidence Interval There is a single true value. There is no such thing as a “classical probability” of a true value. We don't (can't!) know that value. Must construct a meaningful distribution to represent uncertainty as a probability: A confidence interval is a member of a set of intervals with contain the true value with given frequency This means that T = -1 ± 5 K is a member of a set of intervals that contains the true value 68% of the time In the usual since, there is not a 68%chance that the true temperature is in the confidence interval Consider T = -6 ± 5 K The true temperature must be outside of this interval You can't say that there is a 50% change that T < -6 K That is OK! PHY310: Statistical Data Analysis 6 02/04/07 Confidence Intervals Reiterated A 68% confidence interval is a member of a set of intervals of which 68% contain the true value. A false statement: “There is a 68% change that the true value is in the confidence interval” A single confidence interval (one member of the set) doesn'tnecessarily tell you much about the rest of the set If you are near a physical boundary, report the expected sensitivity of your experiment (e.g. Tsens = 5 K) PHY310: Statistical Data Analysis 7 Confusing the Issue: Subjective/Bayesian/Modern Probability 02/04/07 Definition: Probability is a measure of the degree of belief that an event will occur. More general than the classical definition of probability In fact, the classical definition is a special case of the subjective definition Matches the colloquial meaning of probability More importantly matches our ideas about theoretical and systematic uncertainty Examples: T = 22 ± 1 °C means there is a 50% chance that T < 22 °C If your thermometer measures T =-6 K there is a 50% chance that T < ~4 K BUT This introduces a subjective degree of belief Considered EVIL by some physicists This view is considered SILLY by some statisticians PHY310: Statistical Data Analysis 8 02/04/07 Sorting it out When you report an experimental result Use a “Classical” confidence interval to report your measurements Understand that your systematic uncertainty is “Subjective” Understand that your theoretical uncertainty is “Subjective” Keep the classical and subjective uncertainties separate The solar neutrino flux is 2.35 ± 0.02 (stat) ± 0.08 (sys) x 106 cm-2 s-1 When you need to make a decision use a Subjective confidence interval When your write a paper: Include enough information so that the reader can calculateboth types of confidence intervals PHY310: Statistical Data Analysis 9 02/04/07 Notation for Probability P(A): The probability of A This is the probability that an event A will occur P is not a function, it's just notation When A is discreet, this is a single number (e.g. the probability of a coin toss) When A is continuous, this is represented by a function The probability of A satisfies 0 ≤ P(A) ≤ 1 P(not A) = 1- P(A) P(A|B): The conditional probability of A given B This is the probability that an event A will occur given that B has occurred P is not a function, it's just notation (&c) PHY310: Statistical Data Analysis 10 02/04/07 Random Variables When we talk about P(A), we implicitly assume that A is being drawn from a set of possible values (the “Parent Distribution”) The value of P(A) is the ratio the number of times A occurs in the parent distribution to the total number of elements in the parent distribution Example: P A= Parent Distribution: {AAABBCCCCD} P(A) = 3/10 P(B) = 1/5 P(C) = 2/5 P(D) = 1/10 Sum Rule: P(A) + P(B) + P(C) + P(D) = 1 This is the “Law of Total Probability” Number of times A is in the set Number of elements in the set ∑ P Ai =1 i If the set is continuous it's got an infinite number of elements A variable with values drawn from a parent distribution is called a “Random Variable” Can be continuous (x, y, z, t, &c), or discrete (n, m, &c) PHY310: Statistical Data Analysis 11 02/04/07 Describing Continuous Parent Distributions Think about a random variable x drawn from a continuous parent distribution We want to describe the probability thatx0 is between x and x+dx P(x<x0<x+dx) is the probability that x0 is in the interval [x, x+dx] This can be described using a “Probability Density Function” P(x<x0<x+dx) = f(x)dx f(x) is called the “probability density function” or p.d.f. The law of total probability gives the normalization ∞ ∫ f x dx=1 −∞ Sometimes its useful to deal with the “Cumulative Distribution” x F x = ∫ f x 'dx ' −∞ PHY310: Statistical Data Analysis 12 Multi-Dimensional P.D.F.s Normalization ∞ ∞ y+dy ∫ dx −∞ ∫ dy f x , y =1 −∞ Marginalization ∞ P [ x , xdx ]=f x x dx= ∫ dy f x , y −∞ y Conditional Probability of x given y f x , y f x ,y P [ x , xdx ]∣ y =f x ; y dx= = ∫ dx ' f x ' , y f y y 02/04/07 x x+dx A p.d.f. can depend on several parameters, for instance f(x,y) The probability that a measurement is in the both intervals x[ ,x+dx] and [y,y+dy] is P([x,x+dx],[y,y+dy]) = f(x,y)dxdy Sometimes you need to know the probability that x is in the interval [x,x+dx], but y can be any value: “Marginalize w.r.t. y” Probability of that a measurement is in an interval [x,x+dx] given y is written P([x,x+dx]|y) = f(x;y)dx PHY310: Statistical Data Analysis 13 02/04/07 Bayes Theorem P A∣ B= P B∣ A P A P B This is the jackknife of probability theory P(A) is the “prior probability” for A P(B) is the probability that B will occur (the “normalization”) P(A|B) is the “posterior probability” for A You have to be extremely careful with Bayes Theorem if you are trying to have a classical probability With subjective probability: the prior probability is what we know about A before we start the posterior probability is what we learned about A from our measurement This is used a lot in information theory, robotics, control theory (engineering), artificial intelligence, &c PHY310: Statistical Data Analysis 14 Bayes Theorem Example You go for a medical test and the result comes back positive What you know Only 0.1% of the population has the disease So you can assume that P(disease) = 0.001 and P(no disease) = 0.999 Notice this is subjective: if you are at high risk, P(disease) might be higher! The test is 98% efficient to detect the disease P(+|disease) = 0.98 and P(- |disease) = 0.02 The test has 3% false positives: P(+|no disease) = 0.03 and P(- |no disease) = 0.97 What you want to know: What is the probability that you have the disease? P disease∣+= P disease∣ += 02/04/07 P +∣disease P disease P + P +∣disease P disease P +∣disease P diseaseP +∣no disease P no disease P disease∣+= 0.98×0.001 =0.032 0.98×0.0010.03×0.999 PHY310: Statistical Data Analysis 15