Perspective on Astrostatistics from Particle Physics Louis Lyons Particle Physics, Oxford (CDF experiment, Fermilab) l.lyons@physics.ox.ac.uk SCMA IV Penn State 15th June 2006 1 Topics • • • • • • Basic Particle Physics analyses Similarities between Particles and Astrophysics issues Differences What Astrophysicists do particularly well What Particle Physicists have learnt Conclusions 2 Typical Analysis • Parameter determination: dn/dt = 1/τ * exp(-t / τ) Worry about backgrounds, t resolution, t-dependent efficiency • 1) Reconstruct tracks • 2) Select real events • 3) Select wanted events • 4) Extract t from L and v • 5) Model signal and background • 6) Likelihood fit for lifetime and statistical error • 7) Estimate systematic error τ ± στ (stat) ± στ (syst) Goodness of fit 9 Typical Analysis Hypothesis testing: Peak or statistical fluctuation? Why 5σ ? a) Past experience b) Look elsewhere c) Bayes priors 10 Similarities • • • • • • • • • Large data sets {ATLAS: event = Mbyte; total = 10 Pbytes} Experimental resolution Acceptance Systematics Separating signal from background Parameter estimation Testing models (versus alternative?) Search for signals: Setting limits or Discovery SCMA and PHYSTAT 11 Differences • Bayes or Frequentism? • Background • Specific Astrophysics issues Time dependence Spatial structures Correlations Non-parametric methods Visualisation Cosmic variance • Blind analyses ` 12 Bayesian versus Frequentism Bayesian Basis of method Bayes Theorem Posterior probability distribution Frequentist Uses pdf for data, for fixed parameters Meaning of Degree of belief probability Prob of Yes parameters? Frequentist definition Needs prior? Yes No Choice of interval? Data considered Likelihood principle? Yes Yes (except F+C) Only data you have ...+ other possible data Yes No Anathema 13 Bayesian versus Frequentism Bayesian Ensemble of experiments No Final statement Posterior probability distribution Unphysical/ empty ranges Excluded by prior Systematics Integrate over prior Coverage Decision making Unimportant Yes (uses cost function) Frequentist Yes (but often not explicit) Parameter values Data is likely Can occur Extend dimensionality of frequentist construction Built-in Not useful 14 Bayesianism versus Frequentism “Bayesians address the question everyone is interested in, by using assumptions no-one believes” “Frequentists use impeccable logic to deal with an issue of no interest to anyone” 15 Differences • Bayes or Frequentism? • Background • Specific Astrophysics issues Time dependence Spatial structures Correlations Non-parametric methods Visualisation Cosmic variance • Blind analyses ` 16 What Astrophysicists do well • • • • • • • • Glorious pictures Scientists + Statisticians working together Sharing data Making data publicly available Dealing with large data sets Visualisation Funding for Astrostatistics Statistical software 17 Whirlpool Galaxy Width of Z0 3 light neutrinos 18 What Astrophysicists do well • • • • • • • Glorious pictures Sharing data Making data publicly available Dealing with large data sets Visualisation Funding for Astrostatistics Statistical software 19 What Particle Physicists now know • (ln L) = 0.5 rule • Unbinned Lmax and Goodness of fit pU pL L dp 0.90 • Prob (data | hypothesis) ≠ Prob (hypothesis | data) • Comparing 2 hypotheses • Δ(c 2) ≠ c 2 • Bounded parameters: Feldman and Cousins • Use correct L (Punzi effect) • Blind analyses 20 ΔlnL = -1/2 rule If L(μ) is Gaussian, following definitions of σ are equivalent: 1) RMS of L(µ) 2) 1/√(-d2L/dµ2) 3) ln(L(μ±σ) = ln(L(μ0)) -1/2 If L(μ) is non-Gaussian, these are no longer the same “Procedure 3) above still gives interval that contains the true value of parameter μ with 68% probability” (Even Gaussian L(µ) might not) Heinrich: CDF note 6438 (see CDF Statistics Committee Web-page) Barlow: Phystat05 21 COVERAGE How often does quoted range for parameter include param’s true value? N.B. Coverage is a property of METHOD, not of a particular exptl result Coverage can vary with Study coverage of different methods of Poisson parameter observation of number of events n , from 100% Hope for: Nominal value C ( ) 22 Coverage : L approach (Not frequentist) P(n,μ) = e-μμn/n! -2 lnλ< 1 (Joel Heinrich CDF note 6438) λ = P(n,μ)/P(n,μbest) UNDERCOVERS 24 Frequentist central intervals, NEVER undercover (Conservative at both ends) 25 Unbinned Lmax and Goodness of Fit? Find params by maximising L So larger L better than smaller L So Lmax gives Goodness of Fit?? Bad Good? Great? Monte Carlo distribution of unbinned Lmax Frequency Lmax 27 Example 1 Lmax and Goodness of Fit? Fit exponential to times t1, t2 ,t3 ……. L = e t [Joel Heinrich, CDF 5639] i ln Lmax = -N(1 + ln tav) i.e. Depends only on AVERAGE t, but is INDEPENDENT OF DISTRIBUTION OF t (except for……..) (Average t is a sufficient statistic) Variation of Lmax in Monte Carlo is due to variations in samples’ average t , but NOT TO BETTER OR WORSE FIT pdf Same average t same Lmax t 29 Now for Likelihood When parameter changes from λ to τ = 1/λ (a’) L does not change dn/dt = 1/τ exp{-t/τ} and so L(τ;t) = L(λ=1/τ;t) because identical numbers occur in evaluations of the two L’s BUT (b’) 0 L (;t ) d 0 L (;t ) d 0 So it is NOT meaningful to integrate L (However,………) 33 CONCLUSION: pu L dp NOT recognised statistical procedure pl [Metric dependent: τ range agrees with τpred λ range inconsistent with 1/τpred ] BUT 1) Could regard as “black box” 2) Make respectable by L Bayes’ posterior Posterior(λ) ~ L(λ)* Prior(λ) [and Prior(λ) can be constant] 34 P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3% 36 P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3% but P (female ; pregnant) >>>3% 37 P (Data;Theory) P (Theory;Data) HIGGS SEARCH at CERN Is data consistent with Standard Model? or with Standard Model + Higgs? End of Sept 2000 Data not very consistent with S.M. Prob (Data ; S.M.) < 1% valid frequentist statement Turned by the press into: Prob (S.M. ; Data) < 1% and therefore Prob (Higgs ; Data) > 99% i.e. “It is almost certain that the Higgs has been seen” 38 p-value ≠ Prob of hypothesis being correct After Conference Banquet speech: “Of those results that have been quoted as significant at the 99% level, about half have turned out to be wrong!” Supposed to be funny, but in fact is perfectly OK 40 PARADOX Histogram with 100 bins Fit 1 parameter Smin: χ2 with NDF = 99 (Expected χ2 = 99 ± 14) For our data, Smin(p0) = 90 Is p1 acceptable if S(p1) = 115? 1) YES. 2) NO. Very acceptable χ2 probability σp from S(p0 +σp) = Smin +1 = 91 But S(p1) – S(p0) = 25 So p1 is 5σ away from best value 41 42 44 45 Comparing data with different hypotheses 46 χ2 with ν degrees of freedom? 1) ν = data – free parameters ? Why asymptotic (apart from Poisson Gaussian) ? a) Fit flatish histogram with y = N {1 + 10-6 cos(x-x0)} x0 = free param b) Neutrino oscillations: almost degenerate parameters y ~ 1 – A sin2(1.27 Δm2 L/E) 2 parameters 1 – A (1.27 Δm2 L/E)2 1 parameter Small Δm2 47 χ2 with ν degrees of freedom? 2) Is difference in χ2 distributed as χ2 ? H0 is true. Also fit with H1 with k extra params e. g. Look for Gaussian peak on top of smooth background y = C(x) + A exp{-0.5 ((x-x0)/σ)2} Is χ2H0 - χ2H1 distributed as χ2 with ν = k = 3 ? Relevant for assessing whether enhancement in data is just a statistical fluctuation, or something more interesting N.B. Under H0 (y = C(x)) : A=0 (boundary of physical region) x0 and σ undefined 48 Is difference in χ2 distributed as χ2 ? So need to determine the Δχ2 distribution by Monte Carlo N.B. 1) Determining Δχ2 for hypothesis H1 when data is generated according to H0 is not trivial, because there will be lots of local minima 2) If we are interested in 5σ significance level, needs lots of MC simulations 50 51 52 Xobs = -2 Now gives upper limit 56 58 Getting L wrong: Punzi effect Giovanni Punzi @ PHYSTAT2003 “Comments on L fits with variable resolution” Separate two close signals, when resolution σ varies event by event, and is different for 2 signals e.g. 1) Signal 1 1+cos2θ Signal 2 Isotropic and different parts of detector give different σ 2) M (or τ) Different numbers of tracks different σM (or στ) 59 Punzi Effect Events characterised by xi and σi A events centred on x = 0 B events centred on x = 1 L(f)wrong = Π [f * G(xi,0,σi) + (1-f) * G(xi,1,σi)] L(f)right = Π [f*p(xi,σi;A) + (1-f) * p(xi,σi;B)] p(S,T) = p(S|T) * p(T) p(xi,σi|A) = p(xi|σi,A) * p(σi|A) = G(xi,0,σi) * p(σi|A) So L(f)right = Π[f * G(xi,0,σi) * p(σi|A) + (1-f) * G(xi,1,σi) * p(σi|B)] If p(σ|A) = p(σ|B), Lright = Lwrong but NOT otherwise 60 Punzi Effect Punzi’s Monte Carlo for A : G(x,0, A) B : G(x,1, B) fA = 1/3 Lwrong Lright A B 1.0 1 .0 0.336(3) 0.08 Same 1.0 1.1 0.374(4) 0.08 0. 333(0) 0 1.0 2.0 0.645(6) 0.12 0.333(0) 0 12 1.5 3 0.514(7) 0.14 0.335(2) 0.03 1.0 12 0.482(9) 0.09 0.333(0) fA f fA f 0 1) Lwrong OK for p(A) p(B) , but otherwise BIASSED 2) Lright unbiassed, but Lwrong biassed (enormously)! 3) Lright gives smaller σf than Lwrong 61 Explanation of Punzi bias σA = 1 σB = 2 A events with σ = 1 B events with σ = 2 x ACTUAL DISTRIBUTION x FITTING FUNCTION [NA/NB variable, but same for A and B events] Fit gives upward bias for NA/NB because (i) that is much better for A events; and (ii) it does not hurt too much for B events 62 Another scenario for Punzi problem: PID A π B M K TOF Originally: Positions of peaks = constant K-peak π-peak at large momentum σi variable, (σi)A ≠ (σi)B σi ~ constant, pK ≠ pπ COMMON FEATURE: Separation / Error ≠ Constant Where else?? MORAL: Beware of event-by-event variables whose pdf’s do not appear in L 63 Avoiding Punzi Bias • Include p(σ|A) and p(σ|B) in fit (But then, for example, particle identification may be determined more by momentum distribution than by PID) OR • Fit each range of σi separately, and add (NA)i (NA)total, and similarly for B Incorrect method using Lwrong uses weighted average of (fA)j, assumed to be independent of j Talk by Catastini at PHYSTAT05 64 BLIND ANALYSES Why blind analysis? Methods of blinding Selections, corrections, method Add random number to result * Study procedure with simulation only Look at only first fraction of data Keep the signal box closed Keep MC parameters hidden Keep fraction visible for each bin hidden After analysis is unblinded, …….. * Luis Alvarez suggestion re “discovery” of free quarks 65 Conclusions • Common problems: scope for learning from each other Large data sets Separating signal Testing models Signal / Discovery • Targetted Workshops e.g. SAMSI: JanMay 2006 Banff: July 2006 (Limits with nuisance params; significance tests: signal-bgd sepn.) • Summer schools: Spain, FNAL,…… Thanks to Stefen Lauritzen, Lance Miller, Subir Sarkar, SAMSI, Roberto Trotta.……… 66 • Excellent Conference: Very big THANK YOU to Jogesh and Eric ! 67