Perspective on Astrostatistics from Particle Physics Louis Lyons Particle Physics, Oxford

advertisement
Perspective on Astrostatistics
from Particle Physics
Louis Lyons
Particle Physics, Oxford
(CDF experiment, Fermilab)
l.lyons@physics.ox.ac.uk
SCMA IV
Penn State
15th June 2006
1
Topics
•
•
•
•
•
•
Basic Particle Physics analyses
Similarities between Particles and Astrophysics issues
Differences
What Astrophysicists do particularly well
What Particle Physicists have learnt
Conclusions
2
Typical Analysis
• Parameter determination:
dn/dt = 1/τ * exp(-t / τ)
Worry about backgrounds, t resolution, t-dependent
efficiency
• 1) Reconstruct tracks
• 2) Select real events
• 3) Select wanted events
• 4) Extract t from L and v
• 5) Model signal and background
• 6) Likelihood fit for lifetime and statistical error
• 7) Estimate systematic error
τ ± στ (stat) ± στ (syst)
Goodness of fit
9
Typical Analysis
Hypothesis testing: Peak or statistical fluctuation?
Why 5σ ?
a) Past experience
b) Look elsewhere
c) Bayes priors
10
Similarities
•
•
•
•
•
•
•
•
•
Large data sets
{ATLAS: event = Mbyte; total = 10 Pbytes}
Experimental resolution
Acceptance
Systematics
Separating signal from background
Parameter estimation
Testing models (versus alternative?)
Search for signals: Setting limits or Discovery
SCMA and PHYSTAT
11
Differences
• Bayes or Frequentism?
• Background
• Specific Astrophysics issues
Time dependence
Spatial structures
Correlations
Non-parametric methods
Visualisation
Cosmic variance
• Blind analyses
`

12
Bayesian versus Frequentism
Bayesian
Basis of
method
Bayes Theorem 
Posterior probability
distribution
Frequentist
Uses pdf for data,
for fixed parameters
Meaning of
Degree of belief
probability
Prob of
Yes
parameters?
Frequentist definition
Needs prior? Yes
No
Choice of
interval?
Data
considered
Likelihood
principle?
Yes
Yes (except F+C)
Only data you have
...+ other possible data
Yes
No
Anathema
13
Bayesian versus Frequentism
Bayesian
Ensemble of
experiments
No
Final
statement
Posterior probability
distribution
Unphysical/
empty ranges
Excluded by prior
Systematics
Integrate over prior
Coverage
Decision
making
Unimportant
Yes (uses cost function)
Frequentist
Yes (but often not
explicit)
Parameter values 
Data is likely
Can occur
Extend dimensionality
of frequentist
construction
Built-in
Not useful
14
Bayesianism versus Frequentism
“Bayesians address the question everyone is
interested in, by using assumptions no-one
believes”
“Frequentists use impeccable logic to deal
with an issue of no interest to anyone”
15
Differences
• Bayes or Frequentism?
• Background
• Specific Astrophysics issues
Time dependence
Spatial structures
Correlations
Non-parametric methods
Visualisation
Cosmic variance
• Blind analyses
`
16
What Astrophysicists do well
•
•
•
•
•
•
•
•
Glorious pictures

Scientists + Statisticians working together
Sharing data
Making data publicly available
Dealing with large data sets
Visualisation
Funding for Astrostatistics
Statistical software
17
Whirlpool Galaxy
Width of Z0
3 light neutrinos
18
What Astrophysicists do well
•
•
•
•
•
•
•
Glorious pictures
Sharing data
Making data publicly available
Dealing with large data sets
Visualisation
Funding for Astrostatistics
Statistical software
19
What Particle Physicists now know
•
(ln L) = 0.5 rule
•
Unbinned Lmax and Goodness of fit
pU
pL L dp  0.90
• Prob (data | hypothesis) ≠ Prob (hypothesis | data)
•
Comparing 2 hypotheses
•
Δ(c
2)
≠ c
2
• Bounded parameters: Feldman and Cousins
• Use correct L (Punzi effect)
•
Blind analyses
20
ΔlnL = -1/2 rule
If L(μ) is Gaussian, following definitions of σ are
equivalent:
1) RMS of L(µ)
2) 1/√(-d2L/dµ2)
3) ln(L(μ±σ) = ln(L(μ0)) -1/2
If L(μ) is non-Gaussian, these are no longer the same
“Procedure 3) above still gives interval that contains the
true value of parameter μ with 68% probability”
(Even Gaussian L(µ) might not)
Heinrich: CDF note 6438 (see CDF Statistics Committee Web-page)
Barlow: Phystat05
21
COVERAGE
How often does quoted range for parameter include param’s true value?
N.B. Coverage is a property of METHOD, not of a particular exptl result
Coverage can vary with

Study coverage of different methods of Poisson parameter
observation of number of events n

, from
100%
Hope for:
Nominal
value
C ( )

22
Coverage : L approach (Not frequentist)
P(n,μ) = e-μμn/n!
-2 lnλ< 1
(Joel Heinrich CDF note 6438)
λ = P(n,μ)/P(n,μbest)
UNDERCOVERS
24
Frequentist central intervals, NEVER
undercover
(Conservative at both ends)
25
Unbinned Lmax and Goodness of Fit?
Find params by maximising L
So larger L better than smaller L
So Lmax gives Goodness of Fit??
Bad
Good?
Great?
Monte Carlo distribution
of unbinned Lmax
Frequency
Lmax
27
Example 1
Lmax and Goodness of Fit?
Fit exponential to times t1, t2 ,t3 …….
L =   e  t
[Joel Heinrich, CDF 5639]
i
ln Lmax = -N(1 + ln tav)
i.e. Depends only on AVERAGE t, but is
INDEPENDENT OF DISTRIBUTION OF t
(except for……..)
(Average t is a sufficient statistic)
Variation of Lmax in Monte Carlo is due to variations in samples’ average t , but
NOT TO BETTER OR WORSE FIT
pdf
Same average t
same Lmax
t
29
Now for Likelihood
When parameter changes from λ to τ = 1/λ
(a’) L does not change
dn/dt = 1/τ exp{-t/τ}
and so L(τ;t) = L(λ=1/τ;t)
because identical numbers occur in evaluations of the two L’s
BUT
(b’)
0
L
(;t ) d  
0

L
(;t ) d 
0
So it is NOT meaningful to integrate L
(However,………)
33
CONCLUSION:

pu
L dp  
NOT recognised statistical procedure
pl
[Metric dependent:
τ range agrees with τpred
λ range inconsistent with 1/τpred ]
BUT
1) Could regard as “black box”
2) Make respectable by L
Bayes’ posterior
Posterior(λ) ~ L(λ)* Prior(λ)
[and Prior(λ) can be constant]
34
P (Data;Theory)

P (Theory;Data)
Theory = male or female
Data = pregnant or not pregnant
P (pregnant ; female) ~ 3%
36
P (Data;Theory)

P (Theory;Data)
Theory = male or female
Data = pregnant or not pregnant
P (pregnant ; female) ~ 3%
but
P (female ; pregnant) >>>3%
37
P (Data;Theory)

P (Theory;Data)
HIGGS SEARCH at CERN
Is data consistent with Standard Model?
or with Standard Model + Higgs?
End of Sept 2000 Data not very consistent with S.M.
Prob (Data ; S.M.) < 1% valid frequentist statement
Turned by the press into: Prob (S.M. ; Data) < 1%
and therefore
Prob (Higgs ; Data) > 99%
i.e. “It is almost certain that the Higgs has been seen”
38
p-value ≠ Prob of hypothesis being correct
After Conference Banquet speech:
“Of those results that have been quoted as
significant at the 99% level, about half
have turned out to be wrong!”
Supposed to be funny, but in fact is perfectly OK
40
PARADOX
Histogram with 100 bins
Fit 1 parameter
Smin: χ2 with NDF = 99 (Expected χ2 = 99 ± 14)
For our data, Smin(p0) = 90
Is p1 acceptable if S(p1) = 115?
1) YES.
2) NO.
Very acceptable χ2 probability
σp from S(p0 +σp) = Smin +1 = 91
But S(p1) – S(p0) = 25
So p1 is 5σ away from best value
41
42
44
45
Comparing data with different hypotheses
46
χ2 with ν degrees of freedom?
1) ν = data – free parameters ?
Why asymptotic (apart from Poisson  Gaussian) ?
a) Fit flatish histogram with
y = N {1 + 10-6 cos(x-x0)} x0 = free param
b) Neutrino oscillations: almost degenerate parameters
y ~ 1 – A sin2(1.27 Δm2 L/E)
2 parameters
1 – A (1.27 Δm2 L/E)2
1 parameter
Small Δm2
47
χ2 with ν degrees of freedom?
2) Is difference in χ2 distributed as χ2 ?
H0 is true.
Also fit with H1 with k extra params
e. g. Look for Gaussian peak on top of smooth background
y = C(x) + A exp{-0.5 ((x-x0)/σ)2}
Is χ2H0 - χ2H1 distributed as χ2 with ν = k = 3 ?
Relevant for assessing whether enhancement in data is just a
statistical fluctuation, or something more interesting
N.B. Under H0 (y = C(x)) : A=0 (boundary of physical region)
x0 and σ undefined
48
Is difference in χ2 distributed as χ2 ?
So need to determine the Δχ2 distribution by Monte Carlo
N.B.
1) Determining Δχ2 for hypothesis H1 when data is generated
according to H0 is not trivial, because there will be lots of
local minima
2) If we are interested in 5σ significance level, needs lots of
MC simulations
50
51
52
Xobs = -2 Now gives upper limit
56
58
Getting L wrong: Punzi effect
Giovanni Punzi @ PHYSTAT2003
“Comments on L fits with variable resolution”
Separate two close signals, when resolution σ varies event
by event, and is different for 2 signals
e.g. 1) Signal 1 1+cos2θ
Signal 2
Isotropic
and different parts of detector give different σ
2) M (or τ)
Different numbers of tracks  different σM (or στ)
59
Punzi Effect
Events characterised by xi and σi
A events centred on x = 0
B events centred on x = 1
L(f)wrong = Π [f * G(xi,0,σi) + (1-f) * G(xi,1,σi)]
L(f)right = Π [f*p(xi,σi;A) + (1-f) * p(xi,σi;B)]
p(S,T) = p(S|T) * p(T)
p(xi,σi|A) = p(xi|σi,A) * p(σi|A)
= G(xi,0,σi) * p(σi|A)
So
L(f)right = Π[f * G(xi,0,σi) * p(σi|A) + (1-f) * G(xi,1,σi) * p(σi|B)]
If p(σ|A) = p(σ|B), Lright = Lwrong but NOT otherwise
60
Punzi Effect
Punzi’s Monte Carlo for
A : G(x,0, A)
B : G(x,1, B)
fA = 1/3
Lwrong
Lright
A
B
1.0
1 .0
0.336(3)
0.08
Same
1.0
1.1
0.374(4)
0.08
0. 333(0)
0
1.0
2.0
0.645(6)
0.12
0.333(0)
0
12
1.5 3
0.514(7)
0.14
0.335(2) 0.03
1.0
12
0.482(9)
0.09
0.333(0)
fA
f
fA
f
0
1) Lwrong OK for p(A)  p(B) , but otherwise BIASSED
2) Lright unbiassed, but Lwrong biassed (enormously)!
3) Lright gives smaller σf than Lwrong
61
Explanation of Punzi bias
σA = 1
σB = 2
A events with σ = 1
B events with σ = 2
x 
ACTUAL DISTRIBUTION
x
FITTING FUNCTION
[NA/NB variable, but same for A and B events]
Fit gives upward bias for NA/NB because (i) that is much better for A events; and
(ii) it does not hurt too much for B events
62
Another scenario for Punzi problem: PID
A
π
B
M
K
TOF
Originally:
Positions of peaks = constant
K-peak  π-peak at large momentum
σi variable, (σi)A ≠ (σi)B
σi ~ constant, pK ≠ pπ
COMMON FEATURE: Separation / Error ≠ Constant
Where else??
MORAL: Beware of event-by-event variables whose pdf’s do not
appear in L
63
Avoiding Punzi Bias
• Include p(σ|A) and p(σ|B) in fit
(But then, for example, particle identification may be determined
more by momentum distribution than by PID)
OR
• Fit each range of σi separately, and add (NA)i
 (NA)total, and similarly for B
Incorrect method using Lwrong uses weighted
average of (fA)j, assumed to be independent
of j
Talk by Catastini at PHYSTAT05
64
BLIND ANALYSES
Why blind analysis?
Methods of blinding
Selections, corrections, method
Add random number to result *
Study procedure with simulation only
Look at only first fraction of data
Keep the signal box closed
Keep MC parameters hidden
Keep fraction visible for each bin hidden
After analysis is unblinded, ……..
* Luis Alvarez suggestion re “discovery” of free quarks
65
Conclusions
• Common problems: scope for
learning from each other
Large data sets
Separating signal
Testing models
Signal / Discovery
• Targetted Workshops e.g.
SAMSI: JanMay 2006
Banff:
July 2006
(Limits with nuisance params; significance tests: signal-bgd sepn.)
• Summer schools: Spain, FNAL,……
Thanks to Stefen Lauritzen, Lance Miller, Subir Sarkar, SAMSI, Roberto
Trotta.………
66
• Excellent Conference:
Very big
THANK YOU
to Jogesh and Eric !
67
Download