23 Jan 2012
Paul Dauncey 1
•
David Futyan presented sideband approach at last H
gg meeting
• https://indico.cern.ch/conferenceDisplay.py?confId=169763
•
Data-driven method used to estimate background shape for Higgs limit, using 2D BDT including mass
• Today, show update of the method
Massfactorised
“kinematic”
BDT
Slice in
BDT, fit to mass
Fit to BDT output shape gg mass
2D
BDT
Select
|
D
M/M
H
|<2% around signal
Paul Dauncey 23 Jan 2012 2
Background
Signal
•
For this particular example with M
H
120 GeV, BDT has
7 output bins
•
Limit extracted from fitting observed shape in data to background + signal
•
Critical to have accurate estimate of background shape in data
• Also need robust estimate of shape errors, including bin-to-bin correlations
•
Error matrix of background shape
nuisance parameters in Higgs limit fit
23 Jan 2012 Paul Dauncey 3
•
Create several sideband windows , all with |
D
M/M
H
| < 2%
•
Three either side of signal region used for
David’s results last week
•
Zeroth order approximation ; shape in sum of sidebands ~ same as shape in signal window
•
Sidebands give good estimate of shape of background in signal region
• Need procedure to quantify this statement
23 Jan 2012 Paul Dauncey 4
•
Whole mass range from 100 to
180 GeV divided into windows of same size in
D
M/M
H
• Allows 15 windows in total
•
Due to careful construction of BDT input variables
•
Fractions per bin are almost independent of central mass of any window
23 Jan 2012 Paul Dauncey 5
Bin 0 Bin 1 Bin 2 Bin 3
Bin 4 Bin 5 Bin 6
•
Some residual mass dependence of the fractions per bin is seen
•
Mainly (but possibly not entirely) due to background composition changing with mass
• Small effect ; first order correction applied to each sideband by using linear dependence from above fits
•
This method was used by David for results shown last week
23 Jan 2012 Paul Dauncey 6
•
For a given Higgs mass, same data were used twice
•
Once in 15-window linear fit, again in sidebands
•
Only 6 of the 15 of windows used for sidebands, so a weak correlation
• A Higgs signal might distort the 15-window linear fits
• David showed this is a tiny effect last week
•
The 15-window linear fits were done independently for each BDT bin
• No fit constraint on fractions having to sum to one
•
Scaling correction applied afterwards
•
The 15-window linear fits were done with least-squares
•
Assuming error =
N of expected value for each bin
23 Jan 2012 Paul Dauncey 7
•
Today, describe modified method for sidebands
•
Streamlines previous procedure; all done in one fit
•
Statistically robust with no re-use of same data
• Allows accurate extraction of errors to use as nuisance parameters
•
Want to fit to fraction in each BDT bin, for each mass window
•
Want these fractions to have linear dependence on mass
• Fraction f bn
• Constant m
0
= p b0
+ p b1
(m−m
0
), for BDT bin b, mass m centred in window n can be any convenient value; take as Higgs mass
•
With this choice, the p b0 give the fractions in the signal window directly
•
Fractions must sum to 1 over BDT bins, for every mass window
•
•
•
•
S b f bn
= 1 =
S b p b0
+ p b1
(m−m
0
)
Only possible for all m−m
0 if
S b
Force constraint by setting p
7−1=6 each of p b0 and p b1
00 p b0
= 1 and
S b
= 1 − S b≠0 p b0 p b1 and p
01
= 0
= − S b≠0 p b1 parameters for b≠0 →12 parameters
23 Jan 2012 Paul Dauncey 8
•
2D fit to sideband windows × BDT output bins simultaneously
•
Normalisations of each sideband window used are free parameters
•
One normalisation parameter per sideband
• Makes NO assumption (Pol5, Pow2, etc) on mass spectrum shape
•
Fit with LL using full Poisson likelihood for each data bin
• Correct even for low occupancy bins
•
Binned LL so gives effective c
2 goodness-of-fit measure
•
NDoF count depends on number of windows N
W
•
12 fraction parameters + N w normalisation parameters
•
Fit to 7×N w data values, so NDoF = 6N w
−12
•
E.g. For
3 sideband windows as used previously
•
N w
= 6 so NDoF = 24
23 Jan 2012 Paul Dauncey 9
•
Consistency fit to all N
W
=15 windows
•
Equivalent to that done by David previously
•
Fit gives c
2 /NDoF = 83.75/78, probability = 30.8%
•
Linear assumption is reasonable, even over 15 windows
Bin 0 Bin 1 Bin 2 Bin 3
Bin 4 Bin 5 Bin 6
•
Results effectively identical to those shown on slide 6
•
Lack of fraction sum constraint and
N errors used previously were good approximations
23 Jan 2012 Paul Dauncey 10
•
Actual fit used for limit shapes has
3 sideband windows either side of signal region
•
Assumes linearity over mass range equivalent to 9.5 of the 15 sidebands
•
Seems good assumption, given that fit to 15 windows looks reasonable
21 high mass sideband bins
21 low mass sideband bins
E.g. fractions in BDT bin 1
M
H window
Paul Dauncey
Fit gives direct estimation of fraction at M
H
11 23 Jan 2012
H
Linear y
1 s fit
N
Log y
•
Errors from fit are always smaller than Poisson
N errors of bin contents (used for limit fit)
• Worse case: fit error ~ 1/3
N error
•
Fit errors checked against toys ; agree within 10%
• Error estimate robust
23 Jan 2012 Paul Dauncey 12
•
Critical point: there is only ONE assumption in this whole method
•
BDT output fractions are assumed to be linear over fit range of sidebands
•
Looks like a perfectly sensible assumption even over whole mass range
• Coming up with method to estimate a systematic associated with this assumption; report on this in later meeting
•
Suspect dependence is mainly driven by change of background composition with mass
•
I.e. Born/box vs QCD prompt-prompt vs QCD prompt-fake vs QCD fakefake...
•
Cannot accurately check due to lack of MC statistics
•
With MC factor of ×10 data, could even predict linear dependence and check for consistency
• If found agreement, systematic could then be based on physical quantity; i.e. degree of uncertainty in relative background contributions
23 Jan 2012 Paul Dauncey 13
Baseline
Previous New
• Higgs expected limits
(from Nick)
•
Including full correlation matrix of nuisance parameters from fit
•
Code for this implemented in h2gglobe
•
Very little difference ; approximations in previous study ARE good
•
For both, only the systematic for the linear assumption is not included
• Minor difference at high mass seems to be due to adding constraint on fractions; one less DoF in new method
23 Jan 2012 Paul Dauncey 14
•
Streamlined method for handling of sidebands
• Previous (accurate) approximations no longer needed
• Statistically robust, error matrix estimate accurate
•
Results effectively identical
•
Nuisance parameters from shape for limit fit are not large compared with expected data statistics
•
Arise directly from statistics of sidebands so will scale in the same way with luminosity
•
Only one (apparently very reasonable) assumption in the whole method
• Systematic due to this still to be evaluated
• Given simple assumption, expected to be small and well-controlled
•
No assumption on (and hence no systematic from) shape of mass spectrum
23 Jan 2012 Paul Dauncey 15
23 Jan 2012
Paul Dauncey 16