23 Jan 2012
Paul Dauncey 1
David Futyan presented sideband approach at last H
gg meeting
• https://indico.cern.ch/conferenceDisplay.py?confId=169763
Data-driven method used to estimate background shape for Higgs limit, using 2D BDT including mass
• Today, show update of the method
Slice in
BDT, fit to mass
Fit to BDT output shape gg mass
|<2% around signal
Paul Dauncey 23 Jan 2012 2
For this particular example with M
120 GeV, BDT has
7 output bins
Limit extracted from fitting observed shape in data to background + signal
Critical to have accurate estimate of background shape in data
• Also need robust estimate of shape errors, including bin-to-bin correlations
Error matrix of background shape
nuisance parameters in Higgs limit fit
23 Jan 2012 Paul Dauncey 3
Create several sideband windows , all with |
| < 2%
Three either side of signal region used for
David’s results last week
Zeroth order approximation ; shape in sum of sidebands ~ same as shape in signal window
Sidebands give good estimate of shape of background in signal region
• Need procedure to quantify this statement
23 Jan 2012 Paul Dauncey 4
Whole mass range from 100 to
180 GeV divided into windows of same size in
• Allows 15 windows in total
Due to careful construction of BDT input variables
Fractions per bin are almost independent of central mass of any window
23 Jan 2012 Paul Dauncey 5
Bin 0 Bin 1 Bin 2 Bin 3
Bin 4 Bin 5 Bin 6
Some residual mass dependence of the fractions per bin is seen
Mainly (but possibly not entirely) due to background composition changing with mass
• Small effect ; first order correction applied to each sideband by using linear dependence from above fits
This method was used by David for results shown last week
23 Jan 2012 Paul Dauncey 6
For a given Higgs mass, same data were used twice
Once in 15-window linear fit, again in sidebands
Only 6 of the 15 of windows used for sidebands, so a weak correlation
• A Higgs signal might distort the 15-window linear fits
• David showed this is a tiny effect last week
The 15-window linear fits were done independently for each BDT bin
• No fit constraint on fractions having to sum to one
Scaling correction applied afterwards
The 15-window linear fits were done with least-squares
Assuming error =
N of expected value for each bin
23 Jan 2012 Paul Dauncey 7
Today, describe modified method for sidebands
Streamlines previous procedure; all done in one fit
Statistically robust with no re-use of same data
• Allows accurate extraction of errors to use as nuisance parameters
Want to fit to fraction in each BDT bin, for each mass window
Want these fractions to have linear dependence on mass
• Fraction f bn
• Constant m
= p b0
+ p b1
), for BDT bin b, mass m centred in window n can be any convenient value; take as Higgs mass
With this choice, the p b0 give the fractions in the signal window directly
Fractions must sum to 1 over BDT bins, for every mass window
S b f bn
= 1 =
S b p b0
+ p b1
Only possible for all m−m
0 if
S b
Force constraint by setting p
7−1=6 each of p b0 and p b1
00 p b0
= 1 and
S b
= 1 − S b≠0 p b0 p b1 and p
= 0
= − S b≠0 p b1 parameters for b≠0 →12 parameters
23 Jan 2012 Paul Dauncey 8
2D fit to sideband windows × BDT output bins simultaneously
Normalisations of each sideband window used are free parameters
One normalisation parameter per sideband
• Makes NO assumption (Pol5, Pow2, etc) on mass spectrum shape
Fit with LL using full Poisson likelihood for each data bin
• Correct even for low occupancy bins
Binned LL so gives effective c
2 goodness-of-fit measure
NDoF count depends on number of windows N
12 fraction parameters + N w normalisation parameters
Fit to 7×N w data values, so NDoF = 6N w
E.g. For
3 sideband windows as used previously
N w
= 6 so NDoF = 24
23 Jan 2012 Paul Dauncey 9
Consistency fit to all N
=15 windows
Equivalent to that done by David previously
Fit gives c
2 /NDoF = 83.75/78, probability = 30.8%
Linear assumption is reasonable, even over 15 windows
Bin 0 Bin 1 Bin 2 Bin 3
Bin 4 Bin 5 Bin 6
Results effectively identical to those shown on slide 6
Lack of fraction sum constraint and
N errors used previously were good approximations
23 Jan 2012 Paul Dauncey 10
Actual fit used for limit shapes has
3 sideband windows either side of signal region
Assumes linearity over mass range equivalent to 9.5 of the 15 sidebands
Seems good assumption, given that fit to 15 windows looks reasonable
21 high mass sideband bins
21 low mass sideband bins
E.g. fractions in BDT bin 1
H window
Paul Dauncey
Fit gives direct estimation of fraction at M
11 23 Jan 2012
Linear y
1 s fit
Log y
Errors from fit are always smaller than Poisson
N errors of bin contents (used for limit fit)
• Worse case: fit error ~ 1/3
N error
Fit errors checked against toys ; agree within 10%
• Error estimate robust
23 Jan 2012 Paul Dauncey 12
Critical point: there is only ONE assumption in this whole method
BDT output fractions are assumed to be linear over fit range of sidebands
Looks like a perfectly sensible assumption even over whole mass range
• Coming up with method to estimate a systematic associated with this assumption; report on this in later meeting
Suspect dependence is mainly driven by change of background composition with mass
I.e. Born/box vs QCD prompt-prompt vs QCD prompt-fake vs QCD fakefake...
Cannot accurately check due to lack of MC statistics
With MC factor of ×10 data, could even predict linear dependence and check for consistency
• If found agreement, systematic could then be based on physical quantity; i.e. degree of uncertainty in relative background contributions
23 Jan 2012 Paul Dauncey 13
Previous New
• Higgs expected limits
(from Nick)
Including full correlation matrix of nuisance parameters from fit
Code for this implemented in h2gglobe
Very little difference ; approximations in previous study ARE good
For both, only the systematic for the linear assumption is not included
• Minor difference at high mass seems to be due to adding constraint on fractions; one less DoF in new method
23 Jan 2012 Paul Dauncey 14
Streamlined method for handling of sidebands
• Previous (accurate) approximations no longer needed
• Statistically robust, error matrix estimate accurate
Results effectively identical
Nuisance parameters from shape for limit fit are not large compared with expected data statistics
Arise directly from statistics of sidebands so will scale in the same way with luminosity
Only one (apparently very reasonable) assumption in the whole method
• Systematic due to this still to be evaluated
• Given simple assumption, expected to be small and well-controlled
No assumption on (and hence no systematic from) shape of mass spectrum
23 Jan 2012 Paul Dauncey 15
23 Jan 2012
Paul Dauncey 16