Slide 1

advertisement

23 Jan 2012

Background shape estimates using sidebands

Paul Dauncey

G. Davies, D. Futyan, J. Hays, M. Jarvis,

M. Kenzie, C. Seez, J. Virdee, N. Wardle

Imperial College London

Paul Dauncey 1

Shapes determined from sidebands

David Futyan presented sideband approach at last H

gg meeting

• https://indico.cern.ch/conferenceDisplay.py?confId=169763

Data-driven method used to estimate background shape for Higgs limit, using 2D BDT including mass

• Today, show update of the method

Massfactorised

“kinematic”

BDT

Slice in

BDT, fit to mass

Fit to BDT output shape gg mass

2D

BDT

Select

|

D

M/M

H

|<2% around signal

Paul Dauncey 23 Jan 2012 2

BDT output shapes

Background

Signal

For this particular example with M

H

120 GeV, BDT has

7 output bins

Limit extracted from fitting observed shape in data to background + signal

Critical to have accurate estimate of background shape in data

• Also need robust estimate of shape errors, including bin-to-bin correlations

Error matrix of background shape

 nuisance parameters in Higgs limit fit

23 Jan 2012 Paul Dauncey 3

Sideband windows

Create several sideband windows , all with |

D

M/M

H

| < 2%

Three either side of signal region used for

David’s results last week

Zeroth order approximation ; shape in sum of sidebands ~ same as shape in signal window

Sidebands give good estimate of shape of background in signal region

• Need procedure to quantify this statement

23 Jan 2012 Paul Dauncey 4

Study using 15 windows

Whole mass range from 100 to

180 GeV divided into windows of same size in

D

M/M

H

• Allows 15 windows in total

Due to careful construction of BDT input variables

Fractions per bin are almost independent of central mass of any window

23 Jan 2012 Paul Dauncey 5

Fraction dependence per BDT output bin

Bin 0 Bin 1 Bin 2 Bin 3

Bin 4 Bin 5 Bin 6

Some residual mass dependence of the fractions per bin is seen

Mainly (but possibly not entirely) due to background composition changing with mass

• Small effect ; first order correction applied to each sideband by using linear dependence from above fits

This method was used by David for results shown last week

23 Jan 2012 Paul Dauncey 6

Method shown had some approximations

For a given Higgs mass, same data were used twice

Once in 15-window linear fit, again in sidebands

Only 6 of the 15 of windows used for sidebands, so a weak correlation

• A Higgs signal might distort the 15-window linear fits

• David showed this is a tiny effect last week

The 15-window linear fits were done independently for each BDT bin

• No fit constraint on fractions having to sum to one

Scaling correction applied afterwards

The 15-window linear fits were done with least-squares

Assuming error =

N of expected value for each bin

23 Jan 2012 Paul Dauncey 7

LL fit method

Today, describe modified method for sidebands

Streamlines previous procedure; all done in one fit

Statistically robust with no re-use of same data

• Allows accurate extraction of errors to use as nuisance parameters

Want to fit to fraction in each BDT bin, for each mass window

Want these fractions to have linear dependence on mass

• Fraction f bn

• Constant m

0

= p b0

+ p b1

(m−m

0

), for BDT bin b, mass m centred in window n can be any convenient value; take as Higgs mass

With this choice, the p b0 give the fractions in the signal window directly

Fractions must sum to 1 over BDT bins, for every mass window

S b f bn

= 1 =

S b p b0

+ p b1

(m−m

0

)

Only possible for all m−m

0 if

S b

Force constraint by setting p

7−1=6 each of p b0 and p b1

00 p b0

= 1 and

S b

= 1 − S b≠0 p b0 p b1 and p

01

= 0

= − S b≠0 p b1 parameters for b≠0 →12 parameters

23 Jan 2012 Paul Dauncey 8

LL fit method continued

2D fit to sideband windows × BDT output bins simultaneously

Normalisations of each sideband window used are free parameters

One normalisation parameter per sideband

• Makes NO assumption (Pol5, Pow2, etc) on mass spectrum shape

Fit with LL using full Poisson likelihood for each data bin

• Correct even for low occupancy bins

Binned LL so gives effective c

2 goodness-of-fit measure

NDoF count depends on number of windows N

W

12 fraction parameters + N w normalisation parameters

Fit to 7×N w data values, so NDoF = 6N w

−12

E.g. For

3 sideband windows as used previously

N w

= 6 so NDoF = 24

23 Jan 2012 Paul Dauncey 9

Consistency check fit

Consistency fit to all N

W

=15 windows

Equivalent to that done by David previously

Fit gives c

2 /NDoF = 83.75/78, probability = 30.8%

Linear assumption is reasonable, even over 15 windows

Bin 0 Bin 1 Bin 2 Bin 3

Bin 4 Bin 5 Bin 6

Results effectively identical to those shown on slide 6

Lack of fraction sum constraint and

N errors used previously were good approximations

23 Jan 2012 Paul Dauncey 10

Background fraction estimate fits

Actual fit used for limit shapes has

3 sideband windows either side of signal region

Assumes linearity over mass range equivalent to 9.5 of the 15 sidebands

Seems good assumption, given that fit to 15 windows looks reasonable

21 high mass sideband bins

21 low mass sideband bins

E.g. fractions in BDT bin 1

M

H window

Paul Dauncey

Fit gives direct estimation of fraction at M

H

11 23 Jan 2012

Example: fit results for M

H

=120 GeV

Linear y

1 s fit

 

N

Log y

Errors from fit are always smaller than Poisson

N errors of bin contents (used for limit fit)

• Worse case: fit error ~ 1/3

N error

Fit errors checked against toys ; agree within 10%

• Error estimate robust

23 Jan 2012 Paul Dauncey 12

Systematic (singular)

Critical point: there is only ONE assumption in this whole method

BDT output fractions are assumed to be linear over fit range of sidebands

Looks like a perfectly sensible assumption even over whole mass range

• Coming up with method to estimate a systematic associated with this assumption; report on this in later meeting

Suspect dependence is mainly driven by change of background composition with mass

I.e. Born/box vs QCD prompt-prompt vs QCD prompt-fake vs QCD fakefake...

Cannot accurately check due to lack of MC statistics

With MC factor of ×10 data, could even predict linear dependence and check for consistency

• If found agreement, systematic could then be based on physical quantity; i.e. degree of uncertainty in relative background contributions

23 Jan 2012 Paul Dauncey 13

Comparison of expected limits

Baseline

Previous New

• Higgs expected limits

(from Nick)

Including full correlation matrix of nuisance parameters from fit

Code for this implemented in h2gglobe

Very little difference ; approximations in previous study ARE good

For both, only the systematic for the linear assumption is not included

• Minor difference at high mass seems to be due to adding constraint on fractions; one less DoF in new method

23 Jan 2012 Paul Dauncey 14

Conclusions

Streamlined method for handling of sidebands

• Previous (accurate) approximations no longer needed

• Statistically robust, error matrix estimate accurate

Results effectively identical

Nuisance parameters from shape for limit fit are not large compared with expected data statistics

Arise directly from statistics of sidebands so will scale in the same way with luminosity

Only one (apparently very reasonable) assumption in the whole method

• Systematic due to this still to be evaluated

• Given simple assumption, expected to be small and well-controlled

No assumption on (and hence no systematic from) shape of mass spectrum

23 Jan 2012 Paul Dauncey 15

23 Jan 2012

Backup Slides

Paul Dauncey 16

Download