Status of the Jet Probability B - Tagging Algorithm for Run II

advertisement
Status of the Jet Probability
B-Tagging Algorithm for Run II
Darin Acosta, Michael Schmitt, Dmitri Tsybychev,
Song Ming Wang
University of Florida
Jet Probability
Construct Track Probability from
signed impact parameter
measured by SVX:
Probability that track originates
+
–
from the primary vertex
Construct Jet Probability from
products of Track Probabilities:
TP1⊗TP2⊗ … ⊗TPn
bottom
charm
primary
Flat from 0 to 1 for primary jets
Peaks at 0 for b,c jets
0
B-Tagging Meeting, July 19, 2002
2
1
Darin Acosta
Jet Probability in AC++
The Jet Probability code is in two AC++ packages:
JetProbObjects (contains storeable objects JPTrack, JPJet)
JetProbMods (main steering module JetProbModule)
Talk-to implemented to select jet cone size, minimum Et,
eta range, track quality cuts, …
The user can get a collection of jets from JetProbModule
with probabilities attached
User can select probability cut to optimize S/√B for specific
analysis
Track probabilities and other track quality information also
attached
These tagging results are also available in Stntuple now
Must run JetProbModule beforehand
B-Tagging Meeting, July 19, 2002
3
Darin Acosta
Status Prior to This Summer
Last impact parameter fits were based on release
4.2.0 using QCD Monte Carlo from Pythia+cdfSim
Probabilities from these fits are included in JetProb library
release 4.5.0 and later
360 track categories were created (not all possible)
20 hit + shared hit combinations
SVX+COT and SVX-only categories
Pt dependence and innermost silicon layer reached
L00 and ISL included
r-z parameterizations also available
But these old fits, when applied to data or recent MC,
no longer lead to flat Track Probability distributions
Complete re-analysis started this summer
B-Tagging Meeting, July 19, 2002
4
Darin Acosta
Data Samples and Processing
Data:
GJet03 (Jet20, Jet50, Jet70…) from 4.5.2 Production, 100Kevt
MC:
QCD samples produced using Pythia, 700Kevt
L00 removed, beam offset included
Offline defaults:
Everything re-processed using 4.5.0int7 (data) or 4.6.0 (MC)
Alignment version “20 3 Test” used for data
Drop ISLD bank (ISL and L00)
Outside-in r-φ tracking only
Track quality: ≥3 SVX hits, p > 0.5 GeV, d < 1 mm
(no COT cuts yet)
Vertex taken from VXPRIM
Impact parameter error convolutes track error with vertex error
Jet Probability cone size 0.4, E > 7 GeV, |η| < 2.5
T
0
T
B-Tagging Meeting, July 19, 2002
5
Darin Acosta
Track Categories for IP Fits
Conventions:
The number of SVX “hits” along a track is considered to be the
number of distinct SVX layers hit (not the total number of hits).
An SVX “hit” is considered “shared” only if another outside-in
track uses the same hit
Track Categories:
Since we consider only SVX layers, and since each layer
may have
5
a hit, a shared hit, or no hit, there are a maximum of 3 =243
categories labeling the hits along a track (but only 192 have 3 SVX
layers or more). We’ll consider fewer:
Baseline (3 categories)
3, 4, or 5 SVX hits along track (don’t care if hits are shared)
Hits+Shared (12 categories)
3, 4, or 5 SVX hits by 0, 1, 2, or 3+ shared hits
Hits+Shared+pT (36 categories)
3, 4, or 5 SVX hits by 0, 1, 2, or 3+ shared hits by 3 pT bins:
0.5–2.0, 2.0–5.0, >5.0 GeV
Hits+Shared+pT+η (72 categories)
3, 4, or 5 SVX hits by 0, 1, 2, or 3+ shared hits by 3 pT bins
by 2 η bins (central, forward)
B-Tagging Meeting, July 19, 2002
6
Darin Acosta
Track Categories Continued
“mLayer” (52 categories)
Extend the “Hits+Shared” category to parameterize where
missing hits are, where the innermost layer reached by a
track is, and whether the innermost layers have shared hits
Doesn’t yet include any pT or η dependence
B-Tagging Meeting, July 19, 2002
7
Darin Acosta
Comparison of Impact Parameter
Distributions: Data vs. MC
Signed Impact Parameter
Signed IP Significance
Data, Monte Carlo, and B-Bbar MC Significance Distribution Comparison
Data, Monte Carlo, and B-Bbar MC Impact Parameter Distribution Comparison
10
10
10
10
-1
Data
MC
B-Bbar MC
Data
MC
B-Bbar MC
-1
10
-2
-2
10
-3
-3
10
-4
-4
-0.1
10
-0.08 -0.06 -0.04 -0.02
0
0.02
0.04
0.06
0.08
-40
0.1
-30
-20
-10
0
10
D0
20
40
D0/D0Err
Jet Data vs. QCD and bb MC, pT>40
Impact Parameter distribution is narrower in MC than in data
B-Tagging Meeting, July 19, 2002
30
8
Darin Acosta
IP Dependence on Number of Hit Layers
Signed Impact Parameter
Signed IP Significance
Monte Carlo Significance Distribution: Hit Layers Comparison
Monte Carlo Impact Parameter Distribution: Hit Layers Comparison
10
-1
3 Hits
4 Hits
5 Hits
10
10
10
-2
-2
10
10
3 Hits
4 Hits
5 Hits
-1
-3
10
10
-0.1
-0.08 -0.06 -0.04 -0.02
0
0.02
0.04
0.06
0.08
-3
-4
-5
-40
0.1
-30
-20
-10
0
10
20
D0
40
D0/D0Err
QCD MC, pT>40
IP Significance depends on number of hit layers:
Longer tails for fewer hits
B-Tagging Meeting, July 19, 2002
30
9
Darin Acosta
IP Dependence on Number Shared Hits
Signed Impact Parameter
Signed IP Significance
Monte Carlo Significance Distribution: Shared Layers Comparison
Monte Carlo Impact Parameter Distribution: Shared Layers Comparison
10
10
0 Shared
1 Shared
2 Shared
3+ Shared
-1
10
-2
-3
10
10
-1
-2
10
10
0 Shared
1 Shared
2 Shared
3+ Shared
-3
-4
10
-0.1
-0.08 -0.06 -0.04 -0.02
0
0.02
0.04
0.06
0.08
0.1
-4
-40
-30
-20
-10
0
10
20
D0
40
D0/D0Err
QCD MC, pT>40
IP Significance depends on number of shared hits:
Longer tails for more shared hits
B-Tagging Meeting, July 19, 2002
30
10
Darin Acosta
IP Dependence on Track pT
Signed Impact Parameter
Signed IP Significance
Monte Carlo Significance Distribution: PT Comparison
Monte Carlo Impact Parameter Distribution: PT Comparison
10
10
-1
Low Pt
Mid Pt
High Pt
Low Pt
Mid Pt
High Pt
-1
10
-2
-2
10
10
-3
-3
10
10
-4
-4
10
-0.1
-0.08 -0.06 -0.04 -0.02
0
0.02
0.04
0.06
0.08
0.1
-40
-30
-20
-10
0
10
20
D0
40
D0/D0Err
QCD MC, pT>40
Includes all hit and shared hit categories
Not much dependence of IP significance on pT
B-Tagging Meeting, July 19, 2002
30
11
Darin Acosta
IP Dependence on Track η
Signed Impact Parameter
Signed IP Significance
Monte Carlo Significance Distribution: Eta Comparison
Monte Carlo Impact Parameter Distribution: Eta Comparison
Central Eta
Forward Eta
-1
10
10
10
Central Eta
Forward Eta
-1
-2
-2
10
10
-3
10
10
10
-3
-4
-5
-4
10
10
-0.1
-0.08 -0.06 -0.04 -0.02
0
0.02
0.04
0.06
0.08
-6
-40
0.1
-30
-20
-10
0
10
D0
30
D0/D0Err
QCD MC, pT>40
Includes all hit and shared hit categories
Not much dependence of IP significance on η
B-Tagging Meeting, July 19, 2002
20
12
Darin Acosta
40
Fits to Impact Parameter Significance
Fit negative side of D0/D0Err distribution, separately for MC and data
No subtraction of heavy flavor content in data yet
Fit the sum of 4 Gaussians
8 parameters
(assume means are zero)
Choose nonlinear bins to increase statistics in tails
Transform axis to log(D0/D0Err)
D0/D0Err
B-Tagging Meeting, July 19, 2002
n.b. A Gaussian with µ=0
and σ=1 exhibits a peak at
zero along transformed axis
log(D0/D0Err)
13
Darin Acosta
Baseline 3 Categories
QCD MC, pT>40:
3 SVX hits
4 SVX hits
5 SVX hits
Jet data:
B-Tagging Meeting, July 19, 2002
14
Darin Acosta
Group_0_Pt_0.5_2_0_noL00_allEta_SVX+COT
July 02, 2002
using: /grincdf/tmp1/tsybych/pythia_qcd_pt40_offset/stntuple/i˘@
Group 0: Shared 0, Hit 3
_grp0_shr0_hit3
Nent = 43163
Mean = -0.3371
RMS = 1.154
Under = 1890
Over =
0
Integ = 4.127e+04
Chi2 / ndf = 136.2 / 136
Prob = 0.4833
AmpxG = 1235 ± 83.82
SigxG = 1.384 ± 0.05236
AmpxG1 = 2123 ± 93.91
SigxG1 = 0.6442 ± 0.01371
AmpxG2 = 519 ± 19.47
SigxG2 = 4.767 ± 0.1511
AmpxG3 = 124 ± 12.69
SigxG3 = 12.05 ± 0.4359
800
700
600
500
400
300
200
_grp0_shr0_hit4
Nent = 174054
Mean = -0.5144
RMS = 1.031
Under = 8187
Over =
0
Integ = 1.659e+05
Chi2 / ndf = 154.8 / 135
Prob = 0.1165
AmpxG = 5765 ± 350.9
SigxG = 1.16 ± 0.02461
AmpxG1 = 8960 ± 369.7
SigxG1 = 0.652 ± 0.00941
AmpxG2 = 1030 ± 31.16
SigxG2 = 4.037 ± 0.1091
AmpxG3 = 391.7 ± 22.36
SigxG3 = 10.16 ± 0.2058
3000
2500
2000
1500
1000
-2
0
2
4
Group 0: Shared 1, Hit 3
6
8
_grp0_shr1_hit3
Nent = 8470
Mean = 0.2656
RMS = 1.367
Under = 252
Over =
0
Integ = 8218
Chi2 / ndf = 207.1 / 135
Prob = 3.878e-05
AmpxG = 413.9 ± 8.141
SigxG = 0.8891 ± 0.01752
AmpxG1 = 212.8 ± 8.667
SigxG1 = 4.102 ± 0.1742
AmpxG2 = 78.49 ± 9.531
SigxG2 = 11.09 ± 5.552
AmpxG3 = 78.49 ± 9.531
SigxG3 = 11.09 ± 5.552
140
120
100
80
60
40
0
Group 0: Shared 0, Hit 5
_grp0_shr0_hit5
Nent = 525430
Mean = -0.6057
RMS = 0.9514
Under = 2.597e+04
Over =
0
Integ = 4.995e+05
Chi2 / ndf = 1101 / 138
Prob = 0
AmpxG = 9832 ± 278.7
SigxG = 1.448 ± 0.01399
AmpxG1 = 4.119e+04 ± 937.9
SigxG1 = 0.7255 ± 0.003259
AmpxG2 = 3409 ± 939
SigxG2 = -0.7255 ± 0.02075
AmpxG3 = 1074 ± 15.21
SigxG3 = 9.183 ± 0.07441
10000
8000
6000
4000
2000
500
100
0
Group 0: Shared 0, Hit 4
3500
-2
0
2
4
Group 0: Shared 1, Hit 4
6
8
_grp0_shr1_hit4
Nent = 39280
Mean = -0.2064
RMS = 1.183
Under = 1465
Over =
0
Integ = 3.782e+04
Chi2 / ndf = 376.3 / 136
Prob = 0
AmpxG = 2616 ± 14.26
SigxG = 0.8678 ± 0.004392
AmpxG1 = 709.9 ± 11.02
SigxG1 = 3.149 ± 0.04551
AmpxG2 = 156.5 ± 7.508
SigxG2 = 9.778 ± 3.662
AmpxG3 = 156.5 ± 7.508
SigxG3 = 9.778 ± 3.662
700
600
500
400
300
200
0
-2
0
2
4
Group 0: Shared 1, Hit 5
6
8
_grp0_shr1_hit5
Nent = 125831
Mean = -0.4564
RMS = 0.9995
Under = 5471
Over =
0
Integ = 1.204e+05
Chi2 / ndf = 184 / 135
Prob = 0.002715
AmpxG = 4002 ± 283.2
SigxG = 1.319 ± 0.03561
AmpxG1 = 6881 ± 308
SigxG1 = 0.7421 ± 0.01142
AmpxG2 = 554.1 ± 33.26
SigxG2 = 3.909 ± 0.2053
AmpxG3 = 226.8 ± 17.39
SigxG3 = 10.7 ± 0.2934
2500
2000
1500
1000
500
20
0
100
-2
0
2
4
Group 0: Shared 2, Hit 3
6
8
_grp0_shr2_hit3
Nent = 3854
Mean = 1.068
RMS = 1.344
Under = 49
Over =
0
Integ = 3805
Chi2 / ndf = 169.9 / 138
Prob = 0.03226
AmpxG = 77.95 ± 4.968
SigxG = 0.9883 ± 0.06339
AmpxG1 = 199 ± 15.22
SigxG1 = 6.727 ± 0.3293
AmpxG2 = 40.37 ± 209.9
SigxG2 = 13.57 ± 13.27
AmpxG3 = 40.37 ± 209.9
SigxG3 = 13.57 ± 13.27
70
60
50
40
30
20
0
-2
0
2
4
Group 0: Shared 2, Hit 4
6
8
_grp0_shr2_hit4
Nent = 11411
Mean = 0.1711
RMS = 1.296
Under = 318
Over =
0
Integ = 1.109e+04
Chi2 / ndf = 197 / 135
Prob = 0.0002809
AmpxG = 622.9 ± 6.595
SigxG = 0.9712 ± 0.009949
AmpxG1 = 307.2 ± 6.13
SigxG1 = 4.522 ± 0.0818
AmpxG2 = 63.97 ± 4.686
SigxG2 = 11.09 ± 6.468
AmpxG3 = 63.97 ± 4.686
SigxG3 = 11.09 ± 6.468
200
180
160
140
120
100
80
60
0
-2
0
2
4
Group 0: Shared 2, Hit 5
6
8
_grp0_shr2_hit5
Nent = 39000
Mean = -0.1841
RMS = 1.115
Under = 1442
Over =
0
Integ = 3.756e+04
Chi2 / ndf = 177 / 132
Prob = 0.004665
AmpxG = 1244 ± 138.9
SigxG = 0.7052 ± 0.0287
AmpxG1 = 1683 ± 130.1
SigxG1 = 1.316 ± 0.04244
AmpxG2 = 566.7 ± 20.8
SigxG2 = 4.42 ± 0.1372
AmpxG3 = 117.9 ± 15.65
SigxG3 = 10.05 ± 0.4157
700
600
500
400
300
200
40
10
0
0
2
4
Group 0: Shared 3, Hit 3
6
8
_grp0_shr3_hit3
Nent = 3932
Mean = 1.086
RMS = 1.238
Under = 46
Over =
0
Integ = 3886
Chi2 / ndf = 149.8 / 137
Prob = 0.2152
AmpxG = 41.18 ± 3.132
SigxG = 0.8637 ± 0.07574
AmpxG1 = 254.1 ± 5.754
SigxG1 = 5.689 ± 0.1081
AmpxG2 = 34.63 ± 4.945
SigxG2 = 12.79 ± 9.729
AmpxG3 = 34.63 ± 4.945
SigxG3 = 12.79 ± 9.729
90
80
70
60
50
40
30
0
4
6
8
4
120
100
80
60
0
6
8
_grp0_shr3_hit4
Nent = 12033
Mean = 0.6093
RMS = 1.29
Under = 240
Over =
0
Integ = 1.179e+04
Chi2 / ndf = 259.1 / 137
Prob = 2.174e-10
AmpxG = 282.4 ± 6.658
SigxG = 0.8388 ± 0.02165
AmpxG1 = 603.4 ± 8.805
SigxG1 = 3.831 ± 0.05128
AmpxG2 = 114.5 ± 7.21
SigxG2 = 10.87 ± 4.694
AmpxG3 = 114.5 ± 7.21
SigxG3 = 10.87 ± 4.694
140
20
2
2
160
10
0
0
180
40
-2
-2
Group 0: Shared 3, Hit 4
20
0
100
20
-2
0
-2
0
2
4
Group 0: Shared 3, Hit 5
6
8
_grp0_shr3_hit5
Nent = 36525
Mean = 0.06202
RMS = 1.178
Under = 1067
Over =
0
Integ = 3.546e+04
Chi2 / ndf = 293.3 / 135
Prob = 2.639e-15
AmpxG = 1551 ± 33.51
SigxG = 0.8551 ± 0.01506
AmpxG1 = 1571 ± 212.7
SigxG1 = 2.816 ± 0.04986
AmpxG2 = 45.45 ± 212.8
SigxG2 = -2.816 ± 0.4284
AmpxG3 = 309.4 ± 14.67
SigxG3 = 8.95 ± 0.1777
600
500
400
300
200
100
-2
0
2
4
6
8
0
-2
0
2
4
6
8
Group_1_Pt_2_5_0_noL00_allEta_SVX+COT
July 02, 2002
using: /grincdf/tmp1/tsybych/pythia_qcd_pt40_offset/stntuple/i˘@
Group 1: Shared 0, Hit 3
_grp1_shr0_hit3
Nent = 38226
Mean = -0.3958
RMS = 1.173
Under = 1753
Over =
0
Integ = 3.647e+04
Chi2 / ndf = 159.7 / 139
Prob = 0.11
AmpxG = 1076 ± 133.6
SigxG = 1.218 ± 0.0554
AmpxG1 = 2016 ± 140
SigxG1 = 0.656 ± 0.01763
AmpxG2 = 266.4 ± 17.08
SigxG2 = 5.605 ± 0.3614
AmpxG3 = 186 ± 19.55
SigxG3 = 12.8 ± 0.4269
700
600
500
400
300
200
Group 1: Shared 0, Hit 4
_grp1_shr0_hit4
Nent = 149037
Mean = -0.5914
RMS = 0.9907
Under = 7378
Over =
0
Integ = 1.417e+05
Chi2 / ndf = 500.4 / 141
Prob = 0
AmpxG = 2865 ± 182
SigxG = 1.336 ± 0.02546
AmpxG1 = 1.193e+04 ± 493.8
SigxG1 = 0.7157 ± 0.007262
AmpxG2 = 1438 ± 495.2
SigxG2 = -0.7157 ± 0.03934
AmpxG3 = 453.7 ± 8.386
SigxG3 = 11.56 ± 0.133
3000
2500
2000
1500
1000
Group 1: Shared 0, Hit 5
_grp1_shr0_hit5
Nent = 437659
Mean = -0.6663
RMS = 0.9157
Under = 2.252e+04
Over =
0
Integ = 4.151e+05
Chi2 / ndf = 820.4 / 142
Prob = 0
AmpxG = 9460 ± 619.4
SigxG = 1.284 ± 0.01781
AmpxG1 = 3.325e+04 ± 331.7
SigxG1 = 0.7255 ± 0.003097
AmpxG2 = 2635 ± 620.8
SigxG2 = -1.284 ± 0.03792
AmpxG3 = 477.9 ± 8.209
SigxG3 = 13.82 ± 0.1513
9000
8000
7000
6000
5000
4000
3000
2000
500
100
0
1000
-2
0
2
4
Group 1: Shared 1, Hit 3
6
8
_grp1_shr1_hit3
Nent = 7484
Mean = 0.1768
RMS = 1.448
Under = 223
Over =
0
Integ = 7261
Chi2 / ndf = 140.5 / 140
Prob = 0.4762
AmpxG = 383.6 ± 15.61
SigxG = 0.8432 ± 0.02191
AmpxG1 = 144.3 ± 14.34
SigxG1 = 2.031 ± 0.1308
AmpxG2 = 83.62 ± 13.67
SigxG2 = 14.3 ± 7.252
AmpxG3 = 83.62 ± 13.67
SigxG3 = 14.3 ± 7.252
140
120
100
80
60
40
0
-2
0
2
4
Group 1: Shared 1, Hit 4
6
8
_grp1_shr1_hit4
Nent = 33467
Mean = -0.3339
RMS = 1.122
Under = 1413
Over =
0
Integ = 3.205e+04
Chi2 / ndf = 165.9 / 139
Prob = 0.05772
AmpxG = 1433 ± 114.1
SigxG = 0.6983 ± 0.02121
AmpxG1 = 1351 ± 109.8
SigxG1 = 1.287 ± 0.03737
AmpxG2 = 138.2 ± 8.89
SigxG2 = 14.65 ± 0.4126
AmpxG3 = 177.5 ± 11.54
SigxG3 = 4.925 ± 0.2922
600
500
400
300
200
-2
0
2
4
Group 1: Shared 1, Hit 5
6
8
_grp1_shr1_hit5
Nent = 96522
Mean = -0.5305
RMS = 0.9595
Under = 4361
Over =
0
Integ = 9.216e+04
Chi2 / ndf = 419.1 / 141
Prob = 0
AmpxG = 3028 ± 323.5
SigxG = 1.411 ± 0.03138
AmpxG1 = 6356 ± 179.4
SigxG1 = 0.7774 ± 0.009223
AmpxG2 = 597.5 ± 325.2
SigxG2 = -1.411 ± 0.07478
AmpxG3 = 159.7 ± 5.032
SigxG3 = 13.39 ± 0.2612
2000
1800
1600
1400
1200
1000
800
600
400
100
20
0
200
0
-2
0
2
4
Group 1: Shared 2, Hit 3
6
8
_grp1_shr2_hit3
Nent = 3068
Mean = 0.874
RMS = 1.576
Under = 58
Over =
0
Integ = 3010
Chi2 / ndf = 194.2 / 139
Prob = 0.001089
AmpxG = 116.8 ± 4.75
SigxG = 0.9738 ± 0.03807
AmpxG1 = 53.96 ± 6.52
SigxG1 = 5.249 ± 0.507
AmpxG2 = 34.04 ± 95.23
SigxG2 = 16.35 ± 1.607
AmpxG3 = 80.01 ± 95.22
SigxG3 = 16.35 ± 0.7993
50
40
30
20
10
0
-2
0
2
4
Group 1: Shared 2, Hit 4
6
8
_grp1_shr2_hit4
Nent = 10502
Mean = -0.1034
RMS = 1.204
Under = 332
Over =
0
Integ = 1.017e+04
Chi2 / ndf = 197.2 / 138
Prob = 0.0005259
AmpxG = 386.9 ± 29.4
SigxG = 0.867 ± 25.25
AmpxG1 = 453.1 ± 29.63
SigxG1 = 1.456 ± 25.27
AmpxG2 = 67.94 ± 26.56
SigxG2 = 13.07 ± 8.948
AmpxG3 = 67.94 ± 26.56
SigxG3 = 7.577 ± 4.079
220
200
180
160
140
120
100
80
60
40
0
-2
0
2
4
Group 1: Shared 2, Hit 5
6
8
_grp1_shr2_hit5
Nent = 34344
Mean = -0.3367
RMS = 1.006
Under = 1286
Over =
0
Integ = 3.306e+04
Chi2 / ndf = 151.3 / 137
Prob = 0.1914
AmpxG = 2046 ± 120
SigxG = 1.253 ± 0.02378
AmpxG1 = 921.2 ± 122.4
SigxG1 = 0.699 ± 0.03284
AmpxG2 = 181.2 ± 9.819
SigxG2 = 4.521 ± 0.1683
AmpxG3 = 37.82 ± 3.956
SigxG3 = 15.37 ± 0.7664
700
600
500
400
300
200
100
20
0
-2
0
2
4
Group 1: Shared 3, Hit 3
6
8
_grp1_shr3_hit3
Nent = 3661
Mean = 1.303
RMS = 1.467
Under = 51
Over =
0
Integ = 3610
Chi2 / ndf = 235.9 / 142
Prob = 4.844e-07
AmpxG = 75.63 ± 3.667
SigxG = 0.8285 ± 0.04044
AmpxG1 = 193.1 ± 15.36
SigxG1 = 9.732 ± 0.3758
AmpxG2 = 14.7 ± 97.73
SigxG2 = 18.04 ± 3.533
AmpxG3 = 56.13 ± 97.68
SigxG3 = 18.04 ± 1.364
80
70
60
50
40
30
20
-2
0
2
4
Group 1: Shared 3, Hit 4
6
8
_grp1_shr3_hit4
Nent = 9490
Mean = 0.2953
RMS = 1.388
Under = 254
Over =
0
Integ = 9236
Chi2 / ndf = 172.6 / 140
Prob = 0.03036
AmpxG = 218.6 ± 48.7
SigxG = 0.675 ± 0.06362
AmpxG1 = 332.1 ± 45.25
SigxG1 = 1.384 ± 0.09552
AmpxG2 = 125.6 ± 8.863
SigxG2 = 15.79 ± 0.476
AmpxG3 = 205 ± 11
SigxG3 = 5.41 ± 0.2743
160
140
120
100
80
60
40
-2
0
2
4
6
8
0
0
-2
0
2
4
Group 1: Shared 3, Hit 5
6
8
_grp1_shr3_hit5
Nent = 30574
Mean = -0.1511
RMS = 1.093
Under = 1035
Over =
0
Integ = 2.954e+04
Chi2 / ndf = 193 / 141
Prob = 0.001971
AmpxG = 833.3 ± 108.1
SigxG = 0.7361 ± 0.03631
AmpxG1 = 1556 ± 101.8
SigxG1 = 1.4 ± 0.03887
AmpxG2 = 385.4 ± 16.49
SigxG2 = 4.581 ± 0.1513
AmpxG3 = 59.11 ± 6.453
SigxG3 = 14.56 ± 0.6616
600
500
400
300
200
100
20
10
0
0
-2
0
2
4
6
8
0
-2
0
2
4
6
8
Track Probability
Use the parameterized results
and integrate the area under the
curve to find the probability of a
track getting this value of the
impact parameter significance or
larger
Should be a flat probability
distribution for prompt jets
Should peak at zero for b/c jets
Combine into Jet Probability
Constructed so that prompt
jets still give flat distribution
B-Tagging Meeting, July 19, 2002
15
D0/D0Err
Darin Acosta
Group_1_Pt_2_5_0_noL00_allEta_SVX+COT Track Probability Distribution
Track Probability Distribution
Track Probability Distribution
250
900
Track Probability Distribution
2500
800
200
2000
700
600
150
1500
500
TP_grp1_shr0_hit3
TP_grp1_shr0_hit4
400
Nent = 10378
Mean = 0.5028
100
Nent = 39971
Mean = 0.5037
300
RMS = 0.2895
Under =
50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0
0.9
Nent = 118090
Mean = 0.5017
RMS = 0.2899
Over =
0
Integ = 1.038e+04
0
0
TP_grp1_shr0_hit5
1000
1
Track Probability Distribution
200
Under =
100
Over =
0
Integ = 3.997e+04
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
RMS = 0.29
0
0.9
Under =
500
0
Over =
0
Integ = 1.181e+05
1
Track Probability Distribution
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Track Probability Distribution
200
50
500
180
160
40
400
140
120
30
TP_grp1_shr1_hit3
TP_grp1_shr1_hit4
100
Nent = 1984
20
Over =
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Over =
20
1
Track Probability Distribution
0
0
Nent = 25630
RMS = 0.2895
Under =
0
100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
Integ = 2.563e+04
1
Track Probability Distribution
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Track Probability Distribution
220
70
200
180
60
20
160
50
140
15
120
40
10
TP_grp1_shr2_hit3
TP_grp1_shr2_hit4
Nent = 846
Nent = 2778
30
Mean = 0.5067
RMS = 0.2863
Under =
5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
Over =
0
Integ =
846
0.8
0.9
Under =
Over =
10
Nent = 9279
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0
Under =
40
0
0.9
Mean = 0.4959
RMS = 0.2861
60
Over =
20
Integ = 2778
1
TP_grp1_shr2_hit5
100
80
Mean = 0.5043
RMS = 0.2952
20
Track Probability Distribution
1
Track Probability Distribution
0
0
0
0
Integ = 9279
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Track Probability Distribution
200
30
180
60
25
160
50
140
20
120
40
15
TP_grp1_shr3_hit3
Nent = 1026
TP_grp1_shr3_hit4
Nent = 2557
30
Mean = 0.4937
10
Under =
Over =
5
0
0
20
RMS = 0.2883
10
Under =
Over =
Integ = 1026
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
100
TP_grp1_shr3_hit5
Nent = 8286
80
Mean = 0.5015
RMS = 0.2916
0
0
Over =
Integ = 8994
25
0
0
Mean = 0.5029
RMS = 0.2875
Under =
0
40
0
Integ = 1984
0
0
TP_grp1_shr1_hit5
200
Mean = 0.5095
60
RMS = 0.2856
Under =
0
10
300
Nent = 8994
80
Mean = 0.4937
Mean = 0.5039
60
0
0
Integ = 2557
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RMS = 0.2881
40
Under =
Over =
20
Integ = 8286
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0
0
0.9
1
Jet Probability Distribution for b-jets
Calculate using positive IP and 36 fit categories (pT+shr+hit)
Pythia bb MC
Jet Prob (RPhi Pos)
10
Jet Prob (RPhi Pos)
10
16000
10
JPRPHIP
Entries
47127
Mean
0.2063
RMS
0.2832
4
JPRPHIP
Entries
47127
Mean
0.2063
RMS
0.2832
3
2
14000
10
12000
1
0
0.2
0.4
0.6
0.8
1
Log scale
10000
8000
6000
4000
2000
0
0
0.2
0.4
0.6
0.8
1
Sharp spike near 0 probability
B-Tagging Meeting, July 19, 2002
16
Darin Acosta
Jet Probability b-jet Tagging Efficiency
Cut JP<0.01 for Positive IP
for b-jets in Pythia bb MC
Tag Eff
Central jets: |η|<1
Must have ≥ 1 good track
0.8
0.7
0.6
0.5
0.4
mLayer
AllGrp
Pt+Shr+Hit
Shr+Hit
baseline
0.3
0.2
10
15
B-Tagging Meeting, July 19, 2002
20
25
30
17
35
40
45
50
Jet Et (GeV)
Darin Acosta
Jet Probability mis-Tagging Efficiency
Cut JP<0.01 for Negative IP
for non-b-jets in Pythia QCD MC
Central jets: |η|<1
Must have ≥ 1 good track
Tag Eff
0.025
mLayer
AllGrp
Pt+Shr+Hit
Shr+Hit
baseline
0.02
0.015
0.01
0.005
0
10
15
B-Tagging Meeting, July 19, 2002
20
25
30
18
35
40
45
50
Jet Et (GeV)
Darin Acosta
B-jet Efficiency vs. Prompt Jet Efficiency
Central jets: |η|<1
Must have ≥ 1 good track
36 fit categories (pT+shr+hit)
JP : B jet Tag Eff vs Prompt Jet Tag Eff
Positive IP
for b-jets
1
0.8
0.6
0.4
0.2
Negative IP
for b-jets
0
10
B-Tagging Meeting, July 19, 2002
-3
10
-2
10
19
-1
1
Darin Acosta
Old 4.2.0 IP Fit Categories
JP : B jet Tag Eff vs Prompt Jet Tag Eff
1
0.8
0.6
0.4
0.2
0
10
B-Tagging Meeting, July 19, 2002
-3
10
-2
10
20
-1
1
Darin Acosta
Heavy Flavor Parton Tagging Studies
CombAna: Eff. HF Tag
0.04
Select 2 highest ET
jets from Pythia
QCD MC, pT>40
CombAna_fEffJPHFTagEt
Entries
Mean
RMS
Underflow
Overflow
Integral
0.035
0.03
0.025
22
52.78
25.7
0
0.02151
0.359
0.02
0.015
Probability that a jet is a b-jet
(tagged at parton level)
0.01
0.005
0
0
10
20
30
40
50
60
70
80
90
100
Jet ET
CombAna: Eff. HF Tag 2 vs. Et (JP Tag 1)
0.14
0.12
Probability that second jet is
a b-jet when first jet is tagged
with JP<0.01 (positive SIP)
0.1
0.08
CombAna_fEffHFTag2JPTag1Et2
Entries
Mean
RMS
Underflow
Overflow
Integral
0.06
0.04
0.02
0
0
10
20
30
40
50
60
70
80
22
48.29
23.81
0
0.08824
1.153
90
100
CombAna: Ratio of HF Tag 2 Eff. (JP Tag 1)to All HF Tag vs. Et
4.5
4
Ratio of above
b-jet “enrichment factor”
≈ 3.5
⇒
3.5
3
2.5
CombAna_fRatioHFTag2JPTag1Et2
2
Entries
Mean
RMS
Underflow
Overflow
Integral
1.5
1
0.5
0
0
10
20
30
40
50
B-Tagging Meeting, July 19, 2002
60
70
80
22
51.97
22.68
0
4.103
52.49
90
100
21
Darin Acosta
Jet Probability Tagging Study (MC)
CombAna: Jet Prob (Neg SIP)
CombAna: Jet Prob (Pos SIP)
Select 2 highest ET
jets from Pythia
QCD MC, pT>40
CombAna_jet_JPpos
7000
2500
Entries
6000
2000
185924
Mean
0.4649
RMS
0.3038
Underflow
5000
Integral
1500
Entries
0
1.859e+05
4000
CombAna_jet_JPneg
1000
0
Overflow
Positive and
negative SIP
for all jets
182096
Mean
0.4946
RMS
0.2959
3000
2000
Underflow
500
Overflow
Integral
0
0
0
0
1000
1.821e+05
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
CombAna: Jet2 Prob (Neg SIP) (JP Tag 1)
100
60
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
CombAna: Jet2 Prob (Pos SIP) (JP Tag 1)
450
CombAna_JPNeg2JPTag1
Entries
80
0
0
4591
Mean
0.4609
RMS
0.2998
Underflow
0
Overflow
0
Integral
400
Positive and
negative SIP
for second jet when
first is tagged with
JP<0.01 (positive IP)
350
300
4591
250
CombAna_JPPos2JPTag1
200
Entries
40
150
RMS
0.3097
0
Overflow
50
0
0
0.4009
Underflow
100
20
4789
Mean
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
B-Tagging Meeting, July 19, 2002
1
0
0
Integral
0
4789
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
22
1
Darin Acosta
Jet Probability Tagging Study (Data)
CombAna: Jet Prob (Neg SIP)
600
CombAna_jet_JPpos
1400
1200
500
Entries
50335
Mean
0.4961
RMS
0.2938
Underflow
100
0
1
5.152e+04
Positive and
negative SIP
for all jets
600
400
1
200
5.033e+04
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
CombAna: Jet2 Prob (Neg SIP) (JP Tag 1)
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
CombAna: Jet2 Prob (Pos SIP) (JP Tag 1)
22
60
CombAna_JPNeg2JPTag1
12
0.3011
0
Overflow
Integral
14
RMS
CombAna_jet_JPneg
200
16
0.4688
Integral
800
18
51521
Mean
Overflow
300
20
Entries
Underflow
1000
400
0
0
Select 2 highest ET
jets from all jet data
CombAna: Jet Prob (Pos SIP)
Entries
Mean
0.4471
RMS
0.2904
Underflow
0
Overflow
0
Integral
CombAna_JPPos2JPTag1
655
Entries
50
682
Mean
0.4078
RMS
0.3031
Underflow
40
0
Overflow
655
Integral
Positive and
negative SIP
for second jet when
first is tagged with
JP<0.01 (positive IP)
0
682
30
10
8
20
6
4
10
2
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
B-Tagging Meeting, July 19, 2002
1
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
23
1
Darin Acosta
JetProb Tagging: MC vs. Data
CombAna: Ratio Jet2 Prob (JP Tag 1) too All Jet Prob. (Pos SIP)
3
MC
CombAna_RatioJPPos2JPTag1
Entries
102
Mean
0.4489
RMS
0.2951
Underflow
0
Overflow
0
Integral
98.93
2.5
Ratio of JP Distributions:
enriched tagged sample
to non-enriched jets
(normalized to equal
areas)
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CombAna: Ratio Jet2 Prob (JP Tag 1) too All Jet Prob. (Pos SIP)
3
CombAna_RatioJPPos2JPTag1
Entries
Data
Mean
RMS
2.5
Underflow
Overflow
Integral
102
0.4549
0.2901
0
0
95.36
2
1.5
MC and data tend to
agree on JP tagging
(very preliminary study)
1
0.5
0
0
0.1
0.2
B-Tagging Meeting, July 19, 2002
0.3
0.4
0.5
0.6
0.7
24
0.8
0.9
1
Darin Acosta
Conclusions
Good indication that the Jet Probability algorithm is tagging heavyflavor jets in data
Will study the HF content in data (measure efficiency and purity)
Study HF tagging efficiency in MC
Optimize track categories for SVX
Interesting that the heavy-flavor discrimination power of the
Jet Probability algorithm is not too sensitive to the precise shape of the
IP fits (when expressed as HF efficiency vs. background eff.)
Although getting flat track probabilities is sensitive
We plan to converge on a baseline set of track categories soon, and
publish the fit results to the AC++ JetProbModule so that general users
can apply this b-tagging tool
Current studies have been using a Stntuple module
A CDF Note and documentation on how to use JP will follow
Will extend tool to include L00 and ISL in the future
Earlier JP studies show that L00 significantly improves b-tagging efficiency
We also would like to generalize the Jet Probability algorithm to other
jet algorithms, and even other objects (e.g. J/Ψ)
Jet Probability algorithm just operates on the impact parameters of
a collection of tracks about a given axis
B-Tagging Meeting, July 19, 2002
25
Darin Acosta
Download