Status of the Jet Probability B-Tagging Algorithm for Run II Darin Acosta, Michael Schmitt, Dmitri Tsybychev, Song Ming Wang University of Florida Jet Probability Construct Track Probability from signed impact parameter measured by SVX: Probability that track originates + – from the primary vertex Construct Jet Probability from products of Track Probabilities: TP1⊗TP2⊗ … ⊗TPn bottom charm primary Flat from 0 to 1 for primary jets Peaks at 0 for b,c jets 0 B-Tagging Meeting, July 19, 2002 2 1 Darin Acosta Jet Probability in AC++ The Jet Probability code is in two AC++ packages: JetProbObjects (contains storeable objects JPTrack, JPJet) JetProbMods (main steering module JetProbModule) Talk-to implemented to select jet cone size, minimum Et, eta range, track quality cuts, … The user can get a collection of jets from JetProbModule with probabilities attached User can select probability cut to optimize S/√B for specific analysis Track probabilities and other track quality information also attached These tagging results are also available in Stntuple now Must run JetProbModule beforehand B-Tagging Meeting, July 19, 2002 3 Darin Acosta Status Prior to This Summer Last impact parameter fits were based on release 4.2.0 using QCD Monte Carlo from Pythia+cdfSim Probabilities from these fits are included in JetProb library release 4.5.0 and later 360 track categories were created (not all possible) 20 hit + shared hit combinations SVX+COT and SVX-only categories Pt dependence and innermost silicon layer reached L00 and ISL included r-z parameterizations also available But these old fits, when applied to data or recent MC, no longer lead to flat Track Probability distributions Complete re-analysis started this summer B-Tagging Meeting, July 19, 2002 4 Darin Acosta Data Samples and Processing Data: GJet03 (Jet20, Jet50, Jet70…) from 4.5.2 Production, 100Kevt MC: QCD samples produced using Pythia, 700Kevt L00 removed, beam offset included Offline defaults: Everything re-processed using 4.5.0int7 (data) or 4.6.0 (MC) Alignment version “20 3 Test” used for data Drop ISLD bank (ISL and L00) Outside-in r-φ tracking only Track quality: ≥3 SVX hits, p > 0.5 GeV, d < 1 mm (no COT cuts yet) Vertex taken from VXPRIM Impact parameter error convolutes track error with vertex error Jet Probability cone size 0.4, E > 7 GeV, |η| < 2.5 T 0 T B-Tagging Meeting, July 19, 2002 5 Darin Acosta Track Categories for IP Fits Conventions: The number of SVX “hits” along a track is considered to be the number of distinct SVX layers hit (not the total number of hits). An SVX “hit” is considered “shared” only if another outside-in track uses the same hit Track Categories: Since we consider only SVX layers, and since each layer may have 5 a hit, a shared hit, or no hit, there are a maximum of 3 =243 categories labeling the hits along a track (but only 192 have 3 SVX layers or more). We’ll consider fewer: Baseline (3 categories) 3, 4, or 5 SVX hits along track (don’t care if hits are shared) Hits+Shared (12 categories) 3, 4, or 5 SVX hits by 0, 1, 2, or 3+ shared hits Hits+Shared+pT (36 categories) 3, 4, or 5 SVX hits by 0, 1, 2, or 3+ shared hits by 3 pT bins: 0.5–2.0, 2.0–5.0, >5.0 GeV Hits+Shared+pT+η (72 categories) 3, 4, or 5 SVX hits by 0, 1, 2, or 3+ shared hits by 3 pT bins by 2 η bins (central, forward) B-Tagging Meeting, July 19, 2002 6 Darin Acosta Track Categories Continued “mLayer” (52 categories) Extend the “Hits+Shared” category to parameterize where missing hits are, where the innermost layer reached by a track is, and whether the innermost layers have shared hits Doesn’t yet include any pT or η dependence B-Tagging Meeting, July 19, 2002 7 Darin Acosta Comparison of Impact Parameter Distributions: Data vs. MC Signed Impact Parameter Signed IP Significance Data, Monte Carlo, and B-Bbar MC Significance Distribution Comparison Data, Monte Carlo, and B-Bbar MC Impact Parameter Distribution Comparison 10 10 10 10 -1 Data MC B-Bbar MC Data MC B-Bbar MC -1 10 -2 -2 10 -3 -3 10 -4 -4 -0.1 10 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -40 0.1 -30 -20 -10 0 10 D0 20 40 D0/D0Err Jet Data vs. QCD and bb MC, pT>40 Impact Parameter distribution is narrower in MC than in data B-Tagging Meeting, July 19, 2002 30 8 Darin Acosta IP Dependence on Number of Hit Layers Signed Impact Parameter Signed IP Significance Monte Carlo Significance Distribution: Hit Layers Comparison Monte Carlo Impact Parameter Distribution: Hit Layers Comparison 10 -1 3 Hits 4 Hits 5 Hits 10 10 10 -2 -2 10 10 3 Hits 4 Hits 5 Hits -1 -3 10 10 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -3 -4 -5 -40 0.1 -30 -20 -10 0 10 20 D0 40 D0/D0Err QCD MC, pT>40 IP Significance depends on number of hit layers: Longer tails for fewer hits B-Tagging Meeting, July 19, 2002 30 9 Darin Acosta IP Dependence on Number Shared Hits Signed Impact Parameter Signed IP Significance Monte Carlo Significance Distribution: Shared Layers Comparison Monte Carlo Impact Parameter Distribution: Shared Layers Comparison 10 10 0 Shared 1 Shared 2 Shared 3+ Shared -1 10 -2 -3 10 10 -1 -2 10 10 0 Shared 1 Shared 2 Shared 3+ Shared -3 -4 10 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 -4 -40 -30 -20 -10 0 10 20 D0 40 D0/D0Err QCD MC, pT>40 IP Significance depends on number of shared hits: Longer tails for more shared hits B-Tagging Meeting, July 19, 2002 30 10 Darin Acosta IP Dependence on Track pT Signed Impact Parameter Signed IP Significance Monte Carlo Significance Distribution: PT Comparison Monte Carlo Impact Parameter Distribution: PT Comparison 10 10 -1 Low Pt Mid Pt High Pt Low Pt Mid Pt High Pt -1 10 -2 -2 10 10 -3 -3 10 10 -4 -4 10 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 -40 -30 -20 -10 0 10 20 D0 40 D0/D0Err QCD MC, pT>40 Includes all hit and shared hit categories Not much dependence of IP significance on pT B-Tagging Meeting, July 19, 2002 30 11 Darin Acosta IP Dependence on Track η Signed Impact Parameter Signed IP Significance Monte Carlo Significance Distribution: Eta Comparison Monte Carlo Impact Parameter Distribution: Eta Comparison Central Eta Forward Eta -1 10 10 10 Central Eta Forward Eta -1 -2 -2 10 10 -3 10 10 10 -3 -4 -5 -4 10 10 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 -6 -40 0.1 -30 -20 -10 0 10 D0 30 D0/D0Err QCD MC, pT>40 Includes all hit and shared hit categories Not much dependence of IP significance on η B-Tagging Meeting, July 19, 2002 20 12 Darin Acosta 40 Fits to Impact Parameter Significance Fit negative side of D0/D0Err distribution, separately for MC and data No subtraction of heavy flavor content in data yet Fit the sum of 4 Gaussians 8 parameters (assume means are zero) Choose nonlinear bins to increase statistics in tails Transform axis to log(D0/D0Err) D0/D0Err B-Tagging Meeting, July 19, 2002 n.b. A Gaussian with µ=0 and σ=1 exhibits a peak at zero along transformed axis log(D0/D0Err) 13 Darin Acosta Baseline 3 Categories QCD MC, pT>40: 3 SVX hits 4 SVX hits 5 SVX hits Jet data: B-Tagging Meeting, July 19, 2002 14 Darin Acosta Group_0_Pt_0.5_2_0_noL00_allEta_SVX+COT July 02, 2002 using: /grincdf/tmp1/tsybych/pythia_qcd_pt40_offset/stntuple/i˘@ Group 0: Shared 0, Hit 3 _grp0_shr0_hit3 Nent = 43163 Mean = -0.3371 RMS = 1.154 Under = 1890 Over = 0 Integ = 4.127e+04 Chi2 / ndf = 136.2 / 136 Prob = 0.4833 AmpxG = 1235 ± 83.82 SigxG = 1.384 ± 0.05236 AmpxG1 = 2123 ± 93.91 SigxG1 = 0.6442 ± 0.01371 AmpxG2 = 519 ± 19.47 SigxG2 = 4.767 ± 0.1511 AmpxG3 = 124 ± 12.69 SigxG3 = 12.05 ± 0.4359 800 700 600 500 400 300 200 _grp0_shr0_hit4 Nent = 174054 Mean = -0.5144 RMS = 1.031 Under = 8187 Over = 0 Integ = 1.659e+05 Chi2 / ndf = 154.8 / 135 Prob = 0.1165 AmpxG = 5765 ± 350.9 SigxG = 1.16 ± 0.02461 AmpxG1 = 8960 ± 369.7 SigxG1 = 0.652 ± 0.00941 AmpxG2 = 1030 ± 31.16 SigxG2 = 4.037 ± 0.1091 AmpxG3 = 391.7 ± 22.36 SigxG3 = 10.16 ± 0.2058 3000 2500 2000 1500 1000 -2 0 2 4 Group 0: Shared 1, Hit 3 6 8 _grp0_shr1_hit3 Nent = 8470 Mean = 0.2656 RMS = 1.367 Under = 252 Over = 0 Integ = 8218 Chi2 / ndf = 207.1 / 135 Prob = 3.878e-05 AmpxG = 413.9 ± 8.141 SigxG = 0.8891 ± 0.01752 AmpxG1 = 212.8 ± 8.667 SigxG1 = 4.102 ± 0.1742 AmpxG2 = 78.49 ± 9.531 SigxG2 = 11.09 ± 5.552 AmpxG3 = 78.49 ± 9.531 SigxG3 = 11.09 ± 5.552 140 120 100 80 60 40 0 Group 0: Shared 0, Hit 5 _grp0_shr0_hit5 Nent = 525430 Mean = -0.6057 RMS = 0.9514 Under = 2.597e+04 Over = 0 Integ = 4.995e+05 Chi2 / ndf = 1101 / 138 Prob = 0 AmpxG = 9832 ± 278.7 SigxG = 1.448 ± 0.01399 AmpxG1 = 4.119e+04 ± 937.9 SigxG1 = 0.7255 ± 0.003259 AmpxG2 = 3409 ± 939 SigxG2 = -0.7255 ± 0.02075 AmpxG3 = 1074 ± 15.21 SigxG3 = 9.183 ± 0.07441 10000 8000 6000 4000 2000 500 100 0 Group 0: Shared 0, Hit 4 3500 -2 0 2 4 Group 0: Shared 1, Hit 4 6 8 _grp0_shr1_hit4 Nent = 39280 Mean = -0.2064 RMS = 1.183 Under = 1465 Over = 0 Integ = 3.782e+04 Chi2 / ndf = 376.3 / 136 Prob = 0 AmpxG = 2616 ± 14.26 SigxG = 0.8678 ± 0.004392 AmpxG1 = 709.9 ± 11.02 SigxG1 = 3.149 ± 0.04551 AmpxG2 = 156.5 ± 7.508 SigxG2 = 9.778 ± 3.662 AmpxG3 = 156.5 ± 7.508 SigxG3 = 9.778 ± 3.662 700 600 500 400 300 200 0 -2 0 2 4 Group 0: Shared 1, Hit 5 6 8 _grp0_shr1_hit5 Nent = 125831 Mean = -0.4564 RMS = 0.9995 Under = 5471 Over = 0 Integ = 1.204e+05 Chi2 / ndf = 184 / 135 Prob = 0.002715 AmpxG = 4002 ± 283.2 SigxG = 1.319 ± 0.03561 AmpxG1 = 6881 ± 308 SigxG1 = 0.7421 ± 0.01142 AmpxG2 = 554.1 ± 33.26 SigxG2 = 3.909 ± 0.2053 AmpxG3 = 226.8 ± 17.39 SigxG3 = 10.7 ± 0.2934 2500 2000 1500 1000 500 20 0 100 -2 0 2 4 Group 0: Shared 2, Hit 3 6 8 _grp0_shr2_hit3 Nent = 3854 Mean = 1.068 RMS = 1.344 Under = 49 Over = 0 Integ = 3805 Chi2 / ndf = 169.9 / 138 Prob = 0.03226 AmpxG = 77.95 ± 4.968 SigxG = 0.9883 ± 0.06339 AmpxG1 = 199 ± 15.22 SigxG1 = 6.727 ± 0.3293 AmpxG2 = 40.37 ± 209.9 SigxG2 = 13.57 ± 13.27 AmpxG3 = 40.37 ± 209.9 SigxG3 = 13.57 ± 13.27 70 60 50 40 30 20 0 -2 0 2 4 Group 0: Shared 2, Hit 4 6 8 _grp0_shr2_hit4 Nent = 11411 Mean = 0.1711 RMS = 1.296 Under = 318 Over = 0 Integ = 1.109e+04 Chi2 / ndf = 197 / 135 Prob = 0.0002809 AmpxG = 622.9 ± 6.595 SigxG = 0.9712 ± 0.009949 AmpxG1 = 307.2 ± 6.13 SigxG1 = 4.522 ± 0.0818 AmpxG2 = 63.97 ± 4.686 SigxG2 = 11.09 ± 6.468 AmpxG3 = 63.97 ± 4.686 SigxG3 = 11.09 ± 6.468 200 180 160 140 120 100 80 60 0 -2 0 2 4 Group 0: Shared 2, Hit 5 6 8 _grp0_shr2_hit5 Nent = 39000 Mean = -0.1841 RMS = 1.115 Under = 1442 Over = 0 Integ = 3.756e+04 Chi2 / ndf = 177 / 132 Prob = 0.004665 AmpxG = 1244 ± 138.9 SigxG = 0.7052 ± 0.0287 AmpxG1 = 1683 ± 130.1 SigxG1 = 1.316 ± 0.04244 AmpxG2 = 566.7 ± 20.8 SigxG2 = 4.42 ± 0.1372 AmpxG3 = 117.9 ± 15.65 SigxG3 = 10.05 ± 0.4157 700 600 500 400 300 200 40 10 0 0 2 4 Group 0: Shared 3, Hit 3 6 8 _grp0_shr3_hit3 Nent = 3932 Mean = 1.086 RMS = 1.238 Under = 46 Over = 0 Integ = 3886 Chi2 / ndf = 149.8 / 137 Prob = 0.2152 AmpxG = 41.18 ± 3.132 SigxG = 0.8637 ± 0.07574 AmpxG1 = 254.1 ± 5.754 SigxG1 = 5.689 ± 0.1081 AmpxG2 = 34.63 ± 4.945 SigxG2 = 12.79 ± 9.729 AmpxG3 = 34.63 ± 4.945 SigxG3 = 12.79 ± 9.729 90 80 70 60 50 40 30 0 4 6 8 4 120 100 80 60 0 6 8 _grp0_shr3_hit4 Nent = 12033 Mean = 0.6093 RMS = 1.29 Under = 240 Over = 0 Integ = 1.179e+04 Chi2 / ndf = 259.1 / 137 Prob = 2.174e-10 AmpxG = 282.4 ± 6.658 SigxG = 0.8388 ± 0.02165 AmpxG1 = 603.4 ± 8.805 SigxG1 = 3.831 ± 0.05128 AmpxG2 = 114.5 ± 7.21 SigxG2 = 10.87 ± 4.694 AmpxG3 = 114.5 ± 7.21 SigxG3 = 10.87 ± 4.694 140 20 2 2 160 10 0 0 180 40 -2 -2 Group 0: Shared 3, Hit 4 20 0 100 20 -2 0 -2 0 2 4 Group 0: Shared 3, Hit 5 6 8 _grp0_shr3_hit5 Nent = 36525 Mean = 0.06202 RMS = 1.178 Under = 1067 Over = 0 Integ = 3.546e+04 Chi2 / ndf = 293.3 / 135 Prob = 2.639e-15 AmpxG = 1551 ± 33.51 SigxG = 0.8551 ± 0.01506 AmpxG1 = 1571 ± 212.7 SigxG1 = 2.816 ± 0.04986 AmpxG2 = 45.45 ± 212.8 SigxG2 = -2.816 ± 0.4284 AmpxG3 = 309.4 ± 14.67 SigxG3 = 8.95 ± 0.1777 600 500 400 300 200 100 -2 0 2 4 6 8 0 -2 0 2 4 6 8 Group_1_Pt_2_5_0_noL00_allEta_SVX+COT July 02, 2002 using: /grincdf/tmp1/tsybych/pythia_qcd_pt40_offset/stntuple/i˘@ Group 1: Shared 0, Hit 3 _grp1_shr0_hit3 Nent = 38226 Mean = -0.3958 RMS = 1.173 Under = 1753 Over = 0 Integ = 3.647e+04 Chi2 / ndf = 159.7 / 139 Prob = 0.11 AmpxG = 1076 ± 133.6 SigxG = 1.218 ± 0.0554 AmpxG1 = 2016 ± 140 SigxG1 = 0.656 ± 0.01763 AmpxG2 = 266.4 ± 17.08 SigxG2 = 5.605 ± 0.3614 AmpxG3 = 186 ± 19.55 SigxG3 = 12.8 ± 0.4269 700 600 500 400 300 200 Group 1: Shared 0, Hit 4 _grp1_shr0_hit4 Nent = 149037 Mean = -0.5914 RMS = 0.9907 Under = 7378 Over = 0 Integ = 1.417e+05 Chi2 / ndf = 500.4 / 141 Prob = 0 AmpxG = 2865 ± 182 SigxG = 1.336 ± 0.02546 AmpxG1 = 1.193e+04 ± 493.8 SigxG1 = 0.7157 ± 0.007262 AmpxG2 = 1438 ± 495.2 SigxG2 = -0.7157 ± 0.03934 AmpxG3 = 453.7 ± 8.386 SigxG3 = 11.56 ± 0.133 3000 2500 2000 1500 1000 Group 1: Shared 0, Hit 5 _grp1_shr0_hit5 Nent = 437659 Mean = -0.6663 RMS = 0.9157 Under = 2.252e+04 Over = 0 Integ = 4.151e+05 Chi2 / ndf = 820.4 / 142 Prob = 0 AmpxG = 9460 ± 619.4 SigxG = 1.284 ± 0.01781 AmpxG1 = 3.325e+04 ± 331.7 SigxG1 = 0.7255 ± 0.003097 AmpxG2 = 2635 ± 620.8 SigxG2 = -1.284 ± 0.03792 AmpxG3 = 477.9 ± 8.209 SigxG3 = 13.82 ± 0.1513 9000 8000 7000 6000 5000 4000 3000 2000 500 100 0 1000 -2 0 2 4 Group 1: Shared 1, Hit 3 6 8 _grp1_shr1_hit3 Nent = 7484 Mean = 0.1768 RMS = 1.448 Under = 223 Over = 0 Integ = 7261 Chi2 / ndf = 140.5 / 140 Prob = 0.4762 AmpxG = 383.6 ± 15.61 SigxG = 0.8432 ± 0.02191 AmpxG1 = 144.3 ± 14.34 SigxG1 = 2.031 ± 0.1308 AmpxG2 = 83.62 ± 13.67 SigxG2 = 14.3 ± 7.252 AmpxG3 = 83.62 ± 13.67 SigxG3 = 14.3 ± 7.252 140 120 100 80 60 40 0 -2 0 2 4 Group 1: Shared 1, Hit 4 6 8 _grp1_shr1_hit4 Nent = 33467 Mean = -0.3339 RMS = 1.122 Under = 1413 Over = 0 Integ = 3.205e+04 Chi2 / ndf = 165.9 / 139 Prob = 0.05772 AmpxG = 1433 ± 114.1 SigxG = 0.6983 ± 0.02121 AmpxG1 = 1351 ± 109.8 SigxG1 = 1.287 ± 0.03737 AmpxG2 = 138.2 ± 8.89 SigxG2 = 14.65 ± 0.4126 AmpxG3 = 177.5 ± 11.54 SigxG3 = 4.925 ± 0.2922 600 500 400 300 200 -2 0 2 4 Group 1: Shared 1, Hit 5 6 8 _grp1_shr1_hit5 Nent = 96522 Mean = -0.5305 RMS = 0.9595 Under = 4361 Over = 0 Integ = 9.216e+04 Chi2 / ndf = 419.1 / 141 Prob = 0 AmpxG = 3028 ± 323.5 SigxG = 1.411 ± 0.03138 AmpxG1 = 6356 ± 179.4 SigxG1 = 0.7774 ± 0.009223 AmpxG2 = 597.5 ± 325.2 SigxG2 = -1.411 ± 0.07478 AmpxG3 = 159.7 ± 5.032 SigxG3 = 13.39 ± 0.2612 2000 1800 1600 1400 1200 1000 800 600 400 100 20 0 200 0 -2 0 2 4 Group 1: Shared 2, Hit 3 6 8 _grp1_shr2_hit3 Nent = 3068 Mean = 0.874 RMS = 1.576 Under = 58 Over = 0 Integ = 3010 Chi2 / ndf = 194.2 / 139 Prob = 0.001089 AmpxG = 116.8 ± 4.75 SigxG = 0.9738 ± 0.03807 AmpxG1 = 53.96 ± 6.52 SigxG1 = 5.249 ± 0.507 AmpxG2 = 34.04 ± 95.23 SigxG2 = 16.35 ± 1.607 AmpxG3 = 80.01 ± 95.22 SigxG3 = 16.35 ± 0.7993 50 40 30 20 10 0 -2 0 2 4 Group 1: Shared 2, Hit 4 6 8 _grp1_shr2_hit4 Nent = 10502 Mean = -0.1034 RMS = 1.204 Under = 332 Over = 0 Integ = 1.017e+04 Chi2 / ndf = 197.2 / 138 Prob = 0.0005259 AmpxG = 386.9 ± 29.4 SigxG = 0.867 ± 25.25 AmpxG1 = 453.1 ± 29.63 SigxG1 = 1.456 ± 25.27 AmpxG2 = 67.94 ± 26.56 SigxG2 = 13.07 ± 8.948 AmpxG3 = 67.94 ± 26.56 SigxG3 = 7.577 ± 4.079 220 200 180 160 140 120 100 80 60 40 0 -2 0 2 4 Group 1: Shared 2, Hit 5 6 8 _grp1_shr2_hit5 Nent = 34344 Mean = -0.3367 RMS = 1.006 Under = 1286 Over = 0 Integ = 3.306e+04 Chi2 / ndf = 151.3 / 137 Prob = 0.1914 AmpxG = 2046 ± 120 SigxG = 1.253 ± 0.02378 AmpxG1 = 921.2 ± 122.4 SigxG1 = 0.699 ± 0.03284 AmpxG2 = 181.2 ± 9.819 SigxG2 = 4.521 ± 0.1683 AmpxG3 = 37.82 ± 3.956 SigxG3 = 15.37 ± 0.7664 700 600 500 400 300 200 100 20 0 -2 0 2 4 Group 1: Shared 3, Hit 3 6 8 _grp1_shr3_hit3 Nent = 3661 Mean = 1.303 RMS = 1.467 Under = 51 Over = 0 Integ = 3610 Chi2 / ndf = 235.9 / 142 Prob = 4.844e-07 AmpxG = 75.63 ± 3.667 SigxG = 0.8285 ± 0.04044 AmpxG1 = 193.1 ± 15.36 SigxG1 = 9.732 ± 0.3758 AmpxG2 = 14.7 ± 97.73 SigxG2 = 18.04 ± 3.533 AmpxG3 = 56.13 ± 97.68 SigxG3 = 18.04 ± 1.364 80 70 60 50 40 30 20 -2 0 2 4 Group 1: Shared 3, Hit 4 6 8 _grp1_shr3_hit4 Nent = 9490 Mean = 0.2953 RMS = 1.388 Under = 254 Over = 0 Integ = 9236 Chi2 / ndf = 172.6 / 140 Prob = 0.03036 AmpxG = 218.6 ± 48.7 SigxG = 0.675 ± 0.06362 AmpxG1 = 332.1 ± 45.25 SigxG1 = 1.384 ± 0.09552 AmpxG2 = 125.6 ± 8.863 SigxG2 = 15.79 ± 0.476 AmpxG3 = 205 ± 11 SigxG3 = 5.41 ± 0.2743 160 140 120 100 80 60 40 -2 0 2 4 6 8 0 0 -2 0 2 4 Group 1: Shared 3, Hit 5 6 8 _grp1_shr3_hit5 Nent = 30574 Mean = -0.1511 RMS = 1.093 Under = 1035 Over = 0 Integ = 2.954e+04 Chi2 / ndf = 193 / 141 Prob = 0.001971 AmpxG = 833.3 ± 108.1 SigxG = 0.7361 ± 0.03631 AmpxG1 = 1556 ± 101.8 SigxG1 = 1.4 ± 0.03887 AmpxG2 = 385.4 ± 16.49 SigxG2 = 4.581 ± 0.1513 AmpxG3 = 59.11 ± 6.453 SigxG3 = 14.56 ± 0.6616 600 500 400 300 200 100 20 10 0 0 -2 0 2 4 6 8 0 -2 0 2 4 6 8 Track Probability Use the parameterized results and integrate the area under the curve to find the probability of a track getting this value of the impact parameter significance or larger Should be a flat probability distribution for prompt jets Should peak at zero for b/c jets Combine into Jet Probability Constructed so that prompt jets still give flat distribution B-Tagging Meeting, July 19, 2002 15 D0/D0Err Darin Acosta Group_1_Pt_2_5_0_noL00_allEta_SVX+COT Track Probability Distribution Track Probability Distribution Track Probability Distribution 250 900 Track Probability Distribution 2500 800 200 2000 700 600 150 1500 500 TP_grp1_shr0_hit3 TP_grp1_shr0_hit4 400 Nent = 10378 Mean = 0.5028 100 Nent = 39971 Mean = 0.5037 300 RMS = 0.2895 Under = 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.9 Nent = 118090 Mean = 0.5017 RMS = 0.2899 Over = 0 Integ = 1.038e+04 0 0 TP_grp1_shr0_hit5 1000 1 Track Probability Distribution 200 Under = 100 Over = 0 Integ = 3.997e+04 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 RMS = 0.29 0 0.9 Under = 500 0 Over = 0 Integ = 1.181e+05 1 Track Probability Distribution 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Track Probability Distribution 200 50 500 180 160 40 400 140 120 30 TP_grp1_shr1_hit3 TP_grp1_shr1_hit4 100 Nent = 1984 20 Over = 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Over = 20 1 Track Probability Distribution 0 0 Nent = 25630 RMS = 0.2895 Under = 0 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 Integ = 2.563e+04 1 Track Probability Distribution 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Track Probability Distribution 220 70 200 180 60 20 160 50 140 15 120 40 10 TP_grp1_shr2_hit3 TP_grp1_shr2_hit4 Nent = 846 Nent = 2778 30 Mean = 0.5067 RMS = 0.2863 Under = 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 Over = 0 Integ = 846 0.8 0.9 Under = Over = 10 Nent = 9279 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 Under = 40 0 0.9 Mean = 0.4959 RMS = 0.2861 60 Over = 20 Integ = 2778 1 TP_grp1_shr2_hit5 100 80 Mean = 0.5043 RMS = 0.2952 20 Track Probability Distribution 1 Track Probability Distribution 0 0 0 0 Integ = 9279 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Track Probability Distribution 200 30 180 60 25 160 50 140 20 120 40 15 TP_grp1_shr3_hit3 Nent = 1026 TP_grp1_shr3_hit4 Nent = 2557 30 Mean = 0.4937 10 Under = Over = 5 0 0 20 RMS = 0.2883 10 Under = Over = Integ = 1026 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100 TP_grp1_shr3_hit5 Nent = 8286 80 Mean = 0.5015 RMS = 0.2916 0 0 Over = Integ = 8994 25 0 0 Mean = 0.5029 RMS = 0.2875 Under = 0 40 0 Integ = 1984 0 0 TP_grp1_shr1_hit5 200 Mean = 0.5095 60 RMS = 0.2856 Under = 0 10 300 Nent = 8994 80 Mean = 0.4937 Mean = 0.5039 60 0 0 Integ = 2557 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RMS = 0.2881 40 Under = Over = 20 Integ = 8286 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0 0.9 1 Jet Probability Distribution for b-jets Calculate using positive IP and 36 fit categories (pT+shr+hit) Pythia bb MC Jet Prob (RPhi Pos) 10 Jet Prob (RPhi Pos) 10 16000 10 JPRPHIP Entries 47127 Mean 0.2063 RMS 0.2832 4 JPRPHIP Entries 47127 Mean 0.2063 RMS 0.2832 3 2 14000 10 12000 1 0 0.2 0.4 0.6 0.8 1 Log scale 10000 8000 6000 4000 2000 0 0 0.2 0.4 0.6 0.8 1 Sharp spike near 0 probability B-Tagging Meeting, July 19, 2002 16 Darin Acosta Jet Probability b-jet Tagging Efficiency Cut JP<0.01 for Positive IP for b-jets in Pythia bb MC Tag Eff Central jets: |η|<1 Must have ≥ 1 good track 0.8 0.7 0.6 0.5 0.4 mLayer AllGrp Pt+Shr+Hit Shr+Hit baseline 0.3 0.2 10 15 B-Tagging Meeting, July 19, 2002 20 25 30 17 35 40 45 50 Jet Et (GeV) Darin Acosta Jet Probability mis-Tagging Efficiency Cut JP<0.01 for Negative IP for non-b-jets in Pythia QCD MC Central jets: |η|<1 Must have ≥ 1 good track Tag Eff 0.025 mLayer AllGrp Pt+Shr+Hit Shr+Hit baseline 0.02 0.015 0.01 0.005 0 10 15 B-Tagging Meeting, July 19, 2002 20 25 30 18 35 40 45 50 Jet Et (GeV) Darin Acosta B-jet Efficiency vs. Prompt Jet Efficiency Central jets: |η|<1 Must have ≥ 1 good track 36 fit categories (pT+shr+hit) JP : B jet Tag Eff vs Prompt Jet Tag Eff Positive IP for b-jets 1 0.8 0.6 0.4 0.2 Negative IP for b-jets 0 10 B-Tagging Meeting, July 19, 2002 -3 10 -2 10 19 -1 1 Darin Acosta Old 4.2.0 IP Fit Categories JP : B jet Tag Eff vs Prompt Jet Tag Eff 1 0.8 0.6 0.4 0.2 0 10 B-Tagging Meeting, July 19, 2002 -3 10 -2 10 20 -1 1 Darin Acosta Heavy Flavor Parton Tagging Studies CombAna: Eff. HF Tag 0.04 Select 2 highest ET jets from Pythia QCD MC, pT>40 CombAna_fEffJPHFTagEt Entries Mean RMS Underflow Overflow Integral 0.035 0.03 0.025 22 52.78 25.7 0 0.02151 0.359 0.02 0.015 Probability that a jet is a b-jet (tagged at parton level) 0.01 0.005 0 0 10 20 30 40 50 60 70 80 90 100 Jet ET CombAna: Eff. HF Tag 2 vs. Et (JP Tag 1) 0.14 0.12 Probability that second jet is a b-jet when first jet is tagged with JP<0.01 (positive SIP) 0.1 0.08 CombAna_fEffHFTag2JPTag1Et2 Entries Mean RMS Underflow Overflow Integral 0.06 0.04 0.02 0 0 10 20 30 40 50 60 70 80 22 48.29 23.81 0 0.08824 1.153 90 100 CombAna: Ratio of HF Tag 2 Eff. (JP Tag 1)to All HF Tag vs. Et 4.5 4 Ratio of above b-jet “enrichment factor” ≈ 3.5 ⇒ 3.5 3 2.5 CombAna_fRatioHFTag2JPTag1Et2 2 Entries Mean RMS Underflow Overflow Integral 1.5 1 0.5 0 0 10 20 30 40 50 B-Tagging Meeting, July 19, 2002 60 70 80 22 51.97 22.68 0 4.103 52.49 90 100 21 Darin Acosta Jet Probability Tagging Study (MC) CombAna: Jet Prob (Neg SIP) CombAna: Jet Prob (Pos SIP) Select 2 highest ET jets from Pythia QCD MC, pT>40 CombAna_jet_JPpos 7000 2500 Entries 6000 2000 185924 Mean 0.4649 RMS 0.3038 Underflow 5000 Integral 1500 Entries 0 1.859e+05 4000 CombAna_jet_JPneg 1000 0 Overflow Positive and negative SIP for all jets 182096 Mean 0.4946 RMS 0.2959 3000 2000 Underflow 500 Overflow Integral 0 0 0 0 1000 1.821e+05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CombAna: Jet2 Prob (Neg SIP) (JP Tag 1) 100 60 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CombAna: Jet2 Prob (Pos SIP) (JP Tag 1) 450 CombAna_JPNeg2JPTag1 Entries 80 0 0 4591 Mean 0.4609 RMS 0.2998 Underflow 0 Overflow 0 Integral 400 Positive and negative SIP for second jet when first is tagged with JP<0.01 (positive IP) 350 300 4591 250 CombAna_JPPos2JPTag1 200 Entries 40 150 RMS 0.3097 0 Overflow 50 0 0 0.4009 Underflow 100 20 4789 Mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 B-Tagging Meeting, July 19, 2002 1 0 0 Integral 0 4789 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 22 1 Darin Acosta Jet Probability Tagging Study (Data) CombAna: Jet Prob (Neg SIP) 600 CombAna_jet_JPpos 1400 1200 500 Entries 50335 Mean 0.4961 RMS 0.2938 Underflow 100 0 1 5.152e+04 Positive and negative SIP for all jets 600 400 1 200 5.033e+04 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CombAna: Jet2 Prob (Neg SIP) (JP Tag 1) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CombAna: Jet2 Prob (Pos SIP) (JP Tag 1) 22 60 CombAna_JPNeg2JPTag1 12 0.3011 0 Overflow Integral 14 RMS CombAna_jet_JPneg 200 16 0.4688 Integral 800 18 51521 Mean Overflow 300 20 Entries Underflow 1000 400 0 0 Select 2 highest ET jets from all jet data CombAna: Jet Prob (Pos SIP) Entries Mean 0.4471 RMS 0.2904 Underflow 0 Overflow 0 Integral CombAna_JPPos2JPTag1 655 Entries 50 682 Mean 0.4078 RMS 0.3031 Underflow 40 0 Overflow 655 Integral Positive and negative SIP for second jet when first is tagged with JP<0.01 (positive IP) 0 682 30 10 8 20 6 4 10 2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 B-Tagging Meeting, July 19, 2002 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 23 1 Darin Acosta JetProb Tagging: MC vs. Data CombAna: Ratio Jet2 Prob (JP Tag 1) too All Jet Prob. (Pos SIP) 3 MC CombAna_RatioJPPos2JPTag1 Entries 102 Mean 0.4489 RMS 0.2951 Underflow 0 Overflow 0 Integral 98.93 2.5 Ratio of JP Distributions: enriched tagged sample to non-enriched jets (normalized to equal areas) 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CombAna: Ratio Jet2 Prob (JP Tag 1) too All Jet Prob. (Pos SIP) 3 CombAna_RatioJPPos2JPTag1 Entries Data Mean RMS 2.5 Underflow Overflow Integral 102 0.4549 0.2901 0 0 95.36 2 1.5 MC and data tend to agree on JP tagging (very preliminary study) 1 0.5 0 0 0.1 0.2 B-Tagging Meeting, July 19, 2002 0.3 0.4 0.5 0.6 0.7 24 0.8 0.9 1 Darin Acosta Conclusions Good indication that the Jet Probability algorithm is tagging heavyflavor jets in data Will study the HF content in data (measure efficiency and purity) Study HF tagging efficiency in MC Optimize track categories for SVX Interesting that the heavy-flavor discrimination power of the Jet Probability algorithm is not too sensitive to the precise shape of the IP fits (when expressed as HF efficiency vs. background eff.) Although getting flat track probabilities is sensitive We plan to converge on a baseline set of track categories soon, and publish the fit results to the AC++ JetProbModule so that general users can apply this b-tagging tool Current studies have been using a Stntuple module A CDF Note and documentation on how to use JP will follow Will extend tool to include L00 and ISL in the future Earlier JP studies show that L00 significantly improves b-tagging efficiency We also would like to generalize the Jet Probability algorithm to other jet algorithms, and even other objects (e.g. J/Ψ) Jet Probability algorithm just operates on the impact parameters of a collection of tracks about a given axis B-Tagging Meeting, July 19, 2002 25 Darin Acosta