Higgs Pair Production in bbTT Final States at the ARCHNES HL-LHC by MASSACKLM;Ts fK3TUTE OF r'CHNOLOLGY by Jay Mathew Lawhorn AUG 10 2015 Submitted to the Department of Physics in partial fulfillment of the requirements for the degree of LIBRARIES BACHELOR OF SCIENCE at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2015 Jay Mathew Lawhorn, MMXV. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature redacted . Author . Z67/ Certified by.. C Department of Physics January 20, 2015 Signature redacted Markus Klute Associate Professor Thesis Supervisor . Accepted by Signature redacted............. Professor Nergis Mavalvala Senior Thesis Coordinator, Department of Physics 2 Higgs Pair Production in bbTT Final States at the HL-LHC by Jay Mathew Lawhorn Submitted to the Department of Physics on January 20, 2015, in partial fulfillment of the requirements for the degree of BACHELOR OF SCIENCE Abstract A measurement of standard model Higgs pair production in bbrr final states at the High Luminosity LHC is investigated. Higgs pair production can be used to measure the Higgs trilinear coupling constant, which uniquely determines the shape of the Higgs potential. The doubly hadronic, hadron-muon, and hadron-electron di-r final states are considered, with a shape analysis on either the stransverse mass (doubly hadronic) or a BDT discriminant (hadron-muon, hadron-electron) distribution performed to extract expected significances. The expected 95% CL upper limit on the cross section times branching ratio from a combination of all three channels is 2.2 times the SM value, with an expected +1o- uncertainty on the measured cross section of 67%, indicating this measurement is feasible. Thesis Supervisor: Markus Klute Title: Associate Professor 3 4 Acknowledgments I am deeply grateful to Professor Markus Klute, my research supervisor. My ex- periences at MIT and future as a scientist has been unquestionably shaped by the amazing opportunities and challenges he presented me these past two years. A special thank you to Aram Apyan, with whom I worked closely with on all our upgrade studies. Also thank you to my previous research supervisors and mentors: Jim Annis, Jeff Kubo, Donna Kubik, James Battat, Professor Peter Fisher, and Shawn Henderson. The diverse skill set I gained in their groups was invaluable. Thanks to all the current and past MIT-CMS members who made the group an easy and fun place to work, especially Kevin Sung, Leonardo di Matteo, Professor Christoph Paus, Max Goncharov, Valentina Dutta, Stephanie Brandt, Catherine Medlock, and Allison Christian. Also thanks to my academic advisor Professor Jesse Thaler, Miri Skolnik, and Stephen Benyas, as well as Allison Mann, Chelsea Levy, Katharine Berry P.F., Dan Abercrombie, Ian Chen, Brandon Allen, Sid Narayanan, and all my other friends. My research was funded by the MIT Undergraduate Research Opportunities Program and the MIT International Science and Technology Initiatives program. 5 6 Contents 1 Introduction 15 2 Higgs Physics 17 3 Signal Process 21 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . Object Reconstruction 22 25 The Compact Muon Solenoid . 4 The bb-rr Final State . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 26 5 Background Processes 29 6 Monte Carlo Samples 31 7 Event Selection 33 8 Signal Extraction 39 . . . . . . . . . . . . . . . . . . . . 39 8.2 Semi-Leptonic Channels . . . . . . . . . . . . . . . . . . . . 40 8.3 Statistical Interpretation . . . . . . . . . . . . . . . . . . . . 45 8.4 Uncertainties . . . . . . . . . . . . . . . . . . . . 45 . . . . . . . . . . Fully Hadronic Channel Results 47 Cross Check Using 8 TeV Data Sets . . . . . . . . . . . . . . . . . . 9.2 14 TeV Results . 9.1 . . . . 47 52 . 9 8.1 7 10 Conclusions 53 A Semi-Leptonic BDT 55 8 List of Figures 2-1 Feynman diagrams contributing to gluon fusion Higgs pair production. 7-1 Predicted m,, (top) and mbb (bottom) distributions in the ThTh 18 chan- nel. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of five hundred. 7-2 . . . . . . . 36 Predicted m,, (top) and mbb (bottom) distributions in the rtrh channel. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one thousand. 7-3 Predicted m,, (top) and mbb (bottom) in the TeTh . . . . . . 37 channel. The back- ground yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one thousand. 8-1 . . . . . . . . . . . . . . . . . . . Predicted distribution of the p-r(bb) (top) and in the Thrh mT2 38 (bottom) variables channel after mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one hundred. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 40 8-2 Predicted distributions for the pT(bb) (top) and mT2 (bottom) variables in the r,rh channel after mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one thousand. ....... 8-3 ............................. Predicted distributions for the pT(bb) (top) and mT2 42 (bottom) variables in the TeTh channel after mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of five thousand. . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4 43 Predicted distribution of the BDT discriminant in the rTh (top) and rerh (bottom) channels for the signal region. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of five hundred. 9-1 . . . . . . . . . . . . . . . . . . . . . . . . . . Cross check with 8 TeV data for the tions for the mT2 ThTh 44 channel. Predicted distribu- variable before (top) and after (bottom) mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by one thousand (top) or one hundred (bottom). In the bottom figure, the data is blinded for 9-2 Cross check with 8 TeV data for the tions for the mT2 TTh mT2 > 100 GeV. 50 channel. Predicted distribu- variable before (top) and after (bottom) mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by ten thousand (top) or one thousand (bottom ). A-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Input variable distributions for TTh channel BDT. The signal is shown in blue, while the background is red. 10 51 . . . . . . . . . . . . . . . . . . 56 A-2 Input variable distributions for TeTh channel BDT. The signal is shown in blue, while the background is red. A-3 . . . . . . . . . . . . . . . . . . 57 Correlation matrices for the (top) signal and (bottom) background samples in T,rh channel. . . . . . . . . . . . . . . . . . . . . . . . . . 58 A-4 Overtraining check for BDT classifier in the r,1 h channel.The signal is . . . . . . . . . . . . . . 59 A-5 ROC curve for BDT classifier in the rtTh channel. . . . . . . . . . . . 59 A-6 Correlation matrices for the (top) signal and (bottom) background shown in blue, while the background is red. samples in rer channel. . . . . . . . . . . . . . . . . . . . . . . . . . 60 A-7 Overtraining check for BDT classifier in the Terh channel.The signal is A-8 shown in blue, while the background is red . . . . . . . . . . . . . . . 61 ROC curve for BDT classifier in the erTh channel. 61 11 . . . . . . . . . . . 12 List of Tables 3.1 Expected number of SM HH -+ bbTT events separated by at 6.1 = 14 TeV in 3000 fb- TT final state . . . . . . . . . . . . . . . . . of data. 22 List of SM background categories generated for CMS upgrade studies, including the main processes for each category, generator-level final . . . . . . . . . . . . . . . . . . . states, and order in each coupling. 7.1 32 Summary of object-level selection criteria for each di-T channel. The absolute isolation variable I is properly defined in the 8 TeV H -+ Tr analysis but not used here. The relative isolation variable R, as previously defined, is used in its place. 7.2 Expected yields in each channel for 3000 fbafter baseline selection requirements. 7.3 . . . . . . . . . . . . . . . . . 8.1 Expected signal yields in each channel for 3000 fb- 1 35 of integrated lu- . . . . . . . . . . . . . . . 35 Expected signal yields in each channel for 3000 fb-' of integrated luminosity after requiring that the 9.1 of integrated luminosity . . . . . . . . . . . . . . . . . . minosity after mass window requirements. 35 mT2 variable is greater than 100 GeV. Cross check with 8 TeV full simulation at two stages of the 41 rhTh cut- based selection. The 8 TeV columns are from full simulation MC and are scaled to 8 TeV with 21 fb- 1 integrated luminosity, except (*) the signal and tfyields which are scaled to 14 TeV and 3000 fb-1 integrated luminosity. The 14 TeV columns are from Delphes and are scaled to . . . . . . . . . . . . . . . . . . . . 3000 fb-1 integrated luminosity. 13 48 9.2 Cross check with 8 TeV full simulation at two stages of the Trah cut- based selection. The 8 TeV columns are from full simulation MC and are scaled to 8 TeV with 21 fb- 1 integrated luminosity, except (*) the signal and tf yields which are scaled to 14 TeV and 3000 fb- 1 integrated luminosity. The 14 TeV columns are from Delphes and are scaled to 3000 fb9.3 1 integrated luminosity. . . . . . . . . . . . . . . . . . . . . . 49 Statistical results of the analysis, showing the asymptotic 95% CL upper limit on the expected cross section and the expected tainties on the cross section measurement. 14 1a- uncer- . . . . . . . . . . . . . . . 52 Chapter 1 Introduction The standard model (SM) of particle physics was developed in the 1960s and 1970s to explain observations at sub-atomic scale and correspondingly high energy. The theory has been highly successful, predicting the existence of W and Z bosons, gluons, and three heavy quarks before their respective discoveries. The recent discovery [1-31 of the Higgs boson at the Large Hadron Collider (LHC) was another major confirmation of the SM. All current measurements of Higgs boson properties are consistent with SM predictions [4-61, but some phenomena cannot be measured in the data acquired. Additionally, significant experimental evidence suggests the SM is not yet complete, including the gravitational force, dark matter, and neutrino masses. The LHC is scheduled to continue data taking until 2022 when the proposed High Luminosity LHC (HL-LHC) project will make major upgrades to the accelerator complex, increasing the number of collisions per second by a factor of 2.5. The HLLHC is expected to take 3000 fb- 1 of data over ten years and will allow observations of currently unobservable Higgs phenomena as well as precision measurements of previously known SM parameters. The Higgs trilinear coupling constant is one SM parameter that could be measured at the HL-LHC. This constant governs the rate of interactions involving three Higgs bosons, including Higgs pair production events where one (highly off-shell) Higgs boson decays to two on-shell Higgs bosons. The trilinear coupling constant is particularly sensitive to non-SM phenomena in the Higgs sector because it uniquely 15 determines the width of the SM Higgs potential. The SM cross section for Higgs pair production is almost minimal due to interference between two possible Feynman diagrams. Large deviations from the SM trilinear coupling constant increase the production cross section, making Higgs pair production searches uniquely sensitive to non-SM processes. This thesis investigates a measurement of the Higgs pair production cross section in final states containing two b-quarks and two T leptons at the Compact Muon Solenoid (CMS) experiment during the HL-LHC run. The HL-LHC configuration of the CMS detector is currently being designed, but will including a number of upgrades to combat both detector aging and the harsh HL-LHC run conditions. This analysis, as well as Higgs pair production analyses in the bb'yy and bbWW final states, will be included in upcoming the technical proposal for that upgrade. 16 Chapter 2 Higgs Physics The Higgs mechanism [7-12] was proposed in 1964 to preserve local gauge invariance in Lagrangian density while allowing mass terms for the fermions and gauge bosons [13-151. In terms of the physical Higgs field H, the SM Lagrangian for Higgs interactions with vector and Higgs bosons, and fermions is given by =a +61.rV11VA (2m2 VH + ffH H+rn (2v f Y 2VH2 22.1) 2 + +2 H3+ 8vHH4 where nf is the fermion mass, 6w = 1, 6 f is the fermion field, V is either a W, or Z and z = 1/2 [16]. The CMS Higgs coupling measurements from a combination all analyzed final states in the full 8 TeV data set find all observed coupling constants to be consistent with the SM [6]. However, the Higgs decay modes to lighter quarks and leptons as well as the final two terms in Equation 2.1 cannot be probed in the current data due to their vastly smaller cross sections. At a proton-proton collider like the LHC, gluon-gluon fusion via a top quark loop is the dominant mode for both single Higgs boson and Higgs pair production. Vector boson fusion (VBF) and associated production with either a vector boson (VH) or tt pair (ttH) processes also contribute with a smaller cross section but with additional tagging particles that can be exploited to increase the signal-to-background ratio. All four production modes were exploited for the Higgs discovery at V 17 = 7 and 8 TeV. There are two main Feynman diagrams for Higgs pair production, shown in Figure 2-1. The right diagram shows a highly off-shell Higgs produced via gluon-gluon fusion that decays to a pair of less massive Higgs bosons. However, the dependence of the overall production cross section on AHHH is diluted by the left diagram, which also produces Higgs pairs without an HHH vertex. The overall production cross section is reduced because the two diagram destructively interfere. h t t1 '00000- - g Figure 2-1: Feynman diagrams contributing to gluon fusion Higgs pair production. At leading order in QCD, the SM partonic cross section for gluon-gluon fusion Higgs pair production is f+ -Lo(99 -+ HH) { a 256(27) 3 AHH AHHH s-MI + iMHF H FA + Fo 2(2.2) + |Ga| 2 where PH is the Higgs decay width, FA, FE, and Gr are form factors, and the limits of integration are 2 sA 1- 4M ) G2 ( R) where 9 and t are the partonic Mandlestam variables [171. In the limit of an infinitely massive top quark, the form factors reduce to FA = 2/3, FE = -2/3, and Go = 0. Evaluated at /s = 14 TeV, the inclusive Higgs pair production cron 17.8+i fb at leading order and 40.2t3 section is fb at NNLO [181. Because SM Higgs pair production is nearly minimal, many beyond the SM models for the Higgs sector predict an increased cross section, including the Minimally Supersymmetric Standard Model (MSSM) and Higgs portal scenarios [19, 201. For example, a Higgs portal scenario 18 motivated by electroweak baryogenesis predicts yields of up to twenty times the SM expectation [211. 19 20 Chapter 3 Signal Process In the 3000 fh-' of data collected at the HL-LHC, approximately one hundred and twenty thousand gluon-gluon fusion Higgs pair events are expected. However, these events are distributed among a large number of final states because of the large number of SM Higgs boson decay modes. Like the single Higgs searches, Higgs pair searches rely on minimizing all reducible backgrounds and precisely reconstructing the two Higgs mass peaks. Much of the signal strength needed for the initial Higgs discovery lay in the H -+ -y-y and H -+ ZZ -+ 41 channels, which have quite small branching ratios of 0.23% and 0.00125% respectively. Because of the excellent lepton and photon resolutions achieved by the CMS and ATLAS detectors, the lack of neutrinos in the final states, the Higgs mass, and careful analysis efforts, these two channels outperformed the more likely channels bb, TT, and WW. The single Higgs search in the bb final state, with a branching ratio of 57%, was complicated by overwhelming backgrounds and relatively poor jet resolution. The Tr and WW searches had the additional challenge of reconstructing neutrinos in the final state, which escape the detector. While the Higgs mass resolution is still an important concern for Higgs pair searches, the much smaller inclusive cross section means large branching ratios are more important than in the single Higgs search. The large branching ratio for the H -+ bb process makes it a very attractive channel, especially when paired with a channel with more discriminating power. The branching ratio for the H -+ ZZ -+ 41 21 process is so small it is generally not a good candidate for these searches. The bbbb, bbyy, bbTr, and bbWW final states have been considered in theoretical studies with mixed results [17,22-241. From these studies, the bby7 and bbrT channels seem the most promising. 3.1 The bbTT Final State This analysis focuses on the HH -- bbrT channel because its relatively large branching ratio of 7.29% compared to the cleaner bb-y- channel and markedly lower backgrounds than the dominant bbbb channel. It is also an excellent standard candle for overall detector performance, as discussed further in the next chapter. The most promising theoretical work on this channel reported a possible measurement of the trilinear coupling constant AHHH with 30% uncertainty at the HL-LHC [231. While promising, that study and similar ones neglect detector effects that must be taken into account and will likely reduce the significance of any result. In 3000 fb- 1 of data, the expected yield in the bbrT final state is 8792 events. However, because T leptons themselves decay before detection, we must consider each di-T final state separately. The majority of T leptons (64.8%) decay into a v, with some combination of neutral and charged hadrons, classified by the number of charged hadrons into one-, three-, and five-prong decay modes. The remaining T leptons decay almost equally into evev, (17.8%) and pJvv, (17.4%) modes [25]. Table 3.1: Expected number of SM HH -+ bbrT events separated by TT final state at Vs = 14 TeV in 3000 fb-I of data. di-r Final State Notation Yield di-T Final State Notation Yield Electron-electron TeTe 279 Electron-muon Terp 545 Muon-muon TrTr 266 Muon-hadronic TpTh 1983 Hadronic-hadronic Thh 3692 Electron-hadronic TeTh 2028 Table 3.1 lists the number of expected HH -+ bbrT events for each The doubly hadronic final state, ThTh, dominates, while the TeTe TT final state. and -r,r, final states are least likely. Because the Tere and TrTl, final states are completely overwhelmed by the SM Z -+ ee and Z -+ pp processes respectively, they are not considered. 22 While the TeT, channel was initially considered, the Higgs mass reconstruction suffered from the four neutrinos in the final state. This, combined with overwhelming SM backgrounds, led to work on this channel being abandoned as well. This analysis considers three separate di-r final states: rhTh, ,IrTh, and Terh. 23 24 Chapter 4 The Compact Muon Solenoid The CMS experiment 126] is one of two general-purpose physics detectors at the LHC. The detector itself is composed of tracker detectors and calorimeters in a 3.8 T magnetic field produced by a superconducting solenoid, surrounded by muon detectors. The CMS detector coordinates are given in terms of (rj, #), where 7= - In [tan (0/2)] is the pseudo-rapidity, 0 is the polar angle measured from the anticlockwise beam direction, and # is the azimuthal angle. I = 14 TeV starting in 2025 with an instantaneous luminosity of 5 x 1034 cM- 2 s-1. This The proposed HL-LHC program will provide 3000 fb-1 of collisions at corresponds to an average pileup (<PU>) of 140 interactions per bunch crossing, compared to the upcoming Run II with an expected <PU>= 25. Between the dramatically increased <PU> and projected aging effects, a non-upgraded CMS detector would struggle to produce physics results in this harsher environment. In order to preserve or improve the detector performance achieved at F = 7 and 8 TeV, the proposed Phase II CMS detector includes a new silicon tracker with coverage to IT1I = 4.0 and new electromagnetic and hadronic calorimeters in the forward region, 1.6 < mJq < 3.0. The forward regions and regions closest to the beam 25 line are most affected by both aging effects and increased pileup because the total energy deposition per unit and particle number density are highest in those regions. 4.1 Object Reconstruction Events in the CMS detector are analyzed using a particle flow algorithm [27-29] which considers information from all sub-detectors to identify and reconstruct individual particles in the event. It combines tracks from the inner tracker detectors, energy depositions in the various calorimeters, and muon segments in the muon detectors to create particle candidates based on track and muon segment extrapolation and the locations of energy depositions. The Delphes fast simulation [30] is used to model the Phase II detector at <PU>= 140. The parameterized efficiencies used as input for Delphes are derived using a GEANT-based [311 full simulation of the proposed detector geometry. Muons are identified by matching muon segments with tracks. The average muon identification efficiency is 98% for PT > 30 GeV. The Delphes simulation does not include a muon fake rate. Electrons are reconstructed from energy depositions in the electromagnetic calorimeter and compatible tracks in the tracker detector. The parameterized electron identification efficiency as a function of pT and r7 is greater than 90% for electrons with pT > 30 GeV. The electron fake rate from photon conversions in the tracker or other sources is not included in Delphes. Electrons and muons from H -+ T decays are not expected to be near large numbers of hadrons, unlike leptons originating from jets. The lepton relative isolation variable R is a measure of how much unrelated activity is near the lepton candidate. It is calculated in Delphes as PT(lep.) E 2 pT + max (ZPT - 7rpAR , 26 ) (4.1) where pT(lep) is the transverse momentum of the lepton, both sums are over objects within a cone of radius AR around the lepton, the first charged hadrons, the second neutral hadrons, and p is the average energy density per unit area for the event. The 7rpAR2 term is subtracted to correct for the expected energy deposition from pileup within the cone around the lepton, allowing for better discrimination between leptons inside and outside jets. Charged hadrons are reconstructed from energy depositions in the hadronic calorimeter and compatible tracks in the tracker. Neutral hadrons and photons are identified Jets are reconstructed using the anti-kt algorithm [32] with cone radius D - as energy depositions in the relevant calorimeters without matching tracks. 0.4. The FastJet technique 133] is used to correct for pileup effects. Pileup jets are rejected using cuts on track-related and jet shape variables following previous pileup jet identification work within CMS [34] corresponding to 95% non-pileup jet efficiency and 20% pileup acceptance. The CMS Combined Secondary Vertex algorithm (CSV) [35-371 is used to identify b-jets based on a likelihood discriminant which considers track impact parameters and the identification of displaced vertices from the relatively long-lived b-hadron. Jets originating from b-quarks are identified using the CSV medium working point with on average 68% efficiency, a 10% fake rate from c-quarks, and a 1% fake rate from light quarks. In the CMS particle flow algorithm, hadronic r decays are reconstructed from jets with an identified 7r0 decay and charged hadrons matching a hadronic T decay mode [38]. However, the Delphes process for hadronic T identification is greatly simplified and does not take into account the particular hadronic T decay mode. Jets originating from hadronic T decays are tagged with 65% efficiency and a fake rate of 1%. Both the efficiency and fake rate are flat in PT and q, which is likely also a simplification. In data and the CMS full simulation, particular types of hadronic activity are much more likely to be incorrectly identified as hadronic T decays than others. Because the Delphes fake rate is applied to all jets without consideration of the underlying physics, 27 its predictions could differ significantly from reality. In order to improve upon the Delphes hadronic T-tagger, this analysis requires a reconstructed hadronic T candidate contains at least one Delphes isolated track, mimicking a full simulation restriction to one-prong T decays. The missing transverse energy, KT, is calculated as the negative vector sum of all particle flow candidate objects, tT ZPTi. Neutrinos are only reconstructed by theET they create, but ET is also created via mis- reconstructing the momenta of objects in the event. Thus OT resolution is degraded by increase pileup as more objects give more opportunities for mis-reconstruction. After rejecting jets identified as pileup jets as described above, the ,T resolution is on average 20 GeV for <PU>= 140. The di-T mass m,, is reconstructed using the SVFIT algorithm [391, which optimizes the mass resolution of the di-r final state by performing a maximum likelihood fit method to take into account both the visible T decay products and the 4+, which includes contributions from 2-4 neutrinos. At 8 TeV, the SVFIT mass resolution was estimated to range between 10 and 20% depending on final state and category. 28 Chapter 5 Background Processes There are a number of other SM processes with much larger cross sections than the signal process that also have two T leptons and two b-jets in their final state. These include tt events where the W boson decays leptonically, ZH and ZZ events, among others. Additionally, because there are non-negligible fake rates for both hadronic T decays and b-jets, the analysis must consider processes that could fake this final state. QCD multijet processes could be a major source of fake hadronic T decays or b-jets because of the overwhelmingly high production cross section, as well as anything with multiple jets involved. One way to discriminate against backgrounds is to consider the di-T mass m, and di-b mass Mbb distributions. For the HH signal, both of these distributions should peak near MH= 125 GeV. For ZH background, would expect one peak near MH and one peak near Mz. For ZZ background, both peaks near Mz. For tt and QCD multijet events, no peak is expected because the pairs of same flavor objects don't come from a resonant decay. We use one additional discriminant against the tt background, the stransverse mass mT2. It was proposed for this purpose in [23], but originally designed for SUSY searches where a pair of equal-mass particles decay into one invisible daughter and at least one visible daughter particle with unknown parent momenta. For the purposes 29 of this analysis, it is defined as mT2 (mB, mB,bT, U, pr, mc, mC) min {max(mT, mT)} (5.1) CT+C'r=PT where bT and b' are the b-jet transverse momenta, mB and m' are the b-jet masses, CT and c'T are the visible T lepton candidate transverse momenta, mC and m' are the visible T lepton candidate masses, and (5.2) + pT(T) + pTs(T') = pT(W) + pT (W') PT - is the vector sum of the missing transverse momentum, presumably from neutrinos, and the transverse momenta of the visible mT2 T decay products. For tt events, the variable is bounded above by the top mass. In contrast, the di-Higgs signal distribution is bounded only by \//2. In the following chapters, the background processes shown in plots are divided into five major categories: tt, SM H -4 TT, Z -+ TT, electroweak, and QCD. The tf background is tt events with no restriction on the final state, but in the signal region it is overwhelmingly composed of real T decays and b-jets. The SM H -+ TT background includes gluon-gluon fusion, VBF, VH, and ttH production modes to the r-.r final c'tat. In the sin L.L.LL "i %kL..CbL V.5 i.JI.I kiV .1V h IQ IILL L region, LJ J L.% VViI - k'01 .1 ct Admiated kbr eiT-T T .1h..nn. IL~J. L. JL 11..-J. ' 10J CbdLJDw some contribution from VBF jets and vector boson jets faking b-jets. The Z -4 TT background includes events with any number of Z bosons and jets, but no W or H bosons. The electroweak background includes di-boson, tri-boson, single top, and W+jets processes. The QCD background is not shown in most plots, but is discussed separately in Section 9.1. 30 Chapter 6 Monte Carlo Samples Monte Carlo (MC) samples for all signal and background processes were generated using the MC generation strategy developed for the Snowmass 2013 conference [40,41], with the underlying physics processes simulated in Madgraph 5 [42], parton showering 143], and the simulation of T lepton decays and fragmentation performed in PYTHIA 6 done using TAUOLA [44]. Detector simulation was performed using the Delphes fast simulator. One million signal events were generated with the signal yield normalized to the NNLO Higgs pair production cross section at 14 TeV of 40.2 fb. The same number of events were also generated with the Higgs trilinear coupling constant 5x, 0 x, -1x, AHHH set to and -5 x Asm. Single Higgs samples constrained to the TT final state for all four major production modes were also produced to provide increased statistics using the same generator level work flow. The background processes were generated centrally for the CMS Phase II upgrade studies and are organized into five object categories at the generator level: J 0= ,U , dj, , c,=s, e, b, I}, L = {e+, e-pg+ /1- T+,IT I~Ve,IVA, V-r}, B= {W+, W-, Zo,-}, T = {t, 31 , H = {h0}. Final States Order vector boson + jets divector + jets top pair + jets B + mJ BB + nJ TT + nJ O(ca8 O(c8c4) O(ain + 2)) top pair, off-shell T* -+ Wj + jets TB + nJ 0(a(n + 1)aw) single top (s- and t-channel) + jets offshell B* -+ LL + jets T + nJ LL + nJ [ml, > 20 GeV] top pair + boson TTB + nJ, TTH + nJ off-shell divector BLL + nJ [m, > 20 GeV] ) Main Processes ) O(aC ) 1) O(an- ) 2 a O(asM O(anal) B* -* LL + jets BBB + nJ, VH + nJ _ O(asc) _______ H + nJ B + nJ, H + nJ [n > 2] O(asa O(a8 W) ) tri-vector + jets, Higgs associated + jets gluon fusion + jets vector boson fusion + jets Table 6.1: List of SM background categories generated for CMS upgrade studies, including the main processes for each category, generator-level final states, and order in each coupling. The included background samples are summarized in Table 6.1. Each sample was produced in orthogonal bins of the variable S , the scalar sum of the transverse momentum of all generator level particles, with the process cross section computed separately for each bin. The cross section of each event is computed at LO, with the branching ratio of each final state reweighted to enrich rare decay modes. NLO K-factors calculated using MCFM [45] and branching ratio scale factors are applied at the event level to produce the final event weight. Because of the overwhelmingly large QCD multijet cross section and the relatively low probability for four QCD jets in a single event to fake exactly two T lepton candi- dates and two b-jets, no QCD samples were produced. Instead, an 8 TeV data-driven estimate from same-sign rr events, described in Section 9.1, was used to confirm that the QCD background is likely much smaller than the expected background contribution from MC-simulated processes. 32 Chapter 7 Event Selection The baseline selection criteria for the pT, ij, lished based on previous work on the H -+ TT and isolation of each object were estabchannel at CMS, the projected HL-LHC environment and detector capabilities, and physics properties of the HH -+ bb-rT signal. A summary of the object level selection criteria for each channel is summarized in Table 7.1, along with the 8 TeV H -+ TT The pT thresholds for the r,-rh and TeTh analysis requirements [391. channels were raised because compared to the CMS Run I data, the HL-LHC data will have many more low energy objects due to increased pileup and it is unlikely the available bandwidth for the trigger menu will increase proportionally. The FhTh channel pr threshold was not raised because the 8 TeV thresholds already limit the selection acceptance significantly. This analysis is restricted to essentially the same q regions as the Run I detector because the generator level q distributions for signal events were predicted to be mostly central, leaving little motivation to consider the relatively poorly understood performance of the new forward detectors. The relative isolation criteria for leptons and hadronic T candidates were relaxed because the isolated track requirement itself already greatly reduces the number of jets faking hadronic T decays in the signal region. The signal and background yields after applying the baseline selection criteria are shown in Table 7.2. The harsh requirements on hadronic in a baseline signal yield for the ThTh T decay kinematics result channel of about 0.5% the expected production 33 cross section. The erTh and predicted cross sections. T,Th channels fare better, retaining around 1-2% of their As expected, the tt background completely overwhelms the signal in all channels, but the Z -+ significant contributions to the Ter and and electroweak backgrounds also make TT Tphr channels. As previously discussed, the most powerful discriminators against non-resonant backgrounds and resonant Z -4 -rrevents are ?rn, and distributions are shown in Figures 7-1 (for mbb TeTh). ThTh), The baseline m,, and mbb. 7-2 ( for TTh), and 7-3 (for In all figures, the signal distribution is scaled by a factor of five hundred to one thousand for visibility. Even at this baseline selection level, it is apparent the background MC statistics are lacking in the rhTh channel. There is a clear peak in the signal distributions for all three channels. For the distributions in all channels and the m, distributions for the Tj1 Th and TeTh mbb channels, this peak is near 120 GeV as expected for Higgs decays. The M,, distribution for the TT channel peaks much lower, near 90 GeV, suggesting that further optimization of the SVFIT algorithm for <PU>= 140 is possible. However, the aim of this analysis is to demonstrate feasibility, not complete optimization. The m, and mbb window requirements were established separately for each chan- nel to strike a balance between signal acceptance and background rejection. For the rhTh final state, the requirements are 90 < For the Ferh and T,Th mbb < 140 GeV and 90 < m, < 120 GeV. channels, the requirements are 90 < mbb < 130 GeV and 100 <m,, <150 GeV. Table 7.3 shows the signal and background yields after applying mass window requirements. The background contributions in the Th-rh channel are greatly reduced, but again it is apparent that the MC statistics limit the predictive power for that channel. For the ,Trhchannel, the tt yield is reduced by a factor of ten after the mass window cuts, but still dwarfs the signal yield. 34 Table 7.1: Summary of object-level selection criteria for each di-T channel. The absolute isolation variable I is properly defined in the 8 TeV H -+ TT analysis but not used here. The relative isolation variable R, as previously defined, is used in its place. di--r Final State Object 8 TeV Req. 14 TeV Req. pT > 45 GeV pT > 45 GeV IT/I Th Thrh IT/I < 2.1 I < 1.0 GeV pT > 30 GeV pT > 17 - 20 GeV R < 0.4 pT > 30 GeV 171< 2.1 R < 0.4 pT > 30 GeV 1r71 < 2.1 1r/1 < 2.5 R < 0.1 pT > 30 GeV R < 0.4 pT > 30 GeV Th I Trh < 2.1 A r7 < 2.4 < 1.5 GeV 1771< 2-1 J I < 2.4 Th I < 1.5 GeV R < 0.4 pr > 20 - 24 GeV pT > 30 GeV TeTh All e 1r71 < 2.1 R < 0.1 Jq I < 2.5 R < 0.4 b - pr > 30 GeV All < 2.5 Table 7.2: Expected yields in each channel for 3000 fb-' of integrated luminosity after baseline selection requirements. Process ThTh TTh TeTh HH ti Z -+ r EWK Single H 23.6 0.5 (1.4 0.1) x 10 4 2300 600 1500 100 240 20 34.0 0.6 (5.1 0.1) x 105 (1.7 0.1) x 104 (3.9 0.8) x 104 960 + 40 30.6 0.5 (4.8 0.1) x 105 (1.0 0.5) x 104 (2.7 0.0) x 10 4 1000 40 Table 7.3: Expected signal yields in each channel for 3000 fb-l of integrated luminosity after mass window requirements. Process Thrh TITh TeTh HH f _ Z- Tr EWK Single H 7.7 700 0.6 60 30 0.3 300 0.6 30 10 14.4 0.4 (3.6 0.2) x 104 83 83 1600 100 80 10 35 18.6 (1.3 0.4 0.0) x 105 400 7300 350 300 200 30 E 5000 500xhh-m bb O SM H-M- Electroweak bkMg. - _ uncertinty- 2000 1000 0 0 50 150 100 250 200 300 500xhh-+T bb SM W-M G) 4000 Eecroweak ti .' bkg. uncertinty 3000 2000 1000 0 0 50 150 100 200 250 300 mbb Figure 7-1: Predicted m,, (top) and mb, (bottom) distributions in the rhrh channel. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of five hundred. 36 0y 80000 1000xhh-* W bb 50000 z-V 40000 bkg. unewtinty Ekwctrweak 30000 20000 10000 0 0 50 150 100 250 200 300 rT ---1000xhh-m bb 4) H-n 50000 SM 40000 Electrvweak t bk. uncertinty 30000 20000 10000 0 0 50 150 100 200 250 300 Mbb Figure 7-2: Predicted m,, (top) and mbb (bottom) distributions in the rrh channel. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one thousand. 37 S1000xhh-+ > bb 50000 EBectroweak 40OW 3{5 bkg. unewaInty 30000 20000 10000 0 0 50 150 100 40000 > 300 -10-aa 10O0xhh-+= bb a> 35000 250 200 SM H--Ym Electowoak t 30000 Mkg. uncwrtointy 25000 20000 15000 10000 5000 0F 0 50 150 100 200 250 300 Figure 7-3: Predicted m,, (top) and mbb (bottom) in the rer channel. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one thousand. 38 Chapter 8 Signal Extraction The goal of this analysis is to estimate the expected significance of a Higgs pair production cross section measurement at the Phase II CMS detector. Because the signal yield is expected to be quite small compared to the backgrounds, a one-variable shape analysis for each channel is considered to exploit the differences between signal and background distributions in key variables. The CMS combine tool [46] was used to perform all statistical analysis. For all three channels, a number of event-level variables were considered for signal extraction, though only the most promising are discussed below. mass mT2 The stransverse variable was defined in a previous chapter. The transverse momenta of the di-b system pr(bb), the visible di-T system pS(TT), and the overall di-H system pfr(HH) were also considered, as well as opening angles between the various selected objects. Many of these variables are correlated, because, for example, a boosted di-b system with high pT(bb) will have a small opening angle. 8.1 Fully Hadronic Channel The fully hadronic rhrh channel is already limited by the tt MC statistics after the mass window cuts described in the previous chapter. Figure 8-1 shows the pT(bb) and mT2 distributions for this channel. The signal is scaled by a factor of one hun- 39 dred for visibility. Both show good separation between the signal and background mT2 variable is chosen for shape analysis. C a, LU ---- 100xhh- bb - distributions, but the 1000Euectoweakn- bkg. uncertinty - ti- 800 600 400 200 -~i 0 100 300 200 600 500 400 pT(bb) -- a - LU- 800 100xhh-+%w bbSM Ha Electroweak - - - 1000 ff bkq. uncertinty 600 400 200 '-----I - 0 100 I 300 200 77-;m-j- 400 500 600 MT2 Figure 8-1: Predicted distribution of the pr(bb) (top) and mT2 (bottom) variables in the rhrh channel after mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one hundred. 8.2 Semi-Leptonic Channels Unlike the fully hadronic channel, the two semi-leptonic channels rTh and erTh have ample MC statistics after the mass window cuts described in the previous chapter. 40 There are still overwhelming SM backgrounds, so further cuts on event-level variables are considered. The mT2 and pT(bb) distributions for the T11 Th and Ter channels are shown in Figures 8-2 and 8-3 respectively. In the T,Th channel, the signal distributions are scaled by a factor of one thousand for visibility, and the TeTh channel a factor of five thousand. Based on these distributions, a further requirement that the mT2 be greater than 100 GeV is applied to the TTh and rer channels. Table 8.1 gives the expected signal and background yields after the mT2 cut. While it reduces the background yields by half in the rTh channel and by ten in the Terh channel without a significant reduction in signal yield, the signal-to-background ratio is still quite poor in both channels. Table 8.1: Expected signal yields in each channel for 3000 fb- 1 of integrated luminosity after requiring that the mT2 variable is greater than 100 GeV. Process TtiTh Terh HH tf Z - TT EWK Single H 12.7 0.3 (1.2 0.1)x10 4 83 83 540 50 34 4 12.5 0.3 (1.0 0.1)x10 4 0.6 0.6 570 +60 40 10 To improve the discrimination between signal and background samples, a boosted decision tree (BDT) is trained to separate the Higgs pair signal and tf backgrounds after baseline selection. While in principle all descriptive variables can be used together to create excellent separation in the BDT training sample, in practice too many variables leads to overtraining and that success in the training sample will not carry over into other samples. The two BDT for this analysis are described in detail in Appendix A. Both are trained on the mT2 variable, and the masses, transverse momenta and opening angles of the di-r, di-b, and di-H systems after baseline selection cuts. The BDT discriminants for each channel after all selection cuts are applied are shown in Figure 8-4 and are used for signal extraction. Both channels show good 41 separation between signal and background, though the signal distributions are scaled ------------------>M 1OOOxhh-*m bb - by a factor of five hundred for visibility. 8000 7000 - 6000 5000 3000 t 2000 0 - 1000 0 100 200 300 400 500 600 pT(bb) K2SM H-+T m 1000 I - 8000 -* bb Electrowek - C 000 10000 ~ti bkg. uncertainty 6000 y4000 2000 0 100 300 200 400 500 600 1 r2 Figure 8-2: Predicted distributions for the pT(bb) (top) and mT2 (bottom) variables in the TrTh channel after mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of one thousand. 42 () -- 30000 5000xhh--T bb SM H-+w - C- Electroweak- 25000 -- bkg. uncertainty 20000 15000 10000 5000 0 0 100 200 300 400 600 500 pT(bb) 35000 5000xhh-+= bb SM H-vT 30000 Eketroweak 25000 bkg. uncertainty - 0) 20000 15000 10000 5000 0 0 100 200 300 400 600 500 r2 Figure 8-3: Predicted distributions for the pT(bb) (top) and mT2 (bottom) variables in the TeTh channel after mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of five thousand. 43 -4 .- 50xh-+ - 4500 bb SM H-+ 4000 -7 m- 3500 bkgj. uncertainty 3000 2500 2000 C 1500 1000 500 0 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.5 0.4 -- 3500 SM H-+T bb -Electrowoak -ti + w 50"xh-+= - BDT 3000 Z- bkg. uncertainty 2500 2000 1500 16-1000 500 0 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 BDT Figure 8-4: Predicted distribution of the BDT discriminant in the T,7Th (top) and TeTh (bottom) channels for the signal region. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by a factor of five hundred. 44 8.3 Statistical Interpretation Two methods are used to extract the statistical significance of the potential measurement. The first is the asymptotic CLs method [47], which reports an upper limit on the cross section that would still be consistent with the background-only hypothesis to a specified confidence level. The second is a maximum likelihood fit using the Asimov dataset, which estimates the expected precision on a cross section measurement. 8.4 Uncertainties Both statistical uncertainties on the overall MC scale factor from the limited number of MC events and systematic uncertainties are taken into account by the maximum likelihood fit, though bin-by-bin statistical uncertainties are not. The MC statistical uncertainties dominate for all three channels. The leading systematic uncertainties are a 20% uncertainty from the QCD scale at NLO for the signal process, and a 9% PDF uncertainty. The systematic uncertainty on integrated luminosity is taken to be 2.6%, the same as in the 8 TeV data. Uncertainties on the jet, lepton, and missing energy scales are also included. 45 46 Chapter 9 Results Before discussing the results of this analysis, a comparison of the 14 TeV Delphes and 8 TeV full simulation event yields and distributions is presented to motivate the reasonableness of those results. 9.1 Cross Check Using 8 TeV Data Sets Because there were a number of simplifying assumptions made about detector performance to create the Delphes fast simulation, we compare to the very well understood 8 TeV detector using H QCD -+ TT full simulation samples. The major concerns are the multijet backgrounds and hadronic T performance. The background yields from the Delphes samples at 14 TeV for the Phase II detector in the two most sensitive channels, rhmh and r,ITh, were compared to full simulation samples at 8 TeV produced for the CMS H - TT analysis [391. A signal sample at 8 TeV was also produced in full simulation using the same production framework. The 8 TeV QCD contribution is estimated from data in same-sign di-T events that otherwise pass the selection requirements. All 8 TeV samples have the relevant di-T triggers applied. Tables 9.1 and 9.2 show a comparison between the expected signal yields for 21 fb-' of integrated luminosity at v/2 = 8 TeV and 3000 fb- 1 of integrated luminosity at fi = 14 TeV at two stages of the cut-based selection in the 47 ThTh and T,rh channels respectively. The Higgs pair and ti yields from 8 TeV full simulation are scaled to 3000 fb-' and A/s = 14 TeV. For the signal process, this is done using the 8 TeV NNLO Higgs pair production cross section of 9.8 fb [181. For the ti process, scale factors derived from MCFM are used. Overall, the full simulation and Delphes yields are consistent. There is some discrepancy between the signal and ti yields after the mass window requirements are applied, but this is not surprising given the expected differences in and jet PT 'KT resolution and response between <PU>= 21 and <PU>= 140. Additionally, this comparison demonstrates that the QCD contribution to the background, at least for 8 TeV and <PU>= 21, is negligible or zero when compared to the dominant tt background. After the mT2 > 100 GeV selection cut in the rr channel, the QCD contribution is 7 + 4 events, only slightly more than 10% of the 61 + 2 tt events. Table 9.1: Cross check with 8 TeV full simulation at two stages of the rhrh cut-based selection. The 8 TeV columns are from full simulation MC and are scaled to 8 TeV with 21 fb- 1 integrated luminosity, except (*) the signal and ti yields which are scaled to 14 TeV and 3000 fb- integrated luminosity. The 14 TeV columns are from Delphes and are scaled to 3000 fb-1 integrated luminosity. Mass window Baseline selection Process Pr ___ _ ti 8 TeV 23.4 1 2(*) (1.8 0.1) x 104 (*) 14 TeV 23.6 - 0.5 (1.4 0.1) x 104 Z -+ r 5.2 0.8 2300 600 EWK Single H 3.4 0.19 1.1 0.04 1500 240 100 20 QCD The mT2 17.3 4.8 8 TeV 1 14 TeV 14.6 1.2 (*) 1.17 _ .3 530 130 (*) 700 300 0 0.03 0.04 - 0.03 0.02 0.6 0.6 60 30 30 10 0 distributions for 8 TeV data and MC are also considered to demon- strate agreement between the two and to further consider the possible QCD multijet contributions to the background distributions at 14 TeV. Figure 9-1 shows the mT2 distributions at the baseline and mass window levels for the rhTh channel with the signal distribution scaled by a factor of one thousand (baseline) or one hundred (mass window). At the mass window level, the data points above 48 mT2 = 100 GeV are not Table 9.2: Cross check with 8 TeV full simulation at two stages of the T,!rh cut-based selection. The 8 TeV columns are from full simulation MC and are scaled to 8 TeV with 21 fb- 1 integrated luminosity, except (*) the signal and tif yields which are scaled to 14 TeV and 3000 fb- 1 integrated luminosity. The 14 TeV columns are from Delphes and are scaled to 3000 fb-1 integrated luminosity. Mass window Baseline selection 8 TeV 30.0 1.6 (*) tf (6.6 Z -+ TT EWK Single H QCD 12.8 44.7 1.1 0.0) (*) 1.3 4.4 0.1 29.9 11.5 x10 5 14 TeV 34.0 0.6 (5.1 0.1) x 10 5 (1.7 (3.9 0.1) x 104 0.8) x 10 4 960 40 8 TeV 14.6 1.1 (*) (3.4 0.1) x10 4 (*) 0.43 0.22 3.4 1.3 0.2 0.1 6.9 - 3.6 14 TeV 19.2 0.4 (4.4 0.2) x 10 4 (2.3 0.1) x 104 1893 113 1308 45 - Process HH shown in order to blind the analysis. There are no QCD multijet events remaining after the mass window cuts. Figure 9-2 show the same mT2 distributions for the TITh channel with the signal distribution scale by a factor of ten thousand (baseline) or one thousand (mass window). In both the rprh and FhTh channels, there is excellent agreement between data and MC at the baseline selection level. 49 1 ~n 601 T TI I II I 1 1 1000xhh-+ Observed -- >)- W 0 bb SM H-+nr Z- 40 Electroweak 30 QCD Multi Jet bkg. uncertainty 20 10 0 0 100 200 400 300 600 500 rT 2 10xhh-mT bb 3 Observed SM H-= Z-w -.- W 2.5 - -ti - 2 QCD Multi Jet bkg. uncertainty - 1.5 Electroweak - 0.5 0 0 100 300 200 400 500 600 MT 2 Figure 9-1: Cross check with 8 TeV data for the ThTh channel. Predicted distributions for the mn2 variable before (top) and after (bottom) mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by one thousand (top) or one hundred (bottom). In the bottom figure, the data is blinded for mT2 > 100 GeV. 50 0 700 > W -- 10000xhh-+t bb -+- Observed SM H-+- : 600 Electroweak 500 -ti QCD Multi Jet bkg. uncertainty 400 300 200 100 7: 0 100 400 300 200 600 500 M2 0) 45 0> --- 1000xhh-mn - Observed SM H-+'rt WU 35 bb Z+ 30 -ti 25 20 Electroweak QCD MutI Jet bkg. uncertainty 15 10 5 0 0 100 300 200 400 500 600 mr 2 Figure 9-2: Cross check with 8 TeV data for the r,-rh channel. Predicted distributions for the mT2 variable before (top) and after (bottom) mass window cuts. The background yields are expected contributions from SM processes, while the signal yield is the expected contribution from SM Higgs pair production scaled by ten thousand (top) or one thousand (bottom). 51 9.2 14 TeV Results The expected 95% CL upper limit on the branching ratio times cross section measurement and the expected la- uncertainty on the branching ratio times cross section measurement are computed for each channel separately, then combined. Both metrics are listed in Table 9.3 for the separate and combined scenario. The main result is that the expected upper 1l- uncertainty on the branching ratio times cross section measurement for SM Higgs pair production is 67%. This is less precision than the result achieved by [231, as expected due to the detector effects taken into account with our simulation. However, this result is still comparable, which is encouraging overall for the bbrr final state. The rr best, followed by Thhr channel appears to perform the and finally TeTh. Note that the TeTh uncertainty on the cross section is consistent with zero. However, the doubly hadronic channel performance is much less definitive than the other two because it may be the result of some artifact in the distributions created by the limited statistics in that channel. Table 9.3: Statistical results of the analysis, showing the asymptotic 95% CL upper limit on the expected cross section and the expected lo uncertainties on the cross section measurement. Channel ThTh 1Tp Ih TeTh Combined 95% CL upper limit 3.35 T onCA L M.U 5.66 2.19 52 +1U +119% -10-89% 000-/ rPTAOI -00 /0 -IV/O +231% +67% -100% -57% Chapter 10 Conclusions This analysis indicates that a measurement of Higgs pair production in bbrT final states is feasible at the CMS detector during the HL-LHC run. The doubly hadronic and semi-leptonic di-T channels are considered separately to establish selection cuts, before a shape-based signal extraction is performed. The doubly hadronic channel uses the mT2 distribution, while the two semi-leptonic channels are analyzed using a BDT discriminant trained to separate the tf background from the signal. Additionally, the results from 14 TeV HL-LHC fast simulation samples were found to be consistent with 8 TeV full simulation samples, validating these projections. The expected 95% CL upper limit on the cross section times branching ratio from a combination of all three channels is 2.2 times the SM value, with an expected +lc- uncertainty on the measured cross section of 67%. While much of the analysis is based on predictions that will be updated as the detector is built and commissioned, these results indicate that the bbrT final state could yield powerful constraints on the SM and non-SM Higgs sectors. 53 54 Appendix A Semi-Leptonic BDT The TMVA [481 package was used to train two BDT for signal versus ff discrimination in the T,Th and TeTh channels. Half of the MC events for each channel that pass the baseline selection requirements are used for training, while the other half comprise the testing sample. The variables used in these BDT are: " visible di-r mass mv, " visible di-T transverse momentum pys(TT), " di-r opening angle AR(rr) = V/( ," di-b mass " di-b transverse momentum pT(bb), 7r2)2 + (#i - 2 (#b1 - #,2)2, mbb, " di-b opening angle AR(bb) = V/(b1 - " di-H mass mHH, " di-H transverse momentum pr (HH), " di-H opening angle AR(HH) = /(ma " stransverse mass mT2 55 ?b2) - + r1) + (#bb #52)2, - #rT)2, and L4 Figure A-1: Input variable distributions for rrh channel BDT. The signal is shown in blue, while the background is red. 56 A BSwkgound7 a.a. OL IT aim Ikit W1 GAMto *0/o ao Figure A-2:~~ Inputae v ieari abl e ditrbtin frTj e Fiur A2:Iptarbedstbtisfr i blue, twhie.bc in bluewhile te backg ruof ise srd 57 cane DT hanlBTTesgalsshw hesinl sshw Figure A-3: Correlation matrices for the (top) signal and (bottom) background samples in T,hr channel. 58 -. 4 M0.017 (0.04" agnd (backound) proabi -O.t -6.2 .4.3 . -. 5 Background (trinhig smple)7 I Background Kolmogomv-&Srnovbst Figure A-4: Overtraining check for BDT classifier in the T,rh channel.The signal is shown in blue, while the background is red. [Background rejectioni versus C Signal efficiencyI . ........ .. .. .. ... .. .. .. ... ... .. ..... ......... ... .. .... .... ... ... . .... ... 0.6 0.7 .. .. .-. ... .. .---... ... ... ....-. .. ..-.. .. ..-.. .... ... - -.. ---.... .. .. ... ... .... ..... ..... ..... ......... .. .-.. -.. ... .. .. .... . .5 0.8 - 0 0.5 .... .... d... YA -.... .. ...... ...... ......... ... .M......... -.. ........ -.. ..... ........ .......... . 0.6 -.. 0.3 n2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 .8 0. 1 Signa efficiency Figure A-5: ROC curve for BDT classifier in the r,rh channel. 59 Figure A-6: Correlation matrices for the (top) signal and (bottom) background samples in r6 rh channel. 60 TMVA overtraining check for- classifier: BDT Background (traIning ample) z 7 fkround hM awnpie) KolmogarovSmnov lt siignal I probs~illy = 0(0.022) 5- 34 .0.5 4A / 1-F .. 4.2 -. A 4.1 a BDT response Figure A-7: Overtraining check for BDT classifier in the reh channel.The signal is shown in blue, while the background is red. Background rejection versus Signal efficiency AAo CBDT .5 MVA Method: -BDT . 0 .2 .. . . . .. . . . .. . . . . . . 0.4 0.5 0.6 0.7 0.8 0 8.1 0.2 0.3 0.9 1 Signal efficiency Figure A-8: ROC curve for BDT classifier in the eTh channel. 61 62 Bibliography [1] S. Chatrchyan et al. Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC. Phys.Lett., B716:30-61, 2012. [2] S. Chatrchyan et al. Observation of a new boson with mass near 125 GeV in pp collisions at sqrt(s) = 7 and 8 TeV. JHEP, 06:081, 2013. [31 G. Aad et al. Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Phys.Lett., B716:1-29, 2012. [4] S. Chatrchyan et al. Study of the mass and spin-parity of the higgs boson candidate via its decays to z boson pairs. Phys. Rev. Lett., 110:081803, Feb 2013. [51 G. Aad et al. Evidence for the spin-0 nature of the Higgs boson using ATLAS data. Phys.Lett., B726:120-144, 2013. [61 V. Khachatryan et al. Precise determination of the mass of the higgs boson and tests of compatibility of its couplings with the standard model predictions using proton collisions at 7 and 8 tev. Technical Report arXiv:1412.8662. CERN-PH- EP-2014-288. CMS-HIG-14-009, CERN, Geneva, Dec 2014. Comments: Submitted to Eur. Phys. J. C. 171 F. Englert and R. Brout. Broken Symmetry and the Mass of Gauge Vector Mesons. Phys.Rev.Lett., 13:321-323, 1964. [81 P.W. Higgs. Broken symmetries, massless particles and gauge fields. Phys.Lett., 12:132-133, 1964. [9] P.W. Higgs. Broken Symmetries and the Masses of Gauge Bosons. Phys.Rev.Lett., 13:508-509, 1964. [10] G.S. Guralnik, C.R. Hagen, and T.W.B. Kibble. Global Conservation Laws and Massless Particles. Phys.Rev.Lett., 13:585-187, 1964. [11] P.W. Higgs. Spontaneous Symmetry Breakdown without Massless Bosons. Phys.Rev., 145:1156-1163, 1966. [121 T.W.B Kibble. Symmetry Breaking in Non-Abelian Gauge Theories. Phys.Rev., 155:1554-1561, 1967. 63 [13] S.L. Glashow. Partial-symmetries of weak interactions. Nucl.Phys., 22:579-588, 1961. [14] S. Weinberg. A Model of Leptons. Phys.Rev.Lett., 19:1264-1266, 1967. [15] A. Salam. Elementary Particle Physics: Relativistic Groups and Analyticity. page 367, 1968. Proceedings of the eighth Nobel symposium. [16] M. Carena, C. Grojean, M. Kado, and V. Sharma. Status of Higgs Boson Physics. Chin.Phys., C38, 2014. [17] J. Baglio, A. Djouadi, R. Grober, M.M. Muhlleitner, J. Quevillon, et al. The measurement of the Higgs self-coupling at the LHC: theoretical status. JHEP, 1304:151, 2013. [18] D. de Florian and J. Mazzitelli. Higgs boson pair production at next-to-next-toleading order in qcd. Phys. Rev. Lett., 111:201801, Nov 2013. [19] T. Plehn, M. Spira, and P.M. Zerwas. Pair production of neutral Higgs particles in gluon-gluon collisions. Nucl.Phys., B479:46-64, 1996. [20] M.J. Dolan, C. Englert, and M. Spannowsky. New physics in lhc higgs boson pair production. Phys. Rev. D, 87:055002, Mar 2013. [21] J.M. No and M. Ramsey-Musolf. Probing the higgs portal at the lhc through resonant di-higgs production. Phys. Rev. D, 89:095031, May 2014. [22] M. J. Dolan, C. Englert, and M. Spannowsky. Higgs self-coupling measurements at the LHC. JHEP, 1210:112, 2012. [23] A.J. Barr, M.J. Dolan, C. Englert, and M. Spannowsky. Di-higgs final states augint2ed: Selecting hh events at the high luminosity lhc. Phys.Lett., B728:308313, 2014. [24] D.E.F. de Lima, A. Papaefstathiou, and M. Spannowsky. Standard model higgs boson pair production in the (bbbb) final state. Journal of High Energy Physics, 2014(8), 2014. [25] K. Olive et al. Tau Branching Fractions. Chin.Phys., C38, 2014. [261 S. Chatrchyan et al. The CMS experiment at the CERN LHC. Instrumentation, 3(08):S08004, 2008. [ 2u aice-F iuw Eveta1 MeUCOnstruction in CV1 anu re1ormiuancue or Jets, Journal of Taus, and MET. Technical Report CMS-PAS-PFT-09-001, CERN, 2009. Geneva, Apr 2009. [28] Commissioning of the Particle-Flow reconstruction in Minimum-Bias and Jet Events from pp Collisions at 7 TeV. Technical Report CMS-PAS-PFT-10-002, CERN, Geneva, 2010. 64 [29] Particle-flow commissioning with muons and electrons from J/Psi and W events at 7 TeV. Technical Report CMS-PAS-PFT-10-003, CERN, 2010. Geneva, 2010. [301 J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaitre, et al. DELPHES 3, A modular framework for fast simulation of a generic collider experiment. 2013. [311 S. Agostinelli et al. GEANT4: a simulation toolkit. Nucl. Instrum. Meth., A506:250, 2003. [321 M. Cacciari, G.P. Salam, and G. Soyez. The anti-kt jet clustering algorithm. Journal of High Energy Physics, 2008(04):063, 2008. [33] M. Cacciari, G.P. Salam, and G. Soyez. FastJet User Manual. Eur.Phys.J., C72:1896, 2012. [341 Pileup jet identification. Technical Report CMS PAS JME-13-005, CERN, 2013. [35] S. Chatrchyan et al. Identification of b-quark jets with the CMS experiment. JINST, 8:PO4013, 2013. [36] Performance of b tagging at sqrt(s)=8 tev in multijet, ttbar and boosted topology events. Technical Report CMS-PAS-BTV-13-001, CERN, Geneva, 2013. [37] Results on b-tagging identification in 8 tev pp collisions. Technical Report CMS- DP-2013-005, CERN, 2013. [381 S. Chatrchyan et al. Performance of -lepton reconstruction and identification in CMS. J. Instrum., 7(arXiv:1109.6034. CMS-TAU-11-001. CERN-PH-EP-2011137):PO1001. 33 p, Sep 2011. [39] S. Chatrchyan et al. Evidence for the 125 gev higgs boson decaying to a pair of tau leptons. Journal of High Energy Physics, 2014(5), 2014. [40] A. Avetisyan, J.M. Campbell, T. Cohen, N. Dhingra, J. Hirschauer, et al. Methods and Results for Standard Model Event Generation at fIs = 14 TeV, 33 TeV and 100 TeV Proton Colliders (A Snowmass Whitepaper). Technical report, 2013. [41] J. Anderson, A. Avetisyan, R. Brock, S. Chekanov, T. Cohen, et al. Snowmass Energy Frontier Simulations. 2013. [42] J. Alwall, M. Herquet, F. Maltoni, 0. Mattelaer, and T. Stelzer. MadGraph 5: Going Beyond. JHEP, 1106:128, 2011. [43] T. Sjdstrand and S. Mrenna and P. Z. Skands. PYTHIA 6.4 Physics and Manual. JHEP, 05:026, 2006. 65 [44] Stanislaw Jadach, Johann H. Kuhn, and Zbigniew Wa . Tauola - a library of monte carlo programs to simulate decays of polarized tau leptons. Comp.Phys. Com., 64(2):275 - 299, 1991. [45] J.M Campbell and R.K. Ellis. Mcfm for the tevatron and the lhc. Nuc.Phys. B (Proc. Suppl.), 205-206:10 - 15, 2010. Loops and Legs in Quantum Field Theory Proceedings of the 10th DESY Workshop on Elementary Particle Theory. [46] Procedure for the LHC Higgs boson search combination in Summer 2011. Tech- nical Report CMS-NOTE-2011-005. ATL-PHYS-PUB-2011-11, CERN, Geneva, Aug 2011. [471 G. Cowan, K. Cranmer, E. Gross, and 0. Vitells. Asymptotic formulae for likelihood-based tests of new physics. Eur.Phys.J., C71:1554, 2011. [48] A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. von Toerne, and H. Voss. TMVA: Toolkit for Multivariate Data Analysis. PoS, ACAT:040, 2007. 66