Technique for Long-Lived Anomalously Charged Particle Searches at ATLAS Alejandro Javier Cortese

advertisement
Technique for Long-Lived
Anomalously Charged Particle
Searches at ATLAS
Alejandro Javier Cortese
Class of 2012
Honors Thesis
Department of Physics
Duke University
Approved By:
Advisor: Mark Kruse, Ph.D
Reader: Henry Greenside, Ph.D
Reader: Stephen Teitsworth, Ph.D
Reader: Ashutosh Kotwal, Ph.D
i
c
2012
Alejandro Javier Cortese
ALL RIGHTS RESERVED
ii
Abstract
All free particles predicted by the Standard Model are of charge |q| = 0 or |q| = e,
but multiple extensions of the Standard Model, including some supersymmetric theories,
postulate fundamental particles of anomalous charge. These extensions have motivated multiple group efforts to search for anomalously charged particles (ACPs) that are long-lived
enough to be detected within the ATLAS detector at the Large Hadron Collider (LHC).
Currently, there are both long-lived, fractionally and doubly charged particle searches under way by the Highly Ionizing Particle (HIP) Group of the ATLAS Collaboration; these
searches are primarily focused on Drell-Yan production of ACPs. If such long-lived, anomalously charged particles exist, the charge profile of events constructed from measurements of
time-over-threshold (TOT) and trailing edge time (TE) from ATLAS’s Inner Tracker along
with energy loss (dE/dx) from the Pixel Detector and LAr Calorimeters, may offer a powerful discriminating tool. The purpose of Technique for Long-Lived Anomalously Charged
Particle Searches at ATLAS is to detail a technique the author developed to combine these
charge measurements to discriminate for ACPs most effectively and test the initial feasibility of these searches. The technique involved in this work is to utilize the resulting Fisher
distributions from a Fisher Linear Discriminant and apply a maximum likelihood fitting
method to extract limits related to the cross sections of ACP events and SM processes.
We first present results from a toy Monte Carlo that was used to test the initial feasibility of anomalously charged particle searches of |q| = 2e, 31 e, and 32 e using TOT and
TE measurements from the Transition Radiation Tracker. We then carry out our complete
technique with the Monte Carlo datasets that include the full ATLAS detector simulation utilized by HIP group for a Doubly Charged Particle (DCP) search; Drell-Yan heavy
fermion pair production of 200 GeV singly charged q-balls serving as background and 200
GeV doubly charged q-balls serving as signal. We present both the Fisher distributions
with measures of separation for each variable and upper limits set to 95% confidence of the
ratio of DCP to singly charged events out of 10,000 events using i) only time-over-threshold
and trailing edge time, ii) then including energy loss from the Pixel Detector and finally,
iii) including energy loss from the LAr calorimeters. The hope of this methodology is to
illustrated the additional sensitivity to anomalously charged particles gained through the
energy loss measurements. These limits are set using the simulated data detailed in §8.2 and
do not necessarily reflect upper limits set on real event data from ATLAS. We additionally
set upper limits to 95% confidence on the fraction of DCP events out of a range of total
events from 100 to 100,000 events to illustrate any inherent dependence on event statistics.
iii
Contents
I
Theoretical Foundations
1
1 Elementary Particle Physics
2 The
2.1
2.2
2.3
2.4
2.5
2
Standard Model
Classical Field Theory . . . . . . . . . . . . . . . .
Noether’s Theorem and Conserved Currents . . . .
The Dirac Equation and Gauge Theory . . . . . .
Group Symmetries . . . . . . . . . . . . . . . . . .
Quantum Field Theory and the Observable Quanta
.
.
.
.
.
2
2
3
5
8
9
3 Physics Beyond the Standard Model
3.1 Anomalously Charged Particle Searches . . . . . . . . . . . . . . . . . . . .
11
11
II
14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The Large Hadron Collider
4 The Large Hadron Collider and CERN
4.1 The CERN Complex and Experiments at LHC . . . . . . . . . . . . . . . .
15
15
5 ATLAS
5.1 Coordinate Basis and Variables . . . . . . .
5.2 Detector Composition . . . . . . . . . . . .
5.3 The Inner Tracker . . . . . . . . . . . . . .
5.3.1 The Pixel Detector . . . . . . . . . .
5.3.2 The Semiconductor Central Tracker
5.3.3 The Transition Radiation Tracker .
5.4 Calorimetry . . . . . . . . . . . . . . . . . .
5.4.1 Electromagnetic Calorimeter . . . .
5.4.2 Hadronic Calorimeter . . . . . . . .
5.5 Muon Chamber . . . . . . . . . . . . . . . .
5.6 Muon Trigger . . . . . . . . . . . . . . . . .
16
16
17
18
19
20
20
21
21
22
23
24
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Technique for Anomalously Charged Particle Searches
6 Charge Profiles
6.1 Time-Over-Threshold and Trailing Edge Time in the TRT . . . . . . . . . .
6.2 Energy Loss in the Pixel Detector . . . . . . . . . . . . . . . . . . . . . . .
6.3 Energy Loss in the Calorimeters . . . . . . . . . . . . . . . . . . . . . . . .
iv
26
27
27
28
29
7 Multivariate Analysis
7.1 The Fisher Linear Discriminant . . . . . . . . . . . .
7.2 TMVA Package . . . . . . . . . . . . . . . . . . . . .
7.3 Probability Density Functions and Cut Efficiencies .
7.4 Binned Maximum Likelihood Fitting . . . . . . . . .
7.4.1 Likelihood Functions . . . . . . . . . . . . . .
7.4.2 Bayesian Interpretation . . . . . . . . . . . .
7.4.3 Maximum Likelihood Fitting Implementation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
32
34
34
37
37
38
39
8 Results
8.1 Toy Monte Carlo . . . . . . . . . . . . . .
8.1.1 Doubly Charged Particles . . . . .
8.1.2 Two-Thirds Charged Particles . .
8.1.3 One-Third Charged Particles . . .
8.2 Full ATLAS Simulation Monte Carlo . . .
8.2.1 Fisher Linear Discriminant Results
8.2.2 Maximum Likelihood Fitting . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
41
44
45
47
48
51
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Conclusions
55
A Appendix: Derivation of the Fisher Linear Discriminant
56
B Appendix: Correlation Matrices
B.1 Toy Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.2 Full Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
60
61
C Appendix: Cut Efficiency and Significance Plots
C.1 Toy Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.2 Full Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
62
65
References
66
v
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Standard Model Chart . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chirality of Fermions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Doubly Charged Higgs Production: Vector Boson Fusion . . . . . . . .
Doubly Charged Higgs Production: Drell Yan . . . . . . . . . . . . . .
CERN Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Labeled Schematic of the ATLAS Detector . . . . . . . . . . . . . . .
Labeled Cross Section of the ATLAS Detector . . . . . . . . . . . . . .
Inner Tracker Schematic . . . . . . . . . . . . . . . . . . . . . . . . . .
Inner Tracker - Barrel Region Schematic . . . . . . . . . . . . . . . . .
Inner Tracker - End-Caps Schematic . . . . . . . . . . . . . . . . . . .
Labeled Schematic of Calorimeters . . . . . . . . . . . . . . . . . . . .
EM Calorimeter Module Schematic . . . . . . . . . . . . . . . . . . . .
Hadronic Calorimeter Module . . . . . . . . . . . . . . . . . . . . . . .
Muon Chamber Schematic . . . . . . . . . . . . . . . . . . . . . . . . .
TRT Straw Matrix Cross-Section . . . . . . . . . . . . . . . . . . . . .
Time Over Threshold Schematic . . . . . . . . . . . . . . . . . . . . .
Pixel Detector dE/dx Schematic . . . . . . . . . . . . . . . . . . . . .
Track Traversing Pixel Module . . . . . . . . . . . . . . . . . . . . . .
EM Calorimeter LAr dE/dx Schematic . . . . . . . . . . . . . . . . . .
EM Calorimeter Shower . . . . . . . . . . . . . . . . . . . . . . . . . .
EM Calorimeter Track . . . . . . . . . . . . . . . . . . . . . . . . . . .
Possible Choices for 2-dimensional FLD Projection . . . . . . . . . . .
4-Dimensional Fisher Projection . . . . . . . . . . . . . . . . . . . . .
Testing Phase: Fisher PDFs (FPDFs) . . . . . . . . . . . . . . . . . .
Straight Cut for a FPDF . . . . . . . . . . . . . . . . . . . . . . . . .
Cut Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Toy MC - DCP and SCP - Mixed Plots . . . . . . . . . . . . . . . . .
Toy MC - DCP and SCP - Parameter Discrimination Power . . . . . .
Toy MC - DCP and SCP - FPDFs . . . . . . . . . . . . . . . . . . . .
TTCP and SCP - Training Phase - Mixed Plots . . . . . . . . . . . . .
TTCP and SCP - Parameter Discrimination Power . . . . . . . . . . .
TTCP and SCP - FPDFs . . . . . . . . . . . . . . . . . . . . . . . . .
OTCP and SCP - Training Phase - Mixed Plots . . . . . . . . . . . . .
OTCP and SCP - Parameter Discrimination Power . . . . . . . . . . .
OTCP and SCP - FPDFs . . . . . . . . . . . . . . . . . . . . . . . . .
DCP and SCP - Training Phase - Mixed Plots . . . . . . . . . . . . . .
DCP and SCP - Parameter Discrimination Power . . . . . . . . . . . .
DCP and SCP - Scatter Profiles . . . . . . . . . . . . . . . . . . . . .
DCP and SCP - FPDFs . . . . . . . . . . . . . . . . . . . . . . . . . .
ML Fitting with TRT FLD Analysis . . . . . . . . . . . . . . . . . . .
ML Fitting with TRT and Pixel (dE/dx) FLD Analysis . . . . . . . .
ML Fitting with TRT, Pixel (dE/dx), and LAr (dE/dx) FLD Analysis
Upper DCP Fraction Limit Set as a Function of Total Number Events
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
12
15
16
18
18
19
19
21
22
23
24
27
28
29
29
30
31
31
32
33
35
35
35
42
42
43
44
44
45
45
46
46
48
48
49
50
51
52
53
54
44
45
46
47
48
49
50
51
Toy MC - DCP and SCP - Correlation Matrices . . . . .
Toy MC - TTCP and SCP - Correlation Matrices . . . .
Toy MC - OTCP and SCP - Correlation Matrices . . . .
Full MC - DCP and SCP - Correlation Matrices . . . .
DCP and SCP - FLD Cut Efficiencies and Significance .
TTCP and SCP - FLD Cut Efficiencies and Significance
OTCP and SCP - FLD Cut Efficiencies and Significance
DCP and SCP - FLD Cut Efficiencies and Significance .
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
60
60
61
61
62
63
64
65
Acknowledgments
I am indebted to a tremendous number of people for not only for helping to shape the
work involved in this thesis, but also for their support and effort with respect to teaching me
physics. This thesis for me stands not only as the culmination of my HEP research, but also
of my experience studying physics at Duke University. Hence, first and foremost, I would like
to offer my thanks to the Duke University Physics Department, which places an immense
amount of importance on educating undergraduates and fostering their research. This
department has truly made me feel like a valued member of the Duke Physics community.
Of the individuals involved in this research, I owe many thanks Prof. Mark Kruse for
all the support he has offered over the past few years. From the time I first approached
Prof. Kruse about doing HEP research all the way through final edits on this thesis, he
has provided guidance and mentoring that have been invaluable to me. In terms of keeping
my best interests in mind, I will consider myself extremely fortunate if the next research
advisor I have cares half as much about supporting my research and my understanding as
Prof. Kruse has over these years.
A huge part of this research for me was learning new physics. I would thus like to extend
my thanks to the various professors in the Duke Physics departments who helped shape my
understanding of physics and were willing to put up with me as a student; I remain grateful
that so many did not lose their patience with the incessant number of questions I ask. In
particular, I would like to extend a special thanks to Prof. Henry Greenside who provided
me an vast amount of guidance and knowledge during my time at Duke. The effort that
Prof. Greenside puts into his courses and his students is unparalleled and something I valued
greatly as an undergraduate. I truly appreciated that he was always willing to let me pick
his brain about physics, graduate school, complex time, philosophies, and life. I would
also like to thank Prof. Ronen Plesser who has helped me realize what I most enjoy about
physics and has always taken the time to discuss any topic I threw at him over coffee at
Joe Van Gogh. I greatly appreciate his sincere and contagious enthusiasm for physics.
I would also like to thank the Benjamin N. Duke Scholarship for funding my undergraduate education. The extent to which I have been able to dedicate myself to my studies
and this research is primarily owing to the BN Duke Scholarship Program. I feel immense
gratitude for the opportunity it has provided me at Duke that I could certainly not have had
otherwise. Additionally, I would like to thank the Mellon Mays Undergraduate Fellowship
and the Duke ATLAS Group for funding and organizing my summer at CERN. It was an
experience of a lifetime to take part in LHC research in Geneva.
Finally, I would like to thank my parents and my sister, Carina. They have tolerated
a level of obsessiveness towards physics that most could not, all while being nothing but
supportive and loving to me. It means a great deal to me to a have parents who fostered
and encouraged all my pursuits and a sister who, among other things, cares enough to call
back when she senses over the phone that I have equations on the scratch paper in front of
me.
viii
Preface
If you read the acknowledgments, realized from the table of contents that there are forty
pages before the results section, or merely read this sentence, you have probably noted that
my writing is a bit, well, long-winded. After noticing the suggested page limit of seventy
pages for undergraduate theses, I considered shaving down the content in the introduction
and more theoretically oriented sections of this thesis. Instead, I have left the sections
I considered trimming as they are in hope that some future student may find it useful.
In particular, §2.1-§2.5 provide a lengthy – although extremely cursory – introduction to
the Standard Model where I have attempted to explain the basic motivating connections
between classical field theory, conserved currents, gauge invariance, quantization, and elementary particles in a manner that I would have wanted to read as an undergraduate
beginning research and also fits in the confines of a thesis. Also, since we use charge measurements from various subdetectors of ATLAS, in §6 I take the opportunity to make a
greater connection between the apparatus involved (TRT, Pixel Detector, or LAr calorimeters) and the physical processes being measured (TOT, TE, or dE/dx) than is usually the
norm in most detector-composition sections of theses for ATLAS research. Finally, in §7
and Appendix A, I provide a somewhat explicit explanation of the statistical techniques
used. If this thesis were a more formal dissertation or journal entry, some, if not all, of the
information mentioned above would have to be removed. However, considering this work as
an undergraduate thesis, I think the information helps more that the extra length detracts.
Also, since I put a good deal of effort into them, I will note that any schematic, illustration, or Feynman diagram that is not cited immediately below the image, I made specifically
for that section using some combination of Powerpoint, GIMP, LaTeX, and Mathematica.
ix
Part I
Theoretical Foundations
1
1
Elementary Particle Physics
Elementary particle physics addresses one of the oldest questions facing scientists: what
is matter composed of at the most fundamental level? Our notion of the fundamental
composition of matter has progressed from the initial concept of an inseparable atom, to
electrons, protons and neutrons, and has been further revised to include photons, mesons,
gauge bosons, antiparticles, neutrinos, and most recently, quarks. Along with these changes
to what we consider to be the most fundamental particles, came changes to what models
were proposed to explain them. From the Bohr model, to the development of nonrelativistic
quantum mechanics and quantum field theory, we have been continuously paralleling these
revisions with new theories to explain them. And, as was the case for the positron and
quarks, at times the theory advanced ahead of the observed phenomena. Currently the
most well established model we have to explain the recent state of particle physics is the
Standard Model.
2
The Standard Model
The Standard Model (SM) has stood for almost half a century as the most powerful, predictive theory available to particle physics. To delve into the full theory and mathematics of
SM physics would not only be entirely out of the scope of this thesis,∗ but it would also be
far too robust to consider examining within the confines of this work. However, one cannot
help but give context to the subject upon the shoulders of which this thesis stands, so in
what follows, I will attempt to at least motivate some of the primary concepts laying the
foundation of SM physics.
2.1
Classical Field Theory
To begin, it is beneficial to first look to the Lagrangian formulation of classical field theory.
Using the Lagrangian formulation of classical mechanics for particles, we consider an action
with Lagrangian, L, as a function of the generalized, local coordinates and momentum, q
and q̇,
Z
t2
S=
dt L(q, q̇, t).
t1
However, motivated by classical field theory, we will consider an action determined by a
Lagrangian density, L , as a function of a field, φ, which are themselves functions of position
in spacetime:
Z
Z
t2
S=
x2
dt
t1
d3 x L (φ, φ̇, ∇φ).
x1
Introducing tensor notation, Einstein’s summation convention, and the metric
gµν = Diag(1, −1, −1, −1), this can be simplified to
Z x2
S=
d4 xL (φ, ∂µ φ).
x1
∗
and the current reach of the author.
2
(1)
Varying S, we are lead to
Z
∂L
4
δS =
d x
∂φ
Z
∂L
=
d4 x
∂φ
Z
∂L
=
d4 x
∂φ
Z
∂L
=
d4 x
∂φ
∂L
δφ +
δ(∂µ φ)
∂(∂µ φ)
Z
∂L
∂L
4
δφ + d x ∂µ
− ∂µ
δφ
∂(∂µ φ)
∂(∂µ φ)
x2
∂L
∂L
δφ +
− ∂µ
δφ
∂(∂µ φ)
∂(∂µ φ)
x1
∂L
δφ
− ∂µ
∂(∂µ φ)
(2)
(3)
(4)
(5)
where in the third and fourth steps we have used that the function being integrated is a fourdivergence and the integral is held fixed at the boundary points. In fact, from the definition
Equ.(1), the action is invariant under any transformation of the form L → L + ∂µ j µ where
j µ vanishes at the boundary; a property we will need in §2.2.
Since δφ is arbitrary in Equ.(5), if the action is to be at an extrema, we must have that
∂L
∂L
− ∂µ
= 0.
(6)
∂φ
∂(∂µ φ)
These simultaneous equations constitute the Euler-Lagrange equations for a field. If there
are multiple fields, there is a corresponding Euler-Lagrange equation for each.
2.2
Noether’s Theorem and Conserved Currents
Now following along the lines of Peskin and Schroeder’s discussion [18], we can note that
this formulation lends itself to illustrating conserved currents using Noether’s Theorem.
Consider an infinitesimal transformation of our field
φ → φ + ∆φ
(7)
where is assumed to be small. This transformation results in our Lagrangian∗ becoming
L
→ L (φ + ∆φ, ∂µ (φ + ∆φ))
∂L
∂L
= L (φ, ∂µ φ) +
∆φ +
∂µ (∆φ)
∂φ
∂(∂µ φ)
∂L
∂L
∂L
∆φ +
− ∂µ
∆φ,
= L (φ, ∂µ φ) + ∂µ
∂(∂µ φ)
∂φ
∂(∂µ φ)
where we have used the infinitesimal definition of partial derivatives and have only kept
terms of order . Imposing the Euler-Lagrange equations from Equ.(6), we have
∂L
L → L (φ, ∂µ φ) + ∂µ
∆φ .
(8)
∂(∂µ φ)
∗
At this point, we will drop the term ‘density’ in ‘Lagrangian density’, since we will only be considering
densities from here forward.
3
Now, let us consider this transformation to be a symmetry if the Lagrangian remains
invariant under the infinitesimal transformation. By Noether’s Theorem, this restriction
will introduce a conservation law. Note that doing so is a more restrictive requirement than
just making the action invariant; Equ.(8) already satisfies that restriction. For a Lagrangian
L 0 to be invariant under our transformation, Equ.(8) implies that we must have
∂L 0
∂µ
∆φ = 0.
(9)
∂(∂µ φ)
Defining the Noether’s current, j µ , as
jµ =
∂L 0
∆φ,
∂(∂µ φ)
Equ.(9) immediately implies that this current is a conserved quantity:
∂µ j µ = 0
(10)
Furthermore, noting that
∂µ j µ = 0 ⇐⇒ ∂0 j 0 = −∂i j i ,
we can easily see that integrating over all space
Z
Z
d
3
0
d xj =
d3 x ∂0 j 0
dt
Z
= − d3 x ∂i j i
= 0
leads to a conserved scalar. So defining
Z
Q=
d3 x j 0 ,
(11)
we see that in addition to Eq.(10), we can state that the symmetry leads to a “charge” that
remains constant in time.
But suppose that we want to impose that a pre-existing Lagrangian be invariant under
our transformation. In other words, if we have some Lagrangian, L , such that
∂L
∂µ
∆φ 6= 0
∂(∂µ φ)
and is well supported by experiment, how would we go about adjusting L in a way that
produces the same equations of motion while also being invariant under our infinitesimal
transformation?
Well, we saw in §2.1 that we can add the four-divergence of any physically reasonable
current∗ to the Lagrangian and have the variation of the action remain unaffected; this
∗
In our case we are integrating over all space so a ‘physically reasonable’ current is one that vanishes at
infinity.
4
seems like a promising possibility since it would leave the equations of motion invariant. So
suppose for a moment that we can find a J µ that transforms as
∂L
µ
µ
J →J +
∆φ
∂(∂µ φ)
under our infinitesimal transformation (which is not necessarily an easy task). Then one
can easily check that the modified Lagrangian
L 0 = L − ∂µ J µ
will remain invariant. L 0 thus satisfies the two requirements that we imposed; the equations
of motion remain unaffected and the Lagrangian is invariant under Equ.(7). Also, if our
choice of J µ has no dependence on ∂µ φ, then our conserved current is simply
∂L 0
∆φ
∂(∂µ φ)
∂L
∆φ
=
∂(∂µ φ)
= Jµ
jµ =
(12)
So far, the above may seem a bit pedagogical and more about noticing invariants inherent
in differential equations than any physical predictions. To fix that inadequacy, we will
consider the simplest example relevant to our discussion of SM physics.
2.3
The Dirac Equation and Gauge Theory
Relativistic quantum systems of spin-1/2 are governed by the Dirac equation
(iγ µ ∂µ − m)ψ = 0
where we use units ~ = c = 1 and our “field” is now ψ, a four-component wavefunction in
Hilbert space known as a spinor.∗ γ µ denote the standard 4 × 4 gamma matrices. Here, we
note that using the Lagrangian
L = ψ̄(iγ µ ∂µ − m)ψ,
(13)
the Euler-Lagrange equations yield
∂L
∂L
− ∂µ
= 0 ⇒ iγ µ (∂µ ψ) − mψ = 0
∂ ψ̄
∂(∂µ ψ̄)
∂L
∂L
− ∂µ
= 0 ⇒ i(∂µ ψ̄)γ µ + mψ̄ = 0
∂ψ
∂(∂µ ψ)
∗
What proceeds in the next two sections follows closely with the discussion in Chapter 11 of Griffiths on
gauge transformations and theories [13], but here we make the needed connection to Noether’s current.
5
which are consistent with the Dirac and adjoint Dirac equations∗ , so we take it to be the
“free” Dirac Lagrangian.†
From Eq.(13) it is rather obvious that the global gauge transformation of
ψ → e−iqλ ψ
leaves L unchanged. However, consider a local gauge transformation, i.e., one dependent
on the position in spacetime, x, of the form
ψ → e−iqλ(x) ψ,
where λ is a scalar function. In contrast to the global transformation, we see that the local
transformation induces the following transformation to the Lagrangian:
L
→ i(e+iqλ(x) ψ̄)γ µ ∂µ (e−iqλ(x) ψ) − m(e+iqλ(x) ψ̄)(e−iqλ(x) ψ)
=
iψ̄γ µ [−iq(∂µ λ(x))ψ + ∂µ ψ] − mψ̄ψ
=
L + (q ψ̄γ µ ψ)∂µ λ(x).
(14)
At last, we arrive at one of the fundamental concepts of SM physics. Although in electrodynamics we often choose a gauge (Coulomb, Lorentz, etc.) and gain some mathematical
convenience, here we will require that the complete Lagrangian be invariant under local
gauge transformations and discover the physical, conserved currents within our system. So
with this idea in mind, how do we go about killing off the second term in Eq.(14)?
To answer that question, we look back to electrodynamics. As mentioned above, gauge
invariance does appear in classical electrodynamics when considering Maxwell’s inhomogeneous equations
∂µ F µν = 4πJ ν
(15)
governing a massless vector potential‡ , Aµ , where
F µν = ∂ µ Aν − ∂ ν Aµ .
Now, as we did with the Dirac equation, we can construct a free Lagrangian consistent with
Eq.(15), to wit,
1
L =−
Fµν F µν − J µ Aµ .
16π
As we can easily check, the gauge transformation
Aµ → Aµ + ∂µ λ(x)
∗
Here, the adjoint wavefunction ψ̄ denotes ψ̄ = ψ † γ 0 , where as usual, ψ † denotes the hermitian conjugate
of ψ.
†
I wish there were some way to derive such a Lagrangian from fundamental principles, instead of this
guess and check method. Sadly, in relativistic QM we generally construct L to match the governing equation
of motion.
‡
This is the vector potential of a very well known particle, but I don’t want to spoil my punchline.
6
leaves
∂µ F µν
→ ∂µ (∂ µ {Aν + ∂ ν λ} − ∂ ν {Aµ + ∂ µ λ})
=
∂µ (∂ µ Aν − ∂ ν Aµ ) + ∂µ (∂ µ ∂ ν − ∂ ν ∂ µ )λ
=
∂µ (∂ µ Aν − ∂ ν Aµ )
=
∂µ F µν
invariant. Hence, the Lagrangian for this system transforms as
L → L − J µ ∂µ λ(x).
(16)
Comparing Eq.(14) and Eq.(16) we see that the transformation property of the vector
potential with an appropriate choice of J µ has the form needed to cancel the unwanted
term. So combining the free Dirac and free massless vector field Lagrangians,
1
µ
µν
+ [−q ψ̄γ µ ψAµ ],
(17)
LQED = [iψ̄γ ∂µ ψ − mψ̄ψ] + −
Fµν F
16π
we have constructed a complete Lagrangian invariant under local gauge transformations,
where we have determined J µ to be
J µ = q ψ̄γ µ ψ.
Not only that, but as the label and boxing suggests, local gauge invariance has resulted in
a description of quantum electrodynamics (QED):
iψ̄γ µ ∂µ ψ − mψ̄ψ ⇒ Dirac Field
1
−
Fµν F µν ⇒ Massless Vector Field
16π
−q ψ̄γ µ ψAµ ⇒ Electromagnetic Coupling
Going even one step further, we see that the local gauge invariance described above is an
example of a symmetry with a corresponding Noether’s current. Consider the local gauge
transformation considered above to be infinitesimal. Then,
ψ → e−iqλ(x) ψ
=
(1 − iqλ(x))ψ
where we drop terms of order λ2 . Comparing with Eq.(7), we adjust notation from → λ
and ∆φ → −iqψ. Then from Eq.(12) and our free Dirac Lagrangian, we know the conserved
Noether’s current for the transformation to be
j µ = J µ = (iψ̄γ µ )(−iqψ) = q ψ̄γ µ ψ,
which is just the Dirac current present in LQED . We can easily check that for ψ satisfying
the Dirac equation
∂µ j µ = q∂µ (ψ̄γ µ ψ)
= q((∂µ ψ̄)ψ + ψ̄(∂µ ψ))
= q(−mψ̄ψ + mψ̄ψ)
= 0
7
as it should. And we note that the conserved “charge” from this current is indeed the charge
of the system
Z
Q =
d3 x j 0
Z
= q d3 x ψ̄γ 0 ψ
Z
= q d3 x ψ † ψ
= q,
justifying our choice of the constant q.
2.4
Group Symmetries
In the above discussion, we used a notational choice of exp(−iqλ) for our gauge transformation, when we could have equally represented the same transformation with any unitary
1 × 1 matrix, U . Written more explicitly, the transformation
ψ → Uψ
satisfying
U †U = 1
(18)
would have been equivalent. Using this transformation, we refer to the local gauge invariance
present in our discussion of the Dirac field as U (1)∗ gauge invariance. Hence, we found that
applying U (1) gauge invariance to a single spinor obeying the Dirac equation yields the SM
description of QED and the electromagnetic interaction.†
Similarly, this process can be continued to explain further physical fields and interactions. Specifically, applying SU (3)‡ gauge invariance to a composite wavefunction of three
spinors obeying the Dirac equation, yields the SM description of quantum chromodynamics
(QCD) and the strong interaction. Lastly, applying SU (2) gauge invariance to a composite
wavefunction of two spinors obeying the Proca equation yields the SM description of the
weak interaction.§
Combining these groups yields the structure of SM phyiscs, SU (3) × SU (2) × U (1),
which consists of a description of three fundamental forces:
SU (3) ⇒ Strong Force
SU (2) ⇒ Weak Force
U (1) ⇒ Electromagnetic Force
∗
This notation is used to represent the group of unitary 1 × 1 matrices. In general, U (n) would represent
the group of n × n matrices satisfying Eq.(18)
†
Obviously, considering how much we have simplified the discussion, the boldness of this statement is to
be taken with a grain of salt.
‡
This follows the same notation as U (n) except the inclusion of the S denotes that the group is also
“special”, having determinant of one.
§
This is increasingly more complex because the vector potentials introduced are no longer massless.
8
At sufficiently high energies, this model successfully predicts the unification of the electromagnetic and weak forces into SU (2) × U (1), describing the electroweak force. Since there
is not currently a complete theory of quantum gravity, SM does not include the fourth
fundamental force, gravity.
2.5
Quantum Field Theory and the Observable Quanta
At this point, we have described SM physics using fields and vector potentials, yet the model
is well known for the structure it creates with respect to particles. To describe how these
fields and interaction currents give rise to observable particles, we would need to make the
full transition to quantum field theory (QFT) and quantize the fields being discussed. As
mentioned before, to go into the full quantization of these fields is both tremendously time
consuming and beyond the reach of this thesis. Here, we will have to settle for presenting
the important results, cursory as they may be, as described below.
Quantizing the above fields using QFT in a sense amounts to Fourier transforming the
differential equation governing a field and using the structure of the resulting equation to
reinterpret the fields as an infinite number of quantum harmonic oscillators permeating free
spacetime.∗ From this interpretation, excitations from vacuum† in these fields correspond
to the observable quanta: particles. From the Dirac field, this quantization yields twelve‡
fundamental particles; six leptons and six quarks, each with a corresponding antiparticle
with opposite quantum numbers. These particles constitute all known Fermions, which
have spin-1/2, and are organized into three generations of increasing mass. There are three
lepton types (electron, muon and tau) each with a corresponding neutrino, a nearly massless,
neutral particle.
Lepton
Generation
electron (e)
electron (ν )
e
neutrino
muon (µ)
muon
neutrino (νµ )
tau (τ )
tau
neutrino (ντ )
1st
1st
2nd
2nd
3rd
3rd
Electric
Charge (e)
-1
0
-1
0
-1
0
Mass (MeV)
0.510998910 ±(1.3 × 10−9 )
< 2 × 10−6
105.658367±(4 × 10−5 )
< 1.9
1776.82 ±0.16
< 18.2
Table 1: Lepton Properties and Data [20]
There are 6 flavors of quark (up, down, charm, strange, top, and bottom), each of which
can carry one of three colors (red, blue, or green). This results from the triplet of spinors
∗
For instance, an introductory example of this process is done at the beginning of Peskin and Schroeder
[18] using the Klein-Gordon equation.
†
For those who are familiar with quantum harmonic oscillators, this phrase refers to applying the raising
ladder operator on the ground state.
‡
The counting here is a matter of taste. Some would state that I should have written twenty-four since
an antiparticle is also fundamental, but the symmetry between these groups is so great that I will continue
to count in this manner. It avoids continuing this argument further to counting thirty-six quarks because
there are six quarks, six antiquarks and three of each color.
9
introduced to gain a description of QCD, and hence the root “chromo” in chromodynamics.
The notion of color in this context has nothing to do with the color associated with light
and merely stands as a convention to label and handle the symmetries of SU (3).
Flavor
Generation
up (u)
down (d)
charm (c)
strange (s)
top (t)
bottom (b)
1st
1st
2nd
2nd
3rd
3rd
Electric
Charge (e)
+2/3
-1/3
+2/3
-1/3
+2/3
-1/3
Mass (MeV)
1.7 - 3.3
4.1 - 5.8
1270+70
−90
101+29
−21
17, 200+900
−130
4190+180
−60
Table 2: Quark Properties and Data [20]
From SU (3), SU (2) and U (1) gauge invariance, SM predicts a total of 12, spin-1 gauge
bosons that mediate three of the four fundamental forces. U (1) gauge invariance generates
a single gauge boson, the photon, mediating the electromagnetic interaction between all
charged particles. SU (3) gauge invariance generates eight gauge bosons, gluons, mediating
the strong interaction between quarks. SU (2) gauge invariance generates three bosons,
W + , W − , and Z, mediating the weak interaction between all fermions.
Gauge
Boson
photon (γ)
gluon (g)
W±
Z
Force
Mediated
electromagnetic
strong
weak
weak
Electric
Charge (e)
0
0
±1
0
Mass (GeV)
0
0
80.399 ± 0.023
91.1876 ± 0.0021
Table 3: Gauge Boson Properties and Data [20]
Lastly, the inclusion of a Higgs mechanism in the SM predicts the existence of a spin-0
boson, the Higgs boson, that interacts with all massive particles (including itself), giving
them mass. This boson is predicted from arguments of symmetry breaking and has yet to
be observed experimentally at the time of writing of this thesis. A schematic summary of
the fundamental particles of the Standard Model is shown below.∗
∗
Not included in this figure is the Higgs boson.
10
Figure 1: Standard Model Chart
3
Physics Beyond the Standard Model
One of the shortcomings of the SM often addressed is the fact that weak interactions only
take place between left-handed fermions. Here the handedness, or chirality, of particles is
determined by its intrinsic spin. If a particle has intrinsic spin that points in the same
direction as its velocity, it is referred to as right-handed, whereas if its intrinsic spin points
in the opposite direction of its velocity, it is referred to as left-handed.
(a) Left-handed
(b) Right-handed
Figure 2: Chirality of Fermions
Using gauge group notation, the SM picture of the electroweak interaction is often
written as
SU(2)L × U(1)
where SU(2)L denotes the left-handed gauge group for the weak interactions.
3.1
Anomalously Charged Particle Searches
Now instead of leaving the aforementioned left-skewed attribute present in the SM to be
a feature inherent in nature, many theories extend the SM to have a right-handed gauge
11
group for weak interactions.[8]∗ One of these theories is the Left-Right Symmetric Model
(LRSM) with
SU(2)L × SU(2)R × U(1).
By extending SM physics in this manner, the LRSM predicts a triplet of complex Higgs
++
fields, with resulting left- and right-handed gauge bosons ∆0L,R , ∆+
L,R and ∆L,R .[8, 12, 23]
The mass of the doubly, positively charged Higgs boson predicted from this model is on the
order of ∼ 100 GeV. This is in contrast to the SM, which suggests a scalar Higgs field with
a single, neutral, left-handed gauge boson ∆0L .
The Feynman diagrams for two possible processes that would lead to an observable ∆++
L,R
at high enough energies are shown below.†
Figure 3: Doubly Charged Higgs Production: Vector Boson Fusion
+
+
WL,R
+ WL,R
→ ∆++
L,R
Figure 4: Doubly Charged Higgs Production: Drell-Yan
−−
q + q̄ → ∆++
L,R + ∆L,R → 4l
The above extension of the Standard Model and others that similarly predict fundamental particles of anomalous charge have motivated multiple group efforts to search for
anomalously charged particles (ACPs) that are long-lived enough to be detected at the
LHC and other detectors. In the 1995, the OPAL Collaboration with CERN set limits on
long-lived fractionally and doubly charged particles up to a mass of approximately 45 GeV,
∗
Unless otherwise stated, the reference for the material in this section is [8].
In Figure 4 the notation γ/Z/Z 0 denotes that the mediator can be a photon, Z boson or a Z 0 boson,
where Z 0 is the theorized right-handed partner to the left-handed Z boson.
†
12
finding no evidence of such signals.[21] Currently, there are both long-lived, fractionally and
doubly charged particle searches under way by the Highly Ionizing Particle Group of the
ATLAS Collaboration with respect to the above Drell-Yan production of ACPs. For more
information regarding long-lived, fractionally charged particle searches, see [19]. There are
also dark matter searches underway that involve long-lived, doubly charged particles.[1]
If such long-lived, anomalously charged particles exist, the charge profile of events constructed from ATLAS’s Inner Tracker and Calorimeters, detailed in §5.3 and §5.4, may
offer a powerful discriminating tool. The purpose of what follows in this thesis is to detail
a technique to combine measurements from these charge profiles to discriminate for ACPs
most effectively and test the initial feasibility of these searches.
13
Part II
The Large Hadron Collider
14
4
4.1
The Large Hadron Collider and CERN
The CERN Complex and Experiments at LHC
CERN, the European Organization for Nuclear Research, is the scientific research center
which is the site of the Large Hadron Collider (LHC). With a circumference of 27 km, the
LHC is the world’s largest particle accelerator and is designed to produce proton-proton
(p-p) collisions at the highest energies and luminosities ever experimentally observed.
To reach the energies that the LHC was designed to achieve, the protons first pass
through a series of smaller accelerators. At optimal operation, these protons are stripped
from a hydrogen source and accelerated to 50 MeV in the Linac 2, 1.4 GeV in the Proton
Synchrotron Booster (PSB), 25 GeV in the Proton Synchrotron (PS), 450 GeV in the Super
Proton Synchrotron (SPS) and lastly, reach 7 TeV in the LHC. [11] The structure of this
series of accelerators is depicted in Figure 5. There are two oppositely circulating beams of
protons, so when these protons collide at the interaction point they have a center-of-mass
energy of 14 TeV and a luminosity of 1034 cm−2 s−1 . These high energy, high luminosity
collisions allow for the production of extremely rare particle events and thus, the possibility
to discover new physics phenomena.
Figure 5: CERN Complex [10]
Although there are many experiments going on at CERN, there are two primary generalpurpose detectors aimed at measuring p-p collisions: the Compact Muon Solenoid (CMS)
15
and A Toriodal LHC ApparatuS (ATLAS). These two detectors measure similar attributes
of the p-p collisions, however each detector does so using different methods. The analysis
of this thesis focuses entirely on the ATLAS detector, detailed below.
5
ATLAS
With the LHC designed to produce luminosities on the order of 1034 cm−2 s−1 at energies of
14 TeV, the ATLAS detector is designed to provide more precise measurements of existing
physical phenomena such as QCD and electroweak interactions while also contributing to
the search for the SM Higgs boson and new physics beyond SM. In what follows, we detail
the composition of the detector and its various subdetectors.
The primary source for the description that follows is [3]. Any other source will be
explicitly cited.
Figure 6: ATLAS Detector: Above we have a labeled schematic of the ATLAS detector.
The detector is 25 m high, 44 m long and over 7,000 tons. As a reference of scale, there are
two people shown near the left end-cap.[3]
5.1
Coordinate Basis and Variables
Within this thesis, there are many references to the standard coordinate system and common
variable names used in ATLAS research. In what follows, we detail our specific choices of
notation.
The interaction point of the opposing beams at ATLAS’s center is taken to be the origin
of our coordinate system. The z-axis points along the beam axis while the positive y-axis
points up and the positive x-axis points towards the center of the LHC ring. The positive zaxis is taken to be such that the coordinate axes compose a right-handed coordinate system,
16
i.e., the z-axis points along the counter-clockwise direction as seen from above, looking down
at the LHC ring.
Two angles used are the polar angle, θ, which is the angle from the positive z-axis
(0 ≤ θ ≤ π) and the azimuthal angle, φ, which is the angle measured around the z-axis
starting from the positive x-axis (0 ≤ φ ≤ 2π). More commonly, instead of using θ,
pseudorapidity, η, is utilized where
θ
η = −ln tan
,
2
0 ≤ |η| ≤ ∞. Pseudorapidity is an approximation of the exact rapidity,
1
E + pz
y = ln
2
E − pz
for highly relativistic particles, i.e., y → η as v → 1. To gain an idea of this variable
transformation, θ = 45◦ , 10◦ , and 1◦ yield η ≈ 0.88, 2.44, and 4.74, respectively.
Using this coordinate system, we define distances between particle tracks in pseudorapidityazimuthal space by
∆R =
p
(∆φ)2 + (∆η)2 .
And finally, parameters such as momentum and energy are often measured using only
there transverse (x and y) components, since their z-components are not as well measured
(we can only apply conservation of momentum in the transverse plane). These quantities will
be denoted with a subscript T, such as pT for transverse momentum and ET for transverse
energy.
5.2
Detector Composition
The ATLAS detector is composed of four primary, concentric sections. The first section,
closest to the interaction point is the Inner Tracker, which is responsible for reconstructing
the tracks and momentum of charged particles. The next section is the Electromagnetic
Calorimeter, which is primarily responsible for measuring the energy of electrons and photons. The third section, the Hadron Calorimeter is primarily responsible for measuring
the energy of hadrons such as neutrons and protons traversing the detector. And finally,
the fourth section is the Muon Chambers, which discriminate for muons and measure their
momentum. Figure 7, illustrating this structure, is shown below.
17
Figure 7: Labeled Cross Section of the ATLAS Detector: Above we see a schematic
of the cross-section of the barrel region of the ATLAS detector. Each subdetector composes
the next concentric cylinder that will measure a different quantity or particle from collision
events. The Tracking Chamber is the subdetector closest to the interaction point and
surrounding it are the EM Calorimeter, the Hadron Calorimeter, and the Muon Chamber.[6]
5.3
The Inner Tracker
The precise tracking and momentum measurements of particle events is primarily achieved
using the Inner Tracker (IT) of the ATLAS detector. The IT, which is immersed in a 2
Tesla magnetic field, is composed of three subdetectors each with a different method of
measurement: silicon pixel detectors in the Pixel Detector, silicon microstrips in the Silicon
Semiconductor Central Tracker (SCT), and gas filled straws in the Transition Radiation
Tracker (TRT), listed here in order of increasing radii. Although all three subdetectors
are independent, their information is combined for reconstruction of events in the detector,
providing extremely precise measurements of particle tracks and momentum in the IT.
Figure 8:
Inner
Tracker
Schematic: Shown to the left is
a diagram of the Inner Tracker
(IT) of the ATLAS Detector.
Each subdetector is composed
of both a barrel region and a
section of end-caps that measure
the momentum and tracking
of particles emerging from the
interaction point at various η.[3]
18
As shown in Figure 8, the IT has a two-fold structure consisting of a barrel region and
end-caps. In the barrel region, the detectors are arranged in concentric cylinders around
the interaction point. This region is shown in Figure 9.
Figure 9: Inner Tracker - Barrel Region Schematic: Here we
have a cross-sectional schematic
of the barrel region of the IT. As
radii increases, the method of detection changes: the Pixel Detector (50.5 mm < R < 122.5 mm),
SCT (299 mm < R < 514 mm),
and the TRT (554 mm < R <
1082 mm).[3]
In the end-caps, the detectors are arranged on disks that are perpendicular to the beam
axis. This portion of the IT, shown in Figure 10, detects particles emerging at higher η.
5.3.1
The Pixel Detector
The pixel detector is composed of 1,744 n-doped, silicon pixel sensors biased at a gate voltage
greater than 150 V. In the barrel region, these sensors form three concentric cylinders,
entirely covering radii of 50.5mm, 88.5mm and 122.5mm and |η| < 1.7. In the end-cap
region, these sensors cover 3 disks on each side from 1.7 < |η| < 2.5. Each of these sensors
is 19×63 mm2 and has 46,080 readout pixels of approximate dimensions 50×400 µm2 . The
resulting spacial resolution is 10 µm in the R − φ direction and 115 µm in the z-direction.∗
∗
The quoted spacial resolutions are all for the barrel regions of these subdetectors. Resolution for the
disks are similar, but merely measure in the R-direction instead of the z-direction.
Figure 10: Inner Tracker - EndCaps Schematic: Shown to the
left is a cross-sectional schematic
of one of the IT’s end-caps. Illustrated in red are two tracks of particles emerging at η = 2.2 and η =
1.4 from the interaction point.[3]
19
With such a high resolution, the pixel detector is both the most accurate and expensive
tracking subdetector of ATLAS.
When a pixel is traversed by an ionizing particle, the sensor is triggered and an array
of its pixels’ voltage signals are readout. Most commonly, a collection of neighboring pixels
(referred to as a cluster) is triggered and using this information, we can extract energy loss
(dE/dx) as detailed in §6.2.
5.3.2
The Semiconductor Central Tracker
The second-most inner component of the IT is the Semiconductor Central Tracker (SCT).
This subdetector is composed of 15,912 p-in-n∗ semiconductor sensors. In the barrel region
these sensors form four concentric, tiled cylinders covering radii of 299 mm, 371 mm, 443
mm and 514 mm while in the end-caps, the sensors cover 9 disks on each side, with η
coverage similar to that of the pixel detector. Each sensor has 768 active strips each with
width of 80 µm and a length of 12 cm.
Even though the SCT uses technology similar to that of the pixel detector for tracking
events, the SCT is substantially less precise; the spacial resolution for the SCT is 17 µm in
the R − φ direction and 580 µm in the z-direction. The reason for this method of detection
is simply that it was the most financially feasible way to cover the larger surface area, while
still providing precise tracking.
5.3.3
The Transition Radiation Tracker
The outermost subdetector of the IT, the Transition Radiation Tracker (TRT), utilizes gas
filled straws serving as drift chambers for ionizing particles. Each of these 4 mm diameter
straws has a 31 µm, gold-coated tungsten wire running along the straw’s center to serve
as an anode held at high-voltage. Functioning as a cathode held at ground potential, the
straw’s interior is coated with 0.2 µm of Al. In between the anode and cathode is a Xenon
gas mixture composed of 70% Xe, 27% CO2 and 3% O2 . There are 122,880 straws on each
end-cap disk and 52,544 in the barrel region, composing three rings, each with 32 modules
of straws installed into a foam matrix. This structure is more clearly seen in Figure 9.
When ionizing particles traverse the TRT, there are two signals detected. One of these
signals is due to transition radiation (TR) emitted from the particle traversing the radiator
foam material contained in the straw matrix embedded between straws. The TR produced
is usually in the low-energy X-ray spectrum of approximately 5 keV which causes cascades
in the Xe gas mixture. The second signal is the result of primary electrons (PE) from
the ionization of the straw’s gas. These electrons are collected on the anode due to the
potential difference between the wire and the wall. The anode is then sampled at a rate of
3.125 ns to determine the resulting current. To distinguish between the primary electron
signals and the TR signals, the TRT uses a low-threshold (∼250 eV) for the PEs and a highthreshold of 6 keV for TR. A more detailed explanation of the resulting time-over-threshold
measurements gained from the TRT is detailed in §6.
The barrel region of the TRT covers |η| < 1.0 and the end-caps approximately cover
1.0 < |η| < 2.0.
∗
The notation here refers to an intrinsic semiconductor placed between a p-doped side and n-doped side.
20
5.4
Calorimetry
Further from the interaction point, ATLAS is composed of multiple calorimeters that measure the energy deposited by particles that are long-lived enough to travel further than the
Inner Detector and interact with the various materials present in the calorimeters. This
section of the detector is composed of the Electromagnetic Calorimeter and the Hadronic
Calorimeter. The Electromagnetic Calorimeter, detailed in §5.4.1, is primarily responsible
for measuring the energy of photons and electrons, whereas the Hadronic Calorimeter, detailed in §5.4.2, is primarily responsible for the energy deposition measurements of hadronic
particles such as protons, neutrons, and pions.
Figure 11: Labeled Schematic of Calorimeters[3]
5.4.1
Electromagnetic Calorimeter
The Electromagnetic Calorimeter (EMC) is composed of two main sections contained in a
cryostat at 88K that cover different ranges of pseudo-rapidity; the barrel region covers from
|η| < 1.475 while the two end-caps cover 1.375 < |η| < 3.2. The barrel region is composed
of two identical, separate cylindrical barrels leaving a small gap of 6 mm at z = 0 that
can be seen in Figure 11. The end-caps are also composed of two sections: an inner wheel
covering 2.5 < |η| < 3.2 and an outer wheel covering 1.375 < |η| < 2.5. Even within these
sections there is further sub-structure that we will detail below.
All components of the EMC measure energy deposition using lead (Pb) as an absorber for
particles traversing the EMC and Liquid-Argon (LAr) as the scintillator medium. The signal
from the interaction between the particle and these layers is then read out by electrodes
between two layers of LAr. A more detailed explanation of the structure of the EMC
components, the physical processes involved, and how energy loss is calculated using the
EMC is detailed in §6.3. Here, we will instead focus on the general structure of the EMC.
21
Figure 12: EM Calorimeter Module Schematic: [3]
Each barrel region of the EMC is composed of 1024 accordion-shaped layers of absorbers.
A schematic of the one of the 16 accordion-shaped modules of a barrel is shown in Figure
12. The zig-zag layers are composed of 1) a sheet of lead of thickness between 1.13 mm and
1.53 mm glued between two 0.2 mm thick sheets of stainless steel, 2) a gap of approximately
2.1 mm filled with LAr, 3) a readout electrode, and 4) another LAr gap of 2.1mm. This
four layer structure is then repeated. The electrode contained between the LAr gaps is
composed of two outer layers of copper held at high voltage (∼ 2000V) and an inner readout
electrode also made of copper that extends radially to the front and back of the module.
The approximately 190,000 channels needed to readout this information is fed outside of
the EMC through the gaps between the barrels and end-caps.
A similar structure of LAr-Pb and electrodes is utilized in the end-caps of the EMC
that cover higher |η|. Whereas the barrel region has a radial thickness of approximately
22 radiation lengths (Xo ), the end-caps, which were designed to withstand more energetic
particles, have radiation lengths of about 24 Xo .
5.4.2
Hadronic Calorimeter
The Hadronic Calorimeter, which is located outside of the Electromagnetic Calorimeter,
serves to measure the energy of hadrons traversing the detector. This calorimeter is composed of a tile calorimeter (TileCalo) in the barrel region (|η| < 1.7), the Hadronic End-cap
Calorimeters (HEC) immediately behind the EMC end-caps (1.5 < |η| < 3.2), and a Forward Calorimeter (FCal) located close to the beam pipe (3.1 < |η| < 4.9).
22
Figure 13: Hadronic Calorimeter Module: Shown above is a schematic of a module
from the barrel region of the Hadronic Calorimeter.[3]
The TileCalo uses steel as an absorber and plastic scintillators as the active medium. A
total of 64 modules (see Figure 13) compose the TileCalo, forming a projective geometry
of cells in the η-direction. The resulting light from the scintillators is readout through
wavelength shifting fibers fed to photomultiplier tubes (PMT) located between the three
separate barrels composing the TileCalo.
Housed in the same cryostat as the EMC End-caps, the HECs are also similar in design
to the EM components. The HECs are made with copper plates as absorbers and liquid
argon as a medium to induce hadronic showers.
Finally, the FCalo serves to measure both electromagnetic and hadronic showers with
high pseudo-rapidity. The first section of the FCal, closest to the interaction point, uses
cooper plates as absorbers while the second and third sections use tungsten. The FCal is
composed of a matrix of drift tubes parallel to the beam pipe containing liquid argon as
the medium.
5.5
Muon Chamber
The outermost subdetector of ATLAS is the Muon Chamber, which is responsible for the
tracking and momentum measurements of muons in conjunction with the muon’s IT track.
After passing through the inner tracker, the EM Calorimeter, and the Hadron Calorimeter,
muons are typically the only particles besides neutrinos emerging from the interaction point.
To accurately track muons, the Muon Chamber uses three superconducting magnet systems
to bend muons in the barrel region (|η| < 1.4), the transition region (1.4 < |η| < 1.6) and
the end-caps (1.6 < |η| < 2.7). Precision tracking is achieved using Monitored Drift Tubes
(MDT) over the range |η| < 2.7 and Cathode Strip Chambers (CSC) in the higher η region
2.0 < |η| < 2.7. These two devices function similarly to the TRT’s straw tubes and the
23
pixel sensors of the SCT,∗ respectively.
Figure 14: Muon Chamber Schematic: Shown above is a rendered image of the Muon
Chamber. Here we can see the primary detector components of the Muon Chamber – the
MDT and CSC – and the triggering components – the TGC and RPC.[3]
5.6
Muon Trigger
Due to the immense amount of data that is produced in all of the subdetectors of ATLAS,
a three level triggering system was developed to make selections of ‘interesting’ events;
these selections are part of the Trigger and Data Acquisition (TDAQ) systems. Since many
searches for new physics beyond the Standard Model involve muon events, including searches
for the Z 0 boson and SUSY events, the triggering system relies heavily on information from
the Muon Chamber. Since the it is the muon trigger that we would utilize for ACP searches,
we focus on this for our discussion. The three levels of triggering are LVL1, LVL2, and
Event Filter (EF), where each proceeding level refines the selections of the previous one.
The supporting material for this section is [17] and [3].
The first triggering stage is the LVL1 trigger. This stage uses three layers of resistiveplate chambers (RPCs) in the barrel region of the Muon Chamber and seven layers of
thin-gap chambers (TGCs) in the end-caps to identify high-pT events. The LVL1 trigger
also notes any Regions of Interest (RoIs) in η − φ space that have interesting features, such
as missing total transverse energy.
Following LVL1, the entirely software based LVL2 selections are made. This process is
composed of three algorithms that refine the selections made by LVL1 including the RoIs:
∗
Instead of using semiconductor sensors like those of the SCT, the Muon Chamber’s CDCs utilize the
voltage signal from cathode and anode wires overlayed in a grid.
24
µFast, µComb, and µIso. µFast uses the measurements made from the RPCs, TGCs, and
the MDTs of the Muon Chamber to reconstruct the η − φ position and pT of the events
from LVL1. µComb, as the name suggests, combines information from the Inner Tracker
to make further selections. And finally, µIso utilizes the information from the calorimeters
to reject muons emerging from bottom and charm semileptonic decays. Rejecting these
semileptonic decays increases the fraction of muon events coming from Z or W bosons, as
is the case in Drell-Yan production.
The third and final muon trigger stage is EF. Similarly to the LVL2 trigger, the EF
stage is entirely software based; using an offline reconstruction and analysis algorithm, EF
makes further refinements to LVL2’s selections. The muon EF offers more complex pattern
recognition techniques around a RoI to select from specific muon properties.
The three stages take the rate of selected events from approximately 40 MHz to 75 kHz
at LVL1, to 2 kHz at LVL2, and finally 200 Hz at EF, which is permanently stored; the
entire process is done with only 2.5µs of latency time at LVL1, 10 ms at LVL2, and 1 s at
EF.
25
Part III
Technique for Anomalously Charged
Particle Searches
26
6
Charge Profiles
To search for anomalously charged particles, we focus on the calorimeters and two components of the ATLAS Inner Tracker: the TRT and the Pixel Detector. These components of
ATLAS are most readily equipped to measure various properties of charged particles which
can be used to construct a charge profile characterizing an event. The hope is that the
charge profile composed from these measurements can be used to effectively discriminate
for anomalously charged particles. Detailed below are the primary parameters measured
and used in the analysis explained in §7.
6.1
Time-Over-Threshold and Trailing Edge Time in the TRT
Two of the parameters used in the analysis to follow are gained from the TRT. As mentioned
in §5.3.3, when charged particles traverse the straws of the TRT, the Xe gas mixture is
ionized and the PEs produced result in charge collected on the inner wire due to the potential
difference. In addition, low-energy x-rays from TR in the radiator foam can pair produce
leading to additional PEs from the tracks of the resulting, ionizing particles. A schematic
of such a scenario is depicted in Figure 15.
Figure 15: TRT Straw Matrix Cross-Section:
Here we have a cross-section schematic illustrating a highly relativistic electron traversing
the TRT. After passing through the first straw on the right, the electron emits transition
radiation while passing through the foam. This photon then pair produces an electron and
a positron. The thinner arrows within the straws represent PEs migrating towards the
straw’s inner wire. Note that the TR photon is shown at an exaggerated emission angle for
illustrative purposes; these photons are typically very colinear with the track direction.
To track particles traversing the TRT, the voltage signal from the straws is read out at
intervals of 3.125 ns. This signal is represented by a 24 binary bit pattern corresponding
to an interval of 75 ns; when the voltage is over the low-threshold (LT) of ∼ 250 eV a 1
is recorded, otherwise, a 0 is recorded. An example bit-pattern and corresponding voltage
signal are shown in Figure 16.
From this information we extract three parameters relevant to the particle’s charge:
leading edge time (LE), trailing edge time (TE), and time-over-threshold (TOT). LE and
TE roughly correspond to the time at which the first and last primary electrons produced
27
Figure 16: Time Over Threshold Schematic
Courtesy of Prof. Mark Kruse
by the traversing particle are detected, respectively. The time-over-threshold is then the
time elapsed while the voltage signal is over the LT. Since, TOT = TE - LE, and hence LE
is linearly correlated to TOT and TE, we only need to include TOT and TE in our analysis;
the reason for not needing to include LE is made clearer in §7.1.
6.2
Energy Loss in the Pixel Detector
The next discriminating measurement of charge we include in our analysis is energy loss
(dE/dx) measured in the Pixel Detector. As mentioned in §5.3.1, when an ionizing particle
traverses the Pixel Detector and a pixel module registers a hit, an array of its pixels are
readout. Similar to the TOT measurements detailed above, a hit pixel corresponds to a
pixel whose voltage signal is registered to be above a specified threshold (∼3 keV), resulting
from the production of electron-hole pairs in the silicon [4]. In both the barrel region and
end-caps, we expect three hits since each region is composed of three concentric layers that
the particle will traverse.
When the array of pixels is readout, it is often the case that multiple neighboring pixels
are registered as hit. Using the collection of time-over-threshold measurements from this
cluster of pixels, we can gain a measure of the total charge deposited on a pixel module,
Q. Combining this information with the average energy needed to produce an electron-hole
pair, W = 3.68 ± 0.02 eV/pair, the width of the pixel module, d = 250 µm, the density of
the silicon, ρ, and the angle of incidence from normal, α, we calculate the energy loss as [4]
dE
dx
number of pairs
avg. energy
1
×
×
density of Si
pair
length of track in module
Q/e
cos α
=
× (W ) ×
ρ
d
Q W cos α
=
e
ρd
=
28
Figure 17: Pixel Detector dE/dx Schematic: Shown above is a schematic of an electron
traversing the three concentric cylinders composing the barrel region of the Pixel Detector.
Here, three clusters of hit pixels are registered on three pixel modules (hit pixels are shown
in blue). From this information we can extract an estimate for energy loss (dE/dx) along
the track.
Figure 18: Track Traversing Pixel Module: Shown above is a schematic of an electron
track traversing a pixel module of the pixel detector. The angle from the normal of the
module to the track is α, the thickness of the module is d = 250 µm and hence the distance
of the track through the module is given by x = (d/cosα).
6.3
Energy Loss in the Calorimeters
After having discussed the charge deposition measurements of the TRT and the energy loss
of the Pixel Detector, it is fitting that we now consider the energy loss measurements of the
LAr portions of the calorimeters, which utilize similar aspects of both components. Using
Liquid-Argon (LAr) as a scintillator medium, the Electromagnetic Calorimeter (EMC), the
Hadronic End-Cap Calorimeters (HEC), and the Forward Calorimeters (FCal) have a series
of narrow drift chambers – in a similar style to the TRT – to measure the energy lost by
traversing particles through electromagnetic showers induced by absorbers of lead or copper.
29
Using a projective geometric structure with high φ and η resolution, the EMC, HEC, and
FCal allow for a high precision measurement of dx – as was done for the Pixel Detector –
so that dE/dx can be calculated.
As detailed in §5.4.1, each module of the EMC contains readout electrodes between
two outer layers of copper held at high voltage, which are between two gaps filled with
LAr, which are themselves between two lead sheets. This repeated structure is more clearly
illustrated in Figure 19.
Figure 19: EM Calorimeter LAr dE/dx Schematic: Shown above is a schematic illustrating the internal structure of the accordion-shaped modules in the EMC. More specifically, above we are seeing an enlarged image of the cross section of one of the barrel modules
where we can see the layered LAr-Pb and electrode structure that underlies the module.
The above diagram was adapted from [15] and [3].
When high energy electrons and photons traverse the EMC, for instance, they produce
electromagnetic showers of electron-positron pairs and photons. When a high energy electron passes through one of the lead sheets of the EMC, the strong deflection of the electron
from the heavy nucleus of the lead atoms can lead to the emission of a photon through
Bremsstrahlung. This resulting photon pair produces and that resulting electron-positron
pair continues this process, depositing energy in the EMC as it progresses. (See Figure 20.)
If one of the resulting charged particles passes through a LAr gap, it ionizes the argon along
its trajectory and since the outer layer of the readout electrode is held at a voltage of 2000
V, the ionized atoms are collected near the outer layer. This process leads to a current
proportional to the deposited energy that can be readout through the channels leaving the
EMC.
30
Figure 20: EM Calorimeter Shower: Shown above is an exaggerated illustration of an
electromagnetic shower occurring in the EMC. After an electron is deflected by the heavy
lead nucleus – represented as a grey circle – and undergoing Bremsstrahlung, the resulting
photon cascades into a shower of electron-positron pairs and photons.
To calculate energy loss in terms of dE/dx using the above energy deposition measurement, the EMC utilizes its high η and φ resolution resulting from the accordion-shaped
geometry of the modules. Since the readout electrodes are grouped into various cells – of
different resolution depending on the region of the detector – dE/dx can be determined
by first, finding the estimated entry and exit points of the particle’s track to calculate the
length of that segment, dxi , and secondly, determining the amount of energy, dEi , deposited
in that cell. Summing over all ‘hit’ cells allows for the total dE/dx to be determined. A
schematic of this process is shown in Figure 21.
Figure 21: EM Calorimeter Track: The illustration above depicts an electron traversing
the barrel region of the EMC, registering four ‘hit’ cells shown in red. The length of the
track through each cell is estimated and combined with the energy deposited in that cell to
calculate dE/dx.
31
7
Multivariate Analysis
The primary purpose of this research is to quantitatively measure how effectively we can use
these charge profiles to distinguish anomalously charged particles (signal) from other SM
processes (background). In particular, we use the multivariate analysis technique known as
a Fisher Linear Discriminant, described below, to separate signal events from background
events using charge measurements such as TOT, TE, Pixel dE/dx, and LAr dE/dx.
7.1
The Fisher Linear Discriminant
The concept behind the Fisher Linear Discriminant (FLD) is rather simple; we have two
classes of data (signal and background) that we need to discriminate and we want the most
efficient way of separating them. To begin, we consider a sample of vectors∗ that describe
multiple instances of each class.
As a basic example, consider the data shown in Figure 22. Here, we have a signal,
labeled by blue points, and a background, labeled by red points, where the vectors that
describe them are merely the x and y coordinates of their position in the xy plane. Now,
the idea motivating the FLD is that we want to map these n-dimensional vectors (in this
toy example, n = 2) from Rn to the real-line, R, in a manner that maximizes the separation
between signal and background while minimizing the distance between instances of the same
class. We accomplish this goal by projecting the data onto the vector v ∈ Rn so that this
maximum separation is achieved. If we had the data shown below,
(a) ‘Poor’ Choice
(b) ‘Best’ Choice
Figure 22: Possible Choices for 2-dimensional FLD Projection
we would see that there are choices of v that yield large separation between the two classes
and others that lead to a large amount of overlap between the two classes. The questions
∗
For the purists out there, we would be more accurate to call these objects ‘n-tuples,’ as is more common
practice in particle physics. However, in the discussion to follow, it helps to intuit these n-tuples as the
elementary notion of vectors as arrows in Rn .
32
that the FLD addresses are what quantity is a good measure of separation and which v will
maximize this separation?
To see the full derivation of how we find the vector, v, the full explanation of the variables
referenced, and the choice of measurement for separation, see the appendix A. The final
result is that we project each point onto the vector
v i = (W −1 )ni (mi − M i )
where m and M are the means of the signal and background in Rn , and W −1 is the inverted
within-class matrix that contains information regarding the separation between instances
within each class and the correlations between their components.
For the example to be considered in this analysis, the FLD projects a vector in R4 with
the components such as (TOT, LE, TE, pixeldEdx) onto R as shown below.
Figure 23: 4-Dimensional Fisher Projection: Here, we have a more realistic example of
the projection resulting from a FLD applied to a Monte Carlo event dataset for Drell-Yan
production of doubly charged Q-balls.[14]
Since the FLD maximizes the separation between signal and background based on the
within-class and between-class matrices, it takes into account linear correlations between
the various inputs used. If, for example, TOT has a linear dependence on η that helps
distinguish highly and minimally ionizing particles, one merely inputs both TOT and η
into the FLD without needing to explicitly calculate how TOT depends on η. Not only
33
does this attribute of the FLD save the effort of inputing known, linear dependencies between variables, but it also allows for unforeseen correlations to be utilized for maximum
separation.
7.2
TMVA Package
The Toolkit for Multivariate Data Analysis with ROOT (TMVA) is a toolkit – to be implemented using ROOT – that allows the user to carry out a significant number of multivariate
analysis techniques such as boosted decision trees, neural networks, projected likelihood estimators, and most relevant to our purposes, Fisher Linear Discriminants. The FLD analysis
carried out in this thesis was implemented using TMVA.[14]
Using TMVA to implement the majority of these techniques consists of two phases: the
Training Phase and the Testing Phase. To begin the Training Phase, the user must have
samples where it is know which events are to be classified as signal and which are to be
classified as background. Using this information, TMVA is trained to separate these classes
as efficiently as possible. For other multivariate techniques such as neural networks, one
must train the sample multiple times since, unlike the FLD, these techniques do not have
an exact, pre-determined method of separating the data. However, for the FLD there is
a single vector that is determined, so training TMVA once is sufficient. Next, the Testing
Phase simply stands to implement the trained method of separation on a dataset where it is
unknown which events are to be classified as signal or background. For FLD, this amounts
to a resulting one-dimensional probability density function of the Fisher values. Using this
representation of the data, one can make straight cuts to extract signal from background;
this process is detailed more fully in §7.3. In the analysis carried out in this thesis, however,
we are more interested in finding out to what sensitivity we can determine the cross section
of these anomalously charged particles and in doing so we use the full signal and background
FLD distributions. Hence, we will implement a maximum likelihood fitting technique using
the results from the training phase of the FLD. The process for maximum likelihood fitting,
detailed in §7.4, provides a method of setting cross section limits.
7.3
Probability Density Functions and Cut Efficiencies
After the Testing Phase is complete, we are left with a probability density function (PDF)
that has resulted from the FLD. This PDF, which in our case will actually be discrete, gives
us an estimate of the probability that an event has a Fisher value within a specified bin.
From hence forth, we will refer to the PDF of the Fisher distribution for a class of events
as its FPDF.
Shown in Figure 24 is an example of the resulting superimposed FPDFs of the signal and
background datasets after being projected onto the vector determined in the FLD training
phase. However, for the dataset being tested, we need to extract which events we will
classify as signal or background. To do this we impose a cut on the resulting PDFs.
In Figure 25, every event with a Fisher value to the right of the cut would be classified
as signal (blue) and every event with a Fisher value to the left of the cut to be classified as
background (red). But even with these idealized, smooth FPDFs, we can see that with such
a cut, there could be background events classified as signal and vice versa. So a question
34
Figure 24: Testing Phase: Fisher PDFs (FPDFs)
Figure 25: Straight Cut for a FPDF
that begs to be addressed is ‘what are appropriate measures of accuracy for PDF cuts and
how do we impose them on our FPDF?’
To answer this question, we need to introduce what are referred to as signal and background efficiencies. Suppose the one-dimensional PDFs of signal and background are given
by Sg(x) and Bg(x), respectively.
Figure 26: Cut Efficiency
As shown above, if we impose a cut at xc , the area under Bg(x) that would be mistakenly
35
classified as signal is given by
Z
∞
Bg(x)dx,
α=
(19)
xc
while the area under Sg(x) that would be mistakenly be classified as background is given
by
Z xc
Sg(x)dx.
β=
−∞
The discriminating power to observe the signal, which is merely the area under Sg(x) in the
acceptance region above the cut xc , is thus given by
Z ∞
1−β =
Sg(x)dx.
(20)
xc
For a cut at xc , we refer to α in Eq.(19) as the signal efficiency and the quantity (1 − β) in
Eq.(20) as the background efficiency.
Now to answer our initial question, we can use signal and background efficiencies to
measure the significance of our cut. Suppose we have Nsg signal events and Nbg background
events in our dataset. Then using a cut at xc , we would expect to have accepted
S = Nsg (1 − β)
signal events and
B = Nbg α
background events. From this we can define the significance of our cut as
√
S
S+B
=
=
Nsg (1 − β)
Nsg (1 − β) + Nbg α
R
∞
Nsg xc Sg(x)dx
r
R
R
∞
∞
Nsg xc Sg(x)dx + Nbg xc Bg(x)dx
p

=

1 
r
Nsg  R ∞
xc
R∞

Sg(x)dx

R

Nbg
∞
Sg(x)dx + Nsg xc Bg(x)dx
xc
where in the last step we emphasize that the significance depends on both the ratio of signal
to background events and also decreases like 1/Nsg . Utilizing the above quantity, we can
specify the degree of accuracy achieved by our cut. TMVA, when implementing the Testing
Phase of a FLD, maximizes this quantity for all cuts.
36
Also, to measure how effectively the FLD separates the FPDFs of signal and background,
we define the separation of two PDFs, Σ2 , as [14]
Z
2 1 ∞ (Sg(x) − Bg(x))2
dx.
(21)
Σ =
2 xc Sg(x) + Bg(x)
This measure of separation would yield 0.0 for identical PDFs and 1.0 for PDFs with no
overlap. In §8, where we discuss our results, we will frequently make use of this definition
of separation to estimate how well the two classes being examined have been separated by
our FLD.
In the discussion above, we have described the technique for making cuts to the resulting
FPDF which could be used to isolate events of interest and study them further. In the next
section we will instead show how to set limits on the cross section of these events without
isolating any particular events.
7.4
Binned Maximum Likelihood Fitting
In the following sections, we intend to illustrate the construction of likelihood functions and
a technique that can be used to set limits on the cross section of ACP events. In §7.4.1,
we construct likelihood functions in a general setting. Following this discussion, in §7.4.2
we outline the Bayesian interpretation of our posterior probability that will allow us to
relate the likelihood function to the cross section limit setting process. Finally, in §7.4.3, we
outline a more specific technique that can be used to set limits on the cross section of ACP
events. The primary source for the statistical discussion present in what follows, unless
otherwise noted, is [22].
7.4.1
Likelihood Functions
Suppose we have an experimentally measured dataset n = (n1 , ..., nN ), where ni denotes
the measured value of the i-th bin of some parameter. Using some initial hypothesis, H,
and a set of parameters a that characterize that hypothesis, we construct the likelihood
that n was measured given these assumptions:
L (n; a) ··= P (n|a, H)
Without an example in mind, the above statement almost sounds tautologous. Hence,
suppose ni denotes the number of events that occurred with the momentum corresponding
to the i-th bin. Now, suppose we construct a hypothesis, H, that predicts a momentum
of µi in the i-th bin and that follows independent Poisson distributions. In this example,
the value of µi is determined by some set of parameters a – for example, the mass and
generation of the particle – and our likelihood function would become
L (n; a) = P (n|a, H)
= P (n1 |µ1 (a), H)P (n2 |µ2 (a), H) · · · P (nN |µN (a), H)
=
N
Y
µi (a)ni
i=1
ni !
e−µi (a)
37
where we have emphasized the dependence of each µi on a. The above helps illustrate that
the likelihood function is indeed the probability of observing exactly the results n given
some initial hypothesis and the parameters that describe it.
We want to focus our attention on the set of parameters, â, that maximizes the likelihood
that we observed n. This amounts to solving the simultaneous system of partial differential
equations
∂L = 0 , i = 1, ..., N .
∂ai a=â
Since the values of the likelihood function are so small for most purposes (consider the
above example when N is large!), we instead maximize the logarithm∗ of L :
∂lnL = 0 , i = 1, ..., N .
∂ai a=â
7.4.2
Bayesian Interpretation
Using the likelihood function as described above, we want to be able to place limits on
the cross section of ACP events. That is to say, we wish to place a confidence interval on
the ACP cross section given our dataset n = (n1 , ..., nN ) whose i-th component yields the
number of events recorded with the Fisher value corresponding to the i-th bin. To set this
confidence interval, we have to find the relationship between the following quantities: i) the
probability density of a particular cross section given our dataset n and our initial hypothesis
H, denoted P (σ|n, H) and ii) the likelihood function L (n; σ).† We will construction our
specific likelihood function in §7.4.3, but it is not needed for the discussion below.
Using Bayes Theorem, we have that
P (σ|n, H) =
P (n|σ, H)P (a|H)
.
P (n|H)
Here, P (σ|n, H) is the quantity of interest, referred to as the posterior probability. It
represents our knowledge of the distribution of the cross section given the number of events
recorded in each bin and our initial hypothesis. P (σ|H) denotes the prior probability
density, which reflects our knowledge of the cross section distribution given solely our initial
hypothesis H. P (n|H) is referred to as the evidence and since it carries no σ-dependence,
it can be thought of as merely a normalizing factor. Hence,
P (σ|n, H) ∝ P (n|σ, H)P (σ|H)
where we can normalize to unity. Now, assuming a uniform prior knowledge of the cross
section, i.e., P (σ|H) is some positive constant, we have that
P (σ|n, H) ∝ P (n|σ, H)
(22)
∗
The proof that maximization of L occurs at â if and only if the maximization of ln L occurs at â follows
immediately from the logarithm being a monotone increasing function.
†
Here, we will begin discussing the case where a is the single parameter, σ.
38
The right-hand side of Equ.(22) is precisely what we defined the likelihood function, L (n; σ),
to be. Hence, a Bayesian interpretation would claim that the estimator σ̂ that maximizes
the likelihood function would be maximizing the posterior probability assuming a uniform
prior probability.
Under these assumptions calculating limits on the cross section becomes merely a matter
of integrating the likelihood function, which we will assume is normalized to unity:
Z σ−
L (n|σ)dσ ··= α
Prob(σ < σ− ) =
−∞
Z ∞
L (n|σ)dσ ··= β
Prob(σ > σ+ ) =
σ+
Thus, we can determine our confidence on the cross section limits to be
Prob(σ ∈ [σ− , σ+ ]) = 1 − (α + β)
(23)
From a Bayesian interpretation, the above result is the relationship we wanted. Given the
dataset n and our initial hypothesis H, the likelihood function entirely determines that the
true value of the cross section σ is within the interval [σ− , σ+ ] to a confidence of 1 − (α + β).
7.4.3
Maximum Likelihood Fitting Implementation
In this section, we lay out how to carry out a binned maximum likelihood fit using the
PDFs resulting from the Fisher Linear Discriminant (FLD) analysis.
As described in §7.1, the FLD yields two expected PDFs for the Standard Model events
(background) and the ACP processes (signal) based on the MC data. These discrete PDFs
represent the probability of an event having the Fisher value corresponding to each bin.
Recall that the Fisher value is the scalar corresponding to the projection of the event’s ntuple along the vector in our parameter space that maximizes separation between signal and
background classes. From these distributions we have a set of expectation values at each
bin. More specifically, the i-th bin of the background FPDF yields the expected fraction
of Standard Model events, µSM
, that would have the i-th bin’s Fisher value. Similarly, the
i
signal FPDF yields the expected fraction of ACP events, µACP
, that would have the i-th
i
bin’s Fisher value.
Now, we turn to an unknown distribution in Fisher space, i.e, a set of Fisher values determined by a dataset of an unknown composition of signal and background events. We make
the assumption that the data is an unknown composition of only SM and ACP processes,
or at the very least, that other processes are negligible compared to other uncertainties.
From this assumption, we would predict that if we measure the number of events in the i-th
bin of the data, we will find some linear combination of the SM and ACP expected values,
namely,
µi = αµSM
+ βµACP
.
i
i
The coefficients α and β will eventually be related to the cross sections of these processes.
Assuming that the observed events in each bin follow independent Poisson distributions,
the probability that we measure ni events in the i-th bin is given by
39
µni i −µi
e ,
ni !
Pi (ni |µi ) =
where we have dropped the H denoting the hypothesis that the µi values characterize.
Hence, the likelihood function becomes
L 0 (n; µ) =
Y
Pi (ni |µi )
i
where i ranges over all bins. If we know α and β to be within some limits,
α = ᾱ ± δ ᾱ ; β = β̄ ± δ β̄
– due to previous studies or other assumptions – then we can Gaussian constrain the likelihood function L 0 as
Y
(α − ᾱ)2
(β − β̄)2
Pi (ni |µi ) exp −
exp
−
L (n; µ) =
2δ ᾱ2
2δ β̄ 2
i
As alluded to earlier, we will actually be utilizing the logarithm of this function:
ln L =
X
ln Pi −
i
(α − ᾱ)2 (β − β̄)2
−
.
2δ ᾱ2
2δ β̄ 2
As promised before, we will now relate α and β to the cross sections of SM and ACP
events since we want to write the likelihood function in terms of the cross sections. The
relationship relating the ACP cross-section, σACP , is given by
β = σACP
×
BRACP
| {z }
×
Branching Ratio
|{z}
Acceptance
×
L .
|{z}
Integrated Luminosity
The obvious relationship similar to the above holds for α and σSM .
Now that we have the likelihood function in terms of the cross section – assuming, of
course, that we know the branching ratio, acceptance, and luminosity – we could in principle
set limits of the cross section that maximizes L . Instead, since in this analysis we do not
know the branching ratio or acceptance for these events, we instead set limits on the ratio of
β to α. To do so, we find the values α̂ and β̂ that maximize L ; this process is implemented
using the ROOT program TMinuit[9].∗ From here, we fix an initial value of β and calculate
the maximum value of L achievable, or, in other words, find the α that maximizes L given
some fixed β. Then, we will iteratively carry out this processes at various β values within
a specified range and then use the resulting distribution to set limits via Equ.(23) to 95%
confidence. This process will be more clearly depicted in the results §8.2.
∗
Technically, we minimize − ln L since TMinuit, as the name suggests, minimizes a function.
40
8
Results
8.1
Toy Monte Carlo
To begin this analysis, we used a Toy Monte Carlo (Toy MC) developed by TRT software
experts Esben Klinkby and Thomas Kittlemann. This Toy MC allows us to look at lowlevel quantities and change low-level configuration of the Toy MC independently of the
full ATLAS simulation. Although its minimalism gains us the advantage of more easily
accessing and modifying the details of the TRT measurements, the Toy MC still utilizes
the full ATLAS simulation code (PAI model) for particle ionization. The primary benefits
of beginning our analysis with this simulation are that one can easily modify charge to any
double while also being able to specify the type of particle traversing the TRT. This allows
for simulating and testing of fractionally and doubly charged particles of relatively low mass
– on the order of 100 GeV in accordance with the motivating LRSM theory.
Below we present results for simulation and FLD testing phases with signals of long-lived,
Doubly Charged Particles (DCPs), Two-Third Charged Particles (TTCPs) and One-Third
Charged Particles (OTCPs) with masses of 200 GeV and backgrounds of Singly Charged
Particles (SCPs) with masses 200 GeV. No cuts we implemented in the Toy MC analysis,
since only low-level quantities were being considered.
Since, no energy loss (dE/dx) information was available for the Toy MC, the below
analysis was done using TOT, LE, TE and GeomTOT as the input parameters for the
FLD. Although there is a linear dependence between TOT, LE, and TE, to wit, TOT =
TE - LE, and hence the FLD should take into account the correlation, initial studies where
done with all three included in the Toy MC analysis to specifically check that assumption.
GeomTOT is shorthand for TOT Adjusted for Geometry, which is TOT adjusted to take
into account the angle of incidence between the trajectory of the particle and the TRT straw
hit.
8.1.1
Doubly Charged Particles
Shown in Figure 27 are plots of all FLD input parameter distributions for the ToyMC DCP
and SCP simulation, where signal events are shown in blue and background events are
shown in red.
A table of the discrimination power and separation of the input parameters is shown
in Figure 28. Discrimination power is calculated using the diagonal components of the
between-class and within-class matrices corresponding to that parameter: B kk /(B kk +W kk ),
where k is the entry corresponding to the parameter.∗ Discrimination power thus helps
measure the power with which the FLD helps both separate instances of different classes
while also grouping together instances of the same class. The separation is a measure of
the overlap between the signal and background PDFs of an input parameter. For example,
if the separation were 1.0, it would imply that the PDFs of signal and background did not
overlap at all, whereas a separation of 0.0 would correspond to signal and background PDFs
being identical. See §7.3 and Eq.(21) for exact definitions.
∗
Once again, the reader is referred to Appendix A for a more detailed account of these matrices and the
notation used. Correlation Matrices for signal and background are also found in Appendix B.
41
Figure 27: Toy MC - DCP and SCP - Mixed Plots
Rank
Parameter
1
2
3
4
TOT
TE
GeomTOT
LE
Discrimination
Power
0.1736
0.1704
0.0325
0.01789
Figure 28: Toy MC - DCP and SCP - Parameter Discrimination Power
42
The resulting Fisher distribution is shown below.
Figure 29: Toy MC - DCP and SCP - FPDFs: Separation of 0.579 achieved with a
maximum significance of 1.360
In addition, cut efficiencies for varying number of signal and background events are
shown in Appendix C.
43
8.1.2
Two-Thirds Charged Particles
Shown below are the corresponding plots and figures from §8.1.1 for a signal of TTCPs and
background of SCPs.
Figure 30: TTCP and SCP - Training Phase - Mixed Plots
Rank
Parameter
1
2
3
4
TOT
TE
GeomTOT
LE
Discrimination
Power
0.08233
0.03217
0.01234
0.009971
Figure 31: TTCP and SCP - Parameter Discrimination Power
44
Figure 32: TTCP and SCP - FPDFs: Separation of 0.165 achieved with a maximum
significance of 0.603
8.1.3
One-Third Charged Particles
Lastly, shown below are the corresponding plots from §8.1.1 for a signal of OTCPs and
background of SCPs.
Figure 33: OTCP and SCP - Training Phase - Mixed Plots
45
Rank
Parameter
1
2
3
4
TOT
TE
LE
GeomTOT
Discrimination
Power
0.06041
0.03185
0.01290
0.01207
Figure 34: OTCP and SCP - Parameter Discrimination Power
Figure 35: OTCP and SCP - FPDFs: Separation of 0.357 achieved with a maximum
significance of 0.980
46
8.2
Full ATLAS Simulation Monte Carlo
Currently, the Highly Ionizing Particle (HIP) group, within the ATLAS Collaboration, is
undertaking a search for long-lived, anomalously charged particles, including a search for
doubly charged particles (DCP). In this section, we carry out Fisher Linear Discriminant
(FLD) analysis similar to §8.1 and detailed in §7.1 using the Monte Carlo datasets that are
being utilized by this group; we present the Fisher distributions for signal and background
using i) only time-over-threshold and trailing edge time, ii) then including energy loss from
the Pixel Detector and finally, iii) including energy loss from the LAr calorimeters.
The two datasets used in this section are
1. user.SimoneZimmermann.LLPD3PD-00-01-23.SGLMU DESD.user.mjuengst
.qballs MDTfixYury 201201.mc11.m200.q1
2. user.SimoneZimmermann.LLPD3PD-00-01-23.SGLMU DESD.user.mjuengst
.qballs MDTfixYury 201201.mc11.m200.q2
and are Drell-Yan heavy fermion pair production of 1) 200 GeV, singly charged q-balls
serving as background and 2) 200 GeV, doubly charged q-balls serving as signal. (There are
approximately 4400 signal events and 3770 background events in these Monte Carlos.) Qballs are non-topological solitons that can carry arbitrary charges. More basically, they are
spherical solutions to a Lagrangian that – by having a non-vanishing field in their interior –
yield a conserved charge via Equ.(11) that are not constrained to have charge |q| = ±1.[2]
Many of the anomalously charged particle searches are utilizing q-balls to simulate ACP
events.
Additionally, we set upper limits to 95% confidence on the fraction of DCP events out
of 10,000 events that the FLD information allows us to be sensitive to using the maximum likelihood fitting method detailed in §7.4.3. Since a specific analysis channel has not
been decided upon by the HIP group, instead of setting limits on real event data, we set
limits to a simulation of 10,000 events sampled randomly from the singly charged q-ball
Fisher distribution. As we did for the FLD analysis, we present this limit setting using
only time-over-threshold and trailing edge time, then including energy loss from the Pixel
Detector and finally, including energy loss from the LAr calorimeters. This enables us to
illustrate the added discrimination power achieved through the inclusion of these energy
loss measurements in the FLD analysis.
Finally, we set upper limits to 95% confidence on the fraction of DCP events out of a
range of total events from 100 to 100,000 events to illustrate any inherent dependence on
event statistics.
47
8.2.1
Fisher Linear Discriminant Results
Shown below are plots of all FLD input parameter distributions for the Full Monte Carlo
DCP and SCP simulation.
Figure 36: DCP and SCP - Training Phase - Mixed Plots
Below, in Figure 37, we present the discrimination power of each parameter and their
relative ranking. As was the case in the Toy MC, time-over-threshold remains the most
powerful discriminant, but both Pixel and LAr energy loss (dE/dx) provide more powerful
discrimination power that trailing edge time. Additionally, we present the scatter profiles
of the various parameters in Figure 38 along with their precise correlation matrices in
Appendix B.
Rank
Parameter
1
2
3
4
TOT
Pixel (dE/dx)
LAr (dE/dx)
TE
Discrimination
Power
0.2790
0.2756
0.2606
0.2213
Figure 37: DCP and SCP - Parameter Discrimination Power
48
(a) Pixel (dE/dx) vs. TOT
(b) Pixel (dE/dx) vs. TE
(c) LAr (dE/dx) vs. TOT
(d) LAr (dE/dx) vs. TE
(e) Pixel (dE/dx) vs. LAr (dE/dx)
(f) TE vs. TOT
Figure 38: DCP and SCP - Scatter Profiles: Above we present the scatter profiles of
time-over-threshold (TOT), trailing edge time (TE), energy loss in the Pixel Detector (Pixel
dE/dx), and energy loss in the LAr calorimeters (LAr dE/dx). SCP events are represented
by black points and DCP events are represented by contour plots.
49
In Figure 39, we present the resulting Fisher PDFs for signal and background.
(a) TOT and TE
(b) TOT, TE, and Pixel (dE/dx)
(c) TOT, TE, Pixel (dE/dx), and LAr (dE/dx)
Figure 39: DCP and SCP - FPDFs: (a) Separation of 0.756 achieved with a maximum
significance of 1.845 using only time-over-threshold (TOT) and trailing edge time (TE) from
the TRT. (b) Separation of 0.824 achieved with a maximum significance of 2.089 using TOT,
TE, and energy loss (dE/dx) from the Pixel Detector. (c) Separation of 0.944 achieved with
a maximum significance of 2.871 using TOT, TE, Pixel dE/dx, and LAr dE/dx from the
calorimeters.
50
8.2.2
Maximum Likelihood Fitting
Below we present upper limits set to the fraction, i.e., β/α, of DCP events out of 10,000
total simulated unknown events using the methods described in §7.1 and the beginning
of this section. The likelihood function utilized, which was not Gaussian constrained, is
illustrated as a function of DCP events per 10,000 instead of the ratio of β to α for clarity.
Since all likelihood functions were strongly peaked at zero, the lower bound for the 95%
confidence interval is simply null.
Figure 40: ML Fitting with TRT FLD Analysis: To 95% confidence, we set upper limits
on the fraction of DCP events per 10,000 to be 23 × 10−4 using Fisher Linear Discriminant
analysis of trailing edge time (TE) and time-over-threshold (TOT) from the TRT detector.
The top plot shows how well the simulated data was fit by the values of α and β that
maximize the likelihood.
51
Figure 41: ML Fitting with TRT and Pixel (dE/dx) FLD Analysis: To 95% confidence, we set upper limits on the fraction of DCP events per 10,000 to be 12 × 10−4 using
Fisher Linear Discriminant analysis of Trailing-edge time (TE) and time-over-threshold
(TOT) from the TRT detector, and energy loss (dE/dx) from the Pixel Detector. The top
plot shows how well the simulated data was fit by the values of α and β that maximize the
likelihood.
52
Figure 42: ML Fitting with TRT, Pixel (dE/dx), and LAr (dE/dx) FLD Analysis:
To 95% confidence, we set upper limits on the fraction of DCP events per 10,000 to be
6.9 × 10−4 using Fisher Linear Discriminant analysis of Trailing-edge time (TE) and timeover-threshold (TOT) from the TRT detector, energy loss (dE/dx) from the Pixel Detector,
and energy loss (dE/dx) from the LAr calorimeters. The top plot shows how well the
simulated data was fit by the values of α and β that maximize the likelihood.
53
Total Number
of Events
100
200
300
500
800
1,000
2,000
3,000
5,000
8,000
10,000
20,000
30,000
50,000
80,000
100,000
DCP Fraction
Limit Set
4.9E-02
2.3E-02
1.6E-02
9.5E-03
5.7E-03
4.8E-03
2.7E-03
2.0E-03
1.2E-03
8.3E-04
6.4E-04
3.5E-04
2.4E-04
1.4E-04
8.8E-05
6.8E-05
DCP Num. Events
Limit Set
4.9
4.6
4.7
4.7
4.6
4.8
5.5
6.1
5.9
6.6
6.4
7.0
7.2
5.9
6.6
6.8
Figure 43: Upper DCP Fraction Limit Set as a Function of Total Number Events:
Above we present the upper limits set on the fraction of doubly charged particle events as
a function of the total number of events in our unknown dataset. Each entry represents the
average taken over 10 samples. Since, the likelihoods were all strongly peaked at zero, the
lower limit set for all entries is simply null.
54
9
Conclusions
For the full Monte Carlo of Doubly Charged Particle (DCP) and Singly Charged Particle
(SCP) events detailed in §8.2, time-over-threshold from the TRT, energy loss (dE/dx) from
the Pixel detector, and energy loss from the LAr calorimeters are found to offer similar
discrimination powers of 0.2790, 0.2756, and 0.2606, respectively; discrimination power is
measured using the traces of the within-class and between-class matrices: B kk /(B kk +W kk ).
Trailing edge time offered the smallest discrimination power of 0.2213.
The Fisher Linear Discriminant (FLD) analysis of the full Monte Carlo produced Fisher
distributions of signal and background with a separation, Σ, and significance, σ of i)
Σ = 0.756 and σ = 1.845 only using time-over-threshold (TOT) and trailing edge time
(TE) from the Transition Radiation Tracker, ii) Σ = 0.824 and σ = 2.089 using TOT, TE,
and energy loss (dE/dx) from the Pixel Detector and iii) Σ = 0.944 and σ = 2.871 using
TOT, TE, energy loss from the Pixel Detector, and energy loss from the LAr calorimeters. In agreement with the discrimination power results, the separation and significance
of the Fisher distributions substantially increase by incorporating the energy loss (dE/dx)
measurements from the Pixel Detector and the LAr calorimeters.
Using the FLD analysis and the maximum likelihood fitting technique detailed in this
thesis, upper limits on the ratio of DCP to SCP events out of 10,000 events were found to
be i) 23 × 10−4 , ii) 12 × 10−4 , and iii) 6.9 × 10−4 to 95% confidence. These limits are set
using the simulated data detailed in §8.2 and do not reflect upper limits set on real event
data. As above, upper limits set were attained using i) only TOT and TE measurements,
ii) TOT, TE, and energy loss fom the Pixel Detector, and iii) TOT, TE, pixel energy loss,
and energy loss from the LAr calorimeters. The number of DCP events we are sensitive to
using all four measurements is shown to lack a strong dependence on event statistics with
total number events ranging from 100 to 100,000.
The above results provide strong evidence of the additional sensitivity gained by incorporating the energy loss from the Pixel Detector and LAr calorimeters to the TRT charge
measurements for anomalously charged particle searches. Adding energy loss from the Pixel
Detector to the TRT measurements approximately halves the expected upper limits from
the TRT alone, and adding energy loss from the LAr calorimeters halves this limit again,
demonstrating the value of combining energy loss information from all detector components.
Although the relative improvement in measurements of discriminating power – separation, significance, and upper limits set on β/α – of these variables is significant, the
individual measurements of the discriminating power should not be considered nearly as
definitive. Firstly, limits set above we attained using a Monte Carlo composed of Drell-Yan,
singly charged q-balls whereas studies for ACP events will need to examine primarily muonlike SM events, such as Z → µ− + µ+ . Additionally, the maximum achieved separation of
0.944 found in the FLD analysis is likely to diminish when further selections cuts are made
in the data analysis; the limits set thereafter, which are heavily dependent on these Fisher
distributions, will also change accordingly.
55
A
Appendix: Derivation of the Fisher Linear Discriminant
Although it may not be entirely necessary for us to derive the FLD for this thesis, there are
three reasons motivating its inclusion here. Firstly, out of all the multivariate techniques
commonly used (FLD, neural networks, boosted decision trees, etc.), the FLD stands as
one of the few that is simple enough to derive and truly understand the process involved.
It is much less of a ‘black box’ than some of the other techniques listed. The second reason
for including this derivation is that the FLD plays such a central role in the analysis of
this thesis and the conclusions drawn. The last reason is simply that I could not find a
derivation of the FLD that was to my liking; few have both explicit notation and fully
explain results. So here, I include what I hope will offer a clearer explanation of the FLD.
Consider we have two classes of objects, signal and background, that we wish to discriminate. The signal class has N1 vectors∗
x(1), x(2), ..., x(N1 )
that are n-dimensional and describe N1 instances of the signal class. Each component
of these vectors describes a different characteristic of that instance. For example, in our
analysis, each vector would represent a particular event and would have time-over-threshold
as one of its components. Similarly, the background class has N2 vectors
X(1), X(2), ..., X(N2 )
of the same dimension with each component corresponding to the same characteristics as
x(i), i = 1, ..., N1 . Now, we want to project these points from Rn onto R. To begin,
we consider projecting each instance onto some unit vector v ∈ Rn and determining the
length of the resulting projection. Specifically, if we have some point, x, the length of this
projection onto v would simply be the dot product
vi xi
where we have used generalized Einstein notation: repeated indices are to be summed over
all n components. That is
vi xi ≡
n
X
vi xi
i=1
Note that the use of Greek indices is supposed to remind us that these are spacial indices
and that we do not need to worry about the negative signs of the metric.
To measure the separation of the signal and background events projected onto this
vector, we are going to look at the mean projected distances from the origin. The signal
and background mean projected distances are
m̃ =
1 X
vi xi
N1 x
∗
Once again, for the purists out there, replace every instance of the word ‘vector’ with ‘n-tuple’ if it is
more to your liking.
56
and
M̃ =
1 X
vi X i ,
N2
X
respectively. Now we can note that
1 X
vi xi = vi
m̃ =
N1 x
1 X i
x
N1 x
!
= vi mi
where we have defined mi to be the mean of the signal sample. We can similarly find
that M̃ = vi M i , where Mi is the mean of the background sample. Note the difference
in dimensionality going on in this discussion. The mean projected distances are scalars
whereas the mean of the samples are n-dimensional vectors.
It would almost seem sufficient to merely use (m̃ − M̃ )2 as our measure of separation
between the two classes, but this would not take into account the variance of the sample
which is a measure of the separation within each class. To solve this problem, we incorporate
the scatter [25] of the samples
σ̃ 2 =
X
(vi xi − m̃)2
x
for the signal and
S̃ 2 =
X
(vi X i − M̃ )2
X
for the background. These quantities are equal to variance up to a factor of N1 and N2 .
With these definitions, we can define a more suitable measure of separation for the
Fisher discriminant, J , given by
J (vi ) =
=
(m̃ − M̃ )2
σ2 + S 2
(vi mi − vi M i )2
P
i
i 2
i
i 2
X (vi X − vi M )
x (vi x − vi m ) +
P
This seems much more cumbersome to work with, but we can simplify things quite a bit.
First, looking at the numerator we can expand and rearrange to get∗
(vi mi − vi M i )2 = vi (mi − M i )v j (mj − Mj )
= vi (mi − M i )(mj − Mj )v j
= vi B ij v j
where B ij is the between-class matrix
B ij = (mi − M i )(mj − Mj )
∗
This is why I have chosen to use tensor notation instead of conventional matrices. Since each of these
variables is merely a scalar, we do not have to worry about the non-abelian nature of matrices or transpose
any objects.
57
Here, notation of raised and lowered indicies for one letter represents that the quantity is a
matrix. So B ij refers to the (i, j)-th component of the matrix B.
Next, we begin by simplifying the first term of the denominator
X
(vi xi − vi mi )2
σ̃ 2 =
x
=
X
vi (xi − mi )v j (xj − mj )
x
!
X
= vi
=
(xi − mi )(xj − mj ) v j
x
i j
vi σ j v
where σ ij is the signal covariance matrix defined as
σ ij =
X
(xi − mi )(xj − mj )
x
The same process can be done for S yielding the background covariance matrix
S ij =
X
(X i − M i )(Xj − Mj )
X
Combining these terms for the denominator
σ̃ 2 + S̃ 2 = vi σ ij v j + vi S ij v j
= vi (σ ij + S ij )v j
= vi W ij v j
gives us the within-class matrix ∗ W ij ,
W ij = σ ij + S ij .
Now with all the cumbersome variable definitions out of the way, we can represent J
in its cleanest form
J (vi ) =
vi B ij v j
vi W ij v j
which is also known as the generalized Rayliegh quotient. Next, since J is a measure of
∗
If it is not clear, the naming of these matrices is supposed to reflect the notion that the between-class
matrix measures the separation between the classes and the within-class matrix measures the separation
within each class.
58
the separation of the samples onto v, we want to maximize it with respect to vi
⇒
d
J =0
dvi
B ij v j (vl W lk v k ) − W ij v j (vl B lk v k )
(vl W lk v k )2
vl B lk v k
i j
W ij v j = 0
⇒ B jv −
l
k
vl W k v
=0
⇒ B ij v j = λW ij v j
where λ is some scalar. As long as W is invertible,
(W −1 )ni B ij v j = λv n
We recognize this as a matrix eigenvalue problem.
Using that
B ij v j
= (mi − M i )[(mj − Mj )v j ]
= (mi − M i )λ
where we have labeled the second scalar term as λ, our eigenvalue problem becomes
λ(W −1 )ni (mi − M i ) = λv n .
Hence, we find that the vector that maximizes the Fisher Linear Discriminant is given by
v i = (W −1 )ni (mi − M i )
or in more common matrix notation
v = W−1 (m − M).
59
B
B.1
Appendix: Correlation Matrices
Toy Monte Carlo
(a) Signal
(b) Background
Figure 44: Toy MC - DCP and SCP - Correlation Matrices
(a) Signal
(b) Background
Figure 45: Toy MC - TTCP and SCP - Correlation Matrices
60
(a) Signal
(b) Background
Figure 46: Toy MC - OTCP and SCP - Correlation Matrices
B.2
Full Monte Carlo
(a) Signal
(b) Background
Figure 47: Full MC - DCP and SCP - Correlation Matrices
61
C
C.1
Appendix: Cut Efficiency and Significance Plots
Toy Monte Carlo
Figure 48: DCP and SCP - FLD Cut Efficiencies and Significance: Shown are
the cut efficiency plots for various number of background events to number of signal event
ratios.
62
Figure 49: TTCP and SCP - FLD Cut Efficiencies and Significance
63
Figure 50: OTCP and SCP - FLD Cut Efficiencies and Significance
64
C.2
Full Monte Carlo
Figure 51: DCP and SCP - FLD Cut Efficiencies and Significance
65
References
[1] V.A. Aliev, et. al., Search for doubly charged particles as stable constituents of composite
dark matter in the ATLAS experiment. ATLAS Collaboration, ATLAS NOTE, August
11, 2011.
[2] S.Coleman, Q balls. Nucl.Phys.B 262(1985)263.
[3] ATLAS Collaboration, The ATLAS Experiment at the CERN Large Hadron Collider.
JINST 3 S08003, 2008. URL: http://iopscience.iop.org/1748-0221/3/08/S08003
[4] ATLAS Collaboration, dE/dx measurement in the ATLAS Pixel Detector and its use
for particle identification. ATLAS-CONF-2011-016, Mar 2011.
[5] ATLAS Collaboration, ATLAS Detector and Physics Performance Technical Design
Report. ATLAS TDR 14, CERN/LHCC 99-14, 25 May 1999.
[6] ATLAS Collaboration, ATLAS Etours. URL http://www.atlas.ch/etours. 2011
[7] ATLAS Experiment, Mutimedia - ATLAS Photos. http://www.atlas.ch/photos, Web.
March 15, 2011
[8] G. Azurelos, K. Benslama and J.Ferland, arXiv:hep-ph/0503096v1
[9] R. Brun and F. Rademakers, ROOT - An Object Oriented Data Analysis Framework,
Proceedings AIHENP’96 Workshop, Lausanne, Sep. 1996, Nucl. Inst. & Meth. in Phys.
Res. A 389 (1997) 81-86. See also http://root.cern.ch/.
[10] CERN, CERN Document Server. URL: http://cdsweb.cern.ch/
[11] CERN, CERN Public Website. URL: http://public.web.cern.ch/
[12] Z. Chacko and R. N. Mohapatra, arXiv:hep-ph/9712359v1
[13] D. Griffiths, Introduction to Elementary Particles. John Wiley & Sons, INC., New
York, 1st Edition, 1987
[14] A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. von Toerne, and H. Voss,
TMVA: Toolkit for Multivariate Data Analysis. PoS A CAT 040 (2007) [physics/0703039]
[15] M. D. Jørgensen, Search for long lived massive particles with the ATLAS detector at
the LHC. Niels Bohr Institute, University of Copenhagen, Master Thesis
[16] M.
Jüngst,
et.
al.
ExoticMultiCharge.
2012.
https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/ExoticMultiCharge
URL:
[17] A. Krasznahorkay, Documentation for the ATLAS Muon Trigger Slice. 2011. URL:
https://twiki.cern.ch/twiki/bin/viewauth/Atlas/MuonTriggerDocumentation
[18] M.E. Peskin and D.V. Schroeder, An Introduction to Quantum Field Theory. Westview
Press, 1995.
66
[19] M. L. Perl, E. R. Lee, and D. Loomba, Searches for Fractionally Charged Particles.
Annu. Rev. Nucl. Part. Sci. 2009. 59:47–65, June, 2009.
[20] K. Nakamura et al. (Particle Data Group), J. Phys. G 37, 075021 (2010)
[21] The Opal Collaboration, Search for Heavy Charged Particles and for Particles with
Anomalous Charge in e− e− Collisions at LEP. CERN-PPE/95-021, 27th February 1995.
[22] K.F. Riley, M.P. Hobson, S.J. Bence, Mathematical Methods for Physics and Engineering: A comprehensive guide. Cambridge University Press, 2nd Edition, 2002
[23] T. Rizzo, DOI: 10.1103/PhysRevD.25.1355,
URL: http://link.aps.org/doi/10.1103/PhysRevD.25.1355
[24] R. Shankar, Principles of Quantum Mechanics. Springer, New York, 2nd Edition, 1994
[25] O. Veksler, Pattern Recognition - Lecture 8. CS Department University of Western
Ontario, 2006
67
Download