Uploaded by Lucas Oliveira

A New Generation of Cosmic Superstring Simulations (José Ricardo C. C. C. Correira)2023 (Z-Library)

advertisement
Springer Theses
Recognizing Outstanding Ph.D. Research
José Ricardo C. C. C. Correira
A New
Generation
of Cosmic
Superstring
Simulations
Springer Theses
Recognizing Outstanding Ph.D. Research
Aims and Scope
The series “Springer Theses” brings together a selection of the very best Ph.D. theses
from around the world and across the physical sciences. Nominated and endorsed by
two recognized specialists, each published volume has been selected for its scientific
excellence and the high impact of its contents for the pertinent field of research. For
greater accessibility to non-specialists, the published versions include an extended
introduction, as well as a foreword by the student’s supervisor explaining the special
relevance of the work for the field. As a whole, the series will provide a valuable
resource both for newcomers to the research fields described, and for other scientists
seeking detailed background information on special questions. Finally, it provides
an accredited documentation of the valuable contributions made by today’s younger
generation of scientists.
Theses may be nominated for publication in this series by heads
of department at internationally leading universities or institutes
and should fulfill all of the following criteria
• They must be written in good English.
• The topic should fall within the confines of Chemistry, Physics, Earth Sciences,
Engineering and related interdisciplinary fields such as Materials, Nanoscience,
Chemical Engineering, Complex Systems and Biophysics.
• The work reported in the thesis must represent a significant scientific advance.
• If the thesis includes previously published material, permission to reproduce this
must be gained from the respective copyright holder (a maximum 30% of the thesis
should be a verbatim reproduction from the author’s previous publications).
• They must have been examined and passed during the 12 months prior to
nomination.
• Each thesis should include a foreword by the supervisor outlining the significance
of its content.
• The theses should have a clearly defined structure including an introduction
accessible to new PhD students and scientists not expert in the relevant field.
Indexed by zbMATH.
José Ricardo C. C. C. Correira
A New Generation of Cosmic
Superstring Simulations
Doctoral Thesis accepted by
Centro de Astrofísica da Universidade do Porto Rua das
Estrelas s/n, Porto, Portugal
Author
Dr. José Ricardo C. C. C. Correira
Cosmology Thematic Line
Centre for Astrophysics of the University
of Porto
Porto, Portugal
Supervisor
Prof. Carlos Martins
Centro de Astrofïsica e Astronomia da
Universidade do Porto Instituto de
Astrofísica e Ciências do Espaço
Porto, Portugal
ISSN 2190-5053
ISSN 2190-5061 (electronic)
Springer Theses
ISBN 978-3-031-20228-5
ISBN 978-3-031-20229-2 (eBook)
https://doi.org/10.1007/978-3-031-20229-2
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
In memory of,
Armanda Oliveira Pereira Campos
Maria Inês Correia
Maria Branca da Silva Santos
Supervisor’s Foreword
Cosmic string networks are the best motivated fossil relics of the early universe.
Our current understanding of particle physics implies that they must have formed
in cosmological phase transitions, but so far, they have not been detected, which,
in principle, could place very stringent constraints on several theoretical paradigms.
This is also the case in superstring inspired inflation models, where so-called cosmic
superstrings may form, providing a cosmological-scale fossil of string theory. Such
constraints primarily come from the cosmic microwave background, pulsar timing
and gravitational waves, but recent examples find limits on relevant model parameters
which span more than ten orders of magnitude. It follows that existing constraints are
manifestly unreliable, and more robust ones are mandatory. How can this problem
be solved?
Being intrinsically nonlinear objects, the study of the cosmological evolution of
defect networks unavoidably requires both analytic modeling (the canonical model
thereof being due to Martins and Shellard) and high-resolution numerical simulations. Thus, for the sake of simplicity, one often focuses of the simplest such
defects, e.g., Abelian-Higgs strings, but realistic cosmic strings will have non-trivial
internal structure, including charges and currents. And realistic networks will lead
to observational predictions that differ from those of simpler strings.
This is where José Ricardo Correia’s thesis comes in. Its main outcome is the
first GPU-accelerated defect network evolution code (including technical in situ
visualization advances). The code is more than 30 times faster than previousgeneration CPU-based codes, which has removed a 20-year bottleneck in the field,
and opened new avenues of research, including extending defect studies beyond the
simplest Abelian-Higgs strings, to more realistic superconducting strings and cosmic
superstrings.
The new code enables the world’s largest (81923 or larger) field theory simulations of plain cosmic strings, superconducting strings, and then cosmic superstrings,
complementing ongoing work in our team to develop general analytic evolution
models for cosmic strings with internal degrees of freedom, including charges and
vii
viii
Supervisor’s Foreword
currents. Specifically, the numerical simulation diagnostics (densities, RMS velocities, correlation lengths, loop distributions, etc.) will be used to obtain, through a
full MCMC-based analysis, a rigorous calibration of these analytic models.
Ultimately, this thesis paves the way for improving the reliability of observational
constraints on cosmic strings and superstrings. The GPU-accelerated code enables
generating thousands of full sky template maps, which are crucial to obtain reliable
predictions for the CMB signatures of these networks and for their stochastic gravitational wave background, as well as forecasts. (Currently these are based on a few
maps or naive toy models.) The ultimate goal is to develop tools to search for these
fossils not only with current observational facilities, as well as forthcoming ones
(such as the SKAO) and longer term ones (such as LISA).
This thesis therefore lays out a new paradigm in field theory simulations of topological defect networks. As such it will be an ideal starting point for the new generation of students entering the field, but also an important reference for researchers
wanting to catch up on what is, undoubtedly, the new gold standard in the field.
Porto, Portugal
July 2022
Carlos Martins
Abstract
Topological defects are unavoidable consequence of some phase transitions in
the early Universe, being produced via the Kibble mechanism. Depending on the
symmetry broken, different defects can be produced, from planar domain walls, to
line-like cosmic strings to point-like monopoles. The safest of these, in the sense
that it usually cannot overclose the Universe, and therefore the most studied, is the
cosmic string. Given that these objects provide unequivocal evidence of new physics
theories, beyond the Standard Model, the expected imprints of such defect networks
are often primary targets for upcoming and current observational facilities.
In order to study the evolution of defect networks for analytical and observational purposes, there are two possible approaches, one based on simulations and
another based on semi-analytical models. Although one can think of them as separate
approaches, the reality is that the relationship between the two is instead symbiotic.
For example, given the orders of magnitude over which the typical defect width
changes in throughout cosmological history, and the large dynamic range required to
even evolve for such a long conformal time, simulations are bottlenecked and cannot
simulate a network of defects throughout its entire lifetime. The solution would be
to use a semi-analytical model, which properly calibrated can evolve some mean
quantities describing the network throughout its life. However, we emphasize this
requires proper calibration of unknown a priori model parameters, which can be
done via a comparison with the scale-invariant evolution of a string network from a
simulation.
To set tighter experimental constraints and probe larger parameter spaces of theoretical models, one often requires larger and larger resolution and larger dynamic
range in simulations. We set out to alleviate this problem. One way to do so is to
explore the use of graphical supercomputing to accelerate field theory simulations
of defects. All of the work necessary to achieve this is presented in Chap. 3. In the
aforementioned chapter, we first present a simulation of domain walls which can
only use a single graphical accelerator, and then move on to single and multiple
accelerator versions of Abelian-Higgs cosmic string simulations. The main result is
that indeed there are noticeable gains, either in speed-ups or reduction of required
ix
x
Abstract
computational time. With this we can simulate the largest possible lattices attainable
so far: 81923 for Abelian-Higgs strings, using 4096 graphical accelerators.
Equipped with faster and larger resolution simulations we set out to calibrate
the canonical semi-analytical model of defect evolution—the Velocity-dependant
One Scale model (VOS)—in Chap. 4. Specifically we explore the calibration of a
six parameter version of this model, extended to take into account energy loss via
radiation and the possibility of wiggles on small scales. To do so, we first explore
a calibration of walls that re-enter the horizon, after being pushed out by inflation.
Following calibration, we then test if this model properly predicts the behavior of
the network when going for frozen outside of the horizon, re-entering, and then
achieving scale-invariant evolution. We then set out to update the calibration for
Abelian-Higgs strings. Firstly, this was done by attempting to use the single GPU
version of the simulation, which gave us a preliminary set of model parameters
and allowed us to study the effects of overcooling initial conditions. Secondly, the
hardware resources of Piz Daint and the ability of the simulation code to use them
have allowed a high-resolution calibration with a proper exploration of possible
systematic error sources. This then led us to discussing the observational impact of
said systematics in observational predictions derived from the VOS.
To conclude, we then present the end-goal of this thesis project in Chap. 5: to
simulate a toy model of cosmic superstrings. Although cosmic superstrings are very
different from field theory strings, and such a toy model cannot properly model all the
relevant phenomenological effects, it does permit the evolution and interaction of two
string networks. More specifically, it allows the formation of networks of combined
bound state strings, which eventually enter a scale-invariant regime. Although current
literature work has shown that this combined network is rather sparse, this presumes
certain numerical and parameter choices, which can plausibly inhibit or enhance
bound state formation. One such effect we explore is the changing of the behavior
of the model coupling to allow shrinking, growing, and constant width of strings.
Overall, we have developed and implemented string and wall simulations which
can be used for future studies on a variety of topics, ranging from the study of
small-scale structure (heavily relying on the centerline capabilities) or even a further
exploration of dual U (1) models. We remark as well that there are numerous upgrades
that can be made on a computational basis to further exploit extreme hardware.
Acknowledgements
This work was financially supported by a fellowship from Fundação da Ciência
e Tecnologia (SFRH/BD/130445/2017). Due to the computationally demanding
nature of the work developed, there are also some acknowledgements to be made
in terms of hardware donated/accessed. We gratefully acknowledge the support of
NVIDIA Corporation with the donation of the Quadro P5000 GPU used for the
beginning of this research. In addition, we acknowledge PRACE—Partnership for
Advanced Computing in Europe—for awarding us access to Piz Daint at CSCS—
Centro Svizzero di Calcolo Scientifico—Switzerland, through Preparatory Access
proposal 2010PA4610, Project Access proposal 2019204986, and Project Access
proposal 2020225448.
I greatly thank my supervisor Carlos Martins for believing in the crazy idea that
led to this thesis and for supporting me throughout the development of this work.
His patience, guidance, and knowledge have without a shadow of doubt molded my
way of thinking and, consequently, a lot of this manuscript. In terms of interesting
discussions, I would also like to thank Asier Lopez-Eiguren, Ivan Rybak, José Vieira,
Jon Urrestilla, Daniel Jimenez-Aguilar, José Juan Blanco Pillado, Guy Moore, Lara
Sousa, Luisa Maria Serrano, João Camacho, and João Faria. In addition, I would
also like to acknowledge visualization guru Jean Favre at CSCS for his technical
support and for his deep knowledge of ParaView and VTK. Without his know-how
the centerlines approach would not have been developed (and this thesis would be
shorter). I would also like to thank all the students of Carlos Martins with whom
I’ve worked throughout the duration of the doctoral program—Manuel Rosa, Diogo
Gomes, Ana Almeida, Filipe Costa, Siri Berge—we were, and still are, a good team.
No man is an island, and throughout the past 4.5 years there have been multiple
people who’ve made this complicated journey a lot more bearable, by being there
when I needed the most, and by celebrating my successes. I deeply thank my parents
Armanda and José, whose unwavering support, dedication, and love have shaped me
into who I am today. A special thank you as well to my grandfather Manuel, who
taught me the basics of programming with qBASIC, some bits of Latin, Greek, and a
lot of history. He also helped raise me and showed love and guidance when needed.
xi
xii
Acknowledgements
I owe everything I am to all the folk I mentioned and I hope I have made them proud
with this Thesis.
I’m deeply sorry my late grandmothers Armanda Oliveira Pereira Campos and
Maria Inês Correia, and my late great-aunt Maria Branca da Silva Santos could not
see me become a doctor and I gratefully acknowledge their kindness, their love, and
their wisdom. To them I respectfully dedicate this manuscript.
Last but not least, a big thank you to all my friends who, either via beers, coffees,
board game sessions or via a friendly office environment, have never failed to put
a smile on my face. Allow me to name a few (apologies if I forget someone) in no
particular order—Bachar, Jorge, Camacho, Saeed, Faria, André, Leal, Oliveira, Inês,
Elisa, Miguel, Susana, Olivier, Vardan. To conclude, I would also like to state that
the coolest office in CAUP is 1.01—no doubt due to its occupants and the piles of
boxes of Beira Douro coffee.
Contents
1 A Brief Description of Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 The History of Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 The Beginning of Modern Cosmology . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Missing Ingredients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Missing Ingredient I—Inflation . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Missing Ingredient II—Dark Energy and Dark Matter . . . . .
1.4 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
6
6
9
10
12
2 Topological Defects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Solitons and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Abelian-Higgs Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Beyond Abelian-Higgs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Kibble Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Simulations of Defects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.2 Abelian-Higgs Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Network Evolution Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Standard Velocity Dependent One-Scale Model . . . . . . . . . .
2.7.2 Extended Velocity Dependent One-Scale Model . . . . . . . . . .
2.7.3 Observational Footprints from Semi-Analytical
Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
15
16
18
19
23
24
24
27
31
32
35
3 Supercomputing with Graphics Processing Units . . . . . . . . . . . . . . . . . .
3.1 An Introduction to High Performance Computing . . . . . . . . . . . . . . .
3.1.1 Amdahl and Gustafson’s Laws . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Architectures and Programming Paradigms . . . . . . . . . . . . . .
3.2 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Single Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
49
50
56
57
36
42
43
xiii
xiv
Contents
3.3 Abelian-Higgs Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Single Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Multiple Accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 In-Situ Analysis and Visualization Pipeline . . . . . . . . . . . . . . . . . . . . .
3.4.1 Reduced Winding Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Centerlines Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
65
77
88
88
91
93
95
4 Calibration of Extended VOS Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Walls Formed Before the End of Inflation . . . . . . . . . . . . . . . .
4.2.2 A Primer on Calibrating the Extended VOS–Domain
Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Abelian-Higgs Cosmic Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Calibrations on Small Lattices–A First Approach . . . . . . . . .
4.3.2 Overcooled Initial Conditions . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3 Further Exploration of Model Sensitivity to Numerical
Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.4 Coda: Observational Impact of Different Calibrations . . . . .
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
99
100
100
102
109
109
119
5 Strings in U(1) L × U(1) L Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Simulation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 On Average Network Quantities . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 On Locating pq-Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Impact of Physical Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149
150
154
155
159
166
174
175
6 A New Generation of String Simulations . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Overview and Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Computational Improvements . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2 Small-Scale Structure of Abelian-Higgs Strings . . . . . . . . . .
6.2.3 Further Exploration of String Networks
in the U (1) L × U (1) L Model . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
177
177
180
180
181
129
141
144
145
183
184
Curriculum Vitae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Chapter 1
A Brief Description of Cosmology
The story so far:
In the beginning the Universe was created.
This has made a lot of people very angry and been widely
regarded as a bad move
Douglas Adams
1.1 The History of Cosmology
Cosmology is almost as old as humankind. The question of how and when “everything” started is a problem tackled by every ancient civilization in human history. It
started in the Neolithic period (100,000 years ago), where cosmology was more local
and based on phenomena such as weather, the moon, earthquakes, and evolved to a
cosmology based on myths and the supernatural with the Egyptian and Mesopotamian
civilizations—Old Babylonian, Assyrian, New and Late Babylonian periods—from
about 5,000 years ago. Note that from this point on, almost every single religion
includes a mechanism/story for the creation of the Universe.
Eventually such a pressing and difficult question, troubled also the the Greek and
Roman civilizations, whose Cosmology sprung from philosophy and geometry (600
Before Christ). In this sense, their Cosmology differed from the previous models as
it included the relationship between cause and effect, and it noted the need to deliver
predictions. For example, in the Plato model of Cosmology, Earth is surrounded by
a Air and Fire sphere, such that any balloon with hot air will rise to reach the sphere
of Fire. In this model—which we now know to be incorrect—Earth is at the center of
the Universe and all planets, the Sun and all other stars revolve around it. This was
the first of many geocentric models, which inevitably lead to the conclusion that the
Universe was designed for us.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations,
Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_1
1
2
1 A Brief Description of Cosmology
Scientific thought was severely stifled in the Middle Ages, but eventually, during
the 16th century, some important milestones ocurred. Some examples of these important milestones include the work of Johannes Kepler, Tycho Brahe—which allowed
the first mathematical descriptions of the motion of celestial bodies and proposed
a hybrid helio-geocentric model of the Universe—and the work of Nicolau Copernicus and Giordano Bruno—which supported a heliocentric model of the Universe,
where the Earth is not at the center of the Universe. While Copernicus explicitly
mentioned the Sun as the center of in his model, Giordano Bruno, by extension of
the aforementioned model, did away with the idea of stars glued to a celestial sphere,
and suggested that stars were akin to distant suns surrounded by their own planets. In
this sense, he even forsook the heliocentrism of the Copernicus model, and radically
suggested the Universe was infinite and with no center. The ideas of Copernicus and
Giordano Bruno were also supported and championed by Galileo Galilei.
In the 17th century, religious obscurantism stifled again scientific thought in
Europe, however in England, Isaac Newton, inspired by the previous ideas about
the motion of celestial bodies, boldly proposed physical laws that led to classical
Gravitation.
The era of modern cosmology began with Einstein’s theory of general relativity
in 1915. General Relativity, along with the idea of an expanding Universe by Friedmann and Lemaître in 1922 and 1927, respectively, eventually led to a consistent
mathematical description of the Universe, one that could be tested and observed.
The expanding Universe started as hot and dense fluid which gradually expanded
into the one we observe today.
From the observational point of view, the idea of an expanding Universe was
confirmed by Edwin Hubble in 1929, with the discovery that galaxies recede away
from us. Hubble also noted that the recession velocity of each galaxy was proportional
to its distance from the observer. As a starting point for this Thesis we will begin by
describing succinctly the first ingredients of standard Cosmology and the need for
inflation and dark energy. We will then note the role defects play in Cosmology and
which defects are to be studied in this manuscript.
1.2 The Beginning of Modern Cosmology
For now, let us begin with Einstein’s equation,
8π GTμν = Rμν −
1
Rgμν
2
(1.1)
where G is the gravitational constant, Tμν the energy-momentum tensor, Rμν and
R are Ricci tensor and scalar respectively and gμν the metric of space-time. Greek
indices run from 0 to 3, with 0 representing time and everything else spatial coordinates. There are two ingredients necessary to derive the dynamical equations of the
1.2 The Beginning of Modern Cosmology
3
Universe: first the energy-momentum of a perfect isotropic fluid and then the metric
to be used (of signature +,–,–,–). The energy momentum-tensor of a perfect fluid,
Tμν = ( p + ρ)Uμ Uν − pgμν
(1.2)
where U is the fluid velocity four-vector, which in the comoving rest frame is given
by Uμ = (1, 0, 0, 0), and fluid density and pressure in the rest frame are given by ρ
and p respectively. The tensor itself takes the following form then,
⎞
⎛
ρg00 0
0
0
⎜ 0 − pg11 0
0 ⎟
⎟.
(1.3)
Tμν = ⎜
⎝ 0
0 − pg22 0 ⎠
0
0
0 − pg33
For the metric (and derived quantities such as the Ricci tensor and scalar) we will
use the Friedmann-Lemaître-Robertson-Walker (from now on shortened to FLRW)
metric in natural units,
ds = dt − a(t)
2
2
2
dr 2
+ r 2 (dθ 2 + sin θ 2 dφ 2 )
1 − kr 2
(1.4)
where a(t) is a scale factor, and space-time coordinates are (t, r, θ, φ) and K is a
constant responsible for the curvature of the spatial slices for a given time. Depending on the value of K, < 1, > 1 or = 1, the Universe will either be hyperbolic/open,
spherical/closed or flat respectively. Note that the spatial part of this metric is maximally symmetric (i.e., possesses the maximal number of Killing vectors). This is
merely a reflection of the following cosmological principles:
• Homogeneity—the Universe exhibits translational invariance and thus there are
no privileged points in the Universe;
• Isotropy—the Universe exhibits rotational invariance and thus there are no privileged directions.
Note that this principles hold for the Universe at large but need not hold locally.
This metric can also be re-written in terms of conformal time τ ,
ds 2 = a(τ )2 dτ 2 −
dr 2
+ r 2 (dθ 2 + sin θ 2 dφ 2 )
1 − Kr 2
(1.5)
where the conformal time is related to physical time via,
dt 2 = a 2 dτ 2 .
(1.6)
Inserting the two necessary ingredients into the Einstein equations, we can take
the 00-component and the trace to derive the dynamical equations of the Universe:
first the Friedmann equation,
4
1 A Brief Description of Cosmology
H2 =
8π G
K
ρ− 2
3
a
(1.7)
and the Raychaudhuri equation,
Ḣ + H 2 = −
4π G
(ρ + 3 p)
3
(1.8)
where H = a1 da
is known as the Hubble parameter. Before we continue with a
dt
description of the dynamics of the Universe, it is useful to define two different
cosmological horizons: the Hubble horizon and the particle horizon.
The Hubble horizon is a boundary separating objects receding faster and slower
than the speed of light. It is defined as,
1
,
H
rh =
(1.9)
and a comoving version of it can be defined by multiplying by the scale factor,
1
.
aH
rhc =
(1.10)
The particle horizon corresponds to the maximum distance from which photons
could have traveled to the observer in the age of the Universe. It thus separates the
observable from the unobservable. In natural units it is given simply by the conformal
time,
t
η=
0
dt
.
a(t)
(1.11)
Throughout this thesis both will be used at some point of another. Having defined
these, we now continue with the description of the dynamics of the Universe.
μν
Using the conservation of energy-momentum T;ν = 0, or alternatively re-writing
1.8 with the help of a differentiated 1.7, one can write a conservation equation,
dρ
+ 3H (ρ + p) = 0
dt
(1.12)
It is also useful to define the density of the Universe when K = 0, also known as
the critical density,
ρc =
3H 2
8π G
(1.13)
which can be used to define a critical density parameter,
=
ρ
ρc
(1.14)
1.2 The Beginning of Modern Cosmology
5
This parameter can be used to re-write the first Friedmann equation 1.7 as,
1−
=
K
(1.15)
a2 H 2
Before we advance, we will take the continuity equation above in 1.12 and note
that the fluids one considers in Cosmology obey the barotropic equation of state,
p = ωρ, and appropriate substitution allows a solution of the continuity equation,
ρ ∝ a −3(1+ω)
(1.16)
This already gives quite a lot of information: in any era dominated by a relativistic
fluid (ω = 13 ), such as radiation, or in any era dominated by non-relativistic matter
where the pressure is negligeble (ω = 0),
ρr ∝ a −4
ρm ∝ a −3
(1.17)
We can now use these two components to write the density ρ as a sum of component densities,
ρi = ρr + ρm and define a critical density parameter for each
component. The first Friedmann equation can then be re-written in terms of the
critical density parameters of each component at the present time,
H 2 = H02 [
r,0 a
−4
+
m,0 a
−3
]
(1.18)
Using this dynamical equation, we can solve for the scale factor in terms of time,
considering at first some barotropic fluid without specifying ω,
2
a ∝ t 3(1+ω)
(1.19)
which means that the scale factor must have evolved in radiation and matter according
to,
1
a ∝ t2 ∝ τ
2
a ∝ t 3 ∝ τ2
(1.20)
We note now that this description already tells us the Universe first passed through
a hot and dense phase dominated by relativistic particles (radiation domination) and
as it expanded and cooled down it transitioned to a matter-dominated era. This picture
is however incomplete, and to arrive at the Standard accepted model of Cosmology
there are two fundamental missing ingredients that must be discussed.
6
1 A Brief Description of Cosmology
1.3 Missing Ingredients
1.3.1 Missing Ingredient I—Inflation
Inflation is one of the major ingredients of standard Cosmology and it was devised to
solve some issues of the hot big bang model [9]. It comprises a period of exponential
expansion of space (i.e., a ∝ exp(t)) during the early Universe that was proposed to
address three major problems:
• Horizon problem—This problem arises from the high homogeneity of the cosmic
microwave background (CMB). This basically means that all distinct patches of
the CMB sky are statistically the same (and indeed have the same temperature).
However, in a Universe with only with a matter and radiation epoch and no other
epoch preceding them, it is not possible for two distant patches of the sky to
equilibrate and obtain the same temperature, as such patches would move apart
faster than the speed of light. Inflation provides a solution: it posits that all patches
were causally connected in the past in a small region in thermal equilibrium, but
as inflation proceeded all patches are isolated by being pushed beyond the size of
the cosmological horizon (equivalently the cosmological horizon shrinks). In the
end, these regions end up not longer being in causal contact, but due to the period
of rapid expansion still maintain the same statistical properties;
• Monopole problem—Monopoles are topological defects (see next section for a
definition) which form in specific symmetry breaking patterns. Note that the formation of magnetic monopoles [14, 17] would be an inevitable prediction of many
Grand Unified Theories (GUT), as the required ingredient is to have some nonspecific symmetry group G, which contains an unbroken U (1) (gauge symmetry of
electromagnetism). This would be sufficient to form magnetic monopoles. There
is a problem with magnetic monopoles being formed in the early Universe. As
shown in [19] monopole annihilation process extremely slowly, to the point where
the monopole density would greatly exceed the critical density of the Universe,
contrary to observational evidence. Inflation can deal with this issue, by pushing
Monopoles out of the horizon (this is similar to the way inflation solves the horizon
problem), provided a sufficient number of e-folds (see below) are attainable. Note
that not only monopoles can be pushed out of the horizon, but also other cosmic
defects (strings and domain walls for example). The number of e-folds for each
defect can be different. Another consideration is when the symmetry breaking that
forms the defect takes place (before or during inflation). Depending on such details
it can be possible for defects to re-enter the horizon during radiation or matter era;
• Flatness problem—This issue arises due to the severe fine-tuning required to
ρ
achieve a flat Universe (where the critical density parameter, c = ρi c i is exactly
unity). To see how this problem arises, let’s use the re-written Friedmann equation, above 1.15. Assuming no inflation, the scale factor will grow according to
a power law (a ∝ t λ ) for both matter and radiation. As an example, let’s assume
a 1% deviation of |1 − | from zero at present time. Going back to Planck era
1.3 Missing Ingredients
7
then reveals that it must have had a value of 10−62 . This then reveals that any
small variation of the critical density parameter from unity in the initial conditions
results in a non-flat Universe. Inflation fixes this by introducing a period wherein
the scale factor evolves according to a ∝ exp(Ct), where C is a constant, which
in the above equation has the desirable effect: |1 − | can then begin with any
arbitrary value but a period of inflation can force it down to near zero (for example
to near 10−62 ) as,
1−
∝ exp(−t)
(1.21)
and subsequent evolution (post-inflation with power law scale factor) can bring
this value to the necessary small deviation.
In addition to solving these problems, we must also note that inflation provides
an explanation for the appearance of density fluctuations responsible for large-scale
structure. So far, it is in agreement with experimental data from the cosmic microwave
background, namely the shape of the temperature power spectra of the CMB is indeed
well described by an inflationary spectrum.
Initially inflation was inspired by first-order phase transitions, where some scalar
field in lies in a false vacuum state, acting much like a cosmological constant, as the
Universe cools down, the field would quantum tunnel to bubbles of true vacua (bubble nucleation). “Old” inflation however has a major drawback called the “graceful
exit” problem: if inflation proceeds via these bubbles and the probability of them
forming is large, inflation will be short-lived and frequent bubble collisions create a
highly inhomogenous Universe. On the other hand, if the probability of these bubbles forming is too low, inflation will indeed last a long time and each bubble will
represent an open Universe with a null density parameter ω (in stark contrast with
observational evidence). Neither option is acceptable.
A different approach which solves this issue comes from the “new” inflation
paradigm [3, 11]. In order to accurately describe how inflation resolves these issues
one could start with a scalar field (inflaton) which drives this early Universe behavior.
In order to describe new inflation we should however note the dynamics of the inflaton
and why a scalar field can be proposed. Starting with the second Friedmann equation
1.8, and requiring a shrinking Hubble radius (and therefore accelarated expansion),
r˙c < 0 → ä > 0
(1.22)
ρ < −3 p
(1.23)
which equivalently means that,
And this condition is naturally satisfied by a scalar field φ (the aforementioned
inflaton). To show it let’s begin with the simplest possible Lagrangian for a real scalar
8
1 A Brief Description of Cosmology
field,
L=
1
φ,μ φ ,μ − V (φ)
2
(1.24)
Remembering the previous relation for an isotropic fluid 1.2 and the definition
for the energy-momentum tensor,
2 δS
T μν = − √
g δg μν
(1.25)
we can write expressions for pressure and density,
ρ=
1
(∂t φ)2 + V (φ)
2
(1.26)
p=
1
(∂t φ)2 − V (φ)
2
(1.27)
which upon insertion in the previously mentioned condition 1.23, specify inflation
can only occur should the potential energy dominate,
V (φ) > (∂t φ)2 .
(1.28)
This is possible should the potential be flat enough. Note that there should also be
a minima to the potential, so as to end inflation. From here on, we can also re-insert
these expressions for pressure and density in the Friedmann equations, which reveal,
K
1
8π G
2
V (φ) + φ̇ 2 − 2
(1.29)
H =
3
2
a
φ̈ + 3H φ̇ = −∂φ V
(1.30)
In order to simplify these equations one can use the fact that Hubble expansion
will be dominated by potential energy as mentioned in 1.28,
H2 ≈
8π G
V (φ)
3
(1.31)
3H φ̇ ≈ −∂φ V
(1.32)
and define two additional parameters,
=
m 2Pl
16π
∂φ V
V
2
η=
m 2Pl
8π
∂φ2 V
V
2
.
(1.33)
1.3 Missing Ingredients
9
We can now state the necessary conditions for inflation to occur. These are known
as the slow-roll conditions, as they require that the slope and the curvature of the
potential to be sufficiently small,
1
η 1.
(1.34)
In order to solve all previously indicated problems, inflation must last for a sufficient amount of time. We can express this demand by introducing the number of
e-folds,
N = ln
a(te )
a(ti )
(1.35)
te
=
H dt
(1.36)
dφ
√
2
(1.37)
ti
≈
φ
φf
and it can be shown that this number must be minimally around 60, in order to ensure
that all problems can be solved by inflation [4].
1.3.2 Missing Ingredient II—Dark Energy and Dark Matter
Previously we mentioned that the recession velocity of galaxies depends linearly
with the distance of the galaxy to the observer (Hubble law). This result by Edwin
Hubble in 1929 largely confirmed the Universe to be expanding. Up until the late
nineties, the accepted view was this expansion was either constant or decelerated.
However, in 1998 both the High-Z Supernova Search Team [15] and the Supernova
Cosmology Project [13] and found that not only is the expansion not decelerating nor
constant, it is accelerating. This was done by looking at the luminosity of the type Ia
supernova at a range of redshifts. Note that if we only test low redshift accelerated
expansion is not evident (one obtains only the Hubble law). However, as soon as one
includes non-local redshifts, the view changes and indeed it is necessary to introduce
some fluid component into the Universe which accounts for accelerated expansion.
In order to do so, one can introduce a term in the Einstein’s equations,
8π GTμν = Rμν −
1
Rgμν + gμν
2
(1.38)
this term is known as the cosmological constant. It describes a fluid (known as dark
energy) which exerts negative pressure (ω = −1) countering gravity. From here we
10
1 A Brief Description of Cosmology
can re-obtain the Friedmann equations again,
H2 =
k
8π G
ρ− 2 +
3
a
3
Ḣ + H 2 = −
(1.39)
4π G
(ρ + 3 p) +
3
3
=
Introducing a critical density parameter for dark energy,
the second Friedmann equation,
m
+
r
+
(1.40)
=1
3H 2
one can re-write
(1.41)
The latest observational constraints from Planck data [2], assuming a flat Universe,
reveal the following present-day values for each density parameter: of 0.6911 ±
0.0062, an extremely small r (of order ≈ 10−4 ), and an m of around 0.3089 ±
0.0062. We note here that this means that accelerated expansion (and therefore dark
energy) has been experimentally confirmed by CMB experiments (see also [1] or
for older WMAP results [10]). In addition also, cosmic shear by weak lensing and
Lyman-α absorption spectra confirm accelerated expansion.
We also remark that there is another missing ingredient that should be included:
dark matter. It was proposed in order to account for missing matter necessary to
explain the rotational velocity of galaxies and plays an important role in the understanding of large-scale structure. In order to be in agreement with observational data
from large-scale structure one should use cold (non-relativistic) dark matter, which
suggests that the term m can be decomposed as m = cdm + b where cdm , b
are the critical density parameters of dark matter and baryonic matter, respectively.
With these we can write the dynamical equation of the Universe in terms of Hubble
constant,
H 2 = H02 [
r,0 a
−4
+
m,0 a
−3
+
]
(1.42)
Integrating this particular equation and fitting to supernova data (see Fig. 1.1) then
reveals good agreement with accelerated expansion. This equation also tells us one
thing: at late times, dark energy dominates the energy density of the Universe.
1.4 Scope of the Thesis
In this chapter we began by reviewing some of basic concepts underlying modern
Cosmology, such as the concordance model, C D M and inflation. Although we
cannot deny the successes of this model of standard Cosmology, we should be aware
of its failings. Of note we can list a few of these, which are active research topics in
1.4 Scope of the Thesis
11
Fig. 1.1 Supernova data from the Supernova Cosmology Project (Union 2.1 data from [16]) in
blue points, and a red line indicating the fit assuming = 0.7 and m = 0.3
Modern Cosmology at the time of writing. For example, we can consider the extreme
(120 orders of magnitude smaller) discrepancy between the cosmological constant
and the vacuum energy predicted by the Standard Model of particle physics [12], the
5σ tension in the Hubble constant (H0 ) measurements when comparing early and
late time searches [18], the current non-detection of a Dark Matter candidate particle
[5, 8], the fact that no explanation is offered for the matter-antimatter asymmetry
[6], disagreements from N-body simulations at small-scales [7], etc. In addition, we
point out that many extensions to the SM of Particle Physics might not seek to solve
only some of its problems but can also offer solutions to some of the cosmological
issues outlined above. Although such theories of new physics can be probed via direct
accelerator searches, the early Universe and the possibility of high-energy phenomena also provide an excellent laboratory. This can lead us to ask a different question:
are there phenomena common to several of these theories of new physics? And from
this question we can posit the possibility that these new sectors introduce phase transitions in the early Universe, which if they give rise to observational footprints can
be used to search for several theories at once.
We will next introduce (in the following chapter) one class of possible by-products
of phase transitions in the early Universe, named topological defects, and how they
might have formed. Some types of defect will be the object of study of this manuscript.
Subsequently, in Chap. 3 we will introduce briefly the High-Performance computing aspects to be throughout the rest of the chapter where we will also describe
the numerical simulations used for the physics results of this thesis. We remark that
12
1 A Brief Description of Cosmology
banishing all computing aspects to Chap. 3 is intentional: any reader who is not
computationally inclined can simply skip reading this chapter.
Afterwards in Chap. 4, we will describe explore some results obtained with the
simulations of Chap. 3, subsequently moving to explain how these simulations will
help us simulate a toy model of cosmic superstrings in Chap. 5. We then present some
concluding remarks and some tentative next steps.
References
1. Ade P et al (2016a) Planck 2015 results. XIV. Dark energy and modified gravity. Astron
Astrophys 594:A14. https://doi.org/10.1051/0004-6361/201525814
2. Ade P et al (2016) Planck 2015 results. XIII. Cosmological parameters. Astron Astrophys
594:A13. https://doi.org/10.1051/0004-6361/201525830
3. Albrecht A, Steinhardt PJ (1982) Cosmology for grand unified theories with radiatively induced
symmetry breaking. Phys Rev Lett 48:1220–1223. https://doi.org/10.1103/PhysRevLett.48.
1220
4. Baumann D (2011) Inflation. In: Theoretical advanced study institute in elementary particle physics: physics of the large and the small. pp 523–686. https://doi.org/10.1142/
9789814327183_0010
5. Bertone G, Hooper D (2018) History of dark matter. Rev Mod Phys 90(4):045002. https://doi.
org/10.1103/RevModPhys.90.045002
6. Canetti L, Drewes M, Shaposhnikov M (2012) Matter and antimatter in the universe. New J
Phys 14:095012. https://doi.org/10.1088/1367-2630/14/9/095012
7. Del Popolo A, Le Delliou M (2017) Small scale problems of the CDM model: a short review.
Galaxies 5(1):17. https://doi.org/10.3390/galaxies5010017
8. Freese K (2017) Status of dark matter in the universe. Int J Mod Phys 1(06):325–355. https://
doi.org/10.1142/S0218271817300129
9. Guth AH (1981) Inflationary universe: a possible solution to the horizon and flatness problems.
Phys Rev D 23:347–356. https://doi.org/10.1103/PhysRevD.23.347
10. Hinshaw G, Larson D, Komatsu E, Spergel DN, Bennett CL, Dunkley J, Nolta MR, Halpern
M, Hill RS, Odegard N, Page L, Smith KM, Weiland JL, Gold B, Jarosik N, Kogut A, Limon
M, Meyer SS, Tucker GS, Wollack E, Wright EL (2013) Nine-year Wilkinson microwave
anisotropy probe (WMAP) observations: cosmological parameter results. APJS 208(2):19.
https://doi.org/10.1088/0067-0049/208/2/19
11. Linde AD (1987) A new inflationary universe scenario: a possible solution of the horizon,
flatness, homogeneity, isotropy and primordial monopole problems. Adv Ser Astrophys Cosmol
3:149–153. https://doi.org/10.1016/0370-2693(82)91219-9
12. Martin J (2012) Everything you always wanted to know about the cosmological constant
problem (but were afraid to ask). Comptes Rendus Physique 13:566–665. https://doi.org/10.
1016/j.crhy.2012.04.008
13. Perlmutter S et al (1999) Measurements of and from 42 high redshift supernovae. Astrophys
J 517:565–586. https://doi.org/10.1086/307221
14. Polyakov AM (1974) Particle spectrum in the quantum field theory. JETP Lett 20:194–195
15. Riess AG et al (1998) Observational evidence from supernovae for an accelerating universe
and a cosmological constant. Astron J 116:1009–1038. https://doi.org/10.1086/300499
16. Suzuki N, Rubin D, Lidman C, Aldering G, Amanullah R, Barbary K, Barrientos LF, Botyanszki
J, Brodwin M, Connolly N, Dawson KS, Dey A, Doi M, Donahue M, Deustua S, Eisenhardt
P, Ellingson E, Faccioli L, Fadeyev V, Fakhouri HK, Fruchter AS, Gilbank DG, Gladders
MD, Goldhaber G, Gonzalez AH, Goobar A, Gude A, Hattori T, Hoekstra H, Hsiao E, Huang
X, Ihara Y, Jee MJ, Johnston D, Kashikawa N, Koester B, Konishi K, Kowalski M, Linder
References
13
EV, Lubin L, Melbourne J, Meyers J, Morokuma T, Munshi F, Mullis C, Oda T, Panagia N,
Perlmutter S, Postman M, Pritchard T, Rhodes J, Ripoche P, Rosati P, Schlegel DJ, Spadafora
A, Stanford SA, Stanishev V, Stern D, Strovink M, Takanashi N, Tokita K, Wagner M, Wang
L, Yasuda N, Yee HKC (2012) T. Supernova cosmology project. The hubble space telescope
cluster supernova survey. V. Improving the dark-energy constraints above z > 1 and building
an early-type-hosted supernova sample. Astrophys J 746(1):85. https://doi.org/10.1088/0004637X/746/1/85
17. ’t Hooft G (1974) Magnetic monopoles in unified gauge theories. Nucl Phys B 79:276–284.
https://doi.org/10.1016/0550-3213(74)90486-6
18. Verde L, Treu T, Riess AG (2019) Tensions between the early and the late universe. Nature
Astron 3:891. https://doi.org/10.1038/s41550-019-0902-0
19. Zeldovich Y, Khlopov M (1978) On the concentration of relic magnetic monopoles in
the universe. Phys Lett B 79(3):239–241. ISSN 0370-2693. https://doi.org/10.1016/03702693(78)90232-0. https://www.sciencedirect.com/science/article/pii/0370269378902320
Chapter 2
Topological Defects
Its height gradually diminished, and after a chase of one or two
miles [2–3 km] I lost it in the windings of the channel. Such, in
the month of August 1834, was my first chance interview with
that singular and beautiful phenomenon which I have called the
Wave of Translation.
John Scott Russell
2.1 Solitons and Topology
As shown in the epigraph above, in 1834, the scottish engineer John Scott Russell,
when attempting to optimize the design of boats for the Union Canal, noted a particular phenomena: a water “wave of translation” which propagated along the channel
without change of form, nor velocity. These “waves of translation” quickly became
an object of obsession not just for Russell, but for several physicists in different
areas such as fluid dynamics (Kortweg-deVries equation), condensed matter (GrossPitaevskii equation), cosmology (as will be seen in the next paragraph). Today we
name these waves “solitons,” and while a specific definition is often hard to pin, a few
general properties can be identified, such as the fact that they correspond to stable,
self-sustaining wave packet solutions with a permanent form, they are localized to
a given region, and, unlike regular waves, will not merge with other solitons upon
interactions.
One specific type of soliton, owes its stability to a topological condition, i.e., to
the inability to decay to a topologically trivial solution. The main object of study of
this thesis are topological defects possibly created in the early Universe, by means
of the Kibble mechanism [32]. In order to define them, we take the approach of
beginning with a more technical definition and then gradually simplifying to a more
intuitive one. We begin with assuming a phase transition wherein, at some critical
temperature Tc , some symmetry group G of elements g ∈ G is broken spontaneously
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations,
Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_2
15
16
2 Topological Defects
Table 2.1 A list of defects formed from distinct homotopy groups
Homotopy condition
Topology of the vacuum
Resulting defect
π0 (G/H ) = 1
π1 (G/H ) = 1
π2 (G/H ) = 1
π3 (G/H ) = 1
disconnected
non-contractible loop
non-contractible 2-sphere
non-contractible 3-sphere
Domain wall
Cosmic string
Monopole
Texture
to a subgroup H where h ∈ H . The Hamiltonian of the theory is described by a set
of fields φ that are invariant under transformations by elements of G,
H[φ] = H[φg ]
(2.1)
φ0h = φ0
(2.2)
{g H : h ∈ H } = g H
(2.3)
The field configurations which are called defects arise when the vacuum manifold
is homotopically non-trivial,
(2.4)
πn (M) = .
in other words, when the vacuum can be described by an non-contractible ndimensional sphere, which can have non-connected components. The field configuration we are describing is a solution to a given set of partial differential equations
(in a field theory the equations of motion) that obeys a set of boundary conditions. In
other words, this solution exists because at the boundaries, the solution is described
by a topologically non-trivial vacuum manifold. This definition also has an interesting side-effect: depending on the dimensionality of the n-sphere (i.e., on the topology
of the vacuum manifold) different defects emerge (Table 2.1).
To make the point about boundary conditions more clear and to visualize the
nature of defects in real space, we can introduce two examples of defects which,
coincidentally are the ones studied throughout this thesis. The two examples will be
the domain wall, which arises when n = 0 and the cosmic string which arises when
n = 1.
2.2 Global Domain Walls
Starting with the Lagrangian for a theory invariant originally (before symmetry breaking) invariant under global Z2 transformations,
2.2 Global Domain Walls
17
Fig. 2.1 On the left-hand panel, an example of a potential which allows the equations of motion
of a real scalar field to admit a wall solution; on the right-hand, the analytical static domain wall
solution and corresponding energy density obtained with the aforementioned potential
L=
1
φ,μ φ,μ − V0
2
2
φ2
−
1
,
φ20
(2.5)
which, by standard variational techniques admits the following equations of motion
in Minkowski space,
2
2
2 φ
∂μ φ + 4V0 φ0 2 − 1 φ = 0 .
(2.6)
φ0
In this specific case, the vacuum manifold (see Fig. 2.1a) corresponds to disconnected region and each minimum corresponds to a distinct value of the field, η and
−η. The solution we are looking for respects the boundary conditions, φ(−∞) = −η
and φ(∞) = η. Restricting ourselves to one-dimensional static case, the analytical
solution takes the form,
2V0 φ0 x .
(2.7)
φ(x) = φ0 tanh
which is plotted in Fig. 2.1b. The wall core then corresponds to a region where the
Vacuum-Expected Value (VEV) is the one predicted by the unbroken symmetry
phase, a region with higher energy density than it’s surroundings.
To finalize, we can also make the following remark: defects are stable against
perturbations and decay precisely because there exists no continuous transformation
which maps them into a trivial solution. Physically, this implies that the removal of the
defect comes with an associated energy cost: lifting all of the field over the potential
barrier. One can take this a step further by stating that this stability is a consequence
of a topological conservation law, and that each defect has an associated charge. This
was first shown by [1].
18
2 Topological Defects
2.3 Abelian-Higgs Strings
We can also show another example of defect, the line-like cosmic strings. For this
we start with a Lagrangian density invariant under U (1) local transformations,
L = |Dμ φ|2 −
λ
1
(|φ|2 − σ 2 )2 − 2 F μν Fμν
4
4e
(2.8)
Note that the shape of the potential is the so-called “Mexican-hat,” as evidenced
by Fig. 2.2a. By means of the Euler-Lagrange equations one can then obtain the
following equations of motion,
Dμ D μ φ +
λ
(|φ|2 − η 2 )φ = 0 ,
2
∂μ F μν = 2e[φ∗ D ν φ]
(2.9)
(2.10)
Analytical solutions can be difficult (impossible actually) to obtain in this case
from first principles, but for our purposes we can the demonstrate the existence of
the string solution using the Nielsen-Olsen ansatz for a static straight string, whereby
the fields are described by a set of auxiliary functions f (r ) and α(r ),
φ(r ) = f (r ) expinθ
n α(r )
Aθ = −
e r
(2.11)
(2.12)
where r is the radial coordinate, θ the angular coordinate and n is an integer number
denoted the winding (which dictates how many times the field non-trivially winds
around the core of the string). This integer number constitutes a topological charge
and is a conserved quantity. Substitution of this ansatz into the equations of motion
under the assumption of staticity then reveals the following system of equations,
d2 f
1 df
λ
− n 2 f (α − 1)2 − f ( f 2 − 1) = 0
+
2
dr
r dr
2
d 2 α 1 dα
− 2e2 f 2 (α − 1) = 0
−
dr 2
r dr
(2.13)
(2.14)
In order to solve for f (r ) and α(r ) we can use collocation methods to solve for
f (r ) and α(r ) (via the COLNEW software [9]) by imposing the boundary conditions,
2.4 Beyond Abelian-Higgs
19
Fig. 2.2 On the left-hand panel, an example of a potential which allows the equations of motion
of a complex scalar field to admit a string solution; on the right, the analytical static gauged string
solution obtained with the aforementioned potential
lim f (r ) = 0
r →0
lim α(r ) = 0
r →0
lim f (r ) = 1
(2.15)
lim α(r ) = 1
(2.16)
r →∞
r →∞
The resulting solution for unit winding (n = 1) can be seen in Fig. 2.2b. These
boundary conditions imply that the ground state of the scalar field is recovered far
away from the string.
Now that these two numerical solutions gave the reader a better intuition on the
topological defects used throughout this thesis. The intuition gained here will aid us
in the description of a more exotic types of strings, which we will attempt to study,
as an end goal of the thesis. After such introductions, we will also contemplate how
defects could have formed in the early Universe.
2.4 Beyond Abelian-Higgs
In 1985 [68], with the rise of string theory and the appearance of cosmic strings, the
possibility that strings from perturbative string theory could play the role of cosmic
strings was put forth. They are in a strict sense different objects: in one case, the
string is the fundamental object (and as such the action is written for it), in the other
some field is the fundamental object and a specific configuration of it corresponds to
a cosmic string. However, if strings could be stretched to cosmological scales they
could play a similar role to cosmic strings.
In the aforementioned publication, the answer was no and for several reasons:
they would have extremely high tensions which is not only in direct contrast with
current observational limits for cosmic strings, but also suggests they could not form
20
2 Topological Defects
after inflation (meaning they would be diluted away forming before), and a lack of
stability (the strings could quickly break up into smaller strings, thus not provide any
observational signal—bosonic open strings [70]). This however changed with the
introduction of superstring theory and the proposal of higher-dimensional extended
objects called branes [21], where open strings can end (D for Dirichlet boundary
condition) which can possess a gauge field. Although we will later state the action
for a brane, for now let’s keep to arguments that will allow us to relate branes
produced in brane-inflation models to cosmic strings. Note that the author is not a
string theorist, and therefore most arguments presented here will take a field theory
slant.
As such, we will briefly turn our attention to a Type IIB brane-annihilation scenario
[30, 54] where F-strings (F stands for fundamental) strings and D-strings (branes
with all dimensions but one compactified) are formed and act as cosmic strings, even
obeying the homotopy group condition for topological defects. These will be known
as cosmic superstrings. Note that this is not the only possible formation scenario,
and in fact hybrid inflation in the context of superstring theory could also reasonably
lead to cosmic superstrings [22, 53]. In brane annihilation a brane and a parallel
anti-brane collide at the bottom of a throat in a compact manifold [49], where the
metric is warped as,
ds 2 = e2 A(y) ημν d x μ d x ν
(2.17)
which effectively redshifts the tension of any resulting strings. The effective tension
is in a KKLMMT model [31] is,
μe f f = e2 A(y) μ F
(2.18)
where e2 A(y) is a warp factor, y are compact coordinates and μ F the ten-dimensional
tension of a fundamental string. This basically tackles directly the issue of tensions
being either too high and thus in conflict with observational data or too low wherein
strings would have a mass scale lower than the inflaton. In order to solve the next
issue, one also needs to consider stability, however this is highly model dependent
and will not be discussed further (see [45] for general comments). We will now
discuss how these superstrings are analogous to the field theory strings, and we will
(in a later section) describe how they form stretched to cosmological sizes.
In order to show the first, we will succinctly report the arguments of [54]. We
begin by noting that there is a tachyonic open superstring stretching between the
two branes, which ceases to exist as the branes annihilate. This process is known
as tachyon condensation and was first noted by [56]. For this we must write (here
without proof, but see [54]) the tachyon potential for an open tachyon string stretching
between branes, with each brane compactified in a ( p − 3) dimensions and containing
a U (1) gauge symmetry, close to when the inter-brane separation is null and inflation
ends,
2.4 Beyond Abelian-Higgs
21
V (T ) = 2τ p V|| −
λ
Ms2 T † T
+ (T † T )2 + ...
4
4
(2.19)
where τ p is the tension of the branes, T is a complex scalar field (the tachyon per
se), V|| the compactification volume, λ a parameter which can be obtained from open
superstring field theory [11] and Ms = 1/α the string scale. The vacuum manifold
clearly obeys the condition,
π1 (U (1)) = Z = 0
(2.20)
and proves an object akin to a vortex or string should be formed. Since branes of
co-dimension 2k = 2, or equivalently dimension p − 2 (the argument follows from
K-theory, see [69]), and, since p − 3 dimensions are compactified, the resulting
object must have a single large dimension. This connects the topological character
of the string to its one-dimensional (one-large-dimensional) nature. While higher
dimensional U (N ) could give rise to domain walls and monopoles (2k = 1, 2k = 3,
respectively) and odd p would result in BPS instability (Type IIB string theory) thus
ensuring any such defect network would quickly decay.
So far we’ve been describing cosmic superstrings as Dp-branes wrapped around
p − 1 cycles. However, it was also mentioned above that fundamental strings (Fstrings) could play the role of cosmic superstrings. This is a consequence of the
existence of the SL(2, Z) transformation (S-duality) relating the gauge field Aμν
from a D-string to an antisymmetric field Bμν and interchanging the coupling constant
, which establishes D and F strings as duals of one another. A striking
gs → −1
gs
reflection of this duality is the existence of bound-states of p F-strings and q Dstrings—hereby referred to as ( p, q) strings. This gives rise to a tension spectra
which in flat space looks like the following [55],
q2
(2.21)
μ p,q = μ F p 2 + 2
gs
where μ F = 1/2πα is the tension of the fundamental string, gs the coupling constant and p and q the charges of each string type. We remark that this is modified
for a cosmological background [26]. Later on we will introduce a possible starting
point for the analytical studies of cosmic superstrings. However, and given that this
thesis is based on field theory simulations, we will now turn our attention to a toy
model that can be used to study superstrings with such simulations. We then propose to study the dual U (1) model of [52]. In this model, each U (1) parameters
and the coupling to connect scalar fields in both sectors, must support the existence
of two “Abelian-Higgs” strings and possible string combinations. Starting with the
following Lagrangian density,
L = |Dμ φ|2 −
1 μν
1
F Fμν + |Dμ ψ|2 − G μν G μν − V (|φ|, |ψ|)
4
4
(2.22)
22
2 Topological Defects
for two complex scalar fields φ and ψ, two U (1) gauge fields Aμ and Bμ with
corresponding gauge field strengths G μν and Fμν . The covariant derivatives, gauge
field strengths and potential are given by,
Dμ = ∂μ − ie p Aμ
Dμ = ∂μ − ieq Bμ
(2.23)
Fμν = ∂μ Aν − ∂ν Aμ
G μν = ∂μ Bν − ∂ν Bμ
(2.24)
V (|φ|, |ψ|) =
λ1
λ2
(|φ|2 − η12 )2 + (|ψ|2 − η22 ) − λ3 (|φ|2 − η12 )(|ψ|2 − η22 )
4
4
(2.25)
where λ p,q are scalar couplings, e p,q are gauge couplings and κ is the coupling
between scalar fields. If these parameters are such that 0 < κ < 21 λ p λq then the
vacuum manifold will be non-trivial in two sectors supporting the existence of two
strings and, due to the non-zero value of λ3 , also bound state strings.
There are of course several obvious details not adequately captured by this model,
such as the lack of supersymmetry, no dynamical effects of compactified dimensions
on strings, and the intercommutation probabilities will simply be unity (unlike what
is expected from scattering amplitudes [63]). In order to study the dynamical impact
of these effects on cosmic superstring networks it might be fruitful to modify NambuGoto simulations—the reason will become apparent when we describe the NambuGoto action later on in this chapter. However, this toy model is already suitable for
studying the kinematic conditions for the formation of bound states, as in [12], and
for the presence of scaling for different types of strings, as done in [34] and as will
be shown in a later chapter.
Finally, we note that there is an additional arguably more realistic type of field
theory string, first proposed by [67], that this model can be used to study. In the
case where only one U (1) symmetry is broken, for instance by choosing parameters
such that 4λ3 > λ1 λ2 , only a single string type forms. The sector with an unbroken
sector will form a condensate at the string core, where the scalar field ψ takes some
expectation value and tends to zero infinitely far away from the string. In effect, this
translates in a trapping of the flux of the charged scalar field on the string which
behaves like a superconducting wire—hence they are known as superconducting
cosmic strings. Although these strings have been studied in simulations of colliding
strings in flat-space [33], so far it remains to be seen whether a full network can also
be simulated in a cosmological setting.
Having reviewed the types of defects to be studied throughout this thesis, we must
understand how they form before we move on to explaining the types of simulations
and analytical models used.
2.5 Kibble Mechanism
23
2.5 Kibble Mechanism
Topological defects in cosmology were first proposed by [32], and besides positing
their existence, a mechanism for their formation was also proposed. In the early Universe, provided some large symmetry group underpinning a Grand Unified Theory
breaks down to smaller groups (eventually to the Standard Model) phase transitions
will occur at each breaking. In order to describe the Kibble mechanism, we can use the
following effective potential for a Goldstone model (the potential of the Lagrangian
at 2.8 plus thermal effects),
λ
1
λ(|φ|2 − η 2 )2 + (T 2 − 6η 2 )|φ|2 − 0.5T 4 .
(2.26)
4
12
√
Above a critical temperature T
Tc , where Tc is given by Tc = 6η, the two
last terms (thermal effects) dominate the expression and thus the effective potential
takes the shape of a parabola. However, as the Universe cools down, past the critical
temperature, the potential changes drastically. Even though there is still some contribution from the thermal effects, the vacuum manifold is now topologically non-trivial
and the Lagrangian ceases to be invariant under gauge rotations. The behavior of the
potential for different regimes (well above Tc , at Tc and below Tc ) can be seen on
Fig. 2.3.
This signals the threshold at which spontaneous symmetry breaking (SSB) takes
place. At this point the field φ at each region of space will randomly roll down
to one of the minima of VEV. In the case of a string, the field can select a phase
V (T, φ) =
Fig. 2.3 The Goldstone potential with thermal effects for a phase transition that should lead to the
production of a vortex/string solution, should the temperature decrease sufficiently
24
2 Topological Defects
along the vacuum manifold, as long as it is proportional to the winding number.
However this is not yet enough to ensure the formation of the defect as the height of
the potential is not large enough to avoid fields (with sufficient kinetic energy) from
jumping over the potential maxima. When the Universe cools down sufficiently, at the
Tc temperature fluctuations will not be enough to allow
Ginzburg temperature TG
these “jumps,” and correlated regions will remain in the same minima (freeze-out in
comoving coordinates). We then note another critical detail: the typical size of these
patches of correlated choices of phase, which allows us to define a characteristic
scale for defects and defect separation, namely the correlation length, L. In general
in phase transitions the correlation length diverges (to infinity) as one approaches
the critical temperature. However, these regions cannot be infinitely large, as this
violates causality. As such the size often correlated patches must not be larger than
the size of the horizon, L ≤ d H ∼ t.
In the case of D-strings, one can also interpret their formation through the Kibble
mechanism, but with a twist. The first thing that can be noted is that due to the shape
of the potential of the tachyon field near the end of brane inflation (see equation
2.19). Then we can follow the arguments of [54]: as the correlation length needs to
obey a causality constraint, L ≤ d H ∼ t ∼ 1/H , one can say that the mechanism is
only allowed to proceed in dimensions where the compactification size l|| is larger
than the horizon size. In other words, the Kibble mechanism would not occur in
compactified dimensions, only in the large non-compactified ones (Fig. 2.4).
2.6 Simulations of Defects
The main goals of this thesis are twofold: to improve upon existing defect simulations
(taking advantage of graphical processors) and use these to improve and calibrate
existing semi-analytical models of string evolution. These tasks are described in
subsequent chapters. For now we must introduce both the simulations and the semianalytical models to be used throughout the thesis. There are two types of string
simulations: ones that evolve fields on a 3D lattice in conformal time, and those that
evolve string segments over conformal time. Two snapshots of isosurfaces of either
the field or its absolute value (for walls or strings, respectively) can be found in
Fig. 2.5a, b. The type of simulation that has traditionally struggled computationally
corresponds to field theory strings: which is what we will introduce now.
2.6.1 Global Domain Walls
We begin with the Lagrangian density of the simplest defect type: the global domain
wall. Writing an action and through standard variational techniques one can obtain
the following equation of motion,
2.6 Simulations of Defects
25
Fig. 2.4 In this figure we present a schematic view of the Kibble mechanism. First on the top-left
we present a Mexican-hat potential (one whose non-trivial vacuum topology supports the existence
of string solution). Note the choices of scalar field phase A, B, C (phases are represented by color
here). At the phase transition, different regions (for example A, B, C, see the top-right panel) make
a different choice of the phase of the scalar field (different colors). Such choices are typically
correlated over a lengthscale of the size of the horizon, which sets a size for each “patch.” However,
some regions remain trapped between these patches (the white string core in the top-right and lowerleft figure). Note that in the case of the string we should in reality admit a continuous variation of
the phase around the string core, as seen on the panel on the lower left. “Stacking” many different
copies of the lower left figure (as shown on the lower right panel) then shows a cosmic string and
the phase variation around it
∂V (φ)
ȧ
φ̈ + 2 φ̇ = ∇ 2 φ − a 2
a
∂φ
(2.27)
where a is the scale factor and the dotted derivatives indicate derivatives with respect
to conformal time, η, which is related to physical time by dη = dt/a. The doublewell potential takes the form present in the Lagrangian above 2.5. For the purposes
of obtaining the discrete version of the equations of motion, we will re-write the
Hubble damping of the previous such that,
26
2 Topological Defects
Fig. 2.5 Simulation snapshots of a network of domain walls and cosmic strings on the left and
right-hand-side panels, respectively. Both snapshots correspond to matter era simulations
φ̈ + 2
∂V (φ)
d ln a 1
φ̇ = ∇ 2 φ − a 2
d ln a η
∂φ
(2.28)
Consider then a 3D lattice of comoving spatial coordinates, specified by,
xi = xn i ,
ni ∈ Z
(2.29)
where the lattice spacing, x, is the same along all directions. The scalar fields φx
then reside on the lattice at each lattice point x. From here we can write partial
derivatives and the laplacian operator via finite differences to x and x 2 order,
1
[φx+ki − φx ]
x
1 x+ki
∂i− ∂i+ φ →
− 2φx + φx−ki ]
[φ
x 2
∂i+ φ →
(2.30)
(2.31)
where + or − serve to indicate whether the derivatives are forwards or backwards
and ki indicates a unit vector along a spatial direction i. With this we could write a
discrete version of the above mentioned equation of motion. However, before we do
so, there is a tricky problem that must be addressed: how the comoving thickness of a
defect behaves throughout a simulation. In general, given that the physical thickness
of a wall is constant, it means the comoving thickness shrinks by several orders of
2.6 Simulations of Defects
27
magnitude over time. A way to circumvent this is to force the defect thickness to be
constant—this is known as Press-Ryden-Spergel algorithm (PRS for short [48]). It
involves modifying the original equations of motion to yield,
∂2φ
∂V
d ln a 1 ∂φ
− ∇ 2 φ = −αβ
,
(2.32)
+
α
2
∂η
d ln η η ∂η
∂φ
where α and β are that allow one to adjust momentum conservation α = 3 and
constant comoving thickness β = 0. With α = 2 and β = 2 we recover the original
equations of motion. Note that in the original PRS article this was obtained by
fiat (i.e., modifying the equations of motion by hand), but it is possible to obtain
constant comoving width by forcing the comoving scalar coupling to be related
to the physical one by some power of the scale factor, as was done for strings by
[14]. This procedure is detailed in the next section. Note however that momentum
conservation is not enforced in such a way. This then can be used to apply a staggered
leap-frog evolution scheme, first-order Crank-Nicholson with respect to time,
x,η+1/2 =
(1 − δ)x,η−1/2 + η(∇ 2 φx,η − a β ∂V /∂φx,η )
1+δ
φx,η+1 = φx,η + ηx,η+1/2 ,
(2.33)
(2.34)
where the is the derivative with respect to conformal time of the scalar field = φ̇
and the δ parameter is given by,
δ=
1 η d ln a
α
.
2 η d ln η
(2.35)
And with this we are then able to update the scalar field at each site x for conformal
time η, thus simulating the scalar field on the lattice. From this we can stress one of the
differences between field theory simulations of defects and Nambu-Goto simulations:
we are not simulating defects directly, but merely the fields on a lattice. Hubble
damping and a potential where the vacuum manifold is indeed non-trivial, are then
responsible for forming defects on their own.
2.6.2 Abelian-Higgs Strings
For the Abelian-Higgs string we begin with the originally U (1) locally invariant
Lagrangian density presented in the previous subsection. The Lagrangian should
have the form given previously, and from variation of the action under the assumption
of FLRW metric and the temporal gauge (A0 = 0), come the equations of motion,
a2λ
ȧ
(|φ|2 − σ 2 )φ
φ̈ + 2 φ̇ = D j D j φ −
a
2
(2.36)
28
2 Topological Defects
Ḟ0 j = ∂ j Fi j − 2a 2 e2 I m[φ∗ D j φ]
(2.37)
∂i F0i = 2a 2 e2 I m[φ∗ φ̇]
(2.38)
along with Gauss’s law,
Again we have the same issue in these simulations as in the domain walls case: the
comoving radii of strings shrink by several orders of magnitude. In order to fix the
comoving width, [13] took a different approach from what was done by [48] and
described in the previous section (which simply modified the equations of motion by
fiat). This approach consists on modifying the comoving scalar and gauge couplings
to explicitly depend on the physical couplings as,
λ2 = λ20 a 2(1−β)
e = e0 a (1−β)
(2.39)
which now means that depending on the value of β these can either be constant
throughout the simulation (β = 0) or shrink as in the true physical case (β = 1).
Note that there is some flexibility in this approach to implement what [13] named
core growth. In the true equations of motion, by normalizing the scale factor to one at
the end of the simulation, the couplings are extremely large at the initial time-steps,
which prevents evolution of the fields without any artifacts. In the core growth “trick”
we start instead with a scale factor normalized to unity, and a negative β which forces
the comoving radii to grow, evading the problem of too large string width at early
conformal times—permitting a string network to form. After a sufficient amount of
time-steps, we can now set β = 1 and resume normal physical evolution. This core
growth procedure is illustrated on 2.6a.
The equations of motion are now,
a 2β λ0
ȧ
(|φ|2 − σ 2 )φ
φ̈ + 2 φ̇ = D j D j φ −
a
2
(2.40)
ȧ
Ḟ0 j + 2(1 − β) F0 j = ∂ j Fi j − 2a 2β e02 [φ∗ D j φ]
a
(2.41)
Notice the Hubble damping term on the Maxwell equation. This term goes to
zero in the true equations of motion, however, when β = 1 this is no longer true.
This term was not present in earlier Abelian-Higgs simulations, such as [42], and
this would mean that the modified evolution would not preserve Gauss’s law. After
this modification by [13] Gauss’s law is indeed preserved on the lattice to machine
precision. The only problem with constant comoving width is the fact that energymomentum conservation is violated (albeit to order of few percent). So far, at least
for Abelian-Higgs string networks it seems this violation does not impact overall
network dynamics heavily (see [27]), although this has not been explored for other
string types.
2.6 Simulations of Defects
29
Fig. 2.6 Two crucial aspects of Abelian-higgs field theory simulations: the evolution of the string
radius versus conformal time, when s is equal to 1 or when it is equal to −0.27 therefore demonstrating why core growth trick works; and the lattice discretization with the scalar field (blue dots)
defined at lattice sites, and lattice links between each site required to define how the gauge field
lives on the lattice
Now we are almost ready to write the discretized equations of motion to update
all the fields in a 3D comoving lattice. In order to do so, the correct way is to look
at the description of Gauge fields on a lattice given by [66], which preserves gauge
invariance on a lattice. This description is based on the interpretation that gauge fields
act as parallel transporters,
U jx = e−i A j
(2.42)
defined half-way (at links) between lattice points spaced by x (note: in the above
definition we have re-scaled the gauge field as Aj → A j x this implies the electric
field is re-scaled in the same way since E j = F0 j ). A schematic representation can
be found in Fig. 2.6b. The scalar fields will reside at lattice sites. Going around a
lattice square of size x 2 we can write the following product of link variables, i j ,
x+k j
i j = U jx U j
(U jx+ki )∗ (U jx )∗
= exp[ix(∂i+ Aj (x) − ∂ +j Ai (x))]
(2.43)
denominated the plaquette operator. Here the electromagnetic field tensor is already
apparent. From this, we can subsequently write down the gauge field strength,
1 1
1
−
[
Fi j Fi j =
]
(2.44)
ij
2
x 4
i
j
For convenience, we will also define the backwards derivative of Fi j , (forward)
gauge covariant derivatives and a Laplacian stencil,
30
2 Topological Defects
∂ −j Fi j =
1
[i j (x)] − [i j (x − k j )]
x 3
j=i
(2.45)
1 x x+k j
− φx
U φ
x j
(2.46)
D +j φx =
D −j D +j φx =
1
x−k
[U jx φx+k j − 2φxj + (U j j )∗ φx−k j ].
2
x
j
(2.47)
We now have all the ingredients to recover the lattice discretization of [13]. It
is then straightforward to take the equation of motions and create the following
staggered leap-frog (second order in time) evolution scheme,
(a 2 )x,η+ 2 = (a 2 )x,η− 2 + ηaη2 [D −j D +j φx,η
1
1
−
Ei
e2
x,η+ 21
λ0 2β x,η 2
a (|φ | − σ 2 )φx,η ]
2 η
=
+
Ei
e2
x,η− 21
+
xη
[−∂ −j Fi j
eη2
(2.49)
2e02 aη2s I m[φ∗ Di+ φ]x,η ]
1
φx,η+1 = φx,η + x,η+ 2
x,η+1
Ai
(2.48)
x,η
= Ai
(2.50)
x,η+ 21
+ E i
(2.51)
to order O(x 2 ), O(η 2 ). In the continuum limit (i.e., when lattice spacing x
vanishes), this evolution scheme reduces to the above equations of motion. We can
also devise the discrete version of Gauss’s law,
G = ∂ −j E i − 2e02 a 2β [φx,η,∗ x,η− 2 ] = 0
1
(2.52)
Given that a summary of the simulations used throughout this thesis is now complete,
we must warn the reader that simulations are not the be-all-end-all of cosmic string
studies, in the sense that one is prevented (by hardware, precision) to completely
simulate a network of strings/walls throughout all cosmic history. In this sense, a
combination of analytical studies with simulations is often used. We will thus present
in the next section an example of a model that describes the average properties of a
network of defects, if of course, it is properly calibrated by simulations beforehand.
2.7 Network Evolution Modelling
31
2.7 Network Evolution Modelling
The canonical way to analytically treat network evolution for defects is through the
Velocity-dependent One-Scale (VOS) model [38]. For the cosmic string case, we
begin by jotting down the simplest action for an infinitely thin and infinitely long
four-dimensional bosonic string,
S = −μ
√
−γd 2 σ
(2.53)
where μ is the four-dimensional string tension, γ is the induced world-sheet metric,
γab = gμν ∂a x μ ∂b x ν , where ∂ denote derivatives with respect to world-sheet coordinates, τ , σ. We note that while field theory strings don’t necessarily have an infinitely
small radius (in fact, this is undesirable in field theory simulations, as it leads to lattice
artifacts), assuming as an approximation a small radius, one is often able to recover,
as an effective action, the Nambu-Goto action from the field theory one (see [65] for
the Abelian-Higgs model). Bear in mind however, that in some aspects Nambu-Goto
strings and field theory ones are massively different (all one needs to do is compare
velocities and loop production rates from [27] and [50]). Still, this is besides the
point we want to make presently.
The procedure to derive the VOS model involves obtaining the equations of motion
of the Nambu-Goto action on a flat FLRW background, where the metric is given by,
ds 2 = a 2 (τ )(dτ 2 − d x 2 )
(2.54)
where a is the scale factor and τ is the conformal time; and the transverse-temporal
gauge is applied (x · ẋ = 0),
ȧ
(−1 x )
ẍ + 2 (1 − ẋ 2 ) =
a
ȧ 2
˙ = −2 ẋ
a
(2.55)
(2.56)
where dots and dashes indicate derivatives with respect to the conformal time τ and
with respect to the spatial world-sheet coordinate σ and is the quantity,
x 2
=
(2.57)
1 − ẋ 2
Then an averaging procedure is applied to obtain two macroscopic quantities,
ρ=
μa
V
dσ =
μ
L2
2
ẋ dσ
v2 = dσ
(2.58)
32
2 Topological Defects
a density, ρ and a root-mean-squared velocity, v. μ is the string mass per unit length,
is the string energy (zeroth-component of the energy-momentum tensor), a is the
scale factor and L is the correlation length. Once this averaging procedure is applied,
a set of two differential equations is revealed,
dL
= 2H L(1 + v 2 )
dt
k
dv
= 1 − v2
− 2H v
dt
L
2
(2.59)
(2.60)
and these comprise the core of VOS model, where L is a correlation length, k is a
momentum parameter (which while originally a constant, is analytically determined
to take a specific form, detailed in the next subsection) and v the root mean square
(RMS) velocity. In order to obtain the standard model, we need to explicitly account
for the velocity dependence of the momentum parameter and also include one the
main energy loss mechanisms of string networks.
2.7.1 Standard Velocity Dependent One-Scale Model
With regards to the first missing ingredient, a phenomenological shape for the
momentum parameter k(v) was obtained via a combination of the explicit form
of this parameter for the helicoidal string solution and subsequent comparison with
Nambu-Goto simulations in non-relativistic and relativistic regimes [39],
k(v) =
√
√
1 − 8v 6
2 2
(1 − v 2 )(1 + 2 2v 3 )
π
1 + 8v 6
which in the relativistic regime should take the following form,
√
2 2 1 − 8v 6
k(v) =
π 1 + 8v 6
(2.61)
(2.62)
This form of the momentum parameter presumes a maximal velocity√of v 2 of
1/2 (obtained in the low expansion rate limit) and a maximum k(v) of 2 π 2 which
corresponds to the small amplitude limit of the helicoidal string solution ansatz.
Note the even without considering this small amplitude limit, this parameter cannot
exceed unity unless a wiggly string is present.1 In the case of v > √12 , the momentum
parameter reduces to k(v) = 0.
1
Essentially this follows from considering that in the wiggly case the one-scale approximation is
not exactly valid: the curvature scale and the characteristic length are different. Assuming some
proportionality between the two, a constant factor would then be multiplied by the momentum
parameter in the model.
2.7 Network Evolution Modelling
33
Fig. 2.7 The two main mechanisms for the creation of cosmic string loops: the collision between
two strings and the self-intersection of a single string
And now for the second missing ingredient: energy loss via a velocity dependent
function F(v). In the standard VOS, this function only has a term describing energy
loss via loop production. The creation of loops occurs either when a long string self
intersects itself and or when two long strings collide and exchange partners (depicted
on Fig. 2.7). These loops (which contain some string length and therefore some energy
density) contract and collapse to a point, vanishing. We remark that when cosmic
strings meet the probability of intercommutation is close to unity, although such is
not necessarily the case for cosmic superstrings (where they appear from scattering
amplitudes [63]). The rate of energy loss due to loop production is parametrized as
being proportional to the velocity,
ρ
dρ
= cv
dt
L
(2.63)
where c the loop-chopping parameter, is the constant of proportionality. We then end
up with the following VOS model,
dL
= 2H L(1 + v 2 ) + F(v)
dt
(2.64)
dv
k(v)
= 1 − v2
− 2H v
dt
L
(2.65)
2
where F(v) = cv.
34
2 Topological Defects
Note that so far we have assumed strings, but in reality it is possible to deduce
equivalent models for other defect networks (see [37, 40, 47]). We can also present
the standard domain walls VOS, derived from a higher-dimensional analogue of the
Nambu-Goto action, as its extended form will be used in this thesis,
dL
= H L(1 + 3v 2 ) + cω v
dt
(2.66)
dv
kω
2
= 1−v
− 3H v
dt
L
(2.67)
where kω and cω are by analogy, the momentum parameter and the blob chopping
parameter. Note that in this standard case, both parameters are constants. Naturally
extensions of Nambu-Goto (and by extension of the VOS) can also come from more
higher-dimensional generalizations. For instance, the most natural action for a cosmic superstring would be the Dirac-Born-Infeld action from [46], which looks like
Nambu-Goto,
(2.68)
S = μ dτ dσ |γαβ + λFαβ |
plus an additional U (1) gauge field strength Fαβ and the corresponding coupling
constant λ. This is the reason for our previous comment on why a more “natural”
cosmic superstring simulation might be derived from existing Nambu-Goto codes.
For cosmic superstrings, this action with junction conditions (to ensure the conservation of charge) enable one to obtain junction dynamics in a string network, which
can be incorporated in a VOS model for multiple strings, see for instance [51].
Going back to standard string network evolution, a notable behavior networks
partake in is known as linear scaling. It is one of the possible solutions to the VOS
model. The solution is given by,
L ∝ t ∝ dH
v = const.
(2.69)
This linear attractor solution [38] can either be a curse or blessing, depending on
the defect. For instance, let’s consider for a moment a domain wall, where the density
is given by ρ = σ/L and σ is the tension per unit area, and therefore ρ ∝ t −1 . By
comparison with the critical density of the Universe (of order ρc ∝ t −2 ) one can see
the density parameter for domain walls evolves linearly with time, and so they would
eventually dominate the energy density of the Universe! Fortunately this overclosing
behavior can be evaded through various mechanisms (an example would be the
introduction of biases, see [57]). For analogous reasons, Cosmic strings are benign.
Monopoles are a slightly more complicated case, as depending on how effective
monopole-anti-monopole pair annihilation is (global [35] vs local) and therefore
depending on whether or not they tend to scaling, they can be catastrophic as well.
2.7 Network Evolution Modelling
35
2.7.2 Extended Velocity Dependent One-Scale Model
As was mentioned in the previous section, the standard k(v) for Nambu-Goto cosmic
strings was studied to show a specific dependence on the velocity, while no such
work had been done for domain walls in their respective standard VOS. Such is a
consequence of the non-existence of non-trivial analytical ansatz for walls (in contrast
with the cosmic string case, where the helicoidal string is used) some modifications
were required. However, in order to predict the correct velocity dependencies and
to ensure the walls VOS could accurately model the evolution in regimes where the
velocity does change (examples include the radiation to matter transition, or when a
transition from an inflationary era to radiation occurs), it was necessary to introduce
some modifications. However, by analogy with the Nambu strings k(v) the following
generalized momentum parameter was proposed [40],
k(v) = k0
1 − (qv 2 )β
.
1 + (qv 2 )β
(2.70)
where k0 , β and q are free parameters. Note as well that this reduces to the relativistic
string k(v) after making appropriate choices of parameters. In both cases q and k0
have a clear physical interpretation: q is limited by the maximal velocity of defect
network,
0<
1
2
≤ vmax
q
(2.71)
n
n+1
(2.72)
which by the arguments of [58],
2
vmax
=
should not exceed 2/3 for walls and 1/2 for strings; k0 , assuming non-wiggliness
cannot exceed 2. The calibration of the extended VOS for walls in [40] respects these
constraints. Later in this thesis we investigate if the Nambu string form is adequate
for the case of field theory strings or if the generalized form must be used.
The other ingredient of the extension adds another energy loss mechanism to
F(v),
(2.73)
F(v) = cω v + d[k0 − k]r
where d and r are new free parameters. This new term phenomenologically assumes
radiation can be represented by a power law of the curvature. Essentially regions
where the wall (or string) have greater curvature are smoothed out and emit radiation.
Note that it does not specify the nature of this radiation: in the case of comparing with
global domain walls, it will be scalar radiation only but one cannot have an idea of
how much of this radiation is emitted via the massless or the massive channel. In the
case of walls it was shown that losses via radiation greatly exceed blob production
36
2 Topological Defects
(via the linear term above; see [40]). In this thesis we will also study if this term is
necessary (and sufficient) for describing the evolution of field theory strings, where
the emission of massive radiation is posited to take a large role [27]. In the case
of strings this emission of radiation should be greater in regions where the string
discontinuous (such as a sharp kink) or when the string doubles back on itself (cusp).
2.7.3 Observational Footprints from Semi-Analytical
Modelling
The previously mentioned VOS model can be used to derive the expected observational footprints of a network of strings (or other defects) in a given cosmological
background. Since this easily motivates the need for improvements in the calibration of semi-analytical models, especially in the light of next-gen facilities [15, 25,
36] we will now review how to connect such models to footprints in two different
background types.
2.7.3.1
Cosmic Microwave Background
Approximately 370, 000 years after its birth (redshift z = 1, 100) the Universe had
cooled and expanded enough to allow charged electrons and protons to bind forming neutral hydrogen atoms. This had the consequence of greatly increasing the
mean free path of photons, allowing them to transverse greater distances (previously Thomson scattering with electrons occurred after a short traveling distance).
The Universe effectively became “transparent.” Shortly thereafter, since the neutral atoms had electrons not in a ground state, the electrons themselves would jump
towards the ground state and produce photons (this is known as photon decoupling).
The combined effect is for these freely traveling photons to produce a background
of electromagnetic radiation (at microwave wavelengths), that can be observed even
today. The Cosmic Microwave Background first detected in 1965 by Arno Penzias
and Robert Wilson [43] is a landmark evidence of Big Bang origin of the Universe
and, via successive observational probing [4, 10], has allowed for the confirmation
of several pillars of modern Cosmology (such as C D M, nucleosynthesis, and by
providing evidence towards inflation).
This background constitutes one of the most, if not the most precisely described,
black-body radiation spectra observed to date. It currently has a temperature of
2.72548 ± 0.00057 K . Due to its rather homogenous nature, the anisotropies of
this background are often the object of study. For instance, the map of temperature
anisotropies related to density perturbations can be described in terms of spherical
harmonics,
2.7 Network Evolution Modelling
37
l
δT
T
alm
Ylm (n)
(n) =
T
l m=−l
(2.74)
where l is a multipole moment, n is a unitary vector in the direction of the line of
sight and alm are coefficients which describe the temperature perturbation. We can
additionally define the multipole power spectra Cl ,
ClT = |alm |2 (2.75)
which, allows the full power spectrum of anisotropies to be written as,
δT
(n)
T
=
∞
1 (2l + 1)ClT
4π l=0
= DlT
(2.76)
(2.77)
Note that these are not the only types of anisotropies of the CMB: one can also look
into anisotropies in polarization, either curl or divergence-free, which by analogy with
electrostatics are named E-mode (EE) and B-mode polarizations power spectra (BB),
respectively, and additionally a cross-correlation temperature-E-polarization power
spectrum (TE). In order to obtain the theoretical expectation of how different processes can contribute to the existence of perturbations and the resulting anisotropies,
it is necessary to solve a set of linear differential equations in Fourier space [23],
Dac X̃ a (k, η) = S̃(k, η)
(2.78)
where, Dac is a linear differential operator, S̃ is a seed or source term, and X̃ is a
vector with information on all the perturbation variables for k mode (includes dark
matter density, velocity fluctuations, etc.). Perturbations from the analysis of this
equation can be classified as scalar, tensor or vector. Note that to linear order, there
is no mixing between perturbation types.
Now, in order to solve this equation one must first make an assumption about the
seed/source term above. In the beginning of the 1980’s, the cosmological community
was split between two distinct possibilities: inflation or topological defects. For the
first case, perturbations are generated during inflation, pushed out of the cosmological horizon, and after inflation will gradually re-enter (as the specific wavelengths
become comparable to the horizon), then evolving passively under the effects of
cosmological expansion and gravity. The phases of such perturbations will remain
constant after their production, hence the source term will be null, S̃(k, τ ) = 0. As
soon as the perturbations re-enter the horizon they will provoke in-phase (coherent)
acoustic oscillations of the surrounding fluid. This gives rise to a structure of peaks
and troughs in the resulting temperature power spectra.
38
2 Topological Defects
Topological defects however, are an entirely different beast: a network will continue contributing perturbations actively throughout space as it evolves. Here the
source term is no longer null and will generically depend on the energy-momentum
tensor of a full network. Due to the nature of this seeding, an ensemble of perturbations will give rise to incoherent acoustic oscillations. The computation of perturbations in Fourier space according to the aforementioned equation is then intrinsically
linked to knowing the full history of the energy-momentum tensor fo the full network.
In order to compute the power spectrum, we must describe the source term and
then solve for X̃ , which should be a matter of computing,
X̃ j (η, η0 , k) =
η0
ηin
dηG jm (η0 ; η, k) S̃m (η, k)
(2.79)
and for obtaining power spectrum,
X̃ j (η0 , k) X̃ l (η0 , k ) =
dηdη G jm (η, k)G ln (η , k ) S̃m (η, k) S̃n (η , k ).
(2.80)
Equivalently, this means that obtaining the power spectrum entails computing the
Unequal-Time Correlators (UETC) of the seed term,
1
(2.81)
d 3 xSm (η, x)Sn (η , x ).
Ucd (k, η, η ) =
V
This can be done either in numerical simulations (both Nambu-Goto or AbelianHiggs) or by following the Unconnected Segment Model approach from [44],
wherein the seed terms are given by the total network energy-momentum tensor
μν (k, η), given by a sum over K energy-momentum tensors of unconnected consolidated “sticks,”
μν (k, η) =
K Nd iμν T o f f (η, η i ),
(2.82)
i=1
where T o f f is a string decay factor responsible for switching off the contribution of
an i th segment after the time of its decay, and Nd is the number of segments that
decay between two different conformal times. The stress-energy tensor for each i th
segment will be given by the expression for a straight Nambu-Goto segment,
1
μν (y) = √
−g
dσ μ
x 2 μ ν
ẋ 2 μ ν 4
δ (y − x(σ)).
ẋ
ẋ
−
μ
x
x
ẋ 2
x 2
(2.83)
And such quantity in Fourier space will depend on the string network velocity v
and the comoving correlation length, ξ = a L. As an example see the 00 component,
2.7 Network Evolution Modelling
39
Fig. 2.8 The CMB spectrum obtained by Planck collaboration (top) and the spectrum predicted
by an inflationary model (bottom; obtained via CMBACT4 software [44]). The first image is taken
from [6]
00 (η, k) = √
μ
1−
v2
sin(k · X ξη/2)
k · X/2
(2.84)
where vectors X and Ẋ are merely the segment orientation and velocity orientations
(which can be related to string position and velocity x and ẋ, see for instance [19]).
Assuming an r.m.s. velocity, the two quantities v and ξ can be given by the VOS equations above from the previous section. Note that this approach is extremely plastic,
since even though a network of long unconnected Nambu segments is considered, a
network of long Abelian-Higgs could still be assumed, as long as the evolution of the
VOS is dictated by free parameters calibrated from Abelian-Higgs simulations. In
addtion this approach has also been applied to domain wall networks [60] and even
cosmic superstrings [19], where limits on wall tension (GσL 0 < 5.6 × 10−6 , where
L 0 is the characteristic lengthscale at present time) and fundamental string tension
(μ F < 2.8 × 10−8 ) are obtained, respectively. The Nambu-Goto string tension limit
is of Gμ < 1.1 × 10−7 in [5, 19].
Although we did not mention it yet, the relatively low tension limits presented
above are a direct result of inflation being the dominant contribution to the temperature power spectrum, this is obvious from a glance at observational data, whose
power spectrum shape is consistent with the peaks and troughs structure predicted by
inflation (see Fig. 2.8). Moreover the broad hump predicted by a string network (see
Chap. 4) is clearly not in agreement with the observed peak structure and therefore
strings cannot play a dominant role in structure formation. To be more quantitative, at
multipole moment l = 10, the Planck collaboration [7] determined that strings cannot
contribute more than 1–4% of the total temperature spectra. Note however, that even
if strings cannot contribute a significant amount to the temperature spectrum, they
can still contribute significantly to polarization spectra [14], whose characterization
is one of the main reasons for the next-generation instrument COrE [25].
40
2.7.3.2
2 Topological Defects
Stochastic Gravitational Wave Background
Another type of cosmological background expected to have been produced in the
early Universe is the stochastic gravitational wave background (SGWB). Although as
of yet undetected, the recent observation of gravitational waves from black hole and
neutron star mergers by the LIGO experiment [2, 3] and both current and upcoming observational facilities [15, 29, 36] have brought the characterization of this
background into the limelight. Several cosmological sources can contribute to this
background (see [20] for a general review), including a network of cosmic strings.
Given that such networks form loops, and cosmic string loops can emit gravitational
waves with power [64],
dEj
= P j Gμ2
dt
(2.85)
where Pi will be given by assumptions derived from the presence of small-scale
structure of the loop (kinky or cuspy loops for instance give rise to different Pi ).
Assuming some (if not all) energy of the loops is emitted via this gravitational
channel, loops forming and decaying throughout cosmic history will contribute to the
SGWB. One possible way to describe the history of the loop number density involves
using the VOS model [59, 62], a central part of this thesis. The main advantage
of using the VOS is that evolution of the network in non-scaling regimes can be
determined exactly provided the correct calibration of the model is used. We remark
that in the case of SGWB, the proper treatment of the radiation-to-matter transition
can have an impact in the resulting spectra—again, see [59].
Note that so far, such computations presume all energy in the loop is lost via gravitational radiation, which is in line with Nambu-Goto simulations, though possibly
not with Abelian-Higgs simulations (due to the hinted presence of scalar radiation).
Some work has been done in terms of studying the validity of the Nambu-Goto
approximation in flat-space backgrounds, although the initial shape of the loop can
yield contradicting conclusions [28, 41]—we will return to this point in the conclusions of the thesis.
For now we will show how the SGWB due to the number density of string loops
and how this can be computed from the VOS. To do so, we begin with the gravitational
wave energy density spectrum in a unit logarithmic interval of frequency, which can
be written as a sum over i th emission modes as,
gw ( f ) =
∞
igw ( f )
(2.86)
∞
1
2
Ci Pi
= Gμ f
ρc
i=1
(2.87)
i=1
where the specific expression for igw ( f ) can be modelled (following [16, 17, 64])
as if it were composed of different power spectra Pi in harmonic i of loops, where
2.7 Network Evolution Modelling
41
Ci accounts for the contribution of the number density of loops n(l, t) of length l
at different times. The first assumption often made is that the Pi power spectrum is
dominated by cusps along loops, and thus must take the form Pi = /i q /ζ(i), where
i = 4/3 for cuspy loops,
and the normalization is such that corresponds to the
total emitted power = i Pi and, from Nambu-Goto simulations takes the value
≈ 50 [17]. We note that this might not be valid for Abelian-Higgs simulations.
The expression of Ci is given by,
Ci ( f ) =
∞
0
2i
dt
n(li , t)
f 2 (1 + z)5
(2.88)
where z is the redshift. With this, the crux of computing the SGWB is reduced to
how this number density evolves throughout cosmic history. The expression for loops
created (per comoving volume) at time tc can be computed taking into account that,
n(lc , tc ) =
dn c
dt
dt
t=tc dl
(2.89)
t=tc
where n c is the rate of loop production and dl/dt is the rate of change of loop
length.An expression for the first term can be found by dividing the expression for
energy loss dρ/dt = cvρ/L through loop production by the energy of each loop
at the moment of creation E = μl, where l is the size of of the loops at creation
(assumed to be some fraction of the size of the horizon, l = αL),
c
dn c
= vL 4
dt
α
(2.90)
which is almost uniquely determined by the VOS, either through calibration (c) or
via evolution of the equations (L, v). Note however that α needs to be determined
by numerical simulations, and so far it has been determined only for Nambu-Goto,
where it has been found to be α = 0.1. The second term can found by noting that
current loop length is given by the difference between the initial loop size αL and
the power radiated,
l(t) = αL c + Gμ(tc − t)
(2.91)
where L c is the mean string separation at creation time, and therefore l(tc ) = αL c .
In other words, we can write the second term as,
1
dt/dl =
α ddtL
.
+ Gμ
t=tc
(2.92)
42
2 Topological Defects
Afterwards, it’s simply a matter of remembering that a given number density of
loops will contain loops created at different times. Therefore we must sum over these
different creation times,
n(l, t) =
c
n(lc , tc )
a(tc )
a(t)
3
(2.93)
and from this one can compute the entire SGWB spectra. In [17], an older release
of NANOGRAV data (and other Pulsar Timing Array data) seemed to yield a bound
on the string tension of Gμ < 1.5 × 10−11 . We do note however the presence of
an unexplained signal in the latest 12.5 year NANOGRAV data [8], whose shape is
consistent with the SGWB spectra produced by a cosmic string network [18, 24],
assuming tensions in the range Gμ ∈ [4 × 10−11 , 10−10 ]. Note however, that it might
be too early to claim a SGWB from a network of strings has been detected: the signal
itself does not seem (at least at first sight) to exhibit quadrupole spatial correlations.
As such further studies (and observations) are required to properly characterize it.
To conclude we offer some remarks about the computation of the SGWB for cosmic superstring networks, which so far is poorly understood. For now, the role of
junctions and their interplay with aspects of string phenomenology and subsequent
impact on gravitational wave production is clearly not yet understood. What can be
computed is the evolution of string networks with reduced intercommutation probabilities and therefore reduced loop number densities or the multi-tension VOS model
for some fixed number of bound states. This was done in [61] where a conservative
limit on the tension of F-strings was obtained: Gμ < 3.2 × 10−9 .
2.8 Summary
We showed that in the study of cosmic defects one often employs a combination
of numerical and analytical techniques are used to study these networks. We also
described more exotic string types—such as cosmic superstrings, which have some
phenomenological aspects that can be captured via field theory simulations with
more than one string type interacting. Both numerical and analytical approaches to
string studies have advantages and disadvantages: semi-analytical models alow one
to study the whole cosmological evolution of the network, which simulations, limited
by resolution, box size and dynamical range, cannot. However, any such model often
contains free parameters which cannot be obtained ab-initio, but must instead be
calibrated using simulations.
Since this symbiotic relationship sits at the heart of this thesis, we also described
lattice field theory simulations for two types of defects—global domain walls and
Abelian-Higgs strings—and described the semi-analytical modelling to be used from
here on out—versions of the canonical Velocity-dependent One-Scale model of string
evolution.
References
43
References
1. A non-linear field theory. In: Proceedings of the royal society of London a: mathematical,
physical and engineering sciences 260(1300):127–138, 1961. ISSN 0080-4630. https://doi.
org/10.1098/rspa.1961.0018. http://rspa.royalsocietypublishing.org/content/260/1300/127
2. Abbott BP et al (2016) Observation of gravitational waves from a binary black hole merger.
Phys Rev Lett 116(6):061102. https://doi.org/10.1103/PhysRevLett.116.061102
3. Abbott BP et al (2017) Multi-messenger observations of a binary neutron star merger. Astrophys
J Lett 848(2):L12. https://doi.org/10.3847/2041-8213/aa91c9
4. Ade P et al (2016) Planck 2015 results. XIII. Cosmological parameters. Astron Astrophys
594:A13. https://doi.org/10.1051/0004-6361/201525830
5. Ade PAR et al (2014a) Planck 2013 results. XXV. Searches for cosmic strings and other topological defects. Astron Astrophys 571:A25. https://doi.org/10.1051/0004-6361/201321621
6. Ade PAR et al (2014b) Planck 2013 results. XVI. Cosmological parameters. Astron Astrophys
571:A16. https://doi.org/10.1051/0004-6361/201321591
7. Ade PAR et al (2014c) Planck 2013 results. XXV. Searches for cosmic strings and other topological defects. Astron Astrophys 571:A25. https://doi.org/10.1051/0004-6361/201321621
8. Arzoumanian Z et al (2020) The NANOGrav 12.5 yr data set: search for an isotropic stochastic
gravitational-wave background. Astrophys J Lett 905(2):L34. https://doi.org/10.3847/20418213/abd401
9. Ascher UM, Mattheij RMM, Russell RD (1988) Numerical solution of boundary value problems for ordinary differential equations. Class Appl Math
10. Bennett CL, Larson D, Weiland JL, Jarosik N, Hinshaw G, Odegard N, Smith KM, Hill RS,
Gold B, Halpern M, Komatsu E, Nolta MR, Page L, Spergel DN, Wollack E, Dunkley J,
Kogut A, Limon M, Meyer SS, Tucker GS, Wright EL (2013) Nine-year Wilkinson microwave
anisotropy probe (WMAP) observations: final maps and results. APJS 208(2):20. https://doi.
org/10.1088/0067-0049/208/2/20
11. Berkovits N (2000) The Tachyon potential in open Neveu-Schwarz string field theory. JHEP
04:022. https://doi.org/10.1088/1126-6708/2000/04/022
12. Bevis N, Saffin PM (2008) Cosmic string Y-junctions: a comparison between field theoretic
and Nambu-Goto dynamics. Phys Rev D 78:023503. https://doi.org/10.1103/PhysRevD.78.
023503
13. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from
cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D
75:065015. https://doi.org/10.1103/PhysRevD.75.065015
14. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2010) CMB power spectra from cosmic strings:
predictions for the Planck satellite and beyond. Phys Rev D 82:065004. https://doi.org/10.
1103/PhysRevD.82.065004
15. Binetruy P, Bohe A, Caprini C, Dufaux J-F (2012) Cosmological backgrounds of gravitational
waves and eLISA/NGO: phase transitions. Cosmic strings and other sources. JCAP 1206:027.
https://doi.org/10.1088/1475-7516/2012/06/027
16. Blanco-Pillado JJ, Olum KD, Shlaer B (2014) The number of cosmic string loops. Phys Rev
D 89(2):023512. https://doi.org/10.1103/PhysRevD.89.023512
17. Blanco-Pillado JJ, Olum KD, Siemens X (2017) New limits on cosmic strings from gravitational
wave observation
18. Blasi S, Brdar V, Schmitz K (2021) Has NANOGrav found first evidence for cosmic strings?
Phys Rev Lett 126(4):041305. https://doi.org/10.1103/PhysRevLett.126.041305
19. Charnock T, Avgoustidis A, Copeland EJ, Moss A (2016) CMB constraints on cosmic strings
and superstrings. Phys Rev D 93(12):123503. https://doi.org/10.1103/PhysRevD.93.123503
20. Christensen N (2019) Stochastic gravitational wave backgrounds. Rept Prog Phys
82(1):016903. https://doi.org/10.1088/1361-6633/aae6b5
21. Dai J, Leigh RG, Polchinski J (1989) New connections between string theories. Mod Phys Lett
A 4:2073–2083. https://doi.org/10.1142/S0217732389002331
44
2 Topological Defects
22. Davis A-C, Brax P, van de Bruck C (2008) Brane inflation and defect formation. Phil Trans
Roy Soc Lond A366:2833–2842. https://doi.org/10.1098/rsta.2008.0065
23. Durrer R, Kunz M, Melchiorri A (2002) Cosmic structure formation with topological defects.
Phys Rept 364:1–81. https://doi.org/10.1016/S0370-1573(02)00014-5
24. Ellis J, Lewicki M (2021) Cosmic string interpretation of NANOGrav pulsar timing Data. Phys
Rev Lett 126(4):041304. https://doi.org/10.1103/PhysRevLett.126.041304
25. Finelli F et al (2018) Exploring cosmic origins with CORE: inflation. JCAP 1804:016. https://
doi.org/10.1088/1475-7516/2018/04/016
26. Firouzjahi H, Leblond L, Tye SH (2006) The (p, q) string tension in a warped deformed conifold.
J High Energy Phys 2006(05):047–047. https://doi.org/10.1088/1126-6708/2006/05/047.
27. Hindmarsh M, Lizarraga J, Urrestilla J, Daverio D, Kunz M (2017) Scaling from gauge and
scalar radiation in Abelian Higgs string networks. Phys Rev D 96(2):023525. https://doi.org/
10.1103/PhysRevD.96.023525
28. Hindmarsh M, Lizarraga J, Urio A, Urrestilla J (2021) Loop decay in Abelian-Higgs string
networks
29. Jenet F, Finn LS, Lazio J, Lommen A, McLaughlin M, Stairs I, Stinebring D, Verbiest J,
Archibald A, Arzoumanian Z, Backer D, Cordes J, Demorest P, Ferdman R, Freire P, Gonzalez
M, Kaspi V, Kondratiev V, Lorimer D, Lynch R, Nice D, Ransom S, Shannon R, Siemens X
(2009) The north American nanohertz observatory for gravitational waves
30. Jones NT, Stoica H, Tye SHH (2003) The production, spectrum and evolution of cosmic strings
in brane inflation. Phys Lett B 563:6–14. https://doi.org/10.1016/S0370-2693(03)00592-6
31. Kachru S, Kallosh R, Linde AD, Maldacena JM, McAllister LP, Trivedi SP (2003) Towards
inflation in string theory. JCAP 10:013. https://doi.org/10.1088/1475-7516/2003/10/013
32. Kibble TWB (1976) Topology of cosmic domains and strings. J Phys A 9:1387–1398. https://
doi.org/10.1088/0305-4470/9/8/029
33. Laguna P, Matzner RA (1990) Numerical simulation of bosonic superconducting string interactions. Phys Rev D 41:1751–1763. https://doi.org/10.1103/PhysRevD.41.1751
34. Lizarraga J, Urrestilla J (2016) Survival of pq-superstrings in field theory simulations. JCAP
1604(04):053. https://doi.org/10.1088/1475-7516/2016/04/053
35. Lopez-Eiguren A, Lizarraga J, Hindmarsh M, Urrestilla J (2017) Cosmic microwave background constraints for global strings and global monopoles. JCAP 1707:026. https://doi.org/
10.1088/1475-7516/2017/07/026
36. Maartens R, Abdalla FB, Jarvis M, Santos MG (2015) Overview of cosmology with the SKA.
PoS, AASKA14:016. https://doi.org/10.22323/1.215.0016
37. Martins CJAP, Achúcarro A (2008) Evolution of local and global monopole networks. Phys
Rev D 78:083541. https://doi.org/10.1103/PhysRevD.78.083541.
38. Martins CJAP, Shellard EPS (1996) Scale-invariant string evolution with friction. Phys Rev D
53:R575–R579. https://doi.org/10.1103/PhysRevD.53.R575.
39. Martins CJAP, Shellard EPS (2002) Extending the velocity dependent one scale string evolution
model. Phys Rev D 65:043514. https://doi.org/10.1103/PhysRevD.65.043514
40. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10.
1103/PhysRevD.93.043534
41. Matsunami D, Pogosian L, Saurabh A, Vachaspati T (2019) Decay of cosmic string loops due
to particle radiation. Phys Rev Lett 122(20):201301. https://doi.org/10.1103/PhysRevLett.122.
201301
42. Moore J, Shellard E, Martins C (2002) On the evolution of Abelian-Higgs string networks.
Phys Rev D 65:023503. https://doi.org/10.1103/PhysRevD.65.023503
43. Penzias AA, Wilson RW (1965) A measurement of excess antenna temperature at 4080 Mc/s.
APJ 142:419–421. https://doi.org/10.1086/148307
44. Pogosian L, Vachaspati T (1999) Cosmic microwave background anisotropy from wiggly
strings. Phys Rev D 60:083504. https://doi.org/10.1103/PhysRevD.60.083504
45. Polchinski J (2005) Cosmic superstrings revisited. Int J Mod Phys A 20:3413–3415. https://
doi.org/10.1142/S0217751X05026686. [AIP Conf Proc 743,331(2005)]
References
45
46. Polchinski J (2007) String theory. Vol. 2: Superstring theory and beyond. Cambridge University
Press. ISBN 9780511252280, 9780521633048, 9780521672283
47. Pourtsidou A, Avgoustidis A, Copeland EJ, Pogosian L, Steer DA (2011) Scaling configurations
of cosmic superstring networks and their cosmological implications. Phys Rev D 83:063525.
https://doi.org/10.1103/PhysRevD.83.063525
48. Press WH, Ryden BS, Spergel DN (1989) Dynamical evolution of domain walls in an expanding
universe. Astrophys J 347:590–604. https://doi.org/10.1086/168151
49. Randall L, Sundrum R (1999) A large mass hierarchy from a small extra dimension. Phys Rev
Lett 83:3370–3373. https://doi.org/10.1103/PhysRevLett.83.3370
50. Ringeval C, Sakellariadou M, Bouchet F (2007) Cosmological evolution of cosmic string loops.
JCAP 0702:023. https://doi.org/10.1088/1475-7516/2007/02/023
51. Rybak IY, Avgoustidis A, Martins CJAP (2019) Dynamics of junctions and the multitension velocity-dependent one-scale model. Phys Rev D 99:063516. https://doi.org/10.1103/
PhysRevD.99.063516.
52. Saffin PM (2005) A practical model for cosmic (p, q) superstrings. JHEP 09:011. https://doi.
org/10.1088/1126-6708/2005/09/011
53. Sakellariadou M (2008) Production of topological defects at the end of inflation. Lect Notes
Phys 738:359–392. https://doi.org/10.1007/978-3-540-74353-8_10
54. Sarangi S, Tye SHH (2002) Cosmic string production towards the end of brane inflation. Phys
Lett B 536:185–192. https://doi.org/10.1016/S0370-2693(02)01824-5
55. Schwarz JH (1995) An sl(2, z) multiplet of type iib superstrings. Phys Lett B 360(1): 13–18.
ISSN 0370-2693. https://doi.org/10.1016/0370-2693(95)01138-G. https://www.sciencedirect.
com/science/article/pii/037026939501138G
56. Sen A (2000) Non-BPS d-branes in string theory. Class Quantum Gravity 17(5):1251–1256.
https://doi.org/10.1088/0264-9381/17/5/334.
57. Sikivie P (1982) Of Axions, domain walls and the early universe. Phys Rev Lett 48:1156–1159.
https://doi.org/10.1103/PhysRevLett.48.1156
58. Sousa L, Avelino PP (2011) The cosmological evolution of p-brane networks. Phys Rev D
84:063502. https://doi.org/10.1103/PhysRevD.84.063502
59. Sousa L, Avelino PP (2014) Stochastic gravitational wave background generated by cosmic
string networks: the small-loop regime. Phys Rev D 89(8):083503. https://doi.org/10.1103/
PhysRevD.89.083503
60. Sousa L, Avelino PP (2015) Cosmic microwave background anisotropies generated by domain
wall networks. Phys Rev D 92(8):083520. https://doi.org/10.1103/PhysRevD.92.083520
61. Sousa L, Avelino PP (2016) Probing cosmic superstrings with gravitational waves. Phys Rev
D 94(6):063529. https://doi.org/10.1103/PhysRevD.94.063529
62. Sousa L, Avelino PP, Guedes GSF (2020) Full analytical approximation to the stochastic gravitational wave background generated by cosmic string networks. Phys Rev D 101(10):103508.
https://doi.org/10.1103/PhysRevD.101.103508
63. Tong D (2009) String theory
64. Vilenkin A (1981) Gravitational radiation from cosmic strings. Phys Lett 107B:47–50. https://
doi.org/10.1016/0370-2693(81)91144-8
65. Vilenkin A, Shellard EPS (2000) Cosmic strings and other topological defects. Cambridge University Press. ISBN 9780521654760. http://www.cambridge.org/mw/academic/
subjects/physics/theoretical-physics-and-mathematical-physics/cosmic-strings-and-othertopological-defects?format=PB
66. Wilson KG (1974) Confinement of quarks. Phys Rev D 10:2445–2459. https://doi.org/10.1103/
PhysRevD.10.2445.
67. Witten E (1985) Superconducting strings. Nucl Phys B 249:557–592. https://doi.org/10.1016/
0550-3213(85)90022-7
46
2 Topological Defects
68. Witten E (1985) Cosmic superstrings. Phys Lett 153B:243–246. https://doi.org/10.1016/03702693(85)90540-4
69. Witten E (1998) D-branes and K theory. JHEP 12:019. https://doi.org/10.1088/1126-6708/
1998/12/019
70. Zwiebach B (2006) A first course in string theory. Cambridge University Press. ISBN 978-0521-83143-7, 978-0-511-20757-0
Chapter 3
Supercomputing with Graphics
Processing Units
A very large part of space-time must be investigated, if reliable
results are to be obtained.
Alan Turing
3.1 An Introduction to High Performance Computing
For well over a decade, the empirical Moore’s law (a doubling in transistor count
should occur every 18–24 months, courtesy of refined manufacturing processes) has
been upheld. Up until the early 2000’s the way to use these extra transistors was to
create more complex processors, larger caches and much faster clock frequencies.
However, this eventually proved unfeasible: simply increasing clock frequencies
in a much smaller package eventually resulted in power and heat limitations.
The way around this was to introduce parallelism in the processor itself, either
through vector based instructions, simultaneous multi-threading1 or simple glue
logic: just add a second independent (bar cache sharing) processor (called a “core”).
However, while hardware multi-threading is managed by the Operating System, and
vector instructions are compiler managed, the onus of multi-core usage falls to the
programmer. The type of programming needed to leverage multi-core architectures
is commonly know as parallel programming and, while challenging, it is a necessary
way to perform highly demanding (in computational time, for instance) scientific
computing tasks.
1
A thread can be defined as an instruction stream and in hardware based multi-threading, two
streams of different types of instructions can be executed simultaneously.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations,
Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_3
47
48
3 Supercomputing with Graphics Processing Units
Since the study of cosmic defects outlined in the introduction and expanded
upon in subsequent chapters does require leveraging extreme hardware resources,
an exploration of High Performance Computing (parallel computing essentially) is
necessary. Note that since defect simulations (especially field theory simulations) are
heavily bottlenecked they end up limiting analytical and observational constraints
both in current and for future observational studies. Improving current simulations
will then allow more accurate descriptions of defect networks and the reasons for this
are manyfold: increases in lattice size and resolution allow one to probe smaller and
smaller sub-horizon scales and thus more properly resolve small-scale structure of
strings, and also to evolve the lattice for a longer amount of time (the final simulation
time is normally given by the time when the horizon is half the box size).
As a consequence, a large part of the thesis was spent in creating from scratch two
defect evolution codes, one for global domain walls leveraging, the other for AbelianHiggs cosmic strings able to exploitGPU accelerators. However, it is understandable
that most readers of a thesis submitted for a doctoral degree in physics will not
necessarily be interested in the computing details underpinning the codes used. As
such this entire chapter will be an effort to self-contain all computing aspects of this
manuscript, and we will leave new results in physics to later chapters. We will review
some concepts on this area (both in the architectures and programming paradigms
used) and then present a parallel implementation of the evolution of the simplest
defect network: domain walls. With this said, we would like to stress that parallelism
is not a silver bullet: even well optimized codes can be bottlenecked, and already
achieving well-optimized code is a difficult task.
The second part of this chapter will contain original content from the following
publications:
• Work published in Physical Review E, titled “General purpose graphics-processingunit implementation of cosmological domain wall network evolution,” found at
the following reference [15];
• Work published in Astronomy and Computing, titled “Abelian-Higgs cosmic
string network evolution with the Compute Unified Device Architecture.” This
publication can be found at the following reference [16];
• Work published in Astronomy and Computing, titled “Abelian-Higgs cosmic string
evolution with multiple GPUs.” This publication can be found at the following
reference [17].
which discuss benchmarks and behavior of the implementations.
The first part, which begins already next subsection, contains a birds-eye overview
of Amdahl’s and Gustafson’s laws, architectures and programming paradigms and a
description ofGPU accelerators. This is non-original content which would otherwise
be located in the introduction, were it not for the fact that all computing content is
relegated here.
3.1 An Introduction to High Performance Computing
49
3.1.1 Amdahl and Gustafson’s Laws
In order to clearly expose how parallelism is absolutely necessary to execute large
simulations in a reasonable amount of time, we will explain two laws that uproot
all parallel computing. The first law is commonly called Amdahl’s law proposed in
1967 by computer scientist Gene Amdahl. It is a formula for predicting the theoretical
maximum speed-up at a fixed workload as the number of processors increase. This
law takes the following form:
S=
1
(1 − p) +
p
N
(3.1)
where S is the maximum theoretical speed-up, p the proportion of the program
that can be parallelized, N the number of processors. This law already states a
limitation of parallelizing programs: as the number of processors increases N → ∞
the maximum speed-up tends to S = 1/(1 − p) where 1 − p is the serial portion of
the code. Therefore, speedup is limited by the sequential part of the program.
Measured speed-up is obtained by computing,
S=
t1
tN
(3.2)
where t1 is the serial execution time, and t N is the execution time with N processors.
Characterizing how this measured speed-up varies with the number of processors for
a fixed problem size is known as strong scaling.
However, we don’t always want to execute the same problem size faster: we often
wish to execute larger and larger problem sizes. Conversely for a small problem size,
it might be overkill to use a large amount of processors. In 1988 computer scientists
John Gustafson and Edwin Barsis presented Gustafson’s law. In this law, one still
predicts the maximum theoretical speed-up, however the problem size scales with
the number of processors. This “scaled” speed-up is then,
S=s+ p×N
(3.3)
where s, p and N have exactly the same meaning as in Amdahl’s law. In this law
the speed-up scales linearly with the number of processors. The problem size for
each processor stays constant and additional processors are basically used to solve a
larger problem. Characterizing the measured speed-up is weak scaling. Weak scaling
is often necessary for applications that are limited by the amount of memory.
Both will be demonstrated for the developed Multi-GPU Abelian-Higgs simulation
code later in the chapter.
50
3 Supercomputing with Graphics Processing Units
3.1.2 Architectures and Programming Paradigms
As mentioned in the previous section, getting the best possible performance often
requires mapping to hardware present in typical supercomputers. As an introduction
we will briefly explore two different types of programming models and how these are
suited for different hardware architectures. The good news is that supercomputers
are often hybrid machines: the basic building blocks are indeed shared memory
machines daisy-chained into a distributed platform. We begin with the most familiar
architecture: shared memory. Unless otherwise stated, the material in this section
and subsections within is based on the following materials: [4, 5, 13, 43].
3.1.2.1
Vector and Mulithreaded Computing
According to Flynn’s taxonomy, computer architectures can be classified on how
instruction streams are applied to data. One possible class comprises the execution
of a single instruction being applied simultaneously on multiple data items. Most
contemporaneous processing devices these days support applying an instruction to
multiple data streams, either under vectorized instruction sets (as is the case of Intel
CPUs using MMX, SSE, AVX instruction) even if not designed from the ground up
to execute everything such a way. A specific variant of this classification is known as
Single Instruction Multiple Threads, in which threads are used to allow simultaneous
issuing of instructions to multiple data items. A thread is a stream of instructions
spawned from a single process, which shares memory with other threads also spawned
from the same process. We stress here an important distinction: even though a process
can be thought of as a stream of instructions as well, processes do not share memory,
and therefore explicit communications must be implemented. As a result, threaded
implementations are often memory-light when compared to distributed counterparts.
Architectures which support simultaneous execution of these threads and allow them
equal access to memory, are also often denoted shared memory architectures. As a
simple analogy to explain how shared memory architecture works, think of a white
board in an office, where two office mates collaborate: both office mates are working
on the same problem and have access to the same data (barring small amounts of
private data) (Fig. 3.1).
The most familiar example of a SIMD machine would be comprised of a familiar
Central Processing Unit (CPU or processor) and a pool of memory, which allows
vector instructions to be applied to multiple data items. In addition, it allows a process
to spawn multiple threads as well, where in each thread, an instruction can be executed
simultaneously in the cores available to it - this is known as Simultaneous Multi
Threading (SMT). While threads have existed even before the advent of parallel
computing (to allow for concurrency), they became quite useful in this area as
mapping of a thread to a single core, with no migration to other cores is an effective
way to parallelize applications. We can also think of an alternative classification
of computer architectures that maps well to threaded programming, often denoted a
3.1 An Introduction to High Performance Computing
51
Fig. 3.1 In the first image 3.1a, we show a die-shot of a 10-core Intel Ivy Bridge Xeon. In the
second, 3.1b, a schematic representation of the office analogy for shared memory architectures,
where office mates (threads) can read and write from the same board. Taken from [4]
shared memory architecture. In any such machine a common, shared pool of memory
is available to all threads of a given executing program.
There are also several programming paradigms that allow writing code specifically
optimized to these shared memory architectures: some are directive based (like
Open Multi Processing—OpenMP; Open Accelerators—OpenACC) and others
kernel based (Open Computing Language—OpenCL; Compute Unified Device
Architecture—CUDA), and, with the exception of CUDA, are known to work
on either accelerators and traditional CPU’s. We remark however, a peculiarity,
depending on the problem at hand and on the hardware used, well-optimized might
be more easy to achieve in one paradigm than another.2 They all share a couple of
things in common, essentially all the pitfalls of threaded programming: the need for
synchronization mechanisms to avoid threads interfering with each other (commonly
know as a “data race”) and the fact that they are limited to the amount of memory
available in a SIMD platform. There is a way to avoid being limited to a Processing
Unit (be it a GPU or a CPU), and that is to marry Message-Passing with the threaded
approach. It is thus inevitable, that in order to bypass this limitation it is necessary
to explore distributed memory architectures, to be covered in two sections.
For the purpose of this manuscript, and before we move on to distributed
architectures, we will dwell in more detail on another shared memory architecture,
built from the ground-up for parallel tasks (using both SIMD and SIMT): the
Graphical Processing Unit (GPU).
2
For instance, since OpenMP 4.5’s support for GPU’s is still in it’s infancy, it can be more difficult
to match the performance of a comparative CUDA based code.
52
3.1.2.2
3 Supercomputing with Graphics Processing Units
GPU’s as Computing Accelerators
GPU’s, while originally only for graphical purposes, have, in recent years, also been
used for data-parallel scientific tasks. At its core this results from a simple statement:
GPU’s are designed from the ground-up to perform as well as possible in parallel
workloads, often using a number of features to maximize compute throughput.
This is extremely clear already at an instruction level, where the parallelism is
evident in the lanewise SIMD instructions all cards (AMD or Nvidia) employ, which
essentially uses threads to implement SIMD (Nvidia calls this Single Instruction
Multiple Threads). To really contrast this with a typical CPU, let’s compare how
a GPU “cores” operates with how a traditional CPU core operates. A CUDA
core/Streaming Processor is an Arithmetic Logic Unit (ALU) which performs
operations on scalar integers or single-precision Floating Point values, and a collection
of such units is known as a Streaming Multiprocessor (Nvidia, since the Fermi
architecture) or the compute unit (AMD, ever since the premiere of the Graphics
Core Next architecture). The streaming multiprocessor (SM) contains 8 streaming
processors (SP). These SMs only get one instruction at time which means that the 8
SPs all execute the same instruction. We can think of such SMs as the GPU equivalent
of the CPU core. This is done through a warp (32 threads) where the 8 SPs spend
4 clock cycles executing a single instruction on multiple data (SIMD). In the AMD
case, each compute unit is composed of four SIMD units, each equipped with with
a 16-lane wide vector Arithmetic Logic Unit (vALU). Due to the previous structure
a group of 64 threads scheduled for execution in a compute unit over four cycles is
referred to as a “wavefront” (32 threads as “warp” in Nvidia cards, executed over
2 cycles). But! There’s more. Each SIMD unit also has an instruction buffer for 10
wavefronts, each of which can scheduled for execution. This means that, at any given
time, a Radeon M3953 with 28 compute units can handle 71,680 active threads.4 The
obvious cost is single-threaded performance. So unlike a CPU, GPU’s have to rely
on heavily thread-level parallelism.
In reality it’s not just the prior points that make GPU’s excel at scientific highly
parallel workloads, it’s also the fact that their design is heavily skewed to support
fast thread switching and a large number of high-bandwidth-low-latency pins are
dedicated to memory traffic. In other words, while some threads might be busy
executing a memory reading operation of sort, another group is executing at the same
time some floating point operation. A consequence of this is that GPU’s somewhat
encourage “oversubscription” of threads, again in marked contrast with a CPU, where
the ideal speed-up is limited by the number of cores.
The OpenCL memory model describes several types of memory: Global (which
on a graphics card corresponds to video memory), local (a fast-access cache on each
compute unit), constant (technically part of video memory as well, but constant) and
private (memory bound to each work-item/thread).
3
This exact card was used for developtment of the domain walls code, which will be shown in a
later section.
4 4 SIMD units × 10 wavefronts × 64 threads × 28 Compute Units.
3.1 An Introduction to High Performance Computing
3.1.2.3
53
Parallel Processing with Message-Passing
Typical supercomputers do not consist of a single very large CPU with a ridiculous
(106 , for instance) number of cores. In fact, the cost-effective solution is to try
and combine many network nodes, each with their own processor and processes
which can communicate through Message Passing. Each autonomous processor can
operate each on data resident in it’s own memory space. Keeping the alternative
architecture in mind, these would be an example of hybrid distributed and shared
memory architecture, with multiple nodes (each with their own sets of processing
elements—such as CPUs) connected via network interconnects. Returning again to
the whiteboard analogy: two people are working on the same problem but in different
offices, they each have their own copy of data with no implicit sharing. But what if
one person needs some data that only the other individual has? The solution is explicit
communication: person A calls person B by telephone to send whatever data person
is required and B accepts the data. We also note a curious detail, we’ve assumed that
each process would be at a different node, but this need not be the case, in fact it is
always possible to have several processes, each mapped to a core on a node with each
process having it’s own memory space, and communication occurs within the node.
This sounds inefficient, and perhaps socially awkward, from the perspective of the
analogy—two office-mates who each keep a half of the whiteboard for themselves
and to share insights they video call each other even though they are standing near one
another, but it is an alternative to thread parallelism. Eliminating the communication
costs in-node can be beneficial to performance but requires application developers to
combine two programming paradigms (which is a difficult task). Alternatively, some
implementations of MPI (such as MPICH or OpenMPI) do create a shared memory
space which is shared between processes (Figs. 3.2 and 3.3).
To describe in more detail an example of such architectures, consider a real-life
example: the Piz Daint Cray XC50 supercomputer, housed at the Swiss Centre for
Scientific Computing in Lugano, Switzerland. It consists of several nodes, that can
be grouped together according to their characteristics. In the case of the CPU/GPU
nodes (hybrid), each with one 12-core Intel Haswell Xeon (E5-2690v3) and an Nvidia
Telsa P100 accelerator; multicore nodes with two 18 cores Intel Haswell Xeon’s (E52695v4); and then login nodes (not to be used for compute tasks) with 10-core Intel
Haswell Xeon’s. Four compute nodes are then stacked in one compute blade and 16 of
these form a chassis. Each cabinet can then be populated by three chassis. Everything
is connected in a dragonfly topology (all-to-all) by means of Cray’s proprietary highspeed network interconnect Aries. The full machine contains 5704 hybrid nodes,
1431 multicore nodes, and a grand total of 387872 cores with a theoretical peak
performance of 27154.3 TFlop/s (ranking 12th in top 500 supercomputer rank as of
November 2020). The main reason we describe succinctly the CPU/GPU numbers
is because this particular supercomputer will be used later in the thesis (Fig. 3.4).
Given that we described the typical supercomputer architecture and that we
described the need for Message-Passing, only one question remains to be answered:
which programming paradigm must we use here? The gold standard for MessagePassing is the Message-Passing Interface (MPI), a paradigm which allows different
DRAM Memory
Instruction Cache
Scheduler
Scheduler
Streaming Multiprocessor
Instruction Cache
Streaming Multiprocessor
L2 Cache
Scheduler
Instruction Cache
Streaming Multiprocessor
DRAM Memory
Memory Controller
Fig. 3.2 Schematic summaries of the structure of 3rd generation Graphics Core Next cards (Fig. 3.2a) and Pascal GP100 cards (Fig. 3.2b), showcasing the
number of Compute Units/Streaming Multi-Processors. Taken from [6] and [3], respectively
Memory Controller
54
3 Supercomputing with Graphics Processing Units
3.1 An Introduction to High Performance Computing
55
Fig. 3.3 The office analogy used to explain two processes in distinct nodes (Fig. 3.3a) and in the
same node (Fig. 3.3b). Taken from [4]
Fig. 3.4 The Cray XC50 compute blade (with four compute nodes), the building block of the Piz
Daint supercomputer in Fig. 3.4a. The Dragonfly topology (all-to-all) used at different hierarchical
ranks of Daint’s network. Taken from [13]
types of communications between processes (collective or point-to-point) and the
synchronization is handled by the messaging passing (thus avoiding the data-races
present in the threaded approach). To illustrate synchronization in message-passing,
let’s return to the “two offices analogy” where person A sends a letter to person B
and person A only knows when the letter was sent not when it was received (this
is an example of an asynchronous send). Person B waits for some precious data to
arrive in his mailbox, and a receive is given as completed as soon as the message
hits his inbox (this is a synchronous receive). In this case, no data in A can be
corrupted by B and vice-versa, there is in fact no way for A and B to access the each
others data without explicitly sending/receiving a message. MPI also offers another
communication mode: non-blocking or blocking (which can either be synchronous
or asynchronous), which can be implied if work is allowed to be done (non-blocking)
while the communication has yet to be completed (if no work is allowed, then the
operation is blocking). To put it simply synchronicity is related to completion of
56
3 Supercomputing with Graphics Processing Units
an operation, but it is not responsible for declaring when control is returned to the
program in question.
Having briefly described the architectures that characterize parallel machines and
the programming paradigms used to exploit them we now move on to describing
efforts to parallelize defect simulations on GPU accelerators.
3.2 Global Domain Walls
In this chapter we will begin to describe the first code that was ported toGPU
accelerators. It was the cosmic domain walls code of [11, 31, 32], capable of not only
evolving fields such that a network of domain walls eventually appears and evolves,
but also capable of extracting “useful” quantities about the network. Let us define
what is meant by “useful” more explicitly.
As mentioned in the previous chapter, some cosmic defects are capable of
undergoing scaling evolution, where a characteristic scale (defect separation) grows
linearly with time and the network on average has a constant velocity. Depending on
how this scale is related to the energy density, defects can be expected to overclose
the Universe (for example, domain walls; however see next chapter for a way to
avoid this fate) or neither disappear nor overclose it (such as in the case of cosmic
strings). We note that this behavior is crucial both for analytical studies and numerical
studies of defects, and with good reason. From the analytical point-of-view, the way
the network loses energy (and therefore sustains scaling) has direct implications on
analytical models of string evolution (which in turn has implications on observational
consequences of the network itself). And from the numerical side, given that most
simulations cannot last long enough to cover all of cosmological evolution, scaling is
the only way one can extrapolate a set of small short-lived simulations to the required
cosmological scales, by taking the appropriate observables and, for instance, using
them to calibrate analytical models (see next chapter). Scaling is also a way to validate
defect simulations—which will be required in this chapter.
For domain walls it can be shown, both from analytic arguments [30, 51], or via
simulations [31–33] that wall networks in a universe whose scale factor grows as a
power law, a ∝ t m ∝ η m/(1−m) , where m is the expansion rate, for a sufficient amount
of conformal time 5 will reach this attractor linear scaling solution. Radiation and
matter are examples of epochs where scaling is reached in simulations, corresponding
to m = 1/2 and m = 2/3, respectively.
In order to numerically characterize if a walls simulation is in scaling we measure
and output two quantities of interest, the energy density ρ and the mean velocity
squared v 2 and check if they obey the following relations,
ρ ∝ η μ , γv ∝ η ν ,
5
Ie. sufficiently large dynamical range.
(3.4)
3.2 Global Domain Walls
57
where μ = −1 and ν = 0. We then define a threshold for these exponents to differ
from their expected values and use it to infer if the network is scaling. Note that
for simulations with a smaller dynamic range this asymptotic regime may not be
reached, as there is not enough time for the simulation to transition from the initial
conditions to the expected scaling behavior. This translates itself on a dependence
of the exponents μ and ν on the box size [19, 31].
At this point, we should also define how the two diagnostic quantities are computed
in walls simulations. The first is the energy density, or equivalently the comoving
wall area per unit volume. It is computed using the following robust method [47],
ρ=
A
=
V
n · d A = A
links
δ±
∇φ
;
|φ,x | + |φ,y | + |φ,z |
(3.5)
where δ± is unity every time the field changes sign between two neighboring points
(called a link, indicates the possible presence of a wall) and zero otherwise.
The other useful quantity to be computed is the root-mean-squared velocity v of
the network and corresponding Lorentz factor. A possible method to compute the
velocity was demonstrated in [31] and it uses the ratio between kinetic and potential
energy (E k and V (φ)),
1 Ek
,
(3.6)
(γv)2 =
2N walls V (φ)
√
where γ = 1/ 1 − v 2 is the Lorentz factor and the sum is over the number N of
grid points containing walls. The criteria to identify walls can be changed, but the
standard version of this code identifies a point as part of wall if the absolute value
of the scalar field φ doesn’t exceed a certain threshold. For the standard values used
throughout this chapter, this threshold corresponds to 0.5.
3.2.1 Single Accelerator
3.2.1.1
Validation
This code uses the Open Computing Language (OpenCL) 1.2 framework by the
Khronos Consortium [36], and was created on a machine equipped with the following:
a Radeon R9 M395 graphics card (28 compute units clocked at 834 MHz, and
2048 MiB video memory clocked at 1365 MHz) and an Intel i5 6600k processor
(3.3 GHz core clock; can boost to 3.9 GHz) and 8192 MiB of system memory
(clocked at 1867 MHz). The non-vectorized sequential version of the code will run
on the aforementioned processor, mapped to core 0, and our implementation will run
on the graphics card.
Before we compare the performance benefits of this GPU-based implementation,
we mustn’t forget that it must behave sufficiently close to the CPU version in order
58
3 Supercomputing with Graphics Processing Units
Table 3.1 Scaling exponents μ and ν (with 1σ statistical errors) for single and double precision
runs, calculated using the points beyond log(η) = 2.58 for both 20482 and 1283 simulations
20482
μ
ν
Single precision
Double precision
−0.9381 ± 0.0003
−0.9381 ± 0.0003
−0.0374 ± 0.0005
−0.0374 ± 0.0005
1283
Single precision
Double precision
μ
−0.956 ± 0.003
−0.905 ± 0.002
ν
−0.034 ± 0.006
−0.025 ± 0.004
for it to be used for scientific purposes. In this section, we will validate the new code
in two ways: first from the expected scaling behavior and then by direct comparison
with the CPU version.
For the first step in validating the code, we use sets of five single and double
precision runs of 20482 and 1283 boxes to calculate the previously defined scaling
exponents of Eq. 3.4. We keep the same set of five fixed initial conditions for both
the CPU and GPU versions, and for both single and double precision.6 Our decision
to compare how single and double precision fare, is simply related to performance
considerations vs. getting the correct behavior out of the code. Note that we compile
our code in single precision with the flag -cl-fp32-correctly-rounded-divide-sqrt in
order to ensure correct (ie. IE774 standard correct) rounding in division and square
root operations in OpenCL. This is flag not necessary in double precision.
We then take simulation outputs and calculate scaling exponents using a linear fit
to the later part of the simulation, as we are only interested in the achieved asymptotic
behavior. The computed exponents can be seen in Table 3.1, and are in agreement
with previous simulations of boxes of these sizes for the CPU version [19, 31]. We
can additionally impose criteria on the scaling exponents to ensure consistency with
scaling. Using the criteria of [32], one can see these exponents are consistent with the
expected behavior. The uncertainties shown are statistical, arising from the average
of each set of five runs. We note that additional systematic uncertainties in these
quantities are discussed in [33], but for the purposes of our comparison we do not
compute and include them. Figure 3.5 showcases the evolution of the density ρ and
the root-mean-squared velocity γv, illustrating both the initial period of the evolution
(where the simulation gradually “forgets” the initial conditions) and the later scaling
period.
As a final validation, we compare CPU and GPU evolution in both single and
double precision, timestep by timestep, in terms of the computed quantities. The
differences seem to be negligible after early timesteps, as the wall network starts
scaling. In fact, at scaling timesteps, they seem to be consistent with no difference
between the two implementations (to machine precision). This can be seen in Fig. 3.6.
At early timesteps, the differences are indeed large (especially in the single precision
6
The initial conditions are generated in single precision and used for both single and double precision
cases.
3.2 Global Domain Walls
59
Fig. 3.5 Evolution of the density (ρ, left panels) and the velocity (γv, right panels), for 20482
and 1283 box simulations (top and bottom panels, respectively), showcasing the expected scaling
behavior
Fig. 3.6 Relative error between sequential and parallel code implementations, with 20482 boxes
(top panels) and 1283 (bottom panels), for both single (blue) and double precision (orange), for the
wall density (left panels) and the velocity (right panels)
60
3 Supercomputing with Graphics Processing Units
case, even reaching 10−2 ), however these large differences seem to have no impact
at later timesteps.
3.2.1.2
Performance
Since we have just validated our implementation, now it is time to describe its
performance, both by describing the optimizations used and by benchmarking
the program. In OpenCL, applications are subdivided in data-parallel functions
named kernels, which are to be compiled at run-time (so-called Just In-Time—JIT
compilation). Every single step of the PRS algorithm corresponds to a kernel, and so
do the velocity and density calculations, with a separate kernel for the sums. These
kernels execute in order, one timestep at a time. In order to illustrate this consider
the following simple example of a Laplacian kernel (which is basically a simplified
version of the first step in the PRS algorithm),
1
2
__kernel void Lapl(__global float *Lphi, __global float *P0,
const uint size_box) {
3
// Indeces
const uint
const uint
const uint
const uint
const uint
const uint
const uint
const uint
const uint
const uint
4
5
6
7
8
9
10
11
12
13
14
i = get_global_id(0);
j = get_global_id(1);
k = get_global_id(2);
id = i + j * size_box + k * size_box
ip1 = (i + 1) & (size_box - 1);
im1 = (i - 1 + size_box) & (size_box
jp1 = (j + 1) & (size_box - 1);
jm1 = (j - 1 + size_box) & (size_box
kp1 = (k + 1) & (size_box - 1);
km1 = (k - 1 + size_box) & (size_box
* size_box;
- 1);
- 1);
- 1);
15
// Laplacian
Lphi[id] = -6.0f * P0[id] + P0[im1 +
P0[ip1 + j * size_box + k
P0[i + jm1 * size_box + k
P0[i + jp1 * size_box + k
P0[i + j * size_box + km1
P0[i + j * size_box + kp1
16
17
18
19
20
21
22
23
j
*
*
*
*
*
* size_box
size_box *
size_box *
size_box *
size_box *
size_box *
+ k * size_box * size_box] +
size_box] +
size_box] +
size_box] +
size_box] +
size_box];
}
which is launched by selecting the number of threads and the typical subdivisions in
groups of threads. The OpenCL compiler (and the underlying hardware) handle
the distribution of threads automatically to each Compute Unit following our
recommendation of how to group threads. The fields are represented in memory using
buffer data (linear contiguous), and the number of threads (work-items) spawned,
in our case, are always equal to the number of points in a box. Not all kernels are
optimized to use local memory. We will see in the Abelian-Higgs code (in CUDA)
how to do this, where tiled halos will be implemented. An example of a kernel where
titled local memory could be used is the Laplacian and the density kernel—the latter
being one of the most time-consuming kernels (25.81% of runtime). Two kernels
where we already employ local memory would be the velocity kernel (where we
3.2 Global Domain Walls
61
highlight the increased granularity of atomic additions needed to count the number
of walls, as seen in [40]) and partial sums kernel.
Speaking of the sum reduction kernel, we use the scalar version of the kernel in
[48]. The reason for not using the vector one (where instead of using vector datatypes like float4, one would use float, for instance), is that the preferred vector width7
of the device in question is, for both double and floating point types
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT : 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT : 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE : 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE : 1
so it is equivalent to use either kernel. The sum reduction kernel computes a partial
sum for each local memory patch, and all partial sums are transferred back to the
host side, summed and written to disk. The only place where the CPU computes
something sequentially is to sum the partial sums which result from the calculation
of the velocity and the density.
In order to test a small optimization, we use two queues running asynchronously
with respect to each other, ensuring overlap between execution of compute kernels
and data transfer operations. In order to quantify if there is a data transfer bottleneck,
we first remark how the overlap between compute and data transfer works: one has
two different queues, one for data transfer, one for kernel execution, and using events
one triggers data transfer upon completion of the partial sums kernel. Unfortunately,
to allow for overlap, the enqueueing of data transfers needs to be non-blocking. After
enqueueing some kernels, it is important to wait for the data transfers to complete (to
ensure the sum of partial sums isn’t summing over garbage). Since the waiting time
will also include waiting for compute kernels to finish (again enqueueing kernels
is a non-blocking operation) we estimate the time taken by data transfer to roughly
correspond to the difference between waiting time and total kernel execution time.
Comparing to the runtime reveals that data transfer is only a bottleneck in low
resolution boxes.
From the roofline model above (see Fig. 3.7), this implementation seems to
overall have low arithmetic intensity, and to be mostly compute bound (when taking
local memory bandwidth into account, see roofline model in Fig. 3.7). With AMD’s
CodeXL, we report that all kernels have an occupancy of 70% and the main bottleneck
on the number of waves per SIMD unit seems to be the number of scalar registers (96
are used, which corresponds to a score of 8/10, below 81 would be ideal). The tool
also shows that the implementation would highly benefit from more local memory
and more vector register usage (where 4-23 vector registers are used, depending on
the kernel).
We must now discuss one final detail: our code is compatible with both double and
single precision (as seen in the validation section). It should be noted that consumer
7
The OpenCL compiler automatically packs the preferred number of work-items or threads into
Single-Instruction-Multiple-Data lanes and henceforth takes advantage of the native vector width.
We mentioned previously that GPUs tend to pack instructions into vectors. Originally, for AMD at
least, it was preferable to pack everything into float4 vectors as well. However, the native width is
the number of elements a Vector Arithmetic Logic Unit can process at once.
62
3 Supercomputing with Graphics Processing Units
Fig. 3.7 On the top left: an estimate of the time wasted in data transfer, or how good the overlap
between compute and data transfer is, for different box sizes. On the bottom left panel, a roofline
model for the 2D implementation. On the right-hand-side, we can see the relative speed-up of the
parallel version when compared to the sequential one, for both single (blue) and double (orange)
precision, for 2D (top) and 3D (bottom) simulations
facing graphics cards usually have much lower peak operations per second and as
such there is a severe speed penalty in utilizing double precision (for AMD cards
based on the Graphics Core Next architecture this varies between 1/2 and 1/16 of
peak single precision operations per second [1, 2]). This expectation is confirmed by
our analysis, summarized in Fig. 3.7. We can also highlight how the relative speed-up
grows with box size until we reach a certain plateau. This is relatively easy to justify,
also from the roofline: GPU’s require a large number of threads to hide the latency
of operations and with too small a box size there are too few threads to do so. As the
size increases, we start having enough threads however, eventually we hit another
bottleneck (described above in the roofline discussion).
3.3 Abelian-Higgs Strings
We can now describe the Abelian-Higgs single GPU implementation. We begin by
noting that there is a practical problem with the discretization presented earlier in the
first chapter, particularly when evolving the simulations at relatively large expansion
rates (m ≥ 0.9 at single precision): the divisions and multiplications of a 2 factors
can, at early time-steps, go beyond the scope of one’s precision and thus result in
field variables being evaluated to NaN. The first thing we do will be to modify the
equations as follows,
3.3 Abelian-Higgs Strings
63
(1 + δ)x,η+ 2 = (1 − δ)x,η− 2 + η[D −j D +j φx,η
1
1
−
x,η+ 21
(1 + ω)E i
λ0 2β x,η 2
a (|φ | − σ 2 )φx,η ]
2 η
x,η− 21
= (1 − ω)E i
+ η[−∂i− Fi j
+ 2e02 aη2β I m[φ∗ Di+ φ]x,η ]
(3.7)
(3.8)
where
ω = δ(1 − β)
δ=
1
mη
1 dlna η
α
= α
.
2 dlnη η
2 (1 − m)η
(3.9)
(3.10)
η
As in the previous domain walls case, δ = 21 α dlna
is responsible for Hubble
dlnη η
damping the scalar field. The difference is that one also introduces an unphysical
term responsible for damping the gauge field, which is multiplied by ω = (1 − β)δ.
Note that when s = 1 this gauge damping vanishes (as expected in the physical
equations of motion).
This is a similar trick to what is employed in the discretization for walls of [44],
since the scheme is now Crank-Nicolson at the first order with respect to the time
terms. Note that our previous problem with precision is non-existent by selecting
constant comoving string thickness or in other words, β = 0 (such that a 2β is replaced
by 1) and δ and ω are computed directly from the expansion rate. We set α = 2.0 as
this is the choice for the physical equations of motion in the continuum limit. Other
choices of α could possibly be explored later.
For the purpose of validating the simulation (and later calibrating semi-analytical
models), two essential diagnostics that are to be extracted from the simulations. First a
correlation length ξ and second a root mean squared velocity v 2 . Before describing
how to compute these outputs in the simulations, we must begin by defining some
relevant quantities. For starters, the Lagrangian density
Lx =
1
1
2
1
−
Re[
E
−
]
i
j
2e2 a 2 x 2
2e2 a 2 x 4 i j
+ ||2 − |D + φ|2 − a 2 V (φ)
= E−B+P−D−V,
(3.11)
where for convenience in the last line we have also introduced a simplified notation
for each of its components. From here we can also define an energy density and a
pressure,
(3.12)
ρx = E + B + P + D + V
px =
1
E+B
+P− D−V.
3
3
(3.13)
64
3 Supercomputing with Graphics Processing Units
From here √
we can already define two possible estimators for the correlation length
ξ. Since ξ = V/l (with V and l respectively being the box volume and the total
length of string it contains) we need only find the total length of string in the box.
The first estimator makes use of the fact that the Lagrangian density should vanish
away from the string, while being negatively valued at the string itself [8]; this leads
to the definition
−μV
,
(3.14)
ξL = x Lx
which we will from now on refer to as the Lagrangian-based correlation length
estimator. The second option requires computing a gauge invariant winding, as
defined in [29], at each plaquette (lattice cell face),
Wi j =
1
(Yi,x + Y j,x+i − Yi,x+ j − Y j,x ) ,
2π
(3.15)
where Yi is given by
Yi = [(φx )arg − (φx+ki )arg + Ai,x ]π − Ai,x .
(3.16)
The presence of a string segment of length x piercing a plaquette is indicated by
Wi j = 0. Obtaining the total string length is a trivial matter of adding the number of
segments throughout the lattice,
V
,
i j,x Wi j,x
ξW =
(3.17)
which results in the winding-based correlation length estimator. Given that we assume
straight segments connecting cells (which then form collections of strings), this
length estimator suffers from the taxi-cab geometry of the strings. To correct for an
overestimation of the length, we must multiply it by a factor of π/6, as seen in [49].
For the v 2 estimators we will use two possible choices. The first one comes from
[26, 27] and is based on the fact that for the conjugate scalar field momentum, ,
the field configuration of a moving straight string can be given by Lorentz boosts of
the static straight string ansatz. A detailed derivation can be found in [27]. For our
purposes it is sufficient to simply quote the estimator itself,
v2
where R is given by
φ
=
2R
,
1+ R
||2 W
R= x + 2
x,i |D x,i φ| W
(3.18)
(3.19)
3.3 Abelian-Higgs Strings
65
and W is a weight function, meant to merely localize the estimators around strings.
We will refer to this as a the field-based velocity estimator. The second possibility is
to use the equation of state estimator of [27], in which the box averages of density
and pressure (each appropriately weighted by a weight function W) then yield
v
2
ω
1
x px Wx
;
1+3
=
2
x ρx Wx
(3.20)
we will refer to this as the equation of state based velocity estimator. As for the
choices of weight functions, we will explore two possibilities: one from the literature,
in which the Lagrangian has been used (see [20, 26]) and a second choice which
corresponds to the potential of the scalar field V (φ).
Development, benchmarking and validation were all conducted on an NVIDIA
Quadro P5000, with 2560 CUDA cores, a core clock of 1607 MHz and 16384 MiB of
memory, clocked at 1126 MHz, graceously donated by Nvidia Corporation. Before
we study the performance properties of our specific implementation, we have to
validate it and cross-check it by comparison with literature results.
3.3.1 Single Accelerator
3.3.1.1
Validation
The very first check of this code is to verify if Gauss’s law is obeyed to machine
precision (either with the modified or the old discretization) at a lattice site. It matters
not which specific lattice site, so we chose one exactly at the middle of the lattice.
Both evolution schemes preserve Gauss’s law to machine precision and the new
scheme correctly reproduces the dynamics of the network. These two characteristics
can be seen for a 2563 box size simulation (same initial conditions in both the old and
new scheme), in Fig. 3.8, where the left panel shows Gauss’s law violations at single
precision, and the behavior of the winding based correlation length estimator, ξW . We
add that, at most, the relative difference between ξW in the old and new discretization
is of 0.02%. Additionally, inspecting iso-surfaces of the scalar field provides visual
confirmation that a network of strings is formed and evolves as expected—some
examples can be seen in Fig. 3.9.
For the domain walls GPU code (previous section) we had a serial version of
the simulation which had been previously tested and validated [32, 33, 44], and
could directly compare outputs. In the present strings case both the serial and
parallel versions were completely new to us, so it is perhaps more useful to take
an alternative approach. As such the simulations were validated by evaluating the
asymptotic scaling values and comparing them with the results in the literature (which
come from Lattice Abelian Higgs—LAH). We have performed simulations in the
two standard cosmological epochs—radiation and matter epochs—wherein the scale
factor evolves as a(η) ∝ η and a(η) ∝ η 2 .
66
3 Supercomputing with Graphics Processing Units
Fig. 3.8 The top panel shows the Gauss’s law violation operator G x at lattice site i, j, k = 0, 0, 0
at single precision for a box of size 2563 , while the bottom panel shows a winding based correlation
length estimator ξW for two simulations using the same initial conditions, with either the new or
the old discretizations, described in the text. For this comparison we use the same parameters as in
the rest of the paper: x = 0.5, η = 0.1, λ0 = 2, e0 = 1 and σ = 1
Qualitatively scaling is clearly seen, as expected, when looking at the plots of
Fig. 3.10 where the evolution of the Lagrangian-based mean string separation ξ˙L , the
winding based mean string separation ξ˙W and the mean velocity squared v 2 5123
runs are shown in the two aforementioned epochs. Quantitatively, a comparison of
asymptotic quantities, average mean string separation and average velocity squared,
can be found in Table 3.2. These quantities are measured in the dynamic range in
which the networks have reached scaling (we roughly use the last 10% of the dynamic
range). In each case we obtain a statistical error from the average 5 different runs,
each with different random initial conditions.
As shown in the aforementioned table, comparing our results for the mean string
separation slope with the values of ξ˙L in [8] (at the same 5123 lattice size) we find
excellent agreement for both matter and radiation era simulations. Our other length
estimator, ξ˙W , is also in excellent agreement with the results of the first, but in mild
disagreement (about 1.5 standard deviations) with the value found in [9] (larger lattice
size 10243 ). The discrepancy is even when comparing larger lattice simulations of
[20]. In [9] is is explained that this might be due to a period of early cooling being
applied to the initial conditions in the works of [9, 20] and an extended dynamic
range, which jointly lead to a slow drift in the dξ/dη value (changing the ξ˙ from the
0.3 value to about 0.28 at 10243 and then about 0.24 at 40963 ).
We will later explore in depth how the degree of initial cooling can affect the
evolution of the network (and how this is reflected in the calibration of a semianalytical model), and how even with no cooling there is a slow drift of ξ/η with
growing lattice size. The lack of cooling in the initial conditions is also evident
in the oscillations present in the Lagrangian (field-based) mean string separation
estimator ξL shown in Fig. 3.10. This evidently signals the presence of some radiation
caused from the high gradients present in the initial conditions and is not dissipated
3.3 Abelian-Higgs Strings
67
Fig. 3.9 Isosurfaces of the absolute value of the complex scalar field with the value of 0.5, showing
a network of Abelian-Higgs cosmic strings in the radiation and matter eras (left and right side panels
respectively). All pictures are from simulations with box size 5123 ; the top panels correspond to
timestep 60, while the bottom panels correspond to timestep 128
completely. Note that such radiation does not prevent scaling, neither in our case,
nor in high-resolution domain walls simulations of [32, 33].
As for the velocities, the comparison is a bit more qualitative, since there are fewer
measurements in the literature. The most recent work in field theory local string
velocities comes from [27], which tabulates values obtained from extrapolating to
infinitely thin strings – a process made to compare with Nambu-Goto velocities. We
still present the extrapolated values in Table 3.2 (denoted with ext.), but we also note
68
3 Supercomputing with Graphics Processing Units
Fig. 3.10 The evolution of the mean string separation ξL (left panel) and the winding based mean
string separation ξW (right panel) for 5123 runs, in the radiation era (m = 1/2, blue lines without
core growth and green lines with core growth) and in the matter era (m = 2/3, red lines without
core growth). The values of the mean string separation slopes, ξ̇, inferred after the networks have
reached scaling, are also added in the figure legends. These slopes are an average from the slopes
of 5 different runs
that the more correct comparison is likely with the asymptotic values one can infer
from visually reading the measured velocities from the top and bottom panels of
Fig. 9 of [27]. These are denoted in our table as asy..
When it comes to velocities we also observe late-time scaling behaviour—
constant velocity, as expected. This can be seen in Fig. 3.11. Note as well the
large oscillations present at early times as the network relaxes from the choice of
initial conditions. Considering the asymptotic values, our velocity estimates are in
reasonable agreement – the only exception being the potential-weighted estimator
in radiation epoch (note however, it is not statistically significant). In the matter era,
the comparison is also not direct as more dynamic range is required to evolve the
true equations of motion (with or without a period of initial core growth). As such
we directly compare our velocities in constant comoving width simulations to values
obtained from physical simulations of [27], for matter epoch. Technically it should
not be a significant issue comparing the two, since, as mentioned in [27] since PRS
or physical simulations give similar velocities.
3.3.1.2
Performance
Given that we have just validated our implementation it is now time to describe
its performance: not only by describing how to port a cosmic string simulation
toGPU accelerators but also benchmarking it and noting where it can be improved.
This program utilizes an application programming interface named Compute Unified
Device Architecture (CUDA, by NVIDIA Corporation) similar to OpenCL but with
some key differences: it can only target NVIDIA GPU accelerators, it supports C++14
s
1
1
1
1
1
0
0
0
0
1
1
1
0
0
0
0
Epoch
Radiation
Radiation
Radiation
Radiation
Radiation
Radiation
Radiation
Radiation
Radiation
Matter
Matter
Matter
Matter
Matter
Matter
Matter
ξ̇W
–
–
–
0.265 ± 0.005
0.32 ± 0.03
–
0.26 ± 0.02
0.244 ± 0.005
0.32 ± 0.03
–
–
0.277 ± 0.008
–
0.28 ± 0.01
0.247 ± 0.008
0.29 ± 0.02
ξ̇L
0.33 ± 0.02
–
–
0.254 ± 0.005
0.32 ± 0.01
0.31 ± 0.02
–
0.234 ± 0.006
0.30 ± 0.02
–
–
0.261 ± 0.008
0.30 ± 0.01
–
0.235 ± 0.008
0.29 ± 0.01
V
–
–
–
–
0.34 ± 0.01
–
–
–
0.34 ± 0.01
–
–
–
–
–
–
0.26 ± 0.01
v2
L
–
–
–
–
0.31 ± 0.01
–
–
–
0.31 ± 0.01
–
–
–
–
–
–
0.27 ± 0.01
v2
ω
–
0.37 ± 0.01 (ext.)
0.30 ± 0.01 (asy.)
–
0.31 ± 0.01
–
–
–
0.32 ± 0.01
0.31 ± 0.01 (ext.)
0.26 ± 0.01 (asy.)
–
–
–
–
0.25 ± 0.01
v2
[8]
[27]@40963
[27]@40963
[20]@40963
This work
[8]
[9]@10243
[20]@40963
This work
[27]@40963
[27]@40963
[20]@40963
[8]
[9]@10243
[20]@40963
This work
References
Table 3.2 Numerical results for asymptotic scaling quantities ξ̇ (calculated using the Lagrangian or the winding estimator) and the three velocity estimators,
for s = 0 and s = 1 (where applicable), from our simulations and from the literature. All quantities were measured in simulations with box sizes of 5123 ,
except where otherwise noted. The ext. and asy. denote values that were extrapolated (rather than directly measured from the simulations) and inferred by visual
inspection of Fig. 9 of [27]; see the main text for further discussion of these
3.3 Abelian-Higgs Strings
69
70
3 Supercomputing with Graphics Processing Units
Fig. 3.11 The evolution of the mean square velocity, estimated in three different ways: by using
the estimator of Eq. 3.18, weighted by the potential v 2 V (top left panel) or the Lagrangian v 2 L
(top right panel), or by using the equation of state parameter v 2 ω (bottom panel, see equation
3.20). In all cases the results are from 5123 runs, in the radiation era (m = 1/2, blue lines without
core growth and green lines with core growth) and in the matter era (m = 2/3 red lines without core
growth). The asymptotic values of the velocities, inferred after the networks have reached scaling,
are also depicted
even at kernel level, it is proprietary and has a lot of support, both at the level of
documentation and performance analysis tools.
As previously mentioned one of the roles of these interfaces (OpenCL, CUDA)
is to abstract away details of the underlying hardware. Much like OpenCL, even if
threads maintain the organization of being grouped into thread blocks, we do not need
(in fact we cannot) assign multiple groups of threads to Streaming Multiprocessors—
this is done “automagically” without any intervention from us. It then becomes a
matter of how many threads are spawned and how large each thread block is. In our
case, we spawn a number of threads equal to N 2 , where is the size of the side of an
N 3 cubic lattice. Note that this is in contrast to what was done in the previous section
for Global domain walls, the reason for this will become apparent later.
CUDA, like OpenCL, is not directive based and instead splits applications into
data parallel functions named kernels. As was the case for domain walls, we will map
kernels to different pieces evolution equations and to network averaged quantities
3.3 Abelian-Higgs Strings
71
estimators. For evolving the fields from every timestep, there are three kernels: the
first one corresponds to Eq. 2.48, the second to Eq. 2.49 and the third one to Eqs.
2.50–2.51. These will be denoted update, updateE and updateφA respectively. And
then there are the kernels for computing useful quantities such as the mean string
separations or the velocities. All of these kernels implement finite differences and as
such there is a real risk of becoming memory bound. In the present case we will go
further than in the domain walls case and optimize memory loads even more. Doing
so entails again the usual considerations of the memory hierarchy of a GPU and
an optimization called “Z-cycling” which will be described in the next paragraph.
As in OpenCL, the abstract memory model of CUDA describes several types of
memory and the ones relevant for the next paragraph include: global memory (which
corresponds to video memory), shared (a fast-on-chip memory available to groups
of threads, known as thread blocks) and local memory (per-thread memory, even
faster on-chip memory). We note here a small source of confusion: the equivalent of
shared memory in CUDA is called local memory in OpenCL, and the equivalent of
local memory in CUDA is called private memory in OpenCL (Fig. 3.12).
We will now describe the “Z-cycling” algorithm common to all kernels (except the
kernel to update φ and A) used for Abelian-Higgs cosmic strings. The importance of
such an algorithm for optimal finite differences was studied in [34, 37, 41, 53]. The
goal is to relevant field quantities from global memory, where the fields, declared
as contiguous arrays of structures (float2 and float4) reside, into shared memory.
However, instead of merely loading all of the quantities of the entire box, one begins
by loading only the data residing at zero height k = 0. This is 2D slab is decomposed
into different chunks of shared memory (oriented along X and Y) denominated as
tiles. One could naively expect these tiles to have size equal to the number of threads
in a thread block (in the corresponding x and y directions), however as seen from
the equations of motion every such tile would need data values from neighboring
tiles for example at i ± 1, j ± 1, k ± 1. Let us first address the values along x and
y (i ± 1 and j ± 1). In order contain the necessary data, and given that there is no
Fig. 3.12 Schematic representation of the stepA kernel (which updates the conjugate momentum
): the tile in the middle represents a 2D shared memory tile where current values in the z-direction
(site k) are loaded together with halos, and register values (blue pinheads) hold field values directly
above and below (k − 1 and k + 1)
72
3 Supercomputing with Graphics Processing Units
communication between thread blocks, one must pad these XY-tiles by 2, and load
the appropriate boundary field values into these padding regions (commonly known
as ghost cells or halos) before any computation can take place. Now we are only
missing the values at k ± 1. For the first 2D slab, at k = 0 we load field values at the
next and previous z-direction positions, respectively k = 1 and k = N − 1 (periodic
boundary conditions) into local memory. After this we can perform the necessary
computation for this specific slab (say for instance, to use the scalar field and gauge
field to update the conjugate scalar field). In order to proceed one simply cycles
upwards to k = 1, k = 2,... k = N − 1 and recycles whatever values already loaded
before from local to shared memory. For example in the case of the 2D slab at k = 1
one has to load new values from global memory into local memory at k = 2, but
the values at k = 1 were previously loaded into local memory and can moved to
shared memory. The code listing below showcases this technique for computing the
Laplacian (assuming gauge fields),
1
2
3
4
5
template<typename scalar, typename gauge>
__global__ void
__launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP)
LaplacianZCycle(float ct, scalar *P0, scalar *Lapl, gauge *A,
int xstart, int xend, int ystart, int yend, int zstart, int zend)
6
7
8
const uint i = blockIdx.x * blockDim.x + threadIdx.x;
const uint j = blockIdx.y * blockDim.y + threadIdx.y;
9
10
11
12
13
14
15
16
17
if (i >= xend)
return;
if (j >= yend)
return;
if (i < xstart)
return;
if (j < ystart)
return;
18
19
20
__shared__ scalar P0_s[TILE_Y + 2][TILE_X + 2];
__shared__ gauge A_s[TILE_Y + 2][TILE_X + 2];
21
22
23
const uint si = threadIdx.x + 1;
const uint sj = threadIdx.y + 1;
24
25
26
27
28
29
//Hold Index, Nadir and Azimuth values in Registers
scalar P0_top;
scalar P0_bot = P0(i, j, zstart - 1);
scalar P0_cur = P0(i, j, zstart);
gauge A_bot = A(i, j, zstart - 1);
30
31
32
33
34
for (int k = zstart; k < zend; k++)
{
//Hold Index, Nadir and Azimuth values in Registers
P0_top = P0(i, j, k + 1);
35
36
37
38
P0_s[sj][si] = P0_cur;
A_s[sj][si] = A(i, j, k);
3.3 Abelian-Higgs Strings
73
39
//West/southmost halo for X,Y tile
if (threadIdx.x == 0)
{
P0_s[sj][si - 1] = P0(i - 1, j, k);
A_s[sj][si - 1] = A(i - 1, j, k);
}
40
41
42
43
44
45
46
if (threadIdx.y == 0)
{
P0_s[sj - 1][si] = P0(i, j - 1, k);
A_s[sj - 1][si] = A(i, j - 1, k);
}
47
48
49
50
51
52
//East/Northmost halo for X,Y tile
if (threadIdx.x == blockDim.x - 1)
{
P0_s[sj][si + 1] = P0(i + 1, j, k);
A_s[sj][si + 1] = A(i + 1, j, k);
}
if (threadIdx.y == blockDim.y - 1)
{
P0_s[sj + 1][si] = P0(i, j + 1, k);
A_s[sj + 1][si] = A(i, j + 1, k);
}
__syncthreads();
53
54
55
56
57
58
59
60
61
62
63
64
65
Lapl(i,j,k) = (-6.0f*P0_s[sj][si]
+ P0_s[sj][si+1]*Cexp(nI*A_s[sj][si].x)
+ P0_s[sj+1][si]*Cexp(nI*A_s[sj][si].y)
+ P0_top*Cexp(nI*A_s[sj][si].z)
+ P0_s[sj][si-1]*Cexp(I*A_s[sj][si-1].x)
+ P0_s[sj-1][si]*Cexp(I*A_s[sj-1][si].y)
+ P0_bot*Cexp(I*A_bot.z) );
P0_bot = P0_s[sj][si];
A_bot = A_s[sj][si];
P0_cur = P0_top;
__syncthreads();
66
67
68
69
70
71
72
73
74
75
76
}
77
78
}
This has a much higher bandwidth than reading from global memory. Likewise
the previous shared memory tile at k = 0 can be loaded into the bottom local
memory buffers. A schematic representation can be found in Fig. 3.12. The only
slight variation on this technique that we use concerns the kernel to update conjugate
of the gauge field: where too much of this local memory would be used and thus we
opted to use shared memory again for both field values above and below the current
2D slab.
Therein lies another advantage in first loading field values into shared and local
memory and then using these cached values into computation themselves, where
many times a specific component of a field is necessary. This poses an interesting
question: should we load specific components from the fields in global memory or
first cache them in shared/local and then load the necessary component? The two
74
3 Supercomputing with Graphics Processing Units
types of field variables (float2 and float4) are given by aligned vector types defined in
CUDA. This means that to read from global memory a specific component for all field
values it would result in an un-coalesced read (non-sequential) which has a heavy
impact in performance. For the opposite approach, given that in Nvidia GPU’s every
successive 128 bytes can be loaded by a warp (32 threads) in one single transaction
(to the point that a vector load instruction actually exists) one loads entire float2’s
and float4’s from global memory into the faster caches available and then accesses
specific components as necessary. this essentially moves the previous bottleneck to
another type of memory, where bandwidth naturally higher and where the imposition
of coalesced reads need not apply.
The kernel to update φ and A is the only straightforward kernel, since software
pre-fetching cannot be implemented—we only need field values at positions i, j, k
for both reading and writing. Since this kernel performs coalesced reads and writes
we can use as an additional baseline for comparison.
Given that we described the “Z-cycling” optimization and the thread block size
to be used, it is now a matter of finding and characterizing the main bottlenecks.
Fortunately Nvidia’s Visual Profiler makes this job a little easier than AMD CodeXL
(from which we counted assembly instructions and then created the roofline as
needed) by simply pointing out the main bottleneck for each kernel. The evolution
kernels are all limited by the global memory bandwidth when reading field values
from it. As such, the relevant performance metric here is global memory read
bandwidth and how it compares to the peak theoretical bandwidth of the test bench
GPU. Such data can be found in Table 3.3, for box size 2563 in radiation era and
comoving width. The lattice size is chosen as one requires two things: a sufficiently
large lattice size such that the number of threads spawned is large enough to hide
latency, and a small enough size in order to run each kernel quickly thousands of
times. One can easily read from the table that we are sufficiently close to the peak
read bandwidth of 288.5 GB/s.8 Note that we can also compare with the bandwidth
of updateφA kernel, since, as previously mentioned it purely reads and writes in a
coalesced fashion to memory. Interestingly, the kernel updateE seems to hit a 2.6%
larger bandwidth while update is about 8.7% lower.
There is also one detail we have thus far not discussed in detail. In order to ensure
the bottleneck is not the amount of shared memory/local memory used—too much
would result in having a less than ideal Streaming Multiprocessor occupancy—and to
perform less granular global memory loads—in order to minimize the performance
hit from loading the padding of each tile, one must hit a careful balance between
having a too large and a too small thread block size. The only rule-of-thumb in this
case is to maintain the thread block size a multiple of 32, given that instructions are
issued in 32-thread warps. Apart from that it’s matter of trial-and-error and using
the Nvidia Occupancy Calculator [52]. The best performance seemed to yield at a
size of (32, 4) at 2563 box size. Note that even though this will result in a occupancy
of around 50% with kernels which use “Z-cycling,” it is large enough for latency
8
This peak bandwidth is the one reported by the Visual Profiler, we do not perform additional
measurements with custom kernels.
3.3 Abelian-Higgs Strings
75
Table 3.3 The effective Global Load and Store bandwidth (in units of GB/s), the number of Floating
Point Operations per second (in Teraflops) and the achieved occupancy, for a 2563 simulation in
the radiation era and for constant comoving width
Kernel
GLS (GB/s)
TFLOPs
Occupancy (%)
update
updateE
updateφA
VelRVLag
VelRWLag
VelEoSLag
Winding
245.92
271.60
264.83
212.88
217.09
193.70
133.72
1.59
0.80
0.04
1.68
1.67
1.80
1.26
48.0
47.8
90.8
48.3
48.3
48.3
48.3
hiding such that the main bottleneck (as discussed in the previous paragraph) is global
memory read bandwidth.
Now we move on to the kernels for mean string separation and velocity estimation.
Those that compute the lagrangian and therefore the mean string separation estimator
derived from it, ξL , will also compute a velocity based on one of the three following
estimators: vφ2 weighted with the potential, the same velocity estimator but weighted
with the lagrangian or using the equation of state weighted by the lagrangian
vω2 . These will be denoted throughout the manuscript respectively as “VelRVLag,”
“VelRWLag” and “VelEoSLag.” In addition there will be an additional kernel for
computing the mean string separation from computing the winding at each lattice
plaquette, denoted simply “Winding.” All of these compute their necessary quantities
for every thread and store the result in local memory, and subsequently a sum
reduction is performed at thread block level by leveraging the CUDA Unbound
library [39]. Since each block computes a partial (same as in global domain walls)
sum, we must transfer the results back to host/cpu side and sum them before writing
to disk.
According to the Visual Profiler, the three velocity kernels end up being both
compute and memory bound, and on account of this also have similar performance
characteristics, as seen in Table 3.3. The memory limitations are explained from
over-use of local memory (this ends up being necessary for the partial sums). This
was mitigated to some extent by turning on the compiler flag –Xptxas dlcm=ca
which caches register spills in L1 cache. However, improving the compute part is
more challenging, as several of the compiler flags to either use hardware intrinsics or
reduce precision can significantly alter the mean quantities computed, often changing
the overall asymptotic values themselves or increasing uncertainties.
Since computing these mean quantities would dominate the overall runtime (in
proportions similar to what happens in the walls case) we will apply another simple
optimization related to the fact that it is not necessary to compute these every timestep.
As as example, and by default in the application, they are computed every n = 5
timesteps. This effectively reduces the time spent on diagnostic computation, as can
be seen on Table 3.4. We also note that in a typical production run one will select
76
3 Supercomputing with Graphics Processing Units
Table 3.4 Total elapsed times, in seconds, on the three evolution kernels plus estimator kernels
(which calculate averaged network quantities) of one 2563 and one 5123 run. The total time is
computed by taking the average runtime and multiplying by the number of times a kernel is executed
in a single run, i.e. T ime × n calls . The first three kernels are executed every timestep, while the
others are executed only every five timesteps
Kernel
2563
5123
stepA
stepB
stepC
VelRVLag
VelRWLag
VelEoSLag
Winding
2.29
3.10
2.89
0.57
0.56
0.64
0.87
36.86
50.02
46.30
8.38
8.45
8.88
11.92
only one of the Velocity estimators and optionally the Winding estimator. The total
run time then depends on the choice of diagnostic output and on the frequency of
said output.
We additionally remark as well that the time spent on Input/Output operations
(which for now we are referring to as transferring the partial sums, computing the total
sum and writing said output) is also dependent on this optimization. While previously
we considered in the domain walls case diagnostic outputs every timestep, and then
we additionally added the compute-data-transfer overlap, here we the reduction of the
frequency of outputs greatly reduces the need for such overlap. In addition, since this
would generically complicate our life when writing the multi-GPU generalization,
as we will need compute-communication overlaps.
Given that we have described the application’s performance in terms of bandwidth
and runtime, we now need to compare to a baseline CPU implementation, in order to
infer if there really is a performance advantage to the use of GPU’s. First we can do
a ball-park estimate on the speed-up merely based on the typical memory bandwidth
figure of current memory available to high-performance computing multicore CPUs
and the video memory on GPUs. Based on the figures presented in [38], one
can possibly stipulate a speed-up of an order of magnitude for bandwidth and
compute-bound applications. This ballpark estimate also rests on the assumption
that both applications are parallelized and optimized (and will reach close to peak
bandwidth/throughput). In the case of walls, considered in the previous section, a
speed-up of about two orders was obtained when comparing to a single-threaded
implementation. For Abelian-Higgs cosmic strings, we don’t have a CPU code to
compare to, but it is reasonable to estimate a speed-up of about an order of magnitude.
We can however compare to the simulations of [8, 27], also known as Lattice
Abelian-Higgs (LAH), with benchmarks kindly provided via private communication
[25]. The benchmarks provided are expressed in time to full evolution multiplied
by the number of processors per number of sites (expressed in units gpu-sec/site
vs core-sec/site) for the evolution update and for generating and writing windings.
3.3 Abelian-Higgs Strings
77
Table 3.5 The performance of each of our kernels given in gpu-sec/site. Note that in order to
compare with the LAH performance, provided by [25], we present the performance of all of three
update kernels together (stepA+B+C, computed by summing the time for each update kernel from
Table 3.4 and then dividing by the number of sites). These numbers can be obtained from the times
at 5123 in Table 3.4 by dividing by the number of calls of each kernel in a run (1280 for steps A and
B and C, 256 for estimators) and by dividing over the size of the lattice 5123
Kernel
Performance GPU-AH
Performance LAH
(gpu-sec/site at 5123 )
(core-sec/site at 40963 )
stepA+B+C
VelRVLag
VelRWLag
VelEoSLag
Winding
7.75 · 10−10
2.43 · 10−10
2.46 · 10−10
2.58 · 10−10
3.47 · 10−10
8 · 10−7
Not available
Not available
Not available
1.3 · 10−6
Timings are for a single run in Monte Rosa supercomputer at 40963 lattice size
with 32768 cores. Before we perform such a comparison we must note a caveat:
[25] remarked that their simulation is not too optimised for the target architecture
(multi-core CPU). While we were provided the timings for generating and writing
to disk windings (where it was mentioned that most time is spent writing to disk),
the reality is that in our case we output only the average quantity and not the the
full winding output. In a later section, we will revisit this benchmark and compare
with an in-situ visualization approach (where indeed the winding output is written to
disk). As such, it is entirely correct to compare the two winding figures. LAH for the
evolution updates takes (analogously Update + UpdateE + UpdateφA) for 10903
timesteps, 40963 lattice size and the winding compute and write (1300 timesteps)
have the respective performance figures: 8 · 10−7 core-sec/site and 1.3 · 10−6 coresec/site. A 5123 run from start to finish (1280 timesteps) reveals a performance of
about 7.75 · 10−10 gpu-sec/site for updating all fields. Our numbers are thus 3 orders
of magnitude less time updating fields on any given lattice site. We present the
figures for all other kernels in the rest of Table 3.5. It is often said that GPU cores are
drastically slower (or more correctly, have lower throughput) than CPU cores. It is
thus curious that the table seems to imply that they are only about 2.5 times slower.
Speculatively this might be due to the different levels of optimization of the code.
3.3.2 Multiple Accelerators
Havind described the single-GPU implementation we will now move on the next
step of adapting our code to modern supercomputing facilities: to involve MessagePassing to enable the exploitation of a large number ofGPU accelerators to evolve
fields on a lattice. We will again take the same methodology of validating and
78
3 Supercomputing with Graphics Processing Units
describing the performance for this multi-GPU implementation as done previously
in Global domain walls and the single GPU Abelian-Higgs case.
3.3.2.1
Validation
We thus begin by first describing if it behaves exactly as expected, according
to the literature. AS previously done for the single-GPU case and for the global
domain walls case this will involve comparing the measured physical mean network
properties with those found in the literature—for both rate of change of mean string
separation and asymptotic mean velocity squared. We will perform such a comparison
at different box sizes, including the single-GPU result at 5123 lattices, and 10243 ,
20483 and 40963 for an average of five runs in matter era and radiation era. Note
that all simulation parameters (lattice spacing, timestep size and coupling constants)
are to be exactly as they were in the previous section, with constant comoving-width
(PRS) enabled.
The results of this comparison can be found in the Table 3.6, with the corresponding
figures of our runs in Fig. 3.13. Overall, it seems our code in agreement with literature
results, within one-sigma uncertainty. In more detail, we can describe each estimator
and it compares. For the mean string separation, it seems there is a dependency on
the lattice size and therefore it is not correct to compare values for different lattice
sizes. Given our simulations are in agreement with reference literature values at each
lattice size, this slow drift is also present in our work. Exploring if this slow drift
is due to lattice size (and therefore resolution of small scales) is affects energy loss
mechanisms of the network is something we will explore in the next chapter. We also
note that the two different mean string separation estimators (based on winding and
on the Lagrangian density) also lead to fully consistent values for the slopes.
As for the average velocity squared, our previous work [16, 18] using the
estimators of [27] had already established qualitative agreement with the values in
the literature, up to and including 40963 simulations. Here this agreement continues.
Note that in the case of the velocities there is no statistically significant drift in the
scaling value as a function of the box size. On the other hand, and in agreement
both with [27] and with our earlier 5123 study, our present analysis confirms that the
velocity estimator based on the gradient on φ leads to values that are consistently
lower than those of the equation of state estimator, by about ten per cent at all box
sizes Fig. 3.14.
In terms of the mean velocity squared no dependency on lattice size is observed
in our work. While literature values are available only in [27], we can note that
all values are consistent within one-sigma uncertainties, with the aforementioned
literature reference and with the values report in the previous section. We also confirm
the difference between velocity estimators found in [27] where vφ2 is consistently
lower than the equation of state estimator, by about 10% for all boxes.
10243
20483
40963
5123
5123
10243
40963
10243
20483
40963
5123
5123
10243
40963
1/2
1/2
1/2
1/2
1/2
1/2
1/2
2/3
2/3
2/3
2/3
2/3
2/3
2/3
0.280 ± 0.023
0.268 ± 0.011
0.253 ± 0.007
0.30 ± 0.02
0.31 ± 0.02
–
0.234 ± 0.006
0.279 ± 0.016
0.256 ± 0.006
0.252 ± 0.010
0.29 ± 0.01
0.30 ± 0.01
–
0.235 ± 0.008
0.282 ± 0.026
0.267 ± 0.010
0.251 ± 0.006
0.32 ± 0.03
–
0.26 ± 0.02
0.244 ± 0.005
0.285 ± 0.017
0.257 ± 0.005
0.250 ± 0.009
0.29 ± 0.02
–
0.28 ± 0.01
0.247 ± 0.008
0.306 ± 0.003
0.312 ± 0.001
0.308 ± 0.002
0.32 ± 0.01
–
–
–
0.255 ± 0.003
0.264 ± 0.001
0.265 ± 0.001
0.27 ± 0.01
–
–
–
0.272 ± 0.002
0.283 ± 0.001
0.282 ± 0.001
0.31 ± 0.01
–
–
–
0.228 ± 0.004
0.240 ± 0.001
0.243 ± 0.001
0.25 ± 0.01
–
–
–
This section
This section
This section
Previous section
[8]
[9]
[20]
This section
This section
This section
Previous section
[8]
[9]
[20]
Table 3.6 The asymptotic rate of change of the mean string separation ξ and the mean velocity squared v 2 for the estimators defined in the text, in the radiation
and matter eras, for our simulations with box sizes of 40963 , 20483 and 10243 , using 4096, 512 and 64 GPUs respectively. The error bars are the statistical
uncertainties from averages of 20 runs with different initial conditions. For comparison we show the results reported in [16] from the single GPU code (for
averages of 12 5123 simulations) as well as results from simulations with CPU-based codes. The range of timesteps used for each fit to the GPU simulations is
respectively [517, 1023.5], [300.5, 511.5], [100.5, 255.5], [80, 128] for the 40963 , 20483 , 10243 and 5123 simulations
ξ˙L
ξ˙W
Size
m
v2 ω
v2 φ
References
3.3 Abelian-Higgs Strings
79
80
3 Supercomputing with Graphics Processing Units
Fig. 3.13 The evolution of the four relevant average network estimators, defined in
Eqs. 3.14, 3.17, 3.18 and 3.20, for the average of 20 runs in the radiation-dominated epoch (m =
1/2), with lattice sizes of 40963 , 20483 and 10243 , using 4096, 512 and 64 GPUs respectively. We
assume constant co-moving width throughout
3.3.2.2
Performance
Now that the multi-GPU implementation has been validated, we turn our attention to
describing its performance. As previously mentioned the standard way to subdivide a
domain across multiple processing elements (which in our case are GPU accelerators)
and communicate boundary terms, is to use the Message Passing Interface (MPI).
This is necessary for example to compute conjugate scalar and gauge fields and
E, and for the computation of the network average quantities. Since MPI is designed
to be used across not only several cores of a specific processor but also across
several nodes in a network this will ensure the code not only works in a multi-GPU
workstation but also for a large supercomputing architecture. We will throughout
this work assume a 3D domain decomposition (more scalable communications). In
practice this means each sub-domain will be extended by 2 along each direction
in order to store boundary values from nearby sub-domains. This means the typical
dimension of each sub-domain will then be (N X + 2) × (N Y + 2) × (N Z + 2) and
the full lattice size will be of N X × N pr ocsx × N Y × N pr ocs y × N Z × N pr ocsz ,
where N pr ocsi indicates the number of processes along a given direction.
3.3 Abelian-Higgs Strings
81
Fig. 3.14 Same as Fig. 3.13, for the matter-dominated epoch (m = 2/3)
Given that boundary terms are to be communicated to and fro neighbouring subdomains, we additionally need to create CUDA kernels to extract field values (from
the outer core of each sub-domain) into additional buffers which are sent via Isend
(from the MPI standard). Similarly we will also require unpacking kernels which
take the received values and write them to the boundary of each sub-domain. The
receive instruction will correspondingly also be Irecv. These are both non-blocking
instructions, which, as mentioned in the standard good practices [43], allow MPI to
decide the best pattern of communication. To ensure completion of communication
before unpacking, a barrier is necessary and therefore the instruction WaitAll is used.
In order to both make our life easier, and comply with standard good practices (see
again [43]) we will use store these communications buffers in Unified Memory and
allow Remote Direct Memory Access. This will mean these buffers are resident on
GPU, but accessible by the host side, of course after appropriate data-movement (and
amenable to CUDA enabled MPI). There is also an additional challenge here, which
comes from the fact that CUDA kernels are non-blocking with respect to the host. This
basically means that after the host side launches a kernel on a GPU, it can immediately
execute other instructions without waiting for the completion of the kernel on the
GPU. This could immediately cause issues, if one send a communication buffer,
without it having been updated by the GPU. The correct way out of this problem is
82
3 Supercomputing with Graphics Processing Units
to use the CUDAStreamSynchronize function to ensure a (un)packing kernel and all
kernels before it have completed in time, at appropriate steps.
We will note an additional “detail” about the boundary conditions of each subdomain. It is necessary (as can be seen from the computation of the backwards
derivative ∂− Fi j and the winding) to ensure diagonal terms of the gauge field Ai
components are also available at boundaries. The way to correctly handle this is to use
the “diagonal” trick which uses the values previously exchanged in a given direction
to update the new corners necessary in the next direction. This establishes a depency
of exchanges in one direction with the previous exchange, as therefore imposes
communications must proceed in order. In other words, a typical communication
step will look like the following series of steps:
1. Pack the values to be sent to neighbors along X (outer part of the inner N X ×
N Y × N Z part of the domain);
2. Send packed buffers to neighboring sub-domains;
3. Unpack received values into boundaries in the X direction;
4. Pack the values from the outer cells of the inner N X × N Y × N Z along with
values received from the previous exchange (to ensure corners are appropriately
handled), to be sent to neighbors in the Y direction;
5. Exchange packed buffers in the Y-direction;
6. Unpack received buffers into the boundaries of the sub-domain;
7. Pack the values from not only the inner core from the sub-domain but also from
the two previous exchanges;
8. Exchange packed buffers in the Z-direction;
9. Unpack received buffers into boundaries.
This “diagonal trick” is schematically represented in Fig. 3.15 for the 2D case. Note
how the field values fromt he inner core (blue) are packed into the red buffers, an
exchange occurs along X, and then these same value are used to pack field values
along Y (red buffers). Generalizing this to 3D implies that the Z-buffers are extended
by two along both X and Y directions, and red and green buffers are then used to
update the last halos atop and below each sub-domain. The attentive reader will also
notice the presence of magenta and orange boxes, this will be explained below.
Having completed all communications steps, one already has the necessary
ingredients for computing the conjugate momenta fields, and E, and then, without
any additional communication update φ and A. This would already allow one to
simulate with multiple GPUs. However in order to simulate large lattices, we still
need to achieve near-perfect weak scaling for the almost the full machine. This can
be achieved by considering compute-communication overlap. This can be achieved
by updating the fields in the inner core of each sub-domain while one starts to
collect field values for the exchange buffers along the X-direction. Note that the
outermost points of this inner core will require values from the boundaries—which
are still being communicated. This means that computation cannot occur for the
inner core of size N X × N Y × N Z but must instead occur for an inner core of
size (N X − 2) × (N Y − 2) × (N Z − 2). After the exchange in the X-direction is
3.3 Abelian-Higgs Strings
83
NX
NY
y
x
I
II
III
Fig. 3.15 The packing procedure along two different directions. Blue represents the core of each
domain (of size N X × N Y × N Z ), red represents the buffers being filled with appropriate values
to send to neighboring sub-domains and green represents an already received buffer. In the left
panel, the buffer values come only from the blue inner core. After communication has taken place
in this first direction, one can unpack the received buffer into the boundaries of the sub-domain.
Once done, one can start packing the communication buffers for the next direction. This involves
using not only the blue inner core but the freshly unpacked boundaries (in green). The pink boxes
indicate domain areas where one updates fields E and φ either as the packing procedure begins
(left and middle panels) or after all communication has taken place (right panel) whereas orange
indicates these areas have already been updated
completed, we can then proceed with updating the outer part of the inner core since
the necessary boundary is now available, while communication is now allowed to
proceed along the Y-direction. This proceeds until all halos are updated, and all field
values of the conjugate momenta throughout the sub-domain are updated. Returning
to the schematic view of Fig. 3.15, the inner areas which are updated (compute) are
highlighted in magenta at each step, while the already updated areas are depicted in
orange.
There is an additional ingredient to allowing multiple CUDA kernels to overlap
as well and this is to allow them to execute in seperate CUDA streams asynchronous
to eaach other. There will be one stream per pack/unpack kernel pair (for a given
exchanges along a given direction) and the kernels to perform field updates. At the end
of the day, given the interdependencies exchanges in different directions, one must
also ensure these are respected/enforced. The way to do this is to use a combination of
cudaEventRecord (signaling the completion of a kernel) and cudaStreamWaitEvent
which awaits the completion of an event in a different stream. After the completion
of all communication operations and field E, updates we can proceed to update φ
and A.
Now that we have described the general way in which the multipleGPU accelerators
are to be used, we will now describe its scalability. All benchmarking was conducted
at the Piz Daint supercomputer, described in the previous section. All benchmarks
will assume the evolution of fields as previously mentioned and, in order to
84
3 Supercomputing with Graphics Processing Units
mimic a real production run we will additionally compute the Lagrangian based
mean string separation and the mean velocity squared estimated from vφ2 . These
network average quantities are to be computed at every five timesteps. Simulation
parameters are as before λ = 2.0, e = 1.0, x = 0.5, t = 0.2 ∗ x η0 = 1.0 and
η f inal = 0.5 ∗ N ∗ x where N is the lattice size of an N 3 lattice.
Before we characterize the application in terms of performance metrics it should
be noted which one of these will be more relevant and was thus the target of our
optimization efforts. While both strong and weak scaling can be relevant, weak
scaling is the one we will persue as increases in lattice size (and consequently dynamic
range) allow us to probe a larger range of scales between string width and the size
of the horizon. While one cannot simulate all of cosmological history in a single
simulation (due to obvious memory limitations), one often needs to extrapolate from
smaller simulations (with the aid of semi-analytical modelling as described in the
next section for instance). Strong scaling would be much more critical if wall-clock
times (to be presented next) would be larger. We do however remark that strong
scaling does still have importance in terms of finding the most efficient (in terms of
node-hour usage) configuration for a given run of a given lattice size.
Since in string simulations the final conformal time is directly proportional to the
size of the side of the simulation lattice, we need to find some way to quantify weak
scaling—either normalizing to time for a timestep (divided from total run time) or
fixing the amount of timesteps we evolve the box for. We measure only the amount
of time taken to evolve fields on a lattice for 630 timesteps—the necessary number
of timesteps to evolve a 2563 lattice (with x = 0.5) from the initial time to its final
conformal time. Additionally, we will also use some of times measured for strong
scaling to compute a derived weak scaling metric, based on the time to evolve a
lattice of size 5123 for 1270 timesteps.
For both types of scaling we will describe them using a speed-up factor S, whose
definition can be earlier in this chapter, and a parallel efficiency E. It is worth
mentioning that we compare to a reference wall-clock time tr e f , which while for
weak scaling corresponds to the time taken to evolve the smallest overall domain
size in a single GPU, for strong scaling it has a different definition. In the latter it
corresponds to the necessary amount of time to fully evolve a lattice of size N 3 in the
smallest number possible of GPUs in which all field variables for the full lattice fit.
The efficiency for weak scaling is trivially the speed-up converted to a percentage.
For strong scaling however, we re-scale the definition with the number of GPUs of
the reference runs,
n r e f tr e f
.
(3.21)
E str ong =
ntn
Having defined these, we can then proceed to describe the performance of our
application. Let us begin with strong scaling. It is obvious, both from the Table 3.7
and Fig. 3.16, that there exists a point beyond which we cease to obtain useful strong
scaling. We will assume “useful” means a minimum of 50% efficiency, however
we note that there is no consensus on the definition of “useful” scaling. This point
of dimishing returns happens when the sub-domain size becomes small enough, as
3.3 Abelian-Higgs Strings
85
Table 3.7 Strong scaling measurements for different lattice sizes reported in wall clock time to
fully simulate a network from start to finish. We also present the speed-up (relative to the reference
measurement) and a parallel efficiency
Box size
Number of
Domain
Wall-clock
Speed-Up
Efficiency
GPU’s
decomposition time
(x,y,z)
(s)
(%)
5123
10243
20483
40963
81923
1
2
8
32
8
64
512
64
512
512
4096
4096
(1,1,1)
(1,1,2)
(2,2,2)
(2,4,4)
(2,2,2)
(4,4,4)
(8,8,8)
(4,4,4)
(8,8,8)
(8,8,8)
(16,16,16)
(16,16,16)
96.0
50.1
18.2
6.59
217.39
37.06
12.48
438.45
76.15
948.52
156.96
1990.51
–
1.92
5.16
14.57
–
5.87
17.41
–
5.76
–
6.04
–
–
95.9
66.0
45.5
–
73.3
27.2
–
72.0
–
74.3
–
Table 3.8 Weak scaling measurements for fixed box size of 2563 per domain are presented above.
The wall-clock time corresponds to the time to complete 640 timesteps (the number of timesteps
for a full 2563 size simulation). In addition we present a speed-up as well as a parallel efficiency
Box size
Number of
Domain
Wall-clock
Speed-Up
Efficiency
GPU’s
decomposition time
(x,y,z)
(s)
(%)
2563
1
2
2562 × 512
256 × 5122
4
8
5123
10243
64
10242 × 2048 128
1024 × 20482 256
512
20483
20483 × 4096 1024
4096
40963
(1,1,1)
(1,1,2)
(1,2,2)
(2,2,2)
(4,4,4)
(4,4,8)
(4,8,8)
(8,8,8)
(8,8,16)
(16,16,16)
8.93
8.95
8.93
8.94
9.17
9.34
9.44
9.68
9.61
9.81
–
1.00
1.00
1.00
0.97
0.96
0.95
0.92
0.92
0.91
–
99.7
99.9
99.8
97.4
95.6
94.6
92.2
92.9
91.2
one approaches 1283 . This appears to be relatively common in most multi-GPU
implementations (see for example [22, 42]). The reason for poor strong scaling is
twofold: first the amount of communications relative to the execution of CUDA
kernels, where not even the overlap is effective at hiding communications cost; and
second the inherent latency of launching CUDA kernels. In fact one might even say
the reason why the overlap is less effective is due to this latency cost as well.
86
3 Supercomputing with Graphics Processing Units
Fig. 3.16 Performance indicators for our multiple GPU code; strong scaling is shown in the top
panels, while weak scaling can be seen in the bottom ones. The Left-hand side panels show wallclock time for a full-run (for the strong scaling plot) or the amount of wall-clock time necessary
to complete 630 time-steps (for the weak scaling plot). The corresponding parallel efficiencies as
defined in the text (see e.g. Eq. 3.21 for strong scaling) are presented on the right hand side panels
3.3 Abelian-Higgs Strings
87
Table 3.9 Derived weak scaling measurements for fixed box size of 5123 per domain are presented
above. The wall-clock time corresponds to the time to complete 1270 time-steps (the number of
time-steps for a full 5123 size simulation). These are derived from the strong scaling measurements
above. In addition we present a speed-up as well as a parallel efficiency
Box size
Number of
Domain
Wall-clock
Speed-Up
Efficiency
GPU’s
decomposition time
(x,y,z)
(s)
(%)
5123
10243
20483
40963
81923
1
8
64
512
4096
(1,1,1)
(2,2,2)
(4,4,4)
(8,8,8)
(16,16,16)
96.0
108.27
108.97
117.75
124.41
–
0.89
0.88
0.82
0.77
–
88.7
88.1
81.5
77.1
However, as discussed previously having excellent strong scaling might not even
be necessary if the runtimes across the board are relatively short for the amount of
processes involved. Before we discuss this, let’s turn our attention to weak scaling.
For a smaller sub-domain size of 2563 weak scaling is near-perfect until all the way
to 4096 GPUs, as seen in Table 3.8 and corresponding figures. This seems to sugest
the overlap is successful in hiding communication costs. Curiously, if we look at
the derived weak scaling benchmarks for 5123 , it is not quite as successul, with
weak scaling efficiency being minimally of order 77% at 4096 GPUs. The reason
for this is not quite well known, although we suspect more careful of the overlap
to compensate for larger buffers being exchanged is required to achieve better weak
scaling Table 3.9.
Let us now return to the point of using a small amount of node-hours for the
simulation. In order to do so we will enact a comparison with the Lattice Abelian
Higgs benchmark (graciously sent to us by [25] in node-hours. Our simulation can
evolve a 40963 lattice in around 3 min of wall-clock time with 4096 GPUs or about
140–180 node-hours depending on the number of GPUs used. In the case of LAH, one
40963 run in the Monte Rosa supercomputer which used 32768 cores (equivalently
1024 nodes) would take 5251 node-hours. This suggests a node-hour speed-up of a
factor of around 30 which also indirectly leads to another advantage: we can thus
simulate lattices larger than what is seen in the literature so far (of about 4096) by
going to a simulation of 81923 , which is about a factor 8 larger in volume and a factor
2 larger in dynamic range. The time for such a production run as seen in Table 3.7
is around a 33.2 min of wall-clock time, with 4096 GPUs, which gives around 2200
node-hours.
We would also like to comment on the choice of using node-hours for the
comparison and not core-hours. As mentioned previously, the idea of a traditional
“core” does not apply to a GPU and so it might not necessarily be correct to compare in
terms of cores. In most PRACE documentation for applying to new supercomputing
centers the conversion is to consider the number of cores of a node—12 for Daint—
and the number of Streaming Multiprocessors of the GPU—56 for Tesla P100—
88
3 Supercomputing with Graphics Processing Units
which would avoid penalizing the GPU for the large number of cores. Note as well
some ambiguity here, while we do not use the full node, and merely one CPU core
per node, one could argue the amount of core-hours would involve multiplying by a
factor of 57, and not 68. On Piz Daint, the practical solution to this dilemma is taken:
all book-keeping is in node-hours. To avoid these ambiguities, we follow the Daint
approach and make the comparison in node-hours.
As a final word on this simulation, we note that while not outputting any extra
information from the lattice we end up not being IO-bound this also would only
enable the exploration of the simplest type of semi-analytical modelling (ie. without
any extra arguments to describe additional degrees of freedom). However, if we
choose to output more information, one quickly ends up with an onerous amount of
data, which would be too much even for the high-end facilities of Piz Daint. One
way to short-circuit this is to add in-situ capabilities to the simulation, as will be
discussed next.
3.4 In-Situ Analysis and Visualization Pipeline
3.4.1 Reduced Winding Output
For almost every single High-Performance Computing simulation, Input/Output
is the most stringent bottleneck. This is unfortunate as many times the scientific
exploration of a simulation can be bottlenecked by the data needs of the code. This
problem is exacerbated by the rate with which storage solutions evolve versus the rate
with which computational throughput evolves: the first is much slower than the latter.
While outputting initial conditions can be done only once, and checkpointing is not
necessary (though supported in our simulation), outputting only average quantities
is a way to evade this problem, though it is limiting in the amount of science that
can be extracted. We is thus not surprising we started looking to the literature
for possible solutions to this bottleneck. One solution is the application of in-situ
analysis/visualization techniques as seen in Camata et al. [12], Rautenhaus et al.
[45], Mu et al. [35], Sohrabi et al. [50], KAGEYAMA et al. [28], wherein the output
data is heavily reduced prior to output and thus only a smaller amount of data is
written to disk. To demonstrate this technique in our simulation, our approach will
be to output cells pierced by strings, instead of outputting for every single cell in the
simulation if a string is present or not.
In order to use in-situ for outputting string positions in the lattice, we will use
the library ParaView Catalyst (tested 5.8.0). There are two components necessary to
achieve this: an Adaptor written in C++ which is responsible for placing all data in
a format vtk (and by extension ParaView) will understand and the a Python script
which reduces the data and then outputs an Unstructured grid (in Parallel vtu format).
Let us then describe the first component of this puzzle: the Adaptor. One begins
by creating a vtkMultiBlockDataset, where each process contains a vtkImageData
3.4 In-Situ Analysis and Visualization Pipeline
89
with the necessary information about sub-domain extents (lattice spacing, number
of points/cells, origin of each block). After this a series of vtkFloatArray are created
(or re-used) and filled with the contents of seven different buffers resident in Host
memory but accessible by a GPU (pinned). Six of these pinned buffers are updated
by the winding estimator kernel to contain information about which winding pierces
each cell face and in what direction (as such this buffer can take values −1, 0, 1).
Then one extra kernel updates the last array which merely indicates if a cell is pierced
(basically an OR of the absolute values of the previous buffer).
Having all data in a format ParaView Catalyst can understand we then need to
perform reduction and output. This task is left to the second ingredient of the insitu strategy, the Python script. This script first applies a threshold filter to the data,
selecting only cells wherein the last array is non-zero. This is the reduction step.
Afterwards we apply the Merge Block filter to merge all string segments throughout
the the different sub-domains and we finalize by using the appropriate Parallel Writer
for the format the data is in now: the Parallel Unstructured Grid writer. This output
is then in the *pvtu format with several auxiliary files (with the content of each
sub-domain) being present in *.vtu. With this we already have the position of string
cell centers for each string at output timesteps. Some additional treatment can be
necessary to identify which points refer belong to the same string, to smooth the
resulting centerline and if necessary visualize the network. This will be described in
the next section.
We are now in a position to describe the performance aspects of this approach.
Before we do so, we will characterize the I/O subsystem of the machine used thus
far, the Piz Daint supercomputer. All outputs will be written to the /scratch partition,
uprooted by a Cray Sonexion 3000 Lustre filesystem with 8.8PB capacity. This
filesystem contains 40 object storage targets and handle a file per process approach
as long as one does not output thousands of files in a single folder. File per process
is what the simulation uses for initial conditions and checkpointing outputs. The
maximum bandwidth measured for file per process (for the configuration of each
run) is what we will use to estimate the time taken by the HDF5 approach and the
amount of data is computed from the outputs of a single 2563 simulation (around
662M B for a single timestep). For the in-situ part measurements of amount of data
and timings are obtained over a range of timesteps. The reason we do so, is because
as the string network evolves in the linear scaling regime, there will be less string
in the lattice, and therefore less string points to output. Data is output at every
five timesteps in both cases. All of these measurements can be thought of as a
weak scaling benchmark as we keep the sub-domain size constant at 2563 , which
means the domain decompositions of lattices sizes 5123 , 10243 , 20483 and 40963
will be (2, 2, 2), (4, 4, 4) (8, 8, 8) and (16, 16, 16) respectively. The maximum file
per process bandwidths for each configuration is 5712, 45640 and 113300 MB/s with
the last values being also used for the (16, 16, 16) run at 40963 lattice size (as it is
already close to the peak bandwidth of the filesystem). The results are summarized
in Table 3.10 and Fig. 3.17.
These two comparisons summarize well the usual advantages of in-situ techniques.
The first one being that the amount of data output is heavily reduced. The smallest
90
3 Supercomputing with Graphics Processing Units
Table 3.10 On the left-hand side, a summary of typical output sizes for a timesteps either with Raw
output of all cells (HDF5) or using only the unstructured grid outputs from our in-situ approach.
On the right-hand side, we summarize corresponding range of times taken by either approach
Lattice size
Output data size (MB)
Time taken (s)
HDF5 estimated In-situ measured HDF5 estimated In-situ measured
5123
10243
20483
40963
5296
42368
338944
2711552
[8.7, 25.0]
[11.0, 126.0]
[33.0, 199.0]
[99.0, 144.0]
0.9
0.9
3.0
23.9
[2.6, 3.5]
[2.7, 3.2]
[2.9, 5.6]
[4.1, 8.2]
Fig. 3.17 On the left-hand side the typical amount of data output via raw output of all cell contents
(windings for six different cell faces) in HDF5 (dashed lines) and the size of outputs for the In-Situ
approach, where Unstructured grids are output (full lines). On the right, The typical amount of data
outputted via raw output of all cell contents (windings for six different cell faces) in HDF5 (dashed
lines) and the size of outputs for the In-Situ approach, where Unstructured grids are outputted (full
lines). These outputs are obtained for four different lattice sizes: 40963 (blue), 20483 (purple line),
10243 (orange lines) and 5123 (green line)
size of an HDF5 dataset for a 2563 run is about 662 MB in our test set and the largest
2711552 MB at 40963 lattice size (around 2.7 TB for a single timestep). Given the
large amount of data for a single timestep required by the HDF5 approach this already
means it is unfeasible to analyse several timesteps. In-situ here provides a singificant
improvement by reducing the amount of storage space needed by at best four orders
of magnitude. Meaning we can either output at a greater temporal rate (or for a longer
period of conformal time) or even add more datasets to the outputs, if required.
The second advantage is that there seems to be a wall-clock time benefit to using
in-situ but only at a sufficiently large lattice size in order to saturate the bandwidth
of the filesystem. This is evident when going from 20483 to 40963 lattice size where
there amount of time taken is roughly comparable for 20483 with both techniques,
whereas in the 40963 in-situ takes an order of magnitude less time.
It is thus clear that his technique is a significant improvement over the usual
I/O strategy of our multi-GPU simulation and enables the study of the small-scale
structure of strings a possibility for large lattices. It does so by overcoming two
3.4 In-Situ Analysis and Visualization Pipeline
91
possibly bottlenecks, the amount of storage available and the maximum bandwidth
of the system. We note that it is possible to improve upon this technique by changing
what information can be outputted: cell centers and the absolute value of the scalar
field would result in a large reduction in the amount of data outputted and in memory
consumption (going from seven to two data arrays) and, coupled with a Travelling
Salesman Heuristic solver would enable the creation of string centerlines (with
interpolated string centers) in post-processing. For now, we will describe the postprocessing approach with this seven-array version of the outputs.
3.4.2 Centerlines Post-Processing
Given that the previous step did not perform any analysis or visualization, we now
need, in post-processing to do so. We will proceed to the description of the creation
of string centerlines. Since all outputs are in the vtk Unstructured grid format, they
are easily readable by ParaView. This means we can readily use some of the already
existing filters to facilitate our life somewhat. First we use the Connectivity filter to
group neighboring cells pierced by a winding together. This already gives a “Blocky”
string made up of groups of voxels. From this step onwards we must create a custom
filter. In order to ensure the necessary data is available to the custom filter, we
additionally apply the PassArrays filter.
Next the custom filter, from now on denoted centerlines, begins looping over each
region of neighboring cells and by each cellId. The idea will be to add cell centers
to a vtkPolyData collection such that these represent vertices of a vtkLine. There is
however a small catch to doing so directly: cells are ordered by index (i,j,k) and not
by the order in which they appear in a string.
The way in which this is solved is by using the winding at each cell face to
determine physically valid cell neighbors. Without loss of generality, we will choose
the direction given by positive magnetic flux (++1 winding). Once valid neighbors
are identified, the voxel bounds for a specific cell and the next one are used to find
cell center coordinates, thus tracing a vtkLine connecting both cells. Repeating this
procedure until no more neighbors are found then yields a string centerline, as a
collection of vtkLines.
Note that this procedure is not completely fool-proof and there are some special
cases where it can fail to provide correct output. These cases are merely expected
instances of string network phenomenology which the above script initially failed to
take into account. When two strings intersect at one or more points they can exchange
ends, or likewise when one single string self-intersects, a closed loop of string forms
at the scale of the horizon or smaller. The subsequent decay of these loops is an
object of study in and of itself, as the decay mechanisms imposed and the scale(s)
at which they occur can significantly alter observational imprints of cosmic strings.
Loops are one special case that was not well-handled by the script initially as the
centerline would not close in on itself. This can be solved via a comparison of the
number of points and total number of cells: if one exceeds the other by a single unit,
92
3 Supercomputing with Graphics Processing Units
Fig. 3.18 String cells colored by region in two close-up screenshots of 20483 radiation era
simulation. We show in addition the output of the centerlines custom filter with smoothing via
a Hanning window. A Loop (in blue cells) is shown at the center of the top panel screenshot. An
intercomutation event (cells in red) is shown on the left-hand side of the bottom panel
the last end of the string should be re-connected to the beginning. An example of a
loop can be seen in the top panel of Fig. 3.18.
The second pathological case comes from the intercommutation regions. When
two strings meet at one point, this will generate an X-shaped patchwork of cells.
By choosing one cell at random and going through the valid physical neighbors,
we only create a string centerline going from one end of the box to the other (one
half of the X-shape), without using all cell centers to create a centerline. If the total
amount of cells and the number of cells used for the centerline differ, it is necessary
to restart the centerline reconstruction at one of the un-used cells. An example of an
intercommutation event can be seen in the bottom panel of Fig. 3.18.
After all centerlines are created (and all pathological cases are dealt with) we
now possess a collection of string centerlines with string positions located at the end
of each stair-case segment. This taxicab geometry is merely a consequence of the
lattice and of not performing an interpolation to the zero of the complex scalar field.
Nevertheless, the resulting small-scale structure from the lattice should be present
merely in a scale of roughly the lattice size. Following [26] we can attempt to remove
this lattice effect by smoothing the strings along the string path. One way to do so is
to convolve this with a choice of window function and window length, such that this
artificial structure is removed. Both panels of Fig. 3.18 show the smoothed centerline
output (in white). Qualitatively this is more representative of a natural string.
3.5 Conclusion
93
A myriad of information can then be extracted from these smoothed centerlines,
such as for instance distribution of string lengths as a fraction of the particle horizon
over the entire lattice, as the network evolves. An example of this being done for a
radiation era simulation 20483 in Fig. 3.19 can be seen below.
3.5 Conclusion
We have ported a cosmic domain walls simulation to a parallel OpenCL implementation,
wrote from scratch a single GPU implementation in CUDA an Abelian-Higgs
simulation and then proceed to tackle the challenge of extending it to harness
multiple GPUs. In all cases, validation proved the behavior of each simulation is
in agreement with what is presented in the literature. Furthermore we have discussed
the performance benefits of each case, showing the large speed-up of walls (around
200) compared to the previous code, the benefits of the singleGPU and multiple
GPU simulation when compared to the literature reference Lattice Abelian Higgs
simulations (using 30 times less node-hours; near perfect weak scaling). Having
succeeded in doing so, we will note what further steps can be taken to improve and
extend this work, and then conclude this chapter by introducing how we can already
use it to explore astrophysical consequences of such defect networks.
First let’s go over to possible improvements to the walls code. The simplest
improvement is to apply the (admittedly simple) optimization used in AbelianHiggs which is to compute the density and velocity not at every timestep, but
every n timesteps. This would easily allow us to ease the compute-bound bottleneck
and probably end up with a memory-bound implementation. Afterwards, all work
done in the multiple-GPU implementation to tackle memory bandwidth (Z-cycling),
amount of memory available (multi-GPU) can be added. Given that the multipleGPU AH code is done in a more generic way, it is perhaps easier to port relevant
kernels to CUDA (velocity and density estimation) and use the existing code for
communication, IO, field updates from the multiple-GPU case. Basically walls would
be a specific sub-case of the existing kernels, one without gauge field and with a real
scalar instead. After all of this let us comment on what work could be implemented
for any multi-GPU defect simulation. In terms of evading or short-circuiting the
IO-bound that will exist as soon as we wish to output more information about the
network, other than averaged quantities, we are already attempting to improve upon
this by adding in-situ capabilities to the simulation code (as was done for example
in Ayachit et al. [7]). These in-situ capabilities—unpublished work, described in
the previous section—will allow us to study the small-scale structure properties of
string networks in more detail than ever before (by outputting string centerlines
at resolutions of 40963 and higher) and to detect the presence of bound states in
networks with more than one type of string.
We can still ask how we can improve the scalability of the simulation, specifically
to keep up with the rapidly evolving landscape of supercomputing. One interesting
possibility, which could have advantages in both weak and strong scaling, is the
94
3 Supercomputing with Graphics Processing Units
Fig. 3.19 The three panels show a network of strings at 20483 lattice size in radiation era at
different conformal times, η, 445.0, 477.0 and 511.0 for the top, middle and bottom panels. Strings
are colored by their length divided by the size of the particle horizon at the given conformal time.
For each case, we plot histograms to reveal the distribution of lengths per horizon size of strings
References
95
hypercube decomposition of Blanco-Pillado et al. [10] wherein communication is
avoided. We can also attempt to do this in another way: avoid going for larger
lattices but increasing the resolution where it might be necessary, via Adaptative
Mesh Refinement, as seen in [14, 21, 24]. Lastly we remark that adding Fast Fourier
Transform capabilities to this simulation would open up another window for the
scientific exploration of this code. A recent (encouraging) work on this matter, in
the context of a pseudo-spectral solver with multiple GPUs can be found at [46].
Another earlier reference is the FFT library of reference [23].
In terms of scientific work that can be leveraged already from the mean quantities
extraction we will showcase in the next chapter how to use thousands of simulations
to calibrate semi-analytical models of string evolution, appropriately extended to
account for the velocity dependencies of energy loss and curvature, as seen in Martins
et al. [32], which enables a quantitative description of the importance of each energy
loss mechanism in the network. This has obvious implications for observational
studies which rely on the semi-analytical approach.
References
1. AMD Graphics Core Next Architecture White Paper. Technical report, 2012. http://www.amd.
com/Documents/GCN_Architecture_whitepaper.pdf
2. AMD OpenCL Optimisation Guide. Technical report, 2014. http://developer.amd.com/toolsand-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimizationguide/
3. Nvidia Tesla P100 Whitepaper. Technical report, 2016. https://images.nvidia.com/content/pdf/
tesla/whitepaper/pascal-architecture-whitepaper.pdf
4. ASS (2017) Hands on introduction to hpc, a. https://www.archer.ac.uk/training/coursematerial/2017/07/intro-epcc/index.php
5. ASS (2017) Message passing programming with mpi, b. http://www.archer.ac.uk/training/
course-material/2017/07/mpi-epcc/index.php
6. Anandtech. Amd radeon 285 review: Feat. sapphire r9 285 dual-x oc. https://www.anandtech.
com/show/8460/amd-radeon-r9-285-review
7. Ayachit U, Bauer A, Geveci B, O’Leary P, Moreland K, Fabian N, Mauldin J (2015) Paraview
catalyst: enabling in situ data analysis and visualization. In: Proceedings of the First Workshop
on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, ISAV2015,
pp. 25–29, New York, NY, USA. ACM. ISBN 978-1-4503-4003-8. https://doi.org/10.1145/
2828612.2828624
8. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from
cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D
75:065015. https://doi.org/10.1103/PhysRevD.75.065015
9. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2010) CMB power spectra from cosmic strings:
predictions for the Planck satellite and beyond. Phys Rev D 82:065004. https://doi.org/10.
1103/PhysRevD.82.065004
10. Blanco-Pillado JJ, Olum KD, Shlaer B (2012) A new parallel simulation technique. J Comput
Phys 231:98–108. https://doi.org/10.1016/j.jcp.2011.08.029
11. Briggs J, Pennycook SJ, Shellard EPS, Martins CJAP, Woodacre M, Feind K (2014) Unveiling
the Early Universe: Optimizing Cosmology Workloads for Intel Xeon Phi Coprocessors in an
SGI UV20 00 System. Technical report, SGI/Intel White Paper
96
3 Supercomputing with Graphics Processing Units
12. Camata JJ, Silva V, Valduriez P, Mattoso M, Coutinho AL (2018) In situ visualization
and data analysis for turbidity currents simulation. Comput Geosci 110:23–31, . ISSN
0098-3004. 10.1016/j.cageo.2017.09.013. https://www.sciencedirect.com/science/article/pii/
S0098300417305009
13. EPC Centre. Introduction to archer. http://www.archer.ac.uk/training/online/index.php#
IntroARCHER
14. Clough K, Figueras P, Finkel H, Kunesch M, Lim EA, Tunyasuvunakool S (2015) GRChombo?
Numerical relativity with adaptive mesh refinement. Class Quant Grav 32(24):245011. https://
doi.org/10.1088/0264-9381/32/24/245011
15. Correia JRCCC, Martins CJAP (2017) General purpose graphics-processing-unit
implementation of cosmological domain wall network evolution. Phys Rev E 96:043310.
https://doi.org/10.1103/PhysRevE.96.043310
16. Correia JRCCC, Martins CJAP (2020) Abelian-higgs cosmic string evolution with CUDA.
Astron Comput 32:100388. ISSN 2213-1337. https://doi.org/10.1016/j.ascom.2020.100388
17. Correia JRCCC, Martins CJAP (2021) Abelian-Higgs cosmic string evolution with multiple
GPUs. Astron Comput 34:100438. https://doi.org/10.1016/j.ascom.2020.100438
18. Correia JRCCC, Martins JAP (2019) Extending and calibrating the velocity dependent onescale model for cosmic strings with one thousand field theory simulations. Phys Rev D
100(10):103517. https://doi.org/10.1103/PhysRevD.100.103517
19. Correia JRCC, Leite ISCR, Martins CJAP (2014) Effects of biases in domain wall network
evolution. Phys Rev Particles, Fields, Gravit Cosmol 90(2):1–9. ISSN 15502368. https://doi.
org/10.1103/PhysRevD.90.023521
20. Daverio D, Hindmarsh M, Kunz M, Lizarraga J, Urrestilla J (2016) Energy-momentum
correlations for Abelian Higgs cosmic strings. Phys Rev D 93(8):085014
21. Drew A, Shellard EPS (2019) Radiation from global topological strings using adaptive mesh
refinement: methodology and massless modes
22. Fuhrer O, Chadha T, Hoefler T, Kwasniewski G, Lapillonne X, Leutwyler D, Lüthi D, Osuna
C, Schär C, Schulthess TC, Vogt H (2018) Near-global climate simulation at 1 km resolution:
establishing a performance baseline on 4888 gpus with cosmo 5.0. Geosci Model Dev
11(4):1665–1681. 10.5194/gmd-11-1665-2018. https://www.geosci-model-dev.net/11/1665/
2018/
23. Gholami A, Hill J, Malhotra D, Biros G (2015) Accfft: a library for distributed-memory FFT
on CPU and GPU architectures. CoRR, abs/1506.07933 http://arxiv.org/abs/1506.07933
24. Helfer T, Aurrekoetxea JC, Lim EA (2019) Cosmic string loop collapse in full general relativity.
Phys Rev D 99(10):104028. https://doi.org/10.1103/PhysRevD.99.104028
25. Hindmarsh M, Daverio D (2019) Private communication, 20 December 2019
26. Hindmarsh M, Stuckey S, Bevis N (2009) Abelian higgs cosmic strings: small scale structure
and loops. Phys Rev D 79:123504. https://doi.org/10.1103/PhysRevD.79.123504
27. Hindmarsh M, Lizarraga J, Urrestilla J, Daverio D, Kunz M (2017) Scaling from gauge and
scalar radiation in Abelian higgs string networks. Phys Rev D 96(2):023525. https://doi.org/
10.1103/PhysRevD.96.023525
28. Kageyama A, Sakamoto N, Miura H, Ohno N (2020) Interactive exploration of the in-situ
visualization of a magnetohydrodynamic simulation. Plasma and Fusion Res 15:1401065–
1401065 https://doi.org/10.1585/pfr.15.1401065
29. Kajantie K, Karjalainen M, Laine M, Peisa J, Rajantie A (1998) Thermodynamics of gauge
invariant U(1) vortices from lattice Monte Carlo simulations. Phys Lett B 428:334–341. https://
doi.org/10.1016/S0370-2693(98)00440-7
30. Kibble TWB (1976) Topology of cosmic domains and strings. J Phys A 9:1387–1398. https://
doi.org/10.1088/0305-4470/9/8/029
31. Leite AMM, Martins CJAP (2011) Scaling properties of domain wall networks. Phys Rev D
84:103523. https://doi.org/10.1103/PhysRevD.84.103523
32. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10.
1103/PhysRevD.93.043534
References
97
33. Martins CJAP, Rybak IYu, Avgoustidis A, Shellard EPS (2016) Stretching and Kibble scaling
regimes for Hubble-damped defect networks. Phys Rev D 94(11):116017. https://doi.org/10.
1103/PhysRevD.95.039902[Erratum: Phys. Rev. D 95, no.3,039902(2017)]
34. Micikevicius P (2009) 3d finite difference computation on gpus using cuda. In: Proceedings
of 2Nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2,
pp 79–84, New York, NY, USA. ACM. ISBN 978-1-60558-517-8. https://doi.org/10.1145/
1513895.1513905
35. Mu D, Moran J, Zhou H, Cui Y, Hawkins R, Tatineni M, Campbell S (2019) In-situ analysis
and visualization of earthquake simulation. In: Proceedings of the Practice and Experience in
Advanced Research Computing on Rise of the Machines (Learning), PEARC ’19, New York,
NY, USA. Association for Computing Machinery. ISBN 9781450372275. https://doi.org/10.
1145/3332186.3332201
36. Munshi A (2012) OpenCL 1.2 Specification http://scholar.google.com/scholar?hl=en&
btnG=Search&q=intitle:The+opencl+specification#0
37. Nguyen A, Satish N, Chhugani J, Kim C, Dubey P (2010) 3.5-d blocking optimization for
stencil computations on modern cpus and gpus. In: 2010 ACM/IEEE International Conference
for High Performance Computing, Networking, Storage and Analysis, pp 1–13. https://doi.
org/10.1109/SC.2010.2
38. NvidiaCorporation. Cuda programming guide. https://docs.nvidia.com/cuda/cuda-cprogramming-guide/index.html
39. NvidiaResearch-NVLabs (2018) Cub—cuda unbound v1.8.0. https://nvlabs.github.io/cub/
40. OpenCL S (2013) Performance of atomics. http://simpleopencl.blogspot.pt/2013/04/
performance-of-atomics-atomics-in.html
41. Phillips EH, Fatica M (2010) Implementing the himeno benchmark with cuda on gpu clusters.
In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS)
42. Potter D, Stadel J, Teyssier R (2017) PKDGRAV3: Beyond trillion particle cosmological
simulations for the next era of galaxy surveys. Comput Astrophys Cosmol 4:2. https://doi.
org/10.1186/s40668-017-0021-1
43. PRACE (2017) Best practice guide gpgpu. https://prace-ri.eu/wp-content/uploads/BestPractice-Guide_GPGPU.pdf
44. Press WH, Ryden BS, Spergel DN (1989) Dynamical evolution of domain walls in an expanding
universe. Astrophys J 347:590–604. https://doi.org/10.1086/168151
45. Rautenhaus M, Böttinger M, Siemen S, Hoffman R, Kirby RM, Mirzargar M, Röber N,
Westermann R (2018) Visualization in meteorology-a survey of techniques and tools for data
analysis tasks. IEEE Trans Visual Comput Graphics 24(12):3268–3296. https://doi.org/10.
1109/TVCG.2017.2779501
46. Ravikumar K, Appelhans D, Yeung PK (2019) Gpu acceleration of extreme scale pseudospectral simulations of turbulence using asynchronism. In: Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis, SC ’19,
New York, NY, USA. Association for Computing Machinery. ISBN 9781450362290. https://
doi.org/10.1145/3295500.3356209
47. Ryden BS (1988) The area of isodensity contours as a measure of large-scale structure.
Astrophys J 333:L41–L44. https://doi.org/10.1086/185284
48. Scarpino M (2011) OpenCL in action. Manning Publications. ISBN 9781617290176.
https://papers2://publication/uuid/69731F95-2EF6-4DAA-93D3-3E101997D299
49. Scherrer RJ, Vilenkin A (1998) “Lattice-free⣞ simulations of topological defect formation.
Phys Rev D 58:103501. https://doi.org/10.1103/PhysRevD.58.103501
50. Sohrabi R, Omlin S, Miller SA (2019) Geyser: 3d thermo-hydrodynamic reactive transport
numerical simulator including porosity and permeability evolution using gpu clusters. Comput
Geosci 23(6):1317–1330. ISSN 1573-1499. https://doi.org/10.1007/s10596-019-09885-w
51. Vilenkin A, Shellard ES (2000) Cosmic Strings and Other Topological Defects. Cambridge
University Press, 7. ISBN 978-0-521-65476-0
52. Xmartlabs (2012) Cuda occupancy calculator. https://github.com/xmartlabs/cuda-calculator
98
3 Supercomputing with Graphics Processing Units
53. Zhang Y, Mueller F (2012) Auto-generation and auto-tuning of 3d stencil codes on gpu clusters.
In: Proceedings of the Tenth International Symposium on Code Generation and Optimization,
CGO ’12, pp 155–164, New York, NY, USA. ACM. ISBN 978-1-4503-1206-6. https://doi.org/
10.1145/2259016.2259037
Chapter 4
Calibration of Extended VOS Models
There is geometry in the humming of the strings, there is music
in the spacing of the spheres.
Pythagoras
4.1 Prelude
As previously highlighted in the introduction, a network of cosmic strings is expected
to generate imprints on cosmological backgrounds such as the Cosmic Microwave
Background, the Stochastic Gravitational Wave Background or lensing. Comparing to such backgrounds inevitably results on a constraint on the mass scale of
these objects. Such scale is intimately connected to the symmetry breaking scale at
which the phase transition which generated the network. Therein lies then a connection between high-energy physics and observational cosmology. The aforementioned
imprints will inevitably be sourced by the energy-momentum tensor of strings, which
for a network in scale-invariant evolution (typical of the radiation dominated epoch)
is greatly simplified. The only catch is the treatment of transition between epochs
and the conformal stretching regime of dark energy domination. There are two possible ways to tackle this issue, albeit in broad terms both involve some manner of
extrapolation to full cosmological history.
The first one involves the brute-force computation of Unequal Time Correlation
functions in radiation and matter simulations, with some additional interpolation
added to treat the transition between radiation and matter. The second way involves
using the canonical semi-analytical model of string evolution (shown in Chap. 2) to
compute the full history of the string network. Then either by focusing on the amount
of loops to be produced [42] or by assuming perturbations seeded by Unconnected
Segments [7, 13, 38], one can arrive at a predicted power spectrum for the chosen
background. While it might seem this approach is disconnected from simulations, this
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations,
Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_4
99
100
4 Calibration of Extended VOS Mode
is imperatively not the case: any thermodynamic model will contain free parameters
which cannot be derived ab-initio analytically. Recently it was shown that a single
free parameter might not be sufficient to accurately describe the full evolution of a
domain wall network and the energy loss mechanisms. This class of extended models
was also shown to properly predict network evolution through transitions between
epochs, as shown for domain walls in [32, 33].
Having created simulations which can compute and output average network
characteristics faster than previous simulations, we will now exploit this higherperformance to calibrate extended VOS models. The calibrations and conclusions to
be drawn throughout this chapter resulted on the following:
• Work published in Physical Review D, “Effects of Biases in Domain Wall Network
Evolution II: Quantitative Analysis”, found in reference [20];
• Work published in Physical Review D, “Extending and calibrating the velocity
dependent one-scale model for cosmic strings with one thousand field theory simulations”, which can be found at the following [18];
• Work published in Physical Review D, “Quantifying the effect of cooled initial
conditions on cosmic string network evolution”, which is found at the following
coordinates [16];
• Work published in Physical Review D, entitled “High resolution calibration of the
cosmic strings velocity dependent one-scale model”, found at the following [17].
We will begin with a simpler case of domain walls pushed outside the horizon,
which re-enter at a later time and by doing so will enter scaling. We will then turn our
attention to the calibration of the extended VOS model for gauged cosmic strings.
Given the larger number of numerical choices available to this latter type of simulations and the unknown impact of them, we resort to systematically exploring the
impact of each choice possible in order to obtain the most robust and highest resolution possible calibration. We then finalize by discussing the impact on observational
signatures.
4.2 Global Domain Walls
4.2.1 Walls Formed Before the End of Inflation
Let us then begin with the extended VOS model for domain walls and its calibration in
light of super-horizon (anisotropic) walls. This section will serve as “tutorial” on the
ways to calibrate the VOS, with subsequent sections improving on the methodology
hereby presented.
As previously noted, a network of domain walls at scaling can potentially overclose the Universe. This can be seen by considering how the density of walls is
expected to decay ρw ∝ t −1 and comparing to the critical density ρc ∝ t −2 . There
are some solutions which can make domain walls cosmologically viable, which either
4.2 Global Domain Walls
101
require deviating from scaling (via some biasing of the potential) or by ensuring walls
are formed/enter scaling at a sufficiently late time to avoid overclosure. Ensuring that
walls must form sufficiently late imposes an upper bound of ≈ 1MeV on the energy
scale of these defects–a constraint known as the Zeldovich bound [46]. Another way
to avoid overclosure can also be achieved by moving the formation time into the
diametrically opposite direction, to earlier, during an inflationary epoch. If during
inflation a network of domain walls form, they would enter a conformal stretching
regime where L ∝ a and v → 0. By transitioning into the radiation era this network
of walls would have a correlation length larger than the horizon, and would therefore
remain “frozen” outside it. Walls would only start scaling when the horizon grows
to sizes comparable to the mean separation of the network. The time of re-entry
upon the horizon will then depend on the typical length scale of walls after inflation,
and therefore on the details of the inflationary epoch: the energy scale of walls, the
number of e-folds occurring from wall formation till inflation’s end. We remark that
a too large number of e-folds would essentially mean that walls would never re-enter
the horizon during radiation or matter epochs (similar arguments can also be said of
strings and monopoles, see for instance [43]).
One other possibility is not to start with an isotropic network, where the mean
separation is the same in all directions but one with an anisotropy derived from being
formed in a anisotropic scenario. In this case the typical lengthscale would be for
instance be larger over one direction over the others. This has two effects: first the
timescale for re-entry upon the horizon will be set by the larger lengthscale, and
second, since in anisotropic inflation the Universe isotropizes (and then transitions
into radiation), the network of walls will tend towards the usual scaling behavior
dictated by a single correlation length as the anisotropic imprints on the network are
slowly erased. This scenario was described was introduced in [5] and later re-explored
(with larger simulations) in [19].
For this purpose of model calibration we must test the evolution of walls in
post-inflationary Universes, there is no need to simulate anisotropic Universes (and
inflation) but only start from a network of super-horizon (anisotropic) domain walls.
This essentially means we need to generate initial conditions where the mean separation of the walls is larger than the size of the horizon and the velocities tend to
zero. Numerically this is easy to achieve by evolving the network to some conformal
time where the network of walls has formed and is reasonably well-defined, and
subsequently re-setting the simulation to initial conformal time and setting velocities to zero. An anisotropic super-horizon network will also require an additional
ingredient: a “stretching” of factor f via interpolation (and clamping) along a spatial
direction. This leaves us with three cases to compare,
• Standard networks with random initial conditions for the scalar field φ, and φ̇ set
to zero. It will be henceforth denoted case A. These provide a fiducial comparison
not only to account for possible numerical effects (lattice size) but for calibration
comparison with the subsequent cases.
• Super-horizon isotropic networks, whose initial conditions are obtained from
evolving a standard network from conformal time η0 = 1.0 till η = 20.0 (twice
102
4 Calibration of Extended VOS Mode
wall thickness), resetting conformal time back to η = η0 and setting φ̇ to null. This
will be referred to as case B.
• Super-horizon anisotropic networks, where the initial conditions will be as in case
B but with an additional stretching applied to either have a “preferential” mean
separation which is either twice as large as the one in any other direction (case C)
or four times larger (case D). Note that in [19] case D corresponded to having a
preferential direction 16 times larger.
The analysis of [19] had previously demonstrated via 40962 matter epoch simulations that within statistical uncertainties the wall networks do reach scaling, and
that convergence to scaling will require more time with the anisotropic networks, in
particular those with a larger stretch factor. We will continue this analysis by doubling the box size along each direction 81922 and by simulating several expansion
rates, as required for the Extended VOS calibration (exemplified for the first time in
[32, 33]) with super-horizon (anisotropic) networks.
4.2.2 A Primer on Calibrating the Extended VOS–Domain
Walls
In order to describe the procedure for VOS model calibration we will begin by
presenting the extended VOS for domain walls (similar from the cosmic string model
presented earlier). As mentioned previously, the VOS model for domain walls was
been qualitatively derived in [6, 41], and a more rigorous ab initio derivation was
found by [32]. It relies on two averaged quantities, a density ρw (or equivalent a
characteristic physical lengthscale L, related to the former via ρw = σ/L) and a root
mean squared velocity v. In a FLRW spacetime, these evolve as,
dL
= (1 + 3v 2 )H L + F(v)
dt
k(v)
dv
= (1 − v 2 )
− 3H v ,
dt
L
(4.1)
(4.2)
where F(v) and k(v) are respectively energy loss and momentum (or curvature)
velocity-dependent functions. The model has been shown to be in very good agreement with high-resolution field theory simulations of domain wall networks in FLRW
isotropic universes, for a very broad range of expansion rates. The simpler version of
the VOS–with constant momentum parameter and a linear energy loss function–
seemed to accurately describe low resolution simulations [27]. However, as the
resolution increased, the extension was necessary [32, 33] to properly predict the
evolution of the quantities of interest.
Although a detailed derivation of the velocity dependence of the two model parameters can be found in [32] and is briefly discussed in the introduction, here we will
briefly state them. The momentum parameter takes the following the form,
4.2 Global Domain Walls
103
k(v) = k0 ·
1 − (qv 2 )βw
1 + (qv 2 )βw
(4.3)
where k0 is the maximal value of the momentum parameter, q the inverse of the
maximal wall network velocity squared, and βw has no clear physical interpretation–
serving as a parameter to control the interpolation of the momentum parameter from
low and high velocity limits. All of them are constant free parameters. In the case of
cosmic strings, an analytical ansatz can be derived to fix all of these values (assuming
the Nambu-Goto approximation and a helicoidal string), however such is not the case
for walls.
The maximal momentum parameter k0 might be bounded lower than unity (albeit
larger values can indicate a deviation from the one-scale approximation), and q must
2
2
where vmax
can be determined via
be within the following interval 0 < 1/q ≤ vmax
defect dimensionality to be ≈ 2/3.
As for the energy loss function, the traditional blob chopping term cw v is complemented by a power-law term to describe scalar radiation,
F(v) = cw v + d[k0 − k(v)]r ,
(4.4)
where, d and r are two more constant parameters for the normalization and exponent
of the power law. Note that by definition, in the low velocity limit, this term will tend
to zero.
For easier comparison with numerical simulations, the model can be re-written in
terms of the density ρw = σ/L and in conformal time η,
F(v)ρ2w
dρw
= −3v 2 Hρw −
dη
σ
(4.5)
dv
k(v)ρ
= (1 − v 2 )
− 3Hv ;
dη
σ
(4.6)
where H is the conformal Hubble parameter.
The first step towards VOS model calibration is to ensure that on the conformal
time range to be used, all simulations have reached scaling. The reason why scaling
is used for the calibration is due to the fact it is a fixed point of the model for
any Universe where the scale factor behaves as a ∝ t m ∝ η m/(1−m) . Given the need
to properly extract velocity dependencies of each function in the model, we will
also need to verify the existence of scaling for six expansion rates in the range
0.5 < m < 0.99. A possible way to do so is to verify if from the following scaling
laws,
ρ ∝ ημ
γv ∝ η ν
(4.7)
the scaling exponents μ and ν are consistent with −1, 0, respectively. This procedure
is applied for the average of 10 runs for each expansion rate (and case) in a given fit
104
4 Calibration of Extended VOS Mode
Table 4.1 Scaling exponents μ and ν, and asymptotic values σ/(ρτ ) and γv for each case mentioned in the text (A, B, C and D) for each of the simulated expansion rates. A fit range of
η ∈ [501.25, 3096.25] was used in all cases. One-sigma statistical uncertainties are shown throughout
Case
m
μ
ν
σ/(ρw η)
γv
A
1/2
−0.972 ± 0.004
−0.081 ± 0.005
0.547 ± 0.018
0.397 ± 0.022
2/3
−0.973 ± 0.013
−0.043 ± 0.008
0.510 ± 0.055
0.338 ± 0.021
4/5
−0.971 ± 0.006
−0.013 ± 0.005
0.410 ± 0.020
0.269 ± 0.010
9/10
−1.024 ± 0.006
−0.028 ± 0.006
0.319 ± 0.016
0.192 ± 0.009
95/100
−1.014 ± 0.005
0.022 ± 0.006
0.225 ± 0.010
0.136 ± 0.006
99/100
−0.975 ± 0.002
0.010 ± 0.003
0.099 ± 0.001
0.059 ± 0.001
1/2
−0.985 ± 0.010
−0.017 ± 0.006
0.565 ± 0.042
0.408 ± 0.017
2/3
−0.963 ± 0.009
−0.042 ± 0.009
0.495 ± 0.038
0.336 ± 0.023
4/5
−1.031 ± 0.006
−0.049 ± 0.005
0.434 ± 0.023
0.271 ± 0.012
9/10
−0.979 ± 0.003
−0.034 ± 0.004
0.305 ± 0.007
0.189 ± 0.006
95/100
−0.992 ± 0.003
0.010 ± 0.007
0.220 ± 0.006
0.134 ± 0.007
99/100
−0.990 ± 0.002
0.018 ± 0.002
0.100 ± 0.001
0.059 ± 0.001
1/2
−1.003 ± 0.012
−0.043 ± 0.010
0.504 ± 0.046
0.373 ± 0.028
2/3
−0.959 ± 0.008
−0.037 ± 0.006
0.435 ± 0.025
0.316 ± 0.014
4/5
−0.983 ± 0.010
−0.032 ± 0.008
0.376 ± 0.028
0.258 ± 0.015
9/10
−0.987 ± 0.008
−0.046 ± 0.006
0.294 ± 0.018
0.187 ± 0.009
95/100
−0.990 ± 0.004
0.029 ± 0.005
0.214 ± 0.006
0.135 ± 0.006
99/100
−0.992 ± 0.001
0.021 ± 0.003
0.100 ± 0.001
0.059 ± 0.001
1/2
−0.992 ± 0.010
−0.012 ± 0.007
0.351 ± 0.028
0.307 ± 0.017
2/3
−0.933 ± 0.016
−0.097 ± 0.011
0.320 ± 0.039
0.258 ± 0.023
4/5
−0.962 ± 0.005
−0.043 ± 0.009
0.294 ± 0.012
0.230 ± 0.016
9/10
−0.990 ± 0.008
−0.004 ± 0.008
0.250 ± 0.014
0.184 ± 0.011
95/100
−0.971 ± 0.005
0.036 ± 0.006
0.191 ± 0.007
0.134 ± 0.006
99/100
−0.982 ± 0.002
0.023 ± 0.003
0.097 ± 0.002
0.059 ± 0.002
B
C
D
range, which we take to be η ∈ [501.25, 3096.25]. The reason for a specific choice
of expansion rate is related to two factors: at early enough times the network might
not be already in scaling (this can be especially true for case D, as the timescale for
re-entry is larger than in the other cases) and at late times there might not be enough
walls to ensure a good-enough statistic when computing mean velocity squared. The
results can be found in Table 4.1 and point towards a consistency with scaling for
all sets of data. In fact, it is clear that even if the approach to scaling is affected by
the value of the stretch factor and the expansion rate, all simulations achieve scaling
given enough conformal time. This can be visually confirmed in Figs. 4.2 and 4.3.
Having confirmed scaling, we then use the same minimization procedure from
[32, 33] to obtain the best-fit model parameters for each case first and then with all
cases (global fit). The output, together with one-sigma uncertainties (statistical from
the average of runs), can be found in Table 4.2. We confirm that the best-fit parameters
for Case D are different from those of the other cases, which can be confirmed as well
from the VOS model predictions for each velocity dependent function and average
cw
0.00 ± 0.01
−0.00 ± 0.01
0.00 ± 0.01
−0.00 ± 0.01
0.00 ± 0.01
0.00 ± 0.03
0.00 ± 0.08
Case A
Case B
Case C
Case D
Global
Ref. [32]
Ref. [33]
≤ 0.99
≤ 0.99
≤ 0.99
≤ 0.99
≤ 0.99
≤ 0.9
≤ 0.9998
m
0.5 ≤ m
0.5 ≤ m
0.5 ≤ m
0.5 ≤ m
0.5 ≤ m
0.5 ≤ m
0.2 ≤ m
Case
0.28 ± 0.01
0.29 ± 0.02
0.23 ± 0.03
0.13 ± 0.01
0.22 ± 0.02
0.29 ± 0.01
0.26 ± 0.02
d
1.44 ± 0.12
1.57 ± 0.20
1.84 ± 0.21
2.05 ± 0.22
2.10 ± 0.39
1.30 ± 0.06
1.42 ± 0.04
r
1.92 ± 0.21
1.47 ± 0.18
1.37 ± 0.17
1.30 ± 0.18
1.52 ± 0.30
1.65 ± 0.17
1.08 ± 0.07
βw
1.71 ± 0.01
1.73 ± 0.02
1.73 ± 0.03
1.71 ± 0.05
1.72 ± 0.03
1.72 ± 0.03
1.77 ± 0.03
k0
5.07 ± 0.39
4.08 ± 0.43
5.25 ± 0.52
8.97 ± 0.78
5.68 ± 0.89
4.10 ± 0.17
3.35 ± 0.32
q
Table 4.2 Best fit parameters, with one-sigma statistical uncertainties, for the extended VOS for the 81922 domain wall simulations in the present work (cases
A, B, C and D, plus a global fit to all the data). For comparison, the bottom two rows show the parameters obtained for 40963 simulations of standard walls in
[32, 33]
4.2 Global Domain Walls
105
106
4 Calibration of Extended VOS Mode
0.7
0.5
case A
case B
case C
case D
0.4
0.3
0.4
v
σ/(ρη )
0.5
case A
case B
case C
case D
0.6
0.3
0.2
0.2
0.1
0.1
0.0
0.0
0.5
0.6
0.7
0.8
0.9
0.5
1.0
0.6
0.7
m
1.75
caseA
caseB
caseC
caseD
1.50
1.0
0.6
0.5
1.00
F (v)
k(v)
0.9
caseA
caseB
caseC
caseD
0.7
1.25
0.4
0.3
0.75
0.2
0.50
0.1
0.25
0.00
0.8
m
0.0
0.0
0.1
0.2
0.3
v
0.4
0.0
0.1
0.2
0.3
0.4
v
Fig. 4.1 Comparing the VOS model prediction, using the best-fit parameters listed in Table 4.2,
for the scaling parameters σ/ρη (top left), v (top right), the curvature parameter k(v) (bottom left)
and the global energy losses parameter F(v) (bottom right). The fits for Cases A, B, C and D are
separately show in each panel
quantities, as seen on Fig. 4.1. We remark that the latter figure also confirms the
extended model accurately describes scaling for all simulations. For a discussion of
additional systematic uncertainties in these simulations, see [33].
In addition we will also compare with the previously obtained calibrations from
full 40963 simulations shown in [32, 33]. Given the aforementioned uncertainties all
calibrations are mostly consistent. We point out the remarkable consistency in the
value of the blob chopping parameter cw = 0 indicating that in no case does blob
chopping play a dominant role in energy loss, and in the maximal value of the value
of the momentum parameter, k0 . As a larger expansion rate is translated in larger
Hubble damping, this consistency is somewhat expected, as at large expansion rates
all anisotropic imprints on small-scale structure are quickly erased. This is not the
case at low expansion rate, and indeed a rather different k(v) is expected (clearly
shown in the bottom panels of Fig. 4.1).
This already describes how to calibrate the extended VOS model. However, we
can now ask an additional question: does the VOS model describe the approach to
scaling reasonably? To do so, we insert the calibrated parameters into the equations,
use the measure quantities at the first timestep as initial conditions and numerically
integrate the model.
The results of this integration (dashed lines) and the measured average network
quantities are shown in Figs. 4.2 and 4.3. We remark that the model seems to do
quite well for low expansion rates, although less so for higher ones (qualitatively).
The reason for this might be due to the transient scaling regime identified by [33].
4.2 Global Domain Walls
107
caseA
caseA
0
0
−1
−1
−2
−2
log(γv)
log(ρ)
−3
−4
m=1/2
−5
−3
m=1/2
m=2/3
m=2/3
−6
m=4/5
−7
m=95/100
m=9/10
0
1
m=9/10
m=95/100
−5
m=99/100
−8
m=4/5
−4
2
3
4
5
6
7
m=99/100
0
8
1
2
3
4
5
6
7
8
log(η)
log(η)
caseB
caseB
−1
−1
−2
−2
−3
−4
−5
log(γv)
log(ρ)
−3
m=1/2
m=2/3
−6
−4
m=1/2
m=2/3
−5
m=4/5
m=4/5
m=9/10
m=9/10
−7
m=95/100
m=95/100
−6
m=99/100
m=99/100
−8
0
1
2
3
4
5
6
7
0
8
1
2
3
5
6
7
8
log(η)
log(η)
caseC
caseC
−1
4
−1
−2
−2
−3
−3
−5
log(γv)
log(ρ)
−4
m=1/2
m=2/3
−6
m=9/10
m=99/100
0
1
m=2/3
m=4/5
m=9/10
−6
m=95/100
−8
m=1/2
−5
m=4/5
−7
−4
m=95/100
m=99/100
−7
2
3
4
5
6
7
0
8
1
2
3
4
5
6
7
8
log(η)
log(η)
caseD
caseD
−1
−2
−2
−3
−5
log(γv)
log(ρ)
−3
−4
m=1/2
m=2/3
−4
m=1/2
m=2/3
−5
m=4/5
−6
m=4/5
m=9/10
m=99/100
0
1
m=9/10
−6
m=95/100
−7
m=95/100
m=99/100
−7
2
3
4
log(η)
5
6
7
8
0
1
2
3
4
5
6
7
8
log(η)
Fig. 4.2 Comparing the evolution of the density ρ (left column) and the root mean squared velocity
γv (right column) for simulations in the various cases and expansion rates. Each panel compares the
results of the various expansion rates m for one specific Case (A, B, C or D). In all panels the solid
lines show the results of the simulations, while the dashed lines show the corresponding integration
of the VOS, using the best-fit parameters of Table 4.2
108
4 Calibration of Extended VOS Mode
m=1/2
m=1/2
0
0.0
−1
−0.5
−2
−1.0
log(γv)
log(ρ)
−3
−4
−5
−1.5
−2.0
−6
case A
case B
case C
case D
−7
−8
0
1
case A
case B
case C
case D
−2.5
−3.0
2
3
4
5
6
7
0
8
1
2
3
log(η)
5
6
7
8
log(η)
m=2/3
0
4
m=2/3
−0.5
−1
−1.0
−2
−1.5
log(γv)
log(ρ)
−3
−4
−2.0
−2.5
−5
case A
case B
case C
case D
−6
−7
−8
0
1
case A
case B
case C
case D
−3.0
−3.5
2
3
4
5
6
7
0
8
1
2
3
log(η)
6
7
8
m=9/10
−1.5
−1
−2.0
−2
−2.5
log(γv)
−3
log(ρ)
5
log(η)
m=9/10
0
4
−4
−3.0
−3.5
−5
−4.0
case A
case B
case C
case D
−6
−7
0
1
case A
case B
case C
case D
−4.5
2
3
4
5
6
7
−5.0
8
0
1
2
3
log(η)
5
6
7
8
log(η)
m=99/100
m=99/100
0
4
−3
−1
−4
log(γv)
log(ρ)
−2
−3
−5
−4
case A
case B
case C
case D
−5
−6
0
1
case A
case B
case C
case D
−6
−7
2
3
4
log(η)
5
6
7
8
0
1
2
3
4
5
6
7
8
log(η)
Fig. 4.3 As in Fig. 4.2, but each panel now compares the results of the four Cases (A, B, C and D)
for a fixed expansion rate m; specifically the cases m = 1/2 (radiation era), m = 2/3 (matter era),
m = 9/10 and m = 99/100 are shown
4.3 Abelian-Higgs Cosmic Strings
109
Given that this section illustrated how to calibrate the extended VOS model, we
can now describe the calibration for cosmic string networks and all improvements
made to the calibration pipeline.
4.3 Abelian-Higgs Cosmic Strings
4.3.1 Calibrations on Small Lattices–A First Approach
We will now exemplify the calibration of local Abelian-Higgs cosmic strings. This
first calibration will be based on small resolution lattices of size 5123 , with lattice
spacing Δx = 0.5 and with constant comoving width. We will use 12 runs per expansion rate (to reduce statistical uncertainty), for 43 expansion rates and two sets of
estimator choices–resulting in a total of 1032 simulations used.
Some of the aforementioned choices (lattice spacing, lattice size, constant comoving width) and their impact on the model calibration need to be properly explored,
however this requires more hardware resources than were available at the time of
this first calibration–we had only two Nvidia 1080Ti’s (11GB of Video RAM) and
one Nvidia Quadro P5000 (16GB of VRAM), installed on different machines. We
will therefore defer an exploration of several numerical choices to when we more
adequate hardware resources were available. The only choice that could (and in fact
was) explored with the same resources as this preliminary calibration was the impact
of cooling on scaling, and the sensitivity of the model to removal of radiation.
Following the same recipe as for domain walls, we must first ascertain the constancy of ξ˙ and < v 2 >, the asymptotic quantities. There is a small numerical technicality that needs some explaining. While one would expect the following scaling
laws,
ξ ∝ ημ
v2 ∝ ην ,
(4.8)
with the expected values of μ = 1 and ν = 0 indicating that the network has reached
scaling, this is not completely true here. The observed scaling law (as seen in [10]) is
of the form ξ ∝ (η − η0 ) with the extra offset η0 being a consequence of the choice
of initial conditions. With sufficient dynamic range (so at late enough times) this
value takes a less and less prominent role, and therefore one would tend to the defacto expected scaling law ξ ∝ η. There are some strategies to drive this offset to
zero such as preparing the initial conditions to have correlations on a certain scale,
starting the simulation at a later time and then cooling the initial conditions (the
strategy employed in [11, 21, 24]) or, for example, by evolving the network in a
high expansion rate Universe and then change to the desired expansion rate. We have
explored driving the offset to zero in a radiation epoch by using the fast expansion
first and then allowing the normal evolution–this indeed reduces the value of the
offset, as expected. Note that this technique is not useful here–it has little to no effect
110
4 Calibration of Extended VOS Mode
for high enough expansion rates (as expected) and the details of the high damping
evolution need to be tuned for each expansion rate.
From the point-of-view of the VOS equations (re-written in conformal quantities),
mξ
F(v)
dξ
=
v2 +
dη
(1 − m)η
2
(4.9)
dv
2mv
k(v)
2
= (1 − v )
−
.
dη
ξ
(1 − m)η
(4.10)
This different scaling law is also not an issue, as the quantity of interest is the
˙ meaning this rate of change can simply be with ξ/(η − η0 ).
slope of ξ, or ξ:
Given that the model will be calibrated from linear scaling behavior, we will use
the asymptotic quantities v0 and , which is defined as,
=
ξ
,
(η − η0 )(1 − m)
(4.11)
where m is the expansion rate in any Universe with a ∝ t m where t is the physical
time, or in conformal time a ∝η m/(1−m) . The asymptotic quantities are obtained
from the measured ξ values and v 2 at linear scaling. For each expansion rate, we
compute the average offset η0 to subsequently compute as well as its uncertainties.
We find that this offset varies mildly from 36 to 48, depending on expansion rate.
It is also useful to re-write the VOS model assuming linear scaling as,
F(v0 ) = 2 [1 − m(1 + v02 )]
(4.12)
k(v0 ) = 2m v0 .
(4.13)
which gives us a direct way to compare analytical expectations (standard or extended
forms) of each velocity-dependent function with simulation output. Before such a
comparison is to be done, we must, however, verify exponents μ and ν (and therefore
the quality of scaling) and use it to select a specific conformal time range. For this
verification, we will also use two different correlation length estimators (Winding and
Lagrangian based estimators, see Eqs. 3.17 and 3.14, respectively) and two different
velocity estimators (scalar conjugate momenta and equation of state based estimators, see Eqs. 3.18 and 3.20, respectively). The values for these two exponents and
respective asymptotic quantities can be found in Table 4.3. and v0 are furthermore
depicted in Fig. 4.4. Our criteria for ensuring the scaling assumption holds will be
those used in the domain walls case (previous section and [32, 33]), which consist
of demanding a value of μ consistent with unity to at least two decimal places and
ν diverging maximally by about 10% from nil. The reason the second criterion is
relatively more lax is due to the inherent difficulty in measuring velocities in field
theory simulations. Using this information and criteria we can then select the conformal time range for the calibration, which in the table and throughout the rest of
νω
0.024±0.004
0.003±0.005
0.003±0.005
0.008±0.004
0.004±0.004
0.017±0.004
0.009±0.004
0.023±0.004
0.036±0.005
0.033±0.005
0.027±0.005
0.029±0.005
0.043±0.005
0.051±0.005
0.054±0.005
0.073±0.006
0.070±0.006
0.080±0.006
0.085±0.006
0.084±0.006
0.081±0.007
μW
0.999±0.005
1.000±0.005
0.999±0.005
0.999±0.005
0.999±0.004
0.999±0.004
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
1.000±0.003
0.999±0.003
0.999±0.003
1.000±0.003
1.000±0.003
0.999±0.003
m
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0.60
0.61
0.62
0.63
0.64
0.6(6)
0.68
0.69
0.70
0.71
0.72
0.287±0.002
0.291±0.002
0.293±0.002
0.292±0.003
0.293±0.002
0.292±0.002
0.293±0.003
0.292±0.003
0.291±0.002
0.289±0.002
0.288±0.002
0.288±0.003
0.292±0.002
0.291±0.002
0.292±0.002
0.297±0.003
0.298±0.003
0.300±0.004
0.303±0.004
0.310±0.004
0.307±0.004
ξW /(η − η0 )
0.463±0.008
0.471±0.007
0.477±0.008
0.483±0.007
0.488±0.007
0.496±0.007
0.507±0.006
0.511±0.006
0.515±0.006
0.518±0.006
0.522±0.006
0.525±0.006
0.530±0.006
0.533±0.005
0.536±0.005
0.539±0.005
0.541±0.005
0.544±0.006
0.544±0.006
0.547±0.006
0.549±0.006
< v 2 >ω
0.999±0.003
1.000±0.003
1.000±0.003
1.000±0.003
1.000±0.003
1.000±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
0.999±0.003
1.000±0.004
0.999±0.004
0.999±0.005
0.999±0.005
1.000±0.005
1.000±0.005
μL
0.106±0.010
0.105±0.009
0.107±0.010
0.102±0.009
0.089±0.009
0.091±0.008
0.073±0.007
0.066±0.007
0.060±0.007
0.046±0.006
0.045±0.007
0.054±0.006
0.057±0.006
0.043±0.005
0.024±0.005
0.034±0.006
0.019±0.006
0.027±0.006
0.023±0.007
0.014±0.006
0.047±0.007
νφ
0.283±0.002
0.288±0.002
0.290±0.002
0.290±0.003
0.291±0.002
0.290±0.002
0.292±0.003
0.290±0.003
0.290±0.002
0.288±0.003
0.287±0.003
0.287±0.003
0.292±0.002
0.291±0.002
0.291±0.002
0.299±0.003
0.299±0.003
0.300±0.004
0.303±0.004
0.311±0.004
0.309±0.004
ξL /(η − η0 )
< v 2 >φ
(continued)
0.435±0.011
0.442±0.010
0.448±0.010
0.454±0.010
0.459±0.010
0.466±0.009
0.477±0.008
0.481±0.008
0.484±0.008
0.488±0.008
0.491±0.008
0.494±0.008
0.499±0.007
0.501±0.006
0.504±0.007
0.506±0.007
0.508±0.007
0.510±0.008
0.510±0.009
0.512±0.008
0.513±0.008
Table 4.3 Relevant quantities measured from the two sets of simulations, for each expansion rate m: specifically the scaling exponents, μ and ν, together with
the mean correlation length divided by conformal time (corrected by an offset), ξ/(τ − τ0 ), and the mean velocity squared < v 2 >. The left side of the table
uses the winding-based correlation length estimator and the equation of state based velocity estimator, while the right side of the table uses the Lagrangian-based
correlation length estimator and the field-based velocity estimator. All quantities are the result of the average of 12 simulations with different initial conditions
4.3 Abelian-Higgs Cosmic Strings
111
νω
0.083±0.007
0.074±0.008
0.060±0.009
0.062±0.009
0.083±0.008
0.084±0.009
0.075±0.008
0.074±0.008
0.078±0.008
0.091±0.008
0.101±0.008
0.106±0.008
0.102±0.008
0.101±0.008
0.095±0.008
0.091±0.007
0.092±0.006
0.097±0.006
0.106±0.005
0.097±0.005
0.077±0.005
0.070±0.004
μW
0.999±0.004
0.999±0.004
0.999±0.004
0.999±0.004
1.000±0.004
1.000±0.004
1.000±0.003
1.000±0.003
1.000±0.003
1.000±0.003
1.000±0.004
1.000±0.004
1.000±0.004
1.000±0.004
1.000±0.005
1.000±0.005
1.000±0.004
1.000±0.004
1.000±0.004
1.000±0.004
0.999±0.003
0.999±0.003
m
0.73
0.74
0.75
0.76
0.77
0.78
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.95
Table 4.3 (continued)
0.169±0.001
0.180±0.002
0.191±0.002
0.202±0.002
0.212±0.002
0.220±0.003
0.229±0.003
0.235±0.003
0.240±0.003
0.245±0.003
0.250±0.002
0.254±0.002
0.259±0.002
0.261±0.002
0.266±0.002
0.267±0.002
0.274±0.003
0.277±0.003
0.277±0.003
0.278±0.003
0.280±0.003
0.283±0.003
ξW /(η − η0 )
0.207±0.002
0.224±0.003
0.241±0.003
0.256±0.004
0.271±0.004
0.285±0.005
0.299±0.006
0.311±0.006
0.324±0.006
0.336±0.007
0.347±0.007
0.359±0.007
0.369±0.007
0.378±0.007
0.388±0.008
0.397±0.008
0.416±0.009
0.424±0.009
0.431±0.009
0.438±0.009
0.446±0.009
0.454±0.008
< v 2 >ω
0.999±0.003
0.999±0.004
1.000±0.004
1.000±0.004
1.000±0.004
1.000±0.005
1.000±0.005
1.000±0.005
1.000±0.004
1.000±0.004
1.000±0.004
1.000±0.004
1.000±0.004
1.000±0.003
1.000±0.003
1.000±0.003
1.000±0.003
1.000±0.004
1.000±0.003
0.999±0.003
1.000±0.004
0.999±0.003
μL
0.053±0.006
0.070±0.006
0.097±0.007
0.106±0.008
0.093±0.009
0.089±0.009
0.090±0.010
0.098±0.011
0.108±0.011
0.109±0.012
0.116±0.012
0.109±0.011
0.098±0.012
0.083±0.012
0.079±0.012
0.083±0.012
0.095±0.013
0.094±0.012
0.072±0.014
0.071±0.014
0.091±0.012
0.106±0.011
νφ
0.159±0.001
0.169±0.002
0.180±0.002
0.191±0.002
0.201±0.002
0.209±0.003
0.218±0.003
0.224±0.003
0.230±0.003
0.235±0.002
0.240±0.002
0.245±0.002
0.249±0.002
0.252±0.002
0.257±0.002
0.259±0.002
0.268±0.002
0.271±0.003
0.272±0.002
0.273±0.002
0.275±0.003
0.279±0.003
ξL /(η − η0 )
< v 2 >φ
0.181±0.003
0.199±0.003
0.217±0.004
0.233±0.005
0.248±0.005
0.262±0.006
0.275±0.007
0.288±0.007
0.300±0.008
0.312±0.009
0.323±0.009
0.335±0.009
0.344±0.009
0.353±0.010
0.363±0.010
0.371±0.010
0.389±0.011
0.398±0.011
0.404±0.012
0.411±0.012
0.418±0.011
0.426±0.011
112
4 Calibration of Extended VOS Mode
4.3 Abelian-Higgs Cosmic Strings
3.5
113
3.0
√
0.55
W
L
< v 2 >ω
< v 2 >φ
0.50
0.45
< v2 >
2.5
√
2.0
0.40
0.35
0.30
1.5
0.25
1.0
0.20
0.5
0.5
0.6
0.7
0.8
0.9
0.5
0.6
m
0.8
0.9
m
√
√
| < v 2 >ω − < v 2 >φ|/ < v 2 >ω
0.12
|
0.10
Relative difference
0.7
W
−
L|
W
0.08
0.06
0.04
0.02
0.00
0.5
0.6
0.7
0.8
0.9
m
Fig. 4.4 Asymptotic values of (top left panel) and root mean squared velocity v 2 (top right
panel) for the two pairs of estimators used in the our production runs. The bottom panel shows the
relative difference between the pairs of estimators, showing that the difference between the obtained
velocities is in the range 6%−12% while for the correlation length estimators it is at most of 6%
the calibration will be η ∈ [80, 128]. In conclusion, all networks have reached the
scaling regime and can therefore be used to calibrate the VOS.
Here we will also make another choice, given that exploring which velocity estimator should be used will require a larger resolution (larger lattice size with smaller
spacing). This will be explored at a later section. For now, we will make an educated
guess to only use the equation of state parameter, as in [24] the conjugate momenta
estimator seemed to underestimate the velocity of an oscillating string in Minkowski.
This underestimation for expanding Universes is evident as well in our asymptotic
quantities in Table 4.3 and in Fig. 4.4. The difference is maximal at larger expansion
rates (of order 12%), and minimally of about 6% for radiation epoch. On the same
note, both estimates of agree very well at low expansion rate, but begin disagreeing
at higher ones (maximally about 6%). We remark that here, there is no literature reference as to which estimator is performs better, as such we will attempt a calibration
with each of these. The small disagreement at large expansion rates will cause some
differences in the calibration, as will be shown next.
Before we proceed to the calibration let us briefly remind the reader of the extended
forms of the momentum parameter k(v) and the energy loss function F(v), first
114
4 Calibration of Extended VOS Mode
proposed for domain walls in [32, 33] and here used with cosmic strings. The first
velocity-dependent function takes the following form,
k(v) = k0
1 − (qv 2 )β
,
1 + (qv 2 )β
(4.14)
where β, q and k0 are free parameters. This form is general enough to also reduce
(via appropriate choices) to the analytical Nambu-Goto ansatz of Eq. 2.62. We will
also note that a k0 larger than unity can be a sign of wiggliness.
The latter function, energy loss, is modified to include a scalar and gauge radiation
term (in addition to loop production),
F(v) = cv + d[k0 − k(v)]r
(4.15)
where d and r are additional free parameters. Note that the additional power law
term is motivated by the idea that uniformly moving defects do not radiate–only
perturbations of the defect surface will. As a fast expansion rate will smooth the
presence of structure, it also makes sense the energy loss function reduces to only
the loop production term at large expansion rates. We additionally remark that the
radiative power law is limited to some extent as it cannot distinguish between different
types of radiation (scalar, gauge, massive, massless). In any case, it will allow us to
pinpoint which energy loss term will play a dominant role in sustaining scaling for
any given expansion rate.
At this point, we are ready to calibrate the extended VOS and phenomenologically
test our ansatz for velocity-dependent functions. Here we stress that the large number
of free parameters–6, instead of a single one–is not a problem in and of itself, as the
large range and number of expansion rates will allow us to completely understand the
velocity dependencies of each phenomenological function and numerically measure
each parameter to a good level of statistical significance. It is in fact crucial that
expansion rates one would not—at least in the standard cosmological picture—
expect, are used. Without them the uncertainties will become larger and larger. In a
latter section (and with all other sources of uncertainty under control) we will explore
this effect and understand if it is possible to calibrate this extended model in a more
restricted (and realistic) expansion rate range, m ∈ [0.5, 2/3].
As was done in the previous section and in the calibrations of [32, 33] we now
apply a bootstrap procedure to compare model and simulation data. The resulting
calibrated model parameters and their uncertainties for the two choices of correlation
length estimators are shown in Table 4.4. We additionally list the calibrations of the
domain walls VOS for a range of expansion rates comparable to what was used in
our work (relativistic regime; from [32]) and additionally including ultra-relativistic
and non-relativistic networks (from [33]).
Before we comment on the values of the parameters, let us first ascertain if the
model predictions are in line with what is expected from simulations and if our ansatze
for velocity dependent functions provide a better description of simulation data than
the standard ones. Taking the inverted VOS equations to generate a “measured”
4.3 Abelian-Higgs Cosmic Strings
115
Table 4.4 Calibrated parameters for the cosmic strings VOS model, obtained from the two sets
of GPU-based simulations in this work and corresponding to the winding-based and Lagrangianbased correlation length estimators described in the text. For comparison we show the analogous
parameters for the domain walls VOS model (obtained in the literature), both for a range of expansion
rates comparable to the one in this section work and for a wider range of expansion rates
Parameter
Cosmic strings
Cosmic strings
Domain walls
Domain walls
(Winding)
(Lagrangian)
(Relativistic)
(All)
Reference
m range
k0
q
β
r
d
c
This section
[0.50-0.95]
1.37 ± 0.07
2.30 ± 0.04
1.46 ± 0.07
1.85 ± 0.11
0.21 ± 0.01
0.34 ± 0.02
This section
[0.50-0.95]
1.27 ± 0.09
2.27 ± 0.05
1.54 ± 0.09
1.66 ± 0.10
0.26 ± 0.01
0.31 ± 0.02
[32]
[0.50-0.90]
1.72 ± 0.03
4.10 ± 0.17
1.65 ± 0.12
1.30 ± 0.06
0.29 ± 0.01
0.00 ± 0.03
[33]
[0.20-0.9998]
1.77 ± 0.03
3.35 ± 0.32
1.08 ± 0.07
1.42 ± 0.04
0.26 ± 0.02
0.00 ± 0.08
value of the momentum parameter and energy loss function we then compare with
the standard and extended calibrated functions—the comparison can be seen in Fig.
4.5. As evidenced by the orange solid lines, the standard ansatz fails to accurately
reproduced the measured k(v) and F(v) for the extended range of expansion rates.
The extended ansatze however, provide a better fit, as illustrated by the blue line.
In addition we can also plot the measured asymptotes and check if the VOS model
provides a reasonable fit for all expansion rates. This comparison can be found in
Fig. 4.6 and it shows the extended model predicts the asymptote quantities very well
at larger expansion rates, slightly less so for lower ones. We will see in the next
section that in part this can be attributed to the computation of the offset in , and to
the presence of radiative contaminants stemming from the initial conditions.
Now that we have shown that the calibrations provide a reasonable description
of simulation data, we can comment on the parameter values themselves. A first
remark is that winding and Lagrangian based estimations lead to compatible VOS
model parameters, given the uncertainties inferred. The largest difference occurs with
parameter d, but given that it is of order two standard deviations, it is not statistically
significant.
Another noteworthy feature is the fact that the maximal momentum parameter
value k0 exceeds unity. This might indicate the presence of additional internal structure on strings—wiggles. A more detailed study of the presence of wiggles in these
simulations, in the vein of [23, 30] or by comparison with the wiggly VOS [31, 40]
is a possible next step to take.
Also of note, is the preferred value of β which disagrees with Nambu-Goto ansatz
prediction (of β = 3) instead being roughly half this value. This partially explains
why the standard momentum parameter ansatz 2.62 cannot correctly reproduce the
velocity-dependencies of the dynamical quantities (see again Fig. 4.5). Parameter
116
4 Calibration of Extended VOS Mode
1.4
0.8
Extended Analytical k(v)
0.4
Standard Analytical k(v)
Simulation 2
0.2
0.3
0.2
0.4
0.5
Extended Analytical k(v)
Standard Analytical k(v)
Simulation 2
0.2
0.6
0.3
0.5
0.6
Extended Analytical F(v)
Standard Analytical F(v)
0.4
2
Simulation 2 [1 − m(1 + v )]
0.3
F (v)
Standard Analytical F(v)
Simulation 2 [1 − m(1 + v 2)]
0.3
0.2
0.1
0.2
0.3
0.4
v
0.5
Rad
Mat
0.2
Mat
F (v)
0.5
0.5
Extended Analytical F(v)
0.4
0.4
v
v
0.1
0.6
0.2
0.3
0.4
0.5
Rad
0.2
0.8
0.6
0.6
0.4
Rad
Rad
Mat
1.0
k(v)
k(v)
1.0
Mat
1.2
1.2
0.6
v
Fig. 4.5 Comparisons between the analytic VOS model predictions (solid lines) and the simulation
outputs (data points) for both the momentum parameter k(v) (top panels) and a generalized energy
loss function F(v) (bottom panels). Left side and right side panels correspond to the winding-based
and Lagrangian-based correlation length estimators discussed in the text. In each case we show the
simulation diagnostics used as input for the inverted VOS expressions. We show for comparison
both the previous and extended versions of k(v) and F(v) (depicted in red and blue lines, and given
respectively by Eqs. 2.62–2.63 and 2.70–2.73) in order to emphasize that the previous one provides
a poor fit while the extended one provides a very good one. To facilitate comparisons with previous
works the radiation and matter era values are explicitly indicated
degeneracies might also partially be responsible for the difference between analytical
and free β.
It is also curious to compare the calibrated model parameters for strings and walls,
as shown in Table 4.4. First the normalization parameters for each function are clearly
larger in the domain walls case. This is not completely surprising as for instance,
q, by definition, will depend on the dimensionality of the defects. The predicted
values for the maximal network velocity, for which the momentum parameter would
vanish, would be vmax = 0.5 for walls and around vmax = 0.66 for strings. From the
measured case, q is indeed in agreement with this prediction for both cases, giving for
strings a maximal velocity of v ∼ 0.66, while for domain walls v ∼ 0.5 (or v ∼ 0.55
if one considers both non-relativistic and ultra-relativistic regimes).
When it comes to the exponent parameters, such as r and β, the first is somewhat
larger for cosmic strings, and for the latter parameter the situation is less clear. If
one compares strings with walls in a comparable expansion rate range (relativistic),
4.3 Abelian-Higgs Cosmic Strings
117
0.40
0.40
VOS prediction
Simulation
0.35
0.30
ξ/η
ξ/η
0.30
0.25
0.5
0.7
0.8
0.9
0.15
1.0
m
0.7
Mat
Rad
Mat
0.6
0.20
0.5
0.6
0.7
0.7
VOS prediction
Simulation
0.6
0.8
0.9
1.0
m
VOS prediction
Simulation
0.6
0.5
υ
0.5
0.4
0.4
0.3
Rad
Rad
Mat
0.3
Mat
0.15
0.25
Rad
0.20
υ
VOS prediction
Simulation
0.35
0.2
0.2
0.5
0.6
0.7
0.8
m
0.9
1.0
0.5
0.6
0.7
0.8
0.9
1.0
m
Fig. 4.6 Comparison between simulation outputs and the calibrated extended VOS model prediction for the rate of change of ξ, (specifically ξ/η, top panels) and the root mean square velocity
(bottom panel). Left side and right side panels correspond to the two different choices of correlation length estimators, winding-based and Lagrangian-based, described in the text. To facilitate
comparisons with previous works the radiation and matter era values are explicitly indicated
then both seem compatible with the same β. However, as soon as non-relativistic and
ultra-relativistic walls are taken into account, a β of value one is obtained. In a later
section we will show if it is possible to explore the non-relativistic regime, in order
to try to answer this question.
The last parameters we still need to discuss are the most important for observational signatures—the behaviour of the energy loss parameters c and d. The normalization of radiative losses, d, seems to be very similar across all cases, and it
is tempting to speculate there might be a universal value for it, applicable also to
field theory simulations of other defect networks, such as global monopoles [28] and
semi-local strings [1]. This requires testing on a case-by-case basis.
The most striking difference comes from the loop chopping difference, c, which
for walls, independent of which expansion rates are included is always consistent
with zero, while for cosmic strings it is very clearly different from zero (at a high
level of statistical significance). This preliminary calibration seems to lead us to the
conclusion that energy loss mechanisms are different for walls and strings, in the
former radiative losses are the dominant mechanism, while for strings this is not the
case. To explore the relative importance of each mechanism in gauged strings, we
118
4 Calibration of Extended VOS Mode
can evaluate the ratio of the two energy loss terms for each expansion rate (for each
velocity) in the evolution equation for the correlation length. This ratio is given by
parameter , hereby defined as,
=
cv
Loop losses
=
.
Radiation losses
d[k0 − k(v)]r
(4.16)
And using the obtained model parameters along with the velocities from simulations
we find that in radiation epoch (m = 1/2) the ratio takes value,
rad ∼ 0.82
(4.17)
mat ∼ 1.06 ;
(4.18)
while in the matter era (m = 2/3),
Indicating that in the latter loop production and radiative losses contributed in nearly
equal parts to energy loss, while in the first radiative losses become more important.
This is expected as faster expansion rates (small velocities) should result in less
significant role for radiative losses. As a final comparison, taking an expansion rate
of m = 0.9,
(4.19)
0.9 ∼ 6.92 ,
and it is seen that in this extreme regime, loop production is the dominant energy
loss mechanism.
As a last concluding remark we can also compare this calibration of the VOS
model with the calibrations of the standard VOS for both Nambu-Goto and field
theory simulations. In [34] it was found that Nambu-Goto simulations (in radiation
and matter eras) result in a value of c = 0.23 ± 0.04, whereas for field theory simulations a value of c = 0.57 ± 0.05 is preferred. Our new result differs from the
former significantly (at the level of two standard deviations) which is to some extent
expected, as c can have correlations with other parameters (this will be analyzed in
greater detail later on), in particular note that the form of k(v) is also very different
in Nambu-Goto and Abelian-Higgs. Compared to field theory simulations, the new
result is also smaller, which can be expected due to combined effect of the noninclusion of an explicit radiative term and a different k(v). In addition, at the time of
writing the results whereupon this section is based, we knew not how certain possible
sources of systematics could impact our results and the conclusions withdrawn. As
such, the rest of the chapter is an exercise in improving these calibrations, walking
towards a definitive calibration of the VOS for field theory strings.
4.3 Abelian-Higgs Cosmic Strings
119
4.3.2 Overcooled Initial Conditions
Having already established what a preliminary calibration of the Extended VOS
for field theory gauged strings predicts, we now need to march towards a definitive
calibration, with all possible sources of systematics under control. Although most
of these sources of systematic error will require more computing firepower, there is
one source which can be investigated without the need of larger lattices.
This source pertains to the use of cooling procedures in the initial conditions. In
many instances of gauged string simulations (field theory) the simulation is started at
a later time to reduce the offset η0 to zero, which can significantly delay the formation
of a string network via Hubble damping (this depends on the size of Hubble length
relative to string radius). As such a period of cooling is applied to accelerate the
formation of strings. Therein also lies an additional advantage: most random initial
conditions have large gradients which result in the appearance of extra radiation
in the simulation box, one cannot easily remove. This radiation manifests itself as
an oscillation of the Lagrangian estimator (see Fig. 4.7 top left-hand side panel),
or as oscillations in the velocity computation (refer to aforementioned figure, right
hand, top panel). This radiation is not dampened out at sufficiently low expansion
rate (being evident for instance in radiation epoch). Cooling (also known as gradient
flow) corresponds to evolving fields according to the discrete form of the following
equations of motion,
λ
(4.20)
φ̇ = D j D j φ − (|φ|2 − 1)φ
2
Ḟ0 j = ∂ j Fi j − 2a 2 e2 I m[φ∗ D j φ] .
(4.21)
which can obtained from considering the physical equations of motion for the system,
assuming all field second order derivatives with respect to time to be null and no
Hubble damping (ie. aȧ will be null). The timestep size for this period is set to
δη = 1/30 and the parameters λ and e are set to the same values as the corresponding
couplings (physical) as in the cosmological part λ0 and e0 . Since the extended VOS
model explicitly accounts for separate energy loss mechanisms, in the form of loop
chopping and radiation (scalar or gauge) emission, calibrations with differing degrees
of cooling will allow us to pinpoint exactly how much radiation is removed, if there is
any enhancement of loop production (to sustain scaling), or how small-scale structure
might be affected.
To this effect we will then produce three sets of simulations, with three different
amounts of cooling:
• Standard, no cooling case, where simulations start at conformal time η = 1. This
is basically the calibration set from the previous section, and we will refer to it as
Hot case;
• A case with some amount of cooling, with the same initial conditions fromt the
previous set but a dissipation period applied from time ηcool = −10.0 to η = 1.0.
This will be denominated the Warm case;
120
4 Calibration of Extended VOS Mode
0.7
50
0.6
40
0.5
ξL
vω2 30
20
0.4
0.3
0.2
10
0.1
0
0
20
40
60
80
100
0.0
120
0
20
40
60
η
80
100
120
80
100
120
80
100
120
η
0.7
50
0.6
0.5
vω2 ξL
40
30
0.4
0.3
0.2
20
0.1
10
0
20
40
60
80
100
0.0
120
η
20
40
60
η
0.7
55
0.6
50
0.5
45
vω2 40
ξL
0
35
30
0.4
0.3
0.2
25
0.1
20
0
20
40
60
80
η
100
120
0.0
0
20
40
60
η
Fig. 4.7 The evolution of mean string separation (left) and mean velocity squared (right) according
to the Lagrangian estimator and the equation of state estimator, averaged for sets of 12 runs at each
expansion rate in the range [0.50, 0.95]. The top panel shows the results for the Hot case (standard
case, without cooling), while the middle and bottom panels show the Warm and Cold cases. Low
expansion rates are at the top of the panels while high expansion rates are at the bottom of the
panels. All simulations have box sizes 5123 with constant comoving width (PRS algorithm)
4.3 Abelian-Higgs Cosmic Strings
121
• A third case with more cooling applied, ie. with the same initial conditions but a
cooling period from ηcool = −50.0 until initial conformal time η = 1.0. We will
refer to this overcooled case as Cold;
We remark that in all three cases, the cosmological evolution start conformal time
is η = 1.0, and the only difference in terms of initial conditions for this period is
the degree of cooling. All calibrations of the VOS are done the with the scaling
regimen observed during the cosmological evolution, so as to ascertain the effects of
varying degrees of cooling on the obtained parameters. For each set of simulations,
we will use 43 expansion rates in the range m ∈ [0.5, 0.95] in FLRW Universes with
the scale factor varying as a ∝ t m , each with 12 runs, exactly as was done for the
standard case. Radiation and matter epochs correspond to m = 1/2 and m = 2/3,
respectively.
In order to re-write the VOS in the more compact form, and to calibrate it, we will
measure two quantities from the simulations, and v0 , with the first being given by,
=
η
.
(η − η0 )(1 − m)
(4.22)
and v0 is the square root of the mean velocity at scaling. We note again that η0 is
nothing more than an offset dependent on the initial conditions (see previous section).
The first quantity implicitly assumes one can approximate the slope of ξ (the rate of
change of ξ) by ξ/η or, given the presence of the offset ξ/(η − η0 ). As before, this
means we expect the following scaling laws,
ξ ∝ (η − η0 )μ
(4.23)
v ∝ ην ,
(4.24)
with the analytical expectation that the exponents μ and ν reach unity and null
respectively once the network is exhibiting scaling behavior. As done before for the
standard case (previous section) we will now verify the quality of scaling from fitted
values of these exponents and will using them to select a “good enough” calibration
conformal time range. For the Warm and Cold cases we will therefore use a more
stringent fitting range of η ∈ [100, 128]. The scaling exponents obtained and the
network parameters for each expansion rate are listed in Table 4.5.
With these network parameters measured from the simulations we can now calibrate the Extended VOS via the same bootstraping procedure from the previous
sections (and from the ealier domain wall calibrations of [32, 33]). The resulting
parameters from this procedure for the three sets of simulations are shown in Table
4.6. A comparison between VOS predictions and simulation data for the asymptotic quantities is shown in Fig. 4.8 (top panels) which show the model is accurately
describing simulations. In addition, one can also use the inverted VOS equations to
display a measured energy loss function or momentum parameter and compare with
νwar m
0.130±0.006
0.121±0.006
0.120±0.006
0.118±0.006
0.121±0.006
0.134±0.006
0.141±0.005
0.142±0.006
0.139±0.006
0.144±0.007
0.154±0.006
0.159±0.007
0.163±0.007
0.167±0.007
0.157±0.008
μwar m
0.005±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.003±0.001
0.003±0.001
0.004±0.001
0.003±0.001
0.004±0.001
m
0.5
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0.6
0.61
0.62
0.63
0.64
0.789±0.058
0.769±0.052
0.755±0.056
0.741±0.054
0.725±0.053
0.705±0.054
0.689±0.053
0.669±0.051
0.652±0.051
0.639±0.051
0.621±0.050
0.606±0.051
0.595±0.049
0.581±0.047
0.572±0.049
war m
0.503±0.015
0.509±0.015
0.514±0.014
0.519±0.014
0.523±0.014
0.526±0.014
0.531±0.014
0.534±0.013
0.538±0.012
0.541±0.012
0.543±0.012
0.545±0.013
0.548±0.012
0.550±0.012
0.553±0.012
v 2 war m
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
0.005±0.001
μcold
0.110±0.005
0.108±0.005
0.107±0.005
0.105±0.005
0.106±0.004
0.104±0.004
0.103±0.004
0.099±0.004
0.101±0.004
0.096±0.004
0.093±0.004
0.090±0.004
0.086±0.004
0.082±0.004
0.085±0.003
νcold
0.774±0.064
0.757±0.062
0.741±0.063
0.723±0.061
0.705±0.061
0.688±0.058
0.673±0.057
0.657±0.057
0.640±0.055
0.626±0.054
0.610±0.053
0.597±0.051
0.585±0.049
0.572±0.047
0.560±0.047
cold
0.486±0.016
0.492±0.016
0.497±0.016
0.502±0.016
0.506±0.016
0.510±0.015
0.514±0.015
0.518±0.015
0.522±0.015
0.526±0.014
0.529±0.013
0.532±0.013
0.536±0.013
0.539±0.012
0.542±0.012
v 2 cold
Table 4.5 Scaling exponents μ and ν and network parameters used for VOS calibration for the Cold initial conditions case. One-sigma statistical uncertainties,
from averaging sets of 12 simulations, are reported throughout
122
4 Calibration of Extended VOS Mode
0.182±0.010
0.166±0.011
0.134±0.013
0.105±0.014
0.092±0.014
0.073±0.014
0.074±0.015
0.053±0.014
0.036±0.016
0.020±0.017
0.038±0.018
0.067±0.019
0.087±0.018
0.092±0.019
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.003±0.001
0.003±0.001
0.69
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.8
0.82
0.83
0.84
0.68
0.7
0.165±0.009
0.182±0.009
0.004±0.001
0.004±0.001
0.6(6)
νwar m
μwar m
m
Table 4.5 (continued)
1.534±0.104
1.469±0.098
1.407±0.100
1.295±0.093
1.203±0.085
1.160±0.084
1.119±0.082
1.078±0.083
1.037±0.075
1.003±0.073
0.974±0.071
0.942±0.066
0.918±0.065
0.895±0.064
0.870±0.062
0.834±0.064
war m
0.351±0.014
0.361±0.015
0.371±0.016
0.390±0.017
0.408±0.016
0.417±0.015
0.424±0.016
0.431±0.017
0.438±0.016
0.445±0.016
0.452±0.017
0.459±0.016
0.466±0.016
0.473±0.016
0.480±0.016
0.487±0.016
v 2 war m
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.004±0.001
0.005±0.001
μcold
0.197±0.008
0.168±0.008
0.152±0.007
0.122±0.006
0.112±0.006
0.111±0.006
0.108±0.006
0.108±0.006
0.107±0.006
0.114±0.005
0.113±0.005
0.108±0.006
0.110±0.005
0.112±0.005
0.105±0.005
0.107±0.005
νcold
1.470±0.100
1.414±0.096
1.362±0.092
1.266±0.088
1.177±0.084
1.137±0.082
1.101±0.078
1.067±0.077
1.035±0.076
1.005±0.075
0.973±0.075
0.943±0.071
0.916±0.071
0.889±0.07
0.864±0.068
0.833±0.066
cold
0.332±0.014
0.343±0.015
0.353±0.015
0.372±0.016
0.390±0.016
0.398±0.017
0.407±0.018
0.415±0.017
0.423±0.017
0.431±0.017
0.438±0.017
0.445±0.016
0.452±0.017
0.458±0.016
0.464±0.016
0.472±0.016
v 2 cold
4.3 Abelian-Higgs Cosmic Strings
123
νwar m
0.122±0.017
0.140±0.016
0.158±0.014
0.161±0.016
0.147±0.016
0.145±0.015
0.127±0.013
0.095±0.011
0.093±0.011
0.096±0.009
0.081±0.009
μwar m
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.004±0.001
0.004±0.001
0.003±0.001
0.002±0.001
m
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
Table 4.5 (continued)
3.190±0.183
2.820±0.199
2.544±0.204
2.339±0.181
2.176±0.152
2.043±0.128
1.928±0.127
1.824±0.127
1.736±0.125
1.663±0.117
1.598±0.110
war m
0.202±0.004
0.218±0.005
0.234±0.007
0.249±0.008
0.263±0.009
0.278±0.01
0.291±0.011
0.303±0.011
0.317±0.012
0.328±0.013
0.340±0.013
v 2 war m
0.002±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
0.003±0.001
μcold
0.240±0.003
0.227±0.003
0.221±0.003
0.228±0.004
0.227±0.005
0.228±0.006
0.234±0.006
0.228±0.007
0.222±0.007
0.216±0.008
0.212±0.008
νcold
2.774±0.184
2.517±0.185
2.311±0.177
2.147±0.166
2.016±0.154
1.908±0.136
1.812±0.127
1.729±0.120
1.656±0.115
1.591±0.110
1.529±0.105
cold
0.183±0.004
0.200±0.005
0.215±0.006
0.230±0.007
0.245±0.008
0.259±0.009
0.272±0.010
0.285±0.011
0.298±0.012
0.309±0.013
0.321±0.014
v 2 cold
124
4 Calibration of Extended VOS Mode
4.3 Abelian-Higgs Cosmic Strings
125
Table 4.6 Calibrated VOS model parameters for our three cooling scenarios: Hot (standard), Warm
and Cold initial conditions. These were obtained through the previously used bootstrap methods
Case
d
r
β
k0
q
c
Reference
Hot
0.21±0.01 1.85±0.11 1.46±0.07 1.37±0.07 2.30±0.04 0.34±0.02 Previous
section
0.26±0.01 1.58±0.10 1.29±0.06 1.21±0.06 2.05±0.04 0.36±0.03 This
section
0.17±0.01 1.64±0.09 1.91±0.03 0.97±0.03 2.38±0.02 0.56±0.01 This
section
Warm
Cold
0.7
0.40
Hot (standard) initial conditions
Warm initial conditions
Cold initial conditions
0.6
0.5
0.30
ξ/η
0.4
0.3
Rad
Mat
0.20
Mat
0.25
Rad
υ
Hot (standard) initial conditions
Warm initial conditions
Cold initial conditions
0.35
0.2
0.5
0.6
0.7
0.8
Mat
1.0
0.4
F (v)
0.8
0.6
0.1
0.6
0.7
0.8
0.9
1.0
m
Hot (standard) initial conditions
Warm initial conditions
Cooled initial conditions
0.3
Hot (standard) initial conditions
Warm initial conditions
Cold initial conditions
0.2
0.3
0.4
v
0.1
0.5
0.6
0.1
0.2
0.3
0.4
0.5
Rad
0.2
Mat
0.4
0.5
0.5
1.2
0.2
0.15
1.0
Rad
1.4
k(v)
0.9
m
0.6
v
Fig. 4.8 Top
panels: the string network average velocity and dimensionless comoving string separation, v = v 2 and ξ/η, respectively in the left and right panels, for the three cooling scenarios.
Bottom panels: The momentum parameter and the energy loss function (left and right panels respectively) for the same cooling scenarios. In all cases the error bars are the statistical uncertainties from
averaging over 12 simulations with different initial conditions, and the solid line is is the prediction
from the VOS model, with the calibrated parameters listed in Table 4.3. For convenience the values
corresponding to simulations in the radiation and matter eras have been highlighted
the analytical ansatz for their forms (inserting of course the parameters as appropriate). This is shown in the bottom panels of Fig. 4.8.
Our results support the expectation that a small amount of cooling will have very
little impact in the network evolution, as its impact is to only remove thermal oscillations. Indeed, from the Fig. 4.8 and Table 4.6 we can see the warm case parameters
126
4 Calibration of Extended VOS Mode
bring about a better agreement between model prediction and simulation data and
the changes in parameters are not statistically significant, especially considering the
parameter uncertainties (purely statistical) might be optimistic—more on this next
section.
In contrast, for the case with a large amount of cooling (cold) there are clear
differences which are much more significant from a statistical point of view. Take
for instance the observationally important loop chopping parameter c. This parameter
clearly increases with excessive cooling: c = 0.34 → 0.56 when going from the Hot
to the Cold case. This can be interpreted as the analytical model correctly identifying
the reduced amount of radiation in the box as being compensated by a larger loop
production term.
Another parameter of note is the maximal value of the momentum parameter, k0 .
As previously mentioned this parameter can indicate that the curvature radius is not
the same as the correlation length or the mean string separation, which acts as a clue
for the presence of small-scale structure. Going from the Hot to the Warm and then
Cold case, there are clear statistically significant differences as the best-fit values are
reduced from 1.37, to 1.21 and then to 0.97 respectively. We remark that this effect is
rather subtle as the velocities are not particularly affected, with only the correlation
length showing some differences (statistically significant when going from Warm to
Cold cases, but see next section).
4.3.2.1
An Improved Calibration Pipeline
In preparation for the larger runs (40963 and 81923 ) used to investigate possible
systematic sources of error in the calibration of the Extended VOS, we sought to
improve our calibration pipeline. The more robust pipeline is a result of two features
being added: one including full uncertainty propagation, and a new computation of
the offset η0 , and with the addition of Bayesian inference for parameter estimation.
Considering the existence of parameter degeneracies and the possible expansion rate
dependence on uncertainties (specifically for ), we must discuss the impact of each
feature on the previous conclusions.
The first improvement (full uncertainty propagation) relies on the uncertainties
Python package to automatically propagate uncertainties for the velocity and mean
string separation. For each case, at each expansion rate, the average and standard
deviation are obtained and stored into ufloat arrays from the aforementioned uncertainties package. In addition we compute the offset η0 from each run, storing the
mean offset and standard deviation in such an array (for each expansion rate). We
can see the new uncertainties in the top two panels of Fig. 4.9. Velocity uncertainties
are not significantly affected by this proceedure, however, since the computation of
the offset has changed, the uncertainties of increased.
From here-on-out, all uncertainties are propagated automatically, which is much
more convenient , less error-prone and independent of the structure of the VOS given
to the pipeline. This can already be seen in action for the uncertainty computation of
F(v) and k(v) in the bottom panels of Fig. 4.9. Broadly speaking the uncertainties
4.3 Abelian-Higgs Cosmic Strings
127
Fig. 4.9 Same as Fig. 4.8, but including the uncertainty propagation described in the main text and
with the solid line now being the prediction from the VOS model with the calibrated parameters
listed in Table 4.6
Table 4.7 Same as Table 4.3, but including the uncertainty propagation described in the text
Case
d
r
β
k0
q
c
Reference
Hot
0.20 ± 0.01
2.06 ± 0.13
1.54 ± 0.06
1.38 ± 0.02
2.38 ± 0.03
0.35 ± 0.01
This section
Warm
0.21 ± 0.01
1.68 ± 0.12
1.41 ± 0.05
1.27 ± 0.02
2.24 ± 0.03
0.37 ± 0.01
This section
Cold
0.19 ± 0.01
2.00 ± 0.10
1.95 ± 0.03
0.98 ± 0.01
2.45 ± 0.01
0.58 ± 0.01
This section
of F(v) now show the opposite pattern of what is seen on the previous figure: they
become larger at smaller expansion rates, and reduced at high ones.
Regarding the VOS model parameters, we must now verify the impact of the new
uncertainty computation procedure. The new parameters for all cases are listed in
Table 4.7. Comparing with the previous parameters, the changes are small, within at
most two standard deviations. The changes do seem larger in the Cold case than in
the Hot and Warm cases, and overall the uncertainties of model parameters seem to
decrease. This can be due to a multitude of reasons, including parameter degeneracies
or even uncertainty underestimation. The latter can be a result of a subtle point, and
corresponds to the difference between having parameter distributions marginalized
over the values of other parameters or conditional on other parameters.
128
4 Calibration of Extended VOS Mode
For these reasons, we improve the VOS calibrator tool by implementing Markov
Chain Monte Carlo (MCMC) capabilites. To be more specific, we used the emcee1
[22] package to implement such Bayesian inference capabilities. Besides improving
uncertainty estimation, it will also allow us to test if the best-fit parameters obtained
via bootstrapping minimization lie on global minima and to unveil any parameter
interdependencies. In order to obtain posterior distributions for all parameters, we
assume priors based on logarithmic probability density functions drawn from uniform distributions, and a log likelihood of computed via the well-known χ2 statistic,
defined as,
χ2 =
[v pr edicted (m) − vsimulated (m)]2
m
σv2
+
[
pr edicted (m)
−
σ2
simulated (m)]
2
.
(4.25)
With a minimum of 10000 steps and 32 walkers we achieve convergence on all
cases. In addition, we always obtain a minimum of a mean acceptance rate of 0.4.
The resulting posterior distributions can be seen in Figs. 4.10, 4.11 and 4.12,
respectively for the Hot, Warm and Cold cases, with blue solid lines indicating the
minimization best-fit results. We also report the 50th quantile (as the best-fit value)
and the 16th and 84th quantile (represented as black dashed lines on the corner
plots) for the uncertainties on Table 4.8. We can make some first remarks on the
aforementioned figures. For instance, the minima found by minimization normally
coincide with the likelihood peak found via MCMC methods. However, in the specific
case of r it seems that as the distribution of the posterior widens with the amount
of cooling and it is possible for minimization to find the incorrect minima: this is
evident in the cold case (see Fig. 4.12).
Note as well the asymmetry that arises in the d posterior for the cold case, to the
point where the likelihood peak lies not on the 50th quantile, but roughly on the 16th
quantile (we always quote 16, 50, 84th quantiles in tables). This can be attributed to
the reduction of radiative losses from the network which erases some information
useful to calibrate the model. We remark as well that while the uncertainty of d in the
Warm and Hot cases is relatively similar to the bootstrap uncertainties, in the Cold
case the uncertainty is indeed larger. Overall, the uncertainties of parameters c, k0
and d (with the exception mentioned above) are in line with the previous bootstrap
uncertainties, while for q, β and r were previously underestimated. The parameter
that is least well determined is (somewhat unsurprisingly) r .
To conclude we will also note the degeneracies that exist between parameters
(clearly seen in the corner plots). Several of these degeneracies can be physically
explained in the context of the VOS. For instance let’s take the specific example of
k0 which is negatively correlated with q. Even if k0 reflects the maximal value of the
momentum parameter k(v) and is therefore largely determined by the large expansion
rate regime, it is an indicator of the presence of small-scale structure. And the regime
where small-scale structure will be more obvious is in the low-expansion rate limit,
where it will have an effect on the velocity, and thusly on q. Negative correlations
1
https://emcee.readthedocs.io/en/stable/.
4.3 Abelian-Higgs Cosmic Strings
129
Fig. 4.10 The corner plots for the posterior distributions in the Hot (standard case). Above the
1D histogram for each variable we report the 50th quantile and use the 16th and 84th quantiles
to compute and show uncertainties. These three quantiles are indicated by the dashed black lines.
Contour plots between pairs of parameters are also shown. The blue lines (and dots) represent the
value found via the bootstraping procedure
for k0 also include d and β. Positive correlations exist for other parameters, such as
for c and r .
4.3.3 Further Exploration of Model Sensitivity to Numerical
Choices
Having improved the robustness of our VOS calibration tool and with the computational resources of Piz Daint at hand (1 million node hours), we are now in the
130
4 Calibration of Extended VOS Mode
Fig. 4.11 Same as Fig. 4.10, for the Warm case
condition to explore systematic errors which require the use of large lattices. We
will first explore the impact of dynamic range (or equivalently, lattice size) and then
we will move to attempting to understand the impact of different velocity estimators
on VOS model calibration. In all cases we will attempt to understand the necessary
numerical choices to compensate for the studied systematic error sources and obtain
a definitive VOS calibration.
4.3.3.1
Dynamic Range and Lattice Size
Field theory simulations have a problem of separation of scales. That is, one wishes
to understand the properties of the network at the level of mean string separation,
but also have a sufficiently large resolution to describe behavior happening at small
4.3 Abelian-Higgs Cosmic Strings
131
Fig. 4.12 Same as Fig. 4.10, for the Cold case
Table 4.8 Same as Tables 4.6 and 4.7, but using the Bayesien inference method described in the
text. We always report the 50th quantile value, with the 16th and 84th being used for computing the
uncertainties
Case
d
r
β
k0
q
c
Reference
Hot
+0.03
0.20−0.03
2.11+0.50
−0.42
1.55+0.19
−0.18
1.37+0.07
−0.06
2.38+0.08
−0.09
0.35+0.03
−0.04
Warm
0.21+0.04
−0.04
1.88+0.79
−0.53
+0.23
1.42−0.20
1.27+0.09
−0.07
+0.12
2.24−0.11
0.39+0.05
−0.07
Cold
0.37+0.35
−0.19
+1.81
3.74−1.70
+0.31
1.94−0.27
0.98+0.04
−0.04
2.45+0.13
−0.14
0.59+0.02
−0.03
This
section
This
section
This
section
132
4 Calibration of Extended VOS Mode
scales, close to the string radius for instance. The size of the lattice (multiplied by
lattice spacing) will dictate the final conformal time and hence available dynamic
range, beyond which boundary effects are expected to play a more dominant role.
This means that for the same lattice spacing, larger lattices will resolve scales down to
smaller fractions of the horizon (as the horizon can grow to larger sizes). This should
have a visible impact on the calibration of the VOS given that some parameters are
intimately connected to effects occurring at small-scales.
To explore this issue we will consider lattices of sizes 10243 , 20483 and 40963
with s a single lattice spacing of Δx = 0.5. We will simulate 25 expansion rates m
(again assuming a power law scale factor) in the range m ∈ [0.5, 0.95], and for each
of these, in order to reduce statistical errors we will use 10 runs. We will consider
throughout this section the use of the winding based correlation length estimator ξW
defined in Eq. 3.17. We will also, both as a cross-check and to segue into the next
section, consider two different velocity estimators, the equation of state based vω2 and the conjugate momenta based vφ2 , defined in Eqs. 3.20 and 3.18 respectively.
In terms of changes to the measured asymptotes, we see a slow drift in the values
of ξ/η. This is represented in Fig. 4.13, where in the top-left panel the shaded region
Fig. 4.13 Comparison of the mean rate of change of correlation length ξ/η (top left) and the mean
velocity v (top right) with the solid lines corresponding to the calibration and the shaded regions
to the uncertainty of the measurements of each estimator for three different box sizes. The bottom
plots show how these differences impact the momentum parameter k(v) (bottom left) and in the
energy loss parameter F(v) (bottom right)
4.3 Abelian-Higgs Cosmic Strings
133
corresponds to the simulation values (with statistical uncertainties) and the solid lines
correspond to the VOS predictions. We remark that this drift has been reported in the
literature before [11, 24] and can also be seen in our work (previous chapter, section
on the validation of the multi-GPU application). Note however, that this drift has
been partially attributed to the presence of cooling in the preparation of the initial
conditions, at least when going from 5123 to 10243 (see [11]). In the simulations in
this section, we did not apply any cooling, therefore this drift is entirely due to lattice
size. On the top right-hand side panel of Fig. 4.13, we also see that little to no change
takes place in the values of the velocity (and neither in its uncertainties).
There is also an additional behavior to note: the uncertainties of ξ/η are reduced
with an in increase lattice size. This is a result of the increased dynamic range reducing
the importance of the initial conditions offset. It is also of note that all changes (slow
drift, reduced uncertainties) are qualitatively smaller when going from 20483 to
40963 than from 10243 to 20483 . This seems to suggest that, much like in the domain
walls case [32], there is a minimum lattice size for model calibration, beyond which
parameter values are stable.
These differences will consequently impact the velocity-dependent functions and
conversely affect the model parameters. This can be visually inferred from the bottom
panels of Fig. 4.13 and from the 1σ and 2σ contours corner plots of Fig. 4.14. The
different calibrations are also summarized on Table 4.9. To show how the shape of
velocity-dependent functions and parameters are connected let’s take the example of
the energy loss function F(v). As the resolution increases, and ξ/η decreases, F(v)
shifts downwards which suggests a reduction of either normalization parameters
c and/or d. The impact on the shape of the momentum parameter k(v) is however
much less noticeable, with changes being circumscribed to a reduction of its maximal
value k0 . These expectations are confirmed in the Table 4.9 and in the corner plots
of Fig. 4.14 where k0 and c are reduced and d (to a lesser extent) increases. The
anti-correlation of c and d is explained by the fact that both control the relative
importance of different mechanisms of the energy loss function. This anti-correlation
also means that loop formation is gradually replaced by radiative losses, eventually
becoming negligible at 40963 . The effects of reduced uncertainties is also manifest in
the decreasing area of the confidence contours and the smaller changes from 20483
to 40963 visually confirm that most parameters, with the notable exception of c, are
close to their stable values.
However, and before we immediately declare that for gauged cosmic strings loopchopping plays no dominant role at any expansion rate (much as domain walls [32]),
there are a two clues that lead us to note this could be an incorrect conclusion. Visually
inspecting networks in radiation era reveals the formation and evolution of loops
with different sizes (both large, horizon sized loops and small ones). Screenshots
for a 40963 radiation era simulation can be seen in Fig. 4.16 for conformal time
η = 700, 710 and 714. The full animations (colored either by velocity or group of
string cells) spanning the full conformal time η ∈ [741, 1024] are available at the
following webpages [14, 15]. The fact that such loops are present is at odds with the
information that c −→ 0.
134
4 Calibration of Extended VOS Mode
Fig. 4.14 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity
estimator vω , for three different box sizes. The 2D panels the depict the 1σ and 2σ confidence
regions
Table 4.9 Calibrated VOS model parameters for our three different lattice sizes, 10243 , 20483 and
40963 , all with the same lattice spacing Δx = 0.5, and two different choices of velocity estimators,
vω2 and vφ2 (in the top and bottom parts of the table, respectively), further described in the main
text. Displayed values correspond to 16th, 50th, 84th percentiles of the posterior distributions
Lattice
Δx
Velocity d
r
β
k0
q
c
size
estimator
10243
20483
40963
10243
20483
40963
0.5
vω2 0.5
vφ2 +0.04
0.32−0.04
+0.02
0.37−0.02
0.39+0.02
−0.02
0.35+0.23
−0.10
0.33+0.05
−0.04
0.36+0.03
−0.03
1.51+0.48
−0.37
1.27+0.17
−0.15
1.36+0.15
−0.13
2.39+1.58
−0.94
1.86+0.39
−0.32
+0.26
1.72−0.23
+0.34
1.82−0.30
2.33+0.21
−0.20
+0.20
2.32−0.18
+0.73
2.79−0.56
2.65+0.28
−0.26
+0.21
2.50−0.20
1.27+0.08
−0.06
1.21+0.03
−0.03
1.18+0.03
−0.03
1.06+0.05
−0.05
1.05+0.03
−0.03
1.06+0.02
−0.02
2.41+0.13
−0.13
2.57+0.06
−0.06
2.59+0.05
−0.05
2.95+0.18
−0.19
+0.08
2.84−0.08
+0.06
2.83−0.06
0.15+0.05
−0.07
0.03+0.02
−0.03
+0.01
0.00−0.01
+0.04
0.44−0.05
0.31+0.02
−0.02
0.23+0.01
−0.01
4.3 Abelian-Higgs Cosmic Strings
135
Fig. 4.15 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity
estimator vφ , for three different box sizes. The 2D panels the depict the 1σ and 2σ confidence
regions
The second hint is to repeat this analysis with a different velocity estimator. So
far we have used the equation of state estimator vω2 defined in Eq. 3.20, but we
can repeat the analysis with the conjugate momenta estimator vφ seen in 3.18. The
resulting parameter tables and corner plots are available at Table 4.9 and in Fig.
4.15, respectively. The model calibration instead reveals a loop-chopping parameter
that does not go to zero, in fact even at the largest resolution 40963 it is statistically
different from zero and rather similar to the value of d. Remarkably, d seems to
not change as the resolution increases, being in fact the least affected of the six
parameters. The fact that these two model calibrations result in two very different
conclusions prompts an investigation on the reliability of each velocity estimator.
This can eventually leads to a solution which reconciles (even if partially) the two
calibrations.
136
4 Calibration of Extended VOS Mode
Fig. 4.16 Winding centerlines displayed for a radiation epoch, three timesteps, 40963 lattice size
with spacing Δx = 0.5. The top, middle and bottom panels correspond to conformal time η = 700,
710 and 714. We use the local velocity estimator of [45] to color the reconstructed strings. The full
animation can be found at [15]
4.3 Abelian-Higgs Cosmic Strings
4.3.3.2
137
Lattice Spacing and Velocity Estimators
In order to start uncovering the origin of the two differing calibrations we will remind
the reader that in the previous small lattice calibration (at 5123 ) we noted the differences between estimators were more accentuated for velocity estimators. In fact the
difference between correlation length estimators is maximally of about 6% at high
expansion rate (m = 0.95), whereas at the same low-velocity limit this difference
grows to about 10 − 12% for the velocity estimators. Better agreement occurs at
the moderate expansion rate limit (where radiation and matter epochs are) with a
negligible difference for the correlation length estimators and a difference of about
5% for the velocity estimators. Given that in the high-expansion rate limit the VOS
model reduces itself to two parameters,
mξ
dξ
=
v 2 + cv
dη
(1 − m)η
(4.26)
dv
2mv
k0
= (1 − v 2 )
−
,
dη
ξ
(1 − m)η
(4.27)
being c and k0 , and the fact that at larger expansion rates uncertainties are globally
smaller (thus imparting larger statistical weight) it is not completely unsurprising that
even a small disagreement (order 10%) is sufficient to impact the calibration. Given
how important these two parameters are to either indicate the presence of wiggliness
or the importance of loop formation to overall energy losses, we will now turn our
attention to the high expansion rate limit and to finding a possible solution to this
conundrum.
Throughout the literature it can also be seen that certain potential systematics may
arise in this high expansion rate/low velocity limit. For instance, in Abelian-Higgs
in flat space it is (analytically) expected that vortices, provided their velocities are
small enough (non-relativistic, v < 0.2), will pin to lattice sites, unable to overcome
a potential barrier between sites [44]. This barrier is known as the Peierls-Nabarro
barrier [35, 37]. Such an effect would manifest itself as a lattice spacing dependence
on the measured values of velocity dependent functions, k(v) and F(v) and could be
a limiting factor in the calibration of the VOS (especially deep in the non-relativistic
regime).
Continuing on the issue of lattice spacing, in [24] it was shown that in Minkowski
flat space the equation of state velocity estimator was closer to the analytical expectation for the velocity of an oscillating string than the conjugate momenta estimator.
In addition reducing the lattice spacing improved agreement between estimators,
approximating the latter to the first, and both to the analytical expectation. This follows as the increase in resolution can be seen as an approximation to the continuum
limit. The conclusion that in Minkowski space the equation of state estimator is more
reliable might not, however, apply for the opposite limit.
With the goal of understanding how lattice spacing might impact the behavior
of both velocity estimators in this critical regime, we will characterize the dif-
138
4 Calibration of Extended VOS Mode
Fig. 4.17 The effect of lattice spacing on the velocity estimators, for high expansion rate simulations. The top panels show the separate values of the velocities (with the corresponding statistical
uncertainties) obtained with the two velocity estimators defined in the text, while the bottom panels show the relative difference between the two. Left and right side panels depict the results for
standard spacing and half-lattice spacing
ferences in velocities from high expansion rates used for relativistic calibrations
(m=0.93, 0.94, 0.95) all the way into the deep non-relativistic regime. We will do
so for two sets of simulations, each corresponding to two lattice spacings, Δx = 0.5
and Δx = 0.25. At each expansion rate, the two velocities will have a statistical
uncertainty and will represent the average of 10 runs–as is standard the same 10
initial conditions are used for both sets.
From the top panels of Fig. 4.17 one can already see that the difference between
estimators increases with expansion rate and lattice spacing improves agreement
between them. More specifically, the relative differences, which maximally at m =
0.997 will be or order 60% for Δx = 0.5 and minimally of order 10% for m = 0.93
and agreement is significantly improved at Δx = 0.25. Case in point, the lower
expansion rates are now in near agreement at lower m and in the non-relativistic
limit being at 30% maximally. Qualitatively, the largest change is from the equation
of state estimator, whose velocity vω approaches the one of the conjugate momenta
vφ . Although in Minkowski and in most of the expansion rates here analyzed, the
velocity from the equation of state estimator underestimates the conjugate momenta
velocity, at the largest expansion rate m = 0.997 we can see the opposite. The reason
4.3 Abelian-Higgs Cosmic Strings
139
Fig. 4.18 The effect of lattice spacing on the velocity estimators, as manifest in the velocitydependent functions of the VOS model, for high expansion rate simulations. The top panels show
the momentum parameter k(v) while the bottom panels show the energy loss parameter F(v), all
with the corresponding statistical uncertainties, obtained with the two velocity estimators defined
in the text. Left and right side panels depict the results for standard and half-lattice spacing
for this is unclear, although an inversion of this tendency might signal a possible
breakdown of the reliability of vω .
These behaviors should translate themselves into a change in the shapes of k(v)
and F(v): a larger disagreement should be obvious with a coarser lattice spacing, and
the differences should be larger at large expansion rate. Both conclusions follow from
the panels of Fig. 4.18. From comparing bottom and top panels of the aforementioned
figure, we confirm that a small lattice spacing is necessary to assess the proper shape
of velocity-dependent functions and therefore calibrate the VOS adequately. We note
as well one detail: the fact that at coarse lattice spacing the equation of state estimator
fails to give physically reasonable values (ie. positive definite) for the energy loss
function. Although the decreased lattice size improves this somewhat, it doesn’t avoid
that some measured values of this function are negative. This casts some doubt on
the reliability of this estimator at this low velocity limit. While this runs counter to
the Minkowski expectation, it is also not completely surprising: fast expansion and
Minkowski are opposite limits and is also true that the evolution of string networks
in flat space is not representative of evolution in expanding Universes [30].
In terms of the expansion rates used in our relativistic calibration (m ∈ [0.93,
0.95]), the figure shows similar predictions for the velocity dependent functions. The
140
4 Calibration of Extended VOS Mode
Table 4.10 Calibrated VOS model parameters for our two choices of lattice spacing Δx and
corresponding lattice sizes, for the two different choices of velocity estimators, vω2 and vφ2 ,
further described in the main text. Displayed values correspond to 16th, 50th, 84th percentiles of
the posterior distributions
Lattice
Δx
Velocity d
r
β
k0
q
c
size
estimator
20483
40963
20483
40963
0.5
0.25
0.5
0.25
vω2 vφ2 0.37+0.02
−0.02
+0.07
0.34−0.05
+0.05
0.33−0.04
0.36+0.09
−0.06
1.27+0.17
−0.15
+0.52
2.32−0.40
+0.39
1.86−0.32
2.56+0.64
−0.50
2.33+0.21
−0.20
+0.29
2.62−0.26
+0.28
2.65−0.26
2.69+0.30
−0.27
1.21+0.03
−0.03
1.06+0.03
−0.02
1.05+0.03
−0.03
+0.03
1.04−0.02
2.57+0.06
−0.06
2.37+0.06
−0.07
+0.08
2.84−0.08
+0.07
2.47−0.07
0.03+0.02
−0.03
0.25+0.02
−0.02
0.31+0.02
−0.02
+0.02
0.30−0.02
better agreement in velocities and in such predictions, hints at lattice spacing being
able to aid in softening the tension between the two calibrations from the previous
section. In order to test this, but assume the same dynamic range independent of lattice
spacing, we will compare the previous 20483 with standard Δx = 0.5 calibrations
with a new ones at lattice size 40963 with half-standard spacing Δx = 0.25. We
will perform this test for both velocity estimators and infer which estimator should
provide a more stable calibration.
The resulting calibrations are summarized in Table 4.10, and the corresponding
corner plots for each velocity estimator, are found in Figs. 4.19 and 4.20. Both from
the table and from comparing these two figures we can immediately infer that the
conjugate momenta estimator calibration is far more stable to changes in lattice spacing, with the only statistically significant change occurring in the parameter q. This
follows not only from the fact that high expansion rate velocities change (comparatively) very little, but also from the approximation of this velocity to the equation
of state one at moderate expansion rate. This approximation can be indicative of
behavior consistent with Minkowski space simulations.
For the equation of state estimator, both Fig. 4.19 and Table 4.10 show that the
model parameter estimation is heavily affected by lattice spacing: 4 out of 6 parameters differ (by several standard deviations). Note that the loop-chopping efficiency
c is included in this group of parameters and ceases to be consistent with zero. This
again follows from the dramatic changes at high-expansion rate.
To conclude we remark that both calibrations are in near agreement at Δx = 0.25
as can be read from the Table 4.10. This shows that although the EoS velocity is
less reliable in the low velocity limit, it is possible to compensate by reducing lattice
spacing. This is exactly the opposite result to what was shown to happen in Minkowski
space simulations: in [24] the EoS estimator is shown to be the most reliable, in a
flat space oscillating string.
4.3 Abelian-Higgs Cosmic Strings
141
Fig. 4.19 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity
estimator vω and two different lattice spacings Δx = 0.5 and Δx = 0.25. The 2D panels the
depict the 1σ and 2σ confidence regions
4.3.4 Coda: Observational Impact of Different Calibrations
With this we have thus obtained the highest-resolution, most accurate calibration
of the Velocity-dependent One Scale model. We will now highlight the need for
an accurate calibration of this model, since it can be connected to observational
consequences. Although deriving improved constraints is beyond the focus of the
present work, we can showcase how different calibrations will affect computed power
spectra.
In order to achieve this goal we will use Cosmic Microwave Background spectra,
computed for different VOS calibrations. The reason to do so, is because these spectra
are expected to mostly depend on a description of long string dynamics. In the case
of Stochastic Gravitational Wave Background produced by network loops one would
142
4 Calibration of Extended VOS Mode
Fig. 4.20 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity
estimator vφ and two different lattice spacings Δx = 0.5 and Δx = 0.25. The 2D panels the
depict the 1σ and 2σ confidence regions
require not only an accurate calibration but also an in-depth study of how reliable
the Nambu-Goto is approximation is on a loop-by-loop basis (and at which scales).
To perform such computations, we will use the publicly available software
CMBACT4 [38]. Note that in this code, the string network is approximated by several unconnected segments (Unconnected Segment Model–USM; see [3, 8]) whose
average properties are dictated by integration of the VOS model equations. While
this is a simplistic approximation (and indeed more robust simulation based methods
are available, see [2, 10, 11, 21, 25, 26]), it is true that for our intents and purposes,
which is to assess the impact of different calibrations, it will suffice.
For each spectra we wish to compute we will use 200 realizations of the string
network–this has been shown to be a sufficiently large number to produce spectra as
accurate as the approximation allows [13]. We will also keep to standard values of the
code, except in terms of the calibrations used (see next paragraph). Of note we will
4.3 Abelian-Higgs Cosmic Strings
143
Fig. 4.21 Power spectrum of cosmic microwave background anisotropies, obtained with the
CMBACT4 code, for the standard Nambu-Goto calibration, the standard Abelian-Higgs calibration
and two extended VOS calibrations in the present work. The panels depict the TT (top left), EE
(top right), TE (bottom left) and BB (bottom right) spectra. In each case the spectrum is obtained
by averaging over 200 realizations
keep the string decay parameter L f to its default standard value (L f = 0.5). Although
the risk is that string segments might decay earlier than their respective epoch for
0 ≤ L f ≤ 1, given the illustrative nature of this work and how little information
there is on the impact of this parameter, we decided to keep the default value for all
computations.
We will obtain TT, EE, TE and BE spectra for four different calibrations, all
normalized to the same tension (Gμ = 1.1 × 10−7 ). The first two cases correspond
to the standard version of the VOS model (already used in the CMBACT4 codebase)
which uses helicoidal ansatz for the momentum parameter k(v) (see [29]) and only the
linear velocity term (loop-chopping) in the energy loss function F(v). The difference
between the first two cases will be the parameter c, which is set to either c = 0.57
or c = 0.23 ± 0.04 to reflect either the Abelian-Higgs or Nambu-Goto calibration
(previously used in the Planck 2013 constraints, [2]). The last two calibrations will be
a result of the extended VOS with a best and worst case-scenario for the parameters–
ie. the lowest resolution lattice more affected by choice of velocity estimator (10243 ,
Δx = 0.5 and vω2 ) and a more reliable calibration (40963 , Δx = 0.25 and vφ2 ).
The resulting spectra are shown in Fig. 4.21. The first detail we can remark
is that all Abelian-Higgs spectra are in better agreement with each other than with
144
4 Calibration of Extended VOS Mode
Nambu-Goto. This is entirely expected as the velocities of these networks are similar,
whereas Nambu-Goto strings exhibit larger velocities [4, 9, 12, 34, 36, 39], possibly
due to the absence of radiation backreaction. However, even among Abelian-Higgs
calibrations there are still some noteworthy scale-dependent differences. The most
discrepant calibration of the three is the 10243 case, without a shadow of doubt.
Comparing the standard AH calibration with our new 40963 case, the differences
are more obvious at high multipole l for most spectra, except T T (where instead the
differences are more evident at low l). In the case of T T the scalar, vector and tensor
components are 16%, 30% and 11% higher in the 40963 case than in the standard
case. At present we do not know if such differences are only a result of the calibration,
or if there might be scale-dependent effects inherent to the USM approximation. In
any case, our point still stands: the accuracy of calibrations of the VOS model will
have an impact on the computed spectra.
4.4 Conclusion
The study of defect networks, either analytical or observational is full of challenges.
In the case of thermodynamic models used to study string evolution, such as the
Velocity-dependent One Scale model, there will be a number of free parameters
one cannot analytically predict (or if we can predict, it will be based on simplistic
assumptions). This establishes an important symbiosis: while simulations are limited
in resolution and dynamic range, therefore unable to simulate the entirety of cosmological history, they can calibrate semi-analytical modelling which is then capable
of extrapolating the full evolution of a network.
Given that the evolution of defect networks is also intimately connected to observational consequences, we sought to use and apply the improvements of the extended
VOS model for domain walls for both super-horizon wall networks and gauged cosmic string networks. For the first case, we saw that the scaling expected from the
re-entry into the horizon can have an impact in model calibration, changing the parameters related to small-scale structure (ergo, parameters included in the momentum
parameter). We also saw the model adequately described the approach to scaling
behavior, lending further credence to the model being an adequate description of
network evolution.
Afterwards we began using a similar prescription to study the calibration of this
model in the case of cosmic strings. Initially we began with relatively small lattices,
preliminarily obtaining that energy loss is not predominately through radiative losses,
ie. loop creation and evolution is still a contributor (unlike the domain wall case).
In order to continue improving our parameter estimation, we decided to explore
how numerical choices can affect the calibration and we sought to implement a
more robust pipeline. The more robust pipeline came as a consequence of a better
handling of uncertainties, either through automatic propagation, or by the use of
Bayesian inference to predict parameter posteriors. Equipped with this likelihood
analysis and the most extensive set of high-resolution simulations set to date, we
References
145
assessed the impact of different numerical choices, either related to initial conditions
cooling, dynamic range, lattice spacing and choice of numerical velocity estimator.
Our conclusions led us to a best case scenario calibration. This was all made possible
by the speed-ups possible in simulations described in the previous chapter.
In order to conclude, we also compared the observational impact of different calibrations on the Cosmic Microwave Background by computing anisotropies for several cases. We saw scale-dependent differences across all computed spectra. Although
we cannot exclude that such differences might be a result of the approximations made
for spectra computation, this still highlights the need for a proper characterization of
VOS model parameters.
We remark that there is still one possible systematic source whose effects on the
extended VOS are unknown, the PRS algorithm. Fortunately we will analyse outputs
from 81923 physical evolution runs with m ∈ [0.5, 0.75] in the upcoming months.
In parallel, we will implement some improvements in the VOS calibration tool, such
as full Bayesian Inference with differential equations (instead of assuming a linear
scaling solution) and phase space analysis.
References
1. Achucarro A, Avgoustidis A, Leite AMM, Lopez-Eiguren A, Martins CJAP, Nunes AS,
Urrestilla J (2014) Evolution of semilocal string networks: Large-scale properties. Phys Rev
D 89(6):063503. https://doi.org/10.1103/PhysRevD.89.063503
2. Ade PAR, et al (2013) Planck results. XXV. Searches for cosmic strings and other topological
defects. Astron Astrophys 571:A25. https://doi.org/10.1051/0004-6361/201321621
3. Albrecht A, Battye RA, Robinson J (1997) The Case against scaling defect models of cosmic
structure formation. Phys Rev Lett 79:4736–4739. https://doi.org/10.1103/PhysRevLett.79.
4736
4. Allen B, Shellard EPS (1990) Cosmic string evolution: A numerical simulation. Phys Rev Lett
64:119–122
5. Avelino PP, Martins CJAP (2000) Topological defects: Fossils of an anisotropic era? Phys Rev
D 62:103510. https://doi.org/10.1103/PhysRevD.62.103510
6. Avelino PP, Martins CJAP, Oliveira JCRE (2005) One-scale model for domain wall network
evolution. Phys Rev D 72:083506. https://doi.org/10.1103/PhysRevD.72.083506
7. Avgoustidis A, Copeland EJ, Moss A, Skliros D (2012) Fast analytic computation of cosmic
string power spectra. Phys Rev D 86:123513. https://doi.org/10.1103/PhysRevD.86.123513
8. Battye RA, Robinson J, Albrecht A (1998) Structure formation by cosmic strings with a cosmological constant. Phys Rev Lett 80:4847–4850. https://doi.org/10.1103/PhysRevLett.80.4847
9. Bennett DP, Bouchet FR (1990) High resolution simulations of cosmic string evolution. 1.
network evolution. Phys Rev D41:2408
10. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from
cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D
75:065015. https://doi.org/10.1103/PhysRevD.75.065015
11. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2010) CMB power spectra from cosmic strings:
predictions for the Planck satellite and beyond. Phys Rev D 82:065004. https://doi.org/10.
1103/PhysRevD.82.065004
12. Blanco-Pillado JJ, Olum KD, Shlaer B (2011) Large parallel cosmic string simulations: New
results o n loop production. Phys Rev D 83:083514. https://doi.org/10.1103/PhysRevD.83.
083514
146
4 Calibration of Extended VOS Mode
13. Charnock T, Avgoustidis A, Copeland EJ, Moss A (2016) CMB constraints on cosmic strings
and superstrings. Phys Rev D 93(12):123503. https://doi.org/10.1103/PhysRevD.93.123503
14. Correia J, Martins C (2021a) High-resolution GPU-accelerated Abelian-Higgs string simulation: length colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710664, https://doi.
org/10.5281/zenodo.4710664
15. Correia J, Martins C (2021b) High-resolution GPU-accelerated Abelian-Higgs string simulation: velocity colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710670, https://
doi.org/10.5281/zenodo.4710670
16. Correia JRCCC, Martins CJAP (2020) Quantifying the effect of cooled initial conditions on cosmic string network evolution. Phys Rev D 102(4):043503. https://doi.org/10.1103/PhysRevD.
102.043503
17. Correia JRCCC, Martins CJAP (2021) High resolution calibration of the cosmic strings velocity
dependent one-scale model. Phys Rev D 104(6):063511. https://doi.org/10.1103/PhysRevD.
104.063511
18. Correia JRCCC, Martins JAP (2019) Extending and calibrating the velocity dependent onescale model for cosmic strings with one thousand field theory simulations. Phys Rev D
100(10):103517. https://doi.org/10.1103/PhysRevD.100.103517
19. Correia JRCCC, Leite ISCR, Martins CJAP (2014) Effects of biases in domain wall network
evolution. Phys Rev D 90(2):023521. https://doi.org/10.1103/PhysRevD.90.023521
20. Correia JRCCC, Leite ISCR, Martins CJAP (2018) Effects of biases in domain wall network evolution. II. Quantitative analysis. Phys Rev D 97(8):083521. https://doi.org/10.1103/
PhysRevD.97.083521
21. Daverio D, Hindmarsh M, Kunz M, Lizarraga J, Urrestilla J (2016) Energy-momentum correlations for Abelian Higgs cosmic strings. Phys Rev D 93(8):085014. https://doi.org/10.1103/
PhysRevD.95.049903 https://doi.org/10.1103/PhysRevD.93.085014. [Erratum: Phys. Rev.
D95, no.4, 049903(2017)]
22. Foreman-Mackey D, Hogg DW, Lang D, Goodman J (2013) Emcee: The MCMC hammer.
Publ Astron Soc Pac 125:306–312. https://doi.org/10.1086/670067
23. Hindmarsh M, Stuckey S, Bevis N (2009) Abelian higgs cosmic strings: Small scale structure
and loops. Phys Rev D 79:123504. https://doi.org/10.1103/PhysRevD.79.123504
24. Hindmarsh M, Lizarraga J, Urrestilla J, Daverio D, Kunz M (2017) Scaling from gauge and
scalar radiation in Abelian Higgs string networks. Phys Rev D 96(2):023525. https://doi.org/
10.1103/PhysRevD.96.023525
25. Lazanu A, Shellard P (2015) Constraints on the Nambu-Goto cosmic string contribution to the
CMB power spectrum in light of new temperature and polarisation data. JCAP 02:024. https://
doi.org/10.1088/1475-7516/2015/02/024
26. Lazanu A, Shellard EPS, Landriau M (2015) CMB power spectrum of Nambu-Goto cosmic
strings. Phys Rev D 91(8):083519. https://doi.org/10.1103/PhysRevD.91.083519
27. Leite AMM, Martins CJAP (2011) Scaling properties of domain wall networks. Phys Rev D
84:103523. https://doi.org/10.1103/PhysRevD.84.103523
28. Lopez-Eiguren A, Urrestilla J, Achucarro A (2017) Measuring global monopole velocities, one
by one. JCAP 1701(01):020. https://doi.org/10.1088/1475-7516/2017/01/020
29. Martins CJAP, Shellard EPS (2002) Extending the velocity dependent one scale string evolution
model. Phys Rev D 65:043514. https://doi.org/10.1103/PhysRevD.65.043514
30. Martins CJAP, Shellard EPS (2006) Fractal properties and small-scale structure of cosmic string
networks. Phys Rev D 73:043515. https://doi.org/10.1103/PhysRevD.73.043515
31. Martins CJAP, Shellard EPS, Vieira JPP (2014) Models for small-scale structure on cosmic strings: Mathematical formalism. Phys Rev D 90(4):043518. https://doi.org/10.1103/
PhysRevD.90.043518
32. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10.
1103/PhysRevD.93.043534
References
147
33. Martins CJAP, Rybak IYu, Avgoustidis A, Shellard EPS (2016) Stretching and Kibble scaling regimes for Hubble-damped defect networks. Phys Rev D 94(11):116017. https://doi.
org/10.1103/PhysRevD.94.116017 https://doi.org/10.1103/PhysRevD.95.039902. [Erratum:
Phys. Rev. D95, no.3, 039902(2017)]
34. Moore J, Shellard E, Martins C (2002) On the evolution of Abelian-Higgs string networks.
Phys Rev D 65:023503. https://doi.org/10.1103/PhysRevD.65.023503
35. Nabarro PRN (1947) Dislocations in a simple cubic lattice. Proceed Phys Soc 59:256–272
36. Olum KD, Vanchurin V (2007) Cosmic string loops in the expanding universe. Phys Rev D
75:063521
37. Peierls R (1940) The size of a dislocation. Proceed Phys Soc 52:34–37
38. Pogosian L, Vachaspati T (1999) Cosmic microwave background anisotropy from wiggly
strings. Phys Rev D 60:083504. https://doi.org/10.1103/PhysRevD.60.083504
39. Ringeval C, Sakellariadou M, Bouchet F (2007) Cosmological evolution of cosmic string loops.
JCAP 0702:023. https://doi.org/10.1088/1475-7516/2007/02/023
40. Rybak IYu, Avgoustidis A, Martins CJAP (2017) Semianalytic calculation of cosmic microwave
background anisotropies from wiggly and superconducting cosmic strings. Phys Rev D
96(10):103535. https://doi.org/10.1103/PhysRevD.96.103535
41. Sousa L, Avelino PP (2010) Evolution of domain wall networks: The press-ryden-spergel
algorithm. Phys Rev D 81:087305. https://doi.org/10.1103/PhysRevD.81.087305
42. Sousa L, Avelino PP, Guedes GSF (2020) Full analytical approximation to the stochastic gravitational wave background generated by cosmic string networks. Phys Rev D 101(10):103508.
https://doi.org/10.1103/PhysRevD.101.103508
43. Vilenkin A, Shellard EPS (2000) Cosmic strings and other topological defects. Cambridge
University Press, ISBN 9780521654760. http://inspirehep.net/record/1384873?ln=pt
44. Ward RS (1997) Bogomolnyi bounds for two-dimensional lattice systems. Commun Math Phys
184:397–410. https://doi.org/10.1007/s002200050065
45. Yamaguchi M, Yokoyama J (2002) Lagrangian evolution of global strings. Phys Rev D
66:121303. https://doi.org/10.1103/PhysRevD.66.121303
46. Zeldovich Y, Khlopov M (1978) On the concentration of relic magnetic monopoles in
the universe. Phys Lett B 79(3):239–241. ISSN 0370-2693. https://doi.org/10.1016/03702693(78)90232-0. http://www.sciencedirect.com/science/article/pii/0370269378902320
Chapter 5
Strings in U(1) L × U(1) L Simulations
String theory cosmologists have discovered cosmic strings
lurking everywhere in the undergrowth
Tom Kibble
As mentioned in the epigraph, fundamental superstrings were shown in [6] to behave
in some ways similarly to cosmic strings, being one-dimensional horizon sized
objects formed in the early Universe and even obeying a homotopy condition (see
introduction and references therein for more details). Attempts to simulate these
objects will inevitably fall into two categories: extensions of Nambu-Goto cosmic
string simulations (of which simulations with more than one string type do not exist)
or with field theory simulations, of which the simplest model to implement would
correspond to the dual local U (1) model of [4].
L = |Dμ φ|2 −
1 μν
1
F Fμν + |Dμ ψ|2 − G μν G μν − V (|φ|, |ψ|)
4
4
(5.1)
for two complex scalar fields φ and ψ, two U (1) gauge fields Aμ and Bμ with
corresponding gauge field strengths G μν and Fμν . The covariant derivatives, gauge
field strengths and potential are given by,
Dμ = ∂μ − ie p Aμ
Dμ = ∂μ − ieq Bμ
(5.2)
Fμν = ∂μ Aν − ∂ν Aμ
G μν = ∂μ Bν − ∂ν Bμ
(5.3)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations,
Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_5
149
5 Strings in U (1) L × U (1) L Simulations
150
V (|φ|, |ψ|) =
λp
λq
(|φ|2 − σ 2p )2 + (|ψ|2 − σq2 ) − κ(|φ|2 − σ 2p )(|ψ|2 − σq2 ) (5.4)
4
4
where λ p,q are scalar couplings, e p,q are gauge couplings and κ is the coupling
between scalar fields. If these parameters are such that 0 < κ < 21 λ p λq then the
vacuum manifold will be non-trivial in two sectors supporting the existence of two
strings and, due to the non-zero value of κ, also bound state strings (see [4]).
As mentioned previously, this model cannot capture all the proper physics of
cosmic superstrings. For instance, it clearly lacks supersymmetry, intercommutation
probabilities will not be the expected ones for superstrings, the tension spectra cannot
be reproduced. Still, a field theory simulation of this model can tell us a lot about the
abundances of bound states and the properties of scaling of each network component.
Although previously studied in the literature, there are three details that can play an
important role which thus far have been neglected. These details are comprised
of the need of proper non-PRS evolution for bound state formation, the impact of
resolution effects on small-scale structure (and again on bound state formation) and
the possibility of unequal tensions. Our previous Abelian-Higgs code (appropriately
extended) will be able to shed some light on these issues. We additionally remark
that everything in this chapter is unpublished work that in the near future will form
the basis of a publication.
5.1 Simulation Overview
For the Abelian-Higgs string we begin with the originally U (1) locally invariant
Lagrangian density presented in the previous subsection. The Lagrangian should
have the form given previously, and from variation of the action under the assumption
of FLRW metric and the temporal gauge (A0 = 0), come the equations of motion,
λp
ȧ
(|φ|2 − σ 2p ) − κ(|ψ|2 − σq2 )
φ̈ + 2 φ̇ = D j D j φ − a 2 φ
a
2
(5.5)
λq
ȧ
(|ψ|2 − σq2 ) − κ(|φ|2 − σ 2p )
ψ̈ + 2 ψ̇ = D j D j ψ − a 2 ψ
a
2
(5.6)
Ḟ0 j = ∂ j Fi j − 2a 2 e2p I m[φ∗ D j φ]
(5.7)
Ġ 0 j = ∂ j G i j − 2a 2 eq2 I m[ψ ∗ D j ψ]
(5.8)
along with a two copies of Gauss’s law for each sector,
∂i F0i = 2a 2 e2p I m[φ∗ φ̇]φ
(5.9)
5.1 Simulation Overview
151
∂i G 0i = 2a 2 eq2 I m[ψ ∗ ψ̇]ψ
(5.10)
which will be tested in the validation section. All of the previous equations are
symmetric under exchanges of φ ↔ ψ, Aμ ↔ Bμ , Fi j ↔ G i j and under suitable
exchanges of couplings λ p ↔ λq and e p ↔ eq . We will assume criticality in the two
λ
sectors, which corresponds to 2ep,q
= 1 and we will assume both symmetry breaking
2
p,q
scales to be unity σ p,q = 1. This corresponds to equal tension constituent strings
which do not (at coupling κ = 0) form bound states with winding greater than one
(say of the form (2,0) or (0,2)). In these simulations, much like in Abelian-Higgs
and domain walls cases, the comoving string radius varies as a −2 for both scalar
and gauge radii. Here we adopt the same comoving width controlling trick used in
previous chapters, where all coupling constants are made to vary as,
λ p,q = λ p,q0 a −2(1−β)
e = e p,q a −(1−β)
κ = κ0 a −2(1−β)
(5.11)
where, following [2] κ is made to vary in the same way as λ p,q , and β which can be
set to 0 (constant comoving width), 1 (recovering the original equations of motion)
and β < 0 implies radii growth. The previous comoving width trick implies that the
equations of motion can now be re-written as,
λp
ȧ
(|φ|2 − σ 2p ) − κ(|ψ|2 − σq2 )
φ̈ + 2 φ̇ = D j D j φ − a 2β φ
a
2
(5.12)
λq
ȧ
j
2β
2
2
2
2
ψ̈ + 2 ψ̇ = D D j ψ − a ψ
(|ψ| − σq ) − κ(|φ| − σ p )
a
2
(5.13)
ȧ
Ḟ0 j + 2(1 − β) F0 j = ∂ j Fi j − 2a 2β e2p [φ∗ D j φ]
a
(5.14)
ȧ
Ġ 0 j + 2(1 − β) G 0 j = ∂ j G i j − 2a 2β eq2 [ψ ∗ D j ψ]
a
(5.15)
This is very similar to writing two discretized evolution equations for the two
Abelian-Higgs strings, but with an extra term from the existence of the coupling
term in the potential. The evolution equations for updating conjugate momenta of
scalar fields in both string sectors are executed first in a timestep via,
(1 + δ)Π
x,η+ 21
= (1 − δ)Π
−
x,η− 21
+ Δη D −j D +j φx,η
λ p0 x,η 2
(|φ |
aη2β φx,η [
2
−
σ 2p )
− κ(|ψ
| −
x,η 2
σq2 )]
(5.16)
5 Strings in U (1) L × U (1) L Simulations
152
1
1
(1 + δ) x,η+ 2 = (1 − δ) x,η− 2 + Δη D −j D +j ψ x,η
− aη2β ψ x,η [
λq0
(|ψ x,η |2 − σq2 ) − κ(|φx,η |2 − σ 2p )]
2
(5.17)
where δ is defined as,
δ=
1
mΔη
1 dlna Δη
α
= α
.
2 dlnη η
2 (1 − m)η
(5.18)
and sets the strength of Hubble damping on the scalar fields. After we have the
evolution equations for the gauge conjugate momenta E and H ,
x,η+ 21
(1 + ω)E i
x,η− 21
= (1 − ω)E i
+ Δη[−∂i− Fi j
+ 2e2p0 aη2β I m[φ∗ Di+ φ]x,η ]
x,η+ 21
(1 + ω)Hi
x,η− 21
= (1 − ω)Hi
+ Δη[−∂i− G i j
2 2β
+ 2eq0
aη I m[ψ ∗ Di+ ψ]x,η ]
where ω,
ω = δ(1 − β)
(5.19)
(5.20)
(5.21)
introduces an unphysical damping of the gauge field for any value of β = 1. In the
case β = 1, the damping vanishes as is expected in the physically correct limit. To
finish updating all the fields in a timestep, we then apply the following,
1
φx,η+1 = φx,η + ΔΠ x,η+ 2
1
ψ x,η+1 = ψ x,η + Δ x,η+ 2
x,η+1
= Ai
x,η+1
= Bi
Ai
Bi
x,η+ 21
x,η
+ ΔE i
x,η
+ ΔHi
x,η+ 21
(5.22)
(5.23)
(5.24)
(5.25)
again for all non-conjugate fields in all sectors.
One not trivial detail about defect simulations is that one is often constrained in
the amount of dynamical range available to reach and characterize certain behaviors.
For instance, to ascertain the scaling of all types of networks, it is necessary for the
fields to relax into string configurations and then that these networks reach scaling,
again for all types of strings in the simulation (pure p, q or bound states therein).
There is no unique way to proceed in terms of generating initial conditions, but we
can take some notes from the works of [2, 5] while applying some differences of our
5.1 Simulation Overview
153
own. In all cases we will generate simple initial conditions, with both complex scalar
fields having magnitudes set by the VEV and random phases. Similarly, we will also
apply a diffusive phase to smoothen out the large gradients present in such initial
conditions, followed by a damping period to form string networks more quickly. The
details of how these operations are applied (and when, in terms of conformal time)
are different from the previous authors. We start with the diffusive cooling equations,
taken from the previous cooling functions used in Abelian-Higgs for the first sector,
φ̇ = D j D j φ −
λ p0
(|φ|2 − σ p )φ
2
Ḟ0 j = ∂ j Fi j − 2e2p0 [φ∗ D j φ]
(5.26)
(5.27)
and interchangeably for the q-sector by, again, performing appropriate substitutions
(φ → σ, Aμ → Bμ , λ p → λq and e p → eq ),
λq0
(|ψ|2 − σq )ψ
2
(5.28)
2
[ψ ∗ D j ψ] .
Ġ 0 j = ∂ j G i j − 2eq0
(5.29)
ψ̇ = D j D j ψ −
Note that while this method of cooling is equal to what was previously implemented for the Abelian-Higgs case (even ignoring the presence of the coupling term
of the potential) it is different from what was applied in the works of [2, 5] which
merely averages each scalar field over nearest neighbors 30 times. In practice, both
serve the effect of smoothing over the initial condition gradients and we see no
indication throughout the validation procedure that using either method produces a
discernible impact on network evolution. We can however explore the introduction
of the coupling term in the potential to see if the formation of bound states initially
changes. This cooling is applied from an initial conformal time of η = −10.0, until
η = 1.0, in timesteps of size Δη = 1/30.
Next comes an extra step, vital in preparing the initial conditions such that the
network might reach scaling as fast as possible–a damping period. Although previous
work on U (1) L × U (1) L -string network simulations have used the cosmological
discretized equations (along with the same lattice spacing and timestep size) for this
period they introduce a fixed Hubble damping factor γ = aȧ of either 0.2 [5] or 0.4
[2]. In our case the highly damped cosmological evolution will be set by varying
the expansion rate. For instance if set to m = 0.9 and by applying damping in a
short, early period of time (from η = 1.0 until η = 5.0), the damping factor will be
sufficiently large to relax the fields into a network quickly. The fact that this period
is short also allows the network to spend most time in evolution under matter or
radiation. We will set the duration of this phase such that γ never falls below 0.4.
Although [5] suggested that a sufficiently high damping would aid in the formation
of bound states (at an initial stage), we see no evidence of a large population of bound
states forming at early stages of the network evolution.
154
5 Strings in U (1) L × U (1) L Simulations
5.2 Validation
We will now proceed with a comparison of the present simulations with two literature
references of [2, 5]. We will use the same lattice spacing as [2] (Δx = 0.5, which is
double the lattice spacing of [5]) the modified cooling and damping as applied here
means we do not require a large lattice for this validation, and therefore the box size
is kept at 10243 . This also has the advantage of provinding the same dynamic range
presented in the literature. As explained previously, the simulation parameters are
consistent with equal tensions for constituent strings and criticality for each sector:
2
. In the validation we
λ p0 = λq0 = 2, e p0 = eq0 = 1 and λ p0 = λq0 = 2e2p0 = 2eq0
will also force all constants to scale such that the comoving width of strings is kept
constant throughout (β = 0). In addition both symmetry breaking scales are equal
and of value one, σ p = σq = 1. These parameter choices essentially mean basic
strings of either sector have tensions μ p = μq = 2πσ p = 2πσq –unlike the cosmic
superstring case where μ p = μ F and μq = μ F /gs and μ F is the fundamental string
tension. The coupling constant κ0 will be kept at 0.9, which according to [2] showed
no significant change in the amount of pq-string abundances relative to κ = 0.95. We
will conduct the validation procedure as we introduce all the necessary estimators for
mean string separation and mean velocity squared. Before we do so however, there
are two necessary simple checks to be performed: first the verification of Gauss’s
law for each U (1) L sector and secondly visual confirmation of the existence of more
than two string networks.
The verification of Gauss’s law for each sector can be seen below in Fig. 5.1 for
a radiation era run with β = 0, starting at conformal time η = 1.0, throughout the
damping phase and subsequent cosmological evolution. As can be observed both
versions of Gauss’s law are preserved, at worst to 10−7 . Note that this is true both in
the damping and the cosmological phase: there is no a priori reason for this not to
be true for the damping phase.
A first approach to the visual confirmation of multiple string types, can be done
by using isosurfaces of the absolute value of each scalar field |φ| < 0.5, |ψ| < 0.5
and by using isosurfaces of the interaction potential:
Vinteraction = κ(|φ|2 − σ 2p )(|ψ|2 − σq2 ) = ζ
(5.30)
where ζ is a threshold of value which takes the value of 0.855 from [2] if one desires
to exactly match the winding cells of bound states. The result can be seen in the two
panels of Fig. 5.2 for a small lattice (5123 , Δx = 0.5) conformal times η = 101.5
and η = 127.5. The red and blue surfaces correspond to the isosurfaces of the scalar
fields, and green surfaces to the interaction potential with a threshold of ζ = 0.655
(lowered from 0.855 for visual purposes). Red surfaces without overlapping with
blue ones are indicative of strings of the first sector ( p-strings in our nomenclature),
and blue of the second type (q-string). Here we visually confirm that this threshold
locates reasonably well possible overlaps of the two basic strings, indicative of bound
states ( pq-strings). The fact that bound states are far rarer than the basic strings is in
5.2 Validation
155
Fig. 5.1 The panel shows the violation of Gauss’s law, throughout the evolution of a 5123 , Δx = 0.5
matter epoch simulation. Red indicates the p-string sector, blue the q-string sector
qualitative agreement with the conclusion of [2] where bound states would constitute
no more than 2% of the total string length. We will get back to the discussion of bound
state abundances and on using winding to pinpoint pq-strings in a later subsection.
5.2.1 On Average Network Quantities
Although this already seems indicative of the expected behavior of the network,
we must assess as well which types of bound states exist and characterize them (in
terms of relative abundances), as well as define suitable lengthscale estimators for
each network type. We can do so by using the winding estimator from the previous
chapters,
ξp =
V
Lp
ξq =
V
Lq
ξ pq =
V
L pq
(5.31)
where L p and L q are given via the sum of non-zero plaquettes computed in the
respective sector (so either using φ, Aμ or ψ, Bμ ) with the length of string in bound
states L pq subtracted. L pq correspond to cells where both types of winding overlap.
We leave the details of this computation to the next section. For now, we would
merely add that with the choice of parameters made for the validation procedure
(strings of equal tension), we do not observe any bound state with winding two or
156
5 Strings in U (1) L × U (1) L Simulations
a
b
Fig. 5.2 The two panels show isosurfaces of the absolute magnitude of the scalar field φ (blue,
representing p-strings), of the scalar field ψ (red, representing q-strings) set at |φ| = |ψ| = 0.5,
together with isosurfaces of the interaction potential ζ = 0.655 (in green, representing pq-strings).
The snapshots come from a matter epoch simulation of size 2563 , Δx = 0.5 at two conformal times
η = 101.5 and η = 127.5, with no treatment applied to the initial conditions (which is why it needed
to be evolved past half-a-light crossing time to reach scaling)
5.2 Validation
157
Fig. 5.3 The two panels show the evolution of the mean string separation for the full network,
either using the Lagrangian length estimator (in orange) or the Winding length estimator (in purple)
for both radiation and matter epoch (left and right-hand-side panels, respectively). In the case of
the winding estimator, length of pq-segments is computed using the fast method described in the
next section
higher (in any sector), meaning pq-strings are, for the intents and purposes of this
section, (1,1) bound states.
This already enables us to detect scaling behavior for each relevant string species,
although we can additionally define length estimators for the full network (all string
types included together). This was done in [2] via a combined winding estimator,
V
ξW =
(5.32)
L p + L q + L pq
or via the Lagrangian length estimator also used in previous chapters and throughout
Abelian-Higgs simulations,
−μV
ξL = (5.33)
Lx
where V is the box volume, μ the tension of either basic string type1 and Lx is a
lattice-wide sum of the Langrangian computed at every site.
For now we will merely state that both full-network mean string separation estimators show scaling behavior, as can be seen in Fig. 5.3, albeit the details of scaling,
namely the rate of change of ξ differ by about 10% when comparing ξW and ξL .
There is no literature reference for these values and therefore no comparison can be
directly made for full-network estimators. Given that there are two methods for the
computation of L pq (and this quantity can impact the computation of ξW ) we will
1
The tension for the (1,1) states is different, however, and due to the fact that the most abundant
string types have exactly the same tension, this estimator still gives a good indication of scaling
behavior of the network. If we wish to test the specific case where the basic constituent strings have
unequal tension, this estimator might not be the most reasonable choice.
5 Strings in U (1) L × U (1) L Simulations
158
leave a more thorough discussion of ξ˙ for the next section. We remark that although
our end-goal is to merely study the abundance of bound states in these simulations
and how the impact of certian numerical choices, we will also use the velocity estimators of [2] as an additional validation source. We can take the previously defined
(and used in previous chapters) scalar field velocity estimator, where the mean string
velocity is given by,
v 2 φ =
2R
,
1+ R
(5.34)
where R, in [2], is given by
|Π |2 W
R= x + 2
x,i |D x,i φ| W
(5.35)
although here we opt for a definition that takes into account both scalar fields (or
both string sectors),
2
2
x (|Π | + || )W
R=
(5.36)
+
+
2
2
x,i (|D x,i φ| + |D x,i ψ| )W
and W is a weight function, meant to merely localize the estimators around strings.
We will refer to this as a the field-based velocity estimator. The weight function can
be used to specify that we wish the mean velocity of all strings, merely by choosing
it to be equal to the Lagrangian, or the mean velocity of only bound-state strings, by
choosing the weight function to be given by the interaction potential.
The evolution of both velocity estimates are presented in Fig. 5.4 with the corresponding asymptotic values computed for the conformal time range η ∈ [230.0 −
256.0]. Equivalently, the same information of these asymptotic values can be found
2 for either the full
Fig. 5.4 The two panels show the evolution of the mean squared velocity vW
network (by specifying the full Lagrangian as a weight function, in orange) or the pq-segments (by
using the interaction potential as the weight, colored in green). Left panel corresponds to radiation
epoch, right to matter epoch
5.2 Validation
159
2 for either the full network
Table 5.1 The asymptotic values of the mean velocity squared vW
(weighted by the Lagrangian) or pq-segments (weighted by the interaction potential) for the simulations from this section, Abelian-Higgs simulations from Chap. 3, pq-strings simulations from
[2]
Size, Δx
m
v 2 pq
v 2 L
Reference
10243 , Δx = 0.5
10243 , Δx = 0.5
10243 , Δx = 0.5
5123 , Δx = 1.0
10243 , Δx = 0.5
10243 , Δx = 0.5
10243 , Δx = 0.5
5123 , Δx = 1.0
1/2
1/2
1/2
1/2
2/3
2/3
2/3
2/3
–
0.319 ± 0.008
∼0.33
–
–
0.247 ± 0.006
∼0.27
–
0.272 ± 0.002
0.293 ± 0.006
0.306 ± 0.004
–
0.228 ± 0.004
0.253 ± 0.009
0.264 ± 0.006
–
Abelian-Higgs, Chap. 3
This section
[2]
[5]
Abelian-Higgs, Chap. 3
This section
[2]
[5]
in Table 5.1, along with the asymptotic values of pure Abelian-Higgs (only for the full
network, same velocity estimator) and for the directly comparable figures of [2]. We
2
. In matter and
note that [5] performs no velocity estimates. First let us discuss vL
radiation epoch (m = 2/3, m = 1/2, respectively) the values are indeed compatible
within 1 − σ uncertainties although lower than the ones presented in [2]. For the case
of v 2pq , there is an important difference to note here: in [2] the velocities are given
in a range, as for the conformal time period where the network reaches scaling (based
on full network estimators), the velocity of pq-strings decreases. Here we observe
a different behavior: the velocities increase throughout this range. Possibly, this can
be attributed to the different preparation of initial conditions, specifically due to the
different damping applied in our case. If we compute the values from the last 20.0
conformal time units however, we can compare to the lower bound of the ranges
provided in [2]. Assuming uncertainties comparable to ours for this lower range,
we can see that velocities are compatible in radiation epoch while underestimated in
matter era (by about 10%). This can again be attributed to damping period, given that
it is not entirely clear (see again Fig. 5.4) if the velocity has completely stabilized
and thus requires more dynamic range to reach it’s asymptotic value.
We will now move to the (more complicated) task of computing the length of
bound-state strings, on seeing scaling of bound-states and on computing the relative
abundances of pq-strings.
5.2.2 On Locating pq-Segments
The computation of the total length of pq-strings throughout the box might seem
like the relatively simple task of detecting plaquettes of exactly double winding (one
winding of each type), however there can be some “accidental” displacements of
windings, such that the two strings still overlap, but the windings are situated one
160
5 Strings in U (1) L × U (1) L Simulations
site apart for instance. On the other hand one can have some “accidental” crossings
at any given timestep of say only one plaquette, which in the next timestep are not
overlapped–and therefore we can reasonably state that no bound-state string formed.
In order to solve these issues the authors of [2] have allowed p- and q-strings to
be considered as overlapped if within a transverse distance of four lattice units, and
in the previous case of [5] using a maximum intersegment distance (rather than the
transverse distance) of 5. In addition, [2] and [5] also considered a minimum threshold
on the length of segments, requiring that proper bound states have a minimum of
either L pq = 3 and L pq = 20, respectively.
We will present here our two methods for the computation of the length estimator,
which slightly differ from those present in the literature, but give rise to similar
conclusions. The first thing we must mention is that we consider using cells pierced by
strings, without specifying the plaquettes themselves. This is less memory intensive,
although it gives no information about the orientation of different strings in either
the same cell or neighbouring cells.
The first method is made via a custom CUDA-kernel and is supposed to be no
more than a fast, approximate (and less robust) computation of the length itself, while
the second requires the usage of the in-situ capabilities of our simulation. Note that
the second is more time-consuming, a result of being more IO-intensive, although
the method itself is more robust, for reasons we shall explain next.
So for the fast computation we merely store in two separate arrays (one for each
sector) if a cell is pierced by a winding (with the content of each cell set by the total
magnitude of windings that pierce it). We then verify if cells pierced in both arrays
are either non-zero at the same location, or displaced by one in any direction. We
then sum the number of cells pierced by both types of strings and use this as an
indication of the number of segments that constitute the length. This method is less
robust overall for two reasons,
• The number of cells of an overlap region should give rise to a a collection of
segments of length Δx × (Ncells − 1). Since this kernel merely detects overlap
site-by-site, but doesn’t attempt to cluster cells into regions (this in fact would
have to be done a posteriori, via some clustering algorithm), we directly use the
number of cells overall. So, in a sense there is a “off-by-one” systematic error
included here. This is solved by the robust method, as collections of cells are
separated by regions.
• Since no clustering is performed, it is not possible to apply a lower threshold on
the number of cells a segment should contain to be considered a proper bound
state. This again is easily solved in the robust method.
We could argue that these biases might not be significant, or at least have a
reasonable chance of yielding expected results: L pq will be dominated by large strings
(and not by small, a few-cell strings) and having an off-by-error in large segments
will not incur a significant difference towards the total string length. Nonetheless, the
robust method also has an advantage as we shall see next: we can choose the overlap
to occur at greater distances than just one cell over. This is due to the flexibility of
5.2 Validation
161
the filter that identifies overlap between two data sources in Paraview. However more
robust and flexible the second method may be, the first one already yields reasonable
results, already in agreement with literature results. In order to explicitly show this,
let us define some additional quantities besides estimators ξ p , ξq and ξ pq .2 Similar
to [2], one can compute two fractions which allow us to show the relative abundance
of bound states. First is the total fraction of bound states f total ,
f total =
L pq
L pq
+ L p + Lq
(5.37)
which is expected not to exceed more than 1 − 2% when scaling is reached. We can
also compute this abundance relative to the length of one of the constituent strings–
for instance relative to L p . In this case, and following the definition and notation of
[2],
L pq
.
(5.38)
fp =
Lp
All mean string separation estimators ξ p , ξq and ξ pq , and both fractions f total , f pq
can be seen in Fig. 5.5 in both radiation (left-side) and matter (right-hand side) epochs,
having used the fast method to compute L pq . All string separations achieve scaling,
although the behavior of ξ pq is far less “smooth” than that of ξ pq . Even though we
could be tempted to attribute this effect to the two systematic error sources previously
listed, it might instead be related to the the low abundance of bound states. In fact,
even in the literature [2], the “jagged” behavior of ξ pq can be observed. We note as
well fraction f total is always within 1 − 2% as seen in the aforementioned literature
references, and it seems to be slightly larger in matter era (bottom panels of Fig. 5.5).
The fraction of bound states relative to the fraction of p-strings is also in agreement
with the literature values, as in matter era it seems to be of about 4% (Table 5.2).
We can now discuss the asymptotic values of several mean string separations.
Note that this still assumes the fast computation and we will need to re-assess this
comparison with the robust method. The comparison of asymptotic rate of change of
ξ for full network ξW , for p-strings (equivalently q-strings) ξ p and for bound states
ξ pq . For comparison we add the values of Abelian-Higgs, where the full network
estimator and the p-string (for instance) values are the same, and for full network
and p-string we compare to the work [5]. Note that we do not have any literature
values to compare ξ˙pq with, and therefore the values are presented for reference–in
the aforementioned reference the typical distance between Y-junctions is computed
in a very different way, using the number of pq-segments. The values of ξ˙W and ξ˙p
are both larger than those presented by [5]. Note that the addition of error bars in
the values of [5] could certainly lessen the disagreement, although we suspect more
factors play a part here. One thing to note is the resolution of the literature simulation
of 5123 and lattice spacing Δx = 1.0, which could affect the reliability of the mean
2
Remember that the lengths L p and L q require subtraction of L pq and are therefore dependent on
bound state identification. This is the reason we place a discussion of ξ p,q asymptotic behavior in
this section, and not in the previous one.
162
5 Strings in U (1) L × U (1) L Simulations
Fig. 5.5 All panels showcase the evolution of several quantities derived from the total length of
pq-strings present in the box, L pq , derived from the fast computation method. The mean string
separations ξ p and ξq are shown in the two upper plots (red and blue, respectively); ξ pq in the
middle panels (in green); the relative abundances of pq-strings in the lower panels f total and f p ,
in purple and red, respectively. As in previous figures, the left-hand-side corresponds to radiation
epoch, the right-hand-side to matter epoch
string separation estimates and of the evolution of the fields themselves. We do not
think the difference in damping and the difference in bound state identification could
explain this tension as the fractions f total and f p are in agreement with [5]. Moreover,
given the small values of f total and f p the identification of bound states cannot impact
ξ p , ξq and ξW significantly (note that this might not be the case with ξ pq ).
5.2 Validation
163
Table 5.2 The asymptotic rate of change of the mean string separation ξ for either the full network
using windings ξW , the rate of change for p-strings only and the rate of change of ξ pq for pq-strings.
For comparison we provide both the literature values (which have no uncertainties reported) and
the Abelian-Higgs values (where the full network estimator is equivalent to only having a single
string type, say p-strings for instance)
Size, Δx
m
Reference
ξ̇W
ξ̇ p
ξ̇ pq
10243 , Δx = 0.5
1/2
0.280 ± 0.026
= ξ̇W
–
10243 , Δx = 0.5
1/2
0.194 ± 0.026
0.270 ± 0.050
2.488 ± 0.612
10243 , Δx = 0.5
5123 , Δx = 1.0
10243 , Δx = 0.5
1/2
1/2
1/2
–
0.15
0.279 ± 0.016
–
0.22
= ξ̇W
–
–
–
10243 , Δx = 0.5
2/3
0.194 ± 0.022
0.277 ± 0.042
1.634 ± 0.721
10243 , Δx = 0.5
5123 , Δx = 1.0
2/3
2/3
–
0.15
–
0.21
–
–
Abelian-Higgs,
Chap. 3
This section,
fast method
[2]
[5]
Abelian-Higgs,
Chap. 3
This section,
fast method
[2]
[5]
Before we move to the characterization of the robust method, there is an additional
quantity which has been used to characterize pq-string field theory simulations, which
is the average physical length of pq-segments. It can be defined as follows,
l pq =
L pq
N pq
(5.39)
where N pq is the number of pq-segments and L pq is now defined as L pq =
(Ncells − 1) × Δx. We additionally consider only regions with Ncells larger than
7 (equivalently, L pq > 3.0). Here we note another disadvantage of the fast method:
since it does not classify windings into respective segments, it does not compute
N pq , meaning that in order to study such quantity we must use the robust method.
Speaking of the robust method, we can now introduce it.
This method heavily relies on the visualization tooling introduced in Chap. 3: the
catalyst adaptor was changed to output cells pierced by windings of the first sector,
windings of the second sector, the interaction potential Vint and cells identified as
pierced by pq-segments via the fast method. We further note that the value of each cell
will depend on the absolute number of windings that pierce a cell through plaquettes:
one p-string at criticality with λ p = 2e2p = 2 will pierce two faces of a cell, resulting
in having an output value of 2. This is important as strings with higher windings, say
a (2,0) string, would appear as a collection of cells valued at 4 for p-winding cells,
but with no corresponding cells for q-windings. Note that this also leads us to note
that no (2,0) or (0,2) or even other higher winding bound states were found. In total
this means we end up with four (optional) output files per timestep: one per winding
type (p, q or pq) and one for the interaction potential isosurface. All outputs are then
164
5 Strings in U (1) L × U (1) L Simulations
passed to a custom pipeline, adapted from the centerlines script from Chap. 3, which
visualizes not only an isosurface of Vint but also both types of cells, finds overlapping
cells ( pq-strings) and a new connectivity filter is applied to separate each individual
string segment. From this we can perform (and output) the computations of L pq and
N pq , by also correcting for the possible systematic error sources described above.
We further remark that adjusting the size of overlap region is crucial for a better
bound state identification, and that this is immediately obvious from the comparison
of the top (interaction potential) and middle (exact cell overlap) and bottom panels
(one-cell over of threshold) of Fig. 5.6. Although it is clear both identification mechanisms are reasonable, it is also clear that in some cases not allowing some margin
for the overlap inevitably fails to identify all cells pierced by bound states.
We then present the average length of pq-segments, l pq and the mean string separation ξ pq computed via this method introducing one-by-one each of the choices
made to lessen systematic effects. For this comparison, we will use a single 5123
matter epoch run, and compute these quantities either with no correction, then by correcting the off-by-one error, and then by using a minimum threshold on the length of
segments. Note that we keep only one cell of overlap, on purpose, so the uncorrected
case is equal to the fast method. The results can be seen on Fig. 5.7.
It is immediately obvious the one-by-one correction does not alter heavily l pq . This
is somewhat expected, as the number of segments is still the same, only the length
of each segment is reduced by Δx. However, the second reduction does significantly
alter the average length, which makes sense for two reasons: first the threshold value
is higher than the average uncorrected l pq and then because removing small segments,
might not change L pq drastically, but it will reduce the number of segments N pq .
Given that there are a large number of small segments (this can be seen visually
in Fig. 5.6), setting a minimum length threshold will however alter the number of
segments, and consequently l pq . Curiously l pq seems to be stable, whereas in [2], it
begins to evolve linearly around η ∼ 150 for the normal simulations (in combined
simulations, l pq is also stable). This might due to different identification criteria but
it might simply be a result of the smaller dynamic range combined with different
initial conditions. Speaking of using different criteria, remember that there is a higher
minimum segment length (L pq = 20), and a maximum distance between windings is
4 in lattice units in the work of [2]. There is of course no unique way to establish each
of these criteria (only trial and error and visualization), and as is clear l pq is highly
sensitive to them. Using the one-cell-over overlap, we find our choice of minimum
segment length to be reasonable at cleaning small-segments. Comparing to the case
of [5], linear evolution occurs also around time η < 100, but no data is available
before.
We will also need to discuss ξ pq . It seems each of the corrections increases its
value by about ∼10% at all timesteps, although it does not significantly alter the
tendency of ξ pq . It therefore seems the fast method is reasonable at revealing the
scaling evolution of pq-strings. We additionally note that abundances f p , f total and
mean string separations ξW , ξ p , ξq are not significantly altered given that L pq is also
much smaller than L p , L q .
5.2 Validation
165
Fig. 5.6 Different views of a simulation snapshot of a 5123 , Δx = 0.5 simulation at conformal
time η = 101.0 in matter epoch. Cells pierced by p-strings color coded in blue, those pierced by
q-strings in red and pq-strings in green. In the left-hand-side panels we display all string types:
p-strings, q-strings and pq-segments. In the right-hand-side panels we show only pq-strings. The
identification of bound states is done either via the interaction potential isosurface ζ = 0.855 (on
the top) or via winding cell overlap (exact on the middle panel, one cell of tolerance in the lower
panel, obtained via fast method). We further note an additional detail, we are not excluding regions
of cells, using a mininum threshold on the length of the bound state
166
5 Strings in U (1) L × U (1) L Simulations
Fig. 5.7 Plots showing the evolution of l pq and ξ pq in the uncorrected case (which corresponds to
the fast method) and either correcting for the off-by-one error or by additionally correcting for a
minimum segment length
Overall we conclude that the fast method can allow a lightweight exploration of
the dynamics of a cosmic superstring network, with the exception of the computation
of l pq , the average pq-segment length. Note that this will of course carry order ∼20%
corrections to ξ pq although the expected conclusions do not change.
5.3 Impact of Physical Evolution
We will now assess the impact of varying the comoving string width behavior on
bound state abundance and network properties. To do so we will use increase the lattice size to 40963 keeping the same lattice spacing Δx = 0.5 and conformal timestep
size Δη = 0.1. This update gives us more dynamic range, but even then, the “true”
comoving string width, rs would vary too quickly both in radiation and matter epochs
(rs ∝ a −2β and β = 1) for us to be able to resolve the string network either at late
or early times. Therefore, we will use the core growth trick of [1] to allow the string
width to grow in the initial stages of the simulation by setting β < 0. The way we
choose these values is to choose a maximum string width, normalize the scale factor
such that the radius is unity at the end, and from this take note of the maximum radius
at some transition time (and we choose to fix either one or the other). The transition
time is when the value of β jumps from core growth to physical evolution. The value
of β for the growth phase is chosen such that the simulation begins with unit radius
and reaches maximum radius at the transition time.
For the same choice of maximum radius, and again the same normalization at
the end of the simulation, the radius changes more quickly than in radiation for
physical evolution, as such the transition time tends to be later. However, we would
like to have the same dynamic range in either core growth or physical evolution, in
both matter and radiation. As such we fix the transition time, and instead vary the
5.3 Impact of Physical Evolution
167
Table 5.3 A summary of the choices made to prepare the damping of the initial conditions, core
growth phase, conformal time at which one changes to physical evolution and maximum string
radius for physical simulations (β = 1) and constant comoving width ones (β = 0) for both matter
and radiation epochs
m
m damping
βgr owth
β
ηtransition
rs,max
1/2
1/2
2/3
2/3
0.95
0.95
0.75
0.75
−0.25984202
0.0
−0.25984981
0.0
1
0
1
0
343.0
−
343.0
−
3.0
1.0
9.0
1.0
maximum radius. Additionally, before the growth phase, there is a period of diffusive
and damped evolution as was applied in the validation section. Nonetheless we do
make one small difference, given the over-damping noted in matter epoch in the
validation section. This difference corresponds to using a lower damping expansion
rate, of m = 0.75 for matter, while we use m = 0.95 for radiation era. During the
damping period, and to simplify our lives, we take to having constant comoving width
(rs = 1). A summary of all parameter choices can be seen below in the Table 5.3.
For comparison purposes, we will also evolve constant comoving width simulations with the same periods of diffusive and damped evolution (same damping
expansion rate), and the same set of 10 initial conditions is used for all choices (matter, radiation, physical and constant width simulations). As such we will have three
different cases for us to compare β = 0, β = 1 and β < 0. As lowering β from 1
breaks the time-invariance of the discretized action, one can also think of it as parameter that controls violation of energy conservation. Given the role of kinematics in
junction dynamics (see [3]), it is not entirely unexpected that changing β might have
an impact in the formation/destruction of pq-segments. In this sense, the core growth
period, which is a necessary “evil” for shrinking width simulations, ends up giving
us an additional source of comparison.
Let us begin with the full network mean string separation, shown in the panels of Fig. 5.8 for both physical (and core growth) and constant comoving width
simulations–top and middle panels respectively, for both radiation (left-hand-side)
and matter epoch (right-hand-side). Overall there is reasonable agreement between
˙ computed
physical and comoving width simulations in the asymptotic values of ξ,
for conformal time range η ∈ [950, 1024]. Again we see a ∼10% difference between
the two length estimators ξL and ξW . This agreement is not entirely unexpected, and
neither is the existence of a different slope when in the core growth regime. This is
particularly obvious if we take computing a ratio between the quantities in the upper
and lower panel: in core growth the difference between it and PRS keeps increasing,
while this tendency is inverted as soon as we shift to physical evolution. This for now
tells very little about the behavior of bound states core growth + physical simulations,
and therefore we must describe other quantities.
Before we discuss relative abundances and mean string separation of each species
of string, we will first take a small detour to discuss velocities, either computed for
168
5 Strings in U (1) L × U (1) L Simulations
Fig. 5.8 The four panels show the evolution of the mean string separation for the entire network
(all string species included), using either the Lagrangian length estimator (in orange) or the winding
length estimator (in purple). Left panels corresponds to radiation epoch, right to matter epoch. Top
use core growth and subsequent physical evolution, lower panels correspond to constant comoving
width simulations. L pq is computed via the fast method
the full network (Lagrangian weighted) or for pq-segments (interaction potential
weighted). The evolution of such quantities in both growth+physical and in constant comoving width simulations can be found in top and middle panels of Fig. 5.9,
respectively. In both aforementioned figures and in Table 5.4 we can find the asymptotic values of the velocities, computed in conformal time range η ∈ [950, 1024].
Note that in some cases the velocities are not exactly stable and are still decreasing
in range used. This would be the case of v 2pq which keeps decreasing in all cases
5.3 Impact of Physical Evolution
169
2 for either the full
Fig. 5.9 The two panels show the evolution of the mean squared velocity vW
network (by specifying the full Lagrangian as a weight function, in orange) or the pq-segments (by
using the interaction potential as the weight, colored in green). Top use core growth and subsequent
physical evolution, lower panels correspond to constant comoving width simulations. Left panel
corresponds to radiation epoch, right to matter epoch
5 Strings in U (1) L × U (1) L Simulations
170
2 for either the full network
Table 5.4 The asymptotic values of the mean velocity squared vW
(weighted by the Lagrangian) or pq-segments (weighted by the interaction potential) for this simulations from this section, Abelian-Higgs simulations from Chap. 3, pq-strings simulations from
[2]
β m
Size, Δx
v 2 pq
v 2 L
Reference
0
0
0
1
0
0
0
1
1/2
1/2
1/2
1/2
2/3
2/3
2/3
2/3
40963 , Δx
10243 , Δx
10243 , Δx
40963 , Δx
40963 , Δx
10243 , Δx
10243 , Δx
40963 , Δx
= 0.5
= 0.5
= 0.5
= 0.5
= 0.5
= 0.5
= 0.5
= 0.5
0.307 ± 0.005
0.319 ± 0.008
∼0.33
0.309 ± 0.008
0.254 ± 0.005
0.247 ± 0.006
∼0.27
0.249 ± 0.006
0.298 ± 0.005
0.293 ± 0.006
0.306 ± 0.004
0.298 ± 0.004
0.250 ± 0.004
0.253 ± 0.009
0.264 ± 0.006
0.246 ± 0.006
This section
Previous section
[2]
This section
This section
Previous section
[2]
This section
Table 5.5 The asymptotic rate of change of the mean string separation ξ for either the full network
using windings ξW , the rate of change for p-strings only and the rate of change of ξ pq for pq-strings.
For comparison we provide both the literature values (which have no uncertainties reported) and
the Abelian-Higgs values (where the full network estimator is equivalent to only having a single
string type, say p-strings for instance)
β
m
Size, Δx
ξ̇W
ξ̇ p
ξ̇ pq
Reference
0
1/2
40963 , Δx = 0.5
0.172 ± 0.006
0.242 ± 0.010
1.501 ± 0.375
This section,
fast method
0
1/2
10243 , Δx = 0.5
0.194 ± 0.026
0.270 ± 0.050
2.488 ± 0.612
Previous
section, fast
method
0
1/2
5123 , Δx = 1.0
0.15
0.22
–
[5]
1
1/2
40963 , Δx = 0.5
0.179 ± 0.008
0.267 ± 0.015
2.012 ± 0.012
This section,
fast method
0
2/3
40963 , Δx = 0.5
0.174 ± 0.007
0.248 ± 0.007
1.460 ± 0.240
This section,
fast method
0
2/3
10243 , Δx = 0.5
0.194 ± 0.022
0.277 ± 0.042
1.634 ± 0.721
Previous
section, fast
method
0
2/3
5123 , Δx = 1.0
0.15
0.21
–
[5]
1
2/3
40963 , Δx = 0.5
0.194 ± 0.011
0.286 ± 0.021
1.573 ± 0.292
This section,
fast method
except matter era, constant comoving width. Again we see agreement in this time
range for both physical and PRS simulations, however, from the figures one can also
infer that there is a clear disagreement between core growth pq-string velocities and
constant width ones. To quantify the disagreements in core growth or in physical era
we can compute the ratio between the quantities in the upper and middle panels. The
lower panels show that physical evolution and constant comoving width are closer
than the core growth phase to either (Table 5.5).
5.3 Impact of Physical Evolution
171
And so we move to discuss mean string separations of each string species, ξ p , ξq
and ξ pq , along with relative abundances. All figures for the relevant quantities can be
found in Fig. 5.10 for physical simulations, and the equivalent PRS ones are present
in Fig. 5.11 for both radiation and matter epochs. The asymptotic rate of change
of ξ can be found in their corresponding panels but also summarized in Table 5.4.
The figures of ξ p and ξq are rather unsurprising in the sense that scaling is reached
for both types of simulations, and that this scaling is consistent (reasonably similar)
in terms of asymptotic ξ˙ in physical and PRS simulations. The real surprise starts
revealing itself in the figures of ξ pq , where the change from core growth to physical
evolution clearly signals a change in the behavior of bound states. While in core
growth the mean pq-string separation is increasing linearly, as soon as the transition
occurs this quantity decreases, signaling a possible production of bound states, and
˙
then inverts its tendency to growing linearly (albeit with different ξ).
This peculiarity tells us that a change in pq-string abundances must have occurred
after the jump to physical evolution, and that in principle production of bound states
must have taken place. Given that we observe again linear scaling at the very end
of the simulation, this also means the bound state abundances must have stabilized.
Our suspicions are confirmed when looking at the lower panels of Fig. 5.10, which
show the behavior f total and f p . Both decrease in the core growth phase to much
lower abundance than in the PRS case. For instance f p in radiation drops to 0.5%
indicating a much more sparse pq-string network than in the β = 0 case. However,
as soon as the transition happens, p and q-strings begin binding more and more,
producing more bound states. At the end of the simulation, both physical and PRS
abundances are in agreement. The situation is not quite the same for matter era–
although the reason is unclear bound state abundances appear slightly larger (at most
1% for f total ). We additionally note different tendencies in late time behavior for
most abundances (although due to the large uncertainties, one should take this with a
grain of salt). In radiation epoch there seems to be a decreasing tendency for constant
comoving width, but a stable abundance for physical simulations. In matter epoch,
the stable abundances of β = 0 seem to be decreasing in the β = 1 case (albeit we
do not presently know what would happen with larger dynamic range).
From this we can conclude that changing β can clearly have an impact in the formation and destruction of pq-strings, changing the abundances of pq-strings. This
follows from the fact that different values of β different from unity will introduce a
time-dependence upon the action thus spoiling energy conservation. Comparatively
with β = 0 simulations, it is clear the core growth period can result in lower abundances, and in the case of matter epoch, β = 1 will yield slightly higher abundances.
Overall, the effect of physical evolution still results in a sparse pq-string network
and no evidence of frustration is found, with all string species scaling.
Note that we need to verify if these conclusions still hold up with the slow method
and to study the evolution of l pq . We will also take the time to introduce further
refinements to the slow method (such as the use of classification instead of the
connectivity filter) and to use Travelling Salesman methods to create filaments whose
length can be easily computed. The 40963 outputs from Piz Daint have been produced
and will be analysed over the next months.
172
5 Strings in U (1) L × U (1) L Simulations
Fig. 5.10 All panels showcase the evolution of several quantities derived from the total length of
pq-strings present in the box, L pq , derived from the fast computation method, except now we use
core growth up to conformal time η = 343.0 and physical evolution from then onwards, for both
radiation and matter. The mean string separations ξ p and ξq are shown in the two upper plots (red and
blue, respectively); ξ pq in the middle panels (in green); the relative abundances of pq-strings in the
lower panels f total and f p , in purple and red, respectively. As in previous figures, the left-hand-side
corresponds to radiation epoch, the right-hand-side to matter epoch
5.3 Impact of Physical Evolution
173
Fig. 5.11 All panels showcase the evolution of several quantities derived from the total length of
pq-strings present in the box, L pq , derived from the fast computation methods, in constant comoving
width simulations, for both radiation and matter. The mean string separations ξ p and ξq are shown
in the two upper plots (red and blue, respectively); ξ pq in the middle panels (in green); the relative
abundances of pq-strings in the lower panels f total and f p , in purple and red, respectively. As in
previous figures, the left-hand-side corresponds to radiation epoch, the right-hand-side to matter
epoch
174
5 Strings in U (1) L × U (1) L Simulations
5.4 Conclusion
In this chapter we have presented recent (still in progress) work using the U (1) L ×
U (1) L model of [4] to simulate networks with multiple string types. Our work began
by first implementing the model itself by appropriately modifying our multi-GPU
Abelian-Higgs simulation and then by validating its correctness.
The first step do so was checking the validity of Gauss’a law in both string sectors
and by visualizing bound states. The visualization strategy enabled us to use a more
robust way to compute mean pq-segment separation and the average number of
bound state segments in the simulation. We also developed an alternative less robust
but much faster way to compute the amount of string in bound states, which in
conjunction with the network velocity estimators allowed us to verify good agreement
with the work of [2]. There exists a large disagreement with the work [5] for the slopes
of some mean string separations which warrants further investigation. Nonetheless,
the bound state abundances are in agreement with both works.
After the validation was completed, we took to comparing PRS evolution with
physical evolution–ie. with the true equations of motion–which, given the kinematic
conditions for bound state formation (see for example [3] and references therein)
could have an impact in the abundances measured. In order to do so for as much time
as possible, we used a combination of the core growth method and larger lattices of
size 40963 , Δx = 0.5, ergo a larger dynamic range than previously available. The
combination of both lead us to see how changing energy conservation in the action
itself can impact the formation and destruction of bound states.
In the case of core growth nearly all bound states unwind into basic constituent
strings. As soon as the true physical evolution begins bound states start re-forming
and we even reach higher relative abundances than in the PRS case. For instance,
f total reaches around 5% versus about 4% in constant comoving width, for matter
epoch. Even if the uncertainties are quite large, it seems like both abundances exhibit
opposite late time tendencies, decreasing in the PRS case, and increasing in the true
evolution case. It is presently unknown if a larger dynamic range would result in a
stabilization of all fractions, in either case. The core growth phase also shows how
for negative β bound states are quickly unwound and the result is a much lower
abundance (order 0.1%, or 0.2% for f total , f p in matter epoch with the fast method)
of bound states. This allows us to conclude that even if the effect is quite marginal,
changing how the comoving string width behaves (and therefore deviating from
energy conservation at the level of the action) can easily affect which process is
preferred: destruction or formation of bound states.
In the upcoming months we will analyze all the outputs necessary for the robust
method and we will verify if the conclusions are still in agreement with what has thus
far been presented. In principle, given the comparisons made early in the validation
section, we do not expect the conclusions to change.
References
175
References
1. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D 75:065015.
https://doi.org/10.1103/PhysRevD.75.065015
2. Lizarraga J, Urrestilla J (2016) Survival of pq-superstrings in field theory simulations. JCAP
1604(04):053. https://doi.org/10.1088/1475-7516/2016/04/053
3. Rybak IY, Avgoustidis A, Martins CJAP (2019) Dynamics of junctions and the multitension velocity-dependent one-scale model. Phys Rev D 99:063516. https://doi.org/10.1103/
PhysRevD.99.063516
4. Saffin PM (2005) A practical model for cosmic (p, q) superstrings. JHEP 09:011. https://doi.
org/10.1088/1126-6708/2005/09/011
5. Urrestilla J, Vilenkin A (2008) Evolution of cosmic superstring networks: A numerical simulation. JHEP 02:037. https://doi.org/10.1088/1126-6708/2008/02/037
6. Witten E (1985) Cosmic superstrings. Phys Lett 153B:243–246. https://doi.org/10.1016/03702693(85)90540-4
Chapter 6
A New Generation of String Simulations
so much depends
upon
a red wheel
barrow
glazed with rain
water
beside the white
chickens
The Red Wheelbarrow,
William Carlos William
6.1 Overview and Concluding Remarks
The main objective of this thesis was to build a simulation capable of tackling more
realistic and/or more exotic strings at resolutions adequate for deriving reliable constraints with future observational data. This would of course require extreme hardware resources and more optimized simulations than had been available thus far.
Our final objective in fact, was to explore cosmic superstrings—which can be done
either via the Nambu-Goto simulations (albeit no one has, to the author’s knowledge,
attempted to simulate multiple string types yet) or via models with two coupled field
theory sectors, such as the U (1) × U (1) model of [25].
Although the double U (1) has the distinct advantage of being an easy modification of Abelian-Higgs simulations (just copy all fields and add a coupling term
to the potential), it also has the disadvantage of using a type of simulations more
often bottlenecked by hardware resources and/or degree of optimization. The experience of the author of this manuscript with domain wall simulations (field theory)
and a previous interest in General Purpose Graphics Processing Unit programming
eventually led us to take this path instead of the Nambu-Goto one.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations,
Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_6
177
178
6 A New Generation of String Simulations
To such an end, this thesis initially focused in creating simulations adequately
tuned for extreme hardware resources, as seen on the third chapter. This journey
began with the graphically accelerated domain walls software, where it was proved
that there was a tangible benefit to using graphical processors in lieu of traditional
Central Processing Units. To be more exact, when compared to the standard version
of the domain walls code in two spatial dimensions, we obtained speed-ups of order
200 in single precision and order 50 in double precision. Having demonstrated that
GPU’s can be used for field theory simulations we then took to implementing, in
CUDA, the evolution of Abelian-Higgs strings on a lattice. We further attested to
the speed-up benefit of using GPU’s for field theory simulations, as it showed a
single GPU could complete operations at a rate of order 10−10 gpu-sec/site versus
the 10−7 − 10−6 core-sec/site for the Latfield2 based Lattice Abelian Higgs.
Still, this much higher throughput needed to be put to the test in a supercomputing
facility where communications between GPUs could easily limit the scalability (and
therefore speed-ups) of our application. Fortunately, in such an environment (the
Piz Daint supercomputer) we demonstrated weak scaling up to 4096 GPUs. In spite
of the comparatively poor strong scaling, the weak scaling capabilities of this code
allows an order 30 speed-up compared to Lattice Abelian Higgs. Given that only
outputting average quantities would limit future scientific work, we also added in-situ
visualization to our simulation. This is to evade/lessen any Input/Output bottleneck
that might rear its ugly head. Using winding output with in-situ versus the whole
lattice we showed a reduction of computational time and output sizes at large lattice
sizes (of 4 and one order of magnitude, respectively).
The resulting simulations are therefore the “Red Wheelbarrow” of this work (see
beginning epigraph of this chapter). From them, sprung the scientific works collected
on the fourth chapter. These are based on the calibration of a version of the canonical
semi-analytical model for string evolution. Known as the Velocity-dependent OneScale model, its extended version features explicit radiative energy loss functions
(in addition to loop production only) and a generalized momentum parameter. When
introduced it was shown to adequately predict the form of each velocity dependence
function and even the evolution in the radiation-to-matter transition [20]. We further
tested this model in the domain walls context, by calibrating it with the linear scaling
reached by (anisotropic) super-horizon networks. This shows that even the approach
to scaling is reasonably well described by the model post-parameter determination.
We then moved to the gauged strings case, where first we showed the model could
describe the asymptotic values of small lattices.
From this preliminary calibration we could already determine that, unlike the
domain walls case, energy loss by loop production does contribute significantly to
energy loss of the network and that the analytical ansatz for the momentum parameter cannot reproduce simulations. Given the importance of proper model parameter
estimation for observational footprints of string networks we set out to improve both
our calibration pipeline and to find and eliminate possible systematic error sources.
The modifications of the calibration pipeline included automatic error propagation
and Bayesian inference for proper uncertainty estimation on model parameters. The
exploration of systematic error sources was done via three different ways: by vary-
6.1 Overview and Concluding Remarks
179
ing the amount of cooling on initial conditions, by increasing the resolution of subhorizon scales and by choosing different velocity estimators. First, we saw that a
too large amount of cooling would have a significant impact on model parameters,
suggesting a change in the resulting network: smoothened small-scale structure and
larger contribution of loop production to energy losses. Then increasing dynamic
range/lattice size seems to suggest that there exists a minimum resolution beyond
which model parameters will change minimally (similar to the domain walls case).
Changing velocity estimators seemed to result in an apparent contradiction of model
predictions, where either radiative energy loss would be the only relevant energy loss
process or both loop-production and radiative losses could in equal weight contribute
at medium expansion rates (radiation, matter). However, this apparent contradiction
can be solved by reducing the lattice spacing, thus forcing both calibrations to be in
agreement.
With this we suggest a best and worst case scenario calibration and set out to
show how this has an impact on observational consequences. To do so, we compute
the Cosmic Microwave Background anisotropies for several cases and show that
different calibrations can induce scale-dependent differences. Although we do not
compute new limits on string tension, this illustrates the need for the best possible
calibration.
After all of these tasks were completed we moved on to the end-goal of this thesis:
to explore cosmic superstrings via simulations of multiple interacting cosmic strings.
All work is based on implementing the U (1) L × U (1) L model of [25]. Initially
we made sure that, despite differences on either lattice spacing or initial condition
treatment, our simulations were compatible with what is shown in the literature, in
terms of velocity estimations and pq-segment abundances. Past this initial task, we
simulated this toy model (p,q) string networks with true physical comoving width
and higher resolution and dynamic range than was previously available. Given how
quickly the comoving string radius drops in these simulations, we also needed to use
the core growth “trick,” which also says something about bound state velocities and
abundances in this regime. We see that in core growth pq-segments quickly unwind
and the remaining few have a large velocity. Immediately after we switch to the true
physical evolution, bound states begin to form again, as evidenced by the behavior
of relative abundances, and with lower and decreasing velocities. The abundances
eventually even become larger than the abundances for constant comoving width
simulations, in the case of matter epoch. After the transition to shrinking width, the
mean string separation of pq-segments does achieve scaling, even with the reduced
amount of conformal time to reach it.
In order to wrap-up this manuscript we will now consider what are the possible
next steps to take, both from a computational perspective, and from the possible
avenues to explore the physics of cosmological defect networks.
180
6 A New Generation of String Simulations
6.2 Next Steps
6.2.1 Computational Improvements
In order to state some tentative next steps to improve the compute aspects of the
simulation, it helps to put into perspective some of the advances in recent years
across all simulation types. This will of course entail a comparison of features of
certain codes and will result in a description of what the author believes an ideal (in
computational terms) Abelian-Higgs simulation might entail. Note that we cannot
promise to implement all of the features to be discussed, it is merely some food for
thought.
First up, we have to look into two libraries that serve a similar purpose, LatField2
[9] and the recently unveiled CosmoLattice [11]. They serve a similar goal in the
sense that both are libraries which allow the user to implement fields on a lattice with
very little effort from the user. Both use only MPI (one with a 1D decomposition, the
other with a 2D decomposition) and both have Fast Fourier Transform capabilities.
Although we could attempt to write a complete library, it is true it can be more
difficult to optimize it for every single case, especially in CUDA. Nevertheless some
steps were already taken during development to make this simulation a bit more userfriendly and its code much more reusable. In terms of the Fast Fourier Transform
capabilities, the need for a 2D decomposition can hinder our performance, however,
due to the communication heavy nature of FFTs they are bound to scale poorly already
(even worse if we consider using GPUs). Implementing this is a good way to delve into
the brute-force computation of Unequal Time Correlators for observational footprints
of networks, therefore we should eventually add such capabilities to the simulations.
Luckily some success has been obtained with reasonable weak scaling of FFTs by the
pseudo-spectral solver of [24] and in mixed CPU/GPU with accFFT [12]. Last but
not least, the I/O server used by Latfield2 solves the issue of communication between
aggregators dominating runtime easily. For outputs with sizes that are multiples of
lattice size, in-situ makes this a non-issue. For initial conditions files we have thus far
used a file-per-process approach after failing to achieve good performance in singleshared file mode on Daint. The bottleneck responsible for this is the communication
between processes and aggregator processes, and it’s exactly here that a dedicated
I/O server could be useful. A shift to a hybrid approach with a file per aggregator
could also be studied, making use of the ADIOS2 library [13].
Next we have two possibilities which can allow us to simulate even larger swathes
of space time, especially given that GPUs do not possess a lot of memory: the diamond
decomposition from the Nambu-Goto simulations of [5] and the Adaptative Mesh
Refinement capabilities from the library GRChombo [6], as seen in [10, 15]. The first
one subdivided the 4D simulation volume into smaller 4D volumes with only initial
and final surfaces. There is no communication between volumes, since one merely
reads and writes initial and final surfaces for each diamond. As such, the simulation is
never memory limited (although a larger number of diamonds will of course result in
faster simulation times). It is interesting to note that the most promising optimization
6.2 Next Steps
181
for stencil CUDA-kernels makes use of a 4d lattice [22] suggesting the two could
possibly work in tandem. The second way to evade memory costs is to use Adaptative
Mesh Refinement. In principle, in this technique one would create sub-lattices with
smaller spacing only in regions of interest, and maintain a coarse lattice everywhere
else. The level of refinement necessary will depend on the string width (if allowed
to shrink), and the number of cells tagged for refinement will depend on how dense
the network is—both quantities are related to the expansion rate. Implementing this
can also be a technical challenge to do exclusively with GPUs, as the (sub-)lattices
with fewer points might not exploit completely the high thread level parallelism of
graphical accelerators. Therefore, we can hazard a guess that a mixed approach with
CPU cores for lattices with fewer points and GPUs for the denser lattices might be
the correct call.
6.2.2 Small-Scale Structure of Abelian-Higgs Strings
In addition to the computational improvements described above, we can also highlight some extra physics that can be extracted from simulations outputs, in this case
precisely from the centerlines in-situ capabilities described in Chap. 3. Knowing the
exact position of strings at multiple timesteps, means we can extract worldsheet position vectors X and Ẋ for each string on the box. As kinks, cusps and loops take a
central importance in understanding observational consequences of strings, it means
we are in the unique position of understanding small-scale structures features (called
wiggles) in strings evolving in different cosmological backgrounds.
To exemplify, consider the stochastic gravitational wave background (SGWB).
In the introduction we mentioned the typical size of loops at formation (expressed
as fraction of horizon, α) and the loop spectra (Pi ) is specified from Nambu-Goto
simulations—with the centerline detection and reconstruction, we can extract similar
statistics and even compare the loop number density n(l, t) obtained via VOS with
the directly measured one. There is however, an important question that should be
answered before we do any of this: for what loops and at what scales is the NambuGoto approximation valid? The work of [21] has shown that initially square loops
decay according to Nambu-Goto up until a point, where the loops are small enough
for it to decay far more rapidly than NG predicts. This led the authors to propose
the existence of a kink scale below which the NG approximations is invalid, and
above loops would decay via gravitational radiation. However, [17] have shown,
with network loops formed in flat space, that the NG approximation fails at all scales.
While these results appear contradictory, the reality is that it shows that depending
on the initial configuration of loops, and on the small-scale features of them, the NG
approximation may or may not hold. What these studies do not show however, is
how loops in a cosmological background behave—and as shown in [18] flat space
strings have different small-scale properties from those of strings in radiation or
matter epochs. This is also evident when visually comparing loops in [17] and loops
in radiation era from [7, 8].
182
6 A New Generation of String Simulations
In order to investigate if Nambu-Goto is a reasonable approximation of AbelianHiggs strings and at what scales, we are now collaborating with Daniel JimenezAguilar, Jose Juan Blanco-Pillado and Jon Urrestilla in order to extract string centerlines and use them as initial conditions in Nambu-Goto simulations. We are currently
working on the 2D case (domain walls and Nambu-Goto), although we expect to soon
move towards 3D with local Abelian-Higgs strings. On our side, this involves finding
links throughout the lattice and interpolating where the center of the wall lies. This
then results in a collection of points, separated into different regions (loops) which
need to be organized to construct a proper centerline. Note that the non-existence
of windings, and therefore a magnetic flux to tell us how to connect each point to a
neighbor, means we need a different method to order points. This involved exploring
Travelling Salesman algorithms to always find the shortest path that links all these
points, and thus constructs full centerlines. Then these inputs are passed to our collaborators which are then able to compute local velocities and evolve in Nambu-Goto
as necessary.
In parallel, we can also study how some properties of the string network change
with scale and provide a comparison with NG expectations. This change of “description with scale” is not an intrinsic property of the network, being rather a dependency
of average features of the network on scale. An example would be how the total length
of a coastline changes with the size of “ruler” used to measure it. This dependency
on coarse or fine-graining the description of a string has clear ties with the concept
of renormalization, and in fact a quantity which is renormalized for a wiggly string
is the mass per unit length μ. The more small-scale features a string network has, the
more μ will diverge from the bare μ0 at small-scales. As the coastline analogy might
have hinted, μ is also related to multi-fractal dimension,
∂lnμ
≈ dm (l) − 1
∂lnl
(6.1)
where dm (l) is the multi-fractal dimension of the network and l is a coarse-graining
scale. The computation of the renormalized mass per unit length has recently started
being explored by Filipe Costa of the Faculty of Sciences of the University of Porto,
with small lattices (2563 , x = 0.5) in radiation era, using spheres to set the coarsegraining scale along the string. The idea is to vary the radius of spheres necessary to
cover a segment between two points, and then compute the ratio between Euclidean
and comoving (along the string) distance. So far, preliminary tests of the script
for computing renormalized mass per unit length provide the correct results for
simple cases (such as a semi-circle) and show an increase of μ for long strings from
small to large (near-horizon) sized scales. After more refinements (particularly in
the performance of the pipeline) are ready, we will use it for studying larger lattices,
output on Piz Daint. Note that while a previous study of small-scale structure exists
for Abelian-Higgs networks [16], it was done only for 5123 (we are aiming for even
larger lattices) and started from a different set of variables—two-point correlations
of X and Ẋ —which can also be related to fractal dimension and μ.
6.2 Next Steps
183
Obtaining a complete description of μ will also enable an exploration of the
wiggly-string VOS [19, 26] and of the possible scaling solutions that exist for flatspace or cosmological backgrounds (see [1]). This model is derived by assuming
a modification of the Nambu-Goto action, where wiggles are described as a masscurrent propagating along the string. In this VOS, there are partial different equations
describing the evolution of mass per unit length μ, rms velocity and total energy
and how they related to one another depending on scale. This will be explored in
collaboration with Ana Almeida and Filipe Costa, both of the Faculty of Sciences of
the University of Porto. Comparisons with the analytical models of [2, 23] are also
a possible avenue to explore.
6.2.3 Further Exploration of String Networks in the
U(1) L × U(1) L Model
Given that we have studied already the scaling of pq-strings in large resolutions with
physical evolution (ergo, shrinking comoving width on the strings), we can now also
explore the effect of unequal tensions for the constituent strings. Arguably, given
that the tension spectra of cosmic superstrings arises from unequal tensions, with
μ p = μ F and μq = μ F /gs we could consider this case the more realistic one. In
principle, all that is necessary is to set one of the symmetry breaking scales such that
2σ p = σq . In this particular case, one should carefully consider how to introduce
new velocity estimators and different criteria (from overlaps to interaction potential
thresholds) to characterize each possible (new) species of string.
One possible avenue to explore this particular model pertains to a differentchoice
of parameter values. In case the coupling constant is chosen such that κ < − λq λ p
only one the U (1) symmetries is broken. This results in a string in the broken sector,
with a condensate core courtesy of the unbroken symmetry. Such strings possess a
current and are therefore known as superconducting strings. To the authors knowledge, and at the time of writing, a cosmological full network simulation of superconducting strings has never been performed. The current cosmic superstrings code
could be used for this endeavour, albeit it requires some careful tuning of parameters
(to ensure a non-vanishing, positive mass, condensate inside the strings, see [14])
and numerical choices. One example of a numerical choice, would be the choice
of initial conditions, where the scalar fields in the charged sector can be chosen to
mimic a homogenous charged background as in [3, 4] and the gauge fields of this
sector must be set in order to obey Gauss’s law.
184
6 A New Generation of String Simulations
References
1. Almeida ARR, Martins CJAP (2021) Scaling solutions of wiggly cosmic strings
2. Austin D, Copeland EJ, Kibble TWB (1993) Evolution of cosmic string configurations. Phys
Rev D 48:5594–5627. https://doi.org/10.1103/PhysRevD.48.5594
3. Battye RA, Pearson JA (2010) Charge, junctions and the scaling dynamics of domain wall
networks. Phys Rev D 82:125001. https://doi.org/10.1103/PhysRevD.82.125001
4. Battye RA, Pearson JA, Pike S, Sutcliffe PM (2009) Formation and evolution of kinky vortons.
JCAP 09:039. https://doi.org/10.1088/1475-7516/2009/09/039
5. Blanco-Pillado JJ, Olum KD, Shlaer B (2012) A new parallel simulation technique. J Comput
Phys 231:98–108. https://doi.org/10.1016/j.jcp.2011.08.029
6. Clough K, Figueras P, Finkel H, Kunesch M, Lim EA, Tunyasuvunakool S (2015) GRChombo:
numerical relativity with adaptive mesh refinement. Class Quant Grav 32(24):245011. https://
doi.org/10.1088/0264-9381/32/24/245011
7. Correia J, Martins C (2021a) High-resolution GPU-accelerated Abelian-Higgs string simulation: length colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710664
8. Correia J, Martins C (2021b) High-resolution GPU-accelerated Abelian-Higgs string simulation: velocity colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710670
9. Daverio D, Hindmarsh M, Bevis N (2015) Latfield2: A c++ library for classical lattice field
theory
10. Drew A, Shellard EPS (2019) Radiation from global topological strings using adaptive mesh
refinement: methodology and massless modes
11. Figueroa DG, Florio A, Torrenti F, Valkenburg W (2021) CosmoLattice
12. Gholami A, Hill J, Malhotra D, Biros G (2015) Accfft: a library for distributed-memory FFT
on CPU and GPU architectures. CoRR, abs/1506.07933 http://arxiv.org/abs/1506.07933
13. Godoy WF, Podhorszki N, Wang R, Atkins C, Eisenhauer G, Gu J, Davis P, Choi J, Germaschewski K, Huck K, Huebl A, Kim M, Kress J, Kurc T, Liu Q, Logan J, Mehta
K, Ostrouchov G, Parashar M, Poeschel F, Pugmire D, Suchyta E, Takahashi K, Thompson N, Tsutsumi S, Wan L, Wolf M, Wu K, Klasky S (2020) Adios 2: the adaptable input output system. a framework for high-performance data management. SoftwareX 12:100561. ISSN 2352-7110. https://doi.org/10.1016/j.softx.2020.100561. https://
www.sciencedirect.com/science/article/pii/S2352711019302560
14. Hartmann B, Carter B (2008) Logarithmic equation of state for superconducting cosmic strings.
Phys Rev D 77:103516. https://doi.org/10.1103/PhysRevD.77.103516
15. Helfer T, Aurrekoetxea JC, Lim EA (2019) Cosmic string loop collapse in full general relativity.
Phys Rev D 99(10):104028. https://doi.org/10.1103/PhysRevD.99.104028
16. Hindmarsh M, Stuckey S, Bevis N (2009) Abelian Higgs cosmic strings: small scale structure
and loops. Phys Rev D 79:123504. https://doi.org/10.1103/PhysRevD.79.123504
17. Hindmarsh M, Lizarraga J, Urio A, Urrestilla J (2021) Loop decay in Abelian-Higgs string
networks
18. Martins CJAP, Shellard EPS (2006) Fractal properties and small-scale structure of cosmic string
networks. Phys Rev D 73:043515. https://doi.org/10.1103/PhysRevD.73.043515
19. Martins CJAP, Shellard EPS, Vieira JPP (2014) Models for small-scale structure on cosmic strings: mathematical formalism. Phys Rev D 90(4):043518. https://doi.org/10.1103/
PhysRevD.90.043518
20. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10.
1103/PhysRevD.93.043534
21. Matsunami D, Pogosian L, Saurabh A, Vachaspati T (2019) Decay of cosmic string loops due
to particle radiation. Phys Rev Lett 122(20):201301. https://doi.org/10.1103/PhysRevLett.122.
201301
22. Nguyen A, Satish N, Chhugani J, Kim C, Dubey P (2010) 3.5-d blocking optimization for
stencil computations on modern cpus and gpus. In: 2010 ACM/IEEE international conference
References
23.
24.
25.
26.
185
for high performance computing, networking, storage and analysis. pp 1–13. https://doi.org/
10.1109/SC.2010.2
Polchinski J, Rocha JV (2007) Cosmic string structure at the gravitational radiation scale. Phys
Rev D 75:123503. https://doi.org/10.1103/PhysRevD.75.123503
Ravikumar K, Appelhans D, Yeung PK (2019) Gpu acceleration of extreme scale pseudospectral simulations of turbulence using asynchronism. In: Proceedings of the international
conference for high performance computing, networking, storage and analysis, SC ’19, New
York, NY, USA. Association for computing machinery. ISBN 9781450362290. https://doi.org/
10.1145/3295500.3356209.
Saffin PM (2005) A practical model for cosmic (p, q) superstrings. JHEP 09:011. https://doi.
org/10.1088/1126-6708/2005/09/011
Vieira JPP, Martins CJAP, Shellard EPS (2016) Models for small-scale structure on cosmic strings. II. Scaling and its stability. Phys Rev D 94(9):096005. https://doi.org/10.1103/
PhysRevD.94.096005. [Erratum: Phys Rev D 94, 099907 (2016)]
Curriculum Vitae
José R. C. C. C. Correia
PostDoctoral Researcher
“A very large part of space-time must be investigated, if reliable
results are to be obtained”–Alan Turing
Work experience
PostDoctoral researcher
Helsinki, Finland
Department of Physics, University of Helsinki
Sep. 2022 - Ongoing
PhD Researcher
Porto, Portugal
Faculdade de Ciências da Universidade do Porto Jan. 2017 - May 2022
Education
PhD in Physics
Porto, Portugal
Faculdade de Ciências da Universidade do Porto Jan. 2017 - May 2022
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer
Nature Switzerland AG 2023
J. R. C. C. C. Correira, A New Generation of Superstring Simulations,
Springer Theses, https://doi.org/10.1007/978-3-031-20229-2
187
188
Curriculum Vitae
MSc in Physics
Porto, Portugal
Faculdade de Ciências da Universidade do Porto Sep. 2014 - Sep. 2016
BSc in Physics
Porto, Portugal
Faculdade de Ciências da Universidade do Porto Sep. 2010 - Sep. 2014
Skills
DevOps
Parallel Computing
Programming
Languages
Docker, CircleCI
CUDA, OpenCL, MPI
C/C++, Python, Julia, LaTeX
Portuguese, English, Italian, Spanish, French
Honors & Awards
2022 Springer Thesis Award, PhD Thesis published by Springer Nature Online
2021 PRACE Best Poster Award, EuroHPC Summit Week
Online
List of Publications
Multitension strings in high-resolution U (1) × U (1) simulations Phys. Rev. D
106, 043521 (published)
Aug. 2022
J. R. C. C. C. Correia, C. J. A. P. Martins
High Resolution calibration of string network evolution
(accepted)
arXiv:2110.15427
J. R. C. C. C. Correia, C. J. A. P. Martins
Oct. 2021
High Resolution calibration of the cosmic strings velocity dependent one-scale
model
Phys. Rev. D 104, 063511 (published)
J. R. C. C. C. Correia, C. J. A. P. Martins
Abelian-Higgs cosmic string network evolution with multiple GPUs
Comput. 34, 100438 (published)
J. R. C. C. C. Correia, C. J. A. P. Martins
Sep. 2021
Astron.
Jan. 2021
Quantifying the effect of cooled initial conditions on cosmic string network
evolution
Phys. Rev. D 102, 043503 (published)
J. R. C. C. C. Correia, C. J. A. P. Martins
Aug. 2020
Curriculum Vitae
189
Abelian-Higgs cosmic string network evolution with CUDA
32, 100388 (published)
Astron. Comput.
Jul. 2020
J. R. C. C. C. Correia, C. J. A. P. Martins
Extending and Calibrating the Velocity dependent One-Scale model for Cosmic
Strings with One thousand field theory simulations Phys. Rev. D 100, 103517
(published)
Nov. 2019
J. R. C. C. C. Correia, C. J. A. P. Martins
Effects of Biases in Domain Wall Network evolution II. Quantitative analysis
Phys. Rev. D 97 8, 083521 (published)
J. R. C. C. C. Correia, I. S. C. R. Leite, C. J. A. P. Martins Apr. 2018
General purpose graphics-processing-unit implementation of cosmological
domain wall network evolution
Phys. Rev. E 96 4, 043310 (published)
Oct. 2017
J. R. C. C. C. Correia, C. J. A. P. Martins
Effects of Biases in Domain Wall Network evolution Phys. Rev. D 90 2, 023521
(published)
J. R. C. C. C. Correia, I. S. C. R. Leite, C. J. A. P. Martins Jul. 2014
Conference Participations
CarterFest: Black Holes and other Cosmic Systems
Paris, France
Jul. 2022
Attended
12th Iberian Gravitational Waves Meeting
Braga, Portugal
Jun. 2022
Attended
GRChombo meeting
Cambridge, United Kingdom
Mar.-Apr. 2022
Attended
GR Seminar
Cambridge, United Kingdom
Presented Seminar titled “On Multitension string
networks”
CSCS User Lab Day
Attended
COSMO21
Mar. 2022
Online
Sep. 2021
Online
Presented talk titled “High resolution calibration of
string modelling”
Aug. 2021
190
Curriculum Vitae
Marcel Grossmann 16
Online
Presented talk titled “High resolution calibration of
Jul. 2021
string modelling”
Current challenges in gravitational physics workshop
Online
Apr. 2021
Attended
Interactive Computing with Jupyter on Piz Daint, using Python, ParaView and
Julia
Online
Apr. 2021
Attended
Ibericos 2021
Online
Presented talk titled “On improving string
Mar. - Apr. 2021
modelling”
EuroHPC Summit week 2021
Online
Presented poster titled “Coding the cosmos: A New Generation of
Superstring Simulations”
Mar. 2021
Zooming in on Strings and Vortons workshop
Online
Oct. 2020
Attended
Scientific visualization course
Online
Oct. 2020
Attended
Encontro Nacional de AStronomia e Astrofísica XXX
Online
Presented poster titled “Overcooling string simulations”
Texas Symposium 2019
Sep. 2020
Portsmouth, United Kingdom
Presented talk titled “Calibrating string evolution
modelling”
Dec. 2019
IA-ON 2019
Porto, Portugal
Presented talk titled “On string evolution and graphical
computing”
Encontro Nacional de AStronomia e Astrofísica XXIX
Oct. 2019
Online
Presented poster titled “On extending cosmic string analytical
modelling with one thousand simulations”
Sep. 2019
Ibericos 2019
Bilbao, Spain
Presented talk titled “On cosmic string evolution and graphical
supercomputing”
Apr. 2019
Curriculum Vitae
191
Cosmic Topological Defects: Dynamics and Multi-Messenger Signatures Leiden, Netherlands
Presented talk titled “Cracks in the sky: Cosmic String Evolution
with the compute unified device architecture”
Sep. 2018
Gravity@Prague School
Salamanca, Czech Republic
Presented poster titled “Cracks in the sky: cosmic string evolution
with CUDA”
Sep. 2018
XIII Reunión Scientífica de la Sociadad Española de Astronomía (SEA) 2018
Salamanca, Spain
Jul. 2018
Presented poster titled “Anisotropic Domain Walls”
Ibericos 2018
Lisbon, Portugal
Presented talk titled “GPGPU Anisotropic Domain Walls” Apr. 2018
Encontro Nacional de Astronomia e Astrofísica XXVII
Lisbon, Portugal
Presented talk titled “GPGPU Domain Wall network
Jul. 2017
simulations”
Física 2016
Braga, Portugal
Presented poster titled “Search for new vector- like quarks in
hadronic topologies”
Sep. 2016
Ibericos 2014
Aveiro, Portugal
Presented talk titled “Effects of biases on domain wall network
evolution”
Mar. 2014
Encontro Nacional de Astronomia e Astrofísica XXIII
Lisbon, Portugal
Presented talk titled “Different Views on cosmic defect
evolution”
Jul. 2013
Outreach
“O Universo numa caixa” Blog article
Writer
“Strings of the cosmos” Article
Interviewee
SapoTeK, online
Jul. 2021
ScienceNode, online
May 2021
192
Curriculum Vitae
“Iniciativa de computação avançada da UE destingue estudante da FCUP”
Press Release
Portal de Notícias da UP, online
Apr. 2021
Interviewee
“José Correia wins PRACE Best Poster Award” Press Release
Website, online
PRACE
Mar. 2021
Interviewee
“Accelerated and improved simulations shed light on topological defects in the
Universe” Press Release
CSCS Website, online
Feb. 2021
Interviewee
“Trabalhar em Cosmologia: Simulações de defeitos” Outreach talk
Portugal
Talk presenter
“Fendas no Universo” Outreach talk
Talk presenter
Ovar,
Feb. 2020
Tomar, Portugal
Nov. 2018
COSMOEspresso goes to School // Job shadowing Multiple locations, Portugal
Mentor
2018 - Ongoing
Partículas: do Bosão Higgs à matéria escura
Braga, Portugal
Exhibition staff
Feb. - Mar. 2016
Download