Springer Theses Recognizing Outstanding Ph.D. Research José Ricardo C. C. C. Correira A New Generation of Cosmic Superstring Simulations Springer Theses Recognizing Outstanding Ph.D. Research Aims and Scope The series “Springer Theses” brings together a selection of the very best Ph.D. theses from around the world and across the physical sciences. Nominated and endorsed by two recognized specialists, each published volume has been selected for its scientific excellence and the high impact of its contents for the pertinent field of research. For greater accessibility to non-specialists, the published versions include an extended introduction, as well as a foreword by the student’s supervisor explaining the special relevance of the work for the field. As a whole, the series will provide a valuable resource both for newcomers to the research fields described, and for other scientists seeking detailed background information on special questions. Finally, it provides an accredited documentation of the valuable contributions made by today’s younger generation of scientists. Theses may be nominated for publication in this series by heads of department at internationally leading universities or institutes and should fulfill all of the following criteria • They must be written in good English. • The topic should fall within the confines of Chemistry, Physics, Earth Sciences, Engineering and related interdisciplinary fields such as Materials, Nanoscience, Chemical Engineering, Complex Systems and Biophysics. • The work reported in the thesis must represent a significant scientific advance. • If the thesis includes previously published material, permission to reproduce this must be gained from the respective copyright holder (a maximum 30% of the thesis should be a verbatim reproduction from the author’s previous publications). • They must have been examined and passed during the 12 months prior to nomination. • Each thesis should include a foreword by the supervisor outlining the significance of its content. • The theses should have a clearly defined structure including an introduction accessible to new PhD students and scientists not expert in the relevant field. Indexed by zbMATH. José Ricardo C. C. C. Correira A New Generation of Cosmic Superstring Simulations Doctoral Thesis accepted by Centro de Astrofísica da Universidade do Porto Rua das Estrelas s/n, Porto, Portugal Author Dr. José Ricardo C. C. C. Correira Cosmology Thematic Line Centre for Astrophysics of the University of Porto Porto, Portugal Supervisor Prof. Carlos Martins Centro de Astrofïsica e Astronomia da Universidade do Porto Instituto de Astrofísica e Ciências do Espaço Porto, Portugal ISSN 2190-5053 ISSN 2190-5061 (electronic) Springer Theses ISBN 978-3-031-20228-5 ISBN 978-3-031-20229-2 (eBook) https://doi.org/10.1007/978-3-031-20229-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland In memory of, Armanda Oliveira Pereira Campos Maria Inês Correia Maria Branca da Silva Santos Supervisor’s Foreword Cosmic string networks are the best motivated fossil relics of the early universe. Our current understanding of particle physics implies that they must have formed in cosmological phase transitions, but so far, they have not been detected, which, in principle, could place very stringent constraints on several theoretical paradigms. This is also the case in superstring inspired inflation models, where so-called cosmic superstrings may form, providing a cosmological-scale fossil of string theory. Such constraints primarily come from the cosmic microwave background, pulsar timing and gravitational waves, but recent examples find limits on relevant model parameters which span more than ten orders of magnitude. It follows that existing constraints are manifestly unreliable, and more robust ones are mandatory. How can this problem be solved? Being intrinsically nonlinear objects, the study of the cosmological evolution of defect networks unavoidably requires both analytic modeling (the canonical model thereof being due to Martins and Shellard) and high-resolution numerical simulations. Thus, for the sake of simplicity, one often focuses of the simplest such defects, e.g., Abelian-Higgs strings, but realistic cosmic strings will have non-trivial internal structure, including charges and currents. And realistic networks will lead to observational predictions that differ from those of simpler strings. This is where José Ricardo Correia’s thesis comes in. Its main outcome is the first GPU-accelerated defect network evolution code (including technical in situ visualization advances). The code is more than 30 times faster than previousgeneration CPU-based codes, which has removed a 20-year bottleneck in the field, and opened new avenues of research, including extending defect studies beyond the simplest Abelian-Higgs strings, to more realistic superconducting strings and cosmic superstrings. The new code enables the world’s largest (81923 or larger) field theory simulations of plain cosmic strings, superconducting strings, and then cosmic superstrings, complementing ongoing work in our team to develop general analytic evolution models for cosmic strings with internal degrees of freedom, including charges and vii viii Supervisor’s Foreword currents. Specifically, the numerical simulation diagnostics (densities, RMS velocities, correlation lengths, loop distributions, etc.) will be used to obtain, through a full MCMC-based analysis, a rigorous calibration of these analytic models. Ultimately, this thesis paves the way for improving the reliability of observational constraints on cosmic strings and superstrings. The GPU-accelerated code enables generating thousands of full sky template maps, which are crucial to obtain reliable predictions for the CMB signatures of these networks and for their stochastic gravitational wave background, as well as forecasts. (Currently these are based on a few maps or naive toy models.) The ultimate goal is to develop tools to search for these fossils not only with current observational facilities, as well as forthcoming ones (such as the SKAO) and longer term ones (such as LISA). This thesis therefore lays out a new paradigm in field theory simulations of topological defect networks. As such it will be an ideal starting point for the new generation of students entering the field, but also an important reference for researchers wanting to catch up on what is, undoubtedly, the new gold standard in the field. Porto, Portugal July 2022 Carlos Martins Abstract Topological defects are unavoidable consequence of some phase transitions in the early Universe, being produced via the Kibble mechanism. Depending on the symmetry broken, different defects can be produced, from planar domain walls, to line-like cosmic strings to point-like monopoles. The safest of these, in the sense that it usually cannot overclose the Universe, and therefore the most studied, is the cosmic string. Given that these objects provide unequivocal evidence of new physics theories, beyond the Standard Model, the expected imprints of such defect networks are often primary targets for upcoming and current observational facilities. In order to study the evolution of defect networks for analytical and observational purposes, there are two possible approaches, one based on simulations and another based on semi-analytical models. Although one can think of them as separate approaches, the reality is that the relationship between the two is instead symbiotic. For example, given the orders of magnitude over which the typical defect width changes in throughout cosmological history, and the large dynamic range required to even evolve for such a long conformal time, simulations are bottlenecked and cannot simulate a network of defects throughout its entire lifetime. The solution would be to use a semi-analytical model, which properly calibrated can evolve some mean quantities describing the network throughout its life. However, we emphasize this requires proper calibration of unknown a priori model parameters, which can be done via a comparison with the scale-invariant evolution of a string network from a simulation. To set tighter experimental constraints and probe larger parameter spaces of theoretical models, one often requires larger and larger resolution and larger dynamic range in simulations. We set out to alleviate this problem. One way to do so is to explore the use of graphical supercomputing to accelerate field theory simulations of defects. All of the work necessary to achieve this is presented in Chap. 3. In the aforementioned chapter, we first present a simulation of domain walls which can only use a single graphical accelerator, and then move on to single and multiple accelerator versions of Abelian-Higgs cosmic string simulations. The main result is that indeed there are noticeable gains, either in speed-ups or reduction of required ix x Abstract computational time. With this we can simulate the largest possible lattices attainable so far: 81923 for Abelian-Higgs strings, using 4096 graphical accelerators. Equipped with faster and larger resolution simulations we set out to calibrate the canonical semi-analytical model of defect evolution—the Velocity-dependant One Scale model (VOS)—in Chap. 4. Specifically we explore the calibration of a six parameter version of this model, extended to take into account energy loss via radiation and the possibility of wiggles on small scales. To do so, we first explore a calibration of walls that re-enter the horizon, after being pushed out by inflation. Following calibration, we then test if this model properly predicts the behavior of the network when going for frozen outside of the horizon, re-entering, and then achieving scale-invariant evolution. We then set out to update the calibration for Abelian-Higgs strings. Firstly, this was done by attempting to use the single GPU version of the simulation, which gave us a preliminary set of model parameters and allowed us to study the effects of overcooling initial conditions. Secondly, the hardware resources of Piz Daint and the ability of the simulation code to use them have allowed a high-resolution calibration with a proper exploration of possible systematic error sources. This then led us to discussing the observational impact of said systematics in observational predictions derived from the VOS. To conclude, we then present the end-goal of this thesis project in Chap. 5: to simulate a toy model of cosmic superstrings. Although cosmic superstrings are very different from field theory strings, and such a toy model cannot properly model all the relevant phenomenological effects, it does permit the evolution and interaction of two string networks. More specifically, it allows the formation of networks of combined bound state strings, which eventually enter a scale-invariant regime. Although current literature work has shown that this combined network is rather sparse, this presumes certain numerical and parameter choices, which can plausibly inhibit or enhance bound state formation. One such effect we explore is the changing of the behavior of the model coupling to allow shrinking, growing, and constant width of strings. Overall, we have developed and implemented string and wall simulations which can be used for future studies on a variety of topics, ranging from the study of small-scale structure (heavily relying on the centerline capabilities) or even a further exploration of dual U (1) models. We remark as well that there are numerous upgrades that can be made on a computational basis to further exploit extreme hardware. Acknowledgements This work was financially supported by a fellowship from Fundação da Ciência e Tecnologia (SFRH/BD/130445/2017). Due to the computationally demanding nature of the work developed, there are also some acknowledgements to be made in terms of hardware donated/accessed. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Quadro P5000 GPU used for the beginning of this research. In addition, we acknowledge PRACE—Partnership for Advanced Computing in Europe—for awarding us access to Piz Daint at CSCS— Centro Svizzero di Calcolo Scientifico—Switzerland, through Preparatory Access proposal 2010PA4610, Project Access proposal 2019204986, and Project Access proposal 2020225448. I greatly thank my supervisor Carlos Martins for believing in the crazy idea that led to this thesis and for supporting me throughout the development of this work. His patience, guidance, and knowledge have without a shadow of doubt molded my way of thinking and, consequently, a lot of this manuscript. In terms of interesting discussions, I would also like to thank Asier Lopez-Eiguren, Ivan Rybak, José Vieira, Jon Urrestilla, Daniel Jimenez-Aguilar, José Juan Blanco Pillado, Guy Moore, Lara Sousa, Luisa Maria Serrano, João Camacho, and João Faria. In addition, I would also like to acknowledge visualization guru Jean Favre at CSCS for his technical support and for his deep knowledge of ParaView and VTK. Without his know-how the centerlines approach would not have been developed (and this thesis would be shorter). I would also like to thank all the students of Carlos Martins with whom I’ve worked throughout the duration of the doctoral program—Manuel Rosa, Diogo Gomes, Ana Almeida, Filipe Costa, Siri Berge—we were, and still are, a good team. No man is an island, and throughout the past 4.5 years there have been multiple people who’ve made this complicated journey a lot more bearable, by being there when I needed the most, and by celebrating my successes. I deeply thank my parents Armanda and José, whose unwavering support, dedication, and love have shaped me into who I am today. A special thank you as well to my grandfather Manuel, who taught me the basics of programming with qBASIC, some bits of Latin, Greek, and a lot of history. He also helped raise me and showed love and guidance when needed. xi xii Acknowledgements I owe everything I am to all the folk I mentioned and I hope I have made them proud with this Thesis. I’m deeply sorry my late grandmothers Armanda Oliveira Pereira Campos and Maria Inês Correia, and my late great-aunt Maria Branca da Silva Santos could not see me become a doctor and I gratefully acknowledge their kindness, their love, and their wisdom. To them I respectfully dedicate this manuscript. Last but not least, a big thank you to all my friends who, either via beers, coffees, board game sessions or via a friendly office environment, have never failed to put a smile on my face. Allow me to name a few (apologies if I forget someone) in no particular order—Bachar, Jorge, Camacho, Saeed, Faria, André, Leal, Oliveira, Inês, Elisa, Miguel, Susana, Olivier, Vardan. To conclude, I would also like to state that the coolest office in CAUP is 1.01—no doubt due to its occupants and the piles of boxes of Beira Douro coffee. Contents 1 A Brief Description of Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 The History of Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Beginning of Modern Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Missing Ingredients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Missing Ingredient I—Inflation . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Missing Ingredient II—Dark Energy and Dark Matter . . . . . 1.4 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 6 6 9 10 12 2 Topological Defects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Solitons and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Abelian-Higgs Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Beyond Abelian-Higgs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Kibble Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Simulations of Defects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Abelian-Higgs Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Network Evolution Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Standard Velocity Dependent One-Scale Model . . . . . . . . . . 2.7.2 Extended Velocity Dependent One-Scale Model . . . . . . . . . . 2.7.3 Observational Footprints from Semi-Analytical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 16 18 19 23 24 24 27 31 32 35 3 Supercomputing with Graphics Processing Units . . . . . . . . . . . . . . . . . . 3.1 An Introduction to High Performance Computing . . . . . . . . . . . . . . . 3.1.1 Amdahl and Gustafson’s Laws . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Architectures and Programming Paradigms . . . . . . . . . . . . . . 3.2 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Single Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 49 50 56 57 36 42 43 xiii xiv Contents 3.3 Abelian-Higgs Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Single Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Multiple Accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 In-Situ Analysis and Visualization Pipeline . . . . . . . . . . . . . . . . . . . . . 3.4.1 Reduced Winding Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Centerlines Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 65 77 88 88 91 93 95 4 Calibration of Extended VOS Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Global Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Walls Formed Before the End of Inflation . . . . . . . . . . . . . . . . 4.2.2 A Primer on Calibrating the Extended VOS–Domain Walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Abelian-Higgs Cosmic Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Calibrations on Small Lattices–A First Approach . . . . . . . . . 4.3.2 Overcooled Initial Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Further Exploration of Model Sensitivity to Numerical Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Coda: Observational Impact of Different Calibrations . . . . . 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 99 100 100 102 109 109 119 5 Strings in U(1) L × U(1) L Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Simulation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 On Average Network Quantities . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 On Locating pq-Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Impact of Physical Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 150 154 155 159 166 174 175 6 A New Generation of String Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Overview and Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Computational Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Small-Scale Structure of Abelian-Higgs Strings . . . . . . . . . . 6.2.3 Further Exploration of String Networks in the U (1) L × U (1) L Model . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 177 180 180 181 129 141 144 145 183 184 Curriculum Vitae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Chapter 1 A Brief Description of Cosmology The story so far: In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move Douglas Adams 1.1 The History of Cosmology Cosmology is almost as old as humankind. The question of how and when “everything” started is a problem tackled by every ancient civilization in human history. It started in the Neolithic period (100,000 years ago), where cosmology was more local and based on phenomena such as weather, the moon, earthquakes, and evolved to a cosmology based on myths and the supernatural with the Egyptian and Mesopotamian civilizations—Old Babylonian, Assyrian, New and Late Babylonian periods—from about 5,000 years ago. Note that from this point on, almost every single religion includes a mechanism/story for the creation of the Universe. Eventually such a pressing and difficult question, troubled also the the Greek and Roman civilizations, whose Cosmology sprung from philosophy and geometry (600 Before Christ). In this sense, their Cosmology differed from the previous models as it included the relationship between cause and effect, and it noted the need to deliver predictions. For example, in the Plato model of Cosmology, Earth is surrounded by a Air and Fire sphere, such that any balloon with hot air will rise to reach the sphere of Fire. In this model—which we now know to be incorrect—Earth is at the center of the Universe and all planets, the Sun and all other stars revolve around it. This was the first of many geocentric models, which inevitably lead to the conclusion that the Universe was designed for us. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations, Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_1 1 2 1 A Brief Description of Cosmology Scientific thought was severely stifled in the Middle Ages, but eventually, during the 16th century, some important milestones ocurred. Some examples of these important milestones include the work of Johannes Kepler, Tycho Brahe—which allowed the first mathematical descriptions of the motion of celestial bodies and proposed a hybrid helio-geocentric model of the Universe—and the work of Nicolau Copernicus and Giordano Bruno—which supported a heliocentric model of the Universe, where the Earth is not at the center of the Universe. While Copernicus explicitly mentioned the Sun as the center of in his model, Giordano Bruno, by extension of the aforementioned model, did away with the idea of stars glued to a celestial sphere, and suggested that stars were akin to distant suns surrounded by their own planets. In this sense, he even forsook the heliocentrism of the Copernicus model, and radically suggested the Universe was infinite and with no center. The ideas of Copernicus and Giordano Bruno were also supported and championed by Galileo Galilei. In the 17th century, religious obscurantism stifled again scientific thought in Europe, however in England, Isaac Newton, inspired by the previous ideas about the motion of celestial bodies, boldly proposed physical laws that led to classical Gravitation. The era of modern cosmology began with Einstein’s theory of general relativity in 1915. General Relativity, along with the idea of an expanding Universe by Friedmann and Lemaître in 1922 and 1927, respectively, eventually led to a consistent mathematical description of the Universe, one that could be tested and observed. The expanding Universe started as hot and dense fluid which gradually expanded into the one we observe today. From the observational point of view, the idea of an expanding Universe was confirmed by Edwin Hubble in 1929, with the discovery that galaxies recede away from us. Hubble also noted that the recession velocity of each galaxy was proportional to its distance from the observer. As a starting point for this Thesis we will begin by describing succinctly the first ingredients of standard Cosmology and the need for inflation and dark energy. We will then note the role defects play in Cosmology and which defects are to be studied in this manuscript. 1.2 The Beginning of Modern Cosmology For now, let us begin with Einstein’s equation, 8π GTμν = Rμν − 1 Rgμν 2 (1.1) where G is the gravitational constant, Tμν the energy-momentum tensor, Rμν and R are Ricci tensor and scalar respectively and gμν the metric of space-time. Greek indices run from 0 to 3, with 0 representing time and everything else spatial coordinates. There are two ingredients necessary to derive the dynamical equations of the 1.2 The Beginning of Modern Cosmology 3 Universe: first the energy-momentum of a perfect isotropic fluid and then the metric to be used (of signature +,–,–,–). The energy momentum-tensor of a perfect fluid, Tμν = ( p + ρ)Uμ Uν − pgμν (1.2) where U is the fluid velocity four-vector, which in the comoving rest frame is given by Uμ = (1, 0, 0, 0), and fluid density and pressure in the rest frame are given by ρ and p respectively. The tensor itself takes the following form then, ⎞ ⎛ ρg00 0 0 0 ⎜ 0 − pg11 0 0 ⎟ ⎟. (1.3) Tμν = ⎜ ⎝ 0 0 − pg22 0 ⎠ 0 0 0 − pg33 For the metric (and derived quantities such as the Ricci tensor and scalar) we will use the Friedmann-Lemaître-Robertson-Walker (from now on shortened to FLRW) metric in natural units, ds = dt − a(t) 2 2 2 dr 2 + r 2 (dθ 2 + sin θ 2 dφ 2 ) 1 − kr 2 (1.4) where a(t) is a scale factor, and space-time coordinates are (t, r, θ, φ) and K is a constant responsible for the curvature of the spatial slices for a given time. Depending on the value of K, < 1, > 1 or = 1, the Universe will either be hyperbolic/open, spherical/closed or flat respectively. Note that the spatial part of this metric is maximally symmetric (i.e., possesses the maximal number of Killing vectors). This is merely a reflection of the following cosmological principles: • Homogeneity—the Universe exhibits translational invariance and thus there are no privileged points in the Universe; • Isotropy—the Universe exhibits rotational invariance and thus there are no privileged directions. Note that this principles hold for the Universe at large but need not hold locally. This metric can also be re-written in terms of conformal time τ , ds 2 = a(τ )2 dτ 2 − dr 2 + r 2 (dθ 2 + sin θ 2 dφ 2 ) 1 − Kr 2 (1.5) where the conformal time is related to physical time via, dt 2 = a 2 dτ 2 . (1.6) Inserting the two necessary ingredients into the Einstein equations, we can take the 00-component and the trace to derive the dynamical equations of the Universe: first the Friedmann equation, 4 1 A Brief Description of Cosmology H2 = 8π G K ρ− 2 3 a (1.7) and the Raychaudhuri equation, Ḣ + H 2 = − 4π G (ρ + 3 p) 3 (1.8) where H = a1 da is known as the Hubble parameter. Before we continue with a dt description of the dynamics of the Universe, it is useful to define two different cosmological horizons: the Hubble horizon and the particle horizon. The Hubble horizon is a boundary separating objects receding faster and slower than the speed of light. It is defined as, 1 , H rh = (1.9) and a comoving version of it can be defined by multiplying by the scale factor, 1 . aH rhc = (1.10) The particle horizon corresponds to the maximum distance from which photons could have traveled to the observer in the age of the Universe. It thus separates the observable from the unobservable. In natural units it is given simply by the conformal time, t η= 0 dt . a(t) (1.11) Throughout this thesis both will be used at some point of another. Having defined these, we now continue with the description of the dynamics of the Universe. μν Using the conservation of energy-momentum T;ν = 0, or alternatively re-writing 1.8 with the help of a differentiated 1.7, one can write a conservation equation, dρ + 3H (ρ + p) = 0 dt (1.12) It is also useful to define the density of the Universe when K = 0, also known as the critical density, ρc = 3H 2 8π G (1.13) which can be used to define a critical density parameter, = ρ ρc (1.14) 1.2 The Beginning of Modern Cosmology 5 This parameter can be used to re-write the first Friedmann equation 1.7 as, 1− = K (1.15) a2 H 2 Before we advance, we will take the continuity equation above in 1.12 and note that the fluids one considers in Cosmology obey the barotropic equation of state, p = ωρ, and appropriate substitution allows a solution of the continuity equation, ρ ∝ a −3(1+ω) (1.16) This already gives quite a lot of information: in any era dominated by a relativistic fluid (ω = 13 ), such as radiation, or in any era dominated by non-relativistic matter where the pressure is negligeble (ω = 0), ρr ∝ a −4 ρm ∝ a −3 (1.17) We can now use these two components to write the density ρ as a sum of component densities, ρi = ρr + ρm and define a critical density parameter for each component. The first Friedmann equation can then be re-written in terms of the critical density parameters of each component at the present time, H 2 = H02 [ r,0 a −4 + m,0 a −3 ] (1.18) Using this dynamical equation, we can solve for the scale factor in terms of time, considering at first some barotropic fluid without specifying ω, 2 a ∝ t 3(1+ω) (1.19) which means that the scale factor must have evolved in radiation and matter according to, 1 a ∝ t2 ∝ τ 2 a ∝ t 3 ∝ τ2 (1.20) We note now that this description already tells us the Universe first passed through a hot and dense phase dominated by relativistic particles (radiation domination) and as it expanded and cooled down it transitioned to a matter-dominated era. This picture is however incomplete, and to arrive at the Standard accepted model of Cosmology there are two fundamental missing ingredients that must be discussed. 6 1 A Brief Description of Cosmology 1.3 Missing Ingredients 1.3.1 Missing Ingredient I—Inflation Inflation is one of the major ingredients of standard Cosmology and it was devised to solve some issues of the hot big bang model [9]. It comprises a period of exponential expansion of space (i.e., a ∝ exp(t)) during the early Universe that was proposed to address three major problems: • Horizon problem—This problem arises from the high homogeneity of the cosmic microwave background (CMB). This basically means that all distinct patches of the CMB sky are statistically the same (and indeed have the same temperature). However, in a Universe with only with a matter and radiation epoch and no other epoch preceding them, it is not possible for two distant patches of the sky to equilibrate and obtain the same temperature, as such patches would move apart faster than the speed of light. Inflation provides a solution: it posits that all patches were causally connected in the past in a small region in thermal equilibrium, but as inflation proceeded all patches are isolated by being pushed beyond the size of the cosmological horizon (equivalently the cosmological horizon shrinks). In the end, these regions end up not longer being in causal contact, but due to the period of rapid expansion still maintain the same statistical properties; • Monopole problem—Monopoles are topological defects (see next section for a definition) which form in specific symmetry breaking patterns. Note that the formation of magnetic monopoles [14, 17] would be an inevitable prediction of many Grand Unified Theories (GUT), as the required ingredient is to have some nonspecific symmetry group G, which contains an unbroken U (1) (gauge symmetry of electromagnetism). This would be sufficient to form magnetic monopoles. There is a problem with magnetic monopoles being formed in the early Universe. As shown in [19] monopole annihilation process extremely slowly, to the point where the monopole density would greatly exceed the critical density of the Universe, contrary to observational evidence. Inflation can deal with this issue, by pushing Monopoles out of the horizon (this is similar to the way inflation solves the horizon problem), provided a sufficient number of e-folds (see below) are attainable. Note that not only monopoles can be pushed out of the horizon, but also other cosmic defects (strings and domain walls for example). The number of e-folds for each defect can be different. Another consideration is when the symmetry breaking that forms the defect takes place (before or during inflation). Depending on such details it can be possible for defects to re-enter the horizon during radiation or matter era; • Flatness problem—This issue arises due to the severe fine-tuning required to ρ achieve a flat Universe (where the critical density parameter, c = ρi c i is exactly unity). To see how this problem arises, let’s use the re-written Friedmann equation, above 1.15. Assuming no inflation, the scale factor will grow according to a power law (a ∝ t λ ) for both matter and radiation. As an example, let’s assume a 1% deviation of |1 − | from zero at present time. Going back to Planck era 1.3 Missing Ingredients 7 then reveals that it must have had a value of 10−62 . This then reveals that any small variation of the critical density parameter from unity in the initial conditions results in a non-flat Universe. Inflation fixes this by introducing a period wherein the scale factor evolves according to a ∝ exp(Ct), where C is a constant, which in the above equation has the desirable effect: |1 − | can then begin with any arbitrary value but a period of inflation can force it down to near zero (for example to near 10−62 ) as, 1− ∝ exp(−t) (1.21) and subsequent evolution (post-inflation with power law scale factor) can bring this value to the necessary small deviation. In addition to solving these problems, we must also note that inflation provides an explanation for the appearance of density fluctuations responsible for large-scale structure. So far, it is in agreement with experimental data from the cosmic microwave background, namely the shape of the temperature power spectra of the CMB is indeed well described by an inflationary spectrum. Initially inflation was inspired by first-order phase transitions, where some scalar field in lies in a false vacuum state, acting much like a cosmological constant, as the Universe cools down, the field would quantum tunnel to bubbles of true vacua (bubble nucleation). “Old” inflation however has a major drawback called the “graceful exit” problem: if inflation proceeds via these bubbles and the probability of them forming is large, inflation will be short-lived and frequent bubble collisions create a highly inhomogenous Universe. On the other hand, if the probability of these bubbles forming is too low, inflation will indeed last a long time and each bubble will represent an open Universe with a null density parameter ω (in stark contrast with observational evidence). Neither option is acceptable. A different approach which solves this issue comes from the “new” inflation paradigm [3, 11]. In order to accurately describe how inflation resolves these issues one could start with a scalar field (inflaton) which drives this early Universe behavior. In order to describe new inflation we should however note the dynamics of the inflaton and why a scalar field can be proposed. Starting with the second Friedmann equation 1.8, and requiring a shrinking Hubble radius (and therefore accelarated expansion), r˙c < 0 → ä > 0 (1.22) ρ < −3 p (1.23) which equivalently means that, And this condition is naturally satisfied by a scalar field φ (the aforementioned inflaton). To show it let’s begin with the simplest possible Lagrangian for a real scalar 8 1 A Brief Description of Cosmology field, L= 1 φ,μ φ ,μ − V (φ) 2 (1.24) Remembering the previous relation for an isotropic fluid 1.2 and the definition for the energy-momentum tensor, 2 δS T μν = − √ g δg μν (1.25) we can write expressions for pressure and density, ρ= 1 (∂t φ)2 + V (φ) 2 (1.26) p= 1 (∂t φ)2 − V (φ) 2 (1.27) which upon insertion in the previously mentioned condition 1.23, specify inflation can only occur should the potential energy dominate, V (φ) > (∂t φ)2 . (1.28) This is possible should the potential be flat enough. Note that there should also be a minima to the potential, so as to end inflation. From here on, we can also re-insert these expressions for pressure and density in the Friedmann equations, which reveal, K 1 8π G 2 V (φ) + φ̇ 2 − 2 (1.29) H = 3 2 a φ̈ + 3H φ̇ = −∂φ V (1.30) In order to simplify these equations one can use the fact that Hubble expansion will be dominated by potential energy as mentioned in 1.28, H2 ≈ 8π G V (φ) 3 (1.31) 3H φ̇ ≈ −∂φ V (1.32) and define two additional parameters, = m 2Pl 16π ∂φ V V 2 η= m 2Pl 8π ∂φ2 V V 2 . (1.33) 1.3 Missing Ingredients 9 We can now state the necessary conditions for inflation to occur. These are known as the slow-roll conditions, as they require that the slope and the curvature of the potential to be sufficiently small, 1 η 1. (1.34) In order to solve all previously indicated problems, inflation must last for a sufficient amount of time. We can express this demand by introducing the number of e-folds, N = ln a(te ) a(ti ) (1.35) te = H dt (1.36) dφ √ 2 (1.37) ti ≈ φ φf and it can be shown that this number must be minimally around 60, in order to ensure that all problems can be solved by inflation [4]. 1.3.2 Missing Ingredient II—Dark Energy and Dark Matter Previously we mentioned that the recession velocity of galaxies depends linearly with the distance of the galaxy to the observer (Hubble law). This result by Edwin Hubble in 1929 largely confirmed the Universe to be expanding. Up until the late nineties, the accepted view was this expansion was either constant or decelerated. However, in 1998 both the High-Z Supernova Search Team [15] and the Supernova Cosmology Project [13] and found that not only is the expansion not decelerating nor constant, it is accelerating. This was done by looking at the luminosity of the type Ia supernova at a range of redshifts. Note that if we only test low redshift accelerated expansion is not evident (one obtains only the Hubble law). However, as soon as one includes non-local redshifts, the view changes and indeed it is necessary to introduce some fluid component into the Universe which accounts for accelerated expansion. In order to do so, one can introduce a term in the Einstein’s equations, 8π GTμν = Rμν − 1 Rgμν + gμν 2 (1.38) this term is known as the cosmological constant. It describes a fluid (known as dark energy) which exerts negative pressure (ω = −1) countering gravity. From here we 10 1 A Brief Description of Cosmology can re-obtain the Friedmann equations again, H2 = k 8π G ρ− 2 + 3 a 3 Ḣ + H 2 = − (1.39) 4π G (ρ + 3 p) + 3 3 = Introducing a critical density parameter for dark energy, the second Friedmann equation, m + r + (1.40) =1 3H 2 one can re-write (1.41) The latest observational constraints from Planck data [2], assuming a flat Universe, reveal the following present-day values for each density parameter: of 0.6911 ± 0.0062, an extremely small r (of order ≈ 10−4 ), and an m of around 0.3089 ± 0.0062. We note here that this means that accelerated expansion (and therefore dark energy) has been experimentally confirmed by CMB experiments (see also [1] or for older WMAP results [10]). In addition also, cosmic shear by weak lensing and Lyman-α absorption spectra confirm accelerated expansion. We also remark that there is another missing ingredient that should be included: dark matter. It was proposed in order to account for missing matter necessary to explain the rotational velocity of galaxies and plays an important role in the understanding of large-scale structure. In order to be in agreement with observational data from large-scale structure one should use cold (non-relativistic) dark matter, which suggests that the term m can be decomposed as m = cdm + b where cdm , b are the critical density parameters of dark matter and baryonic matter, respectively. With these we can write the dynamical equation of the Universe in terms of Hubble constant, H 2 = H02 [ r,0 a −4 + m,0 a −3 + ] (1.42) Integrating this particular equation and fitting to supernova data (see Fig. 1.1) then reveals good agreement with accelerated expansion. This equation also tells us one thing: at late times, dark energy dominates the energy density of the Universe. 1.4 Scope of the Thesis In this chapter we began by reviewing some of basic concepts underlying modern Cosmology, such as the concordance model, C D M and inflation. Although we cannot deny the successes of this model of standard Cosmology, we should be aware of its failings. Of note we can list a few of these, which are active research topics in 1.4 Scope of the Thesis 11 Fig. 1.1 Supernova data from the Supernova Cosmology Project (Union 2.1 data from [16]) in blue points, and a red line indicating the fit assuming = 0.7 and m = 0.3 Modern Cosmology at the time of writing. For example, we can consider the extreme (120 orders of magnitude smaller) discrepancy between the cosmological constant and the vacuum energy predicted by the Standard Model of particle physics [12], the 5σ tension in the Hubble constant (H0 ) measurements when comparing early and late time searches [18], the current non-detection of a Dark Matter candidate particle [5, 8], the fact that no explanation is offered for the matter-antimatter asymmetry [6], disagreements from N-body simulations at small-scales [7], etc. In addition, we point out that many extensions to the SM of Particle Physics might not seek to solve only some of its problems but can also offer solutions to some of the cosmological issues outlined above. Although such theories of new physics can be probed via direct accelerator searches, the early Universe and the possibility of high-energy phenomena also provide an excellent laboratory. This can lead us to ask a different question: are there phenomena common to several of these theories of new physics? And from this question we can posit the possibility that these new sectors introduce phase transitions in the early Universe, which if they give rise to observational footprints can be used to search for several theories at once. We will next introduce (in the following chapter) one class of possible by-products of phase transitions in the early Universe, named topological defects, and how they might have formed. Some types of defect will be the object of study of this manuscript. Subsequently, in Chap. 3 we will introduce briefly the High-Performance computing aspects to be throughout the rest of the chapter where we will also describe the numerical simulations used for the physics results of this thesis. We remark that 12 1 A Brief Description of Cosmology banishing all computing aspects to Chap. 3 is intentional: any reader who is not computationally inclined can simply skip reading this chapter. Afterwards in Chap. 4, we will describe explore some results obtained with the simulations of Chap. 3, subsequently moving to explain how these simulations will help us simulate a toy model of cosmic superstrings in Chap. 5. We then present some concluding remarks and some tentative next steps. References 1. Ade P et al (2016a) Planck 2015 results. XIV. Dark energy and modified gravity. Astron Astrophys 594:A14. https://doi.org/10.1051/0004-6361/201525814 2. Ade P et al (2016) Planck 2015 results. XIII. Cosmological parameters. Astron Astrophys 594:A13. https://doi.org/10.1051/0004-6361/201525830 3. Albrecht A, Steinhardt PJ (1982) Cosmology for grand unified theories with radiatively induced symmetry breaking. Phys Rev Lett 48:1220–1223. https://doi.org/10.1103/PhysRevLett.48. 1220 4. Baumann D (2011) Inflation. In: Theoretical advanced study institute in elementary particle physics: physics of the large and the small. pp 523–686. https://doi.org/10.1142/ 9789814327183_0010 5. Bertone G, Hooper D (2018) History of dark matter. Rev Mod Phys 90(4):045002. https://doi. org/10.1103/RevModPhys.90.045002 6. Canetti L, Drewes M, Shaposhnikov M (2012) Matter and antimatter in the universe. New J Phys 14:095012. https://doi.org/10.1088/1367-2630/14/9/095012 7. Del Popolo A, Le Delliou M (2017) Small scale problems of the CDM model: a short review. Galaxies 5(1):17. https://doi.org/10.3390/galaxies5010017 8. Freese K (2017) Status of dark matter in the universe. Int J Mod Phys 1(06):325–355. https:// doi.org/10.1142/S0218271817300129 9. Guth AH (1981) Inflationary universe: a possible solution to the horizon and flatness problems. Phys Rev D 23:347–356. https://doi.org/10.1103/PhysRevD.23.347 10. Hinshaw G, Larson D, Komatsu E, Spergel DN, Bennett CL, Dunkley J, Nolta MR, Halpern M, Hill RS, Odegard N, Page L, Smith KM, Weiland JL, Gold B, Jarosik N, Kogut A, Limon M, Meyer SS, Tucker GS, Wollack E, Wright EL (2013) Nine-year Wilkinson microwave anisotropy probe (WMAP) observations: cosmological parameter results. APJS 208(2):19. https://doi.org/10.1088/0067-0049/208/2/19 11. Linde AD (1987) A new inflationary universe scenario: a possible solution of the horizon, flatness, homogeneity, isotropy and primordial monopole problems. Adv Ser Astrophys Cosmol 3:149–153. https://doi.org/10.1016/0370-2693(82)91219-9 12. Martin J (2012) Everything you always wanted to know about the cosmological constant problem (but were afraid to ask). Comptes Rendus Physique 13:566–665. https://doi.org/10. 1016/j.crhy.2012.04.008 13. Perlmutter S et al (1999) Measurements of and from 42 high redshift supernovae. Astrophys J 517:565–586. https://doi.org/10.1086/307221 14. Polyakov AM (1974) Particle spectrum in the quantum field theory. JETP Lett 20:194–195 15. Riess AG et al (1998) Observational evidence from supernovae for an accelerating universe and a cosmological constant. Astron J 116:1009–1038. https://doi.org/10.1086/300499 16. Suzuki N, Rubin D, Lidman C, Aldering G, Amanullah R, Barbary K, Barrientos LF, Botyanszki J, Brodwin M, Connolly N, Dawson KS, Dey A, Doi M, Donahue M, Deustua S, Eisenhardt P, Ellingson E, Faccioli L, Fadeyev V, Fakhouri HK, Fruchter AS, Gilbank DG, Gladders MD, Goldhaber G, Gonzalez AH, Goobar A, Gude A, Hattori T, Hoekstra H, Hsiao E, Huang X, Ihara Y, Jee MJ, Johnston D, Kashikawa N, Koester B, Konishi K, Kowalski M, Linder References 13 EV, Lubin L, Melbourne J, Meyers J, Morokuma T, Munshi F, Mullis C, Oda T, Panagia N, Perlmutter S, Postman M, Pritchard T, Rhodes J, Ripoche P, Rosati P, Schlegel DJ, Spadafora A, Stanford SA, Stanishev V, Stern D, Strovink M, Takanashi N, Tokita K, Wagner M, Wang L, Yasuda N, Yee HKC (2012) T. Supernova cosmology project. The hubble space telescope cluster supernova survey. V. Improving the dark-energy constraints above z > 1 and building an early-type-hosted supernova sample. Astrophys J 746(1):85. https://doi.org/10.1088/0004637X/746/1/85 17. ’t Hooft G (1974) Magnetic monopoles in unified gauge theories. Nucl Phys B 79:276–284. https://doi.org/10.1016/0550-3213(74)90486-6 18. Verde L, Treu T, Riess AG (2019) Tensions between the early and the late universe. Nature Astron 3:891. https://doi.org/10.1038/s41550-019-0902-0 19. Zeldovich Y, Khlopov M (1978) On the concentration of relic magnetic monopoles in the universe. Phys Lett B 79(3):239–241. ISSN 0370-2693. https://doi.org/10.1016/03702693(78)90232-0. https://www.sciencedirect.com/science/article/pii/0370269378902320 Chapter 2 Topological Defects Its height gradually diminished, and after a chase of one or two miles [2–3 km] I lost it in the windings of the channel. Such, in the month of August 1834, was my first chance interview with that singular and beautiful phenomenon which I have called the Wave of Translation. John Scott Russell 2.1 Solitons and Topology As shown in the epigraph above, in 1834, the scottish engineer John Scott Russell, when attempting to optimize the design of boats for the Union Canal, noted a particular phenomena: a water “wave of translation” which propagated along the channel without change of form, nor velocity. These “waves of translation” quickly became an object of obsession not just for Russell, but for several physicists in different areas such as fluid dynamics (Kortweg-deVries equation), condensed matter (GrossPitaevskii equation), cosmology (as will be seen in the next paragraph). Today we name these waves “solitons,” and while a specific definition is often hard to pin, a few general properties can be identified, such as the fact that they correspond to stable, self-sustaining wave packet solutions with a permanent form, they are localized to a given region, and, unlike regular waves, will not merge with other solitons upon interactions. One specific type of soliton, owes its stability to a topological condition, i.e., to the inability to decay to a topologically trivial solution. The main object of study of this thesis are topological defects possibly created in the early Universe, by means of the Kibble mechanism [32]. In order to define them, we take the approach of beginning with a more technical definition and then gradually simplifying to a more intuitive one. We begin with assuming a phase transition wherein, at some critical temperature Tc , some symmetry group G of elements g ∈ G is broken spontaneously © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations, Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_2 15 16 2 Topological Defects Table 2.1 A list of defects formed from distinct homotopy groups Homotopy condition Topology of the vacuum Resulting defect π0 (G/H ) = 1 π1 (G/H ) = 1 π2 (G/H ) = 1 π3 (G/H ) = 1 disconnected non-contractible loop non-contractible 2-sphere non-contractible 3-sphere Domain wall Cosmic string Monopole Texture to a subgroup H where h ∈ H . The Hamiltonian of the theory is described by a set of fields φ that are invariant under transformations by elements of G, H[φ] = H[φg ] (2.1) φ0h = φ0 (2.2) {g H : h ∈ H } = g H (2.3) The field configurations which are called defects arise when the vacuum manifold is homotopically non-trivial, (2.4) πn (M) = . in other words, when the vacuum can be described by an non-contractible ndimensional sphere, which can have non-connected components. The field configuration we are describing is a solution to a given set of partial differential equations (in a field theory the equations of motion) that obeys a set of boundary conditions. In other words, this solution exists because at the boundaries, the solution is described by a topologically non-trivial vacuum manifold. This definition also has an interesting side-effect: depending on the dimensionality of the n-sphere (i.e., on the topology of the vacuum manifold) different defects emerge (Table 2.1). To make the point about boundary conditions more clear and to visualize the nature of defects in real space, we can introduce two examples of defects which, coincidentally are the ones studied throughout this thesis. The two examples will be the domain wall, which arises when n = 0 and the cosmic string which arises when n = 1. 2.2 Global Domain Walls Starting with the Lagrangian for a theory invariant originally (before symmetry breaking) invariant under global Z2 transformations, 2.2 Global Domain Walls 17 Fig. 2.1 On the left-hand panel, an example of a potential which allows the equations of motion of a real scalar field to admit a wall solution; on the right-hand, the analytical static domain wall solution and corresponding energy density obtained with the aforementioned potential L= 1 φ,μ φ,μ − V0 2 2 φ2 − 1 , φ20 (2.5) which, by standard variational techniques admits the following equations of motion in Minkowski space, 2 2 2 φ ∂μ φ + 4V0 φ0 2 − 1 φ = 0 . (2.6) φ0 In this specific case, the vacuum manifold (see Fig. 2.1a) corresponds to disconnected region and each minimum corresponds to a distinct value of the field, η and −η. The solution we are looking for respects the boundary conditions, φ(−∞) = −η and φ(∞) = η. Restricting ourselves to one-dimensional static case, the analytical solution takes the form, 2V0 φ0 x . (2.7) φ(x) = φ0 tanh which is plotted in Fig. 2.1b. The wall core then corresponds to a region where the Vacuum-Expected Value (VEV) is the one predicted by the unbroken symmetry phase, a region with higher energy density than it’s surroundings. To finalize, we can also make the following remark: defects are stable against perturbations and decay precisely because there exists no continuous transformation which maps them into a trivial solution. Physically, this implies that the removal of the defect comes with an associated energy cost: lifting all of the field over the potential barrier. One can take this a step further by stating that this stability is a consequence of a topological conservation law, and that each defect has an associated charge. This was first shown by [1]. 18 2 Topological Defects 2.3 Abelian-Higgs Strings We can also show another example of defect, the line-like cosmic strings. For this we start with a Lagrangian density invariant under U (1) local transformations, L = |Dμ φ|2 − λ 1 (|φ|2 − σ 2 )2 − 2 F μν Fμν 4 4e (2.8) Note that the shape of the potential is the so-called “Mexican-hat,” as evidenced by Fig. 2.2a. By means of the Euler-Lagrange equations one can then obtain the following equations of motion, Dμ D μ φ + λ (|φ|2 − η 2 )φ = 0 , 2 ∂μ F μν = 2e[φ∗ D ν φ] (2.9) (2.10) Analytical solutions can be difficult (impossible actually) to obtain in this case from first principles, but for our purposes we can the demonstrate the existence of the string solution using the Nielsen-Olsen ansatz for a static straight string, whereby the fields are described by a set of auxiliary functions f (r ) and α(r ), φ(r ) = f (r ) expinθ n α(r ) Aθ = − e r (2.11) (2.12) where r is the radial coordinate, θ the angular coordinate and n is an integer number denoted the winding (which dictates how many times the field non-trivially winds around the core of the string). This integer number constitutes a topological charge and is a conserved quantity. Substitution of this ansatz into the equations of motion under the assumption of staticity then reveals the following system of equations, d2 f 1 df λ − n 2 f (α − 1)2 − f ( f 2 − 1) = 0 + 2 dr r dr 2 d 2 α 1 dα − 2e2 f 2 (α − 1) = 0 − dr 2 r dr (2.13) (2.14) In order to solve for f (r ) and α(r ) we can use collocation methods to solve for f (r ) and α(r ) (via the COLNEW software [9]) by imposing the boundary conditions, 2.4 Beyond Abelian-Higgs 19 Fig. 2.2 On the left-hand panel, an example of a potential which allows the equations of motion of a complex scalar field to admit a string solution; on the right, the analytical static gauged string solution obtained with the aforementioned potential lim f (r ) = 0 r →0 lim α(r ) = 0 r →0 lim f (r ) = 1 (2.15) lim α(r ) = 1 (2.16) r →∞ r →∞ The resulting solution for unit winding (n = 1) can be seen in Fig. 2.2b. These boundary conditions imply that the ground state of the scalar field is recovered far away from the string. Now that these two numerical solutions gave the reader a better intuition on the topological defects used throughout this thesis. The intuition gained here will aid us in the description of a more exotic types of strings, which we will attempt to study, as an end goal of the thesis. After such introductions, we will also contemplate how defects could have formed in the early Universe. 2.4 Beyond Abelian-Higgs In 1985 [68], with the rise of string theory and the appearance of cosmic strings, the possibility that strings from perturbative string theory could play the role of cosmic strings was put forth. They are in a strict sense different objects: in one case, the string is the fundamental object (and as such the action is written for it), in the other some field is the fundamental object and a specific configuration of it corresponds to a cosmic string. However, if strings could be stretched to cosmological scales they could play a similar role to cosmic strings. In the aforementioned publication, the answer was no and for several reasons: they would have extremely high tensions which is not only in direct contrast with current observational limits for cosmic strings, but also suggests they could not form 20 2 Topological Defects after inflation (meaning they would be diluted away forming before), and a lack of stability (the strings could quickly break up into smaller strings, thus not provide any observational signal—bosonic open strings [70]). This however changed with the introduction of superstring theory and the proposal of higher-dimensional extended objects called branes [21], where open strings can end (D for Dirichlet boundary condition) which can possess a gauge field. Although we will later state the action for a brane, for now let’s keep to arguments that will allow us to relate branes produced in brane-inflation models to cosmic strings. Note that the author is not a string theorist, and therefore most arguments presented here will take a field theory slant. As such, we will briefly turn our attention to a Type IIB brane-annihilation scenario [30, 54] where F-strings (F stands for fundamental) strings and D-strings (branes with all dimensions but one compactified) are formed and act as cosmic strings, even obeying the homotopy group condition for topological defects. These will be known as cosmic superstrings. Note that this is not the only possible formation scenario, and in fact hybrid inflation in the context of superstring theory could also reasonably lead to cosmic superstrings [22, 53]. In brane annihilation a brane and a parallel anti-brane collide at the bottom of a throat in a compact manifold [49], where the metric is warped as, ds 2 = e2 A(y) ημν d x μ d x ν (2.17) which effectively redshifts the tension of any resulting strings. The effective tension is in a KKLMMT model [31] is, μe f f = e2 A(y) μ F (2.18) where e2 A(y) is a warp factor, y are compact coordinates and μ F the ten-dimensional tension of a fundamental string. This basically tackles directly the issue of tensions being either too high and thus in conflict with observational data or too low wherein strings would have a mass scale lower than the inflaton. In order to solve the next issue, one also needs to consider stability, however this is highly model dependent and will not be discussed further (see [45] for general comments). We will now discuss how these superstrings are analogous to the field theory strings, and we will (in a later section) describe how they form stretched to cosmological sizes. In order to show the first, we will succinctly report the arguments of [54]. We begin by noting that there is a tachyonic open superstring stretching between the two branes, which ceases to exist as the branes annihilate. This process is known as tachyon condensation and was first noted by [56]. For this we must write (here without proof, but see [54]) the tachyon potential for an open tachyon string stretching between branes, with each brane compactified in a ( p − 3) dimensions and containing a U (1) gauge symmetry, close to when the inter-brane separation is null and inflation ends, 2.4 Beyond Abelian-Higgs 21 V (T ) = 2τ p V|| − λ Ms2 T † T + (T † T )2 + ... 4 4 (2.19) where τ p is the tension of the branes, T is a complex scalar field (the tachyon per se), V|| the compactification volume, λ a parameter which can be obtained from open superstring field theory [11] and Ms = 1/α the string scale. The vacuum manifold clearly obeys the condition, π1 (U (1)) = Z = 0 (2.20) and proves an object akin to a vortex or string should be formed. Since branes of co-dimension 2k = 2, or equivalently dimension p − 2 (the argument follows from K-theory, see [69]), and, since p − 3 dimensions are compactified, the resulting object must have a single large dimension. This connects the topological character of the string to its one-dimensional (one-large-dimensional) nature. While higher dimensional U (N ) could give rise to domain walls and monopoles (2k = 1, 2k = 3, respectively) and odd p would result in BPS instability (Type IIB string theory) thus ensuring any such defect network would quickly decay. So far we’ve been describing cosmic superstrings as Dp-branes wrapped around p − 1 cycles. However, it was also mentioned above that fundamental strings (Fstrings) could play the role of cosmic superstrings. This is a consequence of the existence of the SL(2, Z) transformation (S-duality) relating the gauge field Aμν from a D-string to an antisymmetric field Bμν and interchanging the coupling constant , which establishes D and F strings as duals of one another. A striking gs → −1 gs reflection of this duality is the existence of bound-states of p F-strings and q Dstrings—hereby referred to as ( p, q) strings. This gives rise to a tension spectra which in flat space looks like the following [55], q2 (2.21) μ p,q = μ F p 2 + 2 gs where μ F = 1/2πα is the tension of the fundamental string, gs the coupling constant and p and q the charges of each string type. We remark that this is modified for a cosmological background [26]. Later on we will introduce a possible starting point for the analytical studies of cosmic superstrings. However, and given that this thesis is based on field theory simulations, we will now turn our attention to a toy model that can be used to study superstrings with such simulations. We then propose to study the dual U (1) model of [52]. In this model, each U (1) parameters and the coupling to connect scalar fields in both sectors, must support the existence of two “Abelian-Higgs” strings and possible string combinations. Starting with the following Lagrangian density, L = |Dμ φ|2 − 1 μν 1 F Fμν + |Dμ ψ|2 − G μν G μν − V (|φ|, |ψ|) 4 4 (2.22) 22 2 Topological Defects for two complex scalar fields φ and ψ, two U (1) gauge fields Aμ and Bμ with corresponding gauge field strengths G μν and Fμν . The covariant derivatives, gauge field strengths and potential are given by, Dμ = ∂μ − ie p Aμ Dμ = ∂μ − ieq Bμ (2.23) Fμν = ∂μ Aν − ∂ν Aμ G μν = ∂μ Bν − ∂ν Bμ (2.24) V (|φ|, |ψ|) = λ1 λ2 (|φ|2 − η12 )2 + (|ψ|2 − η22 ) − λ3 (|φ|2 − η12 )(|ψ|2 − η22 ) 4 4 (2.25) where λ p,q are scalar couplings, e p,q are gauge couplings and κ is the coupling between scalar fields. If these parameters are such that 0 < κ < 21 λ p λq then the vacuum manifold will be non-trivial in two sectors supporting the existence of two strings and, due to the non-zero value of λ3 , also bound state strings. There are of course several obvious details not adequately captured by this model, such as the lack of supersymmetry, no dynamical effects of compactified dimensions on strings, and the intercommutation probabilities will simply be unity (unlike what is expected from scattering amplitudes [63]). In order to study the dynamical impact of these effects on cosmic superstring networks it might be fruitful to modify NambuGoto simulations—the reason will become apparent when we describe the NambuGoto action later on in this chapter. However, this toy model is already suitable for studying the kinematic conditions for the formation of bound states, as in [12], and for the presence of scaling for different types of strings, as done in [34] and as will be shown in a later chapter. Finally, we note that there is an additional arguably more realistic type of field theory string, first proposed by [67], that this model can be used to study. In the case where only one U (1) symmetry is broken, for instance by choosing parameters such that 4λ3 > λ1 λ2 , only a single string type forms. The sector with an unbroken sector will form a condensate at the string core, where the scalar field ψ takes some expectation value and tends to zero infinitely far away from the string. In effect, this translates in a trapping of the flux of the charged scalar field on the string which behaves like a superconducting wire—hence they are known as superconducting cosmic strings. Although these strings have been studied in simulations of colliding strings in flat-space [33], so far it remains to be seen whether a full network can also be simulated in a cosmological setting. Having reviewed the types of defects to be studied throughout this thesis, we must understand how they form before we move on to explaining the types of simulations and analytical models used. 2.5 Kibble Mechanism 23 2.5 Kibble Mechanism Topological defects in cosmology were first proposed by [32], and besides positing their existence, a mechanism for their formation was also proposed. In the early Universe, provided some large symmetry group underpinning a Grand Unified Theory breaks down to smaller groups (eventually to the Standard Model) phase transitions will occur at each breaking. In order to describe the Kibble mechanism, we can use the following effective potential for a Goldstone model (the potential of the Lagrangian at 2.8 plus thermal effects), λ 1 λ(|φ|2 − η 2 )2 + (T 2 − 6η 2 )|φ|2 − 0.5T 4 . (2.26) 4 12 √ Above a critical temperature T Tc , where Tc is given by Tc = 6η, the two last terms (thermal effects) dominate the expression and thus the effective potential takes the shape of a parabola. However, as the Universe cools down, past the critical temperature, the potential changes drastically. Even though there is still some contribution from the thermal effects, the vacuum manifold is now topologically non-trivial and the Lagrangian ceases to be invariant under gauge rotations. The behavior of the potential for different regimes (well above Tc , at Tc and below Tc ) can be seen on Fig. 2.3. This signals the threshold at which spontaneous symmetry breaking (SSB) takes place. At this point the field φ at each region of space will randomly roll down to one of the minima of VEV. In the case of a string, the field can select a phase V (T, φ) = Fig. 2.3 The Goldstone potential with thermal effects for a phase transition that should lead to the production of a vortex/string solution, should the temperature decrease sufficiently 24 2 Topological Defects along the vacuum manifold, as long as it is proportional to the winding number. However this is not yet enough to ensure the formation of the defect as the height of the potential is not large enough to avoid fields (with sufficient kinetic energy) from jumping over the potential maxima. When the Universe cools down sufficiently, at the Tc temperature fluctuations will not be enough to allow Ginzburg temperature TG these “jumps,” and correlated regions will remain in the same minima (freeze-out in comoving coordinates). We then note another critical detail: the typical size of these patches of correlated choices of phase, which allows us to define a characteristic scale for defects and defect separation, namely the correlation length, L. In general in phase transitions the correlation length diverges (to infinity) as one approaches the critical temperature. However, these regions cannot be infinitely large, as this violates causality. As such the size often correlated patches must not be larger than the size of the horizon, L ≤ d H ∼ t. In the case of D-strings, one can also interpret their formation through the Kibble mechanism, but with a twist. The first thing that can be noted is that due to the shape of the potential of the tachyon field near the end of brane inflation (see equation 2.19). Then we can follow the arguments of [54]: as the correlation length needs to obey a causality constraint, L ≤ d H ∼ t ∼ 1/H , one can say that the mechanism is only allowed to proceed in dimensions where the compactification size l|| is larger than the horizon size. In other words, the Kibble mechanism would not occur in compactified dimensions, only in the large non-compactified ones (Fig. 2.4). 2.6 Simulations of Defects The main goals of this thesis are twofold: to improve upon existing defect simulations (taking advantage of graphical processors) and use these to improve and calibrate existing semi-analytical models of string evolution. These tasks are described in subsequent chapters. For now we must introduce both the simulations and the semianalytical models to be used throughout the thesis. There are two types of string simulations: ones that evolve fields on a 3D lattice in conformal time, and those that evolve string segments over conformal time. Two snapshots of isosurfaces of either the field or its absolute value (for walls or strings, respectively) can be found in Fig. 2.5a, b. The type of simulation that has traditionally struggled computationally corresponds to field theory strings: which is what we will introduce now. 2.6.1 Global Domain Walls We begin with the Lagrangian density of the simplest defect type: the global domain wall. Writing an action and through standard variational techniques one can obtain the following equation of motion, 2.6 Simulations of Defects 25 Fig. 2.4 In this figure we present a schematic view of the Kibble mechanism. First on the top-left we present a Mexican-hat potential (one whose non-trivial vacuum topology supports the existence of string solution). Note the choices of scalar field phase A, B, C (phases are represented by color here). At the phase transition, different regions (for example A, B, C, see the top-right panel) make a different choice of the phase of the scalar field (different colors). Such choices are typically correlated over a lengthscale of the size of the horizon, which sets a size for each “patch.” However, some regions remain trapped between these patches (the white string core in the top-right and lowerleft figure). Note that in the case of the string we should in reality admit a continuous variation of the phase around the string core, as seen on the panel on the lower left. “Stacking” many different copies of the lower left figure (as shown on the lower right panel) then shows a cosmic string and the phase variation around it ∂V (φ) ȧ φ̈ + 2 φ̇ = ∇ 2 φ − a 2 a ∂φ (2.27) where a is the scale factor and the dotted derivatives indicate derivatives with respect to conformal time, η, which is related to physical time by dη = dt/a. The doublewell potential takes the form present in the Lagrangian above 2.5. For the purposes of obtaining the discrete version of the equations of motion, we will re-write the Hubble damping of the previous such that, 26 2 Topological Defects Fig. 2.5 Simulation snapshots of a network of domain walls and cosmic strings on the left and right-hand-side panels, respectively. Both snapshots correspond to matter era simulations φ̈ + 2 ∂V (φ) d ln a 1 φ̇ = ∇ 2 φ − a 2 d ln a η ∂φ (2.28) Consider then a 3D lattice of comoving spatial coordinates, specified by, xi = xn i , ni ∈ Z (2.29) where the lattice spacing, x, is the same along all directions. The scalar fields φx then reside on the lattice at each lattice point x. From here we can write partial derivatives and the laplacian operator via finite differences to x and x 2 order, 1 [φx+ki − φx ] x 1 x+ki ∂i− ∂i+ φ → − 2φx + φx−ki ] [φ x 2 ∂i+ φ → (2.30) (2.31) where + or − serve to indicate whether the derivatives are forwards or backwards and ki indicates a unit vector along a spatial direction i. With this we could write a discrete version of the above mentioned equation of motion. However, before we do so, there is a tricky problem that must be addressed: how the comoving thickness of a defect behaves throughout a simulation. In general, given that the physical thickness of a wall is constant, it means the comoving thickness shrinks by several orders of 2.6 Simulations of Defects 27 magnitude over time. A way to circumvent this is to force the defect thickness to be constant—this is known as Press-Ryden-Spergel algorithm (PRS for short [48]). It involves modifying the original equations of motion to yield, ∂2φ ∂V d ln a 1 ∂φ − ∇ 2 φ = −αβ , (2.32) + α 2 ∂η d ln η η ∂η ∂φ where α and β are that allow one to adjust momentum conservation α = 3 and constant comoving thickness β = 0. With α = 2 and β = 2 we recover the original equations of motion. Note that in the original PRS article this was obtained by fiat (i.e., modifying the equations of motion by hand), but it is possible to obtain constant comoving width by forcing the comoving scalar coupling to be related to the physical one by some power of the scale factor, as was done for strings by [14]. This procedure is detailed in the next section. Note however that momentum conservation is not enforced in such a way. This then can be used to apply a staggered leap-frog evolution scheme, first-order Crank-Nicholson with respect to time, x,η+1/2 = (1 − δ)x,η−1/2 + η(∇ 2 φx,η − a β ∂V /∂φx,η ) 1+δ φx,η+1 = φx,η + ηx,η+1/2 , (2.33) (2.34) where the is the derivative with respect to conformal time of the scalar field = φ̇ and the δ parameter is given by, δ= 1 η d ln a α . 2 η d ln η (2.35) And with this we are then able to update the scalar field at each site x for conformal time η, thus simulating the scalar field on the lattice. From this we can stress one of the differences between field theory simulations of defects and Nambu-Goto simulations: we are not simulating defects directly, but merely the fields on a lattice. Hubble damping and a potential where the vacuum manifold is indeed non-trivial, are then responsible for forming defects on their own. 2.6.2 Abelian-Higgs Strings For the Abelian-Higgs string we begin with the originally U (1) locally invariant Lagrangian density presented in the previous subsection. The Lagrangian should have the form given previously, and from variation of the action under the assumption of FLRW metric and the temporal gauge (A0 = 0), come the equations of motion, a2λ ȧ (|φ|2 − σ 2 )φ φ̈ + 2 φ̇ = D j D j φ − a 2 (2.36) 28 2 Topological Defects Ḟ0 j = ∂ j Fi j − 2a 2 e2 I m[φ∗ D j φ] (2.37) ∂i F0i = 2a 2 e2 I m[φ∗ φ̇] (2.38) along with Gauss’s law, Again we have the same issue in these simulations as in the domain walls case: the comoving radii of strings shrink by several orders of magnitude. In order to fix the comoving width, [13] took a different approach from what was done by [48] and described in the previous section (which simply modified the equations of motion by fiat). This approach consists on modifying the comoving scalar and gauge couplings to explicitly depend on the physical couplings as, λ2 = λ20 a 2(1−β) e = e0 a (1−β) (2.39) which now means that depending on the value of β these can either be constant throughout the simulation (β = 0) or shrink as in the true physical case (β = 1). Note that there is some flexibility in this approach to implement what [13] named core growth. In the true equations of motion, by normalizing the scale factor to one at the end of the simulation, the couplings are extremely large at the initial time-steps, which prevents evolution of the fields without any artifacts. In the core growth “trick” we start instead with a scale factor normalized to unity, and a negative β which forces the comoving radii to grow, evading the problem of too large string width at early conformal times—permitting a string network to form. After a sufficient amount of time-steps, we can now set β = 1 and resume normal physical evolution. This core growth procedure is illustrated on 2.6a. The equations of motion are now, a 2β λ0 ȧ (|φ|2 − σ 2 )φ φ̈ + 2 φ̇ = D j D j φ − a 2 (2.40) ȧ Ḟ0 j + 2(1 − β) F0 j = ∂ j Fi j − 2a 2β e02 [φ∗ D j φ] a (2.41) Notice the Hubble damping term on the Maxwell equation. This term goes to zero in the true equations of motion, however, when β = 1 this is no longer true. This term was not present in earlier Abelian-Higgs simulations, such as [42], and this would mean that the modified evolution would not preserve Gauss’s law. After this modification by [13] Gauss’s law is indeed preserved on the lattice to machine precision. The only problem with constant comoving width is the fact that energymomentum conservation is violated (albeit to order of few percent). So far, at least for Abelian-Higgs string networks it seems this violation does not impact overall network dynamics heavily (see [27]), although this has not been explored for other string types. 2.6 Simulations of Defects 29 Fig. 2.6 Two crucial aspects of Abelian-higgs field theory simulations: the evolution of the string radius versus conformal time, when s is equal to 1 or when it is equal to −0.27 therefore demonstrating why core growth trick works; and the lattice discretization with the scalar field (blue dots) defined at lattice sites, and lattice links between each site required to define how the gauge field lives on the lattice Now we are almost ready to write the discretized equations of motion to update all the fields in a 3D comoving lattice. In order to do so, the correct way is to look at the description of Gauge fields on a lattice given by [66], which preserves gauge invariance on a lattice. This description is based on the interpretation that gauge fields act as parallel transporters, U jx = e−i A j (2.42) defined half-way (at links) between lattice points spaced by x (note: in the above definition we have re-scaled the gauge field as Aj → A j x this implies the electric field is re-scaled in the same way since E j = F0 j ). A schematic representation can be found in Fig. 2.6b. The scalar fields will reside at lattice sites. Going around a lattice square of size x 2 we can write the following product of link variables, i j , x+k j i j = U jx U j (U jx+ki )∗ (U jx )∗ = exp[ix(∂i+ Aj (x) − ∂ +j Ai (x))] (2.43) denominated the plaquette operator. Here the electromagnetic field tensor is already apparent. From this, we can subsequently write down the gauge field strength, 1 1 1 − [ Fi j Fi j = ] (2.44) ij 2 x 4 i j For convenience, we will also define the backwards derivative of Fi j , (forward) gauge covariant derivatives and a Laplacian stencil, 30 2 Topological Defects ∂ −j Fi j = 1 [i j (x)] − [i j (x − k j )] x 3 j=i (2.45) 1 x x+k j − φx U φ x j (2.46) D +j φx = D −j D +j φx = 1 x−k [U jx φx+k j − 2φxj + (U j j )∗ φx−k j ]. 2 x j (2.47) We now have all the ingredients to recover the lattice discretization of [13]. It is then straightforward to take the equation of motions and create the following staggered leap-frog (second order in time) evolution scheme, (a 2 )x,η+ 2 = (a 2 )x,η− 2 + ηaη2 [D −j D +j φx,η 1 1 − Ei e2 x,η+ 21 λ0 2β x,η 2 a (|φ | − σ 2 )φx,η ] 2 η = + Ei e2 x,η− 21 + xη [−∂ −j Fi j eη2 (2.49) 2e02 aη2s I m[φ∗ Di+ φ]x,η ] 1 φx,η+1 = φx,η + x,η+ 2 x,η+1 Ai (2.48) x,η = Ai (2.50) x,η+ 21 + E i (2.51) to order O(x 2 ), O(η 2 ). In the continuum limit (i.e., when lattice spacing x vanishes), this evolution scheme reduces to the above equations of motion. We can also devise the discrete version of Gauss’s law, G = ∂ −j E i − 2e02 a 2β [φx,η,∗ x,η− 2 ] = 0 1 (2.52) Given that a summary of the simulations used throughout this thesis is now complete, we must warn the reader that simulations are not the be-all-end-all of cosmic string studies, in the sense that one is prevented (by hardware, precision) to completely simulate a network of strings/walls throughout all cosmic history. In this sense, a combination of analytical studies with simulations is often used. We will thus present in the next section an example of a model that describes the average properties of a network of defects, if of course, it is properly calibrated by simulations beforehand. 2.7 Network Evolution Modelling 31 2.7 Network Evolution Modelling The canonical way to analytically treat network evolution for defects is through the Velocity-dependent One-Scale (VOS) model [38]. For the cosmic string case, we begin by jotting down the simplest action for an infinitely thin and infinitely long four-dimensional bosonic string, S = −μ √ −γd 2 σ (2.53) where μ is the four-dimensional string tension, γ is the induced world-sheet metric, γab = gμν ∂a x μ ∂b x ν , where ∂ denote derivatives with respect to world-sheet coordinates, τ , σ. We note that while field theory strings don’t necessarily have an infinitely small radius (in fact, this is undesirable in field theory simulations, as it leads to lattice artifacts), assuming as an approximation a small radius, one is often able to recover, as an effective action, the Nambu-Goto action from the field theory one (see [65] for the Abelian-Higgs model). Bear in mind however, that in some aspects Nambu-Goto strings and field theory ones are massively different (all one needs to do is compare velocities and loop production rates from [27] and [50]). Still, this is besides the point we want to make presently. The procedure to derive the VOS model involves obtaining the equations of motion of the Nambu-Goto action on a flat FLRW background, where the metric is given by, ds 2 = a 2 (τ )(dτ 2 − d x 2 ) (2.54) where a is the scale factor and τ is the conformal time; and the transverse-temporal gauge is applied (x · ẋ = 0), ȧ (−1 x ) ẍ + 2 (1 − ẋ 2 ) = a ȧ 2 ˙ = −2 ẋ a (2.55) (2.56) where dots and dashes indicate derivatives with respect to the conformal time τ and with respect to the spatial world-sheet coordinate σ and is the quantity, x 2 = (2.57) 1 − ẋ 2 Then an averaging procedure is applied to obtain two macroscopic quantities, ρ= μa V dσ = μ L2 2 ẋ dσ v2 = dσ (2.58) 32 2 Topological Defects a density, ρ and a root-mean-squared velocity, v. μ is the string mass per unit length, is the string energy (zeroth-component of the energy-momentum tensor), a is the scale factor and L is the correlation length. Once this averaging procedure is applied, a set of two differential equations is revealed, dL = 2H L(1 + v 2 ) dt k dv = 1 − v2 − 2H v dt L 2 (2.59) (2.60) and these comprise the core of VOS model, where L is a correlation length, k is a momentum parameter (which while originally a constant, is analytically determined to take a specific form, detailed in the next subsection) and v the root mean square (RMS) velocity. In order to obtain the standard model, we need to explicitly account for the velocity dependence of the momentum parameter and also include one the main energy loss mechanisms of string networks. 2.7.1 Standard Velocity Dependent One-Scale Model With regards to the first missing ingredient, a phenomenological shape for the momentum parameter k(v) was obtained via a combination of the explicit form of this parameter for the helicoidal string solution and subsequent comparison with Nambu-Goto simulations in non-relativistic and relativistic regimes [39], k(v) = √ √ 1 − 8v 6 2 2 (1 − v 2 )(1 + 2 2v 3 ) π 1 + 8v 6 which in the relativistic regime should take the following form, √ 2 2 1 − 8v 6 k(v) = π 1 + 8v 6 (2.61) (2.62) This form of the momentum parameter presumes a maximal velocity√of v 2 of 1/2 (obtained in the low expansion rate limit) and a maximum k(v) of 2 π 2 which corresponds to the small amplitude limit of the helicoidal string solution ansatz. Note the even without considering this small amplitude limit, this parameter cannot exceed unity unless a wiggly string is present.1 In the case of v > √12 , the momentum parameter reduces to k(v) = 0. 1 Essentially this follows from considering that in the wiggly case the one-scale approximation is not exactly valid: the curvature scale and the characteristic length are different. Assuming some proportionality between the two, a constant factor would then be multiplied by the momentum parameter in the model. 2.7 Network Evolution Modelling 33 Fig. 2.7 The two main mechanisms for the creation of cosmic string loops: the collision between two strings and the self-intersection of a single string And now for the second missing ingredient: energy loss via a velocity dependent function F(v). In the standard VOS, this function only has a term describing energy loss via loop production. The creation of loops occurs either when a long string self intersects itself and or when two long strings collide and exchange partners (depicted on Fig. 2.7). These loops (which contain some string length and therefore some energy density) contract and collapse to a point, vanishing. We remark that when cosmic strings meet the probability of intercommutation is close to unity, although such is not necessarily the case for cosmic superstrings (where they appear from scattering amplitudes [63]). The rate of energy loss due to loop production is parametrized as being proportional to the velocity, ρ dρ = cv dt L (2.63) where c the loop-chopping parameter, is the constant of proportionality. We then end up with the following VOS model, dL = 2H L(1 + v 2 ) + F(v) dt (2.64) dv k(v) = 1 − v2 − 2H v dt L (2.65) 2 where F(v) = cv. 34 2 Topological Defects Note that so far we have assumed strings, but in reality it is possible to deduce equivalent models for other defect networks (see [37, 40, 47]). We can also present the standard domain walls VOS, derived from a higher-dimensional analogue of the Nambu-Goto action, as its extended form will be used in this thesis, dL = H L(1 + 3v 2 ) + cω v dt (2.66) dv kω 2 = 1−v − 3H v dt L (2.67) where kω and cω are by analogy, the momentum parameter and the blob chopping parameter. Note that in this standard case, both parameters are constants. Naturally extensions of Nambu-Goto (and by extension of the VOS) can also come from more higher-dimensional generalizations. For instance, the most natural action for a cosmic superstring would be the Dirac-Born-Infeld action from [46], which looks like Nambu-Goto, (2.68) S = μ dτ dσ |γαβ + λFαβ | plus an additional U (1) gauge field strength Fαβ and the corresponding coupling constant λ. This is the reason for our previous comment on why a more “natural” cosmic superstring simulation might be derived from existing Nambu-Goto codes. For cosmic superstrings, this action with junction conditions (to ensure the conservation of charge) enable one to obtain junction dynamics in a string network, which can be incorporated in a VOS model for multiple strings, see for instance [51]. Going back to standard string network evolution, a notable behavior networks partake in is known as linear scaling. It is one of the possible solutions to the VOS model. The solution is given by, L ∝ t ∝ dH v = const. (2.69) This linear attractor solution [38] can either be a curse or blessing, depending on the defect. For instance, let’s consider for a moment a domain wall, where the density is given by ρ = σ/L and σ is the tension per unit area, and therefore ρ ∝ t −1 . By comparison with the critical density of the Universe (of order ρc ∝ t −2 ) one can see the density parameter for domain walls evolves linearly with time, and so they would eventually dominate the energy density of the Universe! Fortunately this overclosing behavior can be evaded through various mechanisms (an example would be the introduction of biases, see [57]). For analogous reasons, Cosmic strings are benign. Monopoles are a slightly more complicated case, as depending on how effective monopole-anti-monopole pair annihilation is (global [35] vs local) and therefore depending on whether or not they tend to scaling, they can be catastrophic as well. 2.7 Network Evolution Modelling 35 2.7.2 Extended Velocity Dependent One-Scale Model As was mentioned in the previous section, the standard k(v) for Nambu-Goto cosmic strings was studied to show a specific dependence on the velocity, while no such work had been done for domain walls in their respective standard VOS. Such is a consequence of the non-existence of non-trivial analytical ansatz for walls (in contrast with the cosmic string case, where the helicoidal string is used) some modifications were required. However, in order to predict the correct velocity dependencies and to ensure the walls VOS could accurately model the evolution in regimes where the velocity does change (examples include the radiation to matter transition, or when a transition from an inflationary era to radiation occurs), it was necessary to introduce some modifications. However, by analogy with the Nambu strings k(v) the following generalized momentum parameter was proposed [40], k(v) = k0 1 − (qv 2 )β . 1 + (qv 2 )β (2.70) where k0 , β and q are free parameters. Note as well that this reduces to the relativistic string k(v) after making appropriate choices of parameters. In both cases q and k0 have a clear physical interpretation: q is limited by the maximal velocity of defect network, 0< 1 2 ≤ vmax q (2.71) n n+1 (2.72) which by the arguments of [58], 2 vmax = should not exceed 2/3 for walls and 1/2 for strings; k0 , assuming non-wiggliness cannot exceed 2. The calibration of the extended VOS for walls in [40] respects these constraints. Later in this thesis we investigate if the Nambu string form is adequate for the case of field theory strings or if the generalized form must be used. The other ingredient of the extension adds another energy loss mechanism to F(v), (2.73) F(v) = cω v + d[k0 − k]r where d and r are new free parameters. This new term phenomenologically assumes radiation can be represented by a power law of the curvature. Essentially regions where the wall (or string) have greater curvature are smoothed out and emit radiation. Note that it does not specify the nature of this radiation: in the case of comparing with global domain walls, it will be scalar radiation only but one cannot have an idea of how much of this radiation is emitted via the massless or the massive channel. In the case of walls it was shown that losses via radiation greatly exceed blob production 36 2 Topological Defects (via the linear term above; see [40]). In this thesis we will also study if this term is necessary (and sufficient) for describing the evolution of field theory strings, where the emission of massive radiation is posited to take a large role [27]. In the case of strings this emission of radiation should be greater in regions where the string discontinuous (such as a sharp kink) or when the string doubles back on itself (cusp). 2.7.3 Observational Footprints from Semi-Analytical Modelling The previously mentioned VOS model can be used to derive the expected observational footprints of a network of strings (or other defects) in a given cosmological background. Since this easily motivates the need for improvements in the calibration of semi-analytical models, especially in the light of next-gen facilities [15, 25, 36] we will now review how to connect such models to footprints in two different background types. 2.7.3.1 Cosmic Microwave Background Approximately 370, 000 years after its birth (redshift z = 1, 100) the Universe had cooled and expanded enough to allow charged electrons and protons to bind forming neutral hydrogen atoms. This had the consequence of greatly increasing the mean free path of photons, allowing them to transverse greater distances (previously Thomson scattering with electrons occurred after a short traveling distance). The Universe effectively became “transparent.” Shortly thereafter, since the neutral atoms had electrons not in a ground state, the electrons themselves would jump towards the ground state and produce photons (this is known as photon decoupling). The combined effect is for these freely traveling photons to produce a background of electromagnetic radiation (at microwave wavelengths), that can be observed even today. The Cosmic Microwave Background first detected in 1965 by Arno Penzias and Robert Wilson [43] is a landmark evidence of Big Bang origin of the Universe and, via successive observational probing [4, 10], has allowed for the confirmation of several pillars of modern Cosmology (such as C D M, nucleosynthesis, and by providing evidence towards inflation). This background constitutes one of the most, if not the most precisely described, black-body radiation spectra observed to date. It currently has a temperature of 2.72548 ± 0.00057 K . Due to its rather homogenous nature, the anisotropies of this background are often the object of study. For instance, the map of temperature anisotropies related to density perturbations can be described in terms of spherical harmonics, 2.7 Network Evolution Modelling 37 l δT T alm Ylm (n) (n) = T l m=−l (2.74) where l is a multipole moment, n is a unitary vector in the direction of the line of sight and alm are coefficients which describe the temperature perturbation. We can additionally define the multipole power spectra Cl , ClT = |alm |2 (2.75) which, allows the full power spectrum of anisotropies to be written as, δT (n) T = ∞ 1 (2l + 1)ClT 4π l=0 = DlT (2.76) (2.77) Note that these are not the only types of anisotropies of the CMB: one can also look into anisotropies in polarization, either curl or divergence-free, which by analogy with electrostatics are named E-mode (EE) and B-mode polarizations power spectra (BB), respectively, and additionally a cross-correlation temperature-E-polarization power spectrum (TE). In order to obtain the theoretical expectation of how different processes can contribute to the existence of perturbations and the resulting anisotropies, it is necessary to solve a set of linear differential equations in Fourier space [23], Dac X̃ a (k, η) = S̃(k, η) (2.78) where, Dac is a linear differential operator, S̃ is a seed or source term, and X̃ is a vector with information on all the perturbation variables for k mode (includes dark matter density, velocity fluctuations, etc.). Perturbations from the analysis of this equation can be classified as scalar, tensor or vector. Note that to linear order, there is no mixing between perturbation types. Now, in order to solve this equation one must first make an assumption about the seed/source term above. In the beginning of the 1980’s, the cosmological community was split between two distinct possibilities: inflation or topological defects. For the first case, perturbations are generated during inflation, pushed out of the cosmological horizon, and after inflation will gradually re-enter (as the specific wavelengths become comparable to the horizon), then evolving passively under the effects of cosmological expansion and gravity. The phases of such perturbations will remain constant after their production, hence the source term will be null, S̃(k, τ ) = 0. As soon as the perturbations re-enter the horizon they will provoke in-phase (coherent) acoustic oscillations of the surrounding fluid. This gives rise to a structure of peaks and troughs in the resulting temperature power spectra. 38 2 Topological Defects Topological defects however, are an entirely different beast: a network will continue contributing perturbations actively throughout space as it evolves. Here the source term is no longer null and will generically depend on the energy-momentum tensor of a full network. Due to the nature of this seeding, an ensemble of perturbations will give rise to incoherent acoustic oscillations. The computation of perturbations in Fourier space according to the aforementioned equation is then intrinsically linked to knowing the full history of the energy-momentum tensor fo the full network. In order to compute the power spectrum, we must describe the source term and then solve for X̃ , which should be a matter of computing, X̃ j (η, η0 , k) = η0 ηin dηG jm (η0 ; η, k) S̃m (η, k) (2.79) and for obtaining power spectrum, X̃ j (η0 , k) X̃ l (η0 , k ) = dηdη G jm (η, k)G ln (η , k ) S̃m (η, k) S̃n (η , k ). (2.80) Equivalently, this means that obtaining the power spectrum entails computing the Unequal-Time Correlators (UETC) of the seed term, 1 (2.81) d 3 xSm (η, x)Sn (η , x ). Ucd (k, η, η ) = V This can be done either in numerical simulations (both Nambu-Goto or AbelianHiggs) or by following the Unconnected Segment Model approach from [44], wherein the seed terms are given by the total network energy-momentum tensor μν (k, η), given by a sum over K energy-momentum tensors of unconnected consolidated “sticks,” μν (k, η) = K Nd iμν T o f f (η, η i ), (2.82) i=1 where T o f f is a string decay factor responsible for switching off the contribution of an i th segment after the time of its decay, and Nd is the number of segments that decay between two different conformal times. The stress-energy tensor for each i th segment will be given by the expression for a straight Nambu-Goto segment, 1 μν (y) = √ −g dσ μ x 2 μ ν ẋ 2 μ ν 4 δ (y − x(σ)). ẋ ẋ − μ x x ẋ 2 x 2 (2.83) And such quantity in Fourier space will depend on the string network velocity v and the comoving correlation length, ξ = a L. As an example see the 00 component, 2.7 Network Evolution Modelling 39 Fig. 2.8 The CMB spectrum obtained by Planck collaboration (top) and the spectrum predicted by an inflationary model (bottom; obtained via CMBACT4 software [44]). The first image is taken from [6] 00 (η, k) = √ μ 1− v2 sin(k · X ξη/2) k · X/2 (2.84) where vectors X and Ẋ are merely the segment orientation and velocity orientations (which can be related to string position and velocity x and ẋ, see for instance [19]). Assuming an r.m.s. velocity, the two quantities v and ξ can be given by the VOS equations above from the previous section. Note that this approach is extremely plastic, since even though a network of long unconnected Nambu segments is considered, a network of long Abelian-Higgs could still be assumed, as long as the evolution of the VOS is dictated by free parameters calibrated from Abelian-Higgs simulations. In addtion this approach has also been applied to domain wall networks [60] and even cosmic superstrings [19], where limits on wall tension (GσL 0 < 5.6 × 10−6 , where L 0 is the characteristic lengthscale at present time) and fundamental string tension (μ F < 2.8 × 10−8 ) are obtained, respectively. The Nambu-Goto string tension limit is of Gμ < 1.1 × 10−7 in [5, 19]. Although we did not mention it yet, the relatively low tension limits presented above are a direct result of inflation being the dominant contribution to the temperature power spectrum, this is obvious from a glance at observational data, whose power spectrum shape is consistent with the peaks and troughs structure predicted by inflation (see Fig. 2.8). Moreover the broad hump predicted by a string network (see Chap. 4) is clearly not in agreement with the observed peak structure and therefore strings cannot play a dominant role in structure formation. To be more quantitative, at multipole moment l = 10, the Planck collaboration [7] determined that strings cannot contribute more than 1–4% of the total temperature spectra. Note however, that even if strings cannot contribute a significant amount to the temperature spectrum, they can still contribute significantly to polarization spectra [14], whose characterization is one of the main reasons for the next-generation instrument COrE [25]. 40 2.7.3.2 2 Topological Defects Stochastic Gravitational Wave Background Another type of cosmological background expected to have been produced in the early Universe is the stochastic gravitational wave background (SGWB). Although as of yet undetected, the recent observation of gravitational waves from black hole and neutron star mergers by the LIGO experiment [2, 3] and both current and upcoming observational facilities [15, 29, 36] have brought the characterization of this background into the limelight. Several cosmological sources can contribute to this background (see [20] for a general review), including a network of cosmic strings. Given that such networks form loops, and cosmic string loops can emit gravitational waves with power [64], dEj = P j Gμ2 dt (2.85) where Pi will be given by assumptions derived from the presence of small-scale structure of the loop (kinky or cuspy loops for instance give rise to different Pi ). Assuming some (if not all) energy of the loops is emitted via this gravitational channel, loops forming and decaying throughout cosmic history will contribute to the SGWB. One possible way to describe the history of the loop number density involves using the VOS model [59, 62], a central part of this thesis. The main advantage of using the VOS is that evolution of the network in non-scaling regimes can be determined exactly provided the correct calibration of the model is used. We remark that in the case of SGWB, the proper treatment of the radiation-to-matter transition can have an impact in the resulting spectra—again, see [59]. Note that so far, such computations presume all energy in the loop is lost via gravitational radiation, which is in line with Nambu-Goto simulations, though possibly not with Abelian-Higgs simulations (due to the hinted presence of scalar radiation). Some work has been done in terms of studying the validity of the Nambu-Goto approximation in flat-space backgrounds, although the initial shape of the loop can yield contradicting conclusions [28, 41]—we will return to this point in the conclusions of the thesis. For now we will show how the SGWB due to the number density of string loops and how this can be computed from the VOS. To do so, we begin with the gravitational wave energy density spectrum in a unit logarithmic interval of frequency, which can be written as a sum over i th emission modes as, gw ( f ) = ∞ igw ( f ) (2.86) ∞ 1 2 Ci Pi = Gμ f ρc i=1 (2.87) i=1 where the specific expression for igw ( f ) can be modelled (following [16, 17, 64]) as if it were composed of different power spectra Pi in harmonic i of loops, where 2.7 Network Evolution Modelling 41 Ci accounts for the contribution of the number density of loops n(l, t) of length l at different times. The first assumption often made is that the Pi power spectrum is dominated by cusps along loops, and thus must take the form Pi = /i q /ζ(i), where i = 4/3 for cuspy loops, and the normalization is such that corresponds to the total emitted power = i Pi and, from Nambu-Goto simulations takes the value ≈ 50 [17]. We note that this might not be valid for Abelian-Higgs simulations. The expression of Ci is given by, Ci ( f ) = ∞ 0 2i dt n(li , t) f 2 (1 + z)5 (2.88) where z is the redshift. With this, the crux of computing the SGWB is reduced to how this number density evolves throughout cosmic history. The expression for loops created (per comoving volume) at time tc can be computed taking into account that, n(lc , tc ) = dn c dt dt t=tc dl (2.89) t=tc where n c is the rate of loop production and dl/dt is the rate of change of loop length.An expression for the first term can be found by dividing the expression for energy loss dρ/dt = cvρ/L through loop production by the energy of each loop at the moment of creation E = μl, where l is the size of of the loops at creation (assumed to be some fraction of the size of the horizon, l = αL), c dn c = vL 4 dt α (2.90) which is almost uniquely determined by the VOS, either through calibration (c) or via evolution of the equations (L, v). Note however that α needs to be determined by numerical simulations, and so far it has been determined only for Nambu-Goto, where it has been found to be α = 0.1. The second term can found by noting that current loop length is given by the difference between the initial loop size αL and the power radiated, l(t) = αL c + Gμ(tc − t) (2.91) where L c is the mean string separation at creation time, and therefore l(tc ) = αL c . In other words, we can write the second term as, 1 dt/dl = α ddtL . + Gμ t=tc (2.92) 42 2 Topological Defects Afterwards, it’s simply a matter of remembering that a given number density of loops will contain loops created at different times. Therefore we must sum over these different creation times, n(l, t) = c n(lc , tc ) a(tc ) a(t) 3 (2.93) and from this one can compute the entire SGWB spectra. In [17], an older release of NANOGRAV data (and other Pulsar Timing Array data) seemed to yield a bound on the string tension of Gμ < 1.5 × 10−11 . We do note however the presence of an unexplained signal in the latest 12.5 year NANOGRAV data [8], whose shape is consistent with the SGWB spectra produced by a cosmic string network [18, 24], assuming tensions in the range Gμ ∈ [4 × 10−11 , 10−10 ]. Note however, that it might be too early to claim a SGWB from a network of strings has been detected: the signal itself does not seem (at least at first sight) to exhibit quadrupole spatial correlations. As such further studies (and observations) are required to properly characterize it. To conclude we offer some remarks about the computation of the SGWB for cosmic superstring networks, which so far is poorly understood. For now, the role of junctions and their interplay with aspects of string phenomenology and subsequent impact on gravitational wave production is clearly not yet understood. What can be computed is the evolution of string networks with reduced intercommutation probabilities and therefore reduced loop number densities or the multi-tension VOS model for some fixed number of bound states. This was done in [61] where a conservative limit on the tension of F-strings was obtained: Gμ < 3.2 × 10−9 . 2.8 Summary We showed that in the study of cosmic defects one often employs a combination of numerical and analytical techniques are used to study these networks. We also described more exotic string types—such as cosmic superstrings, which have some phenomenological aspects that can be captured via field theory simulations with more than one string type interacting. Both numerical and analytical approaches to string studies have advantages and disadvantages: semi-analytical models alow one to study the whole cosmological evolution of the network, which simulations, limited by resolution, box size and dynamical range, cannot. However, any such model often contains free parameters which cannot be obtained ab-initio, but must instead be calibrated using simulations. Since this symbiotic relationship sits at the heart of this thesis, we also described lattice field theory simulations for two types of defects—global domain walls and Abelian-Higgs strings—and described the semi-analytical modelling to be used from here on out—versions of the canonical Velocity-dependent One-Scale model of string evolution. References 43 References 1. A non-linear field theory. In: Proceedings of the royal society of London a: mathematical, physical and engineering sciences 260(1300):127–138, 1961. ISSN 0080-4630. https://doi. org/10.1098/rspa.1961.0018. http://rspa.royalsocietypublishing.org/content/260/1300/127 2. Abbott BP et al (2016) Observation of gravitational waves from a binary black hole merger. Phys Rev Lett 116(6):061102. https://doi.org/10.1103/PhysRevLett.116.061102 3. Abbott BP et al (2017) Multi-messenger observations of a binary neutron star merger. Astrophys J Lett 848(2):L12. https://doi.org/10.3847/2041-8213/aa91c9 4. Ade P et al (2016) Planck 2015 results. XIII. Cosmological parameters. Astron Astrophys 594:A13. https://doi.org/10.1051/0004-6361/201525830 5. Ade PAR et al (2014a) Planck 2013 results. XXV. Searches for cosmic strings and other topological defects. Astron Astrophys 571:A25. https://doi.org/10.1051/0004-6361/201321621 6. Ade PAR et al (2014b) Planck 2013 results. XVI. Cosmological parameters. Astron Astrophys 571:A16. https://doi.org/10.1051/0004-6361/201321591 7. Ade PAR et al (2014c) Planck 2013 results. XXV. Searches for cosmic strings and other topological defects. Astron Astrophys 571:A25. https://doi.org/10.1051/0004-6361/201321621 8. Arzoumanian Z et al (2020) The NANOGrav 12.5 yr data set: search for an isotropic stochastic gravitational-wave background. Astrophys J Lett 905(2):L34. https://doi.org/10.3847/20418213/abd401 9. Ascher UM, Mattheij RMM, Russell RD (1988) Numerical solution of boundary value problems for ordinary differential equations. Class Appl Math 10. Bennett CL, Larson D, Weiland JL, Jarosik N, Hinshaw G, Odegard N, Smith KM, Hill RS, Gold B, Halpern M, Komatsu E, Nolta MR, Page L, Spergel DN, Wollack E, Dunkley J, Kogut A, Limon M, Meyer SS, Tucker GS, Wright EL (2013) Nine-year Wilkinson microwave anisotropy probe (WMAP) observations: final maps and results. APJS 208(2):20. https://doi. org/10.1088/0067-0049/208/2/20 11. Berkovits N (2000) The Tachyon potential in open Neveu-Schwarz string field theory. JHEP 04:022. https://doi.org/10.1088/1126-6708/2000/04/022 12. Bevis N, Saffin PM (2008) Cosmic string Y-junctions: a comparison between field theoretic and Nambu-Goto dynamics. Phys Rev D 78:023503. https://doi.org/10.1103/PhysRevD.78. 023503 13. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D 75:065015. https://doi.org/10.1103/PhysRevD.75.065015 14. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2010) CMB power spectra from cosmic strings: predictions for the Planck satellite and beyond. Phys Rev D 82:065004. https://doi.org/10. 1103/PhysRevD.82.065004 15. Binetruy P, Bohe A, Caprini C, Dufaux J-F (2012) Cosmological backgrounds of gravitational waves and eLISA/NGO: phase transitions. Cosmic strings and other sources. JCAP 1206:027. https://doi.org/10.1088/1475-7516/2012/06/027 16. Blanco-Pillado JJ, Olum KD, Shlaer B (2014) The number of cosmic string loops. Phys Rev D 89(2):023512. https://doi.org/10.1103/PhysRevD.89.023512 17. Blanco-Pillado JJ, Olum KD, Siemens X (2017) New limits on cosmic strings from gravitational wave observation 18. Blasi S, Brdar V, Schmitz K (2021) Has NANOGrav found first evidence for cosmic strings? Phys Rev Lett 126(4):041305. https://doi.org/10.1103/PhysRevLett.126.041305 19. Charnock T, Avgoustidis A, Copeland EJ, Moss A (2016) CMB constraints on cosmic strings and superstrings. Phys Rev D 93(12):123503. https://doi.org/10.1103/PhysRevD.93.123503 20. Christensen N (2019) Stochastic gravitational wave backgrounds. Rept Prog Phys 82(1):016903. https://doi.org/10.1088/1361-6633/aae6b5 21. Dai J, Leigh RG, Polchinski J (1989) New connections between string theories. Mod Phys Lett A 4:2073–2083. https://doi.org/10.1142/S0217732389002331 44 2 Topological Defects 22. Davis A-C, Brax P, van de Bruck C (2008) Brane inflation and defect formation. Phil Trans Roy Soc Lond A366:2833–2842. https://doi.org/10.1098/rsta.2008.0065 23. Durrer R, Kunz M, Melchiorri A (2002) Cosmic structure formation with topological defects. Phys Rept 364:1–81. https://doi.org/10.1016/S0370-1573(02)00014-5 24. Ellis J, Lewicki M (2021) Cosmic string interpretation of NANOGrav pulsar timing Data. Phys Rev Lett 126(4):041304. https://doi.org/10.1103/PhysRevLett.126.041304 25. Finelli F et al (2018) Exploring cosmic origins with CORE: inflation. JCAP 1804:016. https:// doi.org/10.1088/1475-7516/2018/04/016 26. Firouzjahi H, Leblond L, Tye SH (2006) The (p, q) string tension in a warped deformed conifold. J High Energy Phys 2006(05):047–047. https://doi.org/10.1088/1126-6708/2006/05/047. 27. Hindmarsh M, Lizarraga J, Urrestilla J, Daverio D, Kunz M (2017) Scaling from gauge and scalar radiation in Abelian Higgs string networks. Phys Rev D 96(2):023525. https://doi.org/ 10.1103/PhysRevD.96.023525 28. Hindmarsh M, Lizarraga J, Urio A, Urrestilla J (2021) Loop decay in Abelian-Higgs string networks 29. Jenet F, Finn LS, Lazio J, Lommen A, McLaughlin M, Stairs I, Stinebring D, Verbiest J, Archibald A, Arzoumanian Z, Backer D, Cordes J, Demorest P, Ferdman R, Freire P, Gonzalez M, Kaspi V, Kondratiev V, Lorimer D, Lynch R, Nice D, Ransom S, Shannon R, Siemens X (2009) The north American nanohertz observatory for gravitational waves 30. Jones NT, Stoica H, Tye SHH (2003) The production, spectrum and evolution of cosmic strings in brane inflation. Phys Lett B 563:6–14. https://doi.org/10.1016/S0370-2693(03)00592-6 31. Kachru S, Kallosh R, Linde AD, Maldacena JM, McAllister LP, Trivedi SP (2003) Towards inflation in string theory. JCAP 10:013. https://doi.org/10.1088/1475-7516/2003/10/013 32. Kibble TWB (1976) Topology of cosmic domains and strings. J Phys A 9:1387–1398. https:// doi.org/10.1088/0305-4470/9/8/029 33. Laguna P, Matzner RA (1990) Numerical simulation of bosonic superconducting string interactions. Phys Rev D 41:1751–1763. https://doi.org/10.1103/PhysRevD.41.1751 34. Lizarraga J, Urrestilla J (2016) Survival of pq-superstrings in field theory simulations. JCAP 1604(04):053. https://doi.org/10.1088/1475-7516/2016/04/053 35. Lopez-Eiguren A, Lizarraga J, Hindmarsh M, Urrestilla J (2017) Cosmic microwave background constraints for global strings and global monopoles. JCAP 1707:026. https://doi.org/ 10.1088/1475-7516/2017/07/026 36. Maartens R, Abdalla FB, Jarvis M, Santos MG (2015) Overview of cosmology with the SKA. PoS, AASKA14:016. https://doi.org/10.22323/1.215.0016 37. Martins CJAP, Achúcarro A (2008) Evolution of local and global monopole networks. Phys Rev D 78:083541. https://doi.org/10.1103/PhysRevD.78.083541. 38. Martins CJAP, Shellard EPS (1996) Scale-invariant string evolution with friction. Phys Rev D 53:R575–R579. https://doi.org/10.1103/PhysRevD.53.R575. 39. Martins CJAP, Shellard EPS (2002) Extending the velocity dependent one scale string evolution model. Phys Rev D 65:043514. https://doi.org/10.1103/PhysRevD.65.043514 40. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10. 1103/PhysRevD.93.043534 41. Matsunami D, Pogosian L, Saurabh A, Vachaspati T (2019) Decay of cosmic string loops due to particle radiation. Phys Rev Lett 122(20):201301. https://doi.org/10.1103/PhysRevLett.122. 201301 42. Moore J, Shellard E, Martins C (2002) On the evolution of Abelian-Higgs string networks. Phys Rev D 65:023503. https://doi.org/10.1103/PhysRevD.65.023503 43. Penzias AA, Wilson RW (1965) A measurement of excess antenna temperature at 4080 Mc/s. APJ 142:419–421. https://doi.org/10.1086/148307 44. Pogosian L, Vachaspati T (1999) Cosmic microwave background anisotropy from wiggly strings. Phys Rev D 60:083504. https://doi.org/10.1103/PhysRevD.60.083504 45. Polchinski J (2005) Cosmic superstrings revisited. Int J Mod Phys A 20:3413–3415. https:// doi.org/10.1142/S0217751X05026686. [AIP Conf Proc 743,331(2005)] References 45 46. Polchinski J (2007) String theory. Vol. 2: Superstring theory and beyond. Cambridge University Press. ISBN 9780511252280, 9780521633048, 9780521672283 47. Pourtsidou A, Avgoustidis A, Copeland EJ, Pogosian L, Steer DA (2011) Scaling configurations of cosmic superstring networks and their cosmological implications. Phys Rev D 83:063525. https://doi.org/10.1103/PhysRevD.83.063525 48. Press WH, Ryden BS, Spergel DN (1989) Dynamical evolution of domain walls in an expanding universe. Astrophys J 347:590–604. https://doi.org/10.1086/168151 49. Randall L, Sundrum R (1999) A large mass hierarchy from a small extra dimension. Phys Rev Lett 83:3370–3373. https://doi.org/10.1103/PhysRevLett.83.3370 50. Ringeval C, Sakellariadou M, Bouchet F (2007) Cosmological evolution of cosmic string loops. JCAP 0702:023. https://doi.org/10.1088/1475-7516/2007/02/023 51. Rybak IY, Avgoustidis A, Martins CJAP (2019) Dynamics of junctions and the multitension velocity-dependent one-scale model. Phys Rev D 99:063516. https://doi.org/10.1103/ PhysRevD.99.063516. 52. Saffin PM (2005) A practical model for cosmic (p, q) superstrings. JHEP 09:011. https://doi. org/10.1088/1126-6708/2005/09/011 53. Sakellariadou M (2008) Production of topological defects at the end of inflation. Lect Notes Phys 738:359–392. https://doi.org/10.1007/978-3-540-74353-8_10 54. Sarangi S, Tye SHH (2002) Cosmic string production towards the end of brane inflation. Phys Lett B 536:185–192. https://doi.org/10.1016/S0370-2693(02)01824-5 55. Schwarz JH (1995) An sl(2, z) multiplet of type iib superstrings. Phys Lett B 360(1): 13–18. ISSN 0370-2693. https://doi.org/10.1016/0370-2693(95)01138-G. https://www.sciencedirect. com/science/article/pii/037026939501138G 56. Sen A (2000) Non-BPS d-branes in string theory. Class Quantum Gravity 17(5):1251–1256. https://doi.org/10.1088/0264-9381/17/5/334. 57. Sikivie P (1982) Of Axions, domain walls and the early universe. Phys Rev Lett 48:1156–1159. https://doi.org/10.1103/PhysRevLett.48.1156 58. Sousa L, Avelino PP (2011) The cosmological evolution of p-brane networks. Phys Rev D 84:063502. https://doi.org/10.1103/PhysRevD.84.063502 59. Sousa L, Avelino PP (2014) Stochastic gravitational wave background generated by cosmic string networks: the small-loop regime. Phys Rev D 89(8):083503. https://doi.org/10.1103/ PhysRevD.89.083503 60. Sousa L, Avelino PP (2015) Cosmic microwave background anisotropies generated by domain wall networks. Phys Rev D 92(8):083520. https://doi.org/10.1103/PhysRevD.92.083520 61. Sousa L, Avelino PP (2016) Probing cosmic superstrings with gravitational waves. Phys Rev D 94(6):063529. https://doi.org/10.1103/PhysRevD.94.063529 62. Sousa L, Avelino PP, Guedes GSF (2020) Full analytical approximation to the stochastic gravitational wave background generated by cosmic string networks. Phys Rev D 101(10):103508. https://doi.org/10.1103/PhysRevD.101.103508 63. Tong D (2009) String theory 64. Vilenkin A (1981) Gravitational radiation from cosmic strings. Phys Lett 107B:47–50. https:// doi.org/10.1016/0370-2693(81)91144-8 65. Vilenkin A, Shellard EPS (2000) Cosmic strings and other topological defects. Cambridge University Press. ISBN 9780521654760. http://www.cambridge.org/mw/academic/ subjects/physics/theoretical-physics-and-mathematical-physics/cosmic-strings-and-othertopological-defects?format=PB 66. Wilson KG (1974) Confinement of quarks. Phys Rev D 10:2445–2459. https://doi.org/10.1103/ PhysRevD.10.2445. 67. Witten E (1985) Superconducting strings. Nucl Phys B 249:557–592. https://doi.org/10.1016/ 0550-3213(85)90022-7 46 2 Topological Defects 68. Witten E (1985) Cosmic superstrings. Phys Lett 153B:243–246. https://doi.org/10.1016/03702693(85)90540-4 69. Witten E (1998) D-branes and K theory. JHEP 12:019. https://doi.org/10.1088/1126-6708/ 1998/12/019 70. Zwiebach B (2006) A first course in string theory. Cambridge University Press. ISBN 978-0521-83143-7, 978-0-511-20757-0 Chapter 3 Supercomputing with Graphics Processing Units A very large part of space-time must be investigated, if reliable results are to be obtained. Alan Turing 3.1 An Introduction to High Performance Computing For well over a decade, the empirical Moore’s law (a doubling in transistor count should occur every 18–24 months, courtesy of refined manufacturing processes) has been upheld. Up until the early 2000’s the way to use these extra transistors was to create more complex processors, larger caches and much faster clock frequencies. However, this eventually proved unfeasible: simply increasing clock frequencies in a much smaller package eventually resulted in power and heat limitations. The way around this was to introduce parallelism in the processor itself, either through vector based instructions, simultaneous multi-threading1 or simple glue logic: just add a second independent (bar cache sharing) processor (called a “core”). However, while hardware multi-threading is managed by the Operating System, and vector instructions are compiler managed, the onus of multi-core usage falls to the programmer. The type of programming needed to leverage multi-core architectures is commonly know as parallel programming and, while challenging, it is a necessary way to perform highly demanding (in computational time, for instance) scientific computing tasks. 1 A thread can be defined as an instruction stream and in hardware based multi-threading, two streams of different types of instructions can be executed simultaneously. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations, Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_3 47 48 3 Supercomputing with Graphics Processing Units Since the study of cosmic defects outlined in the introduction and expanded upon in subsequent chapters does require leveraging extreme hardware resources, an exploration of High Performance Computing (parallel computing essentially) is necessary. Note that since defect simulations (especially field theory simulations) are heavily bottlenecked they end up limiting analytical and observational constraints both in current and for future observational studies. Improving current simulations will then allow more accurate descriptions of defect networks and the reasons for this are manyfold: increases in lattice size and resolution allow one to probe smaller and smaller sub-horizon scales and thus more properly resolve small-scale structure of strings, and also to evolve the lattice for a longer amount of time (the final simulation time is normally given by the time when the horizon is half the box size). As a consequence, a large part of the thesis was spent in creating from scratch two defect evolution codes, one for global domain walls leveraging, the other for AbelianHiggs cosmic strings able to exploitGPU accelerators. However, it is understandable that most readers of a thesis submitted for a doctoral degree in physics will not necessarily be interested in the computing details underpinning the codes used. As such this entire chapter will be an effort to self-contain all computing aspects of this manuscript, and we will leave new results in physics to later chapters. We will review some concepts on this area (both in the architectures and programming paradigms used) and then present a parallel implementation of the evolution of the simplest defect network: domain walls. With this said, we would like to stress that parallelism is not a silver bullet: even well optimized codes can be bottlenecked, and already achieving well-optimized code is a difficult task. The second part of this chapter will contain original content from the following publications: • Work published in Physical Review E, titled “General purpose graphics-processingunit implementation of cosmological domain wall network evolution,” found at the following reference [15]; • Work published in Astronomy and Computing, titled “Abelian-Higgs cosmic string network evolution with the Compute Unified Device Architecture.” This publication can be found at the following reference [16]; • Work published in Astronomy and Computing, titled “Abelian-Higgs cosmic string evolution with multiple GPUs.” This publication can be found at the following reference [17]. which discuss benchmarks and behavior of the implementations. The first part, which begins already next subsection, contains a birds-eye overview of Amdahl’s and Gustafson’s laws, architectures and programming paradigms and a description ofGPU accelerators. This is non-original content which would otherwise be located in the introduction, were it not for the fact that all computing content is relegated here. 3.1 An Introduction to High Performance Computing 49 3.1.1 Amdahl and Gustafson’s Laws In order to clearly expose how parallelism is absolutely necessary to execute large simulations in a reasonable amount of time, we will explain two laws that uproot all parallel computing. The first law is commonly called Amdahl’s law proposed in 1967 by computer scientist Gene Amdahl. It is a formula for predicting the theoretical maximum speed-up at a fixed workload as the number of processors increase. This law takes the following form: S= 1 (1 − p) + p N (3.1) where S is the maximum theoretical speed-up, p the proportion of the program that can be parallelized, N the number of processors. This law already states a limitation of parallelizing programs: as the number of processors increases N → ∞ the maximum speed-up tends to S = 1/(1 − p) where 1 − p is the serial portion of the code. Therefore, speedup is limited by the sequential part of the program. Measured speed-up is obtained by computing, S= t1 tN (3.2) where t1 is the serial execution time, and t N is the execution time with N processors. Characterizing how this measured speed-up varies with the number of processors for a fixed problem size is known as strong scaling. However, we don’t always want to execute the same problem size faster: we often wish to execute larger and larger problem sizes. Conversely for a small problem size, it might be overkill to use a large amount of processors. In 1988 computer scientists John Gustafson and Edwin Barsis presented Gustafson’s law. In this law, one still predicts the maximum theoretical speed-up, however the problem size scales with the number of processors. This “scaled” speed-up is then, S=s+ p×N (3.3) where s, p and N have exactly the same meaning as in Amdahl’s law. In this law the speed-up scales linearly with the number of processors. The problem size for each processor stays constant and additional processors are basically used to solve a larger problem. Characterizing the measured speed-up is weak scaling. Weak scaling is often necessary for applications that are limited by the amount of memory. Both will be demonstrated for the developed Multi-GPU Abelian-Higgs simulation code later in the chapter. 50 3 Supercomputing with Graphics Processing Units 3.1.2 Architectures and Programming Paradigms As mentioned in the previous section, getting the best possible performance often requires mapping to hardware present in typical supercomputers. As an introduction we will briefly explore two different types of programming models and how these are suited for different hardware architectures. The good news is that supercomputers are often hybrid machines: the basic building blocks are indeed shared memory machines daisy-chained into a distributed platform. We begin with the most familiar architecture: shared memory. Unless otherwise stated, the material in this section and subsections within is based on the following materials: [4, 5, 13, 43]. 3.1.2.1 Vector and Mulithreaded Computing According to Flynn’s taxonomy, computer architectures can be classified on how instruction streams are applied to data. One possible class comprises the execution of a single instruction being applied simultaneously on multiple data items. Most contemporaneous processing devices these days support applying an instruction to multiple data streams, either under vectorized instruction sets (as is the case of Intel CPUs using MMX, SSE, AVX instruction) even if not designed from the ground up to execute everything such a way. A specific variant of this classification is known as Single Instruction Multiple Threads, in which threads are used to allow simultaneous issuing of instructions to multiple data items. A thread is a stream of instructions spawned from a single process, which shares memory with other threads also spawned from the same process. We stress here an important distinction: even though a process can be thought of as a stream of instructions as well, processes do not share memory, and therefore explicit communications must be implemented. As a result, threaded implementations are often memory-light when compared to distributed counterparts. Architectures which support simultaneous execution of these threads and allow them equal access to memory, are also often denoted shared memory architectures. As a simple analogy to explain how shared memory architecture works, think of a white board in an office, where two office mates collaborate: both office mates are working on the same problem and have access to the same data (barring small amounts of private data) (Fig. 3.1). The most familiar example of a SIMD machine would be comprised of a familiar Central Processing Unit (CPU or processor) and a pool of memory, which allows vector instructions to be applied to multiple data items. In addition, it allows a process to spawn multiple threads as well, where in each thread, an instruction can be executed simultaneously in the cores available to it - this is known as Simultaneous Multi Threading (SMT). While threads have existed even before the advent of parallel computing (to allow for concurrency), they became quite useful in this area as mapping of a thread to a single core, with no migration to other cores is an effective way to parallelize applications. We can also think of an alternative classification of computer architectures that maps well to threaded programming, often denoted a 3.1 An Introduction to High Performance Computing 51 Fig. 3.1 In the first image 3.1a, we show a die-shot of a 10-core Intel Ivy Bridge Xeon. In the second, 3.1b, a schematic representation of the office analogy for shared memory architectures, where office mates (threads) can read and write from the same board. Taken from [4] shared memory architecture. In any such machine a common, shared pool of memory is available to all threads of a given executing program. There are also several programming paradigms that allow writing code specifically optimized to these shared memory architectures: some are directive based (like Open Multi Processing—OpenMP; Open Accelerators—OpenACC) and others kernel based (Open Computing Language—OpenCL; Compute Unified Device Architecture—CUDA), and, with the exception of CUDA, are known to work on either accelerators and traditional CPU’s. We remark however, a peculiarity, depending on the problem at hand and on the hardware used, well-optimized might be more easy to achieve in one paradigm than another.2 They all share a couple of things in common, essentially all the pitfalls of threaded programming: the need for synchronization mechanisms to avoid threads interfering with each other (commonly know as a “data race”) and the fact that they are limited to the amount of memory available in a SIMD platform. There is a way to avoid being limited to a Processing Unit (be it a GPU or a CPU), and that is to marry Message-Passing with the threaded approach. It is thus inevitable, that in order to bypass this limitation it is necessary to explore distributed memory architectures, to be covered in two sections. For the purpose of this manuscript, and before we move on to distributed architectures, we will dwell in more detail on another shared memory architecture, built from the ground-up for parallel tasks (using both SIMD and SIMT): the Graphical Processing Unit (GPU). 2 For instance, since OpenMP 4.5’s support for GPU’s is still in it’s infancy, it can be more difficult to match the performance of a comparative CUDA based code. 52 3.1.2.2 3 Supercomputing with Graphics Processing Units GPU’s as Computing Accelerators GPU’s, while originally only for graphical purposes, have, in recent years, also been used for data-parallel scientific tasks. At its core this results from a simple statement: GPU’s are designed from the ground-up to perform as well as possible in parallel workloads, often using a number of features to maximize compute throughput. This is extremely clear already at an instruction level, where the parallelism is evident in the lanewise SIMD instructions all cards (AMD or Nvidia) employ, which essentially uses threads to implement SIMD (Nvidia calls this Single Instruction Multiple Threads). To really contrast this with a typical CPU, let’s compare how a GPU “cores” operates with how a traditional CPU core operates. A CUDA core/Streaming Processor is an Arithmetic Logic Unit (ALU) which performs operations on scalar integers or single-precision Floating Point values, and a collection of such units is known as a Streaming Multiprocessor (Nvidia, since the Fermi architecture) or the compute unit (AMD, ever since the premiere of the Graphics Core Next architecture). The streaming multiprocessor (SM) contains 8 streaming processors (SP). These SMs only get one instruction at time which means that the 8 SPs all execute the same instruction. We can think of such SMs as the GPU equivalent of the CPU core. This is done through a warp (32 threads) where the 8 SPs spend 4 clock cycles executing a single instruction on multiple data (SIMD). In the AMD case, each compute unit is composed of four SIMD units, each equipped with with a 16-lane wide vector Arithmetic Logic Unit (vALU). Due to the previous structure a group of 64 threads scheduled for execution in a compute unit over four cycles is referred to as a “wavefront” (32 threads as “warp” in Nvidia cards, executed over 2 cycles). But! There’s more. Each SIMD unit also has an instruction buffer for 10 wavefronts, each of which can scheduled for execution. This means that, at any given time, a Radeon M3953 with 28 compute units can handle 71,680 active threads.4 The obvious cost is single-threaded performance. So unlike a CPU, GPU’s have to rely on heavily thread-level parallelism. In reality it’s not just the prior points that make GPU’s excel at scientific highly parallel workloads, it’s also the fact that their design is heavily skewed to support fast thread switching and a large number of high-bandwidth-low-latency pins are dedicated to memory traffic. In other words, while some threads might be busy executing a memory reading operation of sort, another group is executing at the same time some floating point operation. A consequence of this is that GPU’s somewhat encourage “oversubscription” of threads, again in marked contrast with a CPU, where the ideal speed-up is limited by the number of cores. The OpenCL memory model describes several types of memory: Global (which on a graphics card corresponds to video memory), local (a fast-access cache on each compute unit), constant (technically part of video memory as well, but constant) and private (memory bound to each work-item/thread). 3 This exact card was used for developtment of the domain walls code, which will be shown in a later section. 4 4 SIMD units × 10 wavefronts × 64 threads × 28 Compute Units. 3.1 An Introduction to High Performance Computing 3.1.2.3 53 Parallel Processing with Message-Passing Typical supercomputers do not consist of a single very large CPU with a ridiculous (106 , for instance) number of cores. In fact, the cost-effective solution is to try and combine many network nodes, each with their own processor and processes which can communicate through Message Passing. Each autonomous processor can operate each on data resident in it’s own memory space. Keeping the alternative architecture in mind, these would be an example of hybrid distributed and shared memory architecture, with multiple nodes (each with their own sets of processing elements—such as CPUs) connected via network interconnects. Returning again to the whiteboard analogy: two people are working on the same problem but in different offices, they each have their own copy of data with no implicit sharing. But what if one person needs some data that only the other individual has? The solution is explicit communication: person A calls person B by telephone to send whatever data person is required and B accepts the data. We also note a curious detail, we’ve assumed that each process would be at a different node, but this need not be the case, in fact it is always possible to have several processes, each mapped to a core on a node with each process having it’s own memory space, and communication occurs within the node. This sounds inefficient, and perhaps socially awkward, from the perspective of the analogy—two office-mates who each keep a half of the whiteboard for themselves and to share insights they video call each other even though they are standing near one another, but it is an alternative to thread parallelism. Eliminating the communication costs in-node can be beneficial to performance but requires application developers to combine two programming paradigms (which is a difficult task). Alternatively, some implementations of MPI (such as MPICH or OpenMPI) do create a shared memory space which is shared between processes (Figs. 3.2 and 3.3). To describe in more detail an example of such architectures, consider a real-life example: the Piz Daint Cray XC50 supercomputer, housed at the Swiss Centre for Scientific Computing in Lugano, Switzerland. It consists of several nodes, that can be grouped together according to their characteristics. In the case of the CPU/GPU nodes (hybrid), each with one 12-core Intel Haswell Xeon (E5-2690v3) and an Nvidia Telsa P100 accelerator; multicore nodes with two 18 cores Intel Haswell Xeon’s (E52695v4); and then login nodes (not to be used for compute tasks) with 10-core Intel Haswell Xeon’s. Four compute nodes are then stacked in one compute blade and 16 of these form a chassis. Each cabinet can then be populated by three chassis. Everything is connected in a dragonfly topology (all-to-all) by means of Cray’s proprietary highspeed network interconnect Aries. The full machine contains 5704 hybrid nodes, 1431 multicore nodes, and a grand total of 387872 cores with a theoretical peak performance of 27154.3 TFlop/s (ranking 12th in top 500 supercomputer rank as of November 2020). The main reason we describe succinctly the CPU/GPU numbers is because this particular supercomputer will be used later in the thesis (Fig. 3.4). Given that we described the typical supercomputer architecture and that we described the need for Message-Passing, only one question remains to be answered: which programming paradigm must we use here? The gold standard for MessagePassing is the Message-Passing Interface (MPI), a paradigm which allows different DRAM Memory Instruction Cache Scheduler Scheduler Streaming Multiprocessor Instruction Cache Streaming Multiprocessor L2 Cache Scheduler Instruction Cache Streaming Multiprocessor DRAM Memory Memory Controller Fig. 3.2 Schematic summaries of the structure of 3rd generation Graphics Core Next cards (Fig. 3.2a) and Pascal GP100 cards (Fig. 3.2b), showcasing the number of Compute Units/Streaming Multi-Processors. Taken from [6] and [3], respectively Memory Controller 54 3 Supercomputing with Graphics Processing Units 3.1 An Introduction to High Performance Computing 55 Fig. 3.3 The office analogy used to explain two processes in distinct nodes (Fig. 3.3a) and in the same node (Fig. 3.3b). Taken from [4] Fig. 3.4 The Cray XC50 compute blade (with four compute nodes), the building block of the Piz Daint supercomputer in Fig. 3.4a. The Dragonfly topology (all-to-all) used at different hierarchical ranks of Daint’s network. Taken from [13] types of communications between processes (collective or point-to-point) and the synchronization is handled by the messaging passing (thus avoiding the data-races present in the threaded approach). To illustrate synchronization in message-passing, let’s return to the “two offices analogy” where person A sends a letter to person B and person A only knows when the letter was sent not when it was received (this is an example of an asynchronous send). Person B waits for some precious data to arrive in his mailbox, and a receive is given as completed as soon as the message hits his inbox (this is a synchronous receive). In this case, no data in A can be corrupted by B and vice-versa, there is in fact no way for A and B to access the each others data without explicitly sending/receiving a message. MPI also offers another communication mode: non-blocking or blocking (which can either be synchronous or asynchronous), which can be implied if work is allowed to be done (non-blocking) while the communication has yet to be completed (if no work is allowed, then the operation is blocking). To put it simply synchronicity is related to completion of 56 3 Supercomputing with Graphics Processing Units an operation, but it is not responsible for declaring when control is returned to the program in question. Having briefly described the architectures that characterize parallel machines and the programming paradigms used to exploit them we now move on to describing efforts to parallelize defect simulations on GPU accelerators. 3.2 Global Domain Walls In this chapter we will begin to describe the first code that was ported toGPU accelerators. It was the cosmic domain walls code of [11, 31, 32], capable of not only evolving fields such that a network of domain walls eventually appears and evolves, but also capable of extracting “useful” quantities about the network. Let us define what is meant by “useful” more explicitly. As mentioned in the previous chapter, some cosmic defects are capable of undergoing scaling evolution, where a characteristic scale (defect separation) grows linearly with time and the network on average has a constant velocity. Depending on how this scale is related to the energy density, defects can be expected to overclose the Universe (for example, domain walls; however see next chapter for a way to avoid this fate) or neither disappear nor overclose it (such as in the case of cosmic strings). We note that this behavior is crucial both for analytical studies and numerical studies of defects, and with good reason. From the analytical point-of-view, the way the network loses energy (and therefore sustains scaling) has direct implications on analytical models of string evolution (which in turn has implications on observational consequences of the network itself). And from the numerical side, given that most simulations cannot last long enough to cover all of cosmological evolution, scaling is the only way one can extrapolate a set of small short-lived simulations to the required cosmological scales, by taking the appropriate observables and, for instance, using them to calibrate analytical models (see next chapter). Scaling is also a way to validate defect simulations—which will be required in this chapter. For domain walls it can be shown, both from analytic arguments [30, 51], or via simulations [31–33] that wall networks in a universe whose scale factor grows as a power law, a ∝ t m ∝ η m/(1−m) , where m is the expansion rate, for a sufficient amount of conformal time 5 will reach this attractor linear scaling solution. Radiation and matter are examples of epochs where scaling is reached in simulations, corresponding to m = 1/2 and m = 2/3, respectively. In order to numerically characterize if a walls simulation is in scaling we measure and output two quantities of interest, the energy density ρ and the mean velocity squared v 2 and check if they obey the following relations, ρ ∝ η μ , γv ∝ η ν , 5 Ie. sufficiently large dynamical range. (3.4) 3.2 Global Domain Walls 57 where μ = −1 and ν = 0. We then define a threshold for these exponents to differ from their expected values and use it to infer if the network is scaling. Note that for simulations with a smaller dynamic range this asymptotic regime may not be reached, as there is not enough time for the simulation to transition from the initial conditions to the expected scaling behavior. This translates itself on a dependence of the exponents μ and ν on the box size [19, 31]. At this point, we should also define how the two diagnostic quantities are computed in walls simulations. The first is the energy density, or equivalently the comoving wall area per unit volume. It is computed using the following robust method [47], ρ= A = V n · d A = A links δ± ∇φ ; |φ,x | + |φ,y | + |φ,z | (3.5) where δ± is unity every time the field changes sign between two neighboring points (called a link, indicates the possible presence of a wall) and zero otherwise. The other useful quantity to be computed is the root-mean-squared velocity v of the network and corresponding Lorentz factor. A possible method to compute the velocity was demonstrated in [31] and it uses the ratio between kinetic and potential energy (E k and V (φ)), 1 Ek , (3.6) (γv)2 = 2N walls V (φ) √ where γ = 1/ 1 − v 2 is the Lorentz factor and the sum is over the number N of grid points containing walls. The criteria to identify walls can be changed, but the standard version of this code identifies a point as part of wall if the absolute value of the scalar field φ doesn’t exceed a certain threshold. For the standard values used throughout this chapter, this threshold corresponds to 0.5. 3.2.1 Single Accelerator 3.2.1.1 Validation This code uses the Open Computing Language (OpenCL) 1.2 framework by the Khronos Consortium [36], and was created on a machine equipped with the following: a Radeon R9 M395 graphics card (28 compute units clocked at 834 MHz, and 2048 MiB video memory clocked at 1365 MHz) and an Intel i5 6600k processor (3.3 GHz core clock; can boost to 3.9 GHz) and 8192 MiB of system memory (clocked at 1867 MHz). The non-vectorized sequential version of the code will run on the aforementioned processor, mapped to core 0, and our implementation will run on the graphics card. Before we compare the performance benefits of this GPU-based implementation, we mustn’t forget that it must behave sufficiently close to the CPU version in order 58 3 Supercomputing with Graphics Processing Units Table 3.1 Scaling exponents μ and ν (with 1σ statistical errors) for single and double precision runs, calculated using the points beyond log(η) = 2.58 for both 20482 and 1283 simulations 20482 μ ν Single precision Double precision −0.9381 ± 0.0003 −0.9381 ± 0.0003 −0.0374 ± 0.0005 −0.0374 ± 0.0005 1283 Single precision Double precision μ −0.956 ± 0.003 −0.905 ± 0.002 ν −0.034 ± 0.006 −0.025 ± 0.004 for it to be used for scientific purposes. In this section, we will validate the new code in two ways: first from the expected scaling behavior and then by direct comparison with the CPU version. For the first step in validating the code, we use sets of five single and double precision runs of 20482 and 1283 boxes to calculate the previously defined scaling exponents of Eq. 3.4. We keep the same set of five fixed initial conditions for both the CPU and GPU versions, and for both single and double precision.6 Our decision to compare how single and double precision fare, is simply related to performance considerations vs. getting the correct behavior out of the code. Note that we compile our code in single precision with the flag -cl-fp32-correctly-rounded-divide-sqrt in order to ensure correct (ie. IE774 standard correct) rounding in division and square root operations in OpenCL. This is flag not necessary in double precision. We then take simulation outputs and calculate scaling exponents using a linear fit to the later part of the simulation, as we are only interested in the achieved asymptotic behavior. The computed exponents can be seen in Table 3.1, and are in agreement with previous simulations of boxes of these sizes for the CPU version [19, 31]. We can additionally impose criteria on the scaling exponents to ensure consistency with scaling. Using the criteria of [32], one can see these exponents are consistent with the expected behavior. The uncertainties shown are statistical, arising from the average of each set of five runs. We note that additional systematic uncertainties in these quantities are discussed in [33], but for the purposes of our comparison we do not compute and include them. Figure 3.5 showcases the evolution of the density ρ and the root-mean-squared velocity γv, illustrating both the initial period of the evolution (where the simulation gradually “forgets” the initial conditions) and the later scaling period. As a final validation, we compare CPU and GPU evolution in both single and double precision, timestep by timestep, in terms of the computed quantities. The differences seem to be negligible after early timesteps, as the wall network starts scaling. In fact, at scaling timesteps, they seem to be consistent with no difference between the two implementations (to machine precision). This can be seen in Fig. 3.6. At early timesteps, the differences are indeed large (especially in the single precision 6 The initial conditions are generated in single precision and used for both single and double precision cases. 3.2 Global Domain Walls 59 Fig. 3.5 Evolution of the density (ρ, left panels) and the velocity (γv, right panels), for 20482 and 1283 box simulations (top and bottom panels, respectively), showcasing the expected scaling behavior Fig. 3.6 Relative error between sequential and parallel code implementations, with 20482 boxes (top panels) and 1283 (bottom panels), for both single (blue) and double precision (orange), for the wall density (left panels) and the velocity (right panels) 60 3 Supercomputing with Graphics Processing Units case, even reaching 10−2 ), however these large differences seem to have no impact at later timesteps. 3.2.1.2 Performance Since we have just validated our implementation, now it is time to describe its performance, both by describing the optimizations used and by benchmarking the program. In OpenCL, applications are subdivided in data-parallel functions named kernels, which are to be compiled at run-time (so-called Just In-Time—JIT compilation). Every single step of the PRS algorithm corresponds to a kernel, and so do the velocity and density calculations, with a separate kernel for the sums. These kernels execute in order, one timestep at a time. In order to illustrate this consider the following simple example of a Laplacian kernel (which is basically a simplified version of the first step in the PRS algorithm), 1 2 __kernel void Lapl(__global float *Lphi, __global float *P0, const uint size_box) { 3 // Indeces const uint const uint const uint const uint const uint const uint const uint const uint const uint const uint 4 5 6 7 8 9 10 11 12 13 14 i = get_global_id(0); j = get_global_id(1); k = get_global_id(2); id = i + j * size_box + k * size_box ip1 = (i + 1) & (size_box - 1); im1 = (i - 1 + size_box) & (size_box jp1 = (j + 1) & (size_box - 1); jm1 = (j - 1 + size_box) & (size_box kp1 = (k + 1) & (size_box - 1); km1 = (k - 1 + size_box) & (size_box * size_box; - 1); - 1); - 1); 15 // Laplacian Lphi[id] = -6.0f * P0[id] + P0[im1 + P0[ip1 + j * size_box + k P0[i + jm1 * size_box + k P0[i + jp1 * size_box + k P0[i + j * size_box + km1 P0[i + j * size_box + kp1 16 17 18 19 20 21 22 23 j * * * * * * size_box size_box * size_box * size_box * size_box * size_box * + k * size_box * size_box] + size_box] + size_box] + size_box] + size_box] + size_box]; } which is launched by selecting the number of threads and the typical subdivisions in groups of threads. The OpenCL compiler (and the underlying hardware) handle the distribution of threads automatically to each Compute Unit following our recommendation of how to group threads. The fields are represented in memory using buffer data (linear contiguous), and the number of threads (work-items) spawned, in our case, are always equal to the number of points in a box. Not all kernels are optimized to use local memory. We will see in the Abelian-Higgs code (in CUDA) how to do this, where tiled halos will be implemented. An example of a kernel where titled local memory could be used is the Laplacian and the density kernel—the latter being one of the most time-consuming kernels (25.81% of runtime). Two kernels where we already employ local memory would be the velocity kernel (where we 3.2 Global Domain Walls 61 highlight the increased granularity of atomic additions needed to count the number of walls, as seen in [40]) and partial sums kernel. Speaking of the sum reduction kernel, we use the scalar version of the kernel in [48]. The reason for not using the vector one (where instead of using vector datatypes like float4, one would use float, for instance), is that the preferred vector width7 of the device in question is, for both double and floating point types CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT : 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT : 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE : 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE : 1 so it is equivalent to use either kernel. The sum reduction kernel computes a partial sum for each local memory patch, and all partial sums are transferred back to the host side, summed and written to disk. The only place where the CPU computes something sequentially is to sum the partial sums which result from the calculation of the velocity and the density. In order to test a small optimization, we use two queues running asynchronously with respect to each other, ensuring overlap between execution of compute kernels and data transfer operations. In order to quantify if there is a data transfer bottleneck, we first remark how the overlap between compute and data transfer works: one has two different queues, one for data transfer, one for kernel execution, and using events one triggers data transfer upon completion of the partial sums kernel. Unfortunately, to allow for overlap, the enqueueing of data transfers needs to be non-blocking. After enqueueing some kernels, it is important to wait for the data transfers to complete (to ensure the sum of partial sums isn’t summing over garbage). Since the waiting time will also include waiting for compute kernels to finish (again enqueueing kernels is a non-blocking operation) we estimate the time taken by data transfer to roughly correspond to the difference between waiting time and total kernel execution time. Comparing to the runtime reveals that data transfer is only a bottleneck in low resolution boxes. From the roofline model above (see Fig. 3.7), this implementation seems to overall have low arithmetic intensity, and to be mostly compute bound (when taking local memory bandwidth into account, see roofline model in Fig. 3.7). With AMD’s CodeXL, we report that all kernels have an occupancy of 70% and the main bottleneck on the number of waves per SIMD unit seems to be the number of scalar registers (96 are used, which corresponds to a score of 8/10, below 81 would be ideal). The tool also shows that the implementation would highly benefit from more local memory and more vector register usage (where 4-23 vector registers are used, depending on the kernel). We must now discuss one final detail: our code is compatible with both double and single precision (as seen in the validation section). It should be noted that consumer 7 The OpenCL compiler automatically packs the preferred number of work-items or threads into Single-Instruction-Multiple-Data lanes and henceforth takes advantage of the native vector width. We mentioned previously that GPUs tend to pack instructions into vectors. Originally, for AMD at least, it was preferable to pack everything into float4 vectors as well. However, the native width is the number of elements a Vector Arithmetic Logic Unit can process at once. 62 3 Supercomputing with Graphics Processing Units Fig. 3.7 On the top left: an estimate of the time wasted in data transfer, or how good the overlap between compute and data transfer is, for different box sizes. On the bottom left panel, a roofline model for the 2D implementation. On the right-hand-side, we can see the relative speed-up of the parallel version when compared to the sequential one, for both single (blue) and double (orange) precision, for 2D (top) and 3D (bottom) simulations facing graphics cards usually have much lower peak operations per second and as such there is a severe speed penalty in utilizing double precision (for AMD cards based on the Graphics Core Next architecture this varies between 1/2 and 1/16 of peak single precision operations per second [1, 2]). This expectation is confirmed by our analysis, summarized in Fig. 3.7. We can also highlight how the relative speed-up grows with box size until we reach a certain plateau. This is relatively easy to justify, also from the roofline: GPU’s require a large number of threads to hide the latency of operations and with too small a box size there are too few threads to do so. As the size increases, we start having enough threads however, eventually we hit another bottleneck (described above in the roofline discussion). 3.3 Abelian-Higgs Strings We can now describe the Abelian-Higgs single GPU implementation. We begin by noting that there is a practical problem with the discretization presented earlier in the first chapter, particularly when evolving the simulations at relatively large expansion rates (m ≥ 0.9 at single precision): the divisions and multiplications of a 2 factors can, at early time-steps, go beyond the scope of one’s precision and thus result in field variables being evaluated to NaN. The first thing we do will be to modify the equations as follows, 3.3 Abelian-Higgs Strings 63 (1 + δ)x,η+ 2 = (1 − δ)x,η− 2 + η[D −j D +j φx,η 1 1 − x,η+ 21 (1 + ω)E i λ0 2β x,η 2 a (|φ | − σ 2 )φx,η ] 2 η x,η− 21 = (1 − ω)E i + η[−∂i− Fi j + 2e02 aη2β I m[φ∗ Di+ φ]x,η ] (3.7) (3.8) where ω = δ(1 − β) δ= 1 mη 1 dlna η α = α . 2 dlnη η 2 (1 − m)η (3.9) (3.10) η As in the previous domain walls case, δ = 21 α dlna is responsible for Hubble dlnη η damping the scalar field. The difference is that one also introduces an unphysical term responsible for damping the gauge field, which is multiplied by ω = (1 − β)δ. Note that when s = 1 this gauge damping vanishes (as expected in the physical equations of motion). This is a similar trick to what is employed in the discretization for walls of [44], since the scheme is now Crank-Nicolson at the first order with respect to the time terms. Note that our previous problem with precision is non-existent by selecting constant comoving string thickness or in other words, β = 0 (such that a 2β is replaced by 1) and δ and ω are computed directly from the expansion rate. We set α = 2.0 as this is the choice for the physical equations of motion in the continuum limit. Other choices of α could possibly be explored later. For the purpose of validating the simulation (and later calibrating semi-analytical models), two essential diagnostics that are to be extracted from the simulations. First a correlation length ξ and second a root mean squared velocity v 2 . Before describing how to compute these outputs in the simulations, we must begin by defining some relevant quantities. For starters, the Lagrangian density Lx = 1 1 2 1 − Re[ E − ] i j 2e2 a 2 x 2 2e2 a 2 x 4 i j + ||2 − |D + φ|2 − a 2 V (φ) = E−B+P−D−V, (3.11) where for convenience in the last line we have also introduced a simplified notation for each of its components. From here we can also define an energy density and a pressure, (3.12) ρx = E + B + P + D + V px = 1 E+B +P− D−V. 3 3 (3.13) 64 3 Supercomputing with Graphics Processing Units From here √ we can already define two possible estimators for the correlation length ξ. Since ξ = V/l (with V and l respectively being the box volume and the total length of string it contains) we need only find the total length of string in the box. The first estimator makes use of the fact that the Lagrangian density should vanish away from the string, while being negatively valued at the string itself [8]; this leads to the definition −μV , (3.14) ξL = x Lx which we will from now on refer to as the Lagrangian-based correlation length estimator. The second option requires computing a gauge invariant winding, as defined in [29], at each plaquette (lattice cell face), Wi j = 1 (Yi,x + Y j,x+i − Yi,x+ j − Y j,x ) , 2π (3.15) where Yi is given by Yi = [(φx )arg − (φx+ki )arg + Ai,x ]π − Ai,x . (3.16) The presence of a string segment of length x piercing a plaquette is indicated by Wi j = 0. Obtaining the total string length is a trivial matter of adding the number of segments throughout the lattice, V , i j,x Wi j,x ξW = (3.17) which results in the winding-based correlation length estimator. Given that we assume straight segments connecting cells (which then form collections of strings), this length estimator suffers from the taxi-cab geometry of the strings. To correct for an overestimation of the length, we must multiply it by a factor of π/6, as seen in [49]. For the v 2 estimators we will use two possible choices. The first one comes from [26, 27] and is based on the fact that for the conjugate scalar field momentum, , the field configuration of a moving straight string can be given by Lorentz boosts of the static straight string ansatz. A detailed derivation can be found in [27]. For our purposes it is sufficient to simply quote the estimator itself, v2 where R is given by φ = 2R , 1+ R ||2 W R= x + 2 x,i |D x,i φ| W (3.18) (3.19) 3.3 Abelian-Higgs Strings 65 and W is a weight function, meant to merely localize the estimators around strings. We will refer to this as a the field-based velocity estimator. The second possibility is to use the equation of state estimator of [27], in which the box averages of density and pressure (each appropriately weighted by a weight function W) then yield v 2 ω 1 x px Wx ; 1+3 = 2 x ρx Wx (3.20) we will refer to this as the equation of state based velocity estimator. As for the choices of weight functions, we will explore two possibilities: one from the literature, in which the Lagrangian has been used (see [20, 26]) and a second choice which corresponds to the potential of the scalar field V (φ). Development, benchmarking and validation were all conducted on an NVIDIA Quadro P5000, with 2560 CUDA cores, a core clock of 1607 MHz and 16384 MiB of memory, clocked at 1126 MHz, graceously donated by Nvidia Corporation. Before we study the performance properties of our specific implementation, we have to validate it and cross-check it by comparison with literature results. 3.3.1 Single Accelerator 3.3.1.1 Validation The very first check of this code is to verify if Gauss’s law is obeyed to machine precision (either with the modified or the old discretization) at a lattice site. It matters not which specific lattice site, so we chose one exactly at the middle of the lattice. Both evolution schemes preserve Gauss’s law to machine precision and the new scheme correctly reproduces the dynamics of the network. These two characteristics can be seen for a 2563 box size simulation (same initial conditions in both the old and new scheme), in Fig. 3.8, where the left panel shows Gauss’s law violations at single precision, and the behavior of the winding based correlation length estimator, ξW . We add that, at most, the relative difference between ξW in the old and new discretization is of 0.02%. Additionally, inspecting iso-surfaces of the scalar field provides visual confirmation that a network of strings is formed and evolves as expected—some examples can be seen in Fig. 3.9. For the domain walls GPU code (previous section) we had a serial version of the simulation which had been previously tested and validated [32, 33, 44], and could directly compare outputs. In the present strings case both the serial and parallel versions were completely new to us, so it is perhaps more useful to take an alternative approach. As such the simulations were validated by evaluating the asymptotic scaling values and comparing them with the results in the literature (which come from Lattice Abelian Higgs—LAH). We have performed simulations in the two standard cosmological epochs—radiation and matter epochs—wherein the scale factor evolves as a(η) ∝ η and a(η) ∝ η 2 . 66 3 Supercomputing with Graphics Processing Units Fig. 3.8 The top panel shows the Gauss’s law violation operator G x at lattice site i, j, k = 0, 0, 0 at single precision for a box of size 2563 , while the bottom panel shows a winding based correlation length estimator ξW for two simulations using the same initial conditions, with either the new or the old discretizations, described in the text. For this comparison we use the same parameters as in the rest of the paper: x = 0.5, η = 0.1, λ0 = 2, e0 = 1 and σ = 1 Qualitatively scaling is clearly seen, as expected, when looking at the plots of Fig. 3.10 where the evolution of the Lagrangian-based mean string separation ξ˙L , the winding based mean string separation ξ˙W and the mean velocity squared v 2 5123 runs are shown in the two aforementioned epochs. Quantitatively, a comparison of asymptotic quantities, average mean string separation and average velocity squared, can be found in Table 3.2. These quantities are measured in the dynamic range in which the networks have reached scaling (we roughly use the last 10% of the dynamic range). In each case we obtain a statistical error from the average 5 different runs, each with different random initial conditions. As shown in the aforementioned table, comparing our results for the mean string separation slope with the values of ξ˙L in [8] (at the same 5123 lattice size) we find excellent agreement for both matter and radiation era simulations. Our other length estimator, ξ˙W , is also in excellent agreement with the results of the first, but in mild disagreement (about 1.5 standard deviations) with the value found in [9] (larger lattice size 10243 ). The discrepancy is even when comparing larger lattice simulations of [20]. In [9] is is explained that this might be due to a period of early cooling being applied to the initial conditions in the works of [9, 20] and an extended dynamic range, which jointly lead to a slow drift in the dξ/dη value (changing the ξ˙ from the 0.3 value to about 0.28 at 10243 and then about 0.24 at 40963 ). We will later explore in depth how the degree of initial cooling can affect the evolution of the network (and how this is reflected in the calibration of a semianalytical model), and how even with no cooling there is a slow drift of ξ/η with growing lattice size. The lack of cooling in the initial conditions is also evident in the oscillations present in the Lagrangian (field-based) mean string separation estimator ξL shown in Fig. 3.10. This evidently signals the presence of some radiation caused from the high gradients present in the initial conditions and is not dissipated 3.3 Abelian-Higgs Strings 67 Fig. 3.9 Isosurfaces of the absolute value of the complex scalar field with the value of 0.5, showing a network of Abelian-Higgs cosmic strings in the radiation and matter eras (left and right side panels respectively). All pictures are from simulations with box size 5123 ; the top panels correspond to timestep 60, while the bottom panels correspond to timestep 128 completely. Note that such radiation does not prevent scaling, neither in our case, nor in high-resolution domain walls simulations of [32, 33]. As for the velocities, the comparison is a bit more qualitative, since there are fewer measurements in the literature. The most recent work in field theory local string velocities comes from [27], which tabulates values obtained from extrapolating to infinitely thin strings – a process made to compare with Nambu-Goto velocities. We still present the extrapolated values in Table 3.2 (denoted with ext.), but we also note 68 3 Supercomputing with Graphics Processing Units Fig. 3.10 The evolution of the mean string separation ξL (left panel) and the winding based mean string separation ξW (right panel) for 5123 runs, in the radiation era (m = 1/2, blue lines without core growth and green lines with core growth) and in the matter era (m = 2/3, red lines without core growth). The values of the mean string separation slopes, ξ̇, inferred after the networks have reached scaling, are also added in the figure legends. These slopes are an average from the slopes of 5 different runs that the more correct comparison is likely with the asymptotic values one can infer from visually reading the measured velocities from the top and bottom panels of Fig. 9 of [27]. These are denoted in our table as asy.. When it comes to velocities we also observe late-time scaling behaviour— constant velocity, as expected. This can be seen in Fig. 3.11. Note as well the large oscillations present at early times as the network relaxes from the choice of initial conditions. Considering the asymptotic values, our velocity estimates are in reasonable agreement – the only exception being the potential-weighted estimator in radiation epoch (note however, it is not statistically significant). In the matter era, the comparison is also not direct as more dynamic range is required to evolve the true equations of motion (with or without a period of initial core growth). As such we directly compare our velocities in constant comoving width simulations to values obtained from physical simulations of [27], for matter epoch. Technically it should not be a significant issue comparing the two, since, as mentioned in [27] since PRS or physical simulations give similar velocities. 3.3.1.2 Performance Given that we have just validated our implementation it is now time to describe its performance: not only by describing how to port a cosmic string simulation toGPU accelerators but also benchmarking it and noting where it can be improved. This program utilizes an application programming interface named Compute Unified Device Architecture (CUDA, by NVIDIA Corporation) similar to OpenCL but with some key differences: it can only target NVIDIA GPU accelerators, it supports C++14 s 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 Epoch Radiation Radiation Radiation Radiation Radiation Radiation Radiation Radiation Radiation Matter Matter Matter Matter Matter Matter Matter ξ̇W – – – 0.265 ± 0.005 0.32 ± 0.03 – 0.26 ± 0.02 0.244 ± 0.005 0.32 ± 0.03 – – 0.277 ± 0.008 – 0.28 ± 0.01 0.247 ± 0.008 0.29 ± 0.02 ξ̇L 0.33 ± 0.02 – – 0.254 ± 0.005 0.32 ± 0.01 0.31 ± 0.02 – 0.234 ± 0.006 0.30 ± 0.02 – – 0.261 ± 0.008 0.30 ± 0.01 – 0.235 ± 0.008 0.29 ± 0.01 V – – – – 0.34 ± 0.01 – – – 0.34 ± 0.01 – – – – – – 0.26 ± 0.01 v2 L – – – – 0.31 ± 0.01 – – – 0.31 ± 0.01 – – – – – – 0.27 ± 0.01 v2 ω – 0.37 ± 0.01 (ext.) 0.30 ± 0.01 (asy.) – 0.31 ± 0.01 – – – 0.32 ± 0.01 0.31 ± 0.01 (ext.) 0.26 ± 0.01 (asy.) – – – – 0.25 ± 0.01 v2 [8] [27]@40963 [27]@40963 [20]@40963 This work [8] [9]@10243 [20]@40963 This work [27]@40963 [27]@40963 [20]@40963 [8] [9]@10243 [20]@40963 This work References Table 3.2 Numerical results for asymptotic scaling quantities ξ̇ (calculated using the Lagrangian or the winding estimator) and the three velocity estimators, for s = 0 and s = 1 (where applicable), from our simulations and from the literature. All quantities were measured in simulations with box sizes of 5123 , except where otherwise noted. The ext. and asy. denote values that were extrapolated (rather than directly measured from the simulations) and inferred by visual inspection of Fig. 9 of [27]; see the main text for further discussion of these 3.3 Abelian-Higgs Strings 69 70 3 Supercomputing with Graphics Processing Units Fig. 3.11 The evolution of the mean square velocity, estimated in three different ways: by using the estimator of Eq. 3.18, weighted by the potential v 2 V (top left panel) or the Lagrangian v 2 L (top right panel), or by using the equation of state parameter v 2 ω (bottom panel, see equation 3.20). In all cases the results are from 5123 runs, in the radiation era (m = 1/2, blue lines without core growth and green lines with core growth) and in the matter era (m = 2/3 red lines without core growth). The asymptotic values of the velocities, inferred after the networks have reached scaling, are also depicted even at kernel level, it is proprietary and has a lot of support, both at the level of documentation and performance analysis tools. As previously mentioned one of the roles of these interfaces (OpenCL, CUDA) is to abstract away details of the underlying hardware. Much like OpenCL, even if threads maintain the organization of being grouped into thread blocks, we do not need (in fact we cannot) assign multiple groups of threads to Streaming Multiprocessors— this is done “automagically” without any intervention from us. It then becomes a matter of how many threads are spawned and how large each thread block is. In our case, we spawn a number of threads equal to N 2 , where is the size of the side of an N 3 cubic lattice. Note that this is in contrast to what was done in the previous section for Global domain walls, the reason for this will become apparent later. CUDA, like OpenCL, is not directive based and instead splits applications into data parallel functions named kernels. As was the case for domain walls, we will map kernels to different pieces evolution equations and to network averaged quantities 3.3 Abelian-Higgs Strings 71 estimators. For evolving the fields from every timestep, there are three kernels: the first one corresponds to Eq. 2.48, the second to Eq. 2.49 and the third one to Eqs. 2.50–2.51. These will be denoted update, updateE and updateφA respectively. And then there are the kernels for computing useful quantities such as the mean string separations or the velocities. All of these kernels implement finite differences and as such there is a real risk of becoming memory bound. In the present case we will go further than in the domain walls case and optimize memory loads even more. Doing so entails again the usual considerations of the memory hierarchy of a GPU and an optimization called “Z-cycling” which will be described in the next paragraph. As in OpenCL, the abstract memory model of CUDA describes several types of memory and the ones relevant for the next paragraph include: global memory (which corresponds to video memory), shared (a fast-on-chip memory available to groups of threads, known as thread blocks) and local memory (per-thread memory, even faster on-chip memory). We note here a small source of confusion: the equivalent of shared memory in CUDA is called local memory in OpenCL, and the equivalent of local memory in CUDA is called private memory in OpenCL (Fig. 3.12). We will now describe the “Z-cycling” algorithm common to all kernels (except the kernel to update φ and A) used for Abelian-Higgs cosmic strings. The importance of such an algorithm for optimal finite differences was studied in [34, 37, 41, 53]. The goal is to relevant field quantities from global memory, where the fields, declared as contiguous arrays of structures (float2 and float4) reside, into shared memory. However, instead of merely loading all of the quantities of the entire box, one begins by loading only the data residing at zero height k = 0. This is 2D slab is decomposed into different chunks of shared memory (oriented along X and Y) denominated as tiles. One could naively expect these tiles to have size equal to the number of threads in a thread block (in the corresponding x and y directions), however as seen from the equations of motion every such tile would need data values from neighboring tiles for example at i ± 1, j ± 1, k ± 1. Let us first address the values along x and y (i ± 1 and j ± 1). In order contain the necessary data, and given that there is no Fig. 3.12 Schematic representation of the stepA kernel (which updates the conjugate momentum ): the tile in the middle represents a 2D shared memory tile where current values in the z-direction (site k) are loaded together with halos, and register values (blue pinheads) hold field values directly above and below (k − 1 and k + 1) 72 3 Supercomputing with Graphics Processing Units communication between thread blocks, one must pad these XY-tiles by 2, and load the appropriate boundary field values into these padding regions (commonly known as ghost cells or halos) before any computation can take place. Now we are only missing the values at k ± 1. For the first 2D slab, at k = 0 we load field values at the next and previous z-direction positions, respectively k = 1 and k = N − 1 (periodic boundary conditions) into local memory. After this we can perform the necessary computation for this specific slab (say for instance, to use the scalar field and gauge field to update the conjugate scalar field). In order to proceed one simply cycles upwards to k = 1, k = 2,... k = N − 1 and recycles whatever values already loaded before from local to shared memory. For example in the case of the 2D slab at k = 1 one has to load new values from global memory into local memory at k = 2, but the values at k = 1 were previously loaded into local memory and can moved to shared memory. The code listing below showcases this technique for computing the Laplacian (assuming gauge fields), 1 2 3 4 5 template<typename scalar, typename gauge> __global__ void __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP) LaplacianZCycle(float ct, scalar *P0, scalar *Lapl, gauge *A, int xstart, int xend, int ystart, int yend, int zstart, int zend) 6 7 8 const uint i = blockIdx.x * blockDim.x + threadIdx.x; const uint j = blockIdx.y * blockDim.y + threadIdx.y; 9 10 11 12 13 14 15 16 17 if (i >= xend) return; if (j >= yend) return; if (i < xstart) return; if (j < ystart) return; 18 19 20 __shared__ scalar P0_s[TILE_Y + 2][TILE_X + 2]; __shared__ gauge A_s[TILE_Y + 2][TILE_X + 2]; 21 22 23 const uint si = threadIdx.x + 1; const uint sj = threadIdx.y + 1; 24 25 26 27 28 29 //Hold Index, Nadir and Azimuth values in Registers scalar P0_top; scalar P0_bot = P0(i, j, zstart - 1); scalar P0_cur = P0(i, j, zstart); gauge A_bot = A(i, j, zstart - 1); 30 31 32 33 34 for (int k = zstart; k < zend; k++) { //Hold Index, Nadir and Azimuth values in Registers P0_top = P0(i, j, k + 1); 35 36 37 38 P0_s[sj][si] = P0_cur; A_s[sj][si] = A(i, j, k); 3.3 Abelian-Higgs Strings 73 39 //West/southmost halo for X,Y tile if (threadIdx.x == 0) { P0_s[sj][si - 1] = P0(i - 1, j, k); A_s[sj][si - 1] = A(i - 1, j, k); } 40 41 42 43 44 45 46 if (threadIdx.y == 0) { P0_s[sj - 1][si] = P0(i, j - 1, k); A_s[sj - 1][si] = A(i, j - 1, k); } 47 48 49 50 51 52 //East/Northmost halo for X,Y tile if (threadIdx.x == blockDim.x - 1) { P0_s[sj][si + 1] = P0(i + 1, j, k); A_s[sj][si + 1] = A(i + 1, j, k); } if (threadIdx.y == blockDim.y - 1) { P0_s[sj + 1][si] = P0(i, j + 1, k); A_s[sj + 1][si] = A(i, j + 1, k); } __syncthreads(); 53 54 55 56 57 58 59 60 61 62 63 64 65 Lapl(i,j,k) = (-6.0f*P0_s[sj][si] + P0_s[sj][si+1]*Cexp(nI*A_s[sj][si].x) + P0_s[sj+1][si]*Cexp(nI*A_s[sj][si].y) + P0_top*Cexp(nI*A_s[sj][si].z) + P0_s[sj][si-1]*Cexp(I*A_s[sj][si-1].x) + P0_s[sj-1][si]*Cexp(I*A_s[sj-1][si].y) + P0_bot*Cexp(I*A_bot.z) ); P0_bot = P0_s[sj][si]; A_bot = A_s[sj][si]; P0_cur = P0_top; __syncthreads(); 66 67 68 69 70 71 72 73 74 75 76 } 77 78 } This has a much higher bandwidth than reading from global memory. Likewise the previous shared memory tile at k = 0 can be loaded into the bottom local memory buffers. A schematic representation can be found in Fig. 3.12. The only slight variation on this technique that we use concerns the kernel to update conjugate of the gauge field: where too much of this local memory would be used and thus we opted to use shared memory again for both field values above and below the current 2D slab. Therein lies another advantage in first loading field values into shared and local memory and then using these cached values into computation themselves, where many times a specific component of a field is necessary. This poses an interesting question: should we load specific components from the fields in global memory or first cache them in shared/local and then load the necessary component? The two 74 3 Supercomputing with Graphics Processing Units types of field variables (float2 and float4) are given by aligned vector types defined in CUDA. This means that to read from global memory a specific component for all field values it would result in an un-coalesced read (non-sequential) which has a heavy impact in performance. For the opposite approach, given that in Nvidia GPU’s every successive 128 bytes can be loaded by a warp (32 threads) in one single transaction (to the point that a vector load instruction actually exists) one loads entire float2’s and float4’s from global memory into the faster caches available and then accesses specific components as necessary. this essentially moves the previous bottleneck to another type of memory, where bandwidth naturally higher and where the imposition of coalesced reads need not apply. The kernel to update φ and A is the only straightforward kernel, since software pre-fetching cannot be implemented—we only need field values at positions i, j, k for both reading and writing. Since this kernel performs coalesced reads and writes we can use as an additional baseline for comparison. Given that we described the “Z-cycling” optimization and the thread block size to be used, it is now a matter of finding and characterizing the main bottlenecks. Fortunately Nvidia’s Visual Profiler makes this job a little easier than AMD CodeXL (from which we counted assembly instructions and then created the roofline as needed) by simply pointing out the main bottleneck for each kernel. The evolution kernels are all limited by the global memory bandwidth when reading field values from it. As such, the relevant performance metric here is global memory read bandwidth and how it compares to the peak theoretical bandwidth of the test bench GPU. Such data can be found in Table 3.3, for box size 2563 in radiation era and comoving width. The lattice size is chosen as one requires two things: a sufficiently large lattice size such that the number of threads spawned is large enough to hide latency, and a small enough size in order to run each kernel quickly thousands of times. One can easily read from the table that we are sufficiently close to the peak read bandwidth of 288.5 GB/s.8 Note that we can also compare with the bandwidth of updateφA kernel, since, as previously mentioned it purely reads and writes in a coalesced fashion to memory. Interestingly, the kernel updateE seems to hit a 2.6% larger bandwidth while update is about 8.7% lower. There is also one detail we have thus far not discussed in detail. In order to ensure the bottleneck is not the amount of shared memory/local memory used—too much would result in having a less than ideal Streaming Multiprocessor occupancy—and to perform less granular global memory loads—in order to minimize the performance hit from loading the padding of each tile, one must hit a careful balance between having a too large and a too small thread block size. The only rule-of-thumb in this case is to maintain the thread block size a multiple of 32, given that instructions are issued in 32-thread warps. Apart from that it’s matter of trial-and-error and using the Nvidia Occupancy Calculator [52]. The best performance seemed to yield at a size of (32, 4) at 2563 box size. Note that even though this will result in a occupancy of around 50% with kernels which use “Z-cycling,” it is large enough for latency 8 This peak bandwidth is the one reported by the Visual Profiler, we do not perform additional measurements with custom kernels. 3.3 Abelian-Higgs Strings 75 Table 3.3 The effective Global Load and Store bandwidth (in units of GB/s), the number of Floating Point Operations per second (in Teraflops) and the achieved occupancy, for a 2563 simulation in the radiation era and for constant comoving width Kernel GLS (GB/s) TFLOPs Occupancy (%) update updateE updateφA VelRVLag VelRWLag VelEoSLag Winding 245.92 271.60 264.83 212.88 217.09 193.70 133.72 1.59 0.80 0.04 1.68 1.67 1.80 1.26 48.0 47.8 90.8 48.3 48.3 48.3 48.3 hiding such that the main bottleneck (as discussed in the previous paragraph) is global memory read bandwidth. Now we move on to the kernels for mean string separation and velocity estimation. Those that compute the lagrangian and therefore the mean string separation estimator derived from it, ξL , will also compute a velocity based on one of the three following estimators: vφ2 weighted with the potential, the same velocity estimator but weighted with the lagrangian or using the equation of state weighted by the lagrangian vω2 . These will be denoted throughout the manuscript respectively as “VelRVLag,” “VelRWLag” and “VelEoSLag.” In addition there will be an additional kernel for computing the mean string separation from computing the winding at each lattice plaquette, denoted simply “Winding.” All of these compute their necessary quantities for every thread and store the result in local memory, and subsequently a sum reduction is performed at thread block level by leveraging the CUDA Unbound library [39]. Since each block computes a partial (same as in global domain walls) sum, we must transfer the results back to host/cpu side and sum them before writing to disk. According to the Visual Profiler, the three velocity kernels end up being both compute and memory bound, and on account of this also have similar performance characteristics, as seen in Table 3.3. The memory limitations are explained from over-use of local memory (this ends up being necessary for the partial sums). This was mitigated to some extent by turning on the compiler flag –Xptxas dlcm=ca which caches register spills in L1 cache. However, improving the compute part is more challenging, as several of the compiler flags to either use hardware intrinsics or reduce precision can significantly alter the mean quantities computed, often changing the overall asymptotic values themselves or increasing uncertainties. Since computing these mean quantities would dominate the overall runtime (in proportions similar to what happens in the walls case) we will apply another simple optimization related to the fact that it is not necessary to compute these every timestep. As as example, and by default in the application, they are computed every n = 5 timesteps. This effectively reduces the time spent on diagnostic computation, as can be seen on Table 3.4. We also note that in a typical production run one will select 76 3 Supercomputing with Graphics Processing Units Table 3.4 Total elapsed times, in seconds, on the three evolution kernels plus estimator kernels (which calculate averaged network quantities) of one 2563 and one 5123 run. The total time is computed by taking the average runtime and multiplying by the number of times a kernel is executed in a single run, i.e. T ime × n calls . The first three kernels are executed every timestep, while the others are executed only every five timesteps Kernel 2563 5123 stepA stepB stepC VelRVLag VelRWLag VelEoSLag Winding 2.29 3.10 2.89 0.57 0.56 0.64 0.87 36.86 50.02 46.30 8.38 8.45 8.88 11.92 only one of the Velocity estimators and optionally the Winding estimator. The total run time then depends on the choice of diagnostic output and on the frequency of said output. We additionally remark as well that the time spent on Input/Output operations (which for now we are referring to as transferring the partial sums, computing the total sum and writing said output) is also dependent on this optimization. While previously we considered in the domain walls case diagnostic outputs every timestep, and then we additionally added the compute-data-transfer overlap, here we the reduction of the frequency of outputs greatly reduces the need for such overlap. In addition, since this would generically complicate our life when writing the multi-GPU generalization, as we will need compute-communication overlaps. Given that we have described the application’s performance in terms of bandwidth and runtime, we now need to compare to a baseline CPU implementation, in order to infer if there really is a performance advantage to the use of GPU’s. First we can do a ball-park estimate on the speed-up merely based on the typical memory bandwidth figure of current memory available to high-performance computing multicore CPUs and the video memory on GPUs. Based on the figures presented in [38], one can possibly stipulate a speed-up of an order of magnitude for bandwidth and compute-bound applications. This ballpark estimate also rests on the assumption that both applications are parallelized and optimized (and will reach close to peak bandwidth/throughput). In the case of walls, considered in the previous section, a speed-up of about two orders was obtained when comparing to a single-threaded implementation. For Abelian-Higgs cosmic strings, we don’t have a CPU code to compare to, but it is reasonable to estimate a speed-up of about an order of magnitude. We can however compare to the simulations of [8, 27], also known as Lattice Abelian-Higgs (LAH), with benchmarks kindly provided via private communication [25]. The benchmarks provided are expressed in time to full evolution multiplied by the number of processors per number of sites (expressed in units gpu-sec/site vs core-sec/site) for the evolution update and for generating and writing windings. 3.3 Abelian-Higgs Strings 77 Table 3.5 The performance of each of our kernels given in gpu-sec/site. Note that in order to compare with the LAH performance, provided by [25], we present the performance of all of three update kernels together (stepA+B+C, computed by summing the time for each update kernel from Table 3.4 and then dividing by the number of sites). These numbers can be obtained from the times at 5123 in Table 3.4 by dividing by the number of calls of each kernel in a run (1280 for steps A and B and C, 256 for estimators) and by dividing over the size of the lattice 5123 Kernel Performance GPU-AH Performance LAH (gpu-sec/site at 5123 ) (core-sec/site at 40963 ) stepA+B+C VelRVLag VelRWLag VelEoSLag Winding 7.75 · 10−10 2.43 · 10−10 2.46 · 10−10 2.58 · 10−10 3.47 · 10−10 8 · 10−7 Not available Not available Not available 1.3 · 10−6 Timings are for a single run in Monte Rosa supercomputer at 40963 lattice size with 32768 cores. Before we perform such a comparison we must note a caveat: [25] remarked that their simulation is not too optimised for the target architecture (multi-core CPU). While we were provided the timings for generating and writing to disk windings (where it was mentioned that most time is spent writing to disk), the reality is that in our case we output only the average quantity and not the the full winding output. In a later section, we will revisit this benchmark and compare with an in-situ visualization approach (where indeed the winding output is written to disk). As such, it is entirely correct to compare the two winding figures. LAH for the evolution updates takes (analogously Update + UpdateE + UpdateφA) for 10903 timesteps, 40963 lattice size and the winding compute and write (1300 timesteps) have the respective performance figures: 8 · 10−7 core-sec/site and 1.3 · 10−6 coresec/site. A 5123 run from start to finish (1280 timesteps) reveals a performance of about 7.75 · 10−10 gpu-sec/site for updating all fields. Our numbers are thus 3 orders of magnitude less time updating fields on any given lattice site. We present the figures for all other kernels in the rest of Table 3.5. It is often said that GPU cores are drastically slower (or more correctly, have lower throughput) than CPU cores. It is thus curious that the table seems to imply that they are only about 2.5 times slower. Speculatively this might be due to the different levels of optimization of the code. 3.3.2 Multiple Accelerators Havind described the single-GPU implementation we will now move on the next step of adapting our code to modern supercomputing facilities: to involve MessagePassing to enable the exploitation of a large number ofGPU accelerators to evolve fields on a lattice. We will again take the same methodology of validating and 78 3 Supercomputing with Graphics Processing Units describing the performance for this multi-GPU implementation as done previously in Global domain walls and the single GPU Abelian-Higgs case. 3.3.2.1 Validation We thus begin by first describing if it behaves exactly as expected, according to the literature. AS previously done for the single-GPU case and for the global domain walls case this will involve comparing the measured physical mean network properties with those found in the literature—for both rate of change of mean string separation and asymptotic mean velocity squared. We will perform such a comparison at different box sizes, including the single-GPU result at 5123 lattices, and 10243 , 20483 and 40963 for an average of five runs in matter era and radiation era. Note that all simulation parameters (lattice spacing, timestep size and coupling constants) are to be exactly as they were in the previous section, with constant comoving-width (PRS) enabled. The results of this comparison can be found in the Table 3.6, with the corresponding figures of our runs in Fig. 3.13. Overall, it seems our code in agreement with literature results, within one-sigma uncertainty. In more detail, we can describe each estimator and it compares. For the mean string separation, it seems there is a dependency on the lattice size and therefore it is not correct to compare values for different lattice sizes. Given our simulations are in agreement with reference literature values at each lattice size, this slow drift is also present in our work. Exploring if this slow drift is due to lattice size (and therefore resolution of small scales) is affects energy loss mechanisms of the network is something we will explore in the next chapter. We also note that the two different mean string separation estimators (based on winding and on the Lagrangian density) also lead to fully consistent values for the slopes. As for the average velocity squared, our previous work [16, 18] using the estimators of [27] had already established qualitative agreement with the values in the literature, up to and including 40963 simulations. Here this agreement continues. Note that in the case of the velocities there is no statistically significant drift in the scaling value as a function of the box size. On the other hand, and in agreement both with [27] and with our earlier 5123 study, our present analysis confirms that the velocity estimator based on the gradient on φ leads to values that are consistently lower than those of the equation of state estimator, by about ten per cent at all box sizes Fig. 3.14. In terms of the mean velocity squared no dependency on lattice size is observed in our work. While literature values are available only in [27], we can note that all values are consistent within one-sigma uncertainties, with the aforementioned literature reference and with the values report in the previous section. We also confirm the difference between velocity estimators found in [27] where vφ2 is consistently lower than the equation of state estimator, by about 10% for all boxes. 10243 20483 40963 5123 5123 10243 40963 10243 20483 40963 5123 5123 10243 40963 1/2 1/2 1/2 1/2 1/2 1/2 1/2 2/3 2/3 2/3 2/3 2/3 2/3 2/3 0.280 ± 0.023 0.268 ± 0.011 0.253 ± 0.007 0.30 ± 0.02 0.31 ± 0.02 – 0.234 ± 0.006 0.279 ± 0.016 0.256 ± 0.006 0.252 ± 0.010 0.29 ± 0.01 0.30 ± 0.01 – 0.235 ± 0.008 0.282 ± 0.026 0.267 ± 0.010 0.251 ± 0.006 0.32 ± 0.03 – 0.26 ± 0.02 0.244 ± 0.005 0.285 ± 0.017 0.257 ± 0.005 0.250 ± 0.009 0.29 ± 0.02 – 0.28 ± 0.01 0.247 ± 0.008 0.306 ± 0.003 0.312 ± 0.001 0.308 ± 0.002 0.32 ± 0.01 – – – 0.255 ± 0.003 0.264 ± 0.001 0.265 ± 0.001 0.27 ± 0.01 – – – 0.272 ± 0.002 0.283 ± 0.001 0.282 ± 0.001 0.31 ± 0.01 – – – 0.228 ± 0.004 0.240 ± 0.001 0.243 ± 0.001 0.25 ± 0.01 – – – This section This section This section Previous section [8] [9] [20] This section This section This section Previous section [8] [9] [20] Table 3.6 The asymptotic rate of change of the mean string separation ξ and the mean velocity squared v 2 for the estimators defined in the text, in the radiation and matter eras, for our simulations with box sizes of 40963 , 20483 and 10243 , using 4096, 512 and 64 GPUs respectively. The error bars are the statistical uncertainties from averages of 20 runs with different initial conditions. For comparison we show the results reported in [16] from the single GPU code (for averages of 12 5123 simulations) as well as results from simulations with CPU-based codes. The range of timesteps used for each fit to the GPU simulations is respectively [517, 1023.5], [300.5, 511.5], [100.5, 255.5], [80, 128] for the 40963 , 20483 , 10243 and 5123 simulations ξ˙L ξ˙W Size m v2 ω v2 φ References 3.3 Abelian-Higgs Strings 79 80 3 Supercomputing with Graphics Processing Units Fig. 3.13 The evolution of the four relevant average network estimators, defined in Eqs. 3.14, 3.17, 3.18 and 3.20, for the average of 20 runs in the radiation-dominated epoch (m = 1/2), with lattice sizes of 40963 , 20483 and 10243 , using 4096, 512 and 64 GPUs respectively. We assume constant co-moving width throughout 3.3.2.2 Performance Now that the multi-GPU implementation has been validated, we turn our attention to describing its performance. As previously mentioned the standard way to subdivide a domain across multiple processing elements (which in our case are GPU accelerators) and communicate boundary terms, is to use the Message Passing Interface (MPI). This is necessary for example to compute conjugate scalar and gauge fields and E, and for the computation of the network average quantities. Since MPI is designed to be used across not only several cores of a specific processor but also across several nodes in a network this will ensure the code not only works in a multi-GPU workstation but also for a large supercomputing architecture. We will throughout this work assume a 3D domain decomposition (more scalable communications). In practice this means each sub-domain will be extended by 2 along each direction in order to store boundary values from nearby sub-domains. This means the typical dimension of each sub-domain will then be (N X + 2) × (N Y + 2) × (N Z + 2) and the full lattice size will be of N X × N pr ocsx × N Y × N pr ocs y × N Z × N pr ocsz , where N pr ocsi indicates the number of processes along a given direction. 3.3 Abelian-Higgs Strings 81 Fig. 3.14 Same as Fig. 3.13, for the matter-dominated epoch (m = 2/3) Given that boundary terms are to be communicated to and fro neighbouring subdomains, we additionally need to create CUDA kernels to extract field values (from the outer core of each sub-domain) into additional buffers which are sent via Isend (from the MPI standard). Similarly we will also require unpacking kernels which take the received values and write them to the boundary of each sub-domain. The receive instruction will correspondingly also be Irecv. These are both non-blocking instructions, which, as mentioned in the standard good practices [43], allow MPI to decide the best pattern of communication. To ensure completion of communication before unpacking, a barrier is necessary and therefore the instruction WaitAll is used. In order to both make our life easier, and comply with standard good practices (see again [43]) we will use store these communications buffers in Unified Memory and allow Remote Direct Memory Access. This will mean these buffers are resident on GPU, but accessible by the host side, of course after appropriate data-movement (and amenable to CUDA enabled MPI). There is also an additional challenge here, which comes from the fact that CUDA kernels are non-blocking with respect to the host. This basically means that after the host side launches a kernel on a GPU, it can immediately execute other instructions without waiting for the completion of the kernel on the GPU. This could immediately cause issues, if one send a communication buffer, without it having been updated by the GPU. The correct way out of this problem is 82 3 Supercomputing with Graphics Processing Units to use the CUDAStreamSynchronize function to ensure a (un)packing kernel and all kernels before it have completed in time, at appropriate steps. We will note an additional “detail” about the boundary conditions of each subdomain. It is necessary (as can be seen from the computation of the backwards derivative ∂− Fi j and the winding) to ensure diagonal terms of the gauge field Ai components are also available at boundaries. The way to correctly handle this is to use the “diagonal” trick which uses the values previously exchanged in a given direction to update the new corners necessary in the next direction. This establishes a depency of exchanges in one direction with the previous exchange, as therefore imposes communications must proceed in order. In other words, a typical communication step will look like the following series of steps: 1. Pack the values to be sent to neighbors along X (outer part of the inner N X × N Y × N Z part of the domain); 2. Send packed buffers to neighboring sub-domains; 3. Unpack received values into boundaries in the X direction; 4. Pack the values from the outer cells of the inner N X × N Y × N Z along with values received from the previous exchange (to ensure corners are appropriately handled), to be sent to neighbors in the Y direction; 5. Exchange packed buffers in the Y-direction; 6. Unpack received buffers into the boundaries of the sub-domain; 7. Pack the values from not only the inner core from the sub-domain but also from the two previous exchanges; 8. Exchange packed buffers in the Z-direction; 9. Unpack received buffers into boundaries. This “diagonal trick” is schematically represented in Fig. 3.15 for the 2D case. Note how the field values fromt he inner core (blue) are packed into the red buffers, an exchange occurs along X, and then these same value are used to pack field values along Y (red buffers). Generalizing this to 3D implies that the Z-buffers are extended by two along both X and Y directions, and red and green buffers are then used to update the last halos atop and below each sub-domain. The attentive reader will also notice the presence of magenta and orange boxes, this will be explained below. Having completed all communications steps, one already has the necessary ingredients for computing the conjugate momenta fields, and E, and then, without any additional communication update φ and A. This would already allow one to simulate with multiple GPUs. However in order to simulate large lattices, we still need to achieve near-perfect weak scaling for the almost the full machine. This can be achieved by considering compute-communication overlap. This can be achieved by updating the fields in the inner core of each sub-domain while one starts to collect field values for the exchange buffers along the X-direction. Note that the outermost points of this inner core will require values from the boundaries—which are still being communicated. This means that computation cannot occur for the inner core of size N X × N Y × N Z but must instead occur for an inner core of size (N X − 2) × (N Y − 2) × (N Z − 2). After the exchange in the X-direction is 3.3 Abelian-Higgs Strings 83 NX NY y x I II III Fig. 3.15 The packing procedure along two different directions. Blue represents the core of each domain (of size N X × N Y × N Z ), red represents the buffers being filled with appropriate values to send to neighboring sub-domains and green represents an already received buffer. In the left panel, the buffer values come only from the blue inner core. After communication has taken place in this first direction, one can unpack the received buffer into the boundaries of the sub-domain. Once done, one can start packing the communication buffers for the next direction. This involves using not only the blue inner core but the freshly unpacked boundaries (in green). The pink boxes indicate domain areas where one updates fields E and φ either as the packing procedure begins (left and middle panels) or after all communication has taken place (right panel) whereas orange indicates these areas have already been updated completed, we can then proceed with updating the outer part of the inner core since the necessary boundary is now available, while communication is now allowed to proceed along the Y-direction. This proceeds until all halos are updated, and all field values of the conjugate momenta throughout the sub-domain are updated. Returning to the schematic view of Fig. 3.15, the inner areas which are updated (compute) are highlighted in magenta at each step, while the already updated areas are depicted in orange. There is an additional ingredient to allowing multiple CUDA kernels to overlap as well and this is to allow them to execute in seperate CUDA streams asynchronous to eaach other. There will be one stream per pack/unpack kernel pair (for a given exchanges along a given direction) and the kernels to perform field updates. At the end of the day, given the interdependencies exchanges in different directions, one must also ensure these are respected/enforced. The way to do this is to use a combination of cudaEventRecord (signaling the completion of a kernel) and cudaStreamWaitEvent which awaits the completion of an event in a different stream. After the completion of all communication operations and field E, updates we can proceed to update φ and A. Now that we have described the general way in which the multipleGPU accelerators are to be used, we will now describe its scalability. All benchmarking was conducted at the Piz Daint supercomputer, described in the previous section. All benchmarks will assume the evolution of fields as previously mentioned and, in order to 84 3 Supercomputing with Graphics Processing Units mimic a real production run we will additionally compute the Lagrangian based mean string separation and the mean velocity squared estimated from vφ2 . These network average quantities are to be computed at every five timesteps. Simulation parameters are as before λ = 2.0, e = 1.0, x = 0.5, t = 0.2 ∗ x η0 = 1.0 and η f inal = 0.5 ∗ N ∗ x where N is the lattice size of an N 3 lattice. Before we characterize the application in terms of performance metrics it should be noted which one of these will be more relevant and was thus the target of our optimization efforts. While both strong and weak scaling can be relevant, weak scaling is the one we will persue as increases in lattice size (and consequently dynamic range) allow us to probe a larger range of scales between string width and the size of the horizon. While one cannot simulate all of cosmological history in a single simulation (due to obvious memory limitations), one often needs to extrapolate from smaller simulations (with the aid of semi-analytical modelling as described in the next section for instance). Strong scaling would be much more critical if wall-clock times (to be presented next) would be larger. We do however remark that strong scaling does still have importance in terms of finding the most efficient (in terms of node-hour usage) configuration for a given run of a given lattice size. Since in string simulations the final conformal time is directly proportional to the size of the side of the simulation lattice, we need to find some way to quantify weak scaling—either normalizing to time for a timestep (divided from total run time) or fixing the amount of timesteps we evolve the box for. We measure only the amount of time taken to evolve fields on a lattice for 630 timesteps—the necessary number of timesteps to evolve a 2563 lattice (with x = 0.5) from the initial time to its final conformal time. Additionally, we will also use some of times measured for strong scaling to compute a derived weak scaling metric, based on the time to evolve a lattice of size 5123 for 1270 timesteps. For both types of scaling we will describe them using a speed-up factor S, whose definition can be earlier in this chapter, and a parallel efficiency E. It is worth mentioning that we compare to a reference wall-clock time tr e f , which while for weak scaling corresponds to the time taken to evolve the smallest overall domain size in a single GPU, for strong scaling it has a different definition. In the latter it corresponds to the necessary amount of time to fully evolve a lattice of size N 3 in the smallest number possible of GPUs in which all field variables for the full lattice fit. The efficiency for weak scaling is trivially the speed-up converted to a percentage. For strong scaling however, we re-scale the definition with the number of GPUs of the reference runs, n r e f tr e f . (3.21) E str ong = ntn Having defined these, we can then proceed to describe the performance of our application. Let us begin with strong scaling. It is obvious, both from the Table 3.7 and Fig. 3.16, that there exists a point beyond which we cease to obtain useful strong scaling. We will assume “useful” means a minimum of 50% efficiency, however we note that there is no consensus on the definition of “useful” scaling. This point of dimishing returns happens when the sub-domain size becomes small enough, as 3.3 Abelian-Higgs Strings 85 Table 3.7 Strong scaling measurements for different lattice sizes reported in wall clock time to fully simulate a network from start to finish. We also present the speed-up (relative to the reference measurement) and a parallel efficiency Box size Number of Domain Wall-clock Speed-Up Efficiency GPU’s decomposition time (x,y,z) (s) (%) 5123 10243 20483 40963 81923 1 2 8 32 8 64 512 64 512 512 4096 4096 (1,1,1) (1,1,2) (2,2,2) (2,4,4) (2,2,2) (4,4,4) (8,8,8) (4,4,4) (8,8,8) (8,8,8) (16,16,16) (16,16,16) 96.0 50.1 18.2 6.59 217.39 37.06 12.48 438.45 76.15 948.52 156.96 1990.51 – 1.92 5.16 14.57 – 5.87 17.41 – 5.76 – 6.04 – – 95.9 66.0 45.5 – 73.3 27.2 – 72.0 – 74.3 – Table 3.8 Weak scaling measurements for fixed box size of 2563 per domain are presented above. The wall-clock time corresponds to the time to complete 640 timesteps (the number of timesteps for a full 2563 size simulation). In addition we present a speed-up as well as a parallel efficiency Box size Number of Domain Wall-clock Speed-Up Efficiency GPU’s decomposition time (x,y,z) (s) (%) 2563 1 2 2562 × 512 256 × 5122 4 8 5123 10243 64 10242 × 2048 128 1024 × 20482 256 512 20483 20483 × 4096 1024 4096 40963 (1,1,1) (1,1,2) (1,2,2) (2,2,2) (4,4,4) (4,4,8) (4,8,8) (8,8,8) (8,8,16) (16,16,16) 8.93 8.95 8.93 8.94 9.17 9.34 9.44 9.68 9.61 9.81 – 1.00 1.00 1.00 0.97 0.96 0.95 0.92 0.92 0.91 – 99.7 99.9 99.8 97.4 95.6 94.6 92.2 92.9 91.2 one approaches 1283 . This appears to be relatively common in most multi-GPU implementations (see for example [22, 42]). The reason for poor strong scaling is twofold: first the amount of communications relative to the execution of CUDA kernels, where not even the overlap is effective at hiding communications cost; and second the inherent latency of launching CUDA kernels. In fact one might even say the reason why the overlap is less effective is due to this latency cost as well. 86 3 Supercomputing with Graphics Processing Units Fig. 3.16 Performance indicators for our multiple GPU code; strong scaling is shown in the top panels, while weak scaling can be seen in the bottom ones. The Left-hand side panels show wallclock time for a full-run (for the strong scaling plot) or the amount of wall-clock time necessary to complete 630 time-steps (for the weak scaling plot). The corresponding parallel efficiencies as defined in the text (see e.g. Eq. 3.21 for strong scaling) are presented on the right hand side panels 3.3 Abelian-Higgs Strings 87 Table 3.9 Derived weak scaling measurements for fixed box size of 5123 per domain are presented above. The wall-clock time corresponds to the time to complete 1270 time-steps (the number of time-steps for a full 5123 size simulation). These are derived from the strong scaling measurements above. In addition we present a speed-up as well as a parallel efficiency Box size Number of Domain Wall-clock Speed-Up Efficiency GPU’s decomposition time (x,y,z) (s) (%) 5123 10243 20483 40963 81923 1 8 64 512 4096 (1,1,1) (2,2,2) (4,4,4) (8,8,8) (16,16,16) 96.0 108.27 108.97 117.75 124.41 – 0.89 0.88 0.82 0.77 – 88.7 88.1 81.5 77.1 However, as discussed previously having excellent strong scaling might not even be necessary if the runtimes across the board are relatively short for the amount of processes involved. Before we discuss this, let’s turn our attention to weak scaling. For a smaller sub-domain size of 2563 weak scaling is near-perfect until all the way to 4096 GPUs, as seen in Table 3.8 and corresponding figures. This seems to sugest the overlap is successful in hiding communication costs. Curiously, if we look at the derived weak scaling benchmarks for 5123 , it is not quite as successul, with weak scaling efficiency being minimally of order 77% at 4096 GPUs. The reason for this is not quite well known, although we suspect more careful of the overlap to compensate for larger buffers being exchanged is required to achieve better weak scaling Table 3.9. Let us now return to the point of using a small amount of node-hours for the simulation. In order to do so we will enact a comparison with the Lattice Abelian Higgs benchmark (graciously sent to us by [25] in node-hours. Our simulation can evolve a 40963 lattice in around 3 min of wall-clock time with 4096 GPUs or about 140–180 node-hours depending on the number of GPUs used. In the case of LAH, one 40963 run in the Monte Rosa supercomputer which used 32768 cores (equivalently 1024 nodes) would take 5251 node-hours. This suggests a node-hour speed-up of a factor of around 30 which also indirectly leads to another advantage: we can thus simulate lattices larger than what is seen in the literature so far (of about 4096) by going to a simulation of 81923 , which is about a factor 8 larger in volume and a factor 2 larger in dynamic range. The time for such a production run as seen in Table 3.7 is around a 33.2 min of wall-clock time, with 4096 GPUs, which gives around 2200 node-hours. We would also like to comment on the choice of using node-hours for the comparison and not core-hours. As mentioned previously, the idea of a traditional “core” does not apply to a GPU and so it might not necessarily be correct to compare in terms of cores. In most PRACE documentation for applying to new supercomputing centers the conversion is to consider the number of cores of a node—12 for Daint— and the number of Streaming Multiprocessors of the GPU—56 for Tesla P100— 88 3 Supercomputing with Graphics Processing Units which would avoid penalizing the GPU for the large number of cores. Note as well some ambiguity here, while we do not use the full node, and merely one CPU core per node, one could argue the amount of core-hours would involve multiplying by a factor of 57, and not 68. On Piz Daint, the practical solution to this dilemma is taken: all book-keeping is in node-hours. To avoid these ambiguities, we follow the Daint approach and make the comparison in node-hours. As a final word on this simulation, we note that while not outputting any extra information from the lattice we end up not being IO-bound this also would only enable the exploration of the simplest type of semi-analytical modelling (ie. without any extra arguments to describe additional degrees of freedom). However, if we choose to output more information, one quickly ends up with an onerous amount of data, which would be too much even for the high-end facilities of Piz Daint. One way to short-circuit this is to add in-situ capabilities to the simulation, as will be discussed next. 3.4 In-Situ Analysis and Visualization Pipeline 3.4.1 Reduced Winding Output For almost every single High-Performance Computing simulation, Input/Output is the most stringent bottleneck. This is unfortunate as many times the scientific exploration of a simulation can be bottlenecked by the data needs of the code. This problem is exacerbated by the rate with which storage solutions evolve versus the rate with which computational throughput evolves: the first is much slower than the latter. While outputting initial conditions can be done only once, and checkpointing is not necessary (though supported in our simulation), outputting only average quantities is a way to evade this problem, though it is limiting in the amount of science that can be extracted. We is thus not surprising we started looking to the literature for possible solutions to this bottleneck. One solution is the application of in-situ analysis/visualization techniques as seen in Camata et al. [12], Rautenhaus et al. [45], Mu et al. [35], Sohrabi et al. [50], KAGEYAMA et al. [28], wherein the output data is heavily reduced prior to output and thus only a smaller amount of data is written to disk. To demonstrate this technique in our simulation, our approach will be to output cells pierced by strings, instead of outputting for every single cell in the simulation if a string is present or not. In order to use in-situ for outputting string positions in the lattice, we will use the library ParaView Catalyst (tested 5.8.0). There are two components necessary to achieve this: an Adaptor written in C++ which is responsible for placing all data in a format vtk (and by extension ParaView) will understand and the a Python script which reduces the data and then outputs an Unstructured grid (in Parallel vtu format). Let us then describe the first component of this puzzle: the Adaptor. One begins by creating a vtkMultiBlockDataset, where each process contains a vtkImageData 3.4 In-Situ Analysis and Visualization Pipeline 89 with the necessary information about sub-domain extents (lattice spacing, number of points/cells, origin of each block). After this a series of vtkFloatArray are created (or re-used) and filled with the contents of seven different buffers resident in Host memory but accessible by a GPU (pinned). Six of these pinned buffers are updated by the winding estimator kernel to contain information about which winding pierces each cell face and in what direction (as such this buffer can take values −1, 0, 1). Then one extra kernel updates the last array which merely indicates if a cell is pierced (basically an OR of the absolute values of the previous buffer). Having all data in a format ParaView Catalyst can understand we then need to perform reduction and output. This task is left to the second ingredient of the insitu strategy, the Python script. This script first applies a threshold filter to the data, selecting only cells wherein the last array is non-zero. This is the reduction step. Afterwards we apply the Merge Block filter to merge all string segments throughout the the different sub-domains and we finalize by using the appropriate Parallel Writer for the format the data is in now: the Parallel Unstructured Grid writer. This output is then in the *pvtu format with several auxiliary files (with the content of each sub-domain) being present in *.vtu. With this we already have the position of string cell centers for each string at output timesteps. Some additional treatment can be necessary to identify which points refer belong to the same string, to smooth the resulting centerline and if necessary visualize the network. This will be described in the next section. We are now in a position to describe the performance aspects of this approach. Before we do so, we will characterize the I/O subsystem of the machine used thus far, the Piz Daint supercomputer. All outputs will be written to the /scratch partition, uprooted by a Cray Sonexion 3000 Lustre filesystem with 8.8PB capacity. This filesystem contains 40 object storage targets and handle a file per process approach as long as one does not output thousands of files in a single folder. File per process is what the simulation uses for initial conditions and checkpointing outputs. The maximum bandwidth measured for file per process (for the configuration of each run) is what we will use to estimate the time taken by the HDF5 approach and the amount of data is computed from the outputs of a single 2563 simulation (around 662M B for a single timestep). For the in-situ part measurements of amount of data and timings are obtained over a range of timesteps. The reason we do so, is because as the string network evolves in the linear scaling regime, there will be less string in the lattice, and therefore less string points to output. Data is output at every five timesteps in both cases. All of these measurements can be thought of as a weak scaling benchmark as we keep the sub-domain size constant at 2563 , which means the domain decompositions of lattices sizes 5123 , 10243 , 20483 and 40963 will be (2, 2, 2), (4, 4, 4) (8, 8, 8) and (16, 16, 16) respectively. The maximum file per process bandwidths for each configuration is 5712, 45640 and 113300 MB/s with the last values being also used for the (16, 16, 16) run at 40963 lattice size (as it is already close to the peak bandwidth of the filesystem). The results are summarized in Table 3.10 and Fig. 3.17. These two comparisons summarize well the usual advantages of in-situ techniques. The first one being that the amount of data output is heavily reduced. The smallest 90 3 Supercomputing with Graphics Processing Units Table 3.10 On the left-hand side, a summary of typical output sizes for a timesteps either with Raw output of all cells (HDF5) or using only the unstructured grid outputs from our in-situ approach. On the right-hand side, we summarize corresponding range of times taken by either approach Lattice size Output data size (MB) Time taken (s) HDF5 estimated In-situ measured HDF5 estimated In-situ measured 5123 10243 20483 40963 5296 42368 338944 2711552 [8.7, 25.0] [11.0, 126.0] [33.0, 199.0] [99.0, 144.0] 0.9 0.9 3.0 23.9 [2.6, 3.5] [2.7, 3.2] [2.9, 5.6] [4.1, 8.2] Fig. 3.17 On the left-hand side the typical amount of data output via raw output of all cell contents (windings for six different cell faces) in HDF5 (dashed lines) and the size of outputs for the In-Situ approach, where Unstructured grids are output (full lines). On the right, The typical amount of data outputted via raw output of all cell contents (windings for six different cell faces) in HDF5 (dashed lines) and the size of outputs for the In-Situ approach, where Unstructured grids are outputted (full lines). These outputs are obtained for four different lattice sizes: 40963 (blue), 20483 (purple line), 10243 (orange lines) and 5123 (green line) size of an HDF5 dataset for a 2563 run is about 662 MB in our test set and the largest 2711552 MB at 40963 lattice size (around 2.7 TB for a single timestep). Given the large amount of data for a single timestep required by the HDF5 approach this already means it is unfeasible to analyse several timesteps. In-situ here provides a singificant improvement by reducing the amount of storage space needed by at best four orders of magnitude. Meaning we can either output at a greater temporal rate (or for a longer period of conformal time) or even add more datasets to the outputs, if required. The second advantage is that there seems to be a wall-clock time benefit to using in-situ but only at a sufficiently large lattice size in order to saturate the bandwidth of the filesystem. This is evident when going from 20483 to 40963 lattice size where there amount of time taken is roughly comparable for 20483 with both techniques, whereas in the 40963 in-situ takes an order of magnitude less time. It is thus clear that his technique is a significant improvement over the usual I/O strategy of our multi-GPU simulation and enables the study of the small-scale structure of strings a possibility for large lattices. It does so by overcoming two 3.4 In-Situ Analysis and Visualization Pipeline 91 possibly bottlenecks, the amount of storage available and the maximum bandwidth of the system. We note that it is possible to improve upon this technique by changing what information can be outputted: cell centers and the absolute value of the scalar field would result in a large reduction in the amount of data outputted and in memory consumption (going from seven to two data arrays) and, coupled with a Travelling Salesman Heuristic solver would enable the creation of string centerlines (with interpolated string centers) in post-processing. For now, we will describe the postprocessing approach with this seven-array version of the outputs. 3.4.2 Centerlines Post-Processing Given that the previous step did not perform any analysis or visualization, we now need, in post-processing to do so. We will proceed to the description of the creation of string centerlines. Since all outputs are in the vtk Unstructured grid format, they are easily readable by ParaView. This means we can readily use some of the already existing filters to facilitate our life somewhat. First we use the Connectivity filter to group neighboring cells pierced by a winding together. This already gives a “Blocky” string made up of groups of voxels. From this step onwards we must create a custom filter. In order to ensure the necessary data is available to the custom filter, we additionally apply the PassArrays filter. Next the custom filter, from now on denoted centerlines, begins looping over each region of neighboring cells and by each cellId. The idea will be to add cell centers to a vtkPolyData collection such that these represent vertices of a vtkLine. There is however a small catch to doing so directly: cells are ordered by index (i,j,k) and not by the order in which they appear in a string. The way in which this is solved is by using the winding at each cell face to determine physically valid cell neighbors. Without loss of generality, we will choose the direction given by positive magnetic flux (++1 winding). Once valid neighbors are identified, the voxel bounds for a specific cell and the next one are used to find cell center coordinates, thus tracing a vtkLine connecting both cells. Repeating this procedure until no more neighbors are found then yields a string centerline, as a collection of vtkLines. Note that this procedure is not completely fool-proof and there are some special cases where it can fail to provide correct output. These cases are merely expected instances of string network phenomenology which the above script initially failed to take into account. When two strings intersect at one or more points they can exchange ends, or likewise when one single string self-intersects, a closed loop of string forms at the scale of the horizon or smaller. The subsequent decay of these loops is an object of study in and of itself, as the decay mechanisms imposed and the scale(s) at which they occur can significantly alter observational imprints of cosmic strings. Loops are one special case that was not well-handled by the script initially as the centerline would not close in on itself. This can be solved via a comparison of the number of points and total number of cells: if one exceeds the other by a single unit, 92 3 Supercomputing with Graphics Processing Units Fig. 3.18 String cells colored by region in two close-up screenshots of 20483 radiation era simulation. We show in addition the output of the centerlines custom filter with smoothing via a Hanning window. A Loop (in blue cells) is shown at the center of the top panel screenshot. An intercomutation event (cells in red) is shown on the left-hand side of the bottom panel the last end of the string should be re-connected to the beginning. An example of a loop can be seen in the top panel of Fig. 3.18. The second pathological case comes from the intercommutation regions. When two strings meet at one point, this will generate an X-shaped patchwork of cells. By choosing one cell at random and going through the valid physical neighbors, we only create a string centerline going from one end of the box to the other (one half of the X-shape), without using all cell centers to create a centerline. If the total amount of cells and the number of cells used for the centerline differ, it is necessary to restart the centerline reconstruction at one of the un-used cells. An example of an intercommutation event can be seen in the bottom panel of Fig. 3.18. After all centerlines are created (and all pathological cases are dealt with) we now possess a collection of string centerlines with string positions located at the end of each stair-case segment. This taxicab geometry is merely a consequence of the lattice and of not performing an interpolation to the zero of the complex scalar field. Nevertheless, the resulting small-scale structure from the lattice should be present merely in a scale of roughly the lattice size. Following [26] we can attempt to remove this lattice effect by smoothing the strings along the string path. One way to do so is to convolve this with a choice of window function and window length, such that this artificial structure is removed. Both panels of Fig. 3.18 show the smoothed centerline output (in white). Qualitatively this is more representative of a natural string. 3.5 Conclusion 93 A myriad of information can then be extracted from these smoothed centerlines, such as for instance distribution of string lengths as a fraction of the particle horizon over the entire lattice, as the network evolves. An example of this being done for a radiation era simulation 20483 in Fig. 3.19 can be seen below. 3.5 Conclusion We have ported a cosmic domain walls simulation to a parallel OpenCL implementation, wrote from scratch a single GPU implementation in CUDA an Abelian-Higgs simulation and then proceed to tackle the challenge of extending it to harness multiple GPUs. In all cases, validation proved the behavior of each simulation is in agreement with what is presented in the literature. Furthermore we have discussed the performance benefits of each case, showing the large speed-up of walls (around 200) compared to the previous code, the benefits of the singleGPU and multiple GPU simulation when compared to the literature reference Lattice Abelian Higgs simulations (using 30 times less node-hours; near perfect weak scaling). Having succeeded in doing so, we will note what further steps can be taken to improve and extend this work, and then conclude this chapter by introducing how we can already use it to explore astrophysical consequences of such defect networks. First let’s go over to possible improvements to the walls code. The simplest improvement is to apply the (admittedly simple) optimization used in AbelianHiggs which is to compute the density and velocity not at every timestep, but every n timesteps. This would easily allow us to ease the compute-bound bottleneck and probably end up with a memory-bound implementation. Afterwards, all work done in the multiple-GPU implementation to tackle memory bandwidth (Z-cycling), amount of memory available (multi-GPU) can be added. Given that the multipleGPU AH code is done in a more generic way, it is perhaps easier to port relevant kernels to CUDA (velocity and density estimation) and use the existing code for communication, IO, field updates from the multiple-GPU case. Basically walls would be a specific sub-case of the existing kernels, one without gauge field and with a real scalar instead. After all of this let us comment on what work could be implemented for any multi-GPU defect simulation. In terms of evading or short-circuiting the IO-bound that will exist as soon as we wish to output more information about the network, other than averaged quantities, we are already attempting to improve upon this by adding in-situ capabilities to the simulation code (as was done for example in Ayachit et al. [7]). These in-situ capabilities—unpublished work, described in the previous section—will allow us to study the small-scale structure properties of string networks in more detail than ever before (by outputting string centerlines at resolutions of 40963 and higher) and to detect the presence of bound states in networks with more than one type of string. We can still ask how we can improve the scalability of the simulation, specifically to keep up with the rapidly evolving landscape of supercomputing. One interesting possibility, which could have advantages in both weak and strong scaling, is the 94 3 Supercomputing with Graphics Processing Units Fig. 3.19 The three panels show a network of strings at 20483 lattice size in radiation era at different conformal times, η, 445.0, 477.0 and 511.0 for the top, middle and bottom panels. Strings are colored by their length divided by the size of the particle horizon at the given conformal time. For each case, we plot histograms to reveal the distribution of lengths per horizon size of strings References 95 hypercube decomposition of Blanco-Pillado et al. [10] wherein communication is avoided. We can also attempt to do this in another way: avoid going for larger lattices but increasing the resolution where it might be necessary, via Adaptative Mesh Refinement, as seen in [14, 21, 24]. Lastly we remark that adding Fast Fourier Transform capabilities to this simulation would open up another window for the scientific exploration of this code. A recent (encouraging) work on this matter, in the context of a pseudo-spectral solver with multiple GPUs can be found at [46]. Another earlier reference is the FFT library of reference [23]. In terms of scientific work that can be leveraged already from the mean quantities extraction we will showcase in the next chapter how to use thousands of simulations to calibrate semi-analytical models of string evolution, appropriately extended to account for the velocity dependencies of energy loss and curvature, as seen in Martins et al. [32], which enables a quantitative description of the importance of each energy loss mechanism in the network. This has obvious implications for observational studies which rely on the semi-analytical approach. References 1. AMD Graphics Core Next Architecture White Paper. Technical report, 2012. http://www.amd. com/Documents/GCN_Architecture_whitepaper.pdf 2. AMD OpenCL Optimisation Guide. Technical report, 2014. http://developer.amd.com/toolsand-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimizationguide/ 3. Nvidia Tesla P100 Whitepaper. Technical report, 2016. https://images.nvidia.com/content/pdf/ tesla/whitepaper/pascal-architecture-whitepaper.pdf 4. ASS (2017) Hands on introduction to hpc, a. https://www.archer.ac.uk/training/coursematerial/2017/07/intro-epcc/index.php 5. ASS (2017) Message passing programming with mpi, b. http://www.archer.ac.uk/training/ course-material/2017/07/mpi-epcc/index.php 6. Anandtech. Amd radeon 285 review: Feat. sapphire r9 285 dual-x oc. https://www.anandtech. com/show/8460/amd-radeon-r9-285-review 7. Ayachit U, Bauer A, Geveci B, O’Leary P, Moreland K, Fabian N, Mauldin J (2015) Paraview catalyst: enabling in situ data analysis and visualization. In: Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, ISAV2015, pp. 25–29, New York, NY, USA. ACM. ISBN 978-1-4503-4003-8. https://doi.org/10.1145/ 2828612.2828624 8. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D 75:065015. https://doi.org/10.1103/PhysRevD.75.065015 9. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2010) CMB power spectra from cosmic strings: predictions for the Planck satellite and beyond. Phys Rev D 82:065004. https://doi.org/10. 1103/PhysRevD.82.065004 10. Blanco-Pillado JJ, Olum KD, Shlaer B (2012) A new parallel simulation technique. J Comput Phys 231:98–108. https://doi.org/10.1016/j.jcp.2011.08.029 11. Briggs J, Pennycook SJ, Shellard EPS, Martins CJAP, Woodacre M, Feind K (2014) Unveiling the Early Universe: Optimizing Cosmology Workloads for Intel Xeon Phi Coprocessors in an SGI UV20 00 System. Technical report, SGI/Intel White Paper 96 3 Supercomputing with Graphics Processing Units 12. Camata JJ, Silva V, Valduriez P, Mattoso M, Coutinho AL (2018) In situ visualization and data analysis for turbidity currents simulation. Comput Geosci 110:23–31, . ISSN 0098-3004. 10.1016/j.cageo.2017.09.013. https://www.sciencedirect.com/science/article/pii/ S0098300417305009 13. EPC Centre. Introduction to archer. http://www.archer.ac.uk/training/online/index.php# IntroARCHER 14. Clough K, Figueras P, Finkel H, Kunesch M, Lim EA, Tunyasuvunakool S (2015) GRChombo? Numerical relativity with adaptive mesh refinement. Class Quant Grav 32(24):245011. https:// doi.org/10.1088/0264-9381/32/24/245011 15. Correia JRCCC, Martins CJAP (2017) General purpose graphics-processing-unit implementation of cosmological domain wall network evolution. Phys Rev E 96:043310. https://doi.org/10.1103/PhysRevE.96.043310 16. Correia JRCCC, Martins CJAP (2020) Abelian-higgs cosmic string evolution with CUDA. Astron Comput 32:100388. ISSN 2213-1337. https://doi.org/10.1016/j.ascom.2020.100388 17. Correia JRCCC, Martins CJAP (2021) Abelian-Higgs cosmic string evolution with multiple GPUs. Astron Comput 34:100438. https://doi.org/10.1016/j.ascom.2020.100438 18. Correia JRCCC, Martins JAP (2019) Extending and calibrating the velocity dependent onescale model for cosmic strings with one thousand field theory simulations. Phys Rev D 100(10):103517. https://doi.org/10.1103/PhysRevD.100.103517 19. Correia JRCC, Leite ISCR, Martins CJAP (2014) Effects of biases in domain wall network evolution. Phys Rev Particles, Fields, Gravit Cosmol 90(2):1–9. ISSN 15502368. https://doi. org/10.1103/PhysRevD.90.023521 20. Daverio D, Hindmarsh M, Kunz M, Lizarraga J, Urrestilla J (2016) Energy-momentum correlations for Abelian Higgs cosmic strings. Phys Rev D 93(8):085014 21. Drew A, Shellard EPS (2019) Radiation from global topological strings using adaptive mesh refinement: methodology and massless modes 22. Fuhrer O, Chadha T, Hoefler T, Kwasniewski G, Lapillonne X, Leutwyler D, Lüthi D, Osuna C, Schär C, Schulthess TC, Vogt H (2018) Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 gpus with cosmo 5.0. Geosci Model Dev 11(4):1665–1681. 10.5194/gmd-11-1665-2018. https://www.geosci-model-dev.net/11/1665/ 2018/ 23. Gholami A, Hill J, Malhotra D, Biros G (2015) Accfft: a library for distributed-memory FFT on CPU and GPU architectures. CoRR, abs/1506.07933 http://arxiv.org/abs/1506.07933 24. Helfer T, Aurrekoetxea JC, Lim EA (2019) Cosmic string loop collapse in full general relativity. Phys Rev D 99(10):104028. https://doi.org/10.1103/PhysRevD.99.104028 25. Hindmarsh M, Daverio D (2019) Private communication, 20 December 2019 26. Hindmarsh M, Stuckey S, Bevis N (2009) Abelian higgs cosmic strings: small scale structure and loops. Phys Rev D 79:123504. https://doi.org/10.1103/PhysRevD.79.123504 27. Hindmarsh M, Lizarraga J, Urrestilla J, Daverio D, Kunz M (2017) Scaling from gauge and scalar radiation in Abelian higgs string networks. Phys Rev D 96(2):023525. https://doi.org/ 10.1103/PhysRevD.96.023525 28. Kageyama A, Sakamoto N, Miura H, Ohno N (2020) Interactive exploration of the in-situ visualization of a magnetohydrodynamic simulation. Plasma and Fusion Res 15:1401065– 1401065 https://doi.org/10.1585/pfr.15.1401065 29. Kajantie K, Karjalainen M, Laine M, Peisa J, Rajantie A (1998) Thermodynamics of gauge invariant U(1) vortices from lattice Monte Carlo simulations. Phys Lett B 428:334–341. https:// doi.org/10.1016/S0370-2693(98)00440-7 30. Kibble TWB (1976) Topology of cosmic domains and strings. J Phys A 9:1387–1398. https:// doi.org/10.1088/0305-4470/9/8/029 31. Leite AMM, Martins CJAP (2011) Scaling properties of domain wall networks. Phys Rev D 84:103523. https://doi.org/10.1103/PhysRevD.84.103523 32. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10. 1103/PhysRevD.93.043534 References 97 33. Martins CJAP, Rybak IYu, Avgoustidis A, Shellard EPS (2016) Stretching and Kibble scaling regimes for Hubble-damped defect networks. Phys Rev D 94(11):116017. https://doi.org/10. 1103/PhysRevD.95.039902[Erratum: Phys. Rev. D 95, no.3,039902(2017)] 34. Micikevicius P (2009) 3d finite difference computation on gpus using cuda. In: Proceedings of 2Nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp 79–84, New York, NY, USA. ACM. ISBN 978-1-60558-517-8. https://doi.org/10.1145/ 1513895.1513905 35. Mu D, Moran J, Zhou H, Cui Y, Hawkins R, Tatineni M, Campbell S (2019) In-situ analysis and visualization of earthquake simulation. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC ’19, New York, NY, USA. Association for Computing Machinery. ISBN 9781450372275. https://doi.org/10. 1145/3332186.3332201 36. Munshi A (2012) OpenCL 1.2 Specification http://scholar.google.com/scholar?hl=en& btnG=Search&q=intitle:The+opencl+specification#0 37. Nguyen A, Satish N, Chhugani J, Kim C, Dubey P (2010) 3.5-d blocking optimization for stencil computations on modern cpus and gpus. In: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–13. https://doi. org/10.1109/SC.2010.2 38. NvidiaCorporation. Cuda programming guide. https://docs.nvidia.com/cuda/cuda-cprogramming-guide/index.html 39. NvidiaResearch-NVLabs (2018) Cub—cuda unbound v1.8.0. https://nvlabs.github.io/cub/ 40. OpenCL S (2013) Performance of atomics. http://simpleopencl.blogspot.pt/2013/04/ performance-of-atomics-atomics-in.html 41. Phillips EH, Fatica M (2010) Implementing the himeno benchmark with cuda on gpu clusters. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS) 42. Potter D, Stadel J, Teyssier R (2017) PKDGRAV3: Beyond trillion particle cosmological simulations for the next era of galaxy surveys. Comput Astrophys Cosmol 4:2. https://doi. org/10.1186/s40668-017-0021-1 43. PRACE (2017) Best practice guide gpgpu. https://prace-ri.eu/wp-content/uploads/BestPractice-Guide_GPGPU.pdf 44. Press WH, Ryden BS, Spergel DN (1989) Dynamical evolution of domain walls in an expanding universe. Astrophys J 347:590–604. https://doi.org/10.1086/168151 45. Rautenhaus M, Böttinger M, Siemen S, Hoffman R, Kirby RM, Mirzargar M, Röber N, Westermann R (2018) Visualization in meteorology-a survey of techniques and tools for data analysis tasks. IEEE Trans Visual Comput Graphics 24(12):3268–3296. https://doi.org/10. 1109/TVCG.2017.2779501 46. Ravikumar K, Appelhans D, Yeung PK (2019) Gpu acceleration of extreme scale pseudospectral simulations of turbulence using asynchronism. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’19, New York, NY, USA. Association for Computing Machinery. ISBN 9781450362290. https:// doi.org/10.1145/3295500.3356209 47. Ryden BS (1988) The area of isodensity contours as a measure of large-scale structure. Astrophys J 333:L41–L44. https://doi.org/10.1086/185284 48. Scarpino M (2011) OpenCL in action. Manning Publications. ISBN 9781617290176. https://papers2://publication/uuid/69731F95-2EF6-4DAA-93D3-3E101997D299 49. Scherrer RJ, Vilenkin A (1998) “Lattice-free⣞ simulations of topological defect formation. Phys Rev D 58:103501. https://doi.org/10.1103/PhysRevD.58.103501 50. Sohrabi R, Omlin S, Miller SA (2019) Geyser: 3d thermo-hydrodynamic reactive transport numerical simulator including porosity and permeability evolution using gpu clusters. Comput Geosci 23(6):1317–1330. ISSN 1573-1499. https://doi.org/10.1007/s10596-019-09885-w 51. Vilenkin A, Shellard ES (2000) Cosmic Strings and Other Topological Defects. Cambridge University Press, 7. ISBN 978-0-521-65476-0 52. Xmartlabs (2012) Cuda occupancy calculator. https://github.com/xmartlabs/cuda-calculator 98 3 Supercomputing with Graphics Processing Units 53. Zhang Y, Mueller F (2012) Auto-generation and auto-tuning of 3d stencil codes on gpu clusters. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO ’12, pp 155–164, New York, NY, USA. ACM. ISBN 978-1-4503-1206-6. https://doi.org/ 10.1145/2259016.2259037 Chapter 4 Calibration of Extended VOS Models There is geometry in the humming of the strings, there is music in the spacing of the spheres. Pythagoras 4.1 Prelude As previously highlighted in the introduction, a network of cosmic strings is expected to generate imprints on cosmological backgrounds such as the Cosmic Microwave Background, the Stochastic Gravitational Wave Background or lensing. Comparing to such backgrounds inevitably results on a constraint on the mass scale of these objects. Such scale is intimately connected to the symmetry breaking scale at which the phase transition which generated the network. Therein lies then a connection between high-energy physics and observational cosmology. The aforementioned imprints will inevitably be sourced by the energy-momentum tensor of strings, which for a network in scale-invariant evolution (typical of the radiation dominated epoch) is greatly simplified. The only catch is the treatment of transition between epochs and the conformal stretching regime of dark energy domination. There are two possible ways to tackle this issue, albeit in broad terms both involve some manner of extrapolation to full cosmological history. The first one involves the brute-force computation of Unequal Time Correlation functions in radiation and matter simulations, with some additional interpolation added to treat the transition between radiation and matter. The second way involves using the canonical semi-analytical model of string evolution (shown in Chap. 2) to compute the full history of the string network. Then either by focusing on the amount of loops to be produced [42] or by assuming perturbations seeded by Unconnected Segments [7, 13, 38], one can arrive at a predicted power spectrum for the chosen background. While it might seem this approach is disconnected from simulations, this © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations, Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_4 99 100 4 Calibration of Extended VOS Mode is imperatively not the case: any thermodynamic model will contain free parameters which cannot be derived ab-initio analytically. Recently it was shown that a single free parameter might not be sufficient to accurately describe the full evolution of a domain wall network and the energy loss mechanisms. This class of extended models was also shown to properly predict network evolution through transitions between epochs, as shown for domain walls in [32, 33]. Having created simulations which can compute and output average network characteristics faster than previous simulations, we will now exploit this higherperformance to calibrate extended VOS models. The calibrations and conclusions to be drawn throughout this chapter resulted on the following: • Work published in Physical Review D, “Effects of Biases in Domain Wall Network Evolution II: Quantitative Analysis”, found in reference [20]; • Work published in Physical Review D, “Extending and calibrating the velocity dependent one-scale model for cosmic strings with one thousand field theory simulations”, which can be found at the following [18]; • Work published in Physical Review D, “Quantifying the effect of cooled initial conditions on cosmic string network evolution”, which is found at the following coordinates [16]; • Work published in Physical Review D, entitled “High resolution calibration of the cosmic strings velocity dependent one-scale model”, found at the following [17]. We will begin with a simpler case of domain walls pushed outside the horizon, which re-enter at a later time and by doing so will enter scaling. We will then turn our attention to the calibration of the extended VOS model for gauged cosmic strings. Given the larger number of numerical choices available to this latter type of simulations and the unknown impact of them, we resort to systematically exploring the impact of each choice possible in order to obtain the most robust and highest resolution possible calibration. We then finalize by discussing the impact on observational signatures. 4.2 Global Domain Walls 4.2.1 Walls Formed Before the End of Inflation Let us then begin with the extended VOS model for domain walls and its calibration in light of super-horizon (anisotropic) walls. This section will serve as “tutorial” on the ways to calibrate the VOS, with subsequent sections improving on the methodology hereby presented. As previously noted, a network of domain walls at scaling can potentially overclose the Universe. This can be seen by considering how the density of walls is expected to decay ρw ∝ t −1 and comparing to the critical density ρc ∝ t −2 . There are some solutions which can make domain walls cosmologically viable, which either 4.2 Global Domain Walls 101 require deviating from scaling (via some biasing of the potential) or by ensuring walls are formed/enter scaling at a sufficiently late time to avoid overclosure. Ensuring that walls must form sufficiently late imposes an upper bound of ≈ 1MeV on the energy scale of these defects–a constraint known as the Zeldovich bound [46]. Another way to avoid overclosure can also be achieved by moving the formation time into the diametrically opposite direction, to earlier, during an inflationary epoch. If during inflation a network of domain walls form, they would enter a conformal stretching regime where L ∝ a and v → 0. By transitioning into the radiation era this network of walls would have a correlation length larger than the horizon, and would therefore remain “frozen” outside it. Walls would only start scaling when the horizon grows to sizes comparable to the mean separation of the network. The time of re-entry upon the horizon will then depend on the typical length scale of walls after inflation, and therefore on the details of the inflationary epoch: the energy scale of walls, the number of e-folds occurring from wall formation till inflation’s end. We remark that a too large number of e-folds would essentially mean that walls would never re-enter the horizon during radiation or matter epochs (similar arguments can also be said of strings and monopoles, see for instance [43]). One other possibility is not to start with an isotropic network, where the mean separation is the same in all directions but one with an anisotropy derived from being formed in a anisotropic scenario. In this case the typical lengthscale would be for instance be larger over one direction over the others. This has two effects: first the timescale for re-entry upon the horizon will be set by the larger lengthscale, and second, since in anisotropic inflation the Universe isotropizes (and then transitions into radiation), the network of walls will tend towards the usual scaling behavior dictated by a single correlation length as the anisotropic imprints on the network are slowly erased. This scenario was described was introduced in [5] and later re-explored (with larger simulations) in [19]. For this purpose of model calibration we must test the evolution of walls in post-inflationary Universes, there is no need to simulate anisotropic Universes (and inflation) but only start from a network of super-horizon (anisotropic) domain walls. This essentially means we need to generate initial conditions where the mean separation of the walls is larger than the size of the horizon and the velocities tend to zero. Numerically this is easy to achieve by evolving the network to some conformal time where the network of walls has formed and is reasonably well-defined, and subsequently re-setting the simulation to initial conformal time and setting velocities to zero. An anisotropic super-horizon network will also require an additional ingredient: a “stretching” of factor f via interpolation (and clamping) along a spatial direction. This leaves us with three cases to compare, • Standard networks with random initial conditions for the scalar field φ, and φ̇ set to zero. It will be henceforth denoted case A. These provide a fiducial comparison not only to account for possible numerical effects (lattice size) but for calibration comparison with the subsequent cases. • Super-horizon isotropic networks, whose initial conditions are obtained from evolving a standard network from conformal time η0 = 1.0 till η = 20.0 (twice 102 4 Calibration of Extended VOS Mode wall thickness), resetting conformal time back to η = η0 and setting φ̇ to null. This will be referred to as case B. • Super-horizon anisotropic networks, where the initial conditions will be as in case B but with an additional stretching applied to either have a “preferential” mean separation which is either twice as large as the one in any other direction (case C) or four times larger (case D). Note that in [19] case D corresponded to having a preferential direction 16 times larger. The analysis of [19] had previously demonstrated via 40962 matter epoch simulations that within statistical uncertainties the wall networks do reach scaling, and that convergence to scaling will require more time with the anisotropic networks, in particular those with a larger stretch factor. We will continue this analysis by doubling the box size along each direction 81922 and by simulating several expansion rates, as required for the Extended VOS calibration (exemplified for the first time in [32, 33]) with super-horizon (anisotropic) networks. 4.2.2 A Primer on Calibrating the Extended VOS–Domain Walls In order to describe the procedure for VOS model calibration we will begin by presenting the extended VOS for domain walls (similar from the cosmic string model presented earlier). As mentioned previously, the VOS model for domain walls was been qualitatively derived in [6, 41], and a more rigorous ab initio derivation was found by [32]. It relies on two averaged quantities, a density ρw (or equivalent a characteristic physical lengthscale L, related to the former via ρw = σ/L) and a root mean squared velocity v. In a FLRW spacetime, these evolve as, dL = (1 + 3v 2 )H L + F(v) dt k(v) dv = (1 − v 2 ) − 3H v , dt L (4.1) (4.2) where F(v) and k(v) are respectively energy loss and momentum (or curvature) velocity-dependent functions. The model has been shown to be in very good agreement with high-resolution field theory simulations of domain wall networks in FLRW isotropic universes, for a very broad range of expansion rates. The simpler version of the VOS–with constant momentum parameter and a linear energy loss function– seemed to accurately describe low resolution simulations [27]. However, as the resolution increased, the extension was necessary [32, 33] to properly predict the evolution of the quantities of interest. Although a detailed derivation of the velocity dependence of the two model parameters can be found in [32] and is briefly discussed in the introduction, here we will briefly state them. The momentum parameter takes the following the form, 4.2 Global Domain Walls 103 k(v) = k0 · 1 − (qv 2 )βw 1 + (qv 2 )βw (4.3) where k0 is the maximal value of the momentum parameter, q the inverse of the maximal wall network velocity squared, and βw has no clear physical interpretation– serving as a parameter to control the interpolation of the momentum parameter from low and high velocity limits. All of them are constant free parameters. In the case of cosmic strings, an analytical ansatz can be derived to fix all of these values (assuming the Nambu-Goto approximation and a helicoidal string), however such is not the case for walls. The maximal momentum parameter k0 might be bounded lower than unity (albeit larger values can indicate a deviation from the one-scale approximation), and q must 2 2 where vmax can be determined via be within the following interval 0 < 1/q ≤ vmax defect dimensionality to be ≈ 2/3. As for the energy loss function, the traditional blob chopping term cw v is complemented by a power-law term to describe scalar radiation, F(v) = cw v + d[k0 − k(v)]r , (4.4) where, d and r are two more constant parameters for the normalization and exponent of the power law. Note that by definition, in the low velocity limit, this term will tend to zero. For easier comparison with numerical simulations, the model can be re-written in terms of the density ρw = σ/L and in conformal time η, F(v)ρ2w dρw = −3v 2 Hρw − dη σ (4.5) dv k(v)ρ = (1 − v 2 ) − 3Hv ; dη σ (4.6) where H is the conformal Hubble parameter. The first step towards VOS model calibration is to ensure that on the conformal time range to be used, all simulations have reached scaling. The reason why scaling is used for the calibration is due to the fact it is a fixed point of the model for any Universe where the scale factor behaves as a ∝ t m ∝ η m/(1−m) . Given the need to properly extract velocity dependencies of each function in the model, we will also need to verify the existence of scaling for six expansion rates in the range 0.5 < m < 0.99. A possible way to do so is to verify if from the following scaling laws, ρ ∝ ημ γv ∝ η ν (4.7) the scaling exponents μ and ν are consistent with −1, 0, respectively. This procedure is applied for the average of 10 runs for each expansion rate (and case) in a given fit 104 4 Calibration of Extended VOS Mode Table 4.1 Scaling exponents μ and ν, and asymptotic values σ/(ρτ ) and γv for each case mentioned in the text (A, B, C and D) for each of the simulated expansion rates. A fit range of η ∈ [501.25, 3096.25] was used in all cases. One-sigma statistical uncertainties are shown throughout Case m μ ν σ/(ρw η) γv A 1/2 −0.972 ± 0.004 −0.081 ± 0.005 0.547 ± 0.018 0.397 ± 0.022 2/3 −0.973 ± 0.013 −0.043 ± 0.008 0.510 ± 0.055 0.338 ± 0.021 4/5 −0.971 ± 0.006 −0.013 ± 0.005 0.410 ± 0.020 0.269 ± 0.010 9/10 −1.024 ± 0.006 −0.028 ± 0.006 0.319 ± 0.016 0.192 ± 0.009 95/100 −1.014 ± 0.005 0.022 ± 0.006 0.225 ± 0.010 0.136 ± 0.006 99/100 −0.975 ± 0.002 0.010 ± 0.003 0.099 ± 0.001 0.059 ± 0.001 1/2 −0.985 ± 0.010 −0.017 ± 0.006 0.565 ± 0.042 0.408 ± 0.017 2/3 −0.963 ± 0.009 −0.042 ± 0.009 0.495 ± 0.038 0.336 ± 0.023 4/5 −1.031 ± 0.006 −0.049 ± 0.005 0.434 ± 0.023 0.271 ± 0.012 9/10 −0.979 ± 0.003 −0.034 ± 0.004 0.305 ± 0.007 0.189 ± 0.006 95/100 −0.992 ± 0.003 0.010 ± 0.007 0.220 ± 0.006 0.134 ± 0.007 99/100 −0.990 ± 0.002 0.018 ± 0.002 0.100 ± 0.001 0.059 ± 0.001 1/2 −1.003 ± 0.012 −0.043 ± 0.010 0.504 ± 0.046 0.373 ± 0.028 2/3 −0.959 ± 0.008 −0.037 ± 0.006 0.435 ± 0.025 0.316 ± 0.014 4/5 −0.983 ± 0.010 −0.032 ± 0.008 0.376 ± 0.028 0.258 ± 0.015 9/10 −0.987 ± 0.008 −0.046 ± 0.006 0.294 ± 0.018 0.187 ± 0.009 95/100 −0.990 ± 0.004 0.029 ± 0.005 0.214 ± 0.006 0.135 ± 0.006 99/100 −0.992 ± 0.001 0.021 ± 0.003 0.100 ± 0.001 0.059 ± 0.001 1/2 −0.992 ± 0.010 −0.012 ± 0.007 0.351 ± 0.028 0.307 ± 0.017 2/3 −0.933 ± 0.016 −0.097 ± 0.011 0.320 ± 0.039 0.258 ± 0.023 4/5 −0.962 ± 0.005 −0.043 ± 0.009 0.294 ± 0.012 0.230 ± 0.016 9/10 −0.990 ± 0.008 −0.004 ± 0.008 0.250 ± 0.014 0.184 ± 0.011 95/100 −0.971 ± 0.005 0.036 ± 0.006 0.191 ± 0.007 0.134 ± 0.006 99/100 −0.982 ± 0.002 0.023 ± 0.003 0.097 ± 0.002 0.059 ± 0.002 B C D range, which we take to be η ∈ [501.25, 3096.25]. The reason for a specific choice of expansion rate is related to two factors: at early enough times the network might not be already in scaling (this can be especially true for case D, as the timescale for re-entry is larger than in the other cases) and at late times there might not be enough walls to ensure a good-enough statistic when computing mean velocity squared. The results can be found in Table 4.1 and point towards a consistency with scaling for all sets of data. In fact, it is clear that even if the approach to scaling is affected by the value of the stretch factor and the expansion rate, all simulations achieve scaling given enough conformal time. This can be visually confirmed in Figs. 4.2 and 4.3. Having confirmed scaling, we then use the same minimization procedure from [32, 33] to obtain the best-fit model parameters for each case first and then with all cases (global fit). The output, together with one-sigma uncertainties (statistical from the average of runs), can be found in Table 4.2. We confirm that the best-fit parameters for Case D are different from those of the other cases, which can be confirmed as well from the VOS model predictions for each velocity dependent function and average cw 0.00 ± 0.01 −0.00 ± 0.01 0.00 ± 0.01 −0.00 ± 0.01 0.00 ± 0.01 0.00 ± 0.03 0.00 ± 0.08 Case A Case B Case C Case D Global Ref. [32] Ref. [33] ≤ 0.99 ≤ 0.99 ≤ 0.99 ≤ 0.99 ≤ 0.99 ≤ 0.9 ≤ 0.9998 m 0.5 ≤ m 0.5 ≤ m 0.5 ≤ m 0.5 ≤ m 0.5 ≤ m 0.5 ≤ m 0.2 ≤ m Case 0.28 ± 0.01 0.29 ± 0.02 0.23 ± 0.03 0.13 ± 0.01 0.22 ± 0.02 0.29 ± 0.01 0.26 ± 0.02 d 1.44 ± 0.12 1.57 ± 0.20 1.84 ± 0.21 2.05 ± 0.22 2.10 ± 0.39 1.30 ± 0.06 1.42 ± 0.04 r 1.92 ± 0.21 1.47 ± 0.18 1.37 ± 0.17 1.30 ± 0.18 1.52 ± 0.30 1.65 ± 0.17 1.08 ± 0.07 βw 1.71 ± 0.01 1.73 ± 0.02 1.73 ± 0.03 1.71 ± 0.05 1.72 ± 0.03 1.72 ± 0.03 1.77 ± 0.03 k0 5.07 ± 0.39 4.08 ± 0.43 5.25 ± 0.52 8.97 ± 0.78 5.68 ± 0.89 4.10 ± 0.17 3.35 ± 0.32 q Table 4.2 Best fit parameters, with one-sigma statistical uncertainties, for the extended VOS for the 81922 domain wall simulations in the present work (cases A, B, C and D, plus a global fit to all the data). For comparison, the bottom two rows show the parameters obtained for 40963 simulations of standard walls in [32, 33] 4.2 Global Domain Walls 105 106 4 Calibration of Extended VOS Mode 0.7 0.5 case A case B case C case D 0.4 0.3 0.4 v σ/(ρη ) 0.5 case A case B case C case D 0.6 0.3 0.2 0.2 0.1 0.1 0.0 0.0 0.5 0.6 0.7 0.8 0.9 0.5 1.0 0.6 0.7 m 1.75 caseA caseB caseC caseD 1.50 1.0 0.6 0.5 1.00 F (v) k(v) 0.9 caseA caseB caseC caseD 0.7 1.25 0.4 0.3 0.75 0.2 0.50 0.1 0.25 0.00 0.8 m 0.0 0.0 0.1 0.2 0.3 v 0.4 0.0 0.1 0.2 0.3 0.4 v Fig. 4.1 Comparing the VOS model prediction, using the best-fit parameters listed in Table 4.2, for the scaling parameters σ/ρη (top left), v (top right), the curvature parameter k(v) (bottom left) and the global energy losses parameter F(v) (bottom right). The fits for Cases A, B, C and D are separately show in each panel quantities, as seen on Fig. 4.1. We remark that the latter figure also confirms the extended model accurately describes scaling for all simulations. For a discussion of additional systematic uncertainties in these simulations, see [33]. In addition we will also compare with the previously obtained calibrations from full 40963 simulations shown in [32, 33]. Given the aforementioned uncertainties all calibrations are mostly consistent. We point out the remarkable consistency in the value of the blob chopping parameter cw = 0 indicating that in no case does blob chopping play a dominant role in energy loss, and in the maximal value of the value of the momentum parameter, k0 . As a larger expansion rate is translated in larger Hubble damping, this consistency is somewhat expected, as at large expansion rates all anisotropic imprints on small-scale structure are quickly erased. This is not the case at low expansion rate, and indeed a rather different k(v) is expected (clearly shown in the bottom panels of Fig. 4.1). This already describes how to calibrate the extended VOS model. However, we can now ask an additional question: does the VOS model describe the approach to scaling reasonably? To do so, we insert the calibrated parameters into the equations, use the measure quantities at the first timestep as initial conditions and numerically integrate the model. The results of this integration (dashed lines) and the measured average network quantities are shown in Figs. 4.2 and 4.3. We remark that the model seems to do quite well for low expansion rates, although less so for higher ones (qualitatively). The reason for this might be due to the transient scaling regime identified by [33]. 4.2 Global Domain Walls 107 caseA caseA 0 0 −1 −1 −2 −2 log(γv) log(ρ) −3 −4 m=1/2 −5 −3 m=1/2 m=2/3 m=2/3 −6 m=4/5 −7 m=95/100 m=9/10 0 1 m=9/10 m=95/100 −5 m=99/100 −8 m=4/5 −4 2 3 4 5 6 7 m=99/100 0 8 1 2 3 4 5 6 7 8 log(η) log(η) caseB caseB −1 −1 −2 −2 −3 −4 −5 log(γv) log(ρ) −3 m=1/2 m=2/3 −6 −4 m=1/2 m=2/3 −5 m=4/5 m=4/5 m=9/10 m=9/10 −7 m=95/100 m=95/100 −6 m=99/100 m=99/100 −8 0 1 2 3 4 5 6 7 0 8 1 2 3 5 6 7 8 log(η) log(η) caseC caseC −1 4 −1 −2 −2 −3 −3 −5 log(γv) log(ρ) −4 m=1/2 m=2/3 −6 m=9/10 m=99/100 0 1 m=2/3 m=4/5 m=9/10 −6 m=95/100 −8 m=1/2 −5 m=4/5 −7 −4 m=95/100 m=99/100 −7 2 3 4 5 6 7 0 8 1 2 3 4 5 6 7 8 log(η) log(η) caseD caseD −1 −2 −2 −3 −5 log(γv) log(ρ) −3 −4 m=1/2 m=2/3 −4 m=1/2 m=2/3 −5 m=4/5 −6 m=4/5 m=9/10 m=99/100 0 1 m=9/10 −6 m=95/100 −7 m=95/100 m=99/100 −7 2 3 4 log(η) 5 6 7 8 0 1 2 3 4 5 6 7 8 log(η) Fig. 4.2 Comparing the evolution of the density ρ (left column) and the root mean squared velocity γv (right column) for simulations in the various cases and expansion rates. Each panel compares the results of the various expansion rates m for one specific Case (A, B, C or D). In all panels the solid lines show the results of the simulations, while the dashed lines show the corresponding integration of the VOS, using the best-fit parameters of Table 4.2 108 4 Calibration of Extended VOS Mode m=1/2 m=1/2 0 0.0 −1 −0.5 −2 −1.0 log(γv) log(ρ) −3 −4 −5 −1.5 −2.0 −6 case A case B case C case D −7 −8 0 1 case A case B case C case D −2.5 −3.0 2 3 4 5 6 7 0 8 1 2 3 log(η) 5 6 7 8 log(η) m=2/3 0 4 m=2/3 −0.5 −1 −1.0 −2 −1.5 log(γv) log(ρ) −3 −4 −2.0 −2.5 −5 case A case B case C case D −6 −7 −8 0 1 case A case B case C case D −3.0 −3.5 2 3 4 5 6 7 0 8 1 2 3 log(η) 6 7 8 m=9/10 −1.5 −1 −2.0 −2 −2.5 log(γv) −3 log(ρ) 5 log(η) m=9/10 0 4 −4 −3.0 −3.5 −5 −4.0 case A case B case C case D −6 −7 0 1 case A case B case C case D −4.5 2 3 4 5 6 7 −5.0 8 0 1 2 3 log(η) 5 6 7 8 log(η) m=99/100 m=99/100 0 4 −3 −1 −4 log(γv) log(ρ) −2 −3 −5 −4 case A case B case C case D −5 −6 0 1 case A case B case C case D −6 −7 2 3 4 log(η) 5 6 7 8 0 1 2 3 4 5 6 7 8 log(η) Fig. 4.3 As in Fig. 4.2, but each panel now compares the results of the four Cases (A, B, C and D) for a fixed expansion rate m; specifically the cases m = 1/2 (radiation era), m = 2/3 (matter era), m = 9/10 and m = 99/100 are shown 4.3 Abelian-Higgs Cosmic Strings 109 Given that this section illustrated how to calibrate the extended VOS model, we can now describe the calibration for cosmic string networks and all improvements made to the calibration pipeline. 4.3 Abelian-Higgs Cosmic Strings 4.3.1 Calibrations on Small Lattices–A First Approach We will now exemplify the calibration of local Abelian-Higgs cosmic strings. This first calibration will be based on small resolution lattices of size 5123 , with lattice spacing Δx = 0.5 and with constant comoving width. We will use 12 runs per expansion rate (to reduce statistical uncertainty), for 43 expansion rates and two sets of estimator choices–resulting in a total of 1032 simulations used. Some of the aforementioned choices (lattice spacing, lattice size, constant comoving width) and their impact on the model calibration need to be properly explored, however this requires more hardware resources than were available at the time of this first calibration–we had only two Nvidia 1080Ti’s (11GB of Video RAM) and one Nvidia Quadro P5000 (16GB of VRAM), installed on different machines. We will therefore defer an exploration of several numerical choices to when we more adequate hardware resources were available. The only choice that could (and in fact was) explored with the same resources as this preliminary calibration was the impact of cooling on scaling, and the sensitivity of the model to removal of radiation. Following the same recipe as for domain walls, we must first ascertain the constancy of ξ˙ and < v 2 >, the asymptotic quantities. There is a small numerical technicality that needs some explaining. While one would expect the following scaling laws, ξ ∝ ημ v2 ∝ ην , (4.8) with the expected values of μ = 1 and ν = 0 indicating that the network has reached scaling, this is not completely true here. The observed scaling law (as seen in [10]) is of the form ξ ∝ (η − η0 ) with the extra offset η0 being a consequence of the choice of initial conditions. With sufficient dynamic range (so at late enough times) this value takes a less and less prominent role, and therefore one would tend to the defacto expected scaling law ξ ∝ η. There are some strategies to drive this offset to zero such as preparing the initial conditions to have correlations on a certain scale, starting the simulation at a later time and then cooling the initial conditions (the strategy employed in [11, 21, 24]) or, for example, by evolving the network in a high expansion rate Universe and then change to the desired expansion rate. We have explored driving the offset to zero in a radiation epoch by using the fast expansion first and then allowing the normal evolution–this indeed reduces the value of the offset, as expected. Note that this technique is not useful here–it has little to no effect 110 4 Calibration of Extended VOS Mode for high enough expansion rates (as expected) and the details of the high damping evolution need to be tuned for each expansion rate. From the point-of-view of the VOS equations (re-written in conformal quantities), mξ F(v) dξ = v2 + dη (1 − m)η 2 (4.9) dv 2mv k(v) 2 = (1 − v ) − . dη ξ (1 − m)η (4.10) This different scaling law is also not an issue, as the quantity of interest is the ˙ meaning this rate of change can simply be with ξ/(η − η0 ). slope of ξ, or ξ: Given that the model will be calibrated from linear scaling behavior, we will use the asymptotic quantities v0 and , which is defined as, = ξ , (η − η0 )(1 − m) (4.11) where m is the expansion rate in any Universe with a ∝ t m where t is the physical time, or in conformal time a ∝η m/(1−m) . The asymptotic quantities are obtained from the measured ξ values and v 2 at linear scaling. For each expansion rate, we compute the average offset η0 to subsequently compute as well as its uncertainties. We find that this offset varies mildly from 36 to 48, depending on expansion rate. It is also useful to re-write the VOS model assuming linear scaling as, F(v0 ) = 2 [1 − m(1 + v02 )] (4.12) k(v0 ) = 2m v0 . (4.13) which gives us a direct way to compare analytical expectations (standard or extended forms) of each velocity-dependent function with simulation output. Before such a comparison is to be done, we must, however, verify exponents μ and ν (and therefore the quality of scaling) and use it to select a specific conformal time range. For this verification, we will also use two different correlation length estimators (Winding and Lagrangian based estimators, see Eqs. 3.17 and 3.14, respectively) and two different velocity estimators (scalar conjugate momenta and equation of state based estimators, see Eqs. 3.18 and 3.20, respectively). The values for these two exponents and respective asymptotic quantities can be found in Table 4.3. and v0 are furthermore depicted in Fig. 4.4. Our criteria for ensuring the scaling assumption holds will be those used in the domain walls case (previous section and [32, 33]), which consist of demanding a value of μ consistent with unity to at least two decimal places and ν diverging maximally by about 10% from nil. The reason the second criterion is relatively more lax is due to the inherent difficulty in measuring velocities in field theory simulations. Using this information and criteria we can then select the conformal time range for the calibration, which in the table and throughout the rest of νω 0.024±0.004 0.003±0.005 0.003±0.005 0.008±0.004 0.004±0.004 0.017±0.004 0.009±0.004 0.023±0.004 0.036±0.005 0.033±0.005 0.027±0.005 0.029±0.005 0.043±0.005 0.051±0.005 0.054±0.005 0.073±0.006 0.070±0.006 0.080±0.006 0.085±0.006 0.084±0.006 0.081±0.007 μW 0.999±0.005 1.000±0.005 0.999±0.005 0.999±0.005 0.999±0.004 0.999±0.004 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 1.000±0.003 0.999±0.003 0.999±0.003 1.000±0.003 1.000±0.003 0.999±0.003 m 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.60 0.61 0.62 0.63 0.64 0.6(6) 0.68 0.69 0.70 0.71 0.72 0.287±0.002 0.291±0.002 0.293±0.002 0.292±0.003 0.293±0.002 0.292±0.002 0.293±0.003 0.292±0.003 0.291±0.002 0.289±0.002 0.288±0.002 0.288±0.003 0.292±0.002 0.291±0.002 0.292±0.002 0.297±0.003 0.298±0.003 0.300±0.004 0.303±0.004 0.310±0.004 0.307±0.004 ξW /(η − η0 ) 0.463±0.008 0.471±0.007 0.477±0.008 0.483±0.007 0.488±0.007 0.496±0.007 0.507±0.006 0.511±0.006 0.515±0.006 0.518±0.006 0.522±0.006 0.525±0.006 0.530±0.006 0.533±0.005 0.536±0.005 0.539±0.005 0.541±0.005 0.544±0.006 0.544±0.006 0.547±0.006 0.549±0.006 < v 2 >ω 0.999±0.003 1.000±0.003 1.000±0.003 1.000±0.003 1.000±0.003 1.000±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 0.999±0.003 1.000±0.004 0.999±0.004 0.999±0.005 0.999±0.005 1.000±0.005 1.000±0.005 μL 0.106±0.010 0.105±0.009 0.107±0.010 0.102±0.009 0.089±0.009 0.091±0.008 0.073±0.007 0.066±0.007 0.060±0.007 0.046±0.006 0.045±0.007 0.054±0.006 0.057±0.006 0.043±0.005 0.024±0.005 0.034±0.006 0.019±0.006 0.027±0.006 0.023±0.007 0.014±0.006 0.047±0.007 νφ 0.283±0.002 0.288±0.002 0.290±0.002 0.290±0.003 0.291±0.002 0.290±0.002 0.292±0.003 0.290±0.003 0.290±0.002 0.288±0.003 0.287±0.003 0.287±0.003 0.292±0.002 0.291±0.002 0.291±0.002 0.299±0.003 0.299±0.003 0.300±0.004 0.303±0.004 0.311±0.004 0.309±0.004 ξL /(η − η0 ) < v 2 >φ (continued) 0.435±0.011 0.442±0.010 0.448±0.010 0.454±0.010 0.459±0.010 0.466±0.009 0.477±0.008 0.481±0.008 0.484±0.008 0.488±0.008 0.491±0.008 0.494±0.008 0.499±0.007 0.501±0.006 0.504±0.007 0.506±0.007 0.508±0.007 0.510±0.008 0.510±0.009 0.512±0.008 0.513±0.008 Table 4.3 Relevant quantities measured from the two sets of simulations, for each expansion rate m: specifically the scaling exponents, μ and ν, together with the mean correlation length divided by conformal time (corrected by an offset), ξ/(τ − τ0 ), and the mean velocity squared < v 2 >. The left side of the table uses the winding-based correlation length estimator and the equation of state based velocity estimator, while the right side of the table uses the Lagrangian-based correlation length estimator and the field-based velocity estimator. All quantities are the result of the average of 12 simulations with different initial conditions 4.3 Abelian-Higgs Cosmic Strings 111 νω 0.083±0.007 0.074±0.008 0.060±0.009 0.062±0.009 0.083±0.008 0.084±0.009 0.075±0.008 0.074±0.008 0.078±0.008 0.091±0.008 0.101±0.008 0.106±0.008 0.102±0.008 0.101±0.008 0.095±0.008 0.091±0.007 0.092±0.006 0.097±0.006 0.106±0.005 0.097±0.005 0.077±0.005 0.070±0.004 μW 0.999±0.004 0.999±0.004 0.999±0.004 0.999±0.004 1.000±0.004 1.000±0.004 1.000±0.003 1.000±0.003 1.000±0.003 1.000±0.003 1.000±0.004 1.000±0.004 1.000±0.004 1.000±0.004 1.000±0.005 1.000±0.005 1.000±0.004 1.000±0.004 1.000±0.004 1.000±0.004 0.999±0.003 0.999±0.003 m 0.73 0.74 0.75 0.76 0.77 0.78 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94 0.95 Table 4.3 (continued) 0.169±0.001 0.180±0.002 0.191±0.002 0.202±0.002 0.212±0.002 0.220±0.003 0.229±0.003 0.235±0.003 0.240±0.003 0.245±0.003 0.250±0.002 0.254±0.002 0.259±0.002 0.261±0.002 0.266±0.002 0.267±0.002 0.274±0.003 0.277±0.003 0.277±0.003 0.278±0.003 0.280±0.003 0.283±0.003 ξW /(η − η0 ) 0.207±0.002 0.224±0.003 0.241±0.003 0.256±0.004 0.271±0.004 0.285±0.005 0.299±0.006 0.311±0.006 0.324±0.006 0.336±0.007 0.347±0.007 0.359±0.007 0.369±0.007 0.378±0.007 0.388±0.008 0.397±0.008 0.416±0.009 0.424±0.009 0.431±0.009 0.438±0.009 0.446±0.009 0.454±0.008 < v 2 >ω 0.999±0.003 0.999±0.004 1.000±0.004 1.000±0.004 1.000±0.004 1.000±0.005 1.000±0.005 1.000±0.005 1.000±0.004 1.000±0.004 1.000±0.004 1.000±0.004 1.000±0.004 1.000±0.003 1.000±0.003 1.000±0.003 1.000±0.003 1.000±0.004 1.000±0.003 0.999±0.003 1.000±0.004 0.999±0.003 μL 0.053±0.006 0.070±0.006 0.097±0.007 0.106±0.008 0.093±0.009 0.089±0.009 0.090±0.010 0.098±0.011 0.108±0.011 0.109±0.012 0.116±0.012 0.109±0.011 0.098±0.012 0.083±0.012 0.079±0.012 0.083±0.012 0.095±0.013 0.094±0.012 0.072±0.014 0.071±0.014 0.091±0.012 0.106±0.011 νφ 0.159±0.001 0.169±0.002 0.180±0.002 0.191±0.002 0.201±0.002 0.209±0.003 0.218±0.003 0.224±0.003 0.230±0.003 0.235±0.002 0.240±0.002 0.245±0.002 0.249±0.002 0.252±0.002 0.257±0.002 0.259±0.002 0.268±0.002 0.271±0.003 0.272±0.002 0.273±0.002 0.275±0.003 0.279±0.003 ξL /(η − η0 ) < v 2 >φ 0.181±0.003 0.199±0.003 0.217±0.004 0.233±0.005 0.248±0.005 0.262±0.006 0.275±0.007 0.288±0.007 0.300±0.008 0.312±0.009 0.323±0.009 0.335±0.009 0.344±0.009 0.353±0.010 0.363±0.010 0.371±0.010 0.389±0.011 0.398±0.011 0.404±0.012 0.411±0.012 0.418±0.011 0.426±0.011 112 4 Calibration of Extended VOS Mode 4.3 Abelian-Higgs Cosmic Strings 3.5 113 3.0 √ 0.55 W L < v 2 >ω < v 2 >φ 0.50 0.45 < v2 > 2.5 √ 2.0 0.40 0.35 0.30 1.5 0.25 1.0 0.20 0.5 0.5 0.6 0.7 0.8 0.9 0.5 0.6 m 0.8 0.9 m √ √ | < v 2 >ω − < v 2 >φ|/ < v 2 >ω 0.12 | 0.10 Relative difference 0.7 W − L| W 0.08 0.06 0.04 0.02 0.00 0.5 0.6 0.7 0.8 0.9 m Fig. 4.4 Asymptotic values of (top left panel) and root mean squared velocity v 2 (top right panel) for the two pairs of estimators used in the our production runs. The bottom panel shows the relative difference between the pairs of estimators, showing that the difference between the obtained velocities is in the range 6%−12% while for the correlation length estimators it is at most of 6% the calibration will be η ∈ [80, 128]. In conclusion, all networks have reached the scaling regime and can therefore be used to calibrate the VOS. Here we will also make another choice, given that exploring which velocity estimator should be used will require a larger resolution (larger lattice size with smaller spacing). This will be explored at a later section. For now, we will make an educated guess to only use the equation of state parameter, as in [24] the conjugate momenta estimator seemed to underestimate the velocity of an oscillating string in Minkowski. This underestimation for expanding Universes is evident as well in our asymptotic quantities in Table 4.3 and in Fig. 4.4. The difference is maximal at larger expansion rates (of order 12%), and minimally of about 6% for radiation epoch. On the same note, both estimates of agree very well at low expansion rate, but begin disagreeing at higher ones (maximally about 6%). We remark that here, there is no literature reference as to which estimator is performs better, as such we will attempt a calibration with each of these. The small disagreement at large expansion rates will cause some differences in the calibration, as will be shown next. Before we proceed to the calibration let us briefly remind the reader of the extended forms of the momentum parameter k(v) and the energy loss function F(v), first 114 4 Calibration of Extended VOS Mode proposed for domain walls in [32, 33] and here used with cosmic strings. The first velocity-dependent function takes the following form, k(v) = k0 1 − (qv 2 )β , 1 + (qv 2 )β (4.14) where β, q and k0 are free parameters. This form is general enough to also reduce (via appropriate choices) to the analytical Nambu-Goto ansatz of Eq. 2.62. We will also note that a k0 larger than unity can be a sign of wiggliness. The latter function, energy loss, is modified to include a scalar and gauge radiation term (in addition to loop production), F(v) = cv + d[k0 − k(v)]r (4.15) where d and r are additional free parameters. Note that the additional power law term is motivated by the idea that uniformly moving defects do not radiate–only perturbations of the defect surface will. As a fast expansion rate will smooth the presence of structure, it also makes sense the energy loss function reduces to only the loop production term at large expansion rates. We additionally remark that the radiative power law is limited to some extent as it cannot distinguish between different types of radiation (scalar, gauge, massive, massless). In any case, it will allow us to pinpoint which energy loss term will play a dominant role in sustaining scaling for any given expansion rate. At this point, we are ready to calibrate the extended VOS and phenomenologically test our ansatz for velocity-dependent functions. Here we stress that the large number of free parameters–6, instead of a single one–is not a problem in and of itself, as the large range and number of expansion rates will allow us to completely understand the velocity dependencies of each phenomenological function and numerically measure each parameter to a good level of statistical significance. It is in fact crucial that expansion rates one would not—at least in the standard cosmological picture— expect, are used. Without them the uncertainties will become larger and larger. In a latter section (and with all other sources of uncertainty under control) we will explore this effect and understand if it is possible to calibrate this extended model in a more restricted (and realistic) expansion rate range, m ∈ [0.5, 2/3]. As was done in the previous section and in the calibrations of [32, 33] we now apply a bootstrap procedure to compare model and simulation data. The resulting calibrated model parameters and their uncertainties for the two choices of correlation length estimators are shown in Table 4.4. We additionally list the calibrations of the domain walls VOS for a range of expansion rates comparable to what was used in our work (relativistic regime; from [32]) and additionally including ultra-relativistic and non-relativistic networks (from [33]). Before we comment on the values of the parameters, let us first ascertain if the model predictions are in line with what is expected from simulations and if our ansatze for velocity dependent functions provide a better description of simulation data than the standard ones. Taking the inverted VOS equations to generate a “measured” 4.3 Abelian-Higgs Cosmic Strings 115 Table 4.4 Calibrated parameters for the cosmic strings VOS model, obtained from the two sets of GPU-based simulations in this work and corresponding to the winding-based and Lagrangianbased correlation length estimators described in the text. For comparison we show the analogous parameters for the domain walls VOS model (obtained in the literature), both for a range of expansion rates comparable to the one in this section work and for a wider range of expansion rates Parameter Cosmic strings Cosmic strings Domain walls Domain walls (Winding) (Lagrangian) (Relativistic) (All) Reference m range k0 q β r d c This section [0.50-0.95] 1.37 ± 0.07 2.30 ± 0.04 1.46 ± 0.07 1.85 ± 0.11 0.21 ± 0.01 0.34 ± 0.02 This section [0.50-0.95] 1.27 ± 0.09 2.27 ± 0.05 1.54 ± 0.09 1.66 ± 0.10 0.26 ± 0.01 0.31 ± 0.02 [32] [0.50-0.90] 1.72 ± 0.03 4.10 ± 0.17 1.65 ± 0.12 1.30 ± 0.06 0.29 ± 0.01 0.00 ± 0.03 [33] [0.20-0.9998] 1.77 ± 0.03 3.35 ± 0.32 1.08 ± 0.07 1.42 ± 0.04 0.26 ± 0.02 0.00 ± 0.08 value of the momentum parameter and energy loss function we then compare with the standard and extended calibrated functions—the comparison can be seen in Fig. 4.5. As evidenced by the orange solid lines, the standard ansatz fails to accurately reproduced the measured k(v) and F(v) for the extended range of expansion rates. The extended ansatze however, provide a better fit, as illustrated by the blue line. In addition we can also plot the measured asymptotes and check if the VOS model provides a reasonable fit for all expansion rates. This comparison can be found in Fig. 4.6 and it shows the extended model predicts the asymptote quantities very well at larger expansion rates, slightly less so for lower ones. We will see in the next section that in part this can be attributed to the computation of the offset in , and to the presence of radiative contaminants stemming from the initial conditions. Now that we have shown that the calibrations provide a reasonable description of simulation data, we can comment on the parameter values themselves. A first remark is that winding and Lagrangian based estimations lead to compatible VOS model parameters, given the uncertainties inferred. The largest difference occurs with parameter d, but given that it is of order two standard deviations, it is not statistically significant. Another noteworthy feature is the fact that the maximal momentum parameter value k0 exceeds unity. This might indicate the presence of additional internal structure on strings—wiggles. A more detailed study of the presence of wiggles in these simulations, in the vein of [23, 30] or by comparison with the wiggly VOS [31, 40] is a possible next step to take. Also of note, is the preferred value of β which disagrees with Nambu-Goto ansatz prediction (of β = 3) instead being roughly half this value. This partially explains why the standard momentum parameter ansatz 2.62 cannot correctly reproduce the velocity-dependencies of the dynamical quantities (see again Fig. 4.5). Parameter 116 4 Calibration of Extended VOS Mode 1.4 0.8 Extended Analytical k(v) 0.4 Standard Analytical k(v) Simulation 2 0.2 0.3 0.2 0.4 0.5 Extended Analytical k(v) Standard Analytical k(v) Simulation 2 0.2 0.6 0.3 0.5 0.6 Extended Analytical F(v) Standard Analytical F(v) 0.4 2 Simulation 2 [1 − m(1 + v )] 0.3 F (v) Standard Analytical F(v) Simulation 2 [1 − m(1 + v 2)] 0.3 0.2 0.1 0.2 0.3 0.4 v 0.5 Rad Mat 0.2 Mat F (v) 0.5 0.5 Extended Analytical F(v) 0.4 0.4 v v 0.1 0.6 0.2 0.3 0.4 0.5 Rad 0.2 0.8 0.6 0.6 0.4 Rad Rad Mat 1.0 k(v) k(v) 1.0 Mat 1.2 1.2 0.6 v Fig. 4.5 Comparisons between the analytic VOS model predictions (solid lines) and the simulation outputs (data points) for both the momentum parameter k(v) (top panels) and a generalized energy loss function F(v) (bottom panels). Left side and right side panels correspond to the winding-based and Lagrangian-based correlation length estimators discussed in the text. In each case we show the simulation diagnostics used as input for the inverted VOS expressions. We show for comparison both the previous and extended versions of k(v) and F(v) (depicted in red and blue lines, and given respectively by Eqs. 2.62–2.63 and 2.70–2.73) in order to emphasize that the previous one provides a poor fit while the extended one provides a very good one. To facilitate comparisons with previous works the radiation and matter era values are explicitly indicated degeneracies might also partially be responsible for the difference between analytical and free β. It is also curious to compare the calibrated model parameters for strings and walls, as shown in Table 4.4. First the normalization parameters for each function are clearly larger in the domain walls case. This is not completely surprising as for instance, q, by definition, will depend on the dimensionality of the defects. The predicted values for the maximal network velocity, for which the momentum parameter would vanish, would be vmax = 0.5 for walls and around vmax = 0.66 for strings. From the measured case, q is indeed in agreement with this prediction for both cases, giving for strings a maximal velocity of v ∼ 0.66, while for domain walls v ∼ 0.5 (or v ∼ 0.55 if one considers both non-relativistic and ultra-relativistic regimes). When it comes to the exponent parameters, such as r and β, the first is somewhat larger for cosmic strings, and for the latter parameter the situation is less clear. If one compares strings with walls in a comparable expansion rate range (relativistic), 4.3 Abelian-Higgs Cosmic Strings 117 0.40 0.40 VOS prediction Simulation 0.35 0.30 ξ/η ξ/η 0.30 0.25 0.5 0.7 0.8 0.9 0.15 1.0 m 0.7 Mat Rad Mat 0.6 0.20 0.5 0.6 0.7 0.7 VOS prediction Simulation 0.6 0.8 0.9 1.0 m VOS prediction Simulation 0.6 0.5 υ 0.5 0.4 0.4 0.3 Rad Rad Mat 0.3 Mat 0.15 0.25 Rad 0.20 υ VOS prediction Simulation 0.35 0.2 0.2 0.5 0.6 0.7 0.8 m 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0 m Fig. 4.6 Comparison between simulation outputs and the calibrated extended VOS model prediction for the rate of change of ξ, (specifically ξ/η, top panels) and the root mean square velocity (bottom panel). Left side and right side panels correspond to the two different choices of correlation length estimators, winding-based and Lagrangian-based, described in the text. To facilitate comparisons with previous works the radiation and matter era values are explicitly indicated then both seem compatible with the same β. However, as soon as non-relativistic and ultra-relativistic walls are taken into account, a β of value one is obtained. In a later section we will show if it is possible to explore the non-relativistic regime, in order to try to answer this question. The last parameters we still need to discuss are the most important for observational signatures—the behaviour of the energy loss parameters c and d. The normalization of radiative losses, d, seems to be very similar across all cases, and it is tempting to speculate there might be a universal value for it, applicable also to field theory simulations of other defect networks, such as global monopoles [28] and semi-local strings [1]. This requires testing on a case-by-case basis. The most striking difference comes from the loop chopping difference, c, which for walls, independent of which expansion rates are included is always consistent with zero, while for cosmic strings it is very clearly different from zero (at a high level of statistical significance). This preliminary calibration seems to lead us to the conclusion that energy loss mechanisms are different for walls and strings, in the former radiative losses are the dominant mechanism, while for strings this is not the case. To explore the relative importance of each mechanism in gauged strings, we 118 4 Calibration of Extended VOS Mode can evaluate the ratio of the two energy loss terms for each expansion rate (for each velocity) in the evolution equation for the correlation length. This ratio is given by parameter , hereby defined as, = cv Loop losses = . Radiation losses d[k0 − k(v)]r (4.16) And using the obtained model parameters along with the velocities from simulations we find that in radiation epoch (m = 1/2) the ratio takes value, rad ∼ 0.82 (4.17) mat ∼ 1.06 ; (4.18) while in the matter era (m = 2/3), Indicating that in the latter loop production and radiative losses contributed in nearly equal parts to energy loss, while in the first radiative losses become more important. This is expected as faster expansion rates (small velocities) should result in less significant role for radiative losses. As a final comparison, taking an expansion rate of m = 0.9, (4.19) 0.9 ∼ 6.92 , and it is seen that in this extreme regime, loop production is the dominant energy loss mechanism. As a last concluding remark we can also compare this calibration of the VOS model with the calibrations of the standard VOS for both Nambu-Goto and field theory simulations. In [34] it was found that Nambu-Goto simulations (in radiation and matter eras) result in a value of c = 0.23 ± 0.04, whereas for field theory simulations a value of c = 0.57 ± 0.05 is preferred. Our new result differs from the former significantly (at the level of two standard deviations) which is to some extent expected, as c can have correlations with other parameters (this will be analyzed in greater detail later on), in particular note that the form of k(v) is also very different in Nambu-Goto and Abelian-Higgs. Compared to field theory simulations, the new result is also smaller, which can be expected due to combined effect of the noninclusion of an explicit radiative term and a different k(v). In addition, at the time of writing the results whereupon this section is based, we knew not how certain possible sources of systematics could impact our results and the conclusions withdrawn. As such, the rest of the chapter is an exercise in improving these calibrations, walking towards a definitive calibration of the VOS for field theory strings. 4.3 Abelian-Higgs Cosmic Strings 119 4.3.2 Overcooled Initial Conditions Having already established what a preliminary calibration of the Extended VOS for field theory gauged strings predicts, we now need to march towards a definitive calibration, with all possible sources of systematics under control. Although most of these sources of systematic error will require more computing firepower, there is one source which can be investigated without the need of larger lattices. This source pertains to the use of cooling procedures in the initial conditions. In many instances of gauged string simulations (field theory) the simulation is started at a later time to reduce the offset η0 to zero, which can significantly delay the formation of a string network via Hubble damping (this depends on the size of Hubble length relative to string radius). As such a period of cooling is applied to accelerate the formation of strings. Therein also lies an additional advantage: most random initial conditions have large gradients which result in the appearance of extra radiation in the simulation box, one cannot easily remove. This radiation manifests itself as an oscillation of the Lagrangian estimator (see Fig. 4.7 top left-hand side panel), or as oscillations in the velocity computation (refer to aforementioned figure, right hand, top panel). This radiation is not dampened out at sufficiently low expansion rate (being evident for instance in radiation epoch). Cooling (also known as gradient flow) corresponds to evolving fields according to the discrete form of the following equations of motion, λ (4.20) φ̇ = D j D j φ − (|φ|2 − 1)φ 2 Ḟ0 j = ∂ j Fi j − 2a 2 e2 I m[φ∗ D j φ] . (4.21) which can obtained from considering the physical equations of motion for the system, assuming all field second order derivatives with respect to time to be null and no Hubble damping (ie. aȧ will be null). The timestep size for this period is set to δη = 1/30 and the parameters λ and e are set to the same values as the corresponding couplings (physical) as in the cosmological part λ0 and e0 . Since the extended VOS model explicitly accounts for separate energy loss mechanisms, in the form of loop chopping and radiation (scalar or gauge) emission, calibrations with differing degrees of cooling will allow us to pinpoint exactly how much radiation is removed, if there is any enhancement of loop production (to sustain scaling), or how small-scale structure might be affected. To this effect we will then produce three sets of simulations, with three different amounts of cooling: • Standard, no cooling case, where simulations start at conformal time η = 1. This is basically the calibration set from the previous section, and we will refer to it as Hot case; • A case with some amount of cooling, with the same initial conditions fromt the previous set but a dissipation period applied from time ηcool = −10.0 to η = 1.0. This will be denominated the Warm case; 120 4 Calibration of Extended VOS Mode 0.7 50 0.6 40 0.5 ξL vω2 30 20 0.4 0.3 0.2 10 0.1 0 0 20 40 60 80 100 0.0 120 0 20 40 60 η 80 100 120 80 100 120 80 100 120 η 0.7 50 0.6 0.5 vω2 ξL 40 30 0.4 0.3 0.2 20 0.1 10 0 20 40 60 80 100 0.0 120 η 20 40 60 η 0.7 55 0.6 50 0.5 45 vω2 40 ξL 0 35 30 0.4 0.3 0.2 25 0.1 20 0 20 40 60 80 η 100 120 0.0 0 20 40 60 η Fig. 4.7 The evolution of mean string separation (left) and mean velocity squared (right) according to the Lagrangian estimator and the equation of state estimator, averaged for sets of 12 runs at each expansion rate in the range [0.50, 0.95]. The top panel shows the results for the Hot case (standard case, without cooling), while the middle and bottom panels show the Warm and Cold cases. Low expansion rates are at the top of the panels while high expansion rates are at the bottom of the panels. All simulations have box sizes 5123 with constant comoving width (PRS algorithm) 4.3 Abelian-Higgs Cosmic Strings 121 • A third case with more cooling applied, ie. with the same initial conditions but a cooling period from ηcool = −50.0 until initial conformal time η = 1.0. We will refer to this overcooled case as Cold; We remark that in all three cases, the cosmological evolution start conformal time is η = 1.0, and the only difference in terms of initial conditions for this period is the degree of cooling. All calibrations of the VOS are done the with the scaling regimen observed during the cosmological evolution, so as to ascertain the effects of varying degrees of cooling on the obtained parameters. For each set of simulations, we will use 43 expansion rates in the range m ∈ [0.5, 0.95] in FLRW Universes with the scale factor varying as a ∝ t m , each with 12 runs, exactly as was done for the standard case. Radiation and matter epochs correspond to m = 1/2 and m = 2/3, respectively. In order to re-write the VOS in the more compact form, and to calibrate it, we will measure two quantities from the simulations, and v0 , with the first being given by, = η . (η − η0 )(1 − m) (4.22) and v0 is the square root of the mean velocity at scaling. We note again that η0 is nothing more than an offset dependent on the initial conditions (see previous section). The first quantity implicitly assumes one can approximate the slope of ξ (the rate of change of ξ) by ξ/η or, given the presence of the offset ξ/(η − η0 ). As before, this means we expect the following scaling laws, ξ ∝ (η − η0 )μ (4.23) v ∝ ην , (4.24) with the analytical expectation that the exponents μ and ν reach unity and null respectively once the network is exhibiting scaling behavior. As done before for the standard case (previous section) we will now verify the quality of scaling from fitted values of these exponents and will using them to select a “good enough” calibration conformal time range. For the Warm and Cold cases we will therefore use a more stringent fitting range of η ∈ [100, 128]. The scaling exponents obtained and the network parameters for each expansion rate are listed in Table 4.5. With these network parameters measured from the simulations we can now calibrate the Extended VOS via the same bootstraping procedure from the previous sections (and from the ealier domain wall calibrations of [32, 33]). The resulting parameters from this procedure for the three sets of simulations are shown in Table 4.6. A comparison between VOS predictions and simulation data for the asymptotic quantities is shown in Fig. 4.8 (top panels) which show the model is accurately describing simulations. In addition, one can also use the inverted VOS equations to display a measured energy loss function or momentum parameter and compare with νwar m 0.130±0.006 0.121±0.006 0.120±0.006 0.118±0.006 0.121±0.006 0.134±0.006 0.141±0.005 0.142±0.006 0.139±0.006 0.144±0.007 0.154±0.006 0.159±0.007 0.163±0.007 0.167±0.007 0.157±0.008 μwar m 0.005±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.003±0.001 0.003±0.001 0.004±0.001 0.003±0.001 0.004±0.001 m 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.789±0.058 0.769±0.052 0.755±0.056 0.741±0.054 0.725±0.053 0.705±0.054 0.689±0.053 0.669±0.051 0.652±0.051 0.639±0.051 0.621±0.050 0.606±0.051 0.595±0.049 0.581±0.047 0.572±0.049 war m 0.503±0.015 0.509±0.015 0.514±0.014 0.519±0.014 0.523±0.014 0.526±0.014 0.531±0.014 0.534±0.013 0.538±0.012 0.541±0.012 0.543±0.012 0.545±0.013 0.548±0.012 0.550±0.012 0.553±0.012 v 2 war m 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 0.005±0.001 μcold 0.110±0.005 0.108±0.005 0.107±0.005 0.105±0.005 0.106±0.004 0.104±0.004 0.103±0.004 0.099±0.004 0.101±0.004 0.096±0.004 0.093±0.004 0.090±0.004 0.086±0.004 0.082±0.004 0.085±0.003 νcold 0.774±0.064 0.757±0.062 0.741±0.063 0.723±0.061 0.705±0.061 0.688±0.058 0.673±0.057 0.657±0.057 0.640±0.055 0.626±0.054 0.610±0.053 0.597±0.051 0.585±0.049 0.572±0.047 0.560±0.047 cold 0.486±0.016 0.492±0.016 0.497±0.016 0.502±0.016 0.506±0.016 0.510±0.015 0.514±0.015 0.518±0.015 0.522±0.015 0.526±0.014 0.529±0.013 0.532±0.013 0.536±0.013 0.539±0.012 0.542±0.012 v 2 cold Table 4.5 Scaling exponents μ and ν and network parameters used for VOS calibration for the Cold initial conditions case. One-sigma statistical uncertainties, from averaging sets of 12 simulations, are reported throughout 122 4 Calibration of Extended VOS Mode 0.182±0.010 0.166±0.011 0.134±0.013 0.105±0.014 0.092±0.014 0.073±0.014 0.074±0.015 0.053±0.014 0.036±0.016 0.020±0.017 0.038±0.018 0.067±0.019 0.087±0.018 0.092±0.019 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.003±0.001 0.003±0.001 0.69 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.8 0.82 0.83 0.84 0.68 0.7 0.165±0.009 0.182±0.009 0.004±0.001 0.004±0.001 0.6(6) νwar m μwar m m Table 4.5 (continued) 1.534±0.104 1.469±0.098 1.407±0.100 1.295±0.093 1.203±0.085 1.160±0.084 1.119±0.082 1.078±0.083 1.037±0.075 1.003±0.073 0.974±0.071 0.942±0.066 0.918±0.065 0.895±0.064 0.870±0.062 0.834±0.064 war m 0.351±0.014 0.361±0.015 0.371±0.016 0.390±0.017 0.408±0.016 0.417±0.015 0.424±0.016 0.431±0.017 0.438±0.016 0.445±0.016 0.452±0.017 0.459±0.016 0.466±0.016 0.473±0.016 0.480±0.016 0.487±0.016 v 2 war m 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.004±0.001 0.005±0.001 μcold 0.197±0.008 0.168±0.008 0.152±0.007 0.122±0.006 0.112±0.006 0.111±0.006 0.108±0.006 0.108±0.006 0.107±0.006 0.114±0.005 0.113±0.005 0.108±0.006 0.110±0.005 0.112±0.005 0.105±0.005 0.107±0.005 νcold 1.470±0.100 1.414±0.096 1.362±0.092 1.266±0.088 1.177±0.084 1.137±0.082 1.101±0.078 1.067±0.077 1.035±0.076 1.005±0.075 0.973±0.075 0.943±0.071 0.916±0.071 0.889±0.07 0.864±0.068 0.833±0.066 cold 0.332±0.014 0.343±0.015 0.353±0.015 0.372±0.016 0.390±0.016 0.398±0.017 0.407±0.018 0.415±0.017 0.423±0.017 0.431±0.017 0.438±0.017 0.445±0.016 0.452±0.017 0.458±0.016 0.464±0.016 0.472±0.016 v 2 cold 4.3 Abelian-Higgs Cosmic Strings 123 νwar m 0.122±0.017 0.140±0.016 0.158±0.014 0.161±0.016 0.147±0.016 0.145±0.015 0.127±0.013 0.095±0.011 0.093±0.011 0.096±0.009 0.081±0.009 μwar m 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.004±0.001 0.004±0.001 0.003±0.001 0.002±0.001 m 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 Table 4.5 (continued) 3.190±0.183 2.820±0.199 2.544±0.204 2.339±0.181 2.176±0.152 2.043±0.128 1.928±0.127 1.824±0.127 1.736±0.125 1.663±0.117 1.598±0.110 war m 0.202±0.004 0.218±0.005 0.234±0.007 0.249±0.008 0.263±0.009 0.278±0.01 0.291±0.011 0.303±0.011 0.317±0.012 0.328±0.013 0.340±0.013 v 2 war m 0.002±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 0.003±0.001 μcold 0.240±0.003 0.227±0.003 0.221±0.003 0.228±0.004 0.227±0.005 0.228±0.006 0.234±0.006 0.228±0.007 0.222±0.007 0.216±0.008 0.212±0.008 νcold 2.774±0.184 2.517±0.185 2.311±0.177 2.147±0.166 2.016±0.154 1.908±0.136 1.812±0.127 1.729±0.120 1.656±0.115 1.591±0.110 1.529±0.105 cold 0.183±0.004 0.200±0.005 0.215±0.006 0.230±0.007 0.245±0.008 0.259±0.009 0.272±0.010 0.285±0.011 0.298±0.012 0.309±0.013 0.321±0.014 v 2 cold 124 4 Calibration of Extended VOS Mode 4.3 Abelian-Higgs Cosmic Strings 125 Table 4.6 Calibrated VOS model parameters for our three cooling scenarios: Hot (standard), Warm and Cold initial conditions. These were obtained through the previously used bootstrap methods Case d r β k0 q c Reference Hot 0.21±0.01 1.85±0.11 1.46±0.07 1.37±0.07 2.30±0.04 0.34±0.02 Previous section 0.26±0.01 1.58±0.10 1.29±0.06 1.21±0.06 2.05±0.04 0.36±0.03 This section 0.17±0.01 1.64±0.09 1.91±0.03 0.97±0.03 2.38±0.02 0.56±0.01 This section Warm Cold 0.7 0.40 Hot (standard) initial conditions Warm initial conditions Cold initial conditions 0.6 0.5 0.30 ξ/η 0.4 0.3 Rad Mat 0.20 Mat 0.25 Rad υ Hot (standard) initial conditions Warm initial conditions Cold initial conditions 0.35 0.2 0.5 0.6 0.7 0.8 Mat 1.0 0.4 F (v) 0.8 0.6 0.1 0.6 0.7 0.8 0.9 1.0 m Hot (standard) initial conditions Warm initial conditions Cooled initial conditions 0.3 Hot (standard) initial conditions Warm initial conditions Cold initial conditions 0.2 0.3 0.4 v 0.1 0.5 0.6 0.1 0.2 0.3 0.4 0.5 Rad 0.2 Mat 0.4 0.5 0.5 1.2 0.2 0.15 1.0 Rad 1.4 k(v) 0.9 m 0.6 v Fig. 4.8 Top panels: the string network average velocity and dimensionless comoving string separation, v = v 2 and ξ/η, respectively in the left and right panels, for the three cooling scenarios. Bottom panels: The momentum parameter and the energy loss function (left and right panels respectively) for the same cooling scenarios. In all cases the error bars are the statistical uncertainties from averaging over 12 simulations with different initial conditions, and the solid line is is the prediction from the VOS model, with the calibrated parameters listed in Table 4.3. For convenience the values corresponding to simulations in the radiation and matter eras have been highlighted the analytical ansatz for their forms (inserting of course the parameters as appropriate). This is shown in the bottom panels of Fig. 4.8. Our results support the expectation that a small amount of cooling will have very little impact in the network evolution, as its impact is to only remove thermal oscillations. Indeed, from the Fig. 4.8 and Table 4.6 we can see the warm case parameters 126 4 Calibration of Extended VOS Mode bring about a better agreement between model prediction and simulation data and the changes in parameters are not statistically significant, especially considering the parameter uncertainties (purely statistical) might be optimistic—more on this next section. In contrast, for the case with a large amount of cooling (cold) there are clear differences which are much more significant from a statistical point of view. Take for instance the observationally important loop chopping parameter c. This parameter clearly increases with excessive cooling: c = 0.34 → 0.56 when going from the Hot to the Cold case. This can be interpreted as the analytical model correctly identifying the reduced amount of radiation in the box as being compensated by a larger loop production term. Another parameter of note is the maximal value of the momentum parameter, k0 . As previously mentioned this parameter can indicate that the curvature radius is not the same as the correlation length or the mean string separation, which acts as a clue for the presence of small-scale structure. Going from the Hot to the Warm and then Cold case, there are clear statistically significant differences as the best-fit values are reduced from 1.37, to 1.21 and then to 0.97 respectively. We remark that this effect is rather subtle as the velocities are not particularly affected, with only the correlation length showing some differences (statistically significant when going from Warm to Cold cases, but see next section). 4.3.2.1 An Improved Calibration Pipeline In preparation for the larger runs (40963 and 81923 ) used to investigate possible systematic sources of error in the calibration of the Extended VOS, we sought to improve our calibration pipeline. The more robust pipeline is a result of two features being added: one including full uncertainty propagation, and a new computation of the offset η0 , and with the addition of Bayesian inference for parameter estimation. Considering the existence of parameter degeneracies and the possible expansion rate dependence on uncertainties (specifically for ), we must discuss the impact of each feature on the previous conclusions. The first improvement (full uncertainty propagation) relies on the uncertainties Python package to automatically propagate uncertainties for the velocity and mean string separation. For each case, at each expansion rate, the average and standard deviation are obtained and stored into ufloat arrays from the aforementioned uncertainties package. In addition we compute the offset η0 from each run, storing the mean offset and standard deviation in such an array (for each expansion rate). We can see the new uncertainties in the top two panels of Fig. 4.9. Velocity uncertainties are not significantly affected by this proceedure, however, since the computation of the offset has changed, the uncertainties of increased. From here-on-out, all uncertainties are propagated automatically, which is much more convenient , less error-prone and independent of the structure of the VOS given to the pipeline. This can already be seen in action for the uncertainty computation of F(v) and k(v) in the bottom panels of Fig. 4.9. Broadly speaking the uncertainties 4.3 Abelian-Higgs Cosmic Strings 127 Fig. 4.9 Same as Fig. 4.8, but including the uncertainty propagation described in the main text and with the solid line now being the prediction from the VOS model with the calibrated parameters listed in Table 4.6 Table 4.7 Same as Table 4.3, but including the uncertainty propagation described in the text Case d r β k0 q c Reference Hot 0.20 ± 0.01 2.06 ± 0.13 1.54 ± 0.06 1.38 ± 0.02 2.38 ± 0.03 0.35 ± 0.01 This section Warm 0.21 ± 0.01 1.68 ± 0.12 1.41 ± 0.05 1.27 ± 0.02 2.24 ± 0.03 0.37 ± 0.01 This section Cold 0.19 ± 0.01 2.00 ± 0.10 1.95 ± 0.03 0.98 ± 0.01 2.45 ± 0.01 0.58 ± 0.01 This section of F(v) now show the opposite pattern of what is seen on the previous figure: they become larger at smaller expansion rates, and reduced at high ones. Regarding the VOS model parameters, we must now verify the impact of the new uncertainty computation procedure. The new parameters for all cases are listed in Table 4.7. Comparing with the previous parameters, the changes are small, within at most two standard deviations. The changes do seem larger in the Cold case than in the Hot and Warm cases, and overall the uncertainties of model parameters seem to decrease. This can be due to a multitude of reasons, including parameter degeneracies or even uncertainty underestimation. The latter can be a result of a subtle point, and corresponds to the difference between having parameter distributions marginalized over the values of other parameters or conditional on other parameters. 128 4 Calibration of Extended VOS Mode For these reasons, we improve the VOS calibrator tool by implementing Markov Chain Monte Carlo (MCMC) capabilites. To be more specific, we used the emcee1 [22] package to implement such Bayesian inference capabilities. Besides improving uncertainty estimation, it will also allow us to test if the best-fit parameters obtained via bootstrapping minimization lie on global minima and to unveil any parameter interdependencies. In order to obtain posterior distributions for all parameters, we assume priors based on logarithmic probability density functions drawn from uniform distributions, and a log likelihood of computed via the well-known χ2 statistic, defined as, χ2 = [v pr edicted (m) − vsimulated (m)]2 m σv2 + [ pr edicted (m) − σ2 simulated (m)] 2 . (4.25) With a minimum of 10000 steps and 32 walkers we achieve convergence on all cases. In addition, we always obtain a minimum of a mean acceptance rate of 0.4. The resulting posterior distributions can be seen in Figs. 4.10, 4.11 and 4.12, respectively for the Hot, Warm and Cold cases, with blue solid lines indicating the minimization best-fit results. We also report the 50th quantile (as the best-fit value) and the 16th and 84th quantile (represented as black dashed lines on the corner plots) for the uncertainties on Table 4.8. We can make some first remarks on the aforementioned figures. For instance, the minima found by minimization normally coincide with the likelihood peak found via MCMC methods. However, in the specific case of r it seems that as the distribution of the posterior widens with the amount of cooling and it is possible for minimization to find the incorrect minima: this is evident in the cold case (see Fig. 4.12). Note as well the asymmetry that arises in the d posterior for the cold case, to the point where the likelihood peak lies not on the 50th quantile, but roughly on the 16th quantile (we always quote 16, 50, 84th quantiles in tables). This can be attributed to the reduction of radiative losses from the network which erases some information useful to calibrate the model. We remark as well that while the uncertainty of d in the Warm and Hot cases is relatively similar to the bootstrap uncertainties, in the Cold case the uncertainty is indeed larger. Overall, the uncertainties of parameters c, k0 and d (with the exception mentioned above) are in line with the previous bootstrap uncertainties, while for q, β and r were previously underestimated. The parameter that is least well determined is (somewhat unsurprisingly) r . To conclude we will also note the degeneracies that exist between parameters (clearly seen in the corner plots). Several of these degeneracies can be physically explained in the context of the VOS. For instance let’s take the specific example of k0 which is negatively correlated with q. Even if k0 reflects the maximal value of the momentum parameter k(v) and is therefore largely determined by the large expansion rate regime, it is an indicator of the presence of small-scale structure. And the regime where small-scale structure will be more obvious is in the low-expansion rate limit, where it will have an effect on the velocity, and thusly on q. Negative correlations 1 https://emcee.readthedocs.io/en/stable/. 4.3 Abelian-Higgs Cosmic Strings 129 Fig. 4.10 The corner plots for the posterior distributions in the Hot (standard case). Above the 1D histogram for each variable we report the 50th quantile and use the 16th and 84th quantiles to compute and show uncertainties. These three quantiles are indicated by the dashed black lines. Contour plots between pairs of parameters are also shown. The blue lines (and dots) represent the value found via the bootstraping procedure for k0 also include d and β. Positive correlations exist for other parameters, such as for c and r . 4.3.3 Further Exploration of Model Sensitivity to Numerical Choices Having improved the robustness of our VOS calibration tool and with the computational resources of Piz Daint at hand (1 million node hours), we are now in the 130 4 Calibration of Extended VOS Mode Fig. 4.11 Same as Fig. 4.10, for the Warm case condition to explore systematic errors which require the use of large lattices. We will first explore the impact of dynamic range (or equivalently, lattice size) and then we will move to attempting to understand the impact of different velocity estimators on VOS model calibration. In all cases we will attempt to understand the necessary numerical choices to compensate for the studied systematic error sources and obtain a definitive VOS calibration. 4.3.3.1 Dynamic Range and Lattice Size Field theory simulations have a problem of separation of scales. That is, one wishes to understand the properties of the network at the level of mean string separation, but also have a sufficiently large resolution to describe behavior happening at small 4.3 Abelian-Higgs Cosmic Strings 131 Fig. 4.12 Same as Fig. 4.10, for the Cold case Table 4.8 Same as Tables 4.6 and 4.7, but using the Bayesien inference method described in the text. We always report the 50th quantile value, with the 16th and 84th being used for computing the uncertainties Case d r β k0 q c Reference Hot +0.03 0.20−0.03 2.11+0.50 −0.42 1.55+0.19 −0.18 1.37+0.07 −0.06 2.38+0.08 −0.09 0.35+0.03 −0.04 Warm 0.21+0.04 −0.04 1.88+0.79 −0.53 +0.23 1.42−0.20 1.27+0.09 −0.07 +0.12 2.24−0.11 0.39+0.05 −0.07 Cold 0.37+0.35 −0.19 +1.81 3.74−1.70 +0.31 1.94−0.27 0.98+0.04 −0.04 2.45+0.13 −0.14 0.59+0.02 −0.03 This section This section This section 132 4 Calibration of Extended VOS Mode scales, close to the string radius for instance. The size of the lattice (multiplied by lattice spacing) will dictate the final conformal time and hence available dynamic range, beyond which boundary effects are expected to play a more dominant role. This means that for the same lattice spacing, larger lattices will resolve scales down to smaller fractions of the horizon (as the horizon can grow to larger sizes). This should have a visible impact on the calibration of the VOS given that some parameters are intimately connected to effects occurring at small-scales. To explore this issue we will consider lattices of sizes 10243 , 20483 and 40963 with s a single lattice spacing of Δx = 0.5. We will simulate 25 expansion rates m (again assuming a power law scale factor) in the range m ∈ [0.5, 0.95], and for each of these, in order to reduce statistical errors we will use 10 runs. We will consider throughout this section the use of the winding based correlation length estimator ξW defined in Eq. 3.17. We will also, both as a cross-check and to segue into the next section, consider two different velocity estimators, the equation of state based vω2 and the conjugate momenta based vφ2 , defined in Eqs. 3.20 and 3.18 respectively. In terms of changes to the measured asymptotes, we see a slow drift in the values of ξ/η. This is represented in Fig. 4.13, where in the top-left panel the shaded region Fig. 4.13 Comparison of the mean rate of change of correlation length ξ/η (top left) and the mean velocity v (top right) with the solid lines corresponding to the calibration and the shaded regions to the uncertainty of the measurements of each estimator for three different box sizes. The bottom plots show how these differences impact the momentum parameter k(v) (bottom left) and in the energy loss parameter F(v) (bottom right) 4.3 Abelian-Higgs Cosmic Strings 133 corresponds to the simulation values (with statistical uncertainties) and the solid lines correspond to the VOS predictions. We remark that this drift has been reported in the literature before [11, 24] and can also be seen in our work (previous chapter, section on the validation of the multi-GPU application). Note however, that this drift has been partially attributed to the presence of cooling in the preparation of the initial conditions, at least when going from 5123 to 10243 (see [11]). In the simulations in this section, we did not apply any cooling, therefore this drift is entirely due to lattice size. On the top right-hand side panel of Fig. 4.13, we also see that little to no change takes place in the values of the velocity (and neither in its uncertainties). There is also an additional behavior to note: the uncertainties of ξ/η are reduced with an in increase lattice size. This is a result of the increased dynamic range reducing the importance of the initial conditions offset. It is also of note that all changes (slow drift, reduced uncertainties) are qualitatively smaller when going from 20483 to 40963 than from 10243 to 20483 . This seems to suggest that, much like in the domain walls case [32], there is a minimum lattice size for model calibration, beyond which parameter values are stable. These differences will consequently impact the velocity-dependent functions and conversely affect the model parameters. This can be visually inferred from the bottom panels of Fig. 4.13 and from the 1σ and 2σ contours corner plots of Fig. 4.14. The different calibrations are also summarized on Table 4.9. To show how the shape of velocity-dependent functions and parameters are connected let’s take the example of the energy loss function F(v). As the resolution increases, and ξ/η decreases, F(v) shifts downwards which suggests a reduction of either normalization parameters c and/or d. The impact on the shape of the momentum parameter k(v) is however much less noticeable, with changes being circumscribed to a reduction of its maximal value k0 . These expectations are confirmed in the Table 4.9 and in the corner plots of Fig. 4.14 where k0 and c are reduced and d (to a lesser extent) increases. The anti-correlation of c and d is explained by the fact that both control the relative importance of different mechanisms of the energy loss function. This anti-correlation also means that loop formation is gradually replaced by radiative losses, eventually becoming negligible at 40963 . The effects of reduced uncertainties is also manifest in the decreasing area of the confidence contours and the smaller changes from 20483 to 40963 visually confirm that most parameters, with the notable exception of c, are close to their stable values. However, and before we immediately declare that for gauged cosmic strings loopchopping plays no dominant role at any expansion rate (much as domain walls [32]), there are a two clues that lead us to note this could be an incorrect conclusion. Visually inspecting networks in radiation era reveals the formation and evolution of loops with different sizes (both large, horizon sized loops and small ones). Screenshots for a 40963 radiation era simulation can be seen in Fig. 4.16 for conformal time η = 700, 710 and 714. The full animations (colored either by velocity or group of string cells) spanning the full conformal time η ∈ [741, 1024] are available at the following webpages [14, 15]. The fact that such loops are present is at odds with the information that c −→ 0. 134 4 Calibration of Extended VOS Mode Fig. 4.14 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity estimator vω , for three different box sizes. The 2D panels the depict the 1σ and 2σ confidence regions Table 4.9 Calibrated VOS model parameters for our three different lattice sizes, 10243 , 20483 and 40963 , all with the same lattice spacing Δx = 0.5, and two different choices of velocity estimators, vω2 and vφ2 (in the top and bottom parts of the table, respectively), further described in the main text. Displayed values correspond to 16th, 50th, 84th percentiles of the posterior distributions Lattice Δx Velocity d r β k0 q c size estimator 10243 20483 40963 10243 20483 40963 0.5 vω2 0.5 vφ2 +0.04 0.32−0.04 +0.02 0.37−0.02 0.39+0.02 −0.02 0.35+0.23 −0.10 0.33+0.05 −0.04 0.36+0.03 −0.03 1.51+0.48 −0.37 1.27+0.17 −0.15 1.36+0.15 −0.13 2.39+1.58 −0.94 1.86+0.39 −0.32 +0.26 1.72−0.23 +0.34 1.82−0.30 2.33+0.21 −0.20 +0.20 2.32−0.18 +0.73 2.79−0.56 2.65+0.28 −0.26 +0.21 2.50−0.20 1.27+0.08 −0.06 1.21+0.03 −0.03 1.18+0.03 −0.03 1.06+0.05 −0.05 1.05+0.03 −0.03 1.06+0.02 −0.02 2.41+0.13 −0.13 2.57+0.06 −0.06 2.59+0.05 −0.05 2.95+0.18 −0.19 +0.08 2.84−0.08 +0.06 2.83−0.06 0.15+0.05 −0.07 0.03+0.02 −0.03 +0.01 0.00−0.01 +0.04 0.44−0.05 0.31+0.02 −0.02 0.23+0.01 −0.01 4.3 Abelian-Higgs Cosmic Strings 135 Fig. 4.15 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity estimator vφ , for three different box sizes. The 2D panels the depict the 1σ and 2σ confidence regions The second hint is to repeat this analysis with a different velocity estimator. So far we have used the equation of state estimator vω2 defined in Eq. 3.20, but we can repeat the analysis with the conjugate momenta estimator vφ seen in 3.18. The resulting parameter tables and corner plots are available at Table 4.9 and in Fig. 4.15, respectively. The model calibration instead reveals a loop-chopping parameter that does not go to zero, in fact even at the largest resolution 40963 it is statistically different from zero and rather similar to the value of d. Remarkably, d seems to not change as the resolution increases, being in fact the least affected of the six parameters. The fact that these two model calibrations result in two very different conclusions prompts an investigation on the reliability of each velocity estimator. This can eventually leads to a solution which reconciles (even if partially) the two calibrations. 136 4 Calibration of Extended VOS Mode Fig. 4.16 Winding centerlines displayed for a radiation epoch, three timesteps, 40963 lattice size with spacing Δx = 0.5. The top, middle and bottom panels correspond to conformal time η = 700, 710 and 714. We use the local velocity estimator of [45] to color the reconstructed strings. The full animation can be found at [15] 4.3 Abelian-Higgs Cosmic Strings 4.3.3.2 137 Lattice Spacing and Velocity Estimators In order to start uncovering the origin of the two differing calibrations we will remind the reader that in the previous small lattice calibration (at 5123 ) we noted the differences between estimators were more accentuated for velocity estimators. In fact the difference between correlation length estimators is maximally of about 6% at high expansion rate (m = 0.95), whereas at the same low-velocity limit this difference grows to about 10 − 12% for the velocity estimators. Better agreement occurs at the moderate expansion rate limit (where radiation and matter epochs are) with a negligible difference for the correlation length estimators and a difference of about 5% for the velocity estimators. Given that in the high-expansion rate limit the VOS model reduces itself to two parameters, mξ dξ = v 2 + cv dη (1 − m)η (4.26) dv 2mv k0 = (1 − v 2 ) − , dη ξ (1 − m)η (4.27) being c and k0 , and the fact that at larger expansion rates uncertainties are globally smaller (thus imparting larger statistical weight) it is not completely unsurprising that even a small disagreement (order 10%) is sufficient to impact the calibration. Given how important these two parameters are to either indicate the presence of wiggliness or the importance of loop formation to overall energy losses, we will now turn our attention to the high expansion rate limit and to finding a possible solution to this conundrum. Throughout the literature it can also be seen that certain potential systematics may arise in this high expansion rate/low velocity limit. For instance, in Abelian-Higgs in flat space it is (analytically) expected that vortices, provided their velocities are small enough (non-relativistic, v < 0.2), will pin to lattice sites, unable to overcome a potential barrier between sites [44]. This barrier is known as the Peierls-Nabarro barrier [35, 37]. Such an effect would manifest itself as a lattice spacing dependence on the measured values of velocity dependent functions, k(v) and F(v) and could be a limiting factor in the calibration of the VOS (especially deep in the non-relativistic regime). Continuing on the issue of lattice spacing, in [24] it was shown that in Minkowski flat space the equation of state velocity estimator was closer to the analytical expectation for the velocity of an oscillating string than the conjugate momenta estimator. In addition reducing the lattice spacing improved agreement between estimators, approximating the latter to the first, and both to the analytical expectation. This follows as the increase in resolution can be seen as an approximation to the continuum limit. The conclusion that in Minkowski space the equation of state estimator is more reliable might not, however, apply for the opposite limit. With the goal of understanding how lattice spacing might impact the behavior of both velocity estimators in this critical regime, we will characterize the dif- 138 4 Calibration of Extended VOS Mode Fig. 4.17 The effect of lattice spacing on the velocity estimators, for high expansion rate simulations. The top panels show the separate values of the velocities (with the corresponding statistical uncertainties) obtained with the two velocity estimators defined in the text, while the bottom panels show the relative difference between the two. Left and right side panels depict the results for standard spacing and half-lattice spacing ferences in velocities from high expansion rates used for relativistic calibrations (m=0.93, 0.94, 0.95) all the way into the deep non-relativistic regime. We will do so for two sets of simulations, each corresponding to two lattice spacings, Δx = 0.5 and Δx = 0.25. At each expansion rate, the two velocities will have a statistical uncertainty and will represent the average of 10 runs–as is standard the same 10 initial conditions are used for both sets. From the top panels of Fig. 4.17 one can already see that the difference between estimators increases with expansion rate and lattice spacing improves agreement between them. More specifically, the relative differences, which maximally at m = 0.997 will be or order 60% for Δx = 0.5 and minimally of order 10% for m = 0.93 and agreement is significantly improved at Δx = 0.25. Case in point, the lower expansion rates are now in near agreement at lower m and in the non-relativistic limit being at 30% maximally. Qualitatively, the largest change is from the equation of state estimator, whose velocity vω approaches the one of the conjugate momenta vφ . Although in Minkowski and in most of the expansion rates here analyzed, the velocity from the equation of state estimator underestimates the conjugate momenta velocity, at the largest expansion rate m = 0.997 we can see the opposite. The reason 4.3 Abelian-Higgs Cosmic Strings 139 Fig. 4.18 The effect of lattice spacing on the velocity estimators, as manifest in the velocitydependent functions of the VOS model, for high expansion rate simulations. The top panels show the momentum parameter k(v) while the bottom panels show the energy loss parameter F(v), all with the corresponding statistical uncertainties, obtained with the two velocity estimators defined in the text. Left and right side panels depict the results for standard and half-lattice spacing for this is unclear, although an inversion of this tendency might signal a possible breakdown of the reliability of vω . These behaviors should translate themselves into a change in the shapes of k(v) and F(v): a larger disagreement should be obvious with a coarser lattice spacing, and the differences should be larger at large expansion rate. Both conclusions follow from the panels of Fig. 4.18. From comparing bottom and top panels of the aforementioned figure, we confirm that a small lattice spacing is necessary to assess the proper shape of velocity-dependent functions and therefore calibrate the VOS adequately. We note as well one detail: the fact that at coarse lattice spacing the equation of state estimator fails to give physically reasonable values (ie. positive definite) for the energy loss function. Although the decreased lattice size improves this somewhat, it doesn’t avoid that some measured values of this function are negative. This casts some doubt on the reliability of this estimator at this low velocity limit. While this runs counter to the Minkowski expectation, it is also not completely surprising: fast expansion and Minkowski are opposite limits and is also true that the evolution of string networks in flat space is not representative of evolution in expanding Universes [30]. In terms of the expansion rates used in our relativistic calibration (m ∈ [0.93, 0.95]), the figure shows similar predictions for the velocity dependent functions. The 140 4 Calibration of Extended VOS Mode Table 4.10 Calibrated VOS model parameters for our two choices of lattice spacing Δx and corresponding lattice sizes, for the two different choices of velocity estimators, vω2 and vφ2 , further described in the main text. Displayed values correspond to 16th, 50th, 84th percentiles of the posterior distributions Lattice Δx Velocity d r β k0 q c size estimator 20483 40963 20483 40963 0.5 0.25 0.5 0.25 vω2 vφ2 0.37+0.02 −0.02 +0.07 0.34−0.05 +0.05 0.33−0.04 0.36+0.09 −0.06 1.27+0.17 −0.15 +0.52 2.32−0.40 +0.39 1.86−0.32 2.56+0.64 −0.50 2.33+0.21 −0.20 +0.29 2.62−0.26 +0.28 2.65−0.26 2.69+0.30 −0.27 1.21+0.03 −0.03 1.06+0.03 −0.02 1.05+0.03 −0.03 +0.03 1.04−0.02 2.57+0.06 −0.06 2.37+0.06 −0.07 +0.08 2.84−0.08 +0.07 2.47−0.07 0.03+0.02 −0.03 0.25+0.02 −0.02 0.31+0.02 −0.02 +0.02 0.30−0.02 better agreement in velocities and in such predictions, hints at lattice spacing being able to aid in softening the tension between the two calibrations from the previous section. In order to test this, but assume the same dynamic range independent of lattice spacing, we will compare the previous 20483 with standard Δx = 0.5 calibrations with a new ones at lattice size 40963 with half-standard spacing Δx = 0.25. We will perform this test for both velocity estimators and infer which estimator should provide a more stable calibration. The resulting calibrations are summarized in Table 4.10, and the corresponding corner plots for each velocity estimator, are found in Figs. 4.19 and 4.20. Both from the table and from comparing these two figures we can immediately infer that the conjugate momenta estimator calibration is far more stable to changes in lattice spacing, with the only statistically significant change occurring in the parameter q. This follows not only from the fact that high expansion rate velocities change (comparatively) very little, but also from the approximation of this velocity to the equation of state one at moderate expansion rate. This approximation can be indicative of behavior consistent with Minkowski space simulations. For the equation of state estimator, both Fig. 4.19 and Table 4.10 show that the model parameter estimation is heavily affected by lattice spacing: 4 out of 6 parameters differ (by several standard deviations). Note that the loop-chopping efficiency c is included in this group of parameters and ceases to be consistent with zero. This again follows from the dramatic changes at high-expansion rate. To conclude we remark that both calibrations are in near agreement at Δx = 0.25 as can be read from the Table 4.10. This shows that although the EoS velocity is less reliable in the low velocity limit, it is possible to compensate by reducing lattice spacing. This is exactly the opposite result to what was shown to happen in Minkowski space simulations: in [24] the EoS estimator is shown to be the most reliable, in a flat space oscillating string. 4.3 Abelian-Higgs Cosmic Strings 141 Fig. 4.19 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity estimator vω and two different lattice spacings Δx = 0.5 and Δx = 0.25. The 2D panels the depict the 1σ and 2σ confidence regions 4.3.4 Coda: Observational Impact of Different Calibrations With this we have thus obtained the highest-resolution, most accurate calibration of the Velocity-dependent One Scale model. We will now highlight the need for an accurate calibration of this model, since it can be connected to observational consequences. Although deriving improved constraints is beyond the focus of the present work, we can showcase how different calibrations will affect computed power spectra. In order to achieve this goal we will use Cosmic Microwave Background spectra, computed for different VOS calibrations. The reason to do so, is because these spectra are expected to mostly depend on a description of long string dynamics. In the case of Stochastic Gravitational Wave Background produced by network loops one would 142 4 Calibration of Extended VOS Mode Fig. 4.20 Corner plots for the MCMC calibration of the VOS model, obtained with the velocity estimator vφ and two different lattice spacings Δx = 0.5 and Δx = 0.25. The 2D panels the depict the 1σ and 2σ confidence regions require not only an accurate calibration but also an in-depth study of how reliable the Nambu-Goto is approximation is on a loop-by-loop basis (and at which scales). To perform such computations, we will use the publicly available software CMBACT4 [38]. Note that in this code, the string network is approximated by several unconnected segments (Unconnected Segment Model–USM; see [3, 8]) whose average properties are dictated by integration of the VOS model equations. While this is a simplistic approximation (and indeed more robust simulation based methods are available, see [2, 10, 11, 21, 25, 26]), it is true that for our intents and purposes, which is to assess the impact of different calibrations, it will suffice. For each spectra we wish to compute we will use 200 realizations of the string network–this has been shown to be a sufficiently large number to produce spectra as accurate as the approximation allows [13]. We will also keep to standard values of the code, except in terms of the calibrations used (see next paragraph). Of note we will 4.3 Abelian-Higgs Cosmic Strings 143 Fig. 4.21 Power spectrum of cosmic microwave background anisotropies, obtained with the CMBACT4 code, for the standard Nambu-Goto calibration, the standard Abelian-Higgs calibration and two extended VOS calibrations in the present work. The panels depict the TT (top left), EE (top right), TE (bottom left) and BB (bottom right) spectra. In each case the spectrum is obtained by averaging over 200 realizations keep the string decay parameter L f to its default standard value (L f = 0.5). Although the risk is that string segments might decay earlier than their respective epoch for 0 ≤ L f ≤ 1, given the illustrative nature of this work and how little information there is on the impact of this parameter, we decided to keep the default value for all computations. We will obtain TT, EE, TE and BE spectra for four different calibrations, all normalized to the same tension (Gμ = 1.1 × 10−7 ). The first two cases correspond to the standard version of the VOS model (already used in the CMBACT4 codebase) which uses helicoidal ansatz for the momentum parameter k(v) (see [29]) and only the linear velocity term (loop-chopping) in the energy loss function F(v). The difference between the first two cases will be the parameter c, which is set to either c = 0.57 or c = 0.23 ± 0.04 to reflect either the Abelian-Higgs or Nambu-Goto calibration (previously used in the Planck 2013 constraints, [2]). The last two calibrations will be a result of the extended VOS with a best and worst case-scenario for the parameters– ie. the lowest resolution lattice more affected by choice of velocity estimator (10243 , Δx = 0.5 and vω2 ) and a more reliable calibration (40963 , Δx = 0.25 and vφ2 ). The resulting spectra are shown in Fig. 4.21. The first detail we can remark is that all Abelian-Higgs spectra are in better agreement with each other than with 144 4 Calibration of Extended VOS Mode Nambu-Goto. This is entirely expected as the velocities of these networks are similar, whereas Nambu-Goto strings exhibit larger velocities [4, 9, 12, 34, 36, 39], possibly due to the absence of radiation backreaction. However, even among Abelian-Higgs calibrations there are still some noteworthy scale-dependent differences. The most discrepant calibration of the three is the 10243 case, without a shadow of doubt. Comparing the standard AH calibration with our new 40963 case, the differences are more obvious at high multipole l for most spectra, except T T (where instead the differences are more evident at low l). In the case of T T the scalar, vector and tensor components are 16%, 30% and 11% higher in the 40963 case than in the standard case. At present we do not know if such differences are only a result of the calibration, or if there might be scale-dependent effects inherent to the USM approximation. In any case, our point still stands: the accuracy of calibrations of the VOS model will have an impact on the computed spectra. 4.4 Conclusion The study of defect networks, either analytical or observational is full of challenges. In the case of thermodynamic models used to study string evolution, such as the Velocity-dependent One Scale model, there will be a number of free parameters one cannot analytically predict (or if we can predict, it will be based on simplistic assumptions). This establishes an important symbiosis: while simulations are limited in resolution and dynamic range, therefore unable to simulate the entirety of cosmological history, they can calibrate semi-analytical modelling which is then capable of extrapolating the full evolution of a network. Given that the evolution of defect networks is also intimately connected to observational consequences, we sought to use and apply the improvements of the extended VOS model for domain walls for both super-horizon wall networks and gauged cosmic string networks. For the first case, we saw that the scaling expected from the re-entry into the horizon can have an impact in model calibration, changing the parameters related to small-scale structure (ergo, parameters included in the momentum parameter). We also saw the model adequately described the approach to scaling behavior, lending further credence to the model being an adequate description of network evolution. Afterwards we began using a similar prescription to study the calibration of this model in the case of cosmic strings. Initially we began with relatively small lattices, preliminarily obtaining that energy loss is not predominately through radiative losses, ie. loop creation and evolution is still a contributor (unlike the domain wall case). In order to continue improving our parameter estimation, we decided to explore how numerical choices can affect the calibration and we sought to implement a more robust pipeline. The more robust pipeline came as a consequence of a better handling of uncertainties, either through automatic propagation, or by the use of Bayesian inference to predict parameter posteriors. Equipped with this likelihood analysis and the most extensive set of high-resolution simulations set to date, we References 145 assessed the impact of different numerical choices, either related to initial conditions cooling, dynamic range, lattice spacing and choice of numerical velocity estimator. Our conclusions led us to a best case scenario calibration. This was all made possible by the speed-ups possible in simulations described in the previous chapter. In order to conclude, we also compared the observational impact of different calibrations on the Cosmic Microwave Background by computing anisotropies for several cases. We saw scale-dependent differences across all computed spectra. Although we cannot exclude that such differences might be a result of the approximations made for spectra computation, this still highlights the need for a proper characterization of VOS model parameters. We remark that there is still one possible systematic source whose effects on the extended VOS are unknown, the PRS algorithm. Fortunately we will analyse outputs from 81923 physical evolution runs with m ∈ [0.5, 0.75] in the upcoming months. In parallel, we will implement some improvements in the VOS calibration tool, such as full Bayesian Inference with differential equations (instead of assuming a linear scaling solution) and phase space analysis. References 1. Achucarro A, Avgoustidis A, Leite AMM, Lopez-Eiguren A, Martins CJAP, Nunes AS, Urrestilla J (2014) Evolution of semilocal string networks: Large-scale properties. Phys Rev D 89(6):063503. https://doi.org/10.1103/PhysRevD.89.063503 2. Ade PAR, et al (2013) Planck results. XXV. Searches for cosmic strings and other topological defects. Astron Astrophys 571:A25. https://doi.org/10.1051/0004-6361/201321621 3. Albrecht A, Battye RA, Robinson J (1997) The Case against scaling defect models of cosmic structure formation. Phys Rev Lett 79:4736–4739. https://doi.org/10.1103/PhysRevLett.79. 4736 4. Allen B, Shellard EPS (1990) Cosmic string evolution: A numerical simulation. Phys Rev Lett 64:119–122 5. Avelino PP, Martins CJAP (2000) Topological defects: Fossils of an anisotropic era? Phys Rev D 62:103510. https://doi.org/10.1103/PhysRevD.62.103510 6. Avelino PP, Martins CJAP, Oliveira JCRE (2005) One-scale model for domain wall network evolution. Phys Rev D 72:083506. https://doi.org/10.1103/PhysRevD.72.083506 7. Avgoustidis A, Copeland EJ, Moss A, Skliros D (2012) Fast analytic computation of cosmic string power spectra. Phys Rev D 86:123513. https://doi.org/10.1103/PhysRevD.86.123513 8. Battye RA, Robinson J, Albrecht A (1998) Structure formation by cosmic strings with a cosmological constant. Phys Rev Lett 80:4847–4850. https://doi.org/10.1103/PhysRevLett.80.4847 9. Bennett DP, Bouchet FR (1990) High resolution simulations of cosmic string evolution. 1. network evolution. Phys Rev D41:2408 10. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D 75:065015. https://doi.org/10.1103/PhysRevD.75.065015 11. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2010) CMB power spectra from cosmic strings: predictions for the Planck satellite and beyond. Phys Rev D 82:065004. https://doi.org/10. 1103/PhysRevD.82.065004 12. Blanco-Pillado JJ, Olum KD, Shlaer B (2011) Large parallel cosmic string simulations: New results o n loop production. Phys Rev D 83:083514. https://doi.org/10.1103/PhysRevD.83. 083514 146 4 Calibration of Extended VOS Mode 13. Charnock T, Avgoustidis A, Copeland EJ, Moss A (2016) CMB constraints on cosmic strings and superstrings. Phys Rev D 93(12):123503. https://doi.org/10.1103/PhysRevD.93.123503 14. Correia J, Martins C (2021a) High-resolution GPU-accelerated Abelian-Higgs string simulation: length colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710664, https://doi. org/10.5281/zenodo.4710664 15. Correia J, Martins C (2021b) High-resolution GPU-accelerated Abelian-Higgs string simulation: velocity colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710670, https:// doi.org/10.5281/zenodo.4710670 16. Correia JRCCC, Martins CJAP (2020) Quantifying the effect of cooled initial conditions on cosmic string network evolution. Phys Rev D 102(4):043503. https://doi.org/10.1103/PhysRevD. 102.043503 17. Correia JRCCC, Martins CJAP (2021) High resolution calibration of the cosmic strings velocity dependent one-scale model. Phys Rev D 104(6):063511. https://doi.org/10.1103/PhysRevD. 104.063511 18. Correia JRCCC, Martins JAP (2019) Extending and calibrating the velocity dependent onescale model for cosmic strings with one thousand field theory simulations. Phys Rev D 100(10):103517. https://doi.org/10.1103/PhysRevD.100.103517 19. Correia JRCCC, Leite ISCR, Martins CJAP (2014) Effects of biases in domain wall network evolution. Phys Rev D 90(2):023521. https://doi.org/10.1103/PhysRevD.90.023521 20. Correia JRCCC, Leite ISCR, Martins CJAP (2018) Effects of biases in domain wall network evolution. II. Quantitative analysis. Phys Rev D 97(8):083521. https://doi.org/10.1103/ PhysRevD.97.083521 21. Daverio D, Hindmarsh M, Kunz M, Lizarraga J, Urrestilla J (2016) Energy-momentum correlations for Abelian Higgs cosmic strings. Phys Rev D 93(8):085014. https://doi.org/10.1103/ PhysRevD.95.049903 https://doi.org/10.1103/PhysRevD.93.085014. [Erratum: Phys. Rev. D95, no.4, 049903(2017)] 22. Foreman-Mackey D, Hogg DW, Lang D, Goodman J (2013) Emcee: The MCMC hammer. Publ Astron Soc Pac 125:306–312. https://doi.org/10.1086/670067 23. Hindmarsh M, Stuckey S, Bevis N (2009) Abelian higgs cosmic strings: Small scale structure and loops. Phys Rev D 79:123504. https://doi.org/10.1103/PhysRevD.79.123504 24. Hindmarsh M, Lizarraga J, Urrestilla J, Daverio D, Kunz M (2017) Scaling from gauge and scalar radiation in Abelian Higgs string networks. Phys Rev D 96(2):023525. https://doi.org/ 10.1103/PhysRevD.96.023525 25. Lazanu A, Shellard P (2015) Constraints on the Nambu-Goto cosmic string contribution to the CMB power spectrum in light of new temperature and polarisation data. JCAP 02:024. https:// doi.org/10.1088/1475-7516/2015/02/024 26. Lazanu A, Shellard EPS, Landriau M (2015) CMB power spectrum of Nambu-Goto cosmic strings. Phys Rev D 91(8):083519. https://doi.org/10.1103/PhysRevD.91.083519 27. Leite AMM, Martins CJAP (2011) Scaling properties of domain wall networks. Phys Rev D 84:103523. https://doi.org/10.1103/PhysRevD.84.103523 28. Lopez-Eiguren A, Urrestilla J, Achucarro A (2017) Measuring global monopole velocities, one by one. JCAP 1701(01):020. https://doi.org/10.1088/1475-7516/2017/01/020 29. Martins CJAP, Shellard EPS (2002) Extending the velocity dependent one scale string evolution model. Phys Rev D 65:043514. https://doi.org/10.1103/PhysRevD.65.043514 30. Martins CJAP, Shellard EPS (2006) Fractal properties and small-scale structure of cosmic string networks. Phys Rev D 73:043515. https://doi.org/10.1103/PhysRevD.73.043515 31. Martins CJAP, Shellard EPS, Vieira JPP (2014) Models for small-scale structure on cosmic strings: Mathematical formalism. Phys Rev D 90(4):043518. https://doi.org/10.1103/ PhysRevD.90.043518 32. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10. 1103/PhysRevD.93.043534 References 147 33. Martins CJAP, Rybak IYu, Avgoustidis A, Shellard EPS (2016) Stretching and Kibble scaling regimes for Hubble-damped defect networks. Phys Rev D 94(11):116017. https://doi. org/10.1103/PhysRevD.94.116017 https://doi.org/10.1103/PhysRevD.95.039902. [Erratum: Phys. Rev. D95, no.3, 039902(2017)] 34. Moore J, Shellard E, Martins C (2002) On the evolution of Abelian-Higgs string networks. Phys Rev D 65:023503. https://doi.org/10.1103/PhysRevD.65.023503 35. Nabarro PRN (1947) Dislocations in a simple cubic lattice. Proceed Phys Soc 59:256–272 36. Olum KD, Vanchurin V (2007) Cosmic string loops in the expanding universe. Phys Rev D 75:063521 37. Peierls R (1940) The size of a dislocation. Proceed Phys Soc 52:34–37 38. Pogosian L, Vachaspati T (1999) Cosmic microwave background anisotropy from wiggly strings. Phys Rev D 60:083504. https://doi.org/10.1103/PhysRevD.60.083504 39. Ringeval C, Sakellariadou M, Bouchet F (2007) Cosmological evolution of cosmic string loops. JCAP 0702:023. https://doi.org/10.1088/1475-7516/2007/02/023 40. Rybak IYu, Avgoustidis A, Martins CJAP (2017) Semianalytic calculation of cosmic microwave background anisotropies from wiggly and superconducting cosmic strings. Phys Rev D 96(10):103535. https://doi.org/10.1103/PhysRevD.96.103535 41. Sousa L, Avelino PP (2010) Evolution of domain wall networks: The press-ryden-spergel algorithm. Phys Rev D 81:087305. https://doi.org/10.1103/PhysRevD.81.087305 42. Sousa L, Avelino PP, Guedes GSF (2020) Full analytical approximation to the stochastic gravitational wave background generated by cosmic string networks. Phys Rev D 101(10):103508. https://doi.org/10.1103/PhysRevD.101.103508 43. Vilenkin A, Shellard EPS (2000) Cosmic strings and other topological defects. Cambridge University Press, ISBN 9780521654760. http://inspirehep.net/record/1384873?ln=pt 44. Ward RS (1997) Bogomolnyi bounds for two-dimensional lattice systems. Commun Math Phys 184:397–410. https://doi.org/10.1007/s002200050065 45. Yamaguchi M, Yokoyama J (2002) Lagrangian evolution of global strings. Phys Rev D 66:121303. https://doi.org/10.1103/PhysRevD.66.121303 46. Zeldovich Y, Khlopov M (1978) On the concentration of relic magnetic monopoles in the universe. Phys Lett B 79(3):239–241. ISSN 0370-2693. https://doi.org/10.1016/03702693(78)90232-0. http://www.sciencedirect.com/science/article/pii/0370269378902320 Chapter 5 Strings in U(1) L × U(1) L Simulations String theory cosmologists have discovered cosmic strings lurking everywhere in the undergrowth Tom Kibble As mentioned in the epigraph, fundamental superstrings were shown in [6] to behave in some ways similarly to cosmic strings, being one-dimensional horizon sized objects formed in the early Universe and even obeying a homotopy condition (see introduction and references therein for more details). Attempts to simulate these objects will inevitably fall into two categories: extensions of Nambu-Goto cosmic string simulations (of which simulations with more than one string type do not exist) or with field theory simulations, of which the simplest model to implement would correspond to the dual local U (1) model of [4]. L = |Dμ φ|2 − 1 μν 1 F Fμν + |Dμ ψ|2 − G μν G μν − V (|φ|, |ψ|) 4 4 (5.1) for two complex scalar fields φ and ψ, two U (1) gauge fields Aμ and Bμ with corresponding gauge field strengths G μν and Fμν . The covariant derivatives, gauge field strengths and potential are given by, Dμ = ∂μ − ie p Aμ Dμ = ∂μ − ieq Bμ (5.2) Fμν = ∂μ Aν − ∂ν Aμ G μν = ∂μ Bν − ∂ν Bμ (5.3) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations, Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_5 149 5 Strings in U (1) L × U (1) L Simulations 150 V (|φ|, |ψ|) = λp λq (|φ|2 − σ 2p )2 + (|ψ|2 − σq2 ) − κ(|φ|2 − σ 2p )(|ψ|2 − σq2 ) (5.4) 4 4 where λ p,q are scalar couplings, e p,q are gauge couplings and κ is the coupling between scalar fields. If these parameters are such that 0 < κ < 21 λ p λq then the vacuum manifold will be non-trivial in two sectors supporting the existence of two strings and, due to the non-zero value of κ, also bound state strings (see [4]). As mentioned previously, this model cannot capture all the proper physics of cosmic superstrings. For instance, it clearly lacks supersymmetry, intercommutation probabilities will not be the expected ones for superstrings, the tension spectra cannot be reproduced. Still, a field theory simulation of this model can tell us a lot about the abundances of bound states and the properties of scaling of each network component. Although previously studied in the literature, there are three details that can play an important role which thus far have been neglected. These details are comprised of the need of proper non-PRS evolution for bound state formation, the impact of resolution effects on small-scale structure (and again on bound state formation) and the possibility of unequal tensions. Our previous Abelian-Higgs code (appropriately extended) will be able to shed some light on these issues. We additionally remark that everything in this chapter is unpublished work that in the near future will form the basis of a publication. 5.1 Simulation Overview For the Abelian-Higgs string we begin with the originally U (1) locally invariant Lagrangian density presented in the previous subsection. The Lagrangian should have the form given previously, and from variation of the action under the assumption of FLRW metric and the temporal gauge (A0 = 0), come the equations of motion, λp ȧ (|φ|2 − σ 2p ) − κ(|ψ|2 − σq2 ) φ̈ + 2 φ̇ = D j D j φ − a 2 φ a 2 (5.5) λq ȧ (|ψ|2 − σq2 ) − κ(|φ|2 − σ 2p ) ψ̈ + 2 ψ̇ = D j D j ψ − a 2 ψ a 2 (5.6) Ḟ0 j = ∂ j Fi j − 2a 2 e2p I m[φ∗ D j φ] (5.7) Ġ 0 j = ∂ j G i j − 2a 2 eq2 I m[ψ ∗ D j ψ] (5.8) along with a two copies of Gauss’s law for each sector, ∂i F0i = 2a 2 e2p I m[φ∗ φ̇]φ (5.9) 5.1 Simulation Overview 151 ∂i G 0i = 2a 2 eq2 I m[ψ ∗ ψ̇]ψ (5.10) which will be tested in the validation section. All of the previous equations are symmetric under exchanges of φ ↔ ψ, Aμ ↔ Bμ , Fi j ↔ G i j and under suitable exchanges of couplings λ p ↔ λq and e p ↔ eq . We will assume criticality in the two λ sectors, which corresponds to 2ep,q = 1 and we will assume both symmetry breaking 2 p,q scales to be unity σ p,q = 1. This corresponds to equal tension constituent strings which do not (at coupling κ = 0) form bound states with winding greater than one (say of the form (2,0) or (0,2)). In these simulations, much like in Abelian-Higgs and domain walls cases, the comoving string radius varies as a −2 for both scalar and gauge radii. Here we adopt the same comoving width controlling trick used in previous chapters, where all coupling constants are made to vary as, λ p,q = λ p,q0 a −2(1−β) e = e p,q a −(1−β) κ = κ0 a −2(1−β) (5.11) where, following [2] κ is made to vary in the same way as λ p,q , and β which can be set to 0 (constant comoving width), 1 (recovering the original equations of motion) and β < 0 implies radii growth. The previous comoving width trick implies that the equations of motion can now be re-written as, λp ȧ (|φ|2 − σ 2p ) − κ(|ψ|2 − σq2 ) φ̈ + 2 φ̇ = D j D j φ − a 2β φ a 2 (5.12) λq ȧ j 2β 2 2 2 2 ψ̈ + 2 ψ̇ = D D j ψ − a ψ (|ψ| − σq ) − κ(|φ| − σ p ) a 2 (5.13) ȧ Ḟ0 j + 2(1 − β) F0 j = ∂ j Fi j − 2a 2β e2p [φ∗ D j φ] a (5.14) ȧ Ġ 0 j + 2(1 − β) G 0 j = ∂ j G i j − 2a 2β eq2 [ψ ∗ D j ψ] a (5.15) This is very similar to writing two discretized evolution equations for the two Abelian-Higgs strings, but with an extra term from the existence of the coupling term in the potential. The evolution equations for updating conjugate momenta of scalar fields in both string sectors are executed first in a timestep via, (1 + δ)Π x,η+ 21 = (1 − δ)Π − x,η− 21 + Δη D −j D +j φx,η λ p0 x,η 2 (|φ | aη2β φx,η [ 2 − σ 2p ) − κ(|ψ | − x,η 2 σq2 )] (5.16) 5 Strings in U (1) L × U (1) L Simulations 152 1 1 (1 + δ) x,η+ 2 = (1 − δ) x,η− 2 + Δη D −j D +j ψ x,η − aη2β ψ x,η [ λq0 (|ψ x,η |2 − σq2 ) − κ(|φx,η |2 − σ 2p )] 2 (5.17) where δ is defined as, δ= 1 mΔη 1 dlna Δη α = α . 2 dlnη η 2 (1 − m)η (5.18) and sets the strength of Hubble damping on the scalar fields. After we have the evolution equations for the gauge conjugate momenta E and H , x,η+ 21 (1 + ω)E i x,η− 21 = (1 − ω)E i + Δη[−∂i− Fi j + 2e2p0 aη2β I m[φ∗ Di+ φ]x,η ] x,η+ 21 (1 + ω)Hi x,η− 21 = (1 − ω)Hi + Δη[−∂i− G i j 2 2β + 2eq0 aη I m[ψ ∗ Di+ ψ]x,η ] where ω, ω = δ(1 − β) (5.19) (5.20) (5.21) introduces an unphysical damping of the gauge field for any value of β = 1. In the case β = 1, the damping vanishes as is expected in the physically correct limit. To finish updating all the fields in a timestep, we then apply the following, 1 φx,η+1 = φx,η + ΔΠ x,η+ 2 1 ψ x,η+1 = ψ x,η + Δ x,η+ 2 x,η+1 = Ai x,η+1 = Bi Ai Bi x,η+ 21 x,η + ΔE i x,η + ΔHi x,η+ 21 (5.22) (5.23) (5.24) (5.25) again for all non-conjugate fields in all sectors. One not trivial detail about defect simulations is that one is often constrained in the amount of dynamical range available to reach and characterize certain behaviors. For instance, to ascertain the scaling of all types of networks, it is necessary for the fields to relax into string configurations and then that these networks reach scaling, again for all types of strings in the simulation (pure p, q or bound states therein). There is no unique way to proceed in terms of generating initial conditions, but we can take some notes from the works of [2, 5] while applying some differences of our 5.1 Simulation Overview 153 own. In all cases we will generate simple initial conditions, with both complex scalar fields having magnitudes set by the VEV and random phases. Similarly, we will also apply a diffusive phase to smoothen out the large gradients present in such initial conditions, followed by a damping period to form string networks more quickly. The details of how these operations are applied (and when, in terms of conformal time) are different from the previous authors. We start with the diffusive cooling equations, taken from the previous cooling functions used in Abelian-Higgs for the first sector, φ̇ = D j D j φ − λ p0 (|φ|2 − σ p )φ 2 Ḟ0 j = ∂ j Fi j − 2e2p0 [φ∗ D j φ] (5.26) (5.27) and interchangeably for the q-sector by, again, performing appropriate substitutions (φ → σ, Aμ → Bμ , λ p → λq and e p → eq ), λq0 (|ψ|2 − σq )ψ 2 (5.28) 2 [ψ ∗ D j ψ] . Ġ 0 j = ∂ j G i j − 2eq0 (5.29) ψ̇ = D j D j ψ − Note that while this method of cooling is equal to what was previously implemented for the Abelian-Higgs case (even ignoring the presence of the coupling term of the potential) it is different from what was applied in the works of [2, 5] which merely averages each scalar field over nearest neighbors 30 times. In practice, both serve the effect of smoothing over the initial condition gradients and we see no indication throughout the validation procedure that using either method produces a discernible impact on network evolution. We can however explore the introduction of the coupling term in the potential to see if the formation of bound states initially changes. This cooling is applied from an initial conformal time of η = −10.0, until η = 1.0, in timesteps of size Δη = 1/30. Next comes an extra step, vital in preparing the initial conditions such that the network might reach scaling as fast as possible–a damping period. Although previous work on U (1) L × U (1) L -string network simulations have used the cosmological discretized equations (along with the same lattice spacing and timestep size) for this period they introduce a fixed Hubble damping factor γ = aȧ of either 0.2 [5] or 0.4 [2]. In our case the highly damped cosmological evolution will be set by varying the expansion rate. For instance if set to m = 0.9 and by applying damping in a short, early period of time (from η = 1.0 until η = 5.0), the damping factor will be sufficiently large to relax the fields into a network quickly. The fact that this period is short also allows the network to spend most time in evolution under matter or radiation. We will set the duration of this phase such that γ never falls below 0.4. Although [5] suggested that a sufficiently high damping would aid in the formation of bound states (at an initial stage), we see no evidence of a large population of bound states forming at early stages of the network evolution. 154 5 Strings in U (1) L × U (1) L Simulations 5.2 Validation We will now proceed with a comparison of the present simulations with two literature references of [2, 5]. We will use the same lattice spacing as [2] (Δx = 0.5, which is double the lattice spacing of [5]) the modified cooling and damping as applied here means we do not require a large lattice for this validation, and therefore the box size is kept at 10243 . This also has the advantage of provinding the same dynamic range presented in the literature. As explained previously, the simulation parameters are consistent with equal tensions for constituent strings and criticality for each sector: 2 . In the validation we λ p0 = λq0 = 2, e p0 = eq0 = 1 and λ p0 = λq0 = 2e2p0 = 2eq0 will also force all constants to scale such that the comoving width of strings is kept constant throughout (β = 0). In addition both symmetry breaking scales are equal and of value one, σ p = σq = 1. These parameter choices essentially mean basic strings of either sector have tensions μ p = μq = 2πσ p = 2πσq –unlike the cosmic superstring case where μ p = μ F and μq = μ F /gs and μ F is the fundamental string tension. The coupling constant κ0 will be kept at 0.9, which according to [2] showed no significant change in the amount of pq-string abundances relative to κ = 0.95. We will conduct the validation procedure as we introduce all the necessary estimators for mean string separation and mean velocity squared. Before we do so however, there are two necessary simple checks to be performed: first the verification of Gauss’s law for each U (1) L sector and secondly visual confirmation of the existence of more than two string networks. The verification of Gauss’s law for each sector can be seen below in Fig. 5.1 for a radiation era run with β = 0, starting at conformal time η = 1.0, throughout the damping phase and subsequent cosmological evolution. As can be observed both versions of Gauss’s law are preserved, at worst to 10−7 . Note that this is true both in the damping and the cosmological phase: there is no a priori reason for this not to be true for the damping phase. A first approach to the visual confirmation of multiple string types, can be done by using isosurfaces of the absolute value of each scalar field |φ| < 0.5, |ψ| < 0.5 and by using isosurfaces of the interaction potential: Vinteraction = κ(|φ|2 − σ 2p )(|ψ|2 − σq2 ) = ζ (5.30) where ζ is a threshold of value which takes the value of 0.855 from [2] if one desires to exactly match the winding cells of bound states. The result can be seen in the two panels of Fig. 5.2 for a small lattice (5123 , Δx = 0.5) conformal times η = 101.5 and η = 127.5. The red and blue surfaces correspond to the isosurfaces of the scalar fields, and green surfaces to the interaction potential with a threshold of ζ = 0.655 (lowered from 0.855 for visual purposes). Red surfaces without overlapping with blue ones are indicative of strings of the first sector ( p-strings in our nomenclature), and blue of the second type (q-string). Here we visually confirm that this threshold locates reasonably well possible overlaps of the two basic strings, indicative of bound states ( pq-strings). The fact that bound states are far rarer than the basic strings is in 5.2 Validation 155 Fig. 5.1 The panel shows the violation of Gauss’s law, throughout the evolution of a 5123 , Δx = 0.5 matter epoch simulation. Red indicates the p-string sector, blue the q-string sector qualitative agreement with the conclusion of [2] where bound states would constitute no more than 2% of the total string length. We will get back to the discussion of bound state abundances and on using winding to pinpoint pq-strings in a later subsection. 5.2.1 On Average Network Quantities Although this already seems indicative of the expected behavior of the network, we must assess as well which types of bound states exist and characterize them (in terms of relative abundances), as well as define suitable lengthscale estimators for each network type. We can do so by using the winding estimator from the previous chapters, ξp = V Lp ξq = V Lq ξ pq = V L pq (5.31) where L p and L q are given via the sum of non-zero plaquettes computed in the respective sector (so either using φ, Aμ or ψ, Bμ ) with the length of string in bound states L pq subtracted. L pq correspond to cells where both types of winding overlap. We leave the details of this computation to the next section. For now, we would merely add that with the choice of parameters made for the validation procedure (strings of equal tension), we do not observe any bound state with winding two or 156 5 Strings in U (1) L × U (1) L Simulations a b Fig. 5.2 The two panels show isosurfaces of the absolute magnitude of the scalar field φ (blue, representing p-strings), of the scalar field ψ (red, representing q-strings) set at |φ| = |ψ| = 0.5, together with isosurfaces of the interaction potential ζ = 0.655 (in green, representing pq-strings). The snapshots come from a matter epoch simulation of size 2563 , Δx = 0.5 at two conformal times η = 101.5 and η = 127.5, with no treatment applied to the initial conditions (which is why it needed to be evolved past half-a-light crossing time to reach scaling) 5.2 Validation 157 Fig. 5.3 The two panels show the evolution of the mean string separation for the full network, either using the Lagrangian length estimator (in orange) or the Winding length estimator (in purple) for both radiation and matter epoch (left and right-hand-side panels, respectively). In the case of the winding estimator, length of pq-segments is computed using the fast method described in the next section higher (in any sector), meaning pq-strings are, for the intents and purposes of this section, (1,1) bound states. This already enables us to detect scaling behavior for each relevant string species, although we can additionally define length estimators for the full network (all string types included together). This was done in [2] via a combined winding estimator, V ξW = (5.32) L p + L q + L pq or via the Lagrangian length estimator also used in previous chapters and throughout Abelian-Higgs simulations, −μV ξL = (5.33) Lx where V is the box volume, μ the tension of either basic string type1 and Lx is a lattice-wide sum of the Langrangian computed at every site. For now we will merely state that both full-network mean string separation estimators show scaling behavior, as can be seen in Fig. 5.3, albeit the details of scaling, namely the rate of change of ξ differ by about 10% when comparing ξW and ξL . There is no literature reference for these values and therefore no comparison can be directly made for full-network estimators. Given that there are two methods for the computation of L pq (and this quantity can impact the computation of ξW ) we will 1 The tension for the (1,1) states is different, however, and due to the fact that the most abundant string types have exactly the same tension, this estimator still gives a good indication of scaling behavior of the network. If we wish to test the specific case where the basic constituent strings have unequal tension, this estimator might not be the most reasonable choice. 5 Strings in U (1) L × U (1) L Simulations 158 leave a more thorough discussion of ξ˙ for the next section. We remark that although our end-goal is to merely study the abundance of bound states in these simulations and how the impact of certian numerical choices, we will also use the velocity estimators of [2] as an additional validation source. We can take the previously defined (and used in previous chapters) scalar field velocity estimator, where the mean string velocity is given by, v 2 φ = 2R , 1+ R (5.34) where R, in [2], is given by |Π |2 W R= x + 2 x,i |D x,i φ| W (5.35) although here we opt for a definition that takes into account both scalar fields (or both string sectors), 2 2 x (|Π | + || )W R= (5.36) + + 2 2 x,i (|D x,i φ| + |D x,i ψ| )W and W is a weight function, meant to merely localize the estimators around strings. We will refer to this as a the field-based velocity estimator. The weight function can be used to specify that we wish the mean velocity of all strings, merely by choosing it to be equal to the Lagrangian, or the mean velocity of only bound-state strings, by choosing the weight function to be given by the interaction potential. The evolution of both velocity estimates are presented in Fig. 5.4 with the corresponding asymptotic values computed for the conformal time range η ∈ [230.0 − 256.0]. Equivalently, the same information of these asymptotic values can be found 2 for either the full Fig. 5.4 The two panels show the evolution of the mean squared velocity vW network (by specifying the full Lagrangian as a weight function, in orange) or the pq-segments (by using the interaction potential as the weight, colored in green). Left panel corresponds to radiation epoch, right to matter epoch 5.2 Validation 159 2 for either the full network Table 5.1 The asymptotic values of the mean velocity squared vW (weighted by the Lagrangian) or pq-segments (weighted by the interaction potential) for the simulations from this section, Abelian-Higgs simulations from Chap. 3, pq-strings simulations from [2] Size, Δx m v 2 pq v 2 L Reference 10243 , Δx = 0.5 10243 , Δx = 0.5 10243 , Δx = 0.5 5123 , Δx = 1.0 10243 , Δx = 0.5 10243 , Δx = 0.5 10243 , Δx = 0.5 5123 , Δx = 1.0 1/2 1/2 1/2 1/2 2/3 2/3 2/3 2/3 – 0.319 ± 0.008 ∼0.33 – – 0.247 ± 0.006 ∼0.27 – 0.272 ± 0.002 0.293 ± 0.006 0.306 ± 0.004 – 0.228 ± 0.004 0.253 ± 0.009 0.264 ± 0.006 – Abelian-Higgs, Chap. 3 This section [2] [5] Abelian-Higgs, Chap. 3 This section [2] [5] in Table 5.1, along with the asymptotic values of pure Abelian-Higgs (only for the full network, same velocity estimator) and for the directly comparable figures of [2]. We 2 . In matter and note that [5] performs no velocity estimates. First let us discuss vL radiation epoch (m = 2/3, m = 1/2, respectively) the values are indeed compatible within 1 − σ uncertainties although lower than the ones presented in [2]. For the case of v 2pq , there is an important difference to note here: in [2] the velocities are given in a range, as for the conformal time period where the network reaches scaling (based on full network estimators), the velocity of pq-strings decreases. Here we observe a different behavior: the velocities increase throughout this range. Possibly, this can be attributed to the different preparation of initial conditions, specifically due to the different damping applied in our case. If we compute the values from the last 20.0 conformal time units however, we can compare to the lower bound of the ranges provided in [2]. Assuming uncertainties comparable to ours for this lower range, we can see that velocities are compatible in radiation epoch while underestimated in matter era (by about 10%). This can again be attributed to damping period, given that it is not entirely clear (see again Fig. 5.4) if the velocity has completely stabilized and thus requires more dynamic range to reach it’s asymptotic value. We will now move to the (more complicated) task of computing the length of bound-state strings, on seeing scaling of bound-states and on computing the relative abundances of pq-strings. 5.2.2 On Locating pq-Segments The computation of the total length of pq-strings throughout the box might seem like the relatively simple task of detecting plaquettes of exactly double winding (one winding of each type), however there can be some “accidental” displacements of windings, such that the two strings still overlap, but the windings are situated one 160 5 Strings in U (1) L × U (1) L Simulations site apart for instance. On the other hand one can have some “accidental” crossings at any given timestep of say only one plaquette, which in the next timestep are not overlapped–and therefore we can reasonably state that no bound-state string formed. In order to solve these issues the authors of [2] have allowed p- and q-strings to be considered as overlapped if within a transverse distance of four lattice units, and in the previous case of [5] using a maximum intersegment distance (rather than the transverse distance) of 5. In addition, [2] and [5] also considered a minimum threshold on the length of segments, requiring that proper bound states have a minimum of either L pq = 3 and L pq = 20, respectively. We will present here our two methods for the computation of the length estimator, which slightly differ from those present in the literature, but give rise to similar conclusions. The first thing we must mention is that we consider using cells pierced by strings, without specifying the plaquettes themselves. This is less memory intensive, although it gives no information about the orientation of different strings in either the same cell or neighbouring cells. The first method is made via a custom CUDA-kernel and is supposed to be no more than a fast, approximate (and less robust) computation of the length itself, while the second requires the usage of the in-situ capabilities of our simulation. Note that the second is more time-consuming, a result of being more IO-intensive, although the method itself is more robust, for reasons we shall explain next. So for the fast computation we merely store in two separate arrays (one for each sector) if a cell is pierced by a winding (with the content of each cell set by the total magnitude of windings that pierce it). We then verify if cells pierced in both arrays are either non-zero at the same location, or displaced by one in any direction. We then sum the number of cells pierced by both types of strings and use this as an indication of the number of segments that constitute the length. This method is less robust overall for two reasons, • The number of cells of an overlap region should give rise to a a collection of segments of length Δx × (Ncells − 1). Since this kernel merely detects overlap site-by-site, but doesn’t attempt to cluster cells into regions (this in fact would have to be done a posteriori, via some clustering algorithm), we directly use the number of cells overall. So, in a sense there is a “off-by-one” systematic error included here. This is solved by the robust method, as collections of cells are separated by regions. • Since no clustering is performed, it is not possible to apply a lower threshold on the number of cells a segment should contain to be considered a proper bound state. This again is easily solved in the robust method. We could argue that these biases might not be significant, or at least have a reasonable chance of yielding expected results: L pq will be dominated by large strings (and not by small, a few-cell strings) and having an off-by-error in large segments will not incur a significant difference towards the total string length. Nonetheless, the robust method also has an advantage as we shall see next: we can choose the overlap to occur at greater distances than just one cell over. This is due to the flexibility of 5.2 Validation 161 the filter that identifies overlap between two data sources in Paraview. However more robust and flexible the second method may be, the first one already yields reasonable results, already in agreement with literature results. In order to explicitly show this, let us define some additional quantities besides estimators ξ p , ξq and ξ pq .2 Similar to [2], one can compute two fractions which allow us to show the relative abundance of bound states. First is the total fraction of bound states f total , f total = L pq L pq + L p + Lq (5.37) which is expected not to exceed more than 1 − 2% when scaling is reached. We can also compute this abundance relative to the length of one of the constituent strings– for instance relative to L p . In this case, and following the definition and notation of [2], L pq . (5.38) fp = Lp All mean string separation estimators ξ p , ξq and ξ pq , and both fractions f total , f pq can be seen in Fig. 5.5 in both radiation (left-side) and matter (right-hand side) epochs, having used the fast method to compute L pq . All string separations achieve scaling, although the behavior of ξ pq is far less “smooth” than that of ξ pq . Even though we could be tempted to attribute this effect to the two systematic error sources previously listed, it might instead be related to the the low abundance of bound states. In fact, even in the literature [2], the “jagged” behavior of ξ pq can be observed. We note as well fraction f total is always within 1 − 2% as seen in the aforementioned literature references, and it seems to be slightly larger in matter era (bottom panels of Fig. 5.5). The fraction of bound states relative to the fraction of p-strings is also in agreement with the literature values, as in matter era it seems to be of about 4% (Table 5.2). We can now discuss the asymptotic values of several mean string separations. Note that this still assumes the fast computation and we will need to re-assess this comparison with the robust method. The comparison of asymptotic rate of change of ξ for full network ξW , for p-strings (equivalently q-strings) ξ p and for bound states ξ pq . For comparison we add the values of Abelian-Higgs, where the full network estimator and the p-string (for instance) values are the same, and for full network and p-string we compare to the work [5]. Note that we do not have any literature values to compare ξ˙pq with, and therefore the values are presented for reference–in the aforementioned reference the typical distance between Y-junctions is computed in a very different way, using the number of pq-segments. The values of ξ˙W and ξ˙p are both larger than those presented by [5]. Note that the addition of error bars in the values of [5] could certainly lessen the disagreement, although we suspect more factors play a part here. One thing to note is the resolution of the literature simulation of 5123 and lattice spacing Δx = 1.0, which could affect the reliability of the mean 2 Remember that the lengths L p and L q require subtraction of L pq and are therefore dependent on bound state identification. This is the reason we place a discussion of ξ p,q asymptotic behavior in this section, and not in the previous one. 162 5 Strings in U (1) L × U (1) L Simulations Fig. 5.5 All panels showcase the evolution of several quantities derived from the total length of pq-strings present in the box, L pq , derived from the fast computation method. The mean string separations ξ p and ξq are shown in the two upper plots (red and blue, respectively); ξ pq in the middle panels (in green); the relative abundances of pq-strings in the lower panels f total and f p , in purple and red, respectively. As in previous figures, the left-hand-side corresponds to radiation epoch, the right-hand-side to matter epoch string separation estimates and of the evolution of the fields themselves. We do not think the difference in damping and the difference in bound state identification could explain this tension as the fractions f total and f p are in agreement with [5]. Moreover, given the small values of f total and f p the identification of bound states cannot impact ξ p , ξq and ξW significantly (note that this might not be the case with ξ pq ). 5.2 Validation 163 Table 5.2 The asymptotic rate of change of the mean string separation ξ for either the full network using windings ξW , the rate of change for p-strings only and the rate of change of ξ pq for pq-strings. For comparison we provide both the literature values (which have no uncertainties reported) and the Abelian-Higgs values (where the full network estimator is equivalent to only having a single string type, say p-strings for instance) Size, Δx m Reference ξ̇W ξ̇ p ξ̇ pq 10243 , Δx = 0.5 1/2 0.280 ± 0.026 = ξ̇W – 10243 , Δx = 0.5 1/2 0.194 ± 0.026 0.270 ± 0.050 2.488 ± 0.612 10243 , Δx = 0.5 5123 , Δx = 1.0 10243 , Δx = 0.5 1/2 1/2 1/2 – 0.15 0.279 ± 0.016 – 0.22 = ξ̇W – – – 10243 , Δx = 0.5 2/3 0.194 ± 0.022 0.277 ± 0.042 1.634 ± 0.721 10243 , Δx = 0.5 5123 , Δx = 1.0 2/3 2/3 – 0.15 – 0.21 – – Abelian-Higgs, Chap. 3 This section, fast method [2] [5] Abelian-Higgs, Chap. 3 This section, fast method [2] [5] Before we move to the characterization of the robust method, there is an additional quantity which has been used to characterize pq-string field theory simulations, which is the average physical length of pq-segments. It can be defined as follows, l pq = L pq N pq (5.39) where N pq is the number of pq-segments and L pq is now defined as L pq = (Ncells − 1) × Δx. We additionally consider only regions with Ncells larger than 7 (equivalently, L pq > 3.0). Here we note another disadvantage of the fast method: since it does not classify windings into respective segments, it does not compute N pq , meaning that in order to study such quantity we must use the robust method. Speaking of the robust method, we can now introduce it. This method heavily relies on the visualization tooling introduced in Chap. 3: the catalyst adaptor was changed to output cells pierced by windings of the first sector, windings of the second sector, the interaction potential Vint and cells identified as pierced by pq-segments via the fast method. We further note that the value of each cell will depend on the absolute number of windings that pierce a cell through plaquettes: one p-string at criticality with λ p = 2e2p = 2 will pierce two faces of a cell, resulting in having an output value of 2. This is important as strings with higher windings, say a (2,0) string, would appear as a collection of cells valued at 4 for p-winding cells, but with no corresponding cells for q-windings. Note that this also leads us to note that no (2,0) or (0,2) or even other higher winding bound states were found. In total this means we end up with four (optional) output files per timestep: one per winding type (p, q or pq) and one for the interaction potential isosurface. All outputs are then 164 5 Strings in U (1) L × U (1) L Simulations passed to a custom pipeline, adapted from the centerlines script from Chap. 3, which visualizes not only an isosurface of Vint but also both types of cells, finds overlapping cells ( pq-strings) and a new connectivity filter is applied to separate each individual string segment. From this we can perform (and output) the computations of L pq and N pq , by also correcting for the possible systematic error sources described above. We further remark that adjusting the size of overlap region is crucial for a better bound state identification, and that this is immediately obvious from the comparison of the top (interaction potential) and middle (exact cell overlap) and bottom panels (one-cell over of threshold) of Fig. 5.6. Although it is clear both identification mechanisms are reasonable, it is also clear that in some cases not allowing some margin for the overlap inevitably fails to identify all cells pierced by bound states. We then present the average length of pq-segments, l pq and the mean string separation ξ pq computed via this method introducing one-by-one each of the choices made to lessen systematic effects. For this comparison, we will use a single 5123 matter epoch run, and compute these quantities either with no correction, then by correcting the off-by-one error, and then by using a minimum threshold on the length of segments. Note that we keep only one cell of overlap, on purpose, so the uncorrected case is equal to the fast method. The results can be seen on Fig. 5.7. It is immediately obvious the one-by-one correction does not alter heavily l pq . This is somewhat expected, as the number of segments is still the same, only the length of each segment is reduced by Δx. However, the second reduction does significantly alter the average length, which makes sense for two reasons: first the threshold value is higher than the average uncorrected l pq and then because removing small segments, might not change L pq drastically, but it will reduce the number of segments N pq . Given that there are a large number of small segments (this can be seen visually in Fig. 5.6), setting a minimum length threshold will however alter the number of segments, and consequently l pq . Curiously l pq seems to be stable, whereas in [2], it begins to evolve linearly around η ∼ 150 for the normal simulations (in combined simulations, l pq is also stable). This might due to different identification criteria but it might simply be a result of the smaller dynamic range combined with different initial conditions. Speaking of using different criteria, remember that there is a higher minimum segment length (L pq = 20), and a maximum distance between windings is 4 in lattice units in the work of [2]. There is of course no unique way to establish each of these criteria (only trial and error and visualization), and as is clear l pq is highly sensitive to them. Using the one-cell-over overlap, we find our choice of minimum segment length to be reasonable at cleaning small-segments. Comparing to the case of [5], linear evolution occurs also around time η < 100, but no data is available before. We will also need to discuss ξ pq . It seems each of the corrections increases its value by about ∼10% at all timesteps, although it does not significantly alter the tendency of ξ pq . It therefore seems the fast method is reasonable at revealing the scaling evolution of pq-strings. We additionally note that abundances f p , f total and mean string separations ξW , ξ p , ξq are not significantly altered given that L pq is also much smaller than L p , L q . 5.2 Validation 165 Fig. 5.6 Different views of a simulation snapshot of a 5123 , Δx = 0.5 simulation at conformal time η = 101.0 in matter epoch. Cells pierced by p-strings color coded in blue, those pierced by q-strings in red and pq-strings in green. In the left-hand-side panels we display all string types: p-strings, q-strings and pq-segments. In the right-hand-side panels we show only pq-strings. The identification of bound states is done either via the interaction potential isosurface ζ = 0.855 (on the top) or via winding cell overlap (exact on the middle panel, one cell of tolerance in the lower panel, obtained via fast method). We further note an additional detail, we are not excluding regions of cells, using a mininum threshold on the length of the bound state 166 5 Strings in U (1) L × U (1) L Simulations Fig. 5.7 Plots showing the evolution of l pq and ξ pq in the uncorrected case (which corresponds to the fast method) and either correcting for the off-by-one error or by additionally correcting for a minimum segment length Overall we conclude that the fast method can allow a lightweight exploration of the dynamics of a cosmic superstring network, with the exception of the computation of l pq , the average pq-segment length. Note that this will of course carry order ∼20% corrections to ξ pq although the expected conclusions do not change. 5.3 Impact of Physical Evolution We will now assess the impact of varying the comoving string width behavior on bound state abundance and network properties. To do so we will use increase the lattice size to 40963 keeping the same lattice spacing Δx = 0.5 and conformal timestep size Δη = 0.1. This update gives us more dynamic range, but even then, the “true” comoving string width, rs would vary too quickly both in radiation and matter epochs (rs ∝ a −2β and β = 1) for us to be able to resolve the string network either at late or early times. Therefore, we will use the core growth trick of [1] to allow the string width to grow in the initial stages of the simulation by setting β < 0. The way we choose these values is to choose a maximum string width, normalize the scale factor such that the radius is unity at the end, and from this take note of the maximum radius at some transition time (and we choose to fix either one or the other). The transition time is when the value of β jumps from core growth to physical evolution. The value of β for the growth phase is chosen such that the simulation begins with unit radius and reaches maximum radius at the transition time. For the same choice of maximum radius, and again the same normalization at the end of the simulation, the radius changes more quickly than in radiation for physical evolution, as such the transition time tends to be later. However, we would like to have the same dynamic range in either core growth or physical evolution, in both matter and radiation. As such we fix the transition time, and instead vary the 5.3 Impact of Physical Evolution 167 Table 5.3 A summary of the choices made to prepare the damping of the initial conditions, core growth phase, conformal time at which one changes to physical evolution and maximum string radius for physical simulations (β = 1) and constant comoving width ones (β = 0) for both matter and radiation epochs m m damping βgr owth β ηtransition rs,max 1/2 1/2 2/3 2/3 0.95 0.95 0.75 0.75 −0.25984202 0.0 −0.25984981 0.0 1 0 1 0 343.0 − 343.0 − 3.0 1.0 9.0 1.0 maximum radius. Additionally, before the growth phase, there is a period of diffusive and damped evolution as was applied in the validation section. Nonetheless we do make one small difference, given the over-damping noted in matter epoch in the validation section. This difference corresponds to using a lower damping expansion rate, of m = 0.75 for matter, while we use m = 0.95 for radiation era. During the damping period, and to simplify our lives, we take to having constant comoving width (rs = 1). A summary of all parameter choices can be seen below in the Table 5.3. For comparison purposes, we will also evolve constant comoving width simulations with the same periods of diffusive and damped evolution (same damping expansion rate), and the same set of 10 initial conditions is used for all choices (matter, radiation, physical and constant width simulations). As such we will have three different cases for us to compare β = 0, β = 1 and β < 0. As lowering β from 1 breaks the time-invariance of the discretized action, one can also think of it as parameter that controls violation of energy conservation. Given the role of kinematics in junction dynamics (see [3]), it is not entirely unexpected that changing β might have an impact in the formation/destruction of pq-segments. In this sense, the core growth period, which is a necessary “evil” for shrinking width simulations, ends up giving us an additional source of comparison. Let us begin with the full network mean string separation, shown in the panels of Fig. 5.8 for both physical (and core growth) and constant comoving width simulations–top and middle panels respectively, for both radiation (left-hand-side) and matter epoch (right-hand-side). Overall there is reasonable agreement between ˙ computed physical and comoving width simulations in the asymptotic values of ξ, for conformal time range η ∈ [950, 1024]. Again we see a ∼10% difference between the two length estimators ξL and ξW . This agreement is not entirely unexpected, and neither is the existence of a different slope when in the core growth regime. This is particularly obvious if we take computing a ratio between the quantities in the upper and lower panel: in core growth the difference between it and PRS keeps increasing, while this tendency is inverted as soon as we shift to physical evolution. This for now tells very little about the behavior of bound states core growth + physical simulations, and therefore we must describe other quantities. Before we discuss relative abundances and mean string separation of each species of string, we will first take a small detour to discuss velocities, either computed for 168 5 Strings in U (1) L × U (1) L Simulations Fig. 5.8 The four panels show the evolution of the mean string separation for the entire network (all string species included), using either the Lagrangian length estimator (in orange) or the winding length estimator (in purple). Left panels corresponds to radiation epoch, right to matter epoch. Top use core growth and subsequent physical evolution, lower panels correspond to constant comoving width simulations. L pq is computed via the fast method the full network (Lagrangian weighted) or for pq-segments (interaction potential weighted). The evolution of such quantities in both growth+physical and in constant comoving width simulations can be found in top and middle panels of Fig. 5.9, respectively. In both aforementioned figures and in Table 5.4 we can find the asymptotic values of the velocities, computed in conformal time range η ∈ [950, 1024]. Note that in some cases the velocities are not exactly stable and are still decreasing in range used. This would be the case of v 2pq which keeps decreasing in all cases 5.3 Impact of Physical Evolution 169 2 for either the full Fig. 5.9 The two panels show the evolution of the mean squared velocity vW network (by specifying the full Lagrangian as a weight function, in orange) or the pq-segments (by using the interaction potential as the weight, colored in green). Top use core growth and subsequent physical evolution, lower panels correspond to constant comoving width simulations. Left panel corresponds to radiation epoch, right to matter epoch 5 Strings in U (1) L × U (1) L Simulations 170 2 for either the full network Table 5.4 The asymptotic values of the mean velocity squared vW (weighted by the Lagrangian) or pq-segments (weighted by the interaction potential) for this simulations from this section, Abelian-Higgs simulations from Chap. 3, pq-strings simulations from [2] β m Size, Δx v 2 pq v 2 L Reference 0 0 0 1 0 0 0 1 1/2 1/2 1/2 1/2 2/3 2/3 2/3 2/3 40963 , Δx 10243 , Δx 10243 , Δx 40963 , Δx 40963 , Δx 10243 , Δx 10243 , Δx 40963 , Δx = 0.5 = 0.5 = 0.5 = 0.5 = 0.5 = 0.5 = 0.5 = 0.5 0.307 ± 0.005 0.319 ± 0.008 ∼0.33 0.309 ± 0.008 0.254 ± 0.005 0.247 ± 0.006 ∼0.27 0.249 ± 0.006 0.298 ± 0.005 0.293 ± 0.006 0.306 ± 0.004 0.298 ± 0.004 0.250 ± 0.004 0.253 ± 0.009 0.264 ± 0.006 0.246 ± 0.006 This section Previous section [2] This section This section Previous section [2] This section Table 5.5 The asymptotic rate of change of the mean string separation ξ for either the full network using windings ξW , the rate of change for p-strings only and the rate of change of ξ pq for pq-strings. For comparison we provide both the literature values (which have no uncertainties reported) and the Abelian-Higgs values (where the full network estimator is equivalent to only having a single string type, say p-strings for instance) β m Size, Δx ξ̇W ξ̇ p ξ̇ pq Reference 0 1/2 40963 , Δx = 0.5 0.172 ± 0.006 0.242 ± 0.010 1.501 ± 0.375 This section, fast method 0 1/2 10243 , Δx = 0.5 0.194 ± 0.026 0.270 ± 0.050 2.488 ± 0.612 Previous section, fast method 0 1/2 5123 , Δx = 1.0 0.15 0.22 – [5] 1 1/2 40963 , Δx = 0.5 0.179 ± 0.008 0.267 ± 0.015 2.012 ± 0.012 This section, fast method 0 2/3 40963 , Δx = 0.5 0.174 ± 0.007 0.248 ± 0.007 1.460 ± 0.240 This section, fast method 0 2/3 10243 , Δx = 0.5 0.194 ± 0.022 0.277 ± 0.042 1.634 ± 0.721 Previous section, fast method 0 2/3 5123 , Δx = 1.0 0.15 0.21 – [5] 1 2/3 40963 , Δx = 0.5 0.194 ± 0.011 0.286 ± 0.021 1.573 ± 0.292 This section, fast method except matter era, constant comoving width. Again we see agreement in this time range for both physical and PRS simulations, however, from the figures one can also infer that there is a clear disagreement between core growth pq-string velocities and constant width ones. To quantify the disagreements in core growth or in physical era we can compute the ratio between the quantities in the upper and middle panels. The lower panels show that physical evolution and constant comoving width are closer than the core growth phase to either (Table 5.5). 5.3 Impact of Physical Evolution 171 And so we move to discuss mean string separations of each string species, ξ p , ξq and ξ pq , along with relative abundances. All figures for the relevant quantities can be found in Fig. 5.10 for physical simulations, and the equivalent PRS ones are present in Fig. 5.11 for both radiation and matter epochs. The asymptotic rate of change of ξ can be found in their corresponding panels but also summarized in Table 5.4. The figures of ξ p and ξq are rather unsurprising in the sense that scaling is reached for both types of simulations, and that this scaling is consistent (reasonably similar) in terms of asymptotic ξ˙ in physical and PRS simulations. The real surprise starts revealing itself in the figures of ξ pq , where the change from core growth to physical evolution clearly signals a change in the behavior of bound states. While in core growth the mean pq-string separation is increasing linearly, as soon as the transition occurs this quantity decreases, signaling a possible production of bound states, and ˙ then inverts its tendency to growing linearly (albeit with different ξ). This peculiarity tells us that a change in pq-string abundances must have occurred after the jump to physical evolution, and that in principle production of bound states must have taken place. Given that we observe again linear scaling at the very end of the simulation, this also means the bound state abundances must have stabilized. Our suspicions are confirmed when looking at the lower panels of Fig. 5.10, which show the behavior f total and f p . Both decrease in the core growth phase to much lower abundance than in the PRS case. For instance f p in radiation drops to 0.5% indicating a much more sparse pq-string network than in the β = 0 case. However, as soon as the transition happens, p and q-strings begin binding more and more, producing more bound states. At the end of the simulation, both physical and PRS abundances are in agreement. The situation is not quite the same for matter era– although the reason is unclear bound state abundances appear slightly larger (at most 1% for f total ). We additionally note different tendencies in late time behavior for most abundances (although due to the large uncertainties, one should take this with a grain of salt). In radiation epoch there seems to be a decreasing tendency for constant comoving width, but a stable abundance for physical simulations. In matter epoch, the stable abundances of β = 0 seem to be decreasing in the β = 1 case (albeit we do not presently know what would happen with larger dynamic range). From this we can conclude that changing β can clearly have an impact in the formation and destruction of pq-strings, changing the abundances of pq-strings. This follows from the fact that different values of β different from unity will introduce a time-dependence upon the action thus spoiling energy conservation. Comparatively with β = 0 simulations, it is clear the core growth period can result in lower abundances, and in the case of matter epoch, β = 1 will yield slightly higher abundances. Overall, the effect of physical evolution still results in a sparse pq-string network and no evidence of frustration is found, with all string species scaling. Note that we need to verify if these conclusions still hold up with the slow method and to study the evolution of l pq . We will also take the time to introduce further refinements to the slow method (such as the use of classification instead of the connectivity filter) and to use Travelling Salesman methods to create filaments whose length can be easily computed. The 40963 outputs from Piz Daint have been produced and will be analysed over the next months. 172 5 Strings in U (1) L × U (1) L Simulations Fig. 5.10 All panels showcase the evolution of several quantities derived from the total length of pq-strings present in the box, L pq , derived from the fast computation method, except now we use core growth up to conformal time η = 343.0 and physical evolution from then onwards, for both radiation and matter. The mean string separations ξ p and ξq are shown in the two upper plots (red and blue, respectively); ξ pq in the middle panels (in green); the relative abundances of pq-strings in the lower panels f total and f p , in purple and red, respectively. As in previous figures, the left-hand-side corresponds to radiation epoch, the right-hand-side to matter epoch 5.3 Impact of Physical Evolution 173 Fig. 5.11 All panels showcase the evolution of several quantities derived from the total length of pq-strings present in the box, L pq , derived from the fast computation methods, in constant comoving width simulations, for both radiation and matter. The mean string separations ξ p and ξq are shown in the two upper plots (red and blue, respectively); ξ pq in the middle panels (in green); the relative abundances of pq-strings in the lower panels f total and f p , in purple and red, respectively. As in previous figures, the left-hand-side corresponds to radiation epoch, the right-hand-side to matter epoch 174 5 Strings in U (1) L × U (1) L Simulations 5.4 Conclusion In this chapter we have presented recent (still in progress) work using the U (1) L × U (1) L model of [4] to simulate networks with multiple string types. Our work began by first implementing the model itself by appropriately modifying our multi-GPU Abelian-Higgs simulation and then by validating its correctness. The first step do so was checking the validity of Gauss’a law in both string sectors and by visualizing bound states. The visualization strategy enabled us to use a more robust way to compute mean pq-segment separation and the average number of bound state segments in the simulation. We also developed an alternative less robust but much faster way to compute the amount of string in bound states, which in conjunction with the network velocity estimators allowed us to verify good agreement with the work of [2]. There exists a large disagreement with the work [5] for the slopes of some mean string separations which warrants further investigation. Nonetheless, the bound state abundances are in agreement with both works. After the validation was completed, we took to comparing PRS evolution with physical evolution–ie. with the true equations of motion–which, given the kinematic conditions for bound state formation (see for example [3] and references therein) could have an impact in the abundances measured. In order to do so for as much time as possible, we used a combination of the core growth method and larger lattices of size 40963 , Δx = 0.5, ergo a larger dynamic range than previously available. The combination of both lead us to see how changing energy conservation in the action itself can impact the formation and destruction of bound states. In the case of core growth nearly all bound states unwind into basic constituent strings. As soon as the true physical evolution begins bound states start re-forming and we even reach higher relative abundances than in the PRS case. For instance, f total reaches around 5% versus about 4% in constant comoving width, for matter epoch. Even if the uncertainties are quite large, it seems like both abundances exhibit opposite late time tendencies, decreasing in the PRS case, and increasing in the true evolution case. It is presently unknown if a larger dynamic range would result in a stabilization of all fractions, in either case. The core growth phase also shows how for negative β bound states are quickly unwound and the result is a much lower abundance (order 0.1%, or 0.2% for f total , f p in matter epoch with the fast method) of bound states. This allows us to conclude that even if the effect is quite marginal, changing how the comoving string width behaves (and therefore deviating from energy conservation at the level of the action) can easily affect which process is preferred: destruction or formation of bound states. In the upcoming months we will analyze all the outputs necessary for the robust method and we will verify if the conclusions are still in agreement with what has thus far been presented. In principle, given the comparisons made early in the validation section, we do not expect the conclusions to change. References 175 References 1. Bevis N, Hindmarsh M, Kunz M, Urrestilla J (2007) CMB power spectrum contribution from cosmic strings using field-evolution simulations of the Abelian Higgs model. Phys Rev D 75:065015. https://doi.org/10.1103/PhysRevD.75.065015 2. Lizarraga J, Urrestilla J (2016) Survival of pq-superstrings in field theory simulations. JCAP 1604(04):053. https://doi.org/10.1088/1475-7516/2016/04/053 3. Rybak IY, Avgoustidis A, Martins CJAP (2019) Dynamics of junctions and the multitension velocity-dependent one-scale model. Phys Rev D 99:063516. https://doi.org/10.1103/ PhysRevD.99.063516 4. Saffin PM (2005) A practical model for cosmic (p, q) superstrings. JHEP 09:011. https://doi. org/10.1088/1126-6708/2005/09/011 5. Urrestilla J, Vilenkin A (2008) Evolution of cosmic superstring networks: A numerical simulation. JHEP 02:037. https://doi.org/10.1088/1126-6708/2008/02/037 6. Witten E (1985) Cosmic superstrings. Phys Lett 153B:243–246. https://doi.org/10.1016/03702693(85)90540-4 Chapter 6 A New Generation of String Simulations so much depends upon a red wheel barrow glazed with rain water beside the white chickens The Red Wheelbarrow, William Carlos William 6.1 Overview and Concluding Remarks The main objective of this thesis was to build a simulation capable of tackling more realistic and/or more exotic strings at resolutions adequate for deriving reliable constraints with future observational data. This would of course require extreme hardware resources and more optimized simulations than had been available thus far. Our final objective in fact, was to explore cosmic superstrings—which can be done either via the Nambu-Goto simulations (albeit no one has, to the author’s knowledge, attempted to simulate multiple string types yet) or via models with two coupled field theory sectors, such as the U (1) × U (1) model of [25]. Although the double U (1) has the distinct advantage of being an easy modification of Abelian-Higgs simulations (just copy all fields and add a coupling term to the potential), it also has the disadvantage of using a type of simulations more often bottlenecked by hardware resources and/or degree of optimization. The experience of the author of this manuscript with domain wall simulations (field theory) and a previous interest in General Purpose Graphics Processing Unit programming eventually led us to take this path instead of the Nambu-Goto one. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. R. C. C. C. Correira, A New Generation of Cosmic Superstring Simulations, Springer Theses, https://doi.org/10.1007/978-3-031-20229-2_6 177 178 6 A New Generation of String Simulations To such an end, this thesis initially focused in creating simulations adequately tuned for extreme hardware resources, as seen on the third chapter. This journey began with the graphically accelerated domain walls software, where it was proved that there was a tangible benefit to using graphical processors in lieu of traditional Central Processing Units. To be more exact, when compared to the standard version of the domain walls code in two spatial dimensions, we obtained speed-ups of order 200 in single precision and order 50 in double precision. Having demonstrated that GPU’s can be used for field theory simulations we then took to implementing, in CUDA, the evolution of Abelian-Higgs strings on a lattice. We further attested to the speed-up benefit of using GPU’s for field theory simulations, as it showed a single GPU could complete operations at a rate of order 10−10 gpu-sec/site versus the 10−7 − 10−6 core-sec/site for the Latfield2 based Lattice Abelian Higgs. Still, this much higher throughput needed to be put to the test in a supercomputing facility where communications between GPUs could easily limit the scalability (and therefore speed-ups) of our application. Fortunately, in such an environment (the Piz Daint supercomputer) we demonstrated weak scaling up to 4096 GPUs. In spite of the comparatively poor strong scaling, the weak scaling capabilities of this code allows an order 30 speed-up compared to Lattice Abelian Higgs. Given that only outputting average quantities would limit future scientific work, we also added in-situ visualization to our simulation. This is to evade/lessen any Input/Output bottleneck that might rear its ugly head. Using winding output with in-situ versus the whole lattice we showed a reduction of computational time and output sizes at large lattice sizes (of 4 and one order of magnitude, respectively). The resulting simulations are therefore the “Red Wheelbarrow” of this work (see beginning epigraph of this chapter). From them, sprung the scientific works collected on the fourth chapter. These are based on the calibration of a version of the canonical semi-analytical model for string evolution. Known as the Velocity-dependent OneScale model, its extended version features explicit radiative energy loss functions (in addition to loop production only) and a generalized momentum parameter. When introduced it was shown to adequately predict the form of each velocity dependence function and even the evolution in the radiation-to-matter transition [20]. We further tested this model in the domain walls context, by calibrating it with the linear scaling reached by (anisotropic) super-horizon networks. This shows that even the approach to scaling is reasonably well described by the model post-parameter determination. We then moved to the gauged strings case, where first we showed the model could describe the asymptotic values of small lattices. From this preliminary calibration we could already determine that, unlike the domain walls case, energy loss by loop production does contribute significantly to energy loss of the network and that the analytical ansatz for the momentum parameter cannot reproduce simulations. Given the importance of proper model parameter estimation for observational footprints of string networks we set out to improve both our calibration pipeline and to find and eliminate possible systematic error sources. The modifications of the calibration pipeline included automatic error propagation and Bayesian inference for proper uncertainty estimation on model parameters. The exploration of systematic error sources was done via three different ways: by vary- 6.1 Overview and Concluding Remarks 179 ing the amount of cooling on initial conditions, by increasing the resolution of subhorizon scales and by choosing different velocity estimators. First, we saw that a too large amount of cooling would have a significant impact on model parameters, suggesting a change in the resulting network: smoothened small-scale structure and larger contribution of loop production to energy losses. Then increasing dynamic range/lattice size seems to suggest that there exists a minimum resolution beyond which model parameters will change minimally (similar to the domain walls case). Changing velocity estimators seemed to result in an apparent contradiction of model predictions, where either radiative energy loss would be the only relevant energy loss process or both loop-production and radiative losses could in equal weight contribute at medium expansion rates (radiation, matter). However, this apparent contradiction can be solved by reducing the lattice spacing, thus forcing both calibrations to be in agreement. With this we suggest a best and worst case scenario calibration and set out to show how this has an impact on observational consequences. To do so, we compute the Cosmic Microwave Background anisotropies for several cases and show that different calibrations can induce scale-dependent differences. Although we do not compute new limits on string tension, this illustrates the need for the best possible calibration. After all of these tasks were completed we moved on to the end-goal of this thesis: to explore cosmic superstrings via simulations of multiple interacting cosmic strings. All work is based on implementing the U (1) L × U (1) L model of [25]. Initially we made sure that, despite differences on either lattice spacing or initial condition treatment, our simulations were compatible with what is shown in the literature, in terms of velocity estimations and pq-segment abundances. Past this initial task, we simulated this toy model (p,q) string networks with true physical comoving width and higher resolution and dynamic range than was previously available. Given how quickly the comoving string radius drops in these simulations, we also needed to use the core growth “trick,” which also says something about bound state velocities and abundances in this regime. We see that in core growth pq-segments quickly unwind and the remaining few have a large velocity. Immediately after we switch to the true physical evolution, bound states begin to form again, as evidenced by the behavior of relative abundances, and with lower and decreasing velocities. The abundances eventually even become larger than the abundances for constant comoving width simulations, in the case of matter epoch. After the transition to shrinking width, the mean string separation of pq-segments does achieve scaling, even with the reduced amount of conformal time to reach it. In order to wrap-up this manuscript we will now consider what are the possible next steps to take, both from a computational perspective, and from the possible avenues to explore the physics of cosmological defect networks. 180 6 A New Generation of String Simulations 6.2 Next Steps 6.2.1 Computational Improvements In order to state some tentative next steps to improve the compute aspects of the simulation, it helps to put into perspective some of the advances in recent years across all simulation types. This will of course entail a comparison of features of certain codes and will result in a description of what the author believes an ideal (in computational terms) Abelian-Higgs simulation might entail. Note that we cannot promise to implement all of the features to be discussed, it is merely some food for thought. First up, we have to look into two libraries that serve a similar purpose, LatField2 [9] and the recently unveiled CosmoLattice [11]. They serve a similar goal in the sense that both are libraries which allow the user to implement fields on a lattice with very little effort from the user. Both use only MPI (one with a 1D decomposition, the other with a 2D decomposition) and both have Fast Fourier Transform capabilities. Although we could attempt to write a complete library, it is true it can be more difficult to optimize it for every single case, especially in CUDA. Nevertheless some steps were already taken during development to make this simulation a bit more userfriendly and its code much more reusable. In terms of the Fast Fourier Transform capabilities, the need for a 2D decomposition can hinder our performance, however, due to the communication heavy nature of FFTs they are bound to scale poorly already (even worse if we consider using GPUs). Implementing this is a good way to delve into the brute-force computation of Unequal Time Correlators for observational footprints of networks, therefore we should eventually add such capabilities to the simulations. Luckily some success has been obtained with reasonable weak scaling of FFTs by the pseudo-spectral solver of [24] and in mixed CPU/GPU with accFFT [12]. Last but not least, the I/O server used by Latfield2 solves the issue of communication between aggregators dominating runtime easily. For outputs with sizes that are multiples of lattice size, in-situ makes this a non-issue. For initial conditions files we have thus far used a file-per-process approach after failing to achieve good performance in singleshared file mode on Daint. The bottleneck responsible for this is the communication between processes and aggregator processes, and it’s exactly here that a dedicated I/O server could be useful. A shift to a hybrid approach with a file per aggregator could also be studied, making use of the ADIOS2 library [13]. Next we have two possibilities which can allow us to simulate even larger swathes of space time, especially given that GPUs do not possess a lot of memory: the diamond decomposition from the Nambu-Goto simulations of [5] and the Adaptative Mesh Refinement capabilities from the library GRChombo [6], as seen in [10, 15]. The first one subdivided the 4D simulation volume into smaller 4D volumes with only initial and final surfaces. There is no communication between volumes, since one merely reads and writes initial and final surfaces for each diamond. As such, the simulation is never memory limited (although a larger number of diamonds will of course result in faster simulation times). It is interesting to note that the most promising optimization 6.2 Next Steps 181 for stencil CUDA-kernels makes use of a 4d lattice [22] suggesting the two could possibly work in tandem. The second way to evade memory costs is to use Adaptative Mesh Refinement. In principle, in this technique one would create sub-lattices with smaller spacing only in regions of interest, and maintain a coarse lattice everywhere else. The level of refinement necessary will depend on the string width (if allowed to shrink), and the number of cells tagged for refinement will depend on how dense the network is—both quantities are related to the expansion rate. Implementing this can also be a technical challenge to do exclusively with GPUs, as the (sub-)lattices with fewer points might not exploit completely the high thread level parallelism of graphical accelerators. Therefore, we can hazard a guess that a mixed approach with CPU cores for lattices with fewer points and GPUs for the denser lattices might be the correct call. 6.2.2 Small-Scale Structure of Abelian-Higgs Strings In addition to the computational improvements described above, we can also highlight some extra physics that can be extracted from simulations outputs, in this case precisely from the centerlines in-situ capabilities described in Chap. 3. Knowing the exact position of strings at multiple timesteps, means we can extract worldsheet position vectors X and Ẋ for each string on the box. As kinks, cusps and loops take a central importance in understanding observational consequences of strings, it means we are in the unique position of understanding small-scale structures features (called wiggles) in strings evolving in different cosmological backgrounds. To exemplify, consider the stochastic gravitational wave background (SGWB). In the introduction we mentioned the typical size of loops at formation (expressed as fraction of horizon, α) and the loop spectra (Pi ) is specified from Nambu-Goto simulations—with the centerline detection and reconstruction, we can extract similar statistics and even compare the loop number density n(l, t) obtained via VOS with the directly measured one. There is however, an important question that should be answered before we do any of this: for what loops and at what scales is the NambuGoto approximation valid? The work of [21] has shown that initially square loops decay according to Nambu-Goto up until a point, where the loops are small enough for it to decay far more rapidly than NG predicts. This led the authors to propose the existence of a kink scale below which the NG approximations is invalid, and above loops would decay via gravitational radiation. However, [17] have shown, with network loops formed in flat space, that the NG approximation fails at all scales. While these results appear contradictory, the reality is that it shows that depending on the initial configuration of loops, and on the small-scale features of them, the NG approximation may or may not hold. What these studies do not show however, is how loops in a cosmological background behave—and as shown in [18] flat space strings have different small-scale properties from those of strings in radiation or matter epochs. This is also evident when visually comparing loops in [17] and loops in radiation era from [7, 8]. 182 6 A New Generation of String Simulations In order to investigate if Nambu-Goto is a reasonable approximation of AbelianHiggs strings and at what scales, we are now collaborating with Daniel JimenezAguilar, Jose Juan Blanco-Pillado and Jon Urrestilla in order to extract string centerlines and use them as initial conditions in Nambu-Goto simulations. We are currently working on the 2D case (domain walls and Nambu-Goto), although we expect to soon move towards 3D with local Abelian-Higgs strings. On our side, this involves finding links throughout the lattice and interpolating where the center of the wall lies. This then results in a collection of points, separated into different regions (loops) which need to be organized to construct a proper centerline. Note that the non-existence of windings, and therefore a magnetic flux to tell us how to connect each point to a neighbor, means we need a different method to order points. This involved exploring Travelling Salesman algorithms to always find the shortest path that links all these points, and thus constructs full centerlines. Then these inputs are passed to our collaborators which are then able to compute local velocities and evolve in Nambu-Goto as necessary. In parallel, we can also study how some properties of the string network change with scale and provide a comparison with NG expectations. This change of “description with scale” is not an intrinsic property of the network, being rather a dependency of average features of the network on scale. An example would be how the total length of a coastline changes with the size of “ruler” used to measure it. This dependency on coarse or fine-graining the description of a string has clear ties with the concept of renormalization, and in fact a quantity which is renormalized for a wiggly string is the mass per unit length μ. The more small-scale features a string network has, the more μ will diverge from the bare μ0 at small-scales. As the coastline analogy might have hinted, μ is also related to multi-fractal dimension, ∂lnμ ≈ dm (l) − 1 ∂lnl (6.1) where dm (l) is the multi-fractal dimension of the network and l is a coarse-graining scale. The computation of the renormalized mass per unit length has recently started being explored by Filipe Costa of the Faculty of Sciences of the University of Porto, with small lattices (2563 , x = 0.5) in radiation era, using spheres to set the coarsegraining scale along the string. The idea is to vary the radius of spheres necessary to cover a segment between two points, and then compute the ratio between Euclidean and comoving (along the string) distance. So far, preliminary tests of the script for computing renormalized mass per unit length provide the correct results for simple cases (such as a semi-circle) and show an increase of μ for long strings from small to large (near-horizon) sized scales. After more refinements (particularly in the performance of the pipeline) are ready, we will use it for studying larger lattices, output on Piz Daint. Note that while a previous study of small-scale structure exists for Abelian-Higgs networks [16], it was done only for 5123 (we are aiming for even larger lattices) and started from a different set of variables—two-point correlations of X and Ẋ —which can also be related to fractal dimension and μ. 6.2 Next Steps 183 Obtaining a complete description of μ will also enable an exploration of the wiggly-string VOS [19, 26] and of the possible scaling solutions that exist for flatspace or cosmological backgrounds (see [1]). This model is derived by assuming a modification of the Nambu-Goto action, where wiggles are described as a masscurrent propagating along the string. In this VOS, there are partial different equations describing the evolution of mass per unit length μ, rms velocity and total energy and how they related to one another depending on scale. This will be explored in collaboration with Ana Almeida and Filipe Costa, both of the Faculty of Sciences of the University of Porto. Comparisons with the analytical models of [2, 23] are also a possible avenue to explore. 6.2.3 Further Exploration of String Networks in the U(1) L × U(1) L Model Given that we have studied already the scaling of pq-strings in large resolutions with physical evolution (ergo, shrinking comoving width on the strings), we can now also explore the effect of unequal tensions for the constituent strings. Arguably, given that the tension spectra of cosmic superstrings arises from unequal tensions, with μ p = μ F and μq = μ F /gs we could consider this case the more realistic one. In principle, all that is necessary is to set one of the symmetry breaking scales such that 2σ p = σq . In this particular case, one should carefully consider how to introduce new velocity estimators and different criteria (from overlaps to interaction potential thresholds) to characterize each possible (new) species of string. One possible avenue to explore this particular model pertains to a differentchoice of parameter values. In case the coupling constant is chosen such that κ < − λq λ p only one the U (1) symmetries is broken. This results in a string in the broken sector, with a condensate core courtesy of the unbroken symmetry. Such strings possess a current and are therefore known as superconducting strings. To the authors knowledge, and at the time of writing, a cosmological full network simulation of superconducting strings has never been performed. The current cosmic superstrings code could be used for this endeavour, albeit it requires some careful tuning of parameters (to ensure a non-vanishing, positive mass, condensate inside the strings, see [14]) and numerical choices. One example of a numerical choice, would be the choice of initial conditions, where the scalar fields in the charged sector can be chosen to mimic a homogenous charged background as in [3, 4] and the gauge fields of this sector must be set in order to obey Gauss’s law. 184 6 A New Generation of String Simulations References 1. Almeida ARR, Martins CJAP (2021) Scaling solutions of wiggly cosmic strings 2. Austin D, Copeland EJ, Kibble TWB (1993) Evolution of cosmic string configurations. Phys Rev D 48:5594–5627. https://doi.org/10.1103/PhysRevD.48.5594 3. Battye RA, Pearson JA (2010) Charge, junctions and the scaling dynamics of domain wall networks. Phys Rev D 82:125001. https://doi.org/10.1103/PhysRevD.82.125001 4. Battye RA, Pearson JA, Pike S, Sutcliffe PM (2009) Formation and evolution of kinky vortons. JCAP 09:039. https://doi.org/10.1088/1475-7516/2009/09/039 5. Blanco-Pillado JJ, Olum KD, Shlaer B (2012) A new parallel simulation technique. J Comput Phys 231:98–108. https://doi.org/10.1016/j.jcp.2011.08.029 6. Clough K, Figueras P, Finkel H, Kunesch M, Lim EA, Tunyasuvunakool S (2015) GRChombo: numerical relativity with adaptive mesh refinement. Class Quant Grav 32(24):245011. https:// doi.org/10.1088/0264-9381/32/24/245011 7. Correia J, Martins C (2021a) High-resolution GPU-accelerated Abelian-Higgs string simulation: length colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710664 8. Correia J, Martins C (2021b) High-resolution GPU-accelerated Abelian-Higgs string simulation: velocity colormap, dataset on zenodo. https://doi.org/10.5281/zenodo.4710670 9. Daverio D, Hindmarsh M, Bevis N (2015) Latfield2: A c++ library for classical lattice field theory 10. Drew A, Shellard EPS (2019) Radiation from global topological strings using adaptive mesh refinement: methodology and massless modes 11. Figueroa DG, Florio A, Torrenti F, Valkenburg W (2021) CosmoLattice 12. Gholami A, Hill J, Malhotra D, Biros G (2015) Accfft: a library for distributed-memory FFT on CPU and GPU architectures. CoRR, abs/1506.07933 http://arxiv.org/abs/1506.07933 13. Godoy WF, Podhorszki N, Wang R, Atkins C, Eisenhauer G, Gu J, Davis P, Choi J, Germaschewski K, Huck K, Huebl A, Kim M, Kress J, Kurc T, Liu Q, Logan J, Mehta K, Ostrouchov G, Parashar M, Poeschel F, Pugmire D, Suchyta E, Takahashi K, Thompson N, Tsutsumi S, Wan L, Wolf M, Wu K, Klasky S (2020) Adios 2: the adaptable input output system. a framework for high-performance data management. SoftwareX 12:100561. ISSN 2352-7110. https://doi.org/10.1016/j.softx.2020.100561. https:// www.sciencedirect.com/science/article/pii/S2352711019302560 14. Hartmann B, Carter B (2008) Logarithmic equation of state for superconducting cosmic strings. Phys Rev D 77:103516. https://doi.org/10.1103/PhysRevD.77.103516 15. Helfer T, Aurrekoetxea JC, Lim EA (2019) Cosmic string loop collapse in full general relativity. Phys Rev D 99(10):104028. https://doi.org/10.1103/PhysRevD.99.104028 16. Hindmarsh M, Stuckey S, Bevis N (2009) Abelian Higgs cosmic strings: small scale structure and loops. Phys Rev D 79:123504. https://doi.org/10.1103/PhysRevD.79.123504 17. Hindmarsh M, Lizarraga J, Urio A, Urrestilla J (2021) Loop decay in Abelian-Higgs string networks 18. Martins CJAP, Shellard EPS (2006) Fractal properties and small-scale structure of cosmic string networks. Phys Rev D 73:043515. https://doi.org/10.1103/PhysRevD.73.043515 19. Martins CJAP, Shellard EPS, Vieira JPP (2014) Models for small-scale structure on cosmic strings: mathematical formalism. Phys Rev D 90(4):043518. https://doi.org/10.1103/ PhysRevD.90.043518 20. Martins CJAP, Rybak IY, Avgoustidis A, Shellard EPS (2016) Extending the velocitydependent one-scale model for domain walls. Phys Rev D 93(4):043534. https://doi.org/10. 1103/PhysRevD.93.043534 21. Matsunami D, Pogosian L, Saurabh A, Vachaspati T (2019) Decay of cosmic string loops due to particle radiation. Phys Rev Lett 122(20):201301. https://doi.org/10.1103/PhysRevLett.122. 201301 22. Nguyen A, Satish N, Chhugani J, Kim C, Dubey P (2010) 3.5-d blocking optimization for stencil computations on modern cpus and gpus. In: 2010 ACM/IEEE international conference References 23. 24. 25. 26. 185 for high performance computing, networking, storage and analysis. pp 1–13. https://doi.org/ 10.1109/SC.2010.2 Polchinski J, Rocha JV (2007) Cosmic string structure at the gravitational radiation scale. Phys Rev D 75:123503. https://doi.org/10.1103/PhysRevD.75.123503 Ravikumar K, Appelhans D, Yeung PK (2019) Gpu acceleration of extreme scale pseudospectral simulations of turbulence using asynchronism. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’19, New York, NY, USA. Association for computing machinery. ISBN 9781450362290. https://doi.org/ 10.1145/3295500.3356209. Saffin PM (2005) A practical model for cosmic (p, q) superstrings. JHEP 09:011. https://doi. org/10.1088/1126-6708/2005/09/011 Vieira JPP, Martins CJAP, Shellard EPS (2016) Models for small-scale structure on cosmic strings. II. Scaling and its stability. Phys Rev D 94(9):096005. https://doi.org/10.1103/ PhysRevD.94.096005. [Erratum: Phys Rev D 94, 099907 (2016)] Curriculum Vitae José R. C. C. C. Correia PostDoctoral Researcher “A very large part of space-time must be investigated, if reliable results are to be obtained”–Alan Turing Work experience PostDoctoral researcher Helsinki, Finland Department of Physics, University of Helsinki Sep. 2022 - Ongoing PhD Researcher Porto, Portugal Faculdade de Ciências da Universidade do Porto Jan. 2017 - May 2022 Education PhD in Physics Porto, Portugal Faculdade de Ciências da Universidade do Porto Jan. 2017 - May 2022 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. R. C. C. C. Correira, A New Generation of Superstring Simulations, Springer Theses, https://doi.org/10.1007/978-3-031-20229-2 187 188 Curriculum Vitae MSc in Physics Porto, Portugal Faculdade de Ciências da Universidade do Porto Sep. 2014 - Sep. 2016 BSc in Physics Porto, Portugal Faculdade de Ciências da Universidade do Porto Sep. 2010 - Sep. 2014 Skills DevOps Parallel Computing Programming Languages Docker, CircleCI CUDA, OpenCL, MPI C/C++, Python, Julia, LaTeX Portuguese, English, Italian, Spanish, French Honors & Awards 2022 Springer Thesis Award, PhD Thesis published by Springer Nature Online 2021 PRACE Best Poster Award, EuroHPC Summit Week Online List of Publications Multitension strings in high-resolution U (1) × U (1) simulations Phys. Rev. D 106, 043521 (published) Aug. 2022 J. R. C. C. C. Correia, C. J. A. P. Martins High Resolution calibration of string network evolution (accepted) arXiv:2110.15427 J. R. C. C. C. Correia, C. J. A. P. Martins Oct. 2021 High Resolution calibration of the cosmic strings velocity dependent one-scale model Phys. Rev. D 104, 063511 (published) J. R. C. C. C. Correia, C. J. A. P. Martins Abelian-Higgs cosmic string network evolution with multiple GPUs Comput. 34, 100438 (published) J. R. C. C. C. Correia, C. J. A. P. Martins Sep. 2021 Astron. Jan. 2021 Quantifying the effect of cooled initial conditions on cosmic string network evolution Phys. Rev. D 102, 043503 (published) J. R. C. C. C. Correia, C. J. A. P. Martins Aug. 2020 Curriculum Vitae 189 Abelian-Higgs cosmic string network evolution with CUDA 32, 100388 (published) Astron. Comput. Jul. 2020 J. R. C. C. C. Correia, C. J. A. P. Martins Extending and Calibrating the Velocity dependent One-Scale model for Cosmic Strings with One thousand field theory simulations Phys. Rev. D 100, 103517 (published) Nov. 2019 J. R. C. C. C. Correia, C. J. A. P. Martins Effects of Biases in Domain Wall Network evolution II. Quantitative analysis Phys. Rev. D 97 8, 083521 (published) J. R. C. C. C. Correia, I. S. C. R. Leite, C. J. A. P. Martins Apr. 2018 General purpose graphics-processing-unit implementation of cosmological domain wall network evolution Phys. Rev. E 96 4, 043310 (published) Oct. 2017 J. R. C. C. C. Correia, C. J. A. P. Martins Effects of Biases in Domain Wall Network evolution Phys. Rev. D 90 2, 023521 (published) J. R. C. C. C. Correia, I. S. C. R. Leite, C. J. A. P. Martins Jul. 2014 Conference Participations CarterFest: Black Holes and other Cosmic Systems Paris, France Jul. 2022 Attended 12th Iberian Gravitational Waves Meeting Braga, Portugal Jun. 2022 Attended GRChombo meeting Cambridge, United Kingdom Mar.-Apr. 2022 Attended GR Seminar Cambridge, United Kingdom Presented Seminar titled “On Multitension string networks” CSCS User Lab Day Attended COSMO21 Mar. 2022 Online Sep. 2021 Online Presented talk titled “High resolution calibration of string modelling” Aug. 2021 190 Curriculum Vitae Marcel Grossmann 16 Online Presented talk titled “High resolution calibration of Jul. 2021 string modelling” Current challenges in gravitational physics workshop Online Apr. 2021 Attended Interactive Computing with Jupyter on Piz Daint, using Python, ParaView and Julia Online Apr. 2021 Attended Ibericos 2021 Online Presented talk titled “On improving string Mar. - Apr. 2021 modelling” EuroHPC Summit week 2021 Online Presented poster titled “Coding the cosmos: A New Generation of Superstring Simulations” Mar. 2021 Zooming in on Strings and Vortons workshop Online Oct. 2020 Attended Scientific visualization course Online Oct. 2020 Attended Encontro Nacional de AStronomia e Astrofísica XXX Online Presented poster titled “Overcooling string simulations” Texas Symposium 2019 Sep. 2020 Portsmouth, United Kingdom Presented talk titled “Calibrating string evolution modelling” Dec. 2019 IA-ON 2019 Porto, Portugal Presented talk titled “On string evolution and graphical computing” Encontro Nacional de AStronomia e Astrofísica XXIX Oct. 2019 Online Presented poster titled “On extending cosmic string analytical modelling with one thousand simulations” Sep. 2019 Ibericos 2019 Bilbao, Spain Presented talk titled “On cosmic string evolution and graphical supercomputing” Apr. 2019 Curriculum Vitae 191 Cosmic Topological Defects: Dynamics and Multi-Messenger Signatures Leiden, Netherlands Presented talk titled “Cracks in the sky: Cosmic String Evolution with the compute unified device architecture” Sep. 2018 Gravity@Prague School Salamanca, Czech Republic Presented poster titled “Cracks in the sky: cosmic string evolution with CUDA” Sep. 2018 XIII Reunión Scientífica de la Sociadad Española de Astronomía (SEA) 2018 Salamanca, Spain Jul. 2018 Presented poster titled “Anisotropic Domain Walls” Ibericos 2018 Lisbon, Portugal Presented talk titled “GPGPU Anisotropic Domain Walls” Apr. 2018 Encontro Nacional de Astronomia e Astrofísica XXVII Lisbon, Portugal Presented talk titled “GPGPU Domain Wall network Jul. 2017 simulations” Física 2016 Braga, Portugal Presented poster titled “Search for new vector- like quarks in hadronic topologies” Sep. 2016 Ibericos 2014 Aveiro, Portugal Presented talk titled “Effects of biases on domain wall network evolution” Mar. 2014 Encontro Nacional de Astronomia e Astrofísica XXIII Lisbon, Portugal Presented talk titled “Different Views on cosmic defect evolution” Jul. 2013 Outreach “O Universo numa caixa” Blog article Writer “Strings of the cosmos” Article Interviewee SapoTeK, online Jul. 2021 ScienceNode, online May 2021 192 Curriculum Vitae “Iniciativa de computação avançada da UE destingue estudante da FCUP” Press Release Portal de Notícias da UP, online Apr. 2021 Interviewee “José Correia wins PRACE Best Poster Award” Press Release Website, online PRACE Mar. 2021 Interviewee “Accelerated and improved simulations shed light on topological defects in the Universe” Press Release CSCS Website, online Feb. 2021 Interviewee “Trabalhar em Cosmologia: Simulações de defeitos” Outreach talk Portugal Talk presenter “Fendas no Universo” Outreach talk Talk presenter Ovar, Feb. 2020 Tomar, Portugal Nov. 2018 COSMOEspresso goes to School // Job shadowing Multiple locations, Portugal Mentor 2018 - Ongoing Partículas: do Bosão Higgs à matéria escura Braga, Portugal Exhibition staff Feb. - Mar. 2016