Improvements to a Fully Kinetic Hall Thruster Simulation Code and Characterization of the ARCHsNE Cylindrical Cusped Field Thruster MASA CHU9Tl~iiI )F TECHNOLOGY by UN 16 2014 Louis Boulanger IBRARIES Ing6nieur dipl6m6 de l'Ecole Polytechnique Submitted to the Department of Aeronautics and Astronautics in partial fulfillment of the requirements for the degree of Master of Science in Aeronautics and Astronautics at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2014 @ Massachusetts Institute of Technology 2014. All rights reserved. /a Signature redacted A uthor ........................ Department of A4ro)autics and Astronautics May 22, 2014 Signature redacted Certified by..... Manuel Martinez-Sanchez Professor Thesis Supervisor Signature redacted, Accepted by ...... ........... . -Paulo C. Lozano Chairman, Department Committee on Graduate Theses 2 Improvements to a Fully Kinetic Hall Thruster Simulation Code and Characterization of the Cylindrical Cusped Field Thruster by Louis Boulanger Submitted to the Department of Aeronautics and Astronautics on May 22, 2014, in partial fulfillment of the requirements for the degree of Master of Science in Aeronautics and Astronautics Abstract This thesis presents an effort towards a better understanding of the operation of miniaturized cylindrical Hall thrusters. This class of space propulsion devices has come under attention since the 1990s as a possible candidate for the propulsion of 100-1000 kg satellites. In first part, a fully kinetic simulation code developed at the MIT Space Propulsion Laboratory (SPL) is described and applied to two devices of interest: the Princeton Cylindrical Hall Thruster (CHT) and the MIT DCFT (Diverging Cusped-Field Thruster). During this simulation effort, limitations of PTpic were identified which prompted a major redesign, whose central idea is a better parallelization of the workload. At the same time, possible candidates to replace the leapfrog algorithm in the particle pusher have been studied. This work is described in Chapter 3. Finally, chapter 4 presents the results of the testing of the recently built Cylindrical Cusped-Field Thruster (CCFT) performed at the SPL. Thesis Supervisor: Manuel Martinez-Sanchez Title: Professor 3 4 Acknowledgments My thanks go first to my advisor, Professor Martinez-Sanchez, for making these two years possible, and for his guidance, patience and friendliness. This thesis would not exist without him and all the friends and colleagues, members of the SPL or visitors, who helped me at some point along the way. First, Anthony Pang, who built the CCFT and made me feel welcome in the lab; Tom Coles, who provided invaluable help in dealing with computer and programming issues; Steve Gildea, who always answered my questions on "the code" in detail and very fast, even three time zones apart; Professor Lozano, who fought with me to keep Astrovac working; Todd Billings, who taught me Machine Shop 101 (several times); and Regina Sullivan and Jaume Navarro. Special thanks to Jeff for helping with the experiments, and being such a skilled plumber. Thanks to all my fellow SPLers for all the good moments and beer Fridays we had together. Grad School would have been much different without all my friends in the AeroAstro department and elsewhere. As we are about to leave for new horizons, I wish them all the best. A special mention here for my two roommates and friends, R6mi Lam and Alexandre Constantin. Last but not least, I want to dedicate this thesis to my parents and my sister Mathilde. They have supported me from kindergarten to MIT, and I look forward to being reunited with them. This project was funded by a grant of the Air Force Office of Scientific Research. The Direction G6nerale de l'Armement also supported me financially through these 2 years. 5 6 Contents 1 2 Introduction 1.1 Space propulsion . . . . . . . . . . . . . . . . . 17 1.2 Hall Effect Thrusters . . . . . . . . . . . . . . . 18 1.3 Miniaturized Hall thrusters . . . . . . . . . . . . 20 1.4 Hall thrusters numerical simulation . . . . . . . . . . . . . . . . . 21 1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 23 The PTpic code: description and applications 2.1 Particle-in-Cell codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.1 Operations performed over one iteration . . . . . . . . . . 24 2.1.2 Stability lim its . . . . . . . . . . . . . . . . . . 25 2.2 PTpic features . . . . . . . . . . . . . . . . . . . . . 25 2.3 Princeton Cylindrical Hall Thruster . . . . . . . . . 29 2.3.1 Simulation domain and Boundary conditions 29 2.3.2 Magnetic field . . . . . . . . . . . . . . . . . 30 2.3.3 R esults . . . . . . . . . . . . . . . . . . . . . 32 2.4 3 17 MIT Diverging Cusped-Field Thruster . . . . . . . 41 2.4.1 Simulation domain and Boundary conditions 41 2.4.2 Magnetic field . . . . . . . . . . . . . . . . . 41 2.4.3 Results . . . . . . . . . . . . . . . . . . . . . 43 47 The PTpic code: improvements 3.1 Parallelization redesign . . . . . . . . . . . . . . . . . . . . . . . . . 7 47 3.2 4 3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.3 Performance assessment . . . . . . . . . . . . . . . . . . . . . 53 Implicit particle pusher . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.1 Stability limitations of the leapfrog algorithm . . . . . . . . . 59 3.2.2 A framework for implicit PIC codes . . . . . . . . . . . . . . . 61 3.2.3 Semi-implicit field solver . . . . . . . . . . . . . . . . . . . . . 61 3.2.4 Particle Predictor-Corrector 63 Experimental Characterization of the Cylindrical Cusped-Field Thruster 67 4.1 Cylindrical Cusped-Field Thruster overview 4.2 4.3 . . . . . . . ... . . . . . 67 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2.1 Vacuum chamber . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2.2 Electrical setup . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.3 Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.4 Measurement devices . . . . . . . . . . . . . . . . . . . . . . . 71 R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.1 Anode overheating . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.2 Regime transitions . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.3 Voltage-Current characteristics 4.3.4 Faraday Cup measurements 4.3.5 Retarding Potential Analyzer Measurements 4.1.1 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 . . . . . . . . . . . . . . . . . . . 80 . . . . . . . . . . 84 Conclusion 87 5.1 Future work recommendations for PTpic . . . . . . . . . . . . . . . . 87 5.1.1 Electric field solver . . . . . . . . . . . . . . . . . . . . . . . . 87 5.1.2 Improved particle pushers 89 5.1.3 Refinement of the load metric 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 93 A Particle pusher integrators A.1 Leapfrog method A .2 Boris m ethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 9 10 List of Figures 19 . . . . . . . . . . . . 1-1 Hall Thruster schematic 2-1 The PTpic cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2-2 CHT simulation domain . . . . . . . . . . . . . . . . . . . . . . . . . 30 2-3 CHT magnetic field, with superimposed streamlines. . . . . . . . . . . 31 2-4 Count of electron superparticles - CHT -ase A . . . . . . . . . . . . . 32 2-5 Anode current - CHT case A . . . . . . . . . . . . . . . . . . . . . . . 33 2-6 Electron density - CHT case A . . . . . . . . . . . . . . . . . . . . . . 34 2-7 Electron temperature - CHT case A . . . . . . . . . . . . . . . . . . . 34 2-8 Ion density - CHT case A . . . . . . . ................. 35 2-9 Anode current - CHT case B . . . . . . . . . . . . . . . . . . . . . . . 36 2-10 Anode current - CHT case C . . . . . . . . . . . . . . . . . . . . . . . 37 2-11 Electron density - CHT case C . . . . . . . . . . . . . . . . . . . . . . 38 2-12 Electron temperature - CHT case C . . . . . . . . . . . . . . . . . . . 39 2-13 Anode current - CHT case D . . . . . . . . . . . . . . . . . . . . . . . 40 . . . . . . . ................. 42 2-15 DCFT magnetic field, with superimpose d streamlines . . . . . . . . . . 42 2-16 Anode current - DCFT simulation . . . . . . . . . . . . . . . . . . . . 44 2-17 Electron density in steady state . . . . . . . . . . . . . . . . . . . . . 44 . ................. 45 2-14 DCFT simulation domain 2-18 Electron temperature in steady state 3-1 Principal partitioning generated by ParMETIS with 24 processes . 50 3-2 Particles in a cell whose corners are owned by different processes . 51 3-3 Average iteration time for the low density plasma, regular grid test 11 55 3-4 Average iteration time for the high density plasma, regular grid test 56 3-5 Average iteration time for the high density plasma, large grid test 56 3-6 Average time per iteration (high density, regular grid, 24 processes) 57 3-7 Computation time breakdown - low density . . . . . . . . . . . . . . . 58 3-8 Computation time breakdown - high density . . . . . . . . . . . . . . 58 3-9 Number of electron super-particles in various PPC configurations. . . 65 4-1 Schematic DCFT diagram (from [9], p. 31 . . . . . . . . . . . . . . . 68 4-2 Simulated CCFT magnetic field (from [20], p. 36) . . . . . . . . . . . 70 4-3 Sketch of the electrical setup used for the CCFT experiments . . . . . 71 4-4 Stage system and thruster stand......... 72 4-5 The Faraday cup 4-6 Cutaway drawing of the RPA 4-7 Collected current as a function of the repelling voltage - Configuration A,0 0 . . . . . . . ...... ... .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . 73 73 73 4-8 Graphite cap showing some fragments of molten steel . . . . . . . . . 74 4-9 Stainless steel rod showing evidence of melting (at the right tip) . . . 75 4-10 Red glow emitted by the anode assembly during operation 4-11 Normal plume . . . . . . 75 . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4-12 Bag plum e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4-13 Jet plum e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4-14 Anode current-voltage characteristics - I . . . . . . . . . . . . . 79 4-15 Anode current-voltage characteristics - II . . . . . . . . . . . . . 79 4-16 Plume current density - Case 1 . . . . . . . . . . . . . . . . . . 81 4-17 Plume current density - Case 2 . . . . . . . . . . . . . . . . . . 82 4-18 Plume current density - Case 3 . . . . . . . . . . . . . . . . . . 82 4-19 Plume current density - Case 4 . . . . . . . . . . . . . . . . . . 83 4-20 Plume current density - Case 5 . . . . . . . . . . . . . . . . . . 83 . . . . . . . . . . . . . 85 4-21 Normalized 4-22 Normalized at different angles - case A - - dV at different angles - case B 12 85 4-23 Normalized 5-1 at different angles - case C . . . . . . . . . . . . . . . 85 Mesh numbering scheme used by PTpic up to now . . . . . . . . . . . 88 d 13 14 List of Tables 2.1 Parameters for CHT simulation A . . . . . . . . . . . . . . . . . . . . 32 2.2 Performance parameters - CHT case A . . . . . . . . . . . . . . . . . 33 2.3 Parameters for CHT simulation B . . . . . . . . . . . . . . . . . . . . 35 2.4 Average performance parameters - CHT case B . . . . . . . . . . . . 37 2.5 Parameters for CHT simulation C . . . . . . . . . . . . . . . . . . . . 37 2.6 Average performance parameters - CHT case C . . . . . . . . . . . . 38 2.7 Parameters for CHT simulation D . . . . . . . . . . . . . . . . . . . . 39 2.8 Average performance parameters - CHT case D . . . . . . . . . . . . 40 2.9 Parameters for the DCFT simulation . . . . . . . . . . . . . . . . . . 43 2.10 Performance parameters - DCFT simulation . . . . . . . . . . . . . . 43 3.1 Common parameters for code speed experiments . . . . . . . . . . . . 54 3.2 Specific parameters for the low density plasma, regular grid test . . . 54 3.3 Specific parameters for the high density plasma, regular grid test . . . 55 3.4 Parameters for the Particle Predictor-Corrector assessment . . . . . . 64 4.1 DCFT performance at the nominal operating point (from [201, p.30) 69 4.2 Configurations for the voltage-current characteristics 4.3 Comparison of measured anode current to the anode current expected . . . . . . . . . 78 from full single ionization and full double ionization of the propellant. 80 4.4 Configurations for the Faraday cup scans . . . . . . . . . . . . . . . . 81 4.5 Integrated results for Faraday cup scans . . . . . . . . . . . . . . . . 82 4.6 Configurations for the RPA measurements . . . . . . . . . . . . . . . 84 15 16 Chapter 1 Introduction 1.1 Space propulsion The constraints of cost and mass associated with spaceflight make the design of any space propulsion device a challenging task. Aside fron considerations of cost and reliability, two parameters are especially relevant to mission planners when considering the powerplant of a spacecraft: the thrust F (in N), and the specific impulse I,, (usually expressed in s), which are connected by the following equation, with g = 9.81m. s- 2 and T4 the total mass flow rate to the engine (including oxidizer if applicable): Isp = F rmg Thus, the specific impulse quantifies the "fuel efficiency" of a thruster: the higher it is, the lower the fuel flow rate required to achieve a given level of thrust is. If the thruster ejects only one type of particle at a single velocity c, and it operates in vacuum (no pressure effects), there is a straightforward expression of F, and thus of Isp: 17 F = Thc C (1.2) -ISP = - 9 Thus, the specific impulse is strongly connected to the exhaust velocity achieved by the engine. The thusters developed up to this day can be broadly classified into two categories. Chemical rockets are characterized by the ability to develop a very high thrust (up to 6.77 MN for the F-1 engine), at the expense of specific impulse. The best cryogenic engines achieve a vacuum Ip of 450-460s (RL-10 or Space Shuttle Main Engine), not far from the theoretical limit of about 500 s for the liquid hydrogen - liquid oxygen mixture. On the other hand, electric thrusters, which use an electromagnetic field to accelerate charged particles, can achieve specific impulses of several thousand seconds. However, they usually generate less than a Newton of thrust, both because of intrinsic limitations and because of the limited amount of electric power available on spacecrafts. 1.2 Hall Effect Thrusters Hall Effect Thrusters, also commonly referred to as "Hall thrusters", belong to the electric thrusters family. The initial development effort, from the 1960s until the early 1990s, was largely carried out by Soviet engineers and scientists. The typical Hall thruster, such as the Soviet/Russian SPT-100, is described in fig. (1-1). It features an annular discharge channel, sealed at one end by the anode and open at the other end. Gaseous xenon is injected near the anode region; as it travels through the discharge channel, it gets ionized by electrons coming from the external cathode, and the resulting ions are ejected at a high velocity because they are repelled by the anode. A radial magnetic field is set across the chamber, usually by electromagnets. Under the combination of electric and magnetic field, the electrons, instead of going straight to the anode, drift in the azimuthal direction, an effect known as the E x B drift 18 boron nitride cathode walls neutralizer anode / gas distributor inner thruster magnetic coil magnetic circuit outer magnetic coil Figure 1-1: Hall Thruster schematic (see [4], p.50). This increases the electron residence time - and thus the probability of hitting a neutral - by several orders of magnitude. Compared to ion engines (see [13] for a detailed description), Hall thrusters have several key advantages: " Only one power supply - between the anode and the cathode - is required, while ion engines need at least two. " They can achieve a much higher thrust density because the plasma is quasineutral everywhere, and thus is not subject to the space charge limitation (see [13], p. 85). However, a few disadvantages must be noted. Most of them stem from the fact that a single voltage difference is used for both ionization and acceleration, whereas ion engines allow the operator to separately tune the ionization voltage and the acceleration voltage. e Lower specific impulses, usually in the 1600s range. 19 * Lower efficiencies. * Since the whole chamber contains high-energy ions, erosion of the lining material (usually boron nitride) is a critical issue and ultimatley determines the lifetime of the device. An extensive flight experience has been accumulated by Soviet and Russian satellite operators since the 1970s, as well as Western operators since the late 1990s. Hall thrusters are mostly used for station-keeping and orbit raising (in 2010, a Hall thruster originally meant for station-keeping was used to raise the USA-214 satellite to geostationary orbit after the failure of its liquid-fuel apogee motor), but a few spacecrafts, such as the European Space Agency's SMART-1 moon orbiter, have used them as main propulsion. 1.3 Miniaturized Hall thrusters Besides the "traditional" Hall thruster of fig. (1-1), characterized by an annular discharge channel and a radial magnetic field, less conventional designs have been investigated since the late 1990s. A primary driver for these efforts is the need to miniaturize Hall thrusters in order to make them useful for a wider range of spacecrafts. When scaling down a Hall thruster, the designer typically wants to preserve the following ratios: " mean free path to characteristic length, in order to keep the collisionality level constant. Since the device is smaller, this means that a higher plasma density is required. " electron gyroradius to characteristiclength, in order to keep the electrons magnetically confined. This requires a significantly higher magnetic field. Consequently, the following characteristics are common among miniaturized Hall thrusters: 20 " A cylindrical discharge channel (no centerpiece), which offers a lower area-tovolume ratio than an annular one for small radii, and thus reduce losses to the walls. " An intense magnetic field with a complex topology, due to the absence of centerpiece which prevents the establishment of a radial magnetic field as in the conventional design. " While some designs, such as Princeton's Cylindrical Hall Thruster, retain electromagnets, permanent Samarium-Cobalt magnets tend to be preferred. 1.4 Hall thrusters numerical simulation Significant efforts have been invested in the development of simulation codes for Hall thrusters and other electric propulsion devices. Fully kinetic, or Particle-in-Cell (PIC) codes, which model all species as a collection of superparticles, theoretically have the greatest potential for high-fidelity simulation. The MIT SPL has developed its own code, called PTpic (Plasma Thruster Particle-in-Cell), since the early 2000s. 1.5 Thesis Overview This thesis comprises three main parts. In the first one, we present the main properties of PIC codes, describe the essential features of PTpic, and apply it to two different devices, for which a large body of experimental data is available. The second part deals with the improvements made to PTpic by the author, including a major redesign which substantially enhances performance. The third and last part is unrelated and deals with the testing of the Cylindrical Cusped-Field Thruster, recently built at the SPL. 21 22 Chapter 2 The PTpic code: description and applications 2.1 Particle-in-Cell codes Due to the high cost of testing, and the inherent difficulty of instrumenting the thruster without significantly disturbing its operation, numerical simulation is a useful tool to predict the characteristics of a Hall thruster. However, due to the low density and large mean free path prevalent in the device, a fluid treatment is not appropriate; instead, part or all of the charged species must be simulated as a collection of particles, which obviously consumes significant computational resources. If all species (neutrals, electrons and ions) are treated as particles, as is the case in PTpic, the method is said to be fully kinetic. If some species, usually electrons, are treated as a fluid, the method is "hybrid". The properties of a Hall thruster arise from the interaction of charged particles with the electromagnetic field created partly by themselves, and partly imposed by the boundary conditions. While it is possible to calculate directly the force exerted by all other particles on the particle of interest, it is impractical because it involves O(N 2 ) (N being the number of particles) calculation per step. A more common solution consists in projecting the charge created by the particles on a grid, solving for the potential on this grid, and then interpolating to find the value of the field at 23 Figure 2-1: The PTpic cycle the particles' positions. Thanks to this approach, the cost now grows linearly with the number of particles. This method is called Particle In Cell (PIC) and has been used by PTpic since its inception. 2.1.1 Operations performed over one iteration During one iteration, an electrostatic PIC code such as PTpic performs the following operations: o Calculation of the electric field; the magnetic field is assumed constant (electrostatic code). 0 "Particle pusher": moves the particles to their new position and handles collisions. o Calculation of particle moments, including charge, from the new particle distribution. 24 2.1.2 Stability limits This section applies to PIC electrostatic codes with the following characteristics: " Particles are stepped forward with the leapfrog algorithm (see Annex A). " The electric field used to move the particles at iteration n is determined by the charge distribution at the previous iteration (n - 1): Aon+l -- pf CO Under these conditions, PIC codes are subject to the following instabilities: " Plasma oscillations instability, when wpeAt > 2 (wpe being the plasma pulsation). This instability can be demonstrated by deriving the modified plasma dispersion equation when the time is discrete (with a timestep At) rather than continuous (see [15]). * Debye length instability, when a is too small (AD being the Debye length). A detailed analysis can be found in Birdsall ([3]). * CFL-type instability, when VthA > 1 (V > 1 or vo being the thermal veloc- ity, and vo the beam velocity). It is usually not a problem for the simulations we are running. " Cyclotron instability, when wcAt > 2 (P.being the cyclotron pulsation. However, Parker ([21]) suggests that this instability is rather benign. " Nonlinear Instability ([16]), when x, =- t 2 >> 1, where N, is the number of superparticles per cell. In that case, the density of superparticles is too low, so that the movement of a single superparticle induces a large variation of the electric field. It is not an issue as of now. 2.2 PTpic features Alteration of physics The typical conditions prevailing in the channel of a Hall thruster (ne =i o1- 10o 8 m-3, Te = 15eV) mean that the Debye length is of the order of 10-50 pm. For 25 a cylindrical Hall thruster, such as those we want to model with PTpic, the channel size is usually 3-5cm length x 1.5-2cm radius. This means that, in order to resolve the Debye length, a 4000x2000 mesh is required. The timestep, dictated by the cyclotron frequency and the plasma frequency criterion (see above), has to be of the order of 10-11 s. Convergence time is usually estimated from the time taken by the slowest species -i.e. the neutrals- to cross the chamber. Neutrals have a speed of about 200m/s, yielding a conservative estimate for the convergence time of 10-S. i.e. at least 100 million iterations. With the computer facilities available at the SPL, such a simulation would take several months to complete. In order to make the task more tractable, PTpic uses an altered physics model developed by James Szabo ([28]), which greatly alleviates the computational workload. " The vacuum permittivity co is increased by -y'. The Debye length is thus multiplied by -y, allowing the use of a much coarser mesh. Additionally, it makes the simulation less prone to the plasma oscillations instability, since it also divides the plasma pulsation pe by 7. " The mass of heavy species (ions and neutrals) is reduced by a factor multiplies their speed - and thus reduces the convergence time - by f. This f. " In order to compensate for the reduced neutral residence time, the electronneutral cross-sections are increased by V,/7. The rationale backing this methodology is that the overall discharge properties will not change if the Debye length remains small compared to the dimensions of the thruster; nor will it change if the mass ratio of ions and neutrals to electrons remains large enough. It does however require a "post-processing" step in order to recover the "physical" parameters from the simulated ones. For instance, the ion-beam current has to be reduced by f in order to recover a meaningful value. All these manipulations are detailed in Szabo's PhD thesis ([28], p. 87). 26 Electric field solver The electric potential governing equation is discretized by applying Gauss's theorem to a small control volume surronding each node ([12], p. 144). This yields a matrix equation linking the charge and the potential at each node: AO = 4 (2.1) where b and 4 are the vectors containing the normalized electric field and charge at each node, respectively. In its current version, PTpic uses a direct solver which computes a LU decomposition of A during initialization. Reusing this decomposition throughout the run enables a significant saving of time compared to previous versions of the code, which used an iterative solver ([12], p.166). Particle pusher The method used to advance the particles in time is the Boris algorithm, which is the most common method for PIC codes (see Annex A). Its popularity stems from the fact that it is second-order accurate and has interesting energy-conserving properties, which some more accurate methods like the Fourth-Order Runge-Kutta do not possess. However, it is bounded by the stability limits mentioned earlier. Collisions management The "particle pusher" step is also responsible for the handling of collisions. The collisions currently included in PTpic are: " Electron-Neutral elastic scattering. " Ionization and double ionization collisions. Double ions can be generated from single ions and neutrals. " Electron-Neutral excitation in one lumped level. " Ion-Neutral charge-exchange collisions. 27 * Ion-Neutral and Neutral-Ion scattering. " Electron and Ion wall recombination. * Secondary Electron Emission. All the particle-particle collisions are modeled with a Monte-Carlo Collisions (MCC) methodology. This means that, for each collision, one species (the "'target") is treated as a background of particles characterized only by its density ntarget and its bulk speed Utarget. Then, for each superparticle of the other species, the probability p of a collision over the timestep is calculated from the collision frequency v (see Szabo [28], p. 192): Q P Q p- 1 * - IV - UtargetII * ntarget (2.2) e* being the collision cross-section, evaluated at the incoming particle's velocity. This probability is then compared to a random number r drawn with a uniform probability distribution between 0 and 1; if r > p, then the collision is allowed to proceed, provided that there are enough target superparticles to support the collision event. Particle-surface collisions are handled as follows: " Neutrals undergo a diffuse reflection with an energy accomodation factor of 0.5. * Ions are converted to neutrals and diffusely reflected, again with an energy accomodation factor of 0.5. * Electrons are destroyed ; if they hit a dielectric, their charge is transferred to it and secondary electrons are emitted in accordance with the yield reported in [13], p. 3 4 9 . 28 Moments calculation Finally, certain particle moments have to be calculated at each iteration because they are required by the field solver or the collision management functions. The electric charge is calculated with a first-order weighting, which has a lower "numerical noise" than the nearest grid point (NGP) weighting. The particle moments which do not directly contribute to the simulation but provide valuable information on the operation of the thruster are only computed every few dozen thousand iterations. Parallelization PTpic started as a serial code (i.e. run on a single processor), but it was parallelized by Fox ([11]) in 2005. MPI (Message-Passing Interface). Inter-processes communication are handled with The implementation of MPI used in this thesis is OpenMPI. Chapter 3 deals with the efforts made in order to increase the speed and scalability (i.e. the ability to run efficiently on a large number of processes) of the parallelized code. 2.3 Princeton Cylindrical Hall Thruster The first device studied here is the 2.6cm diameter Cylindrical Hall Thruster (CHT), built at the Princeton Plasma Research Laboratory ([23]). This thruster was selected because it is one of the best studied miniaturized Hall thrusters; thus a large body of published data is available to evaluate the results of simulation. 2.3.1 Simulation domain and Boundary conditions A 193x157 mesh with a uniform spacing of 0.25mm was used. A drawing of the simulation domain is provided in fig. 2-2. The light grey parts represent boron nitride; the dark gray parts represent metal. The boundary conditions for the electric potential are a combination of Dirichlet and Neumann type: e The metal parts (dark grey) are set to a prescribed potential Om. 29 Free space: Q = 0 3.5 3 ,-2.5 E 0 Ce2 1.5 1 F Anode 0.5 , Centerline: 09 , , , 1, ,1 , , , , 1 , , , ,(9 1, 0 0 0.5 1 1.5 2 2.5 3 = 0 ,1 3.5 , l . , 4 , 4.5 Z [cm] Figure 2-2: CHT simulation domain " The anode (red) is set to a prescribed potential. " On the centerline (yellow), the electric field is tangent to the boundary. " On the free space boundaries (blue), the potential is set to 0. 2.3.2 Magnetic field The CHT has two electromagnetic coils which allow for an adjustable magnetic field. All the simulations were performed with the magnetic field (fig. 2-3) corre- sponding to a current of +1.4A in the back coil and -0.9A in the front coil. The maximum magnetic field in the channel is 270 G. This magnetic field was calculated with MAXWELL, a finite elements software. 30 4L JBI M 0.7 0.422906 0.2555 0.154361 0.0932574 0.0563417 0.0340389 0.0205647 0.0124242 0.00750611 0.00453483 0.00273973 0.00165521 0.001 3.5 3 .- 2.5 E S2 1.5 1 0.5 00 2 2.5 Z [cm] Figure 2-3: CHT magnetic field, with superimposed streamlines. 31 Table 2.1: Parameters for CHT simulation A Anode flow rate (seem) Anode Voltage (V) 4 250 y 50 1000 Xenon f Propellant Super-particle size (ions and electrons) Super-particle size (neutrals) Timestep (s) Number of iterations Thruster body potential OM (V) 108 108 10-11 3.106 20 1200000 1000000 CL 800000 £ 600000 400000 E 200000 0 000E+00 5.00E-06 100E-05 1.50E-05 2.OOE-05 2.50E-05 3.00E-05 Time (s) Figure 2-4: Count of electron superparticles - CHT case A 2.3.3 Results Note: PTpic outputs the performance parameters (thrust, anode current...) at every iteration. All graphs presented hereafter were time-averaged within groups of 1000 iterations. Case A A first round of simulations was performed with a conservative value for the artificial permittivity factor: -y = 50. The values of the other significant simulation parameters are given in table 2.1. After a large ionization peak, convergence is reached at about 8 x 106 s, i.e. 800000 iterations, as evidenced by figures 2-4 and 2-5. Performance parameters in the steady state are presented in table 2.2, alongside the experimental values measured at the PPPL ([26]). The current utilization 71c is defined as the ratio of ion beam current to anode current. The propellant utilization 7r is defined as the ratio of ion 32 2.5 C 1.5 0 0.5 0 0.DOE+00 5.OOE-06 1.50E-05 1.00E-05 2.OOE-05 2.50E-05 3.OOE-05 Time (s) Figure 2-5: Anode current - CHT case A Table 2.2: Performance parameters - CHT case A Experimental A 4.0 2.54 Thrust (mN) 0.57 0.181 Anode current (A) 0.288 0.139 Ion beam current (A) -0.288 -0.128 (A) current beam Electron 0.51 0.77 Current utilization nc 0.98 0.47 Propellant utilization r, 0.22 0.18 Efficiency j beam current to the anode mass flow rate expressed in amperes, with 1 sccm Xe corresponding to 0.0722399 A ([13], p. 464). The simulated discharge is significantly weaker than the actual one, with a thrust under-predicted by about 40% and an anode current about 3 times smaller than the experimental value. However, the computed efficiency (0.18) is close to the experimental value of 0.22. Let us now examine the structure of the plasma in its steady state. The electron density (fig. 2-6) is rather homogeneous in the chamber, but features a narrow region of higher density (peaking at 2.8 x 10 19 m- 3 ) on the center axis. The electron tem- perature (fig. 2-7) is low, with most regions below 7 eV, while experimental values for channel electrons put the temperature at 15 eV ([25]). The ion density (fig 2-8) is very similar to the electron density in the channel 33 I 4 3.5 3 2.5 Ir2 1.5 0.5 0 0.5 1 1.5 2 2.5 3 3.5 . Z [cm] Figure 2-6: Electron density - CHT case A 416 3.5 15 14 3 13 12 11 2 7 1.5 4 6 5 1 10 3 2 1 Z [CM] Figure 2-7: Electron temperature - CH T case A 34 4 3.5 3 -2.5 E 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 Z [CM] 3.5 4 4.5 Figure 2-8: Ion density - CHT case A Table 2.3: Parameters for CHT simulation B 4 Super-particle size (ions and electrons) Anode flow rate (sccm) Super-particle size (neutrals) 250 Anode Voltage (V) Timestep (s) 25 7 Number of iterations 1000 f Thruster body potential Om (V) Propellant Xenon 108 108 10-11 3.106 100 region. The plume, however, has a conical structure of its own, with a divergence angle comprised between 30 and 45'. This rather large divergence angle echoes the observations made at the MIT SPL on the DCFT ([18]) and the CCFT (see section 4.3.4). Case B In order to improve the fidelity of the simulation, another simulation was performed with -y = 25. Om was also raised to 100V in order to make the initiation of the discharge easier by attracting a large amount of electrons to the thruster. After an initial ionization surge, the discharge enters a highly oscillatory mode, as evidenced by fig. 2-9, representing the anode current. On this graph, the "corrected time" is used; it is equal to the simulation time multiplied by \/f. 35 The rationale 2 5 C'L5 a5 1,27EA23 1,32E-03 137E-03 1.42E-03 1.47E03 1.52E-03 1.57E-03 1.62L03 Corrected Time (s) Figure 2-9: Anode current - CHT case B for this manipulation is that low-frequency oscillations in Hall thrusters are usually determined by the travel time of neutrals through the chamber. Since neutrals are accelerated by a factor of v'f in our altered physics model (see 2.2), the period of simulated oscillations must be multiplied by the same factor in order to make comparison with experimental measurements possible. With this dilated time, the period of the oscillations appear to be around 16.7 kHz, which agrees quite well with the findings of the PPPL experiments, which detected oscillations at a frequency slightly lower than 20 kHz ([24]). These oscillations have the effect of increasing very significantly the average anode current and thrust compared to the previous run, as evidenced by table 2.4. Thrust is now about 25% higher than the experimental value, and the anode current is much closer to 0.57A. An undesirable effect of the high thruster potential #m is that it allows the thruster front surface to draw a large amount of electrons, thus leading to an unphysical imbalance between the ion and the electron beam currents. Case C Pursuing our trend towards lower and more realistic values of -Y, we now attempt a simulation with -y = 16. OM is set to the more realistic value of 20 V, as in run A. Quite surprisingly, the discharge is steady (see fig. 2-10), without any trace of the large amplitude oscillations observed in run B. 36 Table 2.4: Average performance parameters - CHT case B Thrust (mN) C 5.03 Anode current (A) Ion beam current (A) Electron beam current (A) Current utilization rc Propellant utilization rj 0.361 0.408 -0.15 1.13 1.39 Efficiency r/ 0.35 Oscillations corrected period (ps) Anode current oscillations relative amplitude R 60 5.1 Table 2.5: Parameters for CHT simulation C Anode flow rate (secm) Anode Voltage (V) y f Propellant 4 250 16 1000 Xenon Super-particle size (ions and electrons) Super-particle size (neutrals) Timestep (s) Number of iterations Thruster body potential Omu (V) 0.5 C S0.3 C 0.2 0.1 3.20E-05 3.40E-05 3.60E-05 3.80E-05 4.0OE-05 4.20E-05 4AE-05 T4me (s) Figure 2-10: Anode current - CHT case C 37 108 108 10-" 3.106 20 Table 2.6: Average performance parameters - CHT case C C Thrust (mN) Anode current (A) Ion beam current (A) Electron beam current (A) Current utilization qc Propellant utilization r1 Efficiency q 5.25 0.407 0.283 -0.270 0.695 0.98 0.34 4 3.5 3 -2.5 E 1.5 1 0.5 00 Z[CI) Figure 2-11: Electron density - CHT case C In spite of this lack of oscillations, the performance parameters remain high, and close to the value they had with y = 25. A noteworthy point is the good adequation of the predicted (0.283A) and experimental (0.288A) beam current. While the electron density map (fig. 2-11) shows the same general structure as in run A, the electron temperature (fig. 2-12) is significantly higher. A large part of the channel has an electron temperature above 10 eV, with peaks well above 20 eV. Case D In this last case, -y is set to 10 and Om = 20V. This run shows a return of large discharge oscillations, although their amplitude 38 T e [eVI 24 3.5 22 20 18 16 14 12 10 8 6 4 2 0 3 -2.5 E 2 1.5 0.5 00 0.5 1 1.5 2 2.5 Z [CM] 3 O.U + 14.; Figure 2-12: Electron temperature - CHT case C Table 2.7: Parameters for CHT simulation D Anode flow rate (secm) Anode Voltage (V) 7 f Propellant 4 250 10 1000 Xenon Super-particle size (ions and electrons) Super-particle size (neutrals) Timestep (s) Number of iterations Thruster body potential #m (V) 39 108 108 10-11 3.106 20 0.6 0.5 0.4 i 0.3 0.2 0.1 8.OOE-06 9.00E-06 1.00E-05 1.10E-05 1.20E-05 1.30E-05 1.4E-0S 1.5OE-05 1.60E-05 Time (s) Figure 2-13: Anode current - CHT case D Table 2.8: Average performance parameters - CHT case D C Thrust (mN) Anode current (A) Ion beam current (A) Electron beam current (A) Current utilization 77c Propellant utilization T7H Efficiency q 3.462 0.238 0.261 -0.268 1.097 0.904 0.256 is lower than in run B (fig. 2-13). While run C suggested that oscillations in run B may be due to the high value of OM rather than the decrease in -y, this last simulation invalidates this hypothesis. The average anode current and thrust are lower than for runs B and C (table 2.8). This may be connected to the observed propensity of cylindrical Hall thrusters to switch their operating mode. For instance, Raitses ([24]) reports that oscillations can be turned on and off in the CHT by changing the cathode settings, which are currently not included in the PTpic model. Different numerical treatments may end up simulating different modes; this should be further investigated. 40 2.4 MIT Diverging Cusped-Field Thruster This section deals with the application of PTpic to the Diverging Cusped Field Thruster, the first cusped field thruster built at the SPL (see [9]). This simulation program was started by Gildea ([12], p.183), who extensively studied the y = 50 case. At the time, he was using a feature of PTpic by which the thruster body potential Om is allowed to change, based on the amount of charge collected ([12], p. 158). While it is true that OM should be allowed to float in order to replicate the reality more closely, implementation has been troublesome because of the difficulty of accounting for the variations of capacitance caused by the presence of plasma. Thus we will use a fixed Om, as is standard in the Hall thruster simulation community. Apart from this change, the code used for these simulations is essentially the same as used by Gildea; it does not include the modifications presented in the next chapter. 2.4.1 Simulation domain and Boundary conditions A 213x113 mesh was used. It is based on the mesh used by Gildea in his thesis, but focuses on the discharge channel in order to reduce the computational cost. Due to the divergent shape of the thruster, this mesh is not mapped to a rectangular physical space and thus the shape and size of its cells varies significantly; the spacing is smaller near the anode and the cusps, where the highest plasma densities are expected. A drawing of the simulation domain, with the electric potential boundary conditions, is provided in fig. 2-14. 2.4.2 Magnetic field The DCFT magnetic field (fig. 2-15) is created by permanent Samarium-Cobalt magnets. It is much stronger than in the CHT, most of the channel being exposed to a field greater than 2000G. The strong gradient from the center axis towards the walls, as well as the cusped magnetic streamlines, are clearly visible. This magnetic field was also calculated with MAXWELL. 41 r"I E 0 &MEN W 47 - On - 0 3 2 =0 1 01 2 1 Z [cm] 3 4 5 6 7 Figure 2-14: DCFT simulation domain 4 BI [T |-I 3 0.9 0.8 0.7 *0.6 *0.5 *0.4 *0.3 2 0.1 0 1 3 4 Z [CM] Figure 2-15: DCFT magnetic field, with superimposed streamlines. 42 Table 2.9: Parameters for the DCFT simulation 6 Super-particle size (ions and electrons) Anode flow rate (sccm) Super-particle size (neutrals) 300 Anode Voltage (V) 108 108 25 Timestep (s) 10-11 1000 Xenon Number of iterations Thruster body potential #m (V) 3.106 20 y f Propellant Table 2.10: Performance parameters - DCFT simulation Simulation 2.4.3 Thrust (mN) 5.4 Anode current (A) Ion beam current (A) Electron beam current (A) Current utilization r7 Propellant utilization TH Efficiency 17 0.309 0.296 -0.298 0.958 0.683 0.267 Results The simulation presented in this section was performed with the parameters described in table (2.9). The value of -/ chosen represents a significant decrease compared to the y = 50 used in [12]. Performance parameters in steady state are given in table 4.1. The anode current prediction matches closely the experimental value of 0.3A (see [12], p.80). Unfortunately, direct experimental measurements are not available for the other performance parameters. However, comparison with the measurements made by Courtney ([9], p.85) for a configuration at 300V anode voltage and 8.5 seem anode flow rate shows that the simulation results are reasonable. The thrust and anode current are significantly higher than the values reported by Gildea ([12], p. 189), which were 2.72 mN and 0.198A, respectively. The results of a 7 = 50, fixed OM run performed by the author (not included in this thesis) strongly suggests that this increase is due to the lower value of 7, rather than to the switch from a floating to a fixed body potential. Finally, this discharge is clearly steady; the amplitude of fluctuations is very small (fig. 2-16). Thus the oscillations seen by Gildea were likely due to the floating body 43 1.4 1.2 ~0.8 0.2 0 0.00E+00 5,00E-06 1.OOE-05 1.50E-05 2.OOE-05 2.50E-05 Time (s) Figure 2-16: Anode current - DCFT simulation 6 - 5 6528E+19 -8 5+31944E+1 8 4 - 00 1 2 3 [ 4 5 6 7 Figure 2-17: Electron density in steady state potential boundary condition. Snapshots of the electron density and temperature in steady state are provided in figs 2-17 and 2-18. The shaping of the electron density by the magnetic field cusps is clearly visible. Interestingly, the high density regions (around 2 x 10 1 8 m-3 ) around the cusps seem to be at a low temperature (about 5 eV). These density and temperature maps globally agree with those obtained by Gildea ([121, p. 192 & 198). While the performance parameters reported in table 4.1 are reasonable, the high electron density and temperature seen in the anode region suggest that we may be locally close to the onset of numerical instability. In fact, simulations attempted at 'y = 16 were generally unsuccessful, contrary to the CHT case. 44 6T-0 fel/ 5 25 5 17.5 15 ]1245 4 E U 2 10 7,5 5 -0 3~2.5 2 0 1 2 3 4 5 6 7 Z [CM] Figure 2-18: Electron temperature in steady state 45 46 Chapter 3 The PTpic code: improvements 3.1 3.1.1 Parallelization redesign Background PTpic has proved itself a valuable research tool, but over the recent years its limitations have become more apparent. The Cusped-Field Thrusters studied since 2008 at the MIT SPL have proved especially challenging to model since their intense magnetic field and high plasma density make them prone to the numerical instabilities mentioned in section 2.1.2, which in turn mandates the use of a very small timestep (1 x 10-1 2 s), even with a high value of -y. For example, Steve Gildca's DCFT simulations were performed with -y = 50 (meaning that the vacuum permittivity is 2500 times higher in the simulation than in reality) and At =1 x 10- 12 s. Although these simulations were able to predict the erosion rate along the dielectric surface with a good accuracy, some other features, such as the location of the main ionization region, were not resolved with the same success. Tackling these issues and obtaining a more realistic simulation likely requires the use of a vacuum permittivity as close as possible, and ideally identical, to its real value. This, in turn, requires a much finer mesh in order to resolve the local Debye length, as well as a much larger number of superparticles, so that the number of superparticles per cell stays high enough to allow for the calculation of meaningful 47 statistics. If run on 24 processes, as was standard during Gildea's work, such a simulation would take several months to converge. Since PTpic is a parallel code, the obvious solution consists in running it on more processes. Unfortunately, the version of PTpic inherited by the author has a limited scalability; practically. using more than 30 processes does not yield any reduction in computation time (also called "wall time"), or even increases it. Thus, a prerequisite to the use of more realistic values of 7 and f is the removal of this scalability limitation; and a general acceleration of PTpic. A key limiting factor to the speed and scalability of PTpic is the fact that each process manages particles spread all over the grid (see [11]). This non-local architecture degrades performance for two reasons: " Since each process needs, and contributes to, the particle moments everywhere on the grid, the calculation of particle moments requires many time-consuming calls to MPI collective communication functions per iteration. This effect is especially significant when there are few particles, for instance at the beginning of a run, because the statistics calculation takes a larger share of the overall iteration time. * PTpic is designed to ensure that the number of collision events per iteration in a given cell is lower than the number of target superparticles present in this cell. For instance, the number of Xe+ to Xe++ transitions in a given cell must be smaller than the number of Xe+ superparticles in the cell. If this count check was performed independently by each process, the collisionality would be artificially decreased, because a process could exhaust its number of target superparticles and thus reject a collision event while there are still target superparticles in the cell belonging to other processes. Thus, each process has to send all its collision candidates to a single process which approves or cancels them based on the total count of target superparticles (see [11]). This introduces a serial component in the code which degrades speed and scalability, especially when there is a large number of particles. 48 Rather than this non-local architecture, a better program design consists in assigning a region of the domain to each process, so that all particles in the said domain are managed by a single process (local architecture). Thus the calculation of particle moments does not require inter-processor communication anymore, except for a few boundary cells (and even in this case, the situation is much better than in the old design, because the local process only needs to exchange information with its neighbours, rather than all processes), and each process can manage its collision events in full autonomy. This introduces a new issue though: load balancing. In Fox's design, it was automatically guaranteed because each new particle was randomly assigned to a process. Now, we need to repartition the domain periodically so that each process manages roughly the same number of particles. Fortunately, a team at the University of Minnesota has developed a freely available package, called ParMETIS, which is able to quickly compute load-balanced partitionings. With ParMETIS available, it was de- cided to go ahead with a major redesign of PTpic, switching from a non-local to a local architecture. The resulting code will be referred to as PTpicFP (for "Fully Parallel"). Implementation 3.1.2 Three different partitionings It is essential to understand that PTpicFP alternatively uses three partitionings of the grid. A partitioning is a division of the mesh so that each node is assigned to one, and only one, process. e The principal partitioning, which determines the domain over which each process is responsible for moving the particles and computing statistical moments. An example with 24 processes on the CHT 193 x 157 mesh is provided in fig. 3-1. e The field solver partitioning, used exclusively by the field solver. Each process is responsible for computing the potential on a portion of the mesh. 49 4 3.5 3 72.5 2 1.5 1 0.5 0-m 'ZCM) Figure 3-1: Principal partitioning generated by ParMETIS with 24 processes e The ParMETIS partitioning, used exclusively by the ParMETIS functions. The code would be slightly simpler if these three partitionings were identical. However, this would be detrimental to performance because the relevant load metric (a measure of the computation time spent on each cell) is different in each case. For the principal partitioning, the relevant load is the number of superparticles present in the cell. Ions and electrons have a higher weight than neutrals, since they are moved at every iteration while neutrals are only moved every DT-N2E iterations. Thus, the load associated to a cell [k, j] is: [h]load[k,j] = DT-N2E * (Nions[kij] + Neiectrons[kj]) + Nneutrais[k,j] (3.1) This means for instance that a process assigned to the plume of the thruster, where the plasma is very diffuse, will manage several dozen times more cells than a process responsible for a dense plasma region within the channel. On the other hand, for the Field solver, and ParMETIS, the number of particles is not relevant and we simply want to assign the same number of cells to each process. 50 D, proc. 4 C, proc.3 B, proc. 2 A, proc. 1 Figure 3-2: Particles in a cell whose corners are owned by different processes Statistics exchange between processes The local architecture requires the development of tools and structures in order to handle the selective exchange of information between processes. The spatial specialization of each process drastically reduces the amount of statistics information that needs to be exchanged between processes, but it does not completely eliminate it. First, since the principal partitioning and the field solver partitioning are different, a process may very well have to solve for the potential at a location which belongs to its Field solver domain, but not to its Principal domain. To keep things simple, it was decided to retain the logic of the conventional PTpic: all processes know the value of the charge density everywhere. All particles present in a given cell are managed by the process owning the lowerleft corner of this cell. However, they do contribute to the statistics at all four corners, which means that some exchange of statistics may be required if some of the corners are owned by other processes. An example is given in fig. 3-2. All particles in the cell (red disks) are managed by process 1, because it owns the lower-left node A. However, these particles contribute to statistics at nodes B, C and D, which are owned by processes 2, 3 and 4 respectively. The statistics calculation in PTpic is a two-stage process. * Each time a particle is moved (in the "Particle Pusher" step, see fig. 2-1), its contribution to the four neighboring nodes is added. 51 * Then, during the "Moments calculation" step, these raw statistics are divided by the number of particles in order to obtain the moments. The solution adopted is as follows: * During the particle pusher step, process 1 computes the statistics contribution of its particles on all nodes, including nodes owned by other processes. * It then sends the raw statistics for nodes B, C and D to their owner process. " Each process normalizes the statistics for the nodes it owns. * Processes 2, 3 and 4 then send back the normalized moments for nodes B, C and D to process 1. This is necessary because process 1 needs to know the correct value of the moments at all four corners of the cell (for instance to interpolate the neutrals density at the location of an electron in order to compute its collision probability). Obviously most of the cells have their four corners managed by the same process, and thus do not require any external input. Particle Exchange After each move, each process must check if the particle is still in its area of responsibility. If it has moved to another process's domain, one could send it to the target process right away; however it is likely more efficient to pack all particles to be moved in a single buffer and send them in one message. This requires creating a separate array of particles to be displaced, as well as an array to keep count of their number. Non-blocking communications and communication-computation overlap Care was taken to use non-blocking communications as much as possible. The basic concept of non-blocking communications (for more details, see the MPI reference [27] p. 79) consists in starting the transaction between two processes, leaving it 52 running in the background, doing some useful computation in the meantime and then closing the communication channel once we make sure all data has been received. This program design is called conmiunication-conputatiortoverlap and is a very desirable feature in parallel computing. To make it clearer, let's take the example of the statistics exchange described in section 3.1.2. The first thing process 1 does is to initiate the sending of raw statistics for nodes B, C and D to processes 2, 3 and 4 via a non-blocking send. At the same time, these processes open a non-blocking receive resource. Then, each process normalizes the statistics for the nodes it owns which do not require external input (i.e. the vast majority). Once this is done, processes 2, 3 and 4 wait for the message sent by process 1. The key thing is that since they have been busy for some time with the statistics calculation, this message has hopefully arrived so that they do not waste any time waiting for it. Then, they can proceed with the calculation of statistics for nodes B, C and D and send back the normalized moments to process 1. Removal of unnecessary MPI function calls The collision approval functions have been redesigned to remove the inter-processes approval sequence, which is now unnecessary. A broader effort to reduce the number of calls to MPI functions was implemented. Since the ratio of latency time to total communication time is usually high for small messages, it is often beneficial to pack as many messages as possible into a single buffer, then send it, rather than sending them separately as was the case previously. 3.1.3 Performance assessment The performance comparison between the FP and conventional versions of PTpic was done for 8, 24 and 48 processes, on 2 different grids. The first one (regular grid) is the 193x157 mesh used in the CHT runs (see previous chapter); the other one is a 301 x 151 mesh that was also used for a few CHT simulations, whose results are not included in this thesis. Also, thanks to a feature of PTpic which allows the user to 53 Table 3.1: Common parameters for code speed experiments Anode flow rate (sccm) 4.0 Anode voltage (V) 250 Timestep (s) 5 x 10-12 'Y 15 Table 3.2: Specific parameters for the low density plasma, regular grid test Ion temperature (eV) 1.0 Electron temperature (eV) 10.0 Approximate ion and electron superparticles count 13000 "seed" a part of the domain with electrons and ions, various plasma states (density and temperature) were investigated. The parameters in table 3.1 were identical for all simulations. The times reported hereafter are averaged over the first 1000 iterations. Low density plasma, regular grid In this experiment, a low density, low temperature (see 3.2) plasma is seeded in the domain; the pusher component of the algorithm is thus lightly solicited. This configuration is representative of the beginning of a simulation. Since it can easily take one million iterations before the thruster "ignites", reducing the iteration time in this kind of situation is very important. The average iteration time is given in fig. 3-3. In this configuration, the bad scalability of the conventional PTpic clearly appears: the computation time is actually steeply increasing with the number of processes. With this low number of particles, adding more processes does not make the particle pusher substantially faster, but it adds a considerable communication overhead due to the inefficient communication patterns. In contrast, the computation time for PTpicFP decreases significantly between 8 and 24 processes and then stays approximately constant, which is mostly due to the fact that the field solver currently implemented does not scale well above about 30 processes for this kind of meshes (see [12], p. 166). 54 0.12 0.1 0.08 3 FP 0.06 N Conventional 7~ 0.04 0.02 0 8 48 24 Figure 3-3: Average iteration time for the low density plasma, regular grid test Table 3.3: Specific parameters for the high density plasma, regular grid test 1.0 Ion temperature (eV) 20.0 Electron temperature (eV) Approximate ion and electron superparticles count 1.3 x 106 High density plasma, regular grid In this section, a much larger number of plasma particles is seeded. The electrons energy is also increased in order to maximize the collisionality. The average iteration time is given in fig. 3.3. The difference in scalability between the two versions is not as striking as in the previous case. Because of the amount of particles to manage, going from 24 to 48 processes helps, even for the conventional PTpic. However, the speedup achieved by the Fully Parallel version is much larger: 34.2% versus 13.9 %. High density plasma, large grid Reducing -y will mean using much larger meshes. The number of plasma superparticles seeded and their temperature have the same value as in the previous section, but the mesh size is now 301x151, a 50% increase. The average iteration time is given in fig. 3-5. The advantage enjoyed by the Fully Parallel code is significantly larger than in the 55 0.18 0.16 0.14 0.12 0.1 * FP M0.06 I Conventional 0.06 0.04 0.02 24 48 Figure 3-4: Average iteration time for the high density plasma, regular grid test 0.3 0.25 0.2 M FP 0.15 * Conventional 0.1 0.05 24 48 Figure 3-5: Average iteration time for the high density plasma, large grid test 56 0.25 0.2 0.15 -Global -Pusher E time 5 Pusher 9 Pusher 17 0.1 0,05 0 0 2000 4000 6000 8000 Iteration number 10000 12000 Figure 3-6: Average time per iteration (high density, regular grid, 24 processes) high density - regular grid situation. On 48 processes, it is going more than twice as fast as the conventional version, with a better speedup (35.6% against 24.5%). The fact that the statistics calculation is completely serial in the conventional version is obviously a big inconvenience on large meshes. Occurence of load imbalance during a run of the Fully Parallel version If we allow the Fully Parallel code to run for more than 1,000 iterations in the high density - regular grid configuration, it appears than the time per iteration increases rapidly. To investigate this undesirable phenomenon, we measure the time spent by each process to step its particles forward (fig. 3-6). Examining the results, we isolate three interesting processes (5, 9 and 17). While their pusher times are initially close, they then drift rapidly from each other. Since the code is only as fast as the slowest process, process 5 significantly slows down the whole computation. What happened is simple: the partitioning was computed for the initial, homogeneous plasma. Under the influence of the electric and magnetic field, a fast redistribution takes place, which throws the work repartition between processes off-balance. The solution consists obviously in repartitioning the mesh regularly. The whole repartitioning sequence, including particles redistribution and statistics calculation, takes about 0.7-0.8s. Thus, even with a repartitioning every 1000 iterations, the time penalty is less than 1%. After the initial redistribution, it 57 "Field solve r " Pusher - Particle Fxchange a Statistics Figure 3-7: Computation time breakdown - low density * Field solver * Pusher a Particle Exchange a Statistics Figure 3-8: Computation time breakdown - high density is likely that the need for such frequent repartitionings will disappear. Breakdown of the computation time for the Fully Parallel code Finally, timing the separate components of PTpicFP individually allows to establish the time profile of an iteration. We do it for the low (3-7) and high (3-8) density configurations, on 24 processes and the regular mesh. As expected, in the low-density case, the field solver is the most time-consuming step, accounting for more than half of the global iteration time. For the high-density case, the particle pusher consumes more than 80% of the iteration time. In both cases, the time consumed by the particles exchange routine is negligible, which shows 58 that the principles that guided the design of the inter-process exchanges were sound. Summary PTpicFP shows promising performance. It outperforms the previous version substantially, scales up well in both low and high density plasma situations (unlike the conventional PTpic, for which adding processes actually makes the code slower in a low density context), and handles large meshes better. The overhead associated with the periodic repartitionings and the exchange of particles between processes is very small, even in the worst case. 3.2 3.2.1 Implicit particle pusher Stability limitations of the leapfrog algorithm A critical component of any particle-in-cell code is the algorithm used to advance the particles (the "particle pusher"). All versions of PTpic up to now have used the leapfrog algorithm: X n+1 _ X" -+ Vn+1/2 (3.2) Sn+1/2 _ Vn-1/2 + F(xn) . At where F is such that x = F(x). In spite of its relatively low order of accuracy (2), it is still the most common algorithm for PIC codes, due to its low computational cost (the force field F needs to be evaluated only once per iteration, in contrast with higher order Runge-Kutta methods for instance) and favorable energy conservation properties. PTpic uses a variant of the leapfrog known as the Boris algorithm, which further simplifies the calculation when F is the Lorentz force. A detailed description is provided in Annex A. The traditional PIC method based on the leapfrog algorithm mandates the use 59 of a very small timestep, of the order of to 10-12 10-11 s. This is several orders of magnitude smaller than the timescale of any phenomenon of interest happening in the thruster. With a more stable algorithm, it may be possible to use a much larger timestep, without sacrificing the fidelity of the model. In the field of ordinary differential equations (ODE), a common issue is the existence of so-called stiff differential equations, such as the following example (from [22], p.727): u' = 998u + 1998v V = -999u - 1999V (3.3) u(0) 1 v(0) 0 The solution to this system is: -e u =2- v= eX + 10 00 X (3.4) -100x As soon as one gets slightly away from x = 0, the e-100ox term becomes completely negligible. But if we are using an explicit Euler scheme (yn+i are required to use a timestep h smaller than 2 = = yn + hy' ), we 0.002, otherwise numerical instability yields a result completely different from the exact solution. Thus, with the explicit Euler scheme, we are forced to use a timestep much smaller than the characteristic variation time of the exact solution. This is exactly the same problem we face with PIC codes: we have to resolve the cyclotronic motion of electrons and plasma oscillations to avoid numerical instability, even if we have little interest in them. In the field of ODEs, the solution to stiffness consists in using an implicit method, where the derivative is evaluated at the new location Yn+1 instead of yn: y7+1= y. + hy' Thus it appears desirable to make PTpic implicit in order to 60 allow for a larger timestep (at constant -y) or a smaller, more realistic -y (at constant timestep). A framework for implicit PIC codes 3.2.2 There are two knobs on which one can act to make a leapfrog-based PIC method implicit: " The location Xn where the electric and the magnetic field are evaluated to calculate the acceleration. " The way we calculate E, since it is itself time-dependent. A generic "implicit leapfrog" method thus reads: Xn+1 Vn 1 2 -r n 1 _ X ) At = +- Vn+1/ 2 . q -Enx)- mn (E"(xA)+ At V +1i/2) + V 2 -1/29 (3.5) x x B where R and E" are to be defined. This formulation remains similar to the leapfrog Boris method and thus does not encompass all possible implicit PIC schemes. These could be based on a higher-order method, such as Runge-Kutta. However, we will limit our investigation to schemes complying with 3.5. In the straightforward, explicit leapfrog-based PIC (used in all versions of PTpic up to now), we have XT = xn, and E" = E", where E" is calculated from the charge density at t", pn: V - En = $P -760 3.2.3 (3.6) Semi-implicit field solver Cho recently reported very good results using a "semi-implicit field solver" ([6]). The core of this method consists in using the particle densities as well as the current 61 densities in order to predict the charge distribution at t'+ 1 : -En = 1+ 1 I (peAt) F 2 pn --eoV At . V nj+ (1 -, vAt)ji en ie + eAt -ji me x B B N (3.7) where venj is the electron-neutral scattering collision frequency. One can recognize on the right the ion and electron advection terms (including a collision attenuation term for electrons) as well as the Hall current. It does not however include any pressure term. This modification has been implemented in PTpic and tested on the CHT but hasn't delivered a significant improvement in stability. Several reasons may explain this: " Cho used the 4 th order Runge-Kutta scheme to step the particles forward, while the PTpic semi-implicit implementation retained the leapfrog algorithm, as ex- plained in 3.5. " The thruster studied by Cho was the SPT-100, a conventional Hall Thruster with a lower magnetic field and plasma density than the CHT, and thus less prone to numerical instabilities. " Finally, instead of making ions and neutrals lighter as in PTpic, Cho increased the electrons mass instead, which decreases the plasma and the cyclotron frequencies. But at the same time, he retained a small timestep of 10--s. This might explain the increase in stability. Another issue is that formula 3.7 is derived from a cold plasma model: it does not include any pressure or temperature term. In the author's experience, the thermal velocity of the electrons is not negligible with respect to their bulk velocity. Moreover, given that a prominent numerical instability is linked to the ratio of the Debye length to the grid spacing, it seems reasonable to expect that any scheme able to address this instability should include a temperature term (since the Debye length itself depends on electron temperature). 62 3.2.4 Particle Predictor-Corrector The other implicit scheme investigated by the author, called the Particle PredictorCorrector (PPC) works at the particle level rather than with moments. It can be summarized as follows (0 E [0, 1]). Vn+1/2 _ Vn-1/2 q =- (E'(x)+ Vn+1/2 Xn+l - x B(x')) 2 m At + Vn-1/2 Xn + Vn+1/ Vp - 1 + (1 At 2 0)pf - (3.8) 'Yo V n+1/2 =/- m At Xn+1 _ V n+1/2 - q _ V n-1/2 n (En± (xn) V+ + Vn-1/2 2 x B(xn)) A + Vn+1/2 In this sequence: e Particles are first advanced to a position x"2 1 with the classic, explicit leapfrog ("simulated push"). " The charge density pfl1 created by this new distribution is computed. Then the composite charge density Qpfl + (1 - Q)p" is calculated, as well as the associated composite electric field En±. " Particles are then "stepped back" to their initial position and advanced with the standard leapfrog, but this time using the composite field E~ (final push). The parameter 0 sets the "degree of implicitness" of the algorithm. A value of 0 for instance corresponds to the classic, explicit leapfrog. Unlike the semi-implicit method, we do not make any assumption about the nature of the plasma (cold or warm) in order to obtain p2, and we fully account for the thermal motion of electrons. Obviously, the computational cost is higher since we have to move the particles and solve for the electric field twice at each iteration. However, this increase can be kept reasonable by doing the simulated push for electrons only (since ions are much slower 63 and are not subject to numerical instabilities at the timesteps we use). Also, collisions are not taken into account during the simulated push. Thus, the additional cost of the PPC mostly consists in the extra call to the field sover; the extra electrons loop executes fast enough to be of no concern. The PPC was found to have a significant positive effect on stability. It was tested on the CHT 193x157 mesh, with the parameters reported in table 3.4. Table 3.4: Parameters for the Particle Predictor-Corrector assessment. Anode flow rate (sccm) Anode Voltage (V) y 4 250 4 Super-particle size (ions and electrons) f Timestep (s) W 1000 5 x 10-12 Fig. 3-9 represents the number of electrons superparticles in four cases: explicit leapfrog, PPC 0 = 0.25, PPC 0 = 0.5, and PPC 0 = 1. The electrons count was chosen because numerical instability increases the electron speed and temperature, thus leading to a large, unphysical increase in ionization frequency and secondary electron emission, which results in an exponential growth of the number of electrons. Indeed we can see than the electron count diverges fast in the explicit leapfrog case, while it remains bounded for the implicit cases. The extinction of the discharge after the initial ionization surge was found to be caused by an accumulation of positive charge on the dielectric surfaces, which gives rise to an unphysically high potential in the chamber, expelling all ions and effectively ending the discharge. This problem is likely unrelated to the PPC, but should be solved in order to allow for simulations at low ' . 64 5000000 4500000 (U t 4000000 , 3500000 : 3000000 -PC 0 2500000 U - (D 2000000 a) o E Z 1.0 -EXPLICIT 4-, PC 0.5 PC 025 1500000 1000000 500000 0 0.OOE+00 5.OOE-06 1.OOE-05 1.50E-05 2.OOE-05 2.50E-05 Time (s) Figure 3-9: Number of electron super-particles in various PPC configurations. 65 66 Chapter 4 Experimental Characterization of the Cylindrical Cusped-Field Thruster 4.1 4.1.1 Cylindrical Cusped-Field Thruster overview Background The Cylindrical Cusped-Field Thruster (CCFT) builds on the experience gained by the SPL with the Diverging Cusped-Field Thruster. Designed in 2008 by Daniel Courtney ([9]), this device has been extensively tested at MIT, as well as in the Air Force Research Laboratory facility at Edwards AFB, California. The key features of the DCFT are the absence of centerpiece, the cusped magnetic field created by permanent magnets rather than electromagnets, and obviously its unusual divergent shape. The cusped magnetic field is a relatively recent concept in Hall thruster engineering; it was first incorporated into the HEMP thruster ([14]). Unlike the uniform radial magnetic field found in traditional Hall thrusters (see fig. 1-1), cusped-field thruster features regions of strong magnetic field gradient, called cusps, between which electrons are axially confined, a process known as magnetic bottling ([4], p. 77). The strong magnetic field gradient from the center axis to the 67 Cathode Ib Axis -1Ib bb E s Cusp Magnets N Anode Figure 4-1: Schematic DCFT diagram (from [9], p. 31 wall also keeps most electrons (and most ions, since they are electrostatically tied to electrons) away from the walls, dramatically reducing erosion, except at the cusps. The schematic structure of the magnetic field is given in fig. 4-1. Testing showed good performance compared to other devices from the same category (see table 4.1), but several drawbacks were identified. First, the wide plume divergence and the hollow conical plume structure (see [9], p.80) decrease the global efficiency significantly. Also, the DCFT operates alternatively in a high anode current (HC) and a low anode current (LC) mode, and transitions are rather difficult to predict ([9], p.62). The high current mode has a lower efficiency and features large oscillations , first observed by Matlock ([18], p. 162), which may increase the erosion of the boron nitride lining. Thus we want to keep the thruster operating in a single, non-oscillatory mode. The CCFT was built to address these deficiencies; a detailed account of the design process can be found in [20]. It features a cylindrical discharge channel (37mm dia. x 51.5 mm length) and an exit separatrix perpendicular to the center axis. These features have been incorporated mainly in order to reduce the plume divergence angle. Additionally, the cylindrical shape keeps the neutral density higher than in the DCFT, 68 Table 4.1: DCFT performance at the nominal operating point (from [20], p.30) Anode Voltage (V) 550 V Anode power (W) Xenon mass flow rate (secm) Anode efficiency Anode potential (V) 242 8.5 44% 550 Specific impulse (s) 1640 thus increasing the ionization probability. 4.2 4.2.1 Experimental Setup Vacuum chamber All tests presented here were performed in SPL's largest - 1.6m diameter x 2.8m vacuum chamber, Astrovac. Pumping is provided by two CTI-Cryogenics cryopumps, one OB-400 and one CT-10, both cooled down by a CTI-Cryogenics 9600 compressor. Their combined pumping speed is rated at 7500 L/s for Argon. Pressure during high vacuum operation is manually recorded with an Instrutech IGM-401 Hornet hot cathode vacuum gauge. Unless otherwise indicated, all pressures subsequently mentioned were obtained with the gauge configured in Xenon mode. The background pressure with the standard setup installed, no gas load and all valves closed is between 0.5 and 1 IpTorr. Grade 5.0 (i.e. 99.999 % pure) Xenon is used for testing. Two OMEGA FMA6502-ST-XE flow controllers regulate the supply of gas to the anode and the cathode, respectively. The neutralizing cathode is a Busek BHT-1500, which can deliver an emission current of up to 3A. It is a Barium Oxide impregnated cathode, and thus subject to poisoning. To prevent this, a Restek 20600 high capacity oxygen trap and a Restek 22010 indicating oxygen trap are fitted to the cathode gas supply line; together, they are rated to reduce the oxygen concentration to 0.1 ppm. 69 I ." Figure 4-2: Simulated CCFT magnetic field (from 1201, p. 36) 4.2.2 Electrical setup The basic electrical setup is presented in fig. 4.2.2. The main elements are the anode (A), the cathode (C), the cathode heater (H) and the cathode keeper electrode (K). The anode and the keeper were powered by two computer-controlled Agilent N5722A DC power supplies, rated for a maximum voltage of 600V and a maximum current of 2.6A. The heater was supplied by an Agilent HPJA146OPS source, controlled manually. 4.2.3 Stage Previous experiments conducted in Astrovac used a 2-axis stage sytem to move the probes in the chamber, and a rotary stage to keep it pointed towards the thruster (see [18], p.98) . Although this setup worked well, it was quite bulky and took a long time to install. It was thus decided to build a much more compact and lightweight 70 Ac HOC Figure 4-3: Sketch of the electrical setup used for the CCFT experiments stage. This stage provides azimuthal and radial mobility and is thus well suited to azimuthal scans of the plume, which are the primary diagnostics for Hall thrusters. The baseplate, the rail, the chariot and the instruments-bearing mast were machined from 6061 Aluminum in order to minimize the outgassing. Both axes (azimuth and radius) are powered by 6RPM DC motors; position feedback is provided by a rotary and a linear encoder. All the electronics are commanded through an Arduino board. Many components (motors, encoders, timing belt for the linear axis, ball bearings for the chariot) are not certified for vacuum operation, and there were initially concerns about whether the setup would outgass too much and contaminate the chamber. These concerns were not founded since the background pressure (below 1 pfTorr) was found to be small with respect to the pressure elevation caused by the injected Xenon. This new stage has proved itself reliable and well suited for azimuthal-scan based measurements. The only caveat is the vulnerability of the electronics to sparks. 4.2.4 Measurement devices A Faraday Cup and a Retarding Potential Analyzer (RPA) were built for these experiments. The Faraday cup (fig. 4-5) features a 9.91mm diameter stainless steel collector plate, surrounded by a stainless steel guard ring. 71 Both are biased to a Figure 4-4: Stage system and thruster stand potential of about -27V during operation in order to repel electrons. The RPA (see fig. 4-6) is made of an aluminum cylinder (outer diameter = 1") onto which are screwed a backplate and a frontplate, both made of stainless steel. This cylinder houses a stack of grids (made of a stainless steel mesh spot-welded on a steel washer) separated by Macor washers (6). From front to back, the components are: floating front grid (1), electron repelling grid (2), ion repelling grid (3), secondary electron repelling grid (4) and collector plate (5). Insulation between the stack and the hollow cylinder is provided by a layer of Kapton tape. During operation, the two electron repelling grids are held to the same negative potential of about -27V, while the ion repelling grid potential is progressively increased. Since only ions with an energy higher than this potential can reach the collector, plotting the collected current as a function of the repelling voltage gives access to the ion energy distribution. A typical RPA profile, obtained during the tests reported in section 4.3.5, is provided in fig. 4-7. 72 Figure 4-5: The Faraday cup Figure 4-6: Cutaway drawing of the RPA *Data Fit points 2.5 2 1-5- 0,5[ 0il 50 100 200 150 Ion repellng potential (V) 250 300 Figure 4-7: Collected current as a function of the repelling voltage - Configuration A, 0 73 Figure 4-8: Graphite cap showing some fragments of molten steel 4.3 4.3.1 Results Anode overheating A salient feature noted during the operation of the CCFT is the very high temperature reached by the anode assembly. This assembly consists of a graphite cap screwed on a metal threaded rod. During early testing, the temperature was high enough to cause the graphite cap to glow red and the stainless steel threaded rod to shear off. Pictures (figs. 4-8 and 4-9) clearly show evidence of melting at the location where the steel sheared off. These observations are consistent with an anode temperature in excess of 800'C. All the stainless steel parts in the anode assembly were subsequently replaced with molybdenum, which solved the rupture issue. However, the red-orange glow (4-10) was observed during the whole test campaign. Qualitative observation based on the intensity of the glow show that the temperature is strongly correlated to the anode voltage and, but much less to the anode current or anode power level. The most likely explanation for these unusually high temperatures is the magnetic field, which funnels a large number of electrons in a narrow channel around the center axis. It is worth emphasizing that it is the impact of these electrons which is the cause, 74 Figure 4-9: Stainless steel rod showing evidence of melting (at the right tip) Figure 4-10: Red glow emitted by the anode assembly during operation 75 and not ohmic heating, which is orders of magnitude too small to explain this effect. 4.3.2 Regime transitions Over the course of the experiments, three distinct plume structures were observed. " The first one, the most common, called the normal shape (fig. 4-11), is characterized by a globular, diffuse plume extending about 20 centimeters downstream of the exit plane. " The second one, dubbed the bag shape(fig. 4-12), is somewhat similar to the normal plume, but features a distinct separation between a bright part centered on the exit orifice, and a diffuse peripheral plasma around it. * The third one has been called the jet regime (fig. 4-13). It is characterized by a straight, solid plume coming out of the thruster, again surrounded by a much dimmer plasma. This regime looks promising from the point of view of beam collimation; unfortunately it is very elusive, and sometimes moving the probes in the chamber is enough to cause a transition back to the normal or the bag regime. Consequently, the only data available for this mode consists in a few anode current-voltage points. The factors governing the transition from one mode to another have not been conclusively identified. The thruster almost always starts in the normal mode; transition to the "bag" mode, when it happens, usually takes place after several dozen minutes of operation. One this transition has happened, the "bag" mode is rather stable and will subsist even if the thruster is turned off for a few seconds. This suggests that the transition may be due to a "warm-up" of the device. Temperature can change the operation of the thruster by two main avenues: by affecting the velocity, and thus the travel times of the xenon atoms; second, by modifying the magnetic field - the Samarium-Cobalt S3212 magnets used in the CCFT are reported to have temperature coefficients of -0.03%/C for induction and -0.17%/'C for coercivity ([1]). It should be possible to fit some small magnetic field sensors between the magnets and 76 re 4-11: Normal plume urp. 4-12: Bap n1ime Figure 4-13: Jet plume Table 4.2: Configurations for the voltage-current characteristics Case rha Tc P Ik A B C D E F G H I J K (sccm Xe) (sccm Xc) (ptTorr, corrected for Xe) (A) 4.0 6 .0 2.1 1.5 1.0 0.5 2.1 2.0 1.0 2.0 3.0 2.1 2.1 2.1 2.1 2.1 2.1 1.0 1.0 1.0 0.3 0.3 23.8-27.0 32.4 15.6 - 16.5 14.2 - 14.6 12.5 - 12.7 10.1 - 10.2 14.2 - 14.9 12.0 - 12.1 7.76 11.8 - 12.1 15 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.3 0.3 rha Anode mass flow rate rh, Cathode mass flow rate Background pressure Cathode keeper current Pb Ik the boron nitride lining in order to verify this hypothesis. Since three modes have been observed, and the magnetic field has three cusps, the mode transitions may correspond to a displacement of the ionization region from one cusp to another: the well-focused jet beam would correspond to an ionization located at the cusp closest to the anode, while the poorly collimated bag mode would come from the external cusp. 4.3.3 Voltage-Current characteristics The simplest measurement that can be made on a Hall thruster is the anode voltage-current characteristic curve. The configurations investigated are listed in table 4.2. The anode current appears to be rather weakly dependent on the anode voltage (all other things being equal). The correlation with the anode flow rate is much stronger. This suggests that the CCFT achieves, at least at anode flow rates below 4 sccm, a near-complete ionization of the propellant, so that the only way to get more current is to provide more gas. This hypothesis is compounded by the fact that the 78 12 1.1 1 0.9 0.8 C -E 0.4 0.3 0.2 0.1 0 50 100 Anode voltage (V) 300 250 350 Figure 4-14: Anode current-voltage characteristics - I 0.6 0.5 0.4 -0- F C -~K 0.2 0.1 0 50 100 150 Anode voltage (V) 25o 300 350 400 Figure 4-15: Anode current-voltage characteristics - II 79 Table 4.3: Comparison of measured anode current to the anode current expected from full single ionization and full double ionization of the propellant. Case Va mha la Isingle Idouble A B H K Va Ia Isingle Idouble (V) 150 100 300 300 (sccm Xe) 4.0 6.0 2.0 3.0 (A) 0.73 1.113 0.309 0.504 (A) 0.289 0.433 0.144 0.217 (A) 0.578 0.867 0.289 0.433 Anode voltage Anode current Current corresponding to the full single ionization of the anode flow. Current corresponding to the full double ionization of the anode flow. Note: the full single ionization of 1 sccm Xe corresponds to 0.0722399 A ([13], p. 464). levels of anode current measured are close, or even exceed, what would result from full double ionization of the propellant, as shown in table 4.3. However, the anode current is likely boosted by the backflow of Xenon into the thruster caused by the limitations of the pumping system. Testing in a facility with a higher pumping speed would be appropriate. 4.3.4 Faraday Cup measurements Several scans of the plume were performed with the Faraday cup described in section 4.2.4, in order to ascertain the degree of collimation of the beam - obtaining a solid plume was one of the main goals of the CCFT project ([20], p. 31). The probe was moved on a circular arc from -90' to +90' on each side of the center axis. An interpolating current density function Ji(O) was then fitted to the data points and this interpolant was integrated to compute the beam current Ib and its component along the thrust axis Ib. =I 27 0 80 Ji(6)sindO (4.1) Table 4.4: Configurations for the Faraday cup scans Case 1 2 3 4 5 rha (sccm Xe) 2.0 2.0 2.0 2.0 2.0 Va la Ik Vk (V) 250 325 250 250 250 (A) 0.282 0.319 0.265 0.298 0.258 (A) 0.5 0.5 0.5 0.5 19.7 (V) 19.3 19.3 23.7 19.7 0.5 Pb (ptTorr, Xe corr.) 12.6 11.9 11.4-11.9 12.2 11.3-11.6 hc (sccm Xe) 1.0 1.0 1.0 1.0 1 R (cm) 26.0 26.0 12.9 12.9 12.9 fl~~VIN 1' Vk Cathode keeper voltage R Radius of the scan arc 1.4 1 0.8 0.6 C 0.4 0,2 0 -100 -80 -60 -40 -20 20 0 60 40 80 100 Angle (degree) Figure 4-16: Plume current density - Case 1 Ibz = 2irR 2 (4.2) J(0)cos0sin0d6 j/ The ratio I-z can be converted to a divergence angle 6 di, via formula (4.3) which quantifies the degree of collimation of the beam, a value of 0' corresponding to the ideal case of a beam entirely directed along the main thrust axis. cos(Odi,) = Ibz (4.3) IA These results show that the CCFT never exhibits the hollow conical plume structure observed for the DCFT in high-current mode. However, the collimation of the 81 Mode Normal Normal Normal Bag Bag 1.2 C 0.6 C 0) 0.4 0.2 0 -100 -80 -60 -40 -20 0 20 40 60 80 100 Angle (degree) Figure 4-17: Plume current density - Case 2 4.5 3.5 E 3 2.5 C 2 V C 1.5 0) 1 U 0.5 -100 -50 50 0 Angle Figure 4-18: Plume current density - Case 3 Table 4.5: Integrated results for Faraday cup scans 0 Case I Iz div k 1 2 3 4 5 (A) 0.2637 0.2309 0.2194 0.1655 0.142805 (A) 0.1698 0.1382 0.1413 0.1010 0.0869 82 (0) 49.9 53.2 49.9 52.4 52.5 (-) 0.935 0.724 0.828 0.555 0.543 100 3.5 3 E 2.5 2 0.5 0, 100 50 0 -50 -100 Angle (degrees) Figure 4-19: Plume current density - Case 4 3 2 E C> 0,5 -100 -80 -60 -40 -20 0 20 40 60 Angle (degree) Figure 4-20: Plume current density - Case 5 83 80 100 Table 4.6: Configurations for the RPA measurements Case A B C mha (sccm Xe) 2.0 2.0 2.0 Va (V) 250 300 200 'a (A) 0.311 0.281 0.422 Ik (A) 0.5 0.5 0.5 Vk (V) 17.7 25.5 23.2 he (scem Xe) 1.0 1.0 1.0 P Mode (pTorr, Xe corr.) 17.9 13.6 17.2 Normal Normal Normal beam is rather disappointing, with a characteristic angle on the order of 50'. We also notice that the current utilization efficiency (g) drops sharply when the thruster goes into the "bag" mode. However, contrary to what one may infer from visual inspection, the degree of collimation is similar in both cases. The poor current utilization efficiency of the bag mode means that it should probably be avoided in the interest of performance. 4.3.5 Retarding Potential Analyzer Measurements RPA scans were taken for three different operating points, as reported in table 4.6, and at different off-axis angles. An interpolating function was fitted to the raw data with the "smoothing spline" option in the MATLAB Curve fitting toolbox; this interpolant was then differentiated dI and the resulting T curve was normalized by its maximum value to enable easy comparison across scans. The normalized derivative curves show the expected aspect. Ion energies are clustered around a value equal or slightly lower than the anode potential in all cases. The Full Width at Mid-Height (FWMH) is usually around 25V, and appears to be larger for run A (one must keep in mind that the FWMH is not the ion temperature). The literature reports that RPA scans sometimes show the existence of two distinct ion populations: a high-energy population clustered around the anode voltage, and a lower energy population created by charge-exchange collisions (see [13], p. 400). The energy of the charge-exchange population decreases with the off-axis angle. Given the diffuse appearance of the plume in the normal mode (confirmed by Faraday cup 84 0 25 7> p 0.8 0.6 0.4 0.2 -0.2 50 250 10 200 Ion repelling potential (V) 100 Figure 4-21: Normalized dV 300 at different angles - case A 1.2 25" 80 1 0.8 JI 0.6 0.4 0 -02 50 0 100 250 300 150 200 Ion repelling potential (V) 350 400 y at different angles - case B Figure 4-22: Normalized 1.2 25' 1 60" 0.8 0.6 0.4 0.2 0 -02 0 I 7- 50 100 200 150 Ion repelling potential (V) Figure 4-23: Normalized 250 300 at different angles - case C d 85 scans), one may expect to find a lot of low-energy charge-exchange ions, which are created far from the anode and are expelled in a much more isotropic pattern than the axially-directed high-energy ions. However, the RPA scans clearly prove that it is not the case: ions with an energy close to the anode voltage dominate at all angles and in all configurations. Thus, the causes for the CCFT plume divergence are yet to be determined. 86 Chapter 5 Conclusion 5.1 Future work recommendations for PTpic The improvements that can be made to PTpic broadly fall into two categories: those related to performance (doing the same things faster), and those related to the physical model being implemented (representing the physics of the thruster more accurately). The parallelization redesign work presented in this thesis is a major improvement from the point of view of performance. However, in the author's opinion, there are still several avenues to make PTpic faster. They are listed by order of importance. 5.1.1 Electric field solver The experiments performed in section 3.1.3 prove that the electric field solver is now the main obstacle to scalability for runs with a low number of plasma particles (see fig. 3-3). This is problematic since a simulation usually runs in this low density plasma regime for several hundred thousand iterations before the thruster "ignites". Thus, removing this scalability limitation should be a priority of the future work on PTpicFP. In order to understand how we may achieve this, some insight into the internal operation of the field solver is required. The purpose of the field solver is to solve a large linear system (5.1) at each 87 k 351 5 11 17 237 29 4 10 16 22 28 34 3 9 15 21 27 33 2 8 14 20 26 32 7 13 19 25 31 6 12 18 24 30 0 k Figure 5-1: Mesh numbering scheme used by PTpic up to now iteration. A q With the CHT 193x157 mesh for instance, there are N (5.1) = 21930 potential un- knowns (this number is lower than the number of nodes since the anode and the thruster body are held to a fixed potential), meaning that A is a 21930 x 21930 matrix. We take advantage of the fact that A is the same for all iterations by computing a LU factorization A = LU during the initialization. Since L and U are triangular, solving eq. 5.1 is much easier: it only requires a double backsubstitution. general case, L and U have - In the non-zero entries (the coefficients below and above the diagonal, respectively). This number of non-zero entries in L and U is called the fillin. The number of operations requires to perform the backsubstitution is proportional to the fill-in. Thus, minimizing it is critical in order to improve the performance of the field solver. Fortunately, A is a banded matrix, meaning that its coefficients are zero except in a narrow diagonal band. This results from two things: the mesh nodes numbering, and the finite difference scheme we use to discretize Poisson's equation. In the current version of PTpic, the node numbering starts at the lower-left corner, then goes column after column, from bottom to top (see 5-1). 88 Since we use a 9-point discretization of the laplacian ([121, p. 143), a given node can only interact with its immediate neighbors. Thus, the index number of any node involved in the expression for the discretized laplacian at node n is within [n - (N 1 + 1); n + (Nj + 1)], N being the number of lines. For instance, in the example given in fig. 5-1, we have Nj = 6. Then, the nodes involved in the discretized laplacian at n = 21 have an index number between 14 and 28. Thus, A is banded, with a bandwidth w = N + 1. This means that any coefficient which is more than 7V away from the main diagonal is zero. This banded structure drastically reduces the fill-in of the LU factors; more precisely it is now N N/21930 = 148. . instead of N . For N, = 21930, this reduces the workload by This is impressive but it is possible to do better. With a clever numbering of the mesh nodes, it is actually possible to bring the fill-in down to Nlog 2 (Na). One algorithm that can be used to compute this optimal numbering scheme (or fill-in reducing ordering) is the nested-dissection algorithm. Using a fill-in reducing ordering can thus reduce the operations count of the field solver by lo . For Nu = 21930, this is a factor of more than 10. This does not mean that the field solver will be 10 times faster; in modern computers with a very high clock frequency, the speed limiting part is often the loading of data into memory rather than the actual calculation. However, it is certainly worth trying it. Implementation would be easy since ParMETIS provides a function to calculate these fill-in reducing orderings. It is also important that PTpic users keep themselves informed of the state of the art in numerical linear algebra. Right now we are using ScaLAPACK to handle the LU factorization and the backsubstitutions, but it may very well be superseded by another package in the near future. 5.1.2 Improved particle pushers Further research on implicit particle pushers should also be a priority; being able to relax the timestep and grid size stability limits would be a considerable advance. Two interesting candidates have been identified and should be investigated in detail. 89 In both cases, the idea is again to predict the field at the next iteration En+1, then advance the particles with a linear combination OE"+ 1 + (1 - O)E". The first one is the Implicit Moments Method (IMM), developed at Los Alamos National Laboratory by Brackbill and Forslund ([5]). Like Cho's semi-implicit method, it uses particle moments to predict the electric field. The difference is that the IMM takes the pressure into account, and that the predicted electric field and particle moments are iteratively refined until they fully agree (while the semi-implicit method computes the electric field only once per iteration). Stable and accurate results at large timesteps were reported by the authors. The second possibility is the Direct Implicit Method (DIM) developed at Lawrence Livermore National Laboratory by Langdon and Hewett ([16]). Unlike the IMM, it works at the particle level and is thus conceptually close to the Particle PredictorCorrector method. Finally, an alternative or complement to these methods is orbit-averaging, a technique which consists in filtering out the high-frequency electron oscillations by averaging their position over a group of iterations. More information can be found in Denavit ([10]) and Cohen ([7] and [8]). 5.1.3 Refinement of the load metric A somewhat tedious but useful task would consist in finding the optimal load metric. Currently, the following formula is used to calculate the computational load associated to the cell [k,j]: load[k,j] =DTN2E * (Ni[k,j] + Ne[k,j]) + N,[k,j] (5.2) where DTN2E is the number of iterations that we allow to elapse before moving the neutrals (values between 20 and 50 are customarily used), and Ni, N, and N, are the number of ion, neutral and electron superparticles in the cell. The underlying assumptions are that moving a superparticle of any type takes the same amount of time, and that the cost of calculating the statistics for the cell is negligible. Both are 90 questionable. First, it is likely that moving a charged particle takes more time than moving a neutral: the integration is more complicated because one must take the electric and magnetic fields into account; also, the collision management functions are different in each case, and are probably more time-consuming for the electrons since they have to be individually checked for almost all the types of collision modeled in PTpic. Second, the calculation of particle moments in a cell, even if it is empty, does take some computation time. Thus, it may be appropriate to add a constant cost to the load metric in eq. 5.2. A good solution to this problem would consist in measuring the time spent on each cell (including particle move and statistics calculation) directly, through timers embedded into the code, rather than inferring it from a somewhat arbitrary formula such as 5.2. 91 92 Appendix A Particle pusher integrators A.1 Leapfrog method The leapfrog method is a numerical integrator for equations of the type: (A.1) x = F(x) Its distinctive feature is that position and velocity are not calculated at the same time, but with an offset equal to Xn+1 : _ X - Vn+1/ 2 .At (A.2) V n+1/2 _ n-1/2 + F(x")- At Despite being only second-order accurate, it is widely used in PIC codes because of the following properties: 2 1. Time-reversibility: if we start from (xn+1, Vn+1/ ) and apply the method with timestep (-At), the result is (X. V n- 1/ 2) exactly. 2. It is symplectic, i.e. it conserves exactly a modified particle Hamiltonian ([19], [17]). Actually, this property may simply be a consequence of time-reversibility; the literature I've seen is not conclusive on this point. 93 There is a formulation of the method which gives velocity and position at the same time, it is known as the Synchronous Leapfrog Method. It may be useful for theoretical studies, but is almost never used in actual codes. Defining v" =nl/2+vn+1/2 and a' = F(x'), it reads: 1 ~2 X (A.3) an+1 2a+ 2 A.2 Boris method The Boris method is an elegant reformulation of the leapfrog method when F is the Lorentz force. In this case, the second leapfrog equation reads: n-1/2 n+1/2 = mq(E + 'At m x B) 2 - Let us introduce the auxiliary variables v qE -/ n+1/2I 7 - rn 2 and v+ (A.4) Vf+1/2 _ At -2 Substituting into (A.4) yields (see [2]) V+ +- = (v+ At2m A We introduce t = 1B - At and s = 2t v)xB (A.5) . Then it appears that, "from geometrical considerations", v+ can be obtained by: V V- + V- X t (A.6) V- + V' X S V+ Thus the sequence to obtain e Calculate v- = vn-1/2 + q rn vn+1/ -t 2 from Vn-1/2 is: 2 e Then form v' = v- + v- x t 94 o Then calculate v+ SFinally, v'+1/2 = V+ v + v' x s + E - At Hence the common description of Boris algorithm in the litterature: "Advance velocity with half the electric field (step 1), then do the full magnetic rotation (steps 2 and 3), and finally add the other half of the electric field". 95 96 Bibliography [1] Dexter Magnetic Technologies - S3212 Magnet Material Grade Samarium Cobalt, 2014. [2] Charles K. Birdsall and Bruce Langdon. Plasma Physics via Computer Simulation. Taylor & Francis, New York, NY, 2005. [3] Charles K. Birdsall and Neil Maron. Plasma Self-Heating and Saturation due to Numerical Instabilities. Journal of Computational Physics, 36:1-19, 1980. [4] J.A. Bittencourt. Fundamentals of Plasma Physics. Springer, New York, third edition, 2004. [5] Jeremiah U. Brackbill and David W. Forslund. An Implicit Method for Electromagnetic Simulation in Two Dimensions. Journal of Computational Physics, 46(2):271-308, 1982. [6] Shinatora Cho, Kimiya Komurasaki, and Yoshihiro Arakawa. Kinetic particle simulation of discharge and wall erosion of a Hall thruster. Physics of Plasmas, 20(6):63501, 2013. [7] Bruce I. Cohen, Thomas A. Brengle, Davis B. Conley, and Robert P. Freis. An orbit averaged particle code. Journal of Computational Physics, 38(1):45-63, November 1980. [8] Bruce I. Cohen, Robert P. Freis, and Vincent Thomas. Orbit-averaged implicit particle codes. Journal of Computational Physics, 45(3):345-366, March 1982. [9] Daniel George Courtney. Development and Characterization of a Diverging Cusped Field Thruster and a Lanthanum Hexaboride Hollow Cathode. Sm thesis, Massachusetts Institute of Technology, 2008. [10] J. Denavit. Time-Filtering Particle Simulations with wpe.At >> 1. Journal of Computational Physics, 42:337-366, 1981. [11] Justin M Fox. Parallelization of a Particle-in-Cell Simulation Modeling HallEffect Thrusters. PhD thesis, Massachusetts Institute of Technology, 2005. Development of the Plasma Thruster Particle-in[12] Stephen Robert Gildea. Cell Simulator to Complement Empirical Studies of a Low-Power Cusped-Field Thruster. PhD thesis, Massachusetts Institute of Technology, 2013. 97 [13] Dan M. Goebel and Ira Katz. Fundamentals of Electric Propulsion: Ion and Hall Thrusters. John Wiley & Sons, Hoboken, NJ, 2008. [14] G. Kornfeld, N. Koch, and Harmann H.-P. Physics and Evolution of HEMPThrusters. In 30th InternationalElectric Propulsion Conference, Florence, Italy, 2007. [15] Bruce Langdon. Analysis of the time integration in plasma simulation. Journal of Computational Physics, 30(2):202-221, February 1979. [16] Bruce Langdon and Dennis Hewett. Direct Implicit Plasma Simulation. Journal of Computational Physics, 72:121-155, 1987. [17] Daniel W Markiewicz. Survey on symplectic integrators. Technical report, University of California at Berkeley, 1999. [18] Taylor Scott Matlock. An Exploration of Prominent Cusped-Field Thruster Phenomena : The Hollow Conical Plume and Anode Current Bifurcation. PhD thesis, Massachusetts Institute of Technology, 2012. [19] R. I. McLachlan and M. Perlmutter. Energy drift in reversible time integration. Journal of Physics A: Mathematical and General, 37(45):L593-L598, November 2004. [20] Anthony Pang. Development and Simulation of a Cylindrical Cusped-Field Thruster and a Diagnostics Tool for Plasma-MaterialsInteractions. Sm thesis, Massachusetts Institute of Technology, 2013. [21] S.E. Parker and C.K. Birdsall. Numerical error in electron orbits with large W,,At. Journal of Computational Physics, 97(1):91-102, November 1991. [22] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in FORTRAN. Cambridge University Press, Cambridge (UK), second edi edition, 1992. [23] Y. Raitses and N. J. Fisch. Parametric investigations of a nonconventional Hall thruster. Physics of Plasmas, 8(5):2579, 2001. [24] Y. Raitses, A. Smirnov, and N. J. Fisch. Effects of enhanced cathode electron emission on Hall thruster operation. Physics of Plasmas, 16(5):057106, 2009. [25] A. Smirnov. Plasma measurements in a 100 W cylindrical Hall thruster. Journal of Applied Physics, 95(5):2283, 2004. [26] A Smirnov, Y. Raitses, and N. J. Fisch. Parametric investigation of miniaturized cylindrical and annular Hall thrusters. Journal of Applied Physics, 92(10):5673, 2002. [27] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI: The Complete Reference. The MIT Press, Cambridge, MA, 1996. 98 [28] James Joseph Szabo. Fully Kinetic Numerical Modeling of a Plasma Thruster. Phd thesis, Massachusetts Institute of Technology, 2001. 99