D e p a r t me nto fR a d i oS c i e nc ea ndE ng i ne e r i ng MikkoH o nkal a Buil ding bl o c ks fo r fast c irc uitsimul at io n Buil ding bl o c ks f o rf astc irc uitsimul at io n M i k k oH o nk a l a A a l t oU ni v e r s i t y D O C T O R A L D I S S E R T A T I O N S Aalto University publication series DOCTORAL DISSERTATIONS 174/2012 Building blocks for fast circuit simulation Mikko Honkala Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Aalto University School of Electrical Engineering for public examination and debate in Auditorium S1 at the Aalto University School of Electrical Engineering (Otakaari 5, Espoo, Finland) on the 18th of January, 2013, at 12 noon. Aalto University School of Electrical Engineering Department of Radio Science and Engineering Supervising professor Professor Martti Valtonen Thesis advisor D.Sc. (Tech.) Janne Roos Preliminary examiners Associate professor Gabriela Ciuprina, Polytechnic University of Bukarest, Romania Professor Timo Rahkonen, University of Oulu, Finland Opponent Prof.dr. Wil H.A. Schilders, Technische Universiteit Eindhoven, The Netherlands Aalto University publication series DOCTORAL DISSERTATIONS 174/2012 © Mikko Honkala ISBN 978-952-60-4922-9 (printed) ISBN 978-952-60-4923-6 (pdf) ISSN-L 1799-4934 ISSN 1799-4934 (printed) ISSN 1799-4942 (pdf) http://urn.fi/URN:ISBN:978-952-60-4923-6 Unigrafia Oy Helsinki 2012 Finland Abstract Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Mikko Honkala Name of the doctoral dissertation Building blocks for fast circuit simulation Publisher School of Electrical Engineering Unit Department of Radio Science and Engineering Series Aalto University publication series DOCTORAL DISSERTATIONS 174/2012 Field of research Circuit theory Manuscript submitted 8 June 2012 Date of the defence 18 January 2013 Permission to publish granted (date) 30 August 2012 Language English Monograph Article dissertation (summary + original articles) Abstract Modern electronic circuits are typically large, consisting of thousands of transistors and other components. During the design process, there is a need to perform computationally demanding numerical simulations to verify the functionality of the circuit. Thus, the need for fast and accurate circuit simulation tools is obvious. Four approaches to improve the speed and the convergence of the numerical circuit simulation are presented. The first approach utilizes efficient iteration methods for nonlinear DC analysis. Newton–Raphson (NR) iteration is the most used nonliner iteration method for nonlinear circuit equations, but it lacks good global convergence properties. Some new variants of nonlinear iteration methods are proposed to improve the convergence of DC analysis. In the second approach, the computing time is reduced by using parallel processing. Parallelization of harmonic balance (HB) analysis using multithreads is studied. Also, the modified multilevel NR method that has improved convergence properties is presented. The third approach concentrates on improving the convergence of iterative solvers for linear systems using preconditioners. The emphasis is in the preconditioning of Jacobians of the HB method. It is shown how to use time-domain preconditioners with frequency-domain preconditioners in order to benefit from both. The fourth approach to speed up the circuit simulation is to use model-order reduction (MOR), where the idea is to approximate complex circuit models with simpler ones. This thesis concentrates on MOR methods for linear circuits or the linear parts of nonlinear circuits. Efficient partitioning-based MOR methods and a new global approach to projection-based MOR are proposed. Keywords Circuit simulation, numerical analysis, parallel processing, iterative methods, model-order reduction, preconditioners ISBN (printed) 978-952-60-4922-9 ISBN (pdf) 978-952-60-4923-6 ISSN-L 1799-4934 ISSN (printed) 1799-4934 ISSN (pdf) 1799-4942 Location of publisher Espoo Pages 157 Location of printing Helsinki Year 2012 urn http://urn.fi/URN:ISBN:978-952-60-4923-6 Tiivistelmä Aalto-yliopisto, PL 11000, 00076 Aalto www.aalto.fi Tekijä Mikko Honkala Väitöskirjan nimi Nopean piirisimuloinnin rakennuspalikoita Julkaisija Sähkötekniikan korkeakoulu Yksikkö Radiotieteen ja -tekniikan laitos Sarja Aalto University publication series DOCTORAL DISSERTATIONS 174/2012 Tutkimusala Piiriteoria Käsikirjoituksen pvm 08.06.2012 Julkaisuluvan myöntämispäivä 30.08.2012 Monografia Väitöspäivä 18.01.2013 Kieli Englanti Yhdistelmäväitöskirja (yhteenveto-osa + erillisartikkelit) Tiivistelmä Nykyaikaiset elektroniikkapiirit ovat tyypillisesti isoja, tuhansien transistorien kokonaisuuksia. Suunnitteluprosessin aikana niiden toiminta pitää tarkastaa laskennallisesti haastavien simulaatioiden avulla. Näin ollen nopeille ja tarkoille simulaatiotyökaluille on tarvetta. Tässä väitöskirjassa tarkastellaan neljää erilaista lähestymistapaa numeeristen piirisimulaatioiden nopeuttamiseksi. Ensimmäisessä lähestymistavassa tutkitaan tehokkaita iteraatiomenetelmiä epälineaaristen piiriyhtälöiden ratkaisemiseksi. Newton-Raphson-menetelmä (NR) on yleisesti käytetty iteraatiomenetelmä epälineaaristen tasavirtapiiriyhtälöiden ratkaisemiseksi. Sen huono puoli on globaalien suppenemisominaisuuksien puute. Tässä väitöskirjassa esitellään muutamia uusia iteraatiomenetelmiä tasavirta-analyysin suppenemisen parantamiseksi. Toisessa lähestymistavassa piirisimulointia nopeutetaan rinnakkaislaskennan avulla. Väitöskirjassa käsitellään harmoninen balanssi -menetelmän (HB) rinnakkaistaminen säikeiden avulla. Lisäksi esitellään rinnakkaislaskentaan soveltuva monitasoinen NRmenetelmä, jossa on erityisesti otettu huomioon suppenemisen avustaminen. Kolmas lähestymistapa keskittyy lineaaristen yhtälöryhmien ratkaisemisessa käytettyjen iteraatiomenetelmien pohjustimiin. HB-yhtälöiden kanssa käytetään tavallisesti taajuusalueen pohjustimia, mutta tässä väitöskirjassa esitetään, miten taajuusalueen pohjustimet voidaan yhdistää aika-alueen pohjustimien kanssa, jotta saadaan kummankin hyvät ominaisuudet käyttöön. Neljännessä lähestymistavassa käytetään malliredusointia. Sen ideana on redusoida isoa piirimallia pienemmäksi siten, että tarkkuus kuitenkin säilyy riittävänä. Tässä väitöskirjassa keskitytään lineaaristen piirien malliredusointiin ja esitellään piirijakoon perustuvia menetelmiä sekä uusi globaaliin approksimaatioon perustuva menetelmä. Avainsanat Piirisimulointi, numeerinen analyysi, rinnakkaislaskenta, iteratiiviset menetelmät, malliredusointi, pohjustimetnaeos, nulla ISBN (painettu) 978-952-60-4922-9 ISBN (pdf) 978-952-60-4923-6 ISSN-L 1799-4934 ISSN (painettu) 1799-4934 ISSN (pdf) 1799-4942 Julkaisupaikka Espoo Sivumäärä 157 Painopaikka Helsinki Vuosi 2012 urn http://urn.fi/URN:ISBN:978-952-60-4923-6 Preface Starting from 1999 I have worked on many industrial projects (SYANIDE, ARFSIM, MOSAICS, AMAZE, STONGA) mainly in the development of APLAC’s analysis methods. Also, from 2005 to 2007, we had a joint project NETMOR with NEC Europe on model-order reduction. In 2009–2010 I also worked with the EU project ICESTARS. This thesis comprises the collection of the publications that resulted from these projects. I would especially like to thank my supervisor prof. Martti Valtonen and my instructor D.Sc. (Tech.) Janne Roos for many reasons and Ville Karanko and Jarmo Virtanen for co-operating in the APLAC projects. I also wish to thank Pekka Miettinen, in particular, but also Dr. Achim Basermann and Carsten Neff for fruitful co-operation in MOR research. I am very grateful to Sakari Aaltonen for proof-reading my articles and to Luis Costa for proof-reading the overview of this thesis. Many other current and former members of circuit theory group have, at least indirectly, influenced my thesis: Mikko Hulkkonen, Tuomo Kujanpää, Anu Lehtovuori, Vesa Linja-aho, Timo Palenius, dr. Neslihan Şengör (visiting scientist), D.Ss. (Tech.) Kimmo Silvonen, Taisto Tinttunen, Tuukka Tuomisto, and D.Ss. (Tech.) Timo Veijola to mention some of the most important. Thank you. And, of course, the warmest thanks goes to my wife Sanna for her constant support. The Jenny and Antti Wihuri Foundation and the Nokia Foundation have partially funded this thesis. Espoo, November 19, 2012, Mikko Honkala 1 Preface 2 Contents Preface 1 Contents 3 List of Publications 7 Author’s Contribution 9 1. Introduction 19 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 Scope of study . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2. Numerical circuit-analysis methods 23 2.1 Circuit equations . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 DC analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.1 Aiding the convergence . . . . . . . . . . . . . . . . . . 24 2.3 Transient analysis . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Harmonic balance analysis . . . . . . . . . . . . . . . . . . . . 26 2.4.1 Frequency selective harmonic balance analysis . . . . 27 2.4.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . 27 3. Iterative methods for nonlinear equations 31 3.1 Equation formulation . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Line-search methods . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Trust-region methods . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Dogleg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 Tensor methods . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6 Multilevel Newton–Raphson . . . . . . . . . . . . . . . . . . . 35 4. Parallel processing in circuit simulation 39 3 Contents 5. Model-order reduction 41 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 PRIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.1 Equation formulation . . . . . . . . . . . . . . . . . . . 43 5.2.2 PRIMA algorithm . . . . . . . . . . . . . . . . . . . . . 44 5.2.3 Eigenvalue decomposition . . . . . . . . . . . . . . . . 45 5.2.4 Macromodel synthesis by Matsumoto’s method . . . . 46 5.3 Liao–Dai method . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1 T model . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3.2 Π model . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3.3 Circuit model of a port . . . . . . . . . . . . . . . . . . 50 6. Discussion 51 7. Summary of the publications 53 7.1 Publication I: Nonmonotone norm-reduction method for circuit simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.2 Publication II: On nonlinear iteration methods for DC analysis of industrial circuits . . . . . . . . . . . . . . . . . . . . . 53 7.3 Publication III: New multilevel Newton–Raphson method for parallel circuit simulation . . . . . . . . . . . . . . . . . . 54 7.4 Publication IV: A Parallel harmonic balance simulator for shared memory multicomputer . . . . . . . . . . . . . . . . . 54 7.5 Publication V: Mixed preconditioners for harmonic balance Jacobians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7.6 Publication VI: Frequency/time block preconditioners for harmonic balance Jacobians . . . . . . . . . . . . . . . . . . . 55 7.7 Publication VII: Study and development of an efficient RCin–RC-out MOR method . . . . . . . . . . . . . . . . . . . . . 55 7.8 Publication VIII: Hierarchical model-order reduction flow . 55 7.9 Publication IX: GABOR: Global-approximation-based order reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 7.10 Publication X: PartMOR: Partitioning-based realizable modelorder reduction method for RLC circuits . . . . . . . . . . . . 56 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Bibliography 59 Errata 69 4 Contents Publications 71 5 Contents 6 List of Publications This thesis consists of an overview and of the following publications which are referred to in the text by their Roman numerals. I M. Honkala. Nonmonotone norm-reduction method for circuit simulation. Electronics Letters, vol. 38, pp. 1316–1317, Oct. 2002. II M. Honkala, J. Roos, and V. Karanko. On nonlinear iteration methods for DC analysis of industrial circuits. Mathematics in Industry 8: Progress in Industrial Mathematics at ECMI 2004, (A. D. Bucchianico, R. M. M. Mattheij, and M. A. Peletier, eds.), pp. 144–148, 2006. III M. Honkala, J. Roos, and M. Valtonen. New multilevel Newton–Raphson method for parallel circuit simulation. Proceedings of European Conference on Circuit Theory and Design, vol. II, pp. 113–116, Aug. 2001. IV V. Karanko and M. Honkala. A parallel harmonic balance simulator for shared memory multicomputers. Proceedings of the 34th European Microwave Conference, pp. 849–851, 2004. V M. Honkala and V. Karanko. Mixed preconditioners for harmonic balance Jacobians. International Journal of RF and Microwave ComputerAided Engineering, vol. 19, no. 2, pp. 211–217, 2009. VI M. Honkala, V. Karanko, J. Roos, and M. Valtonen. Frequency/time block preconditioners for harmonic balance Jacobians. Proceedings of European Conference on Circuit Theory and Design, pp. 607–610, Aug. 7 List of Publications 2009. VII P. Miettinen, M. Honkala, J. Roos, C. Neff, and A. Basermann. Study and development of an efficient RC-in–RC-out MOR method. Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems, pp. 1277–1280, Aug. 2008. VIII M. Honkala, P. Miettinen, J. Roos, and C. Neff. Hierarchical modelorder reduction flow. Mathematics in Industry 14: Scientific Computing in Electrical Engineering SCEE 2008, (J. Roos and L. R. J. Costa, eds.), pp. 539–546, 2010. IX J. Roos, M. Honkala, and P. Miettinen. GABOR: global-approximationbased order reduction. Mathematics in Industry 14: Scientific Computing in Electrical Engineering SCEE 2008, (J. Roos and L. R. J. Costa, eds.), pp. 517–514, 2010. X P. Miettinen, M. Honkala, J. Roos, and M. Valtonen. PartMOR: partitioningbased realizable model-order reduction method for RLC circuits. IEEE Transactions of Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 3, pp. 374–387, 2011. 8 Author’s Contribution Publication I: “Nonmonotone norm-reduction method for circuit simulation” All the work, i.e. the implementation, the testing and the paper preparation was done by the author. Publication II: “On nonlinear iteration methods for DC analysis of industrial circuits” The author has developed, implemented, and evaluated the algorithms. Ville Karanko worked jointly with the author to develop and implement the basic tensor methods. D.Sc. (Tech.) Janne Roos helped with the preparation of the paper. Publication III: “New multilevel Newton–Raphson method for parallel circuit simulation” The development, implementation and evaluation of the method was done by the author. D.Sc. (Tech.) Janne Roos helped with the preparation of the paper. Publication IV: “A parallel harmonic balance simulator for shared memory multicomputers” The major part of the implementation (about 70 %) was done by Ville Karanko. The author has contributed in the implementation of the component-function evaluations and general parallelization. Ville Karanko 9 Author’s Contribution was responsible for the writing of the paper. Publication V: “Mixed preconditioners for harmonic balance Jacobians” The idea of mixed preconditioners was proposed by the author and the major part of the implementation was done by the author. Ville Karanko contributed to the paper through discussions and helped with implementations. Publication VI: “Frequency/time block preconditioners for harmonic balance Jacobians” The idea of this approach was proposed by the author. The author implemented the methods building on the foundation done by Ville Karanko, who also contributed in writing some parts of the paper. The tests were performed by the author. Publication VII: “Study and development of an efficient RC-in–RC-out MOR method” The author was responsible for implementing the major part of the MOR flow, analyzing the nodal-formulation-based circuit-equation approach, and deriving the simplified macromodel approach together with D.Sc. (Tech.) Janne Roos. A large portion of the method implementation and the extensive simulations were done by Pekka Miettinen. Janne Roos contributed substantially to the paper through discussions and an initial study of the Liao–Dai and PRIMA methods. Carsten Neff helped through discussions on the implementation. Publication VIII: “Hierarchical model-order reduction flow” The paper and the related additional research was inspired by the discoveries made in the test simulations performed during the development of [PVII]. Thus, the contributions stated above for paper [PVII] apply here to some extent. In addition, the author performed test simulations to study hierarchical analysis with PRIMA and discussed the hierarchical method 10 Author’s Contribution flow at a more general level. Pekka Miettinen’s contribution to the paper was writing two preliminary sections of the manuscript, discussions, and commenting on the text. D.Sc. (Tech.) Janne Roos helped through discussions and by commenting the text. Publication IX: “GABOR: global-approximation-based order reduction” The main idea behind this approach as well as the proof of the GABOR method was proposed by D.Sc. (Tech.) Janne Roos. The author’s contribution was in the implementation of the method, the test simulations, and inventing some parts of the method, e.g. frequency shifting and scaling. Publication X: “PartMOR: partitioning-based realizable model-order reduction method for RLC circuits” The original idea as well as most of the method implementation and all the test simulations were done by Pekka Miettinen. The author contributed to the theory at various points during the development of the method. In addition, the idea for Fig. 11 was proposed by the author. D.Sc. (Tech.) Janne Roos helped with preparing the manuscript and with additional discussions. 11 Author’s Contribution 12 Symbols 1 identity matrix a auxiliary vector A matrix of the linear equation A auxiliary matrix Ai ith diagonal block in the BBD matrix b right-hand side vector b auxiliary vector B trust region B selector matrix B̃ reduced-order selector matrix Bi ith upper off-diagonal block in the BBD matrix c capacitance C nodal capacitance matrix C̃ nodal capacitance matrix in the time domain C̃ reduced-order capacitance matrix Ci ith lower off-diagonal block in the BBD matrix D diagonal block in the BBD matrix e voltage source E incidence matrix E auxiliary matrix E part of modified nodal matrix f function f frequency F function g conductance g gradient vector G conductance G nodal conductance matrix 13 Symbols 14 G̃ nodal conductance matrix in the time domain G̃ reduced-order resistance matrix H Hessian matrix H inductor stamp matrix H auxiliary matrix i index i current i vector of currents in the time domain iD diode current ip vector of port currents I vector of currents in the frequency domain j index J Jacobian matrix J̃ Jacobian matrix in the time domain k index K number of block moments K preconditioner L inductance L linear model L inductance matrix L selector matrix L̃ reduced-order selector matrix m moment m quadratic model M moment matrix n number of unknowns N number of harmonic frequencies N , Nx , Ny number of ports N resitor stamp matrix q order of reduction qc number of complex poles qr number of real poles q vector of charges Q capacitor stamp matrix r residual R resistance R auxiliary matrix S scattering parameter matrix Symbols S eigenvector matrix t time T tensor u vector of excitation currents up vector of port voltages U vector of excitation currents v nodal voltage v vector of voltages x unknown x̃ state variable x vector of unknowns x∗ solution vector X vector of unknowns in frequency domain X projection matrix y y parameter Y admittance matrix Z0 reference impedance α damping factor αk weight of the time-domain difference operator β contracting factor Γ DFT matrix δ radius of trust region ∆ auxiliary matrix ∆x update of the vector of unknows ∆xDL dogleg update ∆xNR Newton-Raphson update ∆xSD steepest-descent update error limit λ damping factor Λ diagonal matrix containing the eigenvalues ρ gain factor τ maximum error Ω frequency matrix ω angular velocity 15 Symbols 16 Abbreviations 2D 2 dimensional AC alternating current APLAC circuit simulator AWE asymptotic waveform evaluation BBD bordered-block diagonal BE backward Euler CG conjugate gradient CP Cauchy point CPU central processing unit DAE differential algebraic equations DC direct current DFT discrete Fourier transform DL dog leg FD frequency domain FE forward Euler FFT fast Fourier transform FGMRES flexible generalized minimal residual FSHB frequency selective harmonic balance GABOR global-approximation-based order reduction GMRES generalized minimal residual GPU graphics processing unit HB harmonic balance HMOR hierarchical model-order reduction IDFT inverse discrete Fourier transform IFFT inverse fast Fourier transform LU lower upper MEMS micro-electro-mechanical systems MIMD multiple instruction stream / multiple data stream 17 Abbreviations MLNA multilevel Newton analysis MLNR multilevel Newton–Raphson MNA modified nodal analysis MOR model-order reduction MPI message-passing interface MRHB multirate harmonic balance NOW network of workstations NR Newton–Raphson PartMOR partitioning-based model-order reduction PRIMA passive reduced-order interconnect macromodeling algorithm PVL Padé via Lanczos PVM parallel virtual machine RF radio frequency ROM reduced-order model SAPOR second-order Arnoldi method for passive order reduction SD steepest descent SIMD single instruction / multiple data SPICE a well-known circuit simulator SPRIM structure-preserving reduced-order interconnect macromodeling algorithm SVD singular value decomposition TBR truncated balanced realization TD time domain 18 1. Introduction 1.1 Background Modern electronic circuits are typically large, consisting of thousands of transistors and other components. During the design process, there is a need to perform computationally demanding numerical simulations to verify the functionality of the circuit. Thus, the need for fast and accurate circuit simulation tools is obvious. A circuit simulator is a tool that analyzes (simulates) the behavior of electrical circuits using numerical, or in some cases symbolic, algorithms. The circuit to be simulated is constructed from components that can be described with a mathematical model. In other words, these mathematical circuit models are interconnected and combined into the large system of equations to be solved by the simulation algorithms. There are numerous tools for simulating analog electronics, like SPICE [1] and its many derivatives. Also, tools for digital circuits and mixed-mode simulators for mixed analog and digital design exist [2]. For RF and microwave circuits, there are sophisticated tools like APLAC [3, 4]. The competent circuit simulation is based on accurate circuit models and efficient simulation algorithms. The high quality numerical analysis methods are both fast and robust; e.g., iterative equation-solving algorithms used by the simulation methods should converge in any circumstance. The obtained solution should be as accurate as the models permit. However, sometimes less accurate but faster to evaluate circuit models are needed, e.g., in a timing analysis of digital circuits. Also, the memory consumption can sometimes be the bottleneck. This leads to model-order reduction (MOR) techniques that have been heavily studied in the past decades. In addition to improvements in the numerical algorithms, par- 19 Introduction allel processing (or concurrent programming) has become a standard approach to improve the efficiency of numerical computation – also in circuit simulation. 1.2 Scope of study The general topic of this thesis is numerical circuit-simulation methods for analog and RF/microwave circuits. Four approaches to improve the speed and the convergence of the numerical circuit simulation are presented. A very brief overview of the novel contributions of the publications [PI–PX] is presented in the following. A more detailed description of each publication is presented in Section 7. The first approach utilizes efficient iteration methods for nonlinear DC analysis. When choosing the nonlinear iteration method for a circuit simulator, the special properties of both the circuit equations and the circuit simulator have to take into account. Newton–Raphson (NR) iteration is the most used nonliner iteration method for nonlinear circuit equations, but it lacks good global convergence properties. A nonmonotone linesearch strategy for NR iteration is studied in [PI] and some new (variants of) iteration methods for nonlinear equations are presented in [PII]. In the second approach, the computing time is reduced by using parallel processing. The necessary requirement for that is parallel hardware. Traditionally, the parallel processing is performed in supercomputers with multiple processors, but these computers are usually very expensive. Recently, multi-core processors have become available at cheap price. These multi-core architectures can execute several threads concurrently. The parallelization of the harmonic balance (HB) method using multithreads is studied in [PIV]. Also, utilization of networks of workstations (NOW) as parallel computers are studied. In the networked parallel processing or in the distributed computing, each serial (or parallel) computer is used as a processing unit and data is transferred via a local area network, like Ethernet. With this approach the communication between processors is expensive. The utilization of the multilevel iteration methods has been proposed to minimize the communication. [PIII] proposed a variant of the multilevel Newton–Raphson (MLNR) method with improved global convergence properties. The third approach concentrates on improving the convergence of iterative solvers for linear systems, like generalized minimal residual (GM- 20 Introduction RES) [5], using preconditioners. In this thesis, the emphasis is in preconditioning of Jacobians of the HB method using time-domain (TD) preconditioners instead of typical frequency-domain (FD) ones. The main contribution of this thesis is to show how to combine the FD preconditioners with TD preconditioners [PV, PVI] in order to benefit from both. The fourth approach is a little different. Sometimes the improvements in simulation algorithms and hardware is not enough to be able to improve the speed (and reduce memory consumption) of the circuit simulation. Then, there is need for MOR, where the idea is to simplify complex circuit models by approximating the model by simpler one. In the best case, the reduced-order model (ROM) is of small size compared to the original, but still describes the original system perfectly. In practice, the reduction process generates some error in the ROM in a trade-of for a smaller system. There are MOR methods for linear and nonlinear circuits, but this thesis concentrates on methods for linear circuits or linear parts of the nonlinear circuits. [PVII, PVIII, PX] present partitioning-based MOR for RC and RLC circuits. A new global approach to projection-based MOR was invented and tested in [PIX]. The following highlights the new contributions: • Application of a nonmonotone line-search (norm-reduction) method to a nonlinear DC analysis method in [PI]. • Development of the nonmonotone trust-region dogleg method in [PII]. • Development of the nonmonotone line-search and trust-region tensor methods in [PII]. • Comparison of different nonlinear iterative methods in DC analysis [PII]. • Development of the modified MLNR method with improved line-search properties for DC analysis in [PIII]. • Convergence analysis of MLNR method in [PIII]. • Parallelization of HB analysis using multithreads in [PIV]. • Development of two mixed TD/FD preconditioners that combine TD and 21 Introduction FD preconditioners for HB Jacobians in [PV]. • Development of three TD/FD block preconditioners that divide the Jacobian into blocks that can be preconditioned with different preconditioners in [PVI]. • In [PVII], circuit-partitioning-based MOR methods for RC circuits are developed and studied. • The results in [PVII] are extended into the hierarchical MOR flow presented in [PVIII]. The MOR flow shows how to apply suitable reduction methods to different parts of the linear circuit, e.g. PRIMA for RLC circuit parts and Liao–Dai for RC circuit parts. • A new global approach to projection-based MOR was invented and tested in [PIX]. • A new partitioning-based low-order macromodel MOR method for RLC circuits is proposed in [PX]. 22 2. Numerical circuit-analysis methods In this section, the basics of numerical circuit-analysis methods are presented briefly such that a mathematical framework is provided for the rest of the thesis. Readers not so familiar with numerical circuit simulation methods can read more from the fundamental text books, e.g., [6, 7, 8, 9, 10, 11]. 2.1 Circuit equations There are many ways to formulate the circuit equations [7], e.g., nodal, tableau, and mesh analysis, but the one most used in circuit simulation is the modified nodal analysis (MNA) [12]. The MNA equations are formulated using Kirchhoff ’s current law (sum of currents in a node is zero) and branch constitutive equations: e.g, a resistor is modelled using Ohm’s law as i = v/R and a the behavior of linear capacitor is described by i = dq(v) dt = C dv dt . The MNA differs from nodal analysis such that a circuit element that has no admittance representation (e.g., voltage sources and inductances) is presented using an additional current variable. Consider the system of nonlinear differential algebraic equations (DAEs) f (x(t), t) = dq(x(t)) + i(x(t), t) + u(t) = 0, dt (2.1) where x is the nodal vector of unknowns, i.e., nodal voltages and currents of elements that have no admittance presentation and that are required as part of the solution, q is the nonlinear charge vector, i is the nonlinear function of nodal currents, and u is the excitation (current) vector. 23 Numerical circuit-analysis methods Figure 2.1. Example circuit. For example, the MNA equations for the circuit in Fig. 2.1 are f 0 1 f2 = ∂q(v2 (t)) ∂t f3 0 G(v (t) − v (t)) + i 1 2 E + −G(v1 (t) − v2 (t)) + iD (v2 ) v1 (t) + 0 0 −e(t) 0 = 0 0 (2.2) The other way to formulate the MNA equations (the approach used in APLAC [4]) is to use the gyrator transformation [13] and nodal analysis. In this way a voltage source is transformed into a gyrator and a current source and an inductance into a gyrator and a capacitance. Then, the pure nodal analysis can be applied to formulate the equations. The linear equations arising from electrical circuits are commonly asymmetric and extremely sparse, and thus sparse LU factorization algorithms [7, 14] are applied. 2.2 DC analysis DC analysis is the basis of all circuit simulation. For example, the operating point has to be found before AC analysis, and, also, the DC solution is the initial condition for transient analysis. Moreover, the DC characteristics themselves are sometimes of interest. DC analysis solves the steady-state behavior of the circuit variables under the DC excitation. By setting dq dt = 0, Eq. (2.1) reduces to the nonlin- ear algebraic circuit equation f (x) = 0, (2.3) where x is solved iteratively using the NR method (see Section 3.2). 2.2.1 Aiding the convergence The local convergence of NR iteration is quadratic, but it has no global convergence properties. If the NR iteration lacks a good initial guess 24 Numerical circuit-analysis methods close enough to the solution, convergence is not guaranteed. Therefore, convergence-aiding methods are needed. There are several approaches to aid convergence. Using homotopy and continuation methods [15, 16, 17, 18] is one way to improve the convergence of DC analysis and even to find multiple DC solutions. Another way to help the convergence is to use line-search to damp iterations [19, 20] such that the norm of the objective function reduces in every iteration. Thus the term norm reduction is sometimes used for this strategy. A third approach is to use totally different solution algorithms, e.g. piecewise linear analysis [21]. More details on different nonlinear iterative solvers is found in Section 3. 2.3 Transient analysis Transient analysis is the computation of a time-domain transient response of (nonlinear) DAEs using numerical integration methods. The analysis is an initial value problem, where the initial values are typically got from the DC solution. DAEs are discretized with different methods like forward Euler (FE), backward Euler (BE), or the trapezoidal rule, into the set of nonlinear equations that are solved similarly to DC analysis (using NR iteration). By using the BE formulation, Eq. (2.1) transforms to q(xk+1 ) − q(xk ) + i(xk , tk ) + u(tk ) = 0, (tk+1 − tk ) (2.4) and by using the FE formulation to q(xk+1 ) − q(xk ) + i(xk+1 , tk+1 ) + u(tk+1 ) = 0, (tk+1 − tk ) (2.5) where k is the time index. Starting from the initial condition x0 , which typically is the DC solution, and using (variable) time stepping, the timedomain transient response can be computed. At each time point the nonlinear algebraic equation has to be solved, typically using NR iteration. The time steps are selected such that the truncation error satisfy the adequate error bound. BE and FE are first-order methods and the trapezoidal integration rule a second-order method. Many other integration methods can be applied, e.g. Runge–Kutta methods [22]. 25 Numerical circuit-analysis methods 2.4 Harmonic balance analysis The harmonic balance (HB) [11, 23, 24] method is a frequency-domain analysis technique for solving the periodic and quasi-periodic steady state. It is widely used for RF and microwave circuits. In HB analysis, the variables are presented in terms of Fourier coefficients: x(t) = N X Xk ejωk t , (2.6) −N where N is the number of harmonic frequencies, Xk is the kth Fourier coefficient, and ωk the kth harmonic frequency. Since the HB equations easily become huge, they are usually solved using the inexact Newton [25] method, i.e. the NR method with an iterative linear solver like GMRES [5]. There are two major ways to formulate the HB equations, namely piecewise HB [23] and nodal HB. For simplicity, the following considers nodal equations instead of MNA. By transforming (2.1) into the frequency domain using the (multidimensional) discrete Fourier transform (DFT) Γ, the HB equations become F(X) = I(X) + jΩQ(X) + U, (2.7) where U is the vector of excitation currents, X = Γx(t) is the nodal voltage vector, and I = Γi(t) and Q = Γq(t) are the nonlinear nodal current and charge vectors, respectively. Ω is the frequency domain differentiation matrix ω−N 1 Ω= ω−N +1 1 ... ωN −1 1 ωN 1 , (2.8) where 1 is the unity matrix. In practice, the DFT is usually replaced with the fast Fourier transform (FFT), which is the efficient implementation of the DFT. Another important way to improve the efficiency of the multitone analysis is to use 1-D frequency mappings [11, 24, 26]. The whole HB analysis (using inexact Newton method) is as follows: A LGORITHM 1 HB(X0 ) 1. Set initial quess X0 , k = 0. 2. Inverse discrete Fourier transform (IDFT): x(t) = Γ−1 Xk . 3. Compute nonlinear functions i(x(t)) and q(x(t)). 26 Numerical circuit-analysis methods 4. DFT: I = Γi and Q = Γq. 5. Solve new iterate for Xk+1 . 6. Check convergence. If no convergence go to step 2. 7. Solution found. As mentioned before, the nonlinear equation is solved using inexact Newton. It requires a Jacobian matrix J = jΩC + G, (2.9) where C and G are the nodal capacitance and conductance matrices, respectively. 2.4.1 Frequency selective harmonic balance analysis The circuit design usually has different parts that can have very different operating frequencies, e.g., mixer parts of a circuit have 2-tone frequencies while a DC biasing circuit 1-tone, or just DC as in this case. In frequency selective HB (FSHB) or multirate HB (MRHB) that is implemented in APLAC [27, 28], each circuit element may be assigned a different frequency set. A similar approach was presented by Rizzoli [29], where each circuit block may have the reduced set of frequencies. The circuit is, therefore, effectively decomposed into partitions having same frequencies. Nodal equations are required for the union of frequencies of each partition the node is referenced from, i.e., the boundary node equations are different from the ordinary nodal HB with respect to the frequency selection. This way the number of unknowns in the HB analysis can be reduced while preserving good accuracy in the critical parts. The node equations are then formed for each node from the set of frequencies contributing from the elements connected to the corresponding node. 2.4.2 Preconditioning In order to solve the general linear equation having the form Ax = b, (2.10) the iterative solvers (like GMRES) can be used and are typically used if the linear system is huge. In the following, consider the iterative solver 27 Numerical circuit-analysis methods GMRES that minimizes the residual r = b − Ax. (2.11) The iterative linear solver needs preconditioners to function efficiently. The goal of preconditioning is to reduce the number of GMRES (or other iterative solvers or variants of GMRES more suitable for preconditioning like [30, 31]) iterations by making the problem easier for GMRES: K−1 Ax = K−1 b, (2.12) where K is a preconditioner. For complicated problems, the number of iterations cannot be directly analyzed as a function of the preconditioner. However, a matrix K is often a good preconditioner if [32] 1. K is a good approximation to A in some sense, 2. the cost of the construction of K is not prohibitive, and 3. the solution of the preconditioner equation requires much less computation than solving the original equation. In the best case when K−1 A ≈ I, Eq. (2.12) becomes x = K−1 b, and the GMRES converges in one iteration (or a few iterations) that needs only one computationally cheap inversion of a preconditioner. The preconditioning of GMRES in HB analysis has been studied, e.g., in [33, 34, 35, 36, 37, 38]. In the inexact Newton, the linear equation to be solved is J∆X = −F(X). (2.13) As HB is a frequency-domain method, a natural choice for a preconditioner is a frequency-domain conditioner. One of the most commonly used preconditioners is the block Jacobi preconditioner. This is just the diagonal blocks of the Jacobian matrix when ordered in frequency-major order. KFD = J−N 0 0 ... 0 J−N +1 0 ... 0 0 0 0 J−N +2 . . . .. . 0 0 0 0 , . . . JN (2.14) It is equivalent to zeroing the off-diagonal terms of the conversion matrices and explains why strongly nonlinear problems are not handled well 28 Numerical circuit-analysis methods by this preconditioner. However, the cost of inverting this block equation is low, because each diagonal block can be LU factored separately. While frequency domain preconditioning is effective for weakly nonlinear circuits, for highly nonlinear cases, especially for frequency dividers, time-domain preconditioners [34, 35, 36, 37], which take nonlinear behavior better into account, become attractive. The following considers 1-tone HB analysis only. In the time domain, the Jacobian is J̃ = Γ−1 JΓ = DC̃ + G̃, where −1 G̃ := Γ g 0 1 0 g2 GΓ = 0 0 and, similarly, C̃ := Γ 0 ... 0 ... g3 . . . .. . 0 0 c 0 1 0 c2 CΓ = 0 0 0 −1 (2.15) 0 0 0 . . . gn 0 ... 0 ... 0 c3 . . . .. . 0 0 (2.16) 0 0 0 , . . . cn (2.17) where gk and ck are block matrices, and Γ is, as before, the DFT operating on each nodal variable. The difference operator D = Γ−1 jΩΓ is a matrix having the general form 0 α 1 1 α−1 1 0 D= α−2 1 α−1 1 α1 1 α2 1 α2 1 ... α1 1 ... 0 ... .. . ... α−1 1 α−1 1 α−2 1 α−3 1 , 0 (2.18) where the coefficients αk are the weights of the time-domain difference operator. Since for strongly nonlinear circuits the resistive nonlinearities are dominant, it is tempting to approximate the equations by considering them in this form and further approximating the difference operator D by some typical finite difference. 29 Numerical circuit-analysis methods In [PV] and [PVI] mixed FD and TD preconditioners are considered. Several approaches to mix different preconditioners are proposed. The paper [PVI] presents block preconditioners that can be used with FSHB. In these preconditioners, highly nonlinear 1-tone parts can be preconditioned with TD preconditioners while the other parts can be preconditioned with FD preconditioners. 30 3. Iterative methods for nonlinear equations In this section, some standard iterative methods for solving nonlinear equations are presented. This section offers introduction for the study in [PI–PIII], where these methods are evaluated and further developed in the context of DC circuit analysis. 3.1 Equation formulation The nonlinear algebraic circuit equations were presented in Eq. (2.3). In DC and transient analysis x is the vector of nodal voltages and currents, but in HB analysis the vector contains Fourier coefficients. The function values can be directly obtained from the model equations and the derivatives either using numerical perturbation or directly from the model equations. In order to use more sophisticated methods the objective function is defined as 1 1 F = kf (x)k22 = f (x)T f (x). 2 2 (3.1) g = ∇F = JT f (x), (3.2) The gradient then is where J is the Jacobian matrix. The Hessian matrix ∂ JT ∂g H=∇ F = = f (x) + JT J ∂x ∂x 2 (3.3) can be obtained, but it would need expensive numerical computation. 31 Iterative methods for nonlinear equations 3.2 Line-search methods This thesis studied the damped iteration methods, where the new iterate xk+1 at the kth iteration is xk+1 = xk + λk ∆xk . (3.4) The damping factor λk , 0 < λk ≤ 1, and the update ∆xk depend on the iteration method used. For example, the update for NR, the steepest descent (SD), and the conjugate gradient (CG) method are ∆xNR = −Jk−1 fk , k ∆xSD k = −JTk fk = −gk , ∆xCG = −gk + βk−1 ∆xk−1 , k (3.5) (3.6) (3.7) respectively, where β can be computed in many ways, e.g. by using Fletcher and Reeves formula [39] βk−1 = ||gk ||2 . ||gk−1 ||2 (3.8) Alternative formulas can be found, e.g., in [40]. The local convergence of the NR method is quadratical, i.e. ||xk+1 − x∗ || < K||xk − x∗ ||2 , (3.9) where x∗ is the solution of the nonlinear equation, and K is a constant. The nonmonotone approach to line search has been proposed in several papers, e.g. [41, 42]. The idea here is that close to the solution the NR iteration converges, i.e. ||xk+1 − x∗ || < ||xk − x∗ ||, even if the norm of the objective function does not decrease. If this is the case, some nonmonotonicity to decreasing of the function norm can be allowed. 3.3 Trust-region methods A trust region B = {x | kx − xk k ≤ δ }, where δ is the trust-region radius, is the region where the linear or quadratic model m(x) is assumed to approximate f (x). In the trust-region methods, the iteration step, ∆xk , is obtained by minimizing the model within the trust region: min m(xk + ∆x). k∆xk≤δ (3.10) The trust-region radius, δ, is adaptively adjusted during the iteration. The quality of the linear model L(∆x) = f (xk ) + Jk ∆xk 32 (3.11) Iterative methods for nonlinear equations is monitored using the gain ratio ρ= F (xk ) − F (xk + ∆xk ) , L(0) − L(∆xk ) (3.12) i.e., the ratio between the actual and predicted decrease in function value. A large value of ρ indicates that the linear model is good and the trustregion size can be increased. A small ρ indicates a poor model and a smaller step-size should be used. There are several trust region methods, like dogleg [43], the Levenberg– Marquardt [44, 45], the tensor method with a 2D trust-region [46], and so on. The nonmonotone strategy is reported to have been used with trustregion methods [47, 48]. 3.4 Dogleg As an example of trust-region methods, dogleg (DL) is presented in the following. The method combines NR and SD methods as illustrated in Fig. 3.1. If the NR step is inside the trust region, it is accepted as a trial step. Otherwise, the point that minimizes the objective function in the direction of SD, the Cauchy point (CP), is computed, i.e., the minimizer of the linear model of F (x + α∆xSD ) is α=− kgk2 ∆xsd JT f . 2 = kJ∆xsd k kJgk2 (3.13) If the CP is outside the trust region, a damped SD step to the trust-region boundary is taken. When the CP is inside the trust region, a step is taken to the trust region boundary between the CP and the NR point: ∆xDL = α∆xSD + β(∆xNR − α∆xSD ). (3.14) By defining a := α∆xSD and b := ∆xNR , β is computed as follows: p (−c + c2 + kb − ak2 (δ 2 − kak))/kb − ak2 , if c ≤ 0, β= (δ 2 − kak)/(c + pc2 + kb − ak2 (δ 2 − kak)), if c > 0. (3.15) The whole DL algorithm is as follows: A LGORITHM 2 DL(x0 ,δ0 ) 1. Set k = 0, x = x0 , and δ = δ0 2. Compute g = JT f 3. While kf (x)k > and kgk > and k < kmax (a) Compute CP (b) Solve NR step ∆xNR 33 Iterative methods for nonlinear equations Figure 3.1. Dogleg step. (c) Compute ∆xDL (d) If solution found, end iteration. (e) xk+1 = xk + ∆xDL (f) ρ = F (xk ) − F (xk+1 ) L(0) − L(∆xDL ) (g) If ρ > 0, accept step and compute gk . (h) If ρ > 0.75, then δ = max{δ, 3k∆xDL k} (i) If ρ < 0.75, then δ = δ/2. (j) Set k := k + 1 4. EndWhile. 3.5 Tensor methods Tensor methods with line search were presented in [49]. In [46], tensor methods with 2D trust-region methods were introduced. In these methods, the quadratic model is m(x + ∆x) = f (xk ) + Jk ∆x + (1/2)Tk ∆x∆x, (3.16) where Tk is the tensor obtained from interpolating past function values. Although a quadratic model is used, there is no need for the Hessian matrix. The iteration update ∆x is found by minimizing km(x + ∆x)k. In [PII], nonmonotone trust-region and line-search tensor methods are introduced. 34 Iterative methods for nonlinear equations 3.6 Multilevel Newton–Raphson Multilevel iteration methods are based on the hierarchical analysis of a partitioned circuit. For hierarchical analysis, concepts like diakoptics and tearing have been introduced in the 1970’s [50, 51, 52, 53, 54]. In the 1990’s, the term domain decomposition has been connected to these methods [55]. In these methods, the linear or linearized circuit equations are ordered into bordered block diagonal (BBD) form, which can be decomposed into separately solved submatrices. The equations are solved by using hierarchical LU factorization and forward-backward substitution. The BBD ordering of the matrix can be done even recursively on multiple levels of hierarchy. These methods have been efficiently utilized for parallel computation in DC and transient analysis. The BBD formulation and efficient equation solvers are used also for MOR methods [56]. Consider a circuit that has n nodes and that can be decomposed into m subcircuits consisting of ni internal nodes and nEi external connection nodes. The nonlinear system of nodal equations for internal and external nodes can be written as fi (xi , xE ) = 0, fE (x1 , . . . , xm , xE ) = 0, (3.17) respectively, where xi is the internal nodal voltage vector of subcircuits, xE the voltages of the external connection nodes of the subcircuits fi and fE . The Jacobian matrix J has the BBD form [57]: A 1 J= A2 .. . Am C1 C2 . . . Cm B1 B2 .. . , Bm (3.18) D where ∂fi , ∂xi ∂fE Ci = , ∂xi Ai = ∂fi , ∂xE ∂fE D= . ∂xE Bi = (3.19) The function vector fE , as well as the matrix D, can be further decomposed into parts that contain the contributions of the circuit elements of 35 Iterative methods for nonlinear equations the main circuit and each subcircuit: fE = fE,0 + m X fEi , i=1 m X D = DE,0 + Di . (3.20) (3.21) i=1 In the BBD formulation above, the decomposion is performed on the linear equation level, but if the circuit is partitioned before linearization, then, on the nonlinear equation level, nonlinear analysis methods like MLNR methods or Multilevel Newton Analysis (MLNA) [58] can be applied. These methods have been applied to DC and transient analyses to solve the system of (discretized) nonlinear equations [59, 60, 61, 62, 63, 64, 65]. They can be used also in the HB method [66] as well as in the simulation of micro-electro-mechanical systems (MEMS) [67, 68, 69] and mixed circuit/device systems [70, 71]. The multilevel methods can be effectively parallelized [59, 60, 61, 63, 64, 65, 72], because they apply circuit hierarchy in a natural way. The circuit equations as well as the linearized equations can be processed in parallel. One of the first MLNR methods, MLNA [58], performs the iterations on multiple levels. Between outer iterations, the external variables are kept constant and only the inner variables of subcircuits are iterated: k,j = xk,j xk,j+1 i − Ai i −1 k fi (xk,j i , xE i ), (3.22) where j is the inner iteration index. The inner iteration is stopped at some error level τ = min(τ 0 , k∆xE k2 ) (τ 0 is the maximum allowed error level) which is needed for quadratic convergence of the outer level iteration [58]. The initial guess for the inner variables xk,0 can be the same at every i inner iteration or it may be the ending value of the previous iteration. The main-circuit variables are iterated using subcircuits as macromodels. The MLNA [58] is summarized in the following. A LGORITHM 3 MLNA(circuit) 1. Set x0E , ε and τ 0 . 2. Begin outer iteration: Set k = 0. 3. Begin inner iterations for all subsystems i (in parallel): Set j = 0 and xk,0 i . k,j k (a) Solve Ai ∆xk,j i = −fi (xi , xE i ). k,j (b) Set xik,j+1 = xk,j i + ∆xi . (c) Set j = j + 1. 36 Iterative methods for nonlinear equations (d) If k∆xk,j i k > τ go to Step 3 (a). 4. End inner iteration. 5. Solve DE k0 + Pm k i=1 DSub, i 6. τ = min(τ 0 , k∆xE k2 ). ∆xE = −fE k0 + Pm k i=1 fSub, i . 7. Set k = k + 1. 8. End outer iteration if k∆xE k < ε. The method utilizing parallel computing and having improved global convergence properties is presented in [PIII]. 37 Iterative methods for nonlinear equations 38 4. Parallel processing in circuit simulation Traditionally, computer software has been designed for serial computation. Parallel computing, in turn, uses multiple processing elements simultaneously to perform the computation concurrently (or in parallel). The computation problem is decomposed into independent parts so that each processing element can execute its part of the algorithm independently. The processing elements can be a computer with multiple processors (or cores), a network of workstations (NOW), or a specialized hardware like the graphic processing unit (GPU) that can be used also for double-precison floating point computations. Parallel processing can be performed in the hardware where singleor multiprocessor computers are connected in a network and a software backplane is used to control the processing. This system is treated as a single multiprocessor computer (virtual machine). This kind of computing is called networked computing or distributed computing. From these many types of architectures, in multiple instruction stream / multiple data stream (MIMD) type architectures multiple instruction streams are executed in parallel for multiple data [32]. Most multiprocessor computers belong to this class. GPU architecture is the so called single instruction / multiple data (SIMD) architecture, where a single instruction (e.g., addition) can be performed on multiple data simultaneously. The memory in a parallel computer is either shared memory (shared between all processing elements in a single address space), or distributed memory (in which each processing element has its own local address space). In the message-passing programming model, each processor has its own local memory and message passing is used to deliver data between processors. In the shared-memory model, the processors have shared data [32]. There are programming packages, like PVM [73] and MPI [74], available for message passing programming. 39 Parallel processing in circuit simulation In the multihread computing model, the algorithm is designed such that concurrent prosessing is performed in different threads. Threads are the smallest units of processing that can be scheduled by an operating system and can even be executed simultaneously in different processing elements (typically in cores). Usually multithreading utilizes shared memory. The history of parallel circuit simulation is long. Decades ago supercomputers were used for parallel processing [75], but recently some studies on parallel circuit simulation in multicore processors has been presented, e.g., in [76], and now the GPU has been used for the same [77] along with proposed applications [78, 79]. Multilevel iteration methods suitable for NOWs are presented in [63, 64]. A great deal of effort is put in solving linear sparse matrices in circuit simulation, e.g. [80, 81, 82, 83]. Also, work on parallel HB has been reported, e.g., in [84, 85, 86]. In this thesis, the variant of the MLNA method with some convergence aiding [PIII] and an implementation of parallel HB analysis for shared memory computers are presented [PIV]. 40 5. Model-order reduction 5.1 Overview The goal of model-order reduction (MOR) is to produce a smaller model from a large one. Linear MOR concentrates on the reduction of RLCM circuits or RLCM parts of nonlinear circuits. Nonlinear MOR reduces also circuits consisting of transistors and other nonlinear components. An overview on MOR is presented in [87, 88, 89, 90]. One of the first proposed MOR methods was the asymptotic waveform evaluation (AWE) [91] in 1990. After that, a large number of projectionbased MOR methods have been proposed. The AWE algorithm uses the Padé approximation to the obtain an approximation of the original transfer function. However, the direct matching of high-order moments causes numerical instability problems. A solution to the instability was to use implicit moment matching, i.e., to project the original moment space onto an orthonormal Krylov subspace. The first such method was the Padé via Lanczos (PVL) [92], where the Lanczos process is used to generate the Krylov subspace. Alternatively, the Arnoldi process can be used to generate the subspace. The passive reduced-order interconnect macromodeling algorithm (PRIMA) [93] proposed in 1998 uses the Arnoldi process to generate the Krylov subspace. The projection matrix constructed is used to perform a congruence transformation of the original system into smaller system. PRIMA generetes provably passive reduced-order models (ROM). The easy implementation and guaranteed passivity property made PRIMA very popular. The structure-preserving reduced-order interconnect macromodeling algorithm (SPRIM) [94] proposed in 2004 added a structure-preserving feature to the Arnoldi process. Thus, the reciprocity of the system is preserved. 41 Model-order reduction Also, methods based on singular value decomposition (SVD) have been presented, e.g. truncated-balanced realization (TBR) [95, 96, 97, 98]. The idea of these methods is to use different balancing techniques to capture specific system properties. The methods have computable error bounds. However, these methods have been considered very expensive. Yet another approach to MOR is nodal-elimination methods like TICER [99, 100] that is used for reduction of RC circuits. Its extension [101] is used for RLC circuits. The outcome of these methods is by definition an RC or RLC circuit. One approach to MOR is to use partitioning-based reduction. In general, the original circuit can be divided into subcircuits (or matrices into submatrices), and each subcircuit can be analyzed separately with any of the previously presented methods. The partitioning itself can be performed for the graph constructed from the RLC circuit. There are standard methods for partitioning the graph, e.g. hMETIS [102]. However, some more sophisticated methods are substantially based on partitioning. For them the partitioning is not an additional part of the MOR flow, but an essential part of the method. One of these partitioning-based MOR methods was presented by Liao and Dai in 1999 [103]. Another partitioning based method, SparseRC for RC circuits, is presented in [104]. For RLC circuits, PartMOR [PX], and its extension for RLCM circuits in [105] were proposed. Some other partitioning-based approaches to MOR are presented in [56, 106, 107, 108]. There are several MOR methods for nonlinear systems of equations, but this field of MOR is beyond the scope of this thesis, as is parametric MOR, where the model is parameterized with respect to, e.g., temperature, geometric dimensions, etc. One often overlooked issue in the development of new MOR methods is the realizability of the reduced-order models. If a potential MOR method produces only a reduced mathematical model of a transfer function or state equations instead of a realizable circuit macromodel, simulation tools may need to be modified to handle these mathematical representations. Also, the realized RLC netlist allows the usage of all analysis modes of a simulator, e.g. it is simpler to use an RLC circuit in transient analysis than in a frequency representation. In [109], the macromodel realizations for MOR are well studied, but contain voltage-controlled current and charge sources in addition to standard RLC elements. Depending on the design flow, this may severely limit the utility and usability of MOR. RLC- 42 Model-order reduction SYN [110] can be utilized with structure preserving (reciprocal) methods like SPRIM and SAPOR [111]. TICER and Liao–Dai methods produce, as mentioned before, RC circuits. In order to get the more detailed idea of MOR, some methods used in [PVII–PX] are presented in the following: PRIMA with the macromodel realization method proposed by Matsumoto and the partitioning-based method proposed by Liao and Dai. 5.2 PRIMA The passive reduced-order interconnect macromodeling algorithm (PRIMA) [93] is based on the block Arnoldi algorithm and employs congruence transformations to project a large system of equations onto a smaller subspace so that passivity is preserved during reduction. PRIMA uses the Arnoldi iteration as a numerically stable method of generating the Krylov subspace to match K = bq/N c block moments of the N -port y-parameters, where q is the order of reduction. 5.2.1 Equation formulation The MNA equations of an N -port can be expressed as follows: C dx(t) = −Gx(t) + Bup (t), dt (5.1) ip (t) = LT x(t), where x(t) contains nodal voltages and branch currents of ports and inductances (x(0) = 0), and up and ip denote the port voltages and currents, respectively. B = L, where B ∈ <n×N is a selector matrix consisting of ones, minus ones and zeroes. n is the total number of unknowns. The MNA matrices G ∈ <n×n and C ∈ <n×n can be partitioned as C≡ Q 0 0 H , G ≡ N E −ET 0 , x ≡ v i . (5.2) N, Q, and H are symmetric non-negative definite matrices containing the stamps from resistances, capacitances, and inductances, respectively. Vector v is the nodal voltage vector and i contains the branch currents of ports and inductances. The matrix E represent the current variable contributions in the MNA equations. Define A ≡ −G−1 C and R ≡ G−1 B. Taking the Laplace transformation of (5.1) and solving for the port current variables, the y-parameter matrix 43 Model-order reduction Y(s) is Y(s) = LT (1 − sA)−1 R. (5.3) The block moments of Y(s) are defined as the coefficients of the Taylor expansion of Y around s = 0: Y(s) = M0 + M1 s + M2 s2 + · · · . (5.4) The block moments can be computed using the relation Mi = LT Ai R. (5.5) The reduction happens when this series is truncated into K first moments. PRIMA does not solve these moments directly, but other methods, such as PartMOR computes 1–3 first moments directly using (5.5). 5.2.2 PRIMA algorithm The generation of the projection matrices X is considered in the following algorithm: A LGORITHM 4 PRIMA(G, C, B, K ) 1. Solve GR = B for R. 2. Block orthonormalization: compute QR factorization R = X0 T. 3. For j = 1, . . . , K (a) Solve for Xj in GXj = −CXj−1 . (b) For i = 1, . . . , j (modifed Gram–Schmidt orthogonalization) i. ∆ = XTj−1 XJ . ii. Xj = Xj − Xj−1 ∆. iii. Hj−i,j−1 = Hj−i,j−1 + ∆. (c) EndFor. (d) Block orthonormalization: compute QR fact.: Xj = Xj Hj,j−1 . 4. EndFor. 5. Collect generated vector blocks X = [X0 . . . XK−1 ]. Using the projection matrix X, PRIMA transforms (5.1) into C̃ dx̃(t) = −G̃x̃(t) + B̃u(t), dt i(t) = L̃T x̃(t), 44 (5.6) Model-order reduction where the reduced matrices are C̃ = XT CX, G̃ = XT GX, B̃ = XT B, L̃ = XT L. (5.7) These types of transformations are known as congruence transformations. The matrix X is an n × q matrix, which is obtained after q/N + 1 iterations of the block Arnoldi algorithm (the extra step is not necessary if q/N is an integer). 5.2.3 Eigenvalue decomposition The mathematical formulation in this subsection is based on [109] and [112]. The reduced model in (5.6) is described by dense block matrices. If the direct stamping methods [93, 109], where the matrices are directly stamped into components, are used for macromodel creation, they might create many new components and, thus, lead to realizations that have more components than the original circuit. Therefore, other sophisticated macromodel realization methods are needed. Most of these methods need eigenvalue decomposition as a preprocessing step, and it is presented in the following. If the first equation of Eq. (5.6) is premultiplied by G̃−1 and assuming that a basis of eigenvectors exists for the matrix G̃−1 C̃, then G̃−1 C̃ = SΛS−1 , where Λ is a diagonal matrix containing the eigenvalues of G̃−1 C̃ as its diagonal elements and S has the corresponding q eigenvectors as its columns. After premultiplying by S−1 , Eq. (5.6) can be written as dx̃(t) S−1 SΛS−1 = −S−1 x̃(t) + S−1 G̃−1 B̃u(t), dt i(t) = L̃T SS−1 x̃(t), (5.8) or, with a change in variables S−1 x̃ → x̃, as dx̃(t) Λ = −1x̃(t) + Hu(t), dt i(t) = ET x̃(t), (5.9) where H = S−1 G̃−1 B̃, E = ST L̃, and 1 is the q × q unity matrix. Eq. (5.9) has the same dimensions as Eq. (5.6), but the coefficient matrices Λ and 1 are diagonal. The real matrix G−1 C̃ has qr real eigenvalues and qc complex conjugate pairs such that q = qr + 2qc . Consider one conjugate pair, Λrm ± jΛim . The corresponding eigenvectors, and the corresponding rows of matrices H 45 Model-order reduction and E in Eq. (5.9) are complex conjugate. Let the corresponding elements of vector x̃ be x̃rm ± jx̃im . Multiplying the mth row in the first equation of (5.9) by S−1 , it is written as Λm N X dx̃m (t) = −x̃m (t) + Hmj uj (t), dt j=1 (5.10) If x̃rm ± jx̃im are inserted into Eq. (5.10) and the real and imaginary parts of the equation are required to hold independently, Eq. (5.10) becomes N dx̃i (t) X dx̃r (t) r = −x̃rm (t) + Λim m + Hmj uj (t), Λrm m dt dt j=1 N i r X i r dx̃m (t) = −x̃i (t) − Λi dx̃m (t) + Hmj uj (t). m m Λm dt dt (5.11) j=1 5.2.4 Macromodel synthesis by Matsumoto’s method The macromodeling method proposed by Matsumoto [112] produces efficient macromodel realizations [112, 109] of the reduced-order models for the matrices obtained from PRIMA. The method is a realization of Eqs. (5.10) and (5.11). The equivalent circuit is presented in Fig. 5.1. The nodal equations, e.g., for the circuit in Fig. 5.1(b), can be expressed as N X Hmj Uj = (sΛm + 1)X̃m . (5.12) j=1 5.3 Liao–Dai method In principle, the method proposed by Liao and Dai (here, Liao–Dai) [103] is a partitioning and macromodel-based RC MOR method, where the circuit is partitioned into subcircuits that are modeled with simple low-order RC circuit. In this section, the original method is presented briefly for the reference purposes. In [PVII], the method is investigated further: each step is analyzed carefully and alternative approaches are proposed and evaluated. The Liao–Dai method begins by describing the circuit with scattering (S) parameters, where each circuit element is described in S parameter terms. The goal is to minimize the total number of the entries of S matrix. This is done by decomposing the circuit into subcircuits that have as small number of ports, i.e. connection nodes between subcircuits, as possible. The partitioning in the original Liao–Dai method is done by considering the RC netlist as a weighted graph G(V, E). For RC circuits, vertices 46 Model-order reduction Figure 5.1. Matsumoto’s equivalent-circuit realization [112]. (a) A port VCCS, (b) realization of a real eigenvalue, and (c) realization of a complex eigenvalue pair. V consist of nodes that connect more than two elements, and edges E represent the adjacency (resistance or capacitance) between nodes in the circuit. For each edge, a weight is determined based on the number of ports in each subcircuit the edge is connected to, if the two subcircuits were joined together. The partitioning process then chooses the elements to be combined together into larger subcircuits. This is done by eliminating edges between the elements until the weight of all remaining edges is greater than a preset maximum weight for an edge. In practise, an edge between two subcircuits is eliminated, if the inequality Nx2 + Ny2 > β(Nx + Ny − 2)2 (5.13) holds, where Nx and Ny are the number of ports in two subcircuits, x and y, and β is the user defined contracting factor. The equation is derived from the size of the corresponding S-parameter matrix: the left-hand side of the equation describes the number of entries in the matrix before the elimination and the right-hand side describes the number of entries after the elimination of an edge. As a result, the circuit is divided into partitions with a small number of ports, i.e. smaller number of entries in the S-parameter matrix. At each step, the S parameters of subcircuits are updated by calculat- 47 Model-order reduction Figure 5.2. Liao–Dai circuit macromodels between pairs of ports. ing new S parameters for each subcircuit. The S-parameter equations are also truncated to the first two low-order terms, because higher-order terms are not needed for the low-order macromodel synthesis. After the partitioning is completed, the S parameters of each subcircuit are converted into y parameters as follows Y = Z0−1 (1 + S)−1 (1 − S), (5.14) where Z0 is the reference impedance. These y parameters are used to realize the subcircuits with small-order RC-macromodels. For an N -port, the admittance between the ith port and ground is given by the sum of the ith row of its Y matrix, Y(s). The admittance between ports i and j is −yij . The admittances between a port and ground and between pairs of ports are synthesized with R and C elements. In the original Liao–Dai method [103], the admittance matrices of an RC circuit are synthesized using moment-matching technique. Once M0 and M1 have been calculated for the N -port using (5.5), each element of Y(s) can be expressed as ij yij ≈ mij 0 + m1 s. (5.15) Figure 5.2 presents the macromodel realizations between two ports i and j in different situations. A terminal macromodel, such as shown in Fig. 5.3, is also needed for each port. In case of a one-port, only the terminal macromodel is needed. Depending on the mij 1 , different macromodels are used between ports i and j: 1. The T circuit in Fig. 5.2(a) may be used if mij 1 ≥ 0. Along with the port impedance of Fig. 5.3 at two ports, this creates a 2Π circuit. 2. If mij 1 is negative, Fig. 5.2(b) must be used instead, which forms, combined with the port model of Fig. 5.3 at the two ports, a Π circuit. 48 Model-order reduction Figure 5.3. Liao–Dai circuit macromodel of a port. 5.3.1 T model The synthesization of the T model shown in Fig. 5.2(a), where the circuit parameters are determined by matching the moments of yij , is presented first. The T model together with the port model at the two ports (see Section 5.3.3) forms a 2Π model. ij ij By matching the first two moments, mii 0 , m0 , and m1 of the series in jj (5.15) and the relation mii 1 /m1 with the 2Π macromodel description, we obtain the values q − mjj 1 q R = , q ij1 ij ii + ( ) m m mjj 0 1 1 q − mii 1 q Rij2 = , q ij ii ( m m + mjj 0 1 ) 1 q q 2 mij ( mii + mjj 1 1 ) 1 C = q . ij ii mjj m 1 1 (5.16) jj According to [103], the reason for matching mii 1 /m1 instead of matching jj jj ii mii 1 and m1 directly is that m1 and m1 are the second moments of yii and yjj , which are the total contribution of the circuit, not just a branch between ports i and j. 5.3.2 Π model The above formulas could be used to synthesize all off-diagonal elements ij of the Y matrix with mij 1 ≥ 0. However, m1 may be negative for circuits with floating capacitances. In this case, a floating capacitance can be used to realize the yij between port i and port j as presented in Fig. 5.2(b). The admittance-matrix elements of the reduced macro model in Fig. 5.2(b) are y˜ii = ỹjj = −ỹij = 1 + sCij . Rij (5.17) Thus Rij = − 1 mij 0 , (5.18) 49 Model-order reduction Cij = −mij 1. 5.3.3 (5.19) Circuit model of a port For the diagonal elements yii , yi0 needs to be synthesized, where yi0 = yii − N X j=1(j6=i) i0 ỹij = mi0 0 + m1 s + · · · (5.20) The parameter yi0 is modeled with a parallel RC circuit in Fig. 5.3. If mi0 = 0, the yi0 is modeled with a single capacitance. The parameters of the model are 1 , mi0 0 (5.21) Cii = mi0 1 . (5.22) Rii = In case the capacitance Cii computed in Eq. (5.22) is negative, [103] suggests that Cii can be set to zero and all Cij scaled down to keep the total capacitance unchanged. This way all the resistances and capacitances are non-negative and the total macromodel is passive. 50 6. Discussion This thesis presented four approahes to increase the speed and improve the convergence of numerical circuit simulation. In the first approach, several nonlinear iteration methods have been evaluated and further developed. The main emphasis was in the convergence, and as a result of the studies in [PI, PII], the Dogleg method has been implemented in APLAC’s DC analysis as one of the many convergence strategies [3]. Also implemented is the nonmonotone normreduction method [PI] that the user may choose to use in APLAC’s DC and HB analysis. Only some attention was paid to the computational cost of the methods. In the future studies, the possibility to apply nonlinear methods to massive problems has to be taken better into account, especially how to use these methods with HB analysis efficiently. Even though utilization of parallel processing for circuit simulation has been studied for decades, it is still a relevant topic since the parallel hardwares are constantly developing; e.g., multicore processors exist in almost every computer. The multithreaded HB analysis reported in [PIV] is the basis of APLAC’s current HB implementation. The MLNR method [PIII] was available in some older versions of APLAC but was then disabled due to maintenance difficulties. Preconditoning is an essential part of the development of HB algorithms. Both FD and TD preconditioners have their own advantages. In this thesis, combining the TD preconditioner with the FD preconditioner was suggested in order to benefit from both. The mixed preconditioners seem to improve the performance at least in some cases and, therefore, most of the preconditioners are implemented in APLAC and can be specified for use by the user. Unfortunately, TD preconditioners are applicable directly to 1-tone HB problems only. Therefore, mixing of multitone (FD) preconditioning, especially with MRHB, might be a good direction to focus future 51 Discussion studies. The MOR research presented in this thesis was concentrated mainly on partitioning-based methods (excluding GABOR [PIX]). The PartMOR method presented in [PX] has been further developed and improved [105], and the possibility to apply this and other partitioning-based methods for tigthly coupled problems efficiently is under study. GABOR [PIX] was one attempt to open up a new approach for projection-based MOR but, unfortunately, was not very succesful. However, it was an inspiration for another experimental method: Passive, Reciprocal, and Infinity-Observing Reduction (PRIOR) [113]. In order to get these MOR methods, and those methods not presented in this thesis in more widespread use than now, there is need for research or, at least, technical development work that takes into account the real-life problems. 52 7. Summary of the publications 7.1 Publication I: Nonmonotone norm-reduction method for circuit simulation A nonmonotone norm-reduction (line-search) method for aiding the convergence of NR analysis is presented. It has been implemented in APLAC’s DC analysis and tested with some benchmark circuits. The test results showed that the method can reduce the number of line searches during the DC analysis and, thus, increase the speed of the NR iteration. 7.2 Publication II: On nonlinear iteration methods for DC analysis of industrial circuits This paper concentrates on some trust-region methods: DL and tensor methods which should be efficient in the case of nearly singular Jacobian matrices and do not need the computation of Hessian matrices. The convergence of these methods is also improved using the nonmonotone strategy presented in [PI]. The efficiency of the above mentioned methods was compared to NR and some CG methods. All the methods have been implemented in the in-house development version of APLAC. Simulations with real-life circuits are presented. The results showed that DL method — especially with the nonmonotone search strategy — was the most robust of the methods tested. 53 Summary of the publications 7.3 Publication III: New multilevel Newton–Raphson method for parallel circuit simulation In this paper, a variant of the multilevel Newton–Raphson method for parallel circuit simulation is presented. The reduced communication between processors is the motivation to use multilevel methods in a network of workstations. The proof for the quadratic local convergence is given, and with a specific circuit-equation formulation, the multilevel method is shown to be adjustable using line-search methods to achieve better global convergence. Finally, experimental results are presented that show some speed-up compared to the non-multilevel parallel NR method. 7.4 Publication IV: A Parallel harmonic balance simulator for shared memory multicomputer A parallelization of the HB simulator (APLAC) for a shared memory multiprocessor computer is presented. The paper shows how to utilize multithreading in some critical operations: computation of the values of nonlinear elements at each sample point, the computation of matrix-vector products, and the construction and solution of the block-diagonal preconditioner. As a result, a reasonable scalable simulator is achieved. 7.5 Publication V: Mixed preconditioners for harmonic balance Jacobians The efficiency of a linear iterative solver depends heavily on the preconditioner used. Naturally, most preconditioners for HB equations are in the frequency domain, one of the simplest being the block-diagonal preconditioner. While this is simple and effective for weakly nonlinear circuits, for highly nonlinear cases, especially for frequency dividers, TD preconditioners, which take nonlinear behavior better into account, become attractive. However, both TD and FD preconditioners may lack some good properties. In some situations, changing the preconditioner during the iteration is needed. In this paper, mixed FD/TD preconditioners for HB Jacobians are presented. The efficiency of mixed preconditioners is demonstrated with realistic simulation examples. The results showed that mixing the preconditioners is mostly a good strategy, but not superior. 54 Summary of the publications 7.6 Publication VI: Frequency/time block preconditioners for harmonic balance Jacobians The RF circuit may have a structure where different parts of the HB Jacobians require different preconditioners. [PV] shows how to mix frequencydomain and time-domain preconditioners for full HB Jacobians, but this paper proposes block preconditioners that combine time- and frequencydomain preconditioners such that different preconditioners can be chosen for each HB Jacobian block separately. The preconditioners were tested with a circuit consisting two parts, one suitable for a FD preconditioner other for a TD preconditioner. The simulation result showed that these preconditioners can improve the convergence for this type of circuit. 7.7 Publication VII: Study and development of an efficient RC-in–RC-out MOR method The paper presents a partitioning and macromodel-based MOR method for RC circuits. The original method proposed by Liao and Dai [103] is divided into three parts: circuit partitioning, moment calculation, and macromodel synthesis. For each of these parts, alternative approaches are presented. The alternatives are, then, analyzed and the most efficient solutions to each of these steps are presented. As a result, a MOR method that uses hMETIS as the partitioning algorithm, an MNA-based moment calculation, and the simplied macromodeling method is constructed. In other words, the revised method uses the same original idea as in [103], but in a more efficient manner. The revised partitioning-based RC MOR flow was compared to PRIMA, and TICER. For the most RC circuits, the proposed method out-performed TICER and PRIMA. The PRIMA of course can be applied to RLC circuits, too. 7.8 Publication VIII: Hierarchical model-order reduction flow This paper presents a hierarchical model-order reduction (HMOR) flow, where the linear parts of a hierarchically defined circuit are divided into independently reducable subcircuits. The impact of the hierarchical structure and circuit partitioning on two MOR methods is discussed and simulation results are presented. 55 Summary of the publications The benefits of performing MOR in a hierarchical manner are: The problem can be divided into parts which can be solved separately, thus allowing faster analysis using parallel processing. It also requires less memory. The repeated subcircuits need to be analyzed only once, and a different MOR method may be chosen for each individual part of the original problem. Circuit partitioning presents a natural way to benefit from hierarchical analysis. In addition to general discussion, the HMOR flow is demonstrated using two MOR methods, the RC MOR method presented in [PVII], and PRIMA. The simulation examples showed that also PRIMA benefits from the hierarchical analysis as well as the RC MOR method, where the partitioning is a vital step of the method. 7.9 Publication IX: GABOR: Global-approximation-based order reduction This paper proposes a new approach for the MOR of RLC circuit blocks. Instead of Taylor-series-like local fitting using implicit moment matching, a global approximation of the matrix-valued s-domain transfer function is generated. Then, the Krylov-like subspace spanned by the moments of this approximation is used to set up the projection matrices needed for MOR. The proposed MOR approach preserves passivity, reciprocity, and the properties of the global approximation. The simulation example, a reduction of a dispersive transmission line model, verifies the correct operation of the method. However, there are numerical problems with the method that implies that GABOR is not a competitive method. The concept of global approximation is still worth studing. 7.10 Publication X: PartMOR: Partitioning-based realizable model-order reduction method for RLC circuits PartMOR is an extension of the RC and RL MOR methods presented in [PVII] and [114] into a general-purpose RLC MOR method. This method partitions the RLC circuit into subcircuits that are modeled with two macromodels that are used to match three moments of the original y parameters of each subcircuit. The matching is done simultaneously partly at DC and partly at infinity. Also, thanks to analysis at both DC and infinity, a singularity of the G 56 Summary of the publications matrix at either frequency can be avoided. The test simulations with RC, RL, and RLC circuits are presented, and they show that the performance in terms of accuracy and reduction ratio of PartMOR is generally better than that of SPRIM with the RLCSYN macromodel synthesis method. 57 Summary of the publications 58 Bibliography [1] L. W. Nagel, SPICE2: A Computer Program to Simulate Semiconductor Circuits. PhD thesis, EECS Department, University of California, Berkeley, 1975. [2] R. Saleh, S.-J. Jou, and A. R. Newton, Mixed-Mode Simulation and Analog Multilevel Simulation. Boston, MA: Kluwer Academic Publisher, 1994. [3] APLAC 8.6 manuals, 2012. [4] M. Valtonen, P. Heikkilä, H. Jokinen, and T. Veijola, “APLAC — objectoriented circuit simulator and design tool,” in Low-power HF Microelectronics: a Unified Approach (G. A. S. Machado, ed.), pp. 333–372, London, UK: IEE, 1996. [5] Y. Saad and M. H. Schultz, “GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 7, pp. 856–869, 1986. [6] L. O. Chua and P.-M. Lin, Computer-Aided Analysis of Electronic Circuits: Algorithms and Computational Techniques. Prentice-Hall, 1975. [7] J. Vlach and K. Singhal, Computer Methods for Circuit Analysis and Design. New York, NY, USA: John Wiley & Sons, Inc., 1983. [8] J. Ogrodzki, Circuit Simulation Methods and Algorithms. Boca Raton-Ann Harbor-Tokyo-London: CRC Press, 1994. [9] W. J. McCalla, Fundamentals of Computer-Aided Circuit Simulation. Boston/Dordrecht/Lancaster: Kluwer Academic Publishers, 1988. [10] P. J. C. Rodrigues, Computer-aided analysis of nonlinear microwave circuits. Norwood, MA: Artech House, Inc., 1998. [11] K. S. Kundert, J. K. White, and A. Sangiovanni-Vincentelli, Steady-State Methods for Simulating Analog and Microwave Circuits. Boston: Kluwer Academic Publishers, 1990. [12] C.-W. Ho, A. E. Ruehli, and P. A. Brennan, “The modified nodal approach to network analysis,” IEEE Transactions on Circuits and Systems, vol. 22, pp. 504–509, June 1975. [13] H. Gaunholt, P. Heikkilä, K. Mannersalo, V. Porra, and M. Valtonen, “Gyrator transformation — a better way for modified nodal approach,” in Proceedings of European Conference on Circuit Theory and Design, vol. 2, pp. 864–872, July 1991. 59 Bibliography [14] K. S. Kundert, “Sparse matrix techniques,” in Circuit Analysis, Simulation and Design (A. E. Ruehli, ed.), pp. 281–324, Elseviers Science Publishers B. V., 1986. [15] V. Linja-aho, “Homotopy methods in DC circuit analysis,” Master’s thesis, Helsinki University of Technology, 2006. [16] L. Trajković, R. C. Melville, and S.-C. Fang, “Finding DC operating points of transistor circuits using homotopy methods,” in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 758–761, 1991. [17] K. Yamamura, T. Sekiguchi, and Y. Inoue, “A fixed-point homotopy method for solving modified nodal equations,” IEEE Transactions on Circuits and Systems I, vol. 46, pp. 654–665, June 1999. [18] R. C. Melville, L. Trajković, S.-C. Fang, and L. T. Watson, “Artificial parameter homotopy methods for the DC operating point problem,” IEEE Transactions on Computer-Aided Design, vol. 12, pp. 861–877, June 1993. [19] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations. Philadelphia: SIAM, 1995. [20] H. R. Yeager and R. W. Dutton, “Improvement in norm-reducing Newton methods for circuit simulation,” IEEE Transactions on Computer-Aided Design, vol. 8, pp. 538–546, May 1989. [21] J. Roos, Improving the Speed and Convergence of DC Analysis by Means of Self-Generating Lookup Tables and Piecewise-Linear Analysis. PhD thesis, Helsinki University of Technology, 1999. [22] P. Maffezzoni, L. Codecasa, and D. D’Amore, “Time-domain simulation of nonlinear circuits through implicit Runge–Kutta methods,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 54, pp. 391–400, Feb. 2007. [23] M. S. Nakhla and J. Vlach, “A piecewise harmonic balance technique for determination of the periodic response of nonlinear system,” IEEE Transactions on Circuits and Systems, vol. CAS-23, pp. 85–91, Feb. 1976. [24] S. A. Maas, Nonlinear Microwave Circuits. MA: Artech House, Inc., 1988. [25] R. Dembo, S. Eisenstats, and T. Steighaug, “Inexact Newton methods,” SIAM Journal on Numerical Analysis, vol. 19, pp. 400–408, 1982. [26] D. Valtchev and V. Georgiev, “Time-frequency transformation for the spectral balance methods,” International Journal of Electronics and Communications (AEÜ), vol. 49, no. 1, 1995. [27] J. Virtanen, V. Karanko, T. Tinttunen, and M. Heimlich, “Frequency selective harmonic balance analysis,” in Proceedings of the EUMW’09, pp. 1070– 1073, 2009. [28] V. Karanko and T. Tinttunen, “Multi-rate harmonic balance provides a new solution for nonlinear simulation,” High Frequency Electronics, pp. 30–37, 2009. 60 Bibliography [29] V. Rizzoli, D. Masotti, F. Mastri, and E. Montanari, “System-oriented harmonic-balance algorithm for circuit-level simulation,” IEEE Transactions on Computer-Aided Design, vol. 30, pp. 256–269, Feb. 2011. [30] Y. Saad, “A flexible inner-outer preconditioned GMRES algorithm,” SIAM Journal on Scientific Computing, vol. 14, no. 2, pp. 461–469, 1993. [31] H. A. van der Vorst and C. Vuik, “GMRESR: a family of nested GMRES methods,” Numerical Linear Algebra with Applications, vol. 1, pp. 369– 386, 1994. [32] J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. Van der Vorst, Numerical Linear Algebra for High-Performance Computers. Philadelphia: SIAM, 1998. [33] P. Feldmann, B. Melville, and D. Long, “Efficient frequency domain analysis of large nonlinear analog circuits,” in Proceedings of the IEEE 1996 Custom Integrated Circuits Conference, pp. 461–464, May 1996. [34] R. Telichevesky, K. Kundert, I. Elfadel, and J. White, “Fast simulation algorithms for RF circuits,” in Proceedings of the IEEE 1996 Custom Integrated Circuits Conference, pp. 437–444, May 1996. [35] D. Long, R. Melville, K. Ashby, and B. Horton, “Full-chip harmonic balance,” in Proceedings of the IEEE 1996 Custom Integrated Circuits Conference, pp. 379–382, May 1997. [36] O. Nastov, Spectral methods for Circuit Analysis. PhD thesis, MIT, EECS, 1999. [37] F. Veerse, “Efficient iterative time preconditioners for harmonic balance RF circuit simulation,” in Proceedings of the 2003 IEEE/ACM International Conference on Computer-Aided Design, (Washington, DC, USA), pp. 251–254, IEEE Computer Society, 2003. [38] W. Dong and P. Li, “Hierarchical harmonic balance methods for frequencydomain analog circuit analysis,” IEEE Transactions on Computer-Aided Design, vol. 26, pp. 2089–2101, Dec. 2007. [39] R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients,” Computing Journal, vol. 7, pp. 149–154, 1964. [40] C. T. Kelley, Iterative Methods for Optimization. Philadelphia: SIAM, 1999. [41] L. Grippo, F. Lampariello, and S. Lucidi, “A nonmonotone line search technique for Newton’s method,” SIAM Journal on Numerical Analysis, vol. 23, pp. 707–716, Aug. 1986. [42] M. Ferris, S. Lucidi, and M. Roma, “Nonmonotone curvilinear line search methods for unconstrained optimization,” Computational Optimization and Applications, vol. 6, pp. 117–136, 1996. [43] M. Powell, “A new algorithm for unconstrained optimization,” in Nonlinear Programming (J. Rosen, O. Mangasarian, and K. Ritter, eds.), pp. 31–65, New York: Academic Press, 1970. 61 Bibliography [44] K. Levenberg, “A method for the solution of certain nonlinear problems in least squares,” Quarterly of Applied Mathematics, vol. 4, pp. 164–168, 1944. [45] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinear parameters,” Journal of the Society for Industrial and Applied Mathematics, vol. 11, pp. 431–441, 1963. [46] A. Bouaricha and R. Schnabel, “TENSOLVE: A software package for solving systems of nonlinear equations and nonlinear least squares problems using tensor methods,” preprint MCS-P463-0894, Mathematics snd Computer Science Division, Argonne National Laboratory, Argonne, IL, 1994. [47] N. Deng, Y. Xiao, and F. Zhou, “Nonmonotonic trust region algorithm,” Journal of optimization theory and applications, vol. 76, pp. 259–285, Feb. 1993. [48] P. L. Toint, “A non-monotone trust-region algorithm for nonlinear optimization subject to convex constraints,” Mathematical Programming, vol. 77, pp. 69–94, 1997. [49] R. Schnabel and P. Frank, “Tensor methods for nonlinear equations,” Siam Journal on Numerical Analysis, vol. 21, pp. 815–843, Oct. 1984. [50] H. H. Happ, Diakoptics and Networks. New York: Academic Press, 1971. [51] L. O. Chua and L.-K. Chen, “Diakoptic and generalized hybrid analysis,” IEEE Transactions on Circuits and Systems, vol. CAS-23, pp. 694–705, Dec. 1976. [52] F. F. Wu, “Solution of large-scale networks by tearing,” IEEE Transactions on Circuits and Systems, vol. CAS-23, pp. 706–713, Dec. 1976. [53] G. Guardabassi and A. Sangiovanni-Vincentelli, “A two level algorithm for tearing,” IEEE Transactions on Circuits and Systems, vol. CAS-23, pp. 783–791, Dec. 1976. [54] I. N. Hajj, “Sparsity considerations in network solution by tearing,” IEEE Transactions on Circuits and Systems, vol. CAS-27, pp. 357–366, May 1980. [55] U. Kleis, O. Wallat, U. Wever, and Q. Zheng, “Domain decomposition methods for circuit simulation,” in Proceedings of the 8th Workshop on Parallel and Distributed Simulation, pp. 183–184, 1994. [56] H. Yu, C. Chu, Y. Shi, D. Smart, L. He, and S. X.-D. Tan, “Fast analysis of a large-scale inductive interconnect by block-structure-preserved macromodeling,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, pp. 1399–1411, Oct. 2010. [57] M. Vlach, “LU decomposition algorithms for parallel and vector computation,” in Analog Methods for Computer-Aided Circuit Analysis and Design (T. Ozawa, ed.), pp. 37–64, New York and Basel: Marcel Dekker Inc., 1988. [58] N. B. G. Rabbat, A. L. Sangiovanni-Vincentelli, and H. Y. Hsieh, “A multilevel Newton algorithm with macromodeling and latency for the analysis of large-scale nonlinear circuits in the time domain,” IEEE Transactions on Circuits and Systems, vol. CAS-26, pp. 733–741, Sept. 1979. 62 Bibliography [59] X. Zhang, R. H. Byrd, and R. B. Schnabel, “Parallel methods for solving nonlinear block bordered system of equations,” SIAM Journal on Scientific and Statistical Computing, vol. 13, pp. 841–859, July 1992. [60] X. Zhang, “Dynamic and static load balancing for block bordered system circuit equations on multiprocessors,” IEEE Transactions on ComputerAided Design, vol. 11, pp. 1086–1094, Sept. 1992. [61] J. Borchhardt, F. Grund, D. Horn, and M. Uhle, “MAGNUS — mehrstufige analyse großer netzwerke und systeme,” Tech. Report 9, WIAS, Berlin, 1994. [62] C. Cocchi, A. Benedetti, and Z. M. Kovàcs-V., “A new subcircuit ordering algorithm for a multilevel cicuit simulator,” in Proceedings of European Conference on Circuit Theory and Design, pp. 1059–1062, 1995. [63] U. Wever and Q. Zheng, “Parallel transient analysis for circuit simulation,” in Proceedings of the 29th Annual Hawaii international Conference on Systems Sciences, pp. 442–447, 1996. [64] N. Fröhlich, B. M. Riess, U. A. Wever, and Q. Zheng, “A new approach for parallel simulation of VLSI circuits on a transistor level,” IEEE Transactions on Circuits and Systems I, vol. 45, pp. 601–613, June 1998. [65] J. Borchhardt, F. Grund, and D. Horn, “Parallized numerical methods for large systems of differential-algebraic equations in industrial applications,” Surveys on Mathematics for Industry, vol. 8, pp. 201–211, 1999. [66] V. Rizzoli, F. Mastri, and D. Masotti, “A hiearchical harmonic-balance technique for the efficient simulation of large size nonlinear microvave circuits,” in Proceedings of 25th European Microwave Conference, pp. 615– 619, 1995. [67] N. R. Aluru and J. White, “A multi-level Newton method for static and fundamental frequency analysis of electromechanical systems,” in International Conference on Simulation of Semiconductor Processes and Devices SISPAD’97, pp. 125–128, 1997. [68] S. D. Senturia, N. Aluru, and J. White, “Simulating the behavior of MEMS devices: Computational methods and needs,” IEEE Computational Science and Engineering, vol. 4, pp. 30–43, Jan. 1997. [69] N. R. Aluru and J. White, “A multilevel Newton method for mixed-energy domain simulation of MEMS,” Journal of Microelectromechanical Systems, vol. 8, pp. 299–308, Sept. 1999. [70] K. Mayaram and D. O. Pederson, “Coupling algorithms for mixed-level circuit and device simulation,” IEEE Transactions on Computer-Aided Design, vol. 11, pp. 1003–1012, Aug. 1992. [71] F. M. Rotela, Mixed Circuit and Device Simulation for Analysis, Design, and Optimization of Opto-Electronic, Radio Frequency, and High Speed Semiconductor Devices. PhD thesis, Stanford University, Apr. 2000. [72] G. Zanghirati, “Global convergence of nonmonotone strategies in parallel methods for block-bordered nonlinear systems,” Journal of Computational and Applied Mathematics, vol. 107, pp. 137–168, Jan. 2000. 63 Bibliography [73] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM: Parallel Virtual Machine, A Users’ Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994. [74] Message Passing Interface Forum, “MPI: a message-passing interface standard,” The International Journal of Supercomputer Applications, vol. 8, no. 3/4, 1994. [75] R. A. Saleh, K. A. Gallivan, M.-C. Chang, I. N. Hajj, D. Smart, and T. N. Trick, “Parallel circuit simulation on supercomputers,” Proceedings of the IEEE, vol. 77, pp. 1915–1931, Dec. 1989. [76] X. Ye, W. Dong, P. Li, and S. Nassif, “Hierarchical multialgorithm parallel circuit simulation,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, pp. 45–58, Jan. 2011. [77] M. Hulkkonen, “Graphics prosessing unit utilization in circuit simulation,” Master’s thesis, Aalto University School of Electrical Engineering, 2011. [78] R. E. Poore, “GPU-accelerated time-domain circuit simulation,” in IEEE 2009 Custom Integrated Circuits Conference, pp. 629–632, 2009. [79] K. Gulati, J. Croix, S. Khatri, and R. Shastry, “Fast circuit simulation on graphics processing units,” in Proceedings of 2009 Asia and South Pacific Design Automation Conference, pp. 403 – 408, 2009. [80] W. Bomhof, Iterative and Parallel methods for Linear Systems with Applications in Circuit Simulation. PhD thesis, Utrecht University, 2001. [81] W. Bomhof and H. A. van der Vorst, “A parallel linear system solver for circuit simulation problems,” Numerical Linear Algebra with Applications, vol. 7, pp. 649–665, Oct.–Dec. 2000. [82] A. Basermann, U. Jaekel, M. Nordhausen, and K. Hachiya, “Parallel iterative solvers for sparse linear systems in circuit simulation,” Future Generation Computer Systems, pp. 1275–1284, 2005. [83] T. A. Davis and E. Palamadai Natarajan, “Algorithm 907: Klu, a direct sparse solver for circuit simulation problems,” ACM Transactions on Mathematical Software, vol. 37, Sept. 2010. [84] D. L. Rhodes and B. S. Perlman, “Parallel computation for microwave circuit simulation,” IEEE Transactions on Microwave Theory and Techniques, vol. 45, pp. 587–592, May 1997. [85] D. L. Rhodes and A. Gerasoulis, “A scheduling approach to parallel harmonic balance simulation,” Concurrency: Practice and Experience, vol. 12, pp. 175–187, June 2000. [86] W. Dong and P. Li, “A parallel harmonic balance approach to steady-state and envelope-following simulation of driven and autonomous circuits,” IEEE Transactions on Computer-Aided Design, vol. 8, pp. 409–501, Apr. 2009. [87] M. Celik, L. Pileggi, and A. Odabasioglu, IC Interconnect Analysis. Boston/Dordrecht/London: Kluwer Academic Publishers, 2002. 64 Bibliography [88] S. X.-D. Tan and L. He, Advanced Model Order Reduction Techniques in VLSI Design. New York, NY: Cambridge University Press, 2007. [89] P. Benner, M. Hinze, and E. J. W. ter Maten, eds., Model Reduction for Circuit Simulation, vol. 74 of Lecture Notes in Electrical Engineering. Springer, 2011. [90] A. C. Antoulas, Approximation of Large-Scale Dynamical Systems (Advances in Design and Control). Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2005. [91] L. T. Pillage and R. A. Rohrer, “Asymptotic waveform evaluation for timing analysis,” IEEE Transactions on Computer-Aided Design, vol. 9, pp. 352– 366, Apr. 1990. [92] P. Feldmann and R. Freund, “Efficient linear circuit analysis by padé approximation via the lanczos process,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 14, pp. 639–649, May 1995. [93] A. Odabasioglu, M. Celik, and L. T. Pileggi, “PRIMA: passive reduced-order interconnect macromodeling algorithm,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 17, pp. 645–654, Aug. 1998. [94] R. W. Freund, “SPRIM: structure-preserving reduced-order interconnect macromodeling,” in Proceedings of the 2004 IEEE/ACM International Conference on Computer-Aided design, ICCAD ’04, pp. 80–87, 2004. [95] B. Moore, “Principal component analysis in linear systems: controllability, observability, and model reduction,” IEEE Transactions on Automatic Control, vol. 26, pp. 17 – 32, Feb. 1981. [96] J. R. Phillips, L. Daniel, and L. M. Silveira, “Guaranteed passive balanced transformation for model order reduction,” in Proceedings of Design Automation Conference, pp. 52–57, 2002. [97] J. Phillips and L. Silveira, “Poor man’s TBR: a simple model reduction scheme,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, pp. 43 – 55, Jan. 2005. [98] T. Stykel, “Balancing-related model reduction of circuit equations using topological structure,” in Model Reduction for Circuit Simulation (P. Benner, M. Hinze, and E. J. W. ter Maten, eds.), pp. 53–83, Springer, 2011. [99] B. N. Sheehan, “TICER: realizable reduction of extracted RC circuits,” in Proceedings of the 1999 IEEE/ACM international conference on Computeraided design, pp. 200–203, 1999. [100] B. N. Sheehan, “Realizable reduction of RC networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 26, pp. 1393–1407, Aug. 2007. [101] C. S. Amin, M. H. Chowdhury, and Y. I. Ismail, “Realizable RLCK circuit crunching,” in Proceedings of the 40th annual Design Automation Conference, (New York, USA), pp. 226–231, ACM, 2003. 65 Bibliography [102] G. Karypis and V. Kumar, “hMETIS, a hypergraph partitioning package version 1.5.3.” [103] H. Liao and W. W.-M. Dai, “Partitioning and reduction of RC interconnect networks based on scattering parameter macromodels,” in Digest of Technical Papers of IEEE/ACM International Conference on Computer Aided Design, pp. 704–709, 1995. [104] R. Ionutiu, J. Rommes, and W. Schilders, “SparseRC: sparsity preserving model reduction for RC circuits with many terminals,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, pp. 1828–1841, Dec. 2011. [105] P. Miettinen, M. Honkala, J. Roos, and M. Valtonen, “Partitioning-based reduction of circuits with mutual inductances,” in Scientific Computing in Electrical Engineering SCEE 2010 (B. Michielsen and J.-R. Poirier, eds.), Mathematics in Industry, Vol. 16, 2012. [106] Y.-M. Lee and C. C.-P. Chen, “Hierarchical model order reduction for signal-integrity interconnect synthesis,” in Proceedings of the 11th Great Lakes symposium on VLSI, pp. 109–114, 2001. [107] Y.-M. Lee, Y. Cao, T.-H. Chen, J. M. Wang, and C. C.-P. Chen, “HiPRIME: hierarchical and passivity preserved interconnect macromodeling engine for RLCK power delivery,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, pp. 797–806, June 2005. [108] D. Li, S. X.-D. Tan, and L. Wu, “Hierarchical Krylov subspace based reduction of large interconnects,” Integration, the VLSI Journal, vol. 42, pp. 193– 202, Feb. 2009. [109] T. Palenius and J. Roos, “Comparison of reduced-order interconnect macromodels for time-domain simulation,” IEEE Transactions on Microwave Theory and Techniques, vol. 52, pp. 2240–2250, Sept. 2004. [110] F. Yang, X. Zeng, Y. Su, and D. Zhou, “RLCSYN: RLC equivalent circuit synthesis for structure-preserved reduced-order model of interconnect,” in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 2710–2713, 2007. [111] Y. Su, J. Wang, X. Zeng, Z. Bai, C. Chiang, and D. Zhou, “SAPOR: secondorder Arnoldi method for passive order reduction of RCS circuits,” in Proceedings of International Conference on Computer-Aided Design (ICCAD), pp. 74–79, 2004. [112] Y. Matsumoto, Y. Tanji, and M. Tanaka, “Efficient SPICE-netlist representation of reduced-order interconnect model,” in Proceedings of European Conference on Circuit Theory and Design, vol. 2, (Espoo, Finland), pp. 145– 148, Aug. 2001. [113] J. Roos, M. Honkala, and P. Miettinen, “PRIOR: passive, reciprocal, and infinity-observing reduction.” presentation at Autumn School on Future Developments in Model Order Reduction Terschelling, The Netherlands, September 21–25, 2009. 66 Bibliography [114] P. Miettinen, M. Honkala, and J. Roos, “Partitioning based RL-in–RL-out MOR method,” in Scientific Computing in Electrical engineerign SCEE 2008 (L. R. Costa and J. Roos, eds.), pp. 119–120, 2008. 67 Bibliography 68 Errata Publication III Eq. (4): A 1 J= A2 .. . B1 B2 .. . A m Bm C1 C2 . . . Cm D Page 4: “– it is in direction of steepest descent” should be ”– it is in the descent direction”. Publication V Eq. (10) and (15): C and G should be C̃ and G̃. Eq. (17): Ck s and Gk s should be ck s and gk . 69 Errata 70 A al t o D D1 7 4 / 2 0 1 2 Mo de rn e l e c t ro nicc irc uit s aret ypic al l y l arge ,c o nsist ing o ft h o usands o ft ransist o rs and o t h e rc o mpo ne nt s.D uring t h ede sign pro c e ss, t h e reis a ne e dt ope rfo rm c o mput at io nal l y de manding nume ric al simul at io ns t ove rify t h efunc t io nal it yo ft h e c irc uit .T h us, t h ene e d fo r fastand ac c urat e c irc uitsimul at io nt o o l s is o bvio us. F o ur appro ac h e st oimpro vet h espe e d and t h ec o nve rge nc eo ft h enume ric alc irc uit simul at io n arepre se nt e d. 9HSTFMG*aejccj+ I S BN9 7 89 5 2 6 0 4 9 2 2 9 I S BN9 7 89 5 2 6 0 4 9 2 36( p d f ) I S S N L1 7 9 9 4 9 34 I S S N1 7 9 9 4 9 34 I S S N1 7 9 9 4 9 4 2( p d f ) A a l t oU ni v e r s i t y S c h o o lo fE l e c t r i c a lE ng i ne e r i ng D e p a r t me nto fR a d i oS c i e nc ea ndE ng i ne e r i ng w w w . a a l t o . f i BU S I N E S S+ E C O N O M Y A R T+ D E S I G N+ A R C H I T E C T U R E S C I E N C E+ T E C H N O L O G Y C R O S S O V E R D O C T O R A L D I S S E R T A T I O N S