Jacobi-Davidson methods and preconditioning with applications in

Nat.Lab. Unclassified Report 2002/817
Date of issue: 05/2002
Jacobi-Davidson methods and
preconditioning with applications in
pole-zero analysis
Master’s Thesis
Joost Rommes
Unclassified Report
c
Koninklijke Philips Electronics N.V. 2002
2002/817
Unclassified Report
Authors’ address data: Joost Rommes WAY03.73; jrommes@math.uu.nl, jrommes@cs.uu.nl
c Koninklijke Philips Electronics N.V. 2002
All rights are reserved. Reproduction in whole or in part is
prohibited without the written consent of the copyright owner.
ii
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Unclassified Report:
2002/817
Title:
Jacobi-Davidson methods and preconditioning with applications in polezero analysis
Master’s Thesis
Author(s):
Joost Rommes
Part of project:
Traineeship at Philips Electronic Design and Tools/Analogue Simulation
(ED&T/AS) as master’s thesis project of the Utrecht University
Customer:
Keywords:
Pole-zero analysis, Jacobi-Davidson method, Jacobi-Davidson QR, JacobiDavidson QZ, preconditioning, Pstar, Eigenvalue problems, PSS, JDQR,
JDQZ, QR, QZ, Arnoldi
Abstract:
This report discusses the application of Jacobi-Davidson style methods in
electric circuit simulation. Using the generalised eigenvalue problem, which
arises from pole-zero analysis, as a starting point, both the JDQR-method
and the JDQZ-method are studied. Although the JDQR-method (for the
ordinary eigenproblem) and the JDQZ-method (for the generalised eigenproblem) are designed to converge fast to a few selected eigenvalues, they
will be used to compute all eigenvalues. With the help of suitable preconditioners for the GMRES process, to solve the correction equation of
the Jacobi-Davidson method, the eigenmethods are made more suitable for
pole-zero analysis. Numerical experiments show that the Jacobi-Davidson
methods can be used for pole-zero analysis. However, in a comparison
with the direct Q R and Q Z methods, the shortages in accuracy for certain
implementations of iterative methods become visible. Here preconditioning techniques improve the performance of the Jacobi-Davidson methods.
The Arnoldi method is considered as the iterative competitor of the JacobiDavidson methods.
Besides applications in pole-zero analysis, the Jacobi-Davidson methods
are of great use in stability analysis and periodic steady state analysis.
An implementation of the iterative Jacobi-Davidson methods in Pstar, respecting the hierarchy, is possible, because no dense, full-dimensional matrix multiplications are involved. A description of the hierarchical algorithm
in Pstar is given.
This project has been a cooperation between Philips Electronic Design &
Tools/Analogue Simulation (ED&T/AS) and the Utrecht University (UU).
It has been executed under supervision of Dr. E.J.W. ter Maten (ED&T/AS)
and Prof.Dr. H.A. van der Vorst (UU).
c Koninklijke Philips Electronics N.V. 2002
iii
2002/817
Unclassified Report
Conclusions:
The most important conclusion is that Jacobi-Davidson style methods are
suitable for application in pole-zero analysis under the following assumptions, apart from the dimension of the problem: the eigenspectrum of the
generalised eigenproblem must be not too wide or an efficient preconditioner must be available. If one or both of these assumptions are not met,
there is no special preference for Jacobi-Davidson style methods above the
(restarted) Arnoldi method. On the contrary, with the typical convergence
behaviour of Jacobi-Davidson in mind, the Arnoldi method should be chosen in that case. Nevertheless, if both assumptions are met, one can profit
from the quadratic convergence of the Jacobi-Davidson style methods, combined with acceptable accuracy.
Arguments that Jacobi-Davidson style methods are more robust than the
Arnoldi method are in this case not valid, as numerical experiments have
shown. Some of these arguments are based on the current Arnoldi implementation in Pstar; some improvements for the Arnoldi implementation in
Pstar are proposed.
Nevertheless, Jacobi-Davidson style methods are very applicable in stability analysis and periodic steady-state analysis, where only one or a few
eigenvalues are needed. For this type of applications, Jacobi-Davidson
style methods are preferred over the Arnoldi method. Furthermore, JacobiDavidson style methods are suitable for high dimensional problems because
the spectrum can be searched part by part. The Arnoldi method does lack
this property.
Finally, the direct Q R and Q Z method are superior in accuracy, robustness and efficiency for problems with relatively small dimensions. Even for
larger problems, their performance is acceptable. The only disadvantage is
that the direct methods do not fit in the hierarchical implementation of Pstar,
while the iterative methods do.
iv
c Koninklijke Philips Electronics N.V. 2002
Contents
1 Introduction
1
2 Basic electric circuit theory
2
2.1
Circuit terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.2
Kirchhoff’s laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.3
Branch constitutive relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.3.1
Resistive components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.3.2
Reactive components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3.3
Controlled components . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
3 Circuit equations
7
3.1
Incidence matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
3.2
Nodal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.2.1
Mathematical formulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.2.2
Nodal admittance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.2.3
Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
Modified nodal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3.3.1
Mathematical formulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3.3.2
Matrix-vector formulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Hierarchical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.4.1
Hierarchical circuit equations . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.4.2
Solving using the hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.4.3
Solving without using the hierarchy . . . . . . . . . . . . . . . . . . . . . .
20
3.3
3.4
4 Circuit analysis
21
4.1
Direct current analysis (DC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
4.2
Small signal analysis (AC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
v
2002/817
Unclassified Report
4.3
Transient analysis (TR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.4
Pole-zero analysis (PZ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
4.4.1
Transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
4.4.2
Physical applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.4.3
Numerical strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
Stamps for the C and G matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.5
5 Terminal currents
29
5.1
The terminal current as unknown on the circuit level . . . . . . . . . . . . . . . . . .
30
5.2
The terminal current as unknown on the sub-circuit . . . . . . . . . . . . . . . . . .
31
5.3
Eliminating the terminal current on the circuit level . . . . . . . . . . . . . . . . . .
32
5.4
Multi-level Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . .
33
6 Pole-zero analysis
35
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
6.2
Basic theory of pole-zero analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
6.3
Alternative formulation of the transfer function . . . . . . . . . . . . . . . . . . . .
38
6.3.1
Elementary response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
6.3.2
Natural frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
6.3.3
General response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
6.3.4
Eigenvalue formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
6.3.5
Eigenvalues and eigenvectors of interest . . . . . . . . . . . . . . . . . . . .
43
6.3.6
Frequency shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
6.3.7
Time domain impulse response . . . . . . . . . . . . . . . . . . . . . . . . .
44
Visualisation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
6.4.1
Plots of poles and zeroes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
6.4.2
Pole-zero Bode plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
6.4.3
Nyquist plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
6.4.4
Resonance example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
Numerical methods for pole-zero analysis . . . . . . . . . . . . . . . . . . . . . . .
50
6.5.1
Separate pole and zero computation . . . . . . . . . . . . . . . . . . . . . .
50
6.5.2
Combined pole and zero computation . . . . . . . . . . . . . . . . . . . . .
52
6.4
6.5
7 Pole-zero computation in Pstar
vi
53
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
7.2
The Q R method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
7.2.1
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
7.2.2
Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
7.2.3
Computational notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
7.2.4
Implementation notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
The Arnoldi method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
7.3.1
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
7.3.2
Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
7.3.3
Computational notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
7.3.4
Implementation notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
7.3.5
Possible improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
7.3.6
Arnoldi versus the Q R method . . . . . . . . . . . . . . . . . . . . . . . . .
64
7.4
A note on condition numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
7.5
The Q Z method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
7.5.1
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
7.5.2
Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
7.5.3
Computational notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
7.5.4
Implementation notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
Cancellation of poles and zeroes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
7.3
7.6
8 Stability analysis
69
8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
8.2
Complex mapping of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
8.3
Efficient maximal eigenvalue computation . . . . . . . . . . . . . . . . . . . . . . .
71
8.4
Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
9 Jacobi-Davidson methods
74
9.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
9.2
The origin of Jacobi-Davidson methods . . . . . . . . . . . . . . . . . . . . . . . .
74
9.2.1
Jacobi’s method: JOCC
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
9.2.2
Davidson’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
The Jacobi-Davidson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
9.3.1
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
9.3.2
The correction equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
Jacobi-Davidson QR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
9.4.1
79
9.3
9.4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
c Koninklijke Philips Electronics N.V. 2002
vii
2002/817
9.5
Unclassified Report
9.4.2
Deflation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
9.4.3
Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
9.4.4
JDQR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
9.4.5
Computational notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
Jacobi-Davidson QZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
9.5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
9.5.2
JDQZ Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
9.5.3
Deflation and restarting . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
9.5.4
Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
9.5.5
Approximate solution of the deflated correction equation . . . . . . . . . . .
88
9.5.6
JDQZ Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
9.5.7
Computational notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
10 Pole-zero analysis and eigenvalue computation: numerical experiments
92
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
10.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
10.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
10.3.1 Left-, right- and split-preconditioning . . . . . . . . . . . . . . . . . . . . .
93
10.3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
10.3.3 Diagonal preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
10.3.4 ILUT preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
10.4 Test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
10.5 Test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
10.6 The ordinary eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
10.6.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
10.6.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
10.6.3 JDQR results for problem pz 28 . . . . . . . . . . . . . . . . . . . . . . . 111
10.7 The generalised eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.7.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.7.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.7.3 JDQZ results for problem pz 28 . . . . . . . . . . . . . . . . . . . . . . . 126
10.8 Application in PSS analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.9 Numerical conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
11 Conclusions and future work
viii
135
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
11.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
11.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
References
138
A Matrix transformations and factorisations
141
A.1 Householder transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
A.2 Givens transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
A.3 Q R-factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A.4 Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Distribution
c Koninklijke Philips Electronics N.V. 2002
ix
2002/817
x
Unclassified Report
c Koninklijke Philips Electronics N.V. 2002
Chapter 1
Introduction
Computer aided circuit simulation is used by many electric engineers to obtain characteristics of particular circuits, without the necessity to have a physical prototype of the circuit concerned. There
are at least three factors which determine the applicability of an electric circuit simulator: the performance, the accuracy and the robustness. On the one hand, one should be able to run simulations
of complicated circuits on a workstation. On the other hand, the results must be both accurate and
characteristic.
The simulation of an electric circuit starts by forming the system of circuit equations, which is a numerical model of the circuit. As there are several kinds of circuit analysis, several numerical methods
are used to solve the system. However, the use of particular numerical methods depends on the way
the simulation is implemented, i.e. using plain matrices or using a hierarchical base. Moreover, the
performance and robustness of the simulator depends mainly on the numerical methods used. It is
clear that the numerical methods should be chosen with all this in mind.
This report will focus on numerical methods for one particular kind of circuit analysis, namely polezero analysis. The numerical formulation of a pole-zero analysis is a generalised eigenproblem. Chapter 2 handles elementary electric circuit aspects like the laws of Kirchhoff. Chapter 3 gives some
approaches to systematically formulate the circuit equations, while Chapter 4 is dedicated to several
kinds of circuit analysis. Chapter 5 serves as an intermezzo and discusses a special case which may
arise during hierarchical analysis of certain circuits, namely terminal currents.
Chapter 6 describes pole-zero analysis in detail, as well as the mathematical formulation. In Chapter
7, the numerical methods to solve the pole-zero problem currently used in Pstar are discussed. A
technique to perform a stability analysis is presented in Chapter 8. Chapter 9 introduces JacobiDavidson style methods for the solution of the pole-zero problem. Finally, Chapter 10 presents the
numerical results with the Jacobi-Davidson style methods, compared with other eigenvalue methods.
Readers who are well informed on the topics arising in electric circuit simulation, such as Modified
Nodal Analysis and pole-zero analysis, can skip the first chapters, except Chapter 5, and start with
Chapter 7. The first chapters are advised especially for numerical scientists with minimal knowledge
of electric circuit simulation.
1
Chapter 2
Basic electric circuit theory
2.1 Circuit terminology
Throughout this report, electric circuits will be considered as graphs of two types of elements: nodes
and branches. The branches, which are electric components like resistors and voltage sources, connect
the nodes, which can be viewed as representatives of voltage potentials. Figure 2.1 shows a simple
circuit consisting of 4 nodes and 5 branches.
c1
6
2
3
−
1
c2
l1
r1
+
j1
0
Figure 2.1: A simple RLC-circuit with 4 nodes and 5 branches.
The electric properties of some branches, like a voltage source, require a concrete direction due to the
difference between their positive and negative end node. A subgraph (sub-circuit) G s of the whole
graph (circuit) G is called a loop if G s is connected (i.e. there is a path between any two nodes of the
graph) and every node of G s has exactly two branches of G s incident at it[7]. Unless stated otherwise,
the symbol i k will denote the current through branch k and the symbol v j will denote the voltage
2
Unclassified Report
2002/817
potential of node j .
2.2 Kirchhoff’s laws
A direct result of the requirement that there is no accumulation of charge in any loop, is Kirchhoff’s
Current Law (KCL). This law states that all currents entering a node add up to zero:
Kirchhoff’s Current Law (KCL): At each node, the sum of all incoming currents is equal to the
sum of all outgoing currents:
X
(2.1)
i k = 0.
k∈loop
Kirchhoff’s second law concerns a relationship for the branch voltages:
Kirchhoff’s Voltage Law (KVL): The sum of all branch voltages along any closed loop in a circuit
adds up to zero:
X
(2.2)
vk = 0.
k∈loop
As the branch voltage vk is equal to the potential difference vk+ −vk− between the two nodes the branch
connects, the sum of potential differences along a closed loop must also add up to zero. This is a direct
result of the fact that any loop comes back to the same node, i.e. the same voltage potential.
Kirchhoff’s laws are so called topological laws: they are concerned with the topology of the circuit.
With the help of these laws the topological equations can be formed, which only depend on the way
the branches are connected.
2.3 Branch constitutive relations
The two Kirchhoff laws cover the topology of the circuit, but neglect the electrical properties of
branches. The branch constitutive relations (BCR) reflect the electrical properties of branches and
together with the two Kirchhoff Laws they form a complete description of the circuit. The branch
equations, which result from applying the branch constitutive relations, can contain both branch variables, like the current through a branch, and expressions containing branch variables.
In general, a branch equation only contains branch variables of the branch concerned. However, it is
possible that branch variables associated with other branches are also included in the branch equation.
Branch variables of this type are called controlling variables, making the branch a controlled branch.
An example of a controlled branch is a voltage-controlled current source.
2.3.1 Resistive components
Resistive components are characterised by algebraic branch equations, i.e. equations of the form:
xi = f (t, x),
c Koninklijke Philips Electronics N.V. 2002
3
2002/817
Unclassified Report
where xi ∈ R is the circuit variable concerned, x ∈ Rn is a vector containing all circuit variables and
f : R × Rn → R is a function depending on one or more circuit variables. Throughout this report,
the time variable will be t. Hence we exclude so called frequency-domain defined components.
Resistor
+
−
Figure 2.2: A resistor.
The BCR of a resistor (see Figure 2.2), in the linear case, is given by Ohm’s law, V = I R, resulting
in
iR =
V
.
R
(2.3)
More general, covering both the linear and non-linear resistor, the BCR is given by
i R = i(v R ),
(2.4)
where v R is the potential difference between the two nodes connected by the resistor. Unless stated
otherwise, a resistor is considered as a linear resistor with resistance Ri and is labelled ri , with i a
number.
Independent current source
Figure 2.3 shows an independent, ideal current source.
-
+
+
−
−
Figure 2.3: An independent current source.
A current source (ji ) generates a given current between its two nodes, independent of the voltage
across the nodes. This generated source may be a function of time (e.g. periodic sources). The BCR
of a current source becomes
i I = I (t).
(2.5)
Note that the voltage across the source can take any value, which will be defined implicitly by the
system.
4
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Independent voltage source
-
+
+
−
−
Figure 2.4: An independent voltage source.
An independent voltage source (ei ) supplies a given voltage (see Figure 2.4). Again, the voltage may
be a function of time, so the BCR becomes
vV = V (t).
(2.6)
Similar to the voltage across a current source, the current through the voltage source can take any
value.
2.3.2 Reactive components
Components that are modelled by differential equations are called reactive components. Differential
branch equations are of the following form:
xi =
d
f (t, x).
dt
Unless stated otherwise, the notation ẋ will be used for the time derivative
dx
dt
of x.
Capacitor
An ideal capacitor (see Figure 2.5) is an electrical component which can store an amount of electrical
charge (q), without loss.
+
−
Figure 2.5: A capacitor.
Current through a capacitor results in a growth of the charge: i = q̇. The relation between the charge
and the voltage across the capacitor in the general case is given by qc = q(vc ). Hence, a linear
capacitor, with a (constant) capacitance C, has the following BCR:
i c = q̇c = C v̇c .
(2.7)
A capacitor ci is in general considered to be a linear capacitor with capacity Ci . Non-linear capacitors
are used in the compact modelling of transistor devices (for instance modelling the behaviour of
substrate coupling).
c Koninklijke Philips Electronics N.V. 2002
5
2002/817
Unclassified Report
Inductor
Where a capacitor is characterised by a relationship between the charge it stores and the voltage across
it, an inductor (see Figure 2.6) is characterised by a relationship between the magnetic flux φ L and its
current i L .
+
−
Figure 2.6: An inductor.
Current through an inductor results in the creation of an electromagnetic voltage by self-inductance:
vinductance = −φ̇ L . The actual voltage across the inductor then reads v L = φ̇ L , with φ L = φ(i L ) in
the general case. The BCR for a linear inductor with inductance L is
v L = L i̇ L .
(2.8)
An inductor li is in general considered to be a linear inductor with inductance L i .
2.3.3 Controlled components
Controlled components are circuit components whose BCR contain branch variables which belong to
other components. The previously mentioned voltage-controlled current source (see also Figure 2.7)
is a simple example of a controlled component.
-
0
+
−
-
1
+
−
2
Figure 2.7: A voltage-controlled current source.
The difference between an independent current source and a purely controlled current source is the fact
that its own controlling component has no concrete contribution to the BCR (i.e. i c = 0). However,
the generated current is now a function of the voltage across the controlling component, so the BCR
becomes
i s = I (vc ).
(2.9)
However, mixed forms like i s = I (vs , vc ) are also allowed. In principle, vc may again be controlled
by i s .
6
c Koninklijke Philips Electronics N.V. 2002
Chapter 3
Circuit equations
This chapter describes in detail how the circuit equations corresponding to a circuit can be formulated.
The main part of this chapter is based on three different sources[7, 16, 18].
3.1 Incidence matrix
The two Kirchhoff Laws (2.1 and 2.2) can be represented in matrix-vector form. As these laws are
topological laws, the matrix-vector form can be obtained by inspecting the topology of the circuit.
The matrix is called the incidence matrix, which has the following definition[7]:
Incidence matrix: The incidence
by

 1 if branch j
ai j =
−1 if branch j

0 if branch j
matrix A ∈ Rn×b of a graph of n nodes and b branches is defined
is incident at node i and node i is the positive node of j ,
is incident at node i and node i is the negative node of j ,
is not incident at node i.
(3.1)
The incidence matrix has an important property. In the case of no self-loops (i.e. the positive and
negative node of a branch are not the same), each column of A exactly contains one 1, one −1 and
has all
Pother entries zero. A direct result of this property is the fact that all rows of A add up to zero
(i.e. i ai j = 0). Furthermore, as the topology does not change in time, the incidence matrix neither
does.
The incidence matrix of the example with 4 nodes and 5 branches (figure 2.1) becomes
0
1
2
3
j1 c1 c2 l1 r1


−1 0
0 −1 −1
1
1
0
0
0

.
 0 −1 1
1
0
0
0 −1 0
1
(3.2)
It has been stated that all rows of the incidence matrix A add up to zero. A consequence is that the
rows of A are not linearly independent, so the rank of A is less than n. In fact, for a realistic circuit,
which consists of one piece, it can be proven that the rank of A is n − 1[7].
7
2002/817
Unclassified Report
Now let the vector Vn ∈ R4 contain the nodal voltages, arranged in the same order as the rows of A
and let the vector ib ∈ R5 contain the branch currents, arranged in the same order as the columns of
A. Simple algebra shows that
Aib = 0
(3.3)
at any time, which is exactly Kirchhoff’s Current Law (2.1). In a similar way, it follows that
AT vn = vb
(3.4)
at any time, where vb ∈ R5 contains the branch voltages. It is clear that AT vn − vb is another way of
writing Kirchhoff’s Voltage Law (2.2). These two re-formulations of the KCL and the KVL hold in
general for any valid circuit. The elementary nodal analysis and the more extended modified analysis
are built upon this incidence matrix, as is described in the following sections.
3.2 Nodal analysis
3.2.1 Mathematical formulation
Nodal analysis uses the branch constitutive relations in combination with the KCL and KVL equations
to form a system of equations for the n unknown node voltages. These branch constitutive equations,
as described in section 2, will have the form
ik =
d
q(t, vb ) + j (t, vb ),
dt
(3.5)
where i k is the current through branch k, vb ∈ Rb contains the branch voltages and q, j : R ×
Rb → R are functions resulting from the BCR (q represents reactive parts while j represents resistive
contributions). In general, the functions q and j will depend only on their own local vb,k , but in the
case of controlled components they might consider the whole vector or even more (see section 3.3).
If all branch currents i k are put together in the vector ib ∈ Rb , the corresponding branch equations can
be presented in matrix-vector form:
ib =
d
q̃(t, vb ) + j̃(t, vb ),
dt
(3.6)
with q̃, j̃ : R × Rb → Rb .
System (3.6) contains b equations for 2b unknowns, so it cannot be solved in a straightforward manner.
This is the part where the rewritten KCL (3.3) and KVL (3.4) come into play, after left-multiplying
system (3.6) with the corresponding incidence matrix A:
ib
8
⇒
Aib
⇒
0
⇒
0
d
q̃(t, vb ) + j̃(t, vb )
dt
d
=
Aq̃(t, vb ) + Aj̃(t, vb )
dt
d
=
Aq̃(t, vb ) + Aj̃(t, vb )
dt
d
=
Aq̃(t, AT vn ) + Aj̃(t, AT vn ).
dt
=
(3.7)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
The original system (3.6) is now transformed to a system of n equations with n unknowns, the nodal
voltages. Note that when the nodal voltages are known, the branch voltages and the currents follow
immediately.
However, there still is a problem with the transformed system (3.7) and the problem lies in the incidence matrix A. The rows of A are not linearly independent, so the equation of the reformulated
KCL (3.3) neither are. In fact, each equation is implied by the other n − 1 equations. Knowing that
rank(A)= n − 1, the solution of this problem is to choose one of the unknown vn,k . Physically, this
corresponds with grounding one node, i.e. setting vn,k = vk . As a result, the k-th row of A and the
k-th coordinate of vn can be removed, resulting in  and v̂n . The new system has n − 1 equations for
n − 1 unknowns:
d
0=
(3.8)
Âq̃(t, ÂT v̂n + vk Aek ) + Âj̃(t, ÂT v̂n + vk Aek ),
dt
with ek ∈ Rn the k-th unit vector.
A commonly used form in circuit simulation for the branch equations and the topological Kirchhoff
equations is
d
q(t, x) + j(t, x) = 0.
dt
(3.9)
By defining x := v̂n and
q(t, x) :=
Âq̃(t, ÂT v̂n + vk Aek ),
j(t, x) :=
Âj̃(t, ÂT v̂n + vk Aek ),
the system (3.8) can be written in the form (3.9). A shorter notation can be obtained by defining
f : R × Rn−1 × Rn−1 → Rn−1 as
d
∂
∂
f(t, x, ẋ) := q(t, x) + j(t, x) = q(t, x) +
(q(t, x))ẋ + j(t, x),
(3.10)
dt
∂t
∂x
resulting in the most general form
f(t, x, ẋ) = 0.
(3.11)
Reactive branches will give contributions to q, while resistive branches will give contributions to j.
Table 3.1 shows the contributions of some elements to j and q.
Branch type (BCR)
Current source
(I = I (t))
Resistor
+
−
(i R = v −v
)
R
Capacitor
(i C = C(v̇ + − v̇ − ))
Node
v+
v−
v+
v−
v+
v−
j
I (t)
−I (t)
1 +
v − R1 v −
R
− R1 v + + R1 v −
q
Cv + −Cv −
−Cv + +Cv −
Table 3.1: Contributions of frequently used branches to the circuit equations.
Note that not all types of branches are present in the table. This is because of the fact that not all
branches, for example an inductor, can be handled by nodal analysis. Section 3.2.3 will throw light
on this issue.
c Koninklijke Philips Electronics N.V. 2002
9
2002/817
Unclassified Report
3.2.2 Nodal admittance matrix
An alternative formulation of the nodal analysis method is given in [18]. The section will illustrate
the method for linear circuits. The system of circuit equations is formulated as
Y V = I,
(3.12)
where Y ∈ Rn×n is the nodal admittance matrix, V ∈ Rn contains the (unknown) node voltages and
I ∈ Rn contains the independent source currents. Each equation of the system reflects the KCL for
a node, i.e. that the sum of all currents entering that node is zero. Consequently, the nodal admittance matrix typically contains admittances of branches, which together form the contribution of each
branch to the KCL for a node. The formation of Y happens in a straightforward, constructive process,
also called stamping. The basic idea is that a branch gives a positive contribution to its positive node
and a negative contribution to its negative node1 . These contributions are calculated using the BCRs of
Section 2.3; Table 3.2 shows the contributions of several components to the nodal admittance matrix
and I . Note that again not all types of branches are present in the table.
Branch type (BCR)
Current source
(I = I (t))
Resistor
+
−
(i R = v −v
)
R
Capacitor
(i C = C(v̇ + − v̇ − ))
Node
v+
v−
v+
v−
v+
v−
v+
v−
1
R
− R1
− R1
∗C
−∗C
I
I (t)
−I (t)
1
R
−∗C
∗C
Table 3.2: Contributions to the nodal admittance matrix and the RHS of frequently used branches.
The stars (∗C ) at some places indicate that the value is dependent of the kind of analysis that is
performed. Different kinds of analysis will be discussed in Chapter 4, but will be the same for all stars
belonging to the same branch.
Each row of Y , which reflects the KCL for a unique node, is identified by that node. Of course, these
nodes appear in the same order in the vector V . Finally, I contains independent current sources which
also contribute to the KCL.
3.2.3 Shortcomings
Nodal analysis is a very simple approach, but suffers from a couple of shortcomings. In the first place,
there is no satisfying way in which voltage sources can be handled, simply because they do not fit in
the form of equation (3.5) as their branch equation is of the form vb = V . In fact, no other unknowns
can be introduced besides the nodal voltages, so inductors and capacitors with unknown charge also
cannot be handled by nodal analysis. In the second place, current-controlled components neither can
be handled, leaving the nodal analysis to be appropriate for only current-defined voltage-controlled
components. The next section describes the modified nodal analysis method, which does not suffer
from these two facts.
1 This is an arbitrary choice; other choices are possible, provided that they are consequent.
10
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
3.3 Modified nodal analysis
3.3.1 Mathematical formulation
Besides the nodal voltages, there are other unknown variables which should be solved. It is clear
that nodal analysis is not capable of handling particular kinds of components, which introduce other
unknown variables. The modified nodal analysis is, as the name reads, in principle the same as nodal
analysis, but has a slight modification. In general, the unknown variables of modified nodal analysis
are
• Node voltages
• Voltage source currents
• Output currents
• Controlling source currents.
Modified nodal analysis differs from nodal analysis in the fact that it can handle voltage-defined,
current-controlled components also, i.e. branches with branch equations like
d
(3.13)
q̄(t, i j ) + j̄(t, i j ).
dt
In practice, there are current-defined and voltage-defined branches. For the sake of types of branches,
it does not matter whether a branch is current-controlled or voltage-controlled. Therefore, the vector
of branch voltages, vb , and the vector of branch currents, ib , are split into two parts, vb = (vbT1 , vbT2 )T
and ib = (ibT1 , ibT2 )T respectively, both of length b = b1 + b2 . The first part of both vectors, vb1
and ib1 corresponds to current-defined branches, while the second part, vb2 and ib2 corresponds to
voltage-defined branches. As a result, the incidence matrix A can also be partitioned into two parts:
A = ( A1 |A2 ), with dim( A1 ) = n × b1 and dim( A2 ) = n × b2 . In this new notation, Kirchhoff’s
Current Law (3.3) can be reformulated as
vj =
A1 ib1 + A2 ib2 = 0,
(3.14)
while Kirchhoff’s Voltage Law (3.4) becomes
A1T vn = vb1 ,
A2T vn = vb2 .
(3.15)
As stated earlier, there are now two types of branch equations. Again using the vector notation, the
branch equations become
ib1
=
vb2
=
d
q̃(t, vb1 , ib2 ) + j̃(t, vb1 , ib2 ),
dt
d
q̄(t, vb1 , ib2 ) + j̄(t, vb1 , ib2 ),
dt
(3.16)
(3.17)
with q̃, j̃ : R × Rb1 × Rb2 → Rb1 and q̄, j̄ : R × Rb1 × Rb2 → Rb2 . Again, like in section 3.2, using
the KCL (3.14) and the KVL (3.15), this equations can be rewritten:
− A2 ib2
=
A2T vn =
c Koninklijke Philips Electronics N.V. 2002
d
A1 q̃(t, A1T vn , ib2 ) + A1 j̃(t, A1T vn , ib2 ),
dt
d
q̄(t, A1T vn , ib2 ) + j̄(t, A1T vn , ib2 ).
dt
(3.18)
(3.19)
11
2002/817
Unclassified Report
However, the same problem as with nodal analysis arises: the system is undetermined. By grounding node k and proceeding the same way as in section 3.2, the following determined system can be
obtained:
d
A1 q̃(t, Â1T v̂n + vk A1T ek , ib2 )
dt
+ Â1 j̃(t, Â1T v̂n + vk A1T ek , ib2 ),
d
=
ˆ n + vk A1 ek , ib2 )
q̄(t, Â1T vv
dt
+ j̄(t, Â1T vv
ˆ n + vk A1 ek , ib2 ).
− Â2 ib2 =
Â2T v̂n + vk A2T ek
(3.20)
This system has n − 1 + b2 equations for n − 1 + b2 unknown variables. The unknown variables are
v̂n and ib2 . Finally, by taking these unknown variables as the circuit variables, this system too can be
written in the general form of (3.9).
Table 3.3 is an addition to Table 3.1. It covers all other branches, which could not be handled by the
nodal analysis method.
Branch type (BCR)
Voltage source
(V = V (t))
V-def resistor
(v = V (i R ))
Capacitor
(i C = q˙C (v + − v − ))
(qC unknown)
Inductor
(v L = φ̇ L (i L ))
(φ L unknown)
Node
v+
v−
iV
v+
v−
iR
v+
v−
qC
v+
v−
iL
φL
j
q
iV
−i V
v+ − v− − V
iR
−i R
v + − v − − V (i R )
qC − Q C
iL
−i L
v+ − v−
φ L − 8 L (i L )
qC (v + − v − )
−qC
−φ L
Table 3.3: Contributions of frequently used branches to the circuit equations for modified nodal analysis.
For both the capacitor and the inductor, it is sometimes the case that the charge qC and the flux φ L ,
respectively, are treated as unknowns. This results in an extra equation for the unknown charge and
flux, where it is assumed that there is a nonlinear relation between the charge and voltage ( Q C ) and
the flux and current (8 L ) respectively.
Example
Figure 3.1 shows a simple RL-circuit. The resistor is a current-defined branch, while the inductor and
the voltage source are both voltage-defined branches. This leads to the following partition of vb and
12
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
r1
2
−
1
6
e1
+
l1
0
Figure 3.1: A simple RL-circuit.
ib :
vL
,
vE
i
vi2 = L ,
iE
vb1 = v R ,
vb2 =
i b1 = i R ,
with v R = v1 − v2 , v L = v2 − v0 and v E = v1 − v0 . The incidence matrix A is split accordingly in
two parts A1 and A2 :
r1

1 1
A1 = 2  0  ,
0 −1


l1 e1

1 1
0
A2 = 2 −1 1  .
0 0 −1
The functions q̃, j̃, q̄ and j̄ become
iL
iL
q̃(t, v R ,
) = 0 , j̃( v R ,
) = R1 v R
iE
iE
iL
iL
Li L
0
q̄(t, v R ,
)=
)=
, j̄( v R ,
V (t)
iE
0
iE
c Koninklijke Philips Electronics N.V. 2002
13
2002/817
Unclassified Report
These equations can be substituted in equations (3.18) and (3.19), yielding

  1

iL
(v − v2 )
R 1
,
− i E − i L  = 
0
1
−i E
− (v1 − v2 )
R d Li L
v2 − v0
0
=
+
.
v1 − v0
0
V (t)
dt
Because the system is again undetermined, node 0 will be grounded (v0 = 0). The result is the
following determined system:
1
iL
(v1 − v2 )
R
−
=
,
0
iE − iL
d Li L
v2
0
=
+
,
v1
0
V (t)
dt
which is of the form of equation (3.20). Finally, this result can be rewritten to the general form
d
q(t, x) + j(t, x) = 0
dt
by defining
 
v1
v2 

x=
i L  ,
iE


v1 − V (t)


0

j(t, x) = 
1
−i L − (v1 − v2 ) ,
R
il − i E


0
v2 − Li L 
.
q(t, x) = 


0
0
(3.21)
3.3.2 Matrix-vector formulation
Similar to nodal analysis, modified nodal analysis can also be presented in an equivalent matrix-vector
form[18]. This time, the system of linear circuit equations has the following form:
YR
C
B
D
V
I
=
J
F
.
(3.22)
In this system, Y R is the nodal admittance matrix containing only contributions of branches whose
currents are no output currents. Matrix B handles the contributions of output and/or controlling currents to the KCL, C contains the contributions of voltage-defined branches to the KVL and D contains
contributions of current controlled branches and/or branches with output currents to the BCR. Furthermore, V contains the node voltages, I contains the output and/or controlling currents, J contains the
independent source currents and F contains the independent source voltages and/or output currents.
Table 3.4 shows the stamps for the branches which are not covered by nodal analysis.
14
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Branch type (BCR)
Voltage source
(V = V (t))
V-def resistor
(V = V (i R ))
Capacitor
(i C = q̇C (v + − v − ))
Inductor
(v L = φ̇ L (i L ))
Variable
v+
v−
iV
+
v
v−
iR
v+
v−
+
v
v−
iL
Variable
v+
RHS
v
1
v+
-1
v−
1
v+
∗C
−∗C
v+
-1
v−
−∗C
∗C
v−
1
−1
−
iV
1
-1
V
iR
1
-1
V
iL
1
−1
−∗ L
Table 3.4: Stamps for modified nodal analysis.
The following example shows how to create this system of equations.
Example
The RL-circuit of figure 3.1 can represented in matrix-vector form by applying the stamps of tables
3.2 and 3.4:

  
 1
v1 :
0
− R1 0
0
1
v1
R
1
1

  
v2 : 
0
1
0
R
 v2   0 
− R





(3.23)
v0 :  0
0
0
−1 −1 v0  =  0 
.
iL :  0
1 −1 −∗ L 0  i L   0 
V (t)
iS :
iS
1
0 −1
0
0
To make the system determined, node 0 is grounded. As a result, the equation corresponding to v0
must be deleted together with v0 and the corresponding column. The resulting system is

  
 1
v1 :
0
− R1
0
1
v1
R
1
1

  
v2 : 
1
0
R
 v2  =  0  .
− R
(3.24)





0 
iL :
iL
0
1 −∗ L 0
V (t)
iS :
iS
1
0
0
0
3.4 Hierarchical approach
The examples of circuits discussed so far are all examples of circuits existing of elementary subcircuits: branches, which cannot be subdivided into other sub-circuits. In practice, one wants to study
large scale circuits. These large scale circuits usually are built on a modular design: the circuits
c Koninklijke Philips Electronics N.V. 2002
15
2002/817
Unclassified Report
c1
1
2
3
−
2
l1
r1
+
j1
Subm1
6
c2
0
0
Figure 3.2: A hierarchical circuit, with on the left side the supercircuit and on the right side the
sub-circuit Subm1 .
consist of several sub-circuits, which also consist of several sub-sub-circuits. The main purpose of
this modular design is to encapsulate certain characteristics which are not of interest and to hide the
actual design of sub-circuits. This simplifies the design of big, complicated circuits, and stimulates
re-usability to decrease the throughput time of a design phase.
However, modular design also has consequences for the way the system of circuit equations, arising
from the circuit, can be solved. The previous examples all used a so called plain strategy: the whole
circuit is described using one matrix. Spice, Specter and Titan are examples of simulators which use
the plain strategy. Modular design introduces another, alternative, approach: a hierarchical strategy.
Since the circuit consists of sub-circuits, it is natural to view those sub-circuits as sub-problems of the
original problem. This section will describe how this hierarchical approach works. Example circuit
simulators which use the hierarchical approach are Pstar and Brazil.
3.4.1 Hierarchical circuit equations
Figure 3.2 shows an example of a hierarchical circuit, which is in fact the hierarchical variant of the
circuit in figure 2.1. Although the sub-model in Figure 3.2 contains a loop, in general a sub-model
does not have to contain a loop. The highest level of a circuit, i.e. the circuit which is not the subcircuit of another circuit, is called circuit level. In general, a sub-circuit consists of two kinds of
variables: internal variables and terminal variables. Internal variables are variables which are of no
direct meaning to higher level circuits. In figure 3.2, node v3 is an example of an internal variable
of sub-circuit Subm1 . Terminal variables however, are variables which are needed in BCRs of higher
level circuits. Node v2 is a typical terminal variable of sub-circuit Subm1 , for it is needed on both the
sub-circuit and the circuit level.
It is important to note that a sub-circuit only communicates to his parent circuit in the hierarchy
16
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
through its terminal unknown. Within this parent circuit, such an unknown may again appear as
terminal unknown or as an internal unknown.
Sub-circuits are processed as if they were normal circuits, so the circuit equations for sub-circuit j
with n j variables stored in x( j ) ∈ Rn j read
f( j ) (t, x( j ) , ẋ( j ) ) :=
d ( j)
q (t, x( j ) ) + j( j ) (t, x( j ) ) = 0,
dt
(3.25)
with q( j ) , j( j ) : R × Rn j → Rn j . In matrix-vector notation, an equivalent system can be obtained, for
example, for sub-circuit Subm1 in Figure 3.2:

 

0
v0

v2 

) =  C2 (v2 − v3 )  ,
q(1) (

i L 

−Li L
v3
−C2 (v2 − v3 )

 

−i L − R1 (v3 − v0 )
v0

v2 

iL
) = 
.
j(1) (

i L 

v2 − v0
1
v3
(v − v0 )
R 3
(3.26)
Note that there is no need yet to ground a node. Furthermore, the current through the inductor, i L ,
has also become a terminal variable, as it is needed on the circuit level. In general, when there is a
complete connected list of branches from terminal a to terminal b, each i x which is treated as unknown
becomes a terminal variable. The super circuit is treated in the same way; here the sub-circuit is
viewed to as a black-box, whose contributions appear later:
 

 

 
v0
0
−I
v0
qSuper (v1 ) =  C1 (v1 − v2 )  ,
(3.27)
jSuper (v1 ) =  I  .
0
−C1 (v1 − v2 )
v2
v2
Before verifying that this hierarchical formulation indeed is equivalent to the plain matrix formulation,
some logistical questions must be answered. It is useful to divide the variables into two groups, namely
the internal variables and the terminal variables. In Pstar 2 the vector of unknown variables x( j ) ∈ Rn j
is divided such that y( j ) ∈ Rm j contains the m j terminal variables and z( j ) ∈ Rk j contains the k j
internal variables:
( j) y
x( j ) =
.
(3.28)
z( j )
At the circuit level, the internal variables of the circuit level are completed with the terminal variables
of all sub-circuits, resulting in the vector x at the circuit level:


y
 z(1) 


x =  . ,
(3.29)
 .. 
z(n)
where all the terminal variables of the sub-circuits are collected in y (without loss of generality, only
the case in which there are two levels in the hierarchy is discussed). The choice to store the internal
variables at higher indices than the terminal variables is an arbitrary choice. The consequences of this
choice for the way the system can be solved, will be discussed later.
2 Pstar is an analogue circuit simulator used by Philips.
c Koninklijke Philips Electronics N.V. 2002
17
2002/817
Unclassified Report
The variables in equation (3.26) are already ordered in the right way. On the circuit level, the vector
of unknowns becomes
 
v0
v1 
 

x=
v2  ,
i L 
v3
where node v0 is chosen to position the first index, for it will be grounded. In fact, node v0 acts as a
terminal variable on the circuit level. The terminal variables are in a particular way connected to or
associated with variables on a higher level, and possible also connected with variables in another subcircuit. These terminal variables can have other names in sub-circuits, so there should be a mapping
from the variables visible on the circuit level to the terminal variables of the sub-circuits. This mapping
is called B j for sub-circuit j , with B j ∈ Rn×m j defined as
B j (i, k) =
1 if node k of sub-circuit j is connected with node i of the super-circuit.
0 if node k of sub-circuit j is not connected with node i of the super-circuit.
(3.30)
So B Tj x results in the terminal variables of sub-circuit j in terms of super-circuit variables, i.e. y( j ) ,
while B j y( j ) converts the terminal variables of sub-circuit j to their associated variables at the supercircuit.
With these mappings B j , the complete set of circuit equations at the circuit-level becomes
f(t, x, ẋ) =
N
X
B j f( j ) (t, B Tj x( j ) , B JT ẋ( j ) )
j =1
=
N
X
B j q( j ) (t, B Tj x( j ) ) +
j =1
N
X
B j j( j ) (t, B Tj x( j ) )
j =1
= 0
(3.31)
for a circuit consisting of N sub-circuits.
Proceeding with the example, mapping B1 becomes
v0
v1
v2
iL
v3
18
v0

1
0

0

0
0
v2 i L

0 0
0 0

1 0
.
0 1
0 0
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Using mapping B1, the complete system for the circuit level can be constructed:
 
 
v0
v0
v1 
v1 
 
 

 
f(t, x, ẋ) = q(
v2 ) + j(v2 )
i L 
i L 
v3
v3

 

0
−i L − R1 (v3 − v0 ) − I

 

C1 (v1 − v2 )
I

 





= −C1 (v1 − v2 ) + C2 (v2 − v3 ) + 
iL


 

−Li L
v2 − v0
1
−C2 (v2 − v3 )
(v − v0 )
R 3
= 0.
Considering the matrix-vector notation, the plain matrix representation can be obtained from the subcircuits in the following way:
A =
N
X
B j A j B Tj ,
(3.32)
Bjxj,
(3.33)
Bjbj,
(3.34)
j =1
x =
N
X
j =1
b =
N
X
j =1
with A j , x j and b j corresponding to sub-circuit j .
3.4.2 Solving using the hierarchy
It is possible to construct a plain matrix out of all the sub-matrices from the sub-circuits, but this is
highly inefficient. Even if a sparse storage method is used, memory usage will grow and performance
will decrease (however, in [5] sparse pivoting promises good results). Besides that, the whole process
of constructing the matrix is unnecessary. Instead, a bottom up strategy is used, following the hierarchy of the circuit. In fact, Gaussian Elimination, necessary for the U L-decomposition, explained in
the next paragraph, can be done within the memory allocated for the matrices of the sub-circuits.
First, for all sub-circuits on the lowest level, an U L-decomposition is performed on the part of the
matrix concerning the internal variables, part A22 ∈ Rk j ×k j in equation (3.35).
A12 A12
A=
(3.35)
A21 A22
After that, the columns of the first m j rows, corresponding to the k j internal variables (part A12 ∈
Rm j ×k j ), are also eliminated to zero. Next, the part associated with the terminal variables, part A11 ∈
Rm j ×m j can be added to the corresponding part of the matrix of the higher level. Corresponding actions
are done with the right-hand side b. Proceeding in this way until the circuit level is reached, the result
is a matrix which is completely U L-decomposed and a right-hand side that contains U −1 b. Finally,
c Koninklijke Philips Electronics N.V. 2002
19
2002/817
Unclassified Report
to solve the system, one grounds one node on the circuit-level and solves the terminal variables,
followed by a top-down traversal of the sub-circuits to substitute the terminal values and solve the
internal values.
This way of performing Gaussian Elimination is optimal from a memory point of view, but is may not
be from a stability point of view when compared to a complete flat matrix approach.
3.4.3 Solving without using the hierarchy
Sometimes it is preferred to solve the system of equations in a non-hierarchical way. Typical examples are analysis methods which require iterative methods with pre-conditioning which do not work
within the hierarchy. So the system will be handled as if it were a plain system and consequently, the
decomposing of the sub-matrices is postponed until the circuit level. However, it is assembled first in
a hierarchical way, and it is no option to create a plain matrix directly, due to efficiency reasons.
An alternative way to use iterative methods with pre-conditioning is to modify the pre-conditioners
such that they fit into the hierarchical datastructure while maintaining as many of the properties they
were constructed for.
20
c Koninklijke Philips Electronics N.V. 2002
Chapter 4
Circuit analysis
The most important result of the previous chapter is the observation that the contributions of all circuit
elements together form a system of first order, nonlinear, ordinary differential-algebraic equations:
f(t, x, ẋ) ≡
d
q(t, x) + j(t, x) = 0.
dt
(4.1)
A differential-algebraic equation (DAE) contains both differential equations (due to for instance a
capacitor) and algebraic equations (due to for instance a resistor). The fact that a system of differentialalgebraic equations is concerned, has consequences for the way the system is solved. The following
sections discuss several kinds of analysis and the way to solve the resulting systems.
4.1 Direct current analysis (DC)
Direct current analysis, often referred to as DC analysis, concerns the steady-state solution (or DC
equilibrium) of a circuit. In the steady-state, all currents are direct currents, i.e. time-independent
currents. Although all sources of excitation are constants, circuits which behave as oscillators will
have a periodic steady-state solution instead, which is still time-varying. However, assuming that a
time-independent steady-state (x DC ), which is described by
ẋ DC = 0,
(4.2)
f(t, x, 0) = f DC (x, 0) = 0.
(4.3)
exists, equation (4.1) can be reduced to
Furthermore, in this steady-state situation, all time-derivatives are zero:
∂
∂
q DC (x) +
q (x)ẋ + j DC (x)
∂t
∂ x DC
= j DC (x).
f DC (x, ẋ) ≡
(4.4)
Hence, a capacitor acts in a DC analysis as an open circuit, while an inductor acts as a short circuit.
An open circuit leads to a singular matrix G; therefore, Pstar bypasses each capacitor with a small
resistor during DC analysis.
21
2002/817
Unclassified Report
To perform a DC analysis, the (in general non-linear) system to solve is f DC (x, 0), which is
j DC (x) = 0.
(4.5)
The solution x DC of (4.5), also called DC-operating point, represents the circuit variables in the DC
steady-state.
Equation (4.5) in general is a nonlinear algebraic equation. If a linear circuit is concerned, equation (4.5) of course is a linear algebraic equation. In that case, the corresponding system of linear
equations can be solved by a linear solver. Pstar uses a U L-decomposition to solve the linear system. U L-decomposition is a variant of the well-known LU decomposition. The difference is that
U L-decomposition decomposes a matrix A into A = U L with U an upper-triangular matrix and L a
lower-triangular matrix.
The nonlinear case is more difficult, for the equations can be strongly nonlinear. In practice, the
Newton method is usually used, as it is sufficient to find the roots of equation (4.5). Globally, the
Newton method works as follows. The function j(x) can be expanded in a Taylor series about xi :
j(x) = j(xi ) + Dj(xi )(x − xi ) + O(||x − xi ||22 ),
(4.6)
where Dj(x) is the Jacobian matrix for j:
(Dj(x))ik =
∂ j (x)i
.
∂ xk
(4.7)
The Newton method now tries to solve
j(xi ) + Dj(xi )(x − xi ) = 0,
(4.8)
which suggests to choose x = xi+1 and perform the following iteration, starting with x0 :
Dj(xi )xi+1 = Dj(xi )xi − j(xi ).
(4.9)
At each step of the iteration, a linear system must be solved by for instance U L-decomposition. The
stop criterion is based on a maximum number of iterations, a tolerance value for the norm of j(xi )
and a tolerance value for the difference between xi+1 and xi . For non-singular Jacobian matrices, the
Newton method can be proved to converge quadratically[30]. If the Jacobian matrices are singular,
the Newton method converges linearly at best.
It is not difficult to combine the Newton method with the hierarchical solver. Schematically, the
system on the circuit-level is
D11 D12
yi+1
t
= i ,
(4.10)
D21 D22
zi+1
bi
t
where i is the right-hand side of equation (4.9), the upper part corresponds to the equations for
bi
the terminal variables and the lower part corresponds to those for the non-terminal variables. Note
that this is not the usual way to arrange the system, but that it is only used for notational convenience.
In equation (4.9, many coordinates at the right-hand side may be zero due to an abundance of linear
elements. The system on the sub-circuit is
!
!
!
(1)
(1)
(1)
D11
yi+1
ti(1)
D12
=
,
(4.11)
(1)
(1)
(1)
D21
D22
zi+1
bi(1)
22
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
where in this case also, the upper part corresponds to the terminal variables and the lower part corresponds to the internal variables.
(1)
(1)
As described in section 3.4, an U L-decomposition is made for D22
, D12
is eliminated to zero and the
(1)
(1)
(1)
(1)
(1)
right-hand side ti and D11 are updated accordingly to t̃i and D̃11 . Next, D̃11
is added to D11 and
(1)
ti is added to ti on the circuit-level:
(1)
D11 + D̃11
D21
D12
D22
yi+1
ti + t̃i(1)
=
.
zi+1
bi
(4.12)
(1)
Note that it is assumed here that yi+1 only contains all terminal variables that are also present in yi+1
.
In general this will not be the case, so some extra bookkeeping should be done. Next, a node is chosen
to be grounded, the internal variables are solved together with the terminal variables, which are passed
to the sub-circuit, as described in section 3.4.
Parts of circuits which are only connected by capacitors are called floating areas. It has been remarked
that capacitors are bypassed by small resistors during DC analysis. In fact, the contribution of a
capacitor becomes ρC, where C is the sub-matrix containing the original contribution of the capacitor
and ρ is a scaling factor such that ρC represents an equivalent resistor.
4.2 Small signal analysis (AC)
Small signal analysis, sometimes referred to as alternating current analysis (AC analysis), studies
perturbation of the DC solution due to a small time-varying sine-wave excitation. Therefore, it is
best suited to linear circuits or nonlinear circuits behaving in a linear way with respect to the small
excitation. Because many interesting circuits operate this way, small signal analysis is of great value
in practice.
Small signal analysis starts with the steady-state solution x(t) = x DC of a time-independent system
f DC (x, ẋ) =
∂
q(x)ẋ + j(x) = 0,
∂x
(4.13)
i.e. x(t) = x DC is the solution of
f DC (x, 0) = 0.
(4.14)
Hence AC analysis starts with finding x DC . Although it is not guaranteed that such a solution exists,
this is assumed for the rest of this section.
The small signal analysis problem now is formally defined by the system, which results from adding
an independent small sine-wave excitation e(t) to system (4.13):
f AC (x, ẋ) = f DC (x, ẋ)
d
=
q(x) + j(x)
dt
= e(t),
(4.15)
where x AC (t) = x DC + x(t) and e(t) is independent of x. The next step is to expand the functions q
and j in a Taylor series about the steady-state solution (or operating point) x DC . If the excitation e(t)
c Koninklijke Philips Electronics N.V. 2002
23
2002/817
Unclassified Report
is small enough, second order and higher terms in x can be omitted from the Taylor expansions of q
and j. The result of this linearisation about the operating point is
d
∂q ∂j q(x DC ) +
x(t) + j(x DC ) + x(t) = e(t).
dt
∂x x DC
∂x x DC
(4.16)
∂q = 0, that the Jacobian matrix ∂ x is also
x DC
time-independent, and that the steady-state solution j(x DC ) is the solution of j(x) = 0 the above
equation reduces to
Observing that q(x DC ) is constant, and thus
d
q(x DC )
dt
∂q ∂j ẋ(t) + x(t) = e(t).
∂x x DC
∂x x DC
(4.17)
The two Jacobian matrices are often referred to as C and G:
C =
G =
∂q ∂x x DC
∂j .
∂x x DC
(4.18)
(4.19)
The system of linear ordinary differential algebraic equations, which describes the small signal behaviour of the circuit, becomes
C ẋ(t) + Gx(t) = e(t).
(4.20)
System (4.20) is commonly solved in the frequency domain. To that end, the real system (4.20)
is transformed to a complex system by writing x(t) = X exp(iωt) and e(t) = E exp(iωt), with
X, E ∈ R, i the complex unity and ω the angular frequency:
C
d
X exp(iωt) + G X exp(iωt) + E exp(iωt) = 0.
dt
(4.21)
In general, for small signals e(t) = E exp(iωt) it is assumed that ||e(t)|| = |E| << 1. In practice,
however, the signal e(t) is scaled such that E = 1, to avoid problems when computing with a value of
|E| near the machine precision.
After propagating the derivative, the result is a linear algebraic complex system:
(iωC + G)X = E.
(4.22)
This system is usually solved by LU -factorisation, which is the same for complex and real matrices or
an iterative linear matrix solver like conjugate gradients (for a complex conjugate gradients, see [2]).
The solution of the original system (4.20) is the real part of the complex solution of system (4.22).
In practice, designers want to study the behaviour of the circuit over a range of frequencies, so the
system should be solved for a range of values of the angular frequency ω. This process is called a
frequency sweep. As a result, the complex matrix iωC + G must be re-factored for each value of
ω. However, as this usually is done symbolically, and the structure of the matrix is unchanged for
different values of ω, one symbolic factorisation is sufficient for the whole range of frequencies.
24
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
4.3 Transient analysis (TR)
In practice, most of the circuits are designed for real-life applications. In those cases, a full timedomain simulation of the circuit is desired. This is called transient analysis (TR analysis). The
response of the circuit is determined over a time interval [0, T ], computed as the solution of the DAE
f(t, x, ẋ) :=
d
q(t, x) + j(t, x) = 0.
dt
(4.23)
In general, two types of problems arise in transient analysis. The first one is an initial value problem
(IVP). In that case, the initial state of the circuit x(t0 ) = x DC is the operating point or steady-state
solution of the circuit (which has to be solved first). However, as stated earlier, a steady-state solution
is not always a constant solution. In some important cases, a so called periodic steady-state can be
found to. Here, the initial state of the circuit is equal to its state at a time equal to the period T p :
x(t0 ) = x(t0 , T p ). This type of problem is called a two-point boundary value problem and is common
in periodic steady-state analysis[16].
A common way to solve the DAE representing the circuit equations is a combination of an implicit
integration method and a nonlinear solver. The integration method is often chosen from the class of
linear multi-step methods. A linear s-step method computes approximate solutions q(ti , xi ) at the
times ti = t0 + i1t using the following formula:
q(ti+s , xi+s ) ≈
s−1
X
α j q(ti+ j , xi+ j )
j =0
+ 1t
s
X
j =0
βj
d
q(ti+ j , xi+ j ),
dt
with s ≥ 1 an integer, α0 , α1 , . . . , αs−1 ∈ R and β0 , β1 , . . . , βs ∈ R. Furthermore, α02 + β02 > 0 to
ensure that it is a real s-step method and not a s 0 -method with s 0 < s. The well known Euler Forward
scheme is given by s = 1, α0 = β0 = 1 and β1 = 0, while the implicit Euler Backward scheme is
given by s = 1, α0 = β1 = 1 and β0 = 0. In general, a linear s-step method is called implicit if
βs 6 = 0 and explicit if βs = 0.
The derivative
d
q(t, x)
dt
of system (4.23) in iteration i + 1 can be approximated by
d
q(ti+s , xi+s ) ≈
dt
−
s
−1 X
α j q(ti+ j , xi+ j )
βs 1t j =0
s−1
1 X
d
β j q(ti+ j , xi+ j ),
βs j =0 dt
(4.24)
with αs = −1. For notational convenience, bi+s is defined as
bi+s
c Koninklijke Philips Electronics N.V. 2002
=
s−1
−1 X
α j q(ti+ j , xi+ j )
βs 1t j =0
−
s−1
1 X
d
β j q(ti+ j , xi+ j ),
βs j =0 dt
25
2002/817
Unclassified Report
such that equation (4.24) can be rewritten as
d
1
q(ti+s , xi+s ) ≈ bi+s +
q(ti+s , xi+s ).
dt
βs 1t
(4.25)
Note that bi+s is known during iteration i + s. Substitution of (4.25) in equation (4.23) results in a
non-linear algebraic equation for xi+1 ,
bi+s +
1
q(ti+s , xi+s ) + j(ti+s , xi+s ) = 0,
βs 1t
(4.26)
which can be solved for xi+s by a Newton method. The coefficient matrix of this Newton process is
the Jacobian of the left-hand side of equation (4.26), namely
1
C(t, x) + G(t, x).
βs 1t
(4.27)
Hence, each time-step a non-linear equation has to solved.
Explicit integration methods usually give poor results for (stiff) DAEs, so one is restricted to the use
of implicit linear multi-step methods.
4.4 Pole-zero analysis (PZ)
Three kinds of analysis methods have been discussed in the previous sections. DC analysis (DC)
tries to compute an equilibrium state of the circuit, transient analysis (TR) studies the time domain
behaviour of the circuit and small-signal analysis (AC) is used to compute the linearised response of a
circuit to a sinusoidal input source. Pole-zero analysis (PZ), which concerns a study of circuit transfer
functions, will be introduced in this section and described in detail in Chapter 6.
4.4.1 Transfer function
Pole-zero analysis is a method to compute the dynamic response of a circuit variable or expression to
small pulse excitations by an independent source. This response is described by the transfer function,
which is usually presented in its pole-zero representation. Although pole-zero analysis seems to be
the same as AC analysis, there is an important difference. Where AC analysis only computes the
linearised response of a circuit to a sinusoidal input source, pole-zero analysis provides with the polezero representation additional stability properties of the circuit, as will become clear in the following
sections.
The circuit transfer function H(s) describes the response of a linear circuit to source variations in the
Laplace (i.e. frequency) domain. The transfer function H(s) is defined by
H(s) =
L(zero-state response)(s)
,
L(source variation)(s)
(4.28)
where L( f )(s) is the Laplace transform of a function f defined in the time domain and s is the (complex) variable in the frequency domain. The zero-state response denotes the response to the stationary
solution (i.e. the solution when all free oscillations have damped out). The zero-state response does
not depend on the initial conditions, but depends only on the excitation.
26
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
As will be described in the Chapter 6, the transfer function H(s) can be obtained by applying a Laplace
transformation to the small-signal MNA model (4.22):
H(s) = (sC + G)−1 .
(4.29)
4.4.2 Physical applications
Each entry Hoi (s) of the matrix H(s) will be represented in terms of poles, zeroes and residues.
Each Hoi (s) corresponds with the zero-state response of one circuit variable to a variation of a single
source variable. The poles of Hoi (s) represent the natural frequencies of a circuit transfer function
H(s). The natural frequencies in the Laplace domain give information over damping and amplification
properties in the time domain. As a consequence, the location of the natural frequencies of a circuit in
the Laplace domain determines the physical stability of a linear circuit and are thus of great interest.
The natural frequencies are the frequencies of the free oscillations of a circuit (i.e. oscillations due
solely to capacitors and/or inductors). The zeroes in general have no specific interpretation, but in
case of input impedances and admittances, the zeroes also tell something about the stability[6].
4.4.3 Numerical strategies
There are two different strategies to perform pole-zero analysis. The first strategy is called the separate
pole-zero approximation[32] and tries to locate the roots of polynomials, which are the determinants
of two linearly parameterised square matrices. As will become clear in chapter 6, this strategy can
be proven to be equivalent to solving a generalised eigenvalue problem. A generalised eigenvalue
problem is defined by calculating the eigenvalues λg of
Gx = λg Cx.
(4.30)
The remainder of this report will mostly be dedicated to numerical iterative solvers for generalised
eigenvalue problems, applied to pole-zero analysis.
The second strategy is called the combined approximation of the pole-zero pattern. Here, the transfer
function is treated like a rational function, as it is the quotient of two polynomials. There are several
numerical analysis methods for rational functions which approximate the poles and zeroes of the
transfer function.
4.5 Stamps for the C and G matrices
The importance of the matrices C and G, defined by
C(t, x) =
∂q(t, x)
∂j(t, x)
and G(t, x) =
.
∂x
∂x
(4.31)
has become clear in the above sections. In DC analysis, the matrix G(x, 0) is used in the Newton(Raphson) process to find the roots of f(x, 0). In AC analysis, the matrix iωC(t, x) + G(t, x) forms
together with the vector of unknowns and the right-hand-side the linear algebraic system to solve.
Transient analysis also needs to solve a linear system each time-step, where the matrices C and G are
needed. Finally, the matrices are of great importance for pole-zero analysis, as the transfer function
(4.29) is defined in terms of C and G.
c Koninklijke Philips Electronics N.V. 2002
27
2002/817
Unclassified Report
Branch type (BCR)
Voltage source
(V = V (t))
Resistor
(i R = I (v))
Variable
v+
v−
iV
v+
v−
V-def resistor
(V = V (i R ))
+
v
v−
iR
Capacitor
(i C = q̇C (v + − v − ))
Capacitor
(i C = q̇C (v + − v − ))
(qC = Q unknown)
Inductor
(v L = φ̇ L (i L ))
Inductor
(v L = φ̇ L (i L ))
(φ L = 8 unknown)
G
+
v
0
0
1
v+
dI
dv
− ddvI
+
−
C
v
0
0
-1
v−
- ddvI
iV
1
-1
0
v
0
0
-1
iR
1
-1
− ddiV
dI
dv
−
v
0
0
1
v+
v+
v−
+
v
v−
qC
+
v
v−
iL
v+
v−
iL
φL
v+
0
0
dq
dv
+
v
0
0
1
v+
0
0
1
0
v−
0
0
− dq
dv
v−
0
0
−1
v−
0
0
−1
0
qC
0
0
1
iL
1
−1
0
iL
1
−1
0
− dφ
di
dq
dv
− dq
dv
+
v
dq
dv
− dq
dv
φL
0
0
0
1
0
v+
0
0
0
v+
0
0
0
0
v−
− dq
dv
dq
dv
−
v
− dq
dv
dq
dv
0
v−
0
0
0
v−
0
0
0
0
qC
0
0
0
iL
0
0
L
− dφ
di
iL
0
0
0
0
φL
0
0
−1
0
Table 4.1: Contributions to C and G of several branches.
According to their definition, contributions of branches to C and G can be directly deducted from
their contributions to j and q, which are given in tables 3.1 and 3.3 in chapter 3. Table 4.1 shows
a complete overview of the contributions of the branches discussed sofar, except the current source,
which has no contribution. Note that the inductor with unknown flux and the capacitor with unknown
charge give asymmetric contributions to C and G respectively, while all other branches give symmetric
contributions. The matrices G ( j ) and C ( j ) are related to G and C of the circuit in the following way:
G(t, x) =
N
X
B j G ( j ) (t, B Tj x)B Tj ,
j =1
C(t, x) =
N
X
B j C ( j ) (t, B Tj x)B Tj ,
j =1
where B j is the association matrix as defined in section 3.4.
28
c Koninklijke Philips Electronics N.V. 2002
Chapter 5
Terminal currents
As outlined before, controlled branches are branches whose branch equations contain variables of
other branches or other nodes than their end nodes. An example of a controlled branch, a currentcontrolled voltage-defined branch in particular, is a voltage-defined resistor where the voltage is a
function of the current through a terminal node of a sub-circuit. These currents are called terminal
currents. Figure 5.1 shows a part of a circuit with a sub-circuit and a terminal current-controlled
voltage-defined resistor.
6
3
r3 : R
−
1
r1 : i = v 2 + v
2
r2
v = i (v1 )2 + i(v1 )
+
e1 : V
0
Figure 5.1: A part of a circuit with a sub-circuit (inside the box) and a terminal current-defined resistor.
The terminal current i(v1 ) is needed on the circuit level to evaluate the voltage over resistor r1 . There
are three possibilities to take care of this, namely
1. Add an extra equation for i(v1 ) to the equations of the circuit level, or,
29
2002/817
Unclassified Report
2. Push the terminal current i(v1 ) as extra (terminal) unknown to the set of equations of the subcircuit, which results in an equation for i(v1 ) on the sub-model.
3. Eliminate the terminal current i(v1 ) on the circuit level.
The three alternatives will be worked out in the next sections for the circuit in figure 5.1.
5.1 The terminal current as unknown on the circuit level
Currently, Pstar uses this alternative to handle terminal currents. To make clear what happens actually,
modified nodal analysis is applied to the example circuit in Figure 5.1, to perform a DC analysis. For
the sub-circuit, the vector j(1) of resistive contributions and its Jacobian G (1) read

 

I (v1 − v3 )
v1
 and
j(1) (v2 ) = 
− R1 (v3 − v2 )
1
−I (v1 − v3 ) + R (v3 − v2 )
v3
 dI

 
0
− ddvI
v1
dv
1
G (1) (v2 ) =  0
− R1  ,
R
dI
dI
1
v3
− dv − R dv + R1
with I (v) = v 2 + v, while j and G for the circuit level are







i(r2 ) − i(e1 )
0
0
v0
v0
(1)
 i(e ) + j̃ (v ) 
(1)
 v1 




1
1 


 0 G̃ v1 v1
 v1 


 v2 

0

∗
i(r
)
+
∗
v


2
2 

j(
)=
 and G(
 i(r2 ) ) = 



0
−v0 + v2 − V (i)


−1
 i(r2 ) 


i(v1 )
 0 G̃ (1)
i(v1 )
−i(v1 ) + j̃(1)(v1 )
v1 v1
i(e1 )
i(e1 )
−1
1
−v0 + v1 − V
0
G̃ v(1)
1 v2
∗
1
G̃ v(1)
1 v2
0
1
0
0
0
1
0
0 − ddiV
0 −1
0
0

−1
1

0
,
0

0
0
with V (i) = i 2 + i. The entries ∗ take their values from the U L-decomposed matrix and corre(1)
spondingly updated right-hand side of the sub-circuit, while the term j̃ (v1 ) denotes the swept entry
corresponding to v1 on the sub-circuit and G̃ v(1)
is the swept entry in row v1 , column v1 on the
1 v1
sub-circuit. Internal variable v3 of the sub-circuit does not exist on the circuit level and hence does
not appearPin the equations for the circuit level. Note that the equation corresponding to i(v1 ) is
−i(v1 ) + k g(vk ) = 0, for all branches k of the sub-model incident on v1 , where g(vk ) results from
the BCR of the branch. Without loss of generality, v1 is assumed to be the positive end node of branch
−
− 2
−
k, hence vk = v1 − v − , such
P that g(vk ) reads g(v1 − v ) = (v1 − v ) + (v1 − v ) explicitly. Thus, the
coefficients for the part k g(vk ) are the same as the inherited coefficients for v1 , as can be observed
from the rows of G corresponding to v1 and i(v1 ) respectively (the contributions of all branches of
the sub-circuit incident on v1 together form the terminal current i(v1 ), as there isPno accumulation of
charge in a node). The equation is completed with −i(v1 )Pto make −i(v1 ) + k g(vk ) = 0. The
right-hand side of the equation for i(v1 ) is the evaluation of k g(vk ) (provided from the equation for
v1 ) added by the value of −i(v1 ) at the previous iteration.
The Newton process starts with the zero vector as the first approximation. During iteration i, G i and
ji are evaluated at xi and the equation for xi+1 then reads
G i xi+1 = G i xi − ji .
30
(5.1)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Following the hierarchy in a bottom-up manner, first the Jacobian G (1) of the sub-circuit is partially
U L-decomposed. This gives no problems. The remainder of the Newton process is described in
section 4.1.
Possible difficulties may arise when the sub-circuit is an open circuit, for instance in the case of two
series connected capacitors. Formally, the matrix G (1) of the sub-circuit would be a 3 × 3 zeromatrix and thus singular. However, as Pstar bypasses capacitors with resistors during DC analysis,
this problem does not count.
To summarise the handling of terminal currents on the circuit level, a general description follows. If
the
corresponding node vt , i t will be equal to
P terminal current of sub-circuit S is denoted i t , and its P
g(v
)
for
all
branches
k
of
S
incident
on
v
.
This
sum
k
t
k
k g(vk ) is exactly the contribution of the
sub-circuit to the KCL for node vt . Consequently, the row of the JacobianP
matrix corresponding to i t
k)
on the circuit level, is equal to the contribution for vt of the sub-circuit ( k dg(v
) and a −1 in the
dvk
column corresponding to i t . The right hand side is equal to the contribution of the sub-circuit for vt
minus the value of i t . Equation (5.2) shows the situation schematically.


 
..

 (1)



.
j̃t + ĵt 
 G̃ v(1)v
 vt 
t t
vt 
 . 

 






j( ) =  ..  and G(vk ) = 

 (1)

 it 
 G̃ (1)
j̃ − i t 
vt vt
 
it
t

..
..
.
.
..
.
 

G̃ v(1)
t vk
G̃ v(1)
t vk


.

−1 
(5.2)
(1)
The term j̃t denotes the swept entry corresponding to vt on the sub-circuit, ĵt denotes the contribution
to the KCL for vt on the circuit level and G̃ v(1)
is the swept entry in row vt , column vt on the subt vt
circuit.
5.2 The terminal current as unknown on the sub-circuit
This section describes the situation which arises if the terminal current is pushed one level down to
the sub-circuit as extra terminal unknown. A corresponding equation arises in the system for the subcircuit, which can be solved in the usual manner. This time, the right hand side is processed correctly,
i.e. not forgotten. The vector j(1) and the Jacobian matrix G (1) of the sub-circuit are



I (v1 − v3 )
v1

 v2 

− R1 (v3 − v2 )
) = 

j(1) (

i(v1 )

I (v1 − v3 ) − i(v1 )
1
v3
−I (v1 − v3 ) + R (v3 − v2 )

 dI


0
0
− ddvI
v1
dv
1

 v2 
0
− R1 
R
) =  d0I
,
and G (1) (

i(v1 )
0 −1
− ddvI 
dv
v3
− ddvI − R1 0 ddvI + R1

c Koninklijke Philips Electronics N.V. 2002
31
2002/817
Unclassified Report
and the equivalent vector and matrix for the circuit level are







0
0
v0
i(r2 ) − i(e1 )
v0
(1)
 v1 





i(e1 ) + ∗




 v1 
 0 G̃ v1 v1
 v2 


 v2 

∗
i(r
)
+
∗
2
) = 
 and G(
) =  0
j(
 i(r2 ) 
−v0 + v2 − V (i)
 i(r2 ) 
−1
0







i(v1 )





∗
i(v1 )
0 G̃ i(1)
t v1
−v0 + v1 − V
i(e1 )
i(e1 )
−1
1
0
G̃ v(1)
1 v2
∗
1
G̃ i(1)
t v2
0
−1
0
0
0
1
0
0 − ddiV
0
∗
0
0

−1
1

0
,
0

0
0
with G̃ i(1)
the swept entry in row i t = i(v1 ), column i t = i(v1 ). Note that G̃ i(1)
= G̃ v(1)
and
t it
t v1
1 v1
(1)
(1)
G̃ it v2 = G̃ v1 v2 .
It is clear that the only difference with the previous section is that the equation for the terminal current
is completely computed at the sub-circuit. In fact, this strategy may be more expensive than the
strategy of the previous section, as now one extra row has to be swept on the sub-circuit, although it
can directly be derived from the equation for the terminal node.
Two remarks have to made to the previous two sections. The first concerns the situation which arises if
one wants to solve the circuit equations with a plain method. As described, the hierarchy is still used,
but the matrices of sub-circuits are not decomposed in advance, as the complete matrix is considered
as a plain matrix. In this case it does not make any difference whether the terminal current is added as
unknown to the circuit level or to the sub-circuit.
The second remark applies to the situation in which there are more than two hierarchical levels. Suppose that the terminal current on a sub-circuit also depends on a sub-sub-circuit. It is readily verified
that both approaches give correct results. The contributions to the terminal node, and thus the terminal
current, are still equal in this situation. In fact, every branch is an elementary sub-circuit and the only
difference between bigger sub-circuits and elementary sub-circuits is that the contributions of the bigger sub-circuits contain contributions of more than one branch. These contributions are still defined
in terms of terminal unknowns, so the above strategies also work for deeply nested circuits.
5.3 Eliminating the terminal current on the circuit level
The strategy used in some earlier releases of Pstar contained an alternative approach. The terminal
current was neither added as unknown variable to the circuit level nor as unknown variable to the
sub-circuit. On the circuit level, the unknown i(v1 ) (refer again to Figure 5.1), was simply eliminated
by evaluating i(v1 ) as the contribution to the right-hand side for the equation corresponding to v1 of
the sub-circuit, after U L-decomposing, i.e. the contribution of the sub-circuit, in terms of terminal
variables, to the KCL for node v1 . For instance, assume that the right-hand side corresponding to v1
on the sub-circuit before decomposing is equal to A and that the right-hand side after decomposing
is equal to A − B. The value B is a sum of products of right-hand side values corresponding to
internal variables with Gaussian Elimination factors. In the limit, or when only linear components
are considered for the sub-circuit, B will be equal to zero, as all right-hand side values corresponding
to internal variables will be zero (to satisfy the KCL) and hence the described approach will suffice.
However, in the case of non-linear components, B is non-zero during the iterating process. The result
is that instead of solving the system
F(x) = 0,
32
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
one tries to solve the perturbed system
F̃(x) = F(x + g(x)) = F(y) = 0,
which has Jacobian matrix
∂ F̃
∂F
∂g
=
(1 +
).
∂x
∂y
∂x
Note that a requirement is that 1 + ∂∂gx is invertible. If in the limit g(x) = 0, one automatically has
F(x) = 0, so the Newton process will converge to the correct solution. However, it is difficult to make
a statement about the rate of convergence. If the matrix G = ∂∂ yF is diagonal dominant (in general it
is), pivoting will give no problems and the convergence will be quadratic. A possible danger of this
strategy is the situation which occurs if ∂∂gx = −1 and hence ∂∂ xF̃ = 0 or if ∂∂gx ≈ −1. In this case, the
whole Newton process halts.
5.4 Multi-level Newton-Raphson method
In Section 4.1, the Newton-Raphson method has been described as a method to solve the circuit
equations for DC analysis, which fits in the hierarchy. In [15], a variant of the multi-level NewtonRaphson method (MLNR) is presented, making use of terminal currents. Although it is originally
designed for parallel circuit simulation, it has some interesting features.
Like in Section 5.2, the terminal current is added to the sub-circuit as an independent variable and
will, together with the terminal variables, appear on the circuit-level. If the internal variables of subi
circuit i are represented by x(i) , the terminal variables by x(i)
E and the terminal currents by i , the nodal
equations for the internal and connection nodes become
j(i) (x(i) , x(i)
E ) = 0,
(i)
(i)
j(i)
E (x , x E )
+ ii
= 0.
(5.3)
(5.4)
On the circuit-level, the nodal equations become
j E (x, x E , i) = 0.
(5.5)
The MLNR presented in [22] has inner and outer iterations. During the inner iteration, only internal
variables are iterated, while the connection variables are kept constant:
(i)
(i)
(i) −1 (i)
(i)
(i)
xk,
j +1 = xk, j − (G k, j ) jk, j (xk, j , x E k ).
(5.6)
The outer iterations have indices k, the inner iterations have indices j . The inner iterations are stopped
at a certain error level. The outer iterations iterate over the terminal variables and treat the sub-circuits
as macro-models. Despite low communication costs, the work balance is poor, because the number of
internal iterations can vary for the sub-circuits. In [15], the work balance is improved by defining a
maximal number of J iterations. Furthermore, the local convergence becomes quadratic because the
outer iterations treat all variables instead of only the terminal variables, thereby improving the global
convergence, at the cost of more expensive iterations. During the inner iterations, the connection
variables are still kept constant. Algorithm 1 sketches a template for the new MLNR method, where it
c Koninklijke Philips Electronics N.V. 2002
33
2002/817
Unclassified Report
Algorithm 1: The new Multi-level Newton-Raphson method.
Set x0 and J .
for k = 1, 2, 3, . . . until convergence do
for j = 0, 1, 2, . . . , J − 1 do
(i)
(i)
(i) −1 (i)
(i)
(i)
xk,
j +1 = xk, j − (G k, j ) jk, j (xk, j , x E k )
xk+1,0 = xk,J − (G k,J )−1 jk,J (xk,J )
is noted that terminal currents are embedded in x E and f. The convergence is measured by ||xk+1,0 −
xk,0 ||2 . Convergence of the inner iterations can be controlled by step-size adjusting and terminal
currents. The terminal currents as independent variables decouple the f(i) and f(i)
E from variables of
other sub-circuits, while f E is constant during inner iterations. The inner iteration should be stopped
where the error
"
#
(i)
jk,
(5.7)
(i) j +1 (i) j Ek, j +1 + ik is decreased most. The outer iterations starts from the point where error is minimised (or equal to the
starting error).
The simulation results are promising. Increasing the number of inner iterations decreased the running
time and appeared to be a good load balancing method, which may be used in other analyses where
the computational costs are dominating the communication and preprocessing costs. However, the
number of processors should be chosen with care to avoid dominating communication costs, while
the load balance may not be that good for circuits with varying sub-circuits (in terms of linearity).
A remark should be made concerning implementation in Pstar. Because of the hierarchical implementation in Pstar, there may be problems when computing in parallel. Different sub-circuits share
representations of common sub-models and hence cannot be seen as independent sub-circuits. As a result, certain computations must be done sequentially. This can be solved by providing each sub-circuit
with a unique representation of the sub-model. The shared representation has been implemented for
memory saving reasons.
34
c Koninklijke Philips Electronics N.V. 2002
Chapter 6
Pole-zero analysis
6.1 Introduction
In Section 4.4, already a short introduction has been given to pole-zero analysis. Pole-zero analysis
is described as a method to compute the dynamic response of a circuit variable or expression to small
pulse excitations by an independent source, applied to a linear(ised) circuit. The main difference
between AC analysis and pole-zero analysis is the extra stability information pole-zero analysis gives
about the circuit. This stability information can be extracted from the transfer function by means of
the natural frequencies of the circuit. The natural frequencies are important for physical properties
like damping and amplification in the time domain.
This chapter will focus both on the theory behind pole-zero analysis and the numerical formulation of
the pole-zero analysis subject.
6.2 Basic theory of pole-zero analysis
The whole concept of pole-zero analysis is based upon the dynamic response (output) of a circuit
variable or expression to small pulse excitations (input or input-expression) of an independent source.
For example, Figure 6.1 again shows a simple low pass filter. Why this circuit is called a low pass
filter, will become clear soon. Instead of performing MNA on this circuit, the circuit equation is,
without loss of generality, studied in a one-dimensional form
d
q(t, x) + j (t, x) = e(t).
dt
(6.1)
For the above example, with a periodic voltage source V (t) = V0 exp(iωt), one is interested in the
response current and/or response voltage. Considering the response voltage Vo (t) = VC (t) of an input
voltage Vi (t) = V (t), equation (6.1) becomes
Ri R + VC (t)
≡
RC dtd Vo (t)
= V0 exp(iωt)
+ Vo (t) = V0 exp(iωt).
The particular solution of this equation is given by Vo (t) =
the circuit.
35
V0
1+iωRC
exp(iωt), which is the response of
2002/817
Unclassified Report
r1
2
−
1
6
c1
+
e1
0
Figure 6.1: A simple RC-circuit, which acts as a low pass filter.
The transfer function of a circuit is defined by
H (t) =
y(t)
,
x(t)
(6.2)
where y(t) is the response due solely to input sources and x(t) is the excitation. If the input signal
varies exponentially with time, i.e. x(t) ∼ exp( pt), then, because of linearity, the output signal y(t)
also varies exponentially, with the same exponential, with time (in the limit). The transfer function
then becomes
H ( p) =
y(t)
,
x(t)
(6.3)
where the transfer function is independent of the time-variable t. If, for instance, x(t) ∼ exp(iωt),
H ( p) will be a function of p = iω. When known, the rate of proportion H ( p) can be used to compute
the response to any excitation. Note that the dimension of the transfer function can take various values,
as the response y(t) and the input x(t) do not necessarily have to be the same quantity (the response
is either an unknown current or an unknown voltage, while the excitation is either a (known) source
current or (known) source voltage). Table 6.1 shows the four possible combinations.
Quantity
input voltage
input current
response voltage
H dimensionless
impedance ([H ] = Ohm)
response current
admittance ([H ] = Ohm−1 )
H dimensionless
Table 6.1: The four possible combinations for the transfer function H .
The transfer function for the ratio between Vo (t) = VC (t) and Vi (t) = V (t) for the example circuit
36
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
becomes
H (iω) =
=
=
Vo (t)
V (t)
V0
1+i RCω
exp(iωt)
Vo exp(iωt)
1
.
1 + iω RC
(6.4)
√
A couple of observations can be made. First, the amplitude of the response Vo (t) is a factor 1 + ω2 R 2 C 2
smaller than the excitation. Note that this factor is a function of ω. Secondly, there is a phase shift
between the response and the excitation. This pase shift is given by arg H (iω) = arg 1 − arg(1 +
iω RC) = − arctan(ω RC), which is also a function of ω. With this information, it is clear why the
RC-circuit in Figure 6.1 is called a low-pass filter. Figure 6.2 shows |H (iω)| = √ 12 2 2 as a
1+ω R C
function of ω, both on normal scale and logarithmic scale (throughout this report, log must be read
log10 , unless stated otherwise).
1
0.9
0.8
0.7
0.6
|H (iω)| 0.5
0.4
0.3
0.2
0.1
0
|H (iω)|
0
2e+06
4e+06
ω
6e+06
8e+06
0
-5
-10
-15
-20
dB(|H (iω)|) -25
-30
-35
-40
-45
-50
1e+07
0
dB(|H (iω)|)
1
2
3
4
log(ω)
5
6
7
Figure 6.2: |H (iω)| as a function of ω for the RC-circuit in Figure 6.1, with R = 100 Ohm and
C = 10 · 10−9 Farad. The right figure uses logarithmic and dB axes.
Apparently, high frequencies (ω >> 1/RC) are supressed by the filter, while low frequencies (ω <<
1/RC) do pass, hence the name low-pass filter. The term RC is often reffered to as RC-time (for this
example, the RC-time is 1 · 10−6 s), while the term ωc = 1/RC is called the cutoff frequency, because
of its distinguishing property. The cutoff frequency becomes even more clear in the right graph of
Figure 6.2. Note that ω = 2π f is the angular frequency.
Another application of the transfer function is stability analysis. The roots of the denominator are
called the poles of the transfer function, while the roots of the numerator are called the zeroes of
the transfer function. The role of the zeroes of the transfer function will not be discussed here, but
the poles deserve some attention. The poles of the transfer function are the natural frequencies of
the corresponding circuit, i.e. the frequencies of the free oscillations in absence of any sources. For
an exponentially damped circuit, the real parts of the natural frequencies should be less than zero,
while for an undamped circuit the frequencies should have real parts equal to zero. However, if one
of the natural frequencies has real part larger than zero, the circuit becomes instable, as the produced
energy rises. In other words, if all natural frequencies ωn have real parts less than or equal to zero,
Re(ωn ) ≤ 0, then the circuit is exponentially stable. This information is very important, especially in
the design of active circuits.
c Koninklijke Philips Electronics N.V. 2002
37
2002/817
Unclassified Report
The above example is a typically passive circuit (it produces no energy) and is therefore stable. Indeed,
the pole of the transfer function (6.4) is ωn ≡ iω = −1/RC ≤ 0 (if R, C > 0). Note that a stable
circuit is not necessarily passive.
The transfer function for circuits with an exponential input signal (x(t) ∼ exp(st)) in its most general
form reads
y(t)
H (s) =
,
(6.5)
x(t)
where again the time-dependency has disappeared, because the output signal varies with the same
exponential as the input signal when the natural oscillations have damped out. Furthermore, the
natural frequencies, which are the roots of an equation with real coefficients, are either real or appear
in complex pairs.
In realistic circuits, there usually are multiple sources. The principle of superposition states that the
total response is equal to the sum of the responses to the individual sources[6]. Hence, the above
example can be expanded to the general case.
6.3 Alternative formulation of the transfer function
In Section 4.4, another definition of the transfer function H(s) has been given than the one in the
previous section:
H(s) =
L(zero-state response)(s)
,
L(source variation)(s)
(6.6)
where L( f )(s) is the Laplace transformation of a function f defined in the time domain and s is
the (complex) variable in the frequency domain. In fact, both definitions are equivalent, except that
this definition is defined in the frequency domain while the former is defined in the time domain.
Definition (6.6) is derived from the Laplace transformation of the small-signal MNA model
d x(t)
C dt + Gx(t) = e(t)
.
(6.7)
x(0) = x0
Using the Laplace differentiation rule L( ddtf ) = sF(s) − f (0− ), the Laplace transformation results in
the following system in the Laplace (frequency) domain:
(sC + G)X (s) = E(s) + Cx0 ,
(6.8)
where X (s) = L(x(t)), E(s) = L(e(t)) and s is the variable in the Laplace domain. Note that C and
G are square, real matrices and not necessarily symmetric or invertible. For the zero-state response
(i.e. only due to excitations), it holds that x0 = x(0) = 0, which reduces equation (6.8) to
(sC + G)X (s) = E(s).
(6.9)
Some rewriting results in the following definition for the transfer function H(s):
X (s)
E(s)
= (sC + G)−1 .
H(s) =
(6.10)
So H(s) is in fact the inverse of a parameterised square matrix.
38
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
6.3.1 Elementary response
Each entry Hoi (s) describes the zero-state response of a single unknown circuit variable Xo to a
variation of a single unknown source variable Ei :
Hoi (s) =
Xo (s)
.
Ei (s)
(6.11)
At this point, an analogy can be made with the previous section. The example studied the response
Hio of the unknown output voltage to a variation of the voltage source. Were these actions performed
in the frequency domain, then in fact a single entry Hoi (s) would have been computed.
The variation of a single input variable can be interpreted as an impulse input signal ei (t) = δ(t),
where δ(t) is the Dirac-delta function. The Laplace transform of ei (t) = δ(t) yields Ei (s) = ei , with
ei the i-th unit vector. Thus, the entry Hoi (s) can also be identified by
Hoi (s) = eoT (sC + G)−1 ei ,
(6.12)
where it is noted that (sC + G)−1 ei results in the response of all output variables to a variation
of input variable Ei , which explains the inner product with eoT to select out the particular response
of output variable Xo . Alternatively, Hoi (s) is the o-th entry of the solution H·i (s) of the system
[H(s)]−1 H·i (s) = (sC + G)H·i (s) = ei . By Cramer’s rule[13], Hoi is given by
Hoi (s) =
det([[H(s)]−1 ]o,ei )
det([sC + G]o,ei )
=
,
−1
det([H(s)] )
det(sC + G)
(6.13)
where [sC + G]o,ei is the matrix sC + G with the o-th column replaced by ei 1 . Thus, the function
Hoi (s) is a rational function of s and can also be represented by
det([sC + G]o,ei )
det(sC + G)
Z(s)
=
P(s)
Qdeg(Z )
(s − zl )
l=1
= coi Qdeg(
,
P)
(s
−
p
)
k
k=1
Hoi (s) =
(6.14)
for a suitable, dimensionless coi . The transfer function is, up to a constant coi , determined by the
locations of both the zeroes zl and the poles pk . This factor coi can be determined from
−1
Qdeg(Z )
l=1
Hoi (0) = (G )i j = coi Qdeg(
P)
k=1
(−zl )
(− pk )
,
(6.15)
where the term Hoi (0) is called the DC-gain. When a dynamic system is exposed to a step input, it
eventually reaches a final steady-state value, which is a function of the DC-gain. For linear circuits,
this value is equal to the DC-gain for a unit step input (for more information on the DC-gain, see
[20]). Essentially, the DC-gain gives information about the DC analysis solution (then C = 0) and
is therefore a good indication for the quality of the results of the pole zero analysis. Furthermore,
because the coefficients of the polynomials P(s) and Z(s) (although s may be complex, the matrices
1 A computational gain can sometimes be obtained by only leaving element (sC + G) unequal to zero.
oi
c Koninklijke Philips Electronics N.V. 2002
39
2002/817
Unclassified Report
C and G are both real) are all real, the roots are either real or appear in complex-conjugate pairs.
The polynomials Z(s) and P(s) are real polynomials in s and the degree of Z(s) is one less than or
equal to the degree of P(s), as Z(s) corresponds to the determinant of the minor matrix of (sC + G).
The complex values z k satisfying Z(z k ) = 0 are the zeroes of Hoi (s), while the complex values pk
satisfying P( pk ) = 0 are called the poles of Hoi . Notice that the case C = 0 is a special case resulting
in no appearance of s on the diagonal of (sC + G).
In practice, cancellation of poles and zeroes frequently happens. This cancellation usually cannot be
determined by analytical methods and must therefore be detected during or after numerical approximation of the poles and zeroes.
6.3.2 Natural frequencies
The term natural frequencies has been mentioned a couple of times in the previous sections. The
natural frequencies of a circuit transfer function H(s) are formally defined as the complex values p
satisfying det( pC + G) = 0, i.e. the roots of the polynomial P(s) = det(sC + G). As has been stated
in the previous sub-section, these poles are either real or appear in complex-conjugate pairs, such that
the following is valid:
deg(
YP)
P(s) = det(sC + G) = ci
(s − f k ),
(6.16)
k=1
with natural frequency f 2k either real or ( f 2k , f 2k+1 ) a complex-conjugate pair. The constant ci can be
determined from
det(G) = ci
deg(
YP)
(− f k ).
(6.17)
k=1
The location of the natural frequencies determines the physical stability of a linear circuit. If all
natural frequencies are located in the negative complex half-plane, the circuit is exponentially stable.
However, if, for example due to noise, some modes may be excited, so that they correspond to natural
frequencies with positive real parts, the result is an exponential growth in time of the energy produced,
which is characteristic for an unstable circuit.
6.3.3 General response
In realistic applications, one often wants to know how a specific combination of (unknown) output
variables, identified by a vector a, responds to a specific combination of (unknown) input variables,
identified by a vector b, instead of the special case Hoi (s), where a = ei and b = eo . The reason for
this is that the response of secondary elements like fluxes, charges and currents, that are not system
unknowns, can be studied as well. Similar to equation (6.12), the system
R(a, b)(s) = bT H(s)a,
(6.18)
needs to be rewritten into an equation like (6.13). However, for vectors a and b which are not equal
to a unit vector, this cannot be done without a coordinate transformation. In fact, one searches a
transformation A(x) ∈ Rn×n such that A(x)ek = x for an integer k. This transformation, however,
should be easy to invert. There are several transformations which satisfy this requirements. In [34],
40
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
the following simple transformation is proposed. The coordinates of x are denoted by x j and m is the
lowest integer such that xm 6 = 0 i.e. m = argmin j (x j 6 = 0). The transformation A(x) is now defined
by
A(x) = e1 |e2 | · · · |em−1 |x|em+1 | · · · |en .
(6.19)
Note that A(x) is a lower-triangular and non-singular matrix (familiar as operator in LU -decompositions):







A(x) = 





m
1
..







.





.
1
xm
..
.
..
.
xn
1
..
.
(6.20)
1
The inverse of A(x), [A(x)]−1 is easy to determine:
[A(x)]−1 = e1 |e2 | · · · |em−1 |x̃|em+1 | · · · |en ,
(6.21)
where the vector x̃ is defined by
x̃ j =

 0

1
xm
x
− xmj
if j < m
if j = m .
if j > m
(6.22)
If m = argmin j (a j 6 = 0) and n = argmin j (b j 6 = 0), then the vectors a and b can be written as
a = A(a)em and b = A(b)en , such that (6.18) can be written into an equation similar to (6.13):
bT H(s)a = enT [A(b)]T (sC + G)−1 [A(a)]em
= enT {[A(a)]−1 (sC + G)[A(b)]−T }−1 em
=
det([s Ĉ + Ĝ]n,em )
,
det(s Ĉ + Ĝ)
Ĉ = [A(a)]−1 (C)[A(b)]−T ,
−1
Ĝ = [A(a)] (G)[A(b)]
−T
.
(6.23)
(6.24)
(6.25)
Because the coordinate transformation is applied both to C and G, the poles of (6.23) are the same
as those of (6.13). Note that this is necessary for a correct analysis. If the poles (i.e. the natural
frequencies) would not be the same, the problems would not be similar anymore.
A remark about the chosen transformation A(x) should made. Due to rounding errors, it is possible
that small non-zero elements of x are zero in exact arithmetic. This may cause stability problems in
computing [A(x)]−1 as an evaluation of 1/xm , with xm very small, is needed. An alternative is to take
m as the index of the maximal absolute entry of x, i.e. m = arg(max j |x j |) and define A(x) as
A(x) = e1 |e2 | · · · |em−1 |x|em+1 | · · · |en .
(6.26)
c Koninklijke Philips Electronics N.V. 2002
41
2002/817
Unclassified Report
In this case, [A(x)]−1 is given by
[A(x)]−1 = e1 |e2 | · · · |em−1 |x̃|em+1 | · · · |en ,
(6.27)
with x̃ defined by
x̃ j =
 x
j

 − xm
1
xm

 − xj
xm
if j < m
if j = m .
if j > m
(6.28)
Because xm is the maximal absolute entry of x, this transformation may be more stable when x contains
near machine-precision values.
6.3.4 Eigenvalue formulation
The poles pk ∈ C and the zeroes zl ∈ C satisfy
det(sC + G) = 0,
(6.29)
det([sC + G]n,em ) = 0
(6.30)
respectively. By defining s = − pk and s̃ = −zl , the poles and zeroes can also be found by solving
the generalised eigenvalue problem2
Gx = sCx,
x 6 = 0,
(6.31)
Ḡ x̃ = s̃ C̄ x̃,
x̃ 6 = 0,
(6.32)
where Ḡ and C̄ are minor matrices. Note that the sets of generalised eigenvalues do not change when
the same coordinate transformation is applied to both C and G, and thus the poles are the same.
If matrix G is non-singular, the generalised eigenvalue problems (6.31) and (6.32) can be transformed
to two ordinary eigenvalue problems:
(G −1 C − λI )x = 0,
(Ḡ
−1
C̄ − λ̃I )x̃ = 0,
x 6 = 0,
(6.33)
x̃ 6 = 0.
(6.34)
In this case, the relations between the eigenvalues and the poles and zeroes are λ = −1/ pk and
λ̃ = −1/zl .
Example
The G and C matrices corresponding to (v0 , v1 , v2 , i(e1 ) of the circuit in Figure 6.1 read

0
0
v0 :
1

v :
0
R
G= 1 
v2 :  0 − R1
i(e1 ) : −1 1
0
− R1
1
R
0


−1
v0 :
C


v :
1
0
and C = 1 
v2 : −C
0
i(e1 ) : 0
0

0 −C 0
0 0 0
.
0 C 0
0 0 0
(6.35)
2 In a generalised eigenvalue problem, one searches the λ such that Ax = λBx.
42
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
By grounding node v0 (the circuit terminal), the first row of G is replaced by e1T , resulting in G̃, and
the first row of C is replaced by 0, resulting in C̃. The vector of eigenvalues of matrix G̃ −1 C̃ is
(λ1 , λ2 , λ3 , λ4 )T = (0, RC, 0, 0)T . The eigenvalues equal to zero imply infinite poles, which have no
direct physical interpretation. However, the number of infinite poles is equal to the rank-deficiency of
the matrix C. A small perturbation of (the diagonal elements of) C may result in a situation without
infinite poles, as C is theoretically regular. The eigenvalue λ2 = RC implies a pole of −1/RC, which
is in accordance with the pole found earlier (see Section 6.2).
6.3.5 Eigenvalues and eigenvectors of interest
The formulation of the transfer function Hoi (s) as a rational function in equation (6.14) can be rewritten, using equation (6.15), into
Qdeg(Z )
l=1
Hoi (s) = coi Qdeg(
P)
(s − zl )
k=1 (s − pk )
Qdeg(Z )
Qdeg(Z )
(−zl ) l=1 (1 − s/zl )
l=1
= coi Qdeg( P)
Qdeg( P)
k=1 (− pk )
k=1 (1 − s/ pk )
Qdeg(Z )
(1 − s/zl )
l=1
= Hoi (0) Qdeg(
.
P)
k=1 (1 − s/ pk )
(6.36)
The factors 1 − s/zl and 1 − s/ pk , for relatively large zeroes and poles, will be approximately equal
to 1 for frequencies s with s << zl and s << pk . In other words, only high frequencies (s ∼
= pk )
will have a visible influence on the transfer function. Therefore, for pole-zero analysis one is only
interested in the poles and zeroes which are not much larger than a given maximum frequency, for
these poles and zeroes determine the system behaviour. Having in mind the relation between the
ordinary eigenvalues and the poles ( pk = −1/λk ), the interesting eigenvalues will have absolute
value larger than some value which depends on the specified maximum frequency. Similarly, when
one considers the generalised eigenvalue problem, one is only interested in eigenvalues with absolute
value not much greater than the maximum frequency.
For stable circuits, the eigenvalues are distributed in the positive half of the complex plane. It is
possible that the distribution becomes dense near the origin and hence, the maximum frequency should
be chosen with care to avoid computation of superfluous eigenvalues. Furthermore, the difference
between the largest and the smallest eigenvalue can be very large.
If pole-zero analysis is used to perform stability analysis, positive poles are interesting. Positive poles
correspond with negative eigenvalues. Furthermore, negative poles close to the imaginary axis should
also be studied. Both the C and G matrices contain nominal values for resistors, capacitors and other
elements. In practice, these values may vary by a certain factor, such that it may be the case that a
negative pole becomes a positive pole due to noise. Poles with a quality-factor greater than 10 (in
absolute value) may imply growing oscillations, and are thus of interest for stability analysis3 . The
sign of the real part determines whether there is exponential growth (i.e. for positive real parts) or
decay.
The eigenvectors corresponding to positive poles can be used as estimations for the initial solution of
r
3 Q-factor(x) = −sign(Re(x)) 1 ·
2
Im(x) )2 + 1.
( Re
(x)
c Koninklijke Philips Electronics N.V. 2002
43
2002/817
Unclassified Report
autonomous periodic steady-state problems, instead of using for instance DC solutions. The corresponding circuits are essentially non-linear, to assure that an initial unstable mode can not grow out
of bounds in the time domain.
6.3.6 Frequency shift
Although the conductance matrix G in general is not singular, it can be nearly singular. This may
give problems for those pole-zero analyses which use G −1 , as they will suffer from instability. To
overcome stability problems, one can apply a frequency shift s0 to the original frequency domain
system
(sC + G)X (s) = E(s),
(6.37)
(s̃C + G̃)X (s) = E(s),
(6.38)
which results in the shifted system
with s = s̃ + s0 and G̃ = s0 C + G. The shift s0 must be chosen such that G̃ becomes regular and
can be inverted in a stable manner. To avoid expensive computations, the frequency shift s0 should
be real-valued. One can compare this frequency shift with the bypass of capacitors (when treated as
reactive elements) by small parallel resistors.
The poles and zeroes of the shifted problem must be translated to the poles and zeroes of the original
problem. The poles pk0 = s̃ of the shifted problem are translated to the poles pk of the original problem
by pk = pk0 + s0 . A similar translation can be made for the zeroes of the shifted problem.
If G has a rank-deficiency, a corresponding number of poles at zero will arise. Physically, this can
be interpreted as an infinitely slow response of the circuit to input signals. A small perturbation of a
singular matrix G may remove the rank-deficiency, making the circuit behaviour sensitive to perturbations. Numerically, simulation problems for circuits with poles near zero are ill-posed, because the
circuit behaviour may be very sensitive to perturbations, and certain precautions should be taken to
analyse these problems in a stable manner. One of the ways to handle this is the described frequency
shift.
6.3.7 Time domain impulse response
The time domain impulse response corresponding the input-output impulse response Hoi (s) can be
calculated from the pole-residue representation of Hoi (s). This pole-residue representation can be
Z (s)
obtained in the following way, starting from Hoi (s) = P(s)
.
First, a polynomial Q(s) can be defined such that
Z(s)
Z̃(s)
,
= Q(s) +
P(s)
P̃(s)
(6.39)
with deg( Z̃) < deg( P̃). Furthermore, residue values rk,m can be calculated such that the contribution
of an M-fold pole pk to the general pole-residue representation becomes
Hio, pk (s) =
44
M
X
rk,m
.
(s − pk )m
m=1
(6.40)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
The general pole-residue representation then looks like
X
Hoi (s) = Q(s) +
Hio, pk (s).
Poles pk
(6.41)
In its most general form, the inverse Laplace transformation can be defined in terms of a Cauchy
integral expression:
Z c+i∞
q
h oi (t) =
Hoi (s) exp(st)ds,
(6.42)
2πi c−i∞
for a sufficiently large real constant c. However, using some inverse Laplace transformation rules, a
more explicit representation can be derived from the
pole-residue representation (6.41). The inverse
Pdeg(Q)
Laplace transformation of the polynomial Q(s) = k=0 qk s k reads
L−1 (Q)(t) =
deg(Q)
X
qk δ (k) (t),
(6.43)
k=0
with δ (k) (t) the k-th derivative of the Dirac-delta function. The inverse Laplace transformation of the
remainder of equation (6.41) is
L−1 (
rm
)(t) = rm t m−1 exp( pk t).
(s − pk )m
(6.44)
In this formulation, it is clear that simple real poles result in exponential time domain behaviour, while
multiple poles will result in a combination of polynomial and exponential time domain behaviour.
Finally, a non-zero imaginary part of complex-conjugate pairs of poles indicates harmonic behaviour
in the time domain.
6.4 Visualisation techniques
Pole-zero analysis will result in a approximation of the transfer function
Qdeg(Z )
l=1
Hoi (s) = coi Qdeg(
P)
k=1
(s − zl )
(s − pk )
,
(6.45)
in terms of a real gain value coi , a set of complex poles pk and a set of complex zeroes zl . Depending
on the way the results are processed further, there are different ways to present the results. Usually,
one is most interested in the poles of the transfer function. However, the gain value often gives a good
indication of the accuracy of the analysis method used.
Besides ordinary textual output, where the poles, the zeroes and the gain are simply printed in a nice
format, more interesting visualisations are given by various types of graphical outputs.
6.4.1 Plots of poles and zeroes
A general plot of poles and zeroes is a set of data points in the complex plane. In such a figure, one
can detect coinciding poles and zeroes, which are found by the specific analysis method. Furthermore,
c Koninklijke Philips Electronics N.V. 2002
45
2002/817
Unclassified Report
Oct 30, 2001
17:18:28
Linear Pole Zero Distribution
- y1-axis - Im F
100.0u
(LIN)
POLES
ZEROES
0.0
-100.0u
-200.0k
-100.0k
-150.0k
0.0
-50.0k
Analysis: PZ
User: rommesj
100.0k
50.0k
200.0k
150.0k
(LIN)
Re F
Simulation date: 25-10-2001, 14:39:48
File: /home/rommesj/project/pstar/example2.sdif
Figure 6.3: The pole-zero plot corresponding to the circuit in Figure 6.1. There is one pole and there
are no zeroes.
both poles with positive and negative real parts can be identified from the figure, such that exponential
stability analysis is possible. Figure 6.3 shows the pole-zero plot of the example circuit in Figure 6.1.
Note that the (one and only) pole has the value ωn /2π = −1/(2π RC). If the poles and the zeroes
vary significantly in magnitude, the above visualisation technique may not satisfy. An alternative is to
use logarithmic axes, which are more applicable to reflect the circuit response dynamics. Real parts
rk and imaginary parts z k are mapped to a real or imaginary logarithmic axis respectively, depending
on its value. A common convention is to let relatively large absolute values |rk | or |z k | (e.g. |rk | ≥ 1
or |z k | ≥ 1) correspond to logarithmic values sign(rk ) log(|rk |) or sign(zk ) log(|z k |). Small absolute
values |rk | or |z k | (e.g. |rk | < 1 or |z k | < 1) are truncated to zero.
6.4.2 Pole-zero Bode plot
A Bode plot visualises the approximate transfer function Hoi (s) by evaluating it for a sequence of
frequency values sk = iωk , ωk > 0. Both the magnitude |Hoi (sk )| (see the upper graph of Figure 6.4)
and the phase arg(Hoi (sk )) (see the lower graph of Figure 6.4) are plotted as a function of the angular
frequency ωk . Note that these plots can also be obtained, in general even cheaper, from a small signal
analysis, where Hoi (s) is calculated by Hoi (s) = eiT (sC + G)−1 eo for s = iω. Pole-zero analysis
does the same by evaluating (6.45), after having computed the poles and the zeroes. A Bode-plot
has a base 10-logarithmic frequency scale such that the graph is easily expanded near zero as well as
towards larger frequency values. On the vertical axis a Decibel (dB) scale is used, i.e. a 20 log scale4 .
Note that from Figure 6.4, one can identify clearly the cutoff frequency: at the cutoff frequency, the
curve starts to fall off. Furthermore, the phase, which is given by − arctan(ω RC), equals 135 degrees
1
1
at ω = 2π f = RC
and drops to 90 degrees for ω >> RC
, such that the output voltage (which is
1
nearly zero for ω >> RC ) lags the input by 90 degrees. Also note that the curve in the upper graph
4 dB(x) = 10 log x 2 = 20 log x.
46
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
Oct 30, 2001
17:19:01
DB(VN(2))
2002/817
Bode Plot
0.0
(LIN)
-20.0
-40.0
-60.0
PHA(VN(2))
180.0
170.0
(LIN)
150.0
130.0
110.0
90.0
1.0
100.0
10.0
10.0k
1.0k
1.0M
100.0k
Analysis: AC
100.0M
10.0M
(LOG)
User: rommesj
F
Simulation date: 25-10-2001, 14:39:48
File: /home/rommesj/project/pstar/example2.sdif
Figure 6.4: The pole-zero Bode plot corresponding to the circuit in Figure 6.1.
1
can be approximated by straight line segments. The slope of the curve for ω >> RC
is approximately
−1 (in logarithmic values), i.e. a growth of the frequency with a factor 10 results in a decrease of the
output voltage with a factor 10 (or, equivalently, a decrease of 20 dB).
In fact, a Bode-plot is a rough sketch of sums of straight lines:
Q
20 log H(s) = 20 log(c Q
j (s
− zj)
j (s
− pj)
) = 20(log c +
X
j
log(s − z j ) −
X
log(s − p j )).
(6.46)
j
So in general, poles cause a decrease of 20 dB per decade, while zeroes cause a growth of 20 dB per
decade. The pole which causes the first decrease (i.e. the pole with the smallest absolute value) is
called the dominant pole.
6.4.3 Nyquist plot
A Nyquist plot presents the response H(iω) as a function of ω ≥ 0 in the complex plane. A Nyquist
plot can be useful if one wants to get a complete overview of how the transfer function depends on
the frequency. Figure 6.5 shows the Nyquist figure corresponding to the circuit in Figure 6.1 with
1−iωRC
H(iω) = 1+ω
2 R2C 2 .
There are various other visualisation methods, like plots of the zero-state time domain response
and Smith charts, which are polar plots of the reflection coefficients (characterising for instance an
impedance), used for RF-applications.
c Koninklijke Philips Electronics N.V. 2002
47
2002/817
Unclassified Report
2*pi*f=0
0
2*pi*f=inf
−0.05
−0.1
−0.15
Imaginary
−0.2
−0.25
−0.3
−0.35
−0.4
−0.45
−0.5
0
0.1
0.2
0.3
0.4
0.5
Real
0.6
0.7
0.8
0.9
1
Figure 6.5: The Nyquist plot corresponding to the circuit in Figure 6.1.
6.4.4 Resonance example
A common phenomenon in electric circuits is resonance. A circuit is called resonant when it operates
at a frequency where the reactive, i.e. imaginary, component of the total impedance or admittance is
zero. The corresponding frequency is called the resonance frequency. Figure 6.6 shows an RLCcircuit where resonance takes place at a particular fre quency. The transfer function of the circuit in
Figure 6.6 is given by
Hoi (s) =
L(i R )(s)
LCs 2 + 1
=
.
L(eV )(s)
(LCs 2 + 1)R + Ls
(6.47)
Figure 6.7 shows the pole-zero Bode plot for this circuit, where the current through resistor r 1 is plotted
against the frequency. One can clearly identify a peak in the centre of the upper graph. At the same
frequency, the phase also makes a big shift of approximately 180 degrees. The peak corresponds to an
approximately zero current through resistor r 1 and indicates an infinite impedance at that frequency,
caused by the parallel LC-network. Indeed, the replacing impedance of the parallel LC-network is,
for an harmonic input signal, given by
Z = i(
−1
),
1
ωC − ωL
(6.48)
which is (absolute) infinite if ω = ωr = √1LC . In Figure 6.7, the peak exactly corresponds to a
√
frequency fr = ωr /2π = 1/(2π LC), which is the resonance frequency. At the same frequency,
the phase drops 180 degrees. For low frequencies, the capacitor behaves as an open circuit, so that
the circuit is a simple series RL-network. Consequently, the frequency interval (R/L , ωr ), the phase
decreases with 90 degrees. In the interval (ωr , 1/RC), the phase decreases another 90 degrees. For
48
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
l1
6
2
−
1
c1
r1
+
e1
0
Figure 6.6: A simple RLC-circuit.
Nov 8, 2001
09:28:27
DB(I(R_1))
Bode Plot
-40.0
(LIN)
-50.0
-70.0
-90.0
-110.0
PHA(I(R_1))
200.0
(LIN)
100.0
0.0
-100.0
-200.0
1.0
100.0
10.0
10.0k
1.0k
Analysis: AC
User: rommesj
1.0M
100.0k
100.0M
10.0M
(LOG)
F
Simulation date: 08-11-2001, 09:26:03
File: /home/rommesj/project/pstar/examplercl.sdif
Figure 6.7: The pole-zero Bode plot corresponding to the circuit in Figure 6.6, with R =
100 Ohm, L = 0.02 Henry and C = 10 · 10−9 Farad.
c Koninklijke Philips Electronics N.V. 2002
49
2002/817
Unclassified Report
Oct 31, 2001
11:06:38
- y1-axis - Im F
Linear Pole Zero Distribution
(LIN)
15.0k
POLES
ZEROES
10.0k
5.0k
0.0
-5.0k
-10.0k
-15.0k
-200.0k
-100.0k
-150.0k
0.0
-50.0k
Analysis: PZ
User: rommesj
100.0k
200.0k
50.0k
150.0k
(LIN)
Re F
Simulation date: 31-10-2001, 11:04:56
File: /home/rommesj/project/pstar/examplercl.sdif
Figure 6.8: The pole-zero plot corresponding to the circuit in Figure 6.6, with R = 100 Ohm, L =
0.02 Henry and C = 10 · 10−9 Farad.
high frequencies, the roles of the inductor and the capacitor are switched: the inductor behaves as an
open circuit. Furthermore, because the resonance frequency corresponds with zero conductance, it
is expected that the transfer function (6.47) has a zero at ωr or a complex-conjugated pair of zeroes
(iωr , −iωr ). The pole-zero plot in Figure 6.8 confirms the presence of a complex-conjugated pair.
Furthermore, it seems that there is a pole at zero. However, a glance at the same plot with logarithmic
axis (as explained in Section 6.4.1) shows that this is not the case (see Figure 6.9). The two poles
p1 ≈ −799.795 and p2 ≈ −158.355 · 103 clearly vary in magnitude. For more information about
resonant circuits, the reader is referred to [3, 6].
6.5 Numerical methods for pole-zero analysis
Basically, there are two kinds of methods for pole-zero analysis, namely methods which approximate
the poles and zeroes separately by solving P(s) = 0 and Z(s) = 0, or methods which approximate the
(s)
transfer function Hoi (s) = ZP(s)
as a whole as a rational function. This section will briefly introduce
methods of both kinds, while the next chapters will focus on the methods currently used in Pstar and
possible improvements or replacing methods.
6.5.1 Separate pole and zero computation
The poles and the zeroes are calculated by separate computation of the roots of the determinant polynomials Z(s) = det([sC + G]o,ei ) and P(s) = det(sC + G). This calculation of the roots can be done
in several ways, however, poles and zeroes may cancel. Because both problems are similar, without
loss of generality only the problem of finding the roots of P(s) is considered.
The Q Z method The Q Z method[36] directly computes all the generalised eigenvalues of problems
50
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Nov 8, 2001
10:21:45
- y1-axis - g_FminIm(F)
Logarithmic Pole Zero Distribution
6.0
(LIN)
POLES
ZEROES
4.0
2.0
0.0
-2.0
-4.0
-6.0
-6.0
-2.0
-4.0
Analysis: PZ
User: rommesj
2.0
6.0
0.0
4.0
(LIN)
g_FminRe(F)
Simulation date: 08-11-2001, 09:26:03
File: /home/rommesj/project/pstar/examplercl.sdif
Figure 6.9: The pole-zero plot corresponding to the circuit in Figure 6.6, with R = 100 Ohm, L =
0.02 Henry and C = 10 · 10−9 Farad, with logarithmic axes.
of the form (6.31) by applying similarity transforms. However, the Q Z method works on full
matrices and is therefore less interesting for pole-zero analysis. For more information about the
Q Z method applied to pole-zero analysis, see [32] and the next chapter.
The Q R method The Q R method[36] directly computes all the eigenvalues of problems of the form
(6.33), i.e. of transformed generalised eigenvalue problems. The Q R method is currently be
used as solver for flat matrices in Pstar and will be studied in detail in the next chapter.
The Arnoldi process The Arnoldi process[36] is a Krylov subspace method which iteratively approximates the spectrum of large sparse non-symmetric matrices. Currently, the Arnoldi method
is the only (iterative) alternative in Pstar for Q R, but the implementation lacks the robustness
and accuracy of Q R. The Arnoldi method and some variants will be discussed in the next
chapter.
The Jacobi-Davidson algorithm The Jacobi-Davidson algorithm[26, 27, 36] covers the Arnoldi algorithm but considers other subspaces. Several variants (JDQZ, JDQR) are known; some of
them will be discussed in Chapter 9.
Pseudospectra Although pseudospectra, as described in [31], may not be useful to compute the
whole spectrum, they can be used to study the behaviour of negative poles with small real
parts to perturbations.
Other methods, which have been shown to be less appropriate for pole-zero analysis, can be found
in [32].
c Koninklijke Philips Electronics N.V. 2002
51
2002/817
Unclassified Report
6.5.2 Combined pole and zero computation
(s)
In this case, the rational function Hoi = ZP(s)
is approximated by another rational function H̃(s). In
general, methods that use rational approximation do not suffer from the cancellation of poles and
zeroes.
Interpolation of transfer data The transfer function Hoi (s) is evaluated by determinant calculation
at a number of frequencies sk . After that, an estimation is made for the number of (significant)
poles and zeroes. The data will be interpolated to yield an approximation for the transfer function, whose roots are the poles. This method in general is not applicable, as large errors are to
be expected in case of underestimation of the number of poles and zeroes.
Asymptotic waveform evaluation algorithm (AWE) AWE, as described in [32], is based on asymptotic moment expansion to approximate the transfer function.
Lanczos Padé approximation The Padé Via Lanczos method (PVL) is described in [11]. It utilises
a Lanczos method for non-symmetric systems to approximate the dominant natural frequencies
and the corresponding eigenvectors.
Other methods like Cauchy’s integral calculations and Maehly’s method for rational functions are
described in [32].
52
c Koninklijke Philips Electronics N.V. 2002
Chapter 7
Pole-zero computation in Pstar
7.1 Introduction
In the previous chapter, the numerical formulation of the pole-zero analysis problem has been stated.
Considering the transfer function
Hio (s) =
det([sC + G]o,ei )
det(s C̄ + Ḡ)
=
,
det(sC + G)
det(sC + G)
(7.1)
where C̄ and Ḡ are minor matrices of C and G, one can try to solve either the generalised eigenvalue
problem
Gx = sCx,
x 6 = 0,
(7.2)
Ḡ x̃ = s̃ C̄ x̃,
x̃ 6 = 0,
(7.3)
or the ordinary eigenvalue problem
(G −1 C − λI )x = 0,
(Ḡ
−1
C̄ − λ̃I )x̃ = 0,
x 6 = 0,
(7.4)
x̃ 6 = 0.
(7.5)
For the generalised eigenvalue problem, the poles (respectively zeroes) are given by pk = −s (respectively zl = −s̃). For the ordinary eigenvalue problem, the poles (respectively zeroes) are given by
pk = −1/s (respectively zl = −1/s̃). Unless stated otherwise, the notation C will be used for both C
and C̄ and the notation G will be used for both G and Ḡ, as the eigenvalue problems can be solved in
the same way. Furthermore, the matrices C and G will be n × n matrices.
Before the eigenvalue solvers used in Pstar are described, it is important to state the properties of the
C and G matrices and the requirements for the solvers, as they will be part of the criteria on which
the numerical eigenvalue methods will be chosen.
• The C and G matrices are square, real matrices and not necessarily symmetric or non-singular.
• Because the C and G matrices are real, the eigenvalues of G −1 C, if G −1 exists, are real or
appear in complex-conjugate pairs and hence the spectrum is symmetric with respect to the real
axis.
53
2002/817
Unclassified Report
• In general, one is not interested in all eigenvalues, but only in the eigenvalues greater than some
value (ordinary eigenvalue problems like 7.4 and 7.5) or less than some value (generalised
eigenvalue problems like 7.2 and 7.3). See Section 6.3.5.
• The zeroes are only of interest if a detailed analysis is done. In practice, one is more interested
in stability analysis.
• For stability analysis, one is only interested in the positive poles, i.e. negative eigenvalues, if any,
or in poles with a quality-factor greater than 10, which may imply growing oscillations. The
corresponding eigenvectors can be used as estimations for the initial solution of autonomous
periodic steady-state problems. Sometimes, only the pronouncement whether or not there are
positive poles is sufficient.
• Poles in the right half-plane should never be missed. Furthermore, ’false’ positive poles are not
desired.
• There should be a means of accuracy measurement, by, for example, the DC-gain. When not
all poles (and zeroes) are calculated, the Bode-plot, which sometimes serves as a quality check,
may differ from the Bode-plot computed with AC analysis. However, the Bode-plot is not the
main target of pole-zero analysis.
• Both smaller and larger problems (103 < n < 104 ) need to be handled. A direct solver will not
be sufficient from a computational costs point of view.
• The eigenvalue solver preferably should fit in the hierarchical design of Pstar, which means that
matrix-vector multiplications and inner-products are allowed, but matrix-matrix products in
general not. As a result, direct methods, which need to access elements of matrices to compute
matrix-matrix products are not applicable in Pstar.
• Switching from a generalised eigenvalue problem to an ordinary eigenvalue problem, by computing the inverse of a matrix, may be dangerous (Note that the inverse is not directly computed,
but by for example LU -decomposition). Although the ordinary eigenvalue problem may be
solved, it is possible that the inverse matrix operations introduce substantial numerical errors,
making the physical interpretation of the results worthless.
Note that this chapter is focused on separate pole and zero computation. A technique (PVL) which
performs combined pole and zero computation will be discussed later. The following sections describe the two implemented eigenvalue solvers in Pstar: the Q R method and the Arnoldi method.
Furthermore, a section will be dedicated to the Q Z method, the equivalent of Q R for the generalised
eigenproblem.
Both the Q R method and the Arnoldi method are used in a skeleton algorithm for the computation of
the poles and the zeroes. Algorithm 2 shows this skeleton. The steps which depend on the method
used are described in the corresponding sections. Both methods work with the ordinary eigenvalue
problem, i.e. with G −1 C.
One may decide to scale C and/or G to reduce the condition number of the eigenvalues, which is
defined by
Cond(λ) =
54
1
||u||||w||
=
,
cos θ(u, w)
uT w
(7.6)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Algorithm 2: The algorithm for pole-zero computation in Pstar.
Apply a frequency shift σ to sC + G
Scale C and/or G.
Compute the poles (method dependent)
Compute the zeroes (method dependent)
Filter cancelled poles and zeroes
Compute the DC gain Hoi (0) = G −1
oi
with u and w the right and left eigenvectors. Badly conditioned eigenvalues, i.e. eigenvalues with
large condition numbers, may be sensitive to small perturbations. For example, a perturbation of
order ρ to a matrix may result in a perturbation of magnitude ρCond(λ) to the eigenvalues. The
norm of the matrix itself may be reduced by this balancing too, thereby reducing the roundoff error
made by numerical algorithms, which often is of the size M ||A||, with M the machine precision (see
[1, 14, 23]).
Because there is no actual detailed software documentation of Pstar with relation to pole zero analysis,
the statements made about the implementation of the method are based on [32, 34] and on some
inspection of the source code.
7.2 The Q R method
The Q R method can be used to solve ordinary eigenvalue problems. In Pstar, the ordinary eigenvalue
problem is obtained by first computing the matrix product G −1 C, after applying an optional frequency
shift to make G regular. The inversion is done by LU -decomposition of matrix G, which may be made
more accurate by iterative refinement[32], because LU -decomposition can be unstable. The resulting
ordinary eigenvalue problem is
(G −1 C − λI )x = 0,
x 6 = 0.
(7.7)
7.2.1 Algorithm
For reasons which will become clear soon, matrix A = G −1 C first is reduced to an upper Hessenberg
matrix. An upper Hessenberg matrix H is a nearly triangular matrix, where the lower sub-diagonal is
also non-zero:


∗ ··· ··· ··· ∗


∗ ∗ · · · · · · ... 


 .. ..

.
.
H =
.
(7.8)
.
. · · · .



. . . . .. 

.
. .
∗ ∗
For a real matrix A, there exists an orthogonal similarity transformation to upper Hessenberg form[14].
In other words, for every matrix A there exists an orthogonal matrix Q such that Q ∗ AQ = H . The
transformation is usually done by Householder reflections (if A is dense) or Givens transformations
c Koninklijke Philips Electronics N.V. 2002
55
2002/817
Unclassified Report
(if relatively few entries of A must be rotated to zero). In Appendix A, both transformations are
described. Because the transformation is orthogonal, the eigenvalues of A and H = Q ∗ AQ are the
same:
Ax = λx ⇐⇒ Q ∗ AQ Q ∗ x = λQ ∗ x.
(7.9)
The eigenvectors x of A can be derived from the eigenvectors y of H by the relation y = Q ∗ x.
The Q R algorithm orthogonally transforms A (in an iterative manner) to an upper quasi-triangular
matrix Q ∗ AQ = R, which is defined as a nearly upper triangular matrix where the block-diagonal
exists of 1 × 1 or 2 × 2 matrices. The form R = Q ∗ AQ is called the Schur form of A and Q contains
the Schur vectors (see Appendix A.4). The eigenvalues of R are equal to the eigenvalues of A and,
moreover, are easy to compute. The algorithm is based on the Orthogonal Iteration Method (OIM, see
[36]), which computes the k largest eigenvalues of a real matrix A by power iterations:
Algorithm 3: The Orthogonal Iteration Method.
Choose an orthonormal Uk(1)
for i = 1, 2, 3, . . . until convergence do
Vk = AUk(i)
Orthonormalise the columns of Vk :
Vk = Q k Rk
Uk(i+1) = Q k
The k largest real eigenvalues of A appear on the diagonal of the k × k matrix Rk , while the columns
of the n × k matrix Q k form an orthonormal basis for an invariant subspace of dimension k. The
Orthogonal Iteration Method is based on the block version of the well-known Power Method, which
exploits the fact that the Rayleigh quotient of the vector v j = A j −1 u1 converges to the dominant
eigenvalue λ1 :
lim
j →∞
vTj Av j
vTj v j
= λ1 ,
(7.10)
with u1 6 = 0. However, the OIM orthonormalises the columns of Vk The Q R decomposition Vk =
Q k Rk , with Q k orthogonal and Rk upper triangular, has been added to avoid that all columns of
Uk(i) converge to the same dominant eigenvector (and consequently, the entries r j j to the dominant
eigenvalue). If the OIM starts with a full start matrix Un(0), the matrix Un(i) converges to the matrix
∗
of Schur vectors, and the matrix Un(i) AUn(i) converges to the Schur form. Appendix A.3 gives a
description of the Q R decomposition.
The Q R method proceeds as follows. Starting with a full start matrix U (0) = Un(0) , one iteration of
∗
the OIM results in AU (0) = U (1) R (1) . The similarity transform A1 = U (1) AU (1) transforms A a little
bit into Schur form, which suggests to continue with A1 instead of with A. Note that, with Q 1 = U (1)
and R1 = R (1) , and A = Q 1 R1 , the matrix A1 can be computed as
A1 = Q ∗1 AQ 1 = Q ∗1 Q 1 R1 Q 1 = R1 Q 1 .
(7.11)
The next iteration of the OIM proceeds with A1 instead of A, such that the OIM essentially becomes
a sequence of orthogonal transformations:
As = Q s Rs
As+1 = Rs Q s ,
56
(7.12)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
where, because of the orthogonality of Q s , it is noted that As+1 = Q ∗s As Q s . The above sequence of
orthogonal transformations is known as the Q R method. It can be proven that
lim As = Q ∗ R Q,
s→∞
(7.13)
where Q ∗ R Q is in generalised Schur form (see [30]). The diagonal blocks of R appear in decreasing
order of absolute value of their eigenvalues. The matrix product Q s = Q 1 Q 2 . . . Q s−1 converges to
the matrix of Schur vectors Q, where the matrices Q i can be seen as correction matrices (which are
computed from the factors Q i−1 and Ri−1 of the previous iteration). Algorithm 4 gives a template for
the Q R method.
Algorithm 4: The Q R Method.
Choose A0 = A
for i = 1, 2, 3, . . . until convergence do
Q R factor Ai−1 : Ai−1 = Q i Ri
Compute Ai = Ri Q i
The rate of convergence is given by means of the speed by which a sub-diagonal element aisj converges
to zero and is proportional to
|λi | s
,
|λ j |
(7.14)
where |λ j | > |λi | (the eigenvalues appear in decreasing order of absolute value). If |λ j | is close to
|λi |, convergence may be very slow. This convergence can be improved by introducing a shift σs
to As . The eigenvalues of As − σs I are equal to λk − σs , so that the rate of convergence becomes
proportional to
|λi − σs | s
.
|λ j − σs |
(7.15)
See [30] for detailed information about the convergence of the Q R method. In the ith iteration,
Ai−1 − σi I , instead of Ai−1 , is Q R factored, which results in the following sequence of orthogonal
transformations:
Ai − σi I = Q i Ri
Ai+1 =
Q ∗i Ai Q i
= Ri Q i + σi I.
(7.16)
In order to maximise the rate of convergence, a new shift should be computed every iteration. The
preferable shift is close to the smallest eigenvalue of the quotient. Algorithm 5 shows the Q R method
with shifts.
The Q R algorithm only uses real arithmetic. If one wants to use complex shifts, the computations
can still be done in real arithmetic. Complex eigenvalues of a real matrix always appear in complexconjugate pairs, so convergence to both members of the pair is desired. This suggests to combine two
successive steps in which two complex-conjugate shifts are used. The complex effects cancel, so that
real arithmetic can still be used.
c Koninklijke Philips Electronics N.V. 2002
57
2002/817
Unclassified Report
Algorithm 5: The Q R Method with shifts σi .
Choose A0 = A
for i = 1, 2, 3, . . . until convergence do
Determine shift σi
Q R factor Ai−1 − σi I : Ai−1 − σi = Q i Ri
Compute Ai = Ri Q i + σi I
Shifts are not only useful for faster convergence, they may even prevent the Q R algorithm from not
terminating, as an example in [7] shows. If the Q R algorithm without shifts is applied to


0 0 1
A1 = 1 0 0 ,
(7.17)
0 1 0
the Q 1 R1 factorisation results in
and hence A2 = R1 Q 1

−1

A1 − σ I = 1
0
so that



0 0 1
1 0 0
A1 = 1 0 0 0 1 0 ,
0 1 0
0 0 1
(7.18)
= A. Applying a shift of σ = 1 immediately solves this problem:
√
√
√  √
√
√ 
 
−1/√ 2 −1/√6 1/√3
0
1
2 −1/
2
−1/
√
√ 2




−1 0 = 1/ 2 −1/ 6 1/ 3
0
3/2 − 3/2 ,
√
√
1 −1
0
0
0
0
2/ 6 1/ 3
(7.19)
√


3/2 0
−1/2
−
√
A2 = R1 Q 1 + σ I =  3/2 −1/2 0 .
0
0
1
(7.20)
The number of flops for a Q R factorisation of a real non-symmetric n × n matrix is O(n 3 ). As
each Q R iteration is dominated by a Q R factorisation, the costs for the whole Q R method will be
intolerably high (probably O(n 4 ) flops for real un-symmetric matrices). This is why the original
matrix is first transformed to upper Hessenberg form, as mentioned at the start of this section. This
transformation H = Q ∗ AQ costs O(n 3 ) flops only once. Besides the fact that the eigenvalues of H
are equal to the eigenvalues of A, another important observation can be made: the Q R factorisation
H = Q R of an upper Hessenberg matrix can be efficiently done (O(n 2 )), as only the entries on the
lower co-diagonal must be rotated to zero. Moreover, because Q is again upper Hessenberg, as its
columns are the orthonormalised columns of H , and R is upper triangular, the product R Q is also
in upper Hessenberg form. This is really interesting, because each iteration of the Q R method only
requires the Q R factoring of an upper Hessenberg matrix.
The Q R method with shifts is currently being used in Pstar. The implementation is similar to the
NAG library routine F02AG F, a Fortran implementation (see [19]), which expects a matrix in upper
Hessenberg form.
To compute the eigenvectors of A, it is necessary to accumulate the matrices Q i . The actual eigenvectors x can be computed from equation (7.9).
58
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
7.2.2 Accuracy
In [32], the Q R method is reported to have high, probably maximal accuracy. All kinds of poles
and zeroes, both with large or small negative real part, are approximated with an almost maximum
number of significant digits (17). Frequency shifts also do not influence the accuracy. Of course, the
accuracy is also dependent on the stability of the LU decomposition. Iterative refinement of the LU
decomposition improves stability. However, maximal accuracy would imply exact eigenvalues, and
this was not the case. Besides that, the examples used in [32] are more or less artificial with compact
spectra.
Comments of circuit designers confirm these observations. Despite of its running time, the Q R
method is valued for its accuracy. However, parasitic poles are observed for circuits with many distinct poles, i.e.for problems with wide spectra. This is probably caused by a badly conditioned matrix
G.
Theoretically, the computed orthogonal matrix Q satisfies A + E = Q ∗ AQ, with ||E|| = O(u||A||)
for machine-precision u.
7.2.3 Computational notes
The NAG library routine is estimated to cost 20n 3 flops if all eigenvalues and eigenvectors of an upper
Hessenberg matrix are computed. The total costs of the computation of all eigenvalues and eigenvectors of a real un-symmetric matrix become, including the transformation to upper Hessenberg form,
roughly 25n 3 flops. If only the eigenvalues are computed, the costs reduce to 10n 3 flops. For symmetric matrices, the costs can be reduced by a factor 2. The total costs of O(n 3 ) make the Q R method
only applicable for matrices of order less than a few thousands. Note that the above costs exclude the
computation of the LU -decomposition, which costs n 3 /2 flops. The current implementation of the
Q R method is properly optimised, but could be replaced by a more recent implementation. However,
the Q R method is not likely to be suitable to handle matrices of order greater than a few thousands.
7.2.4 Implementation notes
Because of the matrix-matrix products, the Q R method works on explicitly given matrices. This
means that the hierarchical matrices C and G must be determined in explicit format before applying
the Q R method. The poles can be computed from the eigenvalues λ by p = σ − α/λ, where α ∈ R
is a scalar such that α × maxi j (|Ci j |) ≈ 1.
The actual implementation in Pstar also contains an optimisation, which is described in [4]. A zero
column in C causes a zero column in G −1 C, which will result in an eigenvalue zero. If column j is
zero, the problem size can be reduced by removing column j and row j from G −1 C. This is only
a simple application of the technique described in [4]. One can further reduce the problem size by
considering zero rows as well.
The computation of G −1 is made more stable by dividing each row G i by max j (|G i j |, |Ci j |). Note that
the rows of C must be scaled with the same factor, to ensure that λ((DG)−1 DC) = λ(G −1 D −1 DC) =
λ(G −1 C), with D a diagonal matrix containing the row scale factors.
Before the zeroes z can be computed, first the transformation matrices A(a) and A(b) with corresponding m and n, for input and output expressions a and b, must be computed, as described in
c Koninklijke Philips Electronics N.V. 2002
59
2002/817
Unclassified Report
Section 6.3.3. Applying the transformations results in explicitly given matrices Ĉ and Ĝ. After that,
the zeroes can be determined in a similar way as the poles are.
7.3 The Arnoldi method
The Arnoldi method computes the eigenvalues of the ordinary eigenvalue problem
( A − λI )x = 0,
x 6= 0
(7.21)
by creating an orthonormal basis for the Krylov subspace Km ( A, v) and approximating the eigenvalues
of A by the eigenvalues of the projection of A onto Km ( A, v). The Krylov space Km ( A, v) is defined
by
Km ( A, v) = span{v, Av, A2 v, . . . , Am−1 v}.
(7.22)
Krylov subspaces have some important properties. In the scope if this chapter, a relevant property is
that the Krylov subspace Km ( A, v) is spanned by the same basis if A is scaled and/or shifted, with
α 6 = 0:
Km (α A + β I, v) = span{v, (α A + β I )v, (α A + β I )2 v, . . . , (α A + β I )m−1 v}
= span{v, α Av + βv, (α 2 A2 + 2αβ A + β 2 I )v, . . . , (α A + β I )m−1 v}
= Km ( A, v).
(7.23)
From this it can be concluded that subspace iteration methods for computation of the eigenvalues of
A are invariant under scaling and shifting of A.
In pole-zero analysis, A = G −1 C, but A will not be computed itself (in contrary to the Q R method,
where the computation is necessary). Because the Arnoldi method is an iterative method, it is sufficient to compute the LU -decomposition of G, which can be used to compute matrix-vector multiplications of G −1 C with arbritrary vectors.
7.3.1 Algorithm
The trivial basis v, Av, A2 v, . . . , Am−1 v for the Krylov subspace Km ( A, v) is, apart from the fact that
is not orthonormal, not interesting because the vectors A j v will tend to the dominant eigenvector of
A. This is exactly what the OIM does. Arnoldi proposed to orthonormalise the basis by applying a
modified Gram-Schmidt procedure[13, 36]. Assuming that there already exists an orthonormal basis
v1 , v2 , . . . , v j for K j (A, v 1 ), it can be expanded with v j +1 = q j +1 /||q j +1 ||2 to form an orthonormal
basis for K j +1 ( A, v1 ):
q j +1 = Av j −
j
X
( Av j , vi )vi .
(7.24)
i=1
If this implemented in a straightforward manner, it is called the classical Gram-Schmidt method.
However, the classical Gram-Schmidt method suffers from severe cancellation problems. A solution
is the modified Gram-Schmidt method, as will be described in the next paragraph and Algorithm 6.
60
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
The starting vector v1 = v/||v||2 should be chosen with care. Of course, it must be non-zero, but it
should also be avoided that ( Av1 , v1 )v1 = Av1 , which would result in a breakdown at the start: v2
would be zero in that case. This situation can also occur in a later iteration and means that an invariant
subspace for A has been obtained. If, for instance,
1 1
A=
,
0 1
and v1 = e1 , the Arnoldi method breaks down during the first iteration. Possible solutions for this
situation will be discussed later.
In exact arithmetic, the modified Gram-Schmidt scheme will be sufficient to orthonormalise the basis.
In finite precision, however, it may happen that the orthogonalised new vector is much smaller in norm
than it was before orthogonalisation. This common phenomenon leads to numerical cancellation, and
this can be solved in several ways, one of which will be discussed now. If the quotient of the norm
after and before the orthogonalisation is less than κ < 1, Gram-Schmidt can be applied again until
the quotient is greater than κ. This procedure is a form of iterative refinement. Algorithm 6 shows a
template for the Arnoldi orthonormalisation with iterative refinement.
Algorithm 6: The Arnoldi Method with iterative refined Modified Gram-Schmidt.
Choose a convenient v 6 = 0
Determine a tolerance κ < 1
v1 = v/||v||2
for j = 1, 2, 3, . . . , m − 1 do
w = Av j
w0 = ||w||2
for i = 1, 2, 3, . . . , j do
h i j = v∗i w
w = w − h i j vi
while ||w||2 /w0 ≤ κ do
w0 = ||w||2
for i = 1, 2, 3, . . . , j do
r = v∗i w
w = w − rvi
hij = hij + r
h j +1, j = ||w||2
v j +1 = w/ h j +1, j
Thus far, in exact arithmetic, an orthonormal basis for Km ( A, v1 ) has been created by means of
v1 , v2 , v3 , . . . , vm . So what is the connection with eigenvalues? The answer follows from a closer
inspection of the Arnoldi iterations. If V j is the n × j matrix with columns v1 to v j and H j, j −1 is a
j × ( j − 1) upper Hessenberg matrix, the Arnoldi orthonormalisation process, up to iteration m, can
be summarised as
AVm−1 = Vm Hm,m−1 = Vm−1 Hm−1,m−1 + h m,m−1 vm e∗m .
(7.25)
The elements of Hm,m−1 are defined as h i j by the Arnoldi algorithm. The matrix Hm,m−1 is the
matrix A reduced to the Krylov subspace Km ( A, v1 ) and often referred to as the reduced matrix A.
c Koninklijke Philips Electronics N.V. 2002
61
2002/817
Unclassified Report
∗
Eigenvalues θ of the leading (m −1)×(m −1) part of Hm,m−1 , i.e. eigenvalues of Hm−1 = Vm−1
AVm−1
by orthogonality of Vm−1 , are called the Ritz values of A with respect to the Krylov subspace of
dimension m. The vector Vm−1 s, with s an eigenvector of Hm−1 , is called a Ritz vector of A. Reasoning
in a similar way as with the OIM, the eigenvalues of Hm−1 will converge to the eigenvalues of A and
the Ritz vectors will converge to eigenvectors of A. However, as has been stressed before, there is
an important difference between the OIM and the Arnoldi method. Because A and à = α A + β I
˜ = α Hm−1 + β Im−1 ), the position of the
generate the same Krylov subspace (in other words, Hm−1
origin is not relevant. Krylov methods have no specific preference for the absolute largest eigenvalue,
but approximate eigenvalues to the border of the convex hull of the spectrum faster than eigenvalues
in the centre of the spectrum[36]. Note that eigenvalues at the border are not necessarily the largest
eigenvalues in absolute value. Furthermore, with (θ, s) an eigenpair of Hm−1 , an important relation
for the residual of the approximate eigenpair (θ, Vm−1 s) of A can be derived:
Hm−1,m−1 s
T
T
Vm−1 AVm−1 s − θ Vm−1
Vm−1 s
T
Vm−1 ( AVm−1 s − θ Vm−1 s)
= θs
= 0
= 0,
i.e. , the residual of the approximate eigenpair (θ, Vm−1 s) is orthogonal to the current Krylov subspace.
This is a nice property when working with iterative methods, because is guarantees an optimal search
direction. Another practical feature is that the residual norm can be computed quite easy. For the
approximate eigenpair (θ, Vm−1 s), it follows from
AVm−1 s = Vm−1 Hm−1 s + h m,m−1 vm e∗m s
= θ Vm−1 s + h m,m−1 e∗m svm ,
that the residual is given by
( A − θ I )Vm−1 s = h m,m−1 e∗m svm ,
and therefore, the residual norm is
||( A − θ I )Vm−1 s||2 = ||h m,m−1 e∗m svm ||2
= |h m,m−1 e∗m s|.
The residual norm is thus equal to the absolute value of the mth entry of s multiplied with h m,m−1 .
The residual norm can be used for deriving a stopping procedure (although it is not always an accurate
representation of the actual error in finite precision). After the mth Arnoldi iteration, the spectrum of
the leading part Hm−1 can be computed by for instance the Q R method. Note that this can be done in
a relatively cheap way (O((m − 1)3 ) flops). However, only m − 1 out of n eigenvalues approximations
are obtained.
7.3.2 Accuracy
According to [32], eigenvalues with absolute large values are approximated in an efficient and accurate
way (±14 significant digits). However, eigenvalues near zero are poorly approximated. Moreover, artificial eigenvalues near zero are produced by the algorithm. This may be expected when the number of
iteration is too small. Indeed, if no shift is applied, the exterior eigenvalues will be approximated faster
62
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
than the interior (i.e. near zero) eigenvalues. A possible cure for this is to shift-and-invert the original
matrix (which does not change the Krylov subspace), and to check the convergence behaviour automatically. It is not clear from [32] whether they applied iterative refinements to the LU -decomposition
of G and the Modified Gram-Schmidt process.
In practice, Arnoldi iterations have been proven to be very fast, but they lack the necessary accuracy.
The accuracy may be improved by using more iterations or adding techniques like restarting and
shift-and-invert, which will be discussed in the next chapter.
7.3.3 Computational notes
The costs for m Arnoldi iterations, without refinement, are about 2nm(2m + n) flops. The costs
for the computation of the eigenvalues of the upper Hessenberg matrix Hm−1 are 10m 3 . So for m
much smaller than the dimension n of A, a substantial gain in computational costs is to be expected
when using Arnoldi iterations instead of the Q R method in combination with direct transformation to
upper Hessenberg form. Additional refinement iterations will contribute O(2m 2 n) flops to the total of
operations.
These costs are for dense matrices. If sparse matrices are used, then the costs for the Arnoldi iterations
can be decreased. For instance, if A contains N non-zero entries, the costs for a matrix-vector product
are 2N − 1 flops.
7.3.4 Implementation notes
The Arnoldi method can be applied to hierarchical matrices C and G, but the resulting Hessenberg
matrix HN must be an explicit matrix. The dimension of HN , i.e. the number of Arnoldi iterations N,
must be determined by some criterion or automatically. Currently, N is determined beforehand. The
eigenvalues λ of HN are computed with the Q R method, which is relatively cheap for N << n. The
corresponding poles are computed by p = σ − α/λ, where α is a scaling factor for C, similar to the
factor used to scale C in the previous section.
After the hierarchical matrices Ĉ and Ĝ have been computed by applying transformations A(a) and
A(a) (see Section 6.3.3), the zeroes can be computed in a similar way as the poles.
7.3.5 Possible improvements
For successful application of the Arnoldi method, it is necessary to make some adjustments to the
basic scheme. Beside the opportunity of replacing Arnoldi by another iterative eigenvalue solver like
Jacobi-Davidson, there are some improvements available in the literature:
Shift-and-Invert Arnoldi This method is useful when the eigenvalues closest to a particular shift
need to be computed and the computation of (G −1 C)−1 is relatively cheap.
Implicitly Restarted Arnoldi When relatively many iterations are needed, it is efficient to restart a
couple of times to decrease memory usage and computational work.
Preconditioning Like for iterative methods to solve linear systems, iterative methods for the solution
of eigenproblems may take advantage of pre-conditioners.
c Koninklijke Philips Electronics N.V. 2002
63
2002/817
Unclassified Report
7.3.6 Arnoldi versus the Q R method
Using Arnoldi iterations to compute all n eigenvalues of an n × n matrix may be inefficient. If the
matrix is dense, constructing the orthonormal basis is rather expensive when compared with reduction
to upper Hessenberg form by Householder transformations. Moreover, the Q R method is used to
compute the eigenvalues of the n × n Hessenberg matrix. In this case, one should consider using the
Q R method from the start. The same argument holds for sparse matrices. although the difference
may be less significant. The greatest advantage of Arnoldi iteration is the possibility to handle very
large (sparse) problems. By using Shift-and-Invert techniques, the whole spectrum can be scanned,
whereas the Q R method computes the whole spectrum at once (which is impossible for large n).
7.4 A note on condition numbers
The condition number κ of a matrix A is usually defined by
κ = ||A−1 ||||A||.
(7.26)
If the condition number of a matrix is big, i.e. κ > 104 , the matrix is called ill-conditioned. A rule
of thumb, resulting from perturbation theory, is that inverting a matrix with condition number κ may
generate a loss of log10 κ significant digits. Consequently, the impact of the condition number on
numerical computations depends on the precision number of the computer.
Turning to the pole-zero eigenvalue problem, matrices G with κ = O(105 ) are rather common. This
means that transforming the generalised eigenproblem to an ordinary eigenproblem by inverting G
introduces a loss of 5 digits at least. Suppose that calculations are performed in double precision with
15 significant digits, only 10 significant digits are left for the eigencomputation. Iterative refinement
may prevent these number of digits from dropping further. However, if the eigenvalues vary for
example O(1010 ) in absolute magnitude, then the smallest eigenvalues have no significant digits.
For this reason, solving the eigenproblem by inverting G and applying the Q R method is very dangerous. Although the resulting eigenproblem may be solved without problems, the results may be
worthless if G is badly conditioned. Therefore, it is advised to use the Q Z method, which is the
equivalent of the Q R method for generalised eigenproblems, for solving the pole-zero eigenproblem.
The Q Z method has exactly the same results as the Q R method, but indicates inaccurate eigenvalues.
This will be discussed in the next section.
7.5 The Q Z method
The Q Z method can be used to solve generalised eigenvalue problems. The specific generalised
eigenproblem arising from pole-zero analysis is
Cx = λGx,
x 6 = 0,
(7.27)
where the poles p are given by p = −1/λ. In this section, however, the generalised eigenproblem
Ax = λBx,
x 6 = 0,
(7.28)
will be referred to. This section is based on Section 12 of [36] and Chapter 7.7 of [14]. It is assumed
that the matrices A and B are real. The theory can be extended for complex matrices A and B.
64
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
7.5.1 Algorithm
The Q Z algorithm can be derived from the following expression, which is obtained by applying the
Q R method to AB −1 y = λy:
Q ∗ AB −1 Q = R,
(7.29)
the unitary reduction of AB −1 to Schur form. The eigenvalues are the same for the generalised eigenproblem, while the eigenvectors are given by x = B −1 y. Note that this strategy is only possible when
B is nonsingular and is well conditioned. Dangers of converting from a generalised eigenproblem to
an ordinary problem, like round-off errors, have been discussed in the previous sections.
After B −1 Q is reduced to upper triangular form S, with a unitary Z, it follows from
B −1 Q = Z S
(7.30)
that (7.29) can be written as
Q ∗ AZ S = R,
AZ = Q RS −1 .
or
(7.31)
Because S and R are upper triangular matrices, so are S −1 and RS −1 . Since B Z = Q S −1 , both A and
B can simultaneously be reduced to Schur form, with unitary matrices Q and Z (instead of a single
unitary matrix Q for the Q R method):
AZ = Q RS −1 = Q R A ,
B Z = Q S −1 = Q R B .
(7.32)
This reduction is possible for general B.
To accomplish this, the Q Z method follows a similar pattern as the Q R method does for the ordinary
eigenproblem. First, the matrices A and B are simultaneously transformed to respectively upper
Hessenberg and upper triangular form:
Q̃ ∗ A Z̃ = H
and Q̃ ∗ B Z̃ = R,
(7.33)
where Q̃ and Z̃ are orthogonal matrices. This transformation can be done in two steps. First, B is
factored as B = Q R such that Q ∗ B is upper triangular. After that, Givens transformations G L(i) to
the left are used to reduce Q ∗ A to upper Hessenberg form. However, because it is desired to preserve
the eigenvalues, these Givens transformation have to be applied to Q ∗ B also, thereby destroying the
upper triangular form. This can be fixed by applying another sequence of Givens transformations
G R(i) to the right. Summarising, the reduction to upper Hessenberg-triangular form can be written as
G L(m) · · · G L(0) (Q ∗ A)G R(0) · · · G R(k) = H,
G
L(m)
···G
L(0)
∗
(Q B)G
R(0)
···G
R(k)
= R.
(7.34)
(7.35)
The matrices Q̃ and Z̃ are thus defined by
Z̃
= G R(0) · · · G R(k) ,
Q̃ = QG
L(0)
···G
L(m)
(7.36)
.
(7.37)
Having arrived at the upper Hessenberg-triangular form, the following major step is to reduce A to
upper (quasi-)triangular form by equivalence transformations Q i and Z i . This can be done in the same
c Koninklijke Philips Electronics N.V. 2002
65
2002/817
Unclassified Report
way as with the Q R method. However, again the upper-triangular form of B will be destroyed and
again this can be cured by applying Givens transformations. If this is done iteratively, one eventually
arrives at orthogonal Q and Z, such that Q ∗ AZ = R A and Q ∗ B Z = R B . The non-zero sub-diagonal
elements of the Hessenberg form of A are iteratively transformed to zero.
The eigenvalues of the original problem can now be computed easily from the generalised Schur form
(R A − λR B )x = 0.
(7.38)
The generalised eigenvalues are given by the ratio of the diagonal elements of R A and R B :
λi =
RiiA
,
RiiB
(7.39)
or the ratio of the eigenvalues of the 2 × 2 blocks of R A and the diagonal entries of R B in case of
complex-conjugated pairs of eigenvalues.
Taking the ratio of the diagonal elements can be dangerous, if, for instance, B is singular or badly
RA
conditioned. Computing the ratio RiiB for nearly zero elements RiiB (which may be zero in exact arithii
metic) results in very inaccurate eigenvalues λ < ∞ (so called parasitic eigenvalues). On the other
hand, if A is non-singular, one could have transformed the problem to Bx = µAx. In this case, the
eigenvalues µ are given by
R̃iiB
R̃iiA
and will be approximated much better.
It should be noted that the Q Z method computes the same eigenvalues as the Q R method does. However, there are (numerically) some big advantages of using the Q Z method instead of the Q R method.
First of all, there is more control over the accuracy of the computed eigenvalues: tiny diagonal elements can be detected during the algorithm. Moreover, if both the matrices A and B have singularities,
which may cause both RiiA and RiiB to be near zero, this can also be detected (and reported, whereas
the Q R method would have computed the ratio). The computed ratio, and even all other computed
eigenvalues, may in that case have little or no reliance. Such singularities may indicate ill-defined
problems.
If the eigenvectors are needed too, the products of the intermediate matrices Q i and Z i , resulting in the
final Q and Z, need to be computed also. The generalised eigenvectors xk of the original problem are
given by xk = Zyk , where yk are the generalised eigenvectors of the generalised Schur form (7.38).
A summary of the complete Q Z algorithm can be found in Algorithm 7.
Algorithm 7: The Q Z Method.
Jointly reduce A − λB to upper Hessenberg-Triangular form:
1. Decompose B = Q R, with R = Q ∗ B upper triangular;
2. Use Givens rotations to reduce Q ∗ A to upper Hessenberg form H .
Set A1 = H and B1 = R.
for i = 1, 2, 3, . . . until convergence do
Compute orthonormal Q i and Z i such that Ai+1 −λBi+1 = Q ∗i Ai Z i −λQ ∗i Bi Z i is again in upper
Hessenberg-triangular form.
A remark concerning the rank-deficiency of B is of particular interest for pole-zero analysis, because it
regularly happens that either C or G has rank smaller than n. In [14], it is stated that a rank-deficiency
of B does not affect the convergence speed of the Q Z algorithm.
66
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
7.5.2 Accuracy
It is to be expected that the accuracy of the Q Z method does not differ very much of the accuracy
of the Q R method. However, numerical experiments in [32] report convergence problems for small
positive eigenvalues. Furthermore, the accuracy of the Q Z method seems to depend heavily on the
ordering of the variables and equations, as well as on the problem size.
Formally, the exactly orthogonal matrices Q and Z satisfy Q ∗ ( A+E)Z = R A and Q ∗ (B+F)Z = R B ,
with ||E|| = O(u||A||) and ||F|| = O(u||B||), for machine-precision u.
7.5.3 Computational notes
The total costs of the Q Z method for computing all eigenvalues are 30n 3 . It costs another 16n 3 flops
to compute the right or left eigenvectors (which are in fact the cost for computing the matrices Z or
Q). The costs for computing the eigenvalues are roughly a factor 3 higher than the costs of the Q R
method. One of the reasons for this is the fact that the Q R method somehow has had all the attention,
and is thus optimised, while the Q Z method still exists in its ”original” implementation.
7.5.4 Implementation notes
The Q Z method is currently not (officially) implemented in Pstar.
7.6 Cancellation of poles and zeroes
Due to numerical errors, poles and zeroes which would cancel in exact arithmetic, may not cancel in
inexact arithmetic. Therefore, a criterion for the cancellation of real poles to real zeroes or pairs of
complex-conjugated poles to pairs of complex-conjugated zeroes must be defined. Assuming imaginary parts with equal sign, this criterion for the cancellation of a pole p to a zero z reads as
|Re( p) + Re(z)|
+ abs .
(7.40)
2
The absolute distance abs , which controls the absolute distance between p and z, should be chosen
small (abs << 1): if a pole and a zero have the same imaginary part and the same absolute real part
with different signs, abs is the only criterion, because |Re( p) + Re(z)| = 0. The relative criterion,
with parameter rel , states that the distance between p and z is small compared to the distance of p and
z to the imaginary axis. For small abs , the ratio zp for cancelling (real) poles and zeroes ( p, z ≥ 0)
can be estimated as
1 − rel
z
2
.
(7.41)
≈
rel
p
1+ 2
| p − z| < rel
Note that poles and zeroes satisfying this criterion numerically cancel, but may not cancel in exact
arithmetic. Suppose that a pole and a zero are unjustly judged to cancel. With s the frequency variable,
their contribution to the transfer function would be an amplification factor
s
z
s
p
c Koninklijke Philips Electronics N.V. 2002
−1 z
,
−1 p
(7.42)
67
2002/817
Unclassified Report
but is now equal to 1 due to the cancellation criterion. Depending on the absolute values of p and z,
the erroneous factor made this way may be small for s = iω with small absolute value and grow for
s with larger absolute value (compared to the absolute value of p and z). For s with small absolute
value, the contribution is 1 instead of zp . In Pstar, the default value of rel is rel = 0.05, which
implies a ratio zp ≈ 0.95, an error of approximately 5%. By decreasing rel , i.e. by tightening the
cancellation criterion, this error can be decreased1 .
1 In Pstar, the parameters PZ CANCEL DISTANCE ABS and PZ CANCEL DISTANCE REL can be used to set abs and
rel .
68
c Koninklijke Philips Electronics N.V. 2002
Chapter 8
Stability analysis
8.1 Introduction
In the previous chapter, stability analysis has been mentioned as one of the applications of pole-zero
computation. For stability analysis, one is interested in the positive poles, if any, or, more preferably,
poles with a quality factor greater than a certain value (to be called Q m ). Other applications concern
free running oscillators, where an instability is explicitly desired. In this chapter, a strategy will be
discussed to compute the negative eigenvalues (which correspond with positive poles), without the
necessity to compute all eigenvalues. In fact, if a (single) negative eigenvalue exists, that eigenvalue
is computed first. Recall that the eigenproblem is given by
Cx = λGx,
(8.1)
G −1 Cx = λx.
(8.2)
or, equivalently,
The poles p are given by p = −1/λ. Note that the role of C and G may change if G is ill conditioned
and C is rather nice.
8.2 Complex mapping of the problem
In order to efficiently compute a possible negative eigenvalue of G −1 C, a mapping M with the following properties is desired:
1. M must be onto and one-to-one.
2. The image M(G −1 C) must be cheap to compute.
3. Eigenvalues of M(G −1 C) indicating negative eigenvalues of G −1 C must be clearly separated.
The first property is desired to be able to (uniquely) compute the pole corresponding to the eigenvalue.
The second property follows from the goal to compute negative eigenvalues rather cheap. The third
property is connected with this last argument. In general, subspace methods for the computation of
69
2002/817
Unclassified Report
eigenvalues tend to have a preference for eigenvalues of largest absolute value[36]. Although one and
other can be influenced by applying shifts, this may not be very applicable in this case, because it is
not sure if there is a negative eigenvalue and what its value may be. Therefore, M should map the
negative eigenvalues of G −1 C to the border of the spectrum of M(G −1 C).
With this three requirements in mind, the following mapping is constructed:
M(G −1 C) = (I + G −1 C)−1 (I − G −1 C).
(8.3)
It can be proved that this mapping is onto and one-to-one (see for example [8]). M maps the positive
half-plane onto the interior part of the unit circle. Some algebra shows that eigenvalues λ of G −1 C
relate to eigenvalues λ̃ of M(G −1 C) in the following way:
λ=
1 − λ̃
.
(8.4)
1−λ
.
1+λ
(8.5)
1 + λ̃
Conversely, it also holds that
λ̃ =
Note that relation (8.4) satisfies the third property, because λ̃ > 1 → λ < 0 and λ̃ < −1 → λ < 0
while −1 < λ̃ < 1 → λ > 0. Furthermore, λ̃ = 1 implies an eigenvalue λ = 0, while for λ̃ = −1,
1−λ̃
1−λ̃
relation (8.4) is not defined (limλ̃↑−1 1+
= −∞ and limλ̃↓−1 1+
= ∞). Although it may be the case
λ̃
λ̃
the case that there is a λ = −1, this will in general not happen due to numerical round-off. Moreover,
this λ = −1 causes a singularity in I + G −1 C and will be detected during the LU -decomposition of
I + G −1 C.
Having satisfied the three requirements, only the greatest eigenvalue in absolute value of M(G −1 C)
needs to be computed to come to a conclusion about stability. If this eigenvalue is absolute greater
than one, it can be concluded that G −1 C has a negative eigenvalue and, consequently, the original
problem has a positive pole.
It remains to make a remark about the computation of (I + G −1 C)−1 (I − G −1 C). Note that this term
can be rewritten as
(I + G −1 C)−1 (I − G −1 C) = (G −1 (G + C))−1 G −1 (G − C)
= (G + C)−1 GG −1 (G − C)
= (G + C)−1 (G − C).
The matrices G and C are sparse matrices and the inversion of G + C will not be computed explicitly,
but in the form of an LU -decomposition. This LU -decomposition needs to be computed only once.
Note that such a sum G + C also appears in transient analysis, where G + βs11t C is the Jacobian of the
Newton process (see Section 4.3). So the mapping does not introduce substantial extra computational
costs.
From a more general point of view, when the value of Q m is considered as the criterion, more complex
mappings have to be designed. This will not be discussed here.
70
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
8.3 Efficient maximal eigenvalue computation
The problem of computing a possible negative eigenvalue is now transformed to the problem of computing the eigenvalue(s) with absolute value greater than one. The matrix concerned is
A = (G + C)−1 (G − C).
(8.6)
The largest eigenvalue of A can be computed by for example the Power Method[36], Jacobi-Davidson[26]
or Arnoldi iterations[36]. However, due to the choice of the mapping M, the eigenvalues of M( A)
are clustered together round −1 and 1, with as a result possible poor convergence. Negative eigenvalues are mapped near the unit circle, while only eigenvalues near −1 are mapped away from the unit
circle. Eigenvalues which satisfy −1 < λ < 0 are mapped near λ̃ = 1, while eigenvalues λ < −1
are mapped near λ̃ = −1. A possible way to get rid of this clustering is by scaling the transformed
problem. If the negative eigenvalue is more or less separated (in absolute value) from the other eigenvalues, O(1) Power Iterations with G −1 C give a good approximation for the absolute value of the
largest eigenvalue. This value (α) can be used to scale the transformed problem:
(α I + G −1 C)−1 (α I − G −1 C) = (G +
1 −1
1
C) (G − C).
α
α
(8.7)
This is a special case of the more general Caley transformation. As a result, the negative eigenvalue
will be mapped away from the unit circle instead of near the unit circle. This technique can be used
for both relatively large and relatively small negative eigenvalues. However, such a clear distinction
of a negative eigenvalue highly depends on the nature of the problem. Techniques for handling more
clustered problems, like Chebychev polynomials for instance, go beyond the scope of this report. In
general, the matrix C has a rank deficiency, caused by columns (and rows) of zeroes. These columns
result in eigenvalues equal to zero of G −1 C and to eigenvalues equal to one in the mapped problem.
Similar to the strategy discussed in Section 7.2.4 and [4], columns j which satisfy [M(G −1 C)] j j = 1
and [M(G −1 C)]i j = 0, i 6 = 1 can be removed, together with the corresponding rows. These columns
can be detected by taking a small number (O(1)) of inner-products of the columns with random
vectors. However, this can only be done for explicitly given matrices.
For some applications, like Periodic Steady-State analysis [16], the eigenvector corresponding to the
computed eigenvalue is needed. The choice of the numerical method for computing the eigenvalue
should also consider this fact.
8.4 Numerical results
To illustrate the above strategy, it has been tested on matrices C and G of several circuits from the Pstar
test set. Figure 8.1 shows the eigenspectrum of G −1 C, arising from a realistic model of an amplifier.
The negative eigenvalue is in fact a complex-conjugate pair with values −2.9926 · 101 ± i2.0676 ·
102 . In absolute value, this eigenvalue clearly does not distinguish itself from other eigenvalues.
As a result, a promising shift is hard to find, while the spectrum of the mapped matrix M(G −1 C)
suffers from clustering eigenvalues (see the lower picture in Figure 8.1). Although the spectrum is
clustered (the left-most eigenvalue, which lies outside the unit circle, can hardly be distinguished),
still a computational gain is reached. Using standard Arnoldi iterations (eigs), without Shift-andInvert, the costs for computing all eigenvalues are 2.7 · 107 flops, while computing only the largest
value of the transformed problem with the same Arnoldi method costs 1.4 · 106 . However, using
c Koninklijke Philips Electronics N.V. 2002
71
2002/817
Unclassified Report
eig( G\C )
300
200
100
0
−100
−200
−300
−500
0
500
1000
1500
2000
eig( (1+G\C)\(1−G\C) )
1
0.5
0
−0.5
−1
−1.5
−1
−0.5
0
0.5
1
1.5
Figure 8.1: The spectrum of G −1 C (upper figure) and of M(G −1 C). The dimension is 60.
Arnoldi iterations to compute all eigenvalues is not efficient, because it will be outperformed by the
Q R-method.
Figure 8.2 shows the eigenspectrum of G −1 C, arising from a model of another amplifier. In this
case, the negative eigenvalue λ = −1.1414 · 105 is perfectly separated from the other eigenvalues (in
absolute value). However, the lower picture in Figure 8.2, which shows the spectrum of M(G −1 C),
still shows some clustering (the complex-conjugated pair seems to be projected to the same point,
but this is not the case: the imaginary part is small (O(10−10 ) compared to the scale). However,
by scaling with α = 1 · 105 , the negative eigenvalue of the scaled problem lies near −1, which
causes the mapped eigenvalue to be mapped away of the unit circle (see Figure 8.3), with as a result
faster convergence. Indeed, the Power Method for computing the largest eigenvalue needs 3 iterations
instead of 6 iterations with a tolerance of 1 · 10−12 . Considering the flop count, 5.0 · 105 flops against
2.1 · 107 flops are used when using standard Arnoldi iterations. Again it should be noted that it is not
efficient to use Arnoldi iterations to compute all eigenvalues. Note that in general it is not fruitful to
use the Power Method on the original problem, because it is not known whether the largest eigenvalue
in absolute value is a negative eigenvalue. This is another advantage of this strategy: if the spectrum
of the transformed problem is sufficiently spread, an efficient method with few overhead can be used
to compute the largest eigenvalue.
It is to be expected that this strategy works well on ill-conditioned problems, i.e. problems with a wide
spectrum. Given a good transformation, the computational costs for performing stability analysis can
be reduced as a result of both isolating the interesting eigenvalue and using an effective eigenmethod.
72
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
−10
4
eig( G\C )
x 10
2
0
−2
−4
−12
−10
−8
−6
−4
−2
0
2
4
x 10
eig( (1+G\C)\(1−G\C) )
1
0.5
0
−0.5
−1
−1.5
−1
−0.5
0
0.5
1
1.5
Figure 8.2: The spectrum of G −1 C (upper figure) and of M(G −1 C). The dimension is 30.
−10
4
eig( G\C )
x 10
2
0
−2
−4
−12
−10
−8
−6
−4
−2
0
2
4
x 10
eig( (1+G\C)\(1−G\C) )
1
0.5
0
−0.5
−1
−16
−14
−12
−10
−8
−6
−4
−2
0
2
Figure 8.3: The spectrum of G −1 C (upper figure) and of M(G −1 C). The dimension is 30.
c Koninklijke Philips Electronics N.V. 2002
73
Chapter 9
Jacobi-Davidson methods
9.1 Introduction
This chapter will primarily focus on the Jacobi-Davidson methods JDQR and JDQZ [26, 12, 36].
After a short introduction of the original Jacobi-Davidson method, which is based on both a method of
Jacobi and the Davidson method, JDQR, for the ordinary eigenproblem, and JDQZ, for the generalised
eigenproblem, will be discussed. The methods will not be described in the deepest detail, but attention
will be paid to the global concepts and the correction equation which must be solved during the
iteration of the Jacobi-Davidson algorithm. It is the correction equation where the question for a
suitable preconditioner comes into view.
9.2 The origin of Jacobi-Davidson methods
9.2.1 Jacobi’s method: JOCC
The method known as the Jacobi method, based on Jacobi rotations to force an n×n matrix A to diagonal dominance, is well known. However, another method of Jacobi, the Jacobi Orthogonal Component
Correction (JOCC), originating from 1846, forms a part of the base of the Jacobi-Davidson method.
With the Jacobi and Gauss-Jacobi iterations (Gauss-Jacobi iterations take A = diag( A)− R and iterate
with xi+1 = b + Rxi to solve (diagonally dominant) systems Ax = b) for solving linear systems in
mind, Jacobi considered the eigenvalue problem as a system of linear equations. Having a diagonally
dominant matrix A, with a11 = α the largest diagonal element, this α is an approximation of the largest
eigenvalue λ. The unit vector e1 is an approximation for the corresponding eigenvector u = (1, zT )T :
1
α cT 1
1
A
=
=λ
,
(9.1)
z
b F z
z
where α is a scalar, b, c, z are vectors of size n − 1 and F is an (n − 1) × (n − 1) matrix. This equation
is equivalent with the two equations
λ = α + cT z,
(F − λI )z = −b.
74
Unclassified Report
2002/817
Using Gauss-Jacobi iterations, the latter equations can be solved iteratively to compute approximations θk (these are not Ritz-values) of λ and approximations (1, z kT )T of u:
θk = α + cT zk
(9.2)
(D − θk I )zk+1 = (D − F)zk − b,
where D is the diagonal of F. Note that it is customary to compute the update 1k of zk instead
of zk+1 = zk + 1k directly. However, in the light of Jacobi-Davidson, representation (9.2) will be
considered.
An important remark addresses the fact the JOCC looks for the orthogonal complement to the initial
approximation u1 = e1 in every iteration: u − (uT e1 )e1 . During the iteration process, better approximations uk of u are computed, which can be used to try to compute the orthogonal complement
u−(uT uk )uk instead (which would change operation F in each step because of the orthogonalisation).
This observation, which may improve the iteration process significantly, is one of the key ideas of the
Jacobi-Davidson method.
9.2.2 Davidson’s method
Davidson’s method has been invented by the chemist Davidson with diagonally dominant matrices in
mind. The starting point is a k-dimensional subspace K with orthonormal basis u1 , u2 , . . . , uk . With
the n×k matrix Uk having columns u1 , u2 , . . . , uk , the projection of A onto K becomes Bk = Uk∗ AUk .
An eigenvalue θ of Bk , with eigenvector s, corresponds to a Ritz value θ and a Ritz vector yk = Uk s
of A:
Bk s = θs
AUk s = θUk s.
(9.3)
(9.4)
Because Bk = Uk∗ AUk and Uk∗ Uk = I , it follows that the Ritz pair (θ, Uk s) satisfies a Ritz-Galerkin
condition:
AUk s − θUk s ⊥ {u1 , u2 , . . . , uk }.
(9.5)
Denoting y = Uk s, the residual of this approximated eigenpair reads as
r = Ay − θy.
(9.6)
To find a successful expansion of the subspace, and thereby an improving update of the approximation
y, Davidson focuses on the remaining dominant component of the wanted eigenvector in the residual.
Because Davidson’s problem domain consisted of diagonally dominant matrices, he suggests to solve
t from (D − θ I )t = r, where D is the diagonal of A. This vector t is orthogonalised to the basis
u1 , u2 , . . . , uk , resulting in a vector uk+1 with which the subspace K is expanded.
Although Davidson’s method shows fast convergence for diagonally dominant matrices, it stagnates
for diagonal matrices: in this case, r = Ay − θy = Dy − θy, such that t = (D − θ I )−1 r =
(A − θ I )−1 r = y. In fact, (A − θ I )−1 maps r onto y and does not lead to an expansion of the search
space. It is clear that preconditioners for A − θ I should be not too good.
The Davidson method tries, similar to the JOCC, to find an orthogonal update for the initial approximation u1 = e1 , and hence uses also fixed operators (D). However, whereas the JOCC only considers
c Koninklijke Philips Electronics N.V. 2002
75
2002/817
Unclassified Report
the last computed orthogonal component zk and its correction, the Davidson method takes into account the subspace constructed so far (by orthogonalising the expansion t against it). This aspect is
another key idea of the Jacobi-Davidson method.
For the remainder of this chapter, the matrix A may be either real or complex. As in the preceding
sections, A is not assumed to be symmetric.
9.3 The Jacobi-Davidson method
9.3.1 Algorithm
The base of the Jacobi-Davidson method [26] is formed by the following two ideas, originating from
the JOCC and Davidson’s method respectively (hence the name):
1. Search for the component of the wanted eigenvector x orthogonal to the most recent approximation uk of x.
2. Orthogonally expand the subspace in which the eigenvectors are approximated.
∗
To satisfy the first idea, it is interesting to consider the subspace u⊥
k . From the relation (I − uk uk )uk =
⊥
0 (note that ||uk || = 1), it follows that the orthogonal projection of A onto uk is given by
B = (I − uk u∗k ) A(I − uk u∗k ).
(9.7)
From the Ritz-Galerkin condition ( Auk − θk uk , uk ) = 0 it follows that u∗k Auk = θk , which gives
A = B + Au k u∗k + uk u∗k A − θk uk u∗k .
(9.8)
In order to compute the eigenpair λ, x of A, for a λ close to θk and a x = uk + v close to uk , the
correction v ⊥ uk must satisfy
A(uk + v) = λ(uk + v).
(9.9)
This equation can be rewritten, using equation (9.8) and the facts Buk = 0 and u∗k v = 0, as
Bv + Auk + θk uk − θk uk + uk u∗k Av = Bv + (Au k − θk uk ) + (θk + u∗k Av)uk
= λ(uk + v),
(9.10)
to finally yield, with rk = Auk − θk uk ,
(B − λI )v = −rk + (λ − θk − u∗k Av)uk .
(9.11)
Using the Ritz-Galerking condition rk ⊥ uk and v ⊥ uk again, and thus (B − λI )v ⊥ uk , the equation
for the correction v reduces to
(B − λI )v = −rk = −( Auk − θk uk ).
(9.12)
As λ is unknown, it is replaced by θk at the left-hand side, like it is in the JOCC and Davidson’s
method (except θk in this case is a Ritz value). By seeing that (I − uk u∗k ) is idempotent (of course)
and that (I − uk u∗k )rk = rk , the correction equation becomes
(I − uk u∗k )( A − θk I )(I − uk u∗k )v = −rk .
76
(9.13)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
After the vector v is solved from the correction equation, the current subspace will be expanded with
an orthogonalised v, following the second idea of the Jacobi-Davidson method. This is similar to the
expansion in Davidson’s method.
It should be remarked that the approximation v = −rk leads to same subspaces as the Arnoldi method
[36]. However, Jacobi-Davidson does not take into account that an upper Hessenberg structure can be
used. Furthermore, by approximating the project operator by D − θk I , with D the diagonal of A, and
neglecting the condition v ⊥ uk , the original Davidson method is obtained.
Algorithm 8 shows the basic Jacobi-Davidson method for computing the largest eigenvalue and corresponding eigenvector of a matrix A. Based on this method, Jacobi-Davidson style algorithms like
Jacobi-Davidson QR (JDQR) and Jacobi-Davidson QZ (JDQZ) have been developed.
Algorithm 8: The Jacobi-Davidson method.
Choose a starting vector t = v0 , V0 = V0A = [].
for m = 1, 2, 3, . . . do
{Orthogonalise t to Vm−1 and expand:}
for i = 1, . . . , m − 1 do
t = t − (v∗i t)vi
vm = t/||t||2 , vmA = Avm
A
Vm = [Vm−1 , vm ], VmA = [Vm−1
, vmA ]
{Compute the expansion of the projection of A onto Vm (Wm = V ∗ AV ):}
for i = 1, . . . , m − 1 do
m
Wi,m
= v∗i vmA
m
Wm,i
= v∗m viA
m
Wm,m = v∗m vmA
{Directly compute the largest eigenpair of W m :}
(θ, s̃) = Q R(W m ), s = s̃/||s̃||2
u = Vm s, u A = V A s
r = u A − θu
{Test for convergence, with tolerance :}
if ||r||2 ≤ then
Stop, with λ = θ, x = u
{Solve t ⊥ u from the Correction Equation (approximately):}
(I − uu∗ )(A − θ I )(I − uu∗ )t = −r
The Ritz-Galerkin condition
AVk s − θ Vk s ⊥ {v1 , . . . , vk }
(9.14)
is used in Algorithm 8 to derive the reduced system
Vk∗ AVk s − θs = 0.
(9.15)
The computational costs for the Jacobi-Davidson method are approximately the same as the costs of
Davidson’s method, if the cost for solving the correction equation are equal to the costs of solving
Davidson’s equation. However, the costs for solving the correction equation heavily depend on the
method and preconditioner used.
c Koninklijke Philips Electronics N.V. 2002
77
2002/817
Unclassified Report
If the correction equation is solved exactly, the Jacobi-Davidson method has cubic convergence for
symmetric problems and quadratic convergence for un-symmetric problems (the Jacobi-Davidson
method can be derived as a Newton process). If the correction equation is solved approximately,
in general cubic or quadratic will not be reached[26, 36].
9.3.2 The correction equation
The correction equation
(I − uu∗ )(A − θ I )(I − uu∗ )t = −r
(9.16)
needs to be solved for t ⊥ u. Because t ∈ u⊥ , it holds that
(I − uu∗ )t = t.
(9.17)
Using this, and with α = u∗ ( A − θ I )t, the correction equation (9.16) can be rewritten as
( A − θ I )t = −r + αu.
(9.18)
Note that α depends on the unknown t and hence cannot be calculated directly. An approximation
t̃ ∈ u⊥ of the correction t can be defined, with using a preconditioner K ≈ A − θ I , by
K t̃ = −r + α̃u.
(9.19)
Using the orthogonality constraint u∗ t̃ = 0, the following relation for α̃ can be derived:
α̃ =
u∗ K −1 r
.
u∗ K −1 u
(9.20)
This technique is a so called one-step approximation. Setting α̃ = 0 results in Davidson’s method
with preconditioner K , and hence t̃ will not be orthogonal to u. In its original form (9.20), this is
an instance of the Jacobi-Davidson method. This form may be rather expensive, because a system
concerning K must be solved twice.
Another way to solve the correction equation is to use iterative approximations. In [26], it is suggested
to solve equation (9.16) only approximately by using a small number of steps of an iterative method
like GMRES1. A suitable preconditioner K ≈ A − θk I can be used to accelerate the process. Like the
original matrix A − θk I , the pre-conditioner K has to be restricted to u⊥
k too:
K̃ = (I − uk u∗k )K (I − uk u∗k ).
(9.21)
Iterative solvers like GMRES and BiCGStab typically contain matrix-vector products. Using a Krylov
solver, with starting vector t0 = 0 ∈ u⊥
k , to solve the correction equation (9.16), all iteration vectors
uk will be in that space. With left preconditioning, the computation of a vector z = K̃ −1 Ãv, with v
generated by the Krylov solver (and thus an element of u⊥
k ) and
à = (I − uk u∗k )(A − θk I )(I − uk u∗k ),
(9.22)
1 GMRES builds an upper-Hessenberg matrix like the Arnoldi method does, and uses this Hessenberg matrix to form a
low-dimensional least-squares problem. For more information, see [24]
78
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
is performed in two steps. First, Ãv is computed as
Ãv = (I − uk u∗k )(A − θk I )(I − uk u∗k )v
= (I − uk u∗k )y,
where y = ( A − θk I )v since u∗k v = 0. The second step is to solve z ∈ u⊥
k from
K̃ z = (I − uk u∗k )y.
(9.23)
K z = y − αuk ,
(9.24)
u∗k K −1 y
.
u∗k K −1 uk
(9.25)
Using z∗ uk = 0, z can be solved from
with, again using z∗ uk = 0,
α=
The vector û needs to be solved from K û = uk only once per application of the linear solver for
determining z, while the vector ŷ has to be solved from K ŷ = y every iteration. Furthermore, only
one inner product and one vector update are needed for a matrix-vector product with K̃ −1 Ã, instead of
four operations with the projector (I − uk u∗k ). Note that equation (9.23) cannot be reduced to K z = y
because ker(I − uu∗ ) 6 = 0.
9.4 Jacobi-Davidson QR
9.4.1 Introduction
Whereas the Jacobi-Davidson method as described in the previous section has fast (if the correction
equation is solved exactly even quadratical) convergence to one single eigenvalue closest to the target,
Arnoldi rather has slow convergence to more than one eigenvalue (and eventually all eigenvalues).
Besides the fact that restarting Jacobi-Davidson with another Ritz pair does not guarantee convergence
to a new eigenpair, it also suffers from the problem of detecting multiple eigenvalues (common to other
subspace methods). Both problems can be handled by deflation.
9.4.2 Deflation techniques
The idea of deflation is to, after a Ritz value has converged to an eigenvalue, continue the search in
the subspace spanned by the remaining eigenvectors. In [12, 36], the following partial Schur form of
A is considered:
AQ k = Q k Rk ,
(9.26)
where Q k ∈ Rn×k is orthonormal and Rk ∈ Rk×k is upper triangular, with k << n. One has chosen for
a Schur form because of its orthogonality, so that orthogonal reduction can be obtained. Furthermore,
λ(Rk ) ⊂ λ( A) and an eigenvector x of Rk corresponds with an eigenvector Q k x of A. This partial
Schur form can be constructed in two steps, as will be described next.
c Koninklijke Philips Electronics N.V. 2002
79
2002/817
Unclassified Report
First the matrix A is projected onto the orthonormal subspace spanned by v1 , v2 , . . . vi (the current
searchspace), the columns of Vi :
M = Vi∗ AVi .
(9.27)
Of this projected matrix M, a complete Schur form MU = U S, with S upper triangular, can be
computed with for instance the Q R method. The eigenvalues of M are on the diagonal of the upper
triangular matrix S, or appear in 2 × 2 blocks on the diagonal for complex-conjugated pairs. For
the target eigenvalue τ , the eigenvalues are reordered, while preserving upper triangularity, such that
|sii − τ | forms a non-decreasing row for increasing i. As a result, the first diagonal elements of S are
the eigenvalue approximations closest to the target τ , with the subspace of corresponding eigenvector
approximations spanned by the correspondingly reordered columns of Vi (eigenvector approximations
of A are in the space spanned by the columns of Vi U·i ). From a point of memory savings, one could
choose to continue with this approximations by discarding the other columns, or one could continue
with all columns. The Jacobi-Davidson method finally arrives at an approximated eigenpair (λ̃, q̃).
Secondly, the already existing partial Schur form of dimension k must be expanded with a new pair
(λ, q):
Rk s
A Qk q = Qk q
,
(9.28)
λ
with Q ∗k q = 0. Using this and left-multiplying with (I − Q k Q ∗k ) yields
(I − Q k Q ∗k )(A − λI )(I − Q k Q ∗k )q = 0,
(9.29)
which leads to the conclusion that (λ, q) is an eigenpair of the deflated matrix
à = (I − Q k Q ∗k ) A(I − Q k Q ∗k ).
(9.30)
This pair can be computed with the Jacobi-Davidson method. In this process, for each new Schur
vector approximation ui with Ritz value θi , the correction equation now has to do with the deflated
matrix too:
(I − ui u∗i )(I − Q k Q ∗k )(A − θi I )(I − Q k Q ∗k )(I − ui u∗i )ti = −ri ,
(9.31)
where orthogonality to the current searchspace is represented by the operator (I − ui u∗i ) and the
deflation is represented by the operator (I − Q k Q ∗k ). Concerning the explicit deflation operations,
it is strongly advised in [12] to include the deflation in the correction equation, while this is not
necessary for the projection of A onto the subspace spanned by v j (computed from the correction
equation). Instead, this projection can be computed as Vi∗ AVi . This is correct because the solution of
the deflated correction equation already is orthogonal to the detected Schur vectors. Furthermore, the
computations with explicitly deflated operators becomes more expensive for every converged eigenpair. However, the overall process is expected (and observed [12]) to be more efficient: the deflated
matrix à will be better conditioned, and hence the correction equation can be solved easier.
A suitable restart strategy is to restart with the subspace spanned by the Ritz vectors corresponding
to the Ritz values close to the target value. Restarting with a subspace of dimension one, one will
throw away many possibly valuable information, while this information is more ore less preserved in
a subspace of the ’best’ Ritz vectors. Of course, the eigenvalue equation has to be deflated to prevent
convergence to already found eigenvectors.
80
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
9.4.3 Preconditioning
The correction equation (9.31) can be solved by an iterative solver like GMRES, and applying a
preconditioner goes in a similar way as with the basic Jacobi-Davidson method (see Section 9.3.2).
Assuming there is a left preconditioner K ≈ A − θ j I , the preconditioner K̃ for the deflated operator
reads as
K̃ = (I − ui u∗i )(I − Q k Q ∗k )K (I − Q k Q ∗k )(I − ui u∗i ).
(9.32)
For the remainder of this section the notation Q̃ = [Q k |ui ] will be used, resulting in
K̃ = (I − Q̃ Q̃ ∗ )K (I − Q̃ Q̃ ∗ ).
(9.33)
During the Krylov process, the vector z = K̃ −1 Ãv has to be computed, for a vector v supplied by the
Krylov solver and
à = (I − Q̃ Q̃ ∗ )(A − θ j I )(I − Q̃ Q̃ ∗ ).
(9.34)
Having started with a starting vector in the subspace orthogonal to Q̃, all iteration vectors will be in
that subspace, and thus Q̃ ∗ v = 0. This fact can be used in the first step:
Ãv = (I − Q̃ Q̃ ∗ )(A − θ j I )(I − Q̃ Q̃ ∗ )v
= (I − Q̃ Q̃ ∗ )y,
with y = ( A − θ j I )v.
Next, z ∈ Q̃ ⊥ must be solved from
K̃ z = (I − Q̃ Q̃ ∗ )y.
(9.35)
Using Q̃ ∗ z = 0 and thus (I − Q̃ Q̃ ∗ )z = z, the above equation for z can be rewritten as
K z = y − Q̃( Q̃ ∗ y − Q̃ ∗ K z)
= y − Q̃ αE .
At first sight, αE cannot be computed explicitly, because z is unknown. Because Q̃ ∗ z = 0, αE can be
computed from
αE = ( Q̃ ∗ K −1 Q̃)−1 Q̃ ∗ K −1 y.
(9.36)
z = K −1 y − K −1 Q̃ αE ,
(9.37)
Finally, z can be computed from
where ŷ = K −1 y is solved from K ŷ = y and Q̂ = K −1 Q̃ is solved from K Q̂ = Q̃. (Note that (9.35)
cannot be reduced to K z = y, because ker(I − Q̃ Q̃ ∗ ) 6 = 0.)
Similar savings as with the single vector variant can be obtained: equations involving y have to be
solved once every iteration of the linear solver, while Q̂ must be solved only once per application
of the linear solver.The matrix vector multiplications with the left-preconditioned operator require
one operation with Q̃ ∗ and K −1 Q̃, instead of four operations with (I − Q̃ Q̃ ∗ ). As a result, for i L S
iterations of the linear solver, i L S + k actions with the preconditioner are required.
The preconditioner K ≈ A − θ j I suffers from at least three difficulties:
c Koninklijke Philips Electronics N.V. 2002
81
2002/817
Unclassified Report
1. The better the Ritz value θ j approximates an eigenvalue of A, the more singular becomes A −
θj I.
2. The Ritz value θ j changes every iteration of the Jacobi-Davidson process, and consequently
does A − θ j I .
3. The preconditioner is projected, which may cause undesired effects.
Recomputing K every iteration may be too expensive, while keeping K the same for too many iteration may lead to poor convergence (the correction equation cannot be solved easily). In [28], it is
argued that the preconditioner can be kept the same for a number of successive iterations. In [29],
an efficient update scheme for the preconditioner is derived, so that it is not necessary to recompute
the whole preconditioner. In Chapter 10, these issues will be discussed and illustrated with numerical
examples.
9.4.4 JDQR Algorithm
Before giving the complete JDQR algorithm, first a global outline is presented which serves as a
summary of the above theory: Given a partial Schur form AQ k = Q k Rk and a searchspace Vm ,
compute a new Schur pair (Rk+1,k+1 , qk+1 ) by
• first projecting A onto the searchspace Vm : M = Vm∗ AVm ;
• computing the Schur pair (θ, s) of M with θ closest to the target eigenvalue τ ;
• computing the Schur pair (Rk+1,k+1 , qk+1 ) = (θ, Vm s) of A;
• using an iterative linear solver (with preconditioning) to solve the correction equation
(I − s̃s̃∗ )(I − Q k+1 Q ∗k+1 )(A − θ̃ I )(I − Q k+1 Q ∗k+1 )(I − s̃s̃∗ )t = −r,
(9.38)
where (θ̃, s̃) is the first Schur pair not satisfying the tolerance and r = (I − Q k+1 Q ∗k+1 )(As̃− θ̃ s̃).
• orthogonally expanding the searchspace Vm with t.
Algorithm 9 computes the partial Schur form
AQ k = Q k Rk ,
(9.39)
with Q k ∈ Rn×k orthogonal and Rk ∈ Rk×k upper triangular. The eigenvalue approximations are on
the diagonal of Rk . The corresponding eigenvectors can be computed from Q k and Rk .
Totally, kmax eigenvalues closest to the target τ are computed (the eigenvalues of a non-Hermitian
matrix generally are not ordered in the complex plane). The accuracy of the computed partial Schur
form is
||AQ k − Q k Rk ||2 = O(),
(9.40)
for a given tolerance .
82
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Algorithm 9: Jacobi-Davidson QR for kmax exterior eigenvalues (JDQR).
Input: Starting vector t = v0 .
Target τ .
Tolerance (||AQ − Q R|| < ).
Number of eigenvalues kmax .
Minimal searchspace m min .
Maximal searchspace m max .
Output: kmax eigenvalues.
Partial Schur form AQ k = Q k Rk .
k = 0, m = 0, Q = [], R = []
while k < kmax do
{Orthogonalise t to Vm and expand:}
for i = 1, . . . , m − 1 do
t = t − (v∗i t)vi
m = m + 1, vm = t/||t||2 , vmA = Avm
V = [V, vm ], V A = [V A , vmA ]
{Update the projection of A onto V (M = V ∗ AV ):}
for i = 1, . . . , m − 1 do
m
Mi,m
= v∗i vmA
m
Mm,i
= v∗m viA
m
Mm,m = v∗m vmA
Directly compute Schur decomposition M = ST S ∗ , such that |Tii − τ | ≤ |Ti+1,i+1 − τ |
u = V S·1, u A = V A S·1, θ = T11 , r = u A − θu, ã = Q ∗ r, r̃ = r − Q ã
{Test for acceptance of Schur vector approximations:}
while ||r̃||
≤ do
R ã
R=
, Q = [Q, u], k = k + 1
0 θ
if k = kmax then
Return Q k , Rk .
m =m−1
{Continue the search for a new Ritz pair with deflated space:}
for i = 1, . . . m do
vi = V S·,i+1 , viA = V A S·,i+1 , S·,i = ei
M = lower m × m block of T
u = v1 , θ = M11 , r = v1A − θu, ã = Q ∗ r, r̃ = r − Q ã
{Test whether a restart is required:}
if m ≥ m max then
for i = 2, . . . m min do
vi = V S·,i , viA = V A S·,i
M = leading m min × m min block of T
v1 = u, v1A = u A , m = m min
Q̃ = [Q, u]
{Solve t ∈ Q̃ ⊥ from the deflated Correction Equation (approximately):}
(I − Q̃ Q̃ ∗ )(A − θ I )(I − Q̃ Q̃ ∗ )t = −r̃
c Koninklijke Philips Electronics N.V. 2002
83
2002/817
Unclassified Report
9.4.5 Computational notes
If deflation and restarting are not taken into account, the costs per iteration of Jacobi-Davidson QR
are about 25m 3 + n 2 (4i L S + k + 1) + 17mn + 6n flops, where i L S is the number of iterations of
the linear solver, k is the dimension of the current partial Schur form and m is the dimension of the
current searchspace. An action of the (dense) preconditioner is estimated to cost 2n 2 flops; the costs
for constructing the preconditioner are not counted. Deflation will introduce 4m 2 n + 9mn + 2n flops
extra costs (but will improve the process). Restarting costs 4m 2 n flops, but is not used every iteration.
The term 25m 3 is caused by the Q R operation; it is clear that the searchspace must be kept small
during the process.
The cost picture shows that JDQR is not suitable for computing all eigenvalues, unless the Q R method
cannot be used because of memory restrictions: JDQR stores partial Schur forms instead of complete
Schur forms. Compared with the cost per iteration of Arnoldi, which are 2n 2 + 4mn flops for a Krylov
space of dimension m, JDQR is more expensive (roughly a factor four). However, Arnoldi finishes
with an application of the Q R method to the Hessenberg matrix of dimension k, which costs 10k 3
flops. Furthermore, the convergence properties of JDQR are better than those of Arnoldi, making
it very difficult to make a good comparison on only theoretical base. Another issue is the process of
solving the correction equation. If this can be done efficiently, with the help of a preconditioner, JDQR
has another advantage above Arnoldi. Numerical experiments in the following chapter will make a
better point. The overhead of JDQR makes the method more expensive than Arnoldi, per iteration.
9.5 Jacobi-Davidson QZ
9.5.1 Introduction
The Jacobi-Davidson QZ method (JDQZ) [12, 36] is a Jacobi-Davidson method for the generalised
eigenvalue problem
Ax = λBx.
(9.41)
It can be seen as a subspace iteration variant of the Q Z method and has the advantage that it avoids
transformation of the generalised eigenproblem to an ordinary eigenproblem by solving a linear system involving A or B, which would introduce numerical errors if A or B is ill conditioned (see Section
7.4). Besides that, JDQZ works with orthonormal transformations for stability reasons.
9.5.2 JDQZ Theory
To avoid underflow and overflow in finite precision arithmetic, instead of the generalised problem
(9.41), the equivalent problem
(β A − α B)x = 0
(9.42)
is considered. A generalised eigenvalue λ = α/β is now denoted by the pair (α, β). Instead of working with eigenvectors, generalised Schur vectors will be used (to ensure orthonormal transformations).
The partial generalised Schur form of dimension k for a matrix pair (A, B) is
AQ k = Z k RkA ,
84
B Q k = Z k RkB ,
(9.43)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
with Q k , Z k ∈ Rn×k orthonormal and RkA , RkB ∈ Rk×k upper-triangular. With qi , the generalised
Schur vectors, as the columns of Q k and (αi , βi ) = (RiiA , RiiB ), the generalised Schur pair is denoted
by ((αi , βi ), qi ). A generalised eigenpair ((α, β), y) of (RkA , RkB ) implies a generalised eigenpair
((α, β), Q k y) of ( A, B).
From relation (9.43), it follows via
(βi Aqi − αi Bqi , zi ) = (βi Z k ri − αi Z k ri , zi ) = (βi αi − αi βi ) = 0,
(9.44)
βi Aqi − αi Bqi ⊥ zi .
(9.45)
that
So, unlike the JDQR case, where a Ritz-Galerkin condition can be used, a Petrov-Galerkin condition
is needed. An approximation u of an eigenvector is selected from the searchspace V , while the
residual η Au − ζ Bu, for a generalised Petrov pair (ζ, η), is required to be orthogonal to a so called
test subspace span(W ):
η Au − ζ Bu ⊥ span(W ).
(9.46)
From this requirement, noting that the dimension of both V and W is m, the projected eigenproblem
(ηW ∗ AV − ζ W ∗ BV )s = 0.
(9.47)
can be deduced. This eigenproblem can be reduced to generalised Schur form with the direct Q Z
method (m n), which leads to orthogonal matrices S L , S R ∈ Rm×m and upper triangular matrices
T A , T B ∈ Rm×m , such that
(S L )∗ (W ∗ AV )S R = T A ,
(S L )∗ (W ∗ BV )S R = T B .
(9.48)
After reordering the decomposition (see [12]), the Petrov vector of the pair ( A, B) is defined as u =
V S·1R , with associated generalised Petrov value (ζ, η).
Concerning the testspace W , it follows from V S R ≈ Q k , W S L ≈ Z k and the relation span(Z k ) =
span(AQ k ) = span(B Q k ) (see 9.43) that it is advised to choose W such that span(W ) = span(ν0 AV +
µ0 BV ) for scalars ν0 and µ0 . The scalars ν0 and µ0 can be used to influence the convergence of the
Petrov values; to compute eigenvalues in the interior of the spectrum, the values
ν0 = p
1
1 + |τ |2
,
µ0 = −τ ν0
(9.49)
are reported to be effective[12].
The Jacobi-Davidson correction equation becomes
(I −
pp∗
)(η A − ζ B)(I − uu∗ )t = −r,
p∗ p
(9.50)
with p = ν0 Au + µ0 Bu and r = η Au − ζ Bu. Solving this equation exactly leads to quadratical
convergence[12], while using iterative subspace methods like GMRES combined with suitable preconditioners may also lead to good results. The solution t is used to form the orthonormal expansion
v of V and the testspace W is orthonormally expanded with ν0 Av + µ0 Bv.
c Koninklijke Philips Electronics N.V. 2002
85
2002/817
Unclassified Report
9.5.3 Deflation and restarting
Following similar steps as with the JDQR method, deflation can enhance the Jacobi-Davidson process.
In this case, the partial generalised Schur form
AQ k = Z k RkA ,
B Q k = Z k RkB ,
(9.51)
needs to be expanded with right Schur vector q and left Schur vector z. Similar to the JDQR method,
the new generalised Schur pair is an eigenpair of the deflated matrix pair
((I − Z k Z k∗ )A(I − Q k Q ∗k ),
(I − Z k Z k∗ )B(I − Q k Q ∗k )).
(9.52)
This generalised eigenproblem can be solved with the JDQZ method, where the vectors vi are orthogonal to Q k and the vectors wi are orthogonal to Z k , simplifying the projected deflated matrices M A
and M B :
M A ≡ W ∗ (I − Z k Z k∗ ) A(I − Q k Q ∗k )V = W ∗ AV
(9.53)
M B ≡ W ∗ (I − Z k Z k∗ )B(I − Q k Q ∗k )V = W ∗ BV.
Regarding the restart technique, a strategy equivalent with JDQR is used. The generalised Schur form
(9.48) is ordered with respect to target τ such that
A
B
|T11A /T11B − τ | ≤ |T22A /T22B − τ | ≤ . . . ≤ |Tmm
/Tmm
− τ |,
(9.54)
where m = dim(V ). The columns of S R and S L are ordered correspondingly, so that span(V S·1R , . . . , V S·iR )
contains the i best right Schur vectors of (A, B). The corresponding test subspace is given by
span(W S·1L , . . . , W S·iL ). An implicit restart can be performed by continuing the Jacobi-Davidson algorithm with
V = [V S·1R , . . . , V S·Rjmin ] and W = [W S·1L , . . . , W S·Ljmin ],
(9.55)
where jmin is the minimal dimension of the search and test subspace.
The correction equation in the JDQZ process with deflation becomes
(I − z̃i z̃∗i )(I − Z k Z k∗ )(η A − ζ B)(I − Q k Q ∗k )(I − q̃i q̃∗i )t = −r,
(9.56)
with q̃i = V S·1R , z̃i = W S·1L and r = (I − Z k Z k∗ )(η A − ζ B)(I − Q k Q ∗k )q̃i .
9.5.4 Preconditioning
The correction equation (9.50) again can be solved by an iterative solver like GMRES and applying a
preconditioner goes in a similar way as with JDQR. Already having a preconditioner K ≈ η A − ζ B,
the preconditioner K̃ for the deflated operator reads as
K̃ = (I − z̃i z̃∗i )(I − Z k Z k∗ )K (I − Q k Q ∗k )(I − q̃i q̃∗i ).
(9.57)
For the remainder of this section the notations Q̃ = [Q k |q̃i ] and Z̃ = [Z k |z̃i ] will be used, resulting
in
K̃ = (I − Z̃ Z̃ ∗ )K (I − Q̃ Q̃ ∗ ).
86
(9.58)
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
During the Krylov process, the vector z = K̃ −1 C̃v has to be computed, for a vector v supplied by the
Krylov solver and
C̃ = (I − Z̃ Z̃ ∗ )(η A − ζ B)(I − Q̃ Q̃ ∗ ).
(9.59)
The process starts with a vector in the subspace orthogonal to Q̃. Hence, all iteration vectors will be
in that subspace, and thus Q̃ ∗ v = 0. This fact can be used in the first step:
C̃v = (I − Z̃ Z̃ ∗ )(η A − ζ B)(I − Q̃ Q̃ ∗ )v
= (I − Z̃ Z̃ ∗ )y,
with y = (η A − ζ B)v.
The following step is to solve z ∈ Q̃ ⊥ from
K̃ z = (I − Z̃ Z̃ ∗ )y.
(9.60)
Using Q̃ ∗ z = 0 and thus (I − Q̃ Q̃ ∗ )z = z, the above equation for z can be rewritten as
K z = y − Z̃( Z̃ ∗ y − Z̃ ∗ K z)
= y − Z̃ αE .
Again, αE cannot be computed explicitly, because z is unknown. Because Q̃ ∗ z = 0, αE can be computed
from
αE = ( Q̃ ∗ K −1 Z̃)−1 Q̃ ∗ K −1 y.
(9.61)
z = K −1 y − K −1 Z̃ αE ,
(9.62)
Finally, z can be computed from
where ŷ = K −1 y is solved from K ŷ = y and Ẑ = K −1 Z̃ is solved from K Ẑ = Z̃.
Similar savings as with the JDQR method can be obtained with as a result that for i L S iterations of the
linear solver, i L S + k actions with the preconditioner are required.
The preconditioner K ≈ η A − ζ B also suffers from at least three difficulties:
1. The better the Petrov value (ζ, η) approximates a generalised eigenvalue of ( A, B), the more
singular becomes η A − ζ B.
2. The Petrov value (ζ, η) changes every iteration of the Jacobi-Davidson process, and consequently does η A − ζ B.
3. The preconditioner is projected, which may cause undesired effects.
Recomputing K every iteration may be too expensive, while keeping K the same for too many iterations may lead to poor convergence. In Chapter 10, these issues will be discussed and illustrated with
numerical examples.
c Koninklijke Philips Electronics N.V. 2002
87
2002/817
Unclassified Report
9.5.5 Approximate solution of the deflated correction equation
Because the deflated correction equation (9.56) may be a bit confusing, the solution process may be
not clear at first sight. In the previous sections, the ingredients of the solution process (i.e. computing
matrix vector products z = K̃ −1 C̃v) have been discussed. Algorithm 10 presents all the steps needed
to solve the deflated correction equation, using a preconditioner K̃ . Note that the steps correspond to
the JDQZ process, but that the steps for the JDQR process are similar (replace Z̃ by Q̃). Note also
that the notations K̃ = (I − Z̃ Z̃ ∗ K (I − Q̃ Q̃ ∗ ) and C̃ = (I − Z̃ Z̃ ∗ (η A − ζ B)(I − Q̃ Q̃ ∗ ) do not mean
that this products are computed explicitly.
Algorithm 10: Approximate solution of the deflated JDQZ correction equation (9.56), with preconditioning.
Solve Ẑ from K Ẑ = Z̃
Compute M = Q̃ ∗ Ẑ
Compute r̃ = K̃ −1 r as
1. solve r̂ from K r̂ = r
2. αE = M −1 Q̃ ∗ r̂
3. r̃ = r̂ − Ẑ αE
Apply a Krylov subspace method with start t0 ∈ Q̃ ⊥ to the linear problem with operator K̃ −1 C̃ and
right-hand side −r̃. Products z = K̃ −1 C̃v are computed as
1. y = (η A − ζ B)v
2. solve ŷ from K ŷ = y
3. αE = M −1 Q̃ ∗ ŷ
4. z̃ = ŷ − Ẑ αE
Operations concerning the inversion of the matrix M = Q̃ ∗ Ẑ can be done efficiently by making an
LU -decomposition once. Products z = M −1 v can now be computed by solving a lower-triangular
system and an upper-triangular system.
9.5.6 JDQZ Algorithm
First a global outline of the JDQZ algorithm is presented, which serves as a summary of the above
theory: Given a generalised partial Schur form
AQ k = Z k RkA ,
B Q k = Z k RkB ,
(9.63)
a searchspace Vm and a testspace Wm , compute a new generalised Schur triple
A
B
((Rk+1,k+1
, Rk+1,k+1
), qk+1 , zk+1 )
(9.64)
by
• first computing the projections M A = Wm∗ AVm and M B = Wm∗ BVm ;
88
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
• computing the generalised Schur triple ((ζ, η), s R , s L ) of (M A , M B ) with (ζ, η) closest to the
target generalised eigenvalue (α, β);
A
B
• computing the generalised Schur triple ((Rk+1,k+1
, Rk+1,k+1
), qk+1 , zk+1 ) = ((ζ, η), Vm s R , Wm s L )
of (A, B);
• using a linear solver (with preconditioning) to solve the correction equation
∗
(I − Wm s̃ L (Wm s̃ L )∗ )(I − Z k+1 Z k+1
)(η̃ A − ζ̃ B)(I − Q k+1 Q ∗k+1 )(I − Vm s̃ R (Vm s̃ R )∗ )t = −r,
(9.65)
where ((ζ̃ , η̃), s̃ R , s̃ L ) is the first generalised Schur triple not satisfying the tolerance and r =
∗
(I − Z k+1 Z k+1
)(η̃ A − ζ̃ B)(I − Q k+1 Q ∗k+1 )s̃.
• orthogonally expanding the searchspace Vm with t and the testspace Wm with ν0 Avm+1 +
µ0 Bvm+1 .
Algorithm 11 (initialisation) and 12 compute the partial Schur form
AQ k = Z k RkA ,
B Q k = Z k RkB ,
(9.66)
with Q k , Z k ∈ Rn×k orthogonal and RkA , RkB ∈ Rk×k upper triangular, for k = kmax . The generalised
eigenvalue approximations for (αi , βi ) are given by (RiiA , RiiB ). The corresponding (left) eigenvectors
can be computed from Q k , RkA and RkB .
Totally, kmax generalised eigenvalues (αi , βi ) closest to target τ are computed. The accuracy of the
computed partial generalised Schur form for a tolerance is
||AQ − Z R A ||2 = O(),
||B Q − Z R B ||2 = O().
Algorithm 11: Jacobi-Davidson QZ initialisation for kmax eigenvalues close to τ (JDQZ).
Input: Starting vector t = v0 .
Target τ .
Tolerance (||AQ − Q R|| < ).
Number of eigenvalues kmax .
Minimal searchspace m min .
Maximal searchspace m max .
Output: kmax generalised eigenvalues.
Partial generalised Schur form AQ = Z R A , B Q − Z R B .
k = 0, mp= 0
ν0 = 1/ 1 + |τ |2 , µ0 = −τ ν0 ,
Q = Z = R A = R B = []
Call Algorithm 12
c Koninklijke Philips Electronics N.V. 2002
89
2002/817
Unclassified Report
Algorithm 12: Jacobi-Davidson QZ for kmax eigenvalues close to τ (JDQZ).
while k < kmax do
{Orthogonalise t to Vm and expand V and W :}
for i = 1, . . . , m − 1 do
t = t − (v∗i t)vi
m = m + 1, vm = t/||t||2 , vmA = Avm , vmB = Bvm , w = ν0 vmA + µ0 vmB
V = [V, vm ], V A = [V A , vmA ], V B = [V B , vmB ]
for i = 1, . . . , k do
w = w − (z∗i w)zi
for i = 1, . . . , m − 1 do
w = w − (w∗i w)wi
wm = w/||w||2 , W = [W, wm ]
{Compute the expansion of the projection of A onto Vm :}
for i = 1, . . . , m − 1 do
A
A
Mi,m
= w∗i vmA ,Mm,i
= w∗m viA
B
∗ B
B
Mi,m = wi vm ,Mm,i = w∗m viB
A
B
Mm,m
= w∗m vmA , Mm,m
= w∗m vmB
Directly compute a Q Z decomposition M A S R = S L T A ,M B S R = S L T B such that |TiiA /TiiB −
A
B
τ | ≤ |Ti+1,i+1
/Ti+1,i+1
− τ|
R
u = V S·1 , p = W S·1L , u A = V A S·1R , u B = V B S·1R , ζ = T11A , η = T11B , r = ηu A − ζ u B , ã =
Z ∗ u A , b̃ = Z ∗ u B , r̃ = r − Z(ηã − ζ b̃)
{Test for acceptance of Schur vector approximations:}
while ||r̃|| ≤ do
B
R A ã
b̃
R
A
B
R =
,R =
0 ζ
0 η
Q = [Q, u], Z = [Z, p], k = k + 1
if k = kmax then
Return Q, Z, R A , R B
m =m−1
{Continue the search for a new Petrov pair with deflated space:}
for i = 1, . . . m do
R
R
R
vi = V S·,i+1
, viA = V A S·,i+1
, viB = V B S·,i+1
L
R
L
wi = W S·,i+1
, S·,i
= S·,i
= ei
A
B
M , M are lower m × m blocks of T A , T B
u = v1 , p = w1 , u A = v1A , u B = v1B , ζ = T11A , η = T11B ,
r = ηu A − ζ u B , ã = Z ∗ u A , b̃ = Z ∗ u B , r̃ = r − Z(ηã − ζ b̃)
{Test whether a restart is required:}
if m ≥ m max then
for i = 2, . . . m min do
R
R
R
L
vi = V S·,i
, viA = V A S·,i
, viB = V B S·,i
, wi = W S·,i
M A , M B are the leading m min × m min blocks of T A , T B
v1 = u, v1A = u A , v1B = u B , w1 = p, m = m min
Q̃ = [Q, u], Z̃ = [Z, p]
{Solve t ∈ Q̃ ⊥ from the deflated Correction Equation (approximately):}
(I − Z̃ Z̃ ∗ )(η A − ζ B)(I − Q̃ Q̃ ∗ )t = −r̃
90
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
9.5.7 Computational notes
The costs per iteration, without deflation and restarting, of Jacobi-Davidson QZ are about 56m 3 +
2n(n C + n G )(4i L S + k + 1) + 30mn + 19n + 4kn flops, where i L S is the number of iterations of the
linear solver, k is the dimension of the current partial generalised Schur form and m is the dimension
of the current searchspace. Assuming that C and G are sparse matrices, and n C and n G denote the
number of nonzero entries per row, a multiplication with C or G costs 2n n C or 2nn G flops. An action
of the (sparse) preconditioner is estimated to cost 2n(n C + n G ) flops; the costs for the construction of
the preconditioner are not counted. Deflation will introduce 8m 2 n + 14mn + 7n flops extra costs (but
will improve the process). A restart costs an additional 8m 2 n flops. The term 56m 3 is caused by the
Q Z operation; again it is clear that the searchspace must be kept small during the process.
Compared with JDQR, the JDQZ method is between 2 and 3 times more expensive (neglecting sparse
matrices); a same relation as between the Q R method and the Q Z method. If the matrices C and G
are sparse (and they are in pole-zero analysis: C has approximately 4 nonzero elements per row, G approximately 5), JDQZ will become attractive from a computational view if the maximal searchspace
√
is much smaller than the dimension n of the problem (m < 3 n). For a problem size of n = 300, this
would imply maximal searchspace of m = 7, which is rather small. However, the memory requirements for JDQZ are also twice the memory requirements of JDQR. The main reason for choosing
the JDQZ method is the accuracy of the computed eigenvalues: it is not necessary to convert the
generalised eigenvalue problem to an ordinary eigenvalue problem.
c Koninklijke Philips Electronics N.V. 2002
91
Chapter 10
Pole-zero analysis and eigenvalue
computation: numerical experiments
10.1 Introduction
This chapter will present the numerical results of eigenvalue computations on eigenproblems arising
from pole-zero analysis. The direct Q R- and Q Z method and the subspace method Arnoldi will be
covered globally, while the JDQR- and JDQZ-methods will be studied in more detail. For the latter
methods, there is an important role for the preconditioner. This role will be explained and filled after
a short summary of the pole-zero problem has been given.
10.2 Problem description
Chapter 6 describes the pole-zero problem and its numerical formulation in detail. Without loss of
generality, the problem of computing the poles will be the subject of all discussions. The problem of
computing the zeroes can be solved in the same way.
The poles of an electric circuit with corresponding conductance matrix G ∈ Rn×n and capacitance
matrix C ∈ Rn×n , are the values p ∈ C satisfying
det( pC + G) = 0,
(10.1)
where the number of unknowns is n. The matrices C and G are sparse and near-symmetric. The
corresponding generalised eigenvalue problem is
Gx = λCx,
x 6 = 0,
(10.2)
where the poles p are given by p = −λ. Because the matrices C and G are real matrices, the
eigenvalues are real or appear in complex-conjugated pairs.
The generalised eigenvalue problem can be converted to an ordinary eigenvalue problem if G and/or C
are in a way invertible. Because the matrix G is most likely to be invertible, the following formulation
of the ordinary eigenproblem will be used:
G −1 Cx = µx,
92
x 6 = 0,
(10.3)
Unclassified Report
2002/817
where the poles p are given by p = −1/µ. Of course, the sparse matrix G will not be inverted
directly. Besides the additional computational costs of the conversion, also issues concerning stability
and accuracy may arise.
The corresponding eigenvectors are not of direct importance for pole-zero analysis. Nevertheless,
eigenvectors associated with positive poles (i.e. negative eigenvalues), may be applied in the field of
PSS analysis, as will be described in Section 10.8.
As it is the case with almost all numerical problems, a deal between computational costs, accuracy
and robustness must be made. From the list of requirements concerning the accuracy given in Chapter
7, probably the most important one is that poles in the right half-plane should never be missed. Concerning the computational costs, it should be noted that the dimension varies from O(10) to O(104 ).
Poles (and zeroes) not much larger than a given maximum frequency are needed. The quality of the
computed poles is tested by the circuit designer on base of experience: for example by studying the
Bode-plot or the DC-gain. Last, but not least, the method must have a certain robustness, so that it
is applicable for a wide range of problems, without decreasing the quality. Not discarding all other
points made in this report, the numerical results will be discussed in the light of these requirements.
The generalised eigenvalue formulation (10.2) clearly is the most natural translation of the pole-zero
problem. However, after having discussed the role of the preconditioner, first the numerical experiments for the ordinary eigenvalue problem (10.3) will be presented.
10.3 Preconditioning
10.3.1 Left-, right- and split-preconditioning
Preconditioning is widely used in solving linear systems of the form Ax = b with iterative Krylov
solvers, but less known in the field of eigenproblems. The goal of preconditioning with a preconditioner K is to improve the spectral properties of the matrix A, and thus decreasing the condition number of the matrix K −1 A. As a result, the Krylov process should converge much faster.
There are three types of preconditioning, namely left-preconditioning, right-preconditioning and splitpreconditioning.
Left-preconditioning uses a preconditioner K ≈ A to form the better-conditioned system K −1 Ax =
K −1 b. Note that the solution of this system is the same as the solution x of the original system.
However, the computed residuals are not the same.
Right-preconditioning considers the system AK −1 y = b, where y = K x. In this case the computed
residuals are equal to the residuals of the original system, but the solution x must be solved from
K x = y.
Split-preconditioning uses a preconditioner K = K 1 K 2 to yield the system K 1−1 AK 2−1 y = K 1−1 b,
with K 2 x = y.
Focusing on eigenproblems, preconditioning appears in a slightly different way. An example of preconditioning is Shift-and-Invert, discussed in Chapter 7 and [23, 36], where the term preconditioning
is appropriate because of the better spectral properties of the converted system. Another example is
polynomial preconditioning, where the matrix A is replaced by a matrix B = pk ( A) for a polynomial
pk . This type of preconditioning will not be discussed in this report; Shift-and-Invert techniques are
especially useful for finding interior eigenvalues. Because all eigenvalues are wanted, and there is
c Koninklijke Philips Electronics N.V. 2002
93
2002/817
Unclassified Report
no good target for the interior eigenvalues, Shift-and-Invert techniques will not be discussed in detail. Nevertheless, numerical experiments confirm that this technique is very efficient if a target is
available, as is also reported in [4].
For Jacobi-Davidson style methods, the preconditioning techniques for linear systems can be used
to solve the correction equation efficiently. Furthermore, Shift-and-Invert techniques can be used to
direct the convergence towards the desired eigenvalues.
10.3.2 Requirements
Based on various literature [14, 28, 23] and own experiences, there are several requirements which a
preconditioner K should satisfy:
• The matrix of the preconditioned system should have better spectral properties.
• Systems of the form K z = r should be solved easily.
• The preconditioner should be constructed efficiently in both time and memory.
• The preconditioner should resemble the structure of the original matrix in a certain way.
Preconditioners which are used in Jacobi-Davidson style methods to solve the correction equation,
suffer from at least three difficulties, as has been argued in the Sections 9.4.3 and 9.5.4:
1. The preconditioner should approximate a nearly singular matrix.
2. The original system changes every iteration, so the preconditioner should be updated.
3. The preconditioner approximates a matrix which is projected afterwards.
Approximating a nearly singular matrix may give accuracy problems with for instance incomplete
LU -decompositions. Projecting the preconditioner does not always lead to a good approximation for
the projected matrix [28], and computing a preconditioner for the projected matrix is in general no
option because the original matrix is sparse and the projected matrix is dense. This last argument
especially holds for JDQZ and in a less strong sense for JDQR, because the matrix G −1 C already is
dense.
Having a possible implementation in Pstar in mind, the preconditioner should also fit into the hierarchy and the memory. In other words, one should remember that direct access of matrix elements is
expensive, because of the way the matrices are stored in Pstar: one has to traverse through a a linked
list of sub-matrices to obtain specific elements.
In the following subsections some proposals for preconditioners (to be used with the Jacobi-Davidson
methods) are discussed.
10.3.3 Diagonal preconditioning
Diagonal preconditioning is the cheapest form of preconditioning and is only used for strongly diagonally dominant systems. The associated preconditioner K is defined by
Ai j if i = j ;
Ki j =
(10.4)
0
if i 6 = j.
94
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Not much is to be expected from this preconditioner in pole-zero analysis, because the problems are
not strongly diagonally dominant. Nevertheless, numerical experiments will be used to check this.
10.3.4 ILUT preconditioning
Many preconditioning techniques are based on incomplete LU decomposition, of which a variant of
ILUT [17, 24] will be discussed here. The T in ILUT originates from Tolerance. The idea behind
the tolerance (or drop-tolerance) is to control the fill-in and the computational costs of the incomplete
LU decomposition of a (sparse) matrix A. To that end, off-diagonal elements of the current columns
of L and U which are smaller than the tolerance t times the norm of the corresponding column of the
original matrix A (the local drop-tolerance), are replaced by zero. Algorithm 13 gives a description
of the ILUT algorithm.
Algorithm 13: The algorithm for computing the incomplete LU decomposition with tolerance t.
for i = 1 to n do
for j = 1 to i − 1 do
if Ai j 6 = 0 then
L i j = Ai j / A j j
if L i j < t||A· j || then
Lij = 0
if L i j 6 = 0 then
for k = j + 1 to n do
Aik = Aik − L i j U j k
for j = i to n do
if Ui j < t||A· j || then
Ui j = 0
else
Ui j = Ai j
Note that the algorithm is simplified by discarding pivoting. The experiments however make use of
the MATLAB function luinc which does pivoting. The important difference with Saad’s ILUT is
the absence of a parameters which controls the fill-in directly. Furthermore, if a zero appears on the
diagonal of U , this is replaced by the local drop-tolerance.
Increasing the drop-tolerance t will lead to cheaper, more sparse and less accurate I LU decompositions. Decreasing the drop-tolerance does increase the costs and accuracy. A tolerance t = 0
corresponds to the standard LU decomposition. Tolerances smaller than 1 · 10−6 are not interesting
as preconditioner because of the computational costs.
The ILUT preconditioner will be tested for a range of tolerance values. With an eye on the requirements of the preconditioner applied to the Jacobi-Davidson methods, it is interesting to watch the
costs, the number of actual ILUT-decompositions and the fill-in of the preconditioner during the
Jacobi-Davidson process. These points will be discussed in the respective sections.
c Koninklijke Philips Electronics N.V. 2002
95
2002/817
Unclassified Report
10.4 Test problems
All test problems are chosen from the Pstar test set, a collection of pole-zero problems which are either
numerically difficult, give rise to computational problems or have a high dimension. In [4] a big deal of
these test problems are covered and the interesting test problems are identified. The 12 test problems
are of varying dimensions and lead (or have led) to difficulties for the current Pstar implementation of
the Arnoldi algorithm, like producing non-existing right half-plane poles or missing poles, computing
wrong poles and zeroes, and accuracy and robustness problems (which lead to wrong Bode-plots). A
typical structure of the matrices G and C associated with an electrical circuit is shown in Figure 10.1.
The matrices are nearly symmetric and very sparse: an average number of four nonzero elements per
row. The relatively dense parts are located in a small part of the matrix (this is not necessarily true for
all problems).
0
0
20
20
40
40
60
60
80
80
100
100
120
120
140
140
160
160
0
20
40
60
80
100
nz = 656
120
140
160
0
20
40
60
80
100
nz = 244
120
140
160
Figure 10.1: The non-zero pattern of the matrices G (656 non-zeroes) and C (244 non-zeroes) associated with test problem pz 28.
In the following sections these test problems will be studied and special attention will be paid to the
diagnosis already known beforehand.
10.5 Test environment
All experiments are executed in Matlab version 5.3.1 on a HP-UX 9000/800 system (the system is not
of real importance, as cpu-time characteristics will seldom be given).
The Matlab data is created by Pstar version 4.1 HPUX10, by setting some additional debug parameters1 . Pstar is an analogue circuit simulator, written in C. The data provided by Pstar is listed in Table
10.1. The data reflects the situation just before the actual eigenvalue computation process is started
(i.e. Arnoldi or the Q R method) and is printed with 15 decimals.
1 LIN ITER DEBUG=50 and STOP AFTER FLAT=1.
96
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
Identifier
G
C
GC
GC_red
maxC
rowScale
2002/817
Description
The conductance matrix (scaled).
The capacitance matrix (scaled).
The product G −1 C, via LU -decomposition.
The product G −1 C, where the rows and columns containing only zeroes are
removed.
The scale factor maxi, j |Ci j |, with which C is scaled.
The row-scale factors with which the rows of G and C are scaled (for row i, this
factor is max j (|Ci j |, |G i j |)).
Table 10.1: Numerical data obtained from Pstar.
The notation G or cond(G) will be used to refer to variables, functions and results in Matlab, while
the ordinary G and C will be used in numerical analysis.
10.6 The ordinary eigenvalue problem
10.6.1 Strategy
The ordinary eigenvalue problem associated with a pole-zero analysis of a circuit is
G −1 Cx = λx,
x 6 = 0.
(10.5)
The matrix G is not inverted explicitly, but will be decomposed as G = LU , so that products z =
G −1 y can be computed by solving Lx = y followed by solving z from U z = x. As has been argued
in Section 7.4, this action introduces a possible loss of log(Cond(G)) digits.
In Pstar, the following step is to calculate the eigenvalues of G −1 C. If the eigenvalues are computed
with the direct Q R method, the matrix G −1 C is computed explicitly from the LU factors. If the
eigenvalues are computed with Arnoldi iterations, matrix-vector products G −1 Cy are computed by
solving the systems with U and L. In the experiments presented here, however, all computations will
be done with matrices which are available explicitly. In practice this means that the product G −1 C is
computed by using the LU -factorisation computed by Matlab.
The poles pi of the problem are computed as pi = −1/(λi · maxi, j |Ci j |) for λi 6 = 0. These values
can be compared with the values computed by Pstar. The quality of the eigenvalues and poles can be
tested by computing the determinants det(G −1 C − λi I ) and det( pi C + G), and, with as the columns
of V the eigenvectors of G −1 C and D a diagonal matrix with the corresponding eigenvalues, the norm
of ||G −1 C V − V D||2 , which must be zero in exact arithmetic. Another way to compare the results is
to make a Bode-plot, for which the zeroes are needed also.
10.6.2 Numerical results
Numerical properties of G and C
Before the focus is moved to the eigenvalue methods, some numerical properties of the test problems
are presented. Table 10.2 and Table 10.3 show the dimension of the problem, the condition number
c Koninklijke Philips Electronics N.V. 2002
97
2002/817
Unclassified Report
Test problem
pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
size(G)
504
19
177
30
10
10
96
7
28
12
8
815
Cond(G)
8.1e+05
3.0e+06
7.2e+04
4.5e+06
1.1e+01
2.2e+05
4.6e+07
4.4e+03
1.2e+04
4.5e+04
2.8e+03
1.8e+010
nnz(G)
2166
45
656
76
19
30
326
18
113
33
21
3870
nnz(C)
1307
26
244
44
13
29
246
5
131
10
7
2152
Table 10.2: Numerical properties of the matrices G and C associated with the test problems. Condition
numbers larger than 101 4 indicate singularity (and are in fact infinite).
of G, the number of non-zeroes in G, C and G\C, the difference between G\C (G −1 C computed by
LU -decomposition in Matlab) and GC (G −1 C computed by Pstar), and the dimension and condition
number of the reduced product G −1 C. The matrix C in general has an infinite condition number,
because of the rows and columns which are equal to zero (resulting in a minimal singular value (and
absolute eigenvalue) of zero).
The values give no indications for trouble beforehand, except for problem pz_31. For the other
problems, the differences between the product G −1 C computed with Matlab and with Pstar can be
explained by the condition number of G and its consequences. Furthermore, reducing the problem
does have a great impact on the dimension of the problem and will decrease the computational costs.
Moreover, it improves the condition number of G −1 C, as well as the condition of the eigenvalues
because Jordan blocks may be removed.
Concerning problem pz_31, a possible cause may be that there is a difference between the LU process in Matlab and Pstar. Fact is that this relative large difference will influence the eigenvalues.
The Q R method
The Matlab function eig implements the Q R method and is used for the experiments. Because no
accuracy and robustness problems are reported for the Q R-implementation in Pstar, it is not to be
expected that the results with eig(U\L\C) differ much from the results of Pstar. Indeed, numerical
experiments show that the poles only differ in the digits which already were uncertain due to the
inversion of G. The results for problem pz_31, with the rather large difference between the product
G −1 C computed by Matlab and Pstar, are equal up only to the fourth digit: this can be explained by
the difference between U\L\C and GC.
To make a good comparison with other methods, it is necessary to mention the computational costs
and the accuracy of the Q R method. The computational costs are measured in flops, the number of
floating point operations (which are reported by Matlab). The computation of the LU -decomposition
of G is included in the costs. An indication of the accuracy is obtained by computing the determinants
98
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Test problem
pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
nnz(G\C)
177755
183
11100
220
24
67
4920
30
379
110
31
508843
norm(G\C-GC)
3.565e-09
8.7064e-08
1.0738e-10
7.5696e-05
1.1102e-16
6.5276e-15
2.0714e-10
1.3662e-18
1.6072e-11
0
1.6078e-13
2.3752e-08
size(GC_red)
365
13
74
12
7
8
88
5
20
10
5
681
cond(GC_red)
7.3e+21
6.2e+24
2.7e+16
Inf
4.4e+24
4.2e+28
2.2e+25
8e+06
1.5e+25
1.8e+02
2.8e+17
4.0e+061
Table 10.3: Numerical properties of the matrices G and C associated with the test problems.
Test problem
pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
flops
4.6e+8
3.7e+4
9.3e+6
5.5e+4
5.7e+3
6.6e+3
5.8e+6
2.5e+3
1.3e+5
1.8e+4
2.9e+3
-
maxi (| det(G −1 C − λi I )|)
Inf
Inf
Inf
1.4e+236
7.5e-38
2.4e-48
Inf
3.3e+05
0
5.2e+24
6.4e+08
Inf
maxi (| det( pi C + G)|)
Inf
Inf
Inf
1.4e+236
3.4e+81
3.8e+20
Inf
1e-09
1.9e+41
2.5e-43
0.0016
Inf
||G\C*V-V*D||2
2.3e-10
2e-11
8e-12
1.2e-05
7.5e-18
3.8e-18
1.6e-11
8.9e-11
3.3e-07
4.5e-11
9.3e-13
4.1e-10
Table 10.4: Computational costs and accuracy of the Q R method.
det(G −1 C − λi I ) and det( pi C + G), and by determining the matrix-norm ||G −1 C V − V D||2 . The
results are listed in Table 10.4. The results for the reduced problems are listed in Table 10.5.
The number of flops for problem jr_1 is not available: the reason is that, due to its high dimension,
the experiments with jr_1 have been done in Matlab 6.12, which does not support flopcounts. The
determinants det( pi C +G) and det(G −1 C −λi I ) do not seem to be a good measure of the accuracy, or,
formulated better, they give extremely high values although the computed eigenvalues are accurate.
This is caused by the behaviour of the determinants. The determinants can be very sensitive near
poles or eigenvalues, resulting in extremely large determinants near poles or eigenvalues. An example
situation is given in Figure 10.2, where det(G −1 C − s I ) is plotted for test problem ER_11908,
near the eigenvalue λ ≈ 9.999 · 102 . The accuracy of the Q R method depends on the norm of
G −1 C: Q T (G −1 C + E)Q = R, with ||E||2 ≈ ||G −1 C||2 [14]. The computational costs are near
the theoretical approximations of 13n 3 for the whole process, including the LU -decomposition and
the Q R method (only computing the eigenvalues). The costs are a little bit lower because the costs
c Koninklijke Philips Electronics N.V. 2002
99
2002/817
Test problem
pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
Unclassified Report
flops
3.9e+8
2.9e+4
6.2e+6
4.3e+4
4.6e+3
8.1e+3
5.6e+6
2.4e+3
1.2e+5
5.9e+3
3.2e+3
-
maxi (| det(G −1 C − λi I )|)
Inf
Inf
1.4e+236
1.4e+236
1.2e-32
9.5e-42
Inf
3.2e-25
0
1.1e+15
4.4e-05
Inf
maxi (| det( pi C + G)|)
Inf
Inf
Inf
1.4e+236
4.9e+79
9.9e+21
Inf
1e-11
1.7e+38
1.5e-42
5.5e+02
Inf
||G\C*V-V*D||2
2.6e-07
3.6e-11
1.3e-11
1.9e-06
1.1e-17
6.4e-19
2.4e-11
8e-11
1.8e-08
1.5e-11
7.4e-13
4.1e-10
Table 10.5: Computational costs and accuracy of the Q R method, applied to the reduced problems.
17
10
16
10
15
10
14
det(G\C−s*I)
10
13
10
12
10
11
10
10
10
9
10
8
10
999.9997
999.9997
999.9998
999.9998
s
999.9999
1000
1000
Figure 10.2: The log of the absolute determinant det(G −1 C − s I ) plotted as function of s.
100
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
eigs parameter
K
SIGMA
OPTIONS.tol
OPTIONS.p
OPTIONS.maxit
2002/817
Description
The number of eigenvalues desired.
Location of wanted eigenvalues.
Convergence tolerance: ||AV −
V D||1 <=tol*||A||1 .
Dimension of the Arnoldi basis,
i.e. the maximum dimension of
the upper-Hessenberg matrix before restarting.
The maximum number of outer iterations.
Values (actual)
(5)
(Largest Magnitude), Smallest
Magnitude, Largest Real part,
Smallest Real part, Both ends, or a
target value.
(1e-8)
(2*K)
(300)
Table 10.6: Arnoldi parameters with description and values (with actual value between parentheses)
used for the experiments.
of the LU -decomposition and the computation of G −1 C are lower, as the matrices G and C are
sparse. The difference in computational costs between the unreduced and reduced problems depends
on the reduction. Although this reduction is in some cases about 50%, the computational costs do not
decrease that much (the spectral properties do not change). Nevertheless, the condition of the problem
is improved and the number of Jordan blocks is decreased.
Arnoldi iterations
In Matlab, the function eigs implements the Arnoldi method. More specifically, it uses Arnoldi
iterations with restarting, in contrary to the implementation in Pstar (version 4.1), which does not use
a restarting strategy. A restarting strategy has two possible advantages, namely a reduction of the
memory usage and a reduction of the computational costs. Both reductions are caused by the upper
Hessenberg matrix which is constructed during the Arnoldi process. If a restarting strategy is used,
the dimension of this Hessenberg matrix has a maximum which is in general smaller than the number
of wanted eigenvalues. As a result, the Q R method, which is used to compute the eigenvalues of the
upper Hessenberg matrix, can be applied to a smaller problem (but multiple times).
In Chapter 7, the practical use of Arnoldi iterations to compute all eigenvalues has been discussed.
Arnoldi iterations may be preferred above direct methods to save memory; however, it is not expected
that a reduction of the computational costs can be obtained if all eigenvalues are needed. This statement will be tested for the test problems. Another issue is the robustness of the Arnoldi method. For
some problems from the test set, the Pstar implementation of Arnoldi gives wrong results, varying
from computing positive poles while there are none to not computing any (correct) poles.
Table 10.6 shows the parameters which are used for the experiments with Arnoldi iterations. The
starting vector is a random vector with elements between −0.5 and 0.5. It is noted that one outer
iteration constructs an Arnoldi basis of dimension 2*K. Of more interest is the total number of matrixc Koninklijke Philips Electronics N.V. 2002
101
2002/817
Unclassified Report
Test problem
pz_09
pz_20
pz_28
pz_31
pz_38
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_sh3_crt
ER_9883
ER_11908
jr_1
#MV
372
14
75
12
5
7
8
94
20
10
5
684
flops
1.66e+10
3.73e+05
6.49e+07
2.22e+05
1.57e+04
4.86e+04
7.59e+04
2.31e+08
3.644e+05
6.76e+04
1.67e+04
-
maxi (| det( pi C + G)|)
Inf
1.6e+76
Inf
2e+38
3.4e+25
6.2e+68
Inf
Inf
3.3e+18
4.4e-42
1.6e-21
Inf
||G\C*V-V*D||2
5.5e-05
5.1e-10
3.8e-07
0.00065
4.2e-10
1.9e-15
5.7e-15
2e-06
2.6e-12
2.5e-11
1.7e-12
1.5e-4
Table 10.8: Computational costs and accuracy of the Arnoldi method, applied to the reduced problems.
vector multiplications, but this number is not given explicitly. The product of the total number of
outer iterations and the Arnoldi basis dimension is a rough estimate for the number multiplications.
Nevertheless, a simple counter is added to the Matlab eigs code to obtain the exact number. The
functional call is eigs(U\L\C, opt, n).
Table 10.7 shows the results of using Arnoldi iterations to compute all eigenvalues. Table 10.8 shows
the results for the reduced problems. An extra column, #MV, is added to list the number of matrixvector multiplications G −1 Cx, which is of interest when Arnoldi is compared to other iterative subspace methods, especially Jacobi-Davidson methods.
Test problem
pz_09
pz_20
pz_28
pz_31
pz_38
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_sh3_crt
ER_9883
ER_11908
jr_1
#MV
509
22
179
30
7
10
10
101
28
12
8
817
flops
3.06e+10
1.67e+06
7.71e+08
2.34e+06
3.28e+04
1.15e+05
1.29e+05
2.61e+08
1.77e+06
1.60e+05
4.13e+04
-
maxi (| det( pi C + G)|)
Inf
2e+86
Inf
2.7e+44
5.3e+71
1.3e+65
2.9e+53
Inf
2.1e+21
1.2e-42
2.5e+56
Inf
||G\C*V-V*D||2
0.00016
0.00026
3.3e-07
0.0011
1.3e-09
1.7e-10
1.6e-15
4.7e-06
2.8e-12
7.4e-11
2.5e-07
5.5e-4
Table 10.7: Computational costs and accuracy of the Arnoldi method.
Compared to the results of the Q R method, the expected increase of the computational costs is confirmed by the experiments. The accuracy of the eigenvalues and eigenvectors is not for all below the
tolerance of 1e-8. The most disappointing cases are problems pz_09, pz_20 and pz_31. These
exceptions are most likely caused by the fact that in finite arithmetic, the Ritz estimates, used to es102
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
timate the convergence, become poor error estimates as the process converges (and the Ritz estimate
become small). Although this is tackled by using the more expensive error ||AV − V D||2 when the
estimates become too small, the effects can not be neglected. If the problems are reduced, the accuracy increases. Only problem pz_31 still has very poor accuracy. The difference in computational
costs between the unreduced and reduced problem is more pronounced than for the Q R method.
Apparently, the gain is made in the dominant matrix-vector multiplications.
For certain test problems, it is known that the Pstar implementation of Arnoldi experiences robustness
problems. For example, problem pz_35_osc has four poles, namely p1 ≈ −2.871 · 104 , p2 ≈
−1.698 · 106 , p3 ≈ 6.114 · 102 − 1.601 · 107 i, p4 ≈ 6.114 · 102 + 1.601 · 107 i. In Pstar, these poles
are found using the Q R method. However, the Arnoldi method only finds two negative poles and a
third positive pole p̃ ≈ 1.732 · 106 , which is not a pole at all. Using the Matlab implementation of the
Arnoldi method does lead to the correct poles, so the reason for the problems of Pstar is likely to be
in the (numerical) software, and not in the numerical properties of the specific test problems.
A first thought is the Gram-Schmidt process, used to orthogonalise the basis constructed by the
Arnoldi process. In Pstar, the ordinary Gram-Schmidt method is used, with a number of iterative
refinement steps. It is known that modified Gram-Schmidt with one additional orthogonalisation is
sufficient to take account for cancellation effects [24]. A study of the eigs codes reveals that this
implementation also uses ordinary Gram-Schmidt, with only one additional orthogonalisation. Theoretically it is sufficient to use ordinary Gram-Schmidt with one iterative refinement (every iteration),
although there are a few theoretical counter examples. This implementation does lead to correct results for the test problems. So the Gram-Schmidt implementation does not seem to be the cause of the
problems.
Some further digging in the source code reveals that the private parameter PZ_CRIT_ARNOLDI is
used to test whether an invariant subspace is reached. The standard value of this parameter is 1e-15,
and this is very dangerous for two reasons. First, the norm of the new basis vector is compared with
this parameter before cancellation effects are compensated by iterative refinement. Secondly, it is too
large, as it is not meant to take account for the machine precision, but for the detection of an invariant
subspace. Setting this parameter to a value of 1e-30 does solve the problem: Pstar is able to compute
the correct poles with the Arnoldi method. From a numerical point of view, however, it is advised to
use modified Gram-Schmidt instead of ordinary Gram-Schmidt.
There are indications that there are more errors in the Pstar implementation. Pstar cannot compute
the correct poles of problem pz_09, while the Arnoldi algorithm of Matlab does compute the correct
poles.
An advantage of the Arnoldi method above the Q R method is that the eigenvectors are computed
together with the eigenvalues. However, the application of eigenvectors in pole-zero analysis is restricted to the field of PSS analysis. Another advantage, of more practice for the end-user, are the
intermediate results of the Arnoldi method. Answers may come available during the process.
The JDQR-method
The JDQR method has been described in Chapter 9. For the experiments, the Matlab code, based on
the algorithms in [1], of Gerard Sleijpen has been used (dated 1998). The most important property
of Jacobi-Davidson style methods to keep in mind during the experiments, is the property that it has
fast convergence to the eigenvalues near a specified target. This means that Jacobi-Davidson style
c Koninklijke Philips Electronics N.V. 2002
103
2002/817
JDQR parameter
K
SIGMA
OPTIONS.Tol
OPTIONS.jmin
OPTIONS.jmax
OPTIONS.MaxIt
OPTIONS.v0
OPTIONS.Schur
OPTIONS.TestSpace
OPTIONS.LSolver
OPTIONS.LS_Tol
OPTIONS.LS_MaxIt
OPTIONS.Precond
Unclassified Report
Description
The number of eigenvalues desired.
Location of wanted eigenvalues.
Convergence tolerance: ||AQ −
Q R||2 <=Tol.
Minimum
dimension
searchspace.
Maximum
dimension
searchspace.
Maximum number of JD iterations.
Starting space.
Gives Schur decomposition.
The type of testspace.
The linear solver for the correction equation.
Residual reduction of the linear
solver.
Maximum number of iterations
of the linear solver.
Preconditioner for solving the
correction equation (user defined).
Values (actual)
all
(Largest Magnitude), Smallest
Magnitude, Largest Real part,
Smallest Real part, Both ends,
or a target value
(1E-8)
min(10,n)
2*min(10,n)
(100)
(ones + 0.1*rand)
(’no’),’yes’
Harmonic, (’Standard’)
MINRES,
GMRES,
CG,
SymmLQ, BiCGStab, exact
(0.7)
(5)
(None)
Table 10.9: JDQR parameters with description and values (with actual value between parentheses)
used for the experiments.
methods may not be that suitable for finding all eigenvalues, unless a range of targets is available.
In pole-zero analysis, there is no information about the poles in general, or only about the dominant
poles. Nevertheless, the JDQR method will be used to compute all eigenvalues.
The JDQR method has a range of parameters which influence the convergence. From a practical point
of view, it is desired that many of these parameters have certain default values which work quite well
for many pole-zero problems. Table 10.9 shows all parameters used, together with a description and
the values actually taken for the experiments. If for certain experiments other parameter values are
used, this will be mentioned by the experiments concerned. The convergence tolerance for the norm
of the Schur pairs, ||AQ − Q S||, is 1 · 10−8 . Furthermore, the parameters jmin and jmax control
the searchspace dimension. This means that the process is restarted at a dimension of jmax and
thereby reduced to a searchspace of dimension jmin, keeping the subspace spanned by the jmin
best Ritz-vectors (i.e. the Ritz vectors with corresponding Ritz values closest to the target).
The choice of the minimal searchspace is rather tricky. The initialisation phase consists of the Arnoldi
algorithm, which takes as dimension of its searchspace the minimal search-space dimension. It is not
104
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
unlikely that some eigenvalues are detected during this phase. However, besides the fact that these
eigenvalues may not be near the target (which is not important for the moment since all eigenvalues
are desired), the number of eigenvalues to be found by the JDQR process itself is reduced. For optimal
study of the convergence of the JDQR process, this may be somehow disturbing.
Finally, there is a range of methods to solve the correction equation. If this correction equation is
solved exactly, quadratic convergence is expected. If an iterative linear solver is used, with or
without preconditioning, the convergence speed depends on the tolerance LS_Tol and the maximum
number of linear solver iterations LS_MaxIt. The behaviour of the JDQR-method will be studied by
using an exact solver (Gaussian elimination), the GMRES method without preconditioning and the
GMRES method with the preconditioners discussed in Section 10.3.
First of all, the results of the various strategies will be presented. After that, the JDQR results of test
problem pz_28 will be analysed in depth, where topics like the convergence behaviour and the role
of the preconditioner will be discussed.
Table 10.10 and Table 10.11 list the results for the JDQR method with exact solving of the correction
equation.
Test problem
pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
#it
664
0
130
0
0
0
99
0
0
0
0
877
#MV
674
0
141
0
0
0
111
0
0
0
0
3561
flops
6.00e+11
1.92e+06
6.79e+09
8.50e+06
2.54e+05
1.27e+05
1.31e+09
3.20e+04
6.68e+06
2.88e+05
4.22e+04
-
maxi (| det( pi C + G)|)
Inf
7.9e+99
Inf
5.3e+31
2.4e+78
2.3e+73
Inf
1.4e+83
1.2e+17
4.6e-44
3.7e+54
9.2e+295
||G\C*V-V*D||2
5.4e-11
2.7e-11
7.4e-12
1.5e-09
1.1e-15
3.2e-16
2.4e-11
1.1e-12
9.7e-13
1.2e-11
5.2e-13
1.6e-010
Table 10.10: Computational costs and accuracy of the JDQR-method, with Gaussian elimination.
c Koninklijke Philips Electronics N.V. 2002
105
2002/817
Test problem
pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
Unclassified Report
#it
659
0
114
0
0
0
106
0
0
0
0
999
#MV
669
0
124
0
0
0
118
0
0
0
0
1013
flops
2.89e+11
5.68e+05
6.23e+08
3.91e+05
1.02e+05
6.53e+04
1.06e+09
1.48e+04
2.00e+06
1.90e+05
1.46e+04
-
maxi (| det( pi C + G)|)
Inf
2.3e+84
Inf
9e+26
3.8e+73
1.2e+69
Inf
1.6e+09
1.6e+17
4.1e-44
0.13
0
||G\C*V-V*D||2
5.3e-11
7.9e-12
7.3e-12
3.8e-10
4e-16
1.9e-16
2.6e-11
1.1e-12
5.5e-13
1.5e-11
7.7e-13
6.8e-009
Table 10.11: Computational costs and accuracy of the JDQR-method, with Gaussian elimination,
applied to the reduced problems.
The quality of the computed eigenvalues with norm larger than the Schur-tolerance is good, positively
comparable with the eigenvalues computed by Arnoldi. JDQR does find all poles, however, there
is a large number of eigenvalues with norm smaller than the Schur-tolerance. In exact arithmetic,
about 90% of these small eigenvalues are zero; in the results of JDQR (and other iterative methods
like Arnoldi) one cannot distinguish the eigenvalues which should be zero, from the small, but nonzero eigenvalues. Although decreasing the tolerance may help, there is still the risk that eigenvalues
are discarded while they are not equal to zero in exact arithmetic. The Q R method does not suffer
from these problems; the Arnoldi method does have these symptoms, but in a less severe way. Shiftand-Invert techniques can be used to improve the accuracy of the small eigenvalues, but this requires
additional invert operations.
Reducing the problem size by removing rows and columns which are zero decreases the computational
costs, but the number of Jacobi-Davidson iterations remains equal. This confirms the thought that
reducing the problem does not influence the spectral properties of the problem, and hence does not
improve the convergence. Note also that almost every two iterations a new eigenvalue is found,
because the number of total iterations is about twice the number of eigenvalues.
Tables 10.12 and 10.13 show the results of JDQR with GMRES as solver for the correction equation.
106
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
Test problem
pz_09*
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
2002/817
#it
1999
0
388
0
0
0
218
0
0
0
0
898
#MV
8162
0
1576
0
0
0
957
0
0
0
0
3627
flops
3.61e+10
1.92e+06
4.86e+09
8.50e+06
2.54e+05
1.27e+05
7.96e+08
3.20e+04
6.68e+06
2.88e+05
4.22e+04
-
maxi (| det( pi C + G)|)
8.2e-23
7.9e+99
Inf
-5.3e+31
2.4e+78
2.3e+73
Inf
1.4e+83
1.2e+17
4.6e-44
3.7e+54
3.3e+295
||G\C*V-V*D||2
4.5e-10
2.7e-11
6.7e-12
1.5e-09
1.1e-15
3.2e-16
2.4e-11
1.1e-12
9.7e-13
1.2e-11
5.2e-13
1.5e-010
Table 10.12: Computational costs and accuracy of the JDQR-method, with unpreconditioned GMRES.
Test problem pz_09 is marked with a * because JDQR was not able to find all eigenvalues within
2000 iterations (only 233 out of 509).
Test problem
pz_09*
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
#it
1999
0
363
0
0
0
215
0
0
0
0
865
#MV
8099
0
1595
0
0
0
971
0
0
0
0
3500
flops
2.42e+10
5.68e+05
4.87e+08
3.91e+05
1.02e+05
6.53e+04
6.22e+08
1.48e+04
2.00e+06
1.90e+05
1.46e+04
-
maxi (det( pi C + G))
2.7e-18
2.3e+84
Inf
9e+26
3.8e+73
1.2e+69
Inf
1.6e+09
1.6e+17
4.1e-44
0.13
0
||G\C*V-V*D||2
4.6e-10
7.9e-12
5.8e-12
3.8e-10
4e-16
1.9e-16
2.1e-11
1.1e-12
5.5e-13
1.5e-11
7.7e-13
5.4e-009
Table 10.13: Computational costs and accuracy of the JDQR-method, with unpreconditioned GMRES, applied to the reduced problems.
Compared with the results for the JDQR method with exact solving of the correction equation, three
remarks can be made. Firstly, there is nearly no influence on the accuracy of the eigenvalues. Secondly, JDQR with unpreconditioned GMRES performs better for larger problems (n & 100) than
JDQR with the exact solver with respect to computational costs. Thirdly, the results of the smaller
problems are for both strategies nearly the same. This is not caused by the solver used, but by the
fact that all eigenvalues are already found during the initialisation phase, when the Arnoldi method is
used. Therefore, only the problems pz_09, pz_28 and pz_36_osc will be considered for the experiments with preconditioned GMRES. The computational costs are lower for the reduced problems.
c Koninklijke Philips Electronics N.V. 2002
107
2002/817
Unclassified Report
As an intermezzo an impression of the accuracy is given, by means of the Bode-plots of the problems
pz_20, pz_31, pz_36_osc and pz_sh3_crt, which are all known to experience problems with
the Arnoldi method. Figure 10.3 shows the corresponding Bode-plots for the exact case (via computation of (sC + G)−1 for several s), the Q R method (eig), the Arnoldi-method (eigs), JDQR with
Gaussian elimination and JDQR with GMRES.
20
0
10
−50
0
−100
−20
|H(f)| (dB)
|H(f)| (dB)
−10
−30
−150
−200
−40
−250
−50
−60
−70
0
10
exact
QR
Arnoldi
JDQR exact
JDQR gmres
2
10
−300
4
10
6
10
f (Hz)
8
10
10
10
−350
0
10
12
10
−28
exact
QR
Arnoldi
JDQR exact
JDQR gmres
2
10
4
10
6
10
f (Hz)
8
10
10
10
12
10
50
−29
−30
0
|H(f)| (dB)
|H(f)| (dB)
−31
−32
−50
−33
−34
−35
−36
0
10
−100
exact
QR
Arnoldi
JDQR exact
JDQR gmres
2
10
exact
QR
Arnoldi
JDQR exact
JDQR gmres
4
10
6
10
f (Hz)
8
10
10
10
12
10
−150
0
10
2
10
4
10
6
10
f (Hz)
8
10
10
10
12
10
Figure 10.3: Bode-plots for the problems pz 20 (elementary response (i, o) = (1, n − 1)), pz 31
((1, 19)); pz 36 osc ((1, 19)) and pz sh3 crt ((1, n −1)). The response (i, o) means the response
of the o-th element to the excitation of the i-th element (i.e. (sC + G)oi ).
The results are remarkable. The plot computed by the exact method can be used as reference. The
plots for pz_20 and pz_36 are for all eigenmethods nearly equal, which indicates that the poles are
of the same accuracy. However, the plots for pz_31 and pz_sh3_crt show severe discrepancies.
The inaccuracies are caused by the typical spectra of the problems: one or two extremal eigenvalues
and many very small, but non-zero eigenvalues, which are about a factor 1015 smaller than the extremal
eigenvalues. This results in problems for iterative methods, where the absolute error is nearly equal
for both the smallest and the largest eigenvalues, and hence the relative error varies enormously.
The smallest eigenvalues are needed for the poles. The problems are already visible in the matrix
C, where the elements vary 14 orders in magnitude. A possible strategy might be to compute the
eigenvalues in two or more runs: one run for the smallest eigenvalues, and another run for the largest
eigenvalues. However, information about the spectrum, both qualitative and quantitative, is needed
108
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
for that strategy. The Q R method has also problems, as it is not able to produce the same plot as the
exact plot for pz_31 and pz_sh3_crt. It appears that a pole and a zero do not cancel, while they
do in the exact solution. Possible reasons are that accuracy has been lost during the transformation to
the ordinary eigenproblem and the computation of parasitic poles.
The value of a correct Bode-plot is sometimes overestimated. Although it gives an indication of the
accuracy in a single glance, it does not say a thing about the accuracy of the absolute largest poles,
especially poles greater than the maximum frequency. In some cases these poles are parasitic poles,
but in other cases these are correct poles. Therefore it is dangerous to adopt strategies which do
not compute the absolute largest poles: these may produce ’correct’ Bode-plots, but hide possible
valuable information about positive poles. Besides that, it is not known how accurate the original C
and G data is. Furthermore, a comparison between the iterative methods and the Q R method may be
unfair because a tolerance of 10−8 is used. In other words, the results can be viewed to as of equal
quality in 8 digits of accuracy.
The first preconditioner is the diagonal preconditioner. This is the cheapest preconditioner, but it is
not expected that it will bring any improvements. The only advantage is that, because of its low costs
of computation (no) and application (O(n)), it can be recomputed for every change of the Ritz value
θ. The results in Table 10.14 for the reduced problems confirm the expectations. A similar conclusion
holds for the unreduced problems.
Test problem
pz_09
pz_28
pz_36_osc
#it
1999 (5)
399 (4)
399 (2)
#MV
5921
2296
2390
flops
2.81e+10
6.97e+08
8.54e+08
maxi (det( pi C + G))
4.7e-15
1.3e-08
5.1e-08
||G\C*V-V*D||2
6.1e-10
1.69e-09
2.6361e-10
Table 10.14: Computational costs and accuracy of the JDQR-method, with diagonally preconditioned
GMRES, applied to the reduced problems.
The diagonal preconditioner worsens the process. The number of eigenvalues (between parentheses in
the #it column) is very low, while using GMRES without preconditioning results in all eigenvalues
(except for pz_09, where 233 out of 509 eigenvalues are found).
The ILUT-preconditioner promises to perform much better. This preconditioner has an additional
parameter, namely the drop-tolerance t. With this drop-tolerance, the fill-in and computational costs
of the preconditioner can be controlled, thereby influencing both the costs and the accuracy. Because
the ILUT-preconditioner is not cheap (O(n 3 )) for dense matrices, and its costs depend on the tolerance
and the sparsity of the matrix concerned), it is desired to compute as few ILUT-decompositions as
possible. In any case, at most one preconditioner per eigenvalue should be computed. To this end, the
JDQR code is adapted by the undersigned to control the computation of the preconditioner.
The experiments have been done with drop-tolerances t chosen from the values 10−2 , 10−3 , 10−4 , 10−5
10−6 , 10−7 , 10−8 , 0. The results are presented per test problem and the columns containing the determinants are replaced by columns for the number of ILUT-decompositions and the percentage of
the costs those decompositions form of the total costs. Not for every correction equation a new preconditioner is computed; instead, only a new preconditioner is computed if the difference between
two consecutive Ritz-values is absolute larger than 1 · 10−4 . Only computing a new preconditioner if
the target is adjusted is not sufficient: the intermediate Ritz-values may change too much in absolute
value. The value 1·10−4 takes this into account, as well as it does not lead to superfluous computations
c Koninklijke Philips Electronics N.V. 2002
109
2002/817
Unclassified Report
Droptolerance
#it
#MV
flops
#ILUT
decompositions
Workload ||G\C *V-V*D||2
ILUT
1e-02
1e-03
1e-04
1e-05
1e-06
1e-07
1e-08
0
157 (7)
200 (12)
362 (34)
393 (56)
1067 (150)
1627 (230)
1999 (446)
731
640
746
915
792
2858
5643
7697
907
6.04e+011
1.06e+011
453
386
5%
40%
5.3e-010
4.7e-011
Table 10.15: Computational costs and accuracy of the JDQR-method, with ILUT preconditioned
GMRES, applied to the reduced problem pz 09.
of new preconditioners.
Tables 10.15, 10.16 and 10.17 show the results. Because of early stagnations of the Jacobi-Davidson
process, caused by selection problems of searchspace expansion vectors, the data is not complete for
all drop-tolerances. The data presented is taken from the point the stagnation started.
Droptolerance
#it
#MV
flops
1e-02
1e-03
1e-04
1e-05
1e-06
1e-07
1e-08
0
499 (0)
499 (0)
499 (0)
499(13)
263
173
153
120
2756
2947
2967
1687
409
258
134
135
8.10e+008
5.06e+008
5.10e+008
8.26e+008
6.35e+008
5.26e+008
4.33e+008
4.32e+008
#ILUT
decompositions
416
1
1
264
197
176
129
129
Workload
ILUT
||G\C *V-V*D||2
25%
0%
0%
22%
27%
33%
32%
33%
4.4e-010
8.8e-010
1.8e-010
4.5e-012
4.6e-012
1e-011
7.6e-012
5.8e-012
Table 10.16: Computational costs and accuracy of the JDQR-method, with ILUT preconditioned
GMRES, applied to the reduced problem pz 28.
110
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Droptolerance
#it
#MV
flops
1e-02
1e-03
1e-04
1e-05
1e-06
1e-07
1e-08
0
499 (6)
499 (6)
499 (10)
499 (10)
499 (14)
499 (20)
406
146
2703
2663
2378
2326
2485
2019
813
198
5.56e+008
5.78e+008
7.55e+008
8.80e+008
1.02e+009
1.06e+009
1.28e+009
7.30e+008
#ILUT
decompositions
18
28
232
314
372
398
173
78
Workload
ILUT
||G\C *V-V*D||2
0.6%
1.6%
10%
23%
20%
26%
9.4%
14%
9.4e-010
6.2e-010
9.6e-010
1.2e-009
1.6e-009
6.6e-010
2.5e-011
3.2e-011
Table 10.17: Computational costs and accuracy of the JDQR-method, with ILUT preconditioned
GMRES, applied to the reduced problem pz 36 osc.
The effect of the preconditioner varies for the three problems. Where the preconditioners work as
expected for problem pz_28, the outcome of problems pz_09 and pz_36_osc is surprising in the
negative sense. There is only success for a complete LU -decomposition or an ILUT-decomposition
with a drop-tolerance of 10−8 as preconditioner, all other drop-tolerances lead to stagnation in an early
stage.
For problem pz_28, a decrease of the drop-tolerance leads to a decrease of the number of JacobiDavidson iterations and a decrease of the number of matrix-vector multiplications. This is expected,
because a smaller drop-tolerance implies a more accurate ILUT-decomposition and thus a better
solved correction equation. The total computational costs decrease also, but not dramatically. Because the costs of the ILUT-decompositions form about 30 percent of the total costs, the computation
of the preconditioner it self cannot be neglected, besides its application: one action with the preconditioner costs approximately 2n 2 flops (two solves of triangular systems), while the worst case costs
for the computation are n 3 flops. Thus, a preconditioner becomes really interesting if it can be both
computed and applied at low costs.
In the following subsection, the results for test problem pz_28 will be studied in depth.
10.6.3 JDQR results for problem pz 28
In this section the behaviour of the JDQR-method for test problem pz_28 will be discussed. The
central subject of all discussions is the graphical presentation of the convergence history of the JacobiDavidson process. The results are categorised to the way the correction equation is solved. First the
results for the exact solver will be presented, followed by the result for GMRES and GMRES with
preconditioning.
It is noted again that all eigenvalues are wanted, and that the problems are reduced in the known way.
The values for the JDQR-parameters can be found in Table 10.9, unless stated otherwise.
c Koninklijke Philips Electronics N.V. 2002
111
2002/817
Unclassified Report
The test subspace W is computed as W = V, V orthonormal.
0
log10 || r#it ||2
Correction equation solved with exact.
−2
−4
−6
−8
−10
−12
−14
0
20
40
60
80
100
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:42:38
120
Figure 10.4: Convergence history of JDQR with Gaussian elimination.
JDQR with Gaussian elimination
Figure 10.4 shows the convergence history of JDQR where the correction equation is solved exactly,
i.e. using Gaussian elimination. The general convergence history graph plots the iteration number of
the Jacobi-Davidson process against the norm of the residual on log10 scale. The tolerance is visualised
by a dotted line; if the norm of the residual is smaller than the tolerance, the Ritz value is promoted to
an eigenvalue (this is visualised by a point below the dotted line). The search for the next eigenvalue
starts with a new residual norm, in general above the dotted line. The convergence history chart in
fact is a summary of the convergence to all eigenvalues found by JDQR. Between two stars, there
are a number of inner iterations (for instance GMRES iterations or a Gaussian Elimination process).
The convergence history is not surprising: the correction equation is solved exactly and hence a new
eigenvalue is approximated nearly every iteration of the Jacobi-Davidson process. The total amount
of flops is 6.23e+08 and 124 matrix-vector products are computed. It can be confirmed that the
convergence is at least quadratical: every iteration, the norm of the residual is reduced quadratically
or more, which could be caused by the fact that the problem is more or less symmetric: Suppose
that G is symmetric positive-definite, then there exists a Choleski-decomposition G = L̃ L̃ T . If C is
symmetric, the eigenproblem can be rewritten as
L̃ L̃ T x = λCx
L̃ T x = λ L̃ −1 C L̃ −T L̃ T x
L̃ −1 C L̃ −T y = 1/λy,
with y = L̃ T x and the matrix L̃ −1 C L̃ −T being symmetric. However, both C and G are not symmetric in this case, although parts of the matrices are. The departure of the matrix is dept(G −1 C) =
||G −1 C(G −1 C)T − (G −1 C)T G −1 C)||2 = 2.7 · 107 .
A special case happens in iteration 78, where first a quadratic reduction is observed, but next the
residual grows. The most likely cause is that the Jacobi-Davidson process ’comes along’ a Ritz
112
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
value which approximates another eigenvalue than the one corresponding to the current target and
eigenvector (Jacobi-Davidson zooms to the target). This thought is confirmed by the fast convergence
to the next eigenvalue. However, if it takes too many iterations to converge to the first target, it may
happen that this approximation is discarded (due to a restart). At the moment, the approximations
found this way are not stored, only up to the minimum dimension of the restart space.
Using exact solving is not a realistic option in Pstar. Although it is possible to incorporate LU decomposition in the hierarchy (which becomes U L-decomposition in that case), this is not desired
because of the relatively high costs: the LU -decomposition of a dense, ill-conditioned matrix has to
be computed every iteration, and this is not suitable for high dimensional problems (n > 200).
JDQR with GMRES
Figure 10.5 shows the convergence history of JDQR where the correction equation is solved using
GMRES with maximal 5 iterations. The reduction of the tolerance in the ith call of the GMRES
process (for the same eigenvalue) is 0.7i , starting with a tolerance of 1. The thought behind this is that
it is not necessary to solve the correction equation very accurately, especially not in the beginning of
the search (the approximation θ of λ is very poor). This technique is also applied to inexact Newton
processes, with which the Jacobi-Davidson process can be compared [36].
The test subspace W is computed as W = V, V orthonormal.
0
log10 || r#it ||2
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:43:57
350
400
Figure 10.5: Convergence history of JDQR with GMRES as solver.
The total number of flops is 4.87e+08 and 1595 matrix-vector products (including the ones needed
for GMRES) are computed. Note that the costs are slightly less than the cost for JDQR with Gaussian
elimination. The rate of convergence is approximately linear. The large number of JD iterations
needed per eigenvalue can be explained by the quality of the solution of the correction equation. For
each eigenvalue, an average of 4.8 calls to GMRES is needed (for each eigenvalue 5.5 correction
equations have to be solved), of which each call needs an average of 3.2 GMRES iterations. Setting
the maximum number of iterations to 1 results in an average number of GMRES calls of 5.8, which
does not give a significant computational gain, as less matrix-vector multiplications are need (877),
c Koninklijke Philips Electronics N.V. 2002
113
2002/817
Unclassified Report
The test subspace W is computed as W = V, V orthonormal.
0
log10 || r#it ||2
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−09.
2− 5−2002, 14:54:14
350
400
Figure 10.6: Convergence history of JDQR with GMRES as solver and diagonal preconditioning.
but more Jacobi-Davidson iterations (433).
The poor convergence near iteration 100 and 120 can be explained as follows. It may be the case
that the selected Ritz vector does not have a large component in the direction of an eigenvector[12],
or does not correspond to a nearby Schur pair. As a result, the searchspace is expanded in a poor
direction and the convergence halts (temporarily). This is more or less confirmed by the convergence
history.
Because GMRES only uses matrix-vector products and inner products, it can easily be combined
with the hierarchy datastructures in Pstar, where the vectors and matrices are stored hierarchically.
However, the product G −1 C is a dense matrix, so the practical use of the hierarchy is in doubt if the
product G −1 C is to be computed for every multiplication.
JDQR with GMRES and diagonal preconditioning
Figure 10.6 shows the convergence history of JDQR where the correction equation is solved using
GMRES with maximal 5 iterations and diagonal preconditioning. Diagonal preconditioning is one of
the simplest (and cheapest) forms of preconditioning. This form of preconditioning works well for
strongly diagonal dominant problems. In this case, there are no indications that G −1 C is (strongly)
diagonal dominant. This is one explanation for the poor convergence history. Another reason may be
the fact that the preconditioner is projected. The stagnation in large intervals is caused by selecting
the wrong Ritz pair.
Although diagonal preconditioning is not a good candidate, it is remarked that this preconditioner is
very easy to implement in Pstar: the diagonal elements can be made available directly by for instance
storing the diagonal in a separate datastructure.
114
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
JDQR with GMRES and ILUT preconditioning
Figure 10.7 and Figure 10.8 show the convergence history of JDQR where the correction equation
is solved using GMRES with maximal 5 iterations and ILUT preconditioning (with pivoting). For
the ILUT decomposition, the MATLAB routine luinc has been used. The drop-tolerance t has
taken the values t = 10−2 , 10−3 , 10−4 , 10−5 10−6 , 10−7 , 10−8 and 0. Pivoting is used, and eventual
zeroes on the diagonal of U are replaced. The value of the drop-tolerance can be used to control
the computational complexity of the decomposition and the accuracy. It is clear that an equilibrium
between the computational costs and the accuracy must be found for a suitable preconditioner.
The test subspace W is computed as W = V, V orthonormal.
The test subspace W is computed as W = V, V orthonormal.
0
0
log10 || r#it ||2
log10 || r#it ||2
−2
Correction equation solved with gmres.
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
−4
−6
−8
−10
−12
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:46: 5
350
−14
400
0
50
The test subspace W is computed as W = V, V orthonormal.
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:48:14
0
log10 || r#it ||2
log10 || r#it ||2
−2
Correction equation solved with gmres.
−2
Correction equation solved with gmres.
400
The test subspace W is computed as W = V, V orthonormal.
0
−4
−6
−8
−10
−12
−14
350
−4
−6
−8
−10
−12
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:49:42
350
400
−14
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:50:40
350
400
Figure 10.7: Convergence history of JDQR with GMRES as solver and ILUT preconditioning for
respective thresholds t = 10−2 , 10−3 ; 10−4 and 10−5 (from left to right).
The results for the two largest drop-tolerances are a little bit disappointing. There is no convergence
at all, because the preconditioners are too poor to fasten the GMRES process. In fact, the system is
made worser so that instead of less GMRES iterations, more GMRES iterations are needed to obtain
any convergence. For drop-tolerances t = 10−4 and t = 10−5 , the situation improves. Although it
takes a lot of iterations (350) to converge to a single eigenvalue, the power of the Jacobi-Davidson
method becomes visible afterwards immediately. Once a good searchspace is constructed, linear
convergence to new eigenvalues is obtained. One can think of a strategy which performs intermediate
c Koninklijke Philips Electronics N.V. 2002
115
2002/817
Unclassified Report
Arnoldi iterations to get better search directions (instead of only using Arnoldi iterations during the
initialisation phase). The thought behind this is that the Jacobi-Davidson method, which has the same
convergence properties as the Newton method, tries to find one eigenvalue (the zero of the function) in
a neighbourhood of large values with small derivatives. As a result, approximations for the eigenvalue
and corresponding eigenvector may be poor for a large number of iterations, while Arnoldi iterations
do not suffer from this.
The test subspace W is computed as W = V, V orthonormal.
The test subspace W is computed as W = V, V orthonormal.
0
0
log10 || r#it ||2
log10 || r#it ||2
−2
Correction equation solved with gmres.
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
−4
−6
−8
−10
−12
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:52: 9
350
−14
400
0
50
The test subspace W is computed as W = V, V orthonormal.
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:53: 1
0
log10 || r#it ||2
log10 || r#it ||2
−2
Correction equation solved with gmres.
−2
Correction equation solved with gmres.
400
The test subspace W is computed as W = V, V orthonormal.
0
−4
−6
−8
−10
−12
−14
350
−4
−6
−8
−10
−12
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:53:57
350
400
−14
0
50
100
150
200
250
300
JDQR with jmin=10, jmax=20, residual tolerance 1.16248e−009.
1− 5−2002, 20:54:44
350
400
Figure 10.8: Convergence history of JDQR with GMRES as solver and ILUT preconditioning for
respective thresholds t = 10−6 , 10−7 ; 10−8 and 0 (from left to right).
Further decreasing the drop-tolerance does give the expected results. The number of iterations to
compute all eigenvalues decreases significantly, and the speed of convergence grows from linear to
quadratic. The number of iterations for ILUT with a drop-tolerance of t = 0 is almost equal to the
number of iterations for the exact solver (120 against 114 iterations), which confirms the theory (both
correction equations are in fact solved exactly).
The convergence history for ILUT with t = 10−7 shows some starting problems, which indicates problems with constructing a suitable searchspace. This can be caused by selecting the wrong Ritz pairs.
However, during this initial phase, components of other eigenvectors are collected in the searchspace,
so that there is fast convergence during the next iterations. These stagnating problems only return
sporadically for the remainder of the algorithm, from which it can be concluded that there are no dif116
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
ficulties with selecting the expansions for the searchspace. Other experiments have shown that these
initial stagnation problems also depend on the starting vector (which is random).
The computational costs are of the same order as the costs for unpreconditioned GMRES, as can be
concluded from Table 10.16. Nevertheless, the number of Jacobi-Davidson iterations is reduced by
almost a factor three, and the convergence is improved from linear to quadratic convergence. The
average number of GMRES iterations is very low: for a drop-tolerance t of 10−3 there are 5 iterations
per call (the maximum), while for the three smallest tolerances the average is about 1. So decreasing
the maximum number of GMRES iterations will not make a big difference (which is confirmed by
experiments). On the other hand, by making the tolerance for the GMRES process more strict, it is expected that the average number of GMRES iterations will grow. Numerical experiments confirm this,
but the growth is about 1.5 iteration, which can be explained by the fact that the ILUT-preconditioner
is of good quality: the GMRES process comes to a good approximation in very few (1) iterations.
A good measure of the accuracy of the poles is by comparing the Bode-plot of the pole-zero problem.
To that and, the zeroes must also be computed, but this can be done in a similar way. Figure 10.9
shows the Bode-plot of problem pz_28, where the poles are computed with the Q R method, JDQR
with Gaussian elimination, JDQR with unpreconditioned GMRES, JDQR with ILUT preconditioned
GMRES, for drop-tolerances 1e-8 and 1e-6.
200
0
|H(f)| (dB)
−200
−400
−600
−800
−1000
−1200
0
10
exact
QR
JDQR exact
JDQR gmres
JDQR gmres ILUT1e−8
JDQR gmres ILUT1e−6
2
10
4
10
6
10
f (Hz)
8
10
10
10
12
10
Figure 10.9: Bode-plot of problem pz 28 (response (16, 68)), for various correction equation solvers.
Near the end of the frequency domain, the Bode-plots start to differ. Especially JDQR with preconditioned GMRES experiences accuracy problems. The accuracy becomes worse for larger droptolerances. The last part of the Bode-plot is influenced by the largest poles, and these largest poles
correspond with the smallest eigenvalues. Apparently, these smallest eigenvalues become less accurate when the ILUT-preconditioner is used to solve the correction equation. Decreasing the tolerance
for the correction equation may help in this case, so one can think of an adaptive scheme for the
drop-tolerance, depending on the frequency.
The fill-in of the ILUT-decompositions is not discussed here, because only dense problems are concerned. Jacobi-Davidson QR with harmonic testspaces (W = G −1 C V − σ V T ) is also not considered
c Koninklijke Philips Electronics N.V. 2002
117
2002/817
Unclassified Report
here, because in general no suitable values for σ are at hand, and not only the interior eigenvalues are
wanted.
It can be concluded that the ILUT preconditioner K = L̃ Ũ , for drop-tolerances smaller than 10−5 ,
satisfies the requirements of improving the spectral properties of the matrix G −1 C −θi I and of solving
system of the form K z = r easily (O(n 2 )). On the other hand, the ILUT preconditioner is expensive
(2/3n 3 flops), and in practice the costs are roughly a factor two higher because θi may be complex. The
costs for constructing the preconditioners form a third of the total costs, which is too high. However,
it should be noted that the matrix G −1 C is dense, and that the ILUT preconditioner is not designed to
work with dense matrices. The accuracy of the computed eigenvalues is another point of discussion.
There are several possible improvements to think of, like further reducing the fill-in and other variants
of ILUT (MILU). The optimal case is an easy to update preconditioner. Numerical experiments show
that using a fixed ILUT- or LU -decomposition for the whole process does lead to stagnations right
away or after a couple of eigenvalues have been found. So if a preconditioner is used, there must be
some inexpensive update technique.
Studying the possibilities of an ILUT-preconditioner in combination with the implementation in Pstar,
there is one drawback: the computation of the norm of a row or column is expensive, as matrix
elements are not available cheaply. However, in [17], it is reported that using the diagonal element
instead of the row-norm is a good alternative, giving comparable results (this has been done for the
linear systems arising in transient analysis). Nevertheless, the use of this alternative should be justified
by a deeper insight in the structure of G −1 C (how do the diagonal elements relate to the norm of its
corresponding row or column?). The implementation of the ILUT-algorithm does not give any other
problems, as it is a variation of the LU -algorithm, which can be implemented in the hierarchy as a
U L-decomposition.
10.7 The generalised eigenvalue problem
10.7.1 Strategy
The generalised eigenvalue problem associated with a pole-zero analysis of a circuit is
Gx = λCx,
x 6 = 0,
(10.6)
where the scaling of matrix C by maxC is undone. One advantage above the ordinary eigenvalue
strategy is visible immediately: the LU -decomposition of G is not needed, which saves computational
costs (approximately 2n 2 /3 flops for the LU -decomposition and 2n 2 for each action with U −1 L −1 )
and keeps the digits which would have been lost by inverting G −1 . Another advantage, in particular
for iterative subspace methods, is that the matrices C and G are sparse, while the product G −1 C is
dense. A disadvantage is that there is no trivial reduction strategy available.
There is no official implementation of a generalised eigenvalue solver in Pstar. Therefore, there are
no specific problems known which fail for generalised eigenvalue solvers. The numerical properties
of the test problems can be found in Section 10.6.2. The results will be compared to the results of the
ordinary eigenvalue solvers.
The poles pi of the problem are computed as pi = −λi . These values can be compared with the
values computed by Pstar. The quality of the eigenvalues and poles can be tested by computing,
for the matrix V containing the generalised eigenvectors and the diagonal matrix D containing the
118
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
generalised eigenvalues, ||C V − GV D||2 and the determinant det( pi C + G). Both are zero in exact
arithmetic. Another way to compare the results is to make a Bode-plot, for which the zeroes are needed
also. Note that the norm ||C V − GV D||2 is used and not ||GV − C V D||2 because C in general has a
rank-deficiency, which leads to poles at infinity, with undefined eigenvectors. By switching the roles
of C and G, these poles at infinity become eigenvalues at zero and the norm is defined. This is only
done for numerical analysis; the actual poles are computed as described above.
10.7.2 Numerical results
The Q Z method
The experiments are done with the Matlab implementation of the Q Z method, the function eig. The
function call is eig(G,maxC*C). Table 10.18 shows the results for the Q Z method.
Test problem
pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
flops
7.86e+09
3.81e+05
3.16e+08
1.42e+06
7.63e+04
5.62e+04
6.10e+07
2.78e+04
1.45e+06
1.23e+05
2.74e+04
-
maxi (det( pi C + G))
1.8e-250
2.6e-09
1.3e-162
6.3e+04
1
1.7e-09
6.5e-44
6.3e+04
1.9e+62
1e-30
1.1e-06
0
||C*V-G*V*D||2
3e-21
7.1e-21
5.4e-24
1.2e-10
8e-24
6.5e-19
4.1e-19
9.6e-18
1.4e-05
1.3e-22
2.9e-21
2.6e-16
Table 10.18: Computational costs and accuracy of the Q Z method.
The accuracy the Q Z method is really striking, at first sight. Except for problems pz_31 and
pz_sh3_crt, the error is of order O(10−20 ). Compared with the result of the Q R method, this
a gain of about 10 orders. However, the computational costs are about 10 times as high, independent
of the problem size, but still lower than the costs of the Arnoldi method. This factor 10 is higher
than the expected factor 3. A cause may be the ill-conditioned matrix C, of which the effects are less
worse in the ordinary eigenproblem. Furthermore, the determinant det( pi C + G) is in most cases
nearly zero. This is caused by the improved accuracy of the poles, combined with the absence of the
reciprocal relation between the eigenvalues and the poles.
A reason for the improved accuracy is the fact that the generalised eigenproblem is not transformed to
an ordinary eigenproblem, which requires the inversion of the matrix G. This inversion may change
the spectral (and numerical) properties of the problem slightly, thereby influencing the convergence of
the Q R method. The main reason is the scaling of the matrix C with maxC, which is undone for the
generalised case. If the Q R method is applied to the unscaled problem, the accuracy is of the same
order as the accuracy of the Q Z method.
As has been discussed in Chapter 7, the Q R method and the Q Z method compute the same eigenc Koninklijke Philips Electronics N.V. 2002
119
2002/817
Unclassified Report
values in exact arithmetic, but the Q Z method is more stable. To see the effects in finite arithmetic,
the poles computed by the Q R method and the Q Z method are compared by taking the norm of the
difference. The results are listed in Table 10.19.
Test problem
norm( p Q Z − p Q R )
QZ QR p −p maxi i Q Z i pz_09
pz_20
pz_28
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
3.5e+11
1.5e+11
1.1e+10
2.7e+08
3.8e+14
2.3e+04
1.9e+08
1.1e-05
1.3e+9
2.4e-09
9.4e+03
1.3e+03
2.0
1.5
2.0
1.9
2.0
2.0
2.0
2.6e-13
1.5
1.1e-14
2.2e-08
1.3
pi
Table 10.19: The norm of the difference of the poles computed by the Q R method and the Q Z
method.
The results may look disturbing, but can be explained as follows. Due to the inversion of G, a number
of digits have been lost. This loss has it effects on the Q R method and will be visible especially in
the high poles: if for example the eighth digit in a pole of order O(1014 ) differs, there is already a
difference of O(106 )) with that same pole computed by the Q Z method. For small poles (O(10) −
O(106 )) these effects are less severe. More important are the differences of a factor 2 in some of
the computed poles of some problems. Theoretically, these differences can also be caused by the
inversion of G. Inspection of the triangular matrices resulting from the Q Z-process does not indicate
possible stability problems, because not both triangular matrices have diagonal elements near zero for
the same indices.
The JDQZ method
The JDQZ method has been described in Chapter 9. The Matlab code of Gerard Sleijpen (dated
2002), based on the algorithms in [1], has been used. Like with the JDQR method, the most important
property of Jacobi-Davidson style methods to keep in mind during the experiments, is the property
that it has fast convergence to the eigenvalues near a specified target.
The JDQZ method has a range of parameters in common with the JDQR method. Because several
parameters can have values which differ from the JDQR method, all parameters are given again.
Table 10.20 shows all parameters used, together with a description and the values actually used for
the experiments.
120
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
JDQZ parameter
K
SIGMA
NSigma
OPTIONS.Tol
OPTIONS.jmin
OPTIONS.jmax
OPTIONS.MaxIt
OPTIONS.v0
OPTIONS.Schur
OPTIONS.TestSpace
OPTIONS.LSolver
OPTIONS.LS_Tol
OPTIONS.LS_MaxIt
OPTIONS.Precond
2002/817
Description
The number of eigenvalues desired.
Location of wanted eigenvalues.
The best eigenvalue approximations are chosen as target for the
second and next eigenvalues.
Convergence
tolerance:
A
||AQ − Z R ||2 , ||B Q −
Z R B ||2 <=Tol.
Minimum dimension searchspace.
Maximum dimension searchspace.
Maximum number of JD iterations.
Starting space.
Gives Schur decomposition.
The type of test-space.
The linear solver for the correction equation.
Residual reduction of the linear
solver.
Maximum number of iteration
of the linear solver.
Preconditioner for solving the
correction equation (user defined).
Values (actual)
all
(Largest Magnitude), Smallest
Magnitude, Largest Real part,
Smallest Real part, Both ends,
or a target value.
’no’,(’yes’)
(1E-8)
(min(10,n))
(2*min(10,n))
(100)
(ones + 0.1*rand)
(’no’),’yes’
Standard W=a*A*V+b*B*V,
harmonic W=b*A*Va*B*V,
’SearchSpace’ W=V,AV,
(W=BV)
MINRES, (GMRES), CG,
SymmLQ, BiCGStab, exact
(0.7)
(5)
(None)
Table 10.20: JDQZ parameters with description and values (with actual value between parentheses)
used for the experiments.
If for certain experiments other parameter values are used, this will be mentioned by the experiments
concerned. The parameter NSigma is new and is used to control the targets. If NSigma is set, the
targets for the second and next eigenvalues will be set to the best eigenvalue approximations from the
test subspace. The harmonic testspace is interesting if good targets are available (which are not). The
choice W=V is justified if the matrix B is positive definite (so B=LL’ and the generalised eigenproblem
can be rewritten). The choices AV and BV are interesting if either A or B is a diagonal matrix. Because
the matrix A=C is special in the sense that about a third of the rows and columns is zero, it is expected
c Koninklijke Philips Electronics N.V. 2002
121
2002/817
Unclassified Report
that the difference between the standard testspace and the testspace BV (=GV) is not big. This is
confirmed by numerical experiments and therefore, the testspace BV is chosen.
Besides the choices for parameters as the minimum and maximum searchspace dimensions, which
have already been discussed for the JDQR method, a choice for the testspace must be made. Because
the generalised eigenvalues relate to the poles as pi = −λi , especially the eigenvalues with largest
magnitude are desired. In that case, the standard or harmonic testspace are of less interest (as they are
more applicable for interior eigenvalues), so the Ritz space is chosen. The initialisation phase builds
a searchspace with Arnoldi iterations applied to SIGMA*A+B.
Again, quadratic convergence is expected if the correction equation is solved exactly. The correction
equation will be solved with the exact solver (Gaussian Elimination) and the iterative linear solver
GMRES, with and without preconditioning.
The results of the various strategies will be presented first. After that, the JDQZ results of test problem
pz_28 will be analysed in depth, where topics like the convergence behaviour and the role of the
preconditioner will be discussed.
Table 10.212 lists the results for the JDQZ method with exact solving of the correction equation.
Test problem
pz_20
pz_28+
pz_31
pz_32_lf_a
pz_35_osc
pz_36_osc
pz_38
pz_sh3_crt
ER_9883
ER_11908
jr_1
#it
9
116
2
0
0
186
0
0
2
0
1000 (234)
#MV
19
226
32
10
10
196
7
30
12
8
1010
flops
7.76e+006
3.00e+009
1.12e+007
7.31e+005
7.36e+005
2.10e+009
1.27e+005
9.60e+006
1.31e+006
1.82e+005
-
maxi (det( pi C + G))
1.2e+028
Inf
5.7e+028
1.6e-005
1.4e+024
Inf
6.2e-012
1.2e-200
5.4e-044
1.4e-018
NaN
||G*V-C*V*D||2
5.2e+031
1.9e-008
1.1e+017
1.5e+034
5.2e+023
2e+029
3e+042
3.9e+004
7.3e+039
1.5e+032
5.7e-010
Table 10.21: Computational costs and accuracy of the JDQZ-method, with the Gaussian elimination.
Problems marked with a + experience severe near-singularity situations for the correction equation.
These situations are caused by targets which are very near an eigenvalue, so that the operator in the
correction equation becomes nearly singular. Another cause is the matrix C, which is singular in most
cases: Considering the operator βG − αC in the correction equation, a large α and a small β lead to
a dominating role of C. For certain combinations of the current target and the current Schur bases,
the operator in the correction equation becomes singular. These problems can be circumvented by
changing of roles of G and C, i.e. solving the problem Cx = λGx does avoid the second cause and
gives the results presented for pz_28. Furthermore, it is observed that scaling G as G/maxC leads
to better convergence than unscaling C as C*maxC. This is because maxC is rather small (between
O(10−4 and O(10−10 ). Both scalings lead to equivalent generalised eigenproblems.
2 Although the number of matrix-vector multiplications is given as a measure, the number of vector inner-products and
update should be measured too, as these costs are also substantial when using sparse matrices. The flopcount should
compensate this.
122
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Despite the fact that the JDQZ method can calculate with sparse matrices G and C, the computational
costs are of the same order as the costs of the JDQR method. This confirms the predictions which were
made in Section 9.5.7. Although the last column does not give much confidence in the accuracy of
the results, the poles are of the same quality as the pole computed with the JDQR method and Arnoldi
iterations. The huge values for ||G*V-C*V*D||2 are caused by parasitic poles: the rank deficiency
of C results in infinite poles, but these are finite in finite arithmetic (the relative accuracy is of course
better). If the roles of G and C are changed, the accuracy is O(10−8 ). The disadvantage is that the
reciprocal values of the eigenvalues must be computed to obtain the poles.
The JDQZ-method seems to suffer from the rank deficiency of C, whereas the JDQR method has less
problems. Clearly, the product G −1 C − θ I significantly differs from the sum βG − αC (α, β ∈ C).
Furthermore, the generalised eigenproblem cannot be reduced by removing rows and columns which
are zero. Changing the roles of G and C does help, as can be argued from the fact that the number
of Jacobi-Davidson iterations is approximately equal to the number of iterations needed by the JDQR
method. Problem pz_09 could not be solved within 2000 iterations, but 496 out of 504 eigenvalues
were found before the process stagnated.
To get an idea of the accuracy, the Bode-plots are computed of the testproblems pz_20, pz_31,
pz_36_osc and pz_sh3_crt, which are all known to experience problems with the Arnoldi
method. Figure 10.10 shows the corresponding Bode-plots for the exact solution, the Q Z method,
the Arnoldi-method and JDQZ with Gaussian elimination.
20
0
10
−50
0
−100
−20
|H(f)| (dB)
|H(f)| (dB)
−10
−30
−150
−200
−40
−50
−250
−60
−70
0
10
exact
QZ
Arnoldi
JDQZ exact
2
10
exact
QZ
Arnoldi
JDQZ exact
4
10
6
10
f (Hz)
8
10
10
10
−300
0
10
12
10
−28
2
10
4
10
6
10
f (Hz)
8
10
10
10
12
10
50
−29
0
−30
−50
|H(f)| (dB)
|H(f)| (dB)
−31
−32
−100
−33
−34
−150
−35
−36
0
10
exact
QZ
Arnoldi
JDQZ exact
2
10
exact
QZ
Arnoldi
JDQZ exact
4
10
6
10
f (Hz)
8
10
10
10
12
10
−200
0
10
2
10
4
10
6
10
f (Hz)
8
10
10
10
12
10
Figure 10.10: Bode-plots for the problems pz 20(elementary response (i, o) = (1, n − 1)),
pz 31((1, 19)); pz 36 osc ((1, 19)) and pz sh3 crt ((1, n − 1)).
c Koninklijke Philips Electronics N.V. 2002
123
2002/817
Unclassified Report
Like with the results for JDQR, the plots for pz_20 and pz_36 are for all eigenmethods nearly equal,
which indicates that the poles are of the same accuracy. The plots for pz_31 and pz_sh3_crt again
show severe discrepancies, although the Q Z method performs better than the Q R method for problem
pz_31. The same arguments (the wide spectra) hold also here; the advantage of the generalised
eigenproblem (the poles are equal to the negated eigenvalues) is of no value because the differences
between the extremal eigenvalues are equal.
Table 10.22 shows the results for the JDQZ method with GMRES as linear solver of the correction
equation.
Test problem
pz_09
pz_20
pz_28+
pz_31
pz_36_osc
ER_9883
jr_1
#it
500 (1)
9
500 (1)
16
500(1)
2
1000 (0)
#MV
3002
64
226
106
196
14
5968
flops
2.06e+009
9.23e+006
3.00e+009
1.96e+007
2.10e+009
2.22e+006
-
maxi (det( pi C + G))
5.2e+56
2.9e+138
Inf
7.2e-020
Inf
2.6e+068
-
||G*V-C*V*D||2
1.9e-008
5.8e+030
1.9e-008
2.8e+022
2e+029
2.8e+038
-
Table 10.22: Computational costs and accuracy of the JDQZ-method, with the linear solver GMRES.
The singularity problems which appeared when using the exact solver, appear also here. A number of
5 GMRES iterations is not enough to obtain any results for the larger problems. Setting the maximum
number of GMRES iterations to 50 does improve the situation (40 eigenvalues of problem pz_28
are computed within 500 iterations), but is still far from satisfying as the computational costs grow
too much. Changing the roles of C and G does not help, and using the standard testspace also is
no improvement. Increasing the maximum dimension of the searchspace does not result in better
convergence. In short, it is clear that a preconditioner is needed.
Another remarkable observation is the exhaustion of the searchspace in some cases (pz_31 and
pz_20): all Ritz-values and Ritz-vectors are converged sufficiently. If this happens, a new searchspace
is constructed with the Arnoldi-method, like it is during the initialisation phase.
Table 10.23 shows the results for GMRES with diagonal preconditioning. The results speak for themselves.
Test problem
pz_09
pz_28
pz_36_osc
#it
500 (0)
500 (0)
500(0)
#MV
2399
3000
2991
flops
4.8e+09
1.2e+09
7.2e+08
maxi (det( pi C + G))
-
||G*V-C*V*D||2
-
Table 10.23: Computational costs and accuracy of the JDQZ-method, with the linear solver GMRES
and diagonal preconditioning.
The expectations for the ILUT-preconditioner are high, like for the JDQR-method. The drop-tolerances
are chosen from the values 10−2 , 10−3 , 10−4 , 10−5 , 10−6 , 10−7 , 10−8 , 0. Again the criterion that the
difference between the old and new Ritz value must be larger than 10−4 is used. It is remarked that the
124
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
roles of C and G are changed to avoid (some of the) singular operators, so that the problem becomes
Cx = λGx. Tables 10.24, 10.25 and 10.26 show the results.
Droptolerance
#it
#MV
flops
1e-02
1e-03
1e-04
1e-05
1e-06
1e-07
1e-08+
0+
500 (0)
500 (0)
500 (7)
500 (2)
500 (6)
500 (13)
500 (4)
2976
2991
2992
2981
2936
2929
2834
1.80e+009
1.62e+009
1.94e+009
1.94e+009
2.50e+009
3.13e+009
2.67e+009
#ILUT
decompositions
2
2
3
2
12
15
2
Workload
ILUT
||G\C *V-V*D||2
0%
0%
0.2%
0.1%
2%
10%
1%
1.2e-008
9.4e-009
1.2e-008
1.6e-008
1.1e-008
Table 10.24: Computational costs and accuracy of the JDQZ-method, with ILUT preconditioned
GMRES, applied to the reduced problem pz 09.
Droptolerance
#it
#MV
flops
1e-02
1e-03
1e-04
1e-05
1e-06
1e-07
1e-08
0
400 (16)
400 (24)
400 (39)
400 (53)
400 (21)
400 (38)
400 (21)
2387
2226
2179
2117
2319
2237
1974
8.71e+08
1.14e+09
1.45e+09
1.91e+09
1.19e+09
1.48e+09
1.06e+09
#ILUT
decompositions
3
21
13
15
12
21
9
Workload
ILUT
||G\C *V-V*D||2
0.2%
0.1%
0.2%
0.2%
0.1%
0.2%
0.2%
1.3e-08
1.5e-08
1.9e-08
1.8e-07
1.1e-08
2.3e-08
1.5e-08
Table 10.25: Computational costs and accuracy of the JDQZ-method, with ILUT preconditioned
GMRES, applied to the reduced problem pz 28.
c Koninklijke Philips Electronics N.V. 2002
125
2002/817
Unclassified Report
Droptolerance
#it
#MV
flops
1e-02
1e-03
1e-04
1e-05
1e-06
1e-07
1e-08+
0+
400 (1)
400 (36)
400 (23)
400 (16)
400 (22)
400 (15)
400 (14)
2396
2288
2309
2343
2368
2349
1980
5.03e+008
9.81e+008
8.16e+008
7.74e+008
8.18e+008
7.80e+008
7.23e+008
#ILUT
decompositions
4
4
4
3
4
7
5
Workload
ILUT
||G\C *V-V*D||2
0.6%
0.4%
0.4%
0.3%
0.4%
0.5%
0.4%
9e-009
2.5e-008
1.1e-008
2.2e-008
1.3e-008
7.4e-009
9.7e-009
Table 10.26: Computational costs and accuracy of the JDQZ-method, with ILUT preconditioned
GMRES, applied to the reduced problem pz 36 osc.
Because the operator βC − αG in the correction equation is often ill-conditioned, the computation
of the ILUT-decomposition experiences difficulties, which become visible in zeroes on the diagonal
of U . Replacing these zeroes makes the decomposition consistent, but influences the quality of the
preconditioner in a possible bad way. So although the roles of C and G are changed, still severe
singularity problems are experienced.
The costs for the preconditioners form a minor part of the total costs. The small number of ILUTdecompositions indicates that the Ritz values do not change much, or that the criterion is too strict.
Removing the criterion, i.e. setting the minimum difference to zero, does improve the situation a little
bit. Experiments show that the singularity problems still exist, and that 20 additional eigenvalues are
found for the smallest drop-tolerances. Of course at the cost of one ILUT-decomposition per JacobiDavidson iteration. Remember that for the smallest drop-tolerances, nearly the same convergence as
with the exact solver is expected; this is not the case.
10.7.3 JDQZ results for problem pz 28
Like the results of the JDQR-method have been studied for especially test problem pz_28 in Section
10.6.3, this section will focus on the performance of the JDQZ-method for that problem. The results
will be presented in the same way. Remember that the problem cannot be reduced like it could in
the ordinary eigenproblem. The values for the JDQZ-parameters can be found in Table 10.20, unless
stated otherwise.
JDQZ with Gaussian elimination
Figure 10.11 shows the convergence history of JDQZ where the correction equation is solved exactly,
i.e. using Gaussian elimination.
126
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
The test subspace is computed as Bv.
0
log10 || r#it ||2
Correction equation solved with exact.
−2
−4
−6
−8
−10
−12
−14
0
50
100
JDQZ with jmin=10, jmax=20, residual tolerance 1e−008. 1− 5−2002, 21:34:24
150
Figure 10.11: Convergence history of JDQZ with Gaussian elimination.
The convergence history shows the expected result: quadratical convergence. The main difference
with the JDQR-method (cf. Figure 10.4) is the stagnation in the first 30 iterations. Besides the stagnation, caused by wrong selections of Ritz-pairs, two nearly convergences in iteration 5 and 25 can
be seen. These cases represent Ritz-values which approximate other eigenvalues than the ones corresponding to the target; however, the Ritz-information is kept so that the process in iteration 30-50 is
accelerated.
If the iterations of the stagnation in the beginning are not counted, the total number of iterations is
equal to the number of iterations needed by JDQR. This is expected, because the spectral properties
of the problems are the same. On the other hand, there are side-effects which should be accounted for:
the operators in the correction equation are different, and the operator G −1 C − θ I is not equal to the
operator βC − αG. The projection operator (I − Z Z ∗ ) in JDQZ differs from the projection operator
(I − Q Q ∗ ) in JDQR.
Although the matrices C and G are sparse, no benefit from that fact is taken if the correction equation
is solved exactly. An exact solver which uses the sparsity may decrease the computational costs.
Using exact solving is not a realistic option in Pstar. Although it is possible to incorporate LU decomposition in the hierarchy (which becomes U L-decomposition in that case), this is not desired
because of the relative high costs: the LU -decomposition has to be computed every iteration. Nevertheless, an exact solver will become a serious option if it makes use of the sparsity.
JDQZ with GMRES
Figure 10.12 shows the convergence history of JDQR where the correction equation is solved using
GMRES with maximal 5 iterations. The reduction of the tolerance in the ith call of the GMRES
process (for the same eigenvalue) is 0.7i , starting with a tolerance of 1.
c Koninklijke Philips Electronics N.V. 2002
127
2002/817
Unclassified Report
The test subspace is computed as Bv.
0
log10 || r#it ||2
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−008. 1− 5−2002, 21:37:17
400
Figure 10.12: Convergence history of JDQZ with GMRES as solver.
There is no convergence at all, and the most obvious reason is that no suitable searchspace can be
constructed. This is caused directly by the observation that GMRES is not able to solve the correction
equation with an accuracy satisfying the tolerance. If the maximum number of GMRES is enlarged to
50, the situation is improved a very little bit (1 eigenvalue is found within 400 iterations). The reason
must be sought in the property of GMRES that it does not converge well for ill-conditioned problems
[24]. Preconditioning techniques are expected to improve the process.
JDQZ with GMRES and diagonal preconditioning
Figure 10.13 shows the convergence history of JDQZ where the correction equation is solved using
GMRES with maximal 5 iterations and diagonal preconditioning.
128
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
The test subspace is computed as Bv.
0
log10 || r#it ||2
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 17:10:21
400
Figure 10.13: Convergence history of JDQZ with GMRES as solver and diagonal preconditioning.
The diagonal preconditioner does not change the convergence history. The reasons are the same as the
reasons given in the previous paragraph and in the corresponding paragraph of JDQR with diagonal
preconditioning.
JDQZ with GMRES and ILUT preconditioning
The convergence histories for JDQZ with ILUT preconditioned GMRES as solver for the correction
equation are shown in Figure 10.14 and Figure 10.15. The Matlab function luinc is used for the
ILUT-decompositions, with pivoting and zero-replacing on the diagonal of U .
c Koninklijke Philips Electronics N.V. 2002
129
2002/817
Unclassified Report
The test subspace is computed as Bv.
The test subspace is computed as Bv.
0
0
log10 || r#it ||2
log10 || r#it ||2
−2
Correction equation solved with gmres.
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
−4
−6
−8
−10
−12
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 17:38:18
−14
400
0
The test subspace is computed as Bv.
0
log10 || r#it ||2
log10 || r#it ||2
−2
Correction equation solved with gmres.
−2
Correction equation solved with gmres.
400
The test subspace is computed as Bv.
0
−4
−6
−8
−10
−12
−14
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 17:44:45
−4
−6
−8
−10
−12
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 17:47:26
400
−14
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 17:49:39
400
Figure 10.14: Convergence history of JDQZ with GMRES as solver and ILUT preconditioning for
respective thresholds t = 10−2 , 10−3 ; 10−4 and 10−5 (from left to right).
The first remarkable observation is the performance of the ILUT-preconditioners with relatively high
tolerances. Whereas the effect of the preconditioners with the same thresholds for JDQR is not impressive, they improve the JDQZ process at least. Apparently, GMRES with ILUT succeeds to solve the
correction equation good enough to build a suitable searchspace. Approximately 20 Jacobi-Davidson
iterations per eigenvalue are needed for a drop-tolerance t = 10−2 , and this number decreases to 8
iterations per eigenvalue for a drop-tolerance t = 10−5 . Especially the two last convergence histories
show only one or two regions of selection problems, with accelerated convergence afterwards. These
intermediate results promise more improvements for smaller drop-tolerances.
130
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
The test subspace is computed as Bv.
The test subspace is computed as Bv.
0
0
log10 || r#it ||2
log10 || r#it ||2
−2
Correction equation solved with gmres.
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
−4
−6
−8
−10
−12
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 17:52:22
400
−14
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 17:54:12
400
The test subspace is computed as Bv.
0
log10 || r#it ||2
Correction equation solved with gmres.
−2
−4
−6
−8
−10
−12
−14
0
50
100
150
200
250
300
350
JDQZ with jmin=10, jmax=20, residual tolerance 1e−08. 2− 5−2002, 18: 2:16
400
Figure 10.15: Convergence history of JDQZ with GMRES as solver and ILUT preconditioning for
respective thresholds t = 10−6 , 10−7 ; 10−8 (from left to right).
However, the convergence histories show the contrary of the expectations just made. Where the
severe stagnation for the drop-tolerance t = 10−6 can be called ’bad luck’ (other test runs also show
stagnations, but more spread over the eigenvalues), the two other convergence histories are more
representative. Some additional test runs, for the same drop-tolerances, result in an average number
of 25 eigenvalues in 400 iterations. The drop-tolerances t = 10−4 and t = 10−5 perform slightly
better than the other drop-tolerances, the difference is three eigenvalues. The problems with smaller
drop-tolerances experience problems with the computation of the preconditioner, as there were several
cases where zeroes on the diagonal of U were replaced by the drop-tolerance. These replacements
make the preconditioner consistent, but affect its quality for the current operator. This explains that
the convergence is hardly improved for smaller drop-tolerances. Finally, the runs for a drop-tolerance
t = 0 does not lead to convergence at all: the ILUT-decomposition is singular and the process halts.
Because the matrices G and C are sparse, the ILUT-decompositions can be computed relatively cheap
(compared with the costs for dense matrices): for a drop-tolerance t = 10−6 and a dimension of
177, the costs of one decomposition are 4 · 105 flops. However, compared with a matrix-vector
multiplication of the sparse matrices, this is rather expensive. Because the maximum searchspace
dimension is kept small (20), the ILUT computation is still dominant. The ILUT-decomposition has
a large fill-in: the number of non-zeroes in A + B is 704, while the number of non-zeroes in L is
c Koninklijke Philips Electronics N.V. 2002
131
2002/817
Unclassified Report
1900 averagely and 2300 in U (which is about 10% percent of the total number of elements in the
upper triangular). Nevertheless, a complete multiplication in a GMRES iteration, with the operator,
preconditioner and projection is still ten times cheaper than the construction of the preconditioner.
To check the accuracy of the poles, the Bode-plot of problem pz_28 is computed. Figure 10.16
shows the Bode-plot, where in the left picture the poles are computed with the Q Z method and JDQZ
with Gaussian elimination. In the right picture the plot for JDQZ with ILUT preconditioned GMRES
with a drop-tolerance of 1e-8 is added.
100
200
50
0
0
−200
|H(f)| (dB)
|H(f)| (dB)
−50
−100
−400
−600
−150
−800
−200
−250
−300
0
10
−1000
exact
QZ
Arnoldi
JDQZ exact
2
10
exact
QZ
JDQZ gmres ILUT1e−8
4
10
6
10
f (Hz)
8
10
10
10
12
10
−1200
0
10
2
10
4
10
6
10
f (Hz)
8
10
10
10
12
10
Figure 10.16: Bode-plot of problem pz 28, for Q Z and J D Q Z with exact solver for the correction
equation.
JDQZ with Gaussian elimination performs not as good as the Q Z method concerning the accuracy.
The plot for JDQZ with preconditioned GMRES good in the beginning, but because not all eigenvalues could be computed within 500 iterations, it fails after the first peek. The plot of JDQZ with
unpreconditioned GMRES is absent because earlier results indicated that only 1 eigenvalue could be
computed within 500 iterations.
The ILUT-preconditioner does not meet the requirements for the JDQZ-method. Only the computational costs are acceptable. Although the convergence is improved, so that preconditioned GMRES
can be used as solver for the correction equation, only the dominant eigenvalues can be computed
within reasonable time. For high dimensional problems (n > O(104 )), this may be an alternative for
the exact solver. However, in the field of pole-zero analysis, where all eigenvalues are needed, JDQZ
becomes attractive if a good preconditioner is available.
In comparison with JDQR, the JDQZ method suffers more from ill-conditioned operators in the correction equation. Of course, the operator G −1 C − θ I differs much from the operator G + θC. Where
the main problem with a preconditioner for JDQR are the costs for constructing the preconditioner
for a dense matrix, the problems for JDQZ can be identified as to find a preconditioner for very illconditioned operators. The accuracy is also of a lower level for the JDQZ method.
132
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
10.8 Application in PSS analysis
If the generalised eigenproblem has a negative eigenvalue, and hence the pole-zero problem has a
positive pole, this indicates that the corresponding electric circuit is unstable. To detect these poles,
a pole-zero analysis is performed. Nevertheless, these positive poles are in some cases desired, for
example in periodic steady-state analysis[16]. In periodic steady-state analysis, one wants the circuit
to oscillate.
In Chapter 8, a technique for identifying negative eigenvalues without computing all eigenvalues is
described. The Arnoldi method is used to compute the largest eigenvalue of the transformed problem,
but one can use the Jacobi-Davidson method as well. Numerical experiments show that the JacobiDavidson method is preferred if the largest eigenvalue (in absolute magnitude) is well separated from
the other eigenvalues. However, there is another technique to compute the negative eigenvalue efficiently. In practice, the designer often has an estimate value for the positive pole, and hence an
estimate for the negative eigenvalue can be computed. In that case, no transformation is needed, and
the Jacobi-Davidson method can be used right-away, using the estimate as the target.
The numerical process to perform a periodic steady-state analysis can be improved if the oscillator
frequency and an initial guess for the circuit state is available. From the theory of pole-zero analysis,
it is known that the imaginary part of a pole p = p1 + p2 i is the frequency the circuit oscillates
at. Numerical experiments show that this frequency is a nearly perfect approximation (within finite
arithmetic) of the oscillator frequency.
Furthermore, for such a pole p and corresponding eigenvector v, the following harmonic function can
be defined:
x(t) = ve p1 t e p2 it .
(10.7)
This eigenvector v can now be used as initial guess for the periodic steady-state. Currently, an intuitive
perturbation of the DC solution is used as initial guess (and this is refined by some transient iterations),
while it is expected that disturbing the DC solution by the eigenvector v is a better approximation of
the periodic function. As a result, the Poincaré process can be accelerated.
There is more work on this topic, which goes beyond the scope of this report. See also Section 11.2.
10.9 Numerical conclusions
The direct eigenmethods Q R and Q Z are superior in both accuracy and robustness. They do not
suffer from extreme varying spectra, while the Arnoldi-method and the Jacobi-Davidson style Q R and
Q Z have the property that the absolute error in the largest eigenvalues is approximately equal to the
absolute error in the smallest eigenvalues. This can be concluded by observing the bode-plot, where
especially the small eigenvalues are important. A cure for this is to use Shift-and-Invert techniques for
the iterative methods. The direct Q Z method has the advantage that no matrix inversion is needed, as it
handles generalised eigenproblems. Furthermore, it is more stable than the Q R method, and possible
instabilities can be detected by inspection the diagonal values of the generalised Schur forms.
Concerning the computational costs, the Q R method came out to be 10 times as inexpensive as the
Q Z method. Compared with the iterative methods, the direct methods are also faster for the larger
problems (n > 100). The iterative methods may outperform the direct methods in some cases: the
c Koninklijke Philips Electronics N.V. 2002
133
2002/817
Unclassified Report
Jacobi-Davidson style methods are characterised by fast convergence to the eigenvalues near a target; if more targets are known, and the correction equation can be solved efficiently and accurately,
quadratic convergence can be obtained. Unfortunately, the targets are in general not known. The
Arnoldi method is characterised by another typical convergence behaviour: it has slow convergence
to all eigenvalues. However, during the process, as more eigenvalues are found, the convergence speed
grows[35]. Because of this property and the fact that the Arnoldi method has less overhead, it is faster
than the Jacobi-Davidson style methods, if all eigenvalues are needed.
The quality of the eigenvalues computed by the Arnoldi method and the JDQR-method is comparable,
while the accuracy of the JDQZ-method is a little bit worse. However, the quality is influenced by
the way the correction equation is solved. If this equation is solved exactly, or with unpreconditioned
GMRES (with an adaptive tolerance), the quality is comparable with the Arnoldi-method. Incorporating an ILUT-preconditioner for the GMRES process does reduce the number of Jacobi-Davidson
iterations significantly (depending on the drop-tolerance) and the number of GMRES iterations (and
thereby the norm of the residual), but the quality is only acceptable for rather small drop-tolerances
(t = 10−7 or smaller). Nevertheless, using a preconditioner has its benefits, as the number of JacobiDavidson iterations is reduced. The fact that this is not visible in the computational costs, because the
intermediate preconditioner computations are too expensive, stresses that an efficient preconditioner
update scheme is needed. The singularities of the operator must also be taken into account; because
this happens more than once, preconditioners which are not based on I LU -decompositions should be
considered. The projection of the preconditioner, afterwards, seems to be of less influence.
The Jacobi-Davidson style methods are not designed to compute all eigenvalues, but only a selected
few eigenvalues with certain properties. With this in mind, the performance is not that bad, and it is
interesting to study improvements which may make the Jacobi-Davidson style methods more suitable
for computing all eigenvalues.
Finally, the advantages of the iterative methods are in the field of memory savings and computational
costs if the problems become extremely large (n >> 103 ). By adopting efficient restart strategies, the
iterative methods may be preferred above the direct methods (or they are even the only option). The
accuracy remains as an important issue; the wide spectra, with small and large eigenvalues, are in the
nature of the pole-zero problems.
134
c Koninklijke Philips Electronics N.V. 2002
Chapter 11
Conclusions and future work
11.1 Conclusions
The most important conclusion is that Jacobi-Davidson style methods are suitable for application in
pole-zero analysis under the following assumptions, apart from the dimension of the problem: the
eigenspectrum of the generalised eigenproblem must be not too wide or an efficient preconditioner
must be available. If one or both of these assumptions are not met, there is no special preference
for Jacobi-Davidson style methods above the (restarted) Arnoldi method, on the contrary. With the
typical convergence behaviour of Jacobi-Davidson in mind, the Arnoldi method should be chosen in
that case. Nevertheless, if both assumptions are met, one can profit from the quadratic convergence of
the Jacobi-Davidson style methods, combined with acceptable accuracy.
The wide spectra are quite in the nature of the pole-zero problem. There are always some extremal
eigenvalues. This is not a problem for iterative eigenvalue solvers if the difference between the smallest and the largest eigenvalues is modest. However, there are examples where the difference is a factor
1014 . Any iterative method experiences accuracy problems in that case.
The other assumption, the availability of efficient preconditioners, raises three issues. The fact that
an ill-conditioned operator must be approximated causes problems for I LU -based preconditioners,
while the change of the operator per Jacobi-Davidson iteration asks for computational efficiency. The
projection of the operator has less influence on the effect of preconditioning.
Arguments that Jacobi-Davidson style methods are more robust than the Arnoldi method are in this
case not valid, as numerical experiments have shown. Some of these arguments are based on the
current Arnoldi implementation in Pstar; some improvements for the Arnoldi implementation in Pstar
are proposed.
Nevertheless, Jacobi-Davidson style methods are very applicable in stability analysis and periodic
steady-state analysis, where only one or a few eigenvalues are needed. For this type of applications,
Jacobi-Davidson style methods are preferred above the Arnoldi method, because of the fast convergence to the few selected eigenvalues. This convergence becomes even quadratic if the correction
equation is solved exactly. Furthermore, Jacobi-Davidson style methods are suitable for high dimensional problems because the spectrum can be searched part by part. The Arnoldi method lacks this
property.
Finally, the direct Q R and Q Z method are superior in accuracy, robustness and efficiency for prob135
2002/817
Unclassified Report
lems with relatively small dimensions. Even for larger problems, their performance is acceptable. The
disadvantages are that the direct methods do not fit in the hierarchical implementation of Pstar, while
the iterative methods do, and that the memory and computational requirements grow out of bounds.
11.2 Future work
Some interesting research subjects are found or left open in this report. They are presented in the
following not exhausting list.
• The use of a preconditioner for Jacobi-Davidson style methods is fruitful, as numerical experiments have shown. However, to make the preconditioner really applicable, it should be easyto-update, because it is necessary to recompute the preconditioner for every new eigenvalue.
Besides that, the preconditioner must be suitable for ill-conditioned problems[29].
• The (very) fast convergence of Jacobi-Davidson style methods to a few selected eigenvalues
with special properties is a strong property, but makes it at the same time less applicable for the
class of problems where more than a few eigenvalues are needed. Tailoring the Jacobi-Davidson
style methods to meet the requirements for this type of problems is a challenging subject.
• A combination of Arnoldi iterations and Jacobi-Davidson iterations to improve the construction of the searchspace is promising. When stagnation of the Jacobi-Davidson process is detected, Arnoldi iterations can be used, like they are in the initialisation phase, to expand the
searchspace. The Jacobi-Davidson process can proceed with this searchspace to obtain quadratical convergence.
• Accuracy problems arise for eigenproblems with very wide eigenspectra. Strategies which
circumvent these situations by, for example, splitting the extremal eigenvalues from the other
eigenvalues, need to developed. One can also think of adaptive solvers, which adapt their
method of solving (f.i. Shift-and-Invert) to the situation of the spectrum.
• Numerical experiments confirm that the Padé via Lanczos method outperforms all other methods for stable circuits (in computing the dominant poles), but may fail for unstable circuits [11].
Nevertheless, one could use the estimates for the poles computed by PVL as targets for the
Jacobi-Davidson style methods.
• The reduction technique, first seen in [4], to remove certain rows and columns from the ordinary eigenproblem is effective, both for the accuracy and the computational costs. It is not
unconceivable that a similar technique exists for the generalised eigenproblem.
• Applications of Jacobi-Davidson style methods in other fields of electric circuit simulation,
such as PSS analysis and stability analysis must be researched in more detail.
11.3 Acknowledgements
During this project I have had assistance of people from several areas of science. In arbitrary order,
I would like to thank Michiel Stoutjesdijk for his information about pole-zero analysis and the interesting test problems. My thanks go also to Gerard Sleijpen for the discussions about Jacobi-Davidson
136
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
methods and the improved Matlab routines of JDQR and JDQZ. I thank Jos Peters for his help with
the Pstar source code and other Pstar technical facts. Stephan Houben is thanked for his sharp suggestions. Last, but not least, I thank Jan ter Maten for the daily accompaniment and Henk van der Vorst
for the remote assistance and the discussions and ideas.
c Koninklijke Philips Electronics N.V. 2002
137
Bibliography
[1] BAI , Z., D EMMEL , J., D ONGARRA , J., RUHE , A., AND VAN DER VORST, H., Eds. Templates
for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM, 2000.
[2] BARRET, R., B ERRY, M., C HAN , T., D EMMEL , J., D ONATO , J., D ONGARRA , J., E IJKHOUT,
V., P OZO , R., ROMINE , C., AND VAN DER VORST, H. Templates for the Solution of Linear
Systems: Building Blocks for Iterative Methods. SIAM, 1994.
[3] B OGART, J R ., T. F. Electric Circuits, second ed. Macmillan Publishing Company, 1992.
[4] B OMHOF, C. Jacobi-Davidson methods for eigenvalue problems in pole zero analysis. Nat.Lab.
Unclassified Report 012/97, Philips Electronics NV, 1997.
[5] B OMHOF, C. Iterative and parallel methods for linear systems, with applications in circuit
simulation. PhD thesis, Universiteit Utrecht, 2001.
[6] B UTTERWECK , H. Elektrische netwerken, 1 ed. Prisma-Technika Utrecht/Antwerpen, 1974. in
Dutch.
[7] C HUA , L. O., AND L IN , P.-M. Computer aided analysis of electric circuits: algorithms and
computational techniques, first ed. Prentice Hall, 1975.
[8] D ERRICK , W. R. Complex analysis and applications, 2 ed. Wadsworth Inc., 1984.
[9] ED&T/A NALOGUE S IMULATION. Pstar User Guide for Pstar 4.0. Philips ED&T/Analogue
Simulation, 2000.
[10] F ELDMANN , P., AND F REUND , R. W.
Course Notes for Numerical Simulation of Electronic Circuits:
State-of-the-Art Techniques and Challenges.
http://www.wavelet.org/cm/cs/who/freund/, 1995.
[11] F ELDMANN , P., AND F REUND , R. W. Efficient linear circuit analysis by Padé approximation
via the Lanczos process. IEEE Trans. Computer-Aided Design 14 (May 1995), 639–649.
[12] F OKKEMA , D. R., S LEIJPEN , G. L., AND VAN DER VORST, H. A. Jacobi-Davidson style QR
and QZ algorithms for the reduction of matrix pencils. SIAM J. Sci. Comput. 20, 1 (Aug. 1998),
94–125.
[13] F RALEIGH , J. B.,
AND
B EAUREGARD , R. A. Linear Algebra, 3 ed. Addison-Wesley, 1995.
[14] G OLUB , G. H., AND VAN L OAN , C. F. Matrix Computations, third ed. John Hopkins University
Press, 1996.
138
Unclassified Report
2002/817
[15] H ONKALA , M., ROOS , J., AND VALTONEN , M. New Multilevel Newton-Raphson Method for
Parallel Circuit Simulation. In European Conference on Circuit Theory and Design (2001).
[16] H OUBEN , S. Algorithms for Periodic Steady State Analysis on Electric Circuits. Master’s thesis,
TU Eindhoven, 2 1999.
[17] L ENGOWSKI , L. CGS preconditioned with ILUT as a solver for circuit simulation. Master’s
thesis, TU Eindhoven, 12 1998.
[18] M C C ALLA , W. J. Fundamentals of Computer Aided Circuit Simulation, 1 ed. Kluwer Academic
Publishers, 1988.
[19] NAG.
AG Fortran Library Routine Document F08PEF (SHSEQR/DHSEQR).
http://www.nag.com/numeric/FL/manual/pdf/F08/f08pef.pdf.
[20] PARKER , G. G.
DC gain and response characteristics of first order systems.
http://www.me.mtu.edu/∼ggparker/me4700/pdfs/dcgain info.pdf.
[21] P RESS , W. H., T EUKOLSKY, S. A., V ETTERING , W. T.,
recipes in C, second ed. Cambridge University Press, 1999.
AND
F LANNERY, B. Numerical
[22] R ABBAT, N., S ANGIOVANNI -V INCENTELLI , A., AND H SIEH , H. A Multilevel Newton Algorithm with Macromodeling and Latency for the Analysis of Large-Scale Non-linear Circuits in
the Time Domain. IEEE Trans. Circuits Syst. CAS-26, 9 (1979), 733–741.
[23] S AAD , Y. Numerical methods for large eigenvalue problems: theory and algorithms. Manchester University Press, 1992.
[24] S AAD , Y. Iterative methods for sparse linear systems. PWS Publishing Company, 1996.
[25] S CHILDERS , W., AND D RIESSEN , M. Iterative Linear Solvers for Circuit Simulation. Nat.Lab.
Report 6854/95, Philips Research Laboratories, 1995.
[26] S LEIJPEN , G. L., AND VAN DER VORST, H. A. A Jacobi-Davidson Iteration Method for Linear
Eigenvalue Problems. SIAM Review 42, 2 (Apr. 2000), 267–293.
[27] S LEIJPEN , G. L., VAN DER VORST, H. A., AND BAI , Z. Jacobi-Davidson algorithms for
various eigenproblems: A working document. Preprint 1114, Universiteit Utrecht, Aug. 1999.
Published in [1].
[28] S LEIJPEN , G. L., VAN DER VORST, H. A., AND M EIJERINK , E. Efficient expasion of subspaces in the Jacobi-Davidson method. Electronic Transactions on Numerical Analysis 7 (1998),
75–89.
[29] S LEIJPEN , G. L., AND W UBS , F. W. Exploiting multilevel precondtioning techniques in eigenvalue computations. Preprint 1117, Universiteit Utrecht, Aug. 1999.
[30] S TOER , J.,
AND
B ULIRSCH , R. Introduction to Numerical Analysis. Springer, 1992.
[31] T REFETHEN , L. Computation of Pseudospectra. Acta Numerical (1999), 247–295.
[32]
VAN DE W IEL , M. Pole-zero and stability analysis of electrical circuits. Nat.Lab. Report
6807/94, Philips Research Laboratories, 1994.
c Koninklijke Philips Electronics N.V. 2002
139
2002/817
Unclassified Report
[33]
W IEL , M. Application of pole-zero analysis methods to real life analogue electronic
circuits. Nat.Lab. Technical Note 241/95, Philips Research Laboratories, 1995.
[34]
W IEL , M., D RIESSEN , M., AND TER M ATEN , E. The Pstar algorithm for pole zero
analysis. ED&T Internal Report, Philips Research Laboratories, 1996.
[35]
VAN DER S LUIS , A., AND VAN DER VORST, H. The convergence behavior of Ritz values in the
presence of close eigenvalues. Linear Algebra and its Applications 88/89 (1987), 651–694.
[36]
VAN DER VORST, H. A. Computational Methods for Large Eigenvalue Problems. In Handbook
of Numerical Analysis, P.G. Ciarlet and J.L. Lions, Ed., vol. VIII. North-Holland (Elsevier),
2001, pp. 3–179.
140
VAN DE
VAN DE
c Koninklijke Philips Electronics N.V. 2002
Appendix A
Matrix transformations and
factorisations
In the following sections, some commonly used transformations and factorisations will be described.
A.1
Householder transformations
The Householder transformation Hv ∈ Rn×n is defined by
Hv = I −
2
vT v
vvT ,
(A.1)
with v ∈ Rn . The product of Hv with a non-zero vector x ∈ Rn is
Hv x = (I −
2
vT v
vvT )x = x −
2vT x
v = x − αv.
vT v
(A.2)
Householder transformations, also known as Householder reflections (Hv x reflects x in v⊥ ), can be
used to transform consecutive vector components to zero. By choosing v = x ± ||x||2 e1 , with e1 ∈ Rn
the first unit vector, Hv x becomes
2vT x
v
vT v
2(||x||22 ± ||x||2 x1 )
= x−
(x ± ||x||2 e1 )
2||x||22 ± 2||x||2 x1
= ∓||x||2 e1 .
Hv x = x −
(A.3)
The sign in the expression for v is chosen to be the same as the sign of x1 , to avoid numerical cancellation. Note that any consecutive sequence of vector components can be transformed to zero by
restricting the Householder transformation only to that part. For instance, if components k + 1 to m
need to be transformed to zero, the Householder transformation Hv becomes


0
Ik−1 0
Hv =  0
(A.4)
H̃v
0 ,
0
0 In−m
141
2002/817
Unclassified Report
where H̃v is restricted to span{ek , . . . , em }. Finally, Householder transformations have some nice
properties:
1. Hv = HvT .
2. Hv = Hv−1 .
3. Hv is orthogonal.
It costs 3m flops to compute a v such that Hv transforms m vector components to zero. Application of
Hv to an n × q matrix costs 4q(m + 1) flops. More information about Householder transformations
can be found in [14, 21].
A.2
Givens transformations
Givens transformations, also known as Givens rotations, can be used to selectively transform elements
of a vector to zero. A Givens transformation G i j ∈ Rn×n is of the form



Gi j = 


Ii−1

cos θ





sin θ
I j −i−1
− sin θ
cos θ
(A.5)
In− j .
To zero the j th element of y = G i j x, θ must be chosen such that
yi = cos θ xi + sin θ x j ,
y j = − sin θ xi + cos x j = 0.
This can be achieved by choosing
cos θ = q
xi
xi2 + x 2j
,
sin θ = q
xj
xi2 + x 2j
.
(A.6)
Overflow situations can be avoided by computing cos θ and sin θ as follows:
cos θ = q
1
1+
x
( xij )2
,
sin θ =
x j
xi
cos θ,
(A.7)
if |xi | > |x j |. If |x j | > |xi |, a similar computation can be made. It costs 6q flops to zero an element
of A ∈ Rn×q , while it costs 6nq flops to zero a whole column except the first element. Givens
transformation are thus twice as expensive as Householder reflections. However, if relatively few
elements have to be zeroed, Givens transformations may be preferred. In [14, 21], more information
about Givens rotations can be found.
142
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
A.3
2002/817
Q R-factorisation
For every A ∈ Rn×m with independent column vectors in Rn , there exists an n × k matrix Q with
orthonormal column vectors and an upper-triangular invertible k × k matrix R such that
A = Q R.
(A.8)
This is in fact a matrix representation of the normalised Gram-Schmidt formula
vj = aj −
j −1
X
(a j , q j ),
(A.9)
i=1
where q j = (1/||v j ||)v j . Some reordering gives
aj =
j
X
ri j qi .
(A.10)
i=1
A Q R decomposition can be used to solve systems of linear equations, to solve least square problems or in eigen-computations. Q R decompositions usually are computed by successive Householder
transformations. The costs for a Q R decomposition are approximately n 2 (m − n/3) flops. Q R decomposition and its applications are discussed in detail in [14] section.
A.4
Schur decomposition
It can be proved that every matrix A ∈ Rn×n can be transformed with an orthonormal Q ∈ Rn×n as
Q T AQ = R,
where R ∈ Rn×n is an upper quasi-triangular matrix:

R11 R12

R22

R=

···
···
..
.
(A.11)

R1n
R2n 

..  .
. 
(A.12)
Rnn
For a proof, see, e.g.[14] section. Each diagonal block Rii is either a 1×1 matrix or a 2×2 matrix with
complex-conjugate eigenvalues. The eigenvalues of the diagonal blocks are equal to the eigenvalues
of A, while eigenvectors of A are given by Qy for eigenvectors y of R. The diagonalisation of a real
symmetric matrix is a special case of this Schur decomposition, with R a diagonal matrix.
c Koninklijke Philips Electronics N.V. 2002
143
2002/817
144
Unclassified Report
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Author(s)
Joost Rommes
Title
Jacobi-Davidson methods and preconditioning with applications
in pole-zero analysis
Master’s Thesis
Distribution
Nat.Lab./PI
PRB
LEP
PFL
CIP
WB-5
Briarcliff Manor, USA
Limeil–Brévannes, France
Aachen, BRD
WAH
Director:
Department Head:
Ir. G.F.M. Beenker
Ir. A. Jongepier
WAY 5.53
WAY 3.01
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS
WAY 3.57
WAY 3.65
WAY 3.57
WAY 3.55
WAY 3.59
WAY 3.71
WAY 3.55
WAY 3.69
WAY 3.71
WAY 3.61
ED&T/AS
ED&T/AS
WAY 3.79
WAY 3.65
ED&T/AS
ED&T/AS
ED&T/AS, TU/e
ED&T/AS
ED&T/AS
ED&T/AS
ED&T/AS, UU
Research
Research
Research, TU/e
Research
Research
WAY 3.77
WAY 3.67
WAY 3.73
WAY 3.69
WAY 3.73
WAY 3.79
WAY 3.73
WAY 4.55
WAY 4.91
WAY 4.85
WAY 4.01
WAY 4.91
Abstract
M.E. Alarcon Rivero
Ir. H.A.J. van de Donk
Dr. H.H.J. Janssen
Ing. M. Kluitmans
Mw.Ir. L.S. Lengowski
Dr.Ir. C. Lin
Dr.Ir. J. Niehof
Ir. M.F. Sevat
Dr. A.J. Strachan
G. Warrink
Full report
Mw.Drs. M.M.A. Driessen
Mw.Drs. E.J.M. van Durenvan der Aa
Ir. J.G. Fijnvandraat
Ir. J.C.H. van Gerwen
Ir. S.H.M.J. Houben
Dr.Ir. M.E. Kole
Dr. E.J.W. ter Maten
Ir. J.M.F. Peters
Drs. J. Rommes (10 copies)
Ir. A.J.W.M. ten Berg
Dr. T.G.A. Heijmen
Ir. P.J. Heres
Dr. R. Horváth
Dr. P.B.L. Meijer
c Koninklijke Philips Electronics N.V. 2002
145
2002/817
Dr. Ir. V. Nefedov
Ir. C. Niessen
Prof.Dr. W.H.A. Schilders
Dr.Ir. P.W. Hooijmans
Dr.Ir. M.J.M Pelgrom
Dr. A.J.H. Wachters
L.M.C. van den Hoven
M. van Lier
Dr. D.R. Fokkema
Drs. M.C.J. van de Wiel
Ir. M. Stoutjesdijk
R. Duffee
Ir. J.A. Riedel
Dr. P. Vermeeren
Dr. S.P. Onneweer
146
Unclassified Report
Research, TU/e
WAY 4.85
Research
WAY 4.65
Research
WAY 4.77
Research
WAY 5.101
Research
WAY 5.39
Nat.Lab.
WB 2.17
Semiconductors
BF-p
Semiconductors
BF-p
Semiconductors Nijmegen
MOS4YOU
Semiconductors Nijmegen
Semiconductors CIC Nijmegen M3.573
Semiconductors Southampton MF-36
Semiconductors CIC Hamburg LA303
Consumer Electronics
IC-Lab, SFJ
Semiconductors, 811 E. Arques Ave, MS78
PO Box 3409 Sunnyvale, CA94088-3409
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
Full report
Prof.Dr. R.M.M. Mattheij
Dr. J.M. Maubach
Onderafdeling der Wiskunde
Scientific Computing Group
TU Eindhoven
Den Dolech 2, HG 8.38
5612 AZ Eindhoven
The Netherlands
Dr. Ir. C.W. Bomhof
Plaxis BV
P.O.Box 572
2600 AN Delft
The Netherlands
Dr. B.J.W. Polman
KU Nijmegen
Math. Inst. Fac. Wiskunde en Informatica
Toernooiveld 4
6525 ED Nijmegen
The Netherlands
Prof.Dr. S.M. Verduyn Lunel
Mathematical Institute
Leiden University
P.O.Box 9512
2300 RA Leiden
The Netherlands
Prof.Dr. H.A. van der Vorst (2 copies)
Dr. G.L.G. Sleijpen (1)
Drs. M.R. Pistorius (1)
Utrecht University
Dept. of Mathematics
P.O. Box 80.010
3508 TA Utrecht
The Netherlands
Dr.Ir. F. W. Wubs
c Koninklijke Philips Electronics N.V. 2002
Rijksuniversiteit Groningen
Dept. of Mathematics
P.O. Box 800
9700 AV Groningen
The Netherlands
147
2002/817
Unclassified Report
Full report
Dr. G. Ali
Instituto per Applicazioni della Matematica
Consiglio Nazionale delle Ricerche
Napoli
Italy
Prof. Dr. Z. Bai
Department of Computer Science
University of California
One Shields Avenue
Davis, CA 95616
USA
Dr. H.G. Brachtendorf
Frauenhofer IIS, ADTM
Am Weichselgarten 3
D-91058 Erlangen
Deutschland
Dipl.Math. A. Bartel
Dr. M. Guenther
Dipl.Math. R. Pulch
148
Universität Karlsruhe (TH)
IWRMM, AG Wissenschaftliches Rechnen
Engesser Str. 6
D-76128 Karlsruhe
Deutschland
Prof.Dr. A. Brambilla
DEI - Dipartimento di Elettronica e Informazione
Politecnico di Milano
Via Ponzio / Piazza Leonardo da Vinci 32
I-20133 Milano
Italy
Prof.Dr. A. Bunse-Gerstner
Uni Bremen FB3
Mathematik und Informatik
Bibliothekstrasse
Postfach 330440
D-28334 Bremen
Deutschland
Dr. U. Feldmann
Infineon Technologies AG
MP TI CS ATS
Balanstr. 73
D-81541 Muenchen
Deutschland
Dr. R.W. Freund
Bell Laboratories
Room 2C-525
c Koninklijke Philips Electronics N.V. 2002
Unclassified Report
2002/817
700 Mountain Avenue
Murray Hill, New Jersey 07974-0636
USA
Prof.Dr. G.G.E. Gielen
Katholieke Universiteit Leuven
Departement Elektrotechniek, ESAT-MICAS
Kardinaal Mercierlaan 94
B-3001 LEUVEN
Belgium
Prof.Dr. P. Rentrop
Technische Universität München
Zentrum Mathematik, SCB
D-80290 München
Deutschland
Dr. J. Roos
Helsinki University of Technology
Circuit Theory Laboratory
Department of Electrical and Communications Engineering
P.O.Box 3000, FIN-02015 HUT
Finland
Prof.Dr. J.S. Roychowdhury
Dept. of Electrical and Computer Engineering
University of Minnesota
200 Union St. SE, Room 4-155
Minneapolis MN 55455-0154
USA
c Koninklijke Philips Electronics N.V. 2002
149