SPECTRAL METHODS FOR PARAMETERIZED MATRIX EQUATIONS A DISSERTATION

advertisement
SPECTRAL METHODS FOR
PARAMETERIZED MATRIX EQUATIONS
A DISSERTATION
SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL
AND MATHEMATICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Paul G. Constantine
August 2009
c Copyright by Paul G. Constantine 2009
All Rights Reserved
ii
I certify that I have read this dissertation and that, in my opinion, it
is fully adequate in scope and quality as a dissertation for the degree
of Doctor of Philosophy.
(Gianluca Iaccarino) Principal Adviser
I certify that I have read this dissertation and that, in my opinion, it
is fully adequate in scope and quality as a dissertation for the degree
of Doctor of Philosophy.
(George Papanicolaou)
I certify that I have read this dissertation and that, in my opinion, it
is fully adequate in scope and quality as a dissertation for the degree
of Doctor of Philosophy.
(Parviz Moin)
Approved for the University Committee on Graduate Studies.
iii
Preface
I have always been fascinated by the process of acquiring and validating knowledge,
i.e. how we know what we know. In earlier years, this fascination resided in philosophical and religious inquiries. But somehow the metrics for quantifying the ever
present uncertainties were too subjective to satisfy my cravings. This dissatisfaction
ultimately persuaded me toward a course of study in mathematics – a field with more
rigorous concepts for objective measurements.
In modern computers, we have a powerful technology that aides our search for
knowledge and ideally provides the precise path that brought us to each new vista of
understanding. However, the epistemic considerations of these paths are still largely
unknown. Questions like how much can we trust the results of a simulation? often
do not have a straightforward answer.
These challenging questions are the core of the field of uncertainty quantification, and it is these questions that attracted my attention; it was a natural fit for
my fundamental fascinations with knowledge. This dissertation and its mathematical manipulations represent a nudge in the direction of addressing the questions in
uncertainty quantification for scientific computing.
iv
Abstract
In this age of parallel, high performance computing, simulation of complex physical
systems for engineering computations have become routine. However, the question
arises at the end of such a computation: How much can we trust these results? Determining appropriate measures of confidence falls in the realm of uncertainty quantification, where the goal is to quantify the variability in the output of a physical model
given uncertainty in the model inputs. The computational procedures for computing
these measures (i.e. statistics) often reduce to solving an appropriate matrix equation
whose inputs depend on a set of parameters.
In this work, we present the problem of solving a system of linear equations
where the coefficient matrix and the right hand side are parameterized by a set
of independent variables. Each parameterizing variable has its own range, and we
assume a separable weight function on the product space induced by the variables.
By assuming that the elements of the matrix and right hand side depend analytically
on the parameters (i.e. can be represented in a power series expansion) and by
requiring that the matrix be non-singular for every parameter value, we ensure the
existence of a unique solution.
We present a class of multivariate polynomial approximation methods – known
in numerical PDE communities as spectral methods – for approximating the vectorvalued function that satisfies the parameterized system of equations at each point in
the parameter space. These approximations converge rapidly to the true solution in
a mean-squared sense as the degree of polynomial increases, and they provide flexible
and robust methods for computation. We derive rigorous asymptotic error estimates
for a spectral Galerkin and an interpolating pseudospectral method, as well as a more
v
practical a posteriori residual error estimate. We explore the fascinating connections
between these two classes of methods yielding conditions under which both give the
same approximation. Where the methods differ, we provide insightful perspectives
into the discrepancies based on the symmetric, tridiagonal Jacobi matrices associated
with the weight function.
Unfortunately, the standard multivariate spectral methods suffer from the socalled curse of dimensionality, i.e. as the number of parameters increases, the work
required to compute the approximation increases exponentially.
To combat this
curse, we exploit the flexibility of the choice of multivariate basis polynomials in
a Galerkin framework to construct an efficient representation that takes advantage of
any anisotropic dependence the solution may exhibit with respect to different parameters. Despite the savings from the anisotropic approximation, the size of systems to
be solved for the approximation remains daunting. We therefore offer strategies for
large-scale problems based on a unique factorization of the Galerkin system matrix.
The factorization allows for straightforward implementation of the method, as well
as surprisingly useful tools for further analysis.
To complement the analysis, we demonstrate the power and efficiency of the spectral methods with a series of examples including a univariate example from a PageRank model for ranking nodes in a graph and two variants of a conjugate heat transfer
problem with uncertain flow conditions.
vi
Acknowledgements
First and foremost, I would like to recognize the support and guidance of my adviser,
Professor Gianluca Iaccarino. From the outset of our time working together, he gave
me the freedom to explore my own peculiar inklings while judiciously directing me
towards the important questions for the broader research community in engineering.
His words from class to casual conversation shaped both my research ideas and my
professional ambitions. I am exceedingly grateful to call him a mentor, a colleague,
and a friend.
Secondly, I would like to thank the rest of my reading committee, which included
Professor George Papanicolaou and Professor Parviz Moin. Their comments and
direction – wrapped in years of experience – influenced not only the words in this
dissertation but also my general views and opinions on academic research.
I would like to thank the remainder of my oral defense committee, Professor James
Lambers and Professor Oliver Fringer, for taking time to participate in my defense,
for asking insightful questions, and for offering constructive feedback.
I would like to acknowledge the financial support from the following sources: The
Department of Energy’s PSAAP and ASC programs, The Franklin P. and Caroline
M. Johnson Fellowship, the course assistantships at the Institute for Computational
and Mathematical Engineering (ICME), and the federal Stafford loans. These generous gifts and loans from private, public, and university sources made possible my
education and this research.
I would like to express my gratitude to my collaborators – specifically to:
• Doctor David Gleich, whose brilliant mind, inspiring work ethic, and steadfast
friendship have motivated me throughout our time together in graduate school.
vii
Our countless whiteboard scribble sessions and late night Gchats took many of
the ideas in this dissertation from gut feelings to formalisms.
• Doctor Qiqi Wang, whose remarkable ability to explain complex concepts through
simple examples clarified so many sticky points of my research.
• Doctor Alireza Doostan, whose experience and guidance were an invaluable
contribution to my education in the UQ research community and its methods.
I would also like to thank the uncertainty quantification group at Stanford for listening
to my rants and providing feedback at our weekly pizza lunches.
Thanks to all the experienced researchers who have contributed to my work via
emails, conversations, and feedback at conferences including: Michael Eldred, Dongbin Xiu, Roger Ghanem, Clayton Webster, Raul Tempone, Anthony Nuoy, Olivier
LeMaitre, Didier Lucor, Habib Najm, Youssef Marzouk, Alex Loeven, Jeroen Witteveen, and Marcel Bieri.
I would like to thank all the associates of ICME including:
• Directors Professor Peter Glynn and Professor Walter Murray, for their leadership and guidance,
• Student Services Coordinator Indira Choudhury, for making sure I signed all of
my required forms on time,
• IT Services Seth Tornborg and Brian Tempero, for keeping my computer secure
and updated.
Thanks to the faculty associated with ICME, including:
• Professor Michael Saunders, for his willingness to hear my ideas and offer constructive feedback,
• Professor Amin Saberi, for supporting my transition from the M.S. program to
the Ph.D. program,
• Professor Amir Dembo, for an outstanding course sequence in probability theory,
viii
• the late Professor Gene Golub, whose contributions to the field of numerical analysis cannot be overstated and whose fingerprints are subtly scattered
throughout my work.
A heartfelt thanks to my officemates Doctor Michael Atkinson and Doctor Jeremy
Kozdon, who – along with Doctor Gleich – created a stimulating environment for all
manners of work and play.
Finally, I would like to thank my parents and family for their limitless encouragement and outstanding genetic material.
ix
Contents
Preface
iv
Abstract
v
Acknowledgements
vii
1 Introduction
1
1.1 Verification, Validation, and Uncertainty Quantification . . . . . . . .
2
1.2 Parameterized Matrix Equations . . . . . . . . . . . . . . . . . . . . .
4
1.3 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3.1
Anisotropic Approximation
. . . . . . . . . . . . . . . . . . .
11
1.3.2
Large-scale Computations . . . . . . . . . . . . . . . . . . . .
13
1.4 Summary of Contributions and Future Work . . . . . . . . . . . . . .
15
2 Parameterized Matrix Equations
19
2.1 Problem Definition and Notation . . . . . . . . . . . . . . . . . . . .
19
2.2 Example – An Elliptic PDE with Random Coefficients . . . . . . . .
20
2.3 A Discussion of Singularities . . . . . . . . . . . . . . . . . . . . . . .
22
3 Spectral Methods — Univariate Approximation
25
3.1 Orthogonal Polynomials and Gaussian Quadrature . . . . . . . . . . .
25
3.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.3 Spectral Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4 Pseudospectral Methods . . . . . . . . . . . . . . . . . . . . . . . . .
29
x
3.5 Spectral Galerkin . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.6 A Brief Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.7 Connections Between Pseudospectral and Galerkin
. . . . . . . . . .
34
3.8 Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.9 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
3.9.1
A 2 × 2 Parameterized Matrix Equation . . . . . . . . . . . .
44
A Parameterized Second Order ODE . . . . . . . . . . . . . .
45
3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.11 Application — PageRank . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.9.2
4 Spectral Methods — Multivariate Approximation
51
4.1 Tensor Product Extensions . . . . . . . . . . . . . . . . . . . . . . . .
52
4.2 Non-Tensor Basis Functions . . . . . . . . . . . . . . . . . . . . . . .
56
4.3 A Multivariate Spectral Galerkin Method . . . . . . . . . . . . . . . .
57
2
4.4 L Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
4.4.1
ANOVA Decomposition . . . . . . . . . . . . . . . . . . . . .
59
4.4.2
Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4.4.3
Connections Between ANOVA and Fourier Series . . . . . . .
61
4.5 Developing A Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4.5.1
Incrementing a Basis Set . . . . . . . . . . . . . . . . . . . . .
62
4.5.2
Generating a Score Vector . . . . . . . . . . . . . . . . . . . .
65
4.5.3
Computing The Dimension Weights . . . . . . . . . . . . . . .
67
4.5.4
Estimating the Curvature Parameters . . . . . . . . . . . . . .
69
4.5.5
Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . .
70
4.5.6
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
4.6 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
4.6.1
Anisotropic Parameter Dependence . . . . . . . . . . . . . . .
72
4.6.2
Small Interaction Effects . . . . . . . . . . . . . . . . . . . . .
72
4.6.3
Large Interaction Effects . . . . . . . . . . . . . . . . . . . . .
73
4.6.4
High Dimensional Problem . . . . . . . . . . . . . . . . . . . .
74
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
xi
5 Strategies for Large-Scale Problems
80
5.1 A Weakly Intrusive Paradigm . . . . . . . . . . . . . . . . . . . . . .
82
5.2 Galerkin with Numerical Integration (G-NI) . . . . . . . . . . . . . .
84
5.2.1
A Useful Decomposition . . . . . . . . . . . . . . . . . . . . .
85
5.2.2
Eigenvalue Bounds . . . . . . . . . . . . . . . . . . . . . . . .
87
5.2.3
A Least-Squares Interpretation . . . . . . . . . . . . . . . . .
89
5.2.4
Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . .
90
5.2.5
Preconditioning Strategies . . . . . . . . . . . . . . . . . . . .
91
5.3 Parameterized Matrix Package — A MATLAB Suite . . . . . . . . .
92
5.4 Application — Heat Transfer with Uncertain Material Properties
. .
93
5.4.1
Problem Set-up . . . . . . . . . . . . . . . . . . . . . . . . . .
93
5.4.2
Solution Method . . . . . . . . . . . . . . . . . . . . . . . . .
94
5.4.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
6 An Example from Conjugate Heat Transfer
99
6.1 Problem Description and Motivation . . . . . . . . . . . . . . . . . . 100
6.1.1
Mathematical Formulation . . . . . . . . . . . . . . . . . . . . 102
6.1.2
Uncertainty sources . . . . . . . . . . . . . . . . . . . . . . . . 103
6.1.3
Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 A Hybrid Propagation Scheme . . . . . . . . . . . . . . . . . . . . . . 105
6.2.1
Galerkin method for the energy equation . . . . . . . . . . . . 106
6.2.2
Collocation method for the modified system . . . . . . . . . . 107
6.2.3
Analysis of computational cost . . . . . . . . . . . . . . . . . . 108
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3.1
Numerical convergence and verification . . . . . . . . . . . . . 109
6.3.2
A physical interpretation . . . . . . . . . . . . . . . . . . . . . 110
7 Summary and Conclusions
114
7.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.2.1
Improved Heuristics for Choosing a Polynomial Basis . . . . . 121
xii
7.2.2
Locating and Exploiting Singularities . . . . . . . . . . . . . . 121
7.2.3
Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.2.4
Software and Benchmark Problems . . . . . . . . . . . . . . . 122
7.2.5
Alternative Approximation Methods . . . . . . . . . . . . . . 123
7.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Bibliography
125
xiii
List of Tables
5.1 The weights computed with the ANOVA-based method for choosing
an efficient anisotropic basis. . . . . . . . . . . . . . . . . . . . . . . .
xiv
95
List of Figures
2.1 Plotting the response surface x0 (s) that solves equation (3.66) for different values of ε. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.1 The convergence of the spectral methods applied to equation (3.66).
The figure on the left shows plots the L2 error as the order of approximation increases, and the figure on the right plots the residual error
estimate. The stairstep behavior relates to the fact that x0 (s) and
x1 (s) are odd functions over [−1, 1]. . . . . . . . . . . . . . . . . . . .
45
3.2 The convergence of the residual error estimate for the Galerkin and
pseudospectral approximations applied to the parameterized matrix
equation (3.73). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.3 The x-axis counts the number of points in the Gaussian quadrature
rule for the expectation and standard deviation of PageRank. This
is equivalent to the number of basis functions in the pseudospectral
approximation. The y-axis measures the difference between between
the approximation and the exact solution. Solid circles correspond
to expectation and plus signs correspond to standard deviation. The
different colors correspond to the following Beta distributions for α:
β(2, 16, 0, 1) – blue, β(0, 0, 0.6, 0.9) – salmon, β(1, 1, 0.1, 0.9) – green,
β(−0.5, −0.5, 0.2, 0.7) – red, where β(a, b, l, r) are the signifies distri-
bution parameters a and b, and endpoints l and r. . . . . . . . . . . .
xv
50
4.1 For d = 2, the shaded squares represent the included multi-indices for
the various basis sets with n = 10. The weighted index set with no
curvature has weights 1 and 2. The weighted index set with curvature
has curvature parameter p = 0.7. . . . . . . . . . . . . . . . . . . . .
65
4.2 Tensor product pseudospectral coefficients (color) of a solution with
anisotropic parameter dependence along with the included coefficients
(black & white) of the non-tensor weighted Galerkin approximation. .
73
4.3 Tensor product pseudospectral coefficients (color) of a solution with
weak interaction effects along with the included coefficients (black &
white) of the non-tensor weighted Galerkin approximation. . . . . . .
74
4.4 Tensor product pseudospectral coefficients (color) of a solution with
strong interaction effects along with the included coefficients (black &
white) of the non-tensor weighted Galerkin approximation. . . . . . .
75
4.5 Decay of the coefficients associated with the main effects for the high dimensional elliptic problem computed with a spectral Galerkin method
with a full polynomial basis. . . . . . . . . . . . . . . . . . . . . . . .
76
4.6 Decay of the coefficients associated with the main effects for the high dimensional elliptic problem computed with a spectral Galerkin method
with an ANOVA-based weighted basis. . . . . . . . . . . . . . . . . .
77
4.7 Convergence of the weights in the ANOVA-based weighted scheme as
n increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
4.8 Convergence of the residual error estimate for both the ANOVA-based
anisotropic basis and the full polynomial basis plotted against the number of basis elements in each approximation. . . . . . . . . . . . . . .
79
5.1 Mesh used to compute temperature distribution. . . . . . . . . . . . .
94
5.2 The number of terms as n increases in the weighted, ANOVA-based
anisotropic basis compared to the number of terms in the full polynomial basis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
5.3 Plotting the residual error estimate of the G-NI approximation with
the weighted, ANOVA-based polynomial basis. . . . . . . . . . . . . .
xvi
96
5.4 The expecation (above) and variance (below) of the temperature field
φ over the domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 A turbine engine cooling system with a pin array cooling system.
97
. . 101
6.2 Computational mesh for two-dimensional cylinder problem. . . . . . . 102
6.3 Schematic of uncertain inflow conditions. The arrows represent the
stochastic inflow conditions, and the shading represents the heat flux
on the cylinder wall. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 Convergence of the variance of the Galerkin approximation TN as the
number of terms in the expansion N increases. Each line represents
the convergence for each quadrature rule. The convergence tolerance
is 10−5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5 Convergence of the variance of the Galerkin approximation T4 as the
number of points in the quadrature rule M increases. The convergence
tolerance is 10−4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.6 Approximate expectation as a function of θ around the cylinder wall
computed with the hybrid method and Monte Carlo. . . . . . . . . . 112
6.7 Approximate variance as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo. . . . . . . . . . . . . 113
6.8 Approximate conditional variance at s1 = s2 = 0 as a function of θ
around the cylinder wall computed with the hybrid method and Monte
Carlo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
xvii
Chapter 1
Introduction
The advent of parallel, high performance computing has brought with it the potential
for high fidelity simulation of physics and engineering systems. The last fifty years has
seen an explosion in computing power and a comparable increase in computational
methods for approximating the solution of differential equations and other models
that describe complex physical phenomena. One example is the Department of Energy’s Advanced Simulation and Computing (ASC) program, whose purpose is to
develop simulation capabilities that assess the performance, safety, and reliability of
the nation’s nuclear weapons stockpile. Programs such as ASC have driven research
in computational engineering subdisciplines from fundamental algorithm development
to computer architecture. In general, the goal of such large-scale simulation is to approximate an output quantity of interest given a model, such as a set of coupled
partial differential equations, and the problem data, such as forcing terms, model
parameters, and initial/boundary conditions.
Once the model has been discretized, the code written and compiled, the problem data chosen, and the simulation executed, an intriguing question surrounds the
computed output: How much can we trust these results?
The trend in computational engineering is to treat numerical simulation like an
experiment — one that can explore models and parameter values that are infeasible
or prohibitively expensive for physical experiments. But experimental data typically
include a measure of uncertainty related to measurement errors or inherent variability
1
CHAPTER 1. INTRODUCTION
2
of the observed system. This uncertainty measure provides a level of confidence for
the results of the experiment. Is there an analog to this measure of confidence for
numerical simulation? Can we generate something like a confidence interval for the
output of the simulation?
1.1
Verification, Validation, and Uncertainty Quantification
When the United States signed the Comprehensive Nuclear-Test-Ban Treaty in 1996,
these sorts of questions gained prominence in the national labs as weapons certification
procedures shifted from test-based methodologies to simulation-based methodologies.
Since then, the questions have been refined and expanded, a jargon has emerged along
with a set of mathematical tools, and research in verification and validation (V&V)
and uncertainty quantification (UQ) was birthed [71]. Moreover, other simulationbased research communities — particularly researchers in computational mechanics
and computational fluid dynamics — realized the importance of ascribing confidence
measures to simulation results, and they have become very active in the V&V and
UQ studies [70].
The oft-cited AIAA Guide [1] defines the terms verification and validation as
follows:
• Verification: The process of determining that a model implementation accu-
rately represents the developer’s conceptual description of the model and the
solution to the model.
• Validation: The process of determining the degree to which a model is an
accurate representation of the real world from the perspective of the intended
uses of the model.
A more colloquial definition with an appealing mnemonic reads: Verification tries to
answer the question, are we solving the equations correctly? while validation attempts
to answer, are we solving the correct equations? In practice, verification involves
CHAPTER 1. INTRODUCTION
3
ensuring that simulation codes are running properly via software development best
practices, debugging, and test cases that compare results to a known analytical or
manufactured solution [49]. Validation procedures are not nearly as well defined
since there are only guidelines and heuristics for characterizing the comparison of
simulation results to experimental data.
However, one undeniably critical component in the validation process is quantifying the uncertainties associated with a given mathematical model. Before one can
quantitatively compare simulation to experiment, he must understand the effect of
input uncertainties on the computed output. An input uncertainty may be a range or
probability distribution associated with a model parameter, or it may be a spatially
varying random process associated with material properties or boundary conditions.
Other prevalent examples of input uncertainties include geometric inconsistencies
from manufacturing processes and noise associated with model forcing terms. Exploring the decisions involved in representing these various types of uncertainty falls
outside the scope of this work; we mention the following terms merely to provide
context.
The UQ community has broadly divided uncertainties into two categories: (i)
Aleatory uncertainty describes variability that is inherent in the system; it is irreducible in the sense that further measurements will not reduce the variability. A
natural choice for modeling this type of uncertainty is with probability models, which
assign density functions to input quantities. (ii) Epistemic uncertainty, on the other
hand, derives from lack of knowledge; it is reducible in the sense that further measurements add to existing knowledge. This is sometimes called model form uncertainty,
since a mathematical model can be tuned or modified to match new observations. A
debate currently exists over whether or not probability theory contains the appropriate set of tools for representing epistemic uncertainty. Alternative methods have
been proposed including evidence theory and interval arithmetic [42].
We will assume the uncertainties of interest can be adequately represented by a
finite — though potentially large — set of parameters. If the uncertainties originate from model parameters, then this assumption is entirely natural. If the model
calls for a spatially or temporally varying random process to represent, for instance,
CHAPTER 1. INTRODUCTION
4
boundary conditions or material properties, then this can often be approximated by
a truncated parametric series such as the Karhunen-Loeve expansion [58]. (We will
not address the error made by truncating the infinite series, but we assume that
this modeling choice was justified by other considerations.) We refer to the valid
range of the parameters as the parameter space. This space immediately becomes a
component of the domain of the model output due to the output’s functional relationship to the parameters. In general, the parameters representing the uncertainty
may have some underlying correlation structure depending on how they are derived.
Unfortunately, the approximation techniques we consider for exploring the functional
relationship require the parameters to be mutually independent. Addressing any correlation amongst the parameters is still an active area of research and falls outside
the scope of this work.
1.2
Parameterized Matrix Equations
In recent literature, research in UQ has been closely linked to work in numerical
methods for partial differential equations with stochastic input quantities. One of the
first to examine such models was Ghanem [33], who developed his spectral stochastic
finite element method in the context of mechanics models with stochastic material
properties, which was some of the first work to examine such models. Spurred by
these developments, many authors began investigating methods for elliptic equations
with stochastic coefficients [5, 7, 62, 28], while others extended this work to more
general fluid flow models [61, 52, 54, 45, 26]. One finds a common recipe in the
elliptic problems and steady flow formulations: First, the problem is formulated on
a tensor product of the spatial domain and an abstract probability space, and uncertain problem data are modeled as infinite dimensional stochastic processes. Next,
the stochastic processes are approximated by a multivariate function of a finite set of
parameters, where the parameters are interpreted as random variables. This yields
a perturbed version of the original problem, and questions of well-posedness must
be reconsidered. Each of the parameters induces a coordinate direction in the tensor product domain, which implies that the sought-after solution is a multivariate
CHAPTER 1. INTRODUCTION
5
function of both the spatial coordinates and the parameters. Typically the first step
in devising a computational method is to perform a standard discretization, such as
finite element or finite difference, in the spatial domain. For linear problems, the spatial discretization results in a linear system of equations for the degrees of freedom.
If there were no parametric dependence in the problem, then this linear system could
be solved with any standard solver. However, in contrast to deterministic models, the
coefficient matrix and the right hand side of the linear system of equations derived
from the stochastic problem depend on the values of the input parameters. Thus, we
have arrived at a parameterized matrix equation. At this point, we have not completed
the recipe for the full discretization of the stochastic problem. Nevertheless, we will
pause here and use this as a primary motivation for a general study of parameterized
matrix equations.
The primacy of matrix equations was recognized very early in the development
of scientific computing by distinguished researchers such as von Neumann [85] and
Golub [37]. They realized that the solution to many science and engineering models
could be approximated by solving an appropriate matrix equation or sequence of matrix equations. Prominent examples include linear or linearized differential equation
models, general optimization methods, and least-squares model fitting. As the focus
now shifts from computing the solution to quantifying the uncertainty in the computed solution, the natural choice for analyzing proposed computational methods is
to incorporate the uncertainty directly into the matrix equations. To be sure, this is
not a substitute for a priori analysis of the effects of uncertainty on a given model,
but it does provide a touchstone for algorithm development.
There are many models used for expressing uncertainty in a matrix equation. We
are loose with semantics here, since these models were developed to address errors,
perturbations, or noise in the problem data; the interpretation may be different,
but the mathematics translates remarkably well. The first such model comes from
investigations into round-off error in matrix computations [88]. In this setting, each
stored floating point number is assumed to be perturbed by something on the order
of machine precision from the intended value. Then one can examine the effects of
this perturbation — which is typically assumed to be linear — on the computed
CHAPTER 1. INTRODUCTION
6
solution. Kato [48] tackles a more general perturbation model where the matrix
operator depends analytically on a parameter. His focus is primarily on the eigenvalue
problem, but some of his results will be quite useful for our analysis. Sun [81] extends
this work to the case of analytic dependence on multiple parameters. In the total least
squares model [35], the goal is to find the solution that minimizes the effects of the
errors assumed to be in both the matrix and right hand side. However, this approach
tells nothing about the effects of the errors, only that those effects are minimized in
the computed approximation. Another popular model for representing uncertainty is
the interval matrix equation [2]. In this model, each element of the matrix and right
hand side is prescribed by two endpoints of an interval. The goal is then to construct
a vector of intervals that bounds all possible solutions to the linear system of interval
equations.
To demonstrate their utility, we mention some other examples of parameterized
matrix equations not connected to discretizing differential equations with stochastic
inputs. The PageRank model [72] is one example that we will examine in detail
in Chapter 3. It computes a vector that ranks the nodes in a graph according to
the link structure. This model depends on a parameter that represents the behavior
or an idealized random surfer. The paths taken by the random surfer constitute a
Markov chain on the graph nodes, and the ranking vector can be interpreted as the
stationary distribution of this chain. However, the ranking depends on the value of
the parameter. Thus we examine a modification where we assume a distribution for
the parameter and compute statistics of the ranking over the range of the parameter.
Another parameterized matrix equation is found in models for nonlinear image deblurring [17], where the parametric dependence in the matrix represents blurring of the
image pixels. Some models for electronic circuit design beget parameterized systems
of equations [56], where the parameters represent varying geometric configurations of
the chip. Finally, we mention a recent multivariate rational interpolation scheme [86]
where each evaluation of the interpolant requires the solution of an optimization
problem to minimize the error. This minimization problem can be formulated as a
matrix equation such that the elements of the matrix depend on the point where the
interpolant is evaluated.
CHAPTER 1. INTRODUCTION
7
While parameterized matrix equations occur in a host of unrelated computational
models, we know of no systematic treatment of them as a proper subject. This is likely
because many of the analysis results are straightforward to derive, such as the fact that
each component of the solution is a rational function of the parameters. Such results,
however, are immensely important when attempting to derive computational methods
for approximating the vector-valued solution. For example, with this information, we
now have a concrete approximation question: what is the best way to approximate
multivariate rational functions? When we turn our attention to spectral methods,
we can then ask how well do polynomials approximate rational functions? These
simple questions have not emerged in the UQ literature. We address them by offering
the parameterized matrix equation as a general model problem for both analysis and
algorithm development.
In Chapter 2, we present an exposition of parameterized matrix equations and
make statements characterizing the solution. The key assumption that we will make
is that the matrix is nonsingular at all points in the parameter space; we will also
discuss what happens when this assumption is not satisfied. We will also assume that
the elements of the matrix and right hand side depend analytically on the parameters,
i.e. each element has a convergent power series in some region that contains the
parameter space. These two assumptions will imply that the solution is also an
analytic function of the parameters.
One question that immediately arises is what specifically do we want to compute?
Some applications ask only for bounds on the solution, while others need an approximate functional relationship — or response surface — with respect to the parameters
for surrogate modeling. In the probabilistic context, we would like to estimate the
expectation and variance, or the probability that the solution exceeds some critical
threshold. These statistics can be formulated as high dimensional integrals over the
parameter space, which are intimately tied to the approximation methods. When
refer to “solving” the parameterized system, we mean computing an explicit approximation of the vector-valued function that satisfies the parameterized linear system
of equations for all values of the parameters. This is the most comprehensive of the
CHAPTER 1. INTRODUCTION
8
possible computations, since many statistics can be estimated from this approximation.
1.3
Spectral Methods
We examine a class of polynomial approximation methods known in the context of
numerical PDEs as spectral methods [16, 43, 12, 39]. In their most basic form, these
methods are characterized by a finite degree global polynomial approximation to the
function of interest, whether it is the solution to a partial differential equation or the
solution of a parameterized matrix equation. We will not delve into the long, rich
history of polynomial approximation, but we will present the necessary background
theory — now considered classical — in Chapter 3, including relevant facts about
orthogonal polynomials, Fourier series, Gaussian quadrature, and Lagrange interpolation. Spectral methods rose to prominence as numerical methods for PDEs in the
70s and 80s, and their use is now widespread. Different varieties are characterized
by the choice of basis functions — typically trigonometric or algebraic orthogonal
polynomials — and the method for treating boundary conditions. Their popularity
results from the so-called exponential (i.e. geometric) asymptotic convergence rate
in the mean-squared norm as the order of polynomial approximation increases for
infinitely smooth solutions. Even solutions with a finite number of continuous derivatives typically enjoy a high algebraic rate of convergence, which is dictated by how
many derivatives are continuous.
This rapid convergence was the primary attractor to spectral methods for researchers in UQ who were working with differential equations with random inputs.
Before their introduction, the standard was to employ Monte Carlo (MC) methods
to sample the inputs, compute the output for each sample, and aggregate statistics of
the output [41]. In fact, this is still the most widely used method in practice due to
its robustness and ease of implementation. However, the MC methods suffer from a
dreadfully slow convergence rate proportional to the inverse of the square root of the
number of samples. And if each sample evaluation is expensive — such as the solution
CHAPTER 1. INTRODUCTION
9
of a PDE — then obtaining hundreds of thousands of samples may be entirely infeasible. Thus, the initial applications of spectral methods showed orders-of-magnitude
reduction in the work needed to estimate statistics with comparable accuracy [90].
Such results spurred interest in applying spectral methods to differential equations
with stochastic (i.e. parameterized) inputs.
Ghanem was one of the first on this path [33]. He introduced spectral methods within his spectral stochastic finite element method under the name polynomial
chaos expansion, which uses a basis of multivariate Hermite polynomials to span the
set of square-integrable functions on the parameter space. He justified this terminology by referring back to the work of Wiener [87], whose chaos expansions were an
extension of Hilbert theory to infinite dimensional stochastic processes. The polynomial chaos methods come in intrusive and non-intrusive flavors: The intrusive
variety is a Galerkin projection method that requires the solution of a large, coupled
system of equations to compute the coefficients of the expansion, whereas the nonintrusive variety uses existing deterministic codes — in the way MC methods do — to
compute the coefficients of a pseudo-projection. Xiu and Karniadakis [90] extended
Ghanem’s method to basis functions from the Askey family of orthogonal polynomials and dubbed it the generalized polynomial chaos. Around the same time, Deb,
et al. [20] introduced a method for elliptic problems with stochastic coefficients and
labeled it the stochastic Galerkin method ; the p version of this has strong connections
to spectral methods. This work was done in a Galerkin framework, where one seeks
a finite dimensional approximation such that the residual of the equation is orthogonal to the approximation space. Paralleling the development of spectral methods for
PDEs, the next advance was the introduction of spectral collocation methods. Xiu
and Hesthaven [89] were the first to introduce the collocation idea in this context, and
they were quickly followed by Babuska, et al. who popularized the phrase stochastic collocation [6]. Since the problems lack a differential operator in the parameter
space, the collocation methods reduce to Lagrange polynomial interpolation on a set
of quadrature points, which can be implemented non-intrusively while retaining an
asymptotic convergence rate similar to the Galerkin methods.
Instead of following the nomenclature from this body of recent literature, we prefer
CHAPTER 1. INTRODUCTION
10
the terminology from the spectral methods communities, which is well-established
amongst the larger numerical analysis and engineering communities. This amounts
to avoiding the term “stochastic” when describing the approximation methods. In
this way we hope to connect the probabilistic interpretations to existing analyses.
Contrary to what the flurry of research activity may suggest, spectral methods
have significant drawbacks. Since these methods produce global approximations, they
have trouble resolving solutions with local behavior, such as rapid oscillations or discontinuities. Also, they are not well-suited for the complex geometries that arise
in most engineering applications. But these particular drawbacks are alleviated for
the parameterized matrix equation, since (i) the assumptions of non-singularity and
analytic parameter dependence yield infinitely smooth solutions, and (ii) the tensor product parameter space implies that the domains are typically hyperrectangles.
From this perspective, the spectral methods are ideal. Moreover, the solution of the
parameterized matrix equation has no boundary conditions to satisfy, which dramatically simplifies implementation. In Chapter 3, we derive a spectral Galerkin method
and an interpolatory pseudospectral method for systems that depend on a single parameter. Using classical theory, we derive error estimates showing that both methods
have a similar asymptotic rate of geometric convergence, and this rate is related to
the size of the region of analyticity. For many problems in practice, the region of
analyticity is determined by the nearest point outside the domain where the parameterized matrix is singular. Often this point is related to some existence or stability
criteria for the underlying model, e.g. the point where the parameterized coefficients
of an elliptic PDE reach zero. If it is close to the boundary of the domain, then the
convergence rate can degrade considerably.
We will see in the derivations that the Galerkin and pseudospectral varieties differ
dramatically in implementation. Computing the Galerkin coefficients requires the solution of a constant linear system of equations that is n times larger than the original
parameterized system, where n is the order of the approximation. In contrast, the nth
order pseudospectral approximation can be computed by solving the parameterized
system at the points in the parameter space given by an n-point Gaussian quadrature rule. In UQ parlance, this is the difference between an intrusive method and a
CHAPTER 1. INTRODUCTION
11
non-intrusive method. As the labeling suggests, a non-intrusive method takes full advantage of a solver written for the constant matrix equation resulting from evaluating
the parameterized matrix equation at a point in the parameter space. In contrast,
an intrusive method requires one to write a new solver for the larger linear system
of equations derived for the Galerkin coefficients. Clearly, non-intrusive methods are
embarrassingly parallelizable and take advantage of performance enhancements built
for the related constant matrix equation. Immediately, one encounters the question:
does the increased accuracy of the intrusive Galerkin method justify the extra effort
in implementation? This question continues to puzzle researchers in UQ. To address
this question, we present a rigorous comparison of Galerkin and pseudospectral methods in Chapter 3 using a novel approach based on the symmetric, tridiagonal Jacobi
matrices associated with the orthogonal polynomial basis. This analysis reveals the
conditions under which the two approximations are equivalent, and it uncovers a
wealth of fascinating relationships when the methods produce differing results. In
particular, we show that each approximation method can be viewed as solving a
truncated infinite system of equations; the difference between the two methods lies
in when the truncation is performed — before or after applying the parameterized
operator. This points to numerous strategies for efficient implementation. All of these
results for the single parameter case extend to the case of multiple parameters when
the multivariate basis functions are constructed using tensor products of univariate
basis functions; we make this explicit at the beginning of Chapter 4. However, this
construction is highly impractical — if not infeasible — due to the dramatic cost
increase associated with tensor approximations.
1.3.1
Anisotropic Approximation
The worst disadvantage for spectral methods occurs when the matrix and/or right
hand side depend on multiple parameters. For this case, we have entered the realm of
multivariate approximation, and we quickly encounter the so-called curse of dimensionality [40]. Loosely speaking, this curse reflects the harsh reality that the cost of
CHAPTER 1. INTRODUCTION
12
constructing an accurate approximation increases exponentially as the number of parameters increases. In many practical cases, we expect the dimension of the parameter
space to be at least moderately high, particularly when the parameters represent an
approximation of an infinite dimensional stochastic process. Thus, heuristics for combating the curse are essential. This is a very active research area within UQ, with new
strategies proposed regularly at conferences and in journals. Some prominent techniques include sparse-grid interpolation schemes [68], hierarchical basis functions [53],
low-rank approximation methods [23], and reduced order modeling [78] — all of which
share the goal of reducing the computational work necessary to compute a sufficiently
accurate multivariate approximation. In some special cases, these methods may have
error estimates that are formally independent of the number of parameters. But it is
unreasonable to expect this type of result in general.
In Chapter 4, we present a heuristic for efficient multivariate approximation using
the main effects and interaction effects from a functional ANOVA decomposition [57]
as a guide to uncover the most important coefficients of the Galerkin approximation.
The ANOVA decomposition has been useful in machine learning and data analysis
communities for determining the most important independent variables in a model.
It has also been used to develop quasi Monte Carlo methods for high dimensional
integration. Its use in multivariate approximation is not new, but it continues to
gain status as demand for high dimensional approximation methods increases. It is
also closely connected to the Fourier series which underlies the spectral methods; we
reveal this connection in greater detail in Chapter 4.
The central idea of our heuristic is as follows. If we knew the true values of
the coefficients of the multivariate Fourier expansion, then we could include only the
most important basis functions for an accurate, finite order approximation. The error
estimates for the Galerkin approximation depend on the fact that the Fourier coefficients decay asymptotically as the order of the associated basis function increases.
Therefore, by finding an approximate measure of the decay along the coefficients associated with each parameter, we can take advantage of any anisotropic dependence
(i.e. parameters whose variation affects the solution more than others) and drastically
reduce the number of necessary basis functions in the approximation. This measure
CHAPTER 1. INTRODUCTION
13
of anisotropy can be computed by estimating the variance contribution from each
of the main effects functions in a functional ANOVA decomposition and comparing
them with the total variance of the solution. (These are sometimes called the Sobol
indices [3].) Additionally, we can use the variance contribution from each interaction
effect to adjust the number of cross-term basis functions used to capture the functional dependence on subsets of the parameters. The algorithm repeats the following
steps until a termination criteria is satisfied: (i) For a given set of basis functions,
compute a Galerkin approximation to the solution of the parameterized matrix equation, (ii) use the computed coefficients to estimate the variance contribution for each
subset of parameters, and (iii) use the estimated variance contributions to select an
efficient basis set for the next Galerkin approximation. We demonstrate the power
of this heuristic with a set of representative numerical examples. The upside of the
curse of dimensionality is that simple reduction ideas can dramatically reduce the
work required to compute an efficient approximation.
One important consequence of this technique is that it reveals when the full problem decouples into smaller sub-problems via the variance contributions of the interaction effects. For example, if the variance contribution is zero from the interaction
effect between two parameters, then the problem will split into two smaller, uncoupled problems — one for each parameter. This dependence revealing property can
greatly reduce the necessary work for problems with unknown, complex parameter
dependence, such as multi-physics models with multiple, independent sources of uncertainty. We demonstrate this for a conjugate heat transfer model in Chapter 6.
1.3.2
Large-scale Computations
Arguably, the most pressing challenge for computational methods in UQ is the magnitude of the computations. Non-intrusive techniques like pseudospectral methods may
require hundreds of thousands of evaluations to compute accurate approximations
for multiple parameter systems. If each of those evaluations corresponds to solving
a large linear system of equations (e.g. a properly refined three-dimensional PDE
model), then these methods quickly exceed the limits of the most powerful existing
CHAPTER 1. INTRODUCTION
14
computing resources. A Galerkin method can reduce the cost by selectively choosing
the basis functions as suggested in our heuristic, but any solution that significantly
depends on many parameters will always require a tremendous number of basis functions for an accurate representation. Even forming and storing the linear system for
these coefficients may be infeasible for many platforms.
To help cope with the scale of these computations, we propose an algorithmic
paradigm that falls between the all-or-nothing distinctions of intrusive and nonintrusive. For parameterized matrix equations, non-intrusive methods use only the
solution of the parameterized system evaluated at a set of parameter values. In contrast, the intrusive method must solve a related system of equations many times
larger than the original parameterized system. Between these extremes lies an unexplored middle ground. In the weakly intrusive paradigm, we allow multiplication
of an arbitrary vector against the parameterized matrix evaluated at a point in the
parameter space, as well as evaluations of the parameterized right hand side. This
is comparable to matrix-free solvers such as Krylov-based iterative methods that use
less storage and require only matrix-vector multiplies. Under the weakly intrusive
paradigm, we take full advantage of any sparsity in the parameterized matrix, and
the only code required is to form the matrix and right hand side at a given parameter
value. The following mnemonic may be helpful for distinguishing the weakly intrusive method: Non-intrusive methods evaluate the solution, whereas weakly intrusive
methods evaluate the operators.
One strong advantage of the weakly intrusive paradigm is that it permits the
computation of a residual error estimate for a given approximation. This is comparable to residual error estimates for the solution of constant matrix equations; the
residual is bounded by a constant times the true error. It can also be thought of as
an a posteriori error estimate used in adaptive finite element methods. This type of
error estimate is not available in the strictly non-intrusive paradigm. We derive the
residual error estimate in Chapter 3 and discuss its computation in Chapter 5.
From the rigorous comparison of Galerkin and pseudospectral methods in Chapter
3, we unwittingly recover a cousin of Galerkin method called Galerkin with numerical
integration (G-NI) [16]. The basic idea of G-NI is to use Gaussian quadrature to
CHAPTER 1. INTRODUCTION
15
approximate the integrals in the Galerkin formulation. By the polynomial exactness
of Gaussian quadrature, if the parameterized matrix depends at most polynomially
on the parameters, then there is a quadrature rule that exactly recovers the Galerkin
system. The advantage is that the matrix used to compute the G-NI coefficients can be
factored into a product of a matrix with orthogonal rows, its transpose, and a blockdiagonal matrix sandwiched between. The block-diagonal matrix has blocks equal
to the parameterized matrix evaluated at each quadrature point. This factorization
allows us to to solve the system of equations for the G-NI coefficients within the
weakly intrusive paradigm. We derive and discuss this novel factorization, which has
not yet appeared in the literature, in Chapter 5. We accompany this discussion with
a downloadable suite of MATLAB tools called Parameterized Matrix Package that
solves parameterized matrix equations within the weakly intrusive paradigm.
Beyond implementation strategies, the factorization has many other useful consequences: (i) It allows one to interpret the G-NI approximation as the solution of a
weighted least-squares problem, which yields both theoretical insight and additional
computational strategies. (ii) It shows that increasing the order of the G-NI approximation can be viewed as adding a set of constraints in an active set optimization
method. (iii) It reveals sharp bounds on the eigenvalues of the linear system used to
compute the G-NI coefficients in the symmetric case. (iv) It points to preconditioning
strategies when using iterative methods to solve for the G-NI coefficients. We explore
these facts in detail in Chapter 5.
1.4
Summary of Contributions and Future Work
At the risk of being overly explicit, we summarize the contributions of this thesis
mentioned in the preceding discussion to the larger body of research.
• We propose and analyze the parameterized matrix equation as the model problem for research in UQ methods and algorithm development, effectively reframing the discussion from a stochastic context to an approximation theory context.
(Chapter 2).
CHAPTER 1. INTRODUCTION
16
• We rigorously compare the intrusive spectral Galerkin method and the nonintrusive pseudospectral method and provide a novel interpretation of each
method as solving a truncated infinite system of linear equations. (Chapter
3).
• We develop a heuristic for multivariate anisotropic approximation in the Galerkin
framework aimed at battling the curse of dimensionality by finding the most
important basis functions for accurate approximation. (Chapter 4).
• We propose a weakly intrusive paradigm for algorithm development that takes
advantage of sparsity for large-scale problems and permits a residual error estimate. (Chapter 5).
• We derive a factorization of the linear system used to compute the G-NI coefficients that yields theoretical insight, eigenvalue bounds, and preconditioning
strategies. (Chapter 5).
• We apply the spectral methods to a conjugate heat transfer problem of channel
flow around a cylinder with uncertain (i.e. parameterized) boundary conditions
to demonstrate the types of statistics computed to quantify the uncertainties in
model output. (Chapter 6).
By the end, we will have developed a set of efficient methods for approxmating the
variability in a given model output and its functional dependence on a set of parameters representing uncertainties in the model inputs.
Extensions to this work are plentiful. Regarding the parameterized matrix equations, the fundamental assumptions that allow one to apply spectral methods ought
to be relaxed for a more general problem setting; the independence assumption in the
parameters should be removed and methods should be sought that handle the case of
singularities within the parameter space. In fact, many systems in practice may have
singularities inside the parameter space, but spectral methods will not necessarily detect them (e.g. by failing to converge). A class of methods that detects singularities
or a set of heuristics for finding singularities in the parameter space would be very
useful. Even for problems without singularities, if there were a method developed for
CHAPTER 1. INTRODUCTION
17
finding singularities outside the parameter space, then there is hope of estimating the
convergence rate of a spectral method before ever applying it. There is potential for
such heuristics within the optimal control community, which has remained disjoint
from UQ even though the two often ask similar questions and seek similar answers.
There may be other ways to approximate the solution of a parameterized matrix
equation using other linear algebra techniques. Some matrix factorizations including
the singular value decomposition[15] and QR [22] come in parameterized flavors, which
hold promise for alternative methods to spectral methods. There has been some
work on a parameterized Lanczos method [66], but it is underdeveloped. However, if
a Lanczos-Stieltjes procedure for parameterized systems had as many nice practical
and theoretical properties as the Lanczos method for constant linear systems, then
such a method could have great impact on the field.
Of course, there are many parameterized models that do not easily reduce to matrix equations, such as nonlinear PDEs. These models are arguably more important
to predictive simulation than the linear models. The goal of this thesis was to understand nonlinear parameter dependence within the parameterized matrix equation
as the first step to addressing fully nonlinear models. Thus the next step is to apply
this understanding to nonlinear models. One possible strategy is to explore a Newton
iteration for nonlinear equations where each step is a parameterized matrix equation.
Within the context of multivariate spectral methods, the scale of the computations
will always challenge researchers and practitioners. One promising method for handling large-scale problems is to seek a low-rank approximation to the true Galerkin
coefficients. This could potentially reduce storage requirements and computational
cost. Some initial work has been done along these lines [23, 69], and this type of approximation is actively pursued in data analysis and machine learning communities,
but it is still immature and untested.
One practical step that must be taken is the development of easy-to-use software
libraries for solving parameterized matrix equations for high performance, massively
parallel platforms. As UQ continues to gain prominence and its questions continue
to resonate within computational science communities, there will be great demand
for software libraries that encode and execute the state-of-the-art in solvers, just
CHAPTER 1. INTRODUCTION
18
as we have seen with linear system solvers such as LAPACK. Our vision is for a
Parameterized LAPACK that will approximate the solution to parameterized matrix
equations.
These projects merely scrape the surface of possible research directions emanating
from this thesis. Our experience to date is that there is much to be learned from
disparate research communities in engineering, math, computer science, and physics.
Once a jargon barrier has been breached, the cross-fertilization of ideas flows freely
and both parties find insight into their own struggles. We hope this work will be
uniting instead of dividing and offer clarity for some complex concepts.
Chapter 2
Parameterized Matrix Equations
2.1
Problem Definition and Notation
We consider problems that involve a set of d parameters s = (s1 , . . . , sd ) that take
values in the hypercube [−1, 1]d . Assume that the hypercube is equipped with a
Q
normalized, separable scalar weight function w(s) = k wk (sk ). For functions f :
[−1, 1]d → R, we use bracket notation to denote the integral against the weight
function over the hypercube, i.e.
hf i ≡
Z
f (s)w(s) ds
(2.1)
[−1,1]d
In a stochastic context, w(s) is the probability density function for the independent
random variables s, and hf i is the expectation of f .
Since we are working with multivariate functions, we employ the standard multi-
index notation. Let a multi-index α = (α1 , . . . , αd ) ∈ Nd be a d-tuple of non-negative
integers. A subscript α denotes association with the particular multi-index. A superscript α denotes the following product:
α
s ≡
d
Y
sαk k .
k=1
This notation makes manipulations in multiple variables much clearer.
19
(2.2)
CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS
20
Let the RN -valued function x(s) satisfy the linear system of equations
A(s)x(s) = b(s),
s ∈ [−1, 1]d
(2.3)
for a given RN ×N -valued function A(s) and RN -valued function b(s). We assume that
both A(s) and b(s) are analytic in a region containing [−1, 1]d , which implies that
they have a convergent power series
A(s) =
X
A α sα ,
α∈Nd
b(s) =
X
bα sα
(2.4)
α∈Nd
for some constant matrices Aα and constant vectors bα . We assume that A(s) is
bounded away from singularity for all s ∈ [−1, 1]d . This implies that we can write
x(s) = A−1 (s)b(s).
The elements of the solution x(s) can also be written using Cramer’s rule [64,
Chapter 6] as a ratio of determinants.
xi (s) =
det(Ai (s))
,
det(A(s))
i = 0, . . . , N − 1,
(2.5)
where Ai (s) is the parameterized matrix formed by replacing the ith column of A(s)
by b(s). From equation (2.5) and the invertibility of A(s), we can conclude that each
component of x(s) is analytic in a region containing [−1, 1]d .
Equation (2.5) reveals the underlying structure of the solution as a function of s. If
A(s) and b(s) depend polynomially on s, then (2.5) tells us that x(s) is a multivariate
rational function. Note also that this structure is independent of the particular weight
function w(s).
2.2
Example – An Elliptic PDE with Random Coefficients
One of the motivating examples for this work comes from the field of partial differential equations with stochastic inputs. A model problem from this field – studied by
CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS
21
various authors [7, 28, 33] – is an elliptic equation with stochastic coefficients. More
precisely, let a(y, ω) be a positive, bounded random field with bounds
0 < al ≤ a(y, ω) ≤ au ,
y ∈ D, ω ∈ Ω,
(2.6)
where D is a given spatial domain and Ω is a given sample space. Then we seek a
function u = u(y, ω) that satisfies
∇ · (a(y, ω)∇u) = f (y, ω),
u(y, ω) = 0,
y ∈ D, ω ∈ Ω
y ∈ ∂D, ω ∈ Ω,
(2.7)
(2.8)
where f (y, ω) is a given forcing function. The first step is to approximate the coefficients by some finite dimensional parameterized function
a(y, ω) ≈ ad (y, s),
(2.9)
where the d independent parameters s1 , . . . , sd take values in some parameter space S
such that the approximation ad (y, s) is also bounded and positive. This can be accom-
plished with the well-known Karhunen-Loeve expansion [58] or some other modelling
techniques. We can assume without loss of generality that f (y, ω) = f (y) depends
only on the spatial variable y. Then equations (2.7)-(2.8) are replaced by
∇ · (ad (y, s)∇ud) = f (y),
ud (y, s) = 0,
y ∈ D, s ∈ S
y ∈ ∂D, s ∈ S.
(2.10)
(2.11)
At this point, the spatial domain is discretized using a standard discretization such
as a finite element or finite difference scheme, which results in a linear system of
equations for the degrees of freedom u = u(s)
K(s)u(s) = f,
(2.12)
CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS
22
where the elements of the coefficient matrix K(s) depend on the parameters s. Thus
we are left with a parameterized matrix equation we must solve to obtain the degrees
of freedom at each parameter value.
There are more technical details associated with this particular problem that
we have conveniently sidestepped. We refer to the above references for a thorough
treatment of the existence and well-posedness of this problem, as well as numerical
methods for its approximation. For our purposes, it serves as an outstanding example
of where one finds parameterized matrix equations.
2.3
A Discussion of Singularities
Returning to the general problem in equation (2.3), we may ask what is the goal of
the computation? We may not need the function pointwise over the whole parameter
space. Instead, we may seek a constant vector of averages hxi over the parameter
space. In a stochastic context, this can be thought of as computing the mean. Similarly, we may seek a measure of variability such as variance h(x − hxi)2 i or other
higher moments. Similar probabilistic measures include density or distribution functions, or the probability that some component of x(s) exceeds a given threshold. In
a reliability or control context, we may be interested in bounds on the components
of x(s) over the parameter space.
If the components of x(s) contain singularities in the parameter space, then some
of these measures – particularly the integral measures – do not exist, i.e. are infinite.
Singularities may arise in x(s) if A(s) is singular at some point in the parameter
space. This is apparent from equation (2.5). Notice that since the elements of A(s)
vary continuously with s by the assumption of analyticity, det(A(s)) also varies continuously with s. Thus, if A(s∗ ) is singular for some point s∗ in the domain, then
det(A(s∗ )) = 0 and the limit of 1/ det(A(s)) as s approaches s∗ is infinity. If each
function det(Ai (s)) does not also go to zero as s goes to s∗ , then some components of
x(s) will contain a pole. In fact, if the components of b(s) contain singularities, then
a non-singular A(s) implies that x(s) will inherit those singularities.
As a brief aside, some applications in differential equations with random inputs
CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS
23
try to model the randomness using Gaussian random fields. The problem with using
such models is that the resulting parameter space is unbounded. An unbounded
parameter space means that A(s) must be non-singular for every possible parameter
value. If A(s) depends polynomially on s with some odd degree polynomial (e.g.
linearly), then the matrix A(s) will become singular at some point in an unbounded
parameter space. The Gaussian measure imposed on the parameter space will not –
despite its exponential decay about the mean – remove the singularities, and integral
measures such as expectation and variance will not exist. In terms of function spaces,
the components of x(s) will not belong to L1 (or L2 ). To avoid these situations
altogether, we use a bounded parameter space and require A(s) to be non-singular at
each point in the parameter space.
Despite the protection of these assumptions, however, a singularity that occurs
outside the parameter space but close to the boundary can induce sharp gradients and
large variability in the function within the parameter space. These two characteristics
of a solution typically cause difficulties for numerical approximation methods. In
fact, we will see in Chapter 3 that the convergence of the spectral methods is directly
related to the nearness of the closest singularity to the parameter space in the complex
plane. We provide the following simple example to demonstrate the behavior of a
solution with a nearby singularity.
Let ε > 0, and consider the following parameterized matrix equation
"
#"
#
1 + ε s x0 (s)
s
1
x1 (s)
=
" #
2
1
.
(2.13)
For this case, we can compute the exact solution,
x0 (s) =
2−s
,
1 + ε − s2
x1 (s) =
1 + ε − 2s
.
1 + ε − s2
(2.14)
√
Both of these functions have poles at s = ± 1 + ε. In Figure 2.3 we plot x0 (s)
for various values of ε. Notice how the gradients become steeper and the variability
increases as ε goes to zero. We will return to this example in Chapter 3 to asses the
performance of the spectral methods.
24
CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS
16
ε=0.2
ε=0.4
ε=0.6
ε=0.8
14
x0(s)=(2−s)/(1+ε−s2)
12
10
8
6
4
2
0
−1
−0.5
0
s
0.5
1
Figure 2.1: Plotting the response surface x0 (s) that solves equation (3.66) for different
values of ε.
The point of the preceding discussion is to emphasize that singularities create
difficulties when solving parameterized matrix equations. Fortunately, many parameterized systems in practice have a priori bounds on valid parameters that yield
non-singular systems. For example, the matrix from the discretized elliptic equation
(2.12) is non-singular as long as the elliptic coefficients ad (x, s) remain strictly positive. If such a priori knowledge is not available, then heuristics may be pursued
that search for the nearest singularity, and such heuristics will be highly problem
dependent.
Chapter 3
Spectral Methods — Univariate
Approximation
In this section, we derive the spectral methods we use to approximate the solution
x(s) for the case when d = 1. We begin with a brief review of the relevant theory
of orthogonal polynomials, Gaussian quadrature, and Fourier series. We include this
section primarily for the sake of notation and refer the reader to a standard text
on orthogonal polynomials [82] for further theoretical details and [30] for a modern
perspective on computation.
3.1
Orthogonal Polynomials and Gaussian Quadrature
Let P be the space of real polynomials defined on [−1, 1], and let Pn ⊂ P be the space
of polynomials of degree at most n. For any p, q in P, we define the inner product as
hpqi ≡
Z
We define a norm on P as kpkL2 =
1
p(s)q(s)w(s) ds.
(3.1)
−1
p
hp2 i, which is the standard L2 norm for the
given weight w(s). Let {πk (s)} be the set of polynomials that are orthonormal with
25
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 26
respect to w(s), i.e. hπi πj i = δij . It is known that {πk (s)} satisfy the three-term
recurrence relation
βk+1 πk+1 (s) = (s − αk )πk (s) − βk πk−1 (s),
k = 0, 1, 2, . . . ,
(3.2)
with π−1 (s) = 0 and π0 (s) = 1. If we consider only the first n equations, then we can
rewrite (3.2) as
sπk (s) = βk πk−1 (s) + αk πk (s) + βk+1 πk+1 (s),
k = 0, 1, . . . , n − 1.
(3.3)
Setting π n (s) = [π0 (s), π1 (s), . . . , πn−1 (s)]T , we can write this conveniently in matrix
form as
sπ n (s) = Jn π n (s) + βn πn (s)en
(3.4)
where en is a vector of zeros with a one in the last entry, and Jn (known as the Jacobi
matrix ) is a symmetric, tridiagonal matrix defined as


α0 β1



 β1 α1 β2




.
.
.
.
.
.
Jn = 
.
.
.
.



βn−2 αn−2 βn−1 


βn−1 αn−1
(3.5)
The zeros {λi } of πn (s) are the eigenvalues of Jn and π n (λi ) are the corresponding
eigenvectors; this follows directly from (3.4). Let Qn be the orthogonal matrix of
eigenvectors of Jn . By equation (3.4), the elements of Qn are given by
Qn (i, j) =
πi (λj )
,
kπ n (λj )k2
i, j = 0, . . . , n − 1.
(3.6)
Then we write the eigenvalue decomposition of Jn as
Jn = Qn Λn QTn .
(3.7)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 27
It is known (c.f. [30]) that the eigenvalues {λi } are the familiar Gaussian quadrature points associated with the weight function w(s). The quadrature weight νi
corresponding to λi is equal to the square of the first component of the eigenvector
associated with λi , i.e.
νi = Q(0, i)2 =
1
.
kπ n (λi )k22
(3.8)
The weights {νi } are known to be strictly positive. We will use these facts repeat-
edly in the sequel. For an integrable scalar function f (s), we can approximate its
integral by an n-point Gaussian quadrature rule, which is a weighted sum of function
evaluations,
Z
1
f (s)w(s) ds =
−1
n−1
X
f (λi )νi + Rn (f ).
(3.9)
i=0
If f ∈ P2n−1 , then Rn (f ) = 0; that is to say the degree of exactness of the Gaussian
quadrature rule is 2n − 1. We use the notation
hf in ≡
n−1
X
f (λi )νi
(3.10)
i=0
to denote the Gaussian quadrature rule. This is a discrete approximation to the true
integral.
3.2
Fourier Series
The polynomials {πk (s)} form an orthonormal basis for the Hilbert space
L2 ≡ L2w ([−1, 1]) = {f : [−1, 1] → R | kf kL2 < ∞} .
(3.11)
Therefore, any f ∈ L2 admits a convergent Fourier series
f (s) =
∞
X
k=0
hf πk i πk (s).
(3.12)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 28
The coefficients hf πk i are called the Fourier coefficients. If we truncate the series
(3.12) after n terms, we are left with a polynomial of degree n − 1 that is the best
approximation polynomial in the L2 norm. In other words, if we denote
Pn f (s) =
n−1
X
k=0
hf πk i πk (s),
(3.13)
then
kf − Pn f kL2 = inf kf − pkL2 .
p∈Pn−1
(3.14)
In fact, the error made by truncating the series is equal to the sum of squares of the
neglected coefficients,
kf −
Pn f k2L2
=
∞
X
k=n
hf πk i2 .
(3.15)
These properties of the Fourier series motivate the theory and practice of spectral
methods.
We have shown that each element of the solution x(s) of the parameterized matrix
equation is analytic in a region containing the closed interval [−1, 1]. Therefore it is
continuous and bounded on [−1, 1], which implies that xi (s) ∈ L2 for i = 0, . . . , N −1.
We can thus write the convergent Fourier expansion for each element using vector
notation as
x(s) =
∞
X
k=0
hxπk i πk (s).
(3.16)
Note that we are abusing the bracket notation here, but this will make further manipulations very convenient. The computational strategy is to choose a truncation
level n − 1 and estimate the coefficients of the truncated expansion.
3.3
Spectral Collocation
The term spectral collocation typically refers to the technique of constructing a Lagrange interpolating polynomial through the exact solution evaluated at the Gaussian
quadrature points. Suppose that λi , i = 0, . . . , n − 1 are the Gaussian quadrature
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 29
points for the weight function w(s). We can construct an n − 1 degree polynomial
interpolant of the solution through these points as
xc,n (s) =
n−1
X
i=0
x(λi )ℓi (s) ≡ Xc ln (s).
(3.17)
The vector x(λi ) is the solution to the equation A(λi )x(λi ) = b(λi ). The n − 1 degree
polynomial ℓi (s) is the standard Lagrange basis polynomial defined as
ℓi (s) =
n−1
Y
s − λj
.
λ
i − λj
j=0, j6=i
(3.18)
The N × n constant matrix Xc (the subscript c is for collocation) has one column for
each x(λi ), and ln (s) is a vector of the Lagrange basis polynomials.
By construction, the collocation polynomial xc,n interpolates the true solution
x(s) at the Gaussian quadrature points. We will use this construction to show the
connection between the pseudospectral method and the Galerkin method.
3.4
Pseudospectral Methods
Notice that computing the true coefficients of the Fourier expansion of x(s) requires
the exact solution. The essential idea of the pseudospectral method is to approximate
the Fourier coefficients of x(s) by a Gaussian quadrature rule. In other words,
xp,n (s) =
n−1
X
i=0
hxπk in πk (s) ≡ Xp π n (s),
(3.19)
where Xp is an N × n constant matrix of the approximated Fourier coefficients; the
subscript p is for pseudospectral. For clarity, we recall
hxπk in =
n−1
X
i=0
x(λi )πk (λi )νi .
(3.20)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 30
where x(λi ) solves A(λi )x(λi ) = b(λi ). In general, the number of points in the quadrature rule need not have any relationship to the order of truncation. However, when
the number of terms in the truncated series is equal to the number of points in the
quadrature rule, the pseudospectral approximation is equivalent to the collocation
approximation. This relationship is well-known, but we include the following lemma
and theorem for use in later proofs.
Lemma 1 Let q0 be the first row of Qn , and define Dq0 = diag(q0 ). The matrices
Xp and Xc are related by the equation Xp = Xc Dq0 QTn .
Proof. Write
Xp (:, k) = hxπk in
=
n−1
X
x(λj )πk (λj )νj
j=0
=
n−1
X
Xc (:, j)
j=0
1
πk (λj )
kπ n (λj )k2 kπ n (λj )k2
= Xc Dq0 QTn (:, k)
which implies Xp = Xc Dq0 QTn as required.
Theorem 2 The n − 1 degree collocation approximation is equal to the n − 1 degree
pseudospectral approximation using an n-point Gaussian quadrature rule, i.e.
xc,n (s) = xp,n (s).
(3.21)
for all s.
Proof. Note that the elements of q0 are all non-zero, so D−1
q0 exists. Then Lemma
1 implies Xc = Xp Qn D−1
q0 . Using this change of variables, we can write
xc,n (s) = Xc ln (s) = Xp Qn D−1
q0 ln (s).
(3.22)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 31
Thus it is sufficient to show that π n (s) = Qn D−1
q0 ln (s). Since this is just a vector of
polynomials with degree at most n − 1, we can do this by multiplying each element
by each orthonormal basis polynomial up to order n − 1 and integrating. Towards
this end we define Θ ≡ ln π Tn .
Using the polynomial exactness of the Gaussian quadrature rule, we compute the
i, j element of Θ.
Θ(i, j) = hli πj i
=
n−1
X
ℓi (λk )πj (λk )νk
k=0
=
1
πj (λi )
kπ n (λi )k2 kπ n (λi )k2
= Qn (0, i)Qn (j, i),
which implies that Θ = Dq0 QTn . Therefore
T
−1
T
Qn D−1
q0 ln π n = Qn Dq0 ln π n
= Qn D−1
q0 Θ
T
= Qn D−1
q0 Dq0 Qn
= I,
where I is the appropriately sized identity matrix. This completes the proof.
Some refer to the pseudospectral method explicitly as an interpolation method [12].
See [43] for an insightful interpretation in terms of a discrete projection. Because of
this property, we will freely interchange the collocation and pseudospectral approximations when convenient in the ensuing analysis.
The work required to compute the pseudospectral approximation is highly dependent on the parameterized system. In general, we assume that the computation of
x(λi ) dominates the work; in other words, the cost of computing Gaussian quadrature formulas is negligible compared to computing the solution to each linear system.
Then if each x(λi ) costs O(N 3 ), the pseudospectral approximation with n terms costs
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 32
O(nN 3 ).
3.5
Spectral Galerkin
The spectral Galerkin method computes a finite dimensional approximation to x(s)
such that each element of the equation residual is orthogonal to the approximation
space. Define
r(y, s) = A(s)y(s) − b(s).
(3.23)
The finite dimensional approximation space for each component xi (s) will be the
space of polynomials of degree at most n − 1. This space is spanned by the first
n orthonormal polynomials, i.e. span(π0 (s), . . . , πn−1 (s)) = Pn−1 . We seek an RN valued polynomial xg,n (s) of maximum degree n − 1 such that
hri (xg,n )πk i = 0,
i = 0, . . . , N − 1,
k = 0, . . . , n − 1,
(3.24)
where ri (xg,n ) is the ith component of the residual. We can write equations (3.24) in
matrix notation as
r(xg,n )π Tn = 0
(3.25)
Axg,n π Tn = bπ Tn .
(3.26)
or equivalently
Since each component of xg,n (s) is a polynomial of degree at most n − 1, we can write
its expansion in {πk (s)} as
xg,n (s) =
n−1
X
k=0
xg,k πk (s) ≡ Xg π n (s),
(3.27)
where Xg is a constant matrix of size N × n; the subscript g is for Galerkin. Then
equation (3.26) becomes
AXg π n π Tn = bπ Tn .
(3.28)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 33
Using the vec notation [37, Section 4.5], we can rewrite (3.28) as
π n π Tn ⊗ A vec(Xg ) = hπ n ⊗ bi .
(3.29)
where vec(Xg ) is an Nn × 1 constant vector equal to the columns of Xg stacked on
top of each other. The constant matrix π n π Tn ⊗ A has size Nn × Nn and a distinct
block structure; the i, j block of size N × N is equal to hπi πj Ai. More explicitly,


hπ0 π0 Ai · · · hπ0 πn−1 Ai


..
..
..
.
π n π Tn ⊗ A = 
.
.
.


hπn−1 π0 Ai · · · hπn−1 πn−1 Ai
(3.30)
Similarly, the ith block of the Nn×1 vector hπ n ⊗ bi is equal to hbπi i, which is exactly
the ith Fourier coefficient of b(s).
Since A(s) is bounded and nonsingular for all s ∈ [−1, 1], it is straightforward to
show that xg,n (s) exists and is unique using the classical Galerkin theorems presented
and summarized in Brenner and Scott [14, Chapter 2]. This implies that Xg is unique,
and since b(s) is arbitrary, we conclude that the matrix π n π Tn ⊗ A is nonsingular
for all finite truncations n.
The work required to compute the Galerkin approximation depends on how one
computes the integrals in equation (3.29). If we assume that the cost of forming the
system is negligible, then the costly part of the computation is solving the system
(3.29). The size of the matrix π n π Tn ⊗ A is Nn × Nn, so we expect an operation count of O(N 3 n3 ), in general. However, many applications beget systems with
sparsity or exploitable structure that can considerably reduce the required work.
3.6
A Brief Review
We have discussed two classes of spectral methods: (i) the interpolatory pseudospectral method which approximates the truncated Fourier series of x(s) by using a Gaussian quadrature rule to approximate each Fourier coefficient, and (ii) the Galerkin
projection method which finds an approximation in a finite dimensional subspace
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 34
such that the residual A(s)xg,n (s) − b(s) is orthogonal to the approximation space. In
general, the n-term pseudospectral approximation requires n solutions of the original
parameterized matrix equation (2.3) evaluated at the Gaussian quadrature points,
while the Galerkin method requires the solution of the coupled linear system of equations (3.29) that is n times as large as the original parameterized matrix equation. A
rough operation count for the pseudospectral and Galerkin approximations is O(nN 3 )
and O(n3 N 3 ), respectively.
Before discussing asymptotic error estimates, we first derive some interesting and
useful connections between these two classes of methods. In particular, we can interpret each method as a set of functions acting on the infinite Jacobi matrix for the
weight function w(s); the difference between the methods lies in where each truncates
the infinite system of equations.
3.7
Connections Between Pseudospectral and Galerkin
We begin with a useful lemma for representing a matrix of Gaussian quadrature
integrals in terms of functions of the Jacobi matrix.
Lemma 3 Let f (s) be a scalar function analytic in a region containing [−1, 1]. Then
f π n π Tn n = f (Jn ).
Proof. We examine the i, j element of the n × n matrix f (Jn ).
eTi f (Jn )ej = eTi Qn f (Λn )QTn ej
= qTi f (Λn )qj
=
=
n−1
X
k=0
n−1
X
f (λk )
f (λk )πi (λk )πj (λk )νk
k=0
= hf πi πj in ,
which completes the proof.
πi (λk ) πj (λk )
kπ(λk )k2 kπ(λk )k2
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 35
Note that Lemma 3 generalizes Theorem 3.4 in [36]. With this in the arsenal, we
can prove the following theorem relating pseudospectral to Galerkin.
Theorem 4 The pseudospectral solution is equal to an approximation of the Galerkin
solution where each integral in equation (3.29) is approximated by an n-point Gaussian
quadrature formula. In other words, Xp solves
π n π Tn ⊗ A n vec(Xp ) = hπ n ⊗ bin .
(3.31)
Proof. Define the N × n matrix Bc = [b(λ0 ) · · · b(λn−1 )]. Using the power se-
ries expansion of A(s) (equation (2.4)), we can write the matrix of each collocation
solution as
A(λk ) =
∞
X
Aiλik
(3.32)
i=0
for k = 0, . . . , n − 1. We collect these into one large block-diagonal system by writing
∞
X
i=0
Λin ⊗ Ai
!
vec(Xc ) = vec(Bc ).
(3.33)
Let I be the N × N identity matrix. Premultiply (3.33) by (Dq0 ⊗ I), and by com-
mutativity of diagonal matrices and the mixed product property, it becomes
∞
X
i=0
Λin ⊗ Ai
!
(Dq0 ⊗ I)vec(Xc ) = (Dq0 ⊗ I)vec(Bc ).
(3.34)
Premultiplying (3.34) by (Qn ⊗ I), properly inserting (QTn ⊗ I)(Qn ⊗ I) on the left
hand side, and using the eigenvalue decomposition (3.7), this becomes
∞
X
i=0
!
Jin ⊗ Ai (Qn ⊗ I)(Dq0 ⊗ I)vec(Xc ) = (Qn ⊗ I)(Dq0 ⊗ I)vec(Bc ).
(3.35)
But note that Lemma 1 implies
(Qn ⊗ I)(Dq0 ⊗ I)vec(Xc ) = vec(Xp ).
(3.36)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 36
Using an argument identical to the proof of Lemma 1, we can write
(Qn ⊗ I)(Dq0 ⊗ I)vec(Bc ) = hπ n ⊗ bin .
(3.37)
Finally, using Lemma 3, equation (3.35) becomes
as required.
π n π Tn ⊗ A n vec(Xp ) = hπ n ⊗ bin
(3.38)
Theorem 4 begets a corollary giving conditions for equivalence between Galerkin
and pseudospectral approximations.
Corollary 5 If b(s) contains only polynomials of maximum degree mb and A(s)
contains only polynomials of maximum degree 1 (i.e. linear functions of s), then
xg,n (s) = xp,n (s) for n ≥ mb for all s ∈ [−1, 1].
Proof. The parameterized matrix π n (s)π n (s)T ⊗ A(s) has polynomials of degree
at most 2n − 1. Thus, by the polynomial exactness of the Gaussian quadrature
formulas,
π n π Tn ⊗ A n = π n π Tn ⊗ A ,
hπ n ⊗ bin = hπ n ⊗ bi .
(3.39)
Therefore Xg = Xp , and consequently
xg,n (s) = Xg π n (s) = Xp π n (s) = xp,n (s).
(3.40)
as required.
By taking the transpose of equation (3.28) and following the steps of the proof of
Theorem 4, we get another interesting corollary.
Corollary 6 First define A(Jn ) to be the Nn × Nn constant matrix with the i, j
block of size n × n equal to A(i, j)(Jn ) for i, j = 0, . . . , N − 1. Next define b(Jn ) to
be the Nn × n constant matrix with the ith n × n block equal to bi (Jn ). Then the
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 37
pseudospectral coefficients Xp satisfy
A(Jn )vec(XTp ) = b(Jn )e0 ,
(3.41)
where e0 = [1, 0, . . . , 0]T is an n-vector.
Theorem 4 leads to a fascinating connection between the matrix operators in
the Galerkin and pseudospectral methods, namely that the matrix in the Galerkin
system is equal to a submatrix of the matrix from a sufficiently larger pseudospectral
computation. This is the key to understanding the relationship between the Galerkin
and pseudospectral approximations. In the following lemma, we denote the first r × r
principal minor of a matrix M by [M]r×r .
Lemma 7 Let A(s) contain only polynomials of degree at most ma , and let b(s)
contain only polynomials of degree at most mb . Define
m ≡ m(n) ≥ max
ma + 2n − 1
2
,
mb + n
2
(3.42)
Then
π n π Tn ⊗ A = π m π Tm ⊗ A m N n×N n
hπ n ⊗ bi = [hπ m ⊗ bim ]N n×1 .
Proof. The integrands of the matrix π n π Tn ⊗ A are polynomials of degree
at most 2n + ma − 2. Therefore they can be integrated exactly with a Gaussian
quadrature rule of order m. A similar argument holds for hπ n ⊗ bi.
Combining Lemma 7 with Corollary 6, we get the following proposition relat-
ing the Galerkin coefficients to the Jacobi matrices for A(s) and b(s) that depend
polynomially on s.
Proposition 8 Let m, ma , and mb be defined as in Lemma 7. Define [A]n (Jm ) to be
the Nn × Nn constant matrix with the i, j block of size n × n equal to [A(i, j)(Jm )]n×n
for i, j = 0, . . . , N − 1. Define [b]n (Jm ) to be the Nn × n constant matrix with the ith
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 38
n × n block equal to [bi (Jm )]n×n for i = 0, . . . , N − 1. Then the Galerkin coefficients
Xg satisfy
[A]n (Jm )vec(XTg ) = [b]n (Jm )e0 ,
(3.43)
where e0 = [1, 0, . . . , 0]T is an n-vector.
Notice that Proposition 8 provides a way to compute the exact matrix for the
Galerkin computation without any symbolic manipulation, but beware that m depends on both n and the largest degree of polynomial in A(s). Written in this form,
we have no trouble taking m to infinity, and we arrive at the main theorem of this
section.
Theorem 9 Using the notation of Proposition 8 and Corollary 6, the coefficients Xg
of the n-term Galerkin approximation of the solution x(s) to equation (2.3) satisfy
the linear system of equations
[A]n (J∞ )vec(XTg ) = [b]n (J∞ )e0 ,
(3.44)
where e0 = [1, 0, . . . , 0]T is an n-vector.
Proof. Let A(ma ) (s) be the truncated power series of A(s) up to order ma , and let
b(mb ) (s) be the truncated power series of b(s) up to order mb . Since A(s) is analytic
and bounded away from singularity for all s ∈ [−1, 1], there exists an integer M such
that A(ma ) (s) is also bounded away from singularity for all s ∈ [−1, 1] and all ma > M
(although the bound may be depend on ma ). Assume that ma > M.
(ma ,mb )
Define m as in equation (3.42). Then by Proposition 8, the coefficients Xg
of the n-term Galerkin approximation to the solution of the truncated system satisfy
a ,mb ) T
[A(ma ) ]n (Jm )vec((X(m
) ) = [b(mb ) ]n (Jm )e0 .
g
(3.45)
By the definition of m (equation (3.42)), equation (3.45) holds for all integers greater
than some minimum value. Therefore, we can take m → ∞ without changing the
solution at all, i.e.
a ,mb ) T
[A(ma ) ]n (J∞ )vec((X(m
) ) = [b(mb ) ]n (J∞ )e0 .
g
(3.46)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 39
Next we take ma , mb → ∞ to get
[A(ma ) ]n (J∞ ) → [A]n (J∞ )
[b(mb ) ]n (J∞ ) → [b]n (J∞ )
which implies
a ,mb )
X(m
→ Xg
g
(3.47)
as required.
Theorem 9 and Corollary 6 reveal the fundamental difference between the Galerkin
and pseudospectral approximations. We put them side-by-side for comparison.
[A]n (J∞ )vec(XTg ) = [b]n (J∞ )e0 ,
A(Jn )vec(XTp ) = b(Jn )e0 .
(3.48)
The difference lies in where the truncation occurs. For pseudospectral, the infinite
Jacobi matrix is first truncated, and then the operator is applied. For Galerkin, the
operator is applied to the infinite Jacobi matrix, and the resulting system is truncated. The question that remains is whether it matters. As we will see in the error
estimates in the next section, the interpolating pseudospectral approximation converges at a rate comparable to the Galerkin approximation, yet requires considerably
less computational effort.
3.8
Error Estimates
Asymptotic error estimates for polynomial approximation are well-established in
many contexts, and the theory is now considered classical. Our goal is to apply
the classical theory to relate the rate of geometric convergence to some measure of
singularity for the solution. We do not seek the tightest bounds in the most appropriate norm as in [16], but instead we offer intuition for understanding the asymptotic
rate of convergence. We also present a residual error estimate that may be more
useful in practice. We complement the analysis with two representative numerical
examples.
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 40
To discuss convergence, we need to choose a norm. In the statements and proofs,
we will use the standard L2 and L∞ norms generalized to RN -valued functions.
Definition 10 For a function f : R → RN , define the L2 and L∞ norms as
kf kL2
v
uN −1 Z
uX
:= t
1
i=0
−1
kf kL∞ := max
0≤i≤N −1
fi2 (s)w(s) ds
sup |fi (s)|
−1≤s≤1
(3.49)
(3.50)
With these norms, we can state error estimates for both Galerkin and pseudospectral methods.
Theorem 11 (Galerkin Asymptotic Error Estimate) Let ρ∗ be the sum of the
semi-axes of the greatest ellipse with foci at ±1 in which xi (s) is analytic for i =
0, . . . , N −1. Then for 1 < ρ < ρ∗ , the asymptotic error in the Galerkin approximation
is
kx − xg,n kL2 ≤ Cρ−n ,
(3.51)
where C is a constant independent of n.
Proof. We begin with the standard error estimate for the Galerkin method [16,
Section 6.4] in the L2 norm,
kx − xg,n kL2 ≤ C kx − Rn xkL2 .
(3.52)
The constant C is independent of n but depends on the extremes of the bounded
eigenvalues of A(s). Under the consistency hypothesis, the operator Rn is a projection
operator such that
kxi − Rn xi kL2 → 0,
n→∞
(3.53)
for i = 0, . . . , N − 1. For our purpose, we let Rn x be the expansion of x(s) in terms
of the Chebyshev polynomials,
Rn x(s) =
n−1
X
k=0
ak Tk (s),
(3.54)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 41
where Tk (s) is the kth Chebyshev polynomial, and
ak,i
2
=
πck
Z
1
−1
xi (s)Tk (s)(1 − s2 )−1/2 ds,
ck =
(
2
if k = 0
1
otherwise
(3.55)
for i = 0, . . . , N −1. Since x(s) is continuous for all s ∈ [−1, 1] and w(s) is normalized,
we can bound
kx − Rn xkL2 ≤
√
N kx − Rn xkL∞
(3.56)
The Chebyshev series converges uniformly for functions that are continuous on [−1, 1],
so we can bound
∞
X
=
ak Tk (s)
∞
L
k=n
∞
X
|ak |
≤
kx − Rn xkL∞
k=n
(3.57)
(3.58)
∞
since −1 ≤ Tk (s) ≤ 1 for all k. To be sure, the quantity |ak | is the component-wise
absolute value of the constant vector ak , and the norm k · k∞ is the standard infinity
norm on RN .
Using the result stated in [39, Section 3], we have
lim sup |ak,i|1/k =
k→∞
1
,
ρ∗i
i = 0, . . . , N − 1
(3.59)
where ρ∗i is the sum of the semi-axes of the greatest ellipse with foci at ±1 in which
xi (s) is analytic. This implies that asymptotically
|ak,i | = O
ρ i
k
,
i = 0, . . . , N − 1.
(3.60)
for ρi < ρ∗i . We take ρ = mini ρi , which suffices to prove the estimate (3.51).
Theorem 11 recalls the well-known fact that the convergence of many polynomial
approximations (e.g. power series, Fourier series) depend on the size of the region
in the complex plane in which the function is analytic. Thus, the location of the
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 42
singularity nearest the interval [−1, 1] determines the rate at which the approximation
converges as one includes higher powers in the polynomial approximation. Next we
derive a similar result for the pseudospectral approximation using the fact that it
interpolates x(s) at the Gaussian points of the weight function w(s).
Theorem 12 (Pseudospectral Asymptotic Error Estimate) Let ρ∗ be the sum
of the semi-axes of the greatest ellipse with foci at ±1 in which xi (s) is analytic for
i = 0, . . . , N − 1. Then for 1 < ρ < ρ∗ , the asymptotic error in the pseudospectral
approximation is
kx − xp,n kL2 ≤ Cρ−n ,
(3.61)
where C is a constant independent of n.
Proof. Recall that xc,n (s) is the Lagrange interpolant of x(s) at the Gaussian
points of w(s), and let xc,n,i(s) be the ith component of xc,n (s). We will use the result
from [79, Theorem 4.8] that
Z
1
−1
(xi (s) − xc,n,i(s))2 w(s) ds ≤ 4En2 (xi ),
(3.62)
where En (xi ) is the error of the best approximation polynomial in the uniform norm.
We can, again, bound En (xi ) by the error of the Chebyshev expansion (3.54). Using
Theorem 2 with equation (3.62),
kx − xp,n kL2 = kx − xc,n kL2
√
≤ 2 N kx − Rn xkL∞ .
The remainder of the proof proceeds exactly as the proof of Theorem 11.
We have shown, using classical approximation theory, that the interpolating pseudospectral method and the Galerkin method have the same asymptotic rate of geometric convergence. This rate of convergence depends on the size of the region in
the complex plane where the functions x(s) are analytic. The structure of the matrix
equation reveals at least one singularity that occurs when A(s∗ ) is rank-deficient for
some s∗ ∈ R, assuming the right hand side b(s∗ ) does not fortuitously remove it.
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 43
For a general parameterized matrix, this fact may not be useful. However, for many
parameterized systems in practice, the range of the parameter is dictated by existence
and/or stability criteria. The value that makes the system singular is often known
and has some interpretation in terms of the model. In these cases, one may have an
upper bound on ρ, which is the sum of the semi-axes of the ellipse of analyticity, and
this can be used to estimate the geometric rate of convergence a priori.
We end this section with a residual error estimate – similar to residual error
estimates for constant matrix equations – that may be more useful in practice than
the asymptotic results.
Theorem 13 Define the residual r(y, s) as in equation (3.23), and let e(y, s) = x(s)−
y(s) be the RN -valued function representing the error in the approximation y(s). Then
C1 kr(y)kL2 ≤ ke(y)kL2 ≤ C2 kr(y)kL2
(3.63)
for some constants C1 and C2 , which are independent of y(s).
Proof. Since A(s) is non-singular for all s ∈ [−1, 1], we can write
A−1 (s)r(y, s) = y(s) − A−1 (s)b(s) = e(y, s)
(3.64)
so that
ke(y)k2L2 = e(y)T e(y)
= r T (y)A−T A−1 r(y)
Since A(s) is bounded, so is A−1 (s). Therefore, there exist constants C1∗ and C2∗ that
depend only on A(s) such that
C1∗ r T (y)r(y) ≤ eT (y)e(y) ≤ C2∗ r T (y)r(y) .
Taking the square root yields the desired result.
(3.65)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 44
Theorem 13 states that the L2 norm of the residual behaves like the L2 norm of
the error. In many cases, this residual error may be much easier to compute than the
true L2 error. However, as in residual error estimates for constant matrix problems,
the constants in Theorem 13 will be large if the bounds on the eigenvalues of A(s)
are large. We apply these results in the next section with two numerical examples.
3.9
Numerical Examples
We examine two simple examples of spectral methods applied to parameterized matrix
equations. The first is a 2 × 2 symmetric parameterized matrix, and the second is
a discretized second order ODE. In both cases, we relate the convergence of the
spectral methods to the size of the region of analyticity and verify this relationship
numerically. We also compare the behavior of the true error to the behavior of the
residual error estimate from Theorem 13.
To keep the computations simple, we use a constant weight function w(s). The
corresponding orthonormal polynomials are the normalized Legendre polynomials,
and the Gauss points are the Gauss-Legendre points.
3.9.1
A 2 × 2 Parameterized Matrix Equation
We return to the example introduced in Chapter 2. Let ε > 0, and consider the
following parameterized matrix equation
"
#"
#
1 + ε s x0 (s)
s
1
x1 (s)
=
" #
2
1
.
(3.66)
For this case, we can easily compute the exact solution,
x0 (s) =
2−s
,
1 + ε − s2
x1 (s) =
1 + ε − 2s
.
1 + ε − s2
(3.67)
√
Both of these functions have a poles at s = ± 1 + ε, so the sum of the semi-axes of
√
the ellipse of analyticity is bounded, i.e. ρ < 1 + ε. Notice that the matrix is linear
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 45
0
0
10
Residual Error Estimate
10
−2
L2 Error
10
−4
10
−6
10
−8
10
−10
10
0
ε=0.8
ε=0.6
ε=0.4
ε=0.2
5
−2
10
−4
10
−6
10
ε=0.8
ε=0.6
ε=0.4
ε=0.2
−8
10
−10
10
15
Order
20
25
30
10
0
5
10
15
Order
20
25
30
Figure 3.1: The convergence of the spectral methods applied to equation (3.66). The
figure on the left shows plots the L2 error as the order of approximation increases,
and the figure on the right plots the residual error estimate. The stairstep behavior
relates to the fact that x0 (s) and x1 (s) are odd functions over [−1, 1].
in s, and the right hand side has no dependence on s. Thus, Corollary 5 implies that
the Galerkin approximation is equal to the pseudospectral approximation for all n;
there is no need to solve the system (3.29) to compute the Galerkin approximation.
In Figure 3.1 we plot both the true L2 error and the residual error estimate for four
values of ε. The results confirm the analysis.
3.9.2
A Parameterized Second Order ODE
Consider the second order boundary value problem
du
d
α(s, t)
=1
dt
dt
t ∈ [0, 1]
(3.68)
u(0) = 0
(3.69)
u(1) = 0
(3.70)
where, for ε > 0,
α(s, t) = 1 + 4 cos(πs)(t2 − t),
s ∈ [ε, 1].
(3.71)
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 46
0
0
10
Residual Error −− Pseudospectral
Residual Error −− Galerkin
10
−5
10
−10
10
−15
10
0
ε=0.4
ε=0.3
ε=0.2
ε=0.1
5
−5
10
−10
10
−15
10
15
Order
20
25
30
10
0
ε=0.4
ε=0.3
ε=0.2
ε=0.1
5
10
15
Order
20
25
30
Figure 3.2: The convergence of the residual error estimate for the Galerkin and pseudospectral approximations applied to the parameterized matrix equation (3.73).
The exact solution is
u(s, t) =
1
ln 1 + 4 cos(πs)(t2 − t) .
8 cos(πs)
(3.72)
The solution u(s, t) has a singularity at s = 0 and t = 1/2. Notice that we have
adjusted the range of s to be bounded away from 0 by ε. We use a standard piecewise
linear Galerkin finite element method with 512 elements in the t domain to construct
a stiffness matrix parameterized by s, i.e.
(K0 + cos(πs)K1 )x(s) = b.
(3.73)
Figure 3.2 shows the convergence of the residual error estimate for both Galerkin and
pseudospectral approximations as n increases. (Despite having the exact solution
(3.72) available, we do not present the decay of the L2 error; it is dominated entirely
by the discretization error in the t domain.) As ε gets closer to zero, the geometric
convergence rate of the spectral methods degrades considerably. Also, note that each
element of the parameterized stiffness matrix is an analytic function of s, but figure
3.2 verifies that the less expensive pseudospectral approximation converges at the
same rate as the Galerkin approximation.
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 47
3.10
Summary
We derived two basic spectral methods: (i) the interpolatory pseudospectral method,
which approximates the coefficients of the truncated Fourier series with Gaussian
quadrature formulas, and (ii) the Galerkin method, which finds an approximation in
a finite dimensional subspace by requiring that the equation residual be orthogonal to
the approximation space. The primary work involved in the pseudospectral method
is solving the parameterized system at a finite set of parameter values, whereas the
Galerkin method requires the solution of a coupled system of equations many times
larger than the original parameterized system.
We showed that one can interpret the differences between these two methods
as a choice of when to truncate an infinite linear system of equations. Employing
this relationship we derived conditions under which these two approximations are
equivalent. In this case, there is no reason to solve the large coupled system of
equations for the Galerkin approximation.
Using classical techniques, we presented asymptotic error estimates relating the
decay of the error to the size of the region of analyticity of the solution; we also
derived a residual error estimate that may be more useful in practice. We verified the
theoretical developments with two numerical examples: a 2 × 2 matrix equation and
a finite element discretization of a parameterized second order ODE.
The popularity of spectral methods for PDEs stems from their infinite (i.e. geometric) order of convergence for smooth functions compared to finite difference
schemes. We have the same advantage in the case of parameterized matrix equations, plus the added bonus that there are no boundary conditions to consider. The
primary concern for these methods is determining the value of the parameter closest
to the domain that renders the system singular.
3.11
Application — PageRank
Google’s PageRank model [72] provides an excellent example of a parameterized matrix equation with a single parameter in a real application. We present the results
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 48
of applying spectral methods to this parameterized system and interpret some the
results. The goal of PageRank is to develop a measure of importance to rank web
pages (or nodes in a graph, more generally). We will briefly derive the PageRank
system here – omitting some important details – starting from the Markov chain formulation. The interested reader is encouraged to explore references [72] and [51] for
further details.
The PageRank model begins with a directed graph on N nodes and posits a random surfer traveling from node to node. At each step, the surfer either follows a
link in the graph with probability α or jumps to a node in the graph uniformly with
probability 1 − α. If the surfer follows a link, then he chooses amongst the outlinks of
the current page with uniform probability. This model defines an irreducible, aperi-
odic Markov chain with a unique stationary probability distribution. The stationary
distribution measures the importance of a page, and this metric is used to rank a
given subset of pages, such as those that are returned by a search query.
The graph begets a column stochastic transition probability matrix P, and the
vector of stationary probabilities (i.e. the PageRank vector) is the unique eigenvector
x(α) with unit one-norm that satisfies
(αP + (1 − α)veT )x(α) = x(α),
(3.74)
where e is an N-vector of ones. Equivalently, x(α) solves the parameterized matrix
equation
(I − αP)x(α) = (1 − α)v.
(3.75)
The current practice is to choose a value for α based on some modelling assumptions,
solve the constant linear system of equations, and use the resulting solution vector as
the PageRank vector.
The choice of α is an unsettled issue in the PageRank literature, with the most
popular choices being 0.85 [60] and 0.5 [4]. However, if we instead leave α variable, we
can prescribe a weight function over the interval [0, 1] and interpret the α as a random
variable with a given probability density function. This idea has been explored in [19].
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 49
Notice that both the matrix and the right hand side in equation (3.75) depend linearly on the parameter α. Therefore, by Corollary 5, the Galerkin and pseudospectral
approximations produce the same polynomial representations for n ≥ 2.
The the matrix I − αP is actually singular for α = 1, since by equation (3.75),
(I − P)y = 0 for some non-zero y. Initially, this may suggest that the solution
has a singularity at the endpoint of the interval, thus destroying all hope for rapid
convergence of the spectral methods. However, by construction (see equation (3.74)),
the point α = 1 actually constitutes a removable discontinuity in the functions x(α).
The closest true singularity on the real line lies beyond the endpoint 1.
We test convergence by comparing the expectation and standard deviation of a
pseudospectral approximation with a semi-analytical solution. Using the symbolic
toolbox in Matlab, we compute the PageRank vector as a rational function of α on
the 335 node connected component of the har500cc graph [65]. Using Mathematica,
we numerically integrate to compute the expectation and standard deviation in 32digit arithmetic, which resolve the exact solution when converted to a double precision
number. Figure 3.3 plots the convergence of expectation (solid circles) and standard
deviation (plus signs) for four different Beta distribution models of α.
The covariance structure in the components of the vector x(α) reveals new information about the underlying graph, and its elements may prove useful for additional
websearch applications such as spam detection [34].
CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 50
0
10
−5
10
−10
10
−15
10
0
10
20
30
40
50
60
70
80
90
100
Figure 3.3: The x-axis counts the number of points in the Gaussian quadrature rule
for the expectation and standard deviation of PageRank. This is equivalent to the
number of basis functions in the pseudospectral approximation. The y-axis measures
the difference between between the approximation and the exact solution. Solid
circles correspond to expectation and plus signs correspond to standard deviation.
The different colors correspond to the following Beta distributions for α: β(2, 16, 0, 1)
– blue, β(0, 0, 0.6, 0.9) – salmon, β(1, 1, 0.1, 0.9) – green, β(−0.5, −0.5, 0.2, 0.7) – red,
where β(a, b, l, r) are the signifies distribution parameters a and b, and endpoints l
and r.
Chapter 4
Spectral Methods — Multivariate
Approximation
In this chapter we extend the univariate spectral methods to multivariate analogs,
which will handle systems that depend on multiple parameters. In other words, we
assume s = (s1 , . . . , sd ) where d > 1. Multivariate approximation schemes are notoriously expensive and quickly push the bounds of feasibility for even a modest number
of variables. There is a vast literature on recent methods for alleviating this curse of
dimensionality for certain classes of problems; Griebel provides an excellent survey of
this work in [40]. Many of these techniques are now applied in the context of differential equations with stochastic inputs, where the inputs are typically represented as
a function of a set of parameters [67, 89, 23, 10]. Such problems are closely related to
parameterized matrix equations since discretization in the spatial and temporal domains often yields a linear system of equations whose elements depend on the input
parameters.
In Section 4.5, we develop a scheme that is robust and efficient for many types of
problems within the parameterized matrix equation context. We will compare this
proposed scheme to standard multivariate approximation schemes on two problems
that showcase the features of the developed heuristic. We will also tackle a standard
test problem from the literature. As a final performance assessment, in Chapter 5 we
apply the proposed scheme to an engineering application where the standard methods
51
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION52
are infeasible.
Before we delve into heuristics, we present some of fundamental concepts of the
multivariate approximation in a spectral methods context.
4.1
Tensor Product Extensions
The most natural extension of the univariate methods to multivariate methods is via
product basis functions. Since each parameter induces an independent coordinate
direction in the parameter space, and since we assume a separable weight function
Q
on the space w(s) = k wk (sk ), then we can construct a multivariate orthogonal
basis by taking products of the univariate basis functions. At this point, it is most
convenient to employ the multi-index notation introduced in Chapter 2. Each multiindex α = (α1 , . . . , αd ) ∈ Nd has an associated basis function
πα (s) =
d
Y
παk (sk ).
(4.1)
k=1
The separability of the weight function implies that these multivariate polynomials
are orthogonal with respect to the w(s) since, for α, β ∈ Nd
hπα πβ i =
d Z
Y
k=1
1
παk (sk )πβk (sk )wk (sk ) dsk = δαβ ,
(4.2)
−1
where δαβ is one if each component of α and β are the same, and zero otherwise.
Therefore, the multivariate orthogonal polynomials {πα (s), α ∈ Nd } are a basis
for square-integrable functions of the parameter space. In other words, any square
integrable function f : [−1, 1]d → R can expressed as
f (s) =
X
α∈N
where hf πα i are the Fourier coefficients.
hf πα i πα (s),
(4.3)
For a finite term approximation, we use a subset of these basis functions and
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION53
compute approximate Fourier coefficients. A full tensor product basis consists of all
possible products of the set of univariate polynomials for each parameter. Thus, if
we define n = (n1 , . . . , nd ) ∈ Nd , then we can write the full tensor product basis (or
just tensor basis) compactly with vector and Kronecker notation as
π n (s) = π n1 (s1 ) ⊗ · · · ⊗ π nd (sd ),
(4.4)
where π nk (sk ) are the first nk univariate basis polynomials for the parameter sk . The
Q
number of terms in this basis is k nk , so that if all the nk ’s are equal, then this is
ndk . In words, the number of basis functions in the basis grows exponentially as the
number of parameters increases.
In a similar way, a tensor product Gaussian quadrature rule is constructed from
a set of univariate rules for each coordinate. Denote the set of d-variate quadrature
nodes by
λn = λn1 × · · · × λnd ,
(4.5)
where λnk = (λ0 , . . . , λnk ) are the nk -point Gaussian quadrature nodes for the weight
function wk (sk ). Each d-variate point λα can be indexed by a multi-index α. The
associated weight να is computed as the product of the weights for each coordinate
in the node, i.e.
να =
d
Y
ναk =
k=1
d
Y
1
1
=
.
2
kπ nk (λαk )k2
kπ n (λα )k22
k=1
(4.6)
Then we can write the integral of a function f : [−1, 1]d → R as
hf i =
X
f (λα )να + R(f ),
(4.7)
α≤n−1
where R(f ) is the remainder. This is the multivariate version of equation (3.9). The
notation α ≤ n is short-hand for the set of multi-indices {α ∈ Nd : 0 ≤ αk ≤ nk −1}.
We will use the same notation for the Gaussian quadrature approximation as in the
univariate case, where it is understood that the subscript n is a d-tuple of integers,
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION54
i.e.
hf in ≡
X
f (λα )να .
(4.8)
α≤n−1
There is a subtle but crucial difference between using the multi-indices to index the
multivariate polynomials and using them for the quadrature points/weights. For the
polynomials, a given multi-index will always refer to the same polynomial, regardless
of the number of basis functions in an approximation. For example, π(2,3) (s1 , s2 ) =
π2 (s1 )π3 (s2 ), always. However, the specific points on the real line in the quadrature
formulas will depend on the total number of points in the formula. For example,
a quadrature rule with n = (3, 3) will have λ(1,2) ≈ (0, 0.77), but for n = (4, 4),
λ(1,2) ≈ (−0.34, 0.34). To avoid confusion, we will typically refer to the quadrature
rule for a fixed n, and we will state explicitly otherwise.
We can use the tensor grid of Gaussian quadrature points to construct a multivariate Lagrange interpolant, which is equivalent to spectral collocation (Section 3.3) for
multiple parameters. The multivariate basis polynomials become products of the univariate Lagrange interpolating basis polynomials (equation (3.18)). More precisely,
the tensor product interpolants of the solution x(s) of equation (2.3) is given by
X
xc,n (s) =
α≤n−1
where
ℓα (s) =
d
Y
x(λα )ℓα (s) ≡ Xc ln (s),
nY
k −1
k=1 jk =0, jk 6=αk
The N ×
Q
k
sk − λ j k
.
λαk − λjk
(4.9)
(4.10)
nk constant matrix Xc (the subscript c is for collocation) has one column
for each x(λα ), and ln (s) is a vector of the multivariate Lagrange basis polynomials.
The tensor product pseudospectral approximation is identical to the univariate
case but employs the multi-index notation.
xp,n (s) =
X
α≤n−1
hxπα in πα (s) ≡ Xp π n (s),
(4.11)
where Xp are the pseudospectral coefficients. Using these tensor constructions, it
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION55
is straightforward to show – by adapting arguments from the proof of Theorem 2
with appropriately placed Kronecker products – that the multivariate collocation
interpolant is equivalent to the multivariate pseudospectral approximation.
The coefficients of the Galerkin approximation are computed in precisely the same
way as the univariate case (see equation (3.29)) except using the multivariate basis
polynomials. In other words, to compute Xg in the approximation xg,n = Xg π n (s),
we solve the linear system of equations
π n π Tn ⊗ A vec(Xg ) = hπ n ⊗ bi .
(4.12)
We will see in the next section that we can choose any subset of the multivariate
orthogonal polynomials to construct the Galerkin approximation. We use the tensor
construction here to develop analogs of the connections between the spectral methods
for the multivariate case.
To proceed with this development, we need to define a multivariate function of
matrices. Let f : [−1, 1]d → R be analytic with series representation
f (s) =
X
fα sα ,
(4.13)
α∈Nd
with constants fα . Let Jn = (Jn1 , . . . , Jnd ) be a collection of d matrices, and define
f (Jn ) =
X
α∈Nd
fα Jα1 1 ⊗ · · · ⊗ Jαd d .
(4.14)
By the Kronecker construction, the matrix f (Jn ) has dimension that is the respective
products of the dimensions of the Jk . We use the notation Jk here to invoke the Jacobi matrices (equation (3.5)) for each coordinate direction sk with respective weight
function wk (sk ).
Using these notational conventions, Theorems 9 and 4 hold as written under the
tensor basis construction, as well as Corollaries 5 and 6 and Proposition 8. The
multivariate versions of these ideas merely require the multi-indices to replace the
single indices in the basis elements and a d-tuple of integers for n. The proofs are
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION56
structured exactly as in the univariate case but liberally employ Kronecker products
and their mixed product property [44].
Despite the straightforward theoretical extensions, the tensor constructions are
highly impractical. The exponential growth in the size of the orthogonal basis and/or
the number of quadrature points in the tensor grid makes computation entirely infeasible for a moderate number of parameters; even a linear approximation of the
solution (two points per parameter) will require 2d points. To combat this exponential growth, we seek a Galerkin approximation with carefully chosen basis elements
such that the number of elements is far smaller than the tensor basis.
4.2
Non-Tensor Basis Functions
One distinct advantage of the Galerkin method over the decoupled pseudospectral
method is that one may construct the approximation space using any subset of the
multivariate basis functions. Therefore, if a crystal ball existed that could tell us
precisely which basis functions were most important (i.e. the ones whose associated
coefficients had the largest magnitude), then we could use only those basis functions
for the approximation. While no such oracle actually exists, we can take advantage
of the theoretical decay of the coefficients to effectively find the most important basis
functions. In essence, for the functions x(s) that satisfy the parameterized matrix
equation (2.3), we expect a particular type of decay in the coefficients due to the quasirational structure of the solution (equation (2.5)). We do not expect that x(s) will
have entirely arbitrary or independent Fourier coefficients. Therefore, by constructing
an inexpensive, low-order approximation with few terms, we can often discern which
basis functions to add to best improve the approximation. This heuristic yields an
iterative scheme for finding an efficient finite term approximation for the solution.
The scheme we propose for approximating x(s) is naturally adaptive. If the solution varies differently along different parameters, the scheme will detect and exploit
that anisotropy to efficiently choose an appropriate basis set. This detection is based
on approximating the variance contributions from an ANOVA-like decomposition [57].
There is an intimate connection between the ANOVA decomposition and the Fourier
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION57
expansion of a square-integrable function – the ANOVA decomposition is equivalent
to partitioning the Fourier coefficients according to subsets of associated parameters.
We use this connection along with estimated asymptotic decay rates of the Fourier
coefficients to construct the heuristic that drives our scheme.
4.3
A Multivariate Spectral Galerkin Method
In this section, we recall the spectral Galerkin method proposed in Chapter 3 to
approximate the solution x(s), and we extend it to the case of multiple parameters
(d > 1). The spectral Galerkin method computes a finite dimensional approximation
to x(s) such that each element of the equation residual is orthogonal to the approximation space. In one dimension, we use an approximation space spanned by the first
n orthonormal polynomials, where the orthogonality was with respect to the given
weight function. In multiple dimensions, we construct the multivariate basis functions as products of one dimensional orthonormal polynomials – see equation (4.1).
The finite dimensional approximation space for the Galerkin method is defined as the
span of a subset Π of the multivariate basis functions. This subset of basis functions
is given by a subset of the multi-indices I ⊂ Nd , i.e.
Π = {πα (s) : α ∈ I ⊂ Nd }.
(4.15)
In section 4.5.1 we discuss choices for this basis set. In what follows, we restate – for
the sake of clarity – the Galerkin method including the generalization to arbitrary
approximation spaces. Define the vector-valued function
r(y, s) = A(s)y(s) − b(s).
(4.16)
We seek an RN -valued polynomial xg (s) ∈ span(Π) such that
hri (xg )πα i = 0,
i = 0, . . . , N − 1,
α ∈ I.
(4.17)
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION58
Let π(s) be the vector of elements in Π. We can write equations (4.17) in matrix
notation as
r(xg )π T = 0
(4.18)
Axg π T = bπ T .
(4.19)
or equivalently
Since each component of xg (s) is a polynomial from the span of Π, we can write its
expansion as
xg (s) =
X
α∈I
xg,α πα (s) ≡ Xπ(s),
(4.20)
where X is a constant matrix of size N × |I|; the number |I| is the number of multiindices in I. Then equation (4.19) becomes
AXππ T = bπ T .
(4.21)
T
ππ ⊗ A vec(X) = hπ ⊗ bi .
(4.22)
Using the vec notation [37, Section 4.5], we can rewrite (4.21) as
where vec(X) is an N|I| ×1 constant vector equal to the columns of X stacked on top
of each other. The constant matrix ππ T ⊗ A has size N|I| × N|I| and a distinct
block structure.
The work required to compute the Galerkin approximation depends on how one
computes the integrals in equation (4.22). If we assume that the cost of forming the
system is negligible, then the costly part of the computation is solving the system
(4.22). We expect an operation count of O(N 3 |I|3 ), in general, but many applications
beget systems with sparsity or exploitable structure. Notice that any reduction in |I|
dramatically reduces required work. The goal of the proposed approximation scheme
is to generate an accurate approximation with a small number of expansion terms. In
the next section we motivate a heuristic for choosing the basis functions by examining
the relation between the multivariate Fourier series and the ANOVA decomposition.
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION59
4.4
L2 Decompositions
In this section we briefly recall both the Fourier series and the ANOVA decomposition,
and we discuss the connections between the two.
4.4.1
ANOVA Decomposition
A square-integrable function f (s) defined on [−1, 1]d admits a decomposition into a
sum of functions representing the effects of each subset of the independent variables
on the total variability of f (s). We closely follow the notation from Liu and Owen [57],
and we refer to that paper and its references for more detailed information.
For subsets u ⊆ {1, . . . , d}, let |u| denote the cardinality of u. The indices in
{1, . . . , d} and not in u is denoted by u′ . Let su denote the |u|-tuple of components
sk with k ∈ u. The domain of su is the sub-hypercube [−1, 1]|u| .
Let hf i be the mean of f (s) over the domain, and let σ 2 = h(f − hf i)2 i be the
variance. The notation hf iu means integration against the weight function wu (s) =
Q
|u|
k∈u wk (sk ) with respect to the variables su ∈ [−1, 1] , which results in a function
that does not depend on su .
The ANOVA decomposition of f (s) is given by
f (s) =
X
fu (s),
(4.23)
u⊆{1,...,d}
where fu (s) depends only on su . There are precisely 2d terms in this expansion. The
functions fu are obtained by integrating over su′ and subtracting from f all terms for
strict subsets of u, which results in a function depending only on su ,
fu (s) = hf iu′ −
X
fv (s)
(4.24)
v⊂u
The convention is to set f∅ = hf i. It follows that hfu (s)fv (s)i = 0 for v 6= u, i.e. the
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION60
fu are orthogonal. The variance of fu is σu2 , and the following identity holds,
X
σ2 =
σu2 ,
(4.25)
u⊆{1,...,d}
where σ∅2 = 0. The ANOVA decomposition is so named because it decomposes the
variance into contributions by subsets of the independent variables. The functions
fu (s) for u = 1, . . . , d are called the main effects, since these functions only depend
on one of the variables. For functions fu (s) where u is not a singleton are called
interaction effects.
For the vector-valued function x(s) that solves equation (2.3), we define xu (s)
component-wise, so that xu (s) is also a vector-valued function.
4.4.2
Fourier Series
The Hilbert space L2 = {f : [−1, 1]d → R | hf 2 i < ∞} is spanned by the multivariate
orthogonal polynomials πα (s) with α ∈ Nd . Therefore, each function f ∈ L2 can be
written
f (s) =
X
fˆα πα (s),
(4.26)
α∈Nd
where fˆα = hf πα i are the Fourier coefficients. Let I ⊂ Nd be a finite subset and Π
be the set of basis polynomials associated with I. The projection
PI f (s) =
X
fˆα πα (s)
(4.27)
α∈I
is the best approximation polynomial in the L2 norm amongst all polynomials in
span(Π).
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION61
4.4.3
Connections Between ANOVA and Fourier Series
We first rewrite the Fourier series (4.26) as
f (s) =
≡
∞ X
∞
X
α1 =0 α2 =0
∞
X
···
∞
X
fˆα1 ,α2 ,...,αd πα1 (s1 )πα2 (s2 ) · · · παd (sd )
αd =0
fˆα{1,...,d}
α{1,...,d} =0
Y
παi (si )
i∈{1,...,d}
Notice that integrating against some subset u′ of the variables s leaves
hf iu′ =
∞
X
αu =0
fˆαu
Y
παi (si ),
(4.28)
i∈u
R1
since π0 (si ) = 1 and −1 παi (si )wi (si ) dsi = 0 for αi > 0 and i = 1, . . . , d. The
notation fˆαu denotes the Fourier coefficient with index αi in the ith position for i ∈ u
and zero in the jth position for j ∈ u′ . Therefore, since hf i = fˆ0,...,0 , we have the
relations
fu (s) =
∞
X
αu =1
fˆαu
Y
παi (si )
i∈u
σu2
=
∞
X
fˆα2u .
(4.29)
αu =1
In words, the ANOVA decomposition naturally partitions the Fourier series by its
indices. The nonzero components of the multi-index α of a coefficient fˆα prescribe
exactly to which fu it belongs. In essence, the ANOVA decomposition collapses the
infinite series associated with each component index to a single term, hence the 2d
terms in the ANOVA expansion.
The goal of the spectral Galerkin method is to approximate the Fourier coefficients
of x(s). Thus, if we merely group the Galerkin coefficients according to the nonzero
components of their multi-indices, then we have an approximation of the ANOVA
expansion. And summing the squares of each set of coefficients yields the variance
contributions. This connection motivates an adaptive scheme for choosing a good
approximation with a limited number of terms.
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION62
4.5
Developing A Heuristic
In this section we develop a heuristic for choosing the basis elements in the truncated
expansion to reduce the size of the Galerkin system (4.22) while maintaining a good
approximation. We treat the approximation as an iterative procedure. In other words,
given n we compute an approximation using the set of basis functions for some index
set parameterized by n. We then use the characteristics of the level n approximation
to choose the basis set for level n + 1. We repeat this procedure until some stopping
criteria is satisfied. There are some straightforward implementation tricks to reuse
the solution at level n to help compute the solution at level n + 1, but we will not
discuss them here.
To develop the heuristic, it will be very convenient to assume that we are working
with the Chebyshev expansion of x(s), and the weight function is the product of
one dimensional weight functions for the Chebyshev polynomials, i.e. wi (si ) = (1 −
s2i )−1/2 . In this case, the truncated Chebyshev series is the truncated Fourier series.
Asymptotically, we expect that the Galerkin coefficients will behave much like the
Chebyshev coefficients, which justifies this assumption.
4.5.1
Incrementing a Basis Set
The most natural extension of the one dimensional basis to higher dimensions is
via the tensor products discussed in Section 4.1. This is equivalent to using an
approximation space of multivariate polynomials with largest degree in each variable
up to order n. (We assume for simplicity that all components of the multi-index n
are equal.) In other words, the set of multi-indices corresponding to the tensor basis
is
In =
α ∈ N : max αi ≤ n .
d
1≤i≤d
(4.30)
There are |In | = nd terms in this set, which makes it at best impractical and at
worst infeasible for applications with a moderately large number of dimensions. An
alternative basis uses the full polynomials, which is more standard for analysis of
orthogonal polynomials in several variables [24]. This basis includes polynomials
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION63
where the sum of the degrees in each variable is less than or equal to n,
In = α ∈ Nd : α1 + · · · + αd ≤ n .
The number of terms in this set is |In | =
(4.31)
n+d
n
. This is also the basis used for the
so-called polynomial chaos methods [33, 90]. The number n+d
is smaller than nd ,
n
but suffers the same exponential rate of growth asymptotically.
In fact, we expect the full polynomial basis to be a much more efficient approximation, particularly for low order approximations. We borrow a result from Bochner
and Martin [11] by way of Boyd [13][Theorem 11] related to the asymptotic decay of
the multivariate Chebyshev coefficients to justify this claim.
Theorem 14 (Multivariate Chebyshev Series) Let sj , j = 1, . . . , d, denote an
ordinary complex variable and let f (s1 , . . . , sd ) be analytic in the elliptic polycylinder
q
n
o
ǫ = sj + s2j + 1 < rj , j = 1, . . . , d
(4.32)
X
(4.33)
where rj > 1. Then f (s1 , . . . , sd ) has the unique expansion
f (s1 , . . . , sd ) =
aα Tα (s1 , . . . , sd )
α∈Nd
which is valid in ǫ. The Tα (s1 , . . . , sd ) are the multivariate product Chebyshev polynomials. Convergence is uniform and absolute in every interior polycylinder E(ρ1 , . . . , ρj )
for which 1 < ρj < rj , j = 1, . . . , d.
The coefficients satisfy the bound
|aα | ≤ K(ρ)
d
Y
k
ρ−α
k
(4.34)
k=1
for some constant K(ρ).
From the bound (4.34), we expect the cross-term coefficients to decay faster than the
coefficients associated with only a single parameter. Thus the tensor product expansion will use more basis functions than necessary to capture the essential features of
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION64
the function in the Chebyshev space. We can trim the corners of the tensor product
approximation to get the full polynomial expansion and a more efficient approximation; this metaphor makes sense after examining the basis functions in Figure 4.1.
However, we will see examples in Section 4.6 where this bound is very loose, and even
the full polynomial basis uses too many basis elements.
Examining the full and tensor sets of multi-indices, we notice a correspondence
between the tensor product set (4.30) and the standard infinity norm on Rd , and
similarly the full polynomial set (4.31) and the standard one norm on Rd . We can
generalize these sets by considering more general semi-norms, such as a weighted
p-norm. Consider the set
Iγ,n =



α ∈ Nd , γ ∈ Rd+ :
d
X
(γj αj )p
i=1
!1/p


≤n .

(4.35)
We can choose the parameters γi and p of this set to further reduce the number of
term in the basis. If γi ≥ 1 and p ≤ 1, then |In | ≤ n+d
. The goal, however, is to
n
choose these parameters to construct a sequence of basis sets indexed by n that are
appropriate for the particular solution. For example, if the Chebyshev coefficients
corresponding to si decay much slower than those corresponding to sj , then we can
use the parameter γi to accelerate the number of included terms associated with si
as n increases. Equivalently, we can use γj to exclude basis functions associated with
sj until n increases sufficiently.
Instead of choosing a single parameter p, we can choose a different pu for each
subset of variables su to create
Iu,γ,n


|u|
= α ∈ N|u| , γ ∈ R+ :

X
(γj αj )pu
j∈u
Then we can choose the full index set to be
Iγ,n =
[
u∈{1,...,d}
Iu,γ,n .
!1/pu


≤n .

(4.36)
(4.37)
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION65
Figure 4.1: For d = 2, the shaded squares represent the included multi-indices for the
various basis sets with n = 10. The weighted index set with no curvature has weights
1 and 2. The weighted index set with curvature has curvature parameter p = 0.7.
The parameters pu will dictate the number of included Chebyshev coefficients associated with all the variables su . Examples of index sets in two-dimensions corresponding
to choices of γ and pu are given in Figure 4.1. In what follows, we describe how to
use the ANOVA information to choose the values γ and pu for the basis set (4.37).
4.5.2
Generating a Score Vector
The computed matrix of coefficients X of size N × |In | contains the expansion coeffi-
cients (which we assume are Chebyshev coefficients) for each component xi (s). The
first step is to generate a vector of length |In | from this matrix. This will give us one
number ẑα associated with each basis element πα (s), where α ∈ In . Define ẑ to be
the vector of numbers ẑα .
We motivate the choice of ẑ by noting the relationship of the expansion Xπ(s)
from equation (3.27) to the Karhunen-Loeve (KL) expansion of a random process [58],
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION66
which decomposes a random process as a countable series in the eigenfunctions of its
two-point covariance function. We first partition
h
i
X = x0 Xr
h
iT
π(s) = 1 π r (s) .
(4.38)
where x0 is the first column of X and Xr are the remaining columns. If we take the
singular value decomposition Xr = UΣVT , then by the orthogonality of π(s), we
can write the covariance matrix of the functions Xπ(s),
Cov(Xπ(s)) = Xr XTr = UΣΣT UT
(4.39)
This is exactly an eigendecomposition of the covariance matrix for the Galerkin approximation. Next note that the |In | − 1 functions ξ(s) ≡ VT π r (s) are orthonormal
since,
T
ξξ
= VT π r π Tr V = I.
(4.40)
Define M = min(N − 1, |In | − 2). Then we can write
Xπ(s) = x0 + Xr π r (s)
= x0 + UΣVT π r (s)
= x0 + UΣξ(s)
= x0 +
M
X
uj σ̄j ξj (s),
j=0
where uj are the eigenvectors of the covariance matrix of Xπ(s), as in the KL expansion. The σ̄j are the singular values, which are denoted by a bar to distinguish them
from the variance contributions in the ANOVA decomposition. By the construction
of ξ(s), we can treat the scaled right singular vectors λj vj as a score on the basis
functions π r (s). Each vj contains an element vj,α for the basis function πα (s). By
multiplying vj,α by σ̄j and taking its absolute value, we create a ranking for πα (s)
corresponding to the jth singular value. From these, we can construct a ranking for
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION67
the basis function πα (s) with
ẑα =
M
X
j=0
|σ̄j vj,α |.
(4.41)
For the constant term, we use the 2-norm of the first column of X, i.e. ẑ0 = kx0 k2 .
With the vector ẑ, we have implicitly constructed a function
zn (s) =
X
ẑα πα (s).
(4.42)
α∈In
We assume that we can extend this definition for all n. In other words we assume
there exists a square-integrable function z(s) : Rd → R such that kz − zn kL2 → 0 as
n → ∞. Additionally, we assume that z(s) is analytic in the same region that xi (s) is
analytic for i = 0, . . . , N − 1. In essence, the function z(s) amalgamates the behavior
of the functions xi (s), and using its coefficients, we can compute global ANOVA-like
measures for the solution vector x(s).
4.5.3
Computing The Dimension Weights
Consider the univariate main effects functions zk (s) = zk (sk ) for k = 1, . . . , d from
the ANOVA decomposition of the newly constructed z(s). Using spectral theory for
univariate analytic functions, we assume that there exist ρk > 1 such that
kzk − Pn zk k2L2 =
∞
X
2
ẑk,j
= Ck2 ρ−2n
,
k
(4.43)
j=n
where the ρk are related to the largest ellipse in the complex plane in which zk (s) is
analytic. The second equality should be an inequality; equation (4.43) motivates the
heuristic. Note that for any j, k ∈ {1, . . . , d}, there is a constant µj,k such that
−µj,k n
= Ck ρk
Cj ρ−n
j
.
(4.44)
The constant µj,k is a relative measure of how fast the Chebyshev coefficients of zj (s)
decay relative to the coefficients of zk (s). In other words, if incrementing n by one
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION68
reduces the error in the Fourier-Chebyshev projection Pn zj (s) by 1/ρj , then we expect
a comparable error reduction for the projection Pn zk (s) if we increase n to the integer
nearest n + µj,k . Solving for this constant, we get
µj,k =
log(ρj )
log(ρj ) log(Cj ) − log(Ck )
−
≈
log(ρk )
n log(ρk )
log(ρk )
(4.45)
for sufficiently large n.
Using equations (4.43) and (4.29), we relate the ρk to the variance contributions
from the main effects. Let I be the coefficient of z(s) associated with the constant
function. If we first set n = 0, then for k = 1, . . . , d,
Ck2
=
∞
X
2
ẑk,j
= I 2 + σk2 .
(4.46)
j=0
Next set n = 1 to get
∞
X
Ck2
2
=
ẑk,j
= σk2
ρ2k
j=1
Solving for ρk , we get
ρk =
s
I 2 + σk2
.
σk2
(4.47)
(4.48)
Define ρ∗ = min1≤k≤d ρk , and let σ∗ be the associated variance contribution. Then
using (4.45), we compute the parameters γ in the definition of the index set (4.36) as
γk =
log(ρk )
log(I 2 + σk2 ) − log(σk2 )
=
.
log(ρ∗ )
log(I 2 + σ∗2 ) − log(σ∗2 )
(4.49)
According to this definition, γk ≥ 1 for all k. The set of k such that γk = 1 are associ-
ated with the main effects functions whose Chebyshev coefficients decay the slowest.
Incrementing n by one in the definition of the index set (4.37) will always increase
the order of approximation associated with the slowest converging main effects by
one. The approximation along main effects associated with γk > 1 will not always be
incremented for each increment of n. Instead, it will wait for an increment sufficiently
large so that the error reduction along that main effect is comparable to the slowest
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION69
decaying main effects. This dramatically reduces the number of basis functions for
each increment of n if there is detectable anisotropy in the function z(s).
Clearly, a more aggressive approach would be to set γk = log(ρk )/ log(maxk ρk ).
This definition would have the opposite affect of accelerating the series approximation
along the slower decaying main effects with each increment of the main effects with
faster decay rates. We prefer the more conservative approach since it yields fewer
additional basis functions per iteration.
4.5.4
Estimating the Curvature Parameters
The curvature parameters pu in the definition of the index sets (4.36) control how
many mixed terms are allowed in a particular truncated representation. If pu = 1,
then the mixed term basis functions will be the full polynomial basis for the variables
su . If pu < 1, the basis will exclude many mixed terms, and if pu > 1, the basis will
include mixed terms beyond the full polynomial basis.
If we treat the bound (4.34) on coefficients of mixed terms from Theorem 14 as
an equality, then we can use this relation as an estimate of what we expect from the
variance contribution of the interaction effects σu2 , where u is not a singleton. By
inspection, this estimate corresponds with pu = 1, i.e. if the mixed coefficients decay
according to (4.34), then the full polynomial basis is a very efficient basis set. And
this gives us a reference point for comparison.
Define the quantity
τu2 = I 2
X Y
−2αj
ρj
.
(4.50)
αu ∈Nd j∈u
This is the variance contribution we would get if the coefficients decayed like (4.34),
which corresponds to pu = 1. We then compute the actual variance contribution,
σu2 =
X
ẑα2 u .
(4.51)
pu = (σu2 /τu2 )r .
(4.52)
αu
∈Nd
From these quantities, we set
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION70
The parameter r controls how much impact discrepancies between τu2 and σu2 have on
the number of terms in the basis set. We conservatively set r = 0.1, which has the
tendency to push both large and small discrepancies toward 1.
There is a strong caveat for this measure. If z(s) is an even (odd) function on a
symmetric interval with symmetric weight function, then every coefficient associated
with a basis function with an odd (even) degree will be zero. If this is known a priori,
then equation (4.52) can be corrected accordingly.
4.5.5
Stopping Criteria
The most natural stopping criteria for general problems is the residual error estimate
developed in Theorem 13. For a vector-valued approximation y(s), we compute the
norm of the residual r(y, s) = A(s)y(s) − b(s) as
kr(y)kL2
v
uN −1 Z
uX
=t
i=0
[−1,1]d
ri2 (y, s)w(s) ds.
(4.53)
In practice, we may use a Gaussian quadrature rule to evaluate this quantity. We
stop once this residual falls below a set tolerance.
4.5.6
Algorithm
Since we cannot compute the exact variance contributions σu2 without the full (infinite)
expansion, we approximate these terms using the coefficients we have at a given
2
iteration; denote this quantity by σu,n
. As n increases, we add more terms to the
2
approaches σu2 as quickly
approximate variance contributions, and we expect that σu,n
as the approximation converges to the true solution.
We summarize the above discussion in the following algorithm. First choose an
initial basis set I0 from the possible choices in Section 4.5.1, and compute the co-
efficients X0 . To get initial values for all curvature parameters pu and weights γk ,
the initial basis set must have at least one term for each coordinate sk , k = 1, . . . , d
and each interaction su , u ⊂ {1, . . . , d}. The simplest such expansion uses a tensor
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION71
product basis with n = 1. Then while the residual is greater than TOL, repeat the
following steps.
1. Compute a score vector ẑ from the coefficients Xn using the method described
in Section 4.5.2.
2. Set In2 = ẑ02 . For k = 1, . . . , d, compute
2
σk,n
=
X
ẑα2 (k) .
(4.54)
α(k) ∈In
2
Set σ∗2 = maxk σk,n
, and compute the weights
γk,n =
log(In2 + σ∗2 ) − log(σ∗2 )
.
2
2
)
) − log(σk,n
log(In2 + σk,n
(4.55)
If the set {α(k) ∈ In } is empty for some iteration, then use the initial values
2
computed from the approximation with basis index set I0 . If σk,n
= 0, then set
γk to a very large number.
3. For u ⊂ {1, . . . , d}, compute
2
τu,n
= In
X Y
αu ∈In k∈u
2
σk,n
2
In2 + σk,n
2
σu,n
=
X
ẑα2 (u)
(4.56)
α(u) ∈In
2
2
and set pu = (σu,n
/τu,n
)r ; we set r = 0.1. If the set {αu ∈ In } is empty, then
2
set pu to values computed from the initial approximation. If σu,n
= 0, then set
pu to a very small number.
4. Determine the index set In+1,γ,p and compute the coefficients Xn+1 of the approximation xg,n+1 (s).
5. Compute the residual error estimate of xg,n+1 (s).
We apply this algorithm in the next section to a series of representative numerical
examples to demonstrate its behavior.
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION72
4.6
Numerical Examples
We first present three examples of 2 × 2 matrix equations dependent on two parameters, s1 and s2 , where each problem corresponds to a different type of behavior we
expect from general situations.
4.6.1
Anisotropic Parameter Dependence
Let ε2 > ε1 > 0, and consider the equation
"
1 + ε 1 s1
s1
1
#"
#"
#
1 + ε2 s2 x1 (s1 , s2 )
s2
1
x2 (s1 , s2 )
=
" #
1
1
,
(s1 , s2 ) ∈ [−1, 1]2 .
(4.57)
We set ε1 = 0.2 and ε2 = 2 to induce anisotropic parameter dependence in the
solution. In particular, the region of analyticity with respect to s1 is smaller than the
region of analyticity with respect to s2 . Therefore, we expect the Galerkin coefficients
corresponding to s1 to decay much slower than those corresponding to s2 . In Figure
4.2 we show the pseudospectral coefficients of a tensor product approximation of
order 50 in each variable. The coloring corresponds to the magnitude of the log
of each coefficient squared. The difference in decay rates is clearly visible from the
asymmetry of the region of large coefficients. We show the included basis functions
for the Galerkin approximation with the weighted basis of order 50, which shows how
our method finds the region of large coefficients.
4.6.2
Small Interaction Effects
Let ε > 0, and consider the equation
"
2 + s1
ε
ε
2 + s2
#"
#
x1 (s1 , s2 )
x2 (s1 , s2 )
=
" #
1
1
,
(s1 , s2 ) ∈ [−1, 1]2 .
(4.58)
If ε = 0, then x1 = x1 (s1 ) and x2 = x2 (s2 ), i.e. there will be no interaction effects.
Therefore, we expect that a small ε will result in very small interaction effects in the
functions x1 (s1 , s2 ) and x2 (s1 , s2 ). For the following results, we set ε = 0.01. In Figure
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION73
50
50
45
40
40
35
α
2
30
α2
30
20
25
20
15
10
10
5
10
20
α
30
40
50
5
1
10
15
20
25
α
30
35
40
45
50
1
Figure 4.2: Tensor product pseudospectral coefficients (color) of a solution with
anisotropic parameter dependence along with the included coefficients (black & white)
of the non-tensor weighted Galerkin approximation.
4.3 we present the pseudospectral coefficients for a tensor product approximation of
order 50, and the included coefficients for the weighted Galerkin basis of order 50. The
coefficients corresponding to the interaction effects are clearly smaller than those for
the main effects (along the boundaries), and the weighted basis with curvature finds
an efficient basis set corresponding to such weak interaction effects. To accentuate the
effect, we use a more aggressive curvature parameter r = 0.2 (see equation (4.52)).
4.6.3
Large Interaction Effects
Let ε > 0, and consider the equation
"
4
s2 + s1 s2
s1 + s1 s2
1
#"
#
x1 (s1 , s2 )
x2 (s1 , s2 )
=
" #
1
1
,
(s1 , s2 ) ∈ [−1, 1]2 .
(4.59)
Here we expect large interaction effects since much of the variability from parametric
variation comes from the term s1 s2 . In Figure 4.4, we again show the pseudospectral coefficients for a tensor product approximation of order 50 side-by-side with the
included coefficients of a weighted Galerkin approximation of order 50. Notice how
the included basis elements curve outward to capture the relatively large coefficients
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION74
50
50
45
40
40
35
α
2
30
α2
30
20
25
20
15
10
10
5
10
20
α
30
40
50
5
10
15
20
1
25
α
30
35
40
45
50
1
Figure 4.3: Tensor product pseudospectral coefficients (color) of a solution with weak
interaction effects along with the included coefficients (black & white) of the nontensor weighted Galerkin approximation.
corresponding to the interaction effects.
4.6.4
High Dimensional Problem
Next we test our heuristic on a second order boundary value problem derived from the
linear elliptic PDE with random coefficients developed in [67]. We seek the function
u(t, s) that satisfies the equation
d
−
dt
du
ad (t, s)
= cos(t),
dt
√ √
t ∈ [0, 1], s ∈ [− 3, 3]d
u(0, s) = u(1, s) = 0.
(4.60)
(4.61)
The parameterized coefficient ad (t, s) is given by
log(ad (t, s) − 0.5) = 1 + s1
where
√
ζk = ( πL)1/2 exp
√
πL
2
1/2
−(⌊ k2 ⌋πL)2
8
!
+
d
X
ζk φk (t)sk
(4.62)
k=2
if k > 1
(4.63)
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION75
50
50
45
40
40
35
α
2
30
α2
30
20
25
20
15
10
10
5
10
20
α
30
40
50
5
10
15
1
20
25
α
30
35
40
45
50
1
Figure 4.4: Tensor product pseudospectral coefficients (color) of a solution with strong
interaction effects along with the included coefficients (black & white) of the nontensor weighted Galerkin approximation.
and

k  sin ⌊ 2 ⌋πt
if k even,
kLp φk (t) =
⌋πt
⌊
 cos 2
if k odd.
Lp
(4.64)
Let Lc represent the physical correlation length for a random field a, meaning the
random variables a(t1 ) and a(t2 ) become essentially uncorrelated for |t1 − t2 | >> Lc .
Then Lp in (4.64) can be taken as Lp = max(1, 2Lc ) and the parameter L in (4.62)
and (4.63) is L = Lc /Lp . Under this set up, the expression in (4.62) approximates a
random field with stationary covariance
(t1 − t2 )2
Cov(log(a − 0.5))(t1 , t2 ) = exp −
.
L2c
(4.65)
It has been shown in [67] that the region of analyticity for u(t, s) with respect to
sk grows as k increases. Thus, we expect that the Galerkin coefficients of u(t, s)
associated with sk will decay at increasingly faster rates as k increases – a clear sign
of anisotropic dependence on s. They also show in [67] that the anisotropy increases
as the correlation length Lc decreases. We present numerical results for d = 5 with
correlation length 3/4.
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION76
0
10
−2
10
−4
Magnitude
10
−6
10
−8
10
−10
10
−12
10
−14
10
1
1
2
3
4
5
2
3
4
5
Order
6
7
8
9
Figure 4.5: Decay of the coefficients associated with the main effects for the high
dimensional elliptic problem computed with a spectral Galerkin method with a full
polynomial basis.
We use the standard piecewise linear finite element approximation with 512 elements to discretize (4.60) in the t domain. This yields the parameterized matrix
equation
(K0 + Kr (s))x(s) = b.
(4.66)
The matrix K0 is the standard symmetric, tridiagonal and positive definite matrix
that results from discretizing a one dimensional Poisson’s equation with coefficient 0.5.
The parameterized matrix Kr (s) is symmetric and tridiagonal for all s; its elements
depend nonlinearly but analytically on the parameters.
We expect anisotropic parameter dependence in the solution, but no weak interaction effects. Therefore, we apply the weighted Galerkin method without the curvature
parameters. To see the anisotropy more clearly, we first compute a Galerkin approximation with the standard full polynomial basis. In Figure 4.5 we plot the decay of the
Galerkin coefficients corresponding to the main effects. The coefficients associated
with the last two variables decay much faster than the first three, which is entirely
expected by construction.
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION77
0
10
−2
10
−4
Magnitude
10
−6
10
−8
10
−10
10
−12
10
−14
10
0
1
2
3
4
5
5
10
15
Order
Figure 4.6: Decay of the coefficients associated with the main effects for the high
dimensional elliptic problem computed with a spectral Galerkin method with an
ANOVA-based weighted basis.
Using the adaptive, weighted basis, we find the anisotropic parameter dependence
and exploit it by including more coefficients for main effects functions with slower
coefficient decay. In Figure 4.6, we plot the main effects coefficients of the weighted
approximation at n = 15. The weighted basis has actually overshot the optimal basis
set by including more coefficients than necessary for the most important main effects.
Thus, the weighting scheme may need some tuning for this particular problem. However, the general results are encouraging. The ANOVA-based weighted scheme found
and exploited the anisotropy. In Figure 4.7, we plot the weights that determine the
included indices in the weighted basis as n increases. By construction, the smallest
weight is always 1, which corresponds to the most important parameter (where “importance” means the one that contributes the most to the variability). The remaining
weights indicate how often the weighted basis expands along the other coordinates
as n increases. Notice that the weights converge to their proper values, and the
two distinct convergence rates from the full polynomial basis (Figure 4.5) are clearly
distinguishable in the separation of the weights.
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION78
3.5
3
weights
2.5
2
1.5
1
1
2
3
4
5
0.5
0
4
6
8
10
12
14
n
Figure 4.7: Convergence of the weights in the ANOVA-based weighted scheme as n
increases.
Finally, in Figure 4.8 we plot the residual error estimate from Theorem 13 for both
the full polynomial basis and the ANOVA-based weighted basis against the number
of terms in the approximation. The overshoot in the weights yields an approximation
that is not as efficient as possible, but the ability to discover and exploit the anisotropy
is very encouraging.
4.7
Summary
We have extended the univariate spectral methods presented in Chapter 3 to multivariate methods. The most natural extension to the multivariate case is via the
tensor product bases, but the cost of computing these approximations grows exponentially with the number of parameters; this is the dreaded curse of dimensionality.
To combat this curse, we propose an anisotropic multivariate Galerkin method that
exploits anisotropic parameter dependence in the solution. The method uses the
information from an approximate ANOVA decomposition to determine the most important parameters with respect to variability, as well revealing the importance of the
CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION79
−4
10
ANOVA−based
Full Polynomial
−5
Residual
10
−6
10
−7
10
−8
10
1
10
2
3
10
10
4
10
Number of Terms
Figure 4.8: Convergence of the residual error estimate for both the ANOVA-based
anisotropic basis and the full polynomial basis plotted against the number of basis
elements in each approximation.
interaction effects. The method is essentially iterative; at each iteration it uses the
ANOVA information from the lower order approximation to construct an appropriate
set of basis functions for the next higher order approximation. In this way, it uncovers
many qualitative behaviors of the solution in an efficient way.
We tested the method on a series of toy parameterized matrix equations to showcase its features, and we applied it to a standard problem from the literature with
encouraging results. Since the method is based on heuristics, there are opportunities
for improvement, which we will pursue in future research activities.
Chapter 5
Strategies for Large-Scale
Problems
A spirited debate continues in the uncertainty quantification community amongst
those who work with PDEs with random inputs over whether or not the more expensive Galerkin method holds any advantages over the decoupled, interpolatory
pseudospectral method. More generally, the debate is between (i) nonintrusive methods (pseudospectral, collocation, Monte Carlo), which use only function evaluations
at points in the parameter space yielding maximum code reuse, and (ii) intrusive
methods which require additional coding effort to solve larger systems related to the
original parameterized problem. Some argue that the system solved for the Galerkin
typically contains fewer equations than those for the pseudospectral methods which
results in savings from a linear solver perspective [77, 10]. Others contend that the
Galerkin approximation produces more efficient and accurate approximations for a
fixed polynomial order [21]. From a practical perspective, one can adaptively control the choice of basis functions with maximum flexibility from an intrusive framework [10, 9]. Nevertheless, the nonintrusive interpolation methods (collocation, pseudospectral) retain an asymptotic rate of convergence comparable to the intrusive
Galerkin methods [6, 89], so that the dramatically simpler implementation often overcomes these stated advantages – especially when quickly devising experiments.
For the case of parameterized matrix equations, the computational effort for a
80
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
81
fixed order approximation is a choice between (i) many solves of the constant linear
systems
A(s0 )x(s0 ) = b(s0 ),
s0 ∈ [−1, 1]d ,
(5.1)
where the s0 are chosen according to a Gaussian quadrature rule, or (ii) a single solve
of the system
T
ππ ⊗ A vec(X) = hπ ⊗ bi ,
(5.2)
for a given basis set π(s). Regardless of the choice of intrusive Galerkin or nonintrusive pseudospectral methods, the scale of the computations required for accurate
approximation – particularly for multiple parameters – is enormous. The nonintrusive
varieties may require hundreds of thousands of function evaluations, whereas the system (5.2) for the Galerkin approximation maybe be intractably large and/or difficult
to form.
In this chapter, we will focus on large-scale problems of the Galerkin form (5.2) for
two reasons. First, in Chapter 4 we presented an anisotropic approximation method
based on the Galerkin framework where the basis elements were chosen adaptively.
To be consistent with this method, we seek to solve the large systems corresponding
to that scheme. Secondly, tackling the large-scale problem of solving many linear
systems in the nonintrusive context can effectively be swept under the proverbial
rug by massively parallel machines. In other words, it is a problem of capacity –
solving a large number of small problems – instead of capability – solving one large
problem. And if we assume that such solvers are optimally tuned/preconditioned for
the structure of the system, then there is little to be improved upon from a broader,
global perspective.
However, numerous challenges arise when solving (5.2). Firstly, is it possible to
reuse code in a sense similar to the nonintrusive methods? In Section 5.1, we propose
a weakly intrusive paradigm, which allows for substantial code reuse when computing
the Galerkin approximation and offers a much-needed middle ground in the continuing
debate between intrusive and nonintrusive methods.
Even if we can optimally choose the perfect basis for a finite term approximation,
the size of the system (5.2) may still be daunting. If the size of the parameterized
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
82
matrix N or the number of basis functions |I| is large, then the system can quickly
become unwieldy. It may not fit into memory, and/or the integrals may be very difficult to evaluate. In other words, this bare-bones formulation does not lend itself to
large-scale problems. To alleviate this challenge, we analyze a variant of the Galerkin
method known as Galerkin with Numerical Integration (G-NI) [16] in Section 5.2.
The derivation of G-NI is identical to the standard Galerkin derivation but each integral is replaced by a tensor-product Gaussian quadrature formula. This simple
change – which is often implemented in practice without analytical considerations –
surprisingly begets a useful decomposition of the matrix ππ T ⊗ A . We will use this
decomposition to place the G-NI method within the weakly intrusive paradigm, as
well as derive useful insights into the system (5.2) including eigenvalue bounds, preconditioning strategies, and an elegant interpretation of the approximation problem
as a weighted least-squares problem.
5.1
A Weakly Intrusive Paradigm
The broadest interpretation of the terms intrusive and nonintrusive is straightforward: nonintrusive methods take advantage of existing codes or solvers (with possible pre- or post-processing) and intrusive methods require new codes or solvers to
be written. To formally introduce an alternative paradigm, we need to dramatically
restrict these interpretations.
From the viewpoint of the parameterized matrix equations, we define a nonintrusive method as one that uses the vectors x(s0 ) = A(s0 )−1 b(s0 ) for some point s0 in the
parameter space along with possible pre-processing (such as computing the Gaussian
quadrature nodes) and post-processing (such as forming pseudospectral coefficients)
computations. In words, we say that the nonintrusive methods may use evaluations
(or samples) of the solution vector at a set of points in the parameter space. An
intrusive method is one that does not use such point evaluations of the solution. In
the case of the Galerkin method, it requires the solution of a larger related system
of equations. But notice that sampling the solution is nowhere to be found in the
method derivation.
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
83
The essential idea behind a weakly intrusive paradigm is that it allows evaluation of
the parameterized matrix operator and the parameterized right hand side at points in
the parameter space. Loosely speaking, if nonintrusive methods sample the solution,
then weakly intrusive methods sample the operators, or strictly the action of the
operators on a vector. More formally, we classify an algorithm as weakly intrusive if
it requires only
1. matrix-vector products of the form A(s0 )y for a point s0 in the parameter space
and a given N-vector y,
2. evaluation of the vector b(s0 ) for a point s0 in the parameter space,
3. additional pre- and/or post-processing.
In this way, we are allowed to examine the action of the parameterized matrix throughout the parameter space by observing its effects on given vectors. The key to the
weakly intrusive paradigm is that we restrict the algorithms to matrix-vector multiplies where the matrix is the parameterized operator evaluated at a point in the
parameter space. We are essentially taking a page from methods for iterative linear
solvers (such as MINRES [73]) and eigenvalue solvers (such as Lanczos’ method [50])
that use only matrix-vector products. In fact, we retain some advantages of such
methods, including exploiting sparse structures in the matrices by only needing the
nonzero elements. This can dramatically reduce memory requirements for the methods.
The workhorse of the method is a function f (s0 , y) = A(s0 )y that computes the
matrix-vector products given a d-dimensional point and an N-vector. By taking advantage of this interface, we can write reusable software libraries similar to existing
iterative solver libraries [84, 8, 55] that require only matrix-vector multiplies. The
development of libraries with a common interface will encourage the widespread acceptance of UQ methods. Software based solely on black-box nonintrusive methods,
such as Sandia’s DAKOTA framework [25], cannot accomodate problem dependent algorithms. The weakly intrusive paradigm proposes a software interface that improves
on this shortfall of purely nonintrusive methods.
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
84
An additional and crucial advantage of the looser requirements of a weakly intrusive method compared to a nonintrusive method is that one can estimate the residual
error estimate from Theorem 13 using only matrix-vector products of the parameterized system evaluated at a set of Gaussian quadrature points. Such a posteriori error
estimates are not computable in a purely nonintrusive framework.
Notice that the nonintrusive methods can easily satisfy the restrictions of the
weakly intrusive paradigm if the systems A(s0 )x(s0 ) = b(s0 ) are solved with a Krylovbased iterative method that uses matrix-vector products. In what follows, we derive
the G-NI method and show how it can satisfy the weakly intrusive requirements, as
well.
5.2
Galerkin with Numerical Integration (G-NI)
The G-NI method is identical to the Galerkin method developed in Section 4.3 but
we replace each integral h·i by a tensor product Gaussian quadrature rule h·in for
some n = (n1 , . . . , nd ) (see equation (4.8)). The derivation is not repeated here, but
follows exactly as in 4.3 using quadrature rules instead of integrals. The system of
equations to be solved is then
T
ππ ⊗ A n vec(X) = hπ ⊗ bin ,
(5.3)
where the solution vector vec(X) collects the coefficients of the G-NI approximation
xgni (s) =
X
α∈I
xα πα (s) ≡ Xπ(s)
(5.4)
for some set of multi-indices I ⊂ Nd . The conditions for equivalence are straightfor-
ward generalizations of the theorems in Chapter 3, which we state here for completeness.
Theorem 15 Given n = (n1 , . . . , nd ), the pseudospectral approximation with tensor
product basis π n (s) is equal to the G-NI approximation with tensor product basis π n (s)
using n-point tensor product Gaussian quadrature.
85
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
Proof. The proof is a straightforward adaptation of the proof of Theorem 4.
Theorem 16 Assume A(s) and b(s) contain only finite degree polynomials of maximum degree ma = (ma,1 , . . . , ma,d ) and mb = (mb,1 , . . . , mb,d ), respectively. Assume
also that the basis π n (s) contains polynomials of maximum degree n, not necessarily
a tensor product basis. For j = 1, . . . , d, define
mj ≡ mj (nj ) ≥ max
ma,j + 2nj − 1
2
,
mb,j + nj
2
,
(5.5)
and let m = (m1 , . . . , md ). Then the G-NI approximation with basis π n (s) and mpoint tensor product Gaussian quadrature rule is equivalent to the Galerkin approximation with basis π n (s).
Proof. This is a straightforward adaptation of the proof of Lemma 7 utilizing the
polynomial exactness of the Gaussian quadrature formulas.
In essence, Theorem 16 states that for operators A(s) and b(s) that depend at
most polynomially on s, there is a Gaussian quadrature rule such that G-NI is equal
to Galerkin (within numerical precision). For general analytic dependence on s, the
error analysis is more subtle. However, from Theorems 15 and 16 we conjecture that
the G-NI approximation converges at a rate comparable to the pseudospectral and
Galerkin approximations for a properly incremented Gaussian quadrature rule and
polynomial basis set.
5.2.1
A Useful Decomposition
The true advantage of the G-NI method is not a superior rate of convergence, but a
practical decomposition of the matrix ππ T ⊗ A n If we explicitly write the Gaussian
quadrature formula, we get
X
T
να (π(λα )π(λα )T ⊗ A(λα ))
ππ ⊗ A n =
α≤n−1
(5.6)
86
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
Recalling the definition of νλ in equation (4.6), we define qα = π(λα )/kπ(λα )k2 , so
that
Let Q be the |I| ×
Q
k
X
ππ T ⊗ A n =
qα qTα ⊗ A(λα ).
(5.7)
α≤n−1
nk matrix whose columns are qα . Then we can write the
factorization of the G-NI matrix as
T
ππ ⊗ A n = (Q ⊗ I)A(Λ)(QT ⊗ I),
(5.8)
where I is the N ×N identity matrix and A(Λ) is a block-diagonal matrix with N ×N
blocks equal to A(λα ) for each α ≤ n − 1. The matrix Q has one row for each basis
function from π(s) and one column for each α ≤ n − 1. By the orthogonality of π(s)
and the polynomial exactness of the quadrature rules, the rows of Q are orthogonal.
This implies that QQT = I, where I is the appropriately sized identity matrix.
The columns of Q are, in general, not orthogonal. However, if the basis set π(s)
is constructed as a tensor product of univariate bases (see equation (4.4)), i.e.
π(s) = π n1 (s1 ) ⊗ · · · ⊗ π nd (sd ),
(5.9)
and if each nk is equal to the number of points in the univariate quadrature rules λnk
that begets λ (i.e. as many points in the tensor grid as polynomials in the tensor
basis), then Q becomes square and orthogonal. In this case, the equation (5.3) can
be transformed to set of |I| = |λ| decoupled linear systems, each of size N × N. In
fact, in this case, the computed coefficients are exactly equal to the coefficients of an
interpolatory pseudospectral method with a tensor product basis (see Theorem 15).
We will assume, however, that this is not the case, i.e. that the basis set π(s) is much
smaller than the tensor product basis.
If we define the diagonal matrix D = diag(Q(0, :)), then we can write the right
hand side in (5.3) as
hπ ⊗ biΛ = (QD ⊗ I)b(Λ)
(5.10)
where b(Λ) is a vector of the parameterized right hand side b(s) evaluated at the
87
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
Gauss quadrature points. The scaling D is necessary to recover the weights of the
quadrature formula. Notice that (5.10) are also the pseudospectral coefficients of b(s)
corresponding to the basis elements in π(s).
Using equations (5.8) and (5.10), we can rewrite the G-NI system (5.3) as
(Q ⊗ I)A(Λ)(QT ⊗ I)vec(X) = (QD ⊗ I)b(Λ).
(5.11)
We will derive a number of interesting insights from this equation.
5.2.2
Eigenvalue Bounds
Eigenvalue information is always a crucial component of analyzing linear systems and
matrix operators. From the factorization (5.8), we can immediately derive bounds on
the eigenvalues of the matrix ππ T ⊗ A n . Technically speaking, we need to restrict
this statement to symmetric parameterized matrices A(s) that admit a full param-
eterized eigenvalue decomposition. Such objects were studied at length in Kato’s
seminal work [48], where he shows that parameterized matrices whose elements depend analytically on a single parameter have parameterized eigenvalues that also
depend analytically on the parameter. This work was extended by Sun to systems
that depend on several parameters [81]. We use this decomposition to prove the
following theorem, which states that the eigenvalues of the G-NI matrix are always
bounded by the extremes of the parameterized eigenvalues of A(s).
Theorem 17 Assume that A(s) = X(s)Θ(s)X(s)T is the analytic, paramterized
eigenvalue decomposition of the symmetric, parameterized matrix A(s), where s =
(s1 , . . . , sd ). For any subset of the orthogonal basis π(s) and any n-point tensor product Gaussian quadrature rule (where n = (n1 , . . . , nd )) such that each nk is greater
than the maximum degree of the basis polynomials corresponding to sk ,
min{θmin (A(s))} ≤ θmin
s
ππ T ⊗ A n ≤ θmax ππ T ⊗ A n ≤ max{θmax (A(s))}.
s
(5.12)
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
88
Proof. The elements of the matrix Q in the decomposition (5.8) are equal to the
basis polynomials π(s) evaluated at the tensor product Gaussian quadrature points
λn , i.e. Q has one row for each element of π(s) and one column for each point in λn .
Let π n (s) be the tensor product basis corresponding to the d-tuple of integers n.
By the assumption on the size of n relative to the maximum degree of π(s), each
element of π(s) is contained in π n (s). Define π ′ (s) to be the elements of π n (s) not
in π(s), and let Q′ be the matrix that has elements equal to the basis functions π ′ (s)
evaluated at the tensor product Gaussian quadrature points λn – comparable to Q.
Notice that the matrix
Q̃ =
"
Q
Q′
#
(5.13)
is square and orthogonal, i.e. Q̃−1 = Q̃T . Therefore the matrix (Q̃ ⊗ I)A(Λ)(Q̃T ⊗
I) is a similarity transformation with the block diagonal matrix A(Λ). Since A(s)
is symmetric, each block of A(Λ) is also symmetric and admits a full eigenvalue
decomposition, which we write as X(Λ)Θ(Λ)X(Λ)T where Θ(Λ) is the diagonal
matrix of the parameterized eigenvalues Θ(s) evaluated at the Gaussian quadrature
points.
By construction, the matrix
T
ππ ⊗ A n = (Q ⊗ I)A(Λ)(QT ⊗ I)
(5.14)
(Q̃ ⊗ I)A(Λ)(Q̃T ⊗ I) = (Q̃ ⊗ I)X(Λ)Θ(Λ)X(Λ)T (Q̃T ⊗ I).
(5.15)
is a submatrix of
So by an interlacing theorem [37, Theorem 8.1.7], the eigenvalues of ππ T ⊗ A n are
bounded above by the largest of θ(λn ) and bounded below by the smallest of θ(λn ).
To complete the proof, we state the obvious:
min{θmin (A(s))} ≤ θ(Λ) ≤ max{θmax (A(s))},
s
as required.
s
(5.16)
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
89
Depending on the problem, these bounds can be either tight or loose. However,
they can offer practical guidance originating from the parameterized A(s) when devising solver methods for the G-NI system.
5.2.3
A Least-Squares Interpretation
The factorization (5.8) leads to a weighted least-squares interpretation of the G-NI
system (5.11). First, we manipulate the right hand side. Define x(Λ) to be the
solution vector evaluated at each Gauss point and stacked in the appropriate order.
Then the right hand side of (5.11) becomes
(QD ⊗ I)b(Λ) = (QD ⊗ I)A(Λ)x(Λ)
= (Q ⊗ I)(D ⊗ I)A(Λ)x(Λ)
= (Q ⊗ I)A(Λ)(D ⊗ I)x(Λ)
Next we take the Cholesky factorization of A(Λ) = F (Λ)F (Λ)T , where F (Λ) is
block-diagonal and each block is lower-triangular. Substituting this factorization into
(5.11) with the manipulated right hand side we get
(Q ⊗ I)F (Λ)F (Λ)T (QT ⊗ I)vec(X) = (Q ⊗ I)F (Λ)F (Λ)T (D ⊗ I)x(Λ).
(5.17)
Upon inspection we see that (5.17) is equivalent to the normal equations for the
weighted least-squares problem
vec(X) = argmin F (Λ)T (QT ⊗ I)vec(Y) − (D ⊗ I)x(Λ) 2 .
(5.18)
vec(Y)
We can interpret this minimization problem in the following way. Notice that we can
write Q = PD where the columns of P are equal to π(λ). Applying this change we
get
vec(X) = argmin F (Λ)T (D ⊗ I) (PT ⊗ I)vec(Y) − x(Λ) 2 .
vec(Y)
(5.19)
90
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
The difference (PT ⊗ I)vec(Y) − x(Λ) exactly measures the difference between a
truncated expansion evaluated at the Gaussian quadrature points λ and the true
solution evaluated at λ. For the minimization problem, this difference is weighted by
the operator. In other words, the G-NI coefficients X produce the truncated expansion
with the smallest weighted mean-squared error at the Gaussian quadrature points.
This interpretation may not lead to a realizable computation benefit as the largescale weighted least-squares solvers may yet be too expensive for Galerkin problems
of interest, particularly if they require the computation of the Cholesky factors F (Λ).
However, the derivation offers additional insights into the results of the G-NI approximation.
5.2.4
Iterative Methods
A crucial observation follows from the factorization (5.8) that allows us to place the
G-NI method within the weakly intrusive paradigm. Note that multiplying a vector
against the matrix (5.8) can be accomplished in three steps. Suppose
vec(Z) = (Q ⊗ I)A(Λ)(QT ⊗ I)vec(U).
(5.20)
Then, using the properties of the Kronecker product,
1. W = UQ. Let wα be a column of W.
2. For α ≤ n − 1, yα = A(λα )wα . Define Y to be the matrix with columns yα .
3. Z = YQT .
Step 1 can be thought of as pre-processing, and step 3 as post-processing. Since
each row of the matrix Q has a Kronecker structure corresponding to the tensor
product quadrature rule, steps 1 and 3 can be computed accurately and efficiently
using multiplication methods such as [27].
The second step requires only constant matrix-vector products where the matrix
is A(s) evaluated at some point in the parameter space. The initialization procedure
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
91
for a Krylov-based iterative method uses the right hand side (5.10), and this is constructed with Gaussian quadrature needing only evaluations of b(s) at the quadrature
points plus post-processing. Therefore, applying such iterative methods to compute
the G-NI approximation satisfies the requirements for the weakly intrusive paradigm.
To reiterate, under this paradigm we can take advantage of a tuned, reusable interface
for the matrix vector multiplies that will exploit any sparsity in the matrix to save
memory and increase efficiency.
5.2.5
Preconditioning Strategies
The number of iterations required to achieve a given convergence criterion (e.g. a
sufficiently small residual) can be greatly reduced for Krylov-based iterative methods
with a proper preconditioner. In general, preconditioning a system is highly problem
dependent and begs for the artful intuition of the scientist. However, the structure
revealed by the decomposition (5.8) of the G-NI system offers a number of useful
clues.
Suppose we have an N × N matrix L that is easily invertible. We can construct a
block-diagonal preconditioner I ⊗ L−1 , where I is the identity matrix of size |I| × |I|.
(Recall that |I| is the number of basis function in the G-NI approximation.) If we
premultiply the preconditioner against the factored form of the G-NI matrix, we get
(I ⊗ L−1 )(Q ⊗ I)A(Λ)(QT ⊗ I) = (Q ⊗ I)(I ⊗ L−1 )A(Λ)(QT ⊗ I).
(5.21)
Notice that by the mixed product property and commutativity of the identity matrix,
the block-diagonal preconditioner slips past Q⊗I to act directly on the parameterized
matrix evaluated at the quadrature points. The blocks on the diagonal of the inner
matrix product are L−1 A(λα ) for α ≤ n − 1. In other words, we can choose one
constant matrix L to affect the parameterized system at any point in the parameter
space.
A reasonable and popular choice is the mean L = hAi; see [76, 74] for a detailed
analysis of this preconditioner for stochastic finite element systems. Notice that this
is also the (1, 1) block of the Galerkin matrix and approximately the (1, 1) block of
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
92
the G-NI matrix. However, if A(s) is very large or has some complicated parametric
dependence, then forming the mean system and inverting it (or computing partial
factors) for the preconditioner may be prohibitively expensive. If the dependence of
A(s) on s is close to linear, then L = A (hsi) may be much easier to evaluate and just
as effective.
One goal of the preconditioner is reduce the condition number of the matrix, and
one way of achieving this is to reduce the spread of the eigenvalues. If we knew a
priori which region of the parameter space yielded the extrema of the parameterized
eigenvalues of A(s), then we could choose a parameter value in that space to evaluate
some preconditioner related to A(s). Unfortunately, we only get one such evaluation.
Therefore, if the largest possible value of he parameterized eigenvalues is very large,
we may choose this parameter value. Alternatively, if the smallest eigenvalue is close
to zero (for positive definite systems), then this may be a better option to reduce the
condition number of the G-NI system.
In any case, the structure of the G-NI matrix revealed in the factorization (5.8)
offers many clues for constructing effective preconditioners to compute the coefficients
of the G-NI approximation, regardless of the structure of the parameterized matrix
A(s).
5.3
Parameterized Matrix Package — A MATLAB
Suite
We have implemented many G-NI variants in a functioning MATLAB suite of tools
available on the online repository Github. This suite takes advantage of the common
matrix-vector product interface in the weakly intrusive paradigm, and – unlike most
stochastic finite element codes – does not require any symbolic integration to compute
an approximate Galerkin solution. It also contains codes for computing pseudospectral approximations with tensor product bases, which is useful for verification when
dealing with relatively small problems. The MATLAB codes, documentation, and a
series of examples can be downloaded at http://github.com/paulcon/pimp/tree/
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
93
master.
5.4
Application — Heat Transfer with Uncertain
Material Properties
To conclude this chapter, we examine an application from computational fluid dynamics with uncertain model inputs. The codes (affectionately named Joe) used to
compute the deterministic version of this problem – i.e. for a single realization of the
model inputs – were developed at Stanford’s Center for Turbulence Research as part
of the Department of Energy’s Predictive Science Academic Alliance Program; the
numerics behind the codes are described in [75]. For this example, we made a slight
modification to the codes which allowed the extraction of the non-zero elements of
the matrix and right hand side used in the computation of the temperature distribution. Once we gained access to the linear system, we were able to apply the G-NI
method to the stochastic version of the problem in the weakly intrusive paradigm to
approximate the statistics of solution.
5.4.1
Problem Set-up
The equation we are solving with Joe is the integral version of a two-dimensional
steady advection-diffusion equation. We seek a scalar field φ = φ(x, y) representing
the temperature defined on the domain Ω that satisfies
Z
∂Ω
Z
~ =
ρφ ~v(s4 , s5 , s6 ) · dS
∂Ω
~ .
(Γ(s1 , s2 , s3 ) + Γt ) ∇φ · dS
(5.22)
where ρ is the density, assumed constant, and the velocity ~v is precomputed by
an incompressible Navier-Stokes model and then randomly perturbed by three spatially varying oscillatory functions with different frequencies whose magnitudes are
respectively parameterized by s4 , s5 , and s6 . The parameterized magnitudes are interpreted as random perturbations of the velocity field, which is simply an input to
this model. The diffusion coefficient Γ = Γ(s1 , s2 , s3 ) is similarly altered by a strictly
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
94
Figure 5.1: Mesh used to compute temperature distribution.
positive, parameterized, spatially varying function that models random perturbation.
The turbulent diffusion coefficient Γt is computed using a Reynolds-Averaged NavierStokes (RANS) model. The domain Ω is a channel with a series of cylinders; the
computational mesh on the domain Ω contains roughly 10,000 nodes and is shown in
Figure 5.1. Inflow and outflow boundary conditions are prescribed in the streamwise
direction, and periodic boundary conditions are set along the y coordinate. Specified
heat flux boundary conditions are set on the boundaries of the cylinders to model a
cooling system.
5.4.2
Solution Method
The goal is to compute the expectation and variance of the scalar field φ = φ(x, y, s1, . . . , s6 )
over the domain Ω with respect to the variability introduced by the parameters. We
use the G-NI method to construct a polynomial approximation of φ along the coordinates induced by the parameters s1 , . . . , s6 . We employ the ANOVA-based heuristic
proposed in Chapter 4 to choose an anisotropic basis of orthogonal polynomials, but
we do not include the curvature parameters for the interaction effects. To solve the
G-NI system, we use a BiCGstab [80] method since the matrix is not symmetric.
5.4.3
Results
The weights computed for the anisotropic basis (see equation (4.36)) are shown in
Table 5.1. The weights for coordinates 5 and 6 are orders of magnitude larger than
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
95
γ1
γ2
γ3
γ4
γ5
γ6
1.00 6.79 3.67 16.40 4859.35 4265.45
Table 5.1: The weights computed with the ANOVA-based method for choosing an
efficient anisotropic basis.
1 – the weight for the most important coordinate. With respect to the basis, this
implies that the polynomial expansion along the first coordinate must expand beyond
degree 4859 before it expands at all along the 5th coordinate. This is effectively a
dimension reduction. The ANOVA-based method finds that the 5th and 6th parameter coordinates do not contribute to the overall variability in the function compared
to the first (and most important) coordinate. This dramatically reduces the number
of basis functions required for the approximation. In Figure 5.2, we plot the growth
of the number of terms in the weighted, ANOVA-based anisotropic basis compared
to the growth of a full polynomial basis; we observe orders of magnitude difference in
the number of basis functions between the two methods.
Using the full polynomial basis for comparison would have been infeasible. Therefore we plot the residual error estimate (Theorem 13) for the weighted basis as a
function of the parameter n. This is shown in Figure 5.3. The sharper decrease
(i.e. the stairstep behavior) in the convergence of the residual occurs when an important coefficient is added to the approximation. This signals that there are possible
improvements to the heuristic for choosing the basis functions.
Finally, we show the expectation and variance of φ over the domain Ω in Figure
5.4. As expected, the variance in φ occurs in the downstream portion of the domain
as a result of the variability in the diffusion coefficient.
5.5
Summary
In this chapter, we introduced a new paradigm for spectral methods for parameterized
matrix equations dubbed weakly intrusive that offers a middle ground in the debate
over the relative advantages of intrusive versus nonintrusive methods. The weakly
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
96
Figure 5.2: The number of terms as n increases in the weighted, ANOVA-based
anisotropic basis compared to the number of terms in the full polynomial basis.
Figure 5.3: Plotting the residual error estimate of the G-NI approximation with the
weighted, ANOVA-based polynomial basis.
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
97
Figure 5.4: The expecation (above) and variance (below) of the temperature field φ
over the domain.
intrusive paradigm – in analogy to Krylov-based iterative solvers – uses only matrixvector products against the parameterized matrix evaluated at points in the parameter
space, thus creating possibility of code reuse and exploiting sparsity for memory and
efficiency gains. Also, the weakly intrusive paradigm allows the computation of a
residual error estimate for a given approximation of the solution x(s), which is not
possible in purely nonintrusive methods.
We presented a variant of the spectral methods called Galerkin with Numerical
Integration, or G-NI, which is equivalent to the Galerkin method but we replace
all integrals by Gaussian quadrature formulas. We showed how this simple change
opens up many new avenues for analysis and interpretation of the results through an
elegant factorization of the linear system of equations solved to compute the G-NI
coefficients. This factorization allowed us to implement the G-NI method within the
weakly intrusive paradigm. It also allowed us to derive bounds on the eigenvalues
of the G-NI matrix, which are useful for analysis of the method and connecting
it to the original parameterized matrix equation. The factorization also suggested
broad strategies for preconditioning the G-NI system for rapid convergence inside an
CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS
98
iterative solver.
We tested this method with the ANOVA-based anisotropic basis on an engineering
application from computational fluid dynamics. We were able to reuse code developed
for the associated deterministic problem and take advantage of a custom matrix-vector
product interface. The ANOVA-based method found an appropriate anisotropic basis
for the solution and showed rapid convergence of the residual error estimate with
orders of magnitude fewer terms compared to the full polynomial basis.
Chapter 6
An Example from Conjugate Heat
Transfer
In this chapter, we demonstrate the application of the spectral methods beyond the
context of parameterized matrix equations. While the approximation of many engineering models reduces to solving appropriate linear system, many nonlinear models
do not reduce in such a straightforward manner. Fortunately, we are not hindered by
such difficulties when applying the nonintrusive pseudospectral/collocation methods.
The work in this chapter has been published in [18].
In what follows, we examine the incompressible flow and heat transfer around an
array of circular cylinders. In this case the momentum transport is decoupled from the
energy equations and this allows us to derive a semi -intrusive method combining the
advantages of intrusive and non-intrusive methods. The physical model is based on
the two-dimensional Reynolds-averaged Navier-Stokes (RANS) equations completed
by an eddy-viscosity turbulence model [63]. We introduce stochastic boundary conditions to account for uncertainties in the incoming flow and the thermal state of the
cylinder surface.
To approximate the statistics of the stochastic temperature field, we derive a hybrid uncertainty propagation scheme that applies (i) a spectral collocation method to
the momentum equations and (ii) a spectral Galerkin method to the energy equation.
The Galerkin form of the energy equation naturally decouples to a set of stochastic
99
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
100
scalar transport equations, and the resulting system is developed within a commercial
computational fluid dynamics code [47].
There has been a flurry of recent work applying both intrusive and non-intrusive
techniques to stochastic flow models. We refer the curious reader to the following
papers for further details [52, 83]. General hybrid propagation methods are also not
without precedent; see [31] for a similar approach that integrates spectral expansion
methods with a collocation-like procedure for an efficient solution approach.
The chapter is structured as follows: Section 6.1 describes the problem in full
detail including our modelling choices for the uncertain input parameters and the
specific objectives of our computations. Section 6.2 derives the hybrid propagation
scheme for the model. Section 6.3 presents the results and analysis of our numerical
experiments.
6.1
Problem Description and Motivation
To achieve higher thermal efficiency and thrust, modern gas turbine engines operate
at high combustor outlet temperatures and, therefore, turbine blades undergo severe
thermal stress and fatigue. Secondary cooling flow passages are built into each blade
(Figure 6.1) and consist of turbulators, film cooling holes, tube bundles, and pins.
These are mostly used in the narrow trailing edge region of the blade [59].
In the present study we consider the flow and heat transfer around a periodic
array of pins separated by a distance L/D = 1 (where D is the cylinder diameter).
The flow conditions are assumed to be fully turbulent with a Reynolds number based
on the incoming fluid stream (and D) of ReD = 1, 000, 000. In this regime, direct
solutions of the Navier-Stokes equations are impractical due to the range of length
and time scales, which result in the extremely large computational cost; we resort to
Reynolds-averaged modeling.
The problem is assumed two-dimensional and the computational domain is representative of a single row of aligned cylinders; x1 ∈ [−2, 10] and x2 ∈ [−2, 2] with
a circle of radius 0.5 centered at the origin. Periodicity is enforced in the vertical
direction with inflow and outflow conditions applied in the streamwise direction (see
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
101
Figure 6.1: A turbine engine cooling system with a pin array cooling system.
Figure 6.2).
Most numerical simulations of similar phenomena rely on simple thermal boundary conditions (constant temperature or constant heat flux) to evaluate the heat
transfer characteristics of pin cooling devices. In realistic operating conditions, the
overall surface thermal state is the result of an energy balance between convection and
conduction in the fluid and in the solid. Therefore, accurate predictions of the heat
transfer rates require the solution of the conjugate (solid-fluid) heat transfer problem. Instead of modeling the solid-fluid interactions directly [46], we introduce an
uncertain heat flux on the boundary of the cylinder as a mild substitute; the precise
formulation of this is given below.
Another simplification that is typically invoked in the design of blade cooling
systems is to ignore the interactions between the various components (turbulators,
slots, etc.) and optimize their performance independently. The pins, in particular,
are the last stage of the cooling system and, therefore, more strongly affected by flow
distortions introduced upstream. To investigate the effect of inflow perturbations,
we model the uncertain inflow as a linear combination of oscillatory functions with
different wave lengths and random amplitude; the precise formulation is given below.
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
102
Figure 6.2: Computational mesh for two-dimensional cylinder problem.
6.1.1
Mathematical Formulation
The governing equations are the two-dimensional RANS equations written in the
assumption of incompressible fluid and steady flow. The conservation of mass, momentum and energy can be written in indicial notation as:
∂Ui
=0
∂xi
(6.1)
∂Ui
∂Ui
1 ∂P
∂
Uj
(ν + νt )
−
=
∂xj
∂xj
∂xj
ρ ∂xi
(6.2)
∂
∂T
=
Uj
∂xj
∂xj
νt
∂T
κ+
P rt ∂xj
(6.3)
where the density (ρ), the molecular viscosity (ν) and the thermal conductivity (κ)
are given properties of the fluid and assumed constant. The eddy viscosity (νt ) is
computed using the k-ω turbulence model [63], and the turbulent Prandtl number
(P rt ) is assumed to be a constant. In the assumption of incompressible flow, the
energy equation is decoupled from the momentum equation and can be solved after
the velocity field is computed.
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
6.1.2
103
Uncertainty sources
We assume that the sources of uncertainty are the specification of the velocity boundary condition on the incoming flow – the effect of the upstream components – and the
definition of the thermal condition on the surface of the cylinder – the effect of the
conductivity on the pin. Let s1 , s2 , and s3 be independent random variables on an appropriate probability space, each uniformly distributed over the interval [−1, 1]. The
uncertain boundary conditions are prescribed by continuous functions of si , i = 1, 2, 3,
and are therefore random variables themselves.
The inlet velocity profile is constructed as a linear combination of two cosine
functions of x2 ∈ [−2, 2], i.e.
U|inlet (x2 , s1 , s2 ) = 1 + σ1 (s1 cos(2πx2 ) + s2 cos(10πx2 )).
(6.4)
where σ1 controls the inflow velocity fluctuations. For numerical experiments, we
set σ1 = 0.25, which ensured that the amplitude of the random fluctuations did not
cause the inlet velocity to become negative. This model allowed moderate random
fluctuations (at most 25%) about a mean value, hUinlet i = 1. The wave numbers 2 and
10 in (6.4) were chosen to introduce low and high frequency fluctuations, respectively.
The heat flux is specified as an exponential function of s3 over the cylinder wall,
namely
∂T (θ, s3 ) = e−(0.1+σ2 s3 )(cos(θ)/2)
∂n cyl
(6.5)
where n is the normal to the cylinder and σ2 controls the influence of s3 . The angle
θ ∈ [0, π] is the angle away from the front of the cylinder as shown in figure 6.3. For the
numerical experiments, we chose σ2 = 0.05. This model prescribes a larger heat flux
at the left side of the cylinder where flow strikes it; the realization of s3 determines
precisely how much greater. The maximum variability due to s3 is approximately
2.5%.
The one-way coupling in equations (6.2) and (6.3), implies that the heat flux
boundary condition parameterized by s3 has no affect the velocity field. Therefore
we can write Ui = Ui (x, s1 , s2 ) for i = 1, 2. However, the temperature field does
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
104
Figure 6.3: Schematic of uncertain inflow conditions. The arrows represent the
stochastic inflow conditions, and the shading represents the heat flux on the cylinder
wall.
depend on the variability in the velocity inflow specification, which we denote by
T = T (x, s1 , s2 , s3 ). This observation is crucial to the derivation of the hybrid method
in section 6.2.
Remarks
Our approach to modeling input uncertainties is admittedly ad hoc. For a real application – instead of a problem that is simply motivated by a real application –
the parameters of a stochastic model would be estimated from experimental data,
e.g. using a procedure such as [32].
6.1.3
Objective
We are interested in the effects of the input uncertainties on the temperature distribution around the cylinder wall. To this end, we desire the variance of temperature
as a function of θ around cylinder. In other words, we wish to compute
σT2 (θ)
h
i
≡ Var T |cyl (θ) =
T |cyl (θ, s1 , s2 , s3 ) − µT (θ)
2 ,
(6.6)
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
105
E
D
where µT (θ) = T |cyl (θ). To approximate σT2 , we use a hybrid stochastic Galerkin/collocation
scheme described in the next section.
6.2
A Hybrid Propagation Scheme
In general, the spectral Galerkin method requires the solution of a large, coupled system of equations to solve for the coefficients of the global expansion. In contrast, the
collocation method requires the evaluation of the parameterized model at a discrete
set of points, similar to sampling methods such as Monte Carlo; thus the collocation
method can typically be implemented in a nonintrusive fashion. The difference in
the approximation given by each approach, known as aliasing error, typically decays
like the approximation error, i.e. exponentially fast, for linear problems [16].
In what follows, we apply a Galerkin method to the energy equation (6.3) and a collocation method to the nonlinear momentum equation (6.2). A non-aliased Galerkin
formulation of the full RANS equations would introduce a large, coupled system for
the coefficients of the Galerkin approximation because of the non-linear convective
operators in the momentum, the energy, and the turbulence transport equation. This
greatly complicates the solution procedure and cannot be accomplished within the
framework on a commercial CFD code. The present approach represents an attempt
to retain an efficient Galerkin formulation for the (linear) energy transport while
relying on a collocation formulation for the non-linear momentum equation. The result is a semi -intrusive hybrid scheme that takes advantage of the flexibility in the
commercial software used to solve the flow problem.
Remark
The convergence of both Galerkin and collocation depend on the smoothness of the
quantities of interest with respect to the parameters. We assume that the relatively
small and bounded range of variability in the boundary conditions ensures that the
solution satisfies such a smoothness assumption and does not introduce any singularities in the solution within the parameter space.
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
6.2.1
106
Galerkin method for the energy equation
To solve the Galerkin form of the energy equation (6.3), we express the Galerkin
approximation of temperature TN as an orthogonal expansion in s3 :
TN = TN (x, s1 , s2 , s3 ) =
N
X
Tk (x, s1 , s2 )πk (s3 ).
(6.7)
k=0
where the πk (s3 ) are the normalized Legendre polynomials. For notational convenience, we write this in vector notation as
TN = TT π(s3 ),
(6.8)
where T is a vector of the expansion coefficients and π(s3 ) a vector of the Legendre
basis polynomials. Then by projecting the energy equation onto each basis polynomial
and requiring the residual to be orthogonal to the approximation space, we can write
the Galerkin form as
∂
νt
∂
∂
T
T
T
T
Uj
κ+
=
(T π(s3 ))π(s3 )
(T π(s3 )) π(s3 ) ,
∂xj
∂xj
P rt ∂xj
(6.9)
subject to the boundary conditions
∂
T
T (T π(s3 ))π(s3 ) = e−(0.1+σ2 s3 )(cos(θ)/2) π(s3 )T .
∂n
cyl
(6.10)
Note that the projection is with respect to only s3 . By the linearity of the expectation
operator and the orthonormality of the basis, equation (6.9) reduces to a set of uncoupled scalar transport equations for the coefficients of the Galerkin approximation
TN .
∂Tk
∂
Uj
=
∂xj
∂xj
νt
∂Tk
κ+
,
P rt ∂xj
k = 0, . . . , N,
(6.11)
each subject to boundary conditions on the cylinder wall given by
−(0.1+σ2 s3 )(cos(θ)/2)
∂Tk =
e
π
(s
)
.
k
3
∂n cyl
(6.12)
107
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
(Recall that the subscript on U denotes the spatial coordinate while the subscript on T
denotes the coefficient in the Galerkin expansion.) Note that the velocity components
Uj and the temperature expansion coefficients Tk are functions of the spatial variables
x and the parameters s1 and s2 . Thus by exploiting the one-way coupling in the
RANS model, we have effectively replaced the random variable s3 by a set of N +
1 parameterized transport equations. We can treat the new system of equations
(momentum plus scalar transports) with a collocation method in two dimensions.
6.2.2
Collocation method for the modified system
Following the prescribed collocation algorithm, we evaluate the solution of the model
(6.2) and (6.11) at a discrete set of points within the range of the random variables s1
and s2 . We choose a tensor grid of Gauss-Legendre points with M + 1 points in each
direction. In other words we solve (M + 1)2 deterministic systems given by equations
(i)
(j)
(6.2) and (6.11), so that the parameterized solution is exact at each point (λ1 , λ2 ),
for i, j = 0, . . . , M in the two-dimensional Gauss-Legendre grid.
An important part of any implementation of a non-intrusive propagation technique
(i)
(j)
is the deterministic solver. For each point (λ1 , λ2 ), we employ the commercial software package Fluent [47] to solve for the temperature coefficients Tk and velocity fields
U1 and U2 in the modified RANS equations. Fluent uses a finite volume second-order
discretization on unstructured grids. The mesh has been generated to achieve high
resolution of the boundary layer on the cylinder surface with y + ≈ 1. We performed
preliminary simulations to assess the resolution requirements for the present problem.
Each deterministic solve is converged to steady state by ensuring that the residuals
of all the equations are reduced by four orders of magnitude.
We are not interested in the response surface of the temperature as a function
of s1 , s2 , and s3 – only its variance. Therefore we do not need to construct the
interpolant through the collocation points. Instead we approximate the variance of
temperature as a function of θ with the following steps.
(i)
(j)
1. For each point (λ1 , λ2 ) in the tensor grid of Gauss-Legendre points, solve for
(i,j)
the velocity field U1
(i,j)
and U2
.
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
(i,j)
2. For k = 0, . . . , N solve the scalar transport equation for Tk
108
using the result
from the velocity computation.
3. Compute the approximate variance of temperature as
µT (θ) ≈
M X
M
X
(i,j)
T0
i=0 j=0
wi,j ≡ µ̄T (θ)
M
M X
N
X
X
(i,j)
2
σT (θ) ≈
(Tk (θ))2
i=0 j=0
k=0
!
(6.13)
wi,j − µ̄T (θ)2 ≡ σ̄T2 (θ),
(6.14)
where wi,j is the weight corresponding to the Gauss-Legendre two-dimensional
quadrature rule. This approximation follows from the variance formula (6.6)
applied to the Galerkin approximation TN .
6.2.3
Analysis of computational cost
The hybrid approach benefits both from the increased accuracy of the Galerkin formulation and decreased computational cost. In this section we compare the cost of
the hybrid method to a naive three-dimensional collocation method. Let C1 be the
cost of one deterministic solve of the standard RANS equations (6.1)-(6.3), and let C2
be the cost of the modified system (6.1), (6.2), and (6.11), (both solved with Fluent).
Assume that
C2 = α(N)C1 ,
(6.15)
where α(N) > 1. The cost of naive three-dimensional collocation κc is then C1 (M +
1)3 , and the cost of the hybrid method κh is C2 (M + 1)2 = α(N)C1 (M + 1)2 . Thus
we have the following relation.
κh =
α(N)
κc
M +1
(6.16)
In the numerical experiments in section 6.3, we found that N = 4 was sufficient
for converged variance, and α(4) ≈ 2. The M required for converged variance was
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
109
18. Therefore the hybrid method is roughly ten times as efficient as a naive threedimensional collocation.
We note that if we had parameterized the stochastic heat flux boundary condition
by more random variables to account for random spatial fluctuation, we expect the
savings to be even greater.
6.3
Results
In this section, we demonstrate numerical convergence of the approximate variance
computed with the hybrid method, and we compare the results with a conventional
Monte Carlo uncertainty propagation method. Following this verification, we make
some remarks about the physical phenomenon described by the stochastic model.
6.3.1
Numerical convergence and verification
We first check convergence of the variance approximation from the hybrid method by
increasing (i) the order of the Galerkin approximation TN and (ii) the number of points
in the collocation scheme. Figure 6.4 displays the difference between the two-norm of
the Galerkin approximations TN and TN −1 on the cylinder wall for two-dimensional
tensor collocation schemes built from successive Gauss quadrature formulas. We set a
tolerance of 10−5 to be consistent with the convergence tolerance for each Fluent solve.
The approximation achieves the chosen tolerance when N = 3 for each quadrature
formula, and we include N = 4 to increase the confidence in the convergence.
From these results we conclude that N = 4 is sufficient for converged results. We
then check the convergence of the quadrature rule by increasing the number of points
M. Note that the total number of points in the two-dimensional rule is (M + 1)2 .
We adjust the tolerance for the quadrature rule to 10−4 to account for the increased
effects of rounding due to quadrature. In Figure 6.5, we plot the difference in the
variance computed with an M point rule and an M − 1 point rule. The values appear
to converge after M = 10, and we compute the remaining points to ensure the results
remains within the tolerance.
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
110
The relatively slow convergence of the quadrature versus Galerkin is not a result
of any deficiencies in the method; the quadrature integrates along the coordinates
that affect the nonlinear momentum equation. The nonlinearity in the computation
of velocity yields substantial error in the quadrature which is absent in the Galerkin
approximation of the linear energy equation.
9
25
49
81
121
169
225
289
361
tol
0
||Var[TN]−Var[TN−1]||2
10
−2
10
−4
10
−6
10
−8
10
−10
10
1
2
3
4
N
Figure 6.4: Convergence of the variance of the Galerkin approximation TN as the
number of terms in the expansion N increases. Each line represents the convergence
for each quadrature rule. The convergence tolerance is 10−5 .
6.3.2
A physical interpretation
In Figures 6.6 and 6.7 we plot the approximate expectation µ̄T and variance σ̄T2 as a
function of θ around the top half of the cylinder wall. We compare these results to
a Monte Carlo method with 10,000 samples. The purpose of this comparison is to
verify the qualitative features of the hybrid results, which fare very well.
In the expectation plot, the separation point of the flow is clearly identified by the
sharp dip in the temperature. After the separation, the temperature rises again from
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
111
−3
10
||Var[TN]M||2−||Var[TN]M−1||2
tol
−4
10
−5
10
4
6
8
10
12
14
16
18
M
Figure 6.5: Convergence of the variance of the Galerkin approximation T4 as the
number of points in the quadrature rule M increases. The convergence tolerance is
10−4 .
the impact of the recirculating flow in the wake of the cylinder. The variance plot
shows that the variability in temperature is largest at the front of the cylinder, i.e.
the stagnation point of the flow. This results from two sources: First, the heat flux
boundary condition (equation 6.5) is defined to have larger variability at the front of
the cylinder. Second, the variability in the inflow conditions (equation 6.4) affects
the formation of a thermal boundary layer around the front of the cylinder wall, thus
modifying flow conditions at the stagnation point.
In addition to a large temperature variance at the stagnation point, Figure 6.7
illustrates that the variability is considerably higher in the boundary layer upstream
of the separation (θ ≈ π/2) than in the downstream area. This is expected since the
variability in the upstream conditions does not directly penetrate the separated shear
layer. In other words, the location of the flow separation is only a function of the
Reynolds number that is not considerably altered by the different inflow conditions.
We can qualitatively assess the effects of the inflow variability on the variance at
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
112
180
160
E[TN](θ)
140
120
100
80
0
hybrid
MC
0.5
1
1.5
2
θ (radians)
2.5
3
Figure 6.6: Approximate expectation as a function of θ around the cylinder wall
computed with the hybrid method and Monte Carlo.
the stagnation point by approximating the conditional variance on the cylinder wall
given s1 = s2 = 0. In fact, we have computed this quantity already when computed
the solution corresponding to the central Gauss quadrature point (λ1 = 0, λ2 = 0).
We approximate the conditional variance by
Var[T (x, s1 , s2 , s3 ) | s1 = s2 = 0] ≈ Var[TN (x, 0, 0, s3 )] =
N
X
Tk (x, 0, 0)2
(6.17)
k=1
In Figure 6.8 we plot the conditional variance against the Monte Carlo variance on
the wall of the cylinder. We note that the conditional variance is much smaller than
the Monte Carlo variance, which suggests that the total variance in temperature has
significant contributions from the variability in s1 and s2 .
CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER
113
300
hybrid
MC
Var[TN](θ)
250
200
150
100
50
0
0.5
1
1.5
2
θ (radians)
2.5
3
Figure 6.7: Approximate variance as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo.
300
Var. (MC)
Cond. Var. (hybrid)
Var[TN](θ)
250
200
150
100
50
0
0.5
1
1.5
2
θ (radians)
2.5
3
Figure 6.8: Approximate conditional variance at s1 = s2 = 0 as a function of θ around
the cylinder wall computed with the hybrid method and Monte Carlo.
Chapter 7
Summary and Conclusions
In this final chapter, we summarize the development and contributions of this dissertation and discuss possible extensions and future work.
7.1
Summary of Results
The fundamental issue motivating this work is the need to understand the variability
in the output of a physical model given a parametric representation for the model
inputs. This question is explored in the burgeoning field of uncertainty quantification,
which employs primarily probabilistic tools to quantify the output variability given a
stochastic representation of the inputs in terms of a finite set of random variables or
parameters. In many cases of interest, the computational procedure for exploring this
relationship involves a spatial discretization of the differential operators in the physical model, where the elements of discrete operators and forcing terms then depend
on the newly introduced input parameters. In other words, these models yield a parameterized matrix equation, where the objective is to approximate the vector-valued
function that solves the parameterized equation or compute some derived statistics
of the solution. Beyond this initial motivation, parameterized matrix equations appear in a wide range of applications including image processing, webpage ranking,
circuit design, control problems, and recently an interpolation scheme for arbitrarily
distributed data points.
114
CHAPTER 7. SUMMARY AND CONCLUSIONS
115
With its wide range of applicability and generality, we offer the easily stated
parameterized matrix equation as a model problem for analysis and algorithm development for problems of interest in uncertainty quantification and beyond. Formally,
we examine the equation
A(s)x(s) = b(s),
s ∈ [−1, 1]d ,
(7.1)
where we assume that the elements of A(s) and b(s) depend analytically on the
parameters s and the matrix A(s) is nonsingular for all s ∈ [−1, 1]d . In Chapter 2, we
analyze this model problem and discuss characteristics of the vector-valued solution
x(s). In particular, we show how using Cramer’s rule we can write each component
of x(s) as a ratio of determinants of parameterized matrices. The assumptions then
imply that each component of x(s) depends analytically on s as well. This rational
structure reveals insight into the solution that we can use to understand the types of
functions we will encounter in the approximation schemes. We take advantage of these
insights in an informal discussion of singularities and their importance for computing
statistics of x(s). If singularities exist within the parameter space, then some desired
statistics may not exist (i.e. the integral quantities may be infinite). Other highlights
from this informal discussion include noting the difficulties associated with unbounded
parameter spaces, developing an intuition for the variability in x(s), and relating the
position of singularities to parameter values outside the hypercube where A(s) is
singular. The primary purpose of the Chapter 2 is to become familiar with sorts of
functions and measurements we will approximate with the numerical schemes.
In Chapter 3, we presented the univariate polynomial approximation schemes –
known as spectral methods – for computing the desired statistics of x(s), where s
is a single parameter. Broadly speaking, these methods construct a finite degree
polynomial that globally approximates x(s) over the parameter space; approximate
statistics are then computed from the constructed polynomial. These methods are
well-studied in the context of numerical methods for partial differential equations, and
theory of polynomial approximation is now considered classical. We briefly sketched
the necessary background in orthogonal polynomials, Lagrange interpolation, Fourier
CHAPTER 7. SUMMARY AND CONCLUSIONS
116
series for square-integrable functions, and Gaussian quadrature which underlies the
spectral methods.
We then derived a pseudospectral method that uses a Gaussian quadrature rule
to approximate the coefficients of a truncated Fourier expansion of x(s). We show
how the finite-term pseudospectral approximation interpolates x(s) at the Gaussian
quadrature nodes, and we emphasized that computationally this method only requires
the solution of the parameterized matrix equation at a finite set of parameter values.
Therefore this procedure is categorized as non-intrusive, since one may repeatedly
employ an optimally tuned solver for the matrix equation that results from choosing
a parameter value.
We also derived a spectral Galerkin method that finds the coefficients of a finite
series approximation in the orthogonal polynomial basis such that the residual is
orthogonal to the finite dimensional approximation space spanned by the basis polynomials. While the solution is optimal in the energy norm induced by the operator,
computing its coefficients requires solving a linear system of equations that is n times
larger than the original parameterized system, where n is the number of terms in the
finite series. Unfortunately, we cannot exploit existing solvers for the parameterized
system given a parameter value. Therefore the Galerkin method is categorized as
intrusive.
We rigorously compared the relative merits of both the Galerkin and pseudospectral methods in Chapter 3. Via this comparison, we uncovered a fascinating interpretation of the difference between the two methods in terms of the symmetric,
tridiagonal Jacobi matrices of three-term recurrence coefficients of the orthogonal basis. In short, we found that the coefficients of both approximations solve a carefully
truncated infinite system of linear equations. In the pseudospectral case, first the
infinite Jacobi matrix is truncated and then the operator is applied to arrive at the
necessary system. In the Galerkin case, the operator is applied to the infinite Jacobi
matrix, and the resulting system is truncated. This result is new, and it extends
the work of Golub and Gautschi [29, 38] on Gaussian quadrature and its relationship
to the Jacobi matrices. As a corollary to this result, the conditions for equivalence
between the two methods becomes immediately apparent.
CHAPTER 7. SUMMARY AND CONCLUSIONS
117
After uncovering this relationship, we used classical theory to derive asymptotic
error estimates for both methods in terms of the mean-squared norm. We found that
both methods have the same asymptotic rate of convergence – which was well-known
in the context of spectral methods for PDEs – and that rate is intimately tied to
the location of the nearest singularity in the solution extended to the complex plane.
Loosely speaking, the closer the singularity is to the region of interest, the slower the
convergence of the polynomial approximation, which implies that an approximation
may need many terms to achieve a fixed accuracy in the mean-squared norm. However, the asymptotic error estimates are typically not useful in practice. Therefore,
we also derived a residual error estimate for a given approximation, which functions
as an a posteriori error estimate.
The comparable asymptotic convergence rates lead us to favor the nonintrusive
pseudospectral method for its ease of implementation. We close Chapter 3 with two
representative numerical examples and an application to the PageRank model for
ranking nodes in a graph. The PageRank model includes a parameter, which we
consider variable, and we used the pseudospectral method to compute the statistics
of a random variant of PageRank [34].
In Chapter 4 we extended the univariate spectral methods to problems that depend
on multiple parameters, which led to the challenging realm of multivariate approximation. The most natural extension of these methods is through a tensor product
construction of the multivariate basis. This extension uses all possible products of
the univariate basis functions for each parameter to construct the multivariate basis
functions. The analysis for such a basis follows in a straightforward way from the
univariate analysis, and all the connections related to the Jacobi matrices extend via
Kronecker products. However, such a basis is highly impractical – and often computationally infeasible – since the number of basis functions increases exponentially
as the number of parameters increases. We therefore turned to non-tensor bases in
the hope of finding an accurate approximation method with significantly reduced
computational cost.
Standard multivariate polynomial approximation uses the so-called full polynomial basis, which is equivalent to using basis functions whose multi-index elements
CHAPTER 7. SUMMARY AND CONCLUSIONS
118
sum to something less or equal than a given n. Comparing this to the tensor construction – where each element of the basis multi-indices must be less than or equal to n
– we saw a direct analogy with norms on Rd . In particular, the full polynomial basis
corresponds to a one norm restriction on the multi-indices while the tensor product
basis corresponds to an infinity norm restriction on the multi-indices. We use this
analogy to generalize the possible set of basis elements to a parameterized, weighted
semi-norm restriction on the allowable multi-indices in a finite term polynomial approximation of x(s). This basis let us exploit any anisotropic parameter dependence
in the solution. In other words, if some subset of the parameters contributes more to
the overall variability of x(s) – as measured by the mean-squared norm – then this
weighted basis can choose more basis elements to efficiently capture the effects of the
important parameters and less basis elements for the unimportant parameters.
To choose the weights of the semi-norm, we used the functional ANOVA decomposition of x(s) to determine the most important parameters as characterized
by their main effects functions and interaction effects functions. We developed an
iterative heuristic that computes a polynomial approximation for a fixed n, computes the ANOVA statistics for the approximation, transforms the ANOVA statistics
to weights for the semi-norm restriction, and computes an approximation at n + 1
with this semi-norm. Through this procedure we choose the most important basis
functions to represent a finite term approximation of x(s), where the importance is
related to the magnitude of the associated coefficient, i.e. its contribution to the total
variability.
We tested this method on a series of simple problems built to showcase the merits
of the method: The first problem has a solution that depends anisotropically on its
two parameters, and the ANOVA-based method uncovers and exploits this anisotropy.
The solution to the second problem has very weak interaction between its two input
parameters, and the ANOVA-based method requires many fewer basis elements to
capture the overall variability. The third problem’s solution has strong interaction
effects, and the weighted basis chooses basis elements beyond a one norm restriction
to efficiently approximate the solution. We also tested the weighted method on a
problem from literature of an elliptic partial differential equation with coefficients
CHAPTER 7. SUMMARY AND CONCLUSIONS
119
parameterized by five independent parameters. The method revealed the most important parameters and chose basis elements to capture the variability induced by
those parameters.
The purpose of Chapter 5 was to develop strategies for large-scale systems, where
the scale may come from the size of the parameterized linear system or the number
of parameters used in the model. The chapter opened by revisiting the question of intrusive versus nonintrusive and proposing a middle ground – dubbed weakly intrusive
– as an alternative paradigm for algorithmic development. Methods developed in this
paradigm are allowed (i) matrix-vector products, where the matrix is the parameterized matrix evaluated at a point in the parameter space and (ii) evaluations of the
right hand side at points in the parameter space. This yields an analogy between the
weakly intrusive paradigm and so-called matrix-free solvers, such as Krylov-based iterative methods. A helpful mnemonic for distinguishing the weakly intrusive paradigm
goes: Nonintrusive methods sample the solution whereas weakly intrusive methods
sample the operator.
We presented a variant of the Galerkin method within this framework called
Galerkin with Numerical Integration (G-NI), which is equivalent to the Galerkin
method except that every integral is computed with a Gaussian quadrature rule.
Through analyzing the resulting G-NI method, we derived a new and useful factorization of the matrix in the linear system solved to compute the G-NI coefficients.
In short, the factorization separates the G-NI system into a matrix of orthogonal
rows times a block-diagonal matrix – where each block is equal to the parameterized
matrix A(s) evaluated at the Gaussian quadrature nodes – times the transpose of the
matrix with orthogonal rows. This factorization yielded surprisingly useful insights
into this system. In particular, we showed (i) how an iterative solver applied to the
G-NI system can be implemented within the weakly intrusive paradigm, (ii) eigenvalue bounds on the G-NI system, (iii) a weighted-least squares interpretation of the
G-NI coefficients, and (iv) how preconditioning strategies become apparent from the
Kronecker structure of the factorization.
We closed Chapter 5 with an application to a heat transfer problem in a channel
with cylindrical obstructions. The input flow velocity was perturbed by introducing
CHAPTER 7. SUMMARY AND CONCLUSIONS
120
three variable parameters, and the diffusion coefficient of the fluid was parameterized
by an additional three parameters to represent uncertainties in the material properties. The G-NI method with the weighted basis was then applied to the solution of the
scalar steady advection-diffusion equation to approximate the solution as a function
of the parameters representing uncertainty. The weights computed by the method
amounted to effective dimension reduction in the approximation, which resulted in
dramatically fewer required basis functions. In this case, the full polynomial basis
would have been infeasible due to the number of terms necessary for six parameters.
We were able to take advantage of existing code for computing a related flow problem
with given (unperturbed) inputs by writing a simple matrix-vector product interface
to the code, thus working within the weakly intrusive paradigm.
In Chapter 6, we extended the spectral methods beyond the parameterized matrix equation to a nonlinear model of conjugate heat transfer based on the ReynoldsAveraged Navier Stokes equations. We derived a hybrid collocation/Galerkin method
to quantify the variability of the temperature on the boundary of a cylindrical obstruction in a channel flow given uncertainties in (i) the heat flux boundary condition
and (ii) the inflow velocity specification. Essentially, we used a spectral collocation
method for the nonlinear momentum equation and a spectral Galerkin method on
the linear energy equation. The one-way coupling in the model (i.e. temperature
depended on velocity but not vice versa) allowed us to decouple the equations for the
parameterized Galerkin coefficients of the temperature. Thus, through this method
we transformed the original two equation model to a system including the momentum
equation and n scalar transport equations – one for each term in the Galerkin approximation. This yielded dramatic cost savings for the computation of approximate
statistics.
We verified all results by computing the change in computed solution over a series
of approximations with an increasing number of terms. We computed the variance
and conditional variance of the temperature on the cylinder wall and validated these
computations with Monte Carlo estimates. We offered physical interpretations for the
spatially varying variance, and posited that such a hybrid approach may be useful for
more general models.
CHAPTER 7. SUMMARY AND CONCLUSIONS
7.2
121
Future Work
The work in this dissertation presented a complete and coherent picture of employing
spectral methods to approximate the vector-valued solutions of parameterized matrix
equations. Nevertheless, the possible extensions to this work are both broad and
numerous. We outline some important extensions in what follows. This list is by no
means exhaustive, but it does present directions with a clear initial path.
7.2.1
Improved Heuristics for Choosing a Polynomial Basis
The idea of choosing an appropriate basis for a given function that exploits some prior
known characteristics is common across many engineering heuristics for numerical
approximation. Our ANOVA-based heuristic succeeded in discovering anisotropic
parameter dependence and weak interaction effects, but the resulting approximations
tended to include too many terms for the more important dimensions. Therefore
there is room for improvement in the ANOVA heuristic to create a more balanced
basis. In particular, the drawback of the iterative method with ANOVA is that it
throws away information when collapsing the Galerkin coefficients to the ANOVA
statistics. A strategy could be developed to use the all of the Galerkin coefficients
directly without such a collapse.
7.2.2
Locating and Exploiting Singularities
For some parameterized systems of interest, the singularities in the solution are dictated by some condition on the model, e.g. positivity of the coefficients of an elliptic
equation. However, for many systems in practice the location of singularities is not
known a priori, particularly if the parameterization represents variable geometry in
some physical domain. Thus it could be very useful to develop heuristics for locating the singularities of the solution, which would then restrict the ranges of the
parameters. In general, these would be non-convex optimization procedures with no
guarantees of success.
However, if they did succeed, then the location of the singularities could potentially
CHAPTER 7. SUMMARY AND CONCLUSIONS
122
be used to construct singularity-matching rational basis functions. And this could
reduce number of terms necessary for an accurate approximation.
7.2.3
Nonlinear Models
In a sense we have restricted the applicability of our analysis to linear differential
equations whose discretizations conveniently beget matrix equations. By examining
general (nonlinear) parameter dependence in the parameterized matrix, we took an
important step toward analyzing nonlinear parameterized models. We expect that
the analysis techniques that use the Jacobi matrices can be extended to nonlinear
models as well.
Additionally, we can analyze a parameterized Newton’s iteration for approximating the solution of a nonlinear equation. In this framework, each iteration of the
Newton’s method would involve solving a parameterized matrix equation. The difficulty in the analysis lies in the fact that the parameterized matrix at a given iteration
typically depends on the solution computed at the previous iteration. Therefore, a
worst-case error analysis would see errors in the polynomial approximations accumulating at each iteration. Nevertheless, this idea is worth pursuing in future work, even
if it only yields heuristics.
7.2.4
Software and Benchmark Problems
One way to popularize a method is to write an accessible software implementation.
Such software is necessary for the continued development of methods for uncertainty
quantification, and the weakly intrusive paradigm provides an elegant algorithmic
framework for such a coding endeavor. By stipulating the interface for the matrixvector product for a given parameter value, we can implement of a host of algorithms
for solving problems that involve parameterized matrix equations. Ideally, this could
result in a flexible, cohesive, and object-oriented library of implementations that
maximize code reuse.
Another need in this vein is for a suite of benchmark problems for testing proposed approximation schemes. As it stands, many methods are supposedly widely
CHAPTER 7. SUMMARY AND CONCLUSIONS
123
applicable, but there is no standard set of problems on which to compare methods.
Simply proposing the parameterized matrix equation as a model problem encourages
the development and consideration of a set of test problems. Each problem could be
written with a common matrix-vector product interface, which would result in easy
comparison of methods.
7.2.5
Alternative Approximation Methods
As discussed at length, the success of the spectral methods depends on the distance
of the solution singularities in the complex plane to the parameter region of interest.
If this distance is small, then perhaps spectral methods are not ideal, i.e. they may
require too many terms to be efficient. An alternative to the spectral methods –
which can apply to approximating many types of unknown functions – is to develop
a parameterized Lanczos-Stieljes method, which takes full advantage of the matrix
structure of the problem. Stieljes method constructs the three-term recurrence coefficients for the polynomials orthogonal with respect to a given measure. Lanczos’
procedure is a basic component of many Krylov-based iterative solvers as well as iterative eigenvalue solvers. It can also be used to estimate the condition number of
a matrix. Such varied utility is rare for a single idea. If this idea can be extended
to the parameterized case, then perhaps some of that varied utility can extend, as
well. The combination of Lanczos and Stieljes may yield a unique interpretation of
the parameterized matrix as some type of weight function, and this interpretation
could have powerful consequences for both rigorous analysis and general intuition.
However, if we remain in the spectral methods realm and accept that we will sometimes need many coefficients for an accurate approximation, we may find some hope
in the recently studied low-rank approximation methods. If the matrix of coefficients
of the polynomial approximation can be legitimately assumed to have some low-rank
structure – which is likely for many problems of interest – then methods such as alternating least squares can be used to directly approximate the matrix of coefficients
in factored form. These methods currently have an active research community, and
a cross-fertilization of ideas could be very fruitful.
CHAPTER 7. SUMMARY AND CONCLUSIONS
7.3
124
Concluding Remarks
The impetus for this dissertation came from a desire to test and compare the varieties
of spectral methods on engineering problems with uncertain model inputs, and the
first steps were to understand and implement the methods as derived and analyzed in
the current literature. A series of simple test problems left us with some unanswered
questions: (i)are the intrusive methods more accurate than the nonintrusive methods?
(ii) how do we know if the order of polynomial approximation is sufficiently large?
and (iii) how might we improve these methods given the scale of the computations?
By posing the parameterized matrix equation, we were able to address these questions for a broad range of applications and set an anchor for analysis. In particular,
the comparable convergence rates led us to favor the nonintrusive methods due to ease
of implementation, except in the multivariate case where the basis functions can be
from non-tensor sets. In the latter case, determining the cost is more complicated. To
check the quality of a given finite term approximation, we derived a practical residual
error estimate that behaves like the true error. As an improvement, we proposed
an anisotropic approximation scheme in the Galerkin framework the uses heuristics
to choose an appropriate basis set for a finite term approximation. In the end, we
applied these analyses to engineering problems of interest to compute measures of
confidence for the model outputs.
Bibliography
[1] AIAA. Guide for the Verification and Validation of Computational Fluid Dynamics Simulations. Number AIAA-G-077-1998. American Institute of Aeronautics
and Astronautics, Reston, VA, 1998.
[2] G. Alefeld and J. Herzberger. Introduction to Interval Computations. Academic
Press, 1983.
[3] G. E. B. Archer, A. Saltelli, and I. M. Sobol. Sensitivity measures, anova-like
techniques and the use of bootstrap. Journal of Statistical Computation and
Simulation, 58(2):99 – 120, 1997.
[4] K. Avrachenkov, N. Litvak, and K.S. Pham. Distribution of pagerank mass
among principle components of the web. In Proceedings of the 5th Workshop on
Algorithms and Models for the Web Graph (WAW2007), pages 16 – 28, 2007.
[5] Ivo Babus̆ka, Manas K. Deb, and J. Tinsley Oden. Solution of stochastic partial
differential equations using galerkin finite element techniques. Computer Methods
in Applied Mechanics and Engineering, 190(48):6359–6372, 2001.
[6] Ivo Babus̆ka, Fabio Nobile, and Raul Tempone. A stochastic collocation method
for elliptic partial differential equations with random input data. SIAM Journal
of Numerical Analysis, 45:1005 – 1034, 2007.
[7] Ivo Babus̆ka, Raul Tempone, and George E. Zouraris. Galerkin finite element
approximations of stochastic elliptic partial differential equations. SIAM Journal
of Numerical Analysis, 42:800 – 825, 2004.
125
BIBLIOGRAPHY
126
[8] S. Balay, K. Buschelman, W.D. Gropp, D. Kaushik, M. Knepley, L. Curfman McInnes, B.F. Smith, and H. Zhang. Petsc: Portable, extensible toolkit
for scientific computation.
[9] Marcel Bieri, Roman Andreev, and Christoph Schwab. Sparse tensor discretization of elliptic spdes. Technical Report 2009-07, ETH-Zurich, 2009.
[10] Marcel Bieri and Christoph Schwab. Sparse high order fem for elliptic spdes.
Computer Methods in Applied Mechanics and Engineering, 198:1149 – 1170,
2009.
[11] S. Bochner and W.T. Martin. Several Complex Variables. Princeton University
Press, 1948.
[12] John P. Boyd. Chebyshev and Fourier Spectral Methods. Dover, 2nd edition,
2001.
[13] John P. Boyd. Large-degree asymptotics and exponential asymptotics for Fourier,
Chebyshev and Hermite coefficients and Fourier transforms. Journal of Engineering Mathematics, 63:355 – 399, 2009.
[14] Susan C. Brenner and L. Ridgway Scott. The Mathematical Theory of Finite
Element Methods. Springer, 2nd edition, 2002.
[15] Angelika Bunse-Gerstner, Ralph Byers, Volker Mehrmann, and Nancy K.
Nichols. Numerical computation of an analytic singular value decomposition
of a matrix valued function. Numerische Mathematik, 60:1–39, 1991.
[16] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral Methods:
Fundamentals in Single Domains. Springer, 2006.
[17] J Chung and J G Nagy. Nonlinear least squares and super resolution. Journal
of Physics: Conference Series, 124:012019 (10pp), 2008.
[18] Paul G. Constantine, Alireza Doostan, and Gianluca Iaccarino. A hybrid collocation/galerkin scheme for convective heat transfer problems with stochastic
BIBLIOGRAPHY
127
boundary conditions. International Journal for Numerical Methods in Engineering, 2009.
[19] Paul G. Constantine and David F. Gleich. Using polynomial chaos to compute
the influence of multiple random surfers in the PageRank model. In Springer,
editor, Proceedings of the 5th Workshop on Algorithms and Models for the Web
Graph, 2007.
[20] Manas K. Deb, Ivo M. Babuska, and J. Tinsley Oden. Solution of stochastic
partial differential equations using galerkin finite element techniques. Computer
Methods in Applied Mechanics and Engineering, 190(48):6359 – 6372, 2001.
[21] Bert J. Debusschere, Habib N. Najm, Philippe P. Pebay, Omar M. Knio, Roger G.
Ghanem, and Olivier P. Le Maı̂etre. Numerical challenges in the use of polynomial chaos representations for stochastic processes. SIAM Journal of Scientific
Computing, 26(2):698 – 719, 2004.
[22] Luca Dieci and Timo Eirola. On smooth decompositions of matrices. SIAM
Journal on Matrix Analysis and Applications, 20(3):800–819, 1999.
[23] A. Doostan and G. Iaccarino. A least-squares approximation of partial differential equations with high-dimensional random inputs. Journal of Computational
Physics, 2009. In Press, Available online 19 March 2009.
[24] Charles F. Dunkl and Yuan Xu. Orthogonal Polynomials in Several Variables.
Cambridge University Press, 2001.
[25] Michael S. Eldred, Brian M. Adams, David M. Gay, Laura P. Swiler, Karen
Haskell, William J. Bohnhoff, John P. Eddy, William E. Hart, Jean-Paul Watson, Patricia D. Hough, and Tammy G. Kolda. DAKOTA, a multilevel parallel
object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis. Technical Report SAND2006-6337,
Sandia National Laboratories, 2006.
BIBLIOGRAPHY
128
[26] O.G. Ernst, C.E. Powell, D.J. Silvester, and E. Ullmann. Efficient solvers for a
linear stochastic Galerkin mixed formulation of diffusion problems with random
data. SIAM Journal on Scientific Computing, 31(2):1424–1447, 2009.
[27] Paulo Fernandes, Brigitte Plateau, and William J. Stewart. Efficient descriptorvector multiplications in stochastic automata networks. J. ACM, 45(3):381–414,
May 1998.
[28] Philipp Frauenfelder, Christoph Schwab, and Radu Alexandru Todor. Finite elements for elliptic problems with stochastic coefficients. Computer Methods in
Applied Mechanics and Engineering, 194(2-5):205 – 228, 2005. Selected papers
from the 11th Conference on The Mathematics of Finite Elements and Applications.
[29] Walter Gautschi. The interplay between classical analyses and (numerical) linear algebra — a tribute to Gene Golub. Electronic Transactions on Numerical
Analysis, 13:119 – 147, 2002.
[30] Walter Gautschi. Orthogonal Polynomials: Computation and Approximation.
Clarendon Press, Oxford, 2004.
[31] R. Ghanem. Hybrid stochastic finite elements and generalized monte carlo simulation. Journal of Applied Mechanics, 65(4):1004–1009, 1998.
[32] R. Ghanem and A. Doostan. On the construction and analysis of stochastic
models: characterization and propagation of the errors associated with limited
data. Journal of Computational Physics, 217:63 – 81, 2006.
[33] Roger Ghanem and Pol Spanos. Stochastic Finite Elements: A Spectral Approach. Springer-Verlag, New York, 1991.
[34] David F. Gleich. Models and Algorithms for PageRank Sensitivity. PhD thesis,
Stanford University, 2009.
[35] G. H. Golub and Charles F. VanLoan. An analysis of the total least squares
problem. SIAM Journal of Numerical Analysis, 17:889 – 893, 1980.
BIBLIOGRAPHY
129
[36] Gene H. Golub and Gerard Meurant. Matrices, moments, and quadrature. Longman, Essex, U.K., 1994.
[37] Gene H. Golub and Charles F. VanLoan. Matrix Computations. The Johns
Hopkins University Press, Baltimore, MD, 3rd edition, 1996.
[38] Gene H. Golub and John H. Welsch. Calculation of Gauss quadrature rules.
Mathematics of Computation, 23, 1969.
[39] David Gottlieb and Steven A. Orszag. Numerical Analysis of Spectral Methods:
Theory and Applications. SIAM, 1977.
[40] Michael Griebel. Sparse grids and related approximation schemes for higher dimensional problems. In Foundations of Computational Mathematics, Santander,
2005.
[41] J. C. Helton and F. J. Davis. Latin hypercube sampling and the propagation
of uncertainty in analyses of complex systems. Reliability Engineering & System
Safety, 81(1):23 – 69, 2003.
[42] J. C. Helton and W. L. Oberkampf. Alternative representations of epistemic
uncertainty. Reliability Engineering & System Safety, 85(1-3):1 – 10, 2004. Alternative Representations of Epistemic Uncertainty.
[43] Jan S. Hesthaven, Siegel Gottlieb, and David Gottlieb. Spectral Methods for
Time Dependent Problems. Cambridge University Press, 2007.
[44] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge
University Press, 1991.
[45] Thomas Y. Hou, Wuan Luo, Boris Rozovskii, and Hao-Min Zhou. Wiener chaos
expansions and numerical solutions of randomly forced equations of fluid mechanics. Journal of Computational Physics, 216(2):687 – 706, 2006.
[46] G. Iaccarino, A. Ooi, P.A. Durbin, and Behnia M. Conjugate heat transfer
predictions in two-dimensional ribbed passages. International Journal of Heat
and Fluid Flow, 23:340 – 345, 2002.
BIBLIOGRAPHY
130
[47] Fluent Inc. FLUENT 6.1 User’s Guide. Fluent Inc., Lebanon, New Hampshire,
2003.
[48] Tosio Kato. Perturbation Theory for Linear Operators. Springer-Verlag, 2nd
edition, 1980.
[49] Patrick Knupp and Kambiz Salari. Verification of Computer Codes in Computational Science and Engineering. Chapman and Hall/CRC, 2003.
[50] Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255 – 282, 1950.
[51] A. N. Langville and C. D. Meyer. Google’s PageRank and Beyond: The Science
of Search. Princeton University Press, 2006.
[52] Olivier P. Le Maı̂etre, Omar M. Kino, Habib N. Najm, and Roger G. Ghanem.
A stochastic projection method for fluid flow. i: basic formulation. J. Comput.
Phys., 173(2):481–511, 2001.
[53] Olivier P. Le Maı̂etre, O. M. Knio, H. N. Najm, and R. G. Ghanem. Uncertainty
propagation using wiener-haar expansions. Journal of Computational Physics,
197(1):28 – 57, 2004.
[54] Olivier P. Le Maı̂etre, Matthew T. Reagan, Habib N. Najm, Roger G. Ghanem,
and Omar M. Knio. A stochastic projection method for fluid flow ii.: random
process. J. Comput. Phys., 181(1):9–44, 2002.
[55] Richard B. Lehoucq, Danny C. Sorensen, and C. Yang. ARPACK User’s Guide:
Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi
Methods. SIAM, 1998.
[56] Yung-Ta Li, Zhaojun Bai, and Yangfeng Su. A two-directional Arnoldi process
and its application to parametric model order reduction. Journal of Computational and Applied Mathematics, 226(1):10 – 21, 2009. Special Issue: The
BIBLIOGRAPHY
131
First International Conference on Numerical Algebra and Scientific Computing
(NASC06).
[57] Ruixue Liu and Art B. Owen. Estimating mean dimensionality of analysis of variance decompositions. Journal of the American Statistical Association, 101:712–
721, 2006.
[58] Michel Loève. Probability Theory II. Springer-Verlag, 1978.
[59] Lyall M. Heat transfer from low aspect ratio pin fins. Master’s thesis, Virginia
Polytechnic Insititute and State University, 2006.
[60] Najork M.A, H. Zaragoza, and M.J. Taylor. Hits on the web: how does it compare? In Proceedings of the 30th annual international ACM SIGIR conference
on Research and Development in information retrieal (SIGIR2007), pages 471 –
478, 2007.
[61] L. Mathelin, M. Y. Hussaini, and T. A. Zang. Stochastic approaches to uncertainty quantification in cfd simulations. Numerical Algorithms, 38(1):209 – 236,
2005.
[62] Hermann G. Matthies and Andreas Keese. Galerkin methods for linear and
nonlinear elliptic stochastic partial differential equations. Computer Methods in
Applied Mechanics and Engineering, 194(12-16):1295 – 1331, 2005. Special Issue
on Computational Methods in Stochastic Mechanics and Reliability Analysis.
[63] F.R. Menter. Two-equation eddy-viscosity turbulence models for engineering
applications. AIAA Journal, 32(8):1598 – 1605, 1994.
[64] Carl D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2000.
[65] Cleve B. Moler. Numerical Computing with Matlab. SIAM, January 2004.
[66] Prasanth B. Nair and Andrew J. Keane. Stochastic reduced basis methods.
AIAA Journal, 40:1653 – 1664, 2002.
BIBLIOGRAPHY
132
[67] Fabio Nobile, Raul Tempone, and Clayton G. Webster. An anisotropic sparse
grid stochastic collocation method for partial differential equations with random
input data. SIAM Journal of Numerical Analysis, 46:2411–2442, 2008.
[68] Fabio Nobile, Raul Tempone, and Clayton G. Webster. A sparse grid stochastic
collocation method for partial differential equations with random input data.
SIAM Journal of Numerical Analysis, 46:2309 – 2345, 2008.
[69] Anthony Nouy. A generalized spectral decomposition technique to solve a class
of linear stochastic partial differential equations. Computer Methods in Applied
Mechanics and Engineering, 196(45-48):4521 – 4537, 2007.
[70] William L Oberkampf and Timothy G Trucano. Verification and validation
in computational fluid dynamics. Technical Report SAND2002-0529, Sandia
National Laboratories, 2002.
[71] William L Oberkampf, Timothy G Trucano, and Charles Hirsch. Verification,
validation, and predictive capability in computational engineering and physics.
Applied Mechanics Reviews, 57(5):345–384, 2004.
[72] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking:
Bringing order to the web. Technical Report 1999-66, Stanford University, 1999.
[73] C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear
equations. SIAM Journal of Numerical Analysis, 12(4):617 – 629, 1975.
[74] M.R. Pellissetti and R. Ghanem. Iterative solution of systems of linear equations
arising in the context of stochastic finite elements. Advances in Engineering
Software, 31:607 – 616, 2000.
[75] R. Pečnik, P. Constantine, F. Ham, and G. Iaccarino. A probabilistic framework for high-speed flow simulations. Center for Turbulence Research – Annual
Research Briefs, pages 3 – 17, 2008.
BIBLIOGRAPHY
133
[76] Catherine E. Powell and Howard C. Elman. Block-diagonal preconditioning for
spectral stochastic finite-element systems. IMA Journal of Numerical Analysis,
Advance Access, 2008.
[77] C.E. Powell. Personal communication., 2009.
[78] C. Prud’homme, D. V. Rovas, K. Veroy, L. Machiels, Y. Maday, A. T. Patera,
and G. Turinici. Reliable real-time solution of parametrized partial differential
equations: Reduced-basis output bound methods. Journal of Fluids Engineering,
124(1):70–80, 2002.
[79] Theodore J. Rivlin. An Introduction to the Approximation of Functions. Blaisdell
Publishing Company, 1969.
[80] Yousef Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2003.
[81] Ji-Guang Sun. Eigenvalues and eigenvectors of a matrix dependent on several
parameters. Journal of Computational Mathematics, 3:351–364, 1985.
[82] Gabor Szegö. Orthogonal Polynomials. American Mathematical Society, Providence, RI, 1939.
[83] Daniel M. Tartakovsky and Dongbin Xiu. Stochastic analysis of transport in
tubes with rough walls. J. Comput. Phys., 217(1):248–259, 2006.
[84] Ray S. Tuminaro, Mike Heroux, Scott A. Hutchinson, and John N. Shadid.
Official aztec user’s guide 1 version 2.1, 1999.
[85] John von Neumann and H. H. Goldstine. Numerical inverting of matrices of high
order. Bulletin of the American Mathematical Society, 53:1021 – 1099, 1947.
[86] Q. Wang, P. Moin, and G. Iaccarino. A high-order multi-variate approximation
scheme for arbitrary data sets. submitted, 2009.
[87] Norbert Wiener. The homogeneous chaos. American Journal of Mathematics,
60(4):897–936, 1938.
BIBLIOGRAPHY
134
[88] J. H. Wilkinson. Modern error analysis. SIAM Review, 13(4):548–568, 1971.
[89] Dongbin Xiu and Jan S. Hesthaven. High order collocation methods for differential equations with random inputs. SIAM Journal of Scientific Computing,
27:1118 – 1139, 2005.
[90] Dongbin Xiu and George Karniadakis. The Wiener-Askey polynomial chaos for
stochastic differential equations. SIAM Journal of Scientific Computing, 24:619
– 644, 2002.
Download