SPECTRAL METHODS FOR PARAMETERIZED MATRIX EQUATIONS A DISSERTATION SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL AND MATHEMATICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Paul G. Constantine August 2009 c Copyright by Paul G. Constantine 2009 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Gianluca Iaccarino) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (George Papanicolaou) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Parviz Moin) Approved for the University Committee on Graduate Studies. iii Preface I have always been fascinated by the process of acquiring and validating knowledge, i.e. how we know what we know. In earlier years, this fascination resided in philosophical and religious inquiries. But somehow the metrics for quantifying the ever present uncertainties were too subjective to satisfy my cravings. This dissatisfaction ultimately persuaded me toward a course of study in mathematics – a field with more rigorous concepts for objective measurements. In modern computers, we have a powerful technology that aides our search for knowledge and ideally provides the precise path that brought us to each new vista of understanding. However, the epistemic considerations of these paths are still largely unknown. Questions like how much can we trust the results of a simulation? often do not have a straightforward answer. These challenging questions are the core of the field of uncertainty quantification, and it is these questions that attracted my attention; it was a natural fit for my fundamental fascinations with knowledge. This dissertation and its mathematical manipulations represent a nudge in the direction of addressing the questions in uncertainty quantification for scientific computing. iv Abstract In this age of parallel, high performance computing, simulation of complex physical systems for engineering computations have become routine. However, the question arises at the end of such a computation: How much can we trust these results? Determining appropriate measures of confidence falls in the realm of uncertainty quantification, where the goal is to quantify the variability in the output of a physical model given uncertainty in the model inputs. The computational procedures for computing these measures (i.e. statistics) often reduce to solving an appropriate matrix equation whose inputs depend on a set of parameters. In this work, we present the problem of solving a system of linear equations where the coefficient matrix and the right hand side are parameterized by a set of independent variables. Each parameterizing variable has its own range, and we assume a separable weight function on the product space induced by the variables. By assuming that the elements of the matrix and right hand side depend analytically on the parameters (i.e. can be represented in a power series expansion) and by requiring that the matrix be non-singular for every parameter value, we ensure the existence of a unique solution. We present a class of multivariate polynomial approximation methods – known in numerical PDE communities as spectral methods – for approximating the vectorvalued function that satisfies the parameterized system of equations at each point in the parameter space. These approximations converge rapidly to the true solution in a mean-squared sense as the degree of polynomial increases, and they provide flexible and robust methods for computation. We derive rigorous asymptotic error estimates for a spectral Galerkin and an interpolating pseudospectral method, as well as a more v practical a posteriori residual error estimate. We explore the fascinating connections between these two classes of methods yielding conditions under which both give the same approximation. Where the methods differ, we provide insightful perspectives into the discrepancies based on the symmetric, tridiagonal Jacobi matrices associated with the weight function. Unfortunately, the standard multivariate spectral methods suffer from the socalled curse of dimensionality, i.e. as the number of parameters increases, the work required to compute the approximation increases exponentially. To combat this curse, we exploit the flexibility of the choice of multivariate basis polynomials in a Galerkin framework to construct an efficient representation that takes advantage of any anisotropic dependence the solution may exhibit with respect to different parameters. Despite the savings from the anisotropic approximation, the size of systems to be solved for the approximation remains daunting. We therefore offer strategies for large-scale problems based on a unique factorization of the Galerkin system matrix. The factorization allows for straightforward implementation of the method, as well as surprisingly useful tools for further analysis. To complement the analysis, we demonstrate the power and efficiency of the spectral methods with a series of examples including a univariate example from a PageRank model for ranking nodes in a graph and two variants of a conjugate heat transfer problem with uncertain flow conditions. vi Acknowledgements First and foremost, I would like to recognize the support and guidance of my adviser, Professor Gianluca Iaccarino. From the outset of our time working together, he gave me the freedom to explore my own peculiar inklings while judiciously directing me towards the important questions for the broader research community in engineering. His words from class to casual conversation shaped both my research ideas and my professional ambitions. I am exceedingly grateful to call him a mentor, a colleague, and a friend. Secondly, I would like to thank the rest of my reading committee, which included Professor George Papanicolaou and Professor Parviz Moin. Their comments and direction – wrapped in years of experience – influenced not only the words in this dissertation but also my general views and opinions on academic research. I would like to thank the remainder of my oral defense committee, Professor James Lambers and Professor Oliver Fringer, for taking time to participate in my defense, for asking insightful questions, and for offering constructive feedback. I would like to acknowledge the financial support from the following sources: The Department of Energy’s PSAAP and ASC programs, The Franklin P. and Caroline M. Johnson Fellowship, the course assistantships at the Institute for Computational and Mathematical Engineering (ICME), and the federal Stafford loans. These generous gifts and loans from private, public, and university sources made possible my education and this research. I would like to express my gratitude to my collaborators – specifically to: • Doctor David Gleich, whose brilliant mind, inspiring work ethic, and steadfast friendship have motivated me throughout our time together in graduate school. vii Our countless whiteboard scribble sessions and late night Gchats took many of the ideas in this dissertation from gut feelings to formalisms. • Doctor Qiqi Wang, whose remarkable ability to explain complex concepts through simple examples clarified so many sticky points of my research. • Doctor Alireza Doostan, whose experience and guidance were an invaluable contribution to my education in the UQ research community and its methods. I would also like to thank the uncertainty quantification group at Stanford for listening to my rants and providing feedback at our weekly pizza lunches. Thanks to all the experienced researchers who have contributed to my work via emails, conversations, and feedback at conferences including: Michael Eldred, Dongbin Xiu, Roger Ghanem, Clayton Webster, Raul Tempone, Anthony Nuoy, Olivier LeMaitre, Didier Lucor, Habib Najm, Youssef Marzouk, Alex Loeven, Jeroen Witteveen, and Marcel Bieri. I would like to thank all the associates of ICME including: • Directors Professor Peter Glynn and Professor Walter Murray, for their leadership and guidance, • Student Services Coordinator Indira Choudhury, for making sure I signed all of my required forms on time, • IT Services Seth Tornborg and Brian Tempero, for keeping my computer secure and updated. Thanks to the faculty associated with ICME, including: • Professor Michael Saunders, for his willingness to hear my ideas and offer constructive feedback, • Professor Amin Saberi, for supporting my transition from the M.S. program to the Ph.D. program, • Professor Amir Dembo, for an outstanding course sequence in probability theory, viii • the late Professor Gene Golub, whose contributions to the field of numerical analysis cannot be overstated and whose fingerprints are subtly scattered throughout my work. A heartfelt thanks to my officemates Doctor Michael Atkinson and Doctor Jeremy Kozdon, who – along with Doctor Gleich – created a stimulating environment for all manners of work and play. Finally, I would like to thank my parents and family for their limitless encouragement and outstanding genetic material. ix Contents Preface iv Abstract v Acknowledgements vii 1 Introduction 1 1.1 Verification, Validation, and Uncertainty Quantification . . . . . . . . 2 1.2 Parameterized Matrix Equations . . . . . . . . . . . . . . . . . . . . . 4 1.3 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 Anisotropic Approximation . . . . . . . . . . . . . . . . . . . 11 1.3.2 Large-scale Computations . . . . . . . . . . . . . . . . . . . . 13 1.4 Summary of Contributions and Future Work . . . . . . . . . . . . . . 15 2 Parameterized Matrix Equations 19 2.1 Problem Definition and Notation . . . . . . . . . . . . . . . . . . . . 19 2.2 Example – An Elliptic PDE with Random Coefficients . . . . . . . . 20 2.3 A Discussion of Singularities . . . . . . . . . . . . . . . . . . . . . . . 22 3 Spectral Methods — Univariate Approximation 25 3.1 Orthogonal Polynomials and Gaussian Quadrature . . . . . . . . . . . 25 3.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Spectral Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Pseudospectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . 29 x 3.5 Spectral Galerkin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6 A Brief Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.7 Connections Between Pseudospectral and Galerkin . . . . . . . . . . 34 3.8 Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.9 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.9.1 A 2 × 2 Parameterized Matrix Equation . . . . . . . . . . . . 44 A Parameterized Second Order ODE . . . . . . . . . . . . . . 45 3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.11 Application — PageRank . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.9.2 4 Spectral Methods — Multivariate Approximation 51 4.1 Tensor Product Extensions . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Non-Tensor Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 56 4.3 A Multivariate Spectral Galerkin Method . . . . . . . . . . . . . . . . 57 2 4.4 L Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.4.1 ANOVA Decomposition . . . . . . . . . . . . . . . . . . . . . 59 4.4.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.3 Connections Between ANOVA and Fourier Series . . . . . . . 61 4.5 Developing A Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.1 Incrementing a Basis Set . . . . . . . . . . . . . . . . . . . . . 62 4.5.2 Generating a Score Vector . . . . . . . . . . . . . . . . . . . . 65 4.5.3 Computing The Dimension Weights . . . . . . . . . . . . . . . 67 4.5.4 Estimating the Curvature Parameters . . . . . . . . . . . . . . 69 4.5.5 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.6 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6.1 Anisotropic Parameter Dependence . . . . . . . . . . . . . . . 72 4.6.2 Small Interaction Effects . . . . . . . . . . . . . . . . . . . . . 72 4.6.3 Large Interaction Effects . . . . . . . . . . . . . . . . . . . . . 73 4.6.4 High Dimensional Problem . . . . . . . . . . . . . . . . . . . . 74 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 xi 5 Strategies for Large-Scale Problems 80 5.1 A Weakly Intrusive Paradigm . . . . . . . . . . . . . . . . . . . . . . 82 5.2 Galerkin with Numerical Integration (G-NI) . . . . . . . . . . . . . . 84 5.2.1 A Useful Decomposition . . . . . . . . . . . . . . . . . . . . . 85 5.2.2 Eigenvalue Bounds . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2.3 A Least-Squares Interpretation . . . . . . . . . . . . . . . . . 89 5.2.4 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.5 Preconditioning Strategies . . . . . . . . . . . . . . . . . . . . 91 5.3 Parameterized Matrix Package — A MATLAB Suite . . . . . . . . . 92 5.4 Application — Heat Transfer with Uncertain Material Properties . . 93 5.4.1 Problem Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.2 Solution Method . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6 An Example from Conjugate Heat Transfer 99 6.1 Problem Description and Motivation . . . . . . . . . . . . . . . . . . 100 6.1.1 Mathematical Formulation . . . . . . . . . . . . . . . . . . . . 102 6.1.2 Uncertainty sources . . . . . . . . . . . . . . . . . . . . . . . . 103 6.1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2 A Hybrid Propagation Scheme . . . . . . . . . . . . . . . . . . . . . . 105 6.2.1 Galerkin method for the energy equation . . . . . . . . . . . . 106 6.2.2 Collocation method for the modified system . . . . . . . . . . 107 6.2.3 Analysis of computational cost . . . . . . . . . . . . . . . . . . 108 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3.1 Numerical convergence and verification . . . . . . . . . . . . . 109 6.3.2 A physical interpretation . . . . . . . . . . . . . . . . . . . . . 110 7 Summary and Conclusions 114 7.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.2.1 Improved Heuristics for Choosing a Polynomial Basis . . . . . 121 xii 7.2.2 Locating and Exploiting Singularities . . . . . . . . . . . . . . 121 7.2.3 Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.2.4 Software and Benchmark Problems . . . . . . . . . . . . . . . 122 7.2.5 Alternative Approximation Methods . . . . . . . . . . . . . . 123 7.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Bibliography 125 xiii List of Tables 5.1 The weights computed with the ANOVA-based method for choosing an efficient anisotropic basis. . . . . . . . . . . . . . . . . . . . . . . . xiv 95 List of Figures 2.1 Plotting the response surface x0 (s) that solves equation (3.66) for different values of ε. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1 The convergence of the spectral methods applied to equation (3.66). The figure on the left shows plots the L2 error as the order of approximation increases, and the figure on the right plots the residual error estimate. The stairstep behavior relates to the fact that x0 (s) and x1 (s) are odd functions over [−1, 1]. . . . . . . . . . . . . . . . . . . . 45 3.2 The convergence of the residual error estimate for the Galerkin and pseudospectral approximations applied to the parameterized matrix equation (3.73). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 The x-axis counts the number of points in the Gaussian quadrature rule for the expectation and standard deviation of PageRank. This is equivalent to the number of basis functions in the pseudospectral approximation. The y-axis measures the difference between between the approximation and the exact solution. Solid circles correspond to expectation and plus signs correspond to standard deviation. The different colors correspond to the following Beta distributions for α: β(2, 16, 0, 1) – blue, β(0, 0, 0.6, 0.9) – salmon, β(1, 1, 0.1, 0.9) – green, β(−0.5, −0.5, 0.2, 0.7) – red, where β(a, b, l, r) are the signifies distri- bution parameters a and b, and endpoints l and r. . . . . . . . . . . . xv 50 4.1 For d = 2, the shaded squares represent the included multi-indices for the various basis sets with n = 10. The weighted index set with no curvature has weights 1 and 2. The weighted index set with curvature has curvature parameter p = 0.7. . . . . . . . . . . . . . . . . . . . . 65 4.2 Tensor product pseudospectral coefficients (color) of a solution with anisotropic parameter dependence along with the included coefficients (black & white) of the non-tensor weighted Galerkin approximation. . 73 4.3 Tensor product pseudospectral coefficients (color) of a solution with weak interaction effects along with the included coefficients (black & white) of the non-tensor weighted Galerkin approximation. . . . . . . 74 4.4 Tensor product pseudospectral coefficients (color) of a solution with strong interaction effects along with the included coefficients (black & white) of the non-tensor weighted Galerkin approximation. . . . . . . 75 4.5 Decay of the coefficients associated with the main effects for the high dimensional elliptic problem computed with a spectral Galerkin method with a full polynomial basis. . . . . . . . . . . . . . . . . . . . . . . . 76 4.6 Decay of the coefficients associated with the main effects for the high dimensional elliptic problem computed with a spectral Galerkin method with an ANOVA-based weighted basis. . . . . . . . . . . . . . . . . . 77 4.7 Convergence of the weights in the ANOVA-based weighted scheme as n increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.8 Convergence of the residual error estimate for both the ANOVA-based anisotropic basis and the full polynomial basis plotted against the number of basis elements in each approximation. . . . . . . . . . . . . . . 79 5.1 Mesh used to compute temperature distribution. . . . . . . . . . . . . 94 5.2 The number of terms as n increases in the weighted, ANOVA-based anisotropic basis compared to the number of terms in the full polynomial basis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.3 Plotting the residual error estimate of the G-NI approximation with the weighted, ANOVA-based polynomial basis. . . . . . . . . . . . . . xvi 96 5.4 The expecation (above) and variance (below) of the temperature field φ over the domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 A turbine engine cooling system with a pin array cooling system. 97 . . 101 6.2 Computational mesh for two-dimensional cylinder problem. . . . . . . 102 6.3 Schematic of uncertain inflow conditions. The arrows represent the stochastic inflow conditions, and the shading represents the heat flux on the cylinder wall. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.4 Convergence of the variance of the Galerkin approximation TN as the number of terms in the expansion N increases. Each line represents the convergence for each quadrature rule. The convergence tolerance is 10−5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.5 Convergence of the variance of the Galerkin approximation T4 as the number of points in the quadrature rule M increases. The convergence tolerance is 10−4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.6 Approximate expectation as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo. . . . . . . . . . 112 6.7 Approximate variance as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo. . . . . . . . . . . . . 113 6.8 Approximate conditional variance at s1 = s2 = 0 as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 xvii Chapter 1 Introduction The advent of parallel, high performance computing has brought with it the potential for high fidelity simulation of physics and engineering systems. The last fifty years has seen an explosion in computing power and a comparable increase in computational methods for approximating the solution of differential equations and other models that describe complex physical phenomena. One example is the Department of Energy’s Advanced Simulation and Computing (ASC) program, whose purpose is to develop simulation capabilities that assess the performance, safety, and reliability of the nation’s nuclear weapons stockpile. Programs such as ASC have driven research in computational engineering subdisciplines from fundamental algorithm development to computer architecture. In general, the goal of such large-scale simulation is to approximate an output quantity of interest given a model, such as a set of coupled partial differential equations, and the problem data, such as forcing terms, model parameters, and initial/boundary conditions. Once the model has been discretized, the code written and compiled, the problem data chosen, and the simulation executed, an intriguing question surrounds the computed output: How much can we trust these results? The trend in computational engineering is to treat numerical simulation like an experiment — one that can explore models and parameter values that are infeasible or prohibitively expensive for physical experiments. But experimental data typically include a measure of uncertainty related to measurement errors or inherent variability 1 CHAPTER 1. INTRODUCTION 2 of the observed system. This uncertainty measure provides a level of confidence for the results of the experiment. Is there an analog to this measure of confidence for numerical simulation? Can we generate something like a confidence interval for the output of the simulation? 1.1 Verification, Validation, and Uncertainty Quantification When the United States signed the Comprehensive Nuclear-Test-Ban Treaty in 1996, these sorts of questions gained prominence in the national labs as weapons certification procedures shifted from test-based methodologies to simulation-based methodologies. Since then, the questions have been refined and expanded, a jargon has emerged along with a set of mathematical tools, and research in verification and validation (V&V) and uncertainty quantification (UQ) was birthed [71]. Moreover, other simulationbased research communities — particularly researchers in computational mechanics and computational fluid dynamics — realized the importance of ascribing confidence measures to simulation results, and they have become very active in the V&V and UQ studies [70]. The oft-cited AIAA Guide [1] defines the terms verification and validation as follows: • Verification: The process of determining that a model implementation accu- rately represents the developer’s conceptual description of the model and the solution to the model. • Validation: The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. A more colloquial definition with an appealing mnemonic reads: Verification tries to answer the question, are we solving the equations correctly? while validation attempts to answer, are we solving the correct equations? In practice, verification involves CHAPTER 1. INTRODUCTION 3 ensuring that simulation codes are running properly via software development best practices, debugging, and test cases that compare results to a known analytical or manufactured solution [49]. Validation procedures are not nearly as well defined since there are only guidelines and heuristics for characterizing the comparison of simulation results to experimental data. However, one undeniably critical component in the validation process is quantifying the uncertainties associated with a given mathematical model. Before one can quantitatively compare simulation to experiment, he must understand the effect of input uncertainties on the computed output. An input uncertainty may be a range or probability distribution associated with a model parameter, or it may be a spatially varying random process associated with material properties or boundary conditions. Other prevalent examples of input uncertainties include geometric inconsistencies from manufacturing processes and noise associated with model forcing terms. Exploring the decisions involved in representing these various types of uncertainty falls outside the scope of this work; we mention the following terms merely to provide context. The UQ community has broadly divided uncertainties into two categories: (i) Aleatory uncertainty describes variability that is inherent in the system; it is irreducible in the sense that further measurements will not reduce the variability. A natural choice for modeling this type of uncertainty is with probability models, which assign density functions to input quantities. (ii) Epistemic uncertainty, on the other hand, derives from lack of knowledge; it is reducible in the sense that further measurements add to existing knowledge. This is sometimes called model form uncertainty, since a mathematical model can be tuned or modified to match new observations. A debate currently exists over whether or not probability theory contains the appropriate set of tools for representing epistemic uncertainty. Alternative methods have been proposed including evidence theory and interval arithmetic [42]. We will assume the uncertainties of interest can be adequately represented by a finite — though potentially large — set of parameters. If the uncertainties originate from model parameters, then this assumption is entirely natural. If the model calls for a spatially or temporally varying random process to represent, for instance, CHAPTER 1. INTRODUCTION 4 boundary conditions or material properties, then this can often be approximated by a truncated parametric series such as the Karhunen-Loeve expansion [58]. (We will not address the error made by truncating the infinite series, but we assume that this modeling choice was justified by other considerations.) We refer to the valid range of the parameters as the parameter space. This space immediately becomes a component of the domain of the model output due to the output’s functional relationship to the parameters. In general, the parameters representing the uncertainty may have some underlying correlation structure depending on how they are derived. Unfortunately, the approximation techniques we consider for exploring the functional relationship require the parameters to be mutually independent. Addressing any correlation amongst the parameters is still an active area of research and falls outside the scope of this work. 1.2 Parameterized Matrix Equations In recent literature, research in UQ has been closely linked to work in numerical methods for partial differential equations with stochastic input quantities. One of the first to examine such models was Ghanem [33], who developed his spectral stochastic finite element method in the context of mechanics models with stochastic material properties, which was some of the first work to examine such models. Spurred by these developments, many authors began investigating methods for elliptic equations with stochastic coefficients [5, 7, 62, 28], while others extended this work to more general fluid flow models [61, 52, 54, 45, 26]. One finds a common recipe in the elliptic problems and steady flow formulations: First, the problem is formulated on a tensor product of the spatial domain and an abstract probability space, and uncertain problem data are modeled as infinite dimensional stochastic processes. Next, the stochastic processes are approximated by a multivariate function of a finite set of parameters, where the parameters are interpreted as random variables. This yields a perturbed version of the original problem, and questions of well-posedness must be reconsidered. Each of the parameters induces a coordinate direction in the tensor product domain, which implies that the sought-after solution is a multivariate CHAPTER 1. INTRODUCTION 5 function of both the spatial coordinates and the parameters. Typically the first step in devising a computational method is to perform a standard discretization, such as finite element or finite difference, in the spatial domain. For linear problems, the spatial discretization results in a linear system of equations for the degrees of freedom. If there were no parametric dependence in the problem, then this linear system could be solved with any standard solver. However, in contrast to deterministic models, the coefficient matrix and the right hand side of the linear system of equations derived from the stochastic problem depend on the values of the input parameters. Thus, we have arrived at a parameterized matrix equation. At this point, we have not completed the recipe for the full discretization of the stochastic problem. Nevertheless, we will pause here and use this as a primary motivation for a general study of parameterized matrix equations. The primacy of matrix equations was recognized very early in the development of scientific computing by distinguished researchers such as von Neumann [85] and Golub [37]. They realized that the solution to many science and engineering models could be approximated by solving an appropriate matrix equation or sequence of matrix equations. Prominent examples include linear or linearized differential equation models, general optimization methods, and least-squares model fitting. As the focus now shifts from computing the solution to quantifying the uncertainty in the computed solution, the natural choice for analyzing proposed computational methods is to incorporate the uncertainty directly into the matrix equations. To be sure, this is not a substitute for a priori analysis of the effects of uncertainty on a given model, but it does provide a touchstone for algorithm development. There are many models used for expressing uncertainty in a matrix equation. We are loose with semantics here, since these models were developed to address errors, perturbations, or noise in the problem data; the interpretation may be different, but the mathematics translates remarkably well. The first such model comes from investigations into round-off error in matrix computations [88]. In this setting, each stored floating point number is assumed to be perturbed by something on the order of machine precision from the intended value. Then one can examine the effects of this perturbation — which is typically assumed to be linear — on the computed CHAPTER 1. INTRODUCTION 6 solution. Kato [48] tackles a more general perturbation model where the matrix operator depends analytically on a parameter. His focus is primarily on the eigenvalue problem, but some of his results will be quite useful for our analysis. Sun [81] extends this work to the case of analytic dependence on multiple parameters. In the total least squares model [35], the goal is to find the solution that minimizes the effects of the errors assumed to be in both the matrix and right hand side. However, this approach tells nothing about the effects of the errors, only that those effects are minimized in the computed approximation. Another popular model for representing uncertainty is the interval matrix equation [2]. In this model, each element of the matrix and right hand side is prescribed by two endpoints of an interval. The goal is then to construct a vector of intervals that bounds all possible solutions to the linear system of interval equations. To demonstrate their utility, we mention some other examples of parameterized matrix equations not connected to discretizing differential equations with stochastic inputs. The PageRank model [72] is one example that we will examine in detail in Chapter 3. It computes a vector that ranks the nodes in a graph according to the link structure. This model depends on a parameter that represents the behavior or an idealized random surfer. The paths taken by the random surfer constitute a Markov chain on the graph nodes, and the ranking vector can be interpreted as the stationary distribution of this chain. However, the ranking depends on the value of the parameter. Thus we examine a modification where we assume a distribution for the parameter and compute statistics of the ranking over the range of the parameter. Another parameterized matrix equation is found in models for nonlinear image deblurring [17], where the parametric dependence in the matrix represents blurring of the image pixels. Some models for electronic circuit design beget parameterized systems of equations [56], where the parameters represent varying geometric configurations of the chip. Finally, we mention a recent multivariate rational interpolation scheme [86] where each evaluation of the interpolant requires the solution of an optimization problem to minimize the error. This minimization problem can be formulated as a matrix equation such that the elements of the matrix depend on the point where the interpolant is evaluated. CHAPTER 1. INTRODUCTION 7 While parameterized matrix equations occur in a host of unrelated computational models, we know of no systematic treatment of them as a proper subject. This is likely because many of the analysis results are straightforward to derive, such as the fact that each component of the solution is a rational function of the parameters. Such results, however, are immensely important when attempting to derive computational methods for approximating the vector-valued solution. For example, with this information, we now have a concrete approximation question: what is the best way to approximate multivariate rational functions? When we turn our attention to spectral methods, we can then ask how well do polynomials approximate rational functions? These simple questions have not emerged in the UQ literature. We address them by offering the parameterized matrix equation as a general model problem for both analysis and algorithm development. In Chapter 2, we present an exposition of parameterized matrix equations and make statements characterizing the solution. The key assumption that we will make is that the matrix is nonsingular at all points in the parameter space; we will also discuss what happens when this assumption is not satisfied. We will also assume that the elements of the matrix and right hand side depend analytically on the parameters, i.e. each element has a convergent power series in some region that contains the parameter space. These two assumptions will imply that the solution is also an analytic function of the parameters. One question that immediately arises is what specifically do we want to compute? Some applications ask only for bounds on the solution, while others need an approximate functional relationship — or response surface — with respect to the parameters for surrogate modeling. In the probabilistic context, we would like to estimate the expectation and variance, or the probability that the solution exceeds some critical threshold. These statistics can be formulated as high dimensional integrals over the parameter space, which are intimately tied to the approximation methods. When refer to “solving” the parameterized system, we mean computing an explicit approximation of the vector-valued function that satisfies the parameterized linear system of equations for all values of the parameters. This is the most comprehensive of the CHAPTER 1. INTRODUCTION 8 possible computations, since many statistics can be estimated from this approximation. 1.3 Spectral Methods We examine a class of polynomial approximation methods known in the context of numerical PDEs as spectral methods [16, 43, 12, 39]. In their most basic form, these methods are characterized by a finite degree global polynomial approximation to the function of interest, whether it is the solution to a partial differential equation or the solution of a parameterized matrix equation. We will not delve into the long, rich history of polynomial approximation, but we will present the necessary background theory — now considered classical — in Chapter 3, including relevant facts about orthogonal polynomials, Fourier series, Gaussian quadrature, and Lagrange interpolation. Spectral methods rose to prominence as numerical methods for PDEs in the 70s and 80s, and their use is now widespread. Different varieties are characterized by the choice of basis functions — typically trigonometric or algebraic orthogonal polynomials — and the method for treating boundary conditions. Their popularity results from the so-called exponential (i.e. geometric) asymptotic convergence rate in the mean-squared norm as the order of polynomial approximation increases for infinitely smooth solutions. Even solutions with a finite number of continuous derivatives typically enjoy a high algebraic rate of convergence, which is dictated by how many derivatives are continuous. This rapid convergence was the primary attractor to spectral methods for researchers in UQ who were working with differential equations with random inputs. Before their introduction, the standard was to employ Monte Carlo (MC) methods to sample the inputs, compute the output for each sample, and aggregate statistics of the output [41]. In fact, this is still the most widely used method in practice due to its robustness and ease of implementation. However, the MC methods suffer from a dreadfully slow convergence rate proportional to the inverse of the square root of the number of samples. And if each sample evaluation is expensive — such as the solution CHAPTER 1. INTRODUCTION 9 of a PDE — then obtaining hundreds of thousands of samples may be entirely infeasible. Thus, the initial applications of spectral methods showed orders-of-magnitude reduction in the work needed to estimate statistics with comparable accuracy [90]. Such results spurred interest in applying spectral methods to differential equations with stochastic (i.e. parameterized) inputs. Ghanem was one of the first on this path [33]. He introduced spectral methods within his spectral stochastic finite element method under the name polynomial chaos expansion, which uses a basis of multivariate Hermite polynomials to span the set of square-integrable functions on the parameter space. He justified this terminology by referring back to the work of Wiener [87], whose chaos expansions were an extension of Hilbert theory to infinite dimensional stochastic processes. The polynomial chaos methods come in intrusive and non-intrusive flavors: The intrusive variety is a Galerkin projection method that requires the solution of a large, coupled system of equations to compute the coefficients of the expansion, whereas the nonintrusive variety uses existing deterministic codes — in the way MC methods do — to compute the coefficients of a pseudo-projection. Xiu and Karniadakis [90] extended Ghanem’s method to basis functions from the Askey family of orthogonal polynomials and dubbed it the generalized polynomial chaos. Around the same time, Deb, et al. [20] introduced a method for elliptic problems with stochastic coefficients and labeled it the stochastic Galerkin method ; the p version of this has strong connections to spectral methods. This work was done in a Galerkin framework, where one seeks a finite dimensional approximation such that the residual of the equation is orthogonal to the approximation space. Paralleling the development of spectral methods for PDEs, the next advance was the introduction of spectral collocation methods. Xiu and Hesthaven [89] were the first to introduce the collocation idea in this context, and they were quickly followed by Babuska, et al. who popularized the phrase stochastic collocation [6]. Since the problems lack a differential operator in the parameter space, the collocation methods reduce to Lagrange polynomial interpolation on a set of quadrature points, which can be implemented non-intrusively while retaining an asymptotic convergence rate similar to the Galerkin methods. Instead of following the nomenclature from this body of recent literature, we prefer CHAPTER 1. INTRODUCTION 10 the terminology from the spectral methods communities, which is well-established amongst the larger numerical analysis and engineering communities. This amounts to avoiding the term “stochastic” when describing the approximation methods. In this way we hope to connect the probabilistic interpretations to existing analyses. Contrary to what the flurry of research activity may suggest, spectral methods have significant drawbacks. Since these methods produce global approximations, they have trouble resolving solutions with local behavior, such as rapid oscillations or discontinuities. Also, they are not well-suited for the complex geometries that arise in most engineering applications. But these particular drawbacks are alleviated for the parameterized matrix equation, since (i) the assumptions of non-singularity and analytic parameter dependence yield infinitely smooth solutions, and (ii) the tensor product parameter space implies that the domains are typically hyperrectangles. From this perspective, the spectral methods are ideal. Moreover, the solution of the parameterized matrix equation has no boundary conditions to satisfy, which dramatically simplifies implementation. In Chapter 3, we derive a spectral Galerkin method and an interpolatory pseudospectral method for systems that depend on a single parameter. Using classical theory, we derive error estimates showing that both methods have a similar asymptotic rate of geometric convergence, and this rate is related to the size of the region of analyticity. For many problems in practice, the region of analyticity is determined by the nearest point outside the domain where the parameterized matrix is singular. Often this point is related to some existence or stability criteria for the underlying model, e.g. the point where the parameterized coefficients of an elliptic PDE reach zero. If it is close to the boundary of the domain, then the convergence rate can degrade considerably. We will see in the derivations that the Galerkin and pseudospectral varieties differ dramatically in implementation. Computing the Galerkin coefficients requires the solution of a constant linear system of equations that is n times larger than the original parameterized system, where n is the order of the approximation. In contrast, the nth order pseudospectral approximation can be computed by solving the parameterized system at the points in the parameter space given by an n-point Gaussian quadrature rule. In UQ parlance, this is the difference between an intrusive method and a CHAPTER 1. INTRODUCTION 11 non-intrusive method. As the labeling suggests, a non-intrusive method takes full advantage of a solver written for the constant matrix equation resulting from evaluating the parameterized matrix equation at a point in the parameter space. In contrast, an intrusive method requires one to write a new solver for the larger linear system of equations derived for the Galerkin coefficients. Clearly, non-intrusive methods are embarrassingly parallelizable and take advantage of performance enhancements built for the related constant matrix equation. Immediately, one encounters the question: does the increased accuracy of the intrusive Galerkin method justify the extra effort in implementation? This question continues to puzzle researchers in UQ. To address this question, we present a rigorous comparison of Galerkin and pseudospectral methods in Chapter 3 using a novel approach based on the symmetric, tridiagonal Jacobi matrices associated with the orthogonal polynomial basis. This analysis reveals the conditions under which the two approximations are equivalent, and it uncovers a wealth of fascinating relationships when the methods produce differing results. In particular, we show that each approximation method can be viewed as solving a truncated infinite system of equations; the difference between the two methods lies in when the truncation is performed — before or after applying the parameterized operator. This points to numerous strategies for efficient implementation. All of these results for the single parameter case extend to the case of multiple parameters when the multivariate basis functions are constructed using tensor products of univariate basis functions; we make this explicit at the beginning of Chapter 4. However, this construction is highly impractical — if not infeasible — due to the dramatic cost increase associated with tensor approximations. 1.3.1 Anisotropic Approximation The worst disadvantage for spectral methods occurs when the matrix and/or right hand side depend on multiple parameters. For this case, we have entered the realm of multivariate approximation, and we quickly encounter the so-called curse of dimensionality [40]. Loosely speaking, this curse reflects the harsh reality that the cost of CHAPTER 1. INTRODUCTION 12 constructing an accurate approximation increases exponentially as the number of parameters increases. In many practical cases, we expect the dimension of the parameter space to be at least moderately high, particularly when the parameters represent an approximation of an infinite dimensional stochastic process. Thus, heuristics for combating the curse are essential. This is a very active research area within UQ, with new strategies proposed regularly at conferences and in journals. Some prominent techniques include sparse-grid interpolation schemes [68], hierarchical basis functions [53], low-rank approximation methods [23], and reduced order modeling [78] — all of which share the goal of reducing the computational work necessary to compute a sufficiently accurate multivariate approximation. In some special cases, these methods may have error estimates that are formally independent of the number of parameters. But it is unreasonable to expect this type of result in general. In Chapter 4, we present a heuristic for efficient multivariate approximation using the main effects and interaction effects from a functional ANOVA decomposition [57] as a guide to uncover the most important coefficients of the Galerkin approximation. The ANOVA decomposition has been useful in machine learning and data analysis communities for determining the most important independent variables in a model. It has also been used to develop quasi Monte Carlo methods for high dimensional integration. Its use in multivariate approximation is not new, but it continues to gain status as demand for high dimensional approximation methods increases. It is also closely connected to the Fourier series which underlies the spectral methods; we reveal this connection in greater detail in Chapter 4. The central idea of our heuristic is as follows. If we knew the true values of the coefficients of the multivariate Fourier expansion, then we could include only the most important basis functions for an accurate, finite order approximation. The error estimates for the Galerkin approximation depend on the fact that the Fourier coefficients decay asymptotically as the order of the associated basis function increases. Therefore, by finding an approximate measure of the decay along the coefficients associated with each parameter, we can take advantage of any anisotropic dependence (i.e. parameters whose variation affects the solution more than others) and drastically reduce the number of necessary basis functions in the approximation. This measure CHAPTER 1. INTRODUCTION 13 of anisotropy can be computed by estimating the variance contribution from each of the main effects functions in a functional ANOVA decomposition and comparing them with the total variance of the solution. (These are sometimes called the Sobol indices [3].) Additionally, we can use the variance contribution from each interaction effect to adjust the number of cross-term basis functions used to capture the functional dependence on subsets of the parameters. The algorithm repeats the following steps until a termination criteria is satisfied: (i) For a given set of basis functions, compute a Galerkin approximation to the solution of the parameterized matrix equation, (ii) use the computed coefficients to estimate the variance contribution for each subset of parameters, and (iii) use the estimated variance contributions to select an efficient basis set for the next Galerkin approximation. We demonstrate the power of this heuristic with a set of representative numerical examples. The upside of the curse of dimensionality is that simple reduction ideas can dramatically reduce the work required to compute an efficient approximation. One important consequence of this technique is that it reveals when the full problem decouples into smaller sub-problems via the variance contributions of the interaction effects. For example, if the variance contribution is zero from the interaction effect between two parameters, then the problem will split into two smaller, uncoupled problems — one for each parameter. This dependence revealing property can greatly reduce the necessary work for problems with unknown, complex parameter dependence, such as multi-physics models with multiple, independent sources of uncertainty. We demonstrate this for a conjugate heat transfer model in Chapter 6. 1.3.2 Large-scale Computations Arguably, the most pressing challenge for computational methods in UQ is the magnitude of the computations. Non-intrusive techniques like pseudospectral methods may require hundreds of thousands of evaluations to compute accurate approximations for multiple parameter systems. If each of those evaluations corresponds to solving a large linear system of equations (e.g. a properly refined three-dimensional PDE model), then these methods quickly exceed the limits of the most powerful existing CHAPTER 1. INTRODUCTION 14 computing resources. A Galerkin method can reduce the cost by selectively choosing the basis functions as suggested in our heuristic, but any solution that significantly depends on many parameters will always require a tremendous number of basis functions for an accurate representation. Even forming and storing the linear system for these coefficients may be infeasible for many platforms. To help cope with the scale of these computations, we propose an algorithmic paradigm that falls between the all-or-nothing distinctions of intrusive and nonintrusive. For parameterized matrix equations, non-intrusive methods use only the solution of the parameterized system evaluated at a set of parameter values. In contrast, the intrusive method must solve a related system of equations many times larger than the original parameterized system. Between these extremes lies an unexplored middle ground. In the weakly intrusive paradigm, we allow multiplication of an arbitrary vector against the parameterized matrix evaluated at a point in the parameter space, as well as evaluations of the parameterized right hand side. This is comparable to matrix-free solvers such as Krylov-based iterative methods that use less storage and require only matrix-vector multiplies. Under the weakly intrusive paradigm, we take full advantage of any sparsity in the parameterized matrix, and the only code required is to form the matrix and right hand side at a given parameter value. The following mnemonic may be helpful for distinguishing the weakly intrusive method: Non-intrusive methods evaluate the solution, whereas weakly intrusive methods evaluate the operators. One strong advantage of the weakly intrusive paradigm is that it permits the computation of a residual error estimate for a given approximation. This is comparable to residual error estimates for the solution of constant matrix equations; the residual is bounded by a constant times the true error. It can also be thought of as an a posteriori error estimate used in adaptive finite element methods. This type of error estimate is not available in the strictly non-intrusive paradigm. We derive the residual error estimate in Chapter 3 and discuss its computation in Chapter 5. From the rigorous comparison of Galerkin and pseudospectral methods in Chapter 3, we unwittingly recover a cousin of Galerkin method called Galerkin with numerical integration (G-NI) [16]. The basic idea of G-NI is to use Gaussian quadrature to CHAPTER 1. INTRODUCTION 15 approximate the integrals in the Galerkin formulation. By the polynomial exactness of Gaussian quadrature, if the parameterized matrix depends at most polynomially on the parameters, then there is a quadrature rule that exactly recovers the Galerkin system. The advantage is that the matrix used to compute the G-NI coefficients can be factored into a product of a matrix with orthogonal rows, its transpose, and a blockdiagonal matrix sandwiched between. The block-diagonal matrix has blocks equal to the parameterized matrix evaluated at each quadrature point. This factorization allows us to to solve the system of equations for the G-NI coefficients within the weakly intrusive paradigm. We derive and discuss this novel factorization, which has not yet appeared in the literature, in Chapter 5. We accompany this discussion with a downloadable suite of MATLAB tools called Parameterized Matrix Package that solves parameterized matrix equations within the weakly intrusive paradigm. Beyond implementation strategies, the factorization has many other useful consequences: (i) It allows one to interpret the G-NI approximation as the solution of a weighted least-squares problem, which yields both theoretical insight and additional computational strategies. (ii) It shows that increasing the order of the G-NI approximation can be viewed as adding a set of constraints in an active set optimization method. (iii) It reveals sharp bounds on the eigenvalues of the linear system used to compute the G-NI coefficients in the symmetric case. (iv) It points to preconditioning strategies when using iterative methods to solve for the G-NI coefficients. We explore these facts in detail in Chapter 5. 1.4 Summary of Contributions and Future Work At the risk of being overly explicit, we summarize the contributions of this thesis mentioned in the preceding discussion to the larger body of research. • We propose and analyze the parameterized matrix equation as the model problem for research in UQ methods and algorithm development, effectively reframing the discussion from a stochastic context to an approximation theory context. (Chapter 2). CHAPTER 1. INTRODUCTION 16 • We rigorously compare the intrusive spectral Galerkin method and the nonintrusive pseudospectral method and provide a novel interpretation of each method as solving a truncated infinite system of linear equations. (Chapter 3). • We develop a heuristic for multivariate anisotropic approximation in the Galerkin framework aimed at battling the curse of dimensionality by finding the most important basis functions for accurate approximation. (Chapter 4). • We propose a weakly intrusive paradigm for algorithm development that takes advantage of sparsity for large-scale problems and permits a residual error estimate. (Chapter 5). • We derive a factorization of the linear system used to compute the G-NI coefficients that yields theoretical insight, eigenvalue bounds, and preconditioning strategies. (Chapter 5). • We apply the spectral methods to a conjugate heat transfer problem of channel flow around a cylinder with uncertain (i.e. parameterized) boundary conditions to demonstrate the types of statistics computed to quantify the uncertainties in model output. (Chapter 6). By the end, we will have developed a set of efficient methods for approxmating the variability in a given model output and its functional dependence on a set of parameters representing uncertainties in the model inputs. Extensions to this work are plentiful. Regarding the parameterized matrix equations, the fundamental assumptions that allow one to apply spectral methods ought to be relaxed for a more general problem setting; the independence assumption in the parameters should be removed and methods should be sought that handle the case of singularities within the parameter space. In fact, many systems in practice may have singularities inside the parameter space, but spectral methods will not necessarily detect them (e.g. by failing to converge). A class of methods that detects singularities or a set of heuristics for finding singularities in the parameter space would be very useful. Even for problems without singularities, if there were a method developed for CHAPTER 1. INTRODUCTION 17 finding singularities outside the parameter space, then there is hope of estimating the convergence rate of a spectral method before ever applying it. There is potential for such heuristics within the optimal control community, which has remained disjoint from UQ even though the two often ask similar questions and seek similar answers. There may be other ways to approximate the solution of a parameterized matrix equation using other linear algebra techniques. Some matrix factorizations including the singular value decomposition[15] and QR [22] come in parameterized flavors, which hold promise for alternative methods to spectral methods. There has been some work on a parameterized Lanczos method [66], but it is underdeveloped. However, if a Lanczos-Stieltjes procedure for parameterized systems had as many nice practical and theoretical properties as the Lanczos method for constant linear systems, then such a method could have great impact on the field. Of course, there are many parameterized models that do not easily reduce to matrix equations, such as nonlinear PDEs. These models are arguably more important to predictive simulation than the linear models. The goal of this thesis was to understand nonlinear parameter dependence within the parameterized matrix equation as the first step to addressing fully nonlinear models. Thus the next step is to apply this understanding to nonlinear models. One possible strategy is to explore a Newton iteration for nonlinear equations where each step is a parameterized matrix equation. Within the context of multivariate spectral methods, the scale of the computations will always challenge researchers and practitioners. One promising method for handling large-scale problems is to seek a low-rank approximation to the true Galerkin coefficients. This could potentially reduce storage requirements and computational cost. Some initial work has been done along these lines [23, 69], and this type of approximation is actively pursued in data analysis and machine learning communities, but it is still immature and untested. One practical step that must be taken is the development of easy-to-use software libraries for solving parameterized matrix equations for high performance, massively parallel platforms. As UQ continues to gain prominence and its questions continue to resonate within computational science communities, there will be great demand for software libraries that encode and execute the state-of-the-art in solvers, just CHAPTER 1. INTRODUCTION 18 as we have seen with linear system solvers such as LAPACK. Our vision is for a Parameterized LAPACK that will approximate the solution to parameterized matrix equations. These projects merely scrape the surface of possible research directions emanating from this thesis. Our experience to date is that there is much to be learned from disparate research communities in engineering, math, computer science, and physics. Once a jargon barrier has been breached, the cross-fertilization of ideas flows freely and both parties find insight into their own struggles. We hope this work will be uniting instead of dividing and offer clarity for some complex concepts. Chapter 2 Parameterized Matrix Equations 2.1 Problem Definition and Notation We consider problems that involve a set of d parameters s = (s1 , . . . , sd ) that take values in the hypercube [−1, 1]d . Assume that the hypercube is equipped with a Q normalized, separable scalar weight function w(s) = k wk (sk ). For functions f : [−1, 1]d → R, we use bracket notation to denote the integral against the weight function over the hypercube, i.e. hf i ≡ Z f (s)w(s) ds (2.1) [−1,1]d In a stochastic context, w(s) is the probability density function for the independent random variables s, and hf i is the expectation of f . Since we are working with multivariate functions, we employ the standard multi- index notation. Let a multi-index α = (α1 , . . . , αd ) ∈ Nd be a d-tuple of non-negative integers. A subscript α denotes association with the particular multi-index. A superscript α denotes the following product: α s ≡ d Y sαk k . k=1 This notation makes manipulations in multiple variables much clearer. 19 (2.2) CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS 20 Let the RN -valued function x(s) satisfy the linear system of equations A(s)x(s) = b(s), s ∈ [−1, 1]d (2.3) for a given RN ×N -valued function A(s) and RN -valued function b(s). We assume that both A(s) and b(s) are analytic in a region containing [−1, 1]d , which implies that they have a convergent power series A(s) = X A α sα , α∈Nd b(s) = X bα sα (2.4) α∈Nd for some constant matrices Aα and constant vectors bα . We assume that A(s) is bounded away from singularity for all s ∈ [−1, 1]d . This implies that we can write x(s) = A−1 (s)b(s). The elements of the solution x(s) can also be written using Cramer’s rule [64, Chapter 6] as a ratio of determinants. xi (s) = det(Ai (s)) , det(A(s)) i = 0, . . . , N − 1, (2.5) where Ai (s) is the parameterized matrix formed by replacing the ith column of A(s) by b(s). From equation (2.5) and the invertibility of A(s), we can conclude that each component of x(s) is analytic in a region containing [−1, 1]d . Equation (2.5) reveals the underlying structure of the solution as a function of s. If A(s) and b(s) depend polynomially on s, then (2.5) tells us that x(s) is a multivariate rational function. Note also that this structure is independent of the particular weight function w(s). 2.2 Example – An Elliptic PDE with Random Coefficients One of the motivating examples for this work comes from the field of partial differential equations with stochastic inputs. A model problem from this field – studied by CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS 21 various authors [7, 28, 33] – is an elliptic equation with stochastic coefficients. More precisely, let a(y, ω) be a positive, bounded random field with bounds 0 < al ≤ a(y, ω) ≤ au , y ∈ D, ω ∈ Ω, (2.6) where D is a given spatial domain and Ω is a given sample space. Then we seek a function u = u(y, ω) that satisfies ∇ · (a(y, ω)∇u) = f (y, ω), u(y, ω) = 0, y ∈ D, ω ∈ Ω y ∈ ∂D, ω ∈ Ω, (2.7) (2.8) where f (y, ω) is a given forcing function. The first step is to approximate the coefficients by some finite dimensional parameterized function a(y, ω) ≈ ad (y, s), (2.9) where the d independent parameters s1 , . . . , sd take values in some parameter space S such that the approximation ad (y, s) is also bounded and positive. This can be accom- plished with the well-known Karhunen-Loeve expansion [58] or some other modelling techniques. We can assume without loss of generality that f (y, ω) = f (y) depends only on the spatial variable y. Then equations (2.7)-(2.8) are replaced by ∇ · (ad (y, s)∇ud) = f (y), ud (y, s) = 0, y ∈ D, s ∈ S y ∈ ∂D, s ∈ S. (2.10) (2.11) At this point, the spatial domain is discretized using a standard discretization such as a finite element or finite difference scheme, which results in a linear system of equations for the degrees of freedom u = u(s) K(s)u(s) = f, (2.12) CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS 22 where the elements of the coefficient matrix K(s) depend on the parameters s. Thus we are left with a parameterized matrix equation we must solve to obtain the degrees of freedom at each parameter value. There are more technical details associated with this particular problem that we have conveniently sidestepped. We refer to the above references for a thorough treatment of the existence and well-posedness of this problem, as well as numerical methods for its approximation. For our purposes, it serves as an outstanding example of where one finds parameterized matrix equations. 2.3 A Discussion of Singularities Returning to the general problem in equation (2.3), we may ask what is the goal of the computation? We may not need the function pointwise over the whole parameter space. Instead, we may seek a constant vector of averages hxi over the parameter space. In a stochastic context, this can be thought of as computing the mean. Similarly, we may seek a measure of variability such as variance h(x − hxi)2 i or other higher moments. Similar probabilistic measures include density or distribution functions, or the probability that some component of x(s) exceeds a given threshold. In a reliability or control context, we may be interested in bounds on the components of x(s) over the parameter space. If the components of x(s) contain singularities in the parameter space, then some of these measures – particularly the integral measures – do not exist, i.e. are infinite. Singularities may arise in x(s) if A(s) is singular at some point in the parameter space. This is apparent from equation (2.5). Notice that since the elements of A(s) vary continuously with s by the assumption of analyticity, det(A(s)) also varies continuously with s. Thus, if A(s∗ ) is singular for some point s∗ in the domain, then det(A(s∗ )) = 0 and the limit of 1/ det(A(s)) as s approaches s∗ is infinity. If each function det(Ai (s)) does not also go to zero as s goes to s∗ , then some components of x(s) will contain a pole. In fact, if the components of b(s) contain singularities, then a non-singular A(s) implies that x(s) will inherit those singularities. As a brief aside, some applications in differential equations with random inputs CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS 23 try to model the randomness using Gaussian random fields. The problem with using such models is that the resulting parameter space is unbounded. An unbounded parameter space means that A(s) must be non-singular for every possible parameter value. If A(s) depends polynomially on s with some odd degree polynomial (e.g. linearly), then the matrix A(s) will become singular at some point in an unbounded parameter space. The Gaussian measure imposed on the parameter space will not – despite its exponential decay about the mean – remove the singularities, and integral measures such as expectation and variance will not exist. In terms of function spaces, the components of x(s) will not belong to L1 (or L2 ). To avoid these situations altogether, we use a bounded parameter space and require A(s) to be non-singular at each point in the parameter space. Despite the protection of these assumptions, however, a singularity that occurs outside the parameter space but close to the boundary can induce sharp gradients and large variability in the function within the parameter space. These two characteristics of a solution typically cause difficulties for numerical approximation methods. In fact, we will see in Chapter 3 that the convergence of the spectral methods is directly related to the nearness of the closest singularity to the parameter space in the complex plane. We provide the following simple example to demonstrate the behavior of a solution with a nearby singularity. Let ε > 0, and consider the following parameterized matrix equation " #" # 1 + ε s x0 (s) s 1 x1 (s) = " # 2 1 . (2.13) For this case, we can compute the exact solution, x0 (s) = 2−s , 1 + ε − s2 x1 (s) = 1 + ε − 2s . 1 + ε − s2 (2.14) √ Both of these functions have poles at s = ± 1 + ε. In Figure 2.3 we plot x0 (s) for various values of ε. Notice how the gradients become steeper and the variability increases as ε goes to zero. We will return to this example in Chapter 3 to asses the performance of the spectral methods. 24 CHAPTER 2. PARAMETERIZED MATRIX EQUATIONS 16 ε=0.2 ε=0.4 ε=0.6 ε=0.8 14 x0(s)=(2−s)/(1+ε−s2) 12 10 8 6 4 2 0 −1 −0.5 0 s 0.5 1 Figure 2.1: Plotting the response surface x0 (s) that solves equation (3.66) for different values of ε. The point of the preceding discussion is to emphasize that singularities create difficulties when solving parameterized matrix equations. Fortunately, many parameterized systems in practice have a priori bounds on valid parameters that yield non-singular systems. For example, the matrix from the discretized elliptic equation (2.12) is non-singular as long as the elliptic coefficients ad (x, s) remain strictly positive. If such a priori knowledge is not available, then heuristics may be pursued that search for the nearest singularity, and such heuristics will be highly problem dependent. Chapter 3 Spectral Methods — Univariate Approximation In this section, we derive the spectral methods we use to approximate the solution x(s) for the case when d = 1. We begin with a brief review of the relevant theory of orthogonal polynomials, Gaussian quadrature, and Fourier series. We include this section primarily for the sake of notation and refer the reader to a standard text on orthogonal polynomials [82] for further theoretical details and [30] for a modern perspective on computation. 3.1 Orthogonal Polynomials and Gaussian Quadrature Let P be the space of real polynomials defined on [−1, 1], and let Pn ⊂ P be the space of polynomials of degree at most n. For any p, q in P, we define the inner product as hpqi ≡ Z We define a norm on P as kpkL2 = 1 p(s)q(s)w(s) ds. (3.1) −1 p hp2 i, which is the standard L2 norm for the given weight w(s). Let {πk (s)} be the set of polynomials that are orthonormal with 25 CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 26 respect to w(s), i.e. hπi πj i = δij . It is known that {πk (s)} satisfy the three-term recurrence relation βk+1 πk+1 (s) = (s − αk )πk (s) − βk πk−1 (s), k = 0, 1, 2, . . . , (3.2) with π−1 (s) = 0 and π0 (s) = 1. If we consider only the first n equations, then we can rewrite (3.2) as sπk (s) = βk πk−1 (s) + αk πk (s) + βk+1 πk+1 (s), k = 0, 1, . . . , n − 1. (3.3) Setting π n (s) = [π0 (s), π1 (s), . . . , πn−1 (s)]T , we can write this conveniently in matrix form as sπ n (s) = Jn π n (s) + βn πn (s)en (3.4) where en is a vector of zeros with a one in the last entry, and Jn (known as the Jacobi matrix ) is a symmetric, tridiagonal matrix defined as α0 β1 β1 α1 β2 . . . . . . Jn = . . . . βn−2 αn−2 βn−1 βn−1 αn−1 (3.5) The zeros {λi } of πn (s) are the eigenvalues of Jn and π n (λi ) are the corresponding eigenvectors; this follows directly from (3.4). Let Qn be the orthogonal matrix of eigenvectors of Jn . By equation (3.4), the elements of Qn are given by Qn (i, j) = πi (λj ) , kπ n (λj )k2 i, j = 0, . . . , n − 1. (3.6) Then we write the eigenvalue decomposition of Jn as Jn = Qn Λn QTn . (3.7) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 27 It is known (c.f. [30]) that the eigenvalues {λi } are the familiar Gaussian quadrature points associated with the weight function w(s). The quadrature weight νi corresponding to λi is equal to the square of the first component of the eigenvector associated with λi , i.e. νi = Q(0, i)2 = 1 . kπ n (λi )k22 (3.8) The weights {νi } are known to be strictly positive. We will use these facts repeat- edly in the sequel. For an integrable scalar function f (s), we can approximate its integral by an n-point Gaussian quadrature rule, which is a weighted sum of function evaluations, Z 1 f (s)w(s) ds = −1 n−1 X f (λi )νi + Rn (f ). (3.9) i=0 If f ∈ P2n−1 , then Rn (f ) = 0; that is to say the degree of exactness of the Gaussian quadrature rule is 2n − 1. We use the notation hf in ≡ n−1 X f (λi )νi (3.10) i=0 to denote the Gaussian quadrature rule. This is a discrete approximation to the true integral. 3.2 Fourier Series The polynomials {πk (s)} form an orthonormal basis for the Hilbert space L2 ≡ L2w ([−1, 1]) = {f : [−1, 1] → R | kf kL2 < ∞} . (3.11) Therefore, any f ∈ L2 admits a convergent Fourier series f (s) = ∞ X k=0 hf πk i πk (s). (3.12) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 28 The coefficients hf πk i are called the Fourier coefficients. If we truncate the series (3.12) after n terms, we are left with a polynomial of degree n − 1 that is the best approximation polynomial in the L2 norm. In other words, if we denote Pn f (s) = n−1 X k=0 hf πk i πk (s), (3.13) then kf − Pn f kL2 = inf kf − pkL2 . p∈Pn−1 (3.14) In fact, the error made by truncating the series is equal to the sum of squares of the neglected coefficients, kf − Pn f k2L2 = ∞ X k=n hf πk i2 . (3.15) These properties of the Fourier series motivate the theory and practice of spectral methods. We have shown that each element of the solution x(s) of the parameterized matrix equation is analytic in a region containing the closed interval [−1, 1]. Therefore it is continuous and bounded on [−1, 1], which implies that xi (s) ∈ L2 for i = 0, . . . , N −1. We can thus write the convergent Fourier expansion for each element using vector notation as x(s) = ∞ X k=0 hxπk i πk (s). (3.16) Note that we are abusing the bracket notation here, but this will make further manipulations very convenient. The computational strategy is to choose a truncation level n − 1 and estimate the coefficients of the truncated expansion. 3.3 Spectral Collocation The term spectral collocation typically refers to the technique of constructing a Lagrange interpolating polynomial through the exact solution evaluated at the Gaussian quadrature points. Suppose that λi , i = 0, . . . , n − 1 are the Gaussian quadrature CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 29 points for the weight function w(s). We can construct an n − 1 degree polynomial interpolant of the solution through these points as xc,n (s) = n−1 X i=0 x(λi )ℓi (s) ≡ Xc ln (s). (3.17) The vector x(λi ) is the solution to the equation A(λi )x(λi ) = b(λi ). The n − 1 degree polynomial ℓi (s) is the standard Lagrange basis polynomial defined as ℓi (s) = n−1 Y s − λj . λ i − λj j=0, j6=i (3.18) The N × n constant matrix Xc (the subscript c is for collocation) has one column for each x(λi ), and ln (s) is a vector of the Lagrange basis polynomials. By construction, the collocation polynomial xc,n interpolates the true solution x(s) at the Gaussian quadrature points. We will use this construction to show the connection between the pseudospectral method and the Galerkin method. 3.4 Pseudospectral Methods Notice that computing the true coefficients of the Fourier expansion of x(s) requires the exact solution. The essential idea of the pseudospectral method is to approximate the Fourier coefficients of x(s) by a Gaussian quadrature rule. In other words, xp,n (s) = n−1 X i=0 hxπk in πk (s) ≡ Xp π n (s), (3.19) where Xp is an N × n constant matrix of the approximated Fourier coefficients; the subscript p is for pseudospectral. For clarity, we recall hxπk in = n−1 X i=0 x(λi )πk (λi )νi . (3.20) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 30 where x(λi ) solves A(λi )x(λi ) = b(λi ). In general, the number of points in the quadrature rule need not have any relationship to the order of truncation. However, when the number of terms in the truncated series is equal to the number of points in the quadrature rule, the pseudospectral approximation is equivalent to the collocation approximation. This relationship is well-known, but we include the following lemma and theorem for use in later proofs. Lemma 1 Let q0 be the first row of Qn , and define Dq0 = diag(q0 ). The matrices Xp and Xc are related by the equation Xp = Xc Dq0 QTn . Proof. Write Xp (:, k) = hxπk in = n−1 X x(λj )πk (λj )νj j=0 = n−1 X Xc (:, j) j=0 1 πk (λj ) kπ n (λj )k2 kπ n (λj )k2 = Xc Dq0 QTn (:, k) which implies Xp = Xc Dq0 QTn as required. Theorem 2 The n − 1 degree collocation approximation is equal to the n − 1 degree pseudospectral approximation using an n-point Gaussian quadrature rule, i.e. xc,n (s) = xp,n (s). (3.21) for all s. Proof. Note that the elements of q0 are all non-zero, so D−1 q0 exists. Then Lemma 1 implies Xc = Xp Qn D−1 q0 . Using this change of variables, we can write xc,n (s) = Xc ln (s) = Xp Qn D−1 q0 ln (s). (3.22) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 31 Thus it is sufficient to show that π n (s) = Qn D−1 q0 ln (s). Since this is just a vector of polynomials with degree at most n − 1, we can do this by multiplying each element by each orthonormal basis polynomial up to order n − 1 and integrating. Towards this end we define Θ ≡ ln π Tn . Using the polynomial exactness of the Gaussian quadrature rule, we compute the i, j element of Θ. Θ(i, j) = hli πj i = n−1 X ℓi (λk )πj (λk )νk k=0 = 1 πj (λi ) kπ n (λi )k2 kπ n (λi )k2 = Qn (0, i)Qn (j, i), which implies that Θ = Dq0 QTn . Therefore T −1 T Qn D−1 q0 ln π n = Qn Dq0 ln π n = Qn D−1 q0 Θ T = Qn D−1 q0 Dq0 Qn = I, where I is the appropriately sized identity matrix. This completes the proof. Some refer to the pseudospectral method explicitly as an interpolation method [12]. See [43] for an insightful interpretation in terms of a discrete projection. Because of this property, we will freely interchange the collocation and pseudospectral approximations when convenient in the ensuing analysis. The work required to compute the pseudospectral approximation is highly dependent on the parameterized system. In general, we assume that the computation of x(λi ) dominates the work; in other words, the cost of computing Gaussian quadrature formulas is negligible compared to computing the solution to each linear system. Then if each x(λi ) costs O(N 3 ), the pseudospectral approximation with n terms costs CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 32 O(nN 3 ). 3.5 Spectral Galerkin The spectral Galerkin method computes a finite dimensional approximation to x(s) such that each element of the equation residual is orthogonal to the approximation space. Define r(y, s) = A(s)y(s) − b(s). (3.23) The finite dimensional approximation space for each component xi (s) will be the space of polynomials of degree at most n − 1. This space is spanned by the first n orthonormal polynomials, i.e. span(π0 (s), . . . , πn−1 (s)) = Pn−1 . We seek an RN valued polynomial xg,n (s) of maximum degree n − 1 such that hri (xg,n )πk i = 0, i = 0, . . . , N − 1, k = 0, . . . , n − 1, (3.24) where ri (xg,n ) is the ith component of the residual. We can write equations (3.24) in matrix notation as r(xg,n )π Tn = 0 (3.25) Axg,n π Tn = bπ Tn . (3.26) or equivalently Since each component of xg,n (s) is a polynomial of degree at most n − 1, we can write its expansion in {πk (s)} as xg,n (s) = n−1 X k=0 xg,k πk (s) ≡ Xg π n (s), (3.27) where Xg is a constant matrix of size N × n; the subscript g is for Galerkin. Then equation (3.26) becomes AXg π n π Tn = bπ Tn . (3.28) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 33 Using the vec notation [37, Section 4.5], we can rewrite (3.28) as π n π Tn ⊗ A vec(Xg ) = hπ n ⊗ bi . (3.29) where vec(Xg ) is an Nn × 1 constant vector equal to the columns of Xg stacked on top of each other. The constant matrix π n π Tn ⊗ A has size Nn × Nn and a distinct block structure; the i, j block of size N × N is equal to hπi πj Ai. More explicitly, hπ0 π0 Ai · · · hπ0 πn−1 Ai .. .. .. . π n π Tn ⊗ A = . . . hπn−1 π0 Ai · · · hπn−1 πn−1 Ai (3.30) Similarly, the ith block of the Nn×1 vector hπ n ⊗ bi is equal to hbπi i, which is exactly the ith Fourier coefficient of b(s). Since A(s) is bounded and nonsingular for all s ∈ [−1, 1], it is straightforward to show that xg,n (s) exists and is unique using the classical Galerkin theorems presented and summarized in Brenner and Scott [14, Chapter 2]. This implies that Xg is unique, and since b(s) is arbitrary, we conclude that the matrix π n π Tn ⊗ A is nonsingular for all finite truncations n. The work required to compute the Galerkin approximation depends on how one computes the integrals in equation (3.29). If we assume that the cost of forming the system is negligible, then the costly part of the computation is solving the system (3.29). The size of the matrix π n π Tn ⊗ A is Nn × Nn, so we expect an operation count of O(N 3 n3 ), in general. However, many applications beget systems with sparsity or exploitable structure that can considerably reduce the required work. 3.6 A Brief Review We have discussed two classes of spectral methods: (i) the interpolatory pseudospectral method which approximates the truncated Fourier series of x(s) by using a Gaussian quadrature rule to approximate each Fourier coefficient, and (ii) the Galerkin projection method which finds an approximation in a finite dimensional subspace CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 34 such that the residual A(s)xg,n (s) − b(s) is orthogonal to the approximation space. In general, the n-term pseudospectral approximation requires n solutions of the original parameterized matrix equation (2.3) evaluated at the Gaussian quadrature points, while the Galerkin method requires the solution of the coupled linear system of equations (3.29) that is n times as large as the original parameterized matrix equation. A rough operation count for the pseudospectral and Galerkin approximations is O(nN 3 ) and O(n3 N 3 ), respectively. Before discussing asymptotic error estimates, we first derive some interesting and useful connections between these two classes of methods. In particular, we can interpret each method as a set of functions acting on the infinite Jacobi matrix for the weight function w(s); the difference between the methods lies in where each truncates the infinite system of equations. 3.7 Connections Between Pseudospectral and Galerkin We begin with a useful lemma for representing a matrix of Gaussian quadrature integrals in terms of functions of the Jacobi matrix. Lemma 3 Let f (s) be a scalar function analytic in a region containing [−1, 1]. Then f π n π Tn n = f (Jn ). Proof. We examine the i, j element of the n × n matrix f (Jn ). eTi f (Jn )ej = eTi Qn f (Λn )QTn ej = qTi f (Λn )qj = = n−1 X k=0 n−1 X f (λk ) f (λk )πi (λk )πj (λk )νk k=0 = hf πi πj in , which completes the proof. πi (λk ) πj (λk ) kπ(λk )k2 kπ(λk )k2 CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 35 Note that Lemma 3 generalizes Theorem 3.4 in [36]. With this in the arsenal, we can prove the following theorem relating pseudospectral to Galerkin. Theorem 4 The pseudospectral solution is equal to an approximation of the Galerkin solution where each integral in equation (3.29) is approximated by an n-point Gaussian quadrature formula. In other words, Xp solves π n π Tn ⊗ A n vec(Xp ) = hπ n ⊗ bin . (3.31) Proof. Define the N × n matrix Bc = [b(λ0 ) · · · b(λn−1 )]. Using the power se- ries expansion of A(s) (equation (2.4)), we can write the matrix of each collocation solution as A(λk ) = ∞ X Aiλik (3.32) i=0 for k = 0, . . . , n − 1. We collect these into one large block-diagonal system by writing ∞ X i=0 Λin ⊗ Ai ! vec(Xc ) = vec(Bc ). (3.33) Let I be the N × N identity matrix. Premultiply (3.33) by (Dq0 ⊗ I), and by com- mutativity of diagonal matrices and the mixed product property, it becomes ∞ X i=0 Λin ⊗ Ai ! (Dq0 ⊗ I)vec(Xc ) = (Dq0 ⊗ I)vec(Bc ). (3.34) Premultiplying (3.34) by (Qn ⊗ I), properly inserting (QTn ⊗ I)(Qn ⊗ I) on the left hand side, and using the eigenvalue decomposition (3.7), this becomes ∞ X i=0 ! Jin ⊗ Ai (Qn ⊗ I)(Dq0 ⊗ I)vec(Xc ) = (Qn ⊗ I)(Dq0 ⊗ I)vec(Bc ). (3.35) But note that Lemma 1 implies (Qn ⊗ I)(Dq0 ⊗ I)vec(Xc ) = vec(Xp ). (3.36) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 36 Using an argument identical to the proof of Lemma 1, we can write (Qn ⊗ I)(Dq0 ⊗ I)vec(Bc ) = hπ n ⊗ bin . (3.37) Finally, using Lemma 3, equation (3.35) becomes as required. π n π Tn ⊗ A n vec(Xp ) = hπ n ⊗ bin (3.38) Theorem 4 begets a corollary giving conditions for equivalence between Galerkin and pseudospectral approximations. Corollary 5 If b(s) contains only polynomials of maximum degree mb and A(s) contains only polynomials of maximum degree 1 (i.e. linear functions of s), then xg,n (s) = xp,n (s) for n ≥ mb for all s ∈ [−1, 1]. Proof. The parameterized matrix π n (s)π n (s)T ⊗ A(s) has polynomials of degree at most 2n − 1. Thus, by the polynomial exactness of the Gaussian quadrature formulas, π n π Tn ⊗ A n = π n π Tn ⊗ A , hπ n ⊗ bin = hπ n ⊗ bi . (3.39) Therefore Xg = Xp , and consequently xg,n (s) = Xg π n (s) = Xp π n (s) = xp,n (s). (3.40) as required. By taking the transpose of equation (3.28) and following the steps of the proof of Theorem 4, we get another interesting corollary. Corollary 6 First define A(Jn ) to be the Nn × Nn constant matrix with the i, j block of size n × n equal to A(i, j)(Jn ) for i, j = 0, . . . , N − 1. Next define b(Jn ) to be the Nn × n constant matrix with the ith n × n block equal to bi (Jn ). Then the CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 37 pseudospectral coefficients Xp satisfy A(Jn )vec(XTp ) = b(Jn )e0 , (3.41) where e0 = [1, 0, . . . , 0]T is an n-vector. Theorem 4 leads to a fascinating connection between the matrix operators in the Galerkin and pseudospectral methods, namely that the matrix in the Galerkin system is equal to a submatrix of the matrix from a sufficiently larger pseudospectral computation. This is the key to understanding the relationship between the Galerkin and pseudospectral approximations. In the following lemma, we denote the first r × r principal minor of a matrix M by [M]r×r . Lemma 7 Let A(s) contain only polynomials of degree at most ma , and let b(s) contain only polynomials of degree at most mb . Define m ≡ m(n) ≥ max ma + 2n − 1 2 , mb + n 2 (3.42) Then π n π Tn ⊗ A = π m π Tm ⊗ A m N n×N n hπ n ⊗ bi = [hπ m ⊗ bim ]N n×1 . Proof. The integrands of the matrix π n π Tn ⊗ A are polynomials of degree at most 2n + ma − 2. Therefore they can be integrated exactly with a Gaussian quadrature rule of order m. A similar argument holds for hπ n ⊗ bi. Combining Lemma 7 with Corollary 6, we get the following proposition relat- ing the Galerkin coefficients to the Jacobi matrices for A(s) and b(s) that depend polynomially on s. Proposition 8 Let m, ma , and mb be defined as in Lemma 7. Define [A]n (Jm ) to be the Nn × Nn constant matrix with the i, j block of size n × n equal to [A(i, j)(Jm )]n×n for i, j = 0, . . . , N − 1. Define [b]n (Jm ) to be the Nn × n constant matrix with the ith CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 38 n × n block equal to [bi (Jm )]n×n for i = 0, . . . , N − 1. Then the Galerkin coefficients Xg satisfy [A]n (Jm )vec(XTg ) = [b]n (Jm )e0 , (3.43) where e0 = [1, 0, . . . , 0]T is an n-vector. Notice that Proposition 8 provides a way to compute the exact matrix for the Galerkin computation without any symbolic manipulation, but beware that m depends on both n and the largest degree of polynomial in A(s). Written in this form, we have no trouble taking m to infinity, and we arrive at the main theorem of this section. Theorem 9 Using the notation of Proposition 8 and Corollary 6, the coefficients Xg of the n-term Galerkin approximation of the solution x(s) to equation (2.3) satisfy the linear system of equations [A]n (J∞ )vec(XTg ) = [b]n (J∞ )e0 , (3.44) where e0 = [1, 0, . . . , 0]T is an n-vector. Proof. Let A(ma ) (s) be the truncated power series of A(s) up to order ma , and let b(mb ) (s) be the truncated power series of b(s) up to order mb . Since A(s) is analytic and bounded away from singularity for all s ∈ [−1, 1], there exists an integer M such that A(ma ) (s) is also bounded away from singularity for all s ∈ [−1, 1] and all ma > M (although the bound may be depend on ma ). Assume that ma > M. (ma ,mb ) Define m as in equation (3.42). Then by Proposition 8, the coefficients Xg of the n-term Galerkin approximation to the solution of the truncated system satisfy a ,mb ) T [A(ma ) ]n (Jm )vec((X(m ) ) = [b(mb ) ]n (Jm )e0 . g (3.45) By the definition of m (equation (3.42)), equation (3.45) holds for all integers greater than some minimum value. Therefore, we can take m → ∞ without changing the solution at all, i.e. a ,mb ) T [A(ma ) ]n (J∞ )vec((X(m ) ) = [b(mb ) ]n (J∞ )e0 . g (3.46) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 39 Next we take ma , mb → ∞ to get [A(ma ) ]n (J∞ ) → [A]n (J∞ ) [b(mb ) ]n (J∞ ) → [b]n (J∞ ) which implies a ,mb ) X(m → Xg g (3.47) as required. Theorem 9 and Corollary 6 reveal the fundamental difference between the Galerkin and pseudospectral approximations. We put them side-by-side for comparison. [A]n (J∞ )vec(XTg ) = [b]n (J∞ )e0 , A(Jn )vec(XTp ) = b(Jn )e0 . (3.48) The difference lies in where the truncation occurs. For pseudospectral, the infinite Jacobi matrix is first truncated, and then the operator is applied. For Galerkin, the operator is applied to the infinite Jacobi matrix, and the resulting system is truncated. The question that remains is whether it matters. As we will see in the error estimates in the next section, the interpolating pseudospectral approximation converges at a rate comparable to the Galerkin approximation, yet requires considerably less computational effort. 3.8 Error Estimates Asymptotic error estimates for polynomial approximation are well-established in many contexts, and the theory is now considered classical. Our goal is to apply the classical theory to relate the rate of geometric convergence to some measure of singularity for the solution. We do not seek the tightest bounds in the most appropriate norm as in [16], but instead we offer intuition for understanding the asymptotic rate of convergence. We also present a residual error estimate that may be more useful in practice. We complement the analysis with two representative numerical examples. CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 40 To discuss convergence, we need to choose a norm. In the statements and proofs, we will use the standard L2 and L∞ norms generalized to RN -valued functions. Definition 10 For a function f : R → RN , define the L2 and L∞ norms as kf kL2 v uN −1 Z uX := t 1 i=0 −1 kf kL∞ := max 0≤i≤N −1 fi2 (s)w(s) ds sup |fi (s)| −1≤s≤1 (3.49) (3.50) With these norms, we can state error estimates for both Galerkin and pseudospectral methods. Theorem 11 (Galerkin Asymptotic Error Estimate) Let ρ∗ be the sum of the semi-axes of the greatest ellipse with foci at ±1 in which xi (s) is analytic for i = 0, . . . , N −1. Then for 1 < ρ < ρ∗ , the asymptotic error in the Galerkin approximation is kx − xg,n kL2 ≤ Cρ−n , (3.51) where C is a constant independent of n. Proof. We begin with the standard error estimate for the Galerkin method [16, Section 6.4] in the L2 norm, kx − xg,n kL2 ≤ C kx − Rn xkL2 . (3.52) The constant C is independent of n but depends on the extremes of the bounded eigenvalues of A(s). Under the consistency hypothesis, the operator Rn is a projection operator such that kxi − Rn xi kL2 → 0, n→∞ (3.53) for i = 0, . . . , N − 1. For our purpose, we let Rn x be the expansion of x(s) in terms of the Chebyshev polynomials, Rn x(s) = n−1 X k=0 ak Tk (s), (3.54) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 41 where Tk (s) is the kth Chebyshev polynomial, and ak,i 2 = πck Z 1 −1 xi (s)Tk (s)(1 − s2 )−1/2 ds, ck = ( 2 if k = 0 1 otherwise (3.55) for i = 0, . . . , N −1. Since x(s) is continuous for all s ∈ [−1, 1] and w(s) is normalized, we can bound kx − Rn xkL2 ≤ √ N kx − Rn xkL∞ (3.56) The Chebyshev series converges uniformly for functions that are continuous on [−1, 1], so we can bound ∞ X = ak Tk (s) ∞ L k=n ∞ X |ak | ≤ kx − Rn xkL∞ k=n (3.57) (3.58) ∞ since −1 ≤ Tk (s) ≤ 1 for all k. To be sure, the quantity |ak | is the component-wise absolute value of the constant vector ak , and the norm k · k∞ is the standard infinity norm on RN . Using the result stated in [39, Section 3], we have lim sup |ak,i|1/k = k→∞ 1 , ρ∗i i = 0, . . . , N − 1 (3.59) where ρ∗i is the sum of the semi-axes of the greatest ellipse with foci at ±1 in which xi (s) is analytic. This implies that asymptotically |ak,i | = O ρ i k , i = 0, . . . , N − 1. (3.60) for ρi < ρ∗i . We take ρ = mini ρi , which suffices to prove the estimate (3.51). Theorem 11 recalls the well-known fact that the convergence of many polynomial approximations (e.g. power series, Fourier series) depend on the size of the region in the complex plane in which the function is analytic. Thus, the location of the CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 42 singularity nearest the interval [−1, 1] determines the rate at which the approximation converges as one includes higher powers in the polynomial approximation. Next we derive a similar result for the pseudospectral approximation using the fact that it interpolates x(s) at the Gaussian points of the weight function w(s). Theorem 12 (Pseudospectral Asymptotic Error Estimate) Let ρ∗ be the sum of the semi-axes of the greatest ellipse with foci at ±1 in which xi (s) is analytic for i = 0, . . . , N − 1. Then for 1 < ρ < ρ∗ , the asymptotic error in the pseudospectral approximation is kx − xp,n kL2 ≤ Cρ−n , (3.61) where C is a constant independent of n. Proof. Recall that xc,n (s) is the Lagrange interpolant of x(s) at the Gaussian points of w(s), and let xc,n,i(s) be the ith component of xc,n (s). We will use the result from [79, Theorem 4.8] that Z 1 −1 (xi (s) − xc,n,i(s))2 w(s) ds ≤ 4En2 (xi ), (3.62) where En (xi ) is the error of the best approximation polynomial in the uniform norm. We can, again, bound En (xi ) by the error of the Chebyshev expansion (3.54). Using Theorem 2 with equation (3.62), kx − xp,n kL2 = kx − xc,n kL2 √ ≤ 2 N kx − Rn xkL∞ . The remainder of the proof proceeds exactly as the proof of Theorem 11. We have shown, using classical approximation theory, that the interpolating pseudospectral method and the Galerkin method have the same asymptotic rate of geometric convergence. This rate of convergence depends on the size of the region in the complex plane where the functions x(s) are analytic. The structure of the matrix equation reveals at least one singularity that occurs when A(s∗ ) is rank-deficient for some s∗ ∈ R, assuming the right hand side b(s∗ ) does not fortuitously remove it. CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 43 For a general parameterized matrix, this fact may not be useful. However, for many parameterized systems in practice, the range of the parameter is dictated by existence and/or stability criteria. The value that makes the system singular is often known and has some interpretation in terms of the model. In these cases, one may have an upper bound on ρ, which is the sum of the semi-axes of the ellipse of analyticity, and this can be used to estimate the geometric rate of convergence a priori. We end this section with a residual error estimate – similar to residual error estimates for constant matrix equations – that may be more useful in practice than the asymptotic results. Theorem 13 Define the residual r(y, s) as in equation (3.23), and let e(y, s) = x(s)− y(s) be the RN -valued function representing the error in the approximation y(s). Then C1 kr(y)kL2 ≤ ke(y)kL2 ≤ C2 kr(y)kL2 (3.63) for some constants C1 and C2 , which are independent of y(s). Proof. Since A(s) is non-singular for all s ∈ [−1, 1], we can write A−1 (s)r(y, s) = y(s) − A−1 (s)b(s) = e(y, s) (3.64) so that ke(y)k2L2 = e(y)T e(y) = r T (y)A−T A−1 r(y) Since A(s) is bounded, so is A−1 (s). Therefore, there exist constants C1∗ and C2∗ that depend only on A(s) such that C1∗ r T (y)r(y) ≤ eT (y)e(y) ≤ C2∗ r T (y)r(y) . Taking the square root yields the desired result. (3.65) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 44 Theorem 13 states that the L2 norm of the residual behaves like the L2 norm of the error. In many cases, this residual error may be much easier to compute than the true L2 error. However, as in residual error estimates for constant matrix problems, the constants in Theorem 13 will be large if the bounds on the eigenvalues of A(s) are large. We apply these results in the next section with two numerical examples. 3.9 Numerical Examples We examine two simple examples of spectral methods applied to parameterized matrix equations. The first is a 2 × 2 symmetric parameterized matrix, and the second is a discretized second order ODE. In both cases, we relate the convergence of the spectral methods to the size of the region of analyticity and verify this relationship numerically. We also compare the behavior of the true error to the behavior of the residual error estimate from Theorem 13. To keep the computations simple, we use a constant weight function w(s). The corresponding orthonormal polynomials are the normalized Legendre polynomials, and the Gauss points are the Gauss-Legendre points. 3.9.1 A 2 × 2 Parameterized Matrix Equation We return to the example introduced in Chapter 2. Let ε > 0, and consider the following parameterized matrix equation " #" # 1 + ε s x0 (s) s 1 x1 (s) = " # 2 1 . (3.66) For this case, we can easily compute the exact solution, x0 (s) = 2−s , 1 + ε − s2 x1 (s) = 1 + ε − 2s . 1 + ε − s2 (3.67) √ Both of these functions have a poles at s = ± 1 + ε, so the sum of the semi-axes of √ the ellipse of analyticity is bounded, i.e. ρ < 1 + ε. Notice that the matrix is linear CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 45 0 0 10 Residual Error Estimate 10 −2 L2 Error 10 −4 10 −6 10 −8 10 −10 10 0 ε=0.8 ε=0.6 ε=0.4 ε=0.2 5 −2 10 −4 10 −6 10 ε=0.8 ε=0.6 ε=0.4 ε=0.2 −8 10 −10 10 15 Order 20 25 30 10 0 5 10 15 Order 20 25 30 Figure 3.1: The convergence of the spectral methods applied to equation (3.66). The figure on the left shows plots the L2 error as the order of approximation increases, and the figure on the right plots the residual error estimate. The stairstep behavior relates to the fact that x0 (s) and x1 (s) are odd functions over [−1, 1]. in s, and the right hand side has no dependence on s. Thus, Corollary 5 implies that the Galerkin approximation is equal to the pseudospectral approximation for all n; there is no need to solve the system (3.29) to compute the Galerkin approximation. In Figure 3.1 we plot both the true L2 error and the residual error estimate for four values of ε. The results confirm the analysis. 3.9.2 A Parameterized Second Order ODE Consider the second order boundary value problem du d α(s, t) =1 dt dt t ∈ [0, 1] (3.68) u(0) = 0 (3.69) u(1) = 0 (3.70) where, for ε > 0, α(s, t) = 1 + 4 cos(πs)(t2 − t), s ∈ [ε, 1]. (3.71) CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 46 0 0 10 Residual Error −− Pseudospectral Residual Error −− Galerkin 10 −5 10 −10 10 −15 10 0 ε=0.4 ε=0.3 ε=0.2 ε=0.1 5 −5 10 −10 10 −15 10 15 Order 20 25 30 10 0 ε=0.4 ε=0.3 ε=0.2 ε=0.1 5 10 15 Order 20 25 30 Figure 3.2: The convergence of the residual error estimate for the Galerkin and pseudospectral approximations applied to the parameterized matrix equation (3.73). The exact solution is u(s, t) = 1 ln 1 + 4 cos(πs)(t2 − t) . 8 cos(πs) (3.72) The solution u(s, t) has a singularity at s = 0 and t = 1/2. Notice that we have adjusted the range of s to be bounded away from 0 by ε. We use a standard piecewise linear Galerkin finite element method with 512 elements in the t domain to construct a stiffness matrix parameterized by s, i.e. (K0 + cos(πs)K1 )x(s) = b. (3.73) Figure 3.2 shows the convergence of the residual error estimate for both Galerkin and pseudospectral approximations as n increases. (Despite having the exact solution (3.72) available, we do not present the decay of the L2 error; it is dominated entirely by the discretization error in the t domain.) As ε gets closer to zero, the geometric convergence rate of the spectral methods degrades considerably. Also, note that each element of the parameterized stiffness matrix is an analytic function of s, but figure 3.2 verifies that the less expensive pseudospectral approximation converges at the same rate as the Galerkin approximation. CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 47 3.10 Summary We derived two basic spectral methods: (i) the interpolatory pseudospectral method, which approximates the coefficients of the truncated Fourier series with Gaussian quadrature formulas, and (ii) the Galerkin method, which finds an approximation in a finite dimensional subspace by requiring that the equation residual be orthogonal to the approximation space. The primary work involved in the pseudospectral method is solving the parameterized system at a finite set of parameter values, whereas the Galerkin method requires the solution of a coupled system of equations many times larger than the original parameterized system. We showed that one can interpret the differences between these two methods as a choice of when to truncate an infinite linear system of equations. Employing this relationship we derived conditions under which these two approximations are equivalent. In this case, there is no reason to solve the large coupled system of equations for the Galerkin approximation. Using classical techniques, we presented asymptotic error estimates relating the decay of the error to the size of the region of analyticity of the solution; we also derived a residual error estimate that may be more useful in practice. We verified the theoretical developments with two numerical examples: a 2 × 2 matrix equation and a finite element discretization of a parameterized second order ODE. The popularity of spectral methods for PDEs stems from their infinite (i.e. geometric) order of convergence for smooth functions compared to finite difference schemes. We have the same advantage in the case of parameterized matrix equations, plus the added bonus that there are no boundary conditions to consider. The primary concern for these methods is determining the value of the parameter closest to the domain that renders the system singular. 3.11 Application — PageRank Google’s PageRank model [72] provides an excellent example of a parameterized matrix equation with a single parameter in a real application. We present the results CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 48 of applying spectral methods to this parameterized system and interpret some the results. The goal of PageRank is to develop a measure of importance to rank web pages (or nodes in a graph, more generally). We will briefly derive the PageRank system here – omitting some important details – starting from the Markov chain formulation. The interested reader is encouraged to explore references [72] and [51] for further details. The PageRank model begins with a directed graph on N nodes and posits a random surfer traveling from node to node. At each step, the surfer either follows a link in the graph with probability α or jumps to a node in the graph uniformly with probability 1 − α. If the surfer follows a link, then he chooses amongst the outlinks of the current page with uniform probability. This model defines an irreducible, aperi- odic Markov chain with a unique stationary probability distribution. The stationary distribution measures the importance of a page, and this metric is used to rank a given subset of pages, such as those that are returned by a search query. The graph begets a column stochastic transition probability matrix P, and the vector of stationary probabilities (i.e. the PageRank vector) is the unique eigenvector x(α) with unit one-norm that satisfies (αP + (1 − α)veT )x(α) = x(α), (3.74) where e is an N-vector of ones. Equivalently, x(α) solves the parameterized matrix equation (I − αP)x(α) = (1 − α)v. (3.75) The current practice is to choose a value for α based on some modelling assumptions, solve the constant linear system of equations, and use the resulting solution vector as the PageRank vector. The choice of α is an unsettled issue in the PageRank literature, with the most popular choices being 0.85 [60] and 0.5 [4]. However, if we instead leave α variable, we can prescribe a weight function over the interval [0, 1] and interpret the α as a random variable with a given probability density function. This idea has been explored in [19]. CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 49 Notice that both the matrix and the right hand side in equation (3.75) depend linearly on the parameter α. Therefore, by Corollary 5, the Galerkin and pseudospectral approximations produce the same polynomial representations for n ≥ 2. The the matrix I − αP is actually singular for α = 1, since by equation (3.75), (I − P)y = 0 for some non-zero y. Initially, this may suggest that the solution has a singularity at the endpoint of the interval, thus destroying all hope for rapid convergence of the spectral methods. However, by construction (see equation (3.74)), the point α = 1 actually constitutes a removable discontinuity in the functions x(α). The closest true singularity on the real line lies beyond the endpoint 1. We test convergence by comparing the expectation and standard deviation of a pseudospectral approximation with a semi-analytical solution. Using the symbolic toolbox in Matlab, we compute the PageRank vector as a rational function of α on the 335 node connected component of the har500cc graph [65]. Using Mathematica, we numerically integrate to compute the expectation and standard deviation in 32digit arithmetic, which resolve the exact solution when converted to a double precision number. Figure 3.3 plots the convergence of expectation (solid circles) and standard deviation (plus signs) for four different Beta distribution models of α. The covariance structure in the components of the vector x(α) reveals new information about the underlying graph, and its elements may prove useful for additional websearch applications such as spam detection [34]. CHAPTER 3. SPECTRAL METHODS — UNIVARIATE APPROXIMATION 50 0 10 −5 10 −10 10 −15 10 0 10 20 30 40 50 60 70 80 90 100 Figure 3.3: The x-axis counts the number of points in the Gaussian quadrature rule for the expectation and standard deviation of PageRank. This is equivalent to the number of basis functions in the pseudospectral approximation. The y-axis measures the difference between between the approximation and the exact solution. Solid circles correspond to expectation and plus signs correspond to standard deviation. The different colors correspond to the following Beta distributions for α: β(2, 16, 0, 1) – blue, β(0, 0, 0.6, 0.9) – salmon, β(1, 1, 0.1, 0.9) – green, β(−0.5, −0.5, 0.2, 0.7) – red, where β(a, b, l, r) are the signifies distribution parameters a and b, and endpoints l and r. Chapter 4 Spectral Methods — Multivariate Approximation In this chapter we extend the univariate spectral methods to multivariate analogs, which will handle systems that depend on multiple parameters. In other words, we assume s = (s1 , . . . , sd ) where d > 1. Multivariate approximation schemes are notoriously expensive and quickly push the bounds of feasibility for even a modest number of variables. There is a vast literature on recent methods for alleviating this curse of dimensionality for certain classes of problems; Griebel provides an excellent survey of this work in [40]. Many of these techniques are now applied in the context of differential equations with stochastic inputs, where the inputs are typically represented as a function of a set of parameters [67, 89, 23, 10]. Such problems are closely related to parameterized matrix equations since discretization in the spatial and temporal domains often yields a linear system of equations whose elements depend on the input parameters. In Section 4.5, we develop a scheme that is robust and efficient for many types of problems within the parameterized matrix equation context. We will compare this proposed scheme to standard multivariate approximation schemes on two problems that showcase the features of the developed heuristic. We will also tackle a standard test problem from the literature. As a final performance assessment, in Chapter 5 we apply the proposed scheme to an engineering application where the standard methods 51 CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION52 are infeasible. Before we delve into heuristics, we present some of fundamental concepts of the multivariate approximation in a spectral methods context. 4.1 Tensor Product Extensions The most natural extension of the univariate methods to multivariate methods is via product basis functions. Since each parameter induces an independent coordinate direction in the parameter space, and since we assume a separable weight function Q on the space w(s) = k wk (sk ), then we can construct a multivariate orthogonal basis by taking products of the univariate basis functions. At this point, it is most convenient to employ the multi-index notation introduced in Chapter 2. Each multiindex α = (α1 , . . . , αd ) ∈ Nd has an associated basis function πα (s) = d Y παk (sk ). (4.1) k=1 The separability of the weight function implies that these multivariate polynomials are orthogonal with respect to the w(s) since, for α, β ∈ Nd hπα πβ i = d Z Y k=1 1 παk (sk )πβk (sk )wk (sk ) dsk = δαβ , (4.2) −1 where δαβ is one if each component of α and β are the same, and zero otherwise. Therefore, the multivariate orthogonal polynomials {πα (s), α ∈ Nd } are a basis for square-integrable functions of the parameter space. In other words, any square integrable function f : [−1, 1]d → R can expressed as f (s) = X α∈N where hf πα i are the Fourier coefficients. hf πα i πα (s), (4.3) For a finite term approximation, we use a subset of these basis functions and CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION53 compute approximate Fourier coefficients. A full tensor product basis consists of all possible products of the set of univariate polynomials for each parameter. Thus, if we define n = (n1 , . . . , nd ) ∈ Nd , then we can write the full tensor product basis (or just tensor basis) compactly with vector and Kronecker notation as π n (s) = π n1 (s1 ) ⊗ · · · ⊗ π nd (sd ), (4.4) where π nk (sk ) are the first nk univariate basis polynomials for the parameter sk . The Q number of terms in this basis is k nk , so that if all the nk ’s are equal, then this is ndk . In words, the number of basis functions in the basis grows exponentially as the number of parameters increases. In a similar way, a tensor product Gaussian quadrature rule is constructed from a set of univariate rules for each coordinate. Denote the set of d-variate quadrature nodes by λn = λn1 × · · · × λnd , (4.5) where λnk = (λ0 , . . . , λnk ) are the nk -point Gaussian quadrature nodes for the weight function wk (sk ). Each d-variate point λα can be indexed by a multi-index α. The associated weight να is computed as the product of the weights for each coordinate in the node, i.e. να = d Y ναk = k=1 d Y 1 1 = . 2 kπ nk (λαk )k2 kπ n (λα )k22 k=1 (4.6) Then we can write the integral of a function f : [−1, 1]d → R as hf i = X f (λα )να + R(f ), (4.7) α≤n−1 where R(f ) is the remainder. This is the multivariate version of equation (3.9). The notation α ≤ n is short-hand for the set of multi-indices {α ∈ Nd : 0 ≤ αk ≤ nk −1}. We will use the same notation for the Gaussian quadrature approximation as in the univariate case, where it is understood that the subscript n is a d-tuple of integers, CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION54 i.e. hf in ≡ X f (λα )να . (4.8) α≤n−1 There is a subtle but crucial difference between using the multi-indices to index the multivariate polynomials and using them for the quadrature points/weights. For the polynomials, a given multi-index will always refer to the same polynomial, regardless of the number of basis functions in an approximation. For example, π(2,3) (s1 , s2 ) = π2 (s1 )π3 (s2 ), always. However, the specific points on the real line in the quadrature formulas will depend on the total number of points in the formula. For example, a quadrature rule with n = (3, 3) will have λ(1,2) ≈ (0, 0.77), but for n = (4, 4), λ(1,2) ≈ (−0.34, 0.34). To avoid confusion, we will typically refer to the quadrature rule for a fixed n, and we will state explicitly otherwise. We can use the tensor grid of Gaussian quadrature points to construct a multivariate Lagrange interpolant, which is equivalent to spectral collocation (Section 3.3) for multiple parameters. The multivariate basis polynomials become products of the univariate Lagrange interpolating basis polynomials (equation (3.18)). More precisely, the tensor product interpolants of the solution x(s) of equation (2.3) is given by X xc,n (s) = α≤n−1 where ℓα (s) = d Y x(λα )ℓα (s) ≡ Xc ln (s), nY k −1 k=1 jk =0, jk 6=αk The N × Q k sk − λ j k . λαk − λjk (4.9) (4.10) nk constant matrix Xc (the subscript c is for collocation) has one column for each x(λα ), and ln (s) is a vector of the multivariate Lagrange basis polynomials. The tensor product pseudospectral approximation is identical to the univariate case but employs the multi-index notation. xp,n (s) = X α≤n−1 hxπα in πα (s) ≡ Xp π n (s), (4.11) where Xp are the pseudospectral coefficients. Using these tensor constructions, it CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION55 is straightforward to show – by adapting arguments from the proof of Theorem 2 with appropriately placed Kronecker products – that the multivariate collocation interpolant is equivalent to the multivariate pseudospectral approximation. The coefficients of the Galerkin approximation are computed in precisely the same way as the univariate case (see equation (3.29)) except using the multivariate basis polynomials. In other words, to compute Xg in the approximation xg,n = Xg π n (s), we solve the linear system of equations π n π Tn ⊗ A vec(Xg ) = hπ n ⊗ bi . (4.12) We will see in the next section that we can choose any subset of the multivariate orthogonal polynomials to construct the Galerkin approximation. We use the tensor construction here to develop analogs of the connections between the spectral methods for the multivariate case. To proceed with this development, we need to define a multivariate function of matrices. Let f : [−1, 1]d → R be analytic with series representation f (s) = X fα sα , (4.13) α∈Nd with constants fα . Let Jn = (Jn1 , . . . , Jnd ) be a collection of d matrices, and define f (Jn ) = X α∈Nd fα Jα1 1 ⊗ · · · ⊗ Jαd d . (4.14) By the Kronecker construction, the matrix f (Jn ) has dimension that is the respective products of the dimensions of the Jk . We use the notation Jk here to invoke the Jacobi matrices (equation (3.5)) for each coordinate direction sk with respective weight function wk (sk ). Using these notational conventions, Theorems 9 and 4 hold as written under the tensor basis construction, as well as Corollaries 5 and 6 and Proposition 8. The multivariate versions of these ideas merely require the multi-indices to replace the single indices in the basis elements and a d-tuple of integers for n. The proofs are CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION56 structured exactly as in the univariate case but liberally employ Kronecker products and their mixed product property [44]. Despite the straightforward theoretical extensions, the tensor constructions are highly impractical. The exponential growth in the size of the orthogonal basis and/or the number of quadrature points in the tensor grid makes computation entirely infeasible for a moderate number of parameters; even a linear approximation of the solution (two points per parameter) will require 2d points. To combat this exponential growth, we seek a Galerkin approximation with carefully chosen basis elements such that the number of elements is far smaller than the tensor basis. 4.2 Non-Tensor Basis Functions One distinct advantage of the Galerkin method over the decoupled pseudospectral method is that one may construct the approximation space using any subset of the multivariate basis functions. Therefore, if a crystal ball existed that could tell us precisely which basis functions were most important (i.e. the ones whose associated coefficients had the largest magnitude), then we could use only those basis functions for the approximation. While no such oracle actually exists, we can take advantage of the theoretical decay of the coefficients to effectively find the most important basis functions. In essence, for the functions x(s) that satisfy the parameterized matrix equation (2.3), we expect a particular type of decay in the coefficients due to the quasirational structure of the solution (equation (2.5)). We do not expect that x(s) will have entirely arbitrary or independent Fourier coefficients. Therefore, by constructing an inexpensive, low-order approximation with few terms, we can often discern which basis functions to add to best improve the approximation. This heuristic yields an iterative scheme for finding an efficient finite term approximation for the solution. The scheme we propose for approximating x(s) is naturally adaptive. If the solution varies differently along different parameters, the scheme will detect and exploit that anisotropy to efficiently choose an appropriate basis set. This detection is based on approximating the variance contributions from an ANOVA-like decomposition [57]. There is an intimate connection between the ANOVA decomposition and the Fourier CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION57 expansion of a square-integrable function – the ANOVA decomposition is equivalent to partitioning the Fourier coefficients according to subsets of associated parameters. We use this connection along with estimated asymptotic decay rates of the Fourier coefficients to construct the heuristic that drives our scheme. 4.3 A Multivariate Spectral Galerkin Method In this section, we recall the spectral Galerkin method proposed in Chapter 3 to approximate the solution x(s), and we extend it to the case of multiple parameters (d > 1). The spectral Galerkin method computes a finite dimensional approximation to x(s) such that each element of the equation residual is orthogonal to the approximation space. In one dimension, we use an approximation space spanned by the first n orthonormal polynomials, where the orthogonality was with respect to the given weight function. In multiple dimensions, we construct the multivariate basis functions as products of one dimensional orthonormal polynomials – see equation (4.1). The finite dimensional approximation space for the Galerkin method is defined as the span of a subset Π of the multivariate basis functions. This subset of basis functions is given by a subset of the multi-indices I ⊂ Nd , i.e. Π = {πα (s) : α ∈ I ⊂ Nd }. (4.15) In section 4.5.1 we discuss choices for this basis set. In what follows, we restate – for the sake of clarity – the Galerkin method including the generalization to arbitrary approximation spaces. Define the vector-valued function r(y, s) = A(s)y(s) − b(s). (4.16) We seek an RN -valued polynomial xg (s) ∈ span(Π) such that hri (xg )πα i = 0, i = 0, . . . , N − 1, α ∈ I. (4.17) CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION58 Let π(s) be the vector of elements in Π. We can write equations (4.17) in matrix notation as r(xg )π T = 0 (4.18) Axg π T = bπ T . (4.19) or equivalently Since each component of xg (s) is a polynomial from the span of Π, we can write its expansion as xg (s) = X α∈I xg,α πα (s) ≡ Xπ(s), (4.20) where X is a constant matrix of size N × |I|; the number |I| is the number of multiindices in I. Then equation (4.19) becomes AXππ T = bπ T . (4.21) T ππ ⊗ A vec(X) = hπ ⊗ bi . (4.22) Using the vec notation [37, Section 4.5], we can rewrite (4.21) as where vec(X) is an N|I| ×1 constant vector equal to the columns of X stacked on top of each other. The constant matrix ππ T ⊗ A has size N|I| × N|I| and a distinct block structure. The work required to compute the Galerkin approximation depends on how one computes the integrals in equation (4.22). If we assume that the cost of forming the system is negligible, then the costly part of the computation is solving the system (4.22). We expect an operation count of O(N 3 |I|3 ), in general, but many applications beget systems with sparsity or exploitable structure. Notice that any reduction in |I| dramatically reduces required work. The goal of the proposed approximation scheme is to generate an accurate approximation with a small number of expansion terms. In the next section we motivate a heuristic for choosing the basis functions by examining the relation between the multivariate Fourier series and the ANOVA decomposition. CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION59 4.4 L2 Decompositions In this section we briefly recall both the Fourier series and the ANOVA decomposition, and we discuss the connections between the two. 4.4.1 ANOVA Decomposition A square-integrable function f (s) defined on [−1, 1]d admits a decomposition into a sum of functions representing the effects of each subset of the independent variables on the total variability of f (s). We closely follow the notation from Liu and Owen [57], and we refer to that paper and its references for more detailed information. For subsets u ⊆ {1, . . . , d}, let |u| denote the cardinality of u. The indices in {1, . . . , d} and not in u is denoted by u′ . Let su denote the |u|-tuple of components sk with k ∈ u. The domain of su is the sub-hypercube [−1, 1]|u| . Let hf i be the mean of f (s) over the domain, and let σ 2 = h(f − hf i)2 i be the variance. The notation hf iu means integration against the weight function wu (s) = Q |u| k∈u wk (sk ) with respect to the variables su ∈ [−1, 1] , which results in a function that does not depend on su . The ANOVA decomposition of f (s) is given by f (s) = X fu (s), (4.23) u⊆{1,...,d} where fu (s) depends only on su . There are precisely 2d terms in this expansion. The functions fu are obtained by integrating over su′ and subtracting from f all terms for strict subsets of u, which results in a function depending only on su , fu (s) = hf iu′ − X fv (s) (4.24) v⊂u The convention is to set f∅ = hf i. It follows that hfu (s)fv (s)i = 0 for v 6= u, i.e. the CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION60 fu are orthogonal. The variance of fu is σu2 , and the following identity holds, X σ2 = σu2 , (4.25) u⊆{1,...,d} where σ∅2 = 0. The ANOVA decomposition is so named because it decomposes the variance into contributions by subsets of the independent variables. The functions fu (s) for u = 1, . . . , d are called the main effects, since these functions only depend on one of the variables. For functions fu (s) where u is not a singleton are called interaction effects. For the vector-valued function x(s) that solves equation (2.3), we define xu (s) component-wise, so that xu (s) is also a vector-valued function. 4.4.2 Fourier Series The Hilbert space L2 = {f : [−1, 1]d → R | hf 2 i < ∞} is spanned by the multivariate orthogonal polynomials πα (s) with α ∈ Nd . Therefore, each function f ∈ L2 can be written f (s) = X fˆα πα (s), (4.26) α∈Nd where fˆα = hf πα i are the Fourier coefficients. Let I ⊂ Nd be a finite subset and Π be the set of basis polynomials associated with I. The projection PI f (s) = X fˆα πα (s) (4.27) α∈I is the best approximation polynomial in the L2 norm amongst all polynomials in span(Π). CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION61 4.4.3 Connections Between ANOVA and Fourier Series We first rewrite the Fourier series (4.26) as f (s) = ≡ ∞ X ∞ X α1 =0 α2 =0 ∞ X ··· ∞ X fˆα1 ,α2 ,...,αd πα1 (s1 )πα2 (s2 ) · · · παd (sd ) αd =0 fˆα{1,...,d} α{1,...,d} =0 Y παi (si ) i∈{1,...,d} Notice that integrating against some subset u′ of the variables s leaves hf iu′ = ∞ X αu =0 fˆαu Y παi (si ), (4.28) i∈u R1 since π0 (si ) = 1 and −1 παi (si )wi (si ) dsi = 0 for αi > 0 and i = 1, . . . , d. The notation fˆαu denotes the Fourier coefficient with index αi in the ith position for i ∈ u and zero in the jth position for j ∈ u′ . Therefore, since hf i = fˆ0,...,0 , we have the relations fu (s) = ∞ X αu =1 fˆαu Y παi (si ) i∈u σu2 = ∞ X fˆα2u . (4.29) αu =1 In words, the ANOVA decomposition naturally partitions the Fourier series by its indices. The nonzero components of the multi-index α of a coefficient fˆα prescribe exactly to which fu it belongs. In essence, the ANOVA decomposition collapses the infinite series associated with each component index to a single term, hence the 2d terms in the ANOVA expansion. The goal of the spectral Galerkin method is to approximate the Fourier coefficients of x(s). Thus, if we merely group the Galerkin coefficients according to the nonzero components of their multi-indices, then we have an approximation of the ANOVA expansion. And summing the squares of each set of coefficients yields the variance contributions. This connection motivates an adaptive scheme for choosing a good approximation with a limited number of terms. CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION62 4.5 Developing A Heuristic In this section we develop a heuristic for choosing the basis elements in the truncated expansion to reduce the size of the Galerkin system (4.22) while maintaining a good approximation. We treat the approximation as an iterative procedure. In other words, given n we compute an approximation using the set of basis functions for some index set parameterized by n. We then use the characteristics of the level n approximation to choose the basis set for level n + 1. We repeat this procedure until some stopping criteria is satisfied. There are some straightforward implementation tricks to reuse the solution at level n to help compute the solution at level n + 1, but we will not discuss them here. To develop the heuristic, it will be very convenient to assume that we are working with the Chebyshev expansion of x(s), and the weight function is the product of one dimensional weight functions for the Chebyshev polynomials, i.e. wi (si ) = (1 − s2i )−1/2 . In this case, the truncated Chebyshev series is the truncated Fourier series. Asymptotically, we expect that the Galerkin coefficients will behave much like the Chebyshev coefficients, which justifies this assumption. 4.5.1 Incrementing a Basis Set The most natural extension of the one dimensional basis to higher dimensions is via the tensor products discussed in Section 4.1. This is equivalent to using an approximation space of multivariate polynomials with largest degree in each variable up to order n. (We assume for simplicity that all components of the multi-index n are equal.) In other words, the set of multi-indices corresponding to the tensor basis is In = α ∈ N : max αi ≤ n . d 1≤i≤d (4.30) There are |In | = nd terms in this set, which makes it at best impractical and at worst infeasible for applications with a moderately large number of dimensions. An alternative basis uses the full polynomials, which is more standard for analysis of orthogonal polynomials in several variables [24]. This basis includes polynomials CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION63 where the sum of the degrees in each variable is less than or equal to n, In = α ∈ Nd : α1 + · · · + αd ≤ n . The number of terms in this set is |In | = (4.31) n+d n . This is also the basis used for the so-called polynomial chaos methods [33, 90]. The number n+d is smaller than nd , n but suffers the same exponential rate of growth asymptotically. In fact, we expect the full polynomial basis to be a much more efficient approximation, particularly for low order approximations. We borrow a result from Bochner and Martin [11] by way of Boyd [13][Theorem 11] related to the asymptotic decay of the multivariate Chebyshev coefficients to justify this claim. Theorem 14 (Multivariate Chebyshev Series) Let sj , j = 1, . . . , d, denote an ordinary complex variable and let f (s1 , . . . , sd ) be analytic in the elliptic polycylinder q n o ǫ = sj + s2j + 1 < rj , j = 1, . . . , d (4.32) X (4.33) where rj > 1. Then f (s1 , . . . , sd ) has the unique expansion f (s1 , . . . , sd ) = aα Tα (s1 , . . . , sd ) α∈Nd which is valid in ǫ. The Tα (s1 , . . . , sd ) are the multivariate product Chebyshev polynomials. Convergence is uniform and absolute in every interior polycylinder E(ρ1 , . . . , ρj ) for which 1 < ρj < rj , j = 1, . . . , d. The coefficients satisfy the bound |aα | ≤ K(ρ) d Y k ρ−α k (4.34) k=1 for some constant K(ρ). From the bound (4.34), we expect the cross-term coefficients to decay faster than the coefficients associated with only a single parameter. Thus the tensor product expansion will use more basis functions than necessary to capture the essential features of CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION64 the function in the Chebyshev space. We can trim the corners of the tensor product approximation to get the full polynomial expansion and a more efficient approximation; this metaphor makes sense after examining the basis functions in Figure 4.1. However, we will see examples in Section 4.6 where this bound is very loose, and even the full polynomial basis uses too many basis elements. Examining the full and tensor sets of multi-indices, we notice a correspondence between the tensor product set (4.30) and the standard infinity norm on Rd , and similarly the full polynomial set (4.31) and the standard one norm on Rd . We can generalize these sets by considering more general semi-norms, such as a weighted p-norm. Consider the set Iγ,n = α ∈ Nd , γ ∈ Rd+ : d X (γj αj )p i=1 !1/p ≤n . (4.35) We can choose the parameters γi and p of this set to further reduce the number of term in the basis. If γi ≥ 1 and p ≤ 1, then |In | ≤ n+d . The goal, however, is to n choose these parameters to construct a sequence of basis sets indexed by n that are appropriate for the particular solution. For example, if the Chebyshev coefficients corresponding to si decay much slower than those corresponding to sj , then we can use the parameter γi to accelerate the number of included terms associated with si as n increases. Equivalently, we can use γj to exclude basis functions associated with sj until n increases sufficiently. Instead of choosing a single parameter p, we can choose a different pu for each subset of variables su to create Iu,γ,n |u| = α ∈ N|u| , γ ∈ R+ : X (γj αj )pu j∈u Then we can choose the full index set to be Iγ,n = [ u∈{1,...,d} Iu,γ,n . !1/pu ≤n . (4.36) (4.37) CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION65 Figure 4.1: For d = 2, the shaded squares represent the included multi-indices for the various basis sets with n = 10. The weighted index set with no curvature has weights 1 and 2. The weighted index set with curvature has curvature parameter p = 0.7. The parameters pu will dictate the number of included Chebyshev coefficients associated with all the variables su . Examples of index sets in two-dimensions corresponding to choices of γ and pu are given in Figure 4.1. In what follows, we describe how to use the ANOVA information to choose the values γ and pu for the basis set (4.37). 4.5.2 Generating a Score Vector The computed matrix of coefficients X of size N × |In | contains the expansion coeffi- cients (which we assume are Chebyshev coefficients) for each component xi (s). The first step is to generate a vector of length |In | from this matrix. This will give us one number ẑα associated with each basis element πα (s), where α ∈ In . Define ẑ to be the vector of numbers ẑα . We motivate the choice of ẑ by noting the relationship of the expansion Xπ(s) from equation (3.27) to the Karhunen-Loeve (KL) expansion of a random process [58], CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION66 which decomposes a random process as a countable series in the eigenfunctions of its two-point covariance function. We first partition h i X = x0 Xr h iT π(s) = 1 π r (s) . (4.38) where x0 is the first column of X and Xr are the remaining columns. If we take the singular value decomposition Xr = UΣVT , then by the orthogonality of π(s), we can write the covariance matrix of the functions Xπ(s), Cov(Xπ(s)) = Xr XTr = UΣΣT UT (4.39) This is exactly an eigendecomposition of the covariance matrix for the Galerkin approximation. Next note that the |In | − 1 functions ξ(s) ≡ VT π r (s) are orthonormal since, T ξξ = VT π r π Tr V = I. (4.40) Define M = min(N − 1, |In | − 2). Then we can write Xπ(s) = x0 + Xr π r (s) = x0 + UΣVT π r (s) = x0 + UΣξ(s) = x0 + M X uj σ̄j ξj (s), j=0 where uj are the eigenvectors of the covariance matrix of Xπ(s), as in the KL expansion. The σ̄j are the singular values, which are denoted by a bar to distinguish them from the variance contributions in the ANOVA decomposition. By the construction of ξ(s), we can treat the scaled right singular vectors λj vj as a score on the basis functions π r (s). Each vj contains an element vj,α for the basis function πα (s). By multiplying vj,α by σ̄j and taking its absolute value, we create a ranking for πα (s) corresponding to the jth singular value. From these, we can construct a ranking for CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION67 the basis function πα (s) with ẑα = M X j=0 |σ̄j vj,α |. (4.41) For the constant term, we use the 2-norm of the first column of X, i.e. ẑ0 = kx0 k2 . With the vector ẑ, we have implicitly constructed a function zn (s) = X ẑα πα (s). (4.42) α∈In We assume that we can extend this definition for all n. In other words we assume there exists a square-integrable function z(s) : Rd → R such that kz − zn kL2 → 0 as n → ∞. Additionally, we assume that z(s) is analytic in the same region that xi (s) is analytic for i = 0, . . . , N − 1. In essence, the function z(s) amalgamates the behavior of the functions xi (s), and using its coefficients, we can compute global ANOVA-like measures for the solution vector x(s). 4.5.3 Computing The Dimension Weights Consider the univariate main effects functions zk (s) = zk (sk ) for k = 1, . . . , d from the ANOVA decomposition of the newly constructed z(s). Using spectral theory for univariate analytic functions, we assume that there exist ρk > 1 such that kzk − Pn zk k2L2 = ∞ X 2 ẑk,j = Ck2 ρ−2n , k (4.43) j=n where the ρk are related to the largest ellipse in the complex plane in which zk (s) is analytic. The second equality should be an inequality; equation (4.43) motivates the heuristic. Note that for any j, k ∈ {1, . . . , d}, there is a constant µj,k such that −µj,k n = Ck ρk Cj ρ−n j . (4.44) The constant µj,k is a relative measure of how fast the Chebyshev coefficients of zj (s) decay relative to the coefficients of zk (s). In other words, if incrementing n by one CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION68 reduces the error in the Fourier-Chebyshev projection Pn zj (s) by 1/ρj , then we expect a comparable error reduction for the projection Pn zk (s) if we increase n to the integer nearest n + µj,k . Solving for this constant, we get µj,k = log(ρj ) log(ρj ) log(Cj ) − log(Ck ) − ≈ log(ρk ) n log(ρk ) log(ρk ) (4.45) for sufficiently large n. Using equations (4.43) and (4.29), we relate the ρk to the variance contributions from the main effects. Let I be the coefficient of z(s) associated with the constant function. If we first set n = 0, then for k = 1, . . . , d, Ck2 = ∞ X 2 ẑk,j = I 2 + σk2 . (4.46) j=0 Next set n = 1 to get ∞ X Ck2 2 = ẑk,j = σk2 ρ2k j=1 Solving for ρk , we get ρk = s I 2 + σk2 . σk2 (4.47) (4.48) Define ρ∗ = min1≤k≤d ρk , and let σ∗ be the associated variance contribution. Then using (4.45), we compute the parameters γ in the definition of the index set (4.36) as γk = log(ρk ) log(I 2 + σk2 ) − log(σk2 ) = . log(ρ∗ ) log(I 2 + σ∗2 ) − log(σ∗2 ) (4.49) According to this definition, γk ≥ 1 for all k. The set of k such that γk = 1 are associ- ated with the main effects functions whose Chebyshev coefficients decay the slowest. Incrementing n by one in the definition of the index set (4.37) will always increase the order of approximation associated with the slowest converging main effects by one. The approximation along main effects associated with γk > 1 will not always be incremented for each increment of n. Instead, it will wait for an increment sufficiently large so that the error reduction along that main effect is comparable to the slowest CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION69 decaying main effects. This dramatically reduces the number of basis functions for each increment of n if there is detectable anisotropy in the function z(s). Clearly, a more aggressive approach would be to set γk = log(ρk )/ log(maxk ρk ). This definition would have the opposite affect of accelerating the series approximation along the slower decaying main effects with each increment of the main effects with faster decay rates. We prefer the more conservative approach since it yields fewer additional basis functions per iteration. 4.5.4 Estimating the Curvature Parameters The curvature parameters pu in the definition of the index sets (4.36) control how many mixed terms are allowed in a particular truncated representation. If pu = 1, then the mixed term basis functions will be the full polynomial basis for the variables su . If pu < 1, the basis will exclude many mixed terms, and if pu > 1, the basis will include mixed terms beyond the full polynomial basis. If we treat the bound (4.34) on coefficients of mixed terms from Theorem 14 as an equality, then we can use this relation as an estimate of what we expect from the variance contribution of the interaction effects σu2 , where u is not a singleton. By inspection, this estimate corresponds with pu = 1, i.e. if the mixed coefficients decay according to (4.34), then the full polynomial basis is a very efficient basis set. And this gives us a reference point for comparison. Define the quantity τu2 = I 2 X Y −2αj ρj . (4.50) αu ∈Nd j∈u This is the variance contribution we would get if the coefficients decayed like (4.34), which corresponds to pu = 1. We then compute the actual variance contribution, σu2 = X ẑα2 u . (4.51) pu = (σu2 /τu2 )r . (4.52) αu ∈Nd From these quantities, we set CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION70 The parameter r controls how much impact discrepancies between τu2 and σu2 have on the number of terms in the basis set. We conservatively set r = 0.1, which has the tendency to push both large and small discrepancies toward 1. There is a strong caveat for this measure. If z(s) is an even (odd) function on a symmetric interval with symmetric weight function, then every coefficient associated with a basis function with an odd (even) degree will be zero. If this is known a priori, then equation (4.52) can be corrected accordingly. 4.5.5 Stopping Criteria The most natural stopping criteria for general problems is the residual error estimate developed in Theorem 13. For a vector-valued approximation y(s), we compute the norm of the residual r(y, s) = A(s)y(s) − b(s) as kr(y)kL2 v uN −1 Z uX =t i=0 [−1,1]d ri2 (y, s)w(s) ds. (4.53) In practice, we may use a Gaussian quadrature rule to evaluate this quantity. We stop once this residual falls below a set tolerance. 4.5.6 Algorithm Since we cannot compute the exact variance contributions σu2 without the full (infinite) expansion, we approximate these terms using the coefficients we have at a given 2 iteration; denote this quantity by σu,n . As n increases, we add more terms to the 2 approaches σu2 as quickly approximate variance contributions, and we expect that σu,n as the approximation converges to the true solution. We summarize the above discussion in the following algorithm. First choose an initial basis set I0 from the possible choices in Section 4.5.1, and compute the co- efficients X0 . To get initial values for all curvature parameters pu and weights γk , the initial basis set must have at least one term for each coordinate sk , k = 1, . . . , d and each interaction su , u ⊂ {1, . . . , d}. The simplest such expansion uses a tensor CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION71 product basis with n = 1. Then while the residual is greater than TOL, repeat the following steps. 1. Compute a score vector ẑ from the coefficients Xn using the method described in Section 4.5.2. 2. Set In2 = ẑ02 . For k = 1, . . . , d, compute 2 σk,n = X ẑα2 (k) . (4.54) α(k) ∈In 2 Set σ∗2 = maxk σk,n , and compute the weights γk,n = log(In2 + σ∗2 ) − log(σ∗2 ) . 2 2 ) ) − log(σk,n log(In2 + σk,n (4.55) If the set {α(k) ∈ In } is empty for some iteration, then use the initial values 2 computed from the approximation with basis index set I0 . If σk,n = 0, then set γk to a very large number. 3. For u ⊂ {1, . . . , d}, compute 2 τu,n = In X Y αu ∈In k∈u 2 σk,n 2 In2 + σk,n 2 σu,n = X ẑα2 (u) (4.56) α(u) ∈In 2 2 and set pu = (σu,n /τu,n )r ; we set r = 0.1. If the set {αu ∈ In } is empty, then 2 set pu to values computed from the initial approximation. If σu,n = 0, then set pu to a very small number. 4. Determine the index set In+1,γ,p and compute the coefficients Xn+1 of the approximation xg,n+1 (s). 5. Compute the residual error estimate of xg,n+1 (s). We apply this algorithm in the next section to a series of representative numerical examples to demonstrate its behavior. CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION72 4.6 Numerical Examples We first present three examples of 2 × 2 matrix equations dependent on two parameters, s1 and s2 , where each problem corresponds to a different type of behavior we expect from general situations. 4.6.1 Anisotropic Parameter Dependence Let ε2 > ε1 > 0, and consider the equation " 1 + ε 1 s1 s1 1 #" #" # 1 + ε2 s2 x1 (s1 , s2 ) s2 1 x2 (s1 , s2 ) = " # 1 1 , (s1 , s2 ) ∈ [−1, 1]2 . (4.57) We set ε1 = 0.2 and ε2 = 2 to induce anisotropic parameter dependence in the solution. In particular, the region of analyticity with respect to s1 is smaller than the region of analyticity with respect to s2 . Therefore, we expect the Galerkin coefficients corresponding to s1 to decay much slower than those corresponding to s2 . In Figure 4.2 we show the pseudospectral coefficients of a tensor product approximation of order 50 in each variable. The coloring corresponds to the magnitude of the log of each coefficient squared. The difference in decay rates is clearly visible from the asymmetry of the region of large coefficients. We show the included basis functions for the Galerkin approximation with the weighted basis of order 50, which shows how our method finds the region of large coefficients. 4.6.2 Small Interaction Effects Let ε > 0, and consider the equation " 2 + s1 ε ε 2 + s2 #" # x1 (s1 , s2 ) x2 (s1 , s2 ) = " # 1 1 , (s1 , s2 ) ∈ [−1, 1]2 . (4.58) If ε = 0, then x1 = x1 (s1 ) and x2 = x2 (s2 ), i.e. there will be no interaction effects. Therefore, we expect that a small ε will result in very small interaction effects in the functions x1 (s1 , s2 ) and x2 (s1 , s2 ). For the following results, we set ε = 0.01. In Figure CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION73 50 50 45 40 40 35 α 2 30 α2 30 20 25 20 15 10 10 5 10 20 α 30 40 50 5 1 10 15 20 25 α 30 35 40 45 50 1 Figure 4.2: Tensor product pseudospectral coefficients (color) of a solution with anisotropic parameter dependence along with the included coefficients (black & white) of the non-tensor weighted Galerkin approximation. 4.3 we present the pseudospectral coefficients for a tensor product approximation of order 50, and the included coefficients for the weighted Galerkin basis of order 50. The coefficients corresponding to the interaction effects are clearly smaller than those for the main effects (along the boundaries), and the weighted basis with curvature finds an efficient basis set corresponding to such weak interaction effects. To accentuate the effect, we use a more aggressive curvature parameter r = 0.2 (see equation (4.52)). 4.6.3 Large Interaction Effects Let ε > 0, and consider the equation " 4 s2 + s1 s2 s1 + s1 s2 1 #" # x1 (s1 , s2 ) x2 (s1 , s2 ) = " # 1 1 , (s1 , s2 ) ∈ [−1, 1]2 . (4.59) Here we expect large interaction effects since much of the variability from parametric variation comes from the term s1 s2 . In Figure 4.4, we again show the pseudospectral coefficients for a tensor product approximation of order 50 side-by-side with the included coefficients of a weighted Galerkin approximation of order 50. Notice how the included basis elements curve outward to capture the relatively large coefficients CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION74 50 50 45 40 40 35 α 2 30 α2 30 20 25 20 15 10 10 5 10 20 α 30 40 50 5 10 15 20 1 25 α 30 35 40 45 50 1 Figure 4.3: Tensor product pseudospectral coefficients (color) of a solution with weak interaction effects along with the included coefficients (black & white) of the nontensor weighted Galerkin approximation. corresponding to the interaction effects. 4.6.4 High Dimensional Problem Next we test our heuristic on a second order boundary value problem derived from the linear elliptic PDE with random coefficients developed in [67]. We seek the function u(t, s) that satisfies the equation d − dt du ad (t, s) = cos(t), dt √ √ t ∈ [0, 1], s ∈ [− 3, 3]d u(0, s) = u(1, s) = 0. (4.60) (4.61) The parameterized coefficient ad (t, s) is given by log(ad (t, s) − 0.5) = 1 + s1 where √ ζk = ( πL)1/2 exp √ πL 2 1/2 −(⌊ k2 ⌋πL)2 8 ! + d X ζk φk (t)sk (4.62) k=2 if k > 1 (4.63) CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION75 50 50 45 40 40 35 α 2 30 α2 30 20 25 20 15 10 10 5 10 20 α 30 40 50 5 10 15 1 20 25 α 30 35 40 45 50 1 Figure 4.4: Tensor product pseudospectral coefficients (color) of a solution with strong interaction effects along with the included coefficients (black & white) of the nontensor weighted Galerkin approximation. and k sin ⌊ 2 ⌋πt if k even, kLp φk (t) = ⌋πt ⌊ cos 2 if k odd. Lp (4.64) Let Lc represent the physical correlation length for a random field a, meaning the random variables a(t1 ) and a(t2 ) become essentially uncorrelated for |t1 − t2 | >> Lc . Then Lp in (4.64) can be taken as Lp = max(1, 2Lc ) and the parameter L in (4.62) and (4.63) is L = Lc /Lp . Under this set up, the expression in (4.62) approximates a random field with stationary covariance (t1 − t2 )2 Cov(log(a − 0.5))(t1 , t2 ) = exp − . L2c (4.65) It has been shown in [67] that the region of analyticity for u(t, s) with respect to sk grows as k increases. Thus, we expect that the Galerkin coefficients of u(t, s) associated with sk will decay at increasingly faster rates as k increases – a clear sign of anisotropic dependence on s. They also show in [67] that the anisotropy increases as the correlation length Lc decreases. We present numerical results for d = 5 with correlation length 3/4. CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION76 0 10 −2 10 −4 Magnitude 10 −6 10 −8 10 −10 10 −12 10 −14 10 1 1 2 3 4 5 2 3 4 5 Order 6 7 8 9 Figure 4.5: Decay of the coefficients associated with the main effects for the high dimensional elliptic problem computed with a spectral Galerkin method with a full polynomial basis. We use the standard piecewise linear finite element approximation with 512 elements to discretize (4.60) in the t domain. This yields the parameterized matrix equation (K0 + Kr (s))x(s) = b. (4.66) The matrix K0 is the standard symmetric, tridiagonal and positive definite matrix that results from discretizing a one dimensional Poisson’s equation with coefficient 0.5. The parameterized matrix Kr (s) is symmetric and tridiagonal for all s; its elements depend nonlinearly but analytically on the parameters. We expect anisotropic parameter dependence in the solution, but no weak interaction effects. Therefore, we apply the weighted Galerkin method without the curvature parameters. To see the anisotropy more clearly, we first compute a Galerkin approximation with the standard full polynomial basis. In Figure 4.5 we plot the decay of the Galerkin coefficients corresponding to the main effects. The coefficients associated with the last two variables decay much faster than the first three, which is entirely expected by construction. CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION77 0 10 −2 10 −4 Magnitude 10 −6 10 −8 10 −10 10 −12 10 −14 10 0 1 2 3 4 5 5 10 15 Order Figure 4.6: Decay of the coefficients associated with the main effects for the high dimensional elliptic problem computed with a spectral Galerkin method with an ANOVA-based weighted basis. Using the adaptive, weighted basis, we find the anisotropic parameter dependence and exploit it by including more coefficients for main effects functions with slower coefficient decay. In Figure 4.6, we plot the main effects coefficients of the weighted approximation at n = 15. The weighted basis has actually overshot the optimal basis set by including more coefficients than necessary for the most important main effects. Thus, the weighting scheme may need some tuning for this particular problem. However, the general results are encouraging. The ANOVA-based weighted scheme found and exploited the anisotropy. In Figure 4.7, we plot the weights that determine the included indices in the weighted basis as n increases. By construction, the smallest weight is always 1, which corresponds to the most important parameter (where “importance” means the one that contributes the most to the variability). The remaining weights indicate how often the weighted basis expands along the other coordinates as n increases. Notice that the weights converge to their proper values, and the two distinct convergence rates from the full polynomial basis (Figure 4.5) are clearly distinguishable in the separation of the weights. CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION78 3.5 3 weights 2.5 2 1.5 1 1 2 3 4 5 0.5 0 4 6 8 10 12 14 n Figure 4.7: Convergence of the weights in the ANOVA-based weighted scheme as n increases. Finally, in Figure 4.8 we plot the residual error estimate from Theorem 13 for both the full polynomial basis and the ANOVA-based weighted basis against the number of terms in the approximation. The overshoot in the weights yields an approximation that is not as efficient as possible, but the ability to discover and exploit the anisotropy is very encouraging. 4.7 Summary We have extended the univariate spectral methods presented in Chapter 3 to multivariate methods. The most natural extension to the multivariate case is via the tensor product bases, but the cost of computing these approximations grows exponentially with the number of parameters; this is the dreaded curse of dimensionality. To combat this curse, we propose an anisotropic multivariate Galerkin method that exploits anisotropic parameter dependence in the solution. The method uses the information from an approximate ANOVA decomposition to determine the most important parameters with respect to variability, as well revealing the importance of the CHAPTER 4. SPECTRAL METHODS — MULTIVARIATE APPROXIMATION79 −4 10 ANOVA−based Full Polynomial −5 Residual 10 −6 10 −7 10 −8 10 1 10 2 3 10 10 4 10 Number of Terms Figure 4.8: Convergence of the residual error estimate for both the ANOVA-based anisotropic basis and the full polynomial basis plotted against the number of basis elements in each approximation. interaction effects. The method is essentially iterative; at each iteration it uses the ANOVA information from the lower order approximation to construct an appropriate set of basis functions for the next higher order approximation. In this way, it uncovers many qualitative behaviors of the solution in an efficient way. We tested the method on a series of toy parameterized matrix equations to showcase its features, and we applied it to a standard problem from the literature with encouraging results. Since the method is based on heuristics, there are opportunities for improvement, which we will pursue in future research activities. Chapter 5 Strategies for Large-Scale Problems A spirited debate continues in the uncertainty quantification community amongst those who work with PDEs with random inputs over whether or not the more expensive Galerkin method holds any advantages over the decoupled, interpolatory pseudospectral method. More generally, the debate is between (i) nonintrusive methods (pseudospectral, collocation, Monte Carlo), which use only function evaluations at points in the parameter space yielding maximum code reuse, and (ii) intrusive methods which require additional coding effort to solve larger systems related to the original parameterized problem. Some argue that the system solved for the Galerkin typically contains fewer equations than those for the pseudospectral methods which results in savings from a linear solver perspective [77, 10]. Others contend that the Galerkin approximation produces more efficient and accurate approximations for a fixed polynomial order [21]. From a practical perspective, one can adaptively control the choice of basis functions with maximum flexibility from an intrusive framework [10, 9]. Nevertheless, the nonintrusive interpolation methods (collocation, pseudospectral) retain an asymptotic rate of convergence comparable to the intrusive Galerkin methods [6, 89], so that the dramatically simpler implementation often overcomes these stated advantages – especially when quickly devising experiments. For the case of parameterized matrix equations, the computational effort for a 80 CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 81 fixed order approximation is a choice between (i) many solves of the constant linear systems A(s0 )x(s0 ) = b(s0 ), s0 ∈ [−1, 1]d , (5.1) where the s0 are chosen according to a Gaussian quadrature rule, or (ii) a single solve of the system T ππ ⊗ A vec(X) = hπ ⊗ bi , (5.2) for a given basis set π(s). Regardless of the choice of intrusive Galerkin or nonintrusive pseudospectral methods, the scale of the computations required for accurate approximation – particularly for multiple parameters – is enormous. The nonintrusive varieties may require hundreds of thousands of function evaluations, whereas the system (5.2) for the Galerkin approximation maybe be intractably large and/or difficult to form. In this chapter, we will focus on large-scale problems of the Galerkin form (5.2) for two reasons. First, in Chapter 4 we presented an anisotropic approximation method based on the Galerkin framework where the basis elements were chosen adaptively. To be consistent with this method, we seek to solve the large systems corresponding to that scheme. Secondly, tackling the large-scale problem of solving many linear systems in the nonintrusive context can effectively be swept under the proverbial rug by massively parallel machines. In other words, it is a problem of capacity – solving a large number of small problems – instead of capability – solving one large problem. And if we assume that such solvers are optimally tuned/preconditioned for the structure of the system, then there is little to be improved upon from a broader, global perspective. However, numerous challenges arise when solving (5.2). Firstly, is it possible to reuse code in a sense similar to the nonintrusive methods? In Section 5.1, we propose a weakly intrusive paradigm, which allows for substantial code reuse when computing the Galerkin approximation and offers a much-needed middle ground in the continuing debate between intrusive and nonintrusive methods. Even if we can optimally choose the perfect basis for a finite term approximation, the size of the system (5.2) may still be daunting. If the size of the parameterized CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 82 matrix N or the number of basis functions |I| is large, then the system can quickly become unwieldy. It may not fit into memory, and/or the integrals may be very difficult to evaluate. In other words, this bare-bones formulation does not lend itself to large-scale problems. To alleviate this challenge, we analyze a variant of the Galerkin method known as Galerkin with Numerical Integration (G-NI) [16] in Section 5.2. The derivation of G-NI is identical to the standard Galerkin derivation but each integral is replaced by a tensor-product Gaussian quadrature formula. This simple change – which is often implemented in practice without analytical considerations – surprisingly begets a useful decomposition of the matrix ππ T ⊗ A . We will use this decomposition to place the G-NI method within the weakly intrusive paradigm, as well as derive useful insights into the system (5.2) including eigenvalue bounds, preconditioning strategies, and an elegant interpretation of the approximation problem as a weighted least-squares problem. 5.1 A Weakly Intrusive Paradigm The broadest interpretation of the terms intrusive and nonintrusive is straightforward: nonintrusive methods take advantage of existing codes or solvers (with possible pre- or post-processing) and intrusive methods require new codes or solvers to be written. To formally introduce an alternative paradigm, we need to dramatically restrict these interpretations. From the viewpoint of the parameterized matrix equations, we define a nonintrusive method as one that uses the vectors x(s0 ) = A(s0 )−1 b(s0 ) for some point s0 in the parameter space along with possible pre-processing (such as computing the Gaussian quadrature nodes) and post-processing (such as forming pseudospectral coefficients) computations. In words, we say that the nonintrusive methods may use evaluations (or samples) of the solution vector at a set of points in the parameter space. An intrusive method is one that does not use such point evaluations of the solution. In the case of the Galerkin method, it requires the solution of a larger related system of equations. But notice that sampling the solution is nowhere to be found in the method derivation. CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 83 The essential idea behind a weakly intrusive paradigm is that it allows evaluation of the parameterized matrix operator and the parameterized right hand side at points in the parameter space. Loosely speaking, if nonintrusive methods sample the solution, then weakly intrusive methods sample the operators, or strictly the action of the operators on a vector. More formally, we classify an algorithm as weakly intrusive if it requires only 1. matrix-vector products of the form A(s0 )y for a point s0 in the parameter space and a given N-vector y, 2. evaluation of the vector b(s0 ) for a point s0 in the parameter space, 3. additional pre- and/or post-processing. In this way, we are allowed to examine the action of the parameterized matrix throughout the parameter space by observing its effects on given vectors. The key to the weakly intrusive paradigm is that we restrict the algorithms to matrix-vector multiplies where the matrix is the parameterized operator evaluated at a point in the parameter space. We are essentially taking a page from methods for iterative linear solvers (such as MINRES [73]) and eigenvalue solvers (such as Lanczos’ method [50]) that use only matrix-vector products. In fact, we retain some advantages of such methods, including exploiting sparse structures in the matrices by only needing the nonzero elements. This can dramatically reduce memory requirements for the methods. The workhorse of the method is a function f (s0 , y) = A(s0 )y that computes the matrix-vector products given a d-dimensional point and an N-vector. By taking advantage of this interface, we can write reusable software libraries similar to existing iterative solver libraries [84, 8, 55] that require only matrix-vector multiplies. The development of libraries with a common interface will encourage the widespread acceptance of UQ methods. Software based solely on black-box nonintrusive methods, such as Sandia’s DAKOTA framework [25], cannot accomodate problem dependent algorithms. The weakly intrusive paradigm proposes a software interface that improves on this shortfall of purely nonintrusive methods. CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 84 An additional and crucial advantage of the looser requirements of a weakly intrusive method compared to a nonintrusive method is that one can estimate the residual error estimate from Theorem 13 using only matrix-vector products of the parameterized system evaluated at a set of Gaussian quadrature points. Such a posteriori error estimates are not computable in a purely nonintrusive framework. Notice that the nonintrusive methods can easily satisfy the restrictions of the weakly intrusive paradigm if the systems A(s0 )x(s0 ) = b(s0 ) are solved with a Krylovbased iterative method that uses matrix-vector products. In what follows, we derive the G-NI method and show how it can satisfy the weakly intrusive requirements, as well. 5.2 Galerkin with Numerical Integration (G-NI) The G-NI method is identical to the Galerkin method developed in Section 4.3 but we replace each integral h·i by a tensor product Gaussian quadrature rule h·in for some n = (n1 , . . . , nd ) (see equation (4.8)). The derivation is not repeated here, but follows exactly as in 4.3 using quadrature rules instead of integrals. The system of equations to be solved is then T ππ ⊗ A n vec(X) = hπ ⊗ bin , (5.3) where the solution vector vec(X) collects the coefficients of the G-NI approximation xgni (s) = X α∈I xα πα (s) ≡ Xπ(s) (5.4) for some set of multi-indices I ⊂ Nd . The conditions for equivalence are straightfor- ward generalizations of the theorems in Chapter 3, which we state here for completeness. Theorem 15 Given n = (n1 , . . . , nd ), the pseudospectral approximation with tensor product basis π n (s) is equal to the G-NI approximation with tensor product basis π n (s) using n-point tensor product Gaussian quadrature. 85 CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS Proof. The proof is a straightforward adaptation of the proof of Theorem 4. Theorem 16 Assume A(s) and b(s) contain only finite degree polynomials of maximum degree ma = (ma,1 , . . . , ma,d ) and mb = (mb,1 , . . . , mb,d ), respectively. Assume also that the basis π n (s) contains polynomials of maximum degree n, not necessarily a tensor product basis. For j = 1, . . . , d, define mj ≡ mj (nj ) ≥ max ma,j + 2nj − 1 2 , mb,j + nj 2 , (5.5) and let m = (m1 , . . . , md ). Then the G-NI approximation with basis π n (s) and mpoint tensor product Gaussian quadrature rule is equivalent to the Galerkin approximation with basis π n (s). Proof. This is a straightforward adaptation of the proof of Lemma 7 utilizing the polynomial exactness of the Gaussian quadrature formulas. In essence, Theorem 16 states that for operators A(s) and b(s) that depend at most polynomially on s, there is a Gaussian quadrature rule such that G-NI is equal to Galerkin (within numerical precision). For general analytic dependence on s, the error analysis is more subtle. However, from Theorems 15 and 16 we conjecture that the G-NI approximation converges at a rate comparable to the pseudospectral and Galerkin approximations for a properly incremented Gaussian quadrature rule and polynomial basis set. 5.2.1 A Useful Decomposition The true advantage of the G-NI method is not a superior rate of convergence, but a practical decomposition of the matrix ππ T ⊗ A n If we explicitly write the Gaussian quadrature formula, we get X T να (π(λα )π(λα )T ⊗ A(λα )) ππ ⊗ A n = α≤n−1 (5.6) 86 CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS Recalling the definition of νλ in equation (4.6), we define qα = π(λα )/kπ(λα )k2 , so that Let Q be the |I| × Q k X ππ T ⊗ A n = qα qTα ⊗ A(λα ). (5.7) α≤n−1 nk matrix whose columns are qα . Then we can write the factorization of the G-NI matrix as T ππ ⊗ A n = (Q ⊗ I)A(Λ)(QT ⊗ I), (5.8) where I is the N ×N identity matrix and A(Λ) is a block-diagonal matrix with N ×N blocks equal to A(λα ) for each α ≤ n − 1. The matrix Q has one row for each basis function from π(s) and one column for each α ≤ n − 1. By the orthogonality of π(s) and the polynomial exactness of the quadrature rules, the rows of Q are orthogonal. This implies that QQT = I, where I is the appropriately sized identity matrix. The columns of Q are, in general, not orthogonal. However, if the basis set π(s) is constructed as a tensor product of univariate bases (see equation (4.4)), i.e. π(s) = π n1 (s1 ) ⊗ · · · ⊗ π nd (sd ), (5.9) and if each nk is equal to the number of points in the univariate quadrature rules λnk that begets λ (i.e. as many points in the tensor grid as polynomials in the tensor basis), then Q becomes square and orthogonal. In this case, the equation (5.3) can be transformed to set of |I| = |λ| decoupled linear systems, each of size N × N. In fact, in this case, the computed coefficients are exactly equal to the coefficients of an interpolatory pseudospectral method with a tensor product basis (see Theorem 15). We will assume, however, that this is not the case, i.e. that the basis set π(s) is much smaller than the tensor product basis. If we define the diagonal matrix D = diag(Q(0, :)), then we can write the right hand side in (5.3) as hπ ⊗ biΛ = (QD ⊗ I)b(Λ) (5.10) where b(Λ) is a vector of the parameterized right hand side b(s) evaluated at the 87 CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS Gauss quadrature points. The scaling D is necessary to recover the weights of the quadrature formula. Notice that (5.10) are also the pseudospectral coefficients of b(s) corresponding to the basis elements in π(s). Using equations (5.8) and (5.10), we can rewrite the G-NI system (5.3) as (Q ⊗ I)A(Λ)(QT ⊗ I)vec(X) = (QD ⊗ I)b(Λ). (5.11) We will derive a number of interesting insights from this equation. 5.2.2 Eigenvalue Bounds Eigenvalue information is always a crucial component of analyzing linear systems and matrix operators. From the factorization (5.8), we can immediately derive bounds on the eigenvalues of the matrix ππ T ⊗ A n . Technically speaking, we need to restrict this statement to symmetric parameterized matrices A(s) that admit a full param- eterized eigenvalue decomposition. Such objects were studied at length in Kato’s seminal work [48], where he shows that parameterized matrices whose elements depend analytically on a single parameter have parameterized eigenvalues that also depend analytically on the parameter. This work was extended by Sun to systems that depend on several parameters [81]. We use this decomposition to prove the following theorem, which states that the eigenvalues of the G-NI matrix are always bounded by the extremes of the parameterized eigenvalues of A(s). Theorem 17 Assume that A(s) = X(s)Θ(s)X(s)T is the analytic, paramterized eigenvalue decomposition of the symmetric, parameterized matrix A(s), where s = (s1 , . . . , sd ). For any subset of the orthogonal basis π(s) and any n-point tensor product Gaussian quadrature rule (where n = (n1 , . . . , nd )) such that each nk is greater than the maximum degree of the basis polynomials corresponding to sk , min{θmin (A(s))} ≤ θmin s ππ T ⊗ A n ≤ θmax ππ T ⊗ A n ≤ max{θmax (A(s))}. s (5.12) CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 88 Proof. The elements of the matrix Q in the decomposition (5.8) are equal to the basis polynomials π(s) evaluated at the tensor product Gaussian quadrature points λn , i.e. Q has one row for each element of π(s) and one column for each point in λn . Let π n (s) be the tensor product basis corresponding to the d-tuple of integers n. By the assumption on the size of n relative to the maximum degree of π(s), each element of π(s) is contained in π n (s). Define π ′ (s) to be the elements of π n (s) not in π(s), and let Q′ be the matrix that has elements equal to the basis functions π ′ (s) evaluated at the tensor product Gaussian quadrature points λn – comparable to Q. Notice that the matrix Q̃ = " Q Q′ # (5.13) is square and orthogonal, i.e. Q̃−1 = Q̃T . Therefore the matrix (Q̃ ⊗ I)A(Λ)(Q̃T ⊗ I) is a similarity transformation with the block diagonal matrix A(Λ). Since A(s) is symmetric, each block of A(Λ) is also symmetric and admits a full eigenvalue decomposition, which we write as X(Λ)Θ(Λ)X(Λ)T where Θ(Λ) is the diagonal matrix of the parameterized eigenvalues Θ(s) evaluated at the Gaussian quadrature points. By construction, the matrix T ππ ⊗ A n = (Q ⊗ I)A(Λ)(QT ⊗ I) (5.14) (Q̃ ⊗ I)A(Λ)(Q̃T ⊗ I) = (Q̃ ⊗ I)X(Λ)Θ(Λ)X(Λ)T (Q̃T ⊗ I). (5.15) is a submatrix of So by an interlacing theorem [37, Theorem 8.1.7], the eigenvalues of ππ T ⊗ A n are bounded above by the largest of θ(λn ) and bounded below by the smallest of θ(λn ). To complete the proof, we state the obvious: min{θmin (A(s))} ≤ θ(Λ) ≤ max{θmax (A(s))}, s as required. s (5.16) CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 89 Depending on the problem, these bounds can be either tight or loose. However, they can offer practical guidance originating from the parameterized A(s) when devising solver methods for the G-NI system. 5.2.3 A Least-Squares Interpretation The factorization (5.8) leads to a weighted least-squares interpretation of the G-NI system (5.11). First, we manipulate the right hand side. Define x(Λ) to be the solution vector evaluated at each Gauss point and stacked in the appropriate order. Then the right hand side of (5.11) becomes (QD ⊗ I)b(Λ) = (QD ⊗ I)A(Λ)x(Λ) = (Q ⊗ I)(D ⊗ I)A(Λ)x(Λ) = (Q ⊗ I)A(Λ)(D ⊗ I)x(Λ) Next we take the Cholesky factorization of A(Λ) = F (Λ)F (Λ)T , where F (Λ) is block-diagonal and each block is lower-triangular. Substituting this factorization into (5.11) with the manipulated right hand side we get (Q ⊗ I)F (Λ)F (Λ)T (QT ⊗ I)vec(X) = (Q ⊗ I)F (Λ)F (Λ)T (D ⊗ I)x(Λ). (5.17) Upon inspection we see that (5.17) is equivalent to the normal equations for the weighted least-squares problem vec(X) = argmin F (Λ)T (QT ⊗ I)vec(Y) − (D ⊗ I)x(Λ) 2 . (5.18) vec(Y) We can interpret this minimization problem in the following way. Notice that we can write Q = PD where the columns of P are equal to π(λ). Applying this change we get vec(X) = argmin F (Λ)T (D ⊗ I) (PT ⊗ I)vec(Y) − x(Λ) 2 . vec(Y) (5.19) 90 CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS The difference (PT ⊗ I)vec(Y) − x(Λ) exactly measures the difference between a truncated expansion evaluated at the Gaussian quadrature points λ and the true solution evaluated at λ. For the minimization problem, this difference is weighted by the operator. In other words, the G-NI coefficients X produce the truncated expansion with the smallest weighted mean-squared error at the Gaussian quadrature points. This interpretation may not lead to a realizable computation benefit as the largescale weighted least-squares solvers may yet be too expensive for Galerkin problems of interest, particularly if they require the computation of the Cholesky factors F (Λ). However, the derivation offers additional insights into the results of the G-NI approximation. 5.2.4 Iterative Methods A crucial observation follows from the factorization (5.8) that allows us to place the G-NI method within the weakly intrusive paradigm. Note that multiplying a vector against the matrix (5.8) can be accomplished in three steps. Suppose vec(Z) = (Q ⊗ I)A(Λ)(QT ⊗ I)vec(U). (5.20) Then, using the properties of the Kronecker product, 1. W = UQ. Let wα be a column of W. 2. For α ≤ n − 1, yα = A(λα )wα . Define Y to be the matrix with columns yα . 3. Z = YQT . Step 1 can be thought of as pre-processing, and step 3 as post-processing. Since each row of the matrix Q has a Kronecker structure corresponding to the tensor product quadrature rule, steps 1 and 3 can be computed accurately and efficiently using multiplication methods such as [27]. The second step requires only constant matrix-vector products where the matrix is A(s) evaluated at some point in the parameter space. The initialization procedure CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 91 for a Krylov-based iterative method uses the right hand side (5.10), and this is constructed with Gaussian quadrature needing only evaluations of b(s) at the quadrature points plus post-processing. Therefore, applying such iterative methods to compute the G-NI approximation satisfies the requirements for the weakly intrusive paradigm. To reiterate, under this paradigm we can take advantage of a tuned, reusable interface for the matrix vector multiplies that will exploit any sparsity in the matrix to save memory and increase efficiency. 5.2.5 Preconditioning Strategies The number of iterations required to achieve a given convergence criterion (e.g. a sufficiently small residual) can be greatly reduced for Krylov-based iterative methods with a proper preconditioner. In general, preconditioning a system is highly problem dependent and begs for the artful intuition of the scientist. However, the structure revealed by the decomposition (5.8) of the G-NI system offers a number of useful clues. Suppose we have an N × N matrix L that is easily invertible. We can construct a block-diagonal preconditioner I ⊗ L−1 , where I is the identity matrix of size |I| × |I|. (Recall that |I| is the number of basis function in the G-NI approximation.) If we premultiply the preconditioner against the factored form of the G-NI matrix, we get (I ⊗ L−1 )(Q ⊗ I)A(Λ)(QT ⊗ I) = (Q ⊗ I)(I ⊗ L−1 )A(Λ)(QT ⊗ I). (5.21) Notice that by the mixed product property and commutativity of the identity matrix, the block-diagonal preconditioner slips past Q⊗I to act directly on the parameterized matrix evaluated at the quadrature points. The blocks on the diagonal of the inner matrix product are L−1 A(λα ) for α ≤ n − 1. In other words, we can choose one constant matrix L to affect the parameterized system at any point in the parameter space. A reasonable and popular choice is the mean L = hAi; see [76, 74] for a detailed analysis of this preconditioner for stochastic finite element systems. Notice that this is also the (1, 1) block of the Galerkin matrix and approximately the (1, 1) block of CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 92 the G-NI matrix. However, if A(s) is very large or has some complicated parametric dependence, then forming the mean system and inverting it (or computing partial factors) for the preconditioner may be prohibitively expensive. If the dependence of A(s) on s is close to linear, then L = A (hsi) may be much easier to evaluate and just as effective. One goal of the preconditioner is reduce the condition number of the matrix, and one way of achieving this is to reduce the spread of the eigenvalues. If we knew a priori which region of the parameter space yielded the extrema of the parameterized eigenvalues of A(s), then we could choose a parameter value in that space to evaluate some preconditioner related to A(s). Unfortunately, we only get one such evaluation. Therefore, if the largest possible value of he parameterized eigenvalues is very large, we may choose this parameter value. Alternatively, if the smallest eigenvalue is close to zero (for positive definite systems), then this may be a better option to reduce the condition number of the G-NI system. In any case, the structure of the G-NI matrix revealed in the factorization (5.8) offers many clues for constructing effective preconditioners to compute the coefficients of the G-NI approximation, regardless of the structure of the parameterized matrix A(s). 5.3 Parameterized Matrix Package — A MATLAB Suite We have implemented many G-NI variants in a functioning MATLAB suite of tools available on the online repository Github. This suite takes advantage of the common matrix-vector product interface in the weakly intrusive paradigm, and – unlike most stochastic finite element codes – does not require any symbolic integration to compute an approximate Galerkin solution. It also contains codes for computing pseudospectral approximations with tensor product bases, which is useful for verification when dealing with relatively small problems. The MATLAB codes, documentation, and a series of examples can be downloaded at http://github.com/paulcon/pimp/tree/ CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 93 master. 5.4 Application — Heat Transfer with Uncertain Material Properties To conclude this chapter, we examine an application from computational fluid dynamics with uncertain model inputs. The codes (affectionately named Joe) used to compute the deterministic version of this problem – i.e. for a single realization of the model inputs – were developed at Stanford’s Center for Turbulence Research as part of the Department of Energy’s Predictive Science Academic Alliance Program; the numerics behind the codes are described in [75]. For this example, we made a slight modification to the codes which allowed the extraction of the non-zero elements of the matrix and right hand side used in the computation of the temperature distribution. Once we gained access to the linear system, we were able to apply the G-NI method to the stochastic version of the problem in the weakly intrusive paradigm to approximate the statistics of solution. 5.4.1 Problem Set-up The equation we are solving with Joe is the integral version of a two-dimensional steady advection-diffusion equation. We seek a scalar field φ = φ(x, y) representing the temperature defined on the domain Ω that satisfies Z ∂Ω Z ~ = ρφ ~v(s4 , s5 , s6 ) · dS ∂Ω ~ . (Γ(s1 , s2 , s3 ) + Γt ) ∇φ · dS (5.22) where ρ is the density, assumed constant, and the velocity ~v is precomputed by an incompressible Navier-Stokes model and then randomly perturbed by three spatially varying oscillatory functions with different frequencies whose magnitudes are respectively parameterized by s4 , s5 , and s6 . The parameterized magnitudes are interpreted as random perturbations of the velocity field, which is simply an input to this model. The diffusion coefficient Γ = Γ(s1 , s2 , s3 ) is similarly altered by a strictly CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 94 Figure 5.1: Mesh used to compute temperature distribution. positive, parameterized, spatially varying function that models random perturbation. The turbulent diffusion coefficient Γt is computed using a Reynolds-Averaged NavierStokes (RANS) model. The domain Ω is a channel with a series of cylinders; the computational mesh on the domain Ω contains roughly 10,000 nodes and is shown in Figure 5.1. Inflow and outflow boundary conditions are prescribed in the streamwise direction, and periodic boundary conditions are set along the y coordinate. Specified heat flux boundary conditions are set on the boundaries of the cylinders to model a cooling system. 5.4.2 Solution Method The goal is to compute the expectation and variance of the scalar field φ = φ(x, y, s1, . . . , s6 ) over the domain Ω with respect to the variability introduced by the parameters. We use the G-NI method to construct a polynomial approximation of φ along the coordinates induced by the parameters s1 , . . . , s6 . We employ the ANOVA-based heuristic proposed in Chapter 4 to choose an anisotropic basis of orthogonal polynomials, but we do not include the curvature parameters for the interaction effects. To solve the G-NI system, we use a BiCGstab [80] method since the matrix is not symmetric. 5.4.3 Results The weights computed for the anisotropic basis (see equation (4.36)) are shown in Table 5.1. The weights for coordinates 5 and 6 are orders of magnitude larger than CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 95 γ1 γ2 γ3 γ4 γ5 γ6 1.00 6.79 3.67 16.40 4859.35 4265.45 Table 5.1: The weights computed with the ANOVA-based method for choosing an efficient anisotropic basis. 1 – the weight for the most important coordinate. With respect to the basis, this implies that the polynomial expansion along the first coordinate must expand beyond degree 4859 before it expands at all along the 5th coordinate. This is effectively a dimension reduction. The ANOVA-based method finds that the 5th and 6th parameter coordinates do not contribute to the overall variability in the function compared to the first (and most important) coordinate. This dramatically reduces the number of basis functions required for the approximation. In Figure 5.2, we plot the growth of the number of terms in the weighted, ANOVA-based anisotropic basis compared to the growth of a full polynomial basis; we observe orders of magnitude difference in the number of basis functions between the two methods. Using the full polynomial basis for comparison would have been infeasible. Therefore we plot the residual error estimate (Theorem 13) for the weighted basis as a function of the parameter n. This is shown in Figure 5.3. The sharper decrease (i.e. the stairstep behavior) in the convergence of the residual occurs when an important coefficient is added to the approximation. This signals that there are possible improvements to the heuristic for choosing the basis functions. Finally, we show the expectation and variance of φ over the domain Ω in Figure 5.4. As expected, the variance in φ occurs in the downstream portion of the domain as a result of the variability in the diffusion coefficient. 5.5 Summary In this chapter, we introduced a new paradigm for spectral methods for parameterized matrix equations dubbed weakly intrusive that offers a middle ground in the debate over the relative advantages of intrusive versus nonintrusive methods. The weakly CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 96 Figure 5.2: The number of terms as n increases in the weighted, ANOVA-based anisotropic basis compared to the number of terms in the full polynomial basis. Figure 5.3: Plotting the residual error estimate of the G-NI approximation with the weighted, ANOVA-based polynomial basis. CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 97 Figure 5.4: The expecation (above) and variance (below) of the temperature field φ over the domain. intrusive paradigm – in analogy to Krylov-based iterative solvers – uses only matrixvector products against the parameterized matrix evaluated at points in the parameter space, thus creating possibility of code reuse and exploiting sparsity for memory and efficiency gains. Also, the weakly intrusive paradigm allows the computation of a residual error estimate for a given approximation of the solution x(s), which is not possible in purely nonintrusive methods. We presented a variant of the spectral methods called Galerkin with Numerical Integration, or G-NI, which is equivalent to the Galerkin method but we replace all integrals by Gaussian quadrature formulas. We showed how this simple change opens up many new avenues for analysis and interpretation of the results through an elegant factorization of the linear system of equations solved to compute the G-NI coefficients. This factorization allowed us to implement the G-NI method within the weakly intrusive paradigm. It also allowed us to derive bounds on the eigenvalues of the G-NI matrix, which are useful for analysis of the method and connecting it to the original parameterized matrix equation. The factorization also suggested broad strategies for preconditioning the G-NI system for rapid convergence inside an CHAPTER 5. STRATEGIES FOR LARGE-SCALE PROBLEMS 98 iterative solver. We tested this method with the ANOVA-based anisotropic basis on an engineering application from computational fluid dynamics. We were able to reuse code developed for the associated deterministic problem and take advantage of a custom matrix-vector product interface. The ANOVA-based method found an appropriate anisotropic basis for the solution and showed rapid convergence of the residual error estimate with orders of magnitude fewer terms compared to the full polynomial basis. Chapter 6 An Example from Conjugate Heat Transfer In this chapter, we demonstrate the application of the spectral methods beyond the context of parameterized matrix equations. While the approximation of many engineering models reduces to solving appropriate linear system, many nonlinear models do not reduce in such a straightforward manner. Fortunately, we are not hindered by such difficulties when applying the nonintrusive pseudospectral/collocation methods. The work in this chapter has been published in [18]. In what follows, we examine the incompressible flow and heat transfer around an array of circular cylinders. In this case the momentum transport is decoupled from the energy equations and this allows us to derive a semi -intrusive method combining the advantages of intrusive and non-intrusive methods. The physical model is based on the two-dimensional Reynolds-averaged Navier-Stokes (RANS) equations completed by an eddy-viscosity turbulence model [63]. We introduce stochastic boundary conditions to account for uncertainties in the incoming flow and the thermal state of the cylinder surface. To approximate the statistics of the stochastic temperature field, we derive a hybrid uncertainty propagation scheme that applies (i) a spectral collocation method to the momentum equations and (ii) a spectral Galerkin method to the energy equation. The Galerkin form of the energy equation naturally decouples to a set of stochastic 99 CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 100 scalar transport equations, and the resulting system is developed within a commercial computational fluid dynamics code [47]. There has been a flurry of recent work applying both intrusive and non-intrusive techniques to stochastic flow models. We refer the curious reader to the following papers for further details [52, 83]. General hybrid propagation methods are also not without precedent; see [31] for a similar approach that integrates spectral expansion methods with a collocation-like procedure for an efficient solution approach. The chapter is structured as follows: Section 6.1 describes the problem in full detail including our modelling choices for the uncertain input parameters and the specific objectives of our computations. Section 6.2 derives the hybrid propagation scheme for the model. Section 6.3 presents the results and analysis of our numerical experiments. 6.1 Problem Description and Motivation To achieve higher thermal efficiency and thrust, modern gas turbine engines operate at high combustor outlet temperatures and, therefore, turbine blades undergo severe thermal stress and fatigue. Secondary cooling flow passages are built into each blade (Figure 6.1) and consist of turbulators, film cooling holes, tube bundles, and pins. These are mostly used in the narrow trailing edge region of the blade [59]. In the present study we consider the flow and heat transfer around a periodic array of pins separated by a distance L/D = 1 (where D is the cylinder diameter). The flow conditions are assumed to be fully turbulent with a Reynolds number based on the incoming fluid stream (and D) of ReD = 1, 000, 000. In this regime, direct solutions of the Navier-Stokes equations are impractical due to the range of length and time scales, which result in the extremely large computational cost; we resort to Reynolds-averaged modeling. The problem is assumed two-dimensional and the computational domain is representative of a single row of aligned cylinders; x1 ∈ [−2, 10] and x2 ∈ [−2, 2] with a circle of radius 0.5 centered at the origin. Periodicity is enforced in the vertical direction with inflow and outflow conditions applied in the streamwise direction (see CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 101 Figure 6.1: A turbine engine cooling system with a pin array cooling system. Figure 6.2). Most numerical simulations of similar phenomena rely on simple thermal boundary conditions (constant temperature or constant heat flux) to evaluate the heat transfer characteristics of pin cooling devices. In realistic operating conditions, the overall surface thermal state is the result of an energy balance between convection and conduction in the fluid and in the solid. Therefore, accurate predictions of the heat transfer rates require the solution of the conjugate (solid-fluid) heat transfer problem. Instead of modeling the solid-fluid interactions directly [46], we introduce an uncertain heat flux on the boundary of the cylinder as a mild substitute; the precise formulation of this is given below. Another simplification that is typically invoked in the design of blade cooling systems is to ignore the interactions between the various components (turbulators, slots, etc.) and optimize their performance independently. The pins, in particular, are the last stage of the cooling system and, therefore, more strongly affected by flow distortions introduced upstream. To investigate the effect of inflow perturbations, we model the uncertain inflow as a linear combination of oscillatory functions with different wave lengths and random amplitude; the precise formulation is given below. CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 102 Figure 6.2: Computational mesh for two-dimensional cylinder problem. 6.1.1 Mathematical Formulation The governing equations are the two-dimensional RANS equations written in the assumption of incompressible fluid and steady flow. The conservation of mass, momentum and energy can be written in indicial notation as: ∂Ui =0 ∂xi (6.1) ∂Ui ∂Ui 1 ∂P ∂ Uj (ν + νt ) − = ∂xj ∂xj ∂xj ρ ∂xi (6.2) ∂ ∂T = Uj ∂xj ∂xj νt ∂T κ+ P rt ∂xj (6.3) where the density (ρ), the molecular viscosity (ν) and the thermal conductivity (κ) are given properties of the fluid and assumed constant. The eddy viscosity (νt ) is computed using the k-ω turbulence model [63], and the turbulent Prandtl number (P rt ) is assumed to be a constant. In the assumption of incompressible flow, the energy equation is decoupled from the momentum equation and can be solved after the velocity field is computed. CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 6.1.2 103 Uncertainty sources We assume that the sources of uncertainty are the specification of the velocity boundary condition on the incoming flow – the effect of the upstream components – and the definition of the thermal condition on the surface of the cylinder – the effect of the conductivity on the pin. Let s1 , s2 , and s3 be independent random variables on an appropriate probability space, each uniformly distributed over the interval [−1, 1]. The uncertain boundary conditions are prescribed by continuous functions of si , i = 1, 2, 3, and are therefore random variables themselves. The inlet velocity profile is constructed as a linear combination of two cosine functions of x2 ∈ [−2, 2], i.e. U|inlet (x2 , s1 , s2 ) = 1 + σ1 (s1 cos(2πx2 ) + s2 cos(10πx2 )). (6.4) where σ1 controls the inflow velocity fluctuations. For numerical experiments, we set σ1 = 0.25, which ensured that the amplitude of the random fluctuations did not cause the inlet velocity to become negative. This model allowed moderate random fluctuations (at most 25%) about a mean value, hUinlet i = 1. The wave numbers 2 and 10 in (6.4) were chosen to introduce low and high frequency fluctuations, respectively. The heat flux is specified as an exponential function of s3 over the cylinder wall, namely ∂T (θ, s3 ) = e−(0.1+σ2 s3 )(cos(θ)/2) ∂n cyl (6.5) where n is the normal to the cylinder and σ2 controls the influence of s3 . The angle θ ∈ [0, π] is the angle away from the front of the cylinder as shown in figure 6.3. For the numerical experiments, we chose σ2 = 0.05. This model prescribes a larger heat flux at the left side of the cylinder where flow strikes it; the realization of s3 determines precisely how much greater. The maximum variability due to s3 is approximately 2.5%. The one-way coupling in equations (6.2) and (6.3), implies that the heat flux boundary condition parameterized by s3 has no affect the velocity field. Therefore we can write Ui = Ui (x, s1 , s2 ) for i = 1, 2. However, the temperature field does CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 104 Figure 6.3: Schematic of uncertain inflow conditions. The arrows represent the stochastic inflow conditions, and the shading represents the heat flux on the cylinder wall. depend on the variability in the velocity inflow specification, which we denote by T = T (x, s1 , s2 , s3 ). This observation is crucial to the derivation of the hybrid method in section 6.2. Remarks Our approach to modeling input uncertainties is admittedly ad hoc. For a real application – instead of a problem that is simply motivated by a real application – the parameters of a stochastic model would be estimated from experimental data, e.g. using a procedure such as [32]. 6.1.3 Objective We are interested in the effects of the input uncertainties on the temperature distribution around the cylinder wall. To this end, we desire the variance of temperature as a function of θ around cylinder. In other words, we wish to compute σT2 (θ) h i ≡ Var T |cyl (θ) = T |cyl (θ, s1 , s2 , s3 ) − µT (θ) 2 , (6.6) CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 105 E D where µT (θ) = T |cyl (θ). To approximate σT2 , we use a hybrid stochastic Galerkin/collocation scheme described in the next section. 6.2 A Hybrid Propagation Scheme In general, the spectral Galerkin method requires the solution of a large, coupled system of equations to solve for the coefficients of the global expansion. In contrast, the collocation method requires the evaluation of the parameterized model at a discrete set of points, similar to sampling methods such as Monte Carlo; thus the collocation method can typically be implemented in a nonintrusive fashion. The difference in the approximation given by each approach, known as aliasing error, typically decays like the approximation error, i.e. exponentially fast, for linear problems [16]. In what follows, we apply a Galerkin method to the energy equation (6.3) and a collocation method to the nonlinear momentum equation (6.2). A non-aliased Galerkin formulation of the full RANS equations would introduce a large, coupled system for the coefficients of the Galerkin approximation because of the non-linear convective operators in the momentum, the energy, and the turbulence transport equation. This greatly complicates the solution procedure and cannot be accomplished within the framework on a commercial CFD code. The present approach represents an attempt to retain an efficient Galerkin formulation for the (linear) energy transport while relying on a collocation formulation for the non-linear momentum equation. The result is a semi -intrusive hybrid scheme that takes advantage of the flexibility in the commercial software used to solve the flow problem. Remark The convergence of both Galerkin and collocation depend on the smoothness of the quantities of interest with respect to the parameters. We assume that the relatively small and bounded range of variability in the boundary conditions ensures that the solution satisfies such a smoothness assumption and does not introduce any singularities in the solution within the parameter space. CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 6.2.1 106 Galerkin method for the energy equation To solve the Galerkin form of the energy equation (6.3), we express the Galerkin approximation of temperature TN as an orthogonal expansion in s3 : TN = TN (x, s1 , s2 , s3 ) = N X Tk (x, s1 , s2 )πk (s3 ). (6.7) k=0 where the πk (s3 ) are the normalized Legendre polynomials. For notational convenience, we write this in vector notation as TN = TT π(s3 ), (6.8) where T is a vector of the expansion coefficients and π(s3 ) a vector of the Legendre basis polynomials. Then by projecting the energy equation onto each basis polynomial and requiring the residual to be orthogonal to the approximation space, we can write the Galerkin form as ∂ νt ∂ ∂ T T T T Uj κ+ = (T π(s3 ))π(s3 ) (T π(s3 )) π(s3 ) , ∂xj ∂xj P rt ∂xj (6.9) subject to the boundary conditions ∂ T T (T π(s3 ))π(s3 ) = e−(0.1+σ2 s3 )(cos(θ)/2) π(s3 )T . ∂n cyl (6.10) Note that the projection is with respect to only s3 . By the linearity of the expectation operator and the orthonormality of the basis, equation (6.9) reduces to a set of uncoupled scalar transport equations for the coefficients of the Galerkin approximation TN . ∂Tk ∂ Uj = ∂xj ∂xj νt ∂Tk κ+ , P rt ∂xj k = 0, . . . , N, (6.11) each subject to boundary conditions on the cylinder wall given by −(0.1+σ2 s3 )(cos(θ)/2) ∂Tk = e π (s ) . k 3 ∂n cyl (6.12) 107 CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER (Recall that the subscript on U denotes the spatial coordinate while the subscript on T denotes the coefficient in the Galerkin expansion.) Note that the velocity components Uj and the temperature expansion coefficients Tk are functions of the spatial variables x and the parameters s1 and s2 . Thus by exploiting the one-way coupling in the RANS model, we have effectively replaced the random variable s3 by a set of N + 1 parameterized transport equations. We can treat the new system of equations (momentum plus scalar transports) with a collocation method in two dimensions. 6.2.2 Collocation method for the modified system Following the prescribed collocation algorithm, we evaluate the solution of the model (6.2) and (6.11) at a discrete set of points within the range of the random variables s1 and s2 . We choose a tensor grid of Gauss-Legendre points with M + 1 points in each direction. In other words we solve (M + 1)2 deterministic systems given by equations (i) (j) (6.2) and (6.11), so that the parameterized solution is exact at each point (λ1 , λ2 ), for i, j = 0, . . . , M in the two-dimensional Gauss-Legendre grid. An important part of any implementation of a non-intrusive propagation technique (i) (j) is the deterministic solver. For each point (λ1 , λ2 ), we employ the commercial software package Fluent [47] to solve for the temperature coefficients Tk and velocity fields U1 and U2 in the modified RANS equations. Fluent uses a finite volume second-order discretization on unstructured grids. The mesh has been generated to achieve high resolution of the boundary layer on the cylinder surface with y + ≈ 1. We performed preliminary simulations to assess the resolution requirements for the present problem. Each deterministic solve is converged to steady state by ensuring that the residuals of all the equations are reduced by four orders of magnitude. We are not interested in the response surface of the temperature as a function of s1 , s2 , and s3 – only its variance. Therefore we do not need to construct the interpolant through the collocation points. Instead we approximate the variance of temperature as a function of θ with the following steps. (i) (j) 1. For each point (λ1 , λ2 ) in the tensor grid of Gauss-Legendre points, solve for (i,j) the velocity field U1 (i,j) and U2 . CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER (i,j) 2. For k = 0, . . . , N solve the scalar transport equation for Tk 108 using the result from the velocity computation. 3. Compute the approximate variance of temperature as µT (θ) ≈ M X M X (i,j) T0 i=0 j=0 wi,j ≡ µ̄T (θ) M M X N X X (i,j) 2 σT (θ) ≈ (Tk (θ))2 i=0 j=0 k=0 ! (6.13) wi,j − µ̄T (θ)2 ≡ σ̄T2 (θ), (6.14) where wi,j is the weight corresponding to the Gauss-Legendre two-dimensional quadrature rule. This approximation follows from the variance formula (6.6) applied to the Galerkin approximation TN . 6.2.3 Analysis of computational cost The hybrid approach benefits both from the increased accuracy of the Galerkin formulation and decreased computational cost. In this section we compare the cost of the hybrid method to a naive three-dimensional collocation method. Let C1 be the cost of one deterministic solve of the standard RANS equations (6.1)-(6.3), and let C2 be the cost of the modified system (6.1), (6.2), and (6.11), (both solved with Fluent). Assume that C2 = α(N)C1 , (6.15) where α(N) > 1. The cost of naive three-dimensional collocation κc is then C1 (M + 1)3 , and the cost of the hybrid method κh is C2 (M + 1)2 = α(N)C1 (M + 1)2 . Thus we have the following relation. κh = α(N) κc M +1 (6.16) In the numerical experiments in section 6.3, we found that N = 4 was sufficient for converged variance, and α(4) ≈ 2. The M required for converged variance was CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 109 18. Therefore the hybrid method is roughly ten times as efficient as a naive threedimensional collocation. We note that if we had parameterized the stochastic heat flux boundary condition by more random variables to account for random spatial fluctuation, we expect the savings to be even greater. 6.3 Results In this section, we demonstrate numerical convergence of the approximate variance computed with the hybrid method, and we compare the results with a conventional Monte Carlo uncertainty propagation method. Following this verification, we make some remarks about the physical phenomenon described by the stochastic model. 6.3.1 Numerical convergence and verification We first check convergence of the variance approximation from the hybrid method by increasing (i) the order of the Galerkin approximation TN and (ii) the number of points in the collocation scheme. Figure 6.4 displays the difference between the two-norm of the Galerkin approximations TN and TN −1 on the cylinder wall for two-dimensional tensor collocation schemes built from successive Gauss quadrature formulas. We set a tolerance of 10−5 to be consistent with the convergence tolerance for each Fluent solve. The approximation achieves the chosen tolerance when N = 3 for each quadrature formula, and we include N = 4 to increase the confidence in the convergence. From these results we conclude that N = 4 is sufficient for converged results. We then check the convergence of the quadrature rule by increasing the number of points M. Note that the total number of points in the two-dimensional rule is (M + 1)2 . We adjust the tolerance for the quadrature rule to 10−4 to account for the increased effects of rounding due to quadrature. In Figure 6.5, we plot the difference in the variance computed with an M point rule and an M − 1 point rule. The values appear to converge after M = 10, and we compute the remaining points to ensure the results remains within the tolerance. CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 110 The relatively slow convergence of the quadrature versus Galerkin is not a result of any deficiencies in the method; the quadrature integrates along the coordinates that affect the nonlinear momentum equation. The nonlinearity in the computation of velocity yields substantial error in the quadrature which is absent in the Galerkin approximation of the linear energy equation. 9 25 49 81 121 169 225 289 361 tol 0 ||Var[TN]−Var[TN−1]||2 10 −2 10 −4 10 −6 10 −8 10 −10 10 1 2 3 4 N Figure 6.4: Convergence of the variance of the Galerkin approximation TN as the number of terms in the expansion N increases. Each line represents the convergence for each quadrature rule. The convergence tolerance is 10−5 . 6.3.2 A physical interpretation In Figures 6.6 and 6.7 we plot the approximate expectation µ̄T and variance σ̄T2 as a function of θ around the top half of the cylinder wall. We compare these results to a Monte Carlo method with 10,000 samples. The purpose of this comparison is to verify the qualitative features of the hybrid results, which fare very well. In the expectation plot, the separation point of the flow is clearly identified by the sharp dip in the temperature. After the separation, the temperature rises again from CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 111 −3 10 ||Var[TN]M||2−||Var[TN]M−1||2 tol −4 10 −5 10 4 6 8 10 12 14 16 18 M Figure 6.5: Convergence of the variance of the Galerkin approximation T4 as the number of points in the quadrature rule M increases. The convergence tolerance is 10−4 . the impact of the recirculating flow in the wake of the cylinder. The variance plot shows that the variability in temperature is largest at the front of the cylinder, i.e. the stagnation point of the flow. This results from two sources: First, the heat flux boundary condition (equation 6.5) is defined to have larger variability at the front of the cylinder. Second, the variability in the inflow conditions (equation 6.4) affects the formation of a thermal boundary layer around the front of the cylinder wall, thus modifying flow conditions at the stagnation point. In addition to a large temperature variance at the stagnation point, Figure 6.7 illustrates that the variability is considerably higher in the boundary layer upstream of the separation (θ ≈ π/2) than in the downstream area. This is expected since the variability in the upstream conditions does not directly penetrate the separated shear layer. In other words, the location of the flow separation is only a function of the Reynolds number that is not considerably altered by the different inflow conditions. We can qualitatively assess the effects of the inflow variability on the variance at CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 112 180 160 E[TN](θ) 140 120 100 80 0 hybrid MC 0.5 1 1.5 2 θ (radians) 2.5 3 Figure 6.6: Approximate expectation as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo. the stagnation point by approximating the conditional variance on the cylinder wall given s1 = s2 = 0. In fact, we have computed this quantity already when computed the solution corresponding to the central Gauss quadrature point (λ1 = 0, λ2 = 0). We approximate the conditional variance by Var[T (x, s1 , s2 , s3 ) | s1 = s2 = 0] ≈ Var[TN (x, 0, 0, s3 )] = N X Tk (x, 0, 0)2 (6.17) k=1 In Figure 6.8 we plot the conditional variance against the Monte Carlo variance on the wall of the cylinder. We note that the conditional variance is much smaller than the Monte Carlo variance, which suggests that the total variance in temperature has significant contributions from the variability in s1 and s2 . CHAPTER 6. AN EXAMPLE FROM CONJUGATE HEAT TRANSFER 113 300 hybrid MC Var[TN](θ) 250 200 150 100 50 0 0.5 1 1.5 2 θ (radians) 2.5 3 Figure 6.7: Approximate variance as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo. 300 Var. (MC) Cond. Var. (hybrid) Var[TN](θ) 250 200 150 100 50 0 0.5 1 1.5 2 θ (radians) 2.5 3 Figure 6.8: Approximate conditional variance at s1 = s2 = 0 as a function of θ around the cylinder wall computed with the hybrid method and Monte Carlo. Chapter 7 Summary and Conclusions In this final chapter, we summarize the development and contributions of this dissertation and discuss possible extensions and future work. 7.1 Summary of Results The fundamental issue motivating this work is the need to understand the variability in the output of a physical model given a parametric representation for the model inputs. This question is explored in the burgeoning field of uncertainty quantification, which employs primarily probabilistic tools to quantify the output variability given a stochastic representation of the inputs in terms of a finite set of random variables or parameters. In many cases of interest, the computational procedure for exploring this relationship involves a spatial discretization of the differential operators in the physical model, where the elements of discrete operators and forcing terms then depend on the newly introduced input parameters. In other words, these models yield a parameterized matrix equation, where the objective is to approximate the vector-valued function that solves the parameterized equation or compute some derived statistics of the solution. Beyond this initial motivation, parameterized matrix equations appear in a wide range of applications including image processing, webpage ranking, circuit design, control problems, and recently an interpolation scheme for arbitrarily distributed data points. 114 CHAPTER 7. SUMMARY AND CONCLUSIONS 115 With its wide range of applicability and generality, we offer the easily stated parameterized matrix equation as a model problem for analysis and algorithm development for problems of interest in uncertainty quantification and beyond. Formally, we examine the equation A(s)x(s) = b(s), s ∈ [−1, 1]d , (7.1) where we assume that the elements of A(s) and b(s) depend analytically on the parameters s and the matrix A(s) is nonsingular for all s ∈ [−1, 1]d . In Chapter 2, we analyze this model problem and discuss characteristics of the vector-valued solution x(s). In particular, we show how using Cramer’s rule we can write each component of x(s) as a ratio of determinants of parameterized matrices. The assumptions then imply that each component of x(s) depends analytically on s as well. This rational structure reveals insight into the solution that we can use to understand the types of functions we will encounter in the approximation schemes. We take advantage of these insights in an informal discussion of singularities and their importance for computing statistics of x(s). If singularities exist within the parameter space, then some desired statistics may not exist (i.e. the integral quantities may be infinite). Other highlights from this informal discussion include noting the difficulties associated with unbounded parameter spaces, developing an intuition for the variability in x(s), and relating the position of singularities to parameter values outside the hypercube where A(s) is singular. The primary purpose of the Chapter 2 is to become familiar with sorts of functions and measurements we will approximate with the numerical schemes. In Chapter 3, we presented the univariate polynomial approximation schemes – known as spectral methods – for computing the desired statistics of x(s), where s is a single parameter. Broadly speaking, these methods construct a finite degree polynomial that globally approximates x(s) over the parameter space; approximate statistics are then computed from the constructed polynomial. These methods are well-studied in the context of numerical methods for partial differential equations, and theory of polynomial approximation is now considered classical. We briefly sketched the necessary background in orthogonal polynomials, Lagrange interpolation, Fourier CHAPTER 7. SUMMARY AND CONCLUSIONS 116 series for square-integrable functions, and Gaussian quadrature which underlies the spectral methods. We then derived a pseudospectral method that uses a Gaussian quadrature rule to approximate the coefficients of a truncated Fourier expansion of x(s). We show how the finite-term pseudospectral approximation interpolates x(s) at the Gaussian quadrature nodes, and we emphasized that computationally this method only requires the solution of the parameterized matrix equation at a finite set of parameter values. Therefore this procedure is categorized as non-intrusive, since one may repeatedly employ an optimally tuned solver for the matrix equation that results from choosing a parameter value. We also derived a spectral Galerkin method that finds the coefficients of a finite series approximation in the orthogonal polynomial basis such that the residual is orthogonal to the finite dimensional approximation space spanned by the basis polynomials. While the solution is optimal in the energy norm induced by the operator, computing its coefficients requires solving a linear system of equations that is n times larger than the original parameterized system, where n is the number of terms in the finite series. Unfortunately, we cannot exploit existing solvers for the parameterized system given a parameter value. Therefore the Galerkin method is categorized as intrusive. We rigorously compared the relative merits of both the Galerkin and pseudospectral methods in Chapter 3. Via this comparison, we uncovered a fascinating interpretation of the difference between the two methods in terms of the symmetric, tridiagonal Jacobi matrices of three-term recurrence coefficients of the orthogonal basis. In short, we found that the coefficients of both approximations solve a carefully truncated infinite system of linear equations. In the pseudospectral case, first the infinite Jacobi matrix is truncated and then the operator is applied to arrive at the necessary system. In the Galerkin case, the operator is applied to the infinite Jacobi matrix, and the resulting system is truncated. This result is new, and it extends the work of Golub and Gautschi [29, 38] on Gaussian quadrature and its relationship to the Jacobi matrices. As a corollary to this result, the conditions for equivalence between the two methods becomes immediately apparent. CHAPTER 7. SUMMARY AND CONCLUSIONS 117 After uncovering this relationship, we used classical theory to derive asymptotic error estimates for both methods in terms of the mean-squared norm. We found that both methods have the same asymptotic rate of convergence – which was well-known in the context of spectral methods for PDEs – and that rate is intimately tied to the location of the nearest singularity in the solution extended to the complex plane. Loosely speaking, the closer the singularity is to the region of interest, the slower the convergence of the polynomial approximation, which implies that an approximation may need many terms to achieve a fixed accuracy in the mean-squared norm. However, the asymptotic error estimates are typically not useful in practice. Therefore, we also derived a residual error estimate for a given approximation, which functions as an a posteriori error estimate. The comparable asymptotic convergence rates lead us to favor the nonintrusive pseudospectral method for its ease of implementation. We close Chapter 3 with two representative numerical examples and an application to the PageRank model for ranking nodes in a graph. The PageRank model includes a parameter, which we consider variable, and we used the pseudospectral method to compute the statistics of a random variant of PageRank [34]. In Chapter 4 we extended the univariate spectral methods to problems that depend on multiple parameters, which led to the challenging realm of multivariate approximation. The most natural extension of these methods is through a tensor product construction of the multivariate basis. This extension uses all possible products of the univariate basis functions for each parameter to construct the multivariate basis functions. The analysis for such a basis follows in a straightforward way from the univariate analysis, and all the connections related to the Jacobi matrices extend via Kronecker products. However, such a basis is highly impractical – and often computationally infeasible – since the number of basis functions increases exponentially as the number of parameters increases. We therefore turned to non-tensor bases in the hope of finding an accurate approximation method with significantly reduced computational cost. Standard multivariate polynomial approximation uses the so-called full polynomial basis, which is equivalent to using basis functions whose multi-index elements CHAPTER 7. SUMMARY AND CONCLUSIONS 118 sum to something less or equal than a given n. Comparing this to the tensor construction – where each element of the basis multi-indices must be less than or equal to n – we saw a direct analogy with norms on Rd . In particular, the full polynomial basis corresponds to a one norm restriction on the multi-indices while the tensor product basis corresponds to an infinity norm restriction on the multi-indices. We use this analogy to generalize the possible set of basis elements to a parameterized, weighted semi-norm restriction on the allowable multi-indices in a finite term polynomial approximation of x(s). This basis let us exploit any anisotropic parameter dependence in the solution. In other words, if some subset of the parameters contributes more to the overall variability of x(s) – as measured by the mean-squared norm – then this weighted basis can choose more basis elements to efficiently capture the effects of the important parameters and less basis elements for the unimportant parameters. To choose the weights of the semi-norm, we used the functional ANOVA decomposition of x(s) to determine the most important parameters as characterized by their main effects functions and interaction effects functions. We developed an iterative heuristic that computes a polynomial approximation for a fixed n, computes the ANOVA statistics for the approximation, transforms the ANOVA statistics to weights for the semi-norm restriction, and computes an approximation at n + 1 with this semi-norm. Through this procedure we choose the most important basis functions to represent a finite term approximation of x(s), where the importance is related to the magnitude of the associated coefficient, i.e. its contribution to the total variability. We tested this method on a series of simple problems built to showcase the merits of the method: The first problem has a solution that depends anisotropically on its two parameters, and the ANOVA-based method uncovers and exploits this anisotropy. The solution to the second problem has very weak interaction between its two input parameters, and the ANOVA-based method requires many fewer basis elements to capture the overall variability. The third problem’s solution has strong interaction effects, and the weighted basis chooses basis elements beyond a one norm restriction to efficiently approximate the solution. We also tested the weighted method on a problem from literature of an elliptic partial differential equation with coefficients CHAPTER 7. SUMMARY AND CONCLUSIONS 119 parameterized by five independent parameters. The method revealed the most important parameters and chose basis elements to capture the variability induced by those parameters. The purpose of Chapter 5 was to develop strategies for large-scale systems, where the scale may come from the size of the parameterized linear system or the number of parameters used in the model. The chapter opened by revisiting the question of intrusive versus nonintrusive and proposing a middle ground – dubbed weakly intrusive – as an alternative paradigm for algorithmic development. Methods developed in this paradigm are allowed (i) matrix-vector products, where the matrix is the parameterized matrix evaluated at a point in the parameter space and (ii) evaluations of the right hand side at points in the parameter space. This yields an analogy between the weakly intrusive paradigm and so-called matrix-free solvers, such as Krylov-based iterative methods. A helpful mnemonic for distinguishing the weakly intrusive paradigm goes: Nonintrusive methods sample the solution whereas weakly intrusive methods sample the operator. We presented a variant of the Galerkin method within this framework called Galerkin with Numerical Integration (G-NI), which is equivalent to the Galerkin method except that every integral is computed with a Gaussian quadrature rule. Through analyzing the resulting G-NI method, we derived a new and useful factorization of the matrix in the linear system solved to compute the G-NI coefficients. In short, the factorization separates the G-NI system into a matrix of orthogonal rows times a block-diagonal matrix – where each block is equal to the parameterized matrix A(s) evaluated at the Gaussian quadrature nodes – times the transpose of the matrix with orthogonal rows. This factorization yielded surprisingly useful insights into this system. In particular, we showed (i) how an iterative solver applied to the G-NI system can be implemented within the weakly intrusive paradigm, (ii) eigenvalue bounds on the G-NI system, (iii) a weighted-least squares interpretation of the G-NI coefficients, and (iv) how preconditioning strategies become apparent from the Kronecker structure of the factorization. We closed Chapter 5 with an application to a heat transfer problem in a channel with cylindrical obstructions. The input flow velocity was perturbed by introducing CHAPTER 7. SUMMARY AND CONCLUSIONS 120 three variable parameters, and the diffusion coefficient of the fluid was parameterized by an additional three parameters to represent uncertainties in the material properties. The G-NI method with the weighted basis was then applied to the solution of the scalar steady advection-diffusion equation to approximate the solution as a function of the parameters representing uncertainty. The weights computed by the method amounted to effective dimension reduction in the approximation, which resulted in dramatically fewer required basis functions. In this case, the full polynomial basis would have been infeasible due to the number of terms necessary for six parameters. We were able to take advantage of existing code for computing a related flow problem with given (unperturbed) inputs by writing a simple matrix-vector product interface to the code, thus working within the weakly intrusive paradigm. In Chapter 6, we extended the spectral methods beyond the parameterized matrix equation to a nonlinear model of conjugate heat transfer based on the ReynoldsAveraged Navier Stokes equations. We derived a hybrid collocation/Galerkin method to quantify the variability of the temperature on the boundary of a cylindrical obstruction in a channel flow given uncertainties in (i) the heat flux boundary condition and (ii) the inflow velocity specification. Essentially, we used a spectral collocation method for the nonlinear momentum equation and a spectral Galerkin method on the linear energy equation. The one-way coupling in the model (i.e. temperature depended on velocity but not vice versa) allowed us to decouple the equations for the parameterized Galerkin coefficients of the temperature. Thus, through this method we transformed the original two equation model to a system including the momentum equation and n scalar transport equations – one for each term in the Galerkin approximation. This yielded dramatic cost savings for the computation of approximate statistics. We verified all results by computing the change in computed solution over a series of approximations with an increasing number of terms. We computed the variance and conditional variance of the temperature on the cylinder wall and validated these computations with Monte Carlo estimates. We offered physical interpretations for the spatially varying variance, and posited that such a hybrid approach may be useful for more general models. CHAPTER 7. SUMMARY AND CONCLUSIONS 7.2 121 Future Work The work in this dissertation presented a complete and coherent picture of employing spectral methods to approximate the vector-valued solutions of parameterized matrix equations. Nevertheless, the possible extensions to this work are both broad and numerous. We outline some important extensions in what follows. This list is by no means exhaustive, but it does present directions with a clear initial path. 7.2.1 Improved Heuristics for Choosing a Polynomial Basis The idea of choosing an appropriate basis for a given function that exploits some prior known characteristics is common across many engineering heuristics for numerical approximation. Our ANOVA-based heuristic succeeded in discovering anisotropic parameter dependence and weak interaction effects, but the resulting approximations tended to include too many terms for the more important dimensions. Therefore there is room for improvement in the ANOVA heuristic to create a more balanced basis. In particular, the drawback of the iterative method with ANOVA is that it throws away information when collapsing the Galerkin coefficients to the ANOVA statistics. A strategy could be developed to use the all of the Galerkin coefficients directly without such a collapse. 7.2.2 Locating and Exploiting Singularities For some parameterized systems of interest, the singularities in the solution are dictated by some condition on the model, e.g. positivity of the coefficients of an elliptic equation. However, for many systems in practice the location of singularities is not known a priori, particularly if the parameterization represents variable geometry in some physical domain. Thus it could be very useful to develop heuristics for locating the singularities of the solution, which would then restrict the ranges of the parameters. In general, these would be non-convex optimization procedures with no guarantees of success. However, if they did succeed, then the location of the singularities could potentially CHAPTER 7. SUMMARY AND CONCLUSIONS 122 be used to construct singularity-matching rational basis functions. And this could reduce number of terms necessary for an accurate approximation. 7.2.3 Nonlinear Models In a sense we have restricted the applicability of our analysis to linear differential equations whose discretizations conveniently beget matrix equations. By examining general (nonlinear) parameter dependence in the parameterized matrix, we took an important step toward analyzing nonlinear parameterized models. We expect that the analysis techniques that use the Jacobi matrices can be extended to nonlinear models as well. Additionally, we can analyze a parameterized Newton’s iteration for approximating the solution of a nonlinear equation. In this framework, each iteration of the Newton’s method would involve solving a parameterized matrix equation. The difficulty in the analysis lies in the fact that the parameterized matrix at a given iteration typically depends on the solution computed at the previous iteration. Therefore, a worst-case error analysis would see errors in the polynomial approximations accumulating at each iteration. Nevertheless, this idea is worth pursuing in future work, even if it only yields heuristics. 7.2.4 Software and Benchmark Problems One way to popularize a method is to write an accessible software implementation. Such software is necessary for the continued development of methods for uncertainty quantification, and the weakly intrusive paradigm provides an elegant algorithmic framework for such a coding endeavor. By stipulating the interface for the matrixvector product for a given parameter value, we can implement of a host of algorithms for solving problems that involve parameterized matrix equations. Ideally, this could result in a flexible, cohesive, and object-oriented library of implementations that maximize code reuse. Another need in this vein is for a suite of benchmark problems for testing proposed approximation schemes. As it stands, many methods are supposedly widely CHAPTER 7. SUMMARY AND CONCLUSIONS 123 applicable, but there is no standard set of problems on which to compare methods. Simply proposing the parameterized matrix equation as a model problem encourages the development and consideration of a set of test problems. Each problem could be written with a common matrix-vector product interface, which would result in easy comparison of methods. 7.2.5 Alternative Approximation Methods As discussed at length, the success of the spectral methods depends on the distance of the solution singularities in the complex plane to the parameter region of interest. If this distance is small, then perhaps spectral methods are not ideal, i.e. they may require too many terms to be efficient. An alternative to the spectral methods – which can apply to approximating many types of unknown functions – is to develop a parameterized Lanczos-Stieljes method, which takes full advantage of the matrix structure of the problem. Stieljes method constructs the three-term recurrence coefficients for the polynomials orthogonal with respect to a given measure. Lanczos’ procedure is a basic component of many Krylov-based iterative solvers as well as iterative eigenvalue solvers. It can also be used to estimate the condition number of a matrix. Such varied utility is rare for a single idea. If this idea can be extended to the parameterized case, then perhaps some of that varied utility can extend, as well. The combination of Lanczos and Stieljes may yield a unique interpretation of the parameterized matrix as some type of weight function, and this interpretation could have powerful consequences for both rigorous analysis and general intuition. However, if we remain in the spectral methods realm and accept that we will sometimes need many coefficients for an accurate approximation, we may find some hope in the recently studied low-rank approximation methods. If the matrix of coefficients of the polynomial approximation can be legitimately assumed to have some low-rank structure – which is likely for many problems of interest – then methods such as alternating least squares can be used to directly approximate the matrix of coefficients in factored form. These methods currently have an active research community, and a cross-fertilization of ideas could be very fruitful. CHAPTER 7. SUMMARY AND CONCLUSIONS 7.3 124 Concluding Remarks The impetus for this dissertation came from a desire to test and compare the varieties of spectral methods on engineering problems with uncertain model inputs, and the first steps were to understand and implement the methods as derived and analyzed in the current literature. A series of simple test problems left us with some unanswered questions: (i)are the intrusive methods more accurate than the nonintrusive methods? (ii) how do we know if the order of polynomial approximation is sufficiently large? and (iii) how might we improve these methods given the scale of the computations? By posing the parameterized matrix equation, we were able to address these questions for a broad range of applications and set an anchor for analysis. In particular, the comparable convergence rates led us to favor the nonintrusive methods due to ease of implementation, except in the multivariate case where the basis functions can be from non-tensor sets. In the latter case, determining the cost is more complicated. To check the quality of a given finite term approximation, we derived a practical residual error estimate that behaves like the true error. As an improvement, we proposed an anisotropic approximation scheme in the Galerkin framework the uses heuristics to choose an appropriate basis set for a finite term approximation. In the end, we applied these analyses to engineering problems of interest to compute measures of confidence for the model outputs. Bibliography [1] AIAA. Guide for the Verification and Validation of Computational Fluid Dynamics Simulations. Number AIAA-G-077-1998. American Institute of Aeronautics and Astronautics, Reston, VA, 1998. [2] G. Alefeld and J. Herzberger. Introduction to Interval Computations. Academic Press, 1983. [3] G. E. B. Archer, A. Saltelli, and I. M. Sobol. Sensitivity measures, anova-like techniques and the use of bootstrap. Journal of Statistical Computation and Simulation, 58(2):99 – 120, 1997. [4] K. Avrachenkov, N. Litvak, and K.S. Pham. Distribution of pagerank mass among principle components of the web. In Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph (WAW2007), pages 16 – 28, 2007. [5] Ivo Babus̆ka, Manas K. Deb, and J. Tinsley Oden. Solution of stochastic partial differential equations using galerkin finite element techniques. Computer Methods in Applied Mechanics and Engineering, 190(48):6359–6372, 2001. [6] Ivo Babus̆ka, Fabio Nobile, and Raul Tempone. A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Journal of Numerical Analysis, 45:1005 – 1034, 2007. [7] Ivo Babus̆ka, Raul Tempone, and George E. Zouraris. Galerkin finite element approximations of stochastic elliptic partial differential equations. SIAM Journal of Numerical Analysis, 42:800 – 825, 2004. 125 BIBLIOGRAPHY 126 [8] S. Balay, K. Buschelman, W.D. Gropp, D. Kaushik, M. Knepley, L. Curfman McInnes, B.F. Smith, and H. Zhang. Petsc: Portable, extensible toolkit for scientific computation. [9] Marcel Bieri, Roman Andreev, and Christoph Schwab. Sparse tensor discretization of elliptic spdes. Technical Report 2009-07, ETH-Zurich, 2009. [10] Marcel Bieri and Christoph Schwab. Sparse high order fem for elliptic spdes. Computer Methods in Applied Mechanics and Engineering, 198:1149 – 1170, 2009. [11] S. Bochner and W.T. Martin. Several Complex Variables. Princeton University Press, 1948. [12] John P. Boyd. Chebyshev and Fourier Spectral Methods. Dover, 2nd edition, 2001. [13] John P. Boyd. Large-degree asymptotics and exponential asymptotics for Fourier, Chebyshev and Hermite coefficients and Fourier transforms. Journal of Engineering Mathematics, 63:355 – 399, 2009. [14] Susan C. Brenner and L. Ridgway Scott. The Mathematical Theory of Finite Element Methods. Springer, 2nd edition, 2002. [15] Angelika Bunse-Gerstner, Ralph Byers, Volker Mehrmann, and Nancy K. Nichols. Numerical computation of an analytic singular value decomposition of a matrix valued function. Numerische Mathematik, 60:1–39, 1991. [16] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral Methods: Fundamentals in Single Domains. Springer, 2006. [17] J Chung and J G Nagy. Nonlinear least squares and super resolution. Journal of Physics: Conference Series, 124:012019 (10pp), 2008. [18] Paul G. Constantine, Alireza Doostan, and Gianluca Iaccarino. A hybrid collocation/galerkin scheme for convective heat transfer problems with stochastic BIBLIOGRAPHY 127 boundary conditions. International Journal for Numerical Methods in Engineering, 2009. [19] Paul G. Constantine and David F. Gleich. Using polynomial chaos to compute the influence of multiple random surfers in the PageRank model. In Springer, editor, Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph, 2007. [20] Manas K. Deb, Ivo M. Babuska, and J. Tinsley Oden. Solution of stochastic partial differential equations using galerkin finite element techniques. Computer Methods in Applied Mechanics and Engineering, 190(48):6359 – 6372, 2001. [21] Bert J. Debusschere, Habib N. Najm, Philippe P. Pebay, Omar M. Knio, Roger G. Ghanem, and Olivier P. Le Maı̂etre. Numerical challenges in the use of polynomial chaos representations for stochastic processes. SIAM Journal of Scientific Computing, 26(2):698 – 719, 2004. [22] Luca Dieci and Timo Eirola. On smooth decompositions of matrices. SIAM Journal on Matrix Analysis and Applications, 20(3):800–819, 1999. [23] A. Doostan and G. Iaccarino. A least-squares approximation of partial differential equations with high-dimensional random inputs. Journal of Computational Physics, 2009. In Press, Available online 19 March 2009. [24] Charles F. Dunkl and Yuan Xu. Orthogonal Polynomials in Several Variables. Cambridge University Press, 2001. [25] Michael S. Eldred, Brian M. Adams, David M. Gay, Laura P. Swiler, Karen Haskell, William J. Bohnhoff, John P. Eddy, William E. Hart, Jean-Paul Watson, Patricia D. Hough, and Tammy G. Kolda. DAKOTA, a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis. Technical Report SAND2006-6337, Sandia National Laboratories, 2006. BIBLIOGRAPHY 128 [26] O.G. Ernst, C.E. Powell, D.J. Silvester, and E. Ullmann. Efficient solvers for a linear stochastic Galerkin mixed formulation of diffusion problems with random data. SIAM Journal on Scientific Computing, 31(2):1424–1447, 2009. [27] Paulo Fernandes, Brigitte Plateau, and William J. Stewart. Efficient descriptorvector multiplications in stochastic automata networks. J. ACM, 45(3):381–414, May 1998. [28] Philipp Frauenfelder, Christoph Schwab, and Radu Alexandru Todor. Finite elements for elliptic problems with stochastic coefficients. Computer Methods in Applied Mechanics and Engineering, 194(2-5):205 – 228, 2005. Selected papers from the 11th Conference on The Mathematics of Finite Elements and Applications. [29] Walter Gautschi. The interplay between classical analyses and (numerical) linear algebra — a tribute to Gene Golub. Electronic Transactions on Numerical Analysis, 13:119 – 147, 2002. [30] Walter Gautschi. Orthogonal Polynomials: Computation and Approximation. Clarendon Press, Oxford, 2004. [31] R. Ghanem. Hybrid stochastic finite elements and generalized monte carlo simulation. Journal of Applied Mechanics, 65(4):1004–1009, 1998. [32] R. Ghanem and A. Doostan. On the construction and analysis of stochastic models: characterization and propagation of the errors associated with limited data. Journal of Computational Physics, 217:63 – 81, 2006. [33] Roger Ghanem and Pol Spanos. Stochastic Finite Elements: A Spectral Approach. Springer-Verlag, New York, 1991. [34] David F. Gleich. Models and Algorithms for PageRank Sensitivity. PhD thesis, Stanford University, 2009. [35] G. H. Golub and Charles F. VanLoan. An analysis of the total least squares problem. SIAM Journal of Numerical Analysis, 17:889 – 893, 1980. BIBLIOGRAPHY 129 [36] Gene H. Golub and Gerard Meurant. Matrices, moments, and quadrature. Longman, Essex, U.K., 1994. [37] Gene H. Golub and Charles F. VanLoan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 3rd edition, 1996. [38] Gene H. Golub and John H. Welsch. Calculation of Gauss quadrature rules. Mathematics of Computation, 23, 1969. [39] David Gottlieb and Steven A. Orszag. Numerical Analysis of Spectral Methods: Theory and Applications. SIAM, 1977. [40] Michael Griebel. Sparse grids and related approximation schemes for higher dimensional problems. In Foundations of Computational Mathematics, Santander, 2005. [41] J. C. Helton and F. J. Davis. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering & System Safety, 81(1):23 – 69, 2003. [42] J. C. Helton and W. L. Oberkampf. Alternative representations of epistemic uncertainty. Reliability Engineering & System Safety, 85(1-3):1 – 10, 2004. Alternative Representations of Epistemic Uncertainty. [43] Jan S. Hesthaven, Siegel Gottlieb, and David Gottlieb. Spectral Methods for Time Dependent Problems. Cambridge University Press, 2007. [44] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991. [45] Thomas Y. Hou, Wuan Luo, Boris Rozovskii, and Hao-Min Zhou. Wiener chaos expansions and numerical solutions of randomly forced equations of fluid mechanics. Journal of Computational Physics, 216(2):687 – 706, 2006. [46] G. Iaccarino, A. Ooi, P.A. Durbin, and Behnia M. Conjugate heat transfer predictions in two-dimensional ribbed passages. International Journal of Heat and Fluid Flow, 23:340 – 345, 2002. BIBLIOGRAPHY 130 [47] Fluent Inc. FLUENT 6.1 User’s Guide. Fluent Inc., Lebanon, New Hampshire, 2003. [48] Tosio Kato. Perturbation Theory for Linear Operators. Springer-Verlag, 2nd edition, 1980. [49] Patrick Knupp and Kambiz Salari. Verification of Computer Codes in Computational Science and Engineering. Chapman and Hall/CRC, 2003. [50] Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255 – 282, 1950. [51] A. N. Langville and C. D. Meyer. Google’s PageRank and Beyond: The Science of Search. Princeton University Press, 2006. [52] Olivier P. Le Maı̂etre, Omar M. Kino, Habib N. Najm, and Roger G. Ghanem. A stochastic projection method for fluid flow. i: basic formulation. J. Comput. Phys., 173(2):481–511, 2001. [53] Olivier P. Le Maı̂etre, O. M. Knio, H. N. Najm, and R. G. Ghanem. Uncertainty propagation using wiener-haar expansions. Journal of Computational Physics, 197(1):28 – 57, 2004. [54] Olivier P. Le Maı̂etre, Matthew T. Reagan, Habib N. Najm, Roger G. Ghanem, and Omar M. Knio. A stochastic projection method for fluid flow ii.: random process. J. Comput. Phys., 181(1):9–44, 2002. [55] Richard B. Lehoucq, Danny C. Sorensen, and C. Yang. ARPACK User’s Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, 1998. [56] Yung-Ta Li, Zhaojun Bai, and Yangfeng Su. A two-directional Arnoldi process and its application to parametric model order reduction. Journal of Computational and Applied Mathematics, 226(1):10 – 21, 2009. Special Issue: The BIBLIOGRAPHY 131 First International Conference on Numerical Algebra and Scientific Computing (NASC06). [57] Ruixue Liu and Art B. Owen. Estimating mean dimensionality of analysis of variance decompositions. Journal of the American Statistical Association, 101:712– 721, 2006. [58] Michel Loève. Probability Theory II. Springer-Verlag, 1978. [59] Lyall M. Heat transfer from low aspect ratio pin fins. Master’s thesis, Virginia Polytechnic Insititute and State University, 2006. [60] Najork M.A, H. Zaragoza, and M.J. Taylor. Hits on the web: how does it compare? In Proceedings of the 30th annual international ACM SIGIR conference on Research and Development in information retrieal (SIGIR2007), pages 471 – 478, 2007. [61] L. Mathelin, M. Y. Hussaini, and T. A. Zang. Stochastic approaches to uncertainty quantification in cfd simulations. Numerical Algorithms, 38(1):209 – 236, 2005. [62] Hermann G. Matthies and Andreas Keese. Galerkin methods for linear and nonlinear elliptic stochastic partial differential equations. Computer Methods in Applied Mechanics and Engineering, 194(12-16):1295 – 1331, 2005. Special Issue on Computational Methods in Stochastic Mechanics and Reliability Analysis. [63] F.R. Menter. Two-equation eddy-viscosity turbulence models for engineering applications. AIAA Journal, 32(8):1598 – 1605, 1994. [64] Carl D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2000. [65] Cleve B. Moler. Numerical Computing with Matlab. SIAM, January 2004. [66] Prasanth B. Nair and Andrew J. Keane. Stochastic reduced basis methods. AIAA Journal, 40:1653 – 1664, 2002. BIBLIOGRAPHY 132 [67] Fabio Nobile, Raul Tempone, and Clayton G. Webster. An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data. SIAM Journal of Numerical Analysis, 46:2411–2442, 2008. [68] Fabio Nobile, Raul Tempone, and Clayton G. Webster. A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM Journal of Numerical Analysis, 46:2309 – 2345, 2008. [69] Anthony Nouy. A generalized spectral decomposition technique to solve a class of linear stochastic partial differential equations. Computer Methods in Applied Mechanics and Engineering, 196(45-48):4521 – 4537, 2007. [70] William L Oberkampf and Timothy G Trucano. Verification and validation in computational fluid dynamics. Technical Report SAND2002-0529, Sandia National Laboratories, 2002. [71] William L Oberkampf, Timothy G Trucano, and Charles Hirsch. Verification, validation, and predictive capability in computational engineering and physics. Applied Mechanics Reviews, 57(5):345–384, 2004. [72] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University, 1999. [73] C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations. SIAM Journal of Numerical Analysis, 12(4):617 – 629, 1975. [74] M.R. Pellissetti and R. Ghanem. Iterative solution of systems of linear equations arising in the context of stochastic finite elements. Advances in Engineering Software, 31:607 – 616, 2000. [75] R. Pečnik, P. Constantine, F. Ham, and G. Iaccarino. A probabilistic framework for high-speed flow simulations. Center for Turbulence Research – Annual Research Briefs, pages 3 – 17, 2008. BIBLIOGRAPHY 133 [76] Catherine E. Powell and Howard C. Elman. Block-diagonal preconditioning for spectral stochastic finite-element systems. IMA Journal of Numerical Analysis, Advance Access, 2008. [77] C.E. Powell. Personal communication., 2009. [78] C. Prud’homme, D. V. Rovas, K. Veroy, L. Machiels, Y. Maday, A. T. Patera, and G. Turinici. Reliable real-time solution of parametrized partial differential equations: Reduced-basis output bound methods. Journal of Fluids Engineering, 124(1):70–80, 2002. [79] Theodore J. Rivlin. An Introduction to the Approximation of Functions. Blaisdell Publishing Company, 1969. [80] Yousef Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2003. [81] Ji-Guang Sun. Eigenvalues and eigenvectors of a matrix dependent on several parameters. Journal of Computational Mathematics, 3:351–364, 1985. [82] Gabor Szegö. Orthogonal Polynomials. American Mathematical Society, Providence, RI, 1939. [83] Daniel M. Tartakovsky and Dongbin Xiu. Stochastic analysis of transport in tubes with rough walls. J. Comput. Phys., 217(1):248–259, 2006. [84] Ray S. Tuminaro, Mike Heroux, Scott A. Hutchinson, and John N. Shadid. Official aztec user’s guide 1 version 2.1, 1999. [85] John von Neumann and H. H. Goldstine. Numerical inverting of matrices of high order. Bulletin of the American Mathematical Society, 53:1021 – 1099, 1947. [86] Q. Wang, P. Moin, and G. Iaccarino. A high-order multi-variate approximation scheme for arbitrary data sets. submitted, 2009. [87] Norbert Wiener. The homogeneous chaos. American Journal of Mathematics, 60(4):897–936, 1938. BIBLIOGRAPHY 134 [88] J. H. Wilkinson. Modern error analysis. SIAM Review, 13(4):548–568, 1971. [89] Dongbin Xiu and Jan S. Hesthaven. High order collocation methods for differential equations with random inputs. SIAM Journal of Scientific Computing, 27:1118 – 1139, 2005. [90] Dongbin Xiu and George Karniadakis. The Wiener-Askey polynomial chaos for stochastic differential equations. SIAM Journal of Scientific Computing, 24:619 – 644, 2002.