Chapter 3 A posteriori error estimation and adaptivity 3.1 3.1.1 Introduction Review: The main questions In this chapter we return to finite element approximation of solutions to Poisson’s problem u= f in ⌦, where ⌦ is a polygonal or polyhedral domain. To set the stage, we briefly recall some computational observations we made in our computational prologue in Chapter 1. Let uh 2 Vh be a Lagrange finite element approximation to u on a shape-regular grid. Then: 1. Assume that u 2 H s (⌦). If we use finite element space of degree r and quasi-uniform grids of size h, then ku uh kH 1 (⌦) Ch kukH +1 (⌦) , where = min(r, s 1). The computational rates we observed in Chapter 1 align precisely with those that we predict from the H s regularity results obtained for polygonal and polyhedral domains in Chapter 2. 2. When solving the problem u = 1 In two space dimensions, we always were able to recover optimal convergence rates DOF r/2 by employing adaptivity. 3. When solving the problem u = 1 in three space dimensions, adaptivity generally led to improvement in convergence rates, but were were not always able to recover optimal convergence rates DOF r/3 . In summary, the results of Chapter 2 now allow us to make rigorously grounded theoretical predictions that align with experimental results when using quasi-uniform mesh refinement. However, even though we now understand the singularities produced on polyhedral domains, we still haven’t rigorously explained how adaptive mesh refinement works to improve convergence rates in many cases, and why its ability to do so is sometimes limited. Note that there are two separate questions we may ask in this regard: 1. Given a function u with a given singularity structure, is it in theory possible to construct a (possibly locally refined) mesh such that ku uh kH 1 (⌦) . DOF r/d ? 41 42 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY 2. Can we construct an algorithm such that given the right-hand-side f and boundary value problem u = f , u = 0 on @⌦, the algorithm returns a sequence of meshes that recovers the rate ku uh kH 1 (⌦) . DOF r/d ? Or more generally, does the algorithm return a sequence of meshes which is in some sense the best possible sequence for approximating u? The first question above is essentially a question in a priori error estimation, and there have been many papers over the years that have addressed it. In the interests of time we will mainly leave it alone for now, but may return to it later. The short answer is that using weighted Sobolev spaces, one can prove that it is always possible to construct meshes that recover the optimal rate of convergence DOF r/d , provided that one allows for anisotropic mesh refinement in three space dimensions. The latter point is critical, as anisotropic meshing can be much trickier than shape-regular meshing. Our main focus for now will be on the second question: Can we construct an algorithm that will automatically produce a sequence of meshes that optimally reduces the finite element error? 3.1.2 Vocabulary and a little history We now recall some basic error estimation concepts and vocabulary. First, a (quasi)-optimality result states that a given numerical approximation is somehow the best possible. Ceá’s Lemma is an example of an optimality result: ku uh kH01 (⌦) = inf ku 2Vh kH01 (⌦) . An estimate is quasi-optimal if it expresses optimality up to a nonessential constant, e.g., ku uh kH 1 (⌦) C inf ku 2Vh kH 1 (⌦) , where C does not depend on u or the mesh size parameter h. Such optimality results do not even guarantee that a given method converges, much less that it converges with a given rate. They do however reduce the problem of showing such convergence to proving things about the finite element space, that is, they remove the PDE from task of error estimation. An a priori error estimate is one which estimates the finite element error in terms of a mesh parameter and the generally unknown continuous solution u. A standard example is the bound ku uh kH 1 (⌦) Chr |u|H r+1 (⌦) . A priori estimates are quite useful in understanding whether a given method is e↵ective, but are much less useful in assessing whether the output of a given computation is accurate. We have already seen a major reason why this is so: Very often the solution u is simply not smooth enough to make use of the estimate above. However, proof of such estimates of course ensures that the method will make full use of the polynomials at hand if the regularity of u allows it. A priori estimates also are an important tool for testing codes, as one may insert a manufactured solution with known regularity and test whether the expected rate of convergence is observed in practice. A priori error estimation has been studied intensively since the inception of mathematical study of finite element methods in the late 1960’s and early 1970’s. By the late 1970’s, basic error estimates had been established for a wide range of problems and methods, while quite detailed estimates (estimates in Lp norms, local estimates, etc.) had been proved for relatively simple model problems. 3.1. INTRODUCTION 43 An a posteriori error bound is one which estimates the error in a given finite element computation by means of some computable functional of the problem data, finite element solution, mesh, and finite element space: ku uh kH 1 (⌦) E(f, uh , Th , Vh ), (3.1) reliability where Th is the mesh. We call such a functional E an a posteriori error estimator. In ideal cases there is no unknown constant in the error estimate, although one has to work harder to obtain such estimates. There are many di↵erent options in the literature for reliability obtaining useful functionals E having the above form. We say that the estimator E is reliable if (3.39) holds at least up to a nonessential constant, and it is efficient if it also provides a lower bound for the error: E(f, uh , Th , Vh ) Cku uh kH 1 (⌦) . (3.2) Reliability guarantees that if the quantity E is small, then so is the actual error. That is, reliability ensures that we do not underestimate the error. Efficiency, on the other hand, guarantees that E does not overestimate the true error. The e↵ectivity index ef f = estimated error true error is often used to assess the quality of a given error estimator E. Ideally we have ef f = 1. In practice, some estimators guarantee that ef f ! 1 as h ! 0; this property is called asymptotic exactness. Other estimators guarantee that c1 ef f c2 on any mesh (that is, they are efficient and reliable on any mesh). Much research has been directed toward finding estimators that are both asymptotically exact and unconditionally reliable. An adaptive finite element method is an iterative feedback of the form solve ! estimate ! mark ! refine. (3.3) A rough definition of the modules is as follows. One first solves for the finite element solution uh . In the estimate step, an estimator E is used to determine whether the error has reached a given userdefined tolerance. If not, one proceeds to the mark step. In this step a posteriori error indicators ⌘(T ) are used to determine which elements in the mesh are “most responsible” for causing the error. A posteriori error indicators ⌘(T ) are roughly speaking mesh functions which assign a nonnegative number to each element T 2 Th . If ⌘(T ) is relatively large, then T is judged to be more responsible for the error on the given mesh. Thus in mark, some subset of the elements in Th is “marked”. In refine, the marked elements are refined (e.g., bisected) in order to produce a new mesh, and the feedback procedure is repeated. Adaptive FEM are related to a posteriori error estimation, although the two tasks can also be separated. One might be interested in assessing the solution quality a posteriori on a fixed mesh. Alternatively, it is quite possible to adaptively refine the mesh without being too concerned about accurately assessing the error. However, in typical situations P the local error indicators ⌘(T ) are used to construct an error estimator. Very often we have E = ( T 2Th ⌘(T )2 )1/2 . A posteriori error estimation and adaptivity, although now fairly well-understood, developed as subjects on the whole somewhat later than a priori theory. Early study of a posteriori error BR78 estimation was carried out for example in a 1978 paper of Babuska and Rheinboldt [5]. The topic continued to be studied in 80’s and up to the present, with many improvements and innovations afem 44 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY occurring over the years. The concept of adaptivity developed around the same time, and a 1984 BV84 paper of Babuska and Vogelius [6] gave an a convergence analysis of an adaptive FEM for 1D problems. After this the next paper to appear on the subject was the landmark 1996 paper of Dor96 Dörfler [24], which provided foundational ideas for the rigorous study of adaptive FEM that has blossomed over the two decades since. Binev, Dahmen, and Devore in 2004 used ideas from nonlinear BDD04 Ste07 approximation theory to establish a notion of optimality for AFEM [12]. Stevenson’s 2007 paper [38] established optimality of a standard (practical) AFEM. Finally, a mature theory for linearCKNS08 elliptic scalar problems was given by Cascon, Kreuzer, Nochetto, and Siebert in the 2008 paper [16], and many papers have since appeared which adapt that work’s general framework to other situations. From a historical perspective it is interesting to note that adaptive finite element methods were used for many years before any in-depth understanding of their mathematical properties was gained. This is probably due in part to the fact that their convergence can be observed by use of a posteriori error estimators. Another reason is that, as we shall see below, the ideas needed to analyze AFEM are quite di↵erent in many ways than those used to understand standard FEM using a priori error estimates. 3.2 Residual a posteriori error estimates and AFEM convergence There are quite a number of types of a posteriori error estimates available in the literature. We first will prove residual-type a posteriori error estimates and look at their properties in some detail. We will then survey other types of a posteriori error estimators and discuss their relative advantages and drawbacks. Note that our purpose in studying residual estimators in depth is twofold: They will provide the foundation for our study of AFEM convergence theory, and they also provide the basis for understanding the properties of other types of estimators. 3.2.1 Preliminaries and technical tools Let ⌦ ⇢ Rd be a polyhedral domain. We generally think of d = 2, 3, but this is not necessary for our considerations. Let Th be a simplicial decomposition of ⌦. As in Chapter 1, we assume that Th is conforming in the sense that T1 \ T2 is either empty or a shared subsimplex (edge, face, vertex, etc.) of both elements. In addition, we assume that Th is shape-regular. That is, there are constants c, C such that for each T 2 Th , there exists balls having diameters cdiam(T ) and Cdiam(T ) inscribed in and superscribing T , respectively. Recall that shape-regularity places some restrictions on mesh geometry but allows for substantial local grading (refinement). Let hT = |T |1/d be the local mesh size. This definition of hT is perhaps unfamiliar, as typically in the finite element literature hT is defined as the diameter of T . These two definitions are equivalent for shape-regular grids, but the one we use is more convenient for technical purposes. Next we define a patch of elements about a given T 2 Th : !T = [T 0 2Th :T 0 \T 6=; T 0 . Finally, we shall consider standard Lagrange finite element spaces of degree r on Th : ¯ : u|T 2 Pr }. Vh = {u 2 C(⌦) Note that we have not incorporated any boundary conditions into our definition of Vh as of yet. 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 45 Trace inequalities play an important role below. Note that on a reference element T̂ (i.e., a simplex with vertices at the origin and the canonical unit directions (0, ..., 0, 1, 0, ...0)), we have for 1p<1 kvkLp (@ T̂ ) . kvkW p,1 (T̂ ) , v 2 H 1 (T̂ ). Scaling this inequality to an arbitrary T 2 Th yields kvkLp (@T ) . hT 1/p 1 1/p kvkLp (T ) + hT krvkLp (T ) . (3.4) scaledtrace Implicitly used in the above inequalities is the fact that if v 2 H 1 (T ), then trace v 2 L2 (@T ). We will also need the Bramble-Hilbert Lemma below. The Bramble-Hilbert Lemma on a single BS08 mesh element is quite standard in finite element texts (cf. [15]). The final form needed below (for DS80 element patches) can be found for example in [25, Theorem 7.1]. Lemma 3.2.1 Suppose that ! is either an element T 2 Th or an element patch !T corresponding to T 2 Th . Then for 0 k r, k m r + 1, 1 p 1, and u 2 W p,m (!T ), infr |u 2P |W p,k (!) Chm T k |u|W p,m (!) . (3.5) BHL Here C depends on the shape regularity properties of Th but not hT or u. We do not prove the Bramble-Hilbert Lemma here, but make a couple of notes on the proof. Note first that when k = 0 and m = 1 the Bramble-Hilbert Lemma reduces to a Poincaré inequality. One standard technique for proving the Bramble-Hilbert Lemma (used in the references given above) is to essentially generalize standard techniques used to prove Poincaré inequalities. This is to use averaged Taylor polynomials and potential theory. Another option is assume the standard Poincaré inequality and then iterate it to “knock out” higher-derivative terms. Finally, we mention that shape regularity implies that hT ⇠ hT 0 ⇠ diam(!T ), T 0 ⇢ !T . That is, shape regular meshes are locally quasi-uniform. In addition, shape regular meshes possess a finite overlap property: #(T 0 ⇢ !T ) . 1. (3.6) SZ90 We next recall the Scott-Zhang interpolation operator [37]. Let N be the Lagrange nodes of Th . For a node z 2 N , let z be the Lagrange basis function. That is, z 2 Vh , z (z) = 1, and 0 0 0 z (z ) = 0, z 2 N and z 6= z. We next attach to each z 2 N a “control simplex” Tz . Tz may either be an element in Th , or a d 1-dimensional subsimplex (face) of an element in Th . In either case, we require that z 2 Tz . In particular, we use the following rules: 1. If z 2 int(⌦), then Tz 2 Th is any mesh element containing z. 2. If z 2 @⌦, then we choose Tz ⇢ @⌦ to be any element face (d lying on @⌦ and containing z. 1-dimensional subsimplex) finite_overlap 46 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY Finally, we let 'z 2 Pr (Tz ) be dual to { z }z2N in the sense that for z, z 0 2 N , ⇢ Z 1, z = z 0 , 'z z 0 = 0, z 6= z 0 . Tz Given v 2 H 1 (⌦), we now define the Scott-Zhang quasi-interpolant as Z X Ih v(x) = (x) v'z . z (3.7) szduality (3.8) szdef Tz z2N (Ih is referred to as a quasi-interpolant because it does not actually interpolate, or exactly reproduce, v at a set of control points.) szproperties Theorem 3.2.2 The Scott-Zhang interpolant Ih acts as the identity on Vh . In addition, given v 2 W p,1 (⌦) (1 p 1), we have kIh vkLp (T ) . kvkLp (!T ) + hT krvkLp (!T ) , krIh vkLp (T ) . krvkLp (!T ) . (3.9) h1stab (3.10) l2stab If we additionally assume that v 2 W0p,1 (⌦), then Ih v 2 W0p,1 (⌦) \ Vh , and kIh vkLp (T ) . kvkLp (!T ) , krIh vkLp (T ) . krvkLp (!T ) . Proof. We first show that Ih is a projection (actsszduality as the identity on Vh ). Given T 2 Th or a subsimplex Tz , let NT = {z 2 N : z 2 T }. We use (3.7) to compute that for v 2 Vh , Z X Ih v = v'z z z2N = X z z2N = X Z Tz 'z Tz X v(z 0 ) z0 z 0 2NTz z v(z) = v. z2N We next need to bound norms of 'z and inequalities yields k z kW p,k (T ) = k+d/p . hT z Also, k'z kL1 (Tz ) = Clearly k z. sup v2Pr (Tz ),kvkL1 (Tz ) =1 sup v2Pr (Tz ),kvkL1 (Tz ) =1 Z Z z kL1 (Tz ) . 1, so employing inverse , k = 0, 1. (3.11) 'z v Tz 'z Tz X z 0 2N ,z 0 2Tz v(z 0 ) z0 = v(z) 1. sizepsi 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 47 Standard inverse inequalities yield k dim(Tz )(1 1/p) k'z kW p,k (Tz ) . hTz sizepsi , k = 0, 1. We may then use (3.11) and (3.12) to compute that for T 2 Th , Z X X kIh vkLp (T ) = k v'z kLp (T ) k z kLp (T ) kvkLp (Tz ) k'z kLq (Tz ) z Tz z2NT . X z2NT Here 1 p + 1 q (3.12) sizephi (3.13) eq3-1 sizephi z2NT d/p dim(Tz )/q hT kvkLp (Tz ) hT . = 1. If Tz 2 Th , then dim(Tz ) = d and kvkLp (Tz ) kvkLp (!T ) . If Tz is a face, then scaledtrace 1/p 1 1/p dim(Tz ) = d 1, and we also use (3.4) to find that kvkLp (Tz ) . hT kvkLp (!T ) + hT In either case, d/p dim(Tz )/p hT kvkLp (Tz ) hT . kvkLp (!T ) + hT krvkLp (!T ) , krvkLp (!T ) . h1stab which yields the first line of (3.9). If in addition v 2 W0p,1 (⌦), then all boundary terms fall out, l2stab leaving only those Tz for which dim(Tz ) = d. This then yields the first line of (3.10), since we no longer need a trace inequality toh1stab bound kvk Lp (Tz ) for any z. l2stab To obtain the second line of (3.9) and (3.10), note that for any constant A, rIh v = r(Ih v A) = r(I A)), since Ih is the identity on V h (v h and p 2 Vh . Using an inverse inequality, the first line of h1stab BHL (3.9), and the Bramble-Hilbert Lemma (3.5) with p = 2, k = 0, and m = 1 yields for appropriately chosen A krIh vkLp (T ) . hT 1 kIh (v A)kLp (T ) . hT 1 kv AkLp (!T ) + knabla(v A)kLp (!T ) . krvkLp (!T ) . 2 scaledtrace BHL h1stab l2stab We now combine the trace estimate (3.4), the Bramble-Hilbert Lemma (3.5), and (3.9) and (3.10) in order to obtain approximation estimates of the form we shall need below. Theorem 3.2.3 Assume that u 2 W p,m (⌦) with 0 k m r + 1 and 1 p 1. Then for T 2 Th , ku If in addition m k |u|W p,m (!T ) , (3.14) approx_el (3.15) approx_bd 1, then for T 2 Th ku 3.2.2 Ih ukW p,k (T ) . hm T m 1/p Ih ukLp (@T ) . hT |u|W p,m (!T ) . Residual-type a posteriori error estimates Assume that we approximate the solution u to a di↵erential equation Lu = f by uh . The basic approach of residual-type estimates is to bound the residual Lu f in an appropriate norm (typically a dual to the norm in which we wish to measure the error). We begin by defining elementwise error indicators. First recall that a finite element function uh 2 Vh is continuous, but its gradient ruh is discontinuous across element boundaries. More 48 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY precisely, ruh · ~n is discontinuous on @T , with ~n the outward-pointing unit normal on @T . The tangential component of ruh is continuous. Given T, T 0 2 Th sharing a face e, we define Jruh K = ruh |T · ~nT + ruh |T 0 · ~nT 0 . (3.16) ⌘(T )2 = h2T k uh + f k2L2 (T ) + hT kJruh KkL2 (@T ) . (3.17) Note that adding rather than subtracting is correct here because ~nT = ~nT 0 . Also, for x 2 @T we more precisely define ruh |T (x) = limT 3x0 !x ruh (x0 ). Also, note that because the tangential component of ruh is continuous across @T we have |Jruh K| = |ruh |T ruh |T 0 |. To fix thoughts, we solve u = f in ⌦, u = 0 on @⌦, as before. For element faces lying on the boundary, we define Jruh K = 0. The reasons for this will become clear later. We will also briefly discuss modifications for other boundary conditions later on. For T 2 Th , let jump_def eta_def We then have the following basic a posteriori error estimate. Theorem 3.2.4 Assume that uh 2 Vh0 := Vh \ H01 (⌦) is the standard finite element approximation to u. Then there exists a constant Crel depending on the shape regularity properties of Th , but not on other essential quantities, such that X ku uh kH01 (⌦) Crel ( ⌘(T )2 )1/2 . (3.18) T 2Th Proof. Our basic strategy is to apply Galerkin orthogonality, integrate parts elementwise, do some careful bookkeeping, and apply the approximation properties proved in the previous subsection. In particular, Rlet = u uhR, and let Ih 2 Vh0 . Applying Galerkin orthogonality, applying the global weak form ⌦ ruh · rv = ⌦ f v, and integrating by parts elementwise for the remaining terms yields Z Z 2 ku uh kH 1 (⌦) = r(u uh ) · r = r(u uh )r( Ih ) 0 ⌦ Z⌦ Z (3.19) X = f( Ih ) uh ( Ih ) ruh · ~nT ( Ih ). ⌦ Note that element faces lying in @⌦, and we may compute that X Z ruh · ~nT ( T 2Th @T T 2Th Ih = 0. Otherwise each face is shared by two elements, Ih ) = @T Z 1 X Jruh K( 2 @T Ih ). (3.20) T 2Th Combining the previous two equalities while recalling that u = f , applying Cauchy-Schwarz, and approx_el approx_bd using (3.14) and (3.15), and finally applying the `2 Cauchy-Schwarz inequality over Th yields ku uh k2H 1 (⌦) 0 . X uh kL2 (T ) k (kf + uh kL2 (T ) hT | |H 1 (!T ) + kJruh KkL2 (@T ) hT | |H 1 (!T ) ) T 2Th X T 2Th .( 1 Ih kL2 (T ) + kJruh KkL2 (@T ) k 2 kf + X T 2Th ⌘(T )2 )1/2 ( 1/2 X T 2Th | 2 1/2 . H 1 (!T ) ) Ih kL2 (@T ) (3.21) res_reliability 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 49 finite_overlap The finite overlap property (3.6) implies that X 2 2 | H 1 (! ) . | |H 1 (⌦) = ku T T 2Th uh kH01 (⌦) . Combining the last two inequalities and dividing through by ku (3.22) uh kH01 (⌦) completes the proof. 2 We also prove a local a posteriori upper bound for the di↵erence between finite element solutions on two nested grids. Let T be a shape-regular grid having the same properties as Th as above, and let T 0 be a refinement of T . That is, for each T 2 T , we either have T 2 T 0 , or T is the union of some subset of elements in T 0 . In this case we write T ⇢ T 0 . Denote also by RT !T 0 the set of elements that are refined in passing from T to T 0 . Let VT and VT 0 be Lagrange finite element spaces on T and T 0 as above. Finally, denote by ⌘T (T ) and ⌘T 0 (T ) denote elementwise error indicators (as above) computed on T and T 0 , respectively. Corollary 3.2.5 Let T ⇢ T 0 as above. Let uT and uT 0 be the Galerkin solutions on T and T 0 . Then X kuT u0T kH01 (⌦) . Crel ( ⌘T (T )2 )1/2 . (3.23) locupper T 2RT !T 0 res_reliability Proof. Following the proof of (3.18), let = uT uT 0 . We consider the interpolation error Ih . Let Nnr be the set of nodes lying in elements that are not refined in passing from T to T 0 , that is, lying in T \ RT !T 0 . We define Ih so that the control simplices for all z 2 Nnr res_reliability also lie in T \ RT !T 0 . In this case supp( Ih ) ⇢ [T 2RT !T 0 T . We may then follow the proof of (3.18) nearly verbatim, except while noting that Ih = 0 on elements lying outside of RT !T 0 and thus residual terms from those elements may be omitted. 2 res_reliability P The inequality (3.18) establishes that the estimator Crel ( T 2Th ⌘(T )2 )1/2 is a reliable estimator for the energy error ku uh kH01 (⌦) . We next wish to establish that it is also efficient. In order to do so, we first introduce the concept of data oscillation. Data oscillation measures the degree to which problem data (right-hand-side, coefficients, etc.) di↵er from piecewise polynomials. In our case, data oscillation only depends on the right-hand-side f . We define X osc(T ) = infr 1 hT kf fT kL2 (T ) , osc(Th ) = ( osc(T )2 )1/2 . (3.24) fT 2P oscdef T 2Th We similarly define osc(T 0 ) and osc(!) for any subset T 0 ⇢ Th and ! ⇢ ⌦. Theorem 3.2.6 Assume that T 2 Th . Then 2 ⌘(T )2 C̃ef f (ku uh k2H 1 (!T ) + osc(!T )2 ). 0 (3.25) Thus ( X T 2Th ⌘(T )2 )1/2 Cef f (ku uh kH01 (⌦) + osc(Th )). (3.26) Here C̃ef f and Cef f depend on shape regularity properties of Th but are independent of other essential quantities. eq:ef 50 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY Ver89 Proof. Our proof follows a technique due to Verfürth [41]. We begin by defining an element bubble function bT (x) := ⇧d+1 i=1 i (x), where i , i = 1, ..., d + 1, are barycentric coordinates on T . Note that bT = 0 on @T , since on each edge one of the barycentric coordinates is identically zero. In addition, bT > 0 in int(T ). oscdef Also, kbT kL1 (T ) ' 1 (actually, it is equal to 1/3d+1 ). r 1 Let now fT 2 P , as in (3.24). Then hT kf + uh kL2 (T ) hT kf fT kL2 (T ) + hT kfT + Note next fT + uh 2 Pr 1 (T ). kvkbT := ( reference element and using the fact that Pr k·kL2 (T ) and kvkbT are equivalent norms on Pr quantities. Thus h2T kfT + R uh kL2 (T ) = osc(T ) + hT kfT + uh kL2 (T ) . (3.27) eqeff1 bT v 2 )1/2 is a norm on T . By transformation to a is a finite dimensional space, we in fact have that 1 with constant independent of hT and other essential T 1 uh k2L2 (T ) ' h2T kfT + uh k2bT Z = h2T bT (fT + uh )(fT f + f + uh ) T Z h2T bT (fT + uh )[ (u uh )] + h2T kfT + uh kL2 (T ) kfT f kL2 (T ) T Z 1 2 hT bT (fT + uh )[ (u uh )] + osc(T )2 + h2T kfT + uh k. 4 T (3.28) We then integrate by parts while recalling that bT = 0 on @T and then apply an inverse inequality to obtain Z Z h2T bT (fT + uh )[ (u uh )] = h2T r[bT (fT + uh )]r(u uh ) T T h2T kr[bT (fT + ChT kbT (fT + Ckr(u uh )kL2 (T ) kr(u uh )kL2 (T ) uh )kL2 (T ) kr(u uh )kL2 (T ) 1 uh )kL2 (T ) + h2T kfT + uh kL2 (T ) . 4 (3.29) Combining the last two estimates and reabsorbing the resulting final term 12 h2T kfT + uh kL2 (T ) then yields h2T kfT + uh k2L2 (T ) C(kr(u uh )k2L2 (T ) + osc(T )2 ). eqeff1 Inserting into (3.27) then yields hT kf + uh kL2 (T ) . kr(u 1/2 uh )kL2 (T ) + osc(T ). (3.30) We now consider the edge term hT kJruh KkL2 (@T ) . Let e be one of the faces of T , and assume that e is shared by T and T 0 ; recall that Jruh K|e = 0 if e ⇢ @⌦. We first define an edge bubble be as follows. By (possibly) renumbering we can assume that 1 , ..., d are the barycentric coordinates on T that are nonzero on e, and similarly for T 0 . We then define be |T = ⇧di=1 d , and similarly on T 0 . It is easy to compute that be = 0 on @(T [ T 0 ) and that be is continuous on T [ T 0 . Also, voleff 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 51 1/2 kbe kL1 (T [T 0 ) ' 1. In addition, we have as above that for v 2 Pr 1 , kvkL2 (e) ' kbe vkL2 (e) . Finally, we let be the polynomial on T [ T 0 which is obtained by extending Jruh K as a constant in the direction normal to e. Using the fact that Jruh K is a polynomial, we can compute that 1/2 1/2 k kL2 (T [T 0 ) . hT k kL2 (e) = hT kJruh KkL2 (e) . R R We next note that 0 = @(T [T 0 ) (be )ru · n = T [T 0 r(be )ru + be u. Thus Z kJruh Kk2L2 (e) =' be Jruh K2 Z e = be Jruh K Ze Z = be ruh · ~n + be ruh · ~n @T @T 0 Z Z = r(be )ruh + be uh + r(be )ruh + be uh T0 ZT Z = r(be )r(uh u) + be (uh u) + r(be )r(uh u) + be T0 T kr(be )kL2 (T [T 0 ) kr(u uh )kL2 (T [T 0 ) + kbe kL2 (T [T 0 ) kf + (3.31) (uh u) h uh kL2 (T [T 0 ) , (3.32) psiscale computed elementwise. Using (3.31) and kbe kL1 (T [T 0 ) . 1, we now where by h we mean compute that kr(be )kL2 (T [T 0 ) . hT 1/2 1/2 kJruh KkL2 (e) , kbe kL2 (T [T 0 ) . hT kJruh KkL2 (e) . voleff edge (3.33) Inserting these relationships and then (3.30) into (3.32) then yields 1/2 3/2 hT kJruh Kk2L2 (e) C(hT kr(u C(kr(u psiscale uh )kL2 (T [T 0 ) kJruh KkL2 (e) + hT kf + uh kL2 (T ) kJruh KkL2 (e) ) 1 uh )k2L2 (T [T 0 ) + osc(T [ T 0 )) + hT kJruh Kk2L2 (e) . 2 (3.34) voleff Reabsorbing the last term and combining the result with (3.30) completes the proof. 2 We finally make a philosophical remark about the roll that data oscillation plays in a posteriori error estimation and adaptive finite element methods. First note that osc(T ) ⌘(T ), since uh 2 Pr 1 and so inf fT 2Pr 1 kf fT kL2 (T ) kf + uh kL2 (T ) . Combining this observation with the above reliability and efficiency estimates, we obtain X ku uh kH01 (⌦) + osc(Th ) ' ( ⌘(T )2 )1/2 . T 2Th The quantity ku uh kH01 (⌦) + osc(Th ) (error plus oscillation) is often referred to as the total error. Data oscillation is typically a higher-order (O(hr+1 ) term if f is smooth enough, but it may dominate the total error on coarse meshes. An important philosophical step in establishing a robust edge 52 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY convergence theory for adaptive FEM was the realization that a posteriori error estimates (and corresponding adaptive FEM) really control the total error, not just the plain energy error as is usually considered in a priori error estimates. 3.2.3 Mesh refinement: Bisection Adaptive finite element methods produce a sequence T0 , T1 , T2 ... of grids. Here T0 can be thought of as an algorithm input, i.e., a user-supplied grid, and T1 , T2 , ... are generated automatically by the algorithm. Establishing convergence of adaptive FEM relies critically on the construction of mesh afem refinement routines (“refine” in (3.3)) with appropriate properties. In particular, we will eventually need the following (or closely related properties) to hold: 1. Given T` , refine produces T`+1 with T` ⇢ T`+1 . (3.35) nested 2. The sequence {T` }` 0 is uniformly shape regular. That is, there exist constants c, C independent of ` such that for any T 2 T` (` 0), there are balls of radius chT and ChT inscribed in and superscribing T , respectively. Here c, C may be smaller and larger, respectively, than the corresponding constants c0 , C0 describing the shape regularity of T0 , but not by more than a fixed factor. 3. If we instruct refine to subdivide (at least) the subset M` of T` in order to produce T`+1 , the algorithm will produce a conforming mesh T`+1 such the number of newly produce triangles is no more than a fixed multiple of the number of triangles in M` : #T`+1 #T` . #M` . (3.36) The above properties are (almost) fulfilled by the newest vertex bisection algorithm noinflate and its higherdimensional generalizations, as we describe below. (We say “almost” here because (3.36) does not hold with constant uniform with respect to the mesh level; it only holds in a cumulative fashion described below.) However, other algorithms exist which violate some of the above conditions, and good algorithmic performance may still be achieved even if proof of this fact is more difficult. The essential requirements are that shape regularity is maintained, the algorithm does not inflate the number of refined elements by too much more than the number of marked elements, and that the depth of nonconformity is bounded. Nestedness is sometimes violated in practice, as is the conformity requirement. We briefly describe two refinement algorithms that are widely known but lack some of the above BSW83 properties. The first is “red-green” refinement [9]. We give a brief description of the two-dimensional version; a 3D version is also available (“red-green-blue” refinement and variants). Given a marked set M` ⇢ T` , the algorithm produces T`+1 by a combination of red and green refinements. “Red” refinement of T 2 T` involves connecting the midpoints of all edges in T so that four children are produced, i.e., T is subdivided into four similar triangles. “Green” refinement of T 2 T` is a bisection, that is, a new edge is created by connecting a vertex of T with the midpoint of its opposite edge. T 2 T` is colored red if either T 2 T0 or it is the child of some previous T 0 2 Tk by red refinement. T 2 T` is green if it is the child by green refinement of some T 0 2 Tk , k < `. The steps in red-green refinement are: noinflate 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 53 1. Divide M` into Mr` (red elements) and Mg` (green elements). 2. Refine all elements in Mr` by red refinement. 3. Coarsen all T 2 Mg` by removing their fathers’ midlines. Refine their fathers by red refinement and repaint red. 4. Remove possible hanging nodes by iterating the following: (a) Refine all red elements with at least two hanging nodes by red refinement and paint the children red. (b) Coarsen green elements with at least one hanging node and refine fathers by red refinement; (c) Refine red elements with only one hanging node by green refinement and paint the children green. Note that this strategy clearly preserves shape regularity: Children of red-refined triangles are similar to the parent. Green-refined triangles have angles at most half that of their parent, and because no triangle is ever green-refined more than once, the shape cannot degenerate. A second option is longest-edge bisection. Longest-edge bisection will produce uniformly shaperegular meshes, since always bisecting the longest edge produces children which do the least damage to the shape regularity properties of the new mesh. However, proving combinatorial properties of the created sequence of meshes–in particular that the conforming closures to do not damage the cardinality of the produced meshes–is problematic. Conceptually closely related to longest-edge bisection, but having provably good combinatorial properties, is newest vertex bisection and its generalizations to higher space dimensions. We refer to Ste08 [39] for a thorough exploration of these properties. The basic algorithm in two dimensions involves first input an initial mesh T0 with one vertex in each element labeled as the “newest”. Then for ` 0: 1. Bisect each element in M` with refinement edge being the one opposite of the newest vertex. Relabel the newly created vertex as the newest. 2. Recursively apply this step to the set of hanging nodes until the algorithm terminates, i.e, until no hanging nodes remain. The last step should of course raise warning flags, as it is not clear that the algorithm should terminate at all. It can however be proved that the algorithm does in fact terminate, provided that the initial labeling of “newest” vertices in T0 satisfies certain properties. Such a labeling can always be supplied for two-dimensional meshes, and for higher-dimensional meshes after possibly carrying noinflate out a finite number of uniform refinements of T0 . In addition, it turns out that (3.36) does not hold with uniform constant. A cumulative version does however hold. Lemma 3.2.7 Given a polyhedral domain ⌦, initial conforming shape-regular simplicial decomposition T0 , and for each ` 0 a subset M` ⇢ T` , the newest-vertex bisection algorithm or its generalization to higher space dimension will produce anested sequence of meshes {T` }` 0 which is conforming, uniformly shape-regular, nested in the sense of (3.35), and ensures that all elements T 2 M` are 54 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY bisected at least once in passing from T` to T`+1 . In addition, there is a constant C depending on T0 such that for any ` 0, 3.2.4 ` 1 X #T0 C #T` k=0 Mk . (3.37) real_noinflate Contraction afem We now more precisely define the steps in (3.3). Let an initial mesh T0 be given. We denote by V` the finite element space corresponding to T` , etc. Then for ` 0: 1. We solve for u` 2 V` , assuming exact linear algebra. (This is not the typical case in practice; there are papers available which analyze the e↵ect of inexact system solution on AFEM convergence.) P 2. estimate using the residual estimator ( T 2T` ⌘(T )2 )1/2 . 3. Let 0 < ✓ 1 be given. In mark, we choose the set M` of minimal cardinality so that X X ⌘(T )2 ✓ ⌘(T )2 . (3.38) T 2M` dorfler T 2T` This is called Dörfler or bulk marking. Note for the sake of intuition that ✓ = 1 corresponds to uniform refinement, unless there are elements on which the indicators are 0. 4. Use newest-vertex bisection or its generalization to higher space dimensions in order to refine (bisect) the elements in M` b 1 times and then bisect additional elements in order to produce a new conforming mesh T`+1 with T` ⇢ T`+1 . CKNS08 The work [16] identified a sequence of four steps, or ingredients, for proving that AFEM converges to the true solution: An a posteriori upper bound, orthogonality, an estimator reduction property, and contraction. Ingredient 1: A posteriori upper bound. We have from above that X ku uh kH01 (⌦) Crel ( ⌘(T )2 )1/2 . (3.39) reliability T 2T` Ingredient 2: Orthogonality. Nestedness of the finite element spaces and Galerkin orthogonality produce the Pythagorean identity u`+1 k2H 1 (⌦) = ku u` k2H 1 (⌦) ku` u`+1 k2H 1 (⌦) . (3.40) 0 0 0 R To prove this, we write a(u, v) := ⌦ ru · rv and kuk2H 1 (⌦) = a(u, u). Then using the fact that 0 V` ⇢ V`+1 and so a(u u`+1 , v`+1 ) = 0, v`+1 2 V`+1 , we have ku ku u` k2H 1 (⌦) = a(u 0 u` , u u` ) = a(u u`+1 , u = a(u u`+1 , u) + a(u`+1 , u = a(u u`+1 , u u`+1 ) + a(u`+1 , u`+1 = a(u u`+1 , u u`+1 ) + a(u`+1 = ku u`+1 k2H 1 (⌦) + ku`+1 0 u` ) u` ) + a(u`+1 a(u` , u`+1 u` ) u` , u`+1 u` k2H 1 (⌦) . 0 u` , u u` ) u` ) a(u` , u`+1 u` ) u` ) pyth 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 55 Our strategy is to show that ku u`+1 kH01 (⌦) ⇢ku u` kH01 (⌦) for some ⇢ < 1. The identity pyth (3.40) in essence tells us that we must show that ku` u`+1 kH01 (⌦) is sufficiently large with respect to ku u` kH01 (⌦) . Ingredient 3: Estimator reduction. We first expand our notation. Given v` 2 V` , we write ⌘` (v` ; T )2 = h2T kf + v` k2L2 (T ) + hT kJrv` Kk2L2 (@T ) , T 2 T` . Note that if v` 2 V` , then also v` 2 V`+1 and so we can consider both ⌘` (v` ; T ) (T 2 T` ) and ⌘`+1 (v` ; T ) (T 2 T`+1 ). In this case Jrv` K|e = 0 on edges e which are interfaces of elements in T`+1 but not of elements in T` . We begin with an auxiliary proposition establishing what might be termed an “indicator continuity” property. Proposition 3.2.8 Assume that v, w 2 V` . Then there exists C > 0 such that for any 0 < ↵ < 1 and T 2 T` , ⌘` (T ; v)2 (1 + ↵)⌘` (T ; w)2 + C(1 + ↵ Proof. First consider the volumetric residual hT kf + by an inverse inequality yields hT kf + vkL2 (T ) hT kf + wkL2 (T ) + hT k (w 1 wk2H 1 (!T ) . )kv (3.41) 0 continuity vkL2 (T ) . The triangle inequality followed v)kL2 (T ) hT kf + wkL2 (T ) + Ckw vkH01 (T ) . Squaring the above and applying Young’s inequality yields h2T kf + vk2L2 (T ) h2T kf + wk2L2 (T ) + 2ChT kf + (1 + ↵)h2T kf + wkL2 (T ) kw wk2L2 (T ) + C(1 + ↵ 1 )kw vkH01 (T ) + C 2 kw vk2H 1 (T ) 0 vk2H 1 (T ) . 0 (3.42) eq200 Similarly, let e be a face in @T which is shared by T, T 0 2 T` . Using the scaled trace inequality (3.4) and an inverse inequality yields scaledtrace 1/2 1/2 1/2 hT kJrvKkL2 (e) hT kJrwKkL2 (e) + hT kJr(v 1/2 hT kJrwKkL2 (e) 1/2 1/2 + ChT (hT kv 1/2 hT kJrwKkL2 (e) w)KkL2 (e) 1/2 wkH01 (T [T 0 ) + hT |v + Ckv 1/2 w|H 2 (T ) + hT |v w|H 2 (T 0 ) ) wkH01 (T [T 0 ) . Squaring the above, adding the result over the faces of T , and applying Young’s inequality as above yields hT kJrvKk2L2 (@T ) (1 + ↵)hT kJwKk2L2 (@T ) + C(1 + ↵ eq200 1 )kv wk2H 1 (!T ) . 0 (3.43) eq201 Adding together (3.42) and (3.43) completes the proof. 2 eq201 56 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY dorfler Lemma 3.2.9 Assume that the marked set M` satisfies the bulk marking criterion (3.38). Then there are a constant Cred > 0 independent of essential quantities and a constant 0 < < 1 depending on the number of bisections b applied to each T 2 M` but otherwise independent of essential quantities such that for any 1 > ↵ > 0, ⌘`+1 (T`+1 ; u`+1 )2 (1 + ↵)(1 ✓)⌘` (T` ; u` )2 + Cred (1 + ↵ 1 )ku` continuity Proof. Let T 2 T`+1 . Because u` , u`+1 2 V`+1 , (3.41) yields ⌘`+1 (T ; u`+1 )2 (1 + ↵)⌘`+1 (T ; u` )2 + C(1 + ↵ 1 )ku` u`+1 k2H 1 (⌦) . 0 (3.44) reduction (3.45) eq202 u`+1 k2H 1 (!T ) . 0 Applying finite overlap of the patches !T while summing over T 2 T`+1 yields ⌘`+1 (T`+1 ; u`+1 )2 (1 + ↵)⌘`+1 (T`+1 ; u` )2 + C(1 + ↵ 1 )ku` u`+1 k2H 1 (⌦) . 0 b Assume now that T 2 M` . Then there exist 2b elements T1 , ..., T2b in T`+1 such that T = [2i=1 Ti . Note also that |Ti | = 2 b |T | and so hTi = 2 b/d hT , i = 1.., 2b . Let ˜ = 2 b/d . Then b 2 X i=1 h2Ti kf + u` k2L2 (T ) = ˜ 2 h2T kf + u` k2L2 (Ti ) = h2T1 kf + u` k2L2 (T ) . Recalling that Jru` K|e = 0 for faces e which are faces of elements in T`+1 but not of faces in T` , we similarly have 2b X hTi kJu` Kk2L2 (@Ti ) = ˜ hT kJu` Kk2L2 (@T ) . i=1 Thus since ˜ < 1, ⌘`+1 (T ; u` ) ˜ ⌘` (T ; u` ), T 2 M` . (3.46) If T 2 RT` !T`+1 we may similarly prove ⌘`+1 (T ; u` ) ⌘` (T ; u` ), (3.47) ⌘`+1 (T ; u` ) = ⌘` (T ; u` ), T 2 T` \ RT` !T`+1 . (3.48) and clearly Combining the above and summing over T`+1 and then using the fact that ⌘` (M` ; u` )2 we obtain X 2 ⌘`+1 (T`+1 ; u` ) ⌘` (T` \ M` ; u` )2 + ˜ ⌘` (M` ; u` )2 = ⌘` (T` ; u` ) 2 ⌘` (T` ; u` ) 2 = (1 (1 ✓⌘` (T` ; u` )2 , T 2M` (1 ˜ )⌘` (M` ; u` )2 (1 ˜ )✓⌘` (T` ; u` )2 (3.49) ˜ )✓)⌘` (T` ; u` )2 . eq203 eq202 Inserting (3.49) into (3.45) while defining = 1 ˜ completes the proof. 2 Ingredient 4: Contraction. Properly mixing the first three ingredients yields a reduction in the total error at each step of the AFEM algorithm. a eq203 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE th:contraction 57 Theorem 3.2.10 Assume that AFEM as defined above is employed. Then there exist constants > 0 and 0 < ⇢ < 1 depending on the shape regularity of T0 , b, and ✓ but not on other essential quantities such that u`+1 k2H 1 (⌦) + ⌘`+1 (T`+1 ; u`+1 )2 ⇢(ku ku Proof. Let 0 = 1 Cred (1+↵ u` k2H 1 (⌦) + ⌘` (T` ; u` )2 ). 0 reduction 1) reduction , with Cred and ↵ as in (3.44). We then rewrite (3.44) as ⌘`+1 (T`+1 ; u`+1 )2 (1 + ↵)(1 ✓)⌘` (T` ; u` )2 + ku` u`+1 k2H 1 (⌦) . 0 pyth Adding the above together (Ingredient 1) with the orthogonality relationship (3.40) (Ingredient 2) then yields ku u`+1 kH01 (⌦) + ⌘`+1 (T`+1 ; u`+1 )2 ku u` k2H 1 (⌦) + (1 + ↵)(1 0 ✓)⌘` (T` ; u` )2 . reliability We next use the a posteriori upper bound (3.39) (Ingredient 3) to find (1 + ↵)(1 ✓)⌘` (T` ; u` )2 = ( (1 + ↵)(1 (1 + ↵)(1 (1 + ↵) ✓/2)⌘` (T` ; u` )2 ✓/2) ✓/2)⌘` (T` ; u` )2 Crel2 (1 + ↵) ✓/2ku u` k2H 1 (⌦) . 0 Combining the last two inequalities then yields ku Crel2 (1 + ↵) ✓/2)ku u`+1 kH01 (⌦) + ⌘`+1 (T`+1 ; u`+1 )2 (1 + (1 + ↵)(1 u` k2H 1 (⌦) 0 ✓/2)⌘` (T` ; u` )2 . Recall that 0 < ↵ < 1 is arbitrary. We thus may take ↵ small enough so that (1 + ↵)(1 ✓/2) < 1, since 1 ✓/2 < 1. We do so and then set ⇢ = max(1 Crel2 (1 + ↵) ✓/2, (1 + ↵)(1 ✓/2)) < 1. This completes the proof. 2 th:contraction In the next subsection we will need the following variation of Theorem 3.2.10. Recalling that ⌘` (T` ) ' ku u` kH01 (⌦) + osc(T` ), the proof is immediate. th:contraction Corollary 3.2.11 Under the assumptions of Theorem 3.2.10, we have for m ku 3.2.5 um kH01 (⌦) + osc(Tm ) . ⇢m ` [ku u` kH01 (⌦) + osc(T` )]. ` (3.50) Quasi-Optimality of AFEM th:contraction Theorem 3.2.10 guarantees convergence of AFEM, and it even gives a convergence rate in that it establishes the total error is reduced by a fixed fraction at each step of the algorithm. On the other hand, it leaves important questions open. Recall the standard a priori estimate ku uh kH01 (⌦) Chr |u|H r+1 (⌦) . This estimate shows that a better convergence rate can be achieved with th:contraction a higher polynomial degree, if u is sufficiently smooth. Theorem 3.2.10 however references neither the polynomial degree r or the regularity of u. In this subsection we remedy this shortcoming by establishing a rate optimality result for AFEM. Let T be the set of all conforming meshes that can be derived by newest-vertex bisection (or its generalization to higher space dimension) from the initial mesh T0 . The goal of AFEM is to optimize approximation to u over T. That is, AFEM should pick out a sequence of meshes through T that mod_contract lem:dorf_count 58 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY is optimal in some reasonable sense. We shall show that if u can be approximated with rate s by a sequence of meshes in T, then AFEM in fact picks out a sequence of meshes which also approximates u with rate s. More formally, we define nonlinear approximation classes. Let s > 0 be given; s may be thought of as an approximation rate. We define |u|As := sup N s N 0 #T inf inf (ku #T0 =N uT 2VT uT kH01 (⌦) + osc(T )) (3.51) def:approxclass and write u 2 As if |u|As < 1. Our goal will be to prove that ku u` kH01 (⌦) . (#T #T0 ) s |u|As whenever u 2 As for 0 < s dr , under appropriate assumptions. Comparing with the classical a priori estimate ku uh kH01 (⌦) Chr |u|H r+1 (⌦) , we see that roughly speaking (#T #T0 ) s takes the place of the convergence rate hr and the nonlinear approximation measure |u|As takes the place of the regularity measure |u|H r+1 (⌦) . At this point, it may appear that employing the approximation measure |u|As to measure the regularity of u is somewhat circular or unsatisfying, as we do not refer here to intrinsic regularity properties of u. It is in fact possible to make some statements about the relationship of As with certain Besov spaces, but the relationship is not entirely straightforward. This relationship will be discussed more below. We first prove some important technical lemmas. The first establishes that in essence, any refinement achieving sufficient error reduction must result form a Dörfler marking strategy. This lemma will be used to bound the number of marked elements in the Dörfler marking. locupper Lemma 3.2.12 Let Crel be the constant from the local upper bound (3.23) and Cef f the constant eq:ef from the efficiency estimate (3.26). Assume that the Dörfler marking parameter ✓ satisfies ✓ < 1 Cef f (Crel +1) . Then for T` ⇢ T 2 T satisfying ku uT kH01 (⌦) + osc(T ) [1 u` kH01 (⌦) + osc(T` )), ✓Cef f (Crel + 1)](ku (3.52) eq300 (3.53) eq301 (3.54) eq302 (3.55) eq303 there holds ⌘` (RT` !T ) ✓⌘` (T` ). Proof. We add the inequalities ku u` kH01 (⌦) ku` uT kH01 (⌦) + ku osc(T` ) osc(RT` !T ) + osc(T ) uT kH01 (⌦) , to obtain ku u` kH01 (⌦) + osc(T` ) ku` eq:ef uT kH01 (⌦) + osc(RT` !T ) + ku eq300 uT kH01 (⌦) + osc(T ). eq303 locupper Employing (3.26), a rearranged version of (3.52), a rearranged version of (3.55), and (3.23) then yields ✓(Crel + 1)⌘` (T` ) ✓Cef f (Crel + 1)(ku ku ku` u` kH01 (⌦) + osc(T` )) u` kH01 (⌦) + osc(T` ) ku uT kH01 (⌦) + osc(RT` !T ) uT kH01 (⌦) osc(T ) (3.56) (Crel + 1)⌘(RT` !T ). Here we have also employed the fact that osc(T ) ⌘(T ), T 2 T` . Dividing through by Crel + 1 completes the proof. 2 eq304 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 59 lem:dorf_count Lemma 3.2.13 Let u 2 As for some s > 0. Under the assumptions of Lemma 3.2.12, the collection dorfler of marked elements defined by the Dörfler marking strategy (3.38) satisfies 1/s #M` . |u|As [ku 1/s u` kH01 (⌦) + osc(T` )] . Proof. We first assert that there exists a mesh T 0 2 T such that 1/s #T0 . |u|As #T 0 and uT 0 2 VT 0 such that ku ⇣ [1 ✓Cef f (Crel + 1)][ku uT 0 kH01 (⌦) + osc(T 0 ) [1 u` kH01 (⌦) + osc(T` )] ⌘ 1/s u` kH01 (⌦) + osc(T` )]. ✓Cef f (Crel + 1)][ku eq307 (3.57) eq305 (3.58) eq306 (3.59) eq307 eq306 To see this, choose T 0 of minimal cardinality so that (3.59) holds. (3.58) then also hold for a mesh of the same cardinality by definition of the approximation class As . Let now T 2 T be the smallest common refinement of T` and T 0 . It can be shown that #T #T` #T 0 #T0 . Note that 0 oscillation is monotone under refinement. That is, T ⇢ T ) osc(T ) osc(T 0 ). Because VT 0 ⇢ VT , Céa’s Lemma then yields ku uT kH01 (⌦) + osc(T ) ku [1 uT 0 kH01 (⌦) + osc(T 0 ) ✓Cef f (Crel + 1)][ku lem:dorf_count u` kH01 (⌦) + osc(T` )]. (3.60) eq308 ✓⌘` (T` ). Because M` is the smallest subset of T` By Lemma 3.2.12, we thus have ⌘(RT` !T ) satisfying that latter inequality, we have #T` #T 0 #M` #RT` !T #T eq306 #T0 . (3.61) eq309 The result then follows from (3.58). 2 We finally prove that AFEM converges with the best possible rate s in the sense that if u 2 As , then AFEM chooses a sequence of meshes such that the error decreases with rate s. th:optimality Theorem 3.2.14 Let u 2 As for some s > 0. Under the previous assumptions, it holds that ku u` 1 kH01 (⌦) + osc(T` 1) . (#T` #T0 ) 1/s 1/s |u|As . (3.62) eq310 . (3.63) eq311 ui kH01 (⌦) + osc(Ti )] (3.64) eq312 (3.65) eq313 real_noinflate eq305 Proof. Combining (3.37) and (3.57) yields #T0 . #T` ` 1 X i=0 mod_contract 1/s M` . |u|As ` 1 X [ku i=0 ui kH01 (⌦) + osc(Ti )] 1/s From (3.50) we have ku so that for 0 i ` [ku u` 1 kH01 (⌦) + osc(T` 1) . ⇢` 1 i [ku 1 ui kH01 (⌦) + osc(Ti )] 1/s . ⇢(` 1 i)/s [ku u` 1 kH01 (⌦) + osc(T` 1 )] 1/s . 60 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY eq313 eq311 Combining (3.65) and (3.63) then yields #T` #T0 . ` 1 X i=0 M` 1/s . |u|As [ku . 1/s |u|As [ku u` u` 1 kH01 (⌦) + osc(T` 1 kH01 (⌦) + osc(T` 1 )] 1/s 1 )] 1/s ` 1 X ⇢(` 1 i)/s (3.66) i=0 , with the last step following from the fact that ⇢1/s < 1 and so the last sum is a partial sum of a geometric series. Rearranging completes the proof. 2 3.2.6 Approximation classes and Besov spaces In this subsection we shall try to understand better which functions lie in the approximation classes As . The answer we give is only a partial answer in two senses. First, it does not take into account the role that data oscillation plays in membership of u in a given approximation class. Secondly, we do not quite obtain an “if-and-only-if” statement telling us that u 2 As only when u lies in some particular function space. However, the answer we obtain nonetheless gives us meaningful information about the smoothness needed in order for AFEM to converge with a given rate. The BDDP02 answer to this question was originally given in [13] for space dimension d = 2 and polynomial degree r = 1. Some of the definitions we use below and a broader overview of nonlinear approximation Dev98 theory may be bound in [23]; this reference does not treat finite element approximation but is GM14, Gan16 nonetheless a useful reference. Finally, [30, 29] prove equivalences in the context of higher space dimension and higher polynomial degree. The latter reference additionally includes consideration of data oscillation, and the full results in the latter two references require use of a type of generalized Besov space. As a first step we define Besov spaces. The Besov space Bq↵ (Lp (⌦)) has three indices, a smoothness index ↵, an integrability index p, and a “fine tuning” index q. Here ↵ 0, and 0 p, q 1. Note carefully that the integrability index here may be less than 1, in contrast to classical Lp spaces. This fact plays an important role in understanding AFEM convergence. Also, the smoothness and integrability indices should be quite intuitive to those familiar with Sobolev spaces, while the third index q is rather more mysterious. To give at least a small amount of intuition about it, though, we think briefly of the case L1 (⌦). Here the spaces L1 of essentially bounded functions and C(⌦) of continuous and bounded functions both have the same smoothness and integrability, but are not the same spaces. The spaces BM O (functions of bounded mean oscillation) and VMO (vanishing mean oscillation) also lie in the same place in the smoothness-integrability scale, giving a total of four spaces corresponding to ↵ = 0, p = 1. These are all di↵erent spaces with di↵erent properties and uses. Thus we see that the usual Sobolev scale does not always allow us to distinguish properly between di↵erent types of functions. There are various definitions of Besov spaces available (most of them equivalent under reasonable assumptions); we now give such a definition. We begin by defining a di↵erence operator. Let h 2 Rd eq314 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE and k 2 N. For x 2 G ⇢ Rd , the h-di↵erence of order k is given by 8 ✓ ◆ k < Pk k+j ( 1) f (x + jh), [x, x + kh] ⇢ G, k j=0 j h (f, x, G) := : 0, otherwise. 61 (3.67) eq320 Here [x, x + kh] is the prism (x + [0, kh1 ]) ⇥ ... ⇥ (x + [0, khd ]). We also define the modulus of smoothness of order k in Lp (G) as !k (f, t)p = !k (f, t, G)p := sup k |h|t k h (f, ·, G)kLp (G) , t > 0. (3.68) We next define the Besov semi-(quasi)norm | · |Bq↵ (Lp (⌦) . Given ↵ > 0 and 0 < p, q 1, then for any 2 N such that ↵ < + max{1, 1/p} = r + 1/p⇤ for p⇤ = min{1, p}, the Besov space Bq↵ (Lp (⌦)) is the set of all f 2 Lp (⌦) such that the semi-(quasi)norm |f |Bq↵ (Lp (⌦) is finite, with ( R 1/q 1 [t ↵ ! +1 (f, t)p ]q dt , 0 < q < 1, t 0 |f |Bq↵ (Lp (⌦) := (3.69) supt>0 t ↵ ! +1 (f, t)p , q = 1. eq321 eq323 The (quasi)norm of Bq↵ (Lp (⌦)) is then kf kBq↵ (Lp (⌦)) = kf kLp (⌦) + |f |Bq↵ (Lp (⌦)) . (3.70) eq324 This is only a quasinorm when p < 1 because in this case the triangle inequality is only satisfied up to a constant. Other definitions of Besov spaces exist and are generally equivalent at least if @⌦ is sufficiently regular. We briefly discuss the real interpolation method by way of example, or more precisely the AF03 J-method of real interpolation. We follow [2, Chapter 7]. Assume that we are given Banach spaces X0 , X1 with nontrivial intersection and both lying in a larger Hausdor↵ topological vector space X. Note that X0 + X1 and X0 \ X1 are then also both Banach spaces. Also, we assume that X0 \ X1 is nontrivial. We define for t > 0 the J-norm J(t; u) = max{kukX0 , tkukX1 }. If 0 ✓ 1 and 1 q 1, we denote by (X0 , X1 )✓,q;J the space of all u 2 X0 + X1 with Z 1 dt u= f (t) t 0 (3.71) eq400 (3.72) eq401 for some f 2 L1 (0, 1; dt/t, X0 +X1 ) having values in X0 \X1 and such that the R 1real-valued function t ! t ✓ J(t; f ) belongs to Lq (0, 1; dt/t, R). (Here g 2 Lq (0, 1; dt/t, R) if 0 |g(t)|q dt t < 1 for 1AF03 q < 1, and similarly for q = 1. We also denote this space by L⇤q .) We then have the following ([2, Theroem 7.13]). Theorem 3.2.15 If either 1 < q 1 and 0 < ✓ < 1 or q = 1 and 0 ✓ 1, then (X0 , X1 )✓,q;J is a nontrivial Banach space with norm kuk✓,q;J = inf kt f 2S(u) = inf f 2S(u) ✓Z ✓ J(t; f (t)); L⇤q k 1 0 [t ✓ J(t; f (t))]q dt t ◆1/q (3.73) , q < 1, 62 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY where ⇢ S(u) = f 2 L1 (0, 1; dt/t, X0 + X1 ) : u = Z 1 f (t) 0 dt t . (3.74) Furthermore, kukX0 +X1 kt ✓ min{1, t}; L⇤q0 k kuk✓,q;J max{kukX0 , kukX1 }. (3.75) Thus X0 \ X1 ! (X0 , X1 )✓,q;J ! X0 + X1 , (3.76) that is, (X0 , X1 )✓,q;J is an intermediate space between X0 and X1 . One can then define Besov spaces by real interpolation. Let 0 < ↵ < 1, 1 p < 1, and 1 q 1. Also, let m be the smallest integer greater than ↵. Then we have Bq↵ (Lp (⌦)) = (Lp (⌦), W p,m (⌦))↵/m,q;J . (3.77) Now we return to our goal of gaining intuition about the relationship between intrinsic smoothness of u and its membership in a given approximation class. Here a technical issue arises. Our def:approxclass original definition of apprxoimation classes given in (3.51) includes data oscillation. It is possible to characterize approximation classes which incorporate data oscillation in a manner similar to our development below, but the imbedding spaces that are used are not as easily characterized by referring Gan16 to classical spaces; cf. [29]. We thus omit data oscillation, or in essence assume that f is piecewise polynomial on T0 . We are still able to gain meaningful intuition about approximation classes by doing so. Accordingly, we now redefine our approximation class in order to omit data oscillation. Let |u|As := sup N s N 0 inf T 2T,#T ku inf #T0 =N uT inVT uT kH01 (⌦) . (3.78) eq325 The u 2 AGM14 s if |u|As < 1. From [30, Theorem 2.2], we have the following. Recall that r is the polynomial degree of the finite element space and d is the space dimension. th:direct Theorem 3.2.16 Assume that u 2 B⌧s+1 (L⌧ (⌦)) with and 1 ⌧ < s d + 12 , and 0 s r. Then u 2 As , |u|As . kukB⌧1+s (L⌧ (⌦)) . More precisely, given N (3.79) eq326 (3.80) eq327 1, inf T 2T,#T inf #T0 =N uT inVT ku uT kH01 (⌦) . N s/d |u|B⌧s+1 (L⌧ (⌦)) . The indices in theorem may appear somewhat daunting. A function space diagram (DeVore th:direct diagram) can be helpful in interpreting them. We may roughly restate Theorem 3.2.16 by saying that u 2 As if u lies in a Besov space B⌧s+1th:direct (L⌧ (⌦)) which compactly embeds into H 1 (⌦) (this is always true when the assumptions of Theorem 3.2.16 are satisfied). A DeVore diagram has horizontal axis 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 63 th:direct Figure 3.1: DeVore diagram illustrating Theorem 3.2.16. Order s convergence is obtained if u 2 B⌧s+1 (L⌧ (⌦)) with (1/⌧, s + 1) lying in the shaded region, but not on the line ⌧1 = ds + 12 . fig1 1/p (inverse of integrability index) and vertical axis ↵ (smoothness index). The “fine-tuning” index q plays little role for the direct example given here and could be thought of as a third dimension on the diagram. Another way of stating this is that various spaces (Besov, Sobolev, etc.) may occupy the same point on such a diagram. In order to specify the spaces which compactly embed into H 1 (⌦), we draw a line of slope d through the point (1/p, ↵) = (1/2, 1) on the diagram. Besov spaces B⌧s+1 (L⌧ (⌦)) with (1/⌧, s + 1) lying strictly to the left of this line and with s + 1 1 then embed compactly into H 1 . Solving this condition and adding in the natural convergence barrier s r then th:direct gives the conditions of Theorem 3.2.16. If (1/⌧, s + 1) lies on the line 1/⌧ = s/d + 1/2, then B⌧s+1 (L⌧ (⌦)) embeds continuously but not compactly into H 1 , and we have no guarantee that AFEM converges with rate s–but we also have GM14 no indication that it does not converge with rate s. The following theorem (cf. [30, Theorem 2.5]) however indicates that if u only lies in Besov spaces below the line 1/⌧ = s/d + 1/2, then we cannot achieve rate s convergence. th:inverse Theorem 3.2.17 Let s > 0 and 1 ⌧ = s d + 12 . Then As/d ⇢ B⌧1+s (L⌧ (⌦)). (3.81) th:direct Note carefully that there is a slight gap between the results of Theorem 3.2.16 and those of th:inverse Theorem 3.2.17. To see this, let us seek to determine what Besov regularity is required of u in order to guarantee u 2 Ar/d (which in turn gives us hope that our AFEM can approximate u with the th:direct generally best possible rate r/d). We take s = r in Theorem 3.2.16 and then find that we need r+1 1 r 1 2d ⌧ < d + 2 . Solving for ⌧ yields the condition ⌧ > 2r+d , that is, u 2 B 2d +✏ (L 2d +✏ (⌦)) will th:inverse 2r+d 2r+d guarantee that u 2 Ar/d . On the other hand, from Theorem 3.2.17, we have that u 2 Ar/d implies only that u 2 B r+1 2d (L 2d (⌦)). Thus membership in As is not precisely equivalent to membership 2r+d 2r+d in some specific Besov space; the Besov regularity required to guarantee u 2 As is slightly stronger than that guaranteed by u 2 As . eq328 64 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY table:besov r=1 d=2 2 B1+✏ (L1+✏ (⌦)) d=3 B 26 +✏ (L 65 +✏ (⌦)) r=2 B 32 +✏ (L 23 +✏ (⌦)) B 36 +✏ (L 67 +✏ (⌦)) r=3 B 41 +✏ (L 12 +✏ (⌦)) B 46 +✏ (L 69 +✏ (⌦)) r=4 B 52 +✏ (L 25 +✏ (⌦)) 6 B 56 +✏ (L 11 +✏ (⌦)) 3 2 5 5 7 9 11 Table 3.1: Regularity needed to achieve convergence rate r/d for various r, d. table:besov GM14 In table 3.2.6 we reproduce [30, Table 1], which gives the Besov regularity needed to achieve the generally best possible adaptive convergence rate of r/d for d = 2, 3 and r = 1, 2, 3, 4. These convergence rates may be observed quite precisely in practice in simple situations, for example when solving u = 1 on polyhedral domains. We now return to some examples to understand the adaptive convergence rates observed computationally in Chapter 1. First consider the two-dimensional case d = 2, and assume that u(x, y) solves u = 1 on some polygonal domain ⌦. Then u ⇠ ⇢ for some 1/4, depending on the situation ( = 1/4 corresponds to mixed boundary conditions which change type at a crack vertex; genudef cf. (2.6)). We wish to check whether we can achive the “optimal” convergence rate r/2 given poly4 2 omial degree r. As already discussed, we require u 2 B⌧r+1 (L⌧ (⌦)) with ⌧ > 2r+2 = r+1 . Recall the heuristic that each derivative of ⇢ subtracts one power from the exponent. Taking = 1/4, we have Dr+1 ⇢R1/4 ⇠ ⇢1/4 r 1 = ⇢ r 3/4 . Raising to the power ⌧ and integrating in polar coordinates, 1 2 we require 0 ⇢( r 3/4)⌧ +1 d⇢ < 1, or ⌧ ( r 3/4) + 1 > 1, which gives ⌧ < r+3/4 . That is, 2 Dr+1 ⇢1/4 is ⌧ -integrable for all ⌧ < r+3/4 . We however only need ⇢1/4 2 B⌧r+1 (L⌧ (⌦)) for some 2 2 2 ⌧ > r+1 . Because r+1 < r+3/4 , we can always find ⌧ satisfying both conditions, and it is possible to achieve O(DOF r/2 ) convergence rate. We can extend this observation to problems with stronger point singularities due to discontinuous coefficients, although these situations are not directly covered by our theory. Consider the problem div(aru) = f with a a (potentially discontinuous) scalar coefficient. Let ⌦ = ( 1, 1) ⇥ ( 1, 1), and ⇢ b xy > 0, a(x, y) = (3.82) 1 otherwise. Here we assume b > 0. If b 6= 1, then the coefficient a is discontinuous in a way that Ke74 forms a “checkerboard” pattern.BDN13 Such problems were analyzed in a well-known paper of Kellogg [34]. We take our example from [14], where a slight variation of this problem is discussed and a theoretical analysis of the e↵ects of coefficient regularity on AFEM construction and convergence rates is given. Choosing f = 0 and fixing appropriate boundary conditions for u, we find that the solution u is given in polar coordinates by u(⇢, ✓) = ⇢↵ µ(✓) with 0 < ↵ < 2 and 8 cos(( ⇡2 )↵) cos((✓ pi 0 ✓ < ⇡2 , > > 4 )↵), < ⇡ ⇡ cos( 4 ↵) cos((✓ ⇡ + )↵), 2 ✓ < ⇡, µ(✓) = (3.83) 5⇡ )↵), ⇡ ✓ < 3⇡ > cos(↵ ) cos((✓ > 4 2 , : cos( ⇡4 ↵) cos((✓ 3⇡ )↵), 3⇡ 2 2 < 2⇡. 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE Here b, ↵, and b= 65 satisfy ⇡ 2 ⇡ 1 )↵) cot( ↵), = 4 b 1)) < ⇡ ↵ < min(⇡↵, ⇡), 2 tan(( ⇡ tan( ↵) cot( ↵), b = 4 tan(↵ ) cot ⇡ ↵) 4 (3.84) and max(0, ⇡(↵ max(0, ⇡(1 ↵)) < 2↵ < min(⇡, ⇡(2 ↵)). (3.85) Setting ↵ = 1/8, Mathematica returns an approximate solution b = 103.087, = 11.781. Note that the coefficient a has rather high contrast. The solution thus produced has principal singularity ⇢1/8 , which is nastier than the ⇢1/4 singularity obtained from Poisson’s problem on a crack domain with mixed boundary conditions. In similar fashion we may produce u with u ⇠ ⇢ with arbitrarily close to 0, and thus only in H 1+✏ (⌦) with ✏ arbitrarily close to 0. Using the same reasoning as in the case of corner singularities, even in these cases it is possible to recover the “optimal” convergence rate DOF r/2 for ku u` kH01 (⌦) by using adapted shape-regular conforming meshes. The question of whether AFEM will also recover this rate is more complicated because the algorithm must also BDN13 approximate the coefficient a suitably; see [14] for an in-depth discussion. We now move to three-dimensional problems and consider the regularity of vertex and edge singularities. We first consider vertex singularities of the form ⇢↵ with ⇢ the distance to a point. Given polynomial degree r + 1, we have as above that Dr+1 ⇢↵ = ⇢↵ r 1 . In order to test whether 2d 6 Dr+1 ⇢↵ is ⌧ -integrable for ⌧ > 2r+d = 2r+3 , we use polar integration to write Z ⌦ |Dr+1 ⇢↵ |⌧ dx ⇠ Z 1 0 |⇢↵ r 1 ⌧ 2 | ⇢ d⇢ ⇠ Z 1 ⇢⌧ (↵ r 1)+2 d⇢. (3.86) 0 This integral is finite if ⌧ (↵ r 1) + 2 > 1, or ⌧ < r+13 ↵ (assuming that ↵ r 1 < 0, which 6 holds in interesting cases). Combined with the requirement ⌧ > 2r+6 , we thus require: 6 3 <⌧ < 2r + 3 r+1 ↵ . There are solutions ⌧ to this relationship if 2(r + 1 ↵) < 2r + 3, or ↵ > 1/2. This condition is precisely the one required to guarantee that ⇢↵ 2 H 1 (⌦), so as in the 2D case any such singularity lying in H 1 can always be approximated with the best possible rate ⇢ r/3 using AFEM on shape-regular meshes. We thus may formulate the following heuristic: Vertex singularities lying in H 1 have infinite smoothness in the scale of Besov spaces used to measure approximability by classes of shape-regular simplicial adaptive meshes. We now turn to edge singularities. Here the situation is markedly di↵erent, as we see below. We first return to the case of an polyhedral domain having maximum edge opening angle !e = 7⇡ 8 , fig13 ⇡/!e 8/7 as in Figure 1.12. Recall that the edge singularity at this edge is of the form ⇢ = ⇢ , where e e fig14 ⇢e is the distance to the given edge e. In Figure 1.13 we observed an adaptive convergence rate of DOF 1.12 in H01 (⌦) when using polynomials of degree r = 4. We now explain this rate. We require 66 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY u 2 B⌧s+1 (L⌧ (⌦)) with 1 ⌧ < s d + 12 , or ⌧ > 8/7 vertex singularities, we have Ds+1 ⇢e ⇠ Z Z 1 Z 7⇡/8 Z s 1 ⌧ |⇢8/7 | dx ⇠ e ⌦ 0 0 2d 2s+d . We 8/7 s 1 ⇢e . 0 1 |⇢8/7 e s 8/7 now test the ⌧ -integrability of Ds+1 ⇢e . As for Also, using now cylindrical integration, we have Z 1 1 ⌧ s 1 ⌧ | ⇢e d⇢e d✓ dz ⇠ |⇢8/7 | ⇢e d⇢e . e 0 Assuming that 8/7 s 1 < 0 (which again holds in the cases of interest to us), this expression is finite if ⌧ (8/7 s 1) + 1 > 1, or ⌧ < s+12 8/7 . Thus in order to achieve order s/d convergence, we require 6 2 <⌧ < . 2s + 3 s + 1 8/7 6 Solving for s, we find that there exists such a ⌧ only if 2s+3 < s+12 8/7 , or s < 24 7 . This yields 8 a convergence rate of s/3 < ⇡ 1.142, which is quite close to our observed adaptive convergence fig14 7 rate of 1.12 in Figure 1.13. Note that using polynomial degree r = 3 will lead to the generally best possible rate of DOF 1 , while increasing the polynomial degree above 4 will never yield a convergence rate better than DOF 8/7+✏ . This situation reminds us to be careful when stating that a method is “optimal”, as the above method is optimal in one sense but suboptimal in another. We have said that an AFEM is rateoptimal if it achieves the best possible convergence rate over all conforming meshes obtainable to systematic bisection from T0 . Our AFEM is indeed optimal in this sense, and the above calculations and numerical computations confirm that this is so. On the other hand, we also think of our method as being optimal if achieves the best possible convergence rate with respect to the polynomial degree, that is, DOF r/d . In this sense the method we use above–AFEM with polynomial degree r = 4–is 4/3 not optimal as it does not achieve a rate DOF BNZ05 . Anisotropic elements would be needed to recover an optimal convergence rate in this sense; cf. [7]. We now generalize the calculation. Assume an edge opening angle of !e , which yields an edge ⇡/! singularity for Poisson’s problem with homogeneous Dirichlet boundary conditions of the form ⇢e e . ⇡/! ⇡/! s 1 Then Ds+1 ⇢e e ⇠ ⇢e e , and Z Z 1 s+1 ⇡/!e ⌧ |D ⇢e | dx ⇠ ⇢⌧e (⇡/!e s 1)+1 d⇢e , ⌦ which is finite when ⌧ (⇡/!e 2d ⌧ > 2s+d yields 0 s 1) + 1 > 1. That is, ⌧ < 2 s+1 ⇡/!e . Combining with the condition 6 2 <⌧ < , 2s + 3 s + 1 ⇡/!e 3⇡ which when solved for s yields s < ! . Thus the best possible rate that can generally be achieved e s ⇡ is d < !e . More precisely, an AFEM for solving u = 1 with homogeneous Dirichlet boundary conditions on a polyhedral domain with maximum edge opening angle of !e will achieve a convergence rate of s r ⇡ ku u` kH01 (⌦) DOF s/3 , = min( , ✏). 3 3 !e Recalling that the maximum edge opening angle is 2⇡, there are cases where the best possible ⇡ convergence rate even with high-degree polynomials is 2⇡ = 12 . This rate is already achievable with quadratics, and using shape-regular finite elements of any higher degree would simply waste degrees 3.2. RESIDUAL A POSTERIORI ERROR ESTIMATES AND AFEM CONVERGENCE 67 Figure 3.2: Two-brick domain. fig3-1 of freedom. If we were to take mixed instead of Dirichlet boundary conditions, the singularity strength could be so severe that even using quadratic elements would not be helpful. We finally briefly discuss the situation when we wish to approximate u in Lp (⌦). Here rigorous proof of AFEM optimality is only available when p = 2 (and with the restriction that ⌦ is convex), but the same principles can generally be used to guess optimal convergence rates. We consider an DG12 example from [21] in which u is approximated in L1 (⌦). Our L1 AFEM is structured as follows. Let ⌘1;` (T ) = h2T kf + u` kL1 (T ) + hT kJru` KkL1 (@T ) . DG12, DK15 Then [21, 22] ku u` kL1 (⌦) . (1 + | ln min hT |) max ⌘1;` (T ). T 2T` T 2T` We now use a maximum strategy to mark elements for refinement. That is, for marking parameter 0 < ✓ 1, M` = {T 2 Te ll : ⌘1;` (T ) ✓ max ⌘1;` (T 0 )}. 0 T 2T` fig2-4 fig2-3 We run AFEM on the two-brick fig3-1 domain that we considered in Figure 2.2 (cf. Figure 2.3). We reproduce the domain in Figure fig3-2 3.2. We used polynomial degree r = 2, 3, which fig3-3 produced the sample adaptive mesh in Figure 3.3 and observed convergence rates seen in Figure 3.4. In Lp norms, the best convergence rate that we can hope for a priori is ku uh kLp (⌦) Chr+1 . Similarly, in an adaptive setting we aim for ku u` kLp (⌦) . DOF (r+1)/d . Thus measuring the error L1 in a 3D calculation, we would like to see a convergence rate of DOF 2/3 when using linears (r = 1), fig3-3 DOF 1 when using quadratics (r = 2), etc. In Figure 3.4 we however only see a convergence rate of DOF 2/3 –already achievable using piecewise linears–even if we use quadratics or cubics. To understand these convergence rates, recall that u is approximable with rate s/d in H01 (⌦) if u also lies in a Besov space with smoothness index s+1 (and equal integrability and fine-tuning indices) which compactly imbeds into H01 (⌦). We can analogously guess that u can be approximated with 68 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY Figure 3.3: Two-brick domain; adaptive mesh. fig3-2 0 10 L∞−error P2 estimator P3 estimator slope=−2/3 −1 10 −2 10 2 10 3 10 4 10 dof 5 10 Figure 3.4: Two-brick domain: Convergence rates. 6 10 fig3-3 3.3. OTHER A POSTERIORI ERROR ESTIMATION TECHNIQUES 69 s+1 rate s+1 (L⌧ (⌦)) which compactly imbeds into L1 (⌦). A d in L1 if u also lies in a Besov space B⌧ 3 Devore diagram tells us that the relevant condition is s+1 > ⌧3 , that is, we need integrability ⌧ > s+1 . ⇡/(3⇡/2) On theRother hand, the edge singularity at the L-shaped edges is of strength ⇢e 2 above, ⌦ |Ds+1 ⇢2/3 |⌧ dx is finite when ⌧ < s+1/3 , so overall we require 2/3 = ⇢e . As 3 2 <⌧ < . s+1 s + 1/3 These conditions are solvablefig3-3 when s < 1, yielding a convergence rate of s/d < 23 . This is precisely what is observed in Figure 3.4. We emphasize again that in this case, using any element degree fig3-3 above r = 1 simply wastes computational e↵ort, as we see in Figure 3.4 (the error line for r = 3 is above that for r = 2, so when we use cubics more degrees of freedom are required to achieve a given error level). 3.3 Other a posteriori error estimation techniques Above we have focused on residual-type a posteriori error estimates. Such estimates continue to be widely studied and employed, and they are the easiest and most natural to work with in the context of AFEM convergence theory. However, they also have some drawbacks, and there are quite a number of other a posteriori error estimation techniques with varying applications and properties. As above, we assume that our goal is to approximate a solution to an elliptic boundary value problem, say u = f or div(Aru) = f with appropriate boundary conditions prescribed. Some of the issues one might consider in choosing an estimator are: 1. What information about u are you trying to obtain from the calculation? Do you want to control the error u uh in some norm, or 2. How much are you willing to pay computationally in order to achieve more accurate error estimation? 3. How accurately must your estimator track the actual error? 4. Are you mainly interested in driving adaptivity, or in estimating computational errors? 5. How flexible do you want your estimator to be with respect to coding and di↵erential operator? That is, are you willing to accept an estimator that must be substantially modified when you change the di↵erential operator, or do you have a strong preference for one that is more portable? 6. Do you insist on rigorous and provable a posteriori upper bounds (reliabilty), or are willing to accept an estimator with guaranteed properties only on sufficiently fine meshes? 7. Are you using only low-order finite element methods, or higher-degree (or p or hp methods) for which polynomial-degree robustness matters? We illustrate varying approaches to balancing the demands posed by these questions by giving a few examples of other a posteriori error estimation techniques. Our set of examples is decidedly non-exhaustive and should not be viewed as a survey of current techniques. 70 3.3.1 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY Goal-oriented adaptivity The essence of goal-oriented adaptivity is that one should first identify the desired output from a given calculation, then design an adaptive algorithm that will efficiently reduce the error in producing that output. This idea was popularized in the late 90’s and early 2000’s and continues to be a viable and important part of BR03 the a posteriori error estimation and adaptivity landscape. An important BR01 reference is the book [8] of Bangerth and Rannacher; cf. the survey article [11] of Becker and Rannacher. Also, one often hears the term “dual-weighted residual method” (DWR) in the context of goal-oriented adaptivity, and the two ideas are sometimes use almost interchangably. Goaloriented adaptivity is more properly seen as the general approach to a posteriori error estimation, while dual-weighted residual methods are the means used to implement goal-oriented adaptivity. The core assumption of goal-oriented adaptivity is that one wishes to compute some functional J(u) of the PDE solution. The functional J may not be closely related to the energy norm or other Lp -type norms that are typically used in finite element error analysis. An important feature of J is that it most often is locally supported within the overall computational domain ⌦. We also mention that analysis of such methods is much simpler when J is linear and continuous in the H 1 norm, but the methodology may still be quiteBR03 e↵ective even if one or both of these assumptions is violated. We give an illustrative example from [8, Chapter 1]. Consider momentarily the classical incompressible Navier-Stokes equations @t v ⌫ v + v · rv + rp = f, r · v = 0. Here v is the fluid velocity and p the pressure. We assume the problem is posed on a domain ⌦ consisting of a rectangle with a small ball removed; we denote by S the boundary of the ball. Solving the NS equations with appropriate boundary conditions will allow us to compute the flow around this obstacle. Our goal here is to compute the drag coefficient of the obstacle, given by Z 2 J(v, p) := cdrag := 2 nT (2⌫⌧ pI)d ds, U D S where D is the diameter of the ball, U is the maximal inflow velocity, ⌧ = 12 (rv + rv T ) is the strain tensor, and d = (0, 1)T is the main flow direction. (Some parameters may be a little unclear here; we are only after the general structure of J). J is then lienar in v and p, but it is not bounded in H 1 ⇥L2 (a natural energy space here) because it involves surface integrals of rv and p. Thus controlling the energy error here will not guarantee control of the error in computing J. On the other hand, J is locally supported on the surface S, which leads us to ask how accurately we must compute (v, p) in regions of ⌦ removed from S in order to guarantee accurate computation of J(v, p). The dual-weighted-residual methodology gives us the ability to approach these questions in a meaningful and computationally efficient manner. BR03 We return to the model problem u = f , u = 0 on @⌦ in order to give more details (cf. [8, Chapter 3] for much of this discussion). Our goal is to control the error J(u) J(uh ) in approximating the goal functional J(u). The DWR approach involves computing a dual solution that may be viewed as a response function or generalized Green’s function for the functional J of interest. In particular, we let a(u, v) := (ru, rv), and let z solve a(', z) = J('), ' 2 H01 (⌦). The finite element approximation zh 2 Sh to z is given by a('h , zh ) = J('h ), 'h 2 Sh . 3.3. OTHER A POSTERIORI ERROR ESTIMATION TECHNIQUES Assuming now that J is linear, we have for any J(u) J(uh ) = J(u h 71 2 Sh that uh ) = a(u uh , z) = a(u uh , z h ). (3.87) Computationally we have access to uh and zh , but not to u and z, and so we can attempt to build an a posteriori error estimate out of uh and zh . There are multiple options for approaching this task. The most primitive is to set h = zh and use standard residual (or other) estimators for measuring the product ku uh kH 1 (⌦) kz zh kH 1 (⌦) . This approach assumes that z 2 H 1 (⌦) and costs us the ability to take advantage of local interactions between u uh and z zh , but is simple and straightforward, basically allows use of just about any a posteriori error estimators o↵ the shelf, and still may lead to significantly faster error decrease than considering the error ku uh k in isolation. AO00 A second approachdwr1 based on the parallelogram identity is also possible; cf. [3, Chapter 8]. BR03 Returning to ( 3.87) and following again [8], we again assume that is arbitrary. We have from h dwr1 (3.87) that J(u) J(uh ) = (f, z h) a(uh , z h) = ⇢(uh )(z h ), where ⇢(uh ) is the residual. Standard elementwise integration by parts yields Z X Z 1 ⇢(uh )(z (f + uh )(z Jruh K(z h) = h) + h) 2 @T T 2Th T X ⇢T ! T , (3.88) (3.89) T 2Th where ⇢T = (kf + uh k2L2 (T ) + hT 1 kJruh Kk2L2 (@T ) )1/2 may be termed smoothness indicators and 2 2 1/2 !T = (kz may be termed influence factors. h kL2 (T ) + hT kz h kL2 (@T ) ) The trick now is to meaningfully bound z h , using zh and/or other information about z. We illustrate with a specific example. Let J(u) = u(x0 ) for a given point x0 2 ⌦. The dual solution z is then the Green’s function, satisfying a(', z) = '(x0 ). Typically the functional J and therefore also R the dual solution z are regularized by setting J(u) = B1✏ B✏ u(x) dx, where B✏ is the ball of radius ✏ centered at x0 . Thus averaging u over a small ball around x0 generally gives a p close approximation to u(x0 ), with error depending on the smoothness of u and ✏. Let r(x) = |x x0 | + ✏2 be a regularized distance to x0 . It is known that in 2D, z(x) ⇠ log(r(x)) and for |↵| > 0, D↵ z(x) ⇠ r(x) |↵| (at least assuming that @⌦ is sufficiently regular). By choosing h suitably, we can then compute 2 2 3 that !T ⇡ h2T kD2 zkL2 (T ) ⇡ h2T |T |1/2 krkL1 (T ) ⇡ hT krkL1 (T ) . Thus |J(u) J(uh )| ⇡ X T 2Th h3T krkL1 (T )2 ⇢T . This sketch is very incomplete. For example, in general information about D2 z must be estimated from zh , which can become computationally involved. However, in general DWR estimators do a dwr1 72 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY very e↵ective job at directing adaptive refinement to compute u(x0 ) and other locally supported quantities of interest efficiently, and with proper adjustments also to estimating the resulting error. There are also drawbacks. First, computation of the dual solution zh will be viewed by many as relatively expensive. It involves solution of a global, dual problem that in the current situation is just as costly as solving the original finite element equations. It is however often pointed out that in the case of nonlinear problems, the dual problem is still linear and thus the added expense may be relatively minor. In addition the use of DWR methods becomes less clear as the functional J becomes more nonlinear. If one wants for example to control ku uh kL1 (⌦) instead of (u uh )(x0 ), then it is not so clear how to choose J and compute the associated dual solution z. If error control in norms is desired, it may make sense to instead resort to other techniques. Finally, the most e↵ective techniques for estimating Dz (which are derived from di↵erence quotients) do not lead to provably reliable estimators, even though they are typically e↵ective in practice. 3.3.2 Recovery estimators We next discuss recovery-type a posteriori error estimators. Such estimators broadly speaking work by “recovering” an approximation G(uh ) to ru from the discrete solution uh which is superconvergent or otherwise a better approximation to ru than is ruh . By superconvergent, we mean that kG(uh ) ruk ! 0 as h ! 0. kr(u uh )k (3.90) A standard and popular example AO00 is the Zienkiewicz-Zhu superconvergent patch recovery estimator; we take our discussion from [3, Chapter 4]. Let N be the set of nodes in a mesh Th . Given z 2 N , we denote by !z the patch of elements sharing the vertex z. Assume also that Sh is a standard piecewise linear finite element space. The recovered gradient G(uh ) will be a finite element function polynomial of degree 1 over !z in each component. We begin by defining an auxiliary function gz that is linear in each component. gz is taken to have x-component ↵1,x + ↵2,x x + ↵3,x y and similarly for the y-component. Given T ⇢ !Z , let bT be the barycenter of T . gz (x, y) = (↵1,x + ↵2,x x + ↵3,x y, ↵1,y + ↵2,y x + ↵3,y y) is taken to be the vector function which uniquely minimizes ◆2 X ✓ @uh (bT ) ↵1,x + ↵2,x x + ↵3,x y . @x T ⇢!z and similarly for the y-component. The recovered gradient G(uh ) is the piecewise linear function (in each component)superconvergent with value gz (z) at each node z. Assuming that (3.90) holds, we then have kG(uh ) kr(u ruh kL2 (⌦) kG(uh ) uh )kL2 (⌦) rukL2 (⌦) + kr(u ku uh kL2 (⌦) uh )kL2 (⌦) ! 1 as h ! 0. A similar lower bound can be obtained by using the triangle inequality in the other direction. We thus have kG(uh ) ruh kL2 (⌦) /kr(u uh )kL2 (⌦) ! 0 as h ! 0. We say that the error estimator kG(uh ) ruh kL2 (⌦) is asymptotically exact in this case. The property of asymptotic exactness is highly desirable, and because the ZZ estimator only depends on the solution uh directly and not on properties of the underlying PDE is it also very superconvergent 3.3. OTHER A POSTERIORI ERROR ESTIMATION TECHNIQUES 73 easy to program into a highly portable code. There are two main drawbacks to the ZZ estimator, superconvergent however. First, while the superconvergence property (3.90) is often observed in practice, it is much harder to establish theoretically in much generality. On highly structured (uniform) grids and for linear elements it is observed, and can be proved, that ruh (bT ) is a higher-order approximation to ru(bT ) than is generally the case for other points. That is, for the finite element method using linear elements on uniform grids is superconvergent at barycenters for gradients. However, BX03 superconvegence is observed computationally in much more general circumstances. The paper [10] explains this observation by showing that if the grid satisfies a much weaker regularity property involving “near-parallelograms” formed by adjacent mesh elements, then sufficient superconvergence superconvergent will occur to guarantee (3.90) for suitably defined recovery operators. A second issue involves reliability of ZZ-type (recovery) estimators. Assume that we solve u= f , and it happens that 0 6= f ? §h . Although this situation may seem artificial, it is not hard to construct such examples. Then we find uh = 0, and because G(uh ) depends only on uh and not on f , we also have G(uh ) = 0. Similarly, the error estimate kG(uh ) ruh k = 0. Unlike the residual estimators we studied previously, recovery estimators thus are not reliable. If implemented in an AFEM, the AFEM will stop at this point. Clearly this situation is a disaster if it occurs, although it may not happen so often in practice. There are also techniques for “safeguarding” the ZZ estimator by adding on additional terms which guarantee coarse-mesh reliability while preserving asymptotic FV06 exactness; cf. [28]. 3.3.3 Equilibrated flux estimators Another option for developing a posteriori error estimators uses somewhat di↵erent techniques. Our entry point is the Prager-Synge identity. We present a variation more directly useful for a posteriori EV15 error estimation. While we take our immediate presentation from [26], we note that a large number of papers over the past couple of decades have exploited similar ideas in constructing a posteriori error estimates. Lemma 3.3.1 Let u 2 H01 (⌦) weakly solve u = f . Let uhR 2 Sh be the finite element approxiR mation to u over the mesh Th , and let h 2 H(div; ⌦) satisfy T div h dx = T f dx. Let also now hT = diam(T ). Then uh )k2L2 (⌦) kr(u X (kruh + T 2Th 2 h kL2 (T ) + hT kf ⇡ div Proof. Let v 2 H01 (⌦). Then integrating by parts and employing Z ⌦ r(u uh ) · rv = Z (ru + ⌦ h) · rv R R R Using T div h = T f , we have T (f recall the Poincaré inequality inf kw wT 2R Z div (ruh + h) ⌦ h )v = R T (f · rv = div Z h kL2 (T ) ) 2 . (3.91) PS u = f , we have (f div h )v ⌦ h )(v Z (ruh + ⌦ h) · rv. vT ) for any vT 2 R. We now wT kL2 (T ) CT hT krwkL2 (T ) , w 2 H 1 (T ). 74 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY PW60 It is known that we can in fact take CT = 1/⇡ [36]. Thus breaking integrals over ⌦ into sums over integrals over mesh elements, we have for any v 2 H01 (⌦) Z Z X Z r(u uh ) · rv = ( (f div h )(v vT ) (ruh + h )rv ⌦ T 2Th X T T 2 Th ( T hT kf ⇡ X hT ( kf ⇡ div div T 2Th h kL2 (T ) h kL2 (T ) + kruh + + kruh + h kL2 (T ) )krvkL2 (T ) h kL2 (T ) ) 2 !1/2 krvkL2 (⌦) . Taking v = r(u uh ) and dividing through by kr(u uh )kL2 (⌦)PScompletes the proof. 2 Let us reflect on what we have shown. First, note that (3.91) is both rigorous (it gives a PS reliable upper bound), and that there are no unknown constants in (3.91). These two facts together constitute the good news. The bad news is that we have not come up with an e↵ective way to compute h ; it is thus far completely unknown. One option is to solve the standard mixed finite element problem. Let RTk ⇥ P Ck be a standard mixed finite element space consisting of degree-k (k ge0) Raviart-Thomas elements and piecewise polynomials. We then seek ( h , ũh ) 2 RTk ⇥ P Ck such that Z Z Z Z ũh div ⌧h , ⌧h 2 RTk , (div h )vh = f vh , v h 2 P C K . (3.92) h · ⌧h = ⌦ ⌦ ⌦ R R h then satisfies the given conditions, including the approximate equilibrium condition T div h = f , and we could in theory insert it into our identity. This however raises an important cost-benefit T question. Solving the mixed finite element problem above requires again solving a global system, and it is a saddle point system to boot. It would be more desirable to instead find an appropriate flux approximation h which is locally instead of globally defined. We thus employ a local version of global_mfem (3.92). Given a vertex z 2 Nh , let !z be the patch of elements sharing z. Also, let RTkz be the set of functions which are 0 outside of !z and lie in RTk ; the latter condition is equivalent to requiring ⌧h · ~n = 0 on @!z \ @⌦, where ~n is the unit normal on @!z .R We similarly define P Ck,z to be the piecewise polynomials of degree k on !z with the constraint !z vh = 0 when z is an interior node. Finally, let z be the standard continuous piecewise linear hat function corresponding to z. We then let ( h,z , rh,z ) 2 RTk,z ⇥ P Ck,z solve Z Z Z rh,z div ⌧h,z = h,z · ⌧h,z z ruh · ⌧h,z , ⌧h,z 2 RTk,z , !z Z !z Z !z (3.93) div h,z vh,z = ( z f r z · ruh )vh,z , vh,z 2 P Ck,z . P !z ⌦ local_mfem !z We then set h = z2Nh h,z , which is in H(div; ⌦). We now discuss the properties of h . First, R R for any interior node z we have !z ruh r z = !z z f (since uh is the Galerkin solution), so R R R ( z f r z · ruh ) = 0. This in turn implies that !z div h,z vh = !z ( z f r z ruh )vh for all !z vh 2 P Ck . Summing over z and using the fact that { z } is a partition of unity yields Z Z Z X X (f r( f vh , v h 2 P C k . (3.94) h vh = z z ) · ruh )vh = ⌦) global_mfem ⌦ z2Nh z2Nh ⌦ sigma 3.3. OTHER A POSTERIORI ERROR ESTIMATION TECHNIQUES R R Thus the equilibrium condition T h = T f is satisfied. sigma Next note that (3.94) holds elementwise (since P Ck is a broken space, so that kf div PS kf PPk (T ) f kL2 (T ) with PPk (T ) the L2 (T ) projection onto Pk . Thus (3.91) reduces to kr(u uh )k2L2 (⌦) X T 2Th (kruh + 2 h kL2 (T ) + hT kf ⇡ PPk (T ) )2 . 75 h kL2 (T ) = (3.95) The latter term is a data oscillation term and is generally of higher order. If we take k = r, then kr(u uh )kL2 (⌦) = O(hr ), and hT kf PPk (T ) kL2 (T ) = O(hr+2 ), soEV15 that we expect kruh + h k to dominate the estimator asymptotically. Numerical results (cf. [26]) indicate that very good correlation between errors and estimators may be obtained with this method, with efficiency indices quite close to 1 and robust with respect to polynomial degree. Because the upper bound is also guaranteed with no unknown constants, we see that these estimators have some attractive features. 76 CHAPTER 3. A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY