Advanced Fixed Point Theory for Economics Andrew McLennan April 8, 2014 Preface Over two decades ago now I wrote a rather long survey of the mathematical theory of fixed points entitled Selected Topics in the Theory of Fixed Points. It had no content that could not be found elsewhere in the mathematical literature, but nonetheless some economists found it useful. Almost as long ago, I began work on the project of turning it into a proper book, and finally that project is coming to fruition. Various events over the years have reinforced my belief that the mathematics presented here will continue to influence the development of theoretical economics, and have intensified my regret about not having completed it sooner. There is a vast literature on this topic, which has influenced me in many ways, and which cannot be described in any useful way here. Even so, I should say something about how the present work stands in relation to three other books on fixed points. Fixed Point Theorems with Applications to Economics and Game Theory by Kim Border (1985) is a complement, not a substitute, explaining various forms of the fixed point principle such as the KKMS theorem and some of the many theorems of Ky Fan, along with the concrete details of how they are actually applied in economic theory. Fixed Point Theory by Dugundji and Granas (2003) is, even more than this book, a comprehensive treatment of the topic. Its fundamental point of view (applications to nonlinear functional analysis) audience (professional mathematicians) and technical base (there is extensive use of algebraic topology) are quite different, but it is still a work with much to offer to economics. Particularly notable is the extensive and meticulous information concerning the literature and history of the subject, which is full of affection for the theory and its creators. The book that was, by far, the most useful to me, is The Lefschetz Fixed Point Theorem by Robert Brown (1971). Again, his approach and mine have differences rooted in the nature of our audiences, and the overall objectives, but at their cores the two books are quite similar, in large part because I borrowed a great deal. I would like to thank the many people who, over the years, have commented favorably on Selected Topics. It is a particular pleasure to acknowledge some very detailed and generous written comments by Klaus Ritzberger. This work would not have been possible without the support and affection of my families, both present and past, for which I am forever grateful. i Contents 1 Introduction and Summary 1.1 The First Fixed Point Theorems 1.2 “Fixing” Kakutani’s Theorem . 1.3 Essential Sets of Fixed Points . 1.4 Index and Degree . . . . . . . . 1.4.1 Manifolds . . . . . . . . 1.4.2 The Degree . . . . . . . 1.4.3 The Fixed Point Index . 1.5 Topological Consequences . . . 1.6 Dynamical Systems . . . . . . . I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topological Methods 22 2 Planes, Polyhedra, and Polytopes 2.1 Affine Subspaces . . . . . . . . . 2.2 Convex Sets and Cones . . . . . . 2.3 Polyhedra . . . . . . . . . . . . . 2.4 Polytopes . . . . . . . . . . . . . 2.5 Polyhedral Complexes . . . . . . 2.6 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Computing Fixed Points 3.1 The Lemke-Howson Algorithm . . . . . . . . 3.2 Implementation and Degeneracy Resolution 3.3 Using Games to Find Fixed Points . . . . . 3.4 Sperner’s Lemma . . . . . . . . . . . . . . . 3.5 The Scarf Algorithm . . . . . . . . . . . . . 3.6 Homotopy . . . . . . . . . . . . . . . . . . . 3.7 Remarks on Computation . . . . . . . . . . 4 Topologies on Spaces of Sets 4.1 Topological Terminology . . . 4.2 Spaces of Closed and Compact 4.3 Vietoris’ Theorem . . . . . . . 4.4 Hausdorff Distance . . . . . . 4.5 Basic Operations on Subsets . 2 3 5 7 9 10 11 15 17 18 . . . Sets . . . . . . . . . ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 24 26 29 30 33 . . . . . . . 35 36 44 49 51 54 58 59 . . . . . 66 66 67 68 69 71 iii CONTENTS 4.5.1 4.5.2 4.5.3 4.5.4 4.5.5 4.5.6 Continuity of Union . . . . . . . . . Continuity of Intersection . . . . . . Singletons . . . . . . . . . . . . . . . Continuity of the Cartesian Product The Action of a Function . . . . . . . The Union of the Elements . . . . . . . . . . . . . . . . . . 5 Topologies on Functions and Correspondences 5.1 Upper and Lower Semicontinuity . . . . . . . . 5.2 The Strong Upper Topology . . . . . . . . . . . 5.3 The Weak Upper Topology . . . . . . . . . . . . 5.4 The Homotopy Principle . . . . . . . . . . . . . 5.5 Continuous Functions . . . . . . . . . . . . . . . 6 Metric Space Theory 6.1 Paracompactness . . . . . 6.2 Partitions of Unity . . . . 6.3 Topological Vector Spaces 6.4 Banach and Hilbert Spaces 6.5 EmbeddingTheorems . . . 6.6 Dugundji’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Retracts 7.1 Kinoshita’s Example . . . . . . . 7.2 Retracts . . . . . . . . . . . . . . 7.3 Euclidean Neighborhood Retracts 7.4 Absolute Neighborhood Retracts 7.5 Absolute Retracts . . . . . . . . . 7.6 Domination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 72 72 73 74 . . . . . 76 77 78 80 82 83 . . . . . . 85 85 86 88 90 92 93 . . . . . . 95 95 97 99 100 102 104 8 Essential Sets of Fixed Points 107 8.1 The Fan-Glicksberg Theorem . . . . . . . . . . . . . . . . . . . . . 108 8.2 Convex Valued Correspondences . . . . . . . . . . . . . . . . . . . . 110 8.3 Kinoshita’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 112 9 Approximation of Correspondences 9.1 The Approximation Result . . . . . . . . . 9.2 Extending from the Boundary of a Simplex 9.3 Extending to All of a Simplicial Complex . 9.4 Completing the Argument . . . . . . . . . II Smooth Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 115 116 118 120 124 10 Differentiable Manifolds 125 10.1 Review of Multivariate Calculus . . . . . . . . . . . . . . . . . . . . 126 10.2 Smooth Partitions of Unity . . . . . . . . . . . . . . . . . . . . . . . 128 1 CONTENTS 10.3 10.4 10.5 10.6 10.7 10.8 10.9 Manifolds . . . . . . . . . . . . . . . . Smooth Maps . . . . . . . . . . . . . . Tangent Vectors and Derivatives . . . . Submanifolds . . . . . . . . . . . . . . Tubular Neighborhoods . . . . . . . . . Manifolds with Boundary . . . . . . . Classification of Compact 1-Manifolds . 11 Sard’s Theorem 11.1 Sets of Measure Zero . . . . . . . . 11.2 A Weak Fubini Theorem . . . . . . 11.3 Sard’s Theorem . . . . . . . . . . . 11.4 Measure Zero Subsets of Manifolds 11.5 Genericity of Transversality . . . . 12 Degree Theory 12.1 Orientation . . . . . . . . . . . . . 12.2 Induced Orientation . . . . . . . . 12.3 The Degree . . . . . . . . . . . . . 12.4 Composition and Cartesian Product 13 The 13.1 13.2 13.3 13.4 13.5 III Fixed Point Index Axioms for an Index on a Single Multiple Spaces . . . . . . . . . The Index for Euclidean Spaces Extension by Commutativity . . Extension by Continuity . . . . . . . . . . . . . . . . . . . . . . Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 132 133 136 140 143 146 . . . . . 150 151 153 154 157 158 . . . . 163 163 168 171 174 . . . . . 176 177 178 180 182 189 Applications and Extensions 193 14 Topological Consequences 14.1 Euler, Lefschetz, and Eilenberg-Montgomery 14.2 The Hopf Theorem . . . . . . . . . . . . . . 14.3 More on Maps Between Spheres . . . . . . . 14.4 Invariance of Domain . . . . . . . . . . . . . 14.5 Essential Sets Revisited . . . . . . . . . . . . 15 Vector Fields and their Equilibria 15.1 Euclidean Dynamical Systems . . . 15.2 Dynamics on a Manifold . . . . . . 15.3 The Vector Field Index . . . . . . . 15.4 Dynamic Stability . . . . . . . . . . 15.5 The Converse Lyapunov Problem . 15.6 A Necessary Condition for Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 195 197 200 206 207 . . . . . . 211 212 213 216 220 222 226 Chapter 1 Introduction and Summary The Brouwer fixed point theorem states that if C is a nonempty compact convex subset of a Euclidean space and f : C → C is continuous, then f has a fixed point, which is to say that there is an x∗ ∈ C such that f (x∗ ) = x∗ . The proof of this by Brouwer (1912) was one of the major events in the history of topology. Since then the study of such results, and the methods used to prove them, has flourished, undergoing radical transformations, becoming increasingly general and sophisticated, and extending its influence to diverse areas of mathematics. Around 1950, most notably through the work of Nash (1950, 1951) on noncooperative games, and the work of Arrow and Debreu (1954) on general equilibrium theory, it emerged that in economists’ most basic and general models, equilibria are fixed points. The most obvious consequence of this is that fixed point theorems provide proofs that these models are not vacuous. But fixed point theory also informs our understanding of many other issues such as comparative statics, robustness under perturbations, stability of equilibria with respect to dynamic adjustment processes, and the algorithmics and complexity of equilibrium computation. In particular, since the mid 1970’s the theory of games has been strongly influenced by refinement concepts defined largely in terms of robustness with respect to certain types of perturbations. As the range and sophistication of economic modelling has increased, more advanced mathematical tools have become relevant. Unfortunately, the mathematical literature on fixed points is largely inaccessible to economists, because it relies heavily on homology. This subject is part of the standard graduate school curriculum for mathematicians, but for outsiders it is difficult to penetrate, due to its abstract nature and the amount of material that must be absorbed at the beginning before the structure, nature, and goals of the theory begin to come into view. Many researchers in economics learn advanced topics in mathematics as a side product of their research, but unlike infinite dimensional analysis or continuous time stochastic processes, algebraic topology will not gradually achieve popularity among economic theorists through slow diffusion. Consequently economists have been, in effect, shielded from some of the mathematics that is most relevant to their discipline. This monograph presents an exposition of advanced material from the theory of fixed points that is, in several ways, suitable for graduate students and researchers in mathematical economics and related fields. In part the “fit” with the intended 2 1.1. THE FIRST FIXED POINT THEOREMS 3 audience is a matter of coverage. Economic models always involve domains that are convex, or at least contractible, so there is little coverage here of topics that only become interesting when the underlying space is more complicated. For the settings of interest, the treatment is comprehensive and maximally general, with issues related to correspondences always in the foreground. The project was originally motivated by a desire to understand the existence proofs in the literature on refinements of Nash equilibrium as applications of preexisting mathematics, and the continuing influence of this will be evident. The mathematical prerequisites are within the common background of advanced students and researchers in theoretical economics. Specifically, in addition to multivariate calculus and linear algebra, we assume that the reader is familiar with basic aspects of point-set topology. What we need from topics that may be less familiar to some (e.g., simplicial complexes, infinite dimensional linear spaces, the theory of retracts) will be explained in a self-contained manner. There will be no use of homological methods. The avoidance of homology is a practical necessity, but it can also be seen as a feature rather than a bug. In general, mathematical understanding is enhanced when brute calculations are replaced by logical reasoning based on conceptually meaningful definitions. To say that homology is a calculational machine is a bit simplistic, but it does have that potential in certain contexts. Avoiding it commits us to work with notions that have more direct and intuitive geometric content. (Admittedly there is a slight loss of generality, because there are acyclic—that is, homologically trivial—spaces that are not contractible, but this is unimportant because such spaces are not found “in nature.”) Thus our treatment of fixed point theory can be seen as a mature exposition that presents the theory in a natural and logical manner. In the remainder of this chapter we give a broad overview of the contents of the book. Unlike many subjects in mathematics, it is possible to understand the statements of many of the main results with much less preparation than is required to understand the proofs. Needless to say, as usual, not bothering to study the proofs has many dangers. In addition, the material in this book is, of course, closely related to various topics in theoretical economics, and in many ways quite useful preparation for further study and research. 1.1 The First Fixed Point Theorems A fixed point of a function f : X → X is an element x∗ ∈ X such that f (x∗ ) = x∗ . If X is a topological space, it is said to have the fixed point property if every continuous function from X to itself has a fixed point. The first and most famous result in our subject is Brouwer’s fixed point theorem: Theorem 1.1.1 (Brouwer (1912)). If C ⊂ Rm is nonempty, compact, and convex, then it has the fixed point property. Chapter 3 presents various proofs of this result. Although some are fairly brief, none of them can be described as truly elementary. In general, proofs of Brouwer’s 4 CHAPTER 1. INTRODUCTION AND SUMMARY theorem are closely related to algorithmic procedures for finding approximate fixed points. Chapter 3 discusses the best known general algorithm due to Scarf, a new algorithm due to the author and Rabee Tourky, and homotopy methods, which are the most popular in practice, but require differentiability. The last decade has seen major breakthroughs in computer science concerning the computational complexity of computing fixed points, with particular reference to (seemingly) simple games and general equilibrium models. These developments are sketched briefly in Section 3.7. In economics and game theory fixed point theorems are most commonly used to prove that a model has at least one equilibrium, where an equilibrium is a vector of “endogenous” variable for the model with the property that each individual agent’s predicted behavior is rational, or “utility maximizing,” if that agent regards all the other endogenous variables as fixed. In economics it is natural, and in game theory unavoidable, to consider models in which an agent might have more than one rational choice. Our first generalization of Brouwer’s theorem addresses this concern. If X and Y are sets, a correspondence F : X → Y is a function from X to the nonempty subsets of Y . (On the rare occasions when they arise, we use the term set valued mapping for a function from X to all the subsets of Y , including the empty set.) We will tend to regard a function as a special type of correspondence, both intuitively and in the technical sense that we will frequently blur the distinction between a function f : X → Y and the associated correspondence x 7→ {f (x)}. If Y is a topological space, F is compact valued if, for all x ∈ X, F (x) is compact. Similarly, if Y is a subset of a vector space, then F is convex valued if each F (x) is convex. The extension of Brouwer’s theorem to correspondences requires a notion of continuity for correspondences. If X and Y are topological spaces, a correspondence F : X → Y is upper semicontinuous if it is compact valued and, for each x0 ∈ X and each neighborhood V ⊂ Y of F (x0 ), there is a neighborhood U ⊂ X of x0 such that F (x) ⊂ V for all x ∈ U. It turns out that if X and Y are metric spaces and Y is compact, then F is upper semicontinuous if and only if its graph Gr(F ) := { (x, y) ∈ X × Y : y ∈ F (x) } is closed. (Proving this is a suitable exercise, if you are so inclined.) Thinking of upper semicontinuity as a matter of the graph being closed is quite natural, and in economics this condition is commonly taken as definition, as in Debreu (1959). In Chapter 5 we will develop a topology on the space of nonempty compact subsets of Y such that F is upper semicontinuous if and only if it is a continuous function relative to this topology. A fixed point of a correspondence F : X → X is a point x∗ ∈ X such that ∗ x ∈ F (x∗ ). Kakutani (1941) was motivated to prove the following theorem by the desire to provide a simple approach to the von Neumann (1928) minimax theorem, which is a fundamental result of game theory. This is the fixed point theorem that is most commonly applied in economic analysis. 1.2. “FIXING” KAKUTANI’S THEOREM 5 Theorem 1.1.2 (Kakutani’s Fixed Point Theorem). If C ⊂ Rm is nonempty, compact, and convex, and F : C → C is an upper semicontinuous convex valued correspondence, then F has a fixed point. 1.2 “Fixing” Kakutani’s Theorem Mathematicians strive to craft theorems that maximize the strength of the conclusions while minimizing the strength of the assumptions. One reason for this is obvious: a stronger theorem is a more useful theorem. More important, however, is the desire to attain a proper understanding of the principle the theorem expresses, and to achieve an expression of this principle that is unencumbered by useless clutter. When a theorem that is “too weak” is proved using methods that “happen to work” there is a strong suspicion that attempts to improve the theorem will uncover important new concepts. In the case of Brouwer’s theorem the conclusion, that the space has the fixed point property, is a purely topological assertion. The assumption that the space is convex, and in Kakutani’s theorem the assumption that the correspondence’s values are convex, are geometric conditions that seems out of character and altogether too strong. Suitable generalizations were developed after World War II. A homotopy is a continuous function h : X × [0, 1] → Y where X and Y are topological spaces. It is psychologically natural to think of the second variable in the domain as representing time, and we let ht := h(·, t) : X → Y denote “the function at time t,” so that h is a process that continuously deforms a function h0 into h1 . Another intuitive picture is that h is a continuous path in the space C(X, Y ) of continuous function from X to Y . As we will see in Chapter 5, this intuition can be made completely precise: when X and Y are metric spaces and X is compact, there is a topology on C(X, Y ) such that a continuous path h : [0, 1] → C(X, Y ) is the same thing as a homotopy. We say that two functions f, g : X → Y are homotopic if there is a homotopy h with h0 = f and h1 = g. This is easily seen to be an equivalence relation: symmetry and reflexivity are obvious, and to establish transitivity we observe that if e is homotopic to f and f is homotopic to g, then there is a homotopy between e and g that follows a homotopy between e and f at twice its original speed, then follows a homotopy between f and g at double the pace. The equivalence classes are called homotopy classes. A space X is contractible if the identity function IdX is homotopic to a constant function. That is, there is a homotopy c : X × [0, 1] → X such that c0 = IdX and c1 (X) is a singleton; such a homotopy is called a contraction. Convex sets are contractible. More generally, a subset X of a vector space is star-shaped if there is x∗ ∈ X (the star) such that X contains the line segment { (1 − t)x + tx∗ : 0 ≤ t ≤ 1 } between each x ∈ X and x∗ . If X is star-shaped, there is a contraction (x, t) 7→ (1 − t)x + tx∗ . 6 CHAPTER 1. INTRODUCTION AND SUMMARY It seems natural to guess that a nonempty compact contractible space has the fixed point property. Whether this is the case was an open problem for several years, but it turns out to be false. In Chapter 7 we will see an example due to Kinoshita (1953) of a nonempty compact contractible subset of R3 that does not have the fixed point property. Fixed point theory requires some additional ingredient. If X is a topological space, a subset A ⊂ X is a retract if there is a continuous function r : X → A with r(a) = a for all a ∈ A. Here we tend to think of X as a “simple” space, and the hope is that although A might seem to be more complex, or perhaps “crumpled up,” it nonetheless inherits enough of the simplicity of X. A particularly important manifestation of this is that if r : X → A is a retraction and X has the fixed point property, then so does A, because if f : A → A is continuous, then so is f ◦ r : X → A ⊂ X, so f ◦ r has a fixed point, and this fixed point necessarily lies in A and is consequently a fixed point of f . Also, a retract of a contractible space is contractible because if c : X × [0, 1] → X is a contraction of X and r : X → A ⊂ X is a retraction, then (a, t) 7→ r(c(a, t)) is a contraction of A. A set A ⊂ Rm is a Euclidean neighborhood retract (ENR) if there is an open superset U ⊂ Rm of A and a retraction r : U → A. If X and Y are metric spaces, an embedding of X in Y is a function e : X → Y that is a homeomorphism between X and e(X). That is, e is a continuous injection1 whose inverse is also continuous when e(X) has the subspace topology inherited from Y . An absolute neighborhood retract (ANR) is a separable2 metric space X such that whenever Y is a separable metric space and e : X → Y is an embedding, there is an open superset U ⊂ Y of e(X) and a retraction r : U → e(X). This definition probably seems completely unexpected, and it’s difficult to get any feeling for it right away. In Chapter 7 we’ll see that ANR’s have a simple characterization, and that many of the types of spaces that come up most naturally are ANR’s, so this condition is quite a bit less demanding than one might guess at first sight. In particular, it will turn out that every ENR is an ANR, so that being an ENR is an “intrinsic” property insofar as it depends on the topology of the space and not on how the space is embedded in a Euclidean space. An absolute retract (AR) is a separable metric space X such that whenever Y is a separable metric space and e : X → Y is an embedding, there is a retraction r : Y → e(X). In Chapter 7 we will prove that an ANR is an AR if and only if it is contractible. Theorem 1.2.1. If C is a nonempty compact AR and F : C → C is an upper semicontinuous contractible valued correspondence, then F has a fixed point. An important point is that the values of F are not required to be ANR’s. 1 We will usually use the terms “injective” rather than “one-to-one,” “ surjective” rather than “onto,” and “bijective” to indicate that a function is both injective and surjective. An injection is an injective function, a surjection is a surjective function, and a bijection is a bijective function. 2 A metric space is separable if it has a countable dense subset. 7 1.3. ESSENTIAL SETS OF FIXED POINTS For “practical purposes” this is the maximally general topological fixed point theorem, but for mathematicians there is an additional refinement. There is a concept called acyclicity that is defined in terms of the concepts of algebraic topology. A contractible set is necessarily acyclic, but there are acyclic spaces (including compact ones) that are not contractible. The famous Eilenberg-Montgomery fixed point theorem is: Theorem 1.2.2 (Eilenberg and Montgomery (1946)). If C is a nonempty compact AR and F : C → C is an upper semicontinuous acyclic valued correspondence, then F has a fixed point. 1.3 Essential Sets of Fixed Points It might seem like we have already reached a satisfactory and fitting resolution of “The Fixed Point Problem,” but actually (both in pure mathematics and in economics) this is just the beginning. You see, fixed points come in different flavors. 1 b b 0 0 s t 1 Figure 1.1 The figure above shows a function f : [0, 1] → [0, 1] with two fixed points, s and t. If we perturb the function slightly by adding a small positive constant, s “disappears” in the sense that the perturbed function does not have a fixed point anywhere near s, but a function close to f has a fixed point near t. More precisely, if X is a topological space and f : X → X is continuous, a fixed point x∗ of f is essential if, for any neighborhood U of x∗ , there is a neighborhood V of the graph of f such that any continuous f ′ : X → X whose graph is contained in V has a fixed point in U. If a fixed point is not essential, then we say that it is inessential. These concepts were introduced by Fort (1950). There need not be an essential fixed point. The function shown in Figure 1.2 8 CHAPTER 1. INTRODUCTION AND SUMMARY has an interval of fixed points. If we shift the function down, there will be a fixed point near the lower endpoint of this interval, and if we shift the function up there will be a fixed point near the upper endpoint. This example suggests that we might do better to work with sets of fixed points. A set S of fixed points of a function f : X → X is essential if it is closed, it has a neighborhood that contains no other fixed points, and for any neighborhood U of S, there is a neighborhood V of the graph of f such that any continuous f ′ : X → X whose graph is contained in V has a fixed point in U. The problem with this concept is that “large” connected sets are not of much use. For example, if X is compact and has the fixed point property, then the set of all fixed points of f is essential. It seems that we should really be interested in sets of fixed points that are either essential and connected3 or essential and minimal in the sense of not having a proper subset that is also essential. 1 0 0 1 Figure 1.2 In Chapter 8 we will show that any essential set of fixed points contains a minimal essential set, and that minimal essential sets are connected. The theory of refinements of Nash equilibrium (e.g., Selten (1975); Myerson (1978); Kreps and Wilson (1982); Kohlberg and Mertens (1986); Mertens (1989, 1991); Govindan and Wilson (2008)) has many concepts that amount to a weakening of the notion of essential set, insofar as the set is required to be robust with respect to only certain types of perturbations of the function or correspondence. In particular, Jiang (1963) pioneered the application of the concept to game theory, defining an essential!Nash equilibrium and an essential set of Nash equilibria in terms of robustness with respect to perturbations of the best response correspondence induced by perturbations of the payoffs. The mathematical foundations of such 3 We recall that a subset S of a topological space X is connected if there do not exist two disjoint open sets U1 and U2 with S ∩ U1 6= ∅ 6= S ∩ U2 and S ⊂ U1 ∪ U2 . 9 1.4. INDEX AND DEGREE concepts are treated in Section 8.3. 1.4 Index and Degree There are different types of essential fixed points. Figure 1.3 shows a function with three fixed points. At two of them the function starts above the diagonal and goes below it as one goes from left to right, and at the third it is the other way around. For any k it is easy to imagine a function with k fixed points of the first type and k − 1 fixed points of the second type. This phenomenon generalizes to higher dimensions. Let D m = { x ∈ Rm : kxk ≤ 1 } and S m−1 = { x ∈ Rm : kxk = 1 } be the m-dimensional unit disk and the (m − 1)-dimensional unit sphere, and suppose that f : D m → D m is a C ∞ function. In the best behaved case each fixed point x∗ is in the interior D m \ S m−1 of the disk and regular, which means that IdRm −Df (x∗ ) is nonsingular, where Df (x∗ ) : Rm → Rm is the derivative of f at x∗ . We define the index of x∗ to be 1 if the determinant of IdRm − Df (x∗ ) is positive and −1 if this determinant is negative. We will see that there is always one more fixed point of index 1 than there are fixed points of index −1, which is to say that the sum of the indices is 1. What about fixed points on the boundary of the disk, or fixed points that aren’t regular, or nontrivial connected sets of fixed points? What about correspondences? What happens if the domain is a possibly infinite dimensional ANR? The most challenging and significant aspect of our work will be the development of an axiomatic theory of the index that is general enough to encompass all these possibilities. The work proceeds through several stages, and we describe them in some detail now. 1 b b b 0 0 1 Figure 1.3 10 1.4.1 CHAPTER 1. INTRODUCTION AND SUMMARY Manifolds First of all, it makes sense to expand our perspective a bit. An m-dimensional manifold is a topological space that resembles Rm in a neighborhood of each of its points. More precisely, for each p ∈ M there is an open U ⊂ Rm and an embedding ϕ : U → M whose image is open and contains p. Such a ϕ is a parameterization and its inverse is a coordinate chart. The most obvious examples are Rm itself and S m . If, in addition, N is an n-dimensional manifold, then M × N is an (m + n)dimensional manifold. Thus the torus S 1 × S 1 is a manifold, and this is just the most easily visualized member of a large class of examples. An open subset of an m-dimensional manifold is an m-dimensional manifold. A 0-dimensional manifold is just a set with the discrete topology. The empty set is a manifold of any dimension, including negative dimensions. Of course these special cases are trivial, but they come up in important contexts. A collection {ϕi : Ui → M}i∈I of parameterizations is an atlas if its images cover M. The composition ϕ−1 j ◦ ϕi (with the obvious domain of definition) is called a transition function. If, for some 1 ≤ r ≤ ∞, all the transition functions are C r functions, then the atlas is a C r atlas. An m-dimensional C r manifold is an m-dimensional manifold together with a C r atlas. The basic concepts of differential and integral calculus extend to this setting, leading to a vast range of mathematics. In our formalities we will always assume that M is a subset of a Euclidean space Rk called the ambient space, and that the parameterizations ϕi and the coordinate charts ϕ−1 are C r functions. This is a bit unprincipled—for example, i physicists see only the universe, and their discourse is more disciplined if it does not refer to some hypothetical ambient space—but this maneuver is justified by embedding theorems due to Whitney that show that it does not entail any serious loss of generality. The advantages for us are that this approach bypasses certain technical pathologies while allowing for simplified definitions, and in many settings the ambient space will prove quite handy. For example, a function f : M → N (where N is now contained in some Rℓ ) is C r for our purposes if it is C r in the standard sense: for any S ⊂ Rk a function h : S → Rℓ is C r , by definition, if there is an open W ⊂ Rk containing S and a C r function H : W → Rℓ such that h = H|S . Having an ambient space around makes it relatively easy to establish the basic objects and facts of differential calculus. Suppose that ϕi : Ui → M is a C r parameterization. If x ∈ Ui and ϕi (x) = p, the tangent space of M at p, which we denote by Tp M, is the image of Dϕi (x). This is an m-dimensional linear subspace of Rk . If f : M → N is C r , the derivative Df (p) : Tp M → Tf (p) N of f at p is the restriction to Tp M of the derivative DF (p) of any C r function F : W → Rℓ defined on an open W ⊂ Rk containing M whose restriction to M is f . (In Chapter 10 we will show that the choice of F doesn’t matter.) The chain rule holds: if, in addition, P is a p-dimensional C r manifold and g : N → P is a C r function, then g ◦ f is C r and D(g ◦ f )(p) = Dg(f (p)) ◦ Df (p) : Tp M → Tg(f (p)) P. 1.4. INDEX AND DEGREE 11 The inverse and implicit function theorems have important generalizations. The point p is a regular point of f if the image of Df (p) is all of Tf (p) N. We say that f : M → N is a C r diffeomorphism if m = n, f is a bijection, and both f and f −1 are C r . The generalized inverse function theorem asserts that if m = n, f : M → N is C r , and p is a regular point of f , then there is an open U ⊂ M containing p such that f (U) is an open subset of N and f |U : U → f (U) is a C r diffeomorphism. If 0 ≤ s ≤ m, a set S ⊂ Rk is an s-dimensional C r submanifold of M if it is an s-dimensional C r submanifold that happens to be contained in M. We say that q ∈ N is a regular value of f if every p ∈ f −1 (q) is a regular point. The generalized implicit function theorem, which is known as the regular value theorem, asserts that if q is a regular value of f , then f −1 (q) is an (m − n)-dimensional C r submanifold of M. 1.4.2 The Degree The degree is closely related to the fixed point index, but it has its own theory, which has independent interest and significance. The approach we take here is to work with the degree up to the point where its theory is more or less complete, then translate what we have learned into the language of the fixed point index. We now need to introduce the concept of orientation. Two ordered bases v1 , . . . , vm and w1 , . . . , wm of an m-dimensional vector space have the same orientation if the determinant of the linear transformation taking each vi to wi is positive. It is easy to see that this is an equivalence relation with two equivalence classes. An oriented vector space is a finite dimensional vector space with a designated orientation whose elements are said to be positively oriented. If V and W are m-dimensional oriented vector spaces, a nonsingular linear transformation L : V → W is orientation preserving if it maps positively oriented ordered bases of V to positively oriented ordered bases of W , and otherwise it is orientation reversing. For an intuitive appreciation of this concept just look in a mirror: the linear map taking each point in the actual world to its position as seen in the mirror is orientation reversing, with right shoes turning into left shoes and such. In our discussion of degree theory nothing is lost by working with C ∞ objects rather than C r objects for general r, and smooth will be a synonym for C ∞ . An orientation for a smooth manifold M is a “continuous” specification of an orientation of each of the tangent spaces Tp M. We say that M is orientable if it has an orientation; the most famous examples of unorientable manifolds are the Möbius strip and the Klein bottle. (From a mathematical point of view 2-dimensional projective space is perhaps more fundamental, but it is difficult to visualize.) An oriented manifold is a manifold together with a designated orientation. If M and N are oriented smooth manifolds of the same dimension, f : M → N is a smooth map, and p is a regular point of f , we say that f is orientation preserving at p if Df (p) : Tp M → Tf (p) N is orientation preserving, and otherwise f is orientation reversing at p. If q is a regular value of f and f −1 (q) is finite, then the degree of f over q, denoted by deg∞ q (f ), is the number of points in −1 f (q) at which f is orientation preserving minus the number of points in f −1 (q) at which f is orientation reversing. 12 CHAPTER 1. INTRODUCTION AND SUMMARY We need to extend the degree to situations in which the target point q is not a regular value of f , and to functions that are merely continuous. Instead of being able to define the degree directly, as we did above, we will need to proceed indirectly, showing that the generalized degree is determined by certain of its properties, which we treat as axioms. The first step is to extend the concept, giving it a “local” character. For a compact C ⊂ M let ∂C = C ∩ (M \ C) be the topological boundary of C, and let int C = C \ ∂C be its interior. A smooth function f : C → N with compact domain C ⊂ M is said to be smoothly degree admissible over q ∈ N if f −1 (q) ∩ ∂C = ∅ and q is a regular value of f . As above, for such a pair (f, q) we define deg∞ q (f ) to be the number of p ∈ f −1 (q) at which f is orientation preserving minus the number ∞ of p ∈ f −1 (q) at which f is orientation reversing. Note that deg∞ q (f ) = deg q (f |C ′ ) ′ −1 whenever C is a compact subset of C and f (q) has an empty intersection with the closure of C \ C ′ . Also, if C = C1 ∪ C2 where C1 and C2 are compact and disjoint, then ∞ ∞ deg∞ q (f ) = deg q (f |C1 ) + deg q (f |C2 ). From the point of view of topology, what makes the degree important is its invariance under homotopy. If C ⊂ M is compact, a smooth homotopy h : C × [0, 1] → N is smoothly degree admissible over q if h−1 (q) ∩ (∂C × [0, 1]) = ∅ and q is a regular value of h0 and h1 . In this circumstance ∞ deg∞ q (h0 ) = deg q (h1 ). (∗) Figure 1.4 illustrates the intuitive character of the proof. b b −1 b b −1 +1 b b +1 b +1 −1 −1 b +1 b b b b t=0 t=1 Figure 1.4 1.4. INDEX AND DEGREE 13 The notion of an m-dimensional manifold with boundary is a generalization of the manifold concept in which each point in the space has a neighborhood that is homeomorphic to an open subset of the closed half space { x ∈ Rm : x1 ≥ 0 }. Aside from the half space itself, the closed disk D m = { x ∈ Rm : kxk ≤ 1 } is perhaps the most obvious example, but for us the most important example is M × [0, 1] where M is an (m − 1)-dimensional manifold without boundary. Note that any mdimensional manifold without boundary is (automatically and trivially) a manifold with boundary. All elements of our discussion of manifolds generalize to this setting. In particular, the generalization of the regular value theorem states that if M is an m-dimensional smooth manifold with boundary, N is an n-dimensional (boundaryless) manifold, f : M → N is smooth, and q ∈ N is a regular value of both f and the restriction of f to the boundary of M, then f −1 (q) is an (m − n)-dimensional manifold with boundary, its boundary is its intersection with the boundary of M, and at each point in this intersection the tangent space of f −1 (q) is not contained in the tangent space of the boundary of M. In particular, if the dimension of M is the dimension of N plus one, then f −1 (q) is a 1-dimensional manifold with boundary. If, in addition, f −1 (q) is compact, then it has finitely many connected components. Suppose now that h : C × [0, 1] → N is smoothly degree admissible over q, and that q is a regular value of h. The consequences of applying the regular value theorem to the restriction of h to int C × [0, 1] are as shown in Figure 1.4: h−1 (q) is a 1-dimensional manifold with boundary, its boundary is its intersection with C × {0, 1}, and h−1 (q) is not tangent to C × {0, 1} at any point in this intersection. In addition h−1 (q) is compact, so it has finitely many connected components, each of which is compact. A connected compact 1-dimensional manifold with boundary is either a circle or a line segment. (It will turn out that this obvious fact is surprisingly difficult to prove!) Thus each component of h−1 (q) is either a circle or a line segment connecting two points in its boundary. If a line segment connects two points in C × {0}, say (p, 0) and (p′ , 0), then it turns out that h0 is orientation preserving at p if and only if it is orientation reversing at p′ . Similarly, if a line segment connects two points (p, 1) and (p′ , 1) in C × {1}, then h1 is orientation preserving at p if and only if it is orientation reversing at p′ . On the other hand, if a line segment connects a point (p0 , 0) in C × {0} to a point (p1 , 1) in C × {1}, then h0 is orientation preserving at p0 if and only if h1 is orientation preserving at p1 . Equation (∗) is obtained by summing these facts over the various components of h−1 (q). This completes our discussion of the proof of (∗) except for one detail: if h : C → [0, 1] → N is a smooth homotopy that is smoothly degree admissible over q, q is not necessarily a regular value of h. Nevertheless, Sard’s theorem (which is the subject of Chapter 11, and a crucial ingredient of our entire approach) implies that h has regular values in any neighborhood of q, and it is also the case that ∞ ∞ ∞ ′ deg∞ q ′ (h0 ) = deg q (h0 ) and deg q ′ (h1 ) = deg q (h1 ) when q is sufficiently close to q. It turns out that the smooth degree is completely characterized by the properties we have seen. That is, if D ∞ (M, N) is the set of pairs (f, q) in which f : C → N is smoothly degree admissible over q, then (f, q) 7→ deg∞ q (f ) is the unique function ∞ from D (M, N) to Z satisfying: ∞ −1 (q) is a singleton {p} (∆1) deg∞ q (f ) = 1 for all (f, q) ∈ D (M, N) such that f 14 CHAPTER 1. INTRODUCTION AND SUMMARY and f is orientation preserving at p. Pr ∞ ∞ (∆2) deg∞ q (f ) = i=1 deg q (f |Ci ) whenever (f, q) ∈ D (M, N), the domain of f is C, and C1 , . . . , Cr are pairwise disjoint compact subsets of C such that f −1 (q) ⊂ int C1 ∪ . . . ∪ int Cr . ∞ (∆3) deg∞ q (h0 ) = deg q (h1 ) whenever C ⊂ M is compact and the homotopy h : C × [0, 1] → N is smoothly degree admissible over q. We note two additional properties of the smooth degree. The first is that if, in addition to M and N, M ′ and N ′ are m′ -dimensional smooth functions, (f, q) ∈ D ∞ (M, N), and (f ′ , q ′ ) ∈ D ∞ (M ′ , N ′ ), then (f ×f ′ , (q, q ′)) ∈ D ∞ (M ×M ′ , N ×N ′ ) and ∞ ∞ ′ ′ deg∞ (q,q ′ ) (f ×f ) = degq (f )·degq ′ (f ). Since (f × f ′ )−1 (q, q ′) = f −1 (q) × f ′ −1 (q ′ ), this boils down to a consequence of elementary facts about determinants: if (p, p′ ) ∈ (f × f ′ )−1 (q, q ′ ), then f × f ′ is orientation preserving at (p, p′) if and only if f and f ′ are either both orientation preserving or both orientation reversing at p and p′ respectively. The second property is a strong form of continuity. A continuous function f : C → N with compact domain C ⊂ M is degree admissible over q ∈ N if f −1 (q) ∩ ∂C = ∅. If this is the case, then there is a neighborhood U ⊂ C × N of the graph of f and a neighborhood V ⊂ N \ f (∂C) of q such that ′′ ′ deg∞ q ′ (f ) = deg q ′′ (f ) whenever f ′ , f ′′ : C → N are smooth functions whose graphs are contained in U, q ′ , q ′′ ∈ V , q ′ is a regular value of f ′ , and q ′′ is a regular value of q ′′ . ′ We can now define degq (f ) to be the common value of deg∞ q ′ (f ) for such pairs (f ′ , q ′ ). Let D(M, N) be the set of pairs (f, q) in which f : C → N is a continuous function with compact domain C ⊂ M that is degree admissible over q ∈ N. The fully general form of degree theory asserts that (f, q) 7→ degq (f ) is the unique function from D(M, N) to Z such that: (D1) degq (f ) = 1 for all (f, q) ∈ D(M, N) such that f is smooth, f −1 (q) is a singleton {p}, and f is orientation preserving at p. P (D2) degq (f ) = ri=1 degq (f |Ci ) whenever (f, q) ∈ D(M, N), the domain of f is C, and C1 , . . . , Cr are pairwise disjoint compact subsets of U such that f −1 (q) ⊂ C1 ∪ . . . ∪ Cr \ (∂C1 ∪ . . . ∪ ∂Cr ). (D3) If (f, q) ∈ D(M, N) and C is the domain of f , then there is a neighborhood U ⊂ C × N of the graph of f and a neighborhood V ⊂ N \ f (∂C) of q such that degq′ (f ′ ) = degq′′ (f ′′ ) whenever f ′ , f ′′ : C → N are continuous functions whose graphs are contained in U and q ′ , q ′′ ∈ V . 15 1.4. INDEX AND DEGREE 1.4.3 The Fixed Point Index Although the degree can be applied to continuous functions, and even to convex valued correspondences, it is restricted to finite dimensional manifolds. For such spaces the fixed point index is merely a reformulation of the degree. Its application to general equilibrium theory was initiated by Dierker (1972), and it figures in the analysis of the Lemke-Howson algorithm of Shapley (1974). There is also a third variant of the underlying principle, for vector fields, that is developed in Chapter 15, and which is related to the theory of dynamical systems. Hofbauer (1990) applied the vector field index to dynamic issues in evolutionary stability, and Ritzberger (1994) applies it systematically to normal form game theory. However, it turns out that the fixed point index can be generalized much further, due to the fact that, when we are discussing fixed points, the domain and the range are the same. The general index is developed in three main stages. In order to encompass these stages in a single system of terminology and notation we take a rather abstract approach. Fix a metric space X. An index admissible correspondence for X is an upper semicontinuous correspondence F : C → X, where C ⊂ X is compact, that has no fixed points in ∂C. An index base for X is a set I of index admissible correspondences such that: (a) f ∈ I whenever C ⊂ X is compact and f : C → X is an index admissible continuous function; (b) F |D ∈ I whenever F : C → X is an element of I, D ⊂ C is compact, and F |D is index admissible. Definition 1.4.1. Let I be an index base for X. An index for I is a function ΛX : I → Z satisfying: (I1) (Normalization) If c : C → X is a constant function whose value is an element of int C, then ΛX (c) = 1. (I2) (Additivity) If F : C → X is an element of I, C1 , . . . , Cr are pairwise disjoint compact subsets of C, and F P(F ) ⊂ int C1 ∪ . . . ∪ int Cr , then X ΛX (F ) = ΛX (F |Ci ). i (I3) (Continuity) For each element F : C → X of I there is a neighborhood U ⊂ C × X of the graph of F such that ΛX (F̂ ) = ΛX (F ) for every F̂ ∈ I whose graph is contained in U. For each m = 0, 1, 2, . . . an index base for Rm is given by letting I m be the set of index admissible continuous functions f : C → Rm . Of course (I1)-(I3) parallel (D1)-(D3), and it is not hard to show that there is a unique index ΛRm for I m given by ΛRm (f ) = deg0 (IdC − f ). We now extend our framework to encompass multiple spaces. An index scope S consists of a class of metric spaces SS and an index base IS (X) for each X ∈ SS such that 16 CHAPTER 1. INTRODUCTION AND SUMMARY (a) SS contains X × X ′ whenever X, X ′ ∈ SS ; (b) F × F ′ ∈ IS (X × X ′ ) whenever X, X ′ ∈ SS , F ∈ IS (X), and F ′ ∈ IS (X ′ ). These conditions are imposed in order to express a property of the index that is inherited from the multiplicative property of the degree for cartesian products. The index also has an additional property that has no analogue in degree theory. Suppose that C ⊂ Rm and C̃ ⊂ Rm̃ are compact, g : C → C̃ and g̃ : C̃ → C are continuous, and g̃ ◦ g and g ◦ g̃ are index admissible. Then ΛRm (g̃ ◦ g) = ΛRm̃ (g ◦ g̃). When g and g̃ are smooth and the fixed points in question are regular, this boils down to a highly nontrivial fact of linear algebra (Proposition 13.3.2) that was unknown prior to the development of this aspect of index theory. This property turns out to be the key to moving the index up to a much higher level of generality, but before we can explain this we need to extend the setup a bit, allowing for the possibility that the images of g and g̃ are not contained in C̃ and C, but that there are compact sets D ⊂ C and D̃ ⊂ C̃ with g(D) ⊂ C̃ and g̃(D̃) ⊂ C that contain the relevant sets of fixed points. Definition 1.4.2. A commutativity configuration is a tuple (X, C, D, g, X̂, Ĉ, D̂, ĝ) where X and X̂ are metric spaces and: (a) D ⊂ C ⊂ X, D̂ ⊂ Ĉ ⊂ X̂, and C, Ĉ, D, and D̂ are compact; (b) g ∈ C(C, X̂) and ĝ ∈ C(Ĉ, X) with g(D) ⊂ int Ĉ and ĝ(D̂) ⊂ int C; (c) ĝ ◦ g|D and g ◦ ĝ|D̂ are index admissible; (d) g(F P(ĝ ◦ g|D )) = F P(g ◦ ĝ|D̂ ). After all these preparations we can finally describe the heart of the matter. Definition 1.4.3. An index for an index scope S is a specification of an index ΛX for each X ∈ SS such that: (I4) (Commutativity) If (X, C, D, g, X̂, Ĉ, D̂, ĝ) is a commutativity configuration with X, X̂ ∈ SS , (D, ĝ ◦ g|D ) ∈ IS (X), and (D̂, g ◦ ĝ|D̂ ) ∈ IS (X̂), then ΛX (ĝ ◦ g|D ) = ΛX̂ (g ◦ ĝ|D̂ ). The index is said to be multiplicative if: (M) (Multiplication) If X, X ′ ∈ SS , F ∈ IS (X), and F ′ ∈ IS (X ′ ), then ΛX×X ′ (F × F ′ ) = ΛX (F ) · Λ′X (F ′ ). 1.5. TOPOLOGICAL CONSEQUENCES 17 Let SS Ctr be the class of ANRs, and for each X ∈ SS Ctr let IS Ctr (X) be the union over compact C ⊂ X of the sets of index admissible upper semicontinuous contractible valued correspondences F : C → X. The central goal of this book is: Theorem 1.4.4. There is a unique index ΛCtr for S Ctr , which is multiplicative. The passage from the indices ΛRm to ΛCtr has two stages. The first exploits Commutativity to extend from Euclidean spaces and continuous functions to ANR’s and continuous functions. There is a significant result that is the technical basis for this. Let X be a metric space with metric d. If Y is a topological space and ε > 0, a homotopy η : Y × [0, 1] → X is an ε-homotopy if d η(y, s), η(y, t) < ε for all y ∈ Y and all 0 ≤ s, t ≤ 1. We say that h0 and h1 are ε-homotopic. For ε > 0, a topological space D ε-dominates C ⊂ X if there are continuous functions ϕ : C → D and ψ : D → X such that ψ ◦ ϕ : C → X is ε-homotopic to IdC . In Section 7.6 we show that: Theorem 1.4.5. If X is a separable ANR, C ⊂ X is compact, and ε > 0, then there is an open U ⊂ Rm , for some m, such that U is compact and ε-dominates C. The second stage passes from continuous function to contractible valued correspondences. As in the passage from the smooth degree to the continuous degree, the idea is to use approximation by functions to define the extension. The basis of this is a result of Mas-Colell (1974) that was extended to ANR’s by the author (McLennan (1991)) and is the topic of Chapter 9. Theorem 1.4.6 (Approximation Theorem). Suppose that X is a separable ANR and C and D are compact subsets of X with C ⊂ int D. Let F : D → Y be an upper semicontinuous contractible valued correspondence. Then for any neighborhood U of Gr(F |C ) there are: (a) a continuous f : C → Z with Gr(f ) ⊂ U; (b) a neighborhood U ′ of Gr(F ) such that, for any two continuous functions f0 , f1 : D → Y with Gr(f0 ), Gr(f1 ) ⊂ U ′ , there is a homotopy h : C × [0, 1] → Y with h0 = f0 |C , h1 = f1 |C , and Gr(ht ) ⊂ U for all 0 ≤ t ≤ 1. 1.5 Topological Consequences The final section of the book develops applications of the index. Chapter 14 presents a number of classical concepts and results from topology that are usually proved homologically. Let X be a compact ANR. The Euler characteristic of X is the index of IdX . If F : X → X is an upper semicontinuous contractible valued correspondence, the index of F is called the Lefschetz number of F . Of course Additivity implies that F has a fixed point if its Lefschetz number is not zero. The celebrated Lefschetz fixed point theorem is this assertion (usually restricted to 18 CHAPTER 1. INTRODUCTION AND SUMMARY compact manifolds and continuous functions) together with a homological characterization of the Lefschetz number. If X is contractible, then the Lefschetz number of any F : X → X is equal to the Euler characteristic of F , which is one. Thus we arrive at our version of the Eilenberg-Montgomery theorem: if X is a compact AR and F : X → X is a upper semicontinuous contractible valued correspondence, then F has a fixed point. Chapter 14 also develops many of the classical theorems concerning maps between spheres. The most basic of these is Hopf’s theorem: two continuous functions f, f ′ : S m → S m are homotopic if and only if they have the same degree, so that the degree is a “complete” homotopy invariant for maps between spheres of the same dimension. There are many other theorems concerning maps between spheres of the same dimension. Of these, one in particular has greater depth: if f : S m → S m is continuous and f (−p) = −f (p) for all p ∈ S m , then the degree of f is odd. This and its many corollaries constitute the Borsuk-Ulam theorem. Using these results, we prove the frequently useful theorem known as invariance of domain: if U ⊂ Rm is open and f : U → Rm is continuous and injective, then f (U) is open and f is a homeomorphism onto its image. If a connected set of fixed points has nonzero index, then it is essential, by virtue of Continuity. The result in Section 14.5 shows that the converse holds for convex valued correspondences with convex domains, so for the settings most commonly considered in economics the notion of essentiality does not have independent significance. But it is important to understand that this result does not imply that a component of the set of Nash equilibria of a normal form game of index zero is inessential in the sense of Jiang (1963). In fact Hauk and Hurkens (2002) provide a concrete example of an essential component of index zero. 1.6 Dynamical Systems Dynamic stability is a problematic issue for economic theory. On the one hand, particularly in complex settings, it seems that an equilibrium cannot a plausible prediction unless it can be understood as the end state of a dynamic adjustment process for which it is dynamically stable. In physics and chemistry there are explicit dynamical systems, and with respect to those stability is a well accepted principle. But in economics, explicit models of dynamic adjustment are systematically inconsistent with the principle of rational expectations: if a model of continuous adjustment of prices, or of mixed strategies, is understood and anticipated by the agents in the model, their behavior will exploit the process, not conform to it. Early work in general equilibrium theory (e.g., Arrow and Hurwicz (1958); Arrow et al. (1959)) found special cases, such as a single agent or two goods, in which at least one equilibrium is necessarily stable with respect to natural price adjustment processes. But Scarf (1960) produced examples showing that one could not hope for more general positive results, in the sense that “naive” dynamic adjustment processes, such as Walrasian tatonnement, can easily fail to have stable dynamics, even when there is a unique equilibrium and as few as three goods. A later stream of research (Saari and Simon (1978); Saari (1985); Williams (1985); Jordan (1987)) 1.6. DYNAMICAL SYSTEMS 19 showed that stability is informationally demanding, in the sense that an adjustment process that is guaranteed to return to equilibrium after a small perturbation requires essentially all the information in the matrix of partial derivatives of the aggregate excess demand function. On the whole there seems to be little hope of finding a theoretical basis for an assertion that some equilibrium is stable, or that a stable equilibrium exists. In his Foundations of Economic Analysis Samuelson (1947) Samuelson describes a correspondence principle, according to which the stability of an equilibrium has implications for the qualitative properties of its comparative statics. In this style of reasoning the stability of a given equilibrium is a hypothesis rather than a conclusion, so the problematic state of the existence issue is less relevant. That is, instead of claiming that some dynamical process should result in a stable equilibrium, one argues that equilibria with certain properties are not stable, so if what we observe is an equilibrium, it cannot have these properties. Proponents of such reasoning still need to wrestle with the fact that there is no canonical dynamical process. (The conceptual foundations of economic dynamics, and in particular the principle of rational expectations, were not well understood in Samuelson’s time, and his discussion would be judged today to have various weaknesses.) Here there is the possibility of arguing that although any one dynamical process might be ad hoc, the instability is common to all “reasonable” or “natural” dynamics, for example those in which price adjustment is positively related to excess demand, or that each agent’s mixed strategy adjusts in a direction that would improve her expected utility if other mixed strategies were not also adjusting. From a strictly logical point of view, such reasoning might seem suspect, but it seems quite likely that most economists find it intuitively and practically compelling. In Chapter 15 we present a necessary condition for stability of a component of the set of equilibria that was introduced into game theory by Demichelis and Ritzberger (2003). (See also Demichelis and Germano (2000).) We now give an informal description of this result, with the relevant background, and relate it to Samuelson’s correspondence principle. Let M be an m-dimensional C 2 manifold, where r ≥ 2. A vector field ζ on a set S ⊂ M is a continuous (in the obvious sense) assignment of a tangent vector ζp ∈ Tp M to each p ∈ S. Vector fields have many applications, but by far the most important is that if ζ is defined on an open U ⊂ M and satisfies a mild technical condition, then it determines an autonomous dynamical system: there is an open W ⊂ U × R such that for each p ∈ U, { t ∈ R : (p, t) ∈ W } is an interval containing 0, and a unique function Φ : W → U such that Φ(p, 0) = p for all p and, for each (p, t) ∈ W , the time derivative of Φ at (p, t) is ζΦ(p,t) . If W is the maximal domain admitting such a function, then Φ is the flow of ζ. A point p where ζp = 0 is an equilibrium of ζ. A set A ⊂ M is invariant if Φ(p, t) ∈ A for all p ∈ A and t ≥ 0. The ω-limit set of p ∈ M is \ { Φ(p, t) : t ≥ t0 }. t0 ≥0 The domain of attraction of A is D(A) = { p ∈ M : the ω-limit set of p is nonempty and contained in A }. 20 CHAPTER 1. INTRODUCTION AND SUMMARY A set A ⊂ M is asymptotically stable if: (a) A is compact; (b) A is invariant; (c) D(A) is a neighborhood of A; (d) for every neighborhood Ũ of A there is a neighborhood U such that Φ(p, t) ∈ Ũ for all p ∈ U and t ≥ 0. There is a well known sufficient condition for asymptotic stability. A function f : M → R is ζ-differentiable if the ζ-derivative ζf (p) = d f (Φ(p, t))|t=0 dt is defined for every p ∈ M. A continuous function L : M → [0, ∞) is a Lyapunov function for A ⊂ M if: (a) L−1 (0) = A; (b) L is ζ-differentiable with ζL(p) < 0 for all p ∈ M \ A; (c) for every neighborhood U of A there is an ε > 0 such that L−1 ([0, ε]) ⊂ U. One of the oldest results in the theory of dynamical systems (Theorem 15.4.1) due to Lyapunov, is that if there is a Lyapunov function for A, then A is asymptotically stable. A converse Lyapunov theorem is a result asserting that if A is asymptotically stable, then there is a Lyapunov function for A. Roughly speaking, this is true, but there is in addition the question of what sort of smoothness conditions one may require of the Lyapunov function. The history of converse Lyapunov theorems is rather involved, and the issue was not fully resolved until the 1960’s. We present one such theorem (Theorem 15.5.1) that is sufficient for our purposes. There is a well established definition of the index of an isolated equilibrium of a vector field. We show that this extends to an axiomatically defined vector field index. The theory of the vector field index is exactly analogous to the theories of the degree and the fixed point index, and it can be characterized in terms of the fixed point index. Specifically, a vector field ζ defined on a compact C ⊂ M is index admissible if it does not have any equilibria in the boundary of C. It turns out that if ζ is defined on a neighborhood of C, and satisfies the technical condition guaranteeing the existence and uniqueness of the flow, then the vector field index of ζ is the fixed point index of Φ(·, t)|C for small negative t. (The characterization is in terms of negative time due to an unfortunate normalization axiom for the vector field index that is now traditional.) One may define the vector field index of a compact connected component of the set of equilibria to be the index of the restriction of the vector field to a small compact neighborhood of the component. The definition of asymptotic stability, and in particular condition (d), should make us suspect that there is a connection with the Euler characteristic, because 1.6. DYNAMICAL SYSTEMS 21 for small positive t the flow Φ(·, t) will map neighborhoods of A into themselves. The Lyapunov function given by the converse Lyapunov theorem is used in Section 15.6 to show that if A is dynamically stable and an ANR (otherwise the Euler characteristic is undefined) then the vector field index of A is (−1)m χ(A). In particular, if A is a singleton, then A can only be stable when the vector field index of A is (−1)m . This is the result of Demichelis and Ritzberger. The special case when A = {p0 } is a singleton is a prominent result in the theory of dynamical systems due to Krasnosel’ski and Zabreiko (1984). We now describe the relationship between this result and qualitative properties of an equilibrium’s comparative statics. Consider the following stylized example. Let U be an open subset of Rm ; an element of U is thought of as a vector of endogenous variables. Let P be an open subset of Rn ; an element of P is thought of as a vector of exogenous parameters. Let z : U × P → Rm be a C 1 function, and let ∂x z(x, α) and ∂α z(x, α) denote the matrices of partial derivatives of the components of z with respect to the components of x and α. We think of z as a parameterized vector field on U. An equilibrium for a parameter α ∈ P is an x ∈ U such that z(x, α) = 0. Suppose that x0 is an equilibrium for α0 , and ∂x z(x0 , α0 ) is nonsingular. The implicit function theorem gives a neighborhood V of α and a C 1 function σ : V → U with σ(α0 ) = x0 and z(σ(α), α) = 0 for all α ∈ V . The method of comparative statics is to differentiate this equation with respect to α at α0 , then rearrange, obtaining the equation dσ (α0 ) = −∂x z(x0 , α0 )−1 · ∂α (x0 , α0 ) dα describing how the endogenous variables adjust, in equilibrium, to changes in the vector of parameters. The Krasnosel’ski-Zabreiko theorem implies that if {x0 } is an asymptotically stable set for the dynamical system determined by the vector field z(·, α0 ), then the determinant of −∂x z(x0 , α0 )−1 is positive. This is a precise and general statement of the correspondence principle. Part I Topological Methods 22 Chapter 2 Planes, Polyhedra, and Polytopes This chapter studies basic geometric objects defined by linear equations and inequalities. This serves two purposes, the first of which is simply to introduce basic vocabulary. Beginning with affine subspaces and half spaces, we will proceed to (closed) cones, polyhedra, and polytopes, which are polyhedra that are bounded. A rich class of well behaved spaces is obtained by combining polyhedra to form polyhedral complexes. Although this is foundational, there are nonetheless several interesting and very useful results and techniques, notably the separating hyperplane theorem, Farkas’ lemma, and barycentric subdivision. 2.1 Affine Subspaces Throughout the rest of this chapter we work with a fixed d-dimensional real inner product space V . (Of course we are really talking about Rd , but a more abstract setting emphasizes the geometric nature of the constructions and arguments.) We assume familiarity with the concepts and results of basic linear algebra. An affine combination of y0 , . . . , yr ∈ V is a point of the form α0 y 0 + · · · + αr y r where α = (α0 , . . . , αr ) is a vector of real numbers whose components sum to 1. We say that y0 , . . . , yr are affinely dependent if it is possible to represent a point as an affine combination of these points in two different ways: that is, if X X X X αj = 1 = αj′ and αj y j = αj′ yj , j j j j then α = α′ . If y0 , . . . , yr are not affinely dependent, then they are affinely independent. Lemma 2.1.1. For any y0 , . . . , yr ∈ V the following are equivalent: (a) y0 , . . . , yr are affinely independent; (b) y1 − y0 , . . . , yr − y0 are linearly independent; 23 24 CHAPTER 2. PLANES, POLYHEDRA, AND POLYTOPES (c) therePdo not exist β0 , . . . , βr ∈ R, not all of which are zero, with and j βj yj = 0. P j βj = 0 Proof. Suppose that y0 , . . . , P yr are affinely dependent, and let αj and αj′ be as above. P ′ If P we set βj = αj − Pαj , then j βj = 0 and j βj yj = 0, so (c) implies (a). In turn, if j βj = 0 and j βj yj = 0, then β1 (y1 − y0 ) + · · · + βr (yr − y0 ) = −(β1 + · · · + βr )y0 + β1 y1 + · · · + βr yr = 0, so y1 − y0 , . . . , yr − y0 are linearly dependent. Thus (b) implies (c). If β1 (y1 − y0 ) + · · · + βr (yr − y0 ) = 0, then for any α0 , . . . , αr with α0 + · · · + αr = 1 we can set β0 = −(β1 + · · · + βr ) and αj′ = αj + βj for j = 0, . . . , r, thereby showing that y0 , . . . , yr are affinely dependent. Thus (a) implies (b). The affine hull aff(S) of a set S ⊂ V is the set of all affine combinations of elements of S. The affine hull of S contains S as a subset, and we say that S is an affine subspace if the two sets are equal. That is, S is an affine subspace if it contains all affine combinations of its elements. Note that the intersection of two affine subspaces is an affine subspace. If A ⊂ V is an affine subspace and a0 ∈ A, then { a − a0 : a ∈ A } is a linear subspace, and the dimension dim A of A is, by definition, the dimension of this linear subspace. The codimension of A is d − dim A. A hyperplane is an affine subspace of codimension one. A (closed) half-space is a set of the form H = { v ∈ V : hv, ni ≤ β } where n is a nonzero element of V , called the normal vector of H, and β ∈ R. Of course H determines n and β only up to multiplication by a positive scalar. We say that I = { v ∈ V : hv, ni = β } is the bounding hyperplane of H. Any hyperplane is the intersection of the two half-spaces that it bounds. 2.2 Convex Sets and Cones A convex combination of y0 , . . . , yr ∈ V is a point of the form α0 y0 +· · ·+αr yr where α = (α0 , . . . , αr ) is a vector of nonnegative numbers whose components sum to 1. A set C ⊂ V is convex if it contains all convex combinations of its elements, so that (1 − t)x0 + tx1 ∈ C for all x0 , x1 ∈ C and 0 ≤ t ≤ 1. For any set S ⊂ V the convex hull conv(S) of S is the smallest convex containing S. Equivalently, it is the set of all convex combinations of elements of S. The following fact is a basic tool of geometric analysis. Theorem 2.2.1 (Separating Hyperplane Theorem). If C is a closed convex subset of V and z ∈ V \ C, then there is a half space H with C ⊂ H and z ∈ / H. 2.2. CONVEX SETS AND CONES 25 Proof. The case C = ∅ is trivial. Assuming C 6= ∅, the intersection of C with a closed ball centered at z is compact, and it is nonempty if the ball is large enough, in which case it must contain a point x0 that minimizes the distance to z over the points in this intersection. By construction this point is as close to z as any other point in C. Let n = z − x0 and β = h(x0 + z)/2, ni. Checking that hn, zi > β is a simple calculation. We claim that hx, ni ≤ hx0 , ni for all x ∈ C, which is enough to imply the desired result because hx0 , ni = β − 21 hn, ni. Aiming at a contradiction, suppose that x ∈ C and hx, ni > hx0 , ni, so that hx − x0 , z − x0 i > 0. For t ∈ R we have k(1 − t)x0 + tx − zk2 = kx0 − zk2 + 2thx0 − z, x − x0 i + t2 kx − x0 k2 , and for small positive t this is less than kx0 −zk2 , contradicting the choice of x0 . A convex cone is convex set C that is nonempty and closed under multiplication by nonnegative scalars, so that αx ∈ C for all x ∈ C and α ≥ 0. Such a cone is closed under addition: if x, y ∈ C, then x + y = 2( 21 x + 12 y) is a positive scalar multiple of a convex combination of x and y. Conversely, if a set is closed under addition and multiplication by positive scalars, then it is a cone. The dual of a convex set C is C ∗ = { n ∈ V : hx, ni ≥ 0 for all x ∈ C }. Clearly C ∗ is a convex cone, and it is closed, regardless of whether C is closed, because C ∗ is the intersection of the closed half spaces { n ∈ V : hx, ni ≥ 0 }. An intersection of closed half spaces is a closed convex cone. Farkas’ lemma is the converse of this: a closed convex cone is an intersection of closed half spaces. From a technical point of view, the theory of systems of linear inequalities is dominated by this result because a large fraction of the results about systems of linear inequalities can easily be reduced to applications of it. Theorem 2.2.2 (Farkas’ Lemma). If C is a closed convex cone, then for any b ∈ V \ C there is n ∈ C ∗ such that hn, bi < 0. Proof. The separating hyperplane theorem gives n ∈ V and β ∈ R such that hn, bi < β and hn, xi > β for all x ∈ C. Since 0 ∈ C, β < 0. There cannot be x ∈ C with hn, xi < 0 because we would have hn, αxi < β for sufficiently large α > 0, so n ∈ C∗. The recession cone of a convex set C is RC = { y ∈ V : x + αy ∈ C for all x ∈ C and α ≥ 0 }. Clearly RC is, in fact, a convex cone. Lemma 2.2.3. Suppose C is nonempty, closed, and convex. Then RC is the set of y ∈ V such that hy, ni ≤ 0 whenever H = { v ∈ V : hv, ni ≤ β } is a half space containing C, so RC is closed because it is an intersection of closed half spaces. In addition, C is bounded if and only if RC = {0}. 26 CHAPTER 2. PLANES, POLYHEDRA, AND POLYTOPES Proof. Since C 6= ∅, if y ∈ RC , then hy, ni ≤ 0 whenever H = { v ∈ V : hv, ni ≤ β } is a half space containing C. Suppose that y satisfies the latter condition and x ∈ C. Then for all α ≥ 0, x + αy is contained in every half space containing C, and the separating hyperplane theorem implies that the intersection of all such half spaces is C itself. Thus y is in RC . If RC has a nonzero element, then of course C is unbounded. Suppose that C is unbounded. Fix a point x ∈ C, and let y1 , y2, . . . be a divergent sequence in C. y −x Passing to a subsequence if need be, we can assume that kyjj −xk converges to a unit vector w. To show that w ∈ RC it suffices to observe that if H = { v : hv, ni ≤ β } is a half space containing C, then hw, ni ≤ 0 because yj − x β − hx, ni ,n ≤ → 0. kyj − xk kyj − xk The lineality space of a convex set C is LC = RC ∩ −RC = { y ∈ V : x + αy ∈ C for all x ∈ C and α ∈ R }. The lineality space is closed under addition and scalar multiplication, so it is a linear subspace of V , and in fact it is the largest linear subspace of V contained in RC . Let L⊥ C be the orthogonal complement of LC . Clearly C + LC = C, so C = (C ∩ L⊥ C ) + LC . A convex cone is said to be pointed if its lineality space is {0}. Lemma 2.2.4. If C = 6 V is a closed convex cone, then there is n ∈ C ∗ with hn, xi > 0 for all x ∈ C \ LC . Proof. For n ∈ C ∗ let Zn = { x ∈ C : hx, ni = 0 }. Let n be a point in C ∗ that minimizes the dimension of the span of Zn . Aiming at a contradiction, suppose that 0 6= x ∈ Zn \ LC . Then −x ∈ / C because x ∈ / LC , and Farkas Lemma gives an n′ ∈ C ∗ with hx, n′ i < 0. Then Zn+n′ ⊂ Zn ∩ Zn′ (this inclusion holds for all n, n′ ∈ C ∗ ) and the span of Zn+n′ does not contain x, so it is a proper subspace of the span of Zn . 2.3 Polyhedra A polyhedron in V is an intersection of finitely many closed half spaces. We adopt the convention that V itself is a polyhedron by virtue of being “the intersection of zero half-spaces.” Any hyperplane is the intersection of the two half-spaces it bounds, and any affine subspace is an intersection of hyperplanes, so any affine subspace is a polyhedron. The dimension of a polyhedron is the dimension of its affine hull. Fix a polyhedron P . A face of P is either the empty set, P itself, or the intersection of P with the bounding hyperplane of some half-space that contains P . Evidently any face of P 27 2.3. POLYHEDRA is itself a polyhedron. If F and F ′ are faces of P with F ′ ⊂ F , then F ′ is a face of F , because if F ′ = P ∩ I ′ where I ′ is the bounding hyperplane of a half space containing P , then that half space contains F and F ′ = F ∩ I ′ . A face is proper if it is not P itself. A facet of P is a proper face that is not a proper subset of any other proper face. An edge of P is a one dimensional face, and a vertex of P is a zero dimensional face. Properly speaking, a vertex is a singleton, but we will often blur the distinction between such a singleton and its unique element, so when we refer to the vertices of P , usually we will mean the points themselves. We say that x ∈ P is an initial point of P if there does not exist x′ ∈ P and a nonzero y ∈ RP such that x = x′ + y. If the lineality subspace of P has positive dimension, so that RP is not pointed, then there are no initial points. Proposition 2.3.1. The set of initial points of P is the union of the bounded faces of P . Proof. Let F be a face of P , so that F = P ∩ I where I is the bounding hyperplane of a half plane H containing P . Let x be a point in F . We first show that if x is noninitial, then F is unbounded. Let Let x = x′ + y for some x′ ∈ P and nonzero y ∈ RP . Since x − y and x + y are both in H, they must both be in I, so F contains the ray { x + αy : α ≥ 0, and this ray is contained in P because y ∈ RP , so F is unbounded. We now know that the union of the bounded faces is contained in the set of initial points, and we must show that if x is not contained in a bounded face, it is noninitial. We may assume that F is the smallest face containing x. Since F is unbounded there is a nonzero y ∈ RF . The ray { x − αy : α ≥ 0 } leaves P at some α ≥ 0. (Otherwise the lineality of RP has positive dimension and there are no initial points.) If α > 0, then x is noninitial, and α = 0 is impossible because it would imply that x belonged to a proper face of F . Proposition 2.3.2. If RP is pointed, then every point in P is the sum of an initial point and an element of RP . Proof. Lemma 2.2.4 gives an n ∈ V such that hy, ni > 0 for all nonzero y ∈ RP . Fix x ∈ P . Clearly K = (x − RP ) ∩ P is convex, and it is bounded because its recession cone is contained in −Rp ∩ RP = {0}. Lemma 2.2.3 implies that K is closed, hence compact. Let x′ be a point in K that minimizes hx′ , ni. Then x is a sum of x′ and a point in RP , and if x′ was not initial, so that x′ = x′′ + y where x′′ ∈ P and 0 6= y ∈ RP , then hx′′ , ni < hx′ , ni, which is impossible. Any polyhedron has a standard representation, which is a representation of the form k \ P =G∩ Hi i=1 where G is the affine hull of P and H1 , . . . , Hk are half-spaces. T This representation of P is minimal if it is irredundant, so that for each j, G ∩ i6=j Hi is a proper superset. Starting with any standard representation of P , we can reduce it to a minimal representation by repeatedly eliminating redundant half spaces. We now fix a minimal representation, with Hi = { v ∈ V : hv, ni i ≤ αi } and Ii the bounding hyperplane of Hi . 28 CHAPTER 2. PLANES, POLYHEDRA, AND POLYTOPES Lemma 2.3.3. P has a nonempty interior in the relative topology of G. Proof. For each i we cannot have P ⊂ Ii because that would imply that G ⊂ Ii , making Hi redundant. Therefore P must contain some xi in the interior of each Hi . If x0 is a convex combination of x1 , . . . , xk with positive weights, then x0 is contained in the interior of each Hi . T Proposition 2.3.4. For J ⊂ {1, . . . , k} let FJ = P ∩ j∈J Ij . Then FJ is a face of P , and every nonempty face of P has this form. Proof. If we choose numbers βj > 0 for all j ∈ J, then X X x, βj nj ≤ βj αj j∈J j∈J for all x ∈ P , with equality if and only if x ∈ FJ . We have displayed FJ as a face. Now let F = P ∩ H where H = { v ∈ V : hv, ni ≤ α } is a half-space containing P , and let J = { j : F ⊂ Ij }. Of course F ⊂ FJ . Aiming at a contradiction, suppose there is a point x ∈ FJ \ F . Then hx, ni i ≤ αi for all i ∈ / J and hx, nj i = αj for all j ∈ J. For each i ∈ / J there is a yi ∈ F with hyi , ni i < αi ; let y be a strict convex combination of these. Then hy, ni i < αi for all i ∈ / J and hy, nj i ≤ αj for all j ∈ J. Since x ∈ / H and y ∈ H, the ray emanating from x and passing through y leaves H at y, and consequently it must leave P at y, but continuing along this ray from y does not immediately violate any of the inequalities defining P , so this is a contradiction. This result has many worthwhile corollaries. Corollary 2.3.5. P has finitely many faces, and the intersection of any two faces is a face. Corollary 2.3.6. If F is a face of P and F ′ is a face of F , then F ′ is a face of P . T Proof. If G0 is the affine hull of F , then F =TG0 ∩ i Hi is a standard representation T ′ of F . The proposition implies that F = P ∩ I for some J, that F = F ∩ i i∈J i∈J ′ Ii T for some J ′ , and that F ′ = P ∩ i∈J∪J ′ Ii is a face of P . Corollary 2.3.7. The facets of P are F{1} , . . . , F{k} . The dimension of each F{i} is one less than the dimension of P , The facets are the only faces of P with this dimension. Proof. Minimality implies that each F{i} is a proper face, and the result above implies that F{i} cannot be a proper subset of another proper face. Thus each F{i} is a facet. For each i minimality implies that for each j 6= i there is some xj ∈ F{i} \ F{j} . Let x be a convex combination of these with positive weights, then F{i} contains a neighborhood of x in Ii , so the dimension of F{i} is the dimension of G ∩ Ii , which is one less than the dimension of P . A face F that is not a facet is a proper face of some facet, so its dimension is not greater than two less than the dimension of P . 29 2.4. POLYTOPES Now suppose that P is bounded. Any point in P that is not a vertex can be written as a convex combination of points in proper faces of P . Induction on the dimension of P proves that: Proposition 2.3.8. If P is bounded, then it is the convex hull of its set of vertices. An extreme point of a convex set is a point that is not a convex combination of other points in the set. This result immediately implies that only vertices of P can be extreme. In fact any vertex v is extreme: if {v} = P ∩ I where I is the bounding hyperplane of a half space H containing P , then v cannot be a convex combination of elements of P \ I. 2.4 Polytopes A polytope in V is the convex hull of a finite set of points. Polytopes were already studied in antiquity, but the subject continues to be an active area of research; Ziegler (1995) is a very accessible introduction. We have just seen that a bounded polyhedron is a polytope. The most important fact about polytopes is the converse: Theorem 2.4.1. A polytope is a polyhedron. Proof. Fix P = conv{q1 , . . . , qℓ }. The property of being a polyhedron is invariant under translations: for any x ∈ V , P is a polyhedron if and only if x + P is also a polyhedron. It is also invariant under passage to subspaces: P is a polyhedron in V if and only if it is a polyhedron in the span of P , and in any intermediate subspace. The two invariances imply that we may reduce to a situation where the dimension of P is the same as the dimension of V , and from there we may translate to make the origin of V an interior point of P . Assume this is the case. Let P ∗ = { v ∈ V : hv, pi ≤ 1 for all p ∈ P } and P ∗∗ = { u ∈ V : hu, vi ≤ 1 for all v ∈ P ∗ }. Since P is bounded and has the origin as T an interior point, P ∗ is bounded with the origin in its interior. The formula P ∗ = j { v ∈ V : hv, qj i ≤ 1 } displays P ∗ as a polyhedron, hence a polytope. This argument with P ∗ in place of P implies that P ∗∗ is a bounded polyhedron, so it suffices to show that P ∗∗ = P . The definitions immediately imply that P ⊂ P ∗∗ . Suppose that z ∈ / P . The separating hyperplane theorem gives w ∈ V and β ∈ R such that hw, zi < β and hw, pi > β for all p ∈ P . Since the origin is in P , β < 0. Therefore −w/β ∈ P ∗ , and consequently z ∈ / P ∗∗ . Wrapping things up, there is the following elegant decomposition result: Proposition 2.4.2. Any polyhedron P is the sum of a linear subspace, a pointed cone, and a polytope. 30 CHAPTER 2. PLANES, POLYHEDRA, AND POLYTOPES Proof. Let L be its lineality, and let K be a linear subspace of V that is complementary to L in the sense that K ∩ L = {0} and K + L = V . Let Q = P ∩ K. Then P = Q + L, and the lineality of Q is {0}, so RQ is pointed. Let S be the convex hull of the set of initial points of Q. Above we saw that this is the convex hull of the set of vertices of Q, so S is a polytope. Now Proposition 2.3.2 gives P = L + RQ + S. 2.5 Polyhedral Complexes A wide variety of spaces can be created by taking the union of a finite collection of polyhedra. Definition 2.5.1. A polyhedral complex is a finite set P = {P1 , . . . , Pk } of polyhedra in V such that: (a) F ∈ P whenever P ∈ P and F is a nonempty face of P ; (b) for any 1 ≤ i, j ≤ k, Pi ∩ Pj is a common (possibly empty) face of Pi and Pj . The underlying space of the complex is |P| := [ P, P ∈P and we say that P is a polyhedral subdivision of |P|. The dimension of P is the maximum dimension of any of its elements. To illustrate this concept we mention a structure that was first studied by Descartes, and that has accumulated a huge literature over the centuries . Let x1 , . . . , xn be distinct points in V . The Voronoi diagram determined by these points is P = { PJ : ∅ = 6 J ⊂ {1, . . . , n} } ∪ {∅} where PJ = { y ∈ V : ky − xj k ≤ ky − xi k for all j ∈ J and i = 1, . . . , n } is the set of points such that the xj for j ∈ J are as close to y as any of the points x1 , . . . , xn . From Euclidean geometry we know that the condition ky−xj k ≤ ky−xi k determines a half space in V (a quick calculation shows that ky −xj k2 ≤ ky −xi k2 if and only if hy, xj −xi i ≥ 21 (kxj k2 −kxi k)) so each PJ is a polyhedron, and conditions (a) and (b) are easy consequences of Proposition 2.3.4. Fix a polyhedral complex P. A subcomplex of P is a subset Q ⊂ P that contains all the faces of its elements, so that Q is also a polyhedral complex. If this is the case, then |Q| is a closed (because it is a finite union of closed subsets) subset of |P|. We say that P is a polytopal complex if each Pj is a polytope, in which case P is said to be a polytopal subdivision of |P|. Note that |P| is necessarily 31 2.5. POLYHEDRAL COMPLEXES compact because it is a finite union of compact sets. A k-dimensional simplex is the convex hull of an affinely independent collection of points x0 , . . . , xk . We say that P is a simplicial complex, and that P is a simplicial subdivision of |P|, or a triangulation, if each Pj is a simplex. b b b b b b b b b b b b b b b b b b b b b b b b b b b b b We now describe a general method of subdividing a polytopal complex P into a simplicial complex Q. For each P ∈ P choose wP in the relative interior of P . Let Q be the collection of sets of the form σQ = conv({ wP : P ∈ Q }) where Q is a subset of P that is completely ordered by inclusion. We claim that Q is a simplicial complex, and that |Q| = |P|. Suppose that Q = {P0 , . . . , Pk } where Pi−1 is a proper subset of Pi for 1 ≤ i ≤ k. For each i, wP0 , . . . , wPi−1 are contained in Pi−1 , and wPi is not contained in the affine hull of Pi−1 , so wPi − wP0 is not spanned by wP1 − wP0 , . . . , wPi−1 − wP0 . By induction, wP1 − wP0 , . . . , wPk − wP0 are linearly independent. Now Lemma 2.1.1 implies that wP0 , . . . , wPk are affinely independent, so σQ is a simplex. ′ In addition to Q, suppose that Q′ = {P0′ , . . . , Pk′ ′ } where Pj−1 is a proper subset ′ ′ of Pj for 1 ≤ j ≤ k . Clearly σQ∩Q′ ⊂ σQ ∩ σQ′ , and we claim that it is also the case that the σQ ∩ σQ′ ⊂ σQ∩Q′ . Consider an arbitrary x ∈ σQ ∩ σQ′ . It suffices to show the desired inclusion with Q and Q′ replaced by the smallest sets Q̃ ⊂ Q and Q̃′ ⊂ Q′ such that x ∈ σQ̃ ∩σQ̃′ , so we may assume that x is in the interior of Pk and in the interior of Pk′ ′ , and it follows that Pk = Pk′ ′ . In addition, the ray emanating from wPk and passing through x leaves Pk at a point y ∈ σ{P0 ,...,Pk−1 } ∩ σ{P0′ ,...,Pk′ ′ −1 } , and the claim follows by induction on max{k, k ′ }. We have shown that Q is a simplicial complex. Evidently |Q| ⊂ |P|. Choosing x ∈ |P| arbitrarily, let P be the smallest element of P that contains x. If x = wP , then x ∈ σ{P } , and if P is 0-dimensional then this is the only possibility. Otherwise the ray emanating from wP and passing through x intersects the boundary of P at a point y, and if y ∈ σQ , then x ∈ σQ∪{P } . By induction on the dimension of P we see that x is contained in some element of Q, so |Q| = |P|. 32 CHAPTER 2. PLANES, POLYHEDRA, AND POLYTOPES This construction shows that the underlying space of a polytopal complex is also the underlying space of a simplicial complex. In addition, repeating this process can give a triangulation with small simplices. The diameter of a polytope is the maximum distance between any two of its points. The mesh of a polytopal complex is the maximum of the diameters of its polytopes. Consider an ℓ-dimensional simplex P whose vertices are v0 , . . . , vℓ . The barycenter of P is 1 (v0 + · · · + vℓ ). β(P ) := ℓ+1 In the construction above, suppose that P is a simplicial complex, and that we chose wP = βP for all P . We would like to bound the diameter of the simplices in the subdivision of |P|, which amounts to giving a bound on the maximum distance between the barycenters of any two nested faces. After reindexing, these can be taken to be the faces spanned by v0 , . . . , vk and v0 , . . . , vℓ where 0 ≤ k < ℓ ≤ m and m is the dimension of P. The following rather crude inequality is sufficient for our purposes. 1 1 (v0 + · · · + vk ) − (v0 + · · · + vℓ ) k+1 ℓ+1 X X 1 vi − vj = (k + 1)(ℓ + 1) 0≤i≤k 0≤j≤ℓ X X 1 ≤ kvi − vj k (k + 1)(ℓ + 1) 0≤i≤k 0≤j≤ℓ,j6=i 1 m ≤ (k + 1)ℓD ≤ D. (k + 1)(ℓ + 1) m+1 It follows from this that the mesh of the subdivision of |P| is not greater than m/(m + 1) times the mesh of P. Since we can subdivide repeatedly: Proposition 2.5.2. The underlying space of a polytopal complex has triangulations of arbitrarily small mesh. Simplicial complexes can be understood in purely combinatoric terms. An abstract simplicial complex is a pair (V, Σ) where V is a finite set of vertices and Σ is a collection of subsets of V with the property that τ ∈ Σ whenever σ ∈ Σ and τ ⊂ σ. The geometric interpretation is as follows. Let { ev : v ∈ V } be the standard unit basis vectors of RV : the v-component of ev is 1 and all other coordinates are 0. (Probably most authors would work with R|V | , but our approach is simpler and formally correct insofar as Y X is the set of functions from X to Y .) For each nonempty σ ∈ Σ let Pσ be the convex hull of { ev : v ∈ σ }, and let P∅ = ∅. The simplicial complex P(V,Σ) = { Pσ : σ ∈ Σ } is called the canonical realization of (V, Σ). Let P be a simplicial complex, and let V be the set of vertices of P. For each P ∈ P let σP = P ∩ V be the set of vertices of P , and let Σ = { σP : P ∈ P }. It is easy to see that extending the map v 7→ e affinely on each simplex induces 2.6. GRAPHS 33 a homeomorphism between |P| and |P(V,Σ) |. Thus the homeomorphism type of a simplicial complex is entirely determined by its combinatorics, i.e., the “is a face of” relation between the various simplices. Geometric simplicial complexes and abstract simplicial complexes encompass the same class of homeomorphism types of topological spaces. Simplicial complexes are very important in topology. On the one hand a wide variety of important spaces have simplicial subdivisions, and certain limiting processes can be expressed using repeated barycentric subdivision. On the other hand, the purely combinatoric nature of an abstract simplicial complex allows combinatoric and algebraic methods to be applied. In addition the requirement that a simplicial subdivision exists rules out spaces exhibiting various sorts of pathologies and infinite complexities. A nice example of a space that does not have a simplicial subdivision is the Hawaiian earring, which is the union over all n = 1, 2, 3, . . . of the circle of radius 1/n centered at (1/n, 0) ∈ R2 . 2.6 Graphs A graph is a one dimensional polytopal complex. That is, it consists of finitely many zero and one dimensional polytopes, with the one dimensional polytopes intersecting at common endpoints, if they intersect at all. A one dimensional polytope is just a line segment, which is a one dimensional simplex, so a graph is necessarily a simplicial complex. Relative to general simplicial complexes, graphs sound pretty simple, and from the perspective of our work here this is indeed the case, but the reader should be aware that there is much more to graph theory than this. The formal study of graphs in mathematics began around the middle of the 20th century and quickly became an extremely active area of research, with numerous subfields, deep results, and various applications such as the theory of networks in economic theory. Among the numerous excellent texts in this area, Bollobás (1979) can be recommended to the beginner. This book will use no deep or advanced results about graphs. In fact, almost everything we need to know about them is given in Lemma 2.6.1 below. The main purpose of this section is simply to introduce the basic terminology of the subject, which will be used extensively. Formally, a graph1 is a triple G = (V, E) consisting of a finite set V of vertices and a set E of two element subsets of V . An element of e = {v, w} of E is called an edge, and v and w are its endpoints. Sometimes one writes vw in place of {v, w}. Two vertices are neighbors if they are the endpoints of an edge. The degree of a vertex is the cardinality of its set of neighbors. A walk in G is a sequence v0 v1 · · · vr of vertices such that vj−1 and vj are neighbors for each j = 1, . . . , r. It is a path if v0 , . . . , vr are all distinct. A path is 1 In the context of graph theory the sorts of graphs we describe here are said to be “simple,” to distinguish them from a more complicated class of graphs in which there can be loops (that is, edges whose two endpoints are the same) and multiple edges connecting a single pair of vertices. They are also said to be “undirected” to distinguish them from so-called directed graphs in which each edge is oriented, with a “source” and “target.” 34 CHAPTER 2. PLANES, POLYHEDRA, AND POLYTOPES maximal if it not contained (in the obvious sense) in a longer path. Two vertices are connected if they are the endpoints of a path. This is an equivalence relation, and a component of G is one of the graphs consisting of an equivalence class and the edges in G joining its vertices. We say that G is connected if it has only one component, so that any two vertices are connected. A walk v0 v1 · · · vr is a cycle if r ≥ 3, v0 , . . . , vr−1 are distinct, and vr = v0 . If G has no cycles, then it is said to be acyclic. A connected acyclic graph is a tree. The following simple fact is the only “result” from graph theory applied in this book. It is sufficiently obvious that there would be little point in including a proof. Lemma 2.6.1. If the degree of each of the vertices of G is at most two, then the components of G are maximal paths, cycles, and vertices with no neighbors. This simple principle underlies all the algorithms described in Chapter 3. There are an even number of endpoints of paths in G. If it is known that an odd number represent or embody a situation that is not what we are looking for, then the rest do embody what we are looking for, and in particular the number of “solutions” is odd, hence positive. If it is known that exactly one endpoint embodies what we are not looking for, and that endpoint is easily computed, then we can find a solution by beginning at that point and following the path to its other endpoint. Chapter 3 Computing Fixed Points When it was originally proved, Brouwer’s fixed point theorem was a major breakthrough, providing a resolution of several outstanding problems in topology. Since that time the development of mathematical infrastructure has provided access to various useful techniques, and a number of easier demonstrations have emerged, but there are no proofs that are truly simple. There is an important reason for this. The most common method of proving that some mathematical object exists is to provide an algorithm that constructs it, or some proxy such as an arbitrarily accurate approximation, but for fixed points this is problematic. Naively, one might imagine a computational strategy that tried to find an approximate fixed point by examining the value of the function at various points, eventually halting with a declaration that a certain point was a good approximation of a fixed point. For a function f : [0, 1] → [0, 1] such a strategy is feasible because if f (x) > x and f (x′ ) < x′ (as is the case if x = 0 and x′ = 1 unless one of these is a fixed point) then the intermediate value function implies that there is a fixed point between x and x′ . According to the sign of f (x′′ ) − x′′ , where x′′ = (x+ x′ )/2, we can replace x or x′ with x′′ , obtaining an interval with the same property and half the length. Iterating this procedure provides an arbitrarily fine approximation of a fixed point. In higher dimensions such a computational strategy can never provide a guarantee that the output is actually near a fixed point. To say precisely what we mean by this we need to be a bit more precise. Suppose you set out in search of a fixed point of a continuous function f : X → X (where X is nonempty, compact, and convex subset of a Euclidean space) armed with nothing more than an “oracle” that evaluates f . That is, the only computational resources you can access are the theoretical knowledge that f is continuous, and a “black box” that tells you the value of f at any point in its domain that you submit to it. An algorithm is, by definition, a computational procedure that is guaranteed to halt eventually, so our supposed algorithm for computing a fixed point necessarily halts after sampling the oracle finitely many times, say at x1 , . . . , xn , with some declaration that such-and-such is at least an approximation of a fixed point. Provided that the dimension of X is at least two, the Devil could now change the function to one that agrees with the original function at every point that was sampled, is continuous, and has no fixed points anywhere near the point designated by the algorithm. (One way to do this is 35 36 CHAPTER 3. COMPUTING FIXED POINTS to replace f with h−1 ◦ f ◦ h where h : X → X is a suitable homeomorphism satisfying h(xi ) = xi and h(f (xi )) = f (xi ) for all i = 1, . . . , n.) The algorithm necessarily processes the new function in the same way, arriving at the same conclusion, but for the new function that conclusion is erroneous. Our strategy for proving Brouwer’s fixed point theorem will, of necessity, be a bit indirect. We will prove the existence of objects that we will describe as “points that are approximately fixed.” (The exact nature of such objects will vary from one proof to the next.) An infinite sequence of such points, with the “error” of the approximation converging to zero, will have the property that each of its limit points is a fixed point. The proof that any sequence in a compact space has an accumulation point uses the axiom of choice, and in fact Brouwer’s fixed point theorem cannot be proved without it. The axiom of choice was rather controversial when it emerged, with constructivists (Brouwer himself became one late in life) arguing that mathematics should only consider objects whose definitions are, in effect, algorithms for computing the object in question, or at least a succession of finer and finer approximations. It turns out that this is quite restrictive, so the ‘should’ of the last sentence becomes quite puritanical, at least in comparison with the rich mathematics allowed by a broader set of allowed definitions and accepted axioms, and constructivism has almost completely faded out in recent decades. This chapter studies two algorithmic ideas for computing points that are approximate fixed. One of these uses an algorithm for computing a Nash equilibrium of a two person game. The second may be viewed as a matter of approximating the given function or correspondence with an approximation that is piecewise linear in the sense that its graph is a polyhedral complex. In both cases the algorithm traverses a path of edges in a polyhedral complex, and in the final section we explain recent advances in computer science concerning such algorithms and the problems they solve. 3.1 The Lemke-Howson Algorithm In a two person game each of the two players is required to choose an element from a set of strategies, without being informed of the other player’s choice, and each player’s payoff depends jointly on the pair of strategies chosen. A pair consisting of a strategy for each agent is a Nash equilibrium if neither agent can do better by switching to some other strategy. The “mixed extension” is the derived two person game with the same two players in which each player’s set of strategies is the set of probability measures on that player’s set of strategies in the original game. Payoffs in the mixed extension are computed by taking expectations. In a sense, our primary concern in this section and the next is to show that when the sets of strategies in the given game are finite, the mixed extension necessarily has a Nash equilibrium. But we will actually do something quite a bit more interesting and significant, by providing an algorithm that computes a Nash equilibrium. We will soon see that the existence result is a special case of the Kakutani fixed point theorem. But actually this case is not so “special” because we will eventually 37 3.1. THE LEMKE-HOWSON ALGORITHM see that two person games can be used to approximate quite general fixed point problems. Formally, a finite two person game consists of: (a) nonempty finite sets S = {s1 , . . . , sm } and T = {t1 , . . . , tn } of pure strategies for the two agents, who will be called agent 1 and agent 2; (b) payoff functions u, v : S × T → R. Elements of S × T are called pure strategy profiles. A pure Nash equilibrium is a pure strategy profile (s, t) such that u(s′ , t) ≤ u(s, t) for all s′ ∈ S and v(s, t′ ) ≤ v(s, t) for all t′ ∈ T . To define the mixed extension we need notational conventions for probability measures on finite sets. For each k = 0, 1, 2, . . . let ∆k−1 = { ρ ∈ Rk+ : ρ1 + · · · + ρk = 1 } be the k − 1 dimensional simplex. We will typically think of this as the set of probability measures on a set with k elements indexed by the integers 1, . . . , k. In particular, let S = ∆m−1 and T = ∆n−1 ; elements of these sets are called mixed strategies for agents 1 and 2 respectively. Abusing notation, we will frequently identify pure strategies si ∈ S and tj ∈ T with the mixed strategies in S and T that assign all probability to i and j. An element of S × T is called a mixed strategy profile. We let u and v also denote the bilinear extensions of the given payoff functions to S × T , so the expected payoffs resulting from a mixed strategy profile (σ, τ ) ∈ S × T are u(σ, τ ) = m X n X u(si , tj )σi τj and i=1 j=1 v(σ, τ ) = m X n X v(si , tj )σi τj i=1 j=1 respectively. A (mixed) Nash equilibrium is a mixed strategy profile (σ, τ ) ∈ S × T such that each agent is maximizing her expected payoff, taking the other agent’s mixed strategy as given, so that u(σ ′ , τ ) ≤ u(σ, τ ) for all σ ′ ∈ S and v(σ, τ ′ ) ≤ v(σ, τ ) for all τ ′ ∈ T . The algebraic expressions for expected payoffs given above are rather bulky. There is a way to “lighten” our notation that also allows linear algebra to be applied. Let A and B be the m × n matrices with entries aij = u(si , tj ) and bij = v(si , tj ). Treating mixed strategies as column vectors, we have u(σ, τ ) = σ T Aτ and v(σ, τ ) = σ T Bτ, so that (σ, τ ) is a Nash equilibrium if σ ′ T Aτ ≤ σ T Aτ for all σ ′ ∈ S and σ T Bτ ′ ≤ σ T Bτ for all τ ′ ∈ T . The set of Nash equilibria can be viewed as the set of fixed points of an upper semicontinuous convex valued correspondence β : S ×T → S ×T where β(σ, τ ) = β1 (τ ) × β2 (σ) is given by T β1 (τ ) = argmax σ ′ Aτ σ′ ∈S and β2 (σ) = argmax σ T Bτ ′ . τ ′ ∈T 38 CHAPTER 3. COMPUTING FIXED POINTS A concrete example may help to fix ideas. Suppose that 3 3 4 4 5 A= 4 3 3 and B= 4 2 3 4 3 5 4 m = n = 3, with 2 5 . 2 These payoffs determine the divisions of S and T , according to best responses, shown in Figure 3.1 below. s2 t2 b b t3 S T s3 t1 s2 t2 s1 b b s3 s1 b t1 Figure 3.1 Specifically, for any σ ∈ S, β2 (σ) is the set of probability measures that assign all probability to pure strategies whose associated regions in S contain σ in their closure, and similarly for β1 (τ ). With a little bit of work you should have no difficulty verifying that the divisions of S and T are as pictured, but the discussion uses only the qualitative information shown in the figure, so you can skip this chore if you like. Because the number of pure strategies is quite small, we can use exhaustive search to find all Nash equilibria. For games in which each pure strategy has a unique best response a relatively quick way to find all pure Nash equilibria is to start with an arbitrary pure strategy and follow the sequence of pure best responses until it visits a pure strategy a second time. The last two strategies on the path constitute a Nash equilibrium if they are best responses to each other, and none of the preceeding strategies is part of a pure Nash equilibrium. If there are any pure strategies that were not reached, we can repeat the process starting at one of them, continuing until all pure strategies have been examined. For this example, starting at s1 gives the cycle s1 −→ t2 −→ s3 −→ t1 −→ s2 −→ t3 −→ s1 , so there are no pure Nash equilibria. b t3 39 3.1. THE LEMKE-HOWSON ALGORITHM A similar procedure can be used to find Nash equilibria in which each agent mixes over two pure strategies. If we consider s1 and s2 , we see that there are two mixtures that allow agent 2 to mix over two pure strategies, and we will need to consider both of them, so things are a bit more complicated than they were for pure strategies because the process “branches.” Suppose that agent 1 mixes over s1 and s2 in the proportion that makes t1 and t2 best responses. Agent 2 has a mixture of t1 and t2 that makes s2 and s3 best responses. There is a mixture of s2 and s3 that makes t1 and t3 best responses, and a certain mixture τ ∗ of t1 and t3 makes s1 and s2 best responses. The only hope for continuing this path in a way that might lead to a Nash equilibrium is to now consider the mixture σ ∗ of s1 and s2 that makes t1 and t3 best responses, and indeed, (σ ∗ , τ ∗ ) is a Nash equilibrium. We haven’t yet considered the possibility that agent 1 might mix over s1 and s3 , nor have we examined what might happen if agent 2 mixes over t2 and t3 . There is a mixture of s1 and s3 that allow agent 2 to mix over t1 and t2 , which is a possibility we have already considered and there is a mixture of t2 and t3 that allows agent 1 to mix over s1 and s3 , which we also analyzed above. Therefore there are no additional Nash equilibria in which both agents mix over two pure strategies. Could there be a Nash equilibrium in which one of the agents mixes over all three pure strategies? Agent 2 does have one mixed strategy that allows agent 1 to mix freely, but this mixed strategy assigns positive probability to all pure strategies (such a mixed strategy is said to be totally mixed) so it is not a best response to any of agent 1’s mixed strategies, and we can conclude that there is no Nash equilibrium of this sort. Thus (σ ∗ , τ ∗ ) is the only Nash equilibrium. This sort of analysis quickly becomes extremely tedious as the game becomes larger. In addition, the fact that we are able to find all Nash equilibria in this way does not prove that there is always something to find. Before continuing we reformulate Nash equilibrium using a simple principle with numerous repercussions, namely that a mixed strategy maximizes expected utility if and only if it assigns all probability to pure strategies that maximize expected utility. To understand this formally it suffices to note that agent 1’s problem is to maximize T ui (σ, τ ) = σ Aτ = m X i=1 subject to the constraints σi ≥ 0 for all i and this it follows that: σi n X j=1 Pm i=1 aij τj σi = 1, taking τ as given. From Lemma 3.1.1. A mixed strategy profile (σ, τ ) is a Nash equilibrium if and only if: Pn Pn (a) for each i = 1, . . . , m, either σi = 0 or j=1 ai′ j τj for all j=1 aij τj ≥ ′ i = 1, . . . , m; P Pm ′ (b) for each j = 1, . . . , n, either τj = 0 or m b σ ≥ ij i i=1 i=1 bij ′ σi for all j = 1, . . . , n. For each m + n conditions there are two possibilities, so there are 2m+n cases. For each of these cases the intuition derived from counting equations and unknowns 40 CHAPTER 3. COMPUTING FIXED POINTS suggests that the set of solutions of the conditions given in Lemma 3.1.1 will typically be zero dimensional, which is to say that it is a finite set of points. Thus we expect that the set of Nash equilibria will typically be finite. The Lemke-Howson algorithm is based on the hope that if we relax one of the conditions above, say the one saying that either σ1 = 0 or agent 1’s first pure strategy is a best response, then we may expect that the resulting set will be one dimensional. Specifically, we let M be the set of pairs (σ, τ ) ∈ S × T satisfying: P P (a) for each i = 2, . . . , m, either σi = 0 or nj=1 aij τj ≥ nj=1 ai′ j τj for all i′ = 1, . . . , m; Pm P ′ b σ ≥ (b) for each j = 1, . . . , n, either τj = 0 or m ij i i=1 bij ′ σi for all j = i=1 1, . . . , n. For the rest of the section we will assume that M is 1-dimensional, and that it does not contain any point satisfying more than m + n of the 2(m + n − 1) conditions “σi = 0,” “strategy i is optimal,” “τj = 0,” and “strategy j is optimal,” for 2 ≤ i ≤ m and 1 ≤ j ≤ n. For our example there is a path in M that follows the path (s1 , t2 ) −→ (A, t2 ) −→ (A, B) −→ (C, B) −→ (C, t1) −→ (D, t1 ) −→ (D, E). This path alternates between the moves in S and the moves in T shown in Figure 3.2 below: s2 t2 b b t3 2 Db s3 B 5 t1 C 4 3 t2 s1 b 1 s2 A b s3 b b t1 s1 6 E b t3 Figure 3.2 Let’s look at this path in detail. The best response to s1 is t2 , so (s1 , t2 ) ∈ M. The best response to t2 is s3 , so there is an edge in M leading away from (s1 , t2 ) that increases the probability of s3 until (A, t2 ) is reached. We can’t continue further in this direction because t2 would cease to be a best response. However, t1 becomes a 41 3.1. THE LEMKE-HOWSON ALGORITHM best response at A, so there is the possibility of holding A fixed and moving away from t2 along the edge of T between t1 and t2 . We can’t continue in this way past B because s3 would no longer be a best response. However, at B both s2 and s3 are best responses, so the conditions defining M place no constraints on agent 1’s mixed strategy. Therefore we can move away from (A, B) by holding B fixed and moving into the interior of S in a way that obeys the constraints on agent 2’s mixed strategy, which are that t1 and t2 are best responses. This edge bumps into the boundary of S at C. Since the probability of s3 is now zero, we are no longer required to have it be a best response, so we can continue from B along the edge of T until we arrive at t1 . Since the probability of t2 is now zero, we can move away from C along the edge between s1 and s2 until we arrive at D. Since t3 is now a best response, we can move away from t1 along the edge between t1 and t3 until we arrive at E. As we saw above, (D, E) = (σ ∗ , τ ∗ ) is a Nash equilibrium. We now explain how this works in general. If Y is a proper subset of {1, . . . , m} and D is a nonempty subset of {1, . . . , n}, let SY (D) = { σ ∈ S : σi = 0 for all i ∈ Y and D ⊂ argmax j=1,...,n X i bij σi } be the set of mixed strategies for agent 1 that assign zero probability to every pure strategy in Y and make every pure strategy in D a best response. Evidently SY (D) is a polytope. It is now time to say what “typically” means. The matrix B is said to be in Lemke-Howson general position if, for all Y and D, SY (D) is either empty or (m − |D| − |Y |)-dimensional. That is, SY (D) has the dimensions one would expect by counting equations and unknowns. In particular, if m < |D| + |Y |, then SY (D) is certainly empty. Similarly, if Z is a proper subset of {1, . . . , n} and C is a nonempty subset of {1, . . . , m}, let TZ (C) = { τ ∈ T : τj = 0 for all j ∈ Z and C ⊂ argmax i=1,...,m X j aij τj }. The matrix A is said to be in Lemke-Howson general position if, for all Z and C, TZ (C) is either empty or (n − |C| − |Z|)-dimensional. Through the remainder of this section we assume that A and B are in Lemke-Howson general position. The set of Nash equilibria is the union of the cartesian products SY (D) × TZ (C) over all quadruples (Y, D, Z, C) with Y ∪ C = {1, . . . , m} and Z ∪ D = {1, . . . , n}. The general position assumption implies that if such a product is nonempty, then |Y | + |C| = m and |Z| + |D| = n, so that Y and C are disjoint, as are Z and D, and SY (D) × TZ (C) is zero dimensional, i.e., a singleton. Thus the general position assumption implies that there are finitely many equilibria. In addition, we now have [ M= SY (D) × TZ (C) (∗) where the union is over all quadruples (Y, D, Z, C) such that: 42 CHAPTER 3. COMPUTING FIXED POINTS (a) Y and Z are proper subsets of {1, . . . , m} and {1, . . . , n}; (b) C and D are nonempty subsets of {1, . . . , m} and {1, . . . , n}; (c) {2, . . . , m} ⊂ Y ∪ C; (d) {1, . . . , n} = Z ∪ D; (e) SY (D) and TZ (C) are nonempty. A quadruple (Y, D, Z, C) satisfying these conditions is said to be qualified. A vertex quadruple is a qualified quadruple (Y, D, Z, C) such that SY (D) × TZ (C) is 0-dimensional. It is the starting point of the algorithm if Y = {2, . . . , m}, and it is a Nash equilibrium if 1 ∈ Y ∪ C. An edge quadruple is a qualified quadruple (Y, D, Z, C) such that SY (D) × TZ (C) is 1-dimensional. A vertex quadruple (Y ′ , D ′ , Z ′ , C ′) is an endpoint of this edge quadruple if Y ⊂ Y ′ , D ⊂ D ′ , Z ⊂ Z ′ , and C ⊂ C ′ . It is easy to see that the edge quadruple has two endpoints: if SY (D) is 1-dimensional, then it has two endpoints SY ′ (D ′ ) and SY ′′ (D ′′ ), in which case (Y ′ , D ′ , Z, C) and (Y ′′ , D ′′, Z, C) are the two endpoints of (Y, C, Z, D), and similarly if TZ (C) is q-dimensional. Evidently M is a graph. The picture we would like to establish is that it is a union of loops, paths whose endpoints are the Nash equilibria and the starting point of the algorithm, and possibly an isolated point if the starting point of the algorithm happens to be a Nash equilibrium. If this is the case we can find a Nash equilibrium by following the path leading away from the starting point until we reach its other endpoint, which is necessarily a Nash equilibrium. Put another way, we would like to show that a vertex quadruple is an endpoint of zero, one, or two edge quadruples, and: (i) if it is an endpoint of no edge quadruples, then it is both the starting point of the algorithm and a Nash equilibrium; (ii) if it is an endpoint of one edge quadruple, then it is either the starting point of the algorithm, but not a Nash equilibrium, or a Nash equilibrium, but not the starting point of the algorithm; (iii) if it is an endpoint of two edge quadruples, then it is neither the starting point of the algorithm nor a Nash equilibrium. So, suppose that (Y, D, Z, C) is a vertex quadruple. There are two main cases to consider, the first of which is that it is a Nash equilibrium, so that 1 ∈ Y ∪ C. If 1 ∈ Y , then (Y \ {1}, D, Z, C) is the only quadruple that could be an edge quadruple that has (Y, D, Z, C) as an endpoint, and it is in fact such a quadruple: (a)-(d) hold obviously, and SY \{1} (D) is nonempty because SY (D) is a nonempty subset. If 1 ∈ C, then (Y \ {1}, D, Z, C) is the only quadruple that could be an edge quadruple that has (Y, D, Z, C) as an endpoint, and the same logic shows that it is except when C = {1}, in which case Y = {2, . . . , m}, i.e., (Y, D, Z, C) is the starting point of the algorithm. Summarizing, if (Y, D, Z, C) is a Nash equilibrium vertex quadruple, it is an endpoint of precisely one edge quadruple except when it 3.1. THE LEMKE-HOWSON ALGORITHM 43 is the starting point of the algorithm, in which case it is not an endpoint of any edge quadruple. Now suppose that (Y, D, Z, C) is not a Nash equilibrium. Since SY (D) and TZ (C) are 0-dimensional, |D| + |Y | = m and |C| + |Z| = n, so, in view of (e), one of the two intersections Y ∩ C and Z ∩ D is a singleton while the other is empty. First suppose that Z ∩ D = {j}. Then (Y, D, Z \ {j}, C) and (Y, D \ {j}, Z, C) are the only quadruples that might be edge quadruples that have (Y, D, Z, C) as an endpoint, and in fact both are: again (a)-(d) hold obviously (except that one must note that |D| ≥ 2 because |Z ∪ D| = n, |Z| < n, and |Z ∩ D| = 1) and SY (D \ {j}) and TZ\{j} (C) are both nonempty because SY (D) and TZ (C) are nonempty subsets. On the other hand, if Y ∩ C = {i}, then (Y \ {i}, D, Z, C) and (Y, D, Z, C \ {i}) are the only quadruples that might be edge quadruples that have (Y, D, Z, C) as an endpoint. By the logic above, (Y \ {i}, D, Z, C) certainly is, and (Y, D, Z, C \ {i}) is if C 6= {i}, and not otherwise. When C = {i} we have Y ∪ C = {2, . . . , m} and Y ∩ C = {i} = C, so Y = {2, . . . , m}, which is to say that (Y, D, Z, C) is the starting point of the algorithm. In sum, if (Y, D, Z, C) is not a Nash equilibrium, it is an endpoint of precisely two edge quadruples except when it is the starting point of the algorithm, in which case is an endpoint of precisely one edge quadruple. Taken together, these observations verify (i)-(iii), and complete the formal verification of the main properties of the Lemke-Howson algorithm. Two aspects of the procedure are worth noting. First, when SY (D) × TZ (C) is a vertex that is an endpoint of two edges, the two edges are either SY \{i} (D) × TZ (C) and SY (D) × TZ (C \ {i}) for some i or SY (D) × TZ\{j} (C) and SY (D \ {j}) × TZ (C) for some j. In both cases one of the edges is the cartesian product of a line segment in S and a point in T while the other is the cartesian product of a point in S and a line segment in T . Geometrically, the algorithm alternates between motion in S and motion in T . Second, although our discussion has singled out the first pure strategy of agent 1, this was arbitrary, and any pure strategy of either player could be designated for this role. It is quite possible that different choices will lead to different equilibria. In addition, although the algorithm was described in terms of starting at this pure strategy and its best response, the path following procedure can be started at any endpoint of a path in M. In particular, having computed a Nash equilibrium using one designated pure strategy, we can then switch to a different designated pure strategy and follow the path, for the new designated pure strategy, going away from the equilibrium. This path may go to the starting point of the algorithm for the new designated pure strategy, but it is also quite possible that it leads to a Nash equilibrium that cannot be reached directly by the algorithm using any designated pure strategy. Equilibria that can be reached by repeated applications of this maneuver are said to be accessible. A famous example due to Robert Wilson (reported in Shapley (1974))) shows that there can be inaccessible equilibria even in games with a surprisingly small number of pure strategies. 44 3.2 CHAPTER 3. COMPUTING FIXED POINTS Implementation and Degeneracy Resolution We have described the Lemke-Howson algorithm geometrically, in terms that a human can picture, but that it not quite the same thing as providing a description in terms of concrete, fully elaborated, algebraic operations. This section provides such a description. In addition, our discussion to this point has assumed a game in Lemke-Howson general position. In order to prove that any game has a Nash equilibrium it suffices to show that games in general position are dense in the set of pairs (A, B) of m × n matrices, because it is easy to see that if (Ar , B r ) is a sequence converging to (A, B), and for each r we have a Nash equilibrium (σ r , τ r ) of (the game with payoff matrices) (Ar , B r ), then along some subsequence we have (σ r τ r ) → (σ, τ ), and (σ, τ ) is a Nash equilibrium of (A, B). However, we will do something quite a bit more elegant and useful, providing a refinement of the LemkeHowson algorithm that works even for games that are not in Lemke-Howson general position. The formulation of the Nash equilibrium problem we have been working with so far is a matter of finding u∗ , v ∗ ∈ R, s′ , σ ′ ∈ Rm , and t′ , τ ′ ∈ Rn such that: Aτ ′ + s′ = u∗ em , B T σ ′ + t′ = v ∗ en , hs′ , σ ′ i = 0 = ht′ , τ ′ i, hσ ′ , em i = 1 = hτ ′ , en i, s′ , σ ′ ≥ 0 ∈ Rm , t′ , τ ′ ≥ 0 ∈ Rn . The set of Nash equilibria is unaffected if we add a constant to every entry in a column of A, or to every entry of a row of B. Therefore we may assume that all the entries of A and B are positive, and will do so henceforth. Now the equilibrium utilities u∗ and v ∗ are necessarily positive, so we can divide in the system above, obtaining the system Aτ + s = em , B T σ + t = en , hs, σi = 0 = ht, τ i, s, σ ≥ 0 ∈ Rm , t, τ ≥ 0 ∈ Rn together with the formulas hσ, em i = 1/v ∗ and hτ, en i = 1/u∗ for computing equilibrium expected payoffs. The components of s and t are called slack variables. This new system is not quite equivalent to the one above because the one above in effect requires that σ and τ each have some positive components. The new system has another solution that does not come from a Nash equilibrium, namely σ = 0, τ = 0, s = em , and t = en . It is called the extraneous solution. To see that this is the only new solution consider that if σ = 0, then t = en , so that ht, τ i = 0 implies τ = 0, and similarly τ = 0 implies that σ = 0. We now wish to see the geometry of the Lemke-Howson algorithm in the new coordinate system. Let S ∗ = { σ ∈ Rm : σ ≥ 0 and B T σ ≤ en } and T ∗ = { τ ∈ Rn : τ ≥ 0 and Aτ ≤ em }. P There is a bijection σ 7→ σ/ i σi between the points on the upper surface of S ∗ , namely those for which some component of en − B T σ is zero, and the points of S, and similarly for T ∗ and T . For the game studied in the last section the polytopes S ∗ and T ∗ are shown in Figure 3.3 below. Note that the best response regions in Figure 3.1 have become facets. 45 3.2. IMPLEMENTATION AND DEGENERACY RESOLUTION τ2 σ2 S∗ T∗ t3 s3 t1 τ3 s1 σ3 s2 t2 σ1 τ1 Figure 3.3 We now transport the Lemke-Howson algorithm to this framework. Let M ∗ be the set of (σ, τ ) ∈ S ∗ × T ∗ such that, when we set s = em − Aτ and t = en − B T σ, we have (a) for each i = 2, . . . , m, either σi = 0 or si = 0; (b) for each j = 1, . . . , n, either τj = 0 or tj = 0. For our running example we can follow a path in M ∗ from (0, 0) to the image of the Nash equilibrium, as shown in Figure 3.4. This path has a couple more edges than the one in Figure 3.2, but there is the advantage of starting at (0, 0), which is a bit more canonical. If we set 0 A , q = eℓ , y = (σ, τ ), and x = (s, t), ℓ = m + n, C= BT 0 the system above is equivalent to Cy + x = q hx, yi = 0 x, y ≥ 0 ∈ Rℓ . (∗) This is called the linear complementarity problem. It arises in a variety of other settings, and is very extensively studied. The framework of the linear complementarity problem is simpler conceptually and notationally, and it allows somewhat greater generality, so we will work with it for the remainder of this section. 46 CHAPTER 3. COMPUTING FIXED POINTS Let P = { (x, y) ∈ Rℓ × Rℓ : x ≥ 0, y ≥ 0, and Cy + x = q }. We will assume that all the components of q are positive, that all the entries of C are nonnegative, and that each row of C has at least one positive entry, so that P is bounded and thus a polytope. In general a d-dimensional polytope is said to be simple if each of its vertices is in exactly d facets. The condition that generalizes the general position assumption on A and B is that P is simple. Let the projection of P onto the second copy of Rℓ be Q = { y ∈ Rℓ : y ≥ 0 and Cy ≤ q }. 0 A and q = eℓ , then Q = S ∗ × T ∗ , and each edge of Q is either the If C = BT 0 cartesian product of a vertex of S ∗ and an edge of T ∗ or the cartesian product of an edge of S ∗ and a vertex of T ∗ . τ2 σ2 t3 s3 b t1 s1 σ3 b b s2 b t2 σ1 τ1 Figure 3.4 Our problem is to find a (x, y) ∈ P such that x 6= 0 satisfying the “complementary slackness condition” hx, yi = 0. The algorithm follows the path starting at (x, y) = (q, 0) in M ∗∗ = { (x, y) ∈ P : x2 y2 + · · · + xℓ yℓ = 0 }. The equation x2 y2 +· · ·+xℓ yℓ = 0 encodes the condition that for each j = 2, . . . , ℓ, either xj = 0 or yj = 0. Suppose we are at a vertex (x, y) of P satisfying this condition, τ3 47 3.2. IMPLEMENTATION AND DEGENERACY RESOLUTION but not x1 y1 = 0. Since P is simple, exactly ℓ of the variables x2 , . . . , xℓ , y2 , . . . , yℓ vanish, so there is some i such that xi = 0 = yi . The portion of P where xi ≥ 0 and the other ℓ − 1 variables vanish is an edge of P whose other endpoint is the first point where one of the ℓ variables that are positive at (x, y) vanishes. Again, since P is simple, precisely one of those variables vanishes there. How should we describe moving from one vertex to the next algebraically? Consider specifically the mave away from (0, q). Observe that P is the graph of the function y 7→ q − Cy from Q to Rℓ . We explicitly write out the system of equations describing this function: x1 = q1 − c11 y1 − · · · − c1ℓ yℓ , .. .. .. .. . . . . xi = qi − ci1 y1 − · · · − ciℓ yℓ , .. .. .. .. . . . . xℓ = qℓ − cℓ1 y1 − · · · − cℓℓ yℓ . As we increase y1 , holding 0 = y2 = · · · = yℓ , the constraint we bump into first is the one requiring xi ≥ 0 for the i for which qi /ci1 is minimal. If i = 1, then the point we arrived at is a solution and the algorithm halts, so we may suppose that i ≥ 2. We now want to describe P as the graph of a function with domain in the xi , y2, . . . , yℓ coordinate subspace, and x1 , . . . , xi−1 , y1 , xi+1 , . . . , xℓ as the variables parameterizing the range. To this end we rewrite the ith equation as y1 = 1 1 ci2 ciℓ qi − xi − y2 − · · · − yℓ . ci1 ci1 ci1 ci1 Replacing the first equation above with this, and substituting it into the other equations, gives c11 ci2 c11 ciℓ c11 c11 xi − c12 − y2 − · · · − c1ℓ − yℓ , x1 = q1 − qi − − ci1 ci1 ci1 ci1 .. .. .. .. .. . . . . . 1 ci2 1 qi − xi − y2 − ··· − ci1 ci1 ci1 .. .. .. . . . cℓ1 cℓ1 cℓ1 ci2 = qℓ − xi − cℓ2 − y2 − · · · − cℓℓ − qi − − ci1 ci1 ci1 y1 = .. . xℓ ciℓ yℓ , ci1 .. . cℓ1 ciℓ yℓ . ci1 This is not exactly a thing of beauty, but it evidently has the same form as what we started with. The data of the algorithm consists of a tableau [q ′ , C ′ ], a list describing how the rows and the last ℓ columns of the tableau correspond to the original variables of the problem, and the variable that vanished when we arrived at the corresponding vertex. If this variable is either x1 or y1 we are done. Otherwise the data is updated by letting the variable that is complementary to this one 48 CHAPTER 3. COMPUTING FIXED POINTS increase, finding the next variable that will vanish when we do so, then updating the list and the tableau appropriately. This process is called pivoting. We can now describe how the algorithm works in the degenerate case when P is not necessarily simple. From a conceptual point of view, our method of handling degenerate problems is to deform them slightly, so that they become nondegenerate, but in the end we will have only a combinatoric rule for choosing the next pivot variable. Let L = { (x, y) ∈ Rℓ × Rℓ : Cy + x = q }, let α1 , . . . , αℓ , β1 , . . . , βℓ be distinct positive integers, and for ε > 0 let Pε = { (x, y) ∈ L : xi ≥ −εαi and yi ≥ −εβi for all i = 1, . . . , ℓ }. If (x, y) is a vertex of Pε , then there are ℓ variables, which we will describe as “free variables,” whose corresponding equations xi = εαi and yi = εβi determine (x, y) as the unique member of L satisfying them. At the point in L where these equations are satisfied, the other variables can be written as linear combinations of the free variables, and thus as polynomial functions of ε. Because the αi and βi are all different, there are only finitely many values of ε such that any of the other variables vanish at this vertex. Because there are finitely many ℓ-element subsets of the 2ℓ variables, it follows that Pε is simple for all but finitely many values of ε. In particular, for all ε in some interval (0, ε) the combinatoric structure of Pε will be independent of ε. In addition, we do not actually need to work in Pε because the pivoting procedure, applied to the polytope Pε for such ε, will follow a well defined path that can be described in terms of a combinatoric procedure for choosing the next pivot variable. To see what we mean be this consider the problem of finding which xi first goes below −εαi as we go out the line y1 ≥ −εβ1 , y2 = −εβ2 , . . . , yℓ = −εβℓ . This is basically a process of elimination. If ci1 ≤ 0, then increasing y1 never leads to a violation of the ith constraint, so we can begin by eliminating all those i for which ci1 is not positive. Among the remaining i, the problem is to find the i for which 1 ci2 ciℓ 1 qi + εαi + εβ2 + · · · + εβℓ ci1 ci1 ci1 ci1 is smallest for small ε > 0. The next step is to eliminate all i for which qi /c1i is not minimal. For each i that remains the expression 1 αi ci2 β2 ciℓ ε + ε + · · · + εβℓ ci1 ci1 ci1 has a dominant term, namely the term, among those with nonzero coefficients, whose exponent is smallest. The dominant terms are ordered according to their values for small ε > 0: (a) terms with positive coefficients are greater than terms with negative coefficients; (b) among terms with positive coefficients, those with smaller exponents are greater than terms with larger exponents, and if two terms have equal exponents they are ordered according to the coefficients; 3.3. USING GAMES TO FIND FIXED POINTS 49 (c) among terms with negative coefficients, those with larger exponents are greater than terms with smaller exponents, and if two terms have equal exponents they are ordered according to the coefficients. We now eliminate all i for which the dominant term is not minimal. All remaining i have the same dominant term, and we continue by subtracting off this term and comparing the resulting expressions in a similar manner, repeating until only one i remains. This process does necessarily continue until only one i remains, because if other terms of the expressions above fail to distinguish between two possibilities, eventually there will be a comparison involving the terms εαi /ci1 , and the exponents α1 , . . . , αℓ , β1 , . . . , βℓ are distinct. Let’s review the situation. We have given an algorithm that finds a solution of the linear complementarity problem (∗) that is different from (q, 0). The assumptions that insure that the algorithm works are that q ≥ 0 and that P is a polytope. In particular, these assumptions are satisfied when the linear complementarity problem is derived from a two person game with positive payoffs, in which case any solution other than (q, 0) corresponds to a Nash equilibrium. Therefore any two person game with positive payoffs has a Nash equilibrium, but since the equilibrium conditions are unaffected by adding a constant to a player’s payoffs, in fact we have now shown that any two person game has a Nash equilibrium. There are additional issues that arise in connection with implementing the algorithm, since computers cannot do exact arithmetic on arbitrary real numbers. One possibility is to require that the entries of q and C lie in a set of numbers for which exact arithmetic is possible—usually the rationals, but there are other possibilities, at least theoretically. Alternatively, one may work with floating point numbers, which is more practical, but also more demanding because there are issues associated with round-off error, and in particular its accumulation as the number of pivots increases. The sort of pivoting we have studied here also underlies the simplex algorithm for linear programming, and the same sorts of ideas are applied to resolve degeneracy. Numerical analysis for linear programming has a huge amount of theory, much of which is applicable to the Lemke-Howson algorithm, but it is far beyond our scope. 3.3 Using Games to Find Fixed Points It is surprisingly easy to use the existence of equilibrium in two person games to prove Kakutani’s fixed point theorem in full generality. The key idea has a simple description. Fix a nonempty compact convex X ⊂ Rd , and let F : X → X be a (not necessarily convex valued or upper semicontinuous) correspondence with compact values. We can define a two person game with strategy sets S = T = X by setting ( 0, s 6= t, u(s, t) = − min ks − xk2 and v(s, t) = x∈F (t) 1, s = t. If (s, t) is a Nash equilibrium, then s ∈ F (t) and t = s, so s = t is a fixed point. Conversely, if x is a fixed point, then (x, x) is a Nash equilibrium. 50 CHAPTER 3. COMPUTING FIXED POINTS Of course this observation does not prove anything, but it does point in a useful direction. Let x1 , . . . , xn , y1 , . . . , yn ∈ X be given. We can define a finite two person game with n × n payoff matrices A = (aij ) and B = (bij ) by setting ( 0, i 6= j, aij = −kxi − yj k2 and bij = 1, i = j. Let (σ, τ ) ∈ ∆n−1 ×∆n−1 be a mixed strategy profile. Clearly τ is a best response to σ if and only if it assigns all probability to the strategies that are assigned maximum probability by σ, which is to say that τj > 0 implies that σj ≥ σi for all i. Understanding when σ is a best response to τ requires a brief calculation. Let P z = nj=1 τj yj . For each i we have X j aij τj = − X j τj kxi − yj k2 = − X τj xi − yj , xi − yj j X X X =− τj xi , xi + 2 τj xi , yj − τj yj , yj j j j = − xi , xi + 2 xi , z − hz, zi + C = −kxi − zk2 + C P where C = kzk2 − nj=1 τj kyj k2 is a quantity that does not depend on i. Therefore σ is a best response to τ if and only if it assigns all probability to those i with xi as close to z as possible. If y1 ∈ F (x1 ), . . . , yn ∈ F (xn ), then there is a sense in which a Nash equilibrium may be regarded as a “point that is approximately fixed.” We are going to make this precise, thereby proving Kakutani’s fixed point theorem. Assume now that F is upper semicontinuous with convex values. Define sequences x1 , x2 , . . . and y1 , y2 , . . . inductively as follows. Choose x1 arbitrarily, and let y1 be an element of F (x1 ). Supposing that x1 , . . . , xn and y1 , . . . , yn , have already been determined, let (σ n , τ n ) be a Nash equilibrium of the two person game with payoff matrices An = (anij ) and B nP= (bnij ) where anij = −kxi − yj k2 and bnij is 1 if i = j and 0 otherwise. Let xn+1 = j τj yj , and choose yn+1 ∈ F (yn+1). Let x∗ be an accumulation point of the sequence {xn }. To show that x∗ is a fixed point of F it suffices to show that it is an element of the closure of any convex neighborhood V of F (x∗ ). Choose 0 such that F (x) ⊂ V for all x ∈ Uδ (x∗ ). Pδ > Consider an n such that xn+1 = j τjn yj ∈ Uδ/3 (x∗ ) and at least one of x1 , . . . , xn is also in this ball. Then the points in x1 , . . . , xn that are closest to xn+1 are in U2δ/3 (xn+1 ) ⊂ Uδ (x∗ ), so xn+1 is a convex combination of points in V , and is therefore in V . Therefore x∗ is in the closure of the set of xn that lie in V , and thus in the closure of V . In addition to proving the Kakutani fixed point theorem, we have accumulated all the components of an algorithm for computing approximately fixed points of a continuous function f : X → X. Specifically, for any error tolerance ε > 0 we compute the sequences x1 , x2 , . . . and y1 , y2 , . . . with f in place of F , halting when kxn+1 −f (xn+1 )k < ε. The argument above shows that this is, in fact, an algorithm, in the sense that it is guaranteed to halt eventually. This algorithm is quite new. Code implementing it exists, and the initial impression is that it performs quite well. But it has not been extensively tested. 51 3.4. SPERNER’S LEMMA There is one more idea that may have some algorithmic interest. As before, we consider points x1 , . . . , xn , y1 , . . . , yn ∈ Rd . Define a correspondence Φ : Rd → Rd by letting Φ(z) be the convex hull of { yj : j ∈ argmini kz − xi k } when z ∈ PJ . (Evidently this construction is closely related to the Voronoi diagram determined by x1 , . . . , xn . Recall that this is the polyhedral decomposition of Rd whose nonempty polyhedra are the sets PJ = { z ∈ V : J ⊂ argmini kz − xi k } where ∅ = 6 J ⊂ {1, . . . , n}.) Clearly Φ is upper semicontinuous and convex valued. Suppose that P z is a fixed point of this correspondence. Then z is a convex combination / argmini kz − xi k. Let J = { j : yj > j τj yj with yj = 0 if j ∈ 0 }. If σi = 1/|J| when i ∈ J and σi = 0 when i ∈ / J, then (σ, τ ) is a Nash equilibrium of the game derived from xP , . . . , x , y , . . . , yn . Conversely, if (σ, τ ) is 1 n 1 a Nash equilibrium of this game, then j∈J τj yj is a fixed point of Φ. In a sense, the algorithm described above approximates the given correspondence F with a correspondence of a particularly simple type. We may project the path of the Lemke-Howson algorithm, in its application to the game derived from x1 , . . . , xn , y1, . . . , yn , into this setting. Define Φ1 : Rd → Rd by letting Φ1 (z) be the convex hull of { yi : i ∈ {1}∪argmini kz−xi k }. Suppose that (σ, τ ) is an element of the set M defined in Section 3.1, so that all the conditions of Nash equilibrium are satisfied except that it may be the case that σ1 > 0Peven if the first pure strategy is not optimal. Let J = { j : τj > 0 }, and let z = j τj yj . Then J ⊂ { i : σi > 0 } ⊂ {1} ∪ argminj kz − xj k, so z ∈ Φ1 (z). Conversely, suppose P z is a fixed point of Φ1 , and let J = argminj kz − xj k. Then z = j τj yj for some τ ∈ ∆n−1 with τj = 0 for all j ∈ / {1} ∪ J. If we let σ be the element of ∆n−1 such that σi = 1/|{1} ∪ J| if i ∈ J and σi = 0 if i ∈ / {1} ∪ J, then (σ, τ ) ∈ M. If n is large one might guess that there is a sense in which operating in Rd might be less burdensome than working in ∆n−1 × ∆n−1 , but it seems to be difficult to devise algorithms that take concrete advantage of this. Nonetheless this setup does give a picture of what the Lemke-Howson algorithm is doing that has interesting implications. For example, if there is no point in Rd that is equidistant from more than d + 1 points, then there is no point (σ, τ ) ∈ M with σi > 0 for more than d + 2 indices. This gives a useful upper bound on the number of pivots of the Lemke-Howson algorithm. 3.4 Sperner’s Lemma Sperner’s lemma is the traditional method of proving Brouwer’s fixed point theorem without developing the machinery of algebraic topology. It dates from the late 1920’s, which was a period during which the methods developed by Poincaré and Brouwer were being recast in algebraic terms. Most of our work will take place in ∆d−1 . Let P be a triangulation of ∆d−1 . For k = 0, . . . , d − 1 let P k be the set of k-dimensional elements of P. Let V = P 0 be the set of vertices of P, and fix a function ℓ : V → {1, . . . , d}. We say that ℓ is a labelling for P, and we call ℓ(v) the label of v. If ℓ(v) 6= i for all v ∈ V with vi = 0, then ℓ is a Sperner labelling. Let e1 , . . . , ed be the 52 CHAPTER 3. COMPUTING FIXED POINTS standard unit basis vectors of Rd . Then ℓ is a Sperner labelling if ℓ(v) ∈ { i1 , . . . , ik } whenever v is contained in the convex hull of ei1 , . . . , eik . We say that σ ∈ P d−1 with vertex set {v1 , . . . , vd } is completely labelled if {ℓ(v1 ), . . . , ℓ(vd )} = {1, . . . , d}. 3 b 3 b 3 b 3 1 b b b 1 3 b 1 1 1 b 2 b b b 2 b b b 1 2 b 1 b b b b b 1 2 2 1 2 b 2 Figure 3.5 Theorem 3.4.1 (Sperner’s Lemma). If ℓ is a Sperner labelling, then the number of completely labelled simplices is odd. Before proving this, let’s see why it’s important: Proof of Brower’s Theorem. Let f : ∆d−1 → ∆d−1 be a continuous function. Proposition 2.5.2 implies that there is a sequence P1 , P2 , . . . of triangulations whose meshes converge to zero. For each r = 1, 2, . . . let V r be the set of vertices of Pr . If any of the elements of V r is a fixed point we are done, and otherwise we can define ℓr : V r → {0, . . . , d} by letting ℓr (v) be the smallest index i such that vi > fi (v). Evidently ℓr is a Sperner labelling, so there is a completely labelled simplex with vertices v1r , . . . , vdr where ℓr (vir ) = i. Passing to a subsequence, we may assume that the sequences vi1 , vi2 , . . . have a common limit x. For each i we have fi (x) = lim fi (v r ) ≤ lim vir = xi , and P i fi (x) = 1 = P i xi , so f (x) = x. We will give two proofs of Sperner’s lemma. The first of these uses facts about volume, and in this sense is less elementary than the second (which is given in the next section) but it quickly gives both an intuition for why the result is true and an important refinement. We fix an affine isometry1 A : H d−1 → Rd−1 such that D = det A(e2 ) − A(e1 ), . . . , A(ed ) − A(e1 ) > 0. 1 If (X, dX ) and (Y, dY ) are metric spaces, a function ι : X → Y is an isometry if dY (ι(x), ι(x′ )) = dX (x, x′ ) for all x, x′ ∈ X. 53 3.4. SPERNER’S LEMMA (We regard the determinant as a function of (d − 1)-tuples of elements of Rd−1 be identifying the tuple with the matrix with those columns.) A theorem of Euclid is that the volume of a pyramid is one third of the product of the height and the area of the base. The straightforward2 generalization of this to arbitrary dimensions implies that d!1 D is the volume of ∆d−1 . For each v ∈ V there is an associated function v : [0, 1] → ∆d−1 given by v(t) = (1 − t)v + teℓ(v) . Consider a simplex σ ∈ P d−1 that is the convex hull of v1 , . . . , vd ∈ V , where these vertices are indexed in such a way that det A(v2 ) − A(v1 ), . . . , A(vd ) − A(v1 ) > 0. We define a function pσ : [0, 1] → R by setting pσ (t) = 1 det A(v2 (t)) − A(v1 (t)), . . . , A(vd (t)) − A(v1 (t)) . d! For 0 ≤ t ≤ 1 let σ(t) be the convex hull of v1 (t), . . . , vd (t). Then pσ (t) is the volume of σ(t) when t is small. We have 1 pσ (1) = det A(eℓ(v2 ) ) − A(eℓ(v1 ) ), . . . , A(eℓ(vd ) ) − A(eℓ(v1 ) ) . d! If σ is not completely labelled, then pσ (1) = 0 because some A(eℓ(vi ) ) − A(eℓ(v1 ) ) is zero or two of them are equal. If σ is completely labelled, then we say that the labelling is orientation preserving on σ if pσ (1) > 0, in which case pσ (1) = d!1 D, and orientation reversing on σ if pσ (1) < 0, in which case pσ (1) = − d!1 D. Let p : [0, 1] → R be the sum X p(t) = pσ (t). σ∈P d−1 Elementary properties of the determinant imply that each pσ and p are polynomial functions. For sufficiently small t the simplices σ(t) are the (d − 1)-dimensional simplices of a triangulation of ∆d−1 .3 Therefore p(t) is d!1 D for small t. Since p is a 2 Actually, it is straightforward if you know integration, but Gauss regarded this as “too heavy” a tool, expressing a wish for a more elementary theory of the volume of polytopes. The third of Hilbert’s famous problems asks whether it is possible, for any two polytopes of equal volume, to triangulate the first in such a way that the pieces can be reassembled to give the second. This was resolved negatively by Hilbert’s student Max Dehn within a year of Hilbert’s lecture laying out the problems, and it remains the case today that there is no truly elementary theory of the volumes of polytopes. In line with this, our discussion presumes basic facts about d-dimensional measure of polytopes in Rd that are very well understood by people with no formal mathematical training, but which cannot be justified formally without appealing to relatively advanced theories of measure and integration. 3 This is visually obvious, and a formal proof would be tedious, so we provide only a sketch. Suppose that for each v ∈ V we have a path connected neighborhood Uv of v in the interior of the smallest face of ∆d−1 containing v, and this system of neighborhoods satisfies the condition that for any simplex in P, say with vertices v1 , . . . , vk , if v1′ ∈ Uv1 , . . . , vk′ ∈ Uvk , then v1′ , . . . , vk′ are affinely independent. We claim that a simplicial complex obtained by replacing each v with some element of Uv is a triangulation of ∆d−1 ; note that this can be proved by moving one vertex at a time along a path. Finally observe that because ℓ is a Sperner labelling, for each v and 0 ≤ t < 1, v(t) is contained in the interior of the smallest face of ∆d−1 containing v. 54 CHAPTER 3. COMPUTING FIXED POINTS polynomial function of t, it follows that it is constant, and in particular p(1) = We have established the following refinement of Sperner’s lemma: 1 D. d! Theorem 3.4.2. If ℓ is a Sperner labelling, then the number of σ ∈ P d−1 such that ℓ is orientation preserving on σ is one greater than the number of σ ∈ P d−1 such that ℓ is orientation reversing on σ. One of our major themes is that fixed points where the function or correspondence reverses orientation are different from those where orientation is preserved. Much of what follows is aimed at keeping track of this difference in increasingly general settings. 3.5 The Scarf Algorithm The traditional proof of Sperner’s lemma is an induction on dimension, using path following in a graph with maximal degree two to show that if the result is true in dimension d − 2, then it is also true in dimension d − 1. In the late 1960’s and early 1970’s Herbert Scarf and his coworkers pointed out that the graphs in the various dimensions can be combined into a single graph with maximal degree two that has an obvious vertex whose degree is either zero or one. If the labelling is derived from a function f : ∆d−1 → ∆d−1 in the manner described in the proof of Brouwer’s fixed point theorem in Section 3.4, then following the path in this graph from this starting point to the other endpoint amounts to an algorithm for finding a point that is approximately fixed for f . Our exposition will follow this history, first presenting the inductive argument, then combining the graphs in the various dimensions into a single graph that supports the algorithm. As before, we are given a triangulation P of ∆d−1 and a Sperner labelling ℓ : V → {1, . . . , d} where V = P 0 = {v1 , . . . , vm } is the set of vertices. For each k = 0, . . . , d − 1 a k-dimensional simplex σ ∈ P d with vertices vi1 , . . . , vik+1 is said to be k-almost completely labelled if {1, . . . , k} ⊂ {ℓ(vi1 ), . . . , ℓ(vik+1 )}, and it is k-completely labelled if {ℓ(vi1 ), . . . , ℓ(vik+1 )} = {1, . . . , k + 1}. Note that a k-completely labelled simplex is k-almost completely labelled. What we were calling completely labelled simplices in the last section are now (d − 1)completely labelled simplices. Suppose that σ ∈ P d−1 is (d − 1)-almost completely labelled. If it is (d − 1)completely labelled, then it has precisely one facet that is (d−2)-completely labelled, namely the facet that does not include the vertex with label d. If σ is not (d − 1)completely labelled, then it has two vertices with the same label, and the facets opposite these vertices are its (d − 2)-completely labelled facets, so it has precisely two such facets. For k = 0, . . . , d − 2 let ∆k ⊂ ∆d−1 be the convex hull of e1 , . . . , ek+1 . If one of the (d − 2)-completely labelled facets of σ is contained in the boundary of ∆d−1 , 55 3.5. THE SCARF ALGORITHM then it must be contained in ∆d−2 because the labelling is Sperner. (Every other facet of ∆d−1 lacks one of the labels 1, . . . , d − 1.) When σ has two such facets, it is not possible that ∆d−2 contains both of them, of course, because σ is the convex hull of these facets. Suppose now that τ ∈ P d−2 is (d − 2)-completely labelled. Any element of P d−1 that has it as a facet is necessarily (d −1)-almost completely labelled. If τ intersects the interior of ∆d−1 , then it is a facet of two elements of P d−1 . On the other hand, if it is contained in the boundary of ∆d−1 , then it must be contained in ∆d−2 because ℓ is a Sperner labelling, and it is a facet of precisely one element of P d−1 . We define a graph Γd−1 = (V d−1 , E d−1 ) in which V d−1 be the set of (d−1)-almost completely labelled elements of P d−1 , by declaring that two elements of V d−1 are the endpoints of an edge in E d−1 if their intersection is a (d − 2)-completely labelled element of P d−2 . Let σ be an element of V d−1 . Our remarks above imply that if σ is (d − 1)-completely labelled, then it is an endpoint of no edges if its (d − 2)completely labelled facet is contained in ∆d−2 , and otherwise it is an endpoint of exactly one edge. On the other hand, if σ is not (d − 1)-completely labelled, then it is an endpoint of precisely on edge if one of its (d − 2)-completely labelled facets is contained in ∆d−2 , and otherwise it is an endpoint of exactly two edges. Thus Γd−1 has maximum degree two, so it is a union of isolated points, paths, and loops. The isolated points are the (d−1)-completely labelled simplices whose (d−2)completely labelled facets are contained in ∆d−2 . The endpoints of paths are the (d−1)-completely labelled simplices whose (d−2)-completely labelled facets are not contained in ∆d−2 and the (d − 1)-almost completely labelled simplices that are not completely labelled and have a (d−2)-completely labelled facet in ∆d−2 . Combining this information, we find that the sum of the number of (d − 1)-completely labelled simplices and the number of (d − 2)-completely labelled simplices contained in ∆d−2 is even, because every isolated point is associated with one element of each set, and every path has two endpoints. If there are an odd number of (d − 2)-completely labelled simplices contained in ∆d−2 , then there are necessarily an odd number of (d − 1)-completely labelled simplices. 3 b 3 b 3 b 3 1 b bb bb 1 3 b 1 1 1 b 2 bb bb bb 2 bb bb bb 1 2 b 1 bb bb bb bb bb 1 2 2 1 2 b 2 Figure 3.6 Of course for each k = 0, . . . , d − 2 the set of simplices in P that lie in ∆k 56 CHAPTER 3. COMPUTING FIXED POINTS constitute a simplicial subdivision of ∆k , and it is easy to see that the restriction of the labelling to the vertices that lie in ∆k is a Sperner labelling for that subdivision. Thus Sperner’s lemma follows from induction if we can establish it when d − 1 = 0. In this case ∆d−1 = ∆0 is a 0-dimensional simplex (i.e., a point) and the elements of the triangulation P are necessarily this simplex and the empty set. The simplex is 0-completely labelled, because 0 is the only available label, so the number of 0-completely labelled simplices is odd, as desired. Figure 3.6 shows the simplices in Γ2 for the labelling of Figure 3.5. In order to describe the Scarf algorithm we combine the graphs developed at each stage of the inductive process to create a single graph with a path from a known starting point to a (d − 1)-completely labelled simplex. Let V k be the set of k-almost completely labelled simplices contained in ∆k . Define a graph Γk = (V k , E k ) by specifying that two elements of V k are the endpoints of an edge in E k if their intersection is a (k − 1)-completely labelled element of P k−1 . For each k = 1, . . . , d −1, let F k be the set of unordered pairs {τ, σ} where τ ∈ V k−1 , σ ∈ V k , and τ is a facet of σ. Define a graph Γ = (V, E) by setting V = V 0 ∪ · · · ∪ V d−1 and E = E 0 ∪ F 1 ∪ E 1 ∪ · · · ∪ E d−2 ∪ F d−1 ∪ E d−1 . In our analysis above we saw that the number of neighbors of σ ∈ V k in Γk is two except that this number is reduced by one if σ has a facet in V k−1 , and it is also reduced by one if σ is k-completely labelled. If 1 ≤ k ≤ d − 2, then the first of these conditions is precisely the circumstance in which σ is an endpoint of an edge in F k , and the second is precisely the circumstance in which σ is an endpoint of an edge in F k+1. Therefore every element of V 1 ∪ · · · ∪ V d−2 has precisely two neighbors in Γ. Provided that d ≥ 1, every completely labelled simplex in V d−1 has precisely one neighbor in Γ, and every d-almost completely labelled simplex in V d−1 that is not completely labelled has two neighbors in Γ that are associated with its two (d−1)-completely labelled facets. Again provided that d ≥ 1, the unique element of V 0 has exactly one neighbor in V 1 . Thus the completely labelled elements of V d−1 and the unique element of V 0 each have one neighbor in Γ, and every other element of V has exactly two neighbors in Γ. Consequently the path in Γ that begins at the unique element of V 0 ends at a completely labelled element of V d−1 . Figure 3.7 shows the simplices in Γ for the labelling of Figure 3.5, which include points and line segments in addition to those shown in Figure 3.6. Conceptually, the Scarf algorithm is the process of following this path. An actual implementation requires a computational description of a triangulation of ∆d−1 . That is, there must be a triangulation and an algorithm such that if we are given a k-simplex in of this simplex in ∆k , the algorithm will compute the (k + 1)simplex in ∆k+1 that has the given simplex as a facet (provided that k < d − 1) and if we are given a vertex of the given simplex, the algorithm will return the other k-simplex in ∆k that shares the facet of the given simplex opposite the given vertex (provided that this facet is not contained in the boundary of ∆k ). In addition, we need an algorithm that computes the label of a given vertex; typically this would be derived from an algorithm for computing a given function f : ∆d−1 → ∆d−1 , as in the proof of Brouwer’s theorem. Given these resources, if we are at an element of 57 3.5. THE SCARF ALGORITHM V, we can compute the simplices of its neighbors in Γ and the labels of the vertices of these simplices. If we remember which of these neighbors we were at prior to arriving at the current element of V, then the next step in the algorithm is to go to the other neighbor. Such a step along the path of the algorithm is called a pivot. 3 b 3 b 3 b 3 1 b bb bb 1 3 b 1 1 1 b 2 bb bb bb 2 bb bb bb 1 2 bb 1 b bbb bb b bbb bbb bb b 1 2 2 1 2 b 2 Figure 3.7 At this point we remark on a few aspects of the Scarf algorithm, and later we will compare it with various alternatives. The first point is that it necessarily moves through ∆d−1 rather slowly. Consider a k-almost completely labelled simplex σ. Each pivot of the algorithm drops one of the vertices of the current simplex, possibly adding a new vertex, or possibly dropping down to a lower dimensional face. Therefore a minimum of k pivots are required before one can possibly arrive at a simplex that has no vertex in common with σ. If the grid is fine, the algorithm will certainly require many pivots to arrive a fixed point far from the algorithm’s starting point. This suggests the following strategy. We first apply the Scarf algorithm to a coarse given triangulation of ∆d−1 , thereby arriving at a completely labelled simplex that is hopefully a rough approximation of a fixed point. We then subdivide the given triangulation of ∆d−1 , using barycentric subdivision or some other method. If we could somehow “restart” the algorithm in the fine triangulation, near the completely labelled simplex in the coarse triangulation, it might typically be the case that the algorithm did not have to go very far to find a completely labelled simplex in the fine triangulation. Restart methods do exist (see, e.g., Merrill (1972), Kuhn and MacKinnon (1975), and van der Laan and Talman (1979)) but it remains the case that the Scarf algorithm has not proved to be very useful in practice, perhaps due in part to its difficulties with high dimensional problems. There is one more feature of the Scarf algorithm that is worth mentioning. In our description of the algorithm the ordering of the vertices plays an explicit role, and can easily make a difference to the outcome. If one wishes to find more than one completely labelled simplex, or perhaps as many as possible, or perhaps even all of them, there is the following strategy. Having followed the algorithm for the given ordering of the indices to its terminus, now proceed from that completely labelled simplex in the graph Γ′ associated with some different ordering. This might lead 58 CHAPTER 3. COMPUTING FIXED POINTS back to the starting point of the algorithm in Γ′ , but it is also quite possible that it might lead to some completely labelled simplex that cannot be reached directly by the algorithm under any ordering of the indices. A completely labelled simplex σ is accessible if it is reachable by the algorithm in this more general sense: there is path going to σ from the starting point of the algorithm for some ordering of the indices, along a path that is a union of maximal paths of the various graphs Γ′ for the various orderings of the indices. 3.6 Homotopy Let f : X → X be a continuous function, and let x0 be an element of X. We let h : X × [0, 1] → X be the homotopy h(x, t) = (1 − t)x0 + tf (x). Here we think of the variable t at time, and let ht = h(·, t) : X → X be the function “at time t.” In this way we imagine deforming the constant function with value x0 at time zero into the function f at time one. Let g : X × [0, 1] → X be the function g(x, t) = h(x, t) − x. The idea of the homotopy method is to follow a path in Z = g −1 (0) starting at (x0 , 0) until we reach a point of the form (x∗ , 1). As a practical matter it is necessary to assume that f is C 1 , so that h and g are C 1 . It is also necessary to assume that the derivative of g has full rank at every point of Z, and that the derivative of the map x 7→ f (x) − x has full rank at each of the fixed points of f . As we will see later in the book, there is a sense in which this is typically the case, so that these assumptions are mild. With these assumptions Z will be a union of finitely many curves. Some of these curves will be loops, while others will have two endpoints in X × {0, 1}. In particular, the other endpoint of the curve beginning at (x0 , 0) cannot be in X × {0}, because there is only one point in Z ∩ (X × {0}), so it must be (x∗ , 1) for some fixed point x∗ of f . We now have to tell the computer how to follow this path. The standard computational implementation of curve following is called the predictor-corrector method. Suppose we are at a point z0 = (x, t) ∈ Z. We first need to compute a vector v that is tangent to Z at z0 . Algebraicly this amounts to finding a nonzero linear combination of the columns of the matrix of Dg(z0 ) that vanishes. For this it suffices to express one of the columns as a linear combination of the others, and, roughly speaking, the Gram-Schmidt process can be used to do this. We can divide any vector we obtain this way by its norm, so that v becomes a unit vector. There is a parameter of the procedure called the step size that is a number ∆ > 0, and the “predictor” part of the process is completed by passing to the point z1 = z0 + ∆v. The “corrector” part of the process uses the Newton method to pass from z1 to a new point in Z, or at least very close to it. The first step is to find a vector w1 that is orthogonal to v such that g(z1 ) + Dg(z1)w1 = 0. To do this we can use the Gram-Schmidt process to find a basis for the orthogonal complement of v, compute the matrix M of the derivative of g with respect to this basis, compute the inverse of M, and then set w1 = −M −1 g(z1 ). We then set z2 = z1 + w1 , find a vector w2 3.7. REMARKS ON COMPUTATION 59 orthogonal to v such that g(z2 ) + Dg(z2 )w2 = 0, set z3 = z2 + w2 , and continue in this manner until g(zn ) is acceptably small. The net effect of the predictor followed by the corrector is to move us from one point on Z to another a bit further down. By repeating this one can go from one end of the curve to the other. Probably the reader has sensed that the description above is a high level overview that glides past many issues. In fact it is difficult to regard the homotopy method as an actual algorithm, in the sense of having precisely defined inputs and being guaranteed to eventually halt at an output of the promised sort. One issue is that the procedure might accidentally hop from one component of Z to another, particularly if ∆ is large. There are various things that might be done about this, for instance trying to detect a likely failure and starting over with a smaller ∆, but these issues, and the details of round off error that are common to all numerical software, are really in the realm of engineering rather than computational theory. As a practical matter, the homotopy method is highly successful, and is used to solve systems of equations from a wide variety of application domains. 3.7 Remarks on Computation We have now seen three algorithms for computing points that are approximately fixed. How good are these, practically and theoretically? The first algorithm we saw, in Section 3.3, is new. It is simple, and can be applied to a wide variety of settings. Code now exists, but there has been little testing or practical experience. The Scarf algorithm has not lived up to the hopes it raised when it was first developed, and is not used in practical computation. Homotopy methods are restricted to problems that are smooth. As we mentioned above, within this domain they have an extensive track record with considerable success. More generally, what can we reasonably hope for from an algorithm that computes points that are approximately fixed, and what sort of theoretical concepts can we bring to bear on these issues? These question has been the focus of important recent advances in theoretical computer science, and in this section we give a brief description of these developments. The discussion presumes little in the way of prior background in computer science, and is quite superficial—a full exposition of this material is far beyond our scope. Interested readers can learn much more from the cited references, and from textbooks such as Papadimitriou (1994a) and Arora and Boaz (2007). Theoretical analyses of algorithms must begin with a formal model of computation. The standard model is the Turing machine, which consists of a processor with finitely many states connected by an input-output device to a unbounded one dimensional storage medium that records data in cells, on each of which one can write an element of a finite alphabet that includes a distinguished character ‘blank.’ At the beginning of the computation the processor is in a particular state, the storage medium has a finitely many cells that are not blank, and the input-output device is positioned at a particular cell in storage. In each step of the computation the character at the input-output device’s location is read. The Turing machine is essentially defined by functions that take state-datum pairs as their arguments and 60 CHAPTER 3. COMPUTING FIXED POINTS compute: • the next state of the processor, • a bit that will be written at the current location of the input-output device (overwriting the bit that was just read) and • a motion (forward, back, stay put) of the input-output device. The computation ends when it reaches a particular state of the machine called “Halt.” Once that happens, the data in the storage device is regarded as the output of the computation. As you might imagine, an analysis based on a concrete and detailed description of the operation of a Turing machince can be quite tedious. Fortunately, it is rarely necessary. Historically, other models of computation were proposed, but were subsequently found to be equivalent to the Turing model, and the Church-Turing thesis is the hypothesis that all “reasonable” models of computation are equivalent, in the sense that they all yield the same notion of what it means for something to be “computable.” This is a metamathematical assertion: it can never be proved, and a refutation would not be logical, but would instead be primarily a social phenomenon, consisting of researchers shifting their focus to some inequivalent model. Once we have the notion of a Turing machine, we can define an algorithm to be a Turing machine that eventually halts, for any input state of the storage device. A subtle distinction is possible here: a Turing machine that always halts is not necessarily the same thing as a Turing machine that can be proved to halt, regardless of the input. In fact one of the most important early theorems of computer science is that there is no algorithm that has, as input, a description of a Turing machine and a particular input, and decides whether the Turing machine with that input will eventually halt. As a practical matter, one almost always works with algorithms that can easily be proved to be such, in the sense that it is obvious that they eventually halt. A computational problem is a rule that associates a nonempty set of outputs with each input, where the set of possible inputs and outputs is the set of pairs consisting of a position of the input-output device and a state of the storage medium in which there are finitely many nonblank cells. (Almost always the inputs of interest are formatted in some way, and this definition implicitly makes checking the validity of the input part of the problem.) A computational problem is computable if there is an algorithm that passes from each input to one of the acceptable outputs. The distinction between computational problems that are computable and those that are not is fundamental, with many interesting and important aspects, but in our discussion here we will focus exclusively on problems that are known to be computable. For us the most important distinctions is between those computable computational problems that are “easy” and those that are “hard,” where the definitions of these terms remain to be specified. In order to be theoretically useful, the easiness/hardness distinction should not depend on the architecture of a particular machine or the technology of a particular era. In addition, it should be robust, at least in the sense that a composition of two easy computational problems, where 3.7. REMARKS ON COMPUTATION 61 the output of the first is the input of the second, should also be easy, and possibly in other senses as well. For these reasons, looking at the running time of an algorithm on a particular input is not very useful. Instead, it is more informative to think about how the resources (time and memory) consumed by a computation increase as the size of the input grows. In theoretical computer science, the most useful distinction is between algorithms whose worst case running time is bounded by a polynomial function of the size of the output, and algorithms that do not have this property. The class of computational problems that have polynomial time algorithms is denoted by P. If the set of possible inputs of a computational problem is finite, then the problem is trivially in P, and in fact we will only consider computational problems with infinite sets of inputs. There are many kinds of computational problems, e.g., sorting, function evaluation, optimization, etc. For us the most important types are decision problems , which require a yes or no answer to a well posed question, and search problems, which require an instance of some sort of object or a verification that no such object exists. An important example of a decision problem is Clique: given a simple undirected graph G and an integer k, determine whether G has a clique with k nodes, where a clique is a collection of vertices such that G has an edge between any two of them. An example of a search problem is to actually find such a clique or to certify that no such clique exists. There is a particularly important class of decision problems called NP, which stands for “nondeterministic polynomial time.” Originally NP was thought of as the class of decision problems for which a Turing machine that chose its next state randomly has a positive probability of showing that the answer is “Yes” when this is the case. For example, if a graph has a k-clique, an algorithm that simply guesses which elements constitute the clique has a positive probability of stumbling onto some k-clique. The more modern way of thinking about NP is that it is the class of decision problems for which a “Yes” answer has a certificate or witness that can be verified in polynomial time. In the case of Clique an actual k-clique is such a witness. Factorization of integers is another algorithmic issue which easily generates decision problems—for example, does a given number have a prime factor whose first digit is 3?—that are in NP because a prime factorization is a witness for them. (One of the historic recent advances in mathematics is the discovery of a polynomial time algorithm for testing whether a number is prime. Thus it is possible to verify the primality of the elements of a factorization in polynomial time.) An even larger computational class is EXP, which is the class of computational problems that have algorithms with running times that are bounded above by a function of the form exp(p(s)), where s is the size of the problem and p is a polynomial function. Instead of using time to define a computational class, we can also use space, i.e., memory; PSPACE is the class of computational problems that have algorithms that use an amount of memory that is bounded by a polynomial function of the size of the input. The sizes of the certificates for a problem in NP are necessarily bounded by some polynomial function of the size of the input, and the problem can be solved by trying all possible certificates not exceeding this bound, so any problem in NP is also in PSPACE. In turn, the number of processor state-memory state pairs during the run of a program using polynomially bounded 62 CHAPTER 3. COMPUTING FIXED POINTS memory an exponential function of the polynomial, so any problem in PSPACE is also in EXP. Thus P ⊂ NP ⊂ PSPACE ⊂ EXP. Computational classes can also be defined in relation to an oracle which is assumed to perform some computation. The example of interest to us is an oracle that evaluates a continuous function f : X → X. How hard is it to find a point that is approximately fixed using such an oracle? Hirsch et al. (1989) showed that any algorithm that does this has an exponential worst case running time, because some functions require exponentially many calls to the oracle. Once you commit to an algorithm, the Devil can devise a function for which your algorithm will make exponentially many calls to the oracle before finding an approximate fixed point. An important aspect of this result is that the oracle is assumed to be the only source of information about the function. In practice the function is specified by code, and in principle an algorithm could inspect the code and use what it learned to speed things up. For linear functions, and certain other special classes of functions, this is a useful approach, but it seems quite farfetched to imagine that a fully general algorithm could do this fruitfully. At the same time it is hard to imagine how we might prove that this is impossible, so we arrive at the conclusion that even though we do not quite have a theorem, finding fixed points almost certainly has exponential worst case complexity. Even if finding fixed points is, in full generality, quite hard, it might still be the case that certain types of fixed point problems are easier. Consider, in particular, finding a Nash equilibrium of a two person game. Savani and von Stengel (2006) (see also McLennan and Tourky (2010)) showed that the Lemke-Howson algorithm has exponential worst case running time, but the algorithm is in many ways similar to the simplex algorithm for linear programming, not least because both algorithms tend to work rather well in practice. The simplex algorithm was shown by Klee and Minty (1972) to have exponential case running time, but later polynomial time algorithms were developed by Khachian (1979) and Karmarkar (1984). Whether or not finding a Nash equilibrium of a two person game is in P was one of the outstanding open problems of computer science for over a decade. Additional concepts are required in order to explain how this issue was resolved. A technique called reduction can be used to show that some computational problems are at least as hard as others, in a precise sense. Suppose that A and B are two computational problems, and we have two algorithms, guaranteed to run in polynomial time, the first of which converts the input encoding an instance of problem A into the input encoding an instance of problem B, and the second of which converts the desired output for the derived instance of problem B into the desired output for the given instance of problem A. Then problem B is at least as hard as problem A because one can easily turn an algorithm for problem B into an algorithm for problem A that is “as good,” in any sense that is invariant under these sorts of polynomial time transformations. A problem is complete for a class of computational problems if it is at least as hard, in this sense, as any other member of the class. One of the reasons that NP is so important is there are numerous NP-complete problems, many of which arise 3.7. REMARKS ON COMPUTATION 63 naturally; Clique is one of them. One of the most famous problems in contemporary mathematics is to determine whether NP is contained in P. This question boils down to deciding whether Clique (or any other NP-complete problem) has a polynomial time algorithm. This is thought to be highly unlikely, both because a lot of effort has gone into designing algorithms for these problems, and because the existence of such an algorithm would have remarkable consequences. It should be mentioned that this problem is, to some extent at least, an emblematic representative of numerous open questions in computer science that have a similar character. In fact, one of the implicit conventions of the discipline is to regard a computational problem as hard if, after some considerable effort, people haven’t been able to figure out whether it is hard or easy. For any decision problem in NP there is an associated search problem, namely to find a witness for an affirmative answer or verify that the answer is negative. For Clique this means not only showing that a clique of size k exists, but actually producing one. The class of search problems associated with decision problems is called FNP. (The ‘F’ stands for “function.”) For Clique the search problem is not much harder than the decision problem, in the following sense: if we had a polynomial time algorithm for the decision problem, we could apply it to the graph with various vertices removed, repeatedly narrowing the focus until we found the desired clique, thereby solving the search problem is polynomial time. However, there is a particular class of problems for which the search problem is potentially quite hard, even though the decision problem is trivial because the answer is known to be yes. This class of search problems is called TFNP. (The ’T’ stands for “total.”) There are some “trivial” decision problems that give rise to quite famous problems in this class: • “Does a integer have a prime factorization?” Testing primality can now be done in polynomial time, but there is still no polynomial time algorithm for factoring. • “Given a set of positive integers {a1 , . . . , an } with ai < 2n /n for all i, do there exist two different subsets with the same sum?” There are 2n different subsets, and the sum of any one of them is less than 2n − n + 1, so the pigeonhole principle implies that the answer is certainly yes. • “Does a two person game have sets of pure strategies for the agents that are the supports4 of a Nash equilibrium?” Verifying that a pair of sets are the support of a Nash equilibrium is a computation involving linear algebra and a small number of inequality verifications that can be performed in polynomial time. Problems involving a function defined on some large space must be specified with a bit more care, because if the function is given by listing its values, then the problem is easy, relative to the size of the input, because the input is huge. Instead, one takes the input to be a Turing machine that computes (in polynomial time) the value of the function at any point in the space. 4 The support of a mixed strategy is the set of pure strategies that are assigned positive probability. 64 CHAPTER 3. COMPUTING FIXED POINTS • “Given a Turing machine that computes a real valued function at every vertex of a graph, is there a vertex where the function’s value is at least as large as the function’s value at any of the vertex’ neighbors in the graph?” Since the graph is finite, the function has a global maximum and therefore at least one local maximum. • “Given a Turing machine that computes the value of a Sperner labelling at any vertex in a triangulation of the simplex, does there exist a completely labelled subsimplex?” Mainly because the class of problems in NP that always have a positive answer is defined in terms of a property of the outputs, rather than a property of the inputs (but also in part because factoring seems so different from the other problems) experts expect that TFNP does not contain any problems that are complete for the class. In view of this, trying to study the class as a whole is unlikely to be very fruitful. Instead, it makes sense to define and study coherent subclasses, and Papadimitriou (1994b) advocates defining subclasses in terms of the proof that a solution exists. Thus PPP (“polynomial pigeonhole principle”) is (roughly) the class of problems for which existence is guaranteed by the pigeonhole principle, and PLS (“polynomial local search”) is (again roughly) the set of problems requesting a local maximum of a real valued function defined on a graph by a Turing machine. For us the most important subclass of TFNP is PPAD (“polynomial parity argument directed”) which is defined by abstracting certain features of the algorithms we have seen in this chapter. The computational problem EOTL (“end of the line”) is defined by a Turing machine that defines a directed graph5 of maximal degree two in a space that may, without loss of generality, be taken to be the set {0, 1}k of bit strings of length k, where k is bounded by a polynomial function of the size of the input. For each v ∈ {0, 1}k the Turing machine specifies whether v is a vertex in the graph. If it is, the Turing machine computes its predecessor, if it has one, and its successor, if it has one. When it exists, the predecessor of v must be a vertex, and its successor must be v. Similarly, when v has a successor, it must be a vertex, and its predecessor must be v. Finally, we require that (0, . . . , 0) is a vertex that has a successor but no predecessor. The problem is to find another “leaf” of the graph, by which we mean either a vertex with a predecessor but no successor, or a vertex with a successor but no predecessor. Of course the existence of such a leaf follows from Lemma 2.6.1, generalized in the obvious way to handle directed graphs. The class of computational problems that have reductions to EOTL is PPAD (“polynomial parity problem directed”). The Lemke-Howson algorithm passes from a two person game to an instance of EOTL, then solves it by following the path in the graph to its other endpoint. Similarly, the Scarf algorithm has as input the algorithms for navigating in a triangulation of ∆d−1 and generating the labels of the vertices, and if follows a path in a graph from one endpoint to another. (It would be difficult to describe homotopy in exactly these terms, but there is an obvious sense in which it has this character.) 5 A directed graph is a pair G = (V, E) where V is a finite set of vertices and E is a finite set of ordered pairs of elements of V . That is, in a directed graph each edge has a source and a target. 3.7. REMARKS ON COMPUTATION 65 There is a rather subtle point that is worth mentioning here. In our descriptions of Lemke-Howson, Scarf, and homotopy, we implicitly assumed that the algorithm used its memory of where it had been to decide which direction to go in the graph, but the definition of EOTL requires that the graph be directed, which means in effect that if we begin at any point on the path, we can use local information to decide which of the two directions in the graph constitutes forward motion. It turns out that each of our three algorithms has this property; a proper explanation of this would require more information about orientation than we have developed at this point. The class of problems that can be reduced to the computational problem that has the same features as EOTL, except that the graph is undirected, is PPA. Despite the close resemblance to PPAD, the theoretical properties of the two classes differ in important ways. In a series of rapid developments in 2005 and 2006 (Daskalakis et al. (2006); Chen and Deng (2006b,a)) it was shown that computing a Nash equilibrium of a two player game is PPAD-complete, and also that the two dimensional Sperner problem is PPAD-complete. This means that computing a Nash equilibrium of a two player game is almost certainly hard, in the sense that there is no polynomial time algorithm for the problem, because computing general fixed points is almost certainly hard. Since this breakthrough many other computational problems have been shown to be PPAD-complete, including finding Walrasian equilibria in seemingly quite simple exchange economies. In various senses the problem does not go away if we relax the problem, asking for a point that is ε-approximately fixed for an ε that is significantly greater than zero. The current state of theory presents a contrast between theoretical concepts that classify even quite simple fixed point problems as intractable, and algorithms that often produce useful results in a reasonable amount of time. A recent result presents an even more intense contrast. The computational problem OEOTL has the same given data as EOTL, but now the goal is to find the other end of the path beginning at (0, . . . , 0), and not just any second leaf of the graph. Goldberg et al. (2011) show that OETL is PSPACE-complete, even though the Lemke-Howson algorithm, the Scarf algorithm, and many specific instances of homotopy procedures can be recrafted as algorithms for OEOTL. Recent developments have led to a rich and highly interesting theory explaining why the problem of finding an approximate fixed point is intractable, in the sense that there is almost certainly no algorithm that always finds an approximate fixed point in a small amount of time. What is missing at this point are more tolerant theoretical concepts that give an account of why the algorithms that exist are as useful as they are in fact, and how they might be compared with each other, and with theoretical ideals that have not yet been shown to be far out of reach. Chapter 4 Topologies on Spaces of Sets The theories of the degree and the index involve a certain kind of continuity with respect to the function or correspondence in question, so we need to develop topologies on spaces of functions and correspondences. The main idea is that one correspondence is close to another if its graph is close to the graph of the second correspondence, so we need to have topologies on spaces of subsets of a given space. In this chapter we study such spaces of sets, and in the next chapter we apply these results to spaces of functions and correspondences. There are three basic set theoretic operations that are used to construct new functions or correspondences from given ones, namely restriction to a subdomain, cartesian products, and composition, and our agenda here is to develop continuity results for elementary operations on sets that will eventually support continuity results for those operations. To begin with Section 4.1 reviews some basic properties of topological spaces that hold automatically in the case of metric spaces. In Section 4.2 we define topologies on spaces of compact and closed subsets of a general topological space. Section 4.3 presents a nice result due to Vietoris which asserts that for one of these tolopogies the space of nonempty compact subsets of a compact space is compact. Economists commonly encounter this in the context of a metric space, in which case the topology is induced by the Hausdorff distance; Section 4.4 clarifies the connection. In Section 4.5 we study the continuity properties of basic operations for these spaces. Our treatment is largely drawn from Michael (1951) which contains a great deal of additional information about these topologies. 4.1 Topological Terminology Up to this point the only topological spaces we have encountered have been subsets of Euclidean spaces. Now it will be possible that X lacks some of the properties of metric spaces, in part because we may ultimately be interested in some spaces that are not metrizable, but also in order to clarify the logic underlying our result. Throughout this chapter we work with a fixed topological space X. We say that X is: (a) a T1 -space if, for each x ∈ X, {x} is closed; 66 4.2. SPACES OF CLOSED AND COMPACT SETS 67 (b) Hausdorff if any two distinct points have disjoint neighborhoods; (c) regular if every neighborhood of a point contains a closed neighborhood of that point; (d) normal if, for any two disjoint closed sets C and D, there are disjoint open sets U and V with C ⊂ U and D ⊂ V . In a Hausdorff space the complement of a point is a neighborhood of every other point, so a Hausdorff space is T1 . It is an easy exercise to show that a metric space is normal and T1 . Evidently a normal T1 space is Hausdorff and regular. A collection B of subsets of X is a base of a topology if the open sets are all unions of elements of B. Note that B is a base of a topology if and only if all the elements of B are open and the open sets are those U ⊂ X such that for every x ∈ U there there is a V ∈ B with x ∈ V ⊂ U. We say that B is a subbase of the topology if the open sets are the unions of finite intersections of elements of B. Equivalently, each element of B is open and for each open U and x ∈ U there are V1 , . . . , Vk ∈ B such that x ∈ V1 ∩ · · · ∩ Vk ⊂ U. It is often easy to define or describe a topology by specifying a subbase—in which case we way that the topology of X is generated by B—so we should understand what properties a collection B of subsets of X has to have in order for this to work. Evidently the collection of all unions of finite intersections of elements of B is closed under finite intersection and arbitrary union. We may agree, as a matter of convention if you like, that the empty set is a finite intersection of elements of B. Then the only real requirement is that the union of all elements of B is X, so that X itself is closed. 4.2 Spaces of Closed and Compact Sets There will be a number of topologies, and in order to define them we need the corresponding subbases. For each open U ⊂ X let: • ŨU = { K ⊂ U : K is compact }; • UU = ŨU \ {∅}; • VU = { K ⊂ X : K is compact and K ∩ U 6= ∅ }; • ŨU0 = { C ⊂ U : C is closed }; • UU0 = ŨU0 \ {∅}; • VU0 = { C ⊂ X : C is closed and C ∩ U 6= ∅ }. We now have the following spaces: • K̃(X) is the space of compact subsets of X endowed with the topology generated by the subbase { ŨU : U ⊂ X is open }. 68 CHAPTER 4. TOPOLOGIES ON SPACES OF SETS • K(X) is the space of nonempty compact subsets of X endowed with the subspace topology inherited from K̃(X). • H(X) is the space of nonempty compact subsets of X endowed with the topology generated by the subbase { UU : U ⊂ X is open } ∪ { VU : U ⊂ X is open }. • K̃0 (X) is the space of closed subsets of X endowed with the topology generated by the base { ŨU0 : U ⊂ X is open }. • K0 (X) is the space of nonempty closed subsets of X endowed with the subspace topology inherited from K̃0 (X). • H0 (X) is the space of nonempty closed subsets of X endowed with the topology generated by the subbase { UU0 : U ⊂ X is open } ∪ { VU0 : U ⊂ X is open }. The topologies of H(X) and H0 (X) are both called the Vietoris topology. Roughly, a neighborhood of K in K̃(X) or K(X) consists of those K ′ that are close to K in the sense that every point in K ′ is close to some point of K. A neighborhood of K ∈ H(X) consists of those K ′ that are close in this sense, and also in the sense that every point in K is close to some point of K ′ . Similar remarks pertain to K̃0 (X), K0 (X), and H0 (X). Section 4.4 develops these intuitions precisely when X is a metric space. Compact subsets of Hausdorff spaces are closed, so “for practical purposes” (i.e., when X is Hausdorff) every compact set is closed. In this case K̃(X), K(X), and H(X) have the subspace topologies induced by the topologies of K̃0 (X), K0 (X), and H0 (X). Of course it is always the case that K(X) and K0 (X) have the subspace topologies induced by K̃(X) and K̃0 (X) respectively. It is easy to see that { UU : U ⊂ X is open } is a base for K(X) and { UU0 : U ⊂ X is open } is a base for K0 (X). Also, for any open U1 , . . . , Uk we have ŨU1 ∩ . . . ∩ ŨUk = ŨU1 ∩...∩Uk , and similarly for UU , ŨU0 , and UU0 , so the subbases of K̃(X), K(X), K̃0 (X), and K0 (X) are actually bases. 4.3 Vietoris’ Theorem An interesting fact, which was proved already in Vietoris (1923), and which is applied from time to time in mathematical economics, is that H(X) is compact whenever X is compact. We begin the argument with a technical lemma. Lemma 4.3.1. If X has a subbase such that any cover of X by elements of the subbase has a finite subcover, then X is compact. 69 4.4. HAUSDORFF DISTANCE Proof. Say that a set is basic if it is a finite intersection of elements of the subbasis. Any open cover is refined by the collection of basic sets that are subsets of its elements. If a refinement of an open cover has a finite subcover, then so does the cover, so it suffices to show that any open cover of X by basic sets has a finite subcover. A collection of open covers is a chain if it is completely ordered by inclusion: for any two covers in the chain, the first is a subset of the second or vice versa. If each open cover in a chain consists of basic sets, and has no finite subcover, then the union of the elements of the chain also has these properties (any finite subset of the union is contained in some member of the chain) so Zorn’s lemma implies that if there is one open cover with these properties, then there is a maximal such cover, say {Uα : α ∈ A}. Suppose, for some β ∈ A, that Uβ = V1 ∩ . . . ∩ Vn where V1 , . . . , Vn are in the subbasis. If, for each i = 1, . . . , n, {Uα : α ∈ A} ∪ {Vi } has a finite subcover Ci , then each Ci \ {Vi } covers X \ Vi , so (C1 \ {V1 }) ∪ . . . ∪ (Cn \ {Vn }) ∪ {Uβ } is a finite subcover from {Uα : α ∈ A}. Therefore there is at least one i such that {Uα : α ∈ A}∪{Vi } has no finite subcover, and maximality implies that Vi is already in the cover. This argument shows that each element Uβ of the cover is contained in a subbasic set that is also in the cover, so the subbasic sets in {Uα : α ∈ A} cover X, and by hypothesis there must be a finite subcover after all. Theorem 4.3.2. If X is compact, then H(X) is compact. Proof. Suppose that { UUα : α ∈SA} ∪ { VVβ : β ∈ B} is an open cover of H(X) by subbasic sets. Let D := X \ β Vβ ; since D is closed and X is compact, D is compact. We may assume that D is nonempty because otherwise X = Vβ1 ∪. . .∪Vβn for some β1 , . . . , βn , in which case H(X) = VVβ1 ∪ . . . ∪ VVβn . In addition, D must be contained in some Uα because otherwise D would not be an element of any UUα or any VVβ . But then {Uα } ∪ {Vβ : β ∈ B} has a finite subcover, so, for some β1 , . . . , βn , we have H(X) = UUα ∪ VVβ1 ∪ . . . ∪ VVβn . 4.4 Hausdorff Distance Economists sometimes encounter spaces of compacts subsets of a metric space, which are frequently topologized with the Hausdorff metric. In this section we clarify the relationship between that approach and the spaces introduced above. Suppose that X is a metric space with metric d. For nonempty compact sets K, L ⊂ X let δK (K, L) := max min d(x, y). x∈K y∈L Then for any K and ε > 0 we have { L : δK (L, K) < ε } = { L : L ⊂ Uε (K) } = UUε (K) . (∗) 70 CHAPTER 4. TOPOLOGIES ON SPACES OF SETS On the other hand, whenever K ⊂ U with K compact and U open there is some ε > 0 such that Uε (K) ⊂ U (otherwise we could take sequences x1 , x2 , . . . in L and y1 , y2 , . . . in X \ U with d(xi , yi ) → 0, then take convergent subsequences) so { L : δK (L, K) < ε } ⊂ UU . Thus: Lemma 4.4.1. When X is a metric space, the sets of the form { L : δK (L, K) < ε } constitute a base of the topology of K(X). The Hausdorff distance between nonempty compact sets K, L ⊂ X is δH (K, L) := max{δK (K, L), δK (L, K)}. This is a metric. Specifically, it is evident that δH (K, L) = δH (L, K), and that δH (K, L) = 0 if and only if K = L. If M is a third compact set, then δK (K, M) ≤ δK (K, L) + δK (L, M), from which it follows easily that the Hausdorff distance satisfies the triangle inequality. There is now an ambiguity in our notation, insofar as Uε (L) might refer either to the the union of the ε-balls around the various points of L or to the set of compact sets whose Hausdorff distance from L is less than ε. Unless stated otherwise, we will always interpret it in the first way, as a set of points and not as a set of sets. Proposition 4.4.2. The Hausdorff distance induces the Vietoris topology on H(X). Proof. Fix a nonempty compact K. We will show that any neighborhood of K in one topology contains a neighborhood in the other topology. S First consider some ε > 0. Choose x1 , . . . , xn ∈ K such that K ⊂ i Uε/2 (xi ). If L ∩ Uε/2 (xi ) 6= ∅ for all i, then δK (L, K) < ε, so, in view of (∗), K ∈ UUε (K) ∩ VUε/2 (x1 ) ∩ . . . ∩ VUε/2 (xn ) ⊂ { L : δH (K, L) < ε }. We now show that any element of our subbasis for the Vietoris topology contains { L : δH (K, L) < ε } for some ε > 0. If U is an open set containing K, then (as we argued above) Uε (K) ⊂ U for some ε > 0, so that K ∈ { L : δH (L, K) < ε } ⊂ { L : δK (L, K) < ε } ⊂ UU . If V is open with K ∩ V 6= ∅, then we can choose x ∈ K ∩ V and ε > 0 small enough that Uε (x) ⊂ V . Then K ∈ { L : δH (K, L) < ε } ⊂ { L : δK (K, L) < ε } ⊂ VV . 4.5. BASIC OPERATIONS ON SUBSETS 4.5 71 Basic Operations on Subsets In this section we develop certain basic properties of the topologies defined in Section 4.2. To achieve a more unified presentation, it will be useful to let T denote a generic element of {K̃, K, H, K̃0 , K0 , H0 }. This is, T (X) will denote one of the spaces K̃(X), K(X), H(X), K̃0 (X), H0 (X), and H0 (X), with the range of allowed interpretations indicated in each context. Similarly, W will denote a generic element of {Ũ, U, V, Ũ 0 , U 0 , V 0 }. We will frequently apply the following simple fact. Lemma 4.5.1. If Y is a second topological space, f : Y → X is a function, and B is a subbase for X such that f −1 (V ) is open for every V ∈ B, then f is continuous. T T Proof. For any sets S1 , . . . , Sk ⊂ X we have f −1 (S i Si ) =S i f −1 (Si ), and for any collection {Ti }i∈I of subsets of X we have f −1 ( i Ti ) = i f −1 (Ti ). Thus the preimage of a union of finite intersections of elements of B is open, because it is a union of finite intersections of open subsets of Y . 4.5.1 Continuity of Union The function taking a pair of sets to their union is as well behaved as one might hope. Lemma 4.5.2. For any T ∈ {K̃, K, H, K̃0 , K0 , H0 } the function υ : (K1 , K2 ) 7→ K1 ∪ K2 is a continuous function from T (X) × T (X) to T (X). Proof. Applying Lemma 4.5.1, it suffices to show that preimages of subbasic open sets are open. For T ∈ {K̃, K, K̃0 , K0 } it suffices to note that υ −1(WU ) = WU × WU for all four W ∈ {Ũ, U, Ũ 0 , U 0 }. For T ∈ {H, H0 } we also need to observe that υ −1(WU ) = (WU × H(X)) ∪ (H(X) × WU ) for both W ∈ {V, V 0 }. 4.5.2 Continuity of Intersection Simple examples show that intersection is not a continuous operation for the topologies H and H0 , so the only issues here concern K̃, K, K̃0 , and K0 . Lemma 4.5.3. If A ⊂ X is closed, the function K 7→ K ∩ A from K̃A (X) to K̃(A) 0 and the function C 7→ C ∩ A from K̃A (X) to K̃0 (A) are continuous. Proof. If V ⊂ A is open, then the set of compact K such that K ∩ A ⊂ V is UV ∪(X\A) . This establishes the first asserted continuity, and a similar argument establishes the second. 72 CHAPTER 4. TOPOLOGIES ON SPACES OF SETS 0 For a nonempty closed set A ⊂ X let KA (X) and KA (X) be the sets of compact and closed subsets of X that have nonempty intersection with A. Since the topologies of K(X) and K0 are the subspace topologies inherited from K̃(X) and K̃0 (X), last result has the following immediate consequence. Lemma 4.5.4. The function K 7→ K ∩ A from KA (X) to K(A) and the function 0 C 7→ C ∩ A from KA (X) to K0 (A) are continuous. Joint continuity of the map (C, D) 7→ C ∩ D requires an additional hypothesis. Lemma 4.5.5. If X is a normal space, then ι : (C, D) 7→ C ∩ D is a continuous function from K̃0 (X) × K̃0 (X) to K̃0 (X). If, in addition, X is a T1 space, then ι : K̃(X) × K̃(X) → K̃(X) is continuous. Proof. By Lemma 4.5.1 it suffices to show that, for any open U ⊂ X, ι−1 (UU0 ) is open. For any (C, D) in this set normality implies that there are disjoint open sets V and W containing C \ U and D \ U respectively. Then (U ∪ V ) ∩ (U ∪ W ) = U, so (C, D) ∈ (UU0 ∪V × UU0 ∪W ) ∩ I 0 (X) ⊂ ι−1 (UU0 ). If X is also T1 , it is a Hausdorff space, so compact sets are closed. Therefore ι : K̃(X) × K̃(X) → K̃(X) is continuous because its domain and range have the subspace topologies inherited from K̃0 (X) × K̃0 (X) and K̃0 (X). Let I(X) (resp. I 0 (X)) be the set of pairs (K, L) of compact (resp. closed) subsets of X such that K ∩ L 6= ∅, endowed with the topology it inherits from the product topology of K(X) × K(X) (resp. K0 (X) × K0 (X)). The relevant topologies are relative topologies obtained from the spaces in the last result, so: Lemma 4.5.6. If X is a normal space, then ι : (C, D) 7→ C ∩ D is a continuous function from I 0 (X) to K0 (X). If, in addition, X is a T1 space, then ι : I(X) → K(X) is continuous. 4.5.3 Singletons Lemma 4.5.7. The function η : x 7→ {x} is a continuous function from X to T (X) when T ∈ {K, H}. If, in addition, X is a T1 -space, then it is continuous when T ∈ {K0 , H0 }. Proof. Singletons are always compact, so for any open U we have η −1 (UU ) = η −1 (VU ) = U. If X is T1 , then singletons are closed, so η −1 (UU0 ) = η −1 (VU0 ) = U. 4.5.4 Continuity of the Cartesian Product In addition to X, we now let Y be another given topological space. A simple example shows that the cartesian product π 0 : (C, D) 7→ C × D is not a continuous function from H0 (X) × H0 (Y ) to H0 (X × Y ). Suppose X = Y = R, (C, D) = (X, {0}), and W = { (x, y) : |y| < (1 + x2 )−1 }. 4.5. BASIC OPERATIONS ON SUBSETS 73 It is easy to see that there is no neighborhood V ⊂ H0 (Y ) of D such that π 0 (C, D ′ ) ∈ UW (that is, R × D ′ ⊂ W ) for all D ′ ∈ V . For compact sets there are positive results. In preparation for them we recall a basic fact about the product topology. Lemma 4.5.8. If K ⊂ X and L ⊂ Y are compact, and W ⊂ X × Y is a neighborhood of K × L, then there are neighborhoods U of K and V of L such that U × V ⊂ W. Proof. By the definition of the product topology, for each (x, y) ∈ K × L there are neighborhoods U(x,y) and V(x,y) of x and y such that S U(x,y) × V(x,y) ⊂ W . For each x ∈ KT we can find y1 , . . . , yn such that L ⊂ Vx := j V(x,yj ) , and S we can then let UxT:= j U(x,yj ) . Now choose x1 , . . . , xm such that K ⊂ U := i Uxi , and let V := i Vxi . Proposition 4.5.9. For T ∈ {K̃, K, H} the function π : (K, L) 7→ K × L is a continuous function from T (X) × T (Y ) to T (X × Y ). Proof. Let K ⊂ X and L ⊂ Y be compact. If W is a neighborhood of K × L and U and V are open neighborhoods of K and L with U × V ⊂ W , then (K, L) ∈ UU × UV ⊂ π −1 (UW ). By Lemma 4.5.1, this establishes the asserted continuity when T ∈ {K̃, K}. To demonstrate continuity when T = H we must also show that π −1 (VW ) is open in H(X) × H(Y ) whenever W ⊂ X × Y is open. Suppose that (K × L) ∩ W 6= ∅. Choose (x, y) ∈ (K × L) ∩ W , and choose open neighborhoods U and V of x and y with U × V ⊂ W . Then K × L ∈ VU × VV ⊂ π −1 (VW ). 4.5.5 The Action of a Function Now fix a continuous function f : X → Y . Then f maps compact sets to compact sets while f −1 (D) is closed whenever D ⊂ Y is closed. The first of these operations is as well behaved as one might hope. Lemma 4.5.10. If T ∈ {K̃, K, H}, then φf : K 7→ f (K) is a continuous function from T (X) to T (Y ). Proof. Preimages of subbasic open sets are open: for any open V ⊂ Y we have φ−1 f (WV ) = Wf −1 (V ) for all W ∈ {Ũ, U, V}. There is the following consequence for closed sets. Lemma 4.5.11. If X is compact, Y is Hausdorff, and T ∈ {K̃, K, H}, then φf : K 7→ f (K) is a continuous function from T 0 (X) to T 0 (Y ). 74 CHAPTER 4. TOPOLOGIES ON SPACES OF SETS Proof. Recall that a closed subset of a compact space X is compact1 so that T 0 (X) ⊂ T (X). As we mentioned earlier, T 0 (X) has the relative topologies induced by the topology of T (X), so the last result implies that φf is a continuous function from T 0 (X) to T (Y ). The proof is completed by recalling that a compact subset of a Hausdorff space Y is closed2 , so that T (Y ) ⊂ T 0 (Y ). Since preimages of closed sets are closed, there is a well defined function ψf : D 7→ f −1 (D) from K̃0 (Y ) to K̃0 (X). We need an additional hypothesis to guarantee that it is continuous. Recall that a function is closed if it is continuous and maps closed sets to closed sets. Lemma 4.5.12. If f is a closed map, then ψf : D 7→ f −1 (D) is a continuous function from K̃0 (Y ) to K̃0 (X). Proof. For an open U ⊂ X, we claim that ψf−1 (UU0 ) = UY0 \f (X\U ) . First of all, Y \ f (X \ U) is open because f is a closed map. If D ⊂ Y \ f (X \ U) is closed, then f −1 (D) is a closed subset of U. Thus UY0 \f (X\U ) ⊂ ψf−1 (UU0 ). On the other hand, if D ⊂ Y is closed and f −1 (D) ⊂ U, then D ∩ f (X \ U) = ∅. Thus ψf−1 (UU0 ) ⊂ UY0 \f (X\U ) . Of course if f is closed and surjective, then ψf restricts to a continuous map from K0 (Y ) to K0 (X). When X is compact and Y is Hausdorff, any continuous f : X → Y is closed, because any closed subset of X is compact, so its image is compact and consequently closed. Here is an example illustrating how the assumption that f is closed is indispensable. Example 4.5.13. Suppose 0 < ε < π, let X = (−ε, 2π + ε) and Y = { z ∈ C : |z| = 1 }, and let f : X → Y be the function f (t) := eit . The function ψf : D 7→ f −1 (D) is discontinuous at D0 = { eit : ε ≤ t ≤ 2π − ε } because for any open V containing D0 there are closed D ⊂ V such that f −1 (D) includes points far from f −1 (D0 ) = [ε, 2π − ε]. 4.5.6 The Union of the Elements Whenever we have a set of subsets ofSsome space, we can take the union of its elements. For any open U ⊂ X we have K∈UU K = U because for each x ∈ U, {x} is compact. Since the sets UU are a base for the topology of K(X), it follows that the union of all elements of an open subset of K(X) is open. If U and V1 , . . . , Vk are open, then UU ∩ VV1 ∩ · · · ∩ VVk = ∅ if there is some j with U ∩ Vj = ∅, and otherwise {x, y1 , . . . , yk } ∈ UU ∩ VV1 ∩ · · · ∩ VVk whenever x ∈ U and y1 ∈ V1 ∩ U, . . . , yk ∈ Vk ∩ U, so the union of all K ∈ UU ∩ VV1 ∩ · · · ∩ VVk is again U. Therefore the union of all the elements of an open 1 Proof: an open cover of the subset, together with its complement, is an open cover of the space, any finite subcover of which yields a finite subcover of the subset. 2 Proof: fixing a point y in the complement of the compact set K, for each x ∈ K there are disjoint neighborhoods of Ux of x and Vx of y, {Ux } is an open cover of K, and if Ux1 , . . . , Uxn is a finite subcover, then Vx1 ∩ . . . ∩ Vxn is a neighborhood of y that does not intersect K. 4.5. BASIC OPERATIONS ON SUBSETS 75 subset of H(X) is open. If X is either T1 or regular, then similar logic shows that for either T ∈ {K0 , H0 } the union of the elements of an open subset of T (X) is open. If a subset C of H(X) or H0 (X) is compact, then it is automatically compact in the coarser topology of K(X) or K0 (X). Therefore the following two results imply the analogous claims for the H(X) and H0 (X), which are already interesting. S Lemma 4.5.14. If S ⊂ K(X) is compact, then L := K∈S K is compact. Proof. Let {Uα : α ∈ A} be an open cover of L. For each K ∈ S let VK be the union of the elements of some finite subcover. Then K ∈ UVK , so { UVK :SK ∈ S } is an open cover of S; let UVK1 , . . . , UVKr be a finite subcover. Then L ⊂ ri=1 VKi , and the various sets from {Uα } that were united to form the VKi are the desired finite subcover of L. S Lemma 4.5.15. If X is regular and S ⊂ K0 (X) is compact, then D := C∈S C is closed. Proof. We will show that X \ D is open; let x be a point in this set. Each element of S is a closed set that does not contain x, so (since X is regular) it is an element 0 of UX\N for some closed neighborhood N of x. Since S is compact we have S ⊂ 0 0 UX\N1 ∪ . . . ∪ UX\N for some N1 , . . . , Nk . Then N1 ∩ . . . ∩ Nk is a neighborhood k of x that does not intersect any element of S, so x is in the interior of X \ D as desired. Chapter 5 Topologies on Functions and Correspondences In order to study of robustness of fixed points, or sets of fixed points, with respect to perturbations of the function or correspondence, one must specify topologies on the relevant spaces of functions and correspondences. We do this by identifying a function or correspondence with its graph, so that the topologies from the last chapter can be invoked. The definitions of upper and lower semicontinuity, and their basic properties, are given in Section 5.1. There are two topologies on the space of upper semicontinuous correspondences from X to Y . The strong upper topology, which is defined and discussed in Section 5.2, turns out to be rather poorly behaved, and the weak upper topology, which is usually at least as coarse, is presented in Section 5.3. When X is compact the strong upper topology coincides with the weak upper topology. We will frequently appeal to a perspective in which a homotopy h : X × [0, 1] → Y is understood as a continuous function t 7→ ht from [0, 1] to the space of continuous functions from X to Y . Section 5.4 presents the underlying principle in full generality for correspondences. The specializations to functions of the strong and weak upper topologies are known as the strong topology and the weak topology respectively. If X is regular, then the weak topology coincides with the compactopen topology, and when X is compact the strong and weak topologies coincide. Section 5.5 discusses these matters, and presents some results for functions that are not consequences of more general results pertaining to correspondences. The strong upper topology plays an important role in the development of the topic, and its definition provides an important characterization of the weak upper topology when the domain is compact, but it does not have any independent significance. Throughout the rest of the book, barring an explicit counterindication, the space of upper semicontinuous correspondences from X to Y will be endowed with the weak upper topology, and the space of continuous functions from X to Y will be endowed with the weak topology. 76 5.1. UPPER AND LOWER SEMICONTINUITY 5.1 77 Upper and Lower Semicontinuity Let X and Y be topological spaces. Recall that a correspondence F : X → Y maps each x ∈ X to a nonempty F (x) ⊂ Y . The graph of F is Gr(F ) = { (x, y) ∈ X × Y : y ∈ F (x) }. If each F (x) is compact (closed, convex, etc.) then F is compact valued (closed valued, convex valued, etc.). We say that F is upper semicontinuous if it is compact valued and, for any x ∈ X and open set V ⊂ Y containing F (x), there is a neighborhood U of x such that F (x′ ) ⊂ V for all x′ ∈ U. When F is compact valued, it is upper semi-continuous if and only if F −1 (UV ) is a open whenever V ⊂ Y is open. Thus: Lemma 5.1.1. A compact valued correspondence F : X → Y is upper semicontinuous if and only if it is continuous when regarded as a function from X to K(Y ). In economics literature the graph being closed in X × Y is sometimes presented as the definition of upper semicontinuity. Useful intuitions and simple arguments flow from this point of view, so we should understand precisely when it is justified. Proposition 5.1.2. If F is upper semicontinuous and Y is a Hausdorff space, then Gr(F ) is closed. Proof. We show that the complement of the graph is open. Suppose (x, y) ∈ / Gr(F ). Since Y is Hausdorff, y and each point z ∈ F (x) have disjoint neighborhoods Vz and Wz . Since F (x) is compact, F (x) ⊂ Wz1 ∪ · · · ∪ Wzk for some z1 , . . . , zk . Then V := Vz1 ∩ · · · ∩ Vzk and W := Wz1 ∪ · · · ∪ Wzk are disjoint neighborhoods of y and F (x) respectively. If U is a neighborhood of x with F (x′ ) ⊂ W for all x′ ∈ U, then U × V is a neighborhood (x, y) that does not intersect Gr(F ). If Y is not compact, then a compact valued correspondence F : X → Y with a closed graph need not be upper semicontinuous. For example, suppose X = Y = R, F (0) = {0}, and F (t) = {1/t} when t 6= 0. Proposition 5.1.3. If Y is compact and Gr(F ) is closed, then F is upper semicontinuous. Proof. Fix x ∈ X. Since (X × Y ) \ Gr(F ) is open, for each y ∈ Y \ V we can choose neighborhoods S Uy of x and Vy of y such that (Uy × Vy ) ∩ Gr(F ) = ∅. In particular, Y \ F (x) = y∈Y \F (x) Vy is open, so F (x) is closed and therefore compact. Thus F is compact valued. Now fix an open neighborhood V of F (x). Since Y \ V is a closed subset of a compact space, hence compact, there are y1 , . . . , yk such that Y \ V ⊂ Vy1 ∪. . . ∪Vyk . Then F (x′ ) ⊂ V for all x′ ∈ Uy1 ∩ . . . ∩ Uyk . Proposition 5.1.4. If F is upper semicontinuous and X is compact, then Gr(F ) is compact. 78CHAPTER 5. TOPOLOGIES ON FUNCTIONS AND CORRESPONDENCES Proof. We have the following implications of earlier results: • Lemma 4.5.7 implies that the function x 7→ {x} ∈ K(X) is continuous; • Lemma 5.1.1 implies that F is continuous, as a function from X to K(Y ); • Proposition 4.5.9 states that (K, L) 7→ K × L is a continuous function from K(X) × K(Y ) to K(X × Y ). Together these imply that F̃ : x 7→ {x} × F (x) is continuous, as a function from X to K(X × Y ). Since X is compact, it follows that S F̃ (X) is a compact subset of K(X × Y ), so Lemma 4.5.14 implies that Gr(F ) = x∈X F̃ (x) is compact. We say that F is lower semicontinuous if, for each x ∈ X, y ∈ F (x), and neighborhood V of y, there is a neighborhood U of x such that F (x′ ) ∩ V 6= ∅ for all x′ ∈ U. If F is both upper and lower semi-continuous, then it is said to be continuous. When F is compact valued, it is lower semicontinuous if and only if F −1 (VV ) is open whenever V ⊂ Y is open. Combining this with Lemma 5.1.1 gives: Lemma 5.1.5. A compact valued correspondence F : X → Y is continuous if and only if it is continuous when regarded as a function from X to H(Y ). 5.2 The Strong Upper Topology Let X and Y be topological spaces with Y Hausdorff, and let U(X, Y ) be the set of upper semicontinuous correspondences from X to Y . Proposition 5.1.2 insures that the graph of each F ∈ U(X, Y ) is closed, so there is an embedding F 7→ Gr(F ) of U(X, Y ) in K0 (X × Y ). The strong upper topology is the topology induced by this embedding when the image has the subspace topology. Let US (X, Y ) be U(X, Y ) endowed with this topology. Since {UV0 : V ⊂ X × Y is open } is a subbase for K0 (X × Y ), there is a subbase of US (X, Y ) consisting of the sets of the form { F : Gr(F ) ⊂ V }. Naturally the following result is quite important. Theorem 5.2.1. If Y is a Hausdorff space and X is a compact subset of Y , then F P : US (X, Y ) → K̃(X) is continuous. Proof. Since Y is Hausdorff, X and ∆ = { (x, x) : x ∈ X } are closed subsets of Y and X × Y respectively. For each F ∈ US (X, Y ), F P(F ) is the projection of Gr(F ) ∩ ∆ onto the first coordinate. Since Gr(F ) is compact (Proposition 5.1.4) so is Gr(F )∩∆, and the projection is continuous, so F P(F ) is compact. The definition of the strong topology implies that Gr(F ) is a continuous function of F . Since ∆ is closed in X × Y , Lemma 4.5.3 implies that Gr(F ) ∩ ∆ is a continuous function of F , after which Lemma 4.5.10 implies that F P(F ) is a continuous function of F . 5.2. THE STRONG UPPER TOPOLOGY 79 The basic operations for combining given correspondences to create new correspondences are restriction to a subset of the domain, cartesian products, and composition. We now study the continuity of these constructions. Lemma 5.2.2. If A is a closed subset of X, then the map F 7→ F |A is continuous as a function from US (X, Y ) to US (A, Y ). Proof. Since A × Y is a closed subset of X × Y , continuity as a function from US (X, Y ) to US (A, Y )—that is, continuity of Gr(F ) 7→ Gr(F ) ∩ (A × Y )—follows immediately from Lemma 4.5.4. An additional hypothesis is required to obtain continuity of restriction to a compact subset of the domain, but in this case we obtain a kind of joint continuity. Lemma 5.2.3. If X is regular, then the map (F, K) 7→ Gr(F |K ) is a continuous function from US (X, Y ) × K(X) to K(X × Y ). In particular, for any fixed K the map F 7→ F |K is a continuous function from US (X, Y ) to US (K, Y ). Proof. Fix F ∈ US (X, Y ), K ∈ K(X), and an open neighborhood W of Gr(F |K ). For each x ∈ K Lemma 4.5.8 gives neighborhoods Ux of x and Vx of F (x) with Ux × Vx ⊂ W . Choose x1 , . . . , xk such that U := Ux1 ∪ . . . ∪ Uxk contains K. Since X is regular, each point in K has a closed neighborhood contained in U, and the interiors of finitely many of these cover K, so K has a closed neighborhood C contained in U. Let W ′ := (Ux1 × Vx1 ) ∪ . . . ∪ (Uxk × Vxk ) ∪ ((X \ C) × Y ). Then (K, Gr(F )) ∈ Uint C × UW ′ , and whenever (K ′ , Gr(F ′ )) ∈ Uint C × UW ′ we have Gr(F ′|K ′ ) ⊂ W ′ ∩ (C × Y ) ⊂ (Ux1 × Vx1 ) ∪ . . . ∪ (Uxk × Vxk ) ⊂ W. Let X ′ and Y ′ be two other topological spaces with Y ′ Hausdorff. Since the map (C, D) 7→ C × D is not a continuous operation on closed sets, we should not expect the function (F, F ′ ) 7→ F ×F ′ from US (X, Y )×US (X ′ , Y ′ ) to US (X×X ′ , Y ×Y ′ ) to be continuous, and indeed, after giving the matter a bit of thought, the reader should be able to construct a neighborhood of the graph of the function (x, x′ ) 7→ (0, 0) that shows that the map (F, F ′) 7→ F × F ′ from US (R, R) × US (R, R) to US (R2 , R2 ) is not continuous. We now turn our attention to composition. Suppose that, in addition to X and Y , we have a third topological space Z that is Hausdorff. (We continue to assume that Y is Hausdorff.) We can define a composition operation from (F, G) 7→ G ◦ F from U(X, Y ) × U(Y, Z) to U(X, Z) by letting [ G(F (x)) := F (y). y∈F (x) That is, G(F (x)) is the projection onto Z of Gr(G|F (x)), which is compact by Proposition 5.1.4, so G(F (x)) is compact. Thus G ◦ F is compact valued. To show 80CHAPTER 5. TOPOLOGIES ON FUNCTIONS AND CORRESPONDENCES that G◦F is upper semicontinuous, consider an x ∈ X, and let W be a neighborhood ′ of G(F (x)). For each y ∈ F (x) S there is open neighborhood Vy such that G(y ) ⊂ W ′ for all y ∈ Vy . Setting V := y∈F (x) Vy , we have G(y) ⊂ W for all y ∈ V . If U is a neighborhood of x such that F (x′ ) ⊂ V for all x′ ∈ U, then G(F (x′ )) ⊂ W for all x′ ∈ U. We can also define G ◦ F to be the correspondence whose graph is πX×Z ((Gr(F ) × Z) ∩ (X × Gr(G))) where πX×Z : X × Y × Z → X × Z is the projection. This definition involves set operations that are not continuous, so we should suspect that (F, G) 7→ G ◦ F is not a continuous function from US (X, Y ) × US (Y, Z) to US (X, Z). For a concrete example let X = Y = Z = R, and let f and g be the constant function with value zero. If U and V are neighborhoods of the graph of f and g, there are δ, ε > 0 such that (−δ, δ) × (−ε, ε) ⊂ V , and consequently the set of g ′ ◦ f ′ with Gr(f ′ ) ⊂ U and Gr(g ′ ) ⊂ V contains the set of all constant functions with values in (−ε, ε), but of course there are neighborhoods of the graph of g ◦ f that do not contain this set of functions for any ε. 5.3 The Weak Upper Topology As in the last section, X and Y are topological spaces with Y Hausdorff. There is another topology on U(X, Y ) that is in certain ways more natural and better behaved than the strong upper topology. Recall that if {Bi }i∈I is a collection of topological spaces and { fi : A → Bi }i∈I is a collection of functions, the quotient topology on A induced by this data is the coarsest topology such that each fi is continuous. The weak upper topology on U(X, Y ) is the quotient topology induced by the functions F 7→ F |K ∈ US (K, Y ) for compact K ⊂ X. Since a function is continuous if and only if the preimage of every subbasic subset of the range is open, a subbase for the weak upper topology is given by the sets of the form { F : Gr(F |K ) ⊂ V } where K ⊂ X is compact and V is a (relatively) open subset of K × Y . Let UW (X, Y ) be U(X, Y ) endowed with the weak upper topology. As in the last section, we study the continuity of basic constructions. Lemma 5.3.1. If A is a closed subset of X, then the map F 7→ F |A is continuous as a function from UW (X, Y ) to UW (A, Y ). Proof. If A has the quotient topology induced by { fi : A → Bi }i∈I , then a function g : Z → A is continuous if each composition fi ◦ g is continuous. (The sets of the form fi−1 (Vi ), where Vi ⊂ Bi is open, constitute a subbase of the quotient topology, so this follows from Lemma 4.5.1.) To show that the composition F 7→ F |A 7→ F |K is continuous whenever K is a compact subset of A we simply observe that K is compact as a subset of X, so this follows directly from the definition of the topology of UW (X, Y ). Lemma 5.3.2. If every compact set in X is closed (e.g., because X is Hausdorff ) then the topology of UW (X, Y ) is at least as coarse as the topology of US (X, Y ). If, in addition, X is itself compact, then the two topologies coincide. 5.3. THE WEAK UPPER TOPOLOGY 81 Proof. We need to show that the identity map from US (X, Y ) to UW (X, Y ) is continuous, which is to say that for any given compact K ⊂ X, the map Gr(F ) → Gr(F |K ) = Gr(F ) ∩ (K × Y ) is continuous. This follows from Lemma 5.3.1 because K × Y is closed in X × Y whenever K is compact. If X is compact, the continuity of the identity map from UW (X, Y ) to US (X, Y ) follows directly from the definition of the weak upper topology. There is a useful variant of Lemma 5.2.3. Lemma 5.3.3. If X is normal, Hausdorff, and locally compact, then the function (K, F ) 7→ Gr(F |K ) is a continuous function from K(X) × UW (X, Y ) to K(X × Y ). Proof. We will demonstrate continuity at a given point (K, F ) in the domain. Local compactness implies that there is a compact neighborhood C of K. The map F ′ 7→ F ′ |C from U(X, Y ) to US (C, Y ) is a continuous function by virtue of the definition of the topology of U(X, Y ). Therefore Lemma 5.2.3 implies that the composition (K ′ , F ′ ) → (K ′ , F ′ |C ) → Gr(F |K ′ ) is continuous, and of course it agrees with the function in question on a neighborhood of (K, F ). In contrast with the strong upper topology, for the weak upper topology cartesian products and composition are well behaved. Let X ′ and Y ′ be two other spaces with Y ′ Hausdorff. Lemma 5.3.4. If X and X ′ are Hausdorff, then the function (F, F ′ ) 7→ F × F ′ from UW (X, Y ) × UW (X ′ , Y ′ ) to UW (X × X ′ , Y × Y ′ ) is continuous. Proof. First suppose that X and X ′ are compact. Then, by Proposition 5.1.4, the graphs of upper semicontinuous functions with these domains are compact, and continuity of the function (F, F ′) 7→ F × F ′ from US (X, Y ) × US (X ′ , Y ′ ) to US (X × X ′ , Y × Y ′ ) follows from Proposition 4.5.9. Because UW (X × X ′ , Y × Y ′ ) has the quotient topology, to establish the general case we need to show that (F, F ′ ) 7→ F × F ′ |C is a continuous function from UW (X, Y ) × UW (X ′ , Y ′ ) to US (C, Y × Y ′ ) whenever C ⊂ X × X ′ is compact. Let K and K ′ be the projections of C onto X and X ′ respectively; of course these sets are compact. The map in question is the composition (F, F ′ ) → (F |K , F ′ |K ′ ) → F |K × F ′ |K ′ → (F |K × F ′ |K ′ )|C . The continuity of the second map has already been established, and the continuity of the first and third maps follows from Lemma 5.3.1, because compact subsets of Hausdorff spaces are closed and products of Hausdorff spaces are Hausdorff1 . Suppose that, in addition to X and Y , we have a third topological space Z that is Hausdorff. Lemma 5.3.5. If K ⊂ X is compact, Y is normal and locally compact, and X × Y × Z is normal, then (F, G) 7→ Gr(G ◦ F |K ) is a continuous function from UW (X, Y ) × UW (Y, Z) to K(X × Z). 1 I do not know if the compact subsets of X × X ′ are closed when X and X ′ are compact spaces whose compact subsets are closed. 82CHAPTER 5. TOPOLOGIES ON FUNCTIONS AND CORRESPONDENCES Proof. The map F 7→ Gr(F |K ) is a continuous function from UW (X, Y ) to K(X ×Y ) by virtue of the definition of the weak upper topology, and the natural projection of X × Y onto Y is continuous, so Lemma 4.5.10 implies that im(F |K ) is a continuous function of (K, F ). Since Y is normal and locally compact, Lemma 5.3.3 implies that (F, G) 7→ Gr(G|im(F |K ) ) is a continuous function from UW (X, Y ) × UW (Y, Z) to K(X × Z), and again (F, G) 7→ im(G|im(F |K ) ) is also continuous. The continuity of cartesian products of compact sets (Proposition 4.5.9) now implies that Gr(F |K ) × im(G|im(F |K ) ) and K × Gr(G|im(F |K ) ) are continuous functions of (K, F, G). Since X is T1 while Y and Z are Hausdorff, X × Y × Z is T1 , so Lemma 4.5.6 implies that the intersection { (x, y, z) : x ∈ K, y ∈ F (x), and z ∈ G(y) } of these two sets is a continuous function of (K, F, G), and Gr(G ◦ F |K ) is the projection of this set onto X × Z, so the claim follows from another application of Lemma 4.5.10. As we explained in the proof of Lemma 5.3.1, the continuity of (F, G) 7→ G◦F |K for each compact K ⊂ X implies that (F, G) 7→ G ◦ F is continuous when the range has the weak upper topology, so: Proposition 5.3.6. If X is T1 , Y is normal and locally compact, and X × Y × Z is normal, then (F, G) 7→ G ◦ F is a continuous function from UW (X, Y ) × UW (Y, Z) to UW (X, Z). 5.4 The Homotopy Principle Let X, Y , and Z be topological spaces with Z Hausdorff, and fix a compact valued correspondence F : X × Y → Z. For each x ∈ X let Fx : Y → Z be the derived correspondence y 7→ F (x, y). Motivated by homotopies, we study the relationship between the following two conditions: (a) x 7→ Fx is a continuous function from X to US (Y, Z); (b) F is upper semi-continuous. If F : X × Y → Z is upper semicontinuous, then x 7→ Fx will not necessarily be continuous without some additional hypothesis. For example, let X = Y = Z = R, and suppose that F (0, y) = {0} for all y ∈ Y . Without F being in any sense poorly behaved, it can easily happen that for x arbitrarily close to 0 the graph of Fx is not contained in { (y, z) : |z| < (1 + y 2 )−1 }. Lemma 5.4.1. If Y is compact and F is upper semicontinuous, then x 7→ Fx is a continuous function from X to US (Y, Z). 5.5. CONTINUOUS FUNCTIONS 83 Proof. For x ∈ X let F̃x : Y → Y × Z be the correspondence F̃x (y) := {y} × Fx (y). Clearly F̃x is compact valued and continuous as a function from Y to K(Y × Z). Since Y isScompact, the image of F̃x is compact, so Lemma 4.5.14 implies that Gr(Fx ) = y∈Y F̃x (y) is compact, and Lemma 4.5.15 implies that it is closed. Since Z is a Hausdorff space, Proposition 5.1.2 implies that Gr(F ) is closed. Now Proposition 5.1.3 implies that x 7→ Gr(Fx ) is upper semicontinuous, which is the same (by Lemma 5.1.1) as it being a continuous function from X to K(Y × Z). But since Gr(Fx ) is closed for all x, this is the same as it being a continuous function from X to K0 (Y × Z), and in view of the definition of the topology of US (Y, Z), this is the same as x 7→ Fx being continuous. Lemma 5.4.2. If Y is regular and x 7→ Fx is a continuous function from X to US (Y, Z), then F is upper semicontinuous. Proof. Fix (x, y) ∈ X × Y and a neighborhood W ⊂ Z of F (x, y). Since Fx is upper semicontinuous, there is neighborhood V of y such that F (x, y ′) ⊂ W for all y ′ ∈ V . Applying the regularity of Y , let Ṽ be a closed neighborhood of y contained in V . Since x 7→ Fx is continuous, there is a neighborhood U ⊂ X of x such that Gr(Fx′ ) ⊂ (V × W ) ∪ ((Y \ Ṽ ) × Z) for all x′ ∈ U. Then F (x′ , y ′) ⊂ W for all (x′ , y ′ ) ∈ U × Ṽ . For the sake of easier reference we combine the last two results. Theorem 5.4.3. If Y is regular and compact, then F is upper semicontinuous if and only if x 7→ Fx is a continuous function from X to US (Y, Z). 5.5 Continuous Functions If X and Y are topological spaces with Y Hausdorff, CS (X, Y ) and CW (X, Y ) will denote the space of continuous functions with the topologies induced by the inclusions of C(X, Y ) in US (X, Y ) and UW (X, Y ). In connection with continuous functions, these topologies are know as the strong topology and weak topology respectively. Most of the properties of interest are automatic corollaries of our earlier work; this section contains a few odds and ends that are specific to functions. If K ⊂ X is compact and V ⊂ Y is open, let CK,V be the set of continuous functions f such that f (K) ⊂ V . The compact-open topology is the topology generated by the subbasis { CK,V : K ⊂ X is compact, V ⊂ Y is open }, and CCO (X, Y ) will denote the space of continuous functions from X to Y endowed with this topology. The set of correspondences F : X → Y with Gr(F |K ) ⊂ K × V is open in UW (X, Y ), so the compact-open topology is always at least as coarse as the topology inherited from UW (X, Y ). Proposition 5.5.1. Suppose X is regular. Then the compact-open topology coincides with the weak topology. 84CHAPTER 5. TOPOLOGIES ON FUNCTIONS AND CORRESPONDENCES Proof. What this means concretely is that whenever we are given a compact K ⊂ X, an open set W ⊂ K × Y , and a continuous f : X → Y with Gr(f |K ) ⊂ W , we can find a compact-open neighborhood of f whose elements f ′ satisfy Gr(f ′ |K ) ⊂ W . For each x ∈ K the definition of the product topology gives open sets Ux ⊂ K and Vx ⊂ Y such that (x, f (x)) ∈ Ux × Vx ⊂ W . Since f is continuous, by replacing Ux with a smaller open neighborhood if necessary, we may assume that f (Ux ) ⊂ Vx . Since X is regular, x has a closed neighborhood Cx ⊂ Ux , and Cx is compact because it is a closed subset of a compact set. Then f ∈ CCx ,Vx for each x. We can find x1 , . . . , xn such that K = Cx1 ∪ . . . ∪ Cxn , and clearly Gr(f ′ |K ) ⊂ W whenever f ′ ∈ CCx1 ,Vx1 ∩ . . . ∩ CCxn ,Vxn . For functions there is a special result concerning continuity of composition. Proposition 5.5.2. If X is compact and f : X → Y is continuous, then g 7→ g ◦ f is a continuous function from CCO (Y, Z) → CCO (X, Z). Proof. In view of the subbasis for the strong topology, it suffices to show, for a given continuous g : Y → Z and an open V ⊂ X × Z containing the graph of g ◦ f , that N = { (y, z) ∈ Y × Z : f −1 (y) × {z} ⊂ V } is a neighborhood of the graph of g. If not, then some point (y, g(y)) is an accumulation point of points of the form (f (x′ ), z) where (x′ , z) ∈ / V . Since X is compact, it cannot be the case that for each x ∈ X there are neighborhoods A of x and B of (y, g(y)) such that { (x′ , z) ∈ (A × Z) \ V : (f (x′ ), z) ∈ B } = ∅. Therefore there is some x ∈ X such that for any neighborhoods A of x and B of (y, g(y)) there is some x′ ∈ A and z such that (x′ , z) ∈ / V and (f (x′ ), z) ∈ B. Evidently f (x) = y. To obtain a contradiction choose neighborhoods A of x and W of g(y) such that A × W ⊂ V , and set B = Y × W . The following simple result, which does not depend on any additional assumptions on the spaces, is sometimes just what we need. Proposition 5.5.3. If g : Y → Z is continuous, then f 7→ g ◦ f is a continuous function from CS (X, Y ) to CS (X, Z). Proof. If U ⊂ X × Z is open, then so is (IdX × g)−1(U). Chapter 6 Metric Space Theory In this chapter we develop some advanced results concerning metric spaces. An important tool, partitions of unity, exist for locally finite open covers of a normal space: this is shown in Section 6.2. But sometimes we will be given a local cover that is not necessarily locally finite, so we need to know that any open cover has a locally finite refinement. A space is paracompact if this is the case. Paracompactess is studied in Section 6.1; the fact that metric spaces are paracompact will be quite important. Section 6.3 describes most of the rather small amount we will need to know about topological vector spaces. Of these, the most important for us are the locally convex spaces, which have many desirable properties. One of the larger themes of this study is that the concepts and results of fixed point theory extend naturally to this level of generality, but not further. Two important types of topological vector spaces, Banach spaces and Hilbert spaces, are introduced in Section 6.4. Results showing that metric spaces can be embedded in such linear spaces are given in Section 6.5. Section 6.6 presents an infinite dimensional generalization of the Tietze extension theorem due to Dugundji. 6.1 Paracompactness Fix a topological space X. A family {Sα }α∈A of subsets of X is locally finite if every x ∈ X has a neighborhood W such that there are only finitely many α with W ∩ Sα 6= ∅. If {Uα }α∈A is a cover of X, a second cover {Vβ }β∈B is a refinement of {Uα }α∈A if each Vβ is a subset of some Uα . The space X is paracompact if every open cover is refined by an open cover that is locally finite. This section is devoted to the proof of: Theorem 6.1.1. A metric space is paracompact. This result is due to Stone (1948). At first the proofs were rather complex, but eventually Rudin (1969) found a brief and simple argument. A well ordering of a set Z is a complete ordering ≤ such that any A ⊂ Z has a least element. That any set Z has a well ordering is the assertion of the well ordering theorem, which is a simple consequence of Zorn’s lemma. Let O be the set of all pairs (Z ′ , ≤′ ) where Z ′ ⊂ Z and ≤′ is a well ordering of Z. We order O by specifying that 85 86 CHAPTER 6. METRIC SPACE THEORY (Z ′ , ≤′ ) (Z ′′ , ≤′′ ) if Z ′ ⊂ Z ′′ , ≤′ is the restriction of ≤′′ to Z ′ and z ′ ≤′′ z ′′ for all z ′ ∈ Z ′ and z ′′ ∈ Z ′′ \ Z ′ . Any chain in O has an upper bound in O (just take the union of all the sets and all the orderings) so Zorn’s lemma implies that O has a maximal element (Z ∗ , ≤∗ ). If there was a z ∈ Z \ Z ∗ we could extend ≤∗ to a well ordering of Z ∗ ∪ {z} by specifying that every element of Z ∗ is less than z. This would contradict maximality, so we must have Z ∗ = Z. (The axiom of choice, Zorn’s lemma, and the well ordering theorem are actually equivalent; cf. Kelley (1955).) Proof of Theorem 6.1.1. Let {Uα }α∈A be an open cover of X where A is a well ordered set. We define sets Vαn for α ∈ A and n = 1, 2, . . ., inductively (over n) as follows: let Vαn be the union of the balls U2−n (x) for those x such that: (a) α is the least element of A such that x ∈ Uα ; S (b) x ∈ / j<n,β∈A Vβj ; (c) U3·2−n (x) ⊂ Uα . For each x there is a least α such that x ∈ Uα and an n large enough that (c) holds, so x ∈ Vαn unless x ∈ Vβj for some β and j < n. Thus {Vαn } is a cover of X, and of course each Vαn is open and contained in Uα , so it is a refinement of {Uα }. To prove that the cover is locally finite we fix x, let α be the least element of A such that x ∈ Vαn for some n, and choose j such that U2−j (x) ⊂ Vαn . We claim that U2−n−j (x) intersects only finitely many Vβi . If i > j and y satisfies (a)-(c) with β and i in place of α and n, then U2−n−j (x) ∩ / Vαn , and n + j, i ≥ j + 1. Therefore U2−i (y) = ∅ because U2−j (x) ⊂ Vαn , y ∈ U2−n−j (x) ∩ Vβi = ∅. For i ≤ j we will show that there is at most one β such that U2−n−j (x) intersects Vβi . Suppose that y and z are points satisfying (a)-(c) for β and γ, with i in place of j. Without loss of generality β preceeds γ. Then U3·2−i (y) ⊂ Uβ , z ∈ / Uβ , and n + j > i, so U2−n−j (x) cannot intersect both U2−i (y) and U2−i (z). Since this is the case for all y and z, U2−n−j (x) cannot intersect both Vβi and Vγi . 6.2 Partitions of Unity We continue to work with a fixed topological space X. This section’s central concept is: Definition 6.2.1. Let {Uα }α∈A be a locally finite open cover of X. A partition of unity subordinate to {Uα } is a collection ofP continuous functions {ψα : X → [0, 1]} such that ψα (x) = 0 whenever x ∈ / Uα and α∈A ψα (x) = 1 for each x. The most common use of a partition of unity is to construct a global function or correspondence with particular properties. Typically locally defined functions or correspondences are given or can be shown to exist, and the global object is constructed by taking a “convex combination” of the local objects, with weights that vary continuously. Of course to apply this method one must have results guaranteeing that suitable partitions of unity exist. Our goal in this section is: 87 6.2. PARTITIONS OF UNITY Theorem 6.2.2. For any locally finite open cover {Uα }α∈A of a normal space X there is a partition of unity subordinate to {Uα }. A basic tool used in the constructive proof of this result, and many others, is: Lemma 6.2.3 (Urysohn’s Lemma). If X is a normal space and C ⊂ U ⊂ X with C closed and U open, then there is a continuous function ϕ : X → [0, 1] with ϕ(x) = 0 for all x ∈ C and ϕ(x) = 1 for all x ∈ X \ U. Proof. Since X is normal, whenever C ′ ⊂ U ′ , with C ′ closed and U ′ open, there exist a closed C ′′ and an open U ′′ such that C ′ ⊂ U ′′ , X \ U ′ ⊂ X \ C ′′ , and U ′′ ∩ (X \ C ′′ ) = ∅, which is to say that C ′ ⊂ U ′′ ⊂ C ′′ ⊂ U ′ . Let C0 := C and U1 := U. Choose an open U1/2 and a closed C1/2 with C0 ⊂ U1/2 ⊂ C1/2 ⊂ U1 . Choose an open U1/4 and a closed C1/4 with C0 ⊂ U1/4 ⊂ C1/4 ⊂ U1/2 , and choose an open U3/4 and a closed C3/4 with C1/2 ⊂ U3/4 ⊂ C3/4 ⊂ U1 . Continuing in this fashion, we obtain a system of open sets Ur and a system of closed sets Cr for rationals r ∈ [0, 1] of the form k/2m (except that C1 and U0 are undefined) with Ur ⊂ Cr ⊂ Us ⊂ Cs whenever r < s. For x ∈ X let ( S inf{ r : x ∈ Cr }, x ∈ r Cr ϕ(x) := 1, otherwise. Clearly ϕ(x) = 0 for all x ∈ C and ϕ(x) = 1 for all x ∈ X \ U. Any open subset of [0, 1] is a union of finite intersections of sets of the form [0, a) and (b, 1], where 0 < a, b < 1, and [ [ ϕ−1 [0, a) = Ur and ϕ−1 (b, 1] = (X \ Cr ) r<a r>b are open, so ϕ is continuous. Below we will apply Urysohn’s lemma to a closed subset of each element of a locally finite open cover. We will need X to be covered by these closed sets, as per the next result. Proposition 6.2.4. If X is a normal space and {Uα }α∈A is a locally finite cover of X, then there is an open cover {Vα }α∈A such that for each α, the closure of Vα is contained in Uα . Proof. A partial thinning of {Uα }α∈A is a function F from a subset B of A to the open sets of X such that: (a) for each β ∈ B, the closure of F (β) is contained in Uβ ; S S (b) β∈B F (β) ∪ α∈A\B Uα = X. Our goal is to find such an F with B = A. The partial thinnings can be partially ordered as follows: F < G if the domain of F is a proper subset of the domain of G and F and G agree on this set. We will show that this ordering has maximal elements, and that the domain of a maximal element is all of A. 88 CHAPTER 6. METRIC SPACE THEORY Let {Fι }ι∈I be a chain of partial thinnings. That is, for all distinctS ι, ι′ ∈ I, either Fι < Fι′ or Fι′ < Fι . Let the domain of each Fι be Bι , let B := ι Bι , and for β ∈ B let F (β) be the common value of Fι (β) for those ι with β ∈ Bι . For each x ∈ X there is some ι with Fι (β) = F (β) for all β ∈ B such that x ∈ Uβ because there are only finitely many α with x ∈ Uα . Therefore F satisfies (b). We have shown that any chain of partial thinnings has an upper bound, so Zorn’s lemma implies that the set of all partial thinnings has a maximal element. If F is a partial thinning with domain B and α′ ∈ A \ B, then [ [ Uα F (β) ∪ X\ β∈B α∈A\B,α6=α′ is a closed subset of Uα , so it has an open superset Vα′ whose closure is contained in Uα . We can define a partial thinning G with domain B∪{α′ } by setting G(α′ ) := Vα′ and G(β) := F (β) for β ∈ B. Therefore F cannot be maximal unless its domain is all of A. Proof of Theorem 6.2.2. The result above gives a closed cover {Cα }α∈A of X with Cα ⊂ Uα for each α. For each α let ϕα : X → [0, 1] be continuous with ϕα (x) = 0 P for all x ∈ X \ Uα and ϕα (x) = 1 for all x ∈ Cα . Then α ϕα is well defined and continuous everywhere since {Uα } is locally finite, and it is positive everywhere since {Cα } covers X. For each α ∈ A set ϕα . α′ ϕα′ ψα := P 6.3 Topological Vector Spaces Since we wish to develop fixed point theory in as much generality as is reasonably possible, infinite dimensional vector spaces will inevitably appear at some point. In addition, these spaces will frequently be employed as tools of analysis. The result in the next section refers to such spaces, so this is a good point at which to cover the basic definitions and elementary results. A topological vector space V is a vector space over the real numbers1 that is endowed with a topology that makes addition and scalar multiplication continuous, and makes {0} a closed set. Topological vector spaces, and maps between them, are the objects studied in functional analysis. Over the last few decades functional analysis has grown into a huge body of mathematics; it is fortunate that our work here does not require much more than the most basic definitions and facts. We now lay out elementary properties of V . For any given w ∈ V the maps v 7→ v + w and v 7→ v − w are continuous, hence inverse homeomorphisms. That is, the topology of V is translation invariant. In particular, the topology of V is 1 Other fields of scalars, in particular the complex numbers, play an important role in functional analysis, but have no applications in this book. 6.3. TOPOLOGICAL VECTOR SPACES 89 completely determined by a neighborhood base of the origin, which simplifies many proofs. The following facts are basic. Lemma 6.3.1. If C ⊂ V is convex, then so is its closure C. Proof. Aiming at a contradiction, suppose that v = (1 − t)v0 + tv1 is not in C even though v0 , v1 ∈ C and 0 < t < 1. Let U be a neighborhood of v that does not intersect C. The continuity of addition and scalar multiplication implies that there are neighborhoods U0 and U1 of v0 and v1 such that (1 − t)v0′ + tv1′ ∈ U for all v0′ ∈ U0 and v1′ ∈ U1 . Since U0 and U1 contain points in C, this contradicts the convexity of C. Lemma 6.3.2. If A is a neighborhood of the origin, then there is closed neighborhood of the origin U such that U + U ⊂ A. Proof. Continuity of addition implies that there are neighborhoods of the origin B1 , B2 with B1 + B2 ⊂ A, and replacing these with their intersection gives a neighborhood B such that B +B ⊂ A. If w ∈ B, then w −B intersects any neighborhood of the origin, and in particular (w − B) ∩ B 6= ∅. Thus B ⊂ B + B ⊂ A. Applying this argument again gives a closed neighborhood U of the origin with U ⊂ B. We can now establish the separation properties of V . Lemma 6.3.3. V is a regular T1 space, and consequently a Hausdorff space. Proof. Since {0} is closed, translation invariance implies that V is T1 . Translation invariance also implies that to prove regularity, it suffices to show that any neighborhood of the origin, say A, contains a closed neighborhood, and this is part of what the last result asserts. As has been pointed out earlier, a simple and obvious argument shows that a regular T1 space is Hausdorff. We can say slightly more in this direction: Lemma 6.3.4. If K ⊂ V is compact and U is a neighborhood of K, then there is a closed neighborhood W of the origin such that K + W ⊂ U. Proof. For each v ∈ K Lemma 6.3.2 gives a closed neighborhood Wv of the origin, which is convex if V is locally convex, such that v + Wv + Wv ⊂ U. Then there are v1 , . . . , vn such that v1 +Wv1 , . . . , vn +Wvn is a cover of K. Let W := Wv1 ∩. . .∩Wvn . For any v ∈ K there is an i such that v ∈ vi + Wi , so that v + W ⊂ vi + Wvi + Wvi ⊂ U. A topological vector space is locally convex if every neighborhood of the origin contains a convex neighborhood. In several ways the theory of fixed points developed in this book depends on local convexity, so for the most part locally convex topological vector spaces represent the outer limits of generality considered here. 90 CHAPTER 6. METRIC SPACE THEORY Lemma 6.3.5. If V is locally convex and A is a neighborhood of the origin, then there is closed convex neighborhood of the origin W such that W + W ⊂ A. Proof. Lemma 6.3.2 gives a closed neighborhood U of the origin such that U + U ⊂ A, the definition of local convexity gives a convex neighborhood of the origin W that is contained in U. If we replace W with its closure, it will still be convex due to Lemma 6.3.1. 6.4 Banach and Hilbert Spaces We now describe two important types of locally convex spaces. A norm on V is a function k · k : V → R≥ such that: (a) kvk = 0 if and only if v = 0; (b) kαvk = |α| · kvk for all α ∈ R and v ∈ V ; (c) kv + wk ≤ kvk + kwk for all v, w ∈ V . Condition (c) implies that the function (v, w) 7→ kv − wk is a metric on V , and we endow V with the associated topology. Condition (a) implies that {0} is closed because every other point has a neighborhood that does contain the origin. Conditions (b) and (c) give the calculations kα′ v ′ − αvk ≤ kα′ v ′ − α′ vk + kα′ v − αvk = |α′| · kv ′ − vk + |α′ − α| · kvk and k(v ′ + w ′ ) − (v + w)k ≤ kv ′ − vk + kw ′ − wk, which are easily seen to imply that scalar multiplication and addition are continuous. A vector space endowed with a norm and the associated metric and topology is called a normed space. For a normed space the calculation k(1 − α)v + αwk ≤ k(1 − α)vk + kαwk = (1 − α)kvk + αkwk ≤ max{kvk, kwk} shows that for any ε > 0, the open ball of radius ε centered at the origin is convex. The open ball of radius ε centered at any other point is the translation of this ball, so a normed space is locally convex. A sequence {vm } in a topological vector space V is a Cauchy sequence if, for each neighborhood A of the origin, there is an integer N such that vm − vn ∈ A for all m, n ≥ N. The space V is complete if its Cauchy sequences are all convergent. A Banach space is a complete normed space. For the most part there is little reason to consider topological vector spaces that are not complete except insofar as they occur as subspaces of complete spaces. The reason for this is that any topological vector space V can be embedded in a complete space V whose elements are equivalence classes of Cauchy sequences, where two Cauchy sequence {vm } and {wn } are equivalent if, for each neighborhood A of the origin, there is an integer N such that vm − wn ∈ A for all m, n ≥ N. (This 6.4. BANACH AND HILBERT SPACES 91 relation is clearly reflexive and symmetric. To see that it is transitive, suppose {uℓ } is equivalent to {vm } which is in turn equivalent to {wn }. For any neighborhood A of the origin the continuity of addition implies that there are neighborhoods B, C of the origin such that B + C ⊂ A. There is N such that uℓ −vm ∈ B and vm −wn ∈ C for all ℓ, m, n ≥ N, whence uℓ − wn ∈ A.) Denote the equivalence class of {vm } by [vm ]. The vector operations have the obvious definitions: [vm ] + [wn ] := [vm + wm ] and α[vm ] := [αvm ]. The open sets of V are the sets of the form { [vm ] : vm ∈ A for all large m } where A ⊂ V is open. (It is easy to see that the condition “vm ∈ A for all large m” does not depend on the choice of representative {vm } of [vm ].) A complete justification of this definition would require verifications of the vector space axioms, the axioms for a topological space, the continuity of addition and scalar multiplication, and that {0} is a closed set. Instead of elaborating, we simply assert that the reader who treats this as an exercise will find it entirely straightforward. A similar construction can be used to embed any metric space in a “completion” in which all Cauchy sequences (in the metric sense) are convergent. As in the finite dimensional case, the best behaved normed spaces have inner products. An inner product on a vector space V is a function h·, ·i : V × V → R that is symmetric, bilinear, and positive definite: (i) hv, wi = hw, vi for all v, w ∈ V ; (ii) hαv + v ′ , wi = αhv, wi + hv ′ , wi for all v, v ′ , w ∈ V and α ∈ R; (iii) hv, vi ≥ 0 for all v ∈ V , with equality if and only if v = 0. We would like to define a norm by setting kvk := hv, vi1/2. This evidently satisfies (a) and (b) of the definition of a norm. The verification of (c) begins with the computation 0 ≤ hv, viw − hv, wiv, hv, viw − hv, wiv = hv, vi hv, vihw, wi − hv, wi2 , which implies the Cauchy-Schwartz inequality: hv, wi ≤ kvk · kwk for all v, w ∈ V . This holds with equality if v = 0 or hv, viw − hv, wiv, which is the case if and only if w is a scalar multiple of v, and otherwise the inequality is strict. The Cauchy-Schwartz inequality implies the inequality in the calculation kv + wk2 = hv + w, v + wi = kvk2 + 2hv, wi + kwk2 ≤ (kvk + kwk)2 , which implies (c) and completes the verification and k · k is a norm. A vector space endowed with an inner product and the associated norm and topology is called an inner product space. A Hilbert space is a complete inner product space. Up to linear isometry there is only one separable2 Hilbert space. Let H := { s = (s1 , s2 , . . .) ∈ R∞ : s21 + s22 + · · · < ∞ } 2 Recall that a metric space is separable if it contains a countable set of points whose closure is the entire space. 92 CHAPTER 6. METRIC SPACE THEORY P be the Hilbert space of square summable sequences. Let hs, ti := i si ti be the usual inner product; the Cauchy-Schwartz inequality implies that this sum is convergent. For any Cauchy sequence in H and for each i, the sequence of ith components is Cauchy, and the element of R∞ whose ith component is the limit of this sequence is easily shown to be the limit in H of the given sequence. Thus H is complete. The set of points with only finitely many nonzero components, all of which are rational, is a countable dense subset, so H is separable. We wish to show that any separable Hilbert space is linearly isomorphic to H, so let V be a separable Hilbert space, and let {v1 , v2 , . . . } be a countable dense subset. The span of this set is also dense, of course. Using the Gram-Schmidt process, we may pass from this set to a countable sequence w1 , w2, . . . of orthnormal vectors that has the same span. It is now easy to show that s 7→ s1 w1 + s2 w2 + · · · is a linear isometry between H and V . 6.5 EmbeddingTheorems An important technique is to endow metric spaces with geometric structures by embedding them in normed spaces. Let (X, d) be a metric space, and let C(X) be the space of bounded continuous real valued functions on X. This is, of course, a vector space under pointwise addition and scalar multiplication. We endow C(X) with the norm kf k∞ = sup |f (x)|. x∈X Lemma 6.5.1. C(X) is a Banach space. Proof. The verification that k · k∞ is actually a norm is elementary and left to the reader. To prove completeness suppose that {fn } is a Cauchy sequence. This sequence has a pointwise limit f because each {fn (x)} is Cauchy, and we need to prove that f is continuous. Fix x ∈ X and ε > 0. There is an m such that kfm −fn k < ε/3 for all n ≥ m, and there is a δ > 0 such that |fm (x′ ) −fm (x)| < ε/3 for all x′ ∈ Uδ (x). For such x′ we have |f (x′ ) − f (x)| ≤ |f (x′ ) − fm (x′ )| + |fm (x′ ) + fm (x)| + |fm (x) − f (x)| < ε. Theorem 6.5.2 (Kuratowski (1935), Wojdyslawski (1939)). X is homeomorphic to a relatively closed subset of a convex subset of C(X). If X is complete, then it is homeomorphic to a closed subset of C(X). Proof. For each x ∈ X let fx ∈ C(X) be the function fx (y) := min{1, d(x, y)}; the map h : x 7→ fx is evidently an injection from X to C(X). For any x, y ∈ X we have kfx −fy k∞ = sup | min{1, d(x, z)}−min{1, d(y, z)}| ≤ sup |d(x, z)−d(y, z)| ≤ d(x, y), z z so h is continuous. On the other hand, if {xn } is a sequence such that fxn → fx , then min{1, d(xn , x)} = |fxn (x) − fx (x)| ≤ kfxn − fx k∞ → 0, so xn → x. Thus the inverse of h is continuous, so h is a homeomorphism. 93 6.6. DUGUNDJI’S THEOREM Now suppose that fxn converges to an element f = of h(X). We have kfxn − f k∞ → 0 and Pk i=1 λi fyi of the convex hull kfxn − f k∞ ≥ |fxn (xn ) − f (xn )| = |f (xn )|, so f (xn ) → 0. For each i we have 0 ≤ fyi (xn ) ≤ f (xn )/λi → 0, which implies that xn → yi , whence f = fy1 = · · · = fyk ∈ h(X). Thus h(X) is closed in the relative topology of its convex hull. Now suppose that X is complete, and that {xn } is a sequence such that fxn → f . Then as above, min{1, d(xm , xn )} ≤ kfxm − fxn k∞ , and {fxn } is a Cauchy sequence, so {xn } is also Cauchy and has a limit x. Above we saw that fxn → fx , so fx = f . Thus h(X) is closed in C(X). The so-called Hilbert cube is I ∞ := { s ∈ H : |si | ≤ 1/i for all i = 1, 2, . . . }. For separable metric spaces we have the following refinement of Theorem 6.5.2. Theorem 6.5.3. (Urysohn) If (X, d) is a separable metric space, there is an embedding ι : X → I ∞ . Proof. Let { x1 , x2 , . . . } be a countable dense subset of X. Define ι : X → I ∞ by setting ιi (x) := min{d(x, xi ), 1/i}. Clearly ι is a continuous injection. To show that the inverse is continuous, suppose that {xj } is a sequence with ι(xj ) → ι(x). If it is not the case that xj → x, then there is a neighborhood U that (perhaps after passing to a subsequence) does not have any elements of the sequence. Choose xi in that neighborhood. The sequence of numbers min{d(xj , xi ), 1/i} is bounded below by a positive number, contrary to the assumption that ι(xj ) → ι(x). 6.6 Dugundji’s Theorem The well known Tietze extension theorem asserts that if a topological space X is normal and f : A → [0, 1] is continuous, where A ⊂ X is closed, then f has a continuous extension to all of X. A map into a finite dimensional Euclidean space is continuous if its component functions are each continuous, so Tietze’s theorem is adequate for finite dimensional applications. Mostly, however, we will work with spaces that are potentially infinite dimensional, for which we will need the following variant due to Dugundji (1951). Theorem 6.6.1. If A is a closed subset of a metric space (X, d), Y is a locally convex topological vector space, and f : A → Y is continuous, then there is a continuous extension f : X → Y whose image is contained in the convex hull of f (A). 94 CHAPTER 6. METRIC SPACE THEORY Proof. The sets Ud(x,A)/2 (x) are open and cover X \ A. Theorem 6.1.1 implies the existence of an open locally finite refinement {Wα }α∈I . Theorem 6.2.2 implies the existence of a partition of unity {ϕα }α∈I subordinate to {Wα }α∈I . For each α choose aα ∈ A with d(aα , Wα ) < 2d(A, Wα ), and define the extension by setting X f (x) := ϕα (x)f (aα ) (x ∈ X \ A). α∈I Clearly f is continuous at every point of X \ A and at every interior point of A. Let a be a point in the boundary of A, let U be a neighborhood of f (a), which we may assume to be convex, and choose δ > 0 small enough that f (a′ ) ∈ U whenever a′ ∈ Uδ (a) ∩ A. Consider x ∈ Uδ/7 (a) ∩ (X \ A). For any α such that x ∈ Wα and x′ such that Wα ⊂ Ud(x′ ,A)/2 (x′ ) we have d(aα , Wα ) ≥ d(aα , x′ ) − d(x′ , A)/2 ≥ d(aα , x′ ) − d(x′ , aα )/2 = d(aα , x′ )/2 and d(x′ , x) ≤ d(x′ , A)/2 ≤ d(Wα , A) ≤ d(Wα , aα ), so d(aα , x) ≤ d(aα , x′ ) + d(x′ , x) ≤ 3d(aα , Wα ) ≤ 6d(A, Wα ) ≤ 6d(a, x). Thus d(aα , a) ≤ d(aα , x)+d(x, a) ≤ 7d(x, a) < δ whenever x ∈ Wα , so f (x) ∈ U. Chapter 7 Retracts This chapter begins with Kinoshita’s example of a compact contractible space that does not have the fixed point property. The example is elegant, but also rather complex, and nothing later depends on it, so it can be postponed until the reader is in the mood for a mathematical treat. The point is that fixed point theory depends on some additional condition over and above compactness and contractibility. After that we develop the required material from the theory of retracts. We first describe retracts in general, and then briefly discuss Euclidean neighborhood retracts, which are retracts of open subsets of Euclidean spaces. This concept is quite general, encompassing simplicial complexes and (as we will see later) smooth manifolds. The central concept of the chapter is the notion of an absolute neighborhood retract (ANR) which is a metrizable space whose image, under any embedding as a closed subset of a metric space, is a retract of some neighborhood of itself. The two key characterization results are that an open subset of a convex subset of a locally convex linear space is an absolute neighborhood retract, and that an ANR can be embedded in a normed linear space as a retract of an open subset of a convex set. An absolute retract (AR) is a space that is a retract of any metric space it is embedded in as a closed subset. It turns out that the ARs are precisely the contractible ANRs. The extension of fixed point theory to infinite dimensional settings ultimately depends on “approximating” the setting with finite dimensional objects. Section 7.6 provides one of the key results in this direction. 7.1 Kinoshita’s Example This example came to be known as the “tin can with a roll of toilet paper.” As you will see, this description is apt, but does not do justice to the example’s beauty and ingenuity. Polar coordinates facilitate the description. Let P = [0, ∞) × R, with (r, θ) ∈ P identified with the point (r cos θ, r sin θ). The unit circle and the open unit disk are C = { (r, θ) : r = 1 } and 95 D = { (r, t) : r < 1 }. 96 CHAPTER 7. RETRACTS Let ρ : [0, ∞) → [0, 1) be a homeomorphism, let s : [0, ∞) → P be the function s(t) := (ρ(t), t), and let S = { s(t) : t ≥ 0 }. Then S is a curve that spirals out from the origin, approaching C asymptotically. The space of the example is X = (C × [0, 1]) ∪ (D × {0}) ∪ (S × [0, 1]) ⊂ R3 . Here C × [0, 1] is the cylindrical side of the tin can, D × {0} is its base, and S × [0, 1] is the roll of toilet paper. Evidently X is closed, hence compact, and there is an obvious contraction of X that first pushes the cylinder of the tin can and the toilet paper down onto the closed unit disk and then contracts the disk to the origin. We are now going to define functions f1 : C × [0, 1] → X, f2 : D × {0} → X, f3 : S × [0, 1] → X which combine to form a continuous function f : X → X with no fixed points. Fix a number ε > 0 that is not an integral multiple of 2π; imagining that ε is small may help to visualize f as a motion of X. Also, fix a continuous function κ : [0, 1] → [0, 1] with κ(0) = 0, κ(1) = 1, and κ(z) > z for all 0 < z < 1. The first function is given by the formula f1 (1, θ, z) := (1, θ − (1 − 2z)ε, κ(z)). This is evidently well defined and continuous. The point (1, θ, z) cannot be fixed because κ(z) = z implies that z = 0 or z = 1 and ε is not a multiple of 2π. Observe that D = { (ρ(t), θ) : t ≥ 0, θ ∈ R }. The second function is ( (0, 0, 1 − t/ε), 0 ≤ t ≤ ε, f2 (ρ(t), θ, 0) := (ρ(t − ε), θ − ε, 0), ε ≤ t. This is well defined because ρ is invertible and the two formulas give the origin as the image when t = ε. It is continuous because it is continuous on the two subdomains, which are closed and cover D. It does not have any fixed points because the coordinate of f2 (ρ(t), θ, 0) is less than ρ(t) except when t = 0, and f2 (ρ(0), θ, 0) = (0, 0, 1). The third function is ( (s((t + ε)z), 1 − (1 − κ(z))t/ε), 0 ≤ t ≤ ε, f3 (s(t), z) := (s(t − (1 − 2z)ε), κ(z)), ε ≤ t. This is well defined because s is invertible and the two formulas give (s(2εz), κ(z)) as the image when t = ε. It is continuous because it is continuous on the two subdomains, which are closed and cover S × [0, 1]. Since f2 (s(t), 0) = f3 (s(t), 0) for all t, f2 and f3 combine to define a continuous function on the union of their domains. Can (s(t), z) be a fixed point of f3 ? If t < ε, then the equation z = 1 − (1 − κ(z))t/ε 7.2. RETRACTS 97 is equivalent to (1 − κ(z))t = (1 − z)ε, which is impossible if z < 1 due to the conditions on κ. When t < ε and z = 1, we have s(t) 6= s(t + ε) because s is injective. On the other hand, when t ≥ ε the equation κ(z) = z implies that either z = 0, in which case s(t) 6= s(t − ε), or z = 1, in which case s(t) 6= s(t + ε). We have now shown that f is well defined and has no fixed points, and that it is continuous on (S ×[0, 1])∪(D×{0}) and on C ×[0, 1]. To complete the verification of continuity, first consider a sequence {(ρ(ti ), θi , 0)} in D × {0} converging to (1, θ, 0). Clearly f2 (ρ(ti ), θi , 0) = (ρ(ti − ε), θi − ε, 0) → (1, θ − ε, 0) = f1 (1, θ, 0). Now consider a sequence {(s(ti ), zi )} converging to a point (1, θ, z). In order for f to be continuous it must be the case that f3 (s(ti ), zi ) = (s(ti − (1 − 2zi )ε), κ(zi )) → (1, θ − (1 − 2z)ε, κ(z)) = f1 (1, θ, z). Since s(ti ) → (1, θ) means precisely that ti → ∞ and ti mod 2π → θ mod 2π, again this is clear. 7.2 Retracts This section prepares for later material by presenting general facts about retractions and retracts. Let X be a metric space, and let A be a subset of X such that there is a continuous function r : X → A with r(a) = a for all a ∈ A. We say that A is a retract of X and that r is a retraction. Many desirable properties that X might have are inherited by A. Lemma 7.2.1. If X has the fixed point property, then A has the fixed point property. Proof. If f : A → A is continuous, then f ◦ r necessarily has a fixed point, say a∗ , which must be in A, so that a∗ = f (r(a∗ )) = f (a∗ ) is also a fixed point of f . Lemma 7.2.2. If X is contractible, then A is contractible. Proof. If c : X × [0, 1] → X is a contraction X, then so is (a, t) 7→ r(c(a, t)). Lemma 7.2.3. If X is connected, then A is connected. Proof. We show that if A is not connected, then X is not connected. If U1 and U2 are nonempty open subsets of A with U1 ∩ U2 = ∅ and U1 ∪ U2 = A, then r −1 (U1 ) and r −1 (U2 ) are nonempty open subsets of X with r −1 (U1 ) ∩ r −1 (U2 ) = ∅ and r −1 (U1 ) ∪ r −1 (U2 ) = X. Here are two basic observations that are too obvious to prove. Lemma 7.2.4. If s : A → B is a second retraction, then s ◦ r : X → B is a retraction, so B is a retract of X. Lemma 7.2.5. If A ⊂ Y ⊂ X, then the restriction of r to Y is a retraction, so A is a retract of Y . 98 CHAPTER 7. RETRACTS We say that A is a neighborhood retract in X if A is a retract of an open U ⊂ X. We note two other simple facts, the first of which is an obvious consequence of the last result: Lemma 7.2.6. Suppose that A is not connected: there are disjoint open sets U1 , U2 ⊂ X such that A ⊂ U1 ∪ U2 with A1 := A ∩ U1 and A2 := A ∩ U2 both nonempty. Then A is a neighborhood retract in X if and only if both A1 and A2 are neighborhood retracts in X. Lemma 7.2.7. If A is a neighborhood retract in X and B is a neighborhood retract in A, then B is a neighborhood retract in X. Proof. Let r : U → A and s : V → B be retractions, where U is a neighborhood of A and V ⊂ A is a neighborhood of B in the relative topology of A. The definition of the relative topology implies that there is a neighborhood W ⊂ X of B such that V = A ∩ W . Then U ∩ W is a neighborhood of B in X, and the composition of s with the restriction of r to U ∩ W is a retraction onto B. A set A ⊂ X is locally closed if it is the intersection of an open set and a closed set. Equivalently, it is an open subset of a closed set, or a closed subset of an open set. Lemma 7.2.8. A neighborhood retract is locally closed. Proof. If U ⊂ X is open and r : U → A is a retraction, A is a closed subset of U because it is the set of fixed points of r. This terminology ‘locally closed’ is further explained by: Lemma 7.2.9. If X is a topological space and A ⊂ X, then A is locally closed if and only if each point x ∈ A has a neighborhood U such that U ∩ A is closed in U. Proof. If A = U ∩ C where U is open and C is closed, then U is a neighborhood of each x ∈ A, and A is closed in U. On the other hand suppose that each x ∈ A has a neighborhood Ux suchSthat Ux ∩ A S is closed in Ux ,Swhich is to say that Ux ∩ A = Ux ∩ A. Then A = x (Ux ∩ A) = x (Ux ∩ A) = x Ux ∩ A. Corollary 7.2.10. If X is locally compact, a set A ⊂ X is locally closed if and only if each x ∈ A has a compact neighborhood. Proof. If A = U ∩ C, x ∈ A, and K is a compact neighborhood of x contained in U, then K ∩ C is a compact neighborhood in A. On the other hand, if x ∈ A and K is a compact neighborhood of x in A, then K = A ∩ V for some neighborhood V of x in X. Let U be the interior of V . Then U ∩ A = U ∩ K is closed in U. This shows that if every point in K has a compact neighborhood, then the condition in the last result holds. 7.3. EUCLIDEAN NEIGHBORHOOD RETRACTS 7.3 99 Euclidean Neighborhood Retracts A Euclidean neighborhood retract (ENR) is a topological space that is homeomorphic to a neighborhood retract of a Euclidean space. If a subset of a Euclidean space is homeomorphic to an ENR, then it is a neighborhood retract: Proposition 7.3.1. Suppose that U ⊂ Rm is open, r : U → A is a retraction, B ⊂ Rn , and h : A → B is a homeomorphism. Then B is a neighborhood retract. Proof. Since A is locally closed and Rm is locally compact, each point in A has a closed neighborhood that contains a compact neighborhood. Having a compact neighborhood is an intrinsic property, so every point in B has such a neighborhood, and Corollary 7.2.10 implies that B is locally closed. Let V ⊂ Rn be an open set that has B as a closed subset. The Tietze extension theorem gives an extension of h−1 to a map j : V → Rm . After replacing V with j −1 (U), V is still an open set that contains B, and h ◦ r ◦ j : V → B is a retraction. Note that every locally closed set A = U ∩ C ⊂ Rm is homeomorphic to a closed subset of Rm+1 , by virtue of the embedding x 7→ (x, d(x, Rm \ U)−1 ), where d(x, Rm \ U) is the distance from x to the nearest point not in U. Thus a sufficient condition for X to be an ENR is that it is homeomorphic to a neighborhood retract of a Euclidean space, but a necessary condition is that it homeomorphic to a closed neighborhood retract of a Euclidean space. In order to expand the scope of fixed point theory, it is desirable to show that many types of spaces are ENR’s. Eventually we will see that a smooth submanifold of a Euclidean space is an ENR. At this point we can show that simplicial complexes have this property. Lemma 7.3.2. If K ′ = (V ′ , C ′ ) is a subcomplex of a simplicial complex K = (V, C), then |K ′ | is a neighborhood retract in |K|. Proof. To begin with suppose that there are simplices of positive dimension in K that are not in K ′ . Let σ be such a simplex of maximal dimension, and let β be the barycenter of |σ|. Then |K| \ {β} is a neighborhood of |K| \ int |σ|, and there is a retraction r of the former set onto the latter that is the identity on the latter, of course, and which maps (1 − t)x + tβ to x whenever x ∈ |∂σ| and 0 < t < 1. Iterating this construction and applying Lemma 7.2.7 above, we find that there is a neighborhood retract of |K| consisting of |K ′ | and finitely many isolated points. Now Lemma 7.2.6 implies that |K ′ | is a neighborhood retract in |K|. Proposition 7.3.3. If K = (V, C) is a simplicial complex, then |K| is an ENR. Proof. Let ∆ be the convex hull of the set of unit basis vectors in R|V | . After repeated barycentric subdivision of ∆ there is a (|V | − 1)-dimensional simplex σ in the interior of ∆. (This is a consequence of Proposition 2.5.2.) Identifying the vertices of σ with the elements of V leads to an embedding of |K| as a subcomplex of this subdivision, after which we can apply the result above. 100 CHAPTER 7. RETRACTS Giving an example of a closed subset of a Euclidean space that is not an ENR is a bit more difficult. Eventually we will see that a contractible ENR has the fixed point property, from which it follows that Kinoshita’s example is not an ENR. A simpler example is the Hawaiian earring H, which is the union over all n = 1, 2, . . . of the circle of radius 1/n centered at (1/n, 0). If there was a retraction r : U → H of a neighborhood U of H, then for small n the entire disk of radius 1/n centered at (1/n, 0) would be contained in U, and we would have a violation of the following result, which is actually a quite common method of applying the fixed point principle. Theorem 7.3.4 (No Retraction Theorem). If D n is the closed unit disk centered at the origin in Rn , and S n−1 is its boundary, then there does not exist a continuous r : D n → Rn \ D n with r(x) = x for all s ∈ S n−1 . Proof. Suppose that such an r exists, and let g : D n → S n−1 be the function that takes each x ∈ S n−1 to itself and takes each x ∈ D n \ S n−1 to the point where the line segment between r(x) and x intersects S n−1 . An easy argument shows that g is continuous at each x ∈ D n \ S n−1 , and another easy argument shows that g is continuous at each x ∈ S n−1 , so g is continuous. If a : S n−1 → S n−1 is the antipodal map a(x) = −x, then a ◦ g gives a map from D n to itself that does not have a fixed point, contradicting Brouwer’s fixed point theorem. 7.4 Absolute Neighborhood Retracts A metric space A is an absolute neighborhood retract (ANR) if h(A) is a neighborhood retract whenever X is a metric space, h : A → X is an embedding, and h(A) is closed. This definition is evidently modelled on the description of ENR’s we arrived at in the last section, with ‘metric space’ in place of ‘Euclidean space.’ We saw above that if A ⊂ Rm is a neighborhood retract, then any homeomorphic image of A in another Euclidean space is also a neighborhood retract, and some such homeomorphic image is a closed subset of the Euclidean space. Thus a natural, and at least potentially more restrictive, extension of the concept is obtained by defining an ANR to be a space A such that h(A) is a neighborhood retract whenever h : A → X is an embedding of A is a metric space X, even if h(A) is not closed. There is a second sense in which the definition is weaker than it might be. A topological space is completely metrizable if its topology can be induced by a complete metric. Since an ENR is homeomorphic to a closed subset of a Euclidean space, an ENR is completely metrizable. Problem 6K of Kelley (1955) shows that a topological space A is completely metrizable if and only if, whenever h : A → X is an embedding of A in a metric space X, h(A) is a Gδ . The set of rational numbers is an example of a space that is metrizable, but not completely metrizable, because it is T not a Gδ as a subset of R. To see this observe that the set of irrational numbers is r∈Q R \ {r}, so if Q was a countable intersection of open sets, then ∅ would be a countable intersection of open sets, contrary to the Baire category theorem (p. 200 of Kelley (1955)). The next result shows that the union of { eπir : r ∈ Q } with the open unit disk in C is an ANR, but this space is not completely metrizable, so it is not an ENR. Thus there are finite dimensional ANR’s that are not ENR’s. 7.4. ABSOLUTE NEIGHBORHOOD RETRACTS 101 By choosing the least restrictive definition we strengthen the various results below. However, these complexities are irrelevant to compact ANR’s, which are, for the most part, the only ANR’s that will figure in our work going forward. Of course the homeomorphic image h(A) of a compact metric space A in any metric space is compact and consequently closed, and of course h(A) is also complete. At first blush being an ANR might sound like a remarkable property that can only be possessed by quite special spaces, but this is not the case at all. Although ANR’s cannot exhibit the “infinitely detailed features” of the tin can with a roll of toilet paper, the concept is not very restrictive, at least in comparison with other concepts that might serve as an hypothesis of a fixed point theorem. Proposition 7.4.1. A metric space A is an ANR if it (or its homeomorphic image) is a retract of an open subset of a convex subset of a locally convex linear space. Proof. Let r : U → A be a retraction, where U is an open subset of a convex set C. Suppose h : A → X maps A homeomorphically onto a closed subset h(A) of a metric space X. Dugundji’s theorem implies that h−1 : h(A) → U has a continuous extension j : X → C. Then V = j −1 (U) is a neighborhood of h(A), and h ◦ r ◦ j|V : V → h(A) is a retraction. Corollary 7.4.2. An ENR is an ANR. The proposition above gives a sufficient condition for a space to be an ANR. There is a somewhat stronger necessary condition. Proposition 7.4.3. If A is an ANR, then there is a homeomorphic image of A that is a retract of an open subset of a convex subset of Banach space. Proof. Theorem 6.5.2 gives a map h : A → Z, where Z is a Banach space, such that h maps A homeomorphically onto h(A) and h(A) is closed in the relative topology of its convex hull C. Since A is an ANR, there is a relatively open U ⊂ C and a retraction r : U → h(A). Since compact metric spaces are separable, compact ANR’s satisfy a more demanding embedding condition than the one given by Proposition 7.4.3. Proposition 7.4.4. If A is a compact ANR, then there exists an embedding ι : A → I ∞ such that ι(A) is a neighborhood retract in I ∞ . Proof. Urysohn’s Theorem guarantees the existence of an embedding of A in I ∞ . Since A is compact, h(A) is closed in I ∞ , and since A is an ANR, h(A) is a neighborhood retract in I ∞ . The simplicity of an open subset of I ∞ is the ultimate source of the utility of ANR’s in the theory of fixed points. To exploit this simplicity we need analytic tools that bring it to the surface. Fix a compact metric space (X, d), and let ∆ = { (x, x) : x ∈ X } be the diagonal in X × X. We say that (X, d) is uniformly locally contractible if, for any neighborhood V ⊂ X × X of ∆ there is a neighborhood W of ∆ and a map γ : W × [0, 1] → X such that: 102 CHAPTER 7. RETRACTS (a) γ(x, x′ , 0) = x′ and γ(x, x′ , 1) = x for all (x, x′ ) ∈ W ; (b) γ(x, x, t) = x for all x ∈ X and t ∈ [0, 1]; (c) (x, γ(x, x′ , t)) ∈ V for all (x, x′ ) ∈ W and t ∈ [0, 1]. Proposition 7.4.5. A compact ANR A is uniformly locally contractible. Proof. By Proposition 7.4.4 we may assume that A ⊂ I ∞ , and that there is a retraction r : U → A where U ⊂ I ∞ is open. Fix a neighborhood V ⊂ A × A of the diagonal, and let Ṽ = (IdA × r)−1 (V ) ⊂ A × U. The distance from x to the nearest point in I ∞ \ U is a positive continuous function on A, which attains its minimum since A is compact, so there is some δ > 0 such that ∆δ = { (x, x′ ) ∈ A × U : kx′ − xk < δ } ⊂ Ṽ . Let W = ∆δ ∩ (A × A), and let γ : W × [0, 1] → A be the function γ(x, x′ , t) = r(tx + (1 − t)x′ ). Evidently γ has all the required properties. A topological space X is locally path connected if, for each x ∈ X, each neighborhood Y of x contains a neighborhood U such that for any x0 , x1 ∈ U there is a continuous path γ : [0, 1] → Y with γ(0) = x0 and γ(1) = x1 . At first sight this seems less straightforward than requiring that any neighborhood of x contain a pathwise connected neighborhood, but the weaker condition given by the definition is sometimes much easier to verify, and it usually has whatever implications are desired. Corollary 7.4.6. A compact ANR A is locally path connected. Proof. The last result (with V = A × A) gives a neighborhood W ⊂ A × A of the diagonal and a function γ : W × [0, 1] → A satisfying (a) and (b). Fix x ∈ A, and let Y be a neighborhood of x. There is a neighborhood U of x such that U × U ⊂ W and γ(U × U × [0, 1]) ⊂ Y . (Combining (b) and the continuity of γ, for each t ∈ [0, 1] there is a neighborhood Ut and εt > 0 such that Ut × Ut ⊂ W and γ(Ut × Ut × (tS− εt , t + εt )) ⊂ Y . Since [0,T 1] is compact there are t1 , . . . , tk such that [0, 1] ⊂ i (ti − εti , ti + εti ). Let U = i Uti .) Then for any x0 , x1 ∈ U, t 7→ γ(x1 , x0 , t) is a path in Y going from x0 to x1 . 7.5 Absolute Retracts A metric space A is an absolute retract (AR) if h(A) is a retract of X whenever X is a metric space, h : A → X is an embedding, and h(A) is closed. Of course an AR is an ANR. Below we will see that an ANR is an AR if and only if it is contractible, so compact convex sets are AR’s. Eventually (Theorem 14.1.5) we will show that nonempty compact AR’s have the fixed point property. In this sense AR’s fulfill our goal of replacing the assumption of a convex domain in Kakutani’s theorem with a topological condition. The embedding conditions characterizing AR’s parallel those for ANR’s, with some simplifications. 7.5. ABSOLUTE RETRACTS 103 Proposition 7.5.1. If a metric space A is a retract of a convex subset C of a locally convex linear space, then it is an ANR. Proof. Suppose h : A → X maps A homeomorphically onto a closed subset h(A) of a metric space X. Dugundji’s theorem implies that h−1 : h(A) → C has a continuous extension j : X → C. Let r : C → A be a retraction. Then q := h ◦ r ◦ j is a retraction of X onto h(A). Proposition 7.5.2. If A is an AR, then there is a homeomorphic image of A that is a retract of a convex subset of a Banach space. Proof. Theorem 6.5.2 gives a map h : A → Z, where Z is a Banach space, such that h maps A homeomorphically onto h(A) and h(A) is closed in the relative topology of its convex hull C. Since A is an AR, there is a retraction r : C → h(A). The remainder of the section proves: Proposition 7.5.3. An ANR is an AR if and only if it is contractible. In preparation for the proof we introduce an important concept of general topology. A pair of topological spaces X, A with A ⊂ X are said to have the homotopy extension property with respect to the class ANR if, whenever: (a) Y is an ANR, (b) f : X → Y is continuous, (c) η : A × [0, 1] → Y is a homotopy, and (d) η(·, 0) = f |A , there is a continuous η : X × [0, 1] → Y with η(·, 0) = f and η|A×[0,1] = η. Proposition 7.5.4. If X is a metric space and A is a closed subset of X, then X and A have the homotopy extension property with respect to ANR’s. We separate out one of the larger steps in the argument. Lemma 7.5.5. Let X be a metric space, let A be a closed subset of X, and let Z := (X × {0}) ∪ (A × [0, 1]). Then for every neighborhood V ⊂ X × [0, 1] of Z there is a map ϕ : X × [0, 1] → V that agrees with the identity on Z. Proof. For each (a, t) ∈ A × [0, 1] choose a product neighborhood U(a,t) × (t − ε(a,t) , t + ε(a,t) ) ⊂ V where U(a,t) ⊂ X is open and ε > 0. For any particular a the cover of {a}×[0, 1] has a finite subcover, and the intersection of S its first cartesian factors is a neighborhood Ua of a with Ua × [0, 1] ⊂ V . Let U := a Ua . Thus there is a neighborhood U of A such that U × [0, 1] ⊂ V . Urysohn’s lemma gives a function α : X → [0, 1] with α(x) = 0 for all x ∈ X \ U and α(a) = 1 for all a ∈ A, and the function ϕ(x, t) := (x, α(x)t) satisfies the required conditions. 104 CHAPTER 7. RETRACTS Proof of Proposition 7.5.4. Let Y , f : X → Y , and h : A × [0, 1] → Y satisfy (a)(d) above. By Theorem 6.5.2 we may assume without loss of generality that Y is contained in a Banach space S, and is a relatively closed subset of its convex hull C. Let Z := (X ×{0})∪(A×[0, 1]), and define g : Z → Y by setting g(x, 0) = f (x) and g(a, t) = h(a, t). Dugundji’s theorem implies that there is a continuous extension g : X × [0, 1] → C of g. Let W ⊂ C be a neighborhood of Y for which there is a retraction r : W → Y , let V := g −1 (W ), and let ϕ : X × [0, 1] → V be a continuous map that is the identity on Z, as per the result above. Clearly η := r ◦ g ◦ ϕ has the indicated properties. We now return to the characterization of AR’s. Proof of Proposition 7.5.3. Let A be an ANR. By Theorem 6.5.2 we may embed A as a relatively closed subset of a convex subset C of a Banach space. If A is an AR, then it is a retract of C. A convex set is contractible, and a retract of a contractible set is contractible (Lemma 7.2.2) so A is contractible. Suppose that A is conractible. By Proposition 7.5.1 it suffices to show that A is a retract of C. Let c : A×[0, 1] → A be a contraction, and let a1 be the “final value” a1 , by which we mean that c(a, 1) = a1 for all a ∈ A. Set Z := (C ×{0})∪(A×[0, 1]), and define f : Z → A by setting f (x, 0) := a1 for x ∈ C and f (a, t) := c(a, 1 − t) for (a, t) ∈ A × [0, 1]. Proposition 7.5.4 implies the existence of a continuous extension f : C × [0, 1] → A. Now r := f (·, 1) : C → A is the desired retraction. 7.6 Domination In our development of the fixed point index an important idea will be to pass from a theory for certain simple or elementary spaces to a theory for more general spaces by showing that every space of the latter type can be “approximated” by a simpler space, in the sense of the following definitions. Fix a metric space (X, d). Definition 7.6.1. If Y is a topological space and ε > 0, a homotopy η : Y ×[0, 1] → X is an ε-homotopy if d η(y, s), η(y, t) < ε for all y ∈ Y and all 0 ≤ s, t ≤ 1. We say that η0 and η1 are ε-homotopic. Definition 7.6.2. For ε > 0, a topological space D ε-dominates C ⊂ X if there are continuous functions ϕ : C → D and ψ : D → X such that ψ ◦ ϕ : C → X is ε-homotopic to IdC . This section’s main result is: Theorem 7.6.3 (Domination Theorem). If X is a separable ANR and C ⊂ X is compact, then for any ε > 0 there is a simplicial complex that ε-dominates C. Proof. If C = ∅, then for any ε > 0 it is ε-dominated by ∅, which we consider to be a simplicial complex. Similarly, if C is a singleton, then for any ε > 0 it is ε-dominated by the simplicial complex consisting of a single point. Therefore we may assume that C has more than one point. 105 7.6. DOMINATION In view of Proposition 7.4.3 we may assume that X is a retract of an open set U of a convex subset S of a Banach space. Let r : U → X be the retraction, and let d be the metric on S derived from the norm of the Banach space. Fix ε > 0 small enough that C is not contained in the ε/2-ball around any of its points. Let r : U → X be a retraction of a neighborhood onto X. For x ∈ C let ρ(x) := 12 d x, S \ r −1 (Uε/2 (x) ∩ X) . Choose x1 , . . . , xn ∈ C such that U1 := Uρ(x1 ) (x1 ), . . . , Un := Uρ(xn ) (xn ) is an open cover of C. Let e1 , . . . , en be the standard unit basis vectors of Rn . The nerve of the open cover is [ [ N(U1 ,...,Un) = conv({ ej : x ∈ Uj }) = conv(ej1 , . . . , ejk ). x∈X Vj1 ∩...∩Vjk 6=∅ Of course it is a (geometric) simplicial complex. There are functions α1 , . . . , αn : C → [0, 1] given by d(x, X \ Ui ) αi (x) := Pn . j=1 d(x, X \ Uj ) Of course the denominator is always positive, so these functions are well defined and continuous. There is a continuous function ϕ : C → N(U1 ,...,Un ) given by ϕ(x) := n X αj (x)ej . j=1 We would like to define a function ψ : N(U1 ,...,Un ) → X be setting ψ n X j=1 αj ej = r n X j=1 αj xj . Pn Consider a point y = j=1 αj ej ∈ N(U1 ,...,Un ) . Let j1 , . . . , jk be the indices j such that αj > 0, ordered so that ρ(xj1 ) ≥ max{ρ(xj2 ), . . . , ρ(xjk )}. Let Tp B := U2ρ(xj1 ) (xj1 ). The definition of N(U1 ,...,Un ) implies that there is a point z ∈ h=0 Ujh . For all h = 1, . . . , k we have xjh ∈ B because d(z, xjh ) < ρ(xjh ) ≤ ρ(xj1 ). Now note that B ⊂ r −1 (Uε/2 (xj1 ) ∩ X) ⊂ U. P Since B is convex, it contains kh=1 αjh xjh , so ψ is well defined. Now we would like to define a homotopy η : C × [0, 1] → X by setting X η(x, t) = r (1 − t) αj (x)xj + tx , j 106 CHAPTER 7. RETRACTS so suppose that y = ϕ(x) for some x ∈ C. Then x ∈ Uj1 ∩ . . . ∩ Ujk . In particular B := U2ρ(xj1 ) (xj1 ) ⊃ Uρ(xj1 ) (xj1 ) = Uj1 , so B contains x. Again, since B is convex P P it contains the line segment between x and nj=1 αj (x)xj = kh=1 αjh xjh , so η is well defined. Evidently η is continuous with η0 = ψ ◦ ϕ and η1 = IdC . In addition, since B ⊂ U we have η(x, t) ∈ r(B) ⊂ Uε/2 (xj1 ) ⊂ Uε (x) for all 0 ≤ t ≤ 1. Sometimes we will need the following variant. Theorem 7.6.4. If X is a separable ANR and C ⊂ X is compact, then for any ε > 0 there is an open U ⊂ Rm , for some m, such that U is compact and ε-dominates C. Proof. Fixing ε > 0, let P ⊂ Rm be a simplicial complex that ε-dominates C by virtue of the maps ϕ : C → P and ψ : P → X. Since P is an ENR (Proposition 7.3.3) it is a neighborhood retract. Let r : U ′ → P be a retraction of a neighborhood. For sufficiently small ε > 0 the closed ε-ball around P is contained in U ′ . Let U be the open ε-ball around P . Of course U is compact. Let ϕ′ : C → U be ϕ interpreted as a function with range U , and let ψ ′ = ψ ◦ r : U → X. Since ψ ′ ◦ ϕ′ = ψ ◦ ϕ, C is ε-dominated by U . Chapter 8 Essential Sets of Fixed Points Figure 2.1 shows a function f : [0, 1] → [0, 1] with two fixed points, s and t. Intuitively, they are qualitatively different, in that a small perturbation of f can result in a function that has no fixed points near s, but this is not the case for t. This distinction was recognized by Fort (1950) who described s as inessential, while t is said to be essential. 1 b b 0 0 s t 1 Figure 1.1 In game theory one often deals with correspondences with sets of fixed points that are infinite, and include continua such as submanifolds. As we will see, the definition proposed by Fort can be extended to sets of fixed points rather easily: roughly, a set of fixed points is essential if every neighborhood of it contains fixed points of every “sufficiently close” perturbation of the given correspondence. (Here one needs to be careful, because in the standard terminology of game theory, following Jiang (1963), essential Nash equilibria, and essential sets of Nash equilibria, are defined in terms of perturbations of the payoffs. This is a form of Q-robustness, which is studied in Section 8.3.) But it is easy to show that the set of all fixed 107 108 CHAPTER 8. ESSENTIAL SETS OF FIXED POINTS points is essential, so some additional condition must be imposed before essential sets can be used to distinguish some fixed points from others. The condition that works well, at least from a mathematical viewpoint, is connectedness. This chapter’s main result, Theorem 8.3.2, which is due to Kinoshita (1953), asserts that minimal (in the sense of set inclusion) essential sets are connected. The proof has the following outline. Let K be a minimal essential set of an upper semicontinuous convex valued correspondence F : X → X, where X is a compact, convex subset of a locally convex toplogical vector space. Suppose that K is disconnected, so there are disjoint open sets U1 , U2 such that K1 := K ∩ U1 and K2 := K ∩ U2 are nonempty and K1 ∪ K2 = K. Since K is minimal, K1 and K2 are not essential, so there are perturbations F1 and F2 of F such that each Fi has no fixed points near Ki . Let α1 , α2 : X → [0, 1] be continuous functions such that each αi vanishes outside Ui and is identically 1 near Ki , and let α : X → [0, 1] be the function α(x) := 1 − α1 (x) − α2 (x). Then α, α1 , α2 is a partition of unity subordinate to the open cover X \ K, U1 , U2 . The correspondence x 7→ α(x)F (x) + α1 (x)F1 (x) + α2 (x)F2 (x) is then a perturbation of F that has no fixed points near K, which contradicts the assumption that K is essential. Much of this chapter is concerned with filling in the technical details of this argument. Turning to our particular concerns, Section 8.1 gives the Fan-Glicksberg theorem, which is the extension of the Kakutani fixed point theorem to infinite dimensional sets. Section 8.2 shows that convex valued correspondences can be approximated by functions, and defines convex combinations of convex valued correspondences, with continuously varying weights. Section 8.3 then states and proves Kinoshita’s theorem, which implies that minimal connected sets exist. There remains the matter of proving that minimal essential sets actually exist, which is also handled in Section 8.3. 8.1 The Fan-Glicksberg Theorem We now extend the Kakutani fixed point theorem to correspondences with infinite dimensional domains. The result below was proved independently by Fan (1952) and Glicksberg (1952) using quite similar methods; our proof is perhaps a bit closer to Fan’s. In a sense the result was already known, since it can be derived from the Eilenberg-Montgomery theorem, but the proof below is much simpler. Theorem 8.1.1 (Fan, Glicksberg). If V is a locally convex topological vector space, X ⊂ V is nonempty, convex, and compact, and F : X → X is an upper semicontinuous convex valued correspondence, then F has a fixed point. We treat two technical points separately: Lemma 8.1.2. If V is a (not necessarily locally convex) topological vector space and K, C ⊂ V with K compact and C closed, then K + C is closed. 109 8.1. THE FAN-GLICKSBERG THEOREM Proof. We will show that the compliment is open. Let y be a point of V that is not in K + C. For each x ∈ K, translation invariance of the topology of V implies that x + C is closed, so Lemma 6.3.2 gives a neighborhood Wx of the origin such that (y + Wx + Wx ) ∩ (x + C) = ∅. Since we can replace Wx with −Wx ∩ Wx , we may assume that −Wx = Wx , so that (y + Wx ) ∩ (x + C + Wx ) = ∅. Choose x1 , . . . , xk such that the sets xi + Wxi cover K, and let W = Wx1 ∩ . . . ∩ Wxk . Now [ (y + W ) ∩ (K + C) ⊂ (y + W ) ∩ (xi + C + Wxi ) i ⊂ [ i (y + Wxi ) ∩ (xi + C + Wxi ) = ∅. Lemma 8.1.3. If V is a (not necessarily locally convex) topological vector space and K, C, U ⊂ V with K compact, C closed, U open, and C ∩ K ⊂ U, then there is a neighborhood of the origin W such that (C + W ) ∩ K ⊂ U. Proof. Let L := K \ U. Our goal is to find a neighborhood of the origin W such that (C + W ) ∩ L = ∅. Since C is closed, for each x ∈ L there is (by Lemma 6.3.2) a neighborhood Wx of the origin such that (x + Wx + Wx ) ∩ C = ∅. We can replace Wx with −Wx ∩ Wx , so we may insist that −Wx = Wx . As a closed subset of K, L is compact, so there are x1 , . . . , xk such that the sets xi + Wxi cover L. Let W := Wx1 ∩ . . . ∩ Wxk . Then W = −W , so if (C + W ) ∩ L is nonempty, then so is C ∩ (L + W ), but [ [ L+W ⊂ xi + Wxi + W ⊂ xi + Wxi + Wxi . i i Proof of Theorem 8.1.1. Let U be a closed convex neighborhood of the origin. (Lemma 6.3.4 implies that such a U exists.) Let FU : X → X be the correspondence FU (x) := (F (x) + U) ∩ X. Evidently FU (x) is nonempty and convex, and the first of the two results above implies that it is a closed subset of X, so it is compact. To show that FU is upper semicontinuous we consider a particular x and a neighborhood T of FU (x). The second of the two results above implies that there is a neighborhood W of the origin such that (F (x) + U + W ) ∩ X ⊂ T . Since F is upper semicontinuous there is a neighborhood A of x such that F (x′ ) ⊂ F (x) + W for all x′ ∈ A, and for such an x′ we have FU (x′ ) = (F (x′ ) + U) ∩ X ⊂ (F (x) + W + U) ∩ X ⊂ T. Since X is compact, there are finitely many points x1 , . . . , xk ∈ X such that x1 + U, . . . , xk + U is a cover of X. Let C be the convex hull of these points. Define G : C → C by setting G(x) = FU (x) ∩ C; since G(x) contains some xi , it is nonempty, and of course it is convex. Since C is the image of the continuous function (α1 , . . . , αk ) 7→ α1 x1 +· · ·+αk xk from the (k−1)-dimensional simplex, it is compact, 110 CHAPTER 8. ESSENTIAL SETS OF FIXED POINTS and consequently closed because V is Hausdorff. Since Gr(G) = Gr(FU ) ∩ (C × C) is closed, G is upper semicontinuous. Therefore G satisfies the hypothesis of the Kakutani fixed point theorem and has a nonempty set of fixed points. Any fixed point of G is a fixed point of FU , so the set FU of fixed points of FU is nonempty. Of course it is also closed in X, hence compact. The collection of compact sets { FU : U is a closed convex neighborhood of the origin } has the finite intersection property because FU1 ∩...Uk ⊂ FU1 ∩ . . . ∩ FUk , so its intersection is nonempty. Suppose that x∗ is an element of this intersection. If x∗ was not an element of F (x∗ ) there would be a closed neighborhood U of the origin such that (x∗ − U) ∩ F (x∗ ) = ∅, which contradicts x∗ ∈ FU , so x∗ is a fixed point of F . 8.2 Convex Valued Correspondences Let X be a topological space, and let Y be a subset of a topological vector space V . Then Con(X, Y ) is the set of upper semicontinuous convex valued correspondences from X to Y . Let ConS (X, Y ) denote this set endowed with the relative topology inherited from US (X, Y ), which was defined in Section 5.2. This section treats two topological issues that are particular to convex valued correspondences: a) approximation by continuous functions; b) the continuity of the process by which they are recombined using convex combinations and partitions of unity. The following result is a variant, for convex valued correspondences, of the approximation theorem (Theorem 9.1.1) that is the subject of the next chapter. Proposition 8.2.1. If X is a metric space, V is locally convex, and Y is either open or convex, then C(X, Y ) is dense in ConS (X, Y ). Proof. Fix F ∈ Con(X, Y ) and a neighborhood U ⊂ X × Y of Gr(F ). Our goal is to produce a continuous function f : X → Y with Gr(f ) ⊂ U. Consider a particular x ∈ X. For each y ∈ F (x) there is a neighborhood Tx,y of x and (by Lemma 6.3.2) a neighborhood Wx,y of the origin in V such that Tx,y × (y + Wx,y + Wx,y ) ⊂ U. If Y is open we can also require that y + Wx,y + Wx,y ⊂ Y . The compactness of F (x) T implies that there T are y1 , . . . , yk such that the yi + Wx,yi cover F (x). Setting Tx = i Tx,yi and Wx = i Wx,yi , we have Tx ×(F (x)+Wx ) ⊂ U and F (x)+Wx ⊂ Y if Y is open. Since V is locally convex, we may assume that Wx is convex because we can replace it with a smaller convex neighborhood. Upper semicontinuity gives a δx > 0 such that Uδx (x) ⊂ Tx and F (x′ ) ⊂ F (x) + Wx for all x′ ∈ Uδx (x). Since metric spaces are paracompact there is a locally finite open cover {Tα }α∈A of X that refines {Uδx /2 (x)}x∈X . For each α ∈ A choose xα such that Tα ⊂ Uδα /2 (xα ), where δα := δxα , and choose yα ∈ F (xα ). Since metric spaces are 8.2. CONVEX VALUED CORRESPONDENCES 111 normal, Theorem 6.2.2 gives a partition of unity {ψα } subordinate to {Tα }α∈A . Let f : X → V be the function X f (x) := ψα (x)yα . α∈A Fixing x ∈ X, let α1 , . . . , αn be the α such that ψα (x) > 0. After renumbering we may assume that δα1 ≥ δαi for all i = 2, . . . , n. For each such i we have xαi ∈ Uδαi /2 (x) ⊂ Uδα1 (xα1 ), so that yαi ∈ F (xα1 ) + Wxα1 . Since F (xα1 ) + Wxα1 is convex we have (x, f (x)) ∈ Uδα1 (xα1 ) × (F (xα1 ) + Wxα1 ) ⊂ U. Note that f (x) is contained in Y either because Y is convex or because F (xα1 ) + Wxα1 ⊂ Y . Since x was arbitrary, we have shown that Gr(f ) ⊂ U. We now study correspondences constructed from given correspondences by taking a convex combination, where the weights are given by a partition of unity. Let X be a compact metric space and let V be a topological vector space. Since addition and scalar multiplication are continuous, Proposition 4.5.9 and Lemma 4.5.10 imply that the composition (α, K) 7→ {α} × L 7→ αK = { αv : v ∈ K } (∗) and the Minkowski sum (K, L) 7→ K × L 7→ K + L := { v + w : v ∈ K, w ∈ L } (∗∗) are continuous functions from R × K(V ) and K(V ) × K(V ) to K(V ). These operations define continuous functions on the corresponding spaces of functions and correspondences. Let CS (X) denote the space CS (X, R) defined in Section 5.5. Lemma 8.2.2. The function (ψ, F ) 7→ ψF from CS (X)×ConS (X, V ) to ConS (X, V ) is continuous. Proof. To produce a contradiction suppose the assertion is false. Then there is a directed set (D, <) and a convergent net, say {(ψ d , F d )}d∈D with limit (ψ, F ), such that ψ d F d 6→ ψF . Failure of convergence means that there is a neighborhood W ⊂ X × V of Gr(ψF ) such that (after choosing a subnet) for every d there are points xd ∈ X and y d ∈ F d (xd ) such that (xd , ψ d (xd )y d) ∈ / W. Taking a further subnet, we may assume that xd → x and ψ d (xd ) → α. For each y ∈ F (x) there are neighborhoods Ty and Uy of x and y such T that Ty × UyS⊂ W . Let Uy1 , . . . , Uym be a finite subcover of F (x), and set T := j Tyj and U := j Uyj . Then T and U are neighborhoods of x and F (x) such that T × U ⊂ W . The continuity of (∗) and (∗∗) implies that there are neighborhoods A of α and U of F (x) such that α′ K ⊂ U whenever α′ ∈ A and K ⊂ U. By replacing T with a smaller neighborhood of x if need be, we can insure that ψ(x′ ) ∈ A and F (x′ ) ⊂ U for all x′ ∈ T . Then the set of (ψ ′ , F ′ ) such that ψ ′ (x′ ) ∈ A and F ′ (x′ ) ⊂ U for all 112 CHAPTER 8. ESSENTIAL SETS OF FIXED POINTS x′ ∈ T is a neighborhood of (ψ, F ), so when d is “large” we will have (ψ d , F d ) in this neighborhood and xd ∈ T , which implies that {xd } × ψ d (xd )F d (xd ) ⊂ T × U ⊂ W. This contradicts our supposition, so the proof is complete. The proof of the following follows the same pattern, and is left to the reader. Lemma 8.2.3. The function (F1 , F2 ) 7→ ψF1 + F2 from ConS (X, V ) × ConS (X, V ) to ConS (X, V ) is continuous. If ψ1 , . . . , ψk is a partition of unity subordinate to this cover and F1 , . . . , Fk ∈ Con(X, V ), then each Fi may be regarded as a continuous function from X to K(V ), so we may define a new continuous function from X to K(V ) by setting (ψ1 F1 + · · · + ψk Fk )(x) := ψ1 (x)F1 (x) + · · · + ψk (x)Fk (x). A continuous function from X to K(V ) is the same thing as an upper semicontinuous compact valued correspondence, so we may regard ψ1 F1 + · · · + ψk Fk as an element of Con(X, V ). Let PU k (X) be the space of k-element partitions of unity ψ1 , . . . , ψk of X. We endow PU k (X) with the relative topology it inherits as a subspace of CS (X)k . The last two results now imply: Proposition 8.2.4. The function (ψ1 , . . . , ψk , F1 , . . . , Fk ) 7→ ψ1 F1 + · · · + ψk Fk from PU k (X) × ConS (X, V )k to ConS (X, V ) is continuous. 8.3 Kinoshita’s Theorem Let X be a compact convex subset of a locally convex topological vector space, and fix a particular F ∈ Con(X, X). Definition 8.3.1. A set K ⊂ F P(F ) is an essential set of fixed points of F if it is compact and for any open U ⊃ K there is a neighborhood V ⊂ ConS (X, X) of F such that F P(F ′ ) ∩ U 6= ∅ for all F ′ ∈ V . The following result from Kinoshita (1952) is a key element of the theory of essential sets. Theorem 8.3.2. (Kinoshita) If K ⊂ F P(F ) is essential and K1 , . . . , Kk is a partition of K into disjoint compact sets, then some Kj is essential. Proof. Suppose that no Kj is essential. Then for each j = 1, . . . , k there is a neighborhood Uj of Kj such that for every neighborhood Vj ⊂ ConS (X, X) there is an Fj ∈ Vj with no fixed points in Uj . Replacing the Uj with smaller neighborhoods if need be, we can assume that they are pairwise disjoint. Let U be a neighborhood of X \(U1 ∪. . .∪Uk ) whose closure does not intersect K. A compact Hausdorff space is 8.3. KINOSHITA’S THEOREM 113 normal, so Theorem 6.2.2 implies the existence of a partition of unity ϕ1 , . . . , ϕk , ϕ : X → [0, 1] subordinate to the open cover U1 , . . . , Uk , U. Let V ⊂ ConS (X, X) be a neighborhood of F . Proposition 8.2.4 implies that there are neighborhoods V1 , . . . , Vk ⊂ ConS (X, X) of F such that ϕ1 F1 + · · · + ϕk Fk + ϕF ∈ V whenever F1 ∈ V1 , . . . , Fk ∈ Vk . For each j we can choose a Fj ∈ Vj that has no fixed points in Uj . Then ϕ1 F1 + · · · + ϕk Fk + ϕF has no fixed points in X \ U because on each Uj \ U it agrees with Fj . Since X \ U is a neighborhood of K and V was arbitrary, this contradicts the assumption that K is essential. Recall that a topological space is connected if it is not the union of two disjoint nonempty open sets. A subset of a topological space is connected if the relative topology makes it a connected space. Corollary 8.3.3. A minimal essential set is connected. Proof. Let K be an essential set. If K is not connected, then there are disjoint open sets U1 , U2 such that K ⊂ U1 ∪ U2 and K1 := K ∩ U1 and K2 := K ∩ U2 are both nonempty. Since K1 and K2 are closed subsets of K, they are compact, so Kinoshita’s theorem implies that either K1 or K2 is essential. Consequently K cannot be minimal. Naturally we would like to know whether minimal essential sets exist. Because of important applications in game theory, we will develop the analysis in the context of a slightly more general concept. Definition 8.3.4. A pointed space is a pair (A, a0 ) where A is a topological space and a0 ∈ A. A pointed map f : (A, a0 ) → (B, b0 ) between pointed spaces is a continuous function f : A → B with f (a0 ) = b0 . Definition 8.3.5. Suppose (A, a0 ) is a pointed space and Q : (A, a0 ) → (ConS (X, X), F ) is a pointed map. A nonempty compact set K ⊂ F P(F ) is Q-robust if, for every neighborhood V ⊂ X of K, there is a neighborhood U ⊂ A of a0 such that F P(Q(a)) ∩ V 6= ∅ for all a ∈ U. A set of fixed points is essential if and only if it is Id(ConS (X,X),F ) -robust. At the other extreme, if Q is a constant function, so that Q(a) = F for all a, then any nonempty compact K ⊂ F P(F ) is Q-robust. The weakening of the notion of an essential set provided by this definition is useful when certain perturbations of F are thought to be more relevant than others, or when the perturbations of F are derived from perturbations of the parameter a in a neighborhood of a0 . Some of the most important refinements of the Nash equilibrium concept have this form. In particular, Jiang (1963) defines essential Nash equilibria, and essential sets of Nash equilibria, in terms of perturbations of the game’s payoffs, while Kohlberg and Mertens (1986) define stable sets of Nash equilibria in terms of those perturbations of the payoffs that are induced by the trembles of Selten (1975). Lemma 8.3.6. F P(F ) is Q-robust. 114 CHAPTER 8. ESSENTIAL SETS OF FIXED POINTS Proof. The continuity of F P (Theorem 5.2.1) implies that for any neighborhood V ⊂ X of F P(F ) there is a neighborhood U ⊂ A of a0 such that F P(Q(a)) ⊂ V for all a ∈ U. The Fan-Glicksberg fixed point theorem implies that F P(Q(a)) is nonempty. This result shows that if our goal is to discriminate between some fixed points and others, these concepts must be strengthened in some way. The two main methods for doing this are to require either connectedness or minimality. Definition 8.3.7. A nonempty compact set K ⊂ F P(F ) is a minimal Q-robust set if it is Q-robust and minimal in the class of such sets: K is Q-robust and no proper subset is Q-robust. A minimal connected Q-robust set is a connected Q-robust set that does not contain a proper subset that is connected and Q-robust. In general a minimal Q-robust set need not be connected. For example, if (A, a0 ) = ((−1, 1), 0) and Q(a)(t) = argmaxt∈[0,1] at (so that F (t) = [0, 1] for all t) then F P(Q(a)) is {0} if a < 0 and it is {1} if a > 0, so the only minimal Q-robust set is {0, 1}. In view of this one must be careful to distinguish between a minimal connected Q-robust set and a minimal Q-robust set that happens to be connected. Theorem 8.3.8. If K ⊂ F P(F ) is a Q-robust set, then it contains a minimal Q-robust set, and if K is a connected Q-robust set, then it contains a minimal connected Q-robust set. Proof. Let C be the set of Q-robust sets that are contained in K. We order this set by reverse inclusion, so that our goal is to show that C has a maximal element. This follows from Zorn’s lemma if we can show that any completely ordered subset O has an upper bound in C. The finite intersection property implies that the intersection of all elements of O is nonempty; let K∞ be this intersection. If K∞ is not Qrobust, then there is a neighborhood V of K∞ such that every neighborhood U of a0 contains a point a such that Q(a) has no fixed points in V . If L ∈ O, we cannot have L ⊂ V because L is Q-robust, but now { L \ V : L ∈ O } is a collection of compact sets with the finite intersection property, so it has a nonempty intersection that is contained in K∞ but disjoint from V . Of course this is absurd. The argument for connected Q-robust sets follows the same lines, except that in addition to showing that K∞ is Q-robust, we must also show that it is connected. If not there are disjoint open sets V1 and V2 such that K∞ ⊂ V1 ∪ V2 and K∞ ∩ V1 6= ∅= 6 K∞ ∩ V2 . For each L ∈ O we have L ∩ V1 6= ∅ = 6 L ∩ V2 , so L \ (V1 ∪ V2 ) must be nonempty because L is connected. As above, { L \ (V1 ∪ V2 ) : L ∈ O } has a nonempty intersection that is contained in K∞ but disjoint from V1 ∪ V2 , which is impossible. Chapter 9 Approximation of Correspondences In extending fixed point theory from functions to correspondences, an important method is to show that continuous functions are dense in the space of correspondences, so that any correspondence can be approximated by a function. In the last chapter we saw such a result (Theorem 8.2.1) for convex valued correspondences, but much greater care and ingenuity is required by the arguments showing that contractible valued correspondences have good approximations. This chapter states and proves the key result in this direction. This result was proved in the Euclidean case by Mas-Colell (1974) and extended to ANR’s by the author in McLennan (1991). 9.1 The Approximation Result Our main result can be stated rather easily. We now fix ANR’s X and Y . We assume throughout this chapter that X is separable, in order to be able to invoke the domination theorem. Theorem 9.1.1 (Approximation Theorem). Suppose that C and D are compact subsets of X with C ⊂ int D. Let F : D → Y be an upper semicontinuous contractible valued correspondence. Then for any neighborhood U of Gr(F |C ) there are: (a) a continuous f : C → Z with Gr(f ) ⊂ U; (b) a neighborhood U ′ of Gr(F ) such that, for any two continuous functions f0 , f1 : D → Y with Gr(f0 ), Gr(f1 ) ⊂ U ′ , there is a homotopy h : C × [0, 1] → Y with h0 = f0 |C , h1 = f1 |C , and Gr(ht ) ⊂ U for all 0 ≤ t ≤ 1. Roughly, (a) is an existence result, while (b) is uniqueness up to effective equivalence. Here, and later in the book, things would be much simpler if we could have C = D. More precisely, it would be nice to drop the assumption that C ⊂ int D. This may be possible (that is, I do not know a relevant counterexample) but a proof would certainly involve quite different methods. 115 116 CHAPTER 9. APPROXIMATION OF CORRESPONDENCES The following is an initial indication of the significance of this result. Theorem 9.1.2. If X is a compact ANR with the fixed point property, then any upper semicontinuous contractible valued correspondence F : X → X has a fixed point. Proof. In the last result let Y = X and C = D = X. Endow X with a metric dX . For each j = 1, 2, . . . let Uj := { (x′ , y ′ ) ∈ X × X : dX (x, x′ ) + dX (y, y ′) < 1/j } for some (x, y) ∈ Gr(F ), let fj : X → X be a continuous function with Gr(fj ) ⊂ Uj , let zj be a fixed point of fj , and let (x′j , yj′ ) be a point in Gr(F ) with dX (x′j , zj ) + dX (yj′ , zj ) < 1/j. Passing to convergent subsequences, we find that the common limit of the sequences {x′j }, {yj′ }, and {zj } is a fixed point of F . Much later, applying Theorem 9.1.1, we will show that a nonempty compact contractible ANR has the fixed point property. 9.2 Extending from the Boundary of a Simplex The proof of Theorem 9.1.1 begins with a concrete geometric construction that is given in this section. In subsequent sections we will transport this result to increasingly general settings, eventually arriving at our objective. We now fix a locally convex topological vector space T and a convex Q ⊂ T . A subset Z of a vector space is balanced if λz ∈ Z whenever z ∈ Z and |λ| ≤ 1. Since T is locally convex, every neighborhood of the origin contains a convex neighborhood U, and U ∩ −U is a neighborhood that is convex and balanced. Working with balanced neighborhoods of the origin allows us to not keep track of the difference between a neighborhood and its negation. Proposition 9.2.1. Let A and B be convex balanced neighborhoods of the origin in T with 2A ⊂ B. Suppose S ⊂ Q is compact and c : S×[0, 1] → S is a contraction for which there is a δ > 0 such that c(s, t) − c(s′ , t′ ) ∈ B for all (s, t), (s′, t′ ) ∈ S × [0, 1] with s − s′ ∈ 3A and |t − t′ | < δ. Let L be a simplex. Then any continuous f ′ : ∂L → (S + A) ∩ Q has a continuous extension f : L → (S + B) ∩ Q. Proof. Let β be the barycenter of L. We define “polar coordinate” functions y : L \ {β} → ∂L and t : L \ {β} → [0, 1) implicitly by requiring that (1 − t(x))y(x) + t(x)β = x. Let L1 = t−1 ([0, 31 ]), L2 = t−1 ([ 13 , 32 ]), L3 = t−1 ([ 32 , 1)) ∪ {β}. We first define f at points in L2 , then extend to L1 and L3 . 9.2. EXTENDING FROM THE BOUNDARY OF A SIMPLEX 117 Let d be a metric on L. Since f ′ , t(·), and y(·) are continuous, and L2 is compact, for some sufficiently small λ > 0 it is the case that f ′ (y(x)) − f ′ (y(x′)) ∈ A and |t(x) − t(x′ )| < 31 δ for all x, x′ ∈ L2 such that d(x, x′ ) < λ. There is a polyhedral subdivision of L2 whose cells are the sets y −1 (F ) ∩ t−1 ( 31 ), y −1 (F ) ∩ L2 , y −1(F ) ∩ t−1 ( 32 ) for the various faces F of L. Proposition 2.5.2 implies that repeated barycentric subdivision of this polyhedral complex results eventually in a simplicial subdivision of L2 whose mesh is less than λ. For each vertex v of this subdivision choose s(v) ∈ (f ′ (y(v)) + A) ∩ S, and set f (v) := c(s(v), 3t(v) − 1). Let ∆ be a simplex of the subdivision of L2 with vertices v1 , . . . , vr . We define f on ∆ by linear interpolation on ∆: if x = α1 v1 + · · · + αr vr , then f (x) := α1 f (v1 ) + · · · + αr f (vr ). This definition does not depend on the choice of ∆ if x is contained in more than one simplex, it is continuous on each ∆, and the simplices are a finite closed cover of L2 , so f is continuous. Suppose that v and v ′ are two vertices of ∆, so they are the endpoints of an edge. We have d(v, v ′) < λ, so f ′ (y(v)) − f ′ (y(v ′)) ∈ A and |t(v) − t(v ′ )| < 31 δ. In addition, s(v) − f ′ (y(v)) and f ′ (y(v ′)) − s(v ′ ) are elements of A, so s(v) − s(v ′ ) ∈ 3A and |(3t(v) − 1) − (3t(v ′ ) − 1)| < δ, from which it follows, by hypothesis, that f (v) − f (v ′ ) ∈ B. Consider a point x = α1 v1 + · · · + αr vr ∈ ∆. Since f (v1 ) ∈ S and f (x) − f (v1 ) = r X j=1 αj (f (vj ) − f (v1 )) is a convex combination of the vectors f (vj ) − f (v1 ) for the vertices vj of ∆, we have f (x) ∈ (f (v1 ) + B) ∩ Q ⊂ (S + B) ∩ Q. Thus f (L2 ) ⊂ (S + B) ∩ Q. We now define f on L1 by setting f (x) := (1 − 3t(x))f ′ (y(x)) + 3t(x)f ( 31 β + 32 y(x)). Since f is continuous on L2 , this formula defines a continuous function. Suppose that 2 y(x) + 13 β = α1 v1 + · · · + αr vr 3 as above. Consider a particular vj . Above we showed that f ( 32 y(x) + 13 β) ∈ (s(vj ) + B) + Q. 118 CHAPTER 9. APPROXIMATION OF CORRESPONDENCES The point s(vj ) was chosen with f ′ (y(vj )) − s(vj ) ∈ A, and f ′ (y(x)) − f ′ (y(vj )) ∈ A because d( 32 y(x) + 13 β, y(vj )) < λ, so f ′ (y(x)) ∈ (s(vj ) + 2A) ∩ Q ⊂ (s(vj ) + B) ∩ Q. Since f (x) is a convex combination of f ′ (y(x)) and f ( 32 y(x) + 13 β) we have f (x) ∈ (s(vj ) + B) ∩ Q ⊂ (S + B) ∩ Q. Thus f (L1 ) ⊂ (S + B) ∩ Q. Let z be the point S is contracted to by c: c(S, 1) = {z}. We define f on L3 by setting f (x) := z. Of course this is a continuous function whose image is contained in S ⊂ (S + B) ∩ Q. If x ∈ L1 ∩ L2 , then t(x) = 31 and 23 y(x) + 31 β = x, so the formula defining f on L1 agrees with the definition of f for elements of L2 at x. If v is a vertex of the subdivision of L2 contained in L2 ∩ L3 , then t(v) = 23 , so that the definition of f on L2 gives f (v) = c(s(v), 3t(v) − 1) = z. If x ∈ L2 ∩ L3 , then L2 ∩ L3 contains any simplex of the subdivision of L2 that has x as an element, and the definition of f on L2 gives f (x) = z. Thus this definition agrees with the definition of f on L2 at points in L2 ∩ L3 . Thus f is well defined and continuous. 9.3 Extending to All of a Simplicial Complex As above, Q is a convex subset of T , and we now fix a relatively open Z ⊂ Q. We also fix a simplicial complex K and a subcomplex J. Proposition 9.3.1. Let F : K → Z be an upper semicontinuous contractible valued correspondence. Then for any neighborhood W ⊂ K × Z of Gr(F ) there is a neighborhood W ′ of Gr(F |J ) such that any continuous f ′ : J → Z with Gr(f ′ ) ⊂ W ′ has a continuous extension f : K → Z with Gr(f ) ⊂ W . The main argument will employ two technical results, the first of which will also be applied in the next section. Recall that an ANR can be embedded in a normed space (Proposition 7.4.3) so it is metrizable. Lemma 9.3.2. Let X be an ANR, let F : X → Z be an upper semicontinuous correspondence with metric d, and let V ⊂ X × Z be a neighborhood of Gr(F ). For any x ∈ X there is δ > 0 and a neighborhood B of the origin in Z such that Uδ (x) × ((F (x) + B) ∩ Z) ⊂ V . Proof. By the definition of the product topology, for every z ∈ F (x) there exist δz > 0 and an open neighborhood Az ⊂ Z of the origin in T such that Uδz (x) × ((z + Az ) ∩ Z) ⊂ V, and the continuity of addition in T implies that there is a neighborhood Bz of the origin with Bz + Bz ⊂ Az . Since F (x) is compact there are z1 , . . . ,T zK such that z1 + Bz1 , . . . , zk + Bzk is a cover of F (x). Let δ := minj δzj and B := j Bzj . 9.3. EXTENDING TO ALL OF A SIMPLICIAL COMPLEX 119 Lemma 9.3.3. Let U1 , . . . , Un be a cover of a metric space X by open sets, none of which are X itself. For each y ∈ X let ry = max sup{ ε > 0 : Uε (y) ⊂ Ui }, i:y∈Ui and let Vy be an open subset of U(√5−2)ry (y) that contains y. Then for all y, y ′ ∈ X, if Vy ∩ Vy′ 6= ∅, then Vy′ ⊂ Ury (y). √ √ Proof. Let α = 5−2 and β = 3− 5. Suppose Vy ∩Vy′ 6= ∅. The distance from y to any point in Vy′ cannot exceed α(ry + 2ry′ ), so if Vy′ is not contained in Ury (y), then α(ry +2ry′ ) > ry , which boils down to 2αry′ > βry . Let iy′ be one of the indices such that Ury′ (y) ⊂ Uiy . We claim that x ∈ Uiy′ because ry′ > α(ry + ry′ ), which reduces to βry′ > αry . A quick computation verifies that β/2α > α/β, so this follows from the inequality above. Since y ∈ Uiy′ , and the distance from y to y ′ is less than α(ry +ry′ ), we have ry > ry′ −α(ry +ry′ ), which reduces to (α−1)r √ y > βry′ . Together this inequality and the one above imply that 2α/β > (3 − 5)/(α − 1), but one may easily compute that in fact these two quantities are equal. This contradiction completes the proof. Proof of Proposition 9.3.1. Let m be the largest dimension of any simplex in K that is not in J. The main idea is to use induction on m, but one of the methods used in the construction is subdivision of K, and the formulation of the induction hypothesis must be sensitive to this. Precisely, we will show that for each k = 0, . . . , m there is a neighborhood Wk ⊂ W of Gr(F ) and a simplicial subdivision of K such that if Hk is the union of J with the k-skeleton of some further subdivision, then any f ′ : J → Z with Gr(f ′ ) ⊂ Wk has an extension f : Hk → Z with Gr(f ) ⊂ W . For k = 0 the claim is obvious: we can let W0 = W and take K itself without any further subdivision. By induction we may assume that the claim has already been established with k − 1 in place of k. That is, there is a neighborhood Wk−1 ⊂ W of Gr(F ) and a simplicial subdivision of K such that if Hk−1 is the union of J with the (k − 1)-skeleton of some further subdivision, then any f ′ : J → Z with Gr(f ′ ) ⊂ Wk−1 has an extension f : Hk−1 → Z with Gr(f ) ⊂ W . We now develop two open coverings of K. Consider a particular x ∈ K. Fix a contraction cx : F (x) × [0, 1] → F (x). Lemma 9.3.2 allows us to choose a convex balanced neighborhood Bx of the origin in T and δx > 0 such that Ux × (F (x) + Bx ) ∩ Z ⊂ Wk−1 where Ux := Uδx (x). By choosing Bx sufficiently small we can also have (F (x) + Bx ) ∩ Q ⊂ Z. Since cx is continuous, we can choose a convex balanced neighborhood Ax of the origin in T and a number δx > 0 such that cx (z ′ , t′ ) ∈ cx (z, t)+Bx for all (z, t), (z ′ , t′ ) ∈ F (x) × [0, 1] such that z ′ − z ∈ 3Ax and |t′ − t| < δx . Replacing Ax with a smaller convex neighborhood if need be, we may assume that 2Ax ⊂ Bx . Since F is upper semicontinuous and δx may be replaced by a smaller positive number, we can 120 CHAPTER 9. APPROXIMATION OF CORRESPONDENCES ′ insure that F (x′ ) ⊂ F (x) + 21 Ax whenever Tnx ∈ Ux . Choose x1 , . . . , xn such that Ux1 , . . . , Uxn is a covering of K. Let A := i=1 Axi . The second open covering of K is finer. For each y ∈ K let ry = max sup{ ε > 0 : Uε (y) ⊂ Ui }. i:y∈Ui The upper semicontinuity of F implies that each y has an open neighborhood Vy such that F (y ′) ⊂ F (y) + 12 A for all y ′ ∈ Uεy (y). We can replace Vy with a smaller neighborhood to bring about Vy ⊂ U(√5−2)ry (y). Choose y1 , . . . , yp ∈ K such that Vy1 , . . . , Vyp cover K. Set Wk := p [ j=1 Vyj × ((F (yj ) + 12 A) ∩ Z). Evidently Gr(F ) ⊂ Wk . We have Wk ⊂ Wk−1 because for each j there is some i such that Vyj ⊂ Uxi and (F (yj ) + 21 A) ∩ Z ⊂ ((F (xi ) + 21 Axi ) + 12 A) ∩ Z ⊂ (F (xi ) + Axi ) ∩ Z. Starting with the subdivision of K obtained at stage k − 1, by Proposition 2.5.2 repeated barycentric subdivision leads eventually to a subdivision of K with each simplex contained in some Vyj . Let Hk be the union of J with the k-skeleton of some further subdivision, and fix a continuous f ′ : J → Z with Gr(f ′ ) ⊂ Wk . By the induction hypothesis there is an extension f of f ′ to the (k − 1)-skeleton of the further subdivision. Since extensions to each of the k-simplices that are in Hk but not in J combine to give the desired sort of extension, it suffices to show that there is an extension to a single such k-simplex L. By construction there is a j such that L ⊂ Vyj . Let J be the set of j ′ such that Vyj ∩ Vyj′ 6= ∅. There is some Xi with Vyj′ ⊂ Uxi for all j ′ ∈ J, either because all of K is contained in a single Xi or as an application of the lemma above. The conditions imposed on our construction imply that [ [ F (yj ′ ) + 21 Axi ⊂ F (xi ) + Axi . f (∂L) ⊂ F (yj ′ ) + 21 A ⊂ j ′ ∈J j ′ ∈J Now Lemma 9.2.1, with Ax , Bx , δx , F (x), and f |∂L in place of A, B, δ, S, and f ′ , gives a continuous extension f : L → Z with f (L) ⊂ (F (xi ) + Bxi ) ∩ Q, and by construction this set is contained in Z. The proof is complete. 9.4 Completing the Argument The next step is a result in which the domains are subsets of the ANR X. Proposition 9.4.1. Suppose that C ⊂ D ⊂ X where C and D are compact with C ⊂ int D. Let F : D → Z be an upper semicontinuous contractible valued correspondence. Then for any neighborhood V of Gr(F |C ) there exist: (a) a continuous f : C → Z with Gr(f ) ⊂ V ; 9.4. COMPLETING THE ARGUMENT 121 (b) a neighborhood V ′ of Gr(F ) such that for any two functions f0 , f1 : D → Z with Gr(f0 ), Gr(f1 ) ⊂ V ′ there is a homotopy h : C×[0, 1] → Z with h0 = f0 |C , h1 = f1 |C , and Gr(ht ) ⊂ V for all 0 ≤ t ≤ 1. The passage from this to the main result is straightforward. Proof of Theorem 9.1.1. Recall (Proposition 7.4.1) that an ANR is a retract of a relatively open subset of a convex subset of a locally convex space. In particular, we now fix a locally convex space T , an open subset Z of a convex subset of T , and a retraction r : Z → Y . Let i : Y → Z be the inclusion. Let V := (IdX × r)−1 (U). Proposition 9.4.1(a) implies that there is a continuous f ′ : C → Z with Gr(f ′ ) ⊂ V , and setting f := r ◦ f ′ verifies (a) of Theorem 9.1.1. Let V ′ ⊂ V be a neighborhood of Gr(i ◦ F ) with the property asserted by Proposition 9.4.1(b). Let U ′ := (IdX × i)−1 (V ′ ). Suppose that f0 , f1 : D → Y with Gr(f0 ), Gr(f1 ) ⊂ U ′ . Then there is a homotopy h : C × [0, 1] → Z with h0 = i ◦ f0 |C , h1 = i ◦ f1 |C , and Gr(ht ) ⊂ V for all 0 ≤ t ≤ 1, so that r ◦ h0 = f0 |C , r ◦ h1 = f1 |C , and Gr(r ◦ ht ) ⊂ U for all 0 ≤ t ≤ 1. This confirms (b) of Theorem 9.1.1. The proof of Proposition 9.4.1 depends on two more technical lemmas. Below d denotes a metric for X. For the two lemmas below an upper semicontinuous correspondence F : X → Z is given. Lemma 9.4.2. Suppose that C ⊂ X is compact, and V ⊂ C × Z is a neighborhood of Gr(F |C ). Then there is ε > 0 and a neighborhood Ṽ of Gr(F ) such that [ Uε (x) × {z} ⊂ V. (x,z)∈Ṽ Proof. For each x ∈ C Lemma 9.3.2 allows us to choose δx > 0 and a neighborhood Ax of F (x) such that Uδx (x) × Ax ⊂ V . Replacing δx with a smaller number if need be, we may assume without loss of generality that F (x′ ) ⊂ Ax for all x′ ∈ Uδx (x). Choose x1 , . . . , xH such that Uδx1 /2 (x1 ), . . . , UδxH /2 (xH ) cover C. Let ε := min{δxi /2}, and set [ Ṽ := Uδxi /2 (xi ) × Axi . i Lemma 9.4.3. Suppose that f : S → X is a continuous function, where S is a compact metric space. If U is a neighborhood of Gr(F ◦ f ), then there is a neighborhood V of Gr(F ) such that (f × IdZ )−1 (V ) ⊂ U. 122 CHAPTER 9. APPROXIMATION OF CORRESPONDENCES Proof. Consider a particular x ∈ X. Applying Lemma 9.3.2, for any s ∈ f −1 (x) we can choose a neighborhood Ns of s and a neighborhood As ⊂ Y of F (x) such that Ns × As ⊂ U. Since f −1 (s) is compact, there are s1 , . . . , sℓ such that Ns1 , . . . , Nsℓ cover f −1 (s). Let A := As1 ∩ . . . ∩ Asℓ , and let W be a neighborhood of x small that f −1 (W ) ⊂ Ns1 ∪ . . . ∪ Nsℓ and F (x′ ) ⊂ A for all x′ ∈ W . (Such a W must exist because S is compact and F is upper semicontinuous.) Then [ (f × IdY )−1 (W × A) ⊂ Nsi × A ⊂ U. i Since x was arbitrary, this establishes the claim. Proof of Proposition 9.4.1. Lemma 9.4.2 gives a neighborhood V ′′ of Gr(F ) and ε > 0 such that [ Uε (x) × {z} ⊂ V. (x,z)∈V ′′ After replacing ε with a smaller number, Uε (C) is contained in the interior of D. Because X is separable, the domination theorem (Theorem 7.6.3) implies that there is a simplicial complex K that ε-dominates D by virtue of the maps ϕ : D → K and ψ : K → X. Let W ′′ := (ψ × IdZ )−1 (V ′′ ). Since ψ ◦ ϕ is ε-homotopic to IdD we have ϕ(C) ⊂ ψ −1 (Uε (C)). Since ϕ(C) is compact and ψ −1 (Uε (C)) is open, Proposition 2.5.2 implies that after repeated subdivisions of K the subcomplex H consisting of all simplices that intersect ϕ(C) will satisfy ψ(H) ⊂ Uε (C). Since W ′′ is a neighborhood of Gr(F ◦ψ|H ), Proposition 9.3.1 implies the existence of a function f ′ : H → Z with Gr(f ′ ) ⊂ W ′′ . Let f := f ′ ◦ ϕ|C . Then Gr(f ) ⊂ V , which verifies (a), because [ (ϕ|C × IdZ )−1 (W ′′ ) = ((ψ ◦ ϕ|C ) × IdZ )−1 (V ′′ ) ⊂ Uε (x) × {z} ⊂ V. (∗) (x,z)∈V ′′ Turning to (b), let G : H × [0, 1] → Z be the correspondence G(z, t) = F (ψ(z)). We apply Proposition 9.3.1, with G, H × [0, 1], W ′′ × [0, 1], and H × {0, 1} in place of F , K, W , and J respectively, obtaining neighborhoods W0′ , W1′ ⊂ W ′′ of Gr(F ◦ψ|H ) such that for any continuous functions f0′ , f1′ : H → Z with Gr(f0′ ) ⊂ W0′ and Gr(f1′ ) ⊂ W1′ , there is a homotopy h′ : H × [0, 1] → Z with h′0 = f0′ , h′1 = f1′ , and Gr(h′t ) ⊂ W ′′ for all t. Let W ′ = W0′ ∩ W1′ . Lemma 9.4.3 implies that there is a neighborhood V ′ of Gr(F ) such that (ψ|H × IdZ )−1 (V ′ ) ⊂ W ′. Replacing V ′ with V ′ ∩ V ′′ if need be, we may assume that V ′ ⊂ V ′′ . Now consider continuous f0 , f1 : D → Z with Gr(f0 ), Gr(f1 ) ⊂ V ′ . We have Gr(f0 ◦ ψ|H ), Gr(f1 ◦ ψ|H ) ⊂ W ′ . Therefore there is a homotopy j : H×[0, 1] → Z with j0 = f0 ◦ψ|H , j1 = f1 ◦ψ|H , and Gr(jt ) ⊂ W ′′ for all t. Let h′′ : C ×[0, 1] → Z be the homotopy h′′ (x, t) = j(ϕ(x), t). In view of (∗) we have Gr(h′′t ) ⊂ (ϕ|C × IdZ )−1 (W ′′ ) ⊂ V 9.4. COMPLETING THE ARGUMENT 123 for all t. Of course h′′0 = f0 ◦ ψ ◦ ϕ|C and h′′1 = f1 ◦ ψ ◦ ϕ|C . We now construct a homotopy h′ : C×[0, 1] → Z with h′0 = f0 |C , h′1 = f0 ◦ψ◦ϕ|C , and Gr(h′t ) ⊂ V for all t. Let η : D × [0, 1] → X be an ε-homotopy with η0 = IdD and η1 = ψ ◦ ϕ, and define h′ by h′ (x, t) := f0 (η(x, t)). Then h′ has the desired endpoints, and for all (x, t) in the domain of h′ we have (x, h′t (x)) ∈ V because d(x, η(x, t)) < ε and (η(x, t), h′t (x)) = (η(x, t), f0 (η(x, t))) ∈ V ′ ⊂ V ′′ . Similarly, there is a homotopy h′′′ : C × [0, 1] → Z with h′′′ 0 = f0 ◦ ψ ◦ ϕ|C , ′′′ ′′′ h1 = f1 |C , and Gr(ht ) ⊂ V for all t. To complete the proof of (b) we construct a homotopy h by setting ht = h′3t for 0 ≤ t ≤ 1/3, ht = h′′3t−1 for 1/3 ≤ t ≤ 2/3, and ht = h′′′ 3t−2 for 2/3 ≤ t ≤ 1. Part II Smooth Methods 124 Chapter 10 Differentiable Manifolds This chapter introduces the basic concepts of differential topology: ‘manifold,’ ‘tangent vector,’ ‘smooth map,’ ‘derivative.’ If these concepts are new to you, you will probably be relieved to learn that these are just the basic concepts of multivariate differential calculus, with a critical difference. In multivariate calculus you are handed a coordinate system, and a geometry, when you walk in the door, and everything is a calculation within that given Euclidean space. But many of the applications of multivariate calculus take place in spaces like the sphere, or the physical universe, whose geometry is not Euclidean. The theory of manifolds provides a language for the concepts of differential calculus that is in many ways more natural, because it does not presume a Euclidean setting. Roughly, this has two aspects: • In differential topology spaces that are locally homeomorphic to Euclidean spaces are defined, and we then impose structure that allows us to talk about differentiation of functions between such spaces. The concepts of interest to differential topology per se are those that are “invariant under diffeomorphism,” much as topology is sometimes defined as “rubber sheet geometry,” namely the study of those properties of spaces that don’t change when the space is bent or stretched. • The second step is to impose local notions of angle and distance at each point of a manifold. With this additional structure the entire range of geometric issues can be addressed. This vast subject is called differential geometry. For us differential topology will be primarily a tool that we will use to set up an environment in which issues related to fixed points have a particularly simple and tractable structure. We will only scratch its surface, and differential geometry will not figure in our work at all. The aim of this chapter is provide only as much information as we will need later, in the simplest and most concrete manner possible. Thus our treatment of the subject is in various ways terse and incomplete, even as an introduction to this topic, which has had an important influence on economic theory. Milnor (1965) and Guillemin and Pollack (1974) are recommended to those who would like to learn a bit more, and at a somewhat higher level Hirsch (1976) is more comprehensive, but still quite accessible. 125 126 10.1 CHAPTER 10. DIFFERENTIABLE MANIFOLDS Review of Multivariate Calculus We begin with a quick review of the most important facts of multivariate differential calculus. Let f : U → Rn be a function where U ⊂ Rm is open. Recall that if r ≥ 1 is an integer, we say that f is C r if all partial derivatives of order ≤ r are defined and continuous. For reasons that will become evident in the next paragraph, it can be useful to extend this notation to include r = 0, with C 0 interpreted as a synonym for “continuous.” We say that f is C ∞ if it is C r for all finite r. An order of differentiability is either a nonnegative integer r or ∞, and we write 2 ≤ r ≤ ∞, for example, to indicate that r is such an object, within the given bounds. If f is C 1 , then f is differentiable: for each x ∈ U and ε > 0 there is δ > 0 such that kf (x′ ) − f (x) − Df (x)(x′ − x)k ≤ εkx′ − xk for all x′ ∈ U with kx′ − xk < δ, where the derivative of f at x is the linear function Df (x) : Rm → Rn given by the matrix of first partial derivatives at x. If f is C r , then the function Df : U → L(Rm , Rn ) is C r−1 if we identify L(Rm , Rn ) with the space Rn×m of n × m matrices. The reader is expected to know the standard facts of elementary calculus, especially that addition and multiplication are C ∞ , so that functions built up from these operations (e.g., linear functions and matrix multiplication) are known to be C ∞ . There are three basic operations used to construct new C r functions from give functions. The first is restriction of the function to an open subset of its domain, which requires no comment because the derivative is unaffected. The second is forming the cartesian product of two functions: if f1 : U → Rn1 and f2 : U → Rn2 are functions, we define f1 × f2 : U → Rn1 +n2 to be the function x 7→ (f1 (x), f2 (x)). Evidently f1 × f2 is C r if and only if f1 and f2 are C r , and when this is the case we have D(f1 × f2 ) = Df1 × Df2 . The third operation is composition. The most important theorem of multivariate calculus is the chain rule: if U ⊂ Rm and V ⊂ Rn are open and f : U → V and g : V → Rp are C 1 , then g ◦ f is C 1 and D(g ◦ f )(x) = Dg(f (x)) ◦ Df (x) for all x ∈ U. Of course the composition of two C 0 functions is C 0 . Arguing inductively, suppose we have already shown that the composition of two C r−1 functions is C r−1 . If f and g are C r , then Dg ◦ f is C r−1 , and we can apply the result above about cartesian products, then the chain rule, to the composition x 7→ (Dg(f (x)), Df (x)) 7→ Dg(f (x)) ◦ Df (x) to show that D(g ◦ f ) is C r−1 , so that g ◦ f is C r . 10.1. REVIEW OF MULTIVARIATE CALCULUS 127 Often the domain and range of the pertinent functions are presented to us as vector spaces without a given or preferred coordinate system, so it is important to observe that we can use the chain rule to achieve definitions that are independent of the coordinate systems. Let X and Y be m- and n-dimensional vector spaces. (In this chapter all vector spaces are finite dimensional, with R as the field of scalars.) Let c : X → Rm and d : Y → Rn be linear isomorphisms. If U ⊂ X is open, we can say that a function f : U → Y is C r , by definition, if d ◦ f ◦ c−1 : c(U) → Rk is C r , and if this is the case and x ∈ U, then we can define the derivative of f at x to be Df (x) = d−1 ◦ D(d ◦ f ◦ c−1 )(c(x)) ◦ c ∈ L(X, Y ). Using the chain rule, one can easily verify that these definitions do not depend on the choice of c and d. In addition, the chain rule given above can be used to show that this “coordinate free” definition also satisfies a chain rule. Let Z be a third p-dimensional vector space. Then if V ⊂ Y is open, g : V → Z is C r , and f (U) ⊂ V , then g ◦ f is C r and D(g ◦ f ) = Dg ◦ Df . Sometimes we will deal with functions whose domains are not open, and we need to define what it means for such a function to be C r . Let S be a subset of X of any sort whatsoever. If Y is another vector space and f : S → Y is a function, then f is C r by definition if there is an open U ⊂ X containing S and a C r function F : U → Y such that f = F |S . Evidently being C r isn’t the same thing as having a well defined derivative at each point in the domain! Note that the identity function on S is always C r , and the chain rule implies that compositions of C r functions are C r . Those who are familiar with the category concept will recognize that there is a category of subsets of finite dimensional vector spaces and C r maps between them. (If you haven’t heard of categories it would certainly be a good idea to learn a bit about them, but what happens later won’t depend on this language.) We now state coordinate free versions of the inverse and implicit function theorems. Since you are expected to know the usual, coordinate dependent, formulations of these results, and it is obvious that these imply the statements below, we give no proofs. Theorem 10.1.1 (Inverse Function Theorem). If n = m (that is, X and Y are both m-dimensional) U ⊂ X is open, f : U → Y is C r , x ∈ U, and Df (x) is nonsingular, then there is an open V ⊂ U containing x such that f |V is injective, f (V ) is open in Y , and (f |V )−1 is C r . Suppose that U ⊂ X × Y is open and f : U → Z is a function. If f is C 1 , then, at a point (x, y) ∈ U, we can define “partial derivatives” Dx f (x, y) ∈ L(X, Z) and Dy f (x, y) ∈ L(Y, Z) to be the derivatives of the functions f (·, y) : { x ∈ X : (x, y) ∈ U } → Z and f (x, ·) : { y ∈ Y : (x, y) ∈ U } → Z at x and y respectively. Theorem 10.1.2 (Implicit Function Theorem). Suppose that p = n. (That is Y and Z have the same dimension.) If U ⊂ X × Y is open, f : U → Z is C r , 128 CHAPTER 10. DIFFERENTIABLE MANIFOLDS (x0 , y0 ) ∈ U, f (x0 , y0 ) = z0 , and Dy f (x0 , y0 ) is nonsingular, then there is an open V ⊂ X containing x0 , an open W ⊂ U containing (x0 , y0 ), and a C r function g : V → Y such that g(x0 ) = y0 and { (x, g(x)) : x ∈ V } = { (x, y) ∈ W : f (x, y) = z0 }. In addition Dg(x0 ) = −Dy f (x0 , y0 )−1 ◦ Dx f (x0 , y0 ). We will sometimes encounter settings in which the decomposition of the domain into a cartesian product is not given. Suppose that T is a fourth vector space, U ⊂ T is open, t0 ∈ U, f : U → Z is C r , and Df (t0 ) : T → Z is surjective. Let Y be a linear subspace of T of the same dimension as Z such that Df (t0 )|Y is surjective, and let X be a complementary linear subspace: X ∩ Y = {0} and X + Y = T . If we identify T with X × Y , then the assumptions of the result above hold. We will understand the implicit function theorem as extending in the obvious way to this setting. 10.2 Smooth Partitions of Unity A common problem in differentiable topology is the passage from local to global. That is, one is given or can prove the existence of objects that are defined locally in a neighborhood of each point, and one wishes to construct a global object with the same properties. A common and simple method of doing so is to take convex combinations, where the weights in the convex combination vary smoothly. This section develops the technology underlying this sort of argument, then develops some illustrative and useful applications. Fix a finite dimensional vector space X. Definition 10.2.1. Suppose that {Uα }α∈A is a collection of open subsets of X, S U = α Uα , and 0 ≤ r ≤ ∞. A C r partition of unity for U subordinate to {Uα } is a collection {ϕβ : X → [0, 1]}β∈B of C r functions such that: (a) for each β the closure of Vβ = { x ∈ X : ϕβ (x) > 0 } is contained in some Uα ; (b) {Vβ } is locally finite (as a cover of U); P (c) β ϕβ (x) = 1 for each x ∈ U. The first order of business is to show that such partitions of unity exist. The key idea is the following ingenious construction. Lemma 10.2.2. There is a C ∞ function γ : R → R with γ(t) = 0 for all t ≤ 0 and γ(t) > 0 for all t > 0. Proof. Let γ(t) := ( 0, t ≤ 0, −1/t e , t > 0. 129 10.2. SMOOTH PARTITIONS OF UNITY Standard facts of elementary calculus can be combined inductively to show that for each r ≥ 1 there is a polynomial Pr such that γ (r) (t) is Pr (1/t)e−1/t if t > 0. Since the exponential function dominates any polynomial, it follows that γ (r) (t)/t → 0 as t → 0, so that each γ (r) is differentiable at 0 with γ (r+1) (0) = 0. Thus γ is C ∞ . Note that for any open rectangle x 7→ Y i Qm i=1 (ai , bi ) ⊂ Rm the function γ(xi − ai )γ(bi − xi ) is C ∞ , positive everywhere in the rectangle, and zero everywhere else. S Lemma 10.2.3. If {Uα } is a collection of open subsets of Rm and U = α Uα , then U has a locally finite (relative to U) covering by open rectangles, each of whose closures in contained in some Uα . Proof. For any integer j ≥ 0 and vector k = (k1 , . . . , km ) with integer components let Qj,k = m Y i=1 (ki − 1)/2j , (ki + 1)/2j and Q′j,k = m Y i=1 (ki − 2)/2j , (ki + 3)/2j . The cover consists of those Qj,k such that the closure of Qj,k is contained in some Uα and, if j > 0, there is no α such that the closure of Q′j,k is contained in Uα . Consider a point x ∈ U. The last requirement implies that x has a neighborhood that intersects only finitely many cubes in the collection, which is to say that the collection is locally finite. For any j the Qj,k cover Rm , so there is some k such that x ∈ Qj,k , and if j is sufficiently small, then the closure of Qj,k is contained in some Uα . If Qj,k is not in the collection, then the closure of Q′j,k is contained in some Uα . Define k ′ by letting ki′ be ki /2 or (ki + 1)/2 according to whether ki is even or odd. Then Qj,k ⊂ Qj−1,k′ ⊂ Q′j,k . Repeating this leads eventually to an element of the collection that contains x, so the collection is indeed a cover of U. Imposing a coordinate system on X, then combining the observations above, proves that: ∞ Theorem 10.2.4. For S any collection {Uα }α∈A of open subsets of X there is a C partition of unity for α Uα subordinate to {Uα }. For future reference we mention a special case that comes up frequently: Corollary 10.2.5. If U ⊂ X is open and C0 and C1 are disjoint closed subsets of U, then there is a C ∞ function α : U → [0, 1] with α(x) = 0 for all x ∈ C0 and α(x) = 1 for all x ∈ C1 . Proof. Let {ϕ0 , ϕ1 } be a C ∞ partition of unity subordinate to the open cover {U \ C1 , U \ C0 }, and set α = ϕ1 . 130 CHAPTER 10. DIFFERENTIABLE MANIFOLDS Now let Y be a second vector space. As a first application we consider a problem that arises in connection with the definition in the last section of what it means for a C r function f : S → Y on a general domain S ⊂ X to be C r . We say that f is locally C r if each x ∈ S has a neighborhood Ux ⊂ X that is the domain of a C r function Fx : Ux → Y with Fx |S∩Ux = f |S∩Ux . This seems like the “conceptually correct” definition of what it means for a function to be C r , because this should be a local property that can be checked by looking at a neighborhood of an arbitrary point in the function’s domain. A C r function is locally C r , obviously. Fortunately the converse holds, so that the definition we have given agrees with the one that is conceptually correct. (In addition, it will often be pleasant to apply the given definition because it is simpler!) Proposition 10.2.6. If S ⊂ X and f : S → Y is locally C r , then f is C r . ∞ Proof. Let S {Fx : Ux → Y }x∈S be as above. Let {ϕβ }β∈B be a C partition of unity for U = x Ux subordinate to {Ux }. For each β chooseP an xβ such that the closure of { x : ϕβ (x) > 0 } is contained in Uxβ , and let F := β ϕβ · Fxβ : U → Y . Then F is C r because each point in U has a neighborhood in which it is a finite sum of C r functions. For x ∈ S we have X X F (x) = ϕβ (x) · Fxβ (x) = ϕβ (x) · f (x) = f (x). β β Here is another useful result applying a partition of unity. Proposition 10.2.7. For any S ⊂ X, C ∞ (S, Y ) is dense in CS (S, Y ). Proof. Fix a continuous f : S → Y and an open W ⊂ S × Y containing the graph of f . Our goal is to find a C ∞ function from S to Y whose graph is also contained in W . For each p ∈ S choose a neighborhood Up of p and εp > 0 small enough that f (Up ∩ S) ⊂ Uεp (f (p)) and (Up ∩ S) × U2εp (f (p)) ⊂ W. S Let U = p∈W Up . Let {ϕβ }β∈B be a C ∞ partition of unity for U subordinate to {Up }p∈S . For each β let Vβ = { x : ϕβ (x) > 0 }, choose some pβ such that Vβ ⊂ Upβ , P and let Uβ = Upβ and εβ = εpβ . Let f˜ : U → Y be the function x 7→ β ϕβ (x)·f (pβ ). Since {Vβ } is locally finite, f˜ : U → Y is C ∞ , so f˜|S is C ∞ . We still need to show that the graph of f˜|S is contained in W . Consider some p ∈ S. Of those β with ϕβ (p) > 0, let α be one of those for which εβ is maximal. Of course p ∈ Upα , and f˜(p) ∈ U2εα (f (pα )) because for any other β such that ϕβ (p) > 0 we have kf (pβ ) − f (pα )k ≤ kf (pβ ) − f (p)k + kf (p) − f (pα )k < 2εα . Therefore (p, f˜(p)) ∈ Upα × U2εα (f (pα )) ⊂ W . 10.3. MANIFOLDS 10.3 131 Manifolds The maneuver we saw in Section 10.1—passing from a calculus of functions between Euclidean spaces to a calculus of functions between vector spaces—was accomplished not by fully “eliminating” the coordinate systems of the domain and range, but instead by showing that the “real” meaning of the derivative would not change if we replaced those coordinate systems by any others. The definition of a C r manifold, and of a C r function between such manifolds, is a more radical and far reaching application of this idea. A manifold is an object like the sphere, the torus, and so forth, that “looks like” a Euclidean space in a neighborhood of any point, but which may have different sorts of large scale structure. We first of all need to specify what “looks like” means, and this will depend on a degree of differentiability. Fix an m-dimensional vector space X, an open U ⊂ X, and a degree of differentiability 0 ≤ r ≤ ∞. Recall that if A and B are topological spaces, a function e : A → B is an embedding if it is continuous and injective, and its inverse is continuous when e(A) has the subspace topology. Concretely, e maps open sets of A to open subsets of e(A). Note that the restriction of an embedding to any open subset of the domain is also an embedding. Lemma 10.3.1. If U ⊂ X is open and ϕ : U → Rk is a C r embedding such that for all x ∈ U the rank of Dϕ(x) is m, then ϕ−1 is a C r function. Proof. By Proposition 10.2.6 it suffices to show that ϕ−1 is locally C r . Fix a point p in the image of ϕ, let x = ϕ−1 (p), let X ′ be the image of Dϕ(x), and let π : Rk → X ′ be the orthogonal projection. Since ϕ is an immersion, X ′ is m-dimensional, and the rank of D(π ◦ ϕ)(x) = π ◦ Dϕ(x) is m. The inverse function theorem implies that the restriction of π ◦ ϕ to some open subset of Ũ containing x has a C r inverse. Now the chain rule implies that ϕ−1 |ϕ(Ũ ) = (π ◦ ϕ|Ũ )−1 ◦ π|ϕ(Ũ ) is C r . Definition 10.3.2. A set M ⊂ Rk is an m-dimensional C r manifold if, for each p ∈ M, there is a C r embedding ϕ : U → M, where U is an open subset of an m-dimensional vector space, such that for all x ∈ U the rank of Dϕ(x) is m and ϕ(M) is a relatively open subset of M that contains p. We say that ϕ is a C r parameterization for M and ϕ−1 is a C r coordinate chart for M. A collection {ϕi }i∈I of C r parameterizations for M whose images cover M is called a C r atlas for M. Although the definition above makes sense when r = 0, we will have no use for this case because there are certain pathologies that we wish to avoid. Among other things, the beautiful example known as the Alexander horned sphere (Alexander (1924)) shows that a C 0 manifold may have what is known as a wild embedding in a Euclidean space. From this point on we assume that r ≥ 1. There are many “obvious” examples of C r manifolds such as spheres, the torus, etc. In analytic work one should bear in mind the most basic examples: (i) A set S ⊂ Rk is discrete if each p ∈ S has a neighborhood W such that S ∩ W = {p}. A discrete set is a 0-dimensional C r manifold. 132 CHAPTER 10. DIFFERENTIABLE MANIFOLDS (ii) Any open subset (including the empty set) of an m-dimensional affine subspace of Rk is an m-dimensional C r manifold. More generally, an open subset of an m-dimensional C r manifold is itself an m-dimensional C r manifold. (iii) If U ⊂ Rm is open and φ : U → Rk−m is C r , then the graph Gr(φ) := { (x, φ(x)) : x ∈ U } ⊂ Rk of φ is an m-dimensional C r manifold, because ϕ : x 7→ (x, φ(x)) is a C r parameterization. 10.4 Smooth Maps Let M ⊂ Rk be an m-dimensional C r manifold, and let N ⊂ Rℓ be an ndimensional C r manifold. We have already defined what it means for a function f : M → N is C r to be C r : there is an open W ⊂ Rk that contains M and a C r function F : W → Rℓ such that F |M = f . The following characterization of this condition is technically useful and conceptually important. Proposition 10.4.1. For a function f : M → N the following are equivalent: (a) f is C r ; (b) for each p ∈ M there are C r parameterizations ϕ : U → M and ψ : V → N such that p ∈ ϕ(U), f (ϕ(U)) ⊂ ψ(V ), and ψ −1 ◦ f ◦ ϕ is a C r function; (c) ψ −1 ◦ f ◦ ϕ is a C r function whenever ϕ : U → M and ψ : V → N are C r parameterizations such that f (ϕ(U)) ⊂ ψ(V ). Proof. Because compositions of C r functions are C r , (a) implies (c), and since each point in a manifold is contained in the image of a C r parameterization, it is clear that (c) implies (b). Fix a point p ∈ M and C r parameterizations ϕ : U → M and ψ : V → N with p ∈ ϕ(U) and f (ϕ(U)) ⊂ ψ(V ). Lemma 10.3.1 implies that ϕ−1 and ψ −1 are C r , so ψ ◦ (ψ −1 ◦ f ◦ ϕ) ◦ ψ −1 is C r on its domain of definition. Since p was arbitrary, we have shown that f is locally C r , and Proposition 10.2.6 implies that f is C r . Thus (b) implies (a). There is a more abstract approach to differential topology (which is followed in Hirsch (1976)) in which an m-dimensional C r manifold is a topological space M together with a collection { ϕα : Uα → M }α∈A , where each ϕα is a homeomorphism betweenS an open subset Uα of an m-dimensional vector space and an open subset r of M, α ϕα (Uα ) = M, and for any α, α′ ∈ A, ϕ−1 α′ ◦ ϕα is C on its domain of definition. If N with collection { ψβ : Vβ :→ N } is an n-dimensional C r manifold, a function f : M → N is C r by definition if, for all α and β, ψβ−1 ◦ f ◦ ϕα is a C r function on its domain of definition. The abstract approach is preferable from a conceptual point of view; for example, we can’t see some Rk that contains the physical universe, so our physical theories should avoid reference to such an Rk if possible. (Sometimes Rk is called 10.5. TANGENT VECTORS AND DERIVATIVES 133 the ambient space.) However, in the abstract approach there are certain technical difficulties that must be overcome just to get acceptable definitions. In addition, the Whitney embedding theorems (cf. Hirsch (1976)) show that, under assumptions that are satisfied in almost all applications, a manifold satisfying the abstract definition can be embedded in some Rk , so our approach is not less general in any important sense. From a technical point of view, the assumed embedding of M in Rk is extremely useful because it automatically imposing conditions such as metrizability and thus paracompactness, and it allows certain constructions that simplify many proofs. There is a category of C r manifolds and C r maps between them. (This can be proved from the definitions, or we can just observe that this category can be obtained from the category of subsets of finite dimensional vector spaces and C r maps between them by restricting the objects.) The notion of isomorphism for this category is: Definition 10.4.2. A function f : M → N is a C r -diffeomorphism if f is a bijection and f and f −1 are both C r . If such an f exists we say that M and N are C r diffeomorphic. If M and N are C r diffeomorphic we will, for the most part, regard them as two different “realizations” of “the same” object. In this sense the spirit of the definition of a C r manifold is that the particular embedding of M in Rk is of no importance, and k itself is immaterial. 10.5 Tangent Vectors and Derivatives There are many notions of “derivative” in mathematics, but invariably the term refers to a linear approximation of a function that is accurate “up to first order.” The first step in defining the derivative of a C r map between manifolds is to specify the vector spaces that serve as the linear approximation’s domain and range. Fix an m-dimensional C r manifold M ⊂ Rk . Throughout this section, when we refer to a C r parameterization ϕ : U → M, it will be understood that U is an open subset of the m-dimensional vector space X. Definition 10.5.1. If ϕ : U → M is a C 1 parameterization and p = ϕ(x), then the tangent space of M at p is the image of this linear transformation Dϕ(x) : X → Rk . We should check that this does not depend on the choice of ϕ. If ϕ′ : U ′ → M is a second C 1 parameterization with ϕ′ (x′ ) = p, then the chain rule gives Dϕ′ (x′ ) = Dϕ(x)◦D(ϕ−1 ◦ϕ′ )(x′ ), so the image of Dϕ′ (x′ ) is contained in the image of Dϕ(x). We can combine the tangent spaces at the various points of M: Definition 10.5.2. The tangent bundle of M is [ T M := {p} × Tp M ⊂ Rk × Rk . p∈M 134 CHAPTER 10. DIFFERENTIABLE MANIFOLDS For a C r parameterization ϕ : U → M for M we define Tϕ : U × X → { (p, v) ∈ T M : p ∈ ϕ(U) } ⊂ T M by setting Tϕ (x, w) := (ϕ(x), Dϕ(x)w). Lemma 10.5.3. If r ≥ 2, then Tϕ is a C r−1 parameterization for T M. Proof. It is easy to see that Tϕ is a C r−1 immersion, and that it is injective. The inverse function theorem implies that its inverse is continuous. Every p ∈ M is contained in the image of some C r parameterization ϕ, and for every v ∈ Tp M, (p, v) is in the image of Tϕ , so the images of the Tϕ cover T M. Thus: Proposition 10.5.4. If r ≥ 2, then T M is a C r−1 manifold. Fix a second C r manifold N ⊂ Rℓ , which we assume to be n-dimensional, and a C r function f : M → N. Definition 10.5.5. If F is a C 1 extension of f to a neighborhood of p, the derivative of f at p is the linear function Df (p) = DF (p)|Tp M : Tp M → Tf (p) N. We need to show that this definition does not depend on the choice of extension F . Let ϕ : U → M be a C r parameterization whose image is a neighborhood of p, let x = ϕ−1 (p), and observe that, for any v ∈ Tp M, there is some w ∈ Rm such that v = Dϕ(x)w, so that DF (p)v = DF (p)(Dϕ(x)w) = D(F ◦ ϕ)(x)w = D(f ◦ ϕ)(x)w. We also need to show that the image of Df (p) is, in fact, contained in Tf (p) N. Let ψ : V → N be a C r parameterization of a neighborhood of f (p). The last equation shows that the image of Df (p) is contained in the image of D(f ◦ ϕ)(x) = D(ψ ◦ ψ −1 ◦ f ◦ ϕ)(x) = Dψ(ψ −1 (f (p))) ◦ D(ψ −1 ◦ f ◦ ϕ), so the image of Df (p) is contained in the image of Dψ −1 (ψ(f (p)), which is Tf (p) N. Naturally the chain rule is the most important basic result about the derivative. We expect that many readers have seen the following result, and at worst it is a suitable exercise, following from the chain rule of multivariable calculus without trickery, so we give no proof. Proposition 10.5.6. If M ⊂ Rk , N ⊂ Rℓ , and P ⊂ Rm are C 1 manifolds, and f : M → N and g : N → P are C 1 maps, then, at each p ∈ M, D(g ◦ f )(p) = Dg(f (p)) ◦ Df (p). We can combine the derivatives defined at the various points of M: 10.5. TANGENT VECTORS AND DERIVATIVES 135 Definition 10.5.7. The derivative of f is the function T f : T M → T N given by T f (p, v) := (f (p), Df (p)v). These objects have the expected properties: Proposition 10.5.8. If r ≥ 2, then T f is a C r−1 function. Proof. Each (p, v) ∈ T M is in the image of Tϕ for some C r parameterization ϕ whose image contains p. The chain rule implies that T f ◦ Tϕ : (x, w) 7→ f (ϕ(x)), D(f ◦ ϕ)(x)w , which is a C r−1 function. We have verified that T f satisfies (c) of Proposition 10.4.1. Proposition 10.5.9. T IdM = IdT M . Proof. Since IdRk is a C ∞ extension of IdM , we clearly have DIdM (p) = IdTp M for each p ∈ M. The claim now follows directly from the definition of T IdM . Proposition 10.5.10. If M, N, and P are C r manifolds and f : M → N and g : N → P are C r functions, then T (g ◦ f ) = T g ◦ T f . Proof. Using Proposition 10.5.6 we compute that T g(T f (p, v)) = T g(f (p), Df (p)v) = (g(f (p)), Dg(f (p))Df (p)v) = (g(f (p)), D(g ◦ f )(p)v) = T (g ◦ f )(p, v). For the categorically minded we mention that Proposition 10.5.4 and the last three results can be summarized very succinctly by saying that if r ≥ 2, then T is a functor from the category of C r manifolds and C r maps between them to the category of C r−1 manifolds and C r−1 maps between them. Again, we will not use this language later, so in a sense you do not need to know what a functor is, but categorical concepts and terminology are pervasive in modern mathematics, so it would certainly be a good idea to learn the basic definitions. Let’s relate the definitions above to more elementary notions of differentiation. Consider a C 1 function f : (a, b) → M and a point t ∈ (a, b). Formally Df (t) is a linear function from Tt (a, b) to Tf (t) M, but thinking about things in this way is usually rather cumbersome. Of course Tt (a, b) is just a copy of R, and we define f ′ (t) = Df (t)1 ∈ Tf (t) M, where 1 is the element of Tt (A, b) corresponding to 1 ∈ R. When M is an open subset of R we simplify further by treating f ′ (t) as a number under the identification of Tf (t) M with R. In this way we recover the concept of the derivative as we first learned it in elementary calculus. 136 10.6 CHAPTER 10. DIFFERENTIABLE MANIFOLDS Submanifolds For almost any kind of mathematical object, we pay special attention to subsets, or perhaps “substructures” of other sorts, that share the structural properties of the object. One only has to imagine a smooth curve on the surface of a sphere to see that such substructures of manifolds arise naturally. Fix a degree of differentiability 1 ≤ r ≤ ∞. If M ⊂ Rk is an m-dimensional C r manifold, N is an n-dimensional that is also embedded in Rk , and N ⊂ M, then N is a C r submanifold of M. The integer m − n is called the codimension of N in M. The reader can certainly imagine a host of examples, so we only mention one that might easily be overlooked because it is so trivial: any open subset of M is a C r manifold. Conversely, any codimension zero submanifold of M is just an open subset. Evidently submanifolds of codimension zero are not in themselves particularly interesting, but of course they occur frequently. Submanifolds arise naturally as images of smooth maps, and as solution sets of systems of equations. We now discuss these two points of view at length, arriving eventually at an important characterization result. Let M ⊂ Rk and N ⊂ Rℓ be C r manifolds that are m- and n-dimensional respectively, and let f : M → N be a C r function. We say that p ∈ M is: (a) an immersion point of f if Df (p) : Tp M → Tf (p) N is injective; (b) a submersion point of f if Df (p) is surjective; (c) a diffeomorphism point of f is Df (p) is a bijection. There are now a number of technical results. Collectively their proofs display the inverse function and the implicit function theorem as the linchpins of the analysis supporting this subject. Proposition 10.6.1. If p is an immersion point of f , then there is a neighborhood V of p such that f (V ) is an m-dimensional C r submanifold of N. In addition Df (p) : Tp M → Tf (p) f (V ) is a linear isomorphism Proof. Let ϕ : U → M be a C r parameterization for M whose image contains p, and let x = ϕ−1 (p). The continuity of the derivative implies that there is a neighborhood U ′ of x such that for all x′ ∈ U ′ the rank of D(f ◦ ϕ)(x′ ) is m. Let X ⊂ Rℓ be the image of Df (p), and let π : Rℓ → X be the orthogonal projection. Possibly after replacing U ′ with a suitable smaller neighborhood of x, the inverse function theorem implies that π ◦ f ◦ ϕ|U ′ is invertible. Let V = ϕ(U ′ ). Now f ◦ ϕ|U ′ is an embedding because its inverse is (π ◦ f ◦ ϕ|U ′ )−1 ◦ π. Lemma 10.3.1 implies that the inverse of f is also C r , so, for every x′ ∈ U ′ the rank of D(f ◦ ϕ)(x′ ) is m, so f (V ) = f (ϕ(U ′ )) satisfies Definition 10.3.2. The final assertion follows from Df (p) being injective while Tp M and Tf (p) (f (V ) are both m-dimensional. Proposition 10.6.2. If p is a submersion point of f , then there is a neighborhood U of p such that f −1 (f (p)) ∩ U is a (m − n)-dimensional C r submanifold of M. In addition Tp f −1 (q) = ker Df (p). 10.6. SUBMANIFOLDS 137 Proof. Let ϕ : U → M be a C r parameterization whose image is an open neighborhood of p, let w0 = ϕ−1 (p), and let ψ : Z → Rn be a C r coordinate chart for an open neighborhood Z ⊂ N of f (p). Without loss of generality we may assume that f (ϕ(U)) ⊂ Z. Since Dϕ(w0 ) and Dψ(f (p)) are bijections, D(ψ ◦ f ◦ ϕ)(w0 ) = Dψ(f (p)) ◦ Df (p) ◦ Dϕ(w0) is surjective, and the vector space containing U can be decomposed as X × Y where Y is n dimensional and Dy (ψ ◦ f ◦ ϕ)(w0 ) is nonsingular. Let w0 = (x0 , y0 ). The implicit function theorem gives an open neighborhood V ⊂ X containing x0 , an open W ⊂ U containing w0 , and a C r function g : V → Y such that g(x0 ) = y0 and { (x, g(x)) : x ∈ V } = { w ∈ W : f (ϕ(w)) = f (p) }. Then { ϕ(x, g(x)) : x ∈ V } = f −1 (f (p)) ∩ ϕ(W ) is a neighborhood of p in f −1 (f (p)), and x 7→ ϕ(x, g(x)) is a C r embedding because its inverse is the composition of ϕ−1 with the projection (x, y) 7→ x. We obviously have Tp f −1 (q) ⊂ ker Df (p), and the two vector spaces have the same dimension. Proposition 10.6.3. If p is a diffeomorphism point of f , then there is a neighborhood W of p such that f (W ) is a neighborhood of f (p) and f |W : W → f (W ) is a C r diffeomorphism. Proof. Let ϕ : U → M be a C r parameterization of a neighborhood of p, let x = ϕ−1 (p), and let ψ : V → N be a C r parameterization of a neighborhood of f (p). Then D(ψ −1 ◦ f ◦ ϕ)(x) = Dψ −1 (f (p)) ◦ Df (p) ◦ Dϕ(x) is nonsingular, so the inverse function theorem implies that, after replacing U and V with smaller open sets containing x and ψ −1 (f (p)), ψ −1 ◦ f ◦ ϕ is invertible with C r inverse. Let W = ϕ(U). We now have (f |W )−1 = ϕ ◦ (ψ −1 ◦ f ◦ ϕ)−1 ◦ ψ −1 , which is C r . Now let P be a p-dimensional C r submanifold of N. The following is the technical basis of the subsequent characterization theorem. Lemma 10.6.4. If q ∈ P then: (a) There is a neighborhood V ⊂ P , a p-dimensional C r manifold M, a C r function f : M → P , a p ∈ f −1 (q) that is an immersion point of f , and a neighborhood U of P , such that f (U) = V . (b) There is a neighborhood Z ⊂ N of q, an (n − p)-dimensional C r manifold M, and a C r function f : Z → M such q is a submersion point of f and f −1 (f (q)) = P ∩ Z. 138 CHAPTER 10. DIFFERENTIABLE MANIFOLDS Proof. Let ϕ : U → P be a C r parameterization for P whose image contains q. Taking f = ϕ verifies (a). Let w = ϕ−1 (q). Let ψ : V → N be a C r parameterization for N whose image contains q. Then the rank of D(ψ −1 ◦ ϕ)(w) is p, so the vector space containing V can be decomposed as X × Y where X is the image of D(ψ −1 ◦ ϕ)(w). Let πX : X × Y → X and πY : X × X → Y be the projections (x, y) 7→ x and (x, y) 7→ y respectively. The inverse function implies that, after replacing U with a smaller neighborhood of w, πX ◦ ψ −1 ◦ ϕ is a C r diffeomorphism between U and −1 an open W ⊂ X. Since we can replace V with V ∩ πX (W ), we may assume that πX (V ) ⊂ W . Let Z = ψ(V ), and let f = πY ◦ ψ −1 − πY ◦ ψ −1 ◦ ϕ ◦ (πX ◦ ψ −1 ◦ ϕ)−1 ◦ πX ◦ ψ −1 : Z → Y. Evidently every point of V is a submersion point of πY − ψ −1 ◦ ϕ ◦ (πX ◦ ψ −1 ◦ ϕ)−1 ◦ πX , so every point of Z is a submersion point of f . If q ′ ∈ P ∩ Z, then q ′ = ϕ(w ′ ) for some w ′ ∈ U, so f (q ′ ) = 0. On the other hand, suppose f (q ′) = 0, and let q ′′ be the image of q ′ under the map ϕ ◦ (πX ◦ ψ −1 ◦ ϕ)−1 ◦ πX ◦ ψ −1 . Then πX (ψ −1 (q ′ )) = πX (ψ −1 (q ′′ )) and πY (ψ −1 (q ′ )) = πY (ψ −1 (q ′′ )), so q ′′ = q ′ and thus q ′ ∈ P . Thus f −1 (f (q)) = P ∩ Z. Theorem 10.6.5. Let N be a C r manifold. For P ⊂ N the following are equivalent: (a) P is a p-dimensional C r submanifold of M. (b) For every q ∈ P there is a relatively open neighborhood V ⊂ P , a p-dimensional C r manifold M, a C r function f : M → P , a p ∈ f −1 (q) that is an immersion point of f , and a neighborhood U of P , such that f (U) = V . (c) For every q ∈ P there is a neighborhood Z ⊂ N of q, an (n − p)-dimensional C r manifold M, and a C r function f : Z → M such q is a submersion point of f and f −1 (f (q)) = P ∩ Z. Proof. The last result asserts that (a) implies (b) and (c), Proposition 10.6.1 implies that (b) implies (a), and Proposition 10.6.2 implies that (c) implies (a). Let M ⊂ Rk and N ⊂ Rℓ be an m-dimensional and an n-dimensional C r manifold, and let f : M → N be a C r function. We say that f is an immersion if every p ∈ M is an immersion point of f . It is a submersion if every p ∈ M is a submersion point, and it is a local diffeomorphism if every p ∈ M is a diffeomorphism point. There are now some important results that derive submanifolds from functions. Theorem 10.6.6. If f : M → N is a C r immersion, and an embedding, then f (M) is an m-dimensional C r submanifold of N. Proof. We need to show that any q ∈ f (M) has a neighborhood in f (M) that is an (n − m)-dimensional C r manifold. Proposition 10.6.1 implies that any p ∈ M has an open neighborhood V such that f (V ) is a C r (n − m)-dimensional submanifold of N. Since f is an embedding, f (V ) is a neighborhood of f (p) in f (M). 139 10.6. SUBMANIFOLDS A submersion point of f is also said to be a regular point of f . If p is not a regular point of f , then it is a critical point of f . A point q ∈ N is a critical value of f if some preimage of q is a critical point, and if q is not a critical value, then it is a regular value. Note the following paradoxical aspect of this terminology: if q is not a value of f , in the sense that f −1 (q) = ∅, then q is automatically a regular value of f . Theorem 10.6.7 (Regular Value Theorem). If q is a regular value of f , then f −1 (q) is an (m − n)-dimensional submanifold of M. Proof. This is an immediate consequence of Proposition 10.6.2. This result has an important generalization. Let P ⊂ N be a p-dimensional C r submanifold. Definition 10.6.8. The function f is transversal to P along S ⊂ M if, for all p ∈ f −1 (P ) ∩ S, im Df (p) + Tf (p) P = Tf (p) N. We write f ⋔S P to indicate that this is the case, and when S = M we simply write f ⋔ P. Theorem 10.6.9 (Transversality Theorem). If f ⋔ P , then f −1 (P ) is an (m − n + p)-dimensional C r submanifold of M. For each p ∈ f −1 (P ), Tp f −1 (P ) = Df (p)−1 (Tf (p) P ). Proof. Fix p ∈ f −1 (P ). (If f −1 (P ) = ∅, then all claims hold trivially.) We use the characterization of a C r submanifold given by Theorem 10.6.5: since P is a submanifold of N, there is a neighborhood W ⊂ N of f (p) and a C r function Ψ : W → Rn−p such that DΨ(f (p)) has rank n − p and P ∩ W = Ψ−1 (0). Let V = f −1 (W ) and Φ = Ψ ◦ f |V . Of course V is open, Φ is C r , and f −1 (P ) ∩ V = Φ−1 0). We compute that im DΦ(p) = DΨ(f (p)) im Df (p) = DΨ(f (p)) im Df (p) + ker DΨ(f (p)) = DΨ(f (p)) im Df (p) + Tf (p) P = DΨ(f (p))(Tf (p) N) = Rn−s . (The third equality follows from the final assertion of Proposition 10.6.2, and the fourth is the transversality assumption.) Thus p is a submersion point of Φ. Since p is an arbitrary point of f −1 (P ) the claim follows from Theorem 10.6.5. We now have Tp f −1 (P ) = ker DΦ(p) = ker(DΨ(f (p)) ◦ Df (p)) = Df (p)−1 (ker DΨ(p)) = Df (p)−1 (Tf (p) P ) where the first and last equalities are from Proposition 10.6.2. 140 10.7 CHAPTER 10. DIFFERENTIABLE MANIFOLDS Tubular Neighborhoods Fix a degree of differentiability r ≥ 2 and an n dimensional C r manifold N ⊂ Rℓ . For each q ∈ N let νq N be the orthogonal complement of Tq N. The normal bundle of N is [ νN = {q} × νq N. q∈N Proposition 10.7.1. νN is an ℓ-dimensional C r−1 submanifold of N × Rℓ . Proof. Let ϕ : U → Rℓ be a C r parameterization for N. Let Z : U × Rℓ → Rn be the function Z(x, w) = hDϕ(x)e1 , wi, . . . , hDϕ(x)en , wi where e1 , . . . , em is the standard basis for Rn . Clearly Z is C r−1 , and for every (x, w) in its domain the rank of DZ(x, w) is n. Therefore the regular value theorem implies that Z −1 (0) = { (x, w) ∈ U × Rℓ : (ϕ(x), w) ∈ νN } is a ℓ-dimensional C r−1 manifold. Since (x, w) 7→ (ϕ(x), w) and (q, w) 7→ (ϕ−1 (q), w) are inverse C r−1 bijections between Z −1 (0) and νN ∩ (ϕ(U) × Rℓ ), the first of these maps is a C r−1 embedding, which implies (Theorem 10.6.6) that the latter set is a C r−1 manifold. Of course these sets cover νN because the images of C r parameterizations cover N. Like the tangent bundle, the normal bundle attaches a vector space of a certain dimension to each point of N. (The general term for such a construct is a vector bundle.) The zero section of νN is { (q, 0) : q ∈ N }. There are maps π : (q, v) 7→ q and σ : (q, v) 7→ q + v from N × Rℓ to N and Rℓ respectively. Let π T = Π|T M , π ν = π|νN , σ T = σ|T M , and σ ν = σ|νN . For a continuous function ρ : N → (0, ∞) let Uρ = { (q, v) ∈ νN : kvk < ρ(q) }, and let σρν = σ ν |Uρ . The main topic of this section is the following result and its many applications. Theorem 10.7.2 (Tubular Neighborhood Theorem). There is a continuous ρ : N → (0, ∞) such that σρν is a C r−1 diffeomorphism onto its image, which is a neighborhood of N. The inverse function theorem implies that each (q, 0) in the zero section has a neighborhood that is mapped C r−1 diffeomorphically by σ onto a neighborhood of q in Rℓ . The methods used to produce a suitable neighborhood of the zero section with this property are topological and quite technical, in spite of their elementary character. 10.7. TUBULAR NEIGHBORHOODS 141 Lemma 10.7.3. If (X, d) and (Y, e) are metric spaces, f : X → Y is continuous, S is a subset of X such that f |S is an embedding, and for each s ∈ S the restriction of f to some neighborhood Ns of s is an embedding, then there is an open U such S that S ⊂ U ⊂ s Ns and f |U is an embedding. Proof. For s ∈ S let δ(s) be one half of the supremum of the set of ε > 0 such that Uε (s) ⊂ Ns and f |Uε (s) is an embedding. The restriction of an embedding to any subset of its domain is an embedding, which implies that δ is continuous. Since f |S is an invertible, its inverse is continuous. In conjunction with the continuity of δ and d, this implies that for each s ∈ S there is a ζs > 0 such that d(s, s′ ) < min{δ(s) − 21 δ(s′ ), δ(s) − 12 δ(s′ )} (∗) for all s′ ∈ S with e(f (s), f (s′)) ≤ ζs . For each s chooseSan open Us ⊂ X such that s ∈ Us ⊂ Uδ(s)/2 (s) and f (Us ) ⊂ Uζs /3 (f (s)). Let U = s∈S Us . We will show that f |U is injective with continuous inverse. Consider s, s′ ∈ S and y, y ′ ∈ Y with e(f (s), y) < ζs /3 and e(f (s′ ), y ′) < ζs′ /3. We claim that if y = y ′ , then (∗) holds: otherwise e(f (s), f (s′)) > ζs , ζs′ , so that e(y, y ′) ≥ e(f (s), f (s′)) − e(f (s), y) − e(f (s′ ), y ′) > ( 21 e(f (s), f (s′)) − ζs /3) + ( 21 e(f (s), f (s′)) − ζs′ /3) ≥ 16 (ζs + ζs′ ). In particular, if f (x) = y = y ′ = f (x′ ) for some x ∈ Us and x′ ∈ Us′ , then 1 δ(s′ ) + d(s, s′ ) ≤ δ(s) and thus 2 Us′ ⊂ Uδ(s′ )/2 (s′ ) ⊂ Uδ(s′ )/2+d(s,s′ ) (s) ⊂ Uδ(s) (s). We have x ∈ Us , x′ ∈ Us′ , and Us , Us′ ⊂ Uδ(s) (s), and f |Uδ(s) (s) is injective, so it follows that x = x′ . We have shown that f |U is injective. We now need to show that the image of any open subset of U is open in the relative topology of f (U). Fix a particular s ∈ S. In view of the definition of U, it suffices to show that if Vs ⊂ Us is open, then f (Vs ) is relatively open. The restriction of f to Uδ(s) (s) is an embedding, so there is an open Zs ⊂ Y such that f (Vs ) = f (Uδ(s) (s)) ∩ Zs . Since f (Vs ) ⊂ f (Us ) ⊂ Uζs /3 (f (s)) we have f (Vs ) = f (U) ∩ Uζs /3 (f (s)) ∩ Zs ∩ f (Uδ(s) (s)). Above we showed that if Uζs /3 (f (s)) ∩ Uζs′ /3 (f (s′ )) is nonempty, then (∗) holds. Therefore f (U) ∩ Uζs /3 (f (s)) is contained in the union of the f (Us′ ) for those s′ such that 12 δ(s′ ) + d(s, s′ ) < δ(s), and for each such s′ we have Us′ ⊂ Uδ(s′ )/2 (s′ ) ⊂ Uδ(s) (s). Therefore f (U) ∩ Uζs /3 (f (s)) ⊂ f (Uδ(s) (s)), and consequently f (Vs ) = f (U) ∩ Uζs /3 (f (s)) ∩ Zs , so f (Vs ) is relatively open in f (U). Lemma 10.7.4. If (X, d) is a metric space, S ⊂ X, and U is an open set containing S, then there is a continuous δ : S → (0, ∞) such that for all s ∈ S, Uδ(s) (s) ⊂ U. 142 CHAPTER 10. DIFFERENTIABLE MANIFOLDS Proof. For each s ∈ S let βs = sup{ ε > 0 : Uε (s) ⊂ U }. Since X is paracompact (Theorem 6.1.1) there is a locally finite refinement {Vα }α∈A of {Uβs (s)}s∈S . Theorem 6.2.2 gives a partition of unity {ϕα } subordinate to {Vα }. The claim holds trivially if there is some α with Vα = X; otherwise for each α let δα : S → [0, ∞) be the function δα (s) P = inf x∈X\Vα d(s, x), which is of course continuous, and define δ by setting δ(s) := α ϕα (s)δα (s). If s ∈ S, s ∈ Vα , and δα′ (s) ≤ δα (s) for all other α′ such that s ∈ Vα′ , then Uδ(s) (s) ⊂ Uδα (s) (s) ⊂ Vα ⊂ Uβs′ (s′ ) ⊂ U for some s′ , so Uδ(s) (s) ⊂ U. The two lemmas above combine to imply that: Proposition 10.7.5. If (X, d) and (Y, e) are metric spaces, f : X → Y is continuous, S is a subset of X such that f |S is an embedding, and for each s ∈ S the restriction of f to some neighborhood Ns of s is an embedding, then there is a continuous ρ : S → (0, ∞) such that Uρ(s) (s) ⊂ Ns for all s and the restriction of S f to s∈S Uρ(s) (s) is an embedding. Proof of the Tubular Neighborhood Theorem. The inverse function theorem implies that each point (q, 0) in the zero section of νN has a neighborhood Nq such that then σρν is an embedding, σ ν |Ns is a C r−1 diffeomorphism. If ρ is in the last result, S r−1 and its inverse is C differentiable because Uρ ⊂ q Nq . We now develop several applications of the tubular neighborhood theorem. Let M be an m-dimensional C r ∂-manifold. Theorem 10.7.6. For any S ⊂ M, C r−1 (S, N) is dense in CS (S, N). Proof. Proposition 10.2.7 implies that C r−1 (S, Vρ ) is dense in CS (S, Vρ ), and Proposition 5.5.3 implies that f 7→ π ν ◦ σρ−1 ◦ f is continuous. Recall that a topological space X is locally path connected if, for each x ∈ X, each neighborhood U of x contains a neighborhood V such that for any x0 , x1 ∈ V there is a continuous path γ : [0, 1] → U with γ(0) = x0 and γ(1) = x1 . For an open subset of a locally convex topological vector space, local path connectedness is automatic: any neighborhood of a point contains a convex neighborhood. Theorem 10.7.7. For any S ⊂ M, CS (S, N) is locally path connected. Proof. Fix a neighborhood U ⊂ CS (S, N) of a continuous f : S → N. The definition of the strong topology implies that there is an open W ⊂ S ×N such that f ∈ { f ′ ∈ C(S, N) : Gr(f ′ ) ⊂ W } ⊂ U. Lemma 10.7.4 implies that there is a continuous λ : N → (0, ∞) such that Uλ(y) (y) ⊂ Vρ for all y ∈ N and (x, π(σρ−1 (z))) ∈ W for all x ∈ S and z ∈ Uλ(f (x)) (f (x)). Let W ′ = { (x, y) ∈ W : y ∈ Uλ(f (x)) (f (x)) }. For f0 , f1 ∈ C(S, N) with Gr(f0 ), Gr(f1 ) ⊂ W ′ we define h by setting h(x, t) = π ν σρ−1 ((1 − t)f0 (x) + tf1 (x)) . If f0 and f1 are C r , so that they are the restrictions to S of C r functions defined on open supersets of S, then this formula defines a C r extension of h to an open superset of S × [0, 1], so that h is C r . 143 10.8. MANIFOLDS WITH BOUNDARY Proposition 10.7.8. There is a continuous function λ : N → (0, ∞) and a C r−1 function κ : Vλ → N, where Vλ = { (q, v) ∈ T N : kvk < λ(q) }, such that the function κ̃ : (q, v) 7→ (q, κ(q, v)) is a C r−1 diffeomorphism between Vλ and a neighborhood of the diagonal in N × N. Proof. Let ρ and Uρ be as in the tubular neighborhood theorem. For each q ∈ N there is a neighborhood Nq of (q, 0) ∈ Tq N such that σ T (Nq ) is contained in σρν (Uρ ). Let [ [ Nq → N and κ̃ = π T × κ : Nq → N × N. κ = π ν ◦ (σρν )−1 ◦ σ T : q q It is easy to see (and not hard to compute formally using the chain rule) that Dκ̃(q, 0) = IdTq N × IdTq N under the natural identification of T(q,0) (T N) with Tq N × Tq N. The inverse function theorem implies that after replacing Nq with a smaller neighborhood of (q, 0), the restriction of κ̃ to Nq is a diffeomorphism onto its image. We can now proceed as in the proof of the tubular neighborhood theorem. The following construction simulates convex combination. Proposition 10.7.9. There is a neighborhood W of the diagonal in N × N and a continuous function c : W × [0, 1] → N such that: (a) c((q, q ′), 0) = q for all (q, q ′) ∈ W ; (b) c((q, q ′), 1) = q ′ for all (q, q ′ ) ∈ W ; (c) c((q, q), t) = q for all q ∈ N and all t. Proof. The tubular neighborhood gives an open neighborhood U of the zero section in νN such that if σ : νN → Rk is the map σ(q, v) = q + v, then σ|U is a homeomorphism between U and σ(U). Let π : νN → N be the projection for the normal bundle. Let W = { (q, q ′) ∈ N × N : (1 − t)q + tq ′ ∈ σ(U) for all 0 ≤ t ≤ 1 }, and for (q, q ′) ∈ W and 0 ≤ t ≤ 1 let c((q, q ′), t) = π (σ|U )−1 ((1 − t)q + tq ′ ) . 10.8 Manifolds with Boundary Let X be an m-dimensional vector space, and let H be a closed half space of X. In the same way that manifolds were “modeled” on open subsets of X, manifolds with boundary are “modeled” on open subsets of H. Examples of ∂-manifolds include the m-dimensional unit disk D m := { x ∈ Rm : kxk ≤ 1 }, 144 CHAPTER 10. DIFFERENTIABLE MANIFOLDS the annulus { x ∈ R2 : 1 ≤ x ≤ 2 }, and of course H itself. Since we will be very concerned with homotopies, a particularly important example is M × [0, 1] where M is a manifold (without boundary). Thus it is not surprising that we need to extend our formalism in this direction. What actually seems more surprising is the infrequency with which one needs to refer to “manifolds with corners,” which are spaces that are “modeled” on the nonnegative orthant of Rm . There is a technical point that we need to discuss. If U ⊂ H is open and f : U → Y is C 1 , where Y is another vector space, then the derivative Df (x) is defined at any x ∈ U, including those in the boundary of H, in the sense that all C 1 extensions f˜ : Ũ → Y of f to open (in X) sets Ũ with Ũ ∩ H = U have the same derivative at x. This is fairly easy to prove by showing that if w ∈ X and the ray rw = { x + tw : t ≥ 0 } from x “goes into” H, then the derivative of f˜ along rw is determined by f , and that the set of such w spans X. We won’t belabor the point by formalizing this argument. The following definitions parallel those of the last section. If U ⊂ H is open and ϕ : U → Y is a function, we say that ϕ is a C r ∂-immersion if it is C r and the rank of Dϕ(x) is m for all x ∈ U. If, in addition, ϕ is a homeomorphism between U and ϕ(U), then we say that ϕ is a C r ∂-embedding. Definition 10.8.1. If M ⊂ Rk , an m-dimensional C r ∂-parameterization for M is a C r ∂-embedding ϕ : U → M, where U ⊂ H is open and ϕ(U) is a relatively open subset of M. If each p ∈ M is contained in the image of a C r parameterization for M, then M is an m-dimensional C r manifold with boundary. We will often write “∂-manifold” in place of the cumbersome phrase “manifold with boundary.” Fix an m-dimensional C r ∂-manifold M ⊂ Rk . We say that p ∈ M is a boundary point of M if there a C r ∂-parameterization of M that maps a point in the boundary of H to p. If any C r parameterization of a neighborhood of p has this property, then all do; this is best understood as a consequence of invariance of domain (Theorem 14.4.4) which is most commonly proved using algebraic topology. Invariance of domain is quite intuitive, and eventually we will be able to establish it, but in the meantime there arises the question of whether our avoidance of results derived from algebraic topology is “pure.” One way of handling this is to read the definition of a ∂-manifold as specifying which points are in the boundary. That is, a ∂manifold is defined to be a subset of Rk together with an atlas of m-dimensional C r parameterizations {ϕi }i∈I such that each ϕ−1 j ◦ ϕi maps points in the boundary of H to points in the boundary and points in the interior to points in the interior. In order for this to be rigorous it is necessary to check that all the constructions in our proofs preserve this feature, but this will be clear throughout. With this point cleared up, the boundary of M is well defined; we denote this subset by ∂M. Note that ∂M automatically inherits a system of coordinate systems that display it as an (m − 1)-dimensional C r manifold (without boundary). Naturally our analytic work will be facilitated by characterizations of ∂-manifolds that are somewhat easier to verify than the definition. Lemma 10.8.2. For M ⊂ Rk the following are equivalent: 10.8. MANIFOLDS WITH BOUNDARY 145 (a) M is an m-dimensional ∂-manifold; (b) for each p ∈ M there is a neighborhood W ⊂ M, an m-dimensional C r manifold (without boundary) W̃ , and a C r function h : W̃ → R such that W = h−1 ([0, ∞)) and Dh(p) 6= 0. Proof. Fix p ∈ M. If (a) holds then there is a C r ∂-embedding ϕ : U → M, where U ⊂ H is open and ϕ(U) is a relatively open subset of M. After composing with an affine function, we have assume that H = { x ∈ Rm : xm ≥ 0 }. Let ϕ̃ : Ũ → Rk be a C r extension of ϕ to an open (in Rm ) superset of U. After replacing Ũ with a smaller neighbohrood of ϕ−1 (p) it will be the case that ϕ is a C r embedding, and we may replace U with its intersection with this smaller neighborhood. To verify (b) we set W̃ = ϕ̃(Ũ ) and W = ϕ(U), and we let h be the last component function of ϕ̃−1 . Now suppose that W , W̃ , and h are as in (b). Let ψ̃ : Ṽ → W̃ be a C r parameterization for W̃ whose image contains p, and let x̃ = ψ̃ −1 (p). Since Dh(p) 6= ψ̃) 0 there is some i such that ∂(h◦ (x̃) 6= 0; after reindexing we may assume that ∂xi m i = m. Let η : W̃ → R be the function η(x) = x1 , . . . , xm−1 , h(ψ̃(x)) . Examination of the matrix of partial derivatives shows that Dη(x̃) is nonsingular, so, by the inverse function, after replacing W̃ with a smaller neighborhood of x̃, we may assume that η is a C r embedding. Let Ũ = η(Ṽ ), U = Ũ ∩H, ϕ̃ = ψ̃ ◦η −1 : Ũ → W̃ , and ϕ = ϕ̃|U : U → W . Evidently ϕ is a C r ∂-parameterization for M. The following consequence is obvious, but is still worth mentioning because it will have important applications. Proposition 10.8.3. If M is an m-dimensional C r manifold, f : M → R is C r , and a is a regular value of f , then f −1 ([a, ∞)) is an m-dimensional C r ∂-manifold. The definitions of tangent spaces, tangent manifolds, and derivatives, are only slightly different from what we saw earlier. Suppose that M ⊂ Rk is an mdimensional C r ∂-manifold, ϕ : U → M is a C r ∂-parameterization, x ∈ U, and ϕ(x) = p. The definition of a C r function gives a C r extension ϕ̃ : Ũ → Rk of ϕ to an open (in Rm ) superset of U, and we define Tp M to be the image of D ϕ̃(x). (Of course there is no difficulty showing that D ϕ̃(x) does not depend on the choice of extension ϕ̃.) As before, the tangent manifold of M is [ TM = {p} × Tp M. p∈M Let πT M : T M → M be the natural projection π : (p, v) 7→ p. We wish to show that T M is a C r−1 ∂-manifold. To this end define Tϕ : U × m R → πT−1M (U) by setting Tϕ (x, w) = (ϕ(x), D ϕ̃(x)w). If r ≥ 2, then Tϕ is an injective C r−1 ∂-immersion whose image is open in T M, so it is a C r ∂-embedding. Since T M is covered by the images of maps such as Tϕ , it is indeed a C r−1 ∂manifold. 146 CHAPTER 10. DIFFERENTIABLE MANIFOLDS If N ⊂ Rℓ is an n-dimensional C r ∂-manifold and f : M → N is a C r map, then the definitions of Df (p) : Tp M → Tf (p) N for p ∈ M and T f : T M → T N, and the main properties, are what we saw earlier, with only technical differences in the explanation. In particular, T extends to a functor from the category C r ∂-manifolds and C r maps to the category of C r−1 ∂-manifolds and C r−1 maps. We also need to reconsider the notion of a submanifold. One can of course define a C r ∂-submanifold of M to be a C r ∂-manifold that happens to be contained in M, but the submanifolds of interest to us satisfy additional conditions. Any point in the submanifold that lies in ∂M should be a boundary point of the submanifold, and we don’t want the submanifold to be tangent to ∂M at such a point. Definition 10.8.4. If M is a C r ∂-manifold, a subset P is a neat C r ∂-submanifold if it is a C r ∂-manifold, ∂P = P ∩∂M, and for each p ∈ ∂P we have Tp P +Tp ∂M = Tp M. The reason this is the relevant notion has to do with transversality. Suppose that M is a C r ∂-manifold, N is a C r manifold, without boundary, P is a C r submanifold of N, and f : M → N is C r . We say that f is transversal to P along S ⊂ M, and write f ⋔S P , if f |M \∂M ⋔S\∂M P and f |∂M ⋔S∩∂M P . As above, when S = M we write f ⋔ P . The transversality theorem generalizes as follows: Proposition 10.8.5. If f : M → N is a C r function that is transversal to P , then f −1 (P ) is a neat C r submanifold of M with ∂f −1 (P ) = f −1 (P ) ∩ ∂M. Proof. We need to show that a neighborhood of a point p ∈ f −1 (P ) has the required properties. If p ∈ M \ ∂M, this follows from the Theorem 10.6.9, so suppose that p ∈ ∂M. Lemma 10.8.2 implies that there is a neighborhood W ⊂ M of p, an m-dimensional C r manifold W̃ , and a C r function h : W̃ → R such that W = h−1 ([0, ∞)), h(p) = 0, and Dh(p) 6= 0. Let f˜ : W̃ → N be a C r extension of f |W 1 . We may assume that f˜ is transverse to P , so the transversality theorem implies that f˜−1 (P ) is a C r submanifold of W̃ . Since f˜ and f |∂M are both tranverse to P , there must be a v ∈ Tp M \Tp ∂M such that D f˜(p)v ∈ Tf (p) P . This implies two things. First, since v ∈ / ker Dh(p) = Tp ∂M −1 −1 −1 and f (P )∩W = f˜ (P )∩W ∩h ([0, ∞)), Lemma 10.8.2 implies that f −1 (P )∩W is a C r ∂-manifold in a neighborhood of p. Second, the transversality theorem implies that Tp f −1 (P ) includes v, so we have Tp f −1 (P ) + Tp ∂M = Tp M. 10.9 Classification of Compact 1-Manifolds In order to study the behavior of fixed points under homotopy, we will need to understand the structure of h−1 (q) when M and N are manifolds of the same dimension, h : M × [0, 1] → N 1 If ψ : V → N is a C r parameterization for N whose image contains f (W ), then ψ −1 has a C r extension, because that is what it means for a function on a possibly nonopen domain to be C r , and this extension can be composed with ψ to give f˜. 10.9. CLASSIFICATION OF COMPACT 1-MANIFOLDS 147 is a C r homotopy, and q is a regular value of h. The transverality theorem implies that h−1 (q) is a 1-dimensional C r ∂-manifolds, so our first step is the following result. Proposition 10.9.1. A nonempty compact connected 1-dimensional C r manifold is C r diffeomorphic to the circle C = { (x, y) ∈ R2 : x2 +y 2 = 1 }. A compact connected 1-dimensional C r ∂-manifold with nonempty boundary is C r diffeomorphic to [0, 1]. Of course no one has any doubts about this being true. If there is anything to learn from the following technical lemma and the subsequent argument, it can only concern technique. Readers who skip this will not be at any disadvantage. Lemma 10.9.2. Suppose that a < b and c < d, and that there is an increasing C r diffeomorphism f : (a, b) → (c, d). Then for sufficiently large Q ∈ R there is an increasing C r diffeomorphism λ : (a, b) → (a − Q, d) such that λ(s) = s − Q for all s in some interval (a, a + δ) and λ(s) = f (s) for all s in some interval (b − ε, b). Proof. Lemma 10.2.2 presented a C ∞ function γ : R → [0, ∞] with γ(t) = 0 for all t ≤ 0 and γ ′ (t) > 0 for all t > 0. Setting κ(s) = γ(s − a − δ) γ(s − a − δ) + γ(b − ε − s) for sufficiently small δ, ε > 0 gives a C ∞ function κ : (a, b) → [0, 1] with κ(s) = 0 for all s ∈ (a, a + δ), κ(s) = 1 for all s ∈ (b − ε, b), and κ′ (s) > 0 for all s such that 0 < κ(s) < 1. For any real number Q we can define λ : (a, b) → R by setting λ(s) = (1 − κ(s))(s − Q) + κ(s)f (s). Clearly this will be satisfactory if λ′ (s) > 0 for all s. A brief calculation gives λ′ (s) = 1 + κ(s)(f ′(s) − 1) + κ′ (s)(Q + f (s) − s) = (1 − κ(s))(1 − f ′ (s)) + f ′ (s) + κ′ (s)(Q + f (s) − s). If Q is larger than the upper bound for s − f (s), then λ′ (s) > 0 when κ(s) is close to 0 or 1. Since those s for which this is not the case will be contained in a compact interval on which κ′ positive and continuous, hence bounded below by a positive constant, if Q is sufficiently large then λ′ (s) > 0 for all s. Proof of Proposition 10.9.1. Let M be a nonempty compact connected 1-dimensional C r manifold. We can pass from a C r atlas for M to a C r atlas whose elements all have connected domains by taking the restrictions of each element of the atlas to the connected components of its domain. To be concrete, we will assume that the domains of the parameterizations are connected subsets of R, i.e., open intervals. Since we can pass from a parameterization with unbounded domain to a countable collection of restrictions to bounded domains, we may assume that all domains are bounded. Since M is compact, any atlas has a finite subset that is also an atlas. We now have an atlas of the form { ϕ1 : (a1 , b1 ) → M, . . . , ϕK : (aK , bK ) → M }. 148 CHAPTER 10. DIFFERENTIABLE MANIFOLDS Finally, we may assume that K is minimal. Since M is compact, K > 1. Let p be a limit point of ϕ1 (s) as s → b1 . If p was in the image of ϕ1 , say p = ϕ1 (s1 ), then the image of a neighborhood of s1 would be a neighborhood of p, and points close to b1 would be mapped to this neighborhood, contradicting the injectivity of ϕ1 . Therefore p is not in the image of ϕ1 . After reindexing, we may assume that p is in the image of ϕ2 , say p = ϕ2 (t2 ). Fix ε > 0 small enough that [t2 − ε, t2 + ε] ⊂ (a2 , b2 ). Since ϕ2 ((t2 − ε, t2 + ε)) and M \ ϕ2([t2 − ε, t2 + ε]) are open and disjoint, and there at most two s such that ϕ1 (s) = ϕ2 (t2 ±ε), there is some δ > 0 such that ϕ1 ((b2 −δ, b1 )) ⊂ ϕ2 ((t2 −ε, t2 +ε)). r Then f = ϕ−1 2 ◦ ϕ1 |(b1 −δ,b1 ) is a C diffeomorphism. The intermediate value theorem implies that it is monotonic. Without loss of generality (we could replace ϕ2 with t 7→ ϕ2 (−t)) we may assume that it is increasing. Of course lims→b1 f (s) = t2 . The last result implies that there is some real number Q and an increasing C r diffeomorphism λ : (b1 − δ, b1 ) → (b1 − δ − c, t2 ) such that λ(s) = s − Q for all s near b1 − δ and λ(s) = f (s) for all s near b1 . We can now define ϕ : (a1 − Q, b2 ) → M by setting ϕ1 (s + Q), s ≤ b1 − δ − Q, ϕ(s) = ϕ1 (λ−1 (s)), b1 − δ − Q < s < t2 , ϕ2 (s), s ≥ t2 . We have λ−1 (s) = s + Q for all s in a neighborhood of b1 − δ − Q and ϕ(s) = ϕ2 (s) for all s close to t2 . Therefore ϕ is a C r function. Each point in its domain has a neighborhood such that the restriction of ϕ to that neighborhood is a C r parameterization for M, which implies that if maps open sets to open sets. If it was injective, it would be a C r coordinate chart whose image was the union of the images of ϕ1 and ϕ2 , which would contradict the minimality of K. Therefore ϕ is not injective. Since ϕ1 and ϕ2 are injective, there must be s < b1 − δ − c such that ϕ(s) = ϕ(s′ ) for some s′ > t1 . Let s0 be the supremum of such s. If ϕ(s0 ) = ϕ(s′ ) for some s′ > t1 , then the restrictions of ϕ to neighborhoods of s0 and s′ would both map diffeomorphically onto some neighborhood of this point, which would give a contradiction of the definition of s0 . Therefore ϕ(s0 ) is in the closure of ϕ(((t1 , b2 )), but is not an element of this set, so it must be lims′ →b2 ϕ(s′ ). Arguments similar to those given above imply that there are α, β > 0 such that the images of ϕ|(b2 −α,b2 ) and ϕ|(s0 −β,s0 ) are the same, and the C r diffeomorphism g = (ϕ|(s0 −β,s0 ) )−1 ◦ ϕ|(b2 −α,b2 ) is increasing. Applying the lemma above again, there is a real number R and an increasing C r diffeomorphism λ : (b2 −α, b2 ) → (b2 −α−R, s0 ) such that λ(s) = s−R for s near b2 − α and λ(s) = g(s) for s near b2 . We now define ψ : [s0 , s0 + R) → M by setting ( ϕ(s), s0 ≤ s ≤ b2 − α, ψ(s) = −1 ϕ(λ (s − R)), b2 − α < s < s0 + R. Then ψ agrees with ϕ near b2 − α, so it is C r , and it agrees with ϕ(s − R) near s0 + R, so it can be construed as a C r function from the circle (thought of R modulo 10.9. CLASSIFICATION OF COMPACT 1-MANIFOLDS 149 R) to M. This function is easily seen to be injective, and it maps open sets to open sets, so its image is open, but also compact, hence closed. Since M is connected, its image must be all of M, so we have constructed te desired C r diffeomorphism between the circle and M. The argument for a compact connected one dimensional C r ∂-manifold with nonempty boundary is similar, but somewhat simpler, so we leave it to the reader. Although it will not figure in the work here, the reader should certainly be aware that the analogous issues for higher dimensions are extremely important in topology, and mathematical culture more generally. In general, a classification of some type of mathematical object is a description of all the isomorphism classes (for whatever is the appropriate notion of isomorphism) of the object in question. The result above classifies compact connected 1-dimensional C r manifolds. The problem of classifying oriented surfaces (2-dimensional manifolds) was first considered in a paper of Möbius in 1870. The classification of all compact connected surfaces was correctly stated by van Dyke in 1888. This result was proved for surfaces that can be triangulated by Dehn and Heegaard in 1907, and in 1925 Rado showed that any surface can be triangulated. After some missteps, Poincaré formulated a fundamental problem for the the classification of 3-manifolds: is a simply connected compact 3-manifold necessarily homeomorphic to S 3 ? (A topological space X is simply connected if it is connected and any continuous function f : S 1 = { (x, y) ∈ R2 : x2 + y 2 = 1 } → X has a continuous extension F : D 2 = { (x, y) ∈ R2 : x2 + y 2 ≤ 1 } → X.) Although Poincaré did not express a strong view, this became known as the Poincaré conjecture, and over the course of the 20th century, as it resisted solution and the four color theorem and Fermat’s last theorem were proved, it became perhaps the most famous open problem in mathematics. Curiously, the analogous theorems for higher dimensions were proved first, by Smale in 1961 for dimensions five and higher, and by Freedman in 1982 for dimension four. Finally in late 2002 and 2003 Perelman posted three papers on the internet that sketched a proof of the original conjecture. Over the next three years three different teams of two mathematicians set about filling in the details of the argument. In the middle of 2006 each of the teams posted a (book length) paper giving a complete argument. Although Perelman’s papers were quite terse, and many details needed to be filled in, all three teams agreed that all gaps in his argument were minor. Chapter 11 Sard’s Theorem The results concerning existence and uniqueness of systems of linear equations have been well established for a long time, of course. In the late 19th century Walras recognized that the system describing economic equilibria had (after recognizing the redundant equation now known as Walras’ law) the same number of equations and free variables, which suggested that “typically” economic equilibria should be isolated and also robust, in the sense that the endogenous variables will vary continuously with the underlying parameters in some neighborhood of the initial point. It was several decades before methods for making these ideas precise were established in mathematics, and then several more decades elapsed before they were imported into theoretical economics. The original versions of what is now known as Sard’s theorem appeared during the 1930’s. There followed a process of evolution, both in the generality of the result and in the method of proof, that culminated in the version due to Federer (see Section 11.3.) Our treatment here is primarily based on Milnor (1965), fleshed out with some arguments from Sternberg (1983), which (in its first edition) seems to have been Milnor’s primary source. While not completely general, this version of the result is adequate for all of the applications in economic theory to date, many of which are extremely important. Suppose 1 ≤ r ≤ ∞, and let f : U → Rn be a C r function, where U ⊂ Rm is open. If f (x) = y and Df (x) has rank n, then the implicit function theorem (Theorem 10.1.2) implies that, in a neighborhood of x, f −1 (y) can be thought of as the graph of a C r function. Intuition developed by looking at low dimensional examples suggests that for “typical” values of y this pleasant situation will prevail at all elements of f −1 (y), but even in the case m = n = 1 one can see that there can be a countable infinity of exceptional y. Thus the difficulty in formulating this idea precisely is that we need a suitable notion of a “small” subset of Rn . This problem was solved by the theory of Lesbesgue measure, which explains the relatively late date at which the result first appeared. Measure theory has rather complex foundations, so it preferable that it not be a prerequisite. Thus it is fortunate that only the notion of a set of measure zero is required. Section 11.1 defines this notion and establishes its basic properties. One of the most important results in measure theory is Fubini’s theorem, which, roughly speaking, allows functions to be integrated one variable at a time. Section 150 151 11.1. SETS OF MEASURE ZERO 11.2 develops a Fubini-like result for sets of measure zero. With these elements in place, it becomes possible to state and prove Sard’s theorem in Section 11.3. Section 11.4 explains how to extend the result to maps between sufficiently smooth manifolds. The application of Sard’s theorem that is most important in the larger scheme of this book is given in Section 11.5. The overall idea is to show that any map between manifolds can be approximated by one that is transversal to a given submanifold of the range. 11.1 Sets of Measure Zero For each n there is a positive constant such that the volume of a ball in Rn is that constant times r n , where r is the radius of the ball. Without knowing very much about the constant, we can still say that sets satisfying the following definition are “small.” Definition 11.1.1. A set S ⊂ Rm has measure zero if, for any ε > 0, there is k a sequence {(xj , rj )}∞ j=1 in R × (0, 1) such that S⊂ [ j Urj (xj ) and X rjm < ε. j Of course we can use different sets, such as cubes, as a measure of whether a set has measure Specifically, if we can find a covering of S by balls of P zero. m radiusPrj with j rj < ε, then there is a covering by cubes of side length 2rj with j (2rj )m < 2m ε, and if we can find a covering of S by cubes of side lengths P √ 2ℓj with (2ℓj )m < ε, then there is a covering by balls of radius mℓj with j Qm P √ √ m m i=1 [ai , bi ] because we can j ( mℓj ) < ( m/2) ε. We can also use rectangles cover such a rectangle with a collection of cubes of almost the same total volume; from the point of view of our methodology it is important to recognize that we “know” this as a fact of arithmetic (and in particular the distributive law) rather than as prior knowledge concerning volume. The rest of this section develops a few basic facts. The following property of sets of measure zero occurs frequently in proofs. Lemma 11.1.2. If S1 , S2 , . . . ⊂ Rm are sets of measure zero, then S1 ∪ S2 ∪ . . . has measure zero. Proof. For given ε take the union of a countable cover of S1 by rectangles of total volume < ε/2, a countable cover of S2 by rectangles of total volume < ε/4, etc. It is intuitively obvious that a set of measure zero cannot have a nonempty interior, but our methodology requires that we “forget” everything we know about volume, using only arithmetic to prove it. Lemma 11.1.3. If S has measure zero, its interior is empty, so its complement is dense. 152 CHAPTER 11. SARD’S THEOREM Proof. Suppose that, on the contrary, S has a nonempty interior. Then it contains a closed cube C, say of side length P 2ℓ. Fixing ε > 0, suppose that S has a covering by cubes of side length 2ℓj with j (2ℓj )m < ε. Then it has a covering by open cubes Cj of side length 3ℓj , and there is a finite subcover of C. For some large integer K, Q ij ij +1 consider all “standard cubes” of the form m j=1 [ K , K ]. For each cube in our finite subcover, let Dj be the union of all such standard cubes contained in Cj , and let nj be the number of such cubes. Let D be the union of all standard cubes containing a point in C, and let n be the number of them. Simply as a matter of counting (that is to say, without reference to any theory of volume)Swe have nj /K m ≤ P (3ℓj )m and n/K m ≥ (2ℓ)m . If K is sufficiently large, then D ⊂ j Dj , so that n ≤ j nj and (2ℓ)m ≤ n/K m ≤ X j nj /K m ≤ X j (3ℓj )m ≤ (3/2)m ε, so that ε > (4ℓ/3)m cannot be arbitrarily small. The next result implies that the notion of a set of measure zero is invariant under C 1 changes of coordinates. In the proof of Theorem 11.3.1 we will use this flexibility to choose coordinate systems with useful properties. In addition, this fact is the key to the definition of sets of measure zero in manifolds. Recall that if L : Rm → Rm is a linear transformation, then the operator norm of L is kLk = max kL(v)k. kvk=1 Lemma 11.1.4. If U ⊂ Rm is open, f : U → Rm is C 1 , and S ⊂ U has measure zero, then f (S) has measure zero. Proof. Let C ⊂ U be a closed cube. Since U can be covered by countably many such cubes (e.g., all cubes contained in U with rational centers and rational side lengths) it suffices to show that f (S ∩ C) has measure zero. Let B := maxx∈C kDf (x)k. For any x, y ∈ C we have Z 1 kf (x) − f (y)k = Df ((1 − t)x + ty)(y − x) dt 0 Z 1 ≤ kDf ((1 − t)x + ty)k · ky − xk dt ≤ Bky − xk. 0 If {(xj , rj )}∞ j=1 is a sequence such that S∩C ⊂ [ Urj (xj ) and j X rjm < ε, X (Brj )m < B m ε. j then f (S ∩ C) ⊂ [ j UBrj (f (xj )) and j 11.2. A WEAK FUBINI THEOREM 11.2 153 A Weak Fubini Theorem For a set S ⊂ Rm and t ∈ R let S(t) := { (x2 , . . . , xm ) ∈ Rm−1 : (t, x2 , . . . , xm ) ∈ S } be the t-slice of S. Let P (S) be the set of t such that S(t) does not have (m − 1)dimensional measure zero. Certainly it seems natural to expect that if S is a set of m-dimensional zero, then P (S) should be a set of 1-dimensional measure zero, and conversely. This is true, by virtue of Fubini’s theorem, but we do not have the means to prove it in full generality. Fortunately all we will need going forward is a special case. Proposition 11.2.1. If S ⊂ Rm is locally closed, then S has measure zero if and only if P (S) has measure zero. We will prove this in several steps. Lemma 11.2.2. If C ⊂ Rm is compact, then C has measure zero if and only if P (C) has measure zero. Proof of Proposition 11.2.1. Suppose that S = C ∩ U where C is closed and U is open. Let A1 , A2 , . . . be a countable collection of compact rectangles that cover U. Then the following are equivalent: (a) S has measure zero; (b) each C ∩ Aj has measure zero; (c) each P (C ∩ Aj ) has measure zero; (d) P (S) has measure zero. Specifically, S Lemma 11.1.2 implies that (a) and (b) are equivalent, and also that P (S) = j P (C ∩ Aj ), after which the equivalence of (c) and (d) follows from a third application of the result. The equivalence of (b) and (c) follows from the lemma above. We now need to prove Lemma Q 11.2.2. Fix a compact set C, which we assume is contained in the rectangle m i=1 [ai , bi ]. For each δ > 0 let Pδ (C) be the set of t such that C(t) cannot be covered by a finite collection of open rectangles whose total (m − 1)-dimensional volume is less than δ. Lemma 11.2.3. For each δ > 0, Pδ (C) is closed. Proof. If t is in the complement of Pδ (C), then any collection of open rectangles that cover C(t) also covers C(t′ ) for t′ sufficiently close to t, because C is compact. The next two results are the two implications of Lemma 11.2.2. Lemma 11.2.4. If P (C) has measure zero, then C has measure zero. 154 CHAPTER 11. SARD’S THEOREM Proof. Fix ε > 0, and choose δ < ε/2(b1 − a1 ). Since Pδ (C) ⊂ P (C), it has one dimensional measure zero, and since it is closed, hence compact, it can be covered by the union J of finitely many open intervals of total length ε/2(b2 − a2 ) · · · (bm − am ). In this way { x ∈ C : x1 ∈ J } is covered by a union of open rectangles of total volume ≤ ε/2. For each t ∈ / J we can choose a finite union of rectangles in Rm−1 of total volume less than δ that covers C(t), and these will also cover C(t′ ) for all t′ in some open interval around t. Since [a1 , b1 ] \ J is compact, it is covered by a finite collection of such intervals, and it is evident that we can construct a cover of { x ∈ C : x1 ∈ /J} of total volume less than ε/2. Lemma 11.2.5. If C has measure zero, then P (C) has measure zero. S Proof. Since P (C) = n=1,2,... P1/n (C), it suffices to show that Pδ (C) has measure zero for any δ > 0. For any ε > 0 there is a covering of C by finitely many rectangles of total volume less than ε. For each t there is an induced covering C(t) be a finite collection of rectangles, and there is an induced covering of [a1 , b1 ]. The total length of intervals with induced coverings of total volume greater than δ cannot exceed ε/δ. 11.3 Sard’s Theorem We now come to this chapter’s central result. Recall that a critical point of a C function is a point in the domain at which the rank of the derivative is less than the dimension of the range, and a critical value is a point in the range that is the image of a critical point. 1 Theorem 11.3.1. If U ⊂ Rm is open and f : U → Rn is a C r function, where r > max{m − n, 0}, then the set of critical values of f has measure zero. Proof. If n = 0, then f has no critical points and therefore no critical values. If m = 0, then U is either a single point or the null set, and if n > 0 its image has measure zero. Therefore we may assume that m, n > 0. Since r > m − n implies both r > (m − 1) − (n − 1) and r > (m − 1) − n, by induction we may assume that the claim has been established with (m, n) replaced by either (m − 1, n − 1) or (m − 1, n). Let C by the set of critical points of f . For i = 1, . . . , r let Ci be the set of points in U at which all partial derivatives of f up to order i vanish. It suffices to show that: (a) f (C \ C1 ) has measure 0; (b) f (Ci \ Ci+1 ) has measure zero for all i = 1, . . . , r − 1; (c) f (Cr ) has measure zero. Proof of (a): We will show that each x ∈ C \ C1 has a neighborhood V such that f (V ∩ C) has measure zero. This suffices because C \ C1 is an open subset of a 155 11.3. SARD’S THEOREM closed set, so it is covered by countably many compact sets, each of which is covered by finitely many such neighborhoods, and consequently it has a countable cover by such neighborhoods. ∂f1 After reindexing we may assume that ∂x (x) 6= 0. Let V be a neighborhood of 1 ∂f1 x in which ∂x1 does not vanish. Let h : V → Rm be the function h(x) := (f1 (x), x2 , . . . , xm ). The matrix of partial derivatives of h at x is ∂f ∂f1 1 (x) ∂x (x) · · · ∂x1 2 0 1 ··· .. . .. . 0 ··· 0 ∂f1 (x) ∂xm 0 .. . 1 , so the inverse function theorem implies that, after replacing V with a smaller neighborhood of x, h is a diffeomorphism onto its image. The chain rule implies that the critical values of f are the critical values of g = f ◦ h−1 , so we can replace f with g, and g has the additional property that g1 (z) = z1 for all z in its domain. The upshot of this argument is that we may assume without loss of generality that f1 (x) = x1 for all x ∈ V . For each t ∈ R let V t := { w ∈ Rm−1 : (t, w) ∈ V }, let f t : V t → Rn−1 be the function f t (w) := (f2 (t, w), . . . , fn (t, w)), and let C t be the set of critical points of f t . The matrix of partial derivatives of f at x ∈ V is 1 0 ··· 0 ∂f2 (x) ∂f2 (x) · · · ∂f2 (x) ∂x2 ∂xm ∂x1 .. .. .. , . . . ∂fn (x) ∂x1 ∂fn (x) ∂x2 ··· ∂fn (x) ∂xm so x is a critical point of f if and only if (x2 , . . . , xm ) is a critical point of f x1 , and consequently [ [ C∩V = {t} × C t and f (C ∩ V ) = {t} × f t (C t ). t t Since the result is known to be true with (m, n) replaced by (m − 1, n − 1), each f t (C t ) has (n − 1)-dimensional measure zero. In addition, the continuity of the relevant partial derivatives implies that C \ C1 is locally closed, so Proposition 11.2.1 implies that f (C ∩ V ) has measure zero. Proof of (b): As above, it is enough to show that an arbitrary x ∈ Ci \ Ci+1 has a neighborhood V such that f (Ci ∩ V ) has measure zero. Choose a partial derivative ∂ i+1 f that does not vanish at x. Define h : U → Rm by ∂xs ···∂xs ·∂xs 1 i i+1 i f (x), x2 , . . . , xm ). h(x) := ( ∂xs ∂···∂x s 1 i 156 CHAPTER 11. SARD’S THEOREM After reindexing we may assume that si+1 = 1, so that the matrix of partial derivatives of h at x is triangular with nonzero diagonal entries. By the inverse function theorem the restriction of h to some neighborhood V of x is a C ∞ diffeomorphism. Let g := f ◦ (h|V )−1 . Then h(V ∩ Ci ) ⊂ {0} × Rm−1 . Let g0 : { y ∈ Rm−1 : (0, y) ∈ h(V ) } → Rn be the map g0 (y) = g(0, y). Then f (V ∩(Ci \Ci+1 )) is contained in the set of critical values of g0 , and the latter set has measure zero because the result is already known when (m, n) is replaced by (m − 1, n). Proof of (c): Since U can be covered by countably many compact cubes, it suffices to show that f (Cr ∩I) has measure zero whenever I ⊂ U is a compact cube. Since I is compact and the partials of f of order r are continuous, Taylor’s theorem implies that for every ε > 0 there is δ > 0 such that kf (x + h) − f (x)k ≤ εkhkr whenever x, x + h ∈ I with x ∈ Cr and khk < δ. Let L be the side length of I. For each integer d > 0√divide I into dm subcubes of side length L/d. The diameter of such a subcube is mL/d. If this quantity is less than δ and the subcube contains a √ point x ∈ Cr , then its image is contained in a cube of sidelength 2ε( mL)r centered at f (x). There are dm subcubes of I, each one of which may or may not contain a point in Cr , so for large r ∩ I) is contained in a finite union of cubes of total √ d,r fn(C n m−nr volume at most 2( mL) ε d . Now observe that nr ≥ m: either m < n and r ≥ 1, or m ≥ n and nr ≥ n(m − n + 1) = (n − 1)(m − n) + m ≥ m. Therefore f (Cr ∩ I) is contained in a finite union of cubes of total volume at most √ r n n 2( mL) ε , and ε may be arbitrarily small. Instead of worrying about just which degree of differentiability is the smallest that allows all required applications of Sard’s theorem, in the remainder of the book we will, for the most part, work with objects that are smooth, where smooth is a synonym for C ∞ . This will result in no loss of generality, since for the most part the arguments depend on the existence of smooth objects, which will follow from Proposition 10.2.7. However, in Chapter 15 there will be given objects that may, in applications, be only C 1 , but Sard’s theorem will be applicable because the domain and range have the same dimension. It is perhaps worth mentioning that for this particular case there is a simpler proof, which can be found on p. 72 of Spivak (1965). We briefly describe the most general and powerful version of Sard’s theorem, which depends on a more general notion of dimension. Definition 11.3.2. For α > 0, a set S ⊂ Rk has α-dimensional Hausdorff measure zero if, for any ε > 0, there is a sequence {(xj , δj )}∞ j=1 such that [ X S⊂ Uδj (xj ) and δjα < ε. j j 11.4. MEASURE ZERO SUBSETS OF MANIFOLDS 157 Note that this definition makes perfect sense even if α is not an integer! Let U ⊂ Rm be open, and let f : U → Rn be a C r function. For 0 ≤ p < m let Rp be the set of points x ∈ M such that the rank of Df (x) is less than or equal to p. The most general and sophisticated version of Sard’s theorem, due to Federer, states that f (Rp ) has α-dimensional measure zero for all α > p + m−p . A beautiful r informal introduction to the circle of ideas surrounding these concepts, which is the branch of analysis called geometric measure theory, is given by Morgan (1988). The proof itself is in Section 3.4 of Federer (1969). This reference also gives a complete set of counterexamples showing this result to be best possible. 11.4 Measure Zero Subsets of Manifolds In most books Sard’s theorem is presented as a result concerning maps between Euclidean spaces, as in the last section, with relatively little attention to the extension to maps between manifolds. Certainly this extension is intuitively obvious, and there are no real surprises or subtleties in the details, which are laid out in this section. Definition 11.4.1. If M ⊂ Rk is an m-dimensional C 1 manifold, then S ⊂ M has m-dimensional measure zero if ϕ−1 (S) has measure zero whenever U ⊂ Rm is open and ϕ : U → M is a C 1 parameterization. In order for this to be sensible, it should be the case that ϕ(S) has measure zero whenever ϕ : U → M is a C 1 parameterization and S ⊂ U has measure zero. That is, it must be the case that if ϕ′ : U ′ → M is another C 1 parameterization, then ϕ′ −1 (ϕ(S)) has measure zero. This follows from the application of Lemma 11.1.4 to ϕ′ −1 ◦ ϕ. Clearly the basic properties of sets of measure zero in Euclidean spaces—the complement of a set of measure zero is dense, and countable unions of sets of measure zero have measure zero—extend, by straightforward verifications, to subsets of manifolds of measure zero. Since uncountable unions of sets of measure zero need not have measure zero, the following fact about manifolds (as we have defined them, namely submanifolds of Euclidean spaces) is comforting, even if the definition above makes it superfluous. Lemma 11.4.2. If M ⊂ Rk is an m-dimensional C 1 manifold, then M is covered by the images of a countable system of parameterizations {ϕj : Uj → M}j=1,2,.... Proof. If p ∈ M and ϕ : U → M is a C r parameterization with p ∈ ϕ(U), then there is an open set W ⊂ Rk such that ϕ(U) = M ∩ W . Of course there is an open ball B of rational radius whose center has rational coordinates with p ∈ B ⊂ W , and we may replace ϕ with its restriction to ϕ−1 (B). Now the claim follows from the fact that there are countably many balls in Rk of rational radii centered at points with rational coordinates. The “conceptually correct” version of Sard’s theorem is an easy consequence of the Euclidean special case. 158 CHAPTER 11. SARD’S THEOREM Theorem 11.4.3. (Morse-Sard Theorem) If f : M → N is a smooth map, where M and N are smooth manifolds, then the set of critical values of f has measure zero. Proof. Let C be the set of critical points of f . In view of the last result it suffices to show that f (C ∩ϕ(U)) has measure zero whenever ϕ : U → M is a parameterization for M. That is, we need to show that ψ −1 (f (C ∩ ϕ(U))) has measure zero whenever ψ : V → N is a parameterization for N. But ψ −1 (f (C ∩ ϕ(U))) is the set of critical values of ψ −1 ◦ f ◦ ϕ, so this follows from Theorem 11.3.1. 11.5 Genericity of Transversality Intuitively, it should be unlikely that two smooth curves in 3-dimensional space intersect. If they happen to, it should be possible to undo the intersection by “perturbing” one of the curves slightly. Similarly, a smooth curve and a smooth surface in 3-space should intersect at isolated points, and again one expects that a small perturbation can bring about this situation if it is not the case initially. Sard’s theorem can help us express this intuition precisely. Let M and N be m- and n-dimensional smooth manifolds, and let P be a pdimensional smooth submanifold of N. Recall that a smooth function f : M → N is transversal to P if Df (p)(Tp M) + Tf (p) P = Tf (p) N for every p ∈ f −1 (P ). The conceptual point studied in this section is that smooth functions from M to N that are transversal to P are plentiful, in the sense that any continuous function can be approximated by such a map. This is expressed more precisely by the following result. Proposition 11.5.1. If f : M → N is a continuous function and A ⊂ M × N is a neighborhood of Gr(f ), then there is a smooth function f ′ : M → N that is transverse to P with Gr(f ′ ) ⊂ A. In some applications the approximation will be required to satisfy some restriction. A vector field on a set S ⊂ M is a continuous function ζ : S → T M such that π ◦ ζ = IdM , where π : T M → M is the projection. We can write ζ(p) = (p, ζp ) where ζp ∈ Tp M. Thus a vector field on S attaches a tangent vector ζp to each p ∈ S, in a continuous manner. The zero section of T M is M × {0} ⊂ T M. If we like we can think of it as the image of the vector field that is identically zero, and of course it is also an m-dimensional smooth submanifold of T M. Proposition 11.5.2. If ζ is a vector field on M and A ⊂ T M is an open neighborhood of { (p, ζp ) : p ∈ M }, then there is a smooth vector field ζ ′ such that { (p, ζp′ ) : p ∈ M } ⊂ A and ζ ′ is transverse to the zero section of T M. To obtain a framework that is sufficiently general we introduce an s-dimensional smooth manifold S and a smooth submersion h : N → S. We now fix a continuous function f : M → N such that h ◦ f is a smooth submersion. Our main result, which has the two results above as corollaries, is: 11.5. GENERICITY OF TRANSVERSALITY 159 Theorem 11.5.3. If A ⊂ M × N is an open neighborhood of Gr(f ), then there is a smooth function f ′ : M → N with Gr(f ′ ) ⊂ A, h ◦ f ′ = h ◦ f , and f ′ ⋔ P . The first proposition above is the special case of this in which S is a point. To obtain the second proposition we set S = M and let h be the natural projection T M → M. The rest of this section is devoted to the proof of Theorem 11.5.3. The proof is a matter of repeated modifying f on sets that are small enough that we can conduct the construction in a fully Euclidean setting. In the following result, which describes the local modification, there is an open domain U ⊂ Rm and a compact “target” set K ⊂ U. There is a closed set C ⊂ U and an open neighborhood Y of C on which the desired tranversality already holds. We wish to modify the function so that the desired transversality holds on a possibly smaller neighborhood of C, and also on a neighborhood of K. However, when we apply this result U will be, in effect, an open subset of M, and in order to preserve the properties of the given function at points in the boundary of U in M it will need to be the case that the function is unchanged outside of a given compact neighborhood of K. Collectively these requirements create a significant burden of complexity. Proposition 11.5.4. Let U be an open subset of Rm , suppose that C ⊂ W ⊂ U with C relatively closed and W open, let K be a compact subset of U, and let Z be an open neighborhood of K whose closure is a compact subset of U. Suppose that g = (g s , g n−s) : U → Rn = Rs × Rn−s is a continuous function, g|W is smooth, and g s is a smooth submersion. Let P be a p-dimensional smooth submanifold of Rn , and suppose that g|C ⋔ P . Let A ⊂ U ×Rn be a neighborhood of the graph of g. Then there is an open Z ′ ⊂ Z containing U and a continuous function g̃ n−s : U → Rn−s such that, setting g̃ = (g s , g̃ n−s ) : U → Rn , we have: (a) Gr(g̃) ⊂ A; (b) g̃|U \Z = g|U \Z ; (c) g̃|W ∪Z ′ is smooth; (d) g̃ ⋔C∪K P . We first explain how this result can be used to prove Theorem 11.5.3. The next result describes how the local modification of f looks in context. Let ψ : V → N be a smooth parameterization. We say that ψ is aligned with h if h(ψ(y)) is independent of ys+1, . . . , yn . Lemma 11.5.5. Suppose that C ⊂ W ⊂ M with C closed and W open, f |W is smooth, and f ⋔C P . Let A ⊂ M × N be an open set that contains the graph of f . Suppose that ϕ : U → M and ψ : V → N are smooth parameterizations with f (ϕ(U)) ⊂ ψ(V ) and ψ aligned with h. Suppose that K ⊂ ϕ(U) is compact, and Z is an open subset of ϕ(U) whose closure is compact and contained in ϕ(U). Then there is an open Z ′ ⊂ Z containing K and a continuous function f˜ : M → N such that: 160 CHAPTER 11. SARD’S THEOREM (a) Gr(f˜) ⊂ A; (b) f˜|M \Z = f |M \Z ; (c) f˜|W ∪Z ′ is smooth; (d) f˜ ⋔C∪K P . Proof. Let P̃ = ψ −1 (P ), g = ψ −1 ◦ f ◦ ϕ, C̃ = ϕ−1 (C), à = { (x, y) ∈ U × V : (ϕ(x), ψ(y)) ∈ A }, W̃ = ϕ−1 (W ), K̃ = ϕ−1 (K), Z̃ = ϕ−1 (Z). Let Z̃ ′ and g̃ be the set and function whose existence is guaranteed by the last result. Set Z ′ = ϕ(Z̃ ′ ), and define f˜ by specifying that f˜ agrees with f on M \ ϕ(U), and f˜|ϕ(U ) = ψ ◦ g̃ ◦ ϕ−1 . Clearly f˜ has all the desired properties. In order to apply this we need to have an ample supply of smooth parameterizations for N that are aligned with h. Lemma 11.5.6. Each point q ∈ f (M) is contained in the image of a smooth parameterization that is aligned with h. Proof. Let ψ̃ : Ṽ → N be any smooth parameterization whose image contains q, and let y = ψ̃ −1 (q). Let σ : W → S be a smooth parameterization whose image contains h(q); we can replace Ṽ with a smaller open set containing y, so we may assume that h(ψ̃(Ṽ )) ⊂ σ(W ). Since h ◦ f is a submersion, the rank of Dh(q) is s, and consequently the rank of D(σ −1 ◦ h ◦ ψ̃)(y) is also s. After some reindexing, y is a regular point of ′ , . . . , yn′ ). θ : y ′ 7→ (σ −1 (h(ψ̃(y ′ ))), ys+1 Applying the inverse function theorem, a smooth parameterization whose image contains q that is aligned with h is given by letting ψ be the restriction of ψ̃ ◦ θ−1 to some neighborhood of (σ −1 (h(q)), ys+1, . . . , yn ). Proof of Theorem 11.5.3. Any open subset of M is a union of open subsets whose closures are compact. In view of this fact and the last result, M is covered by the sets ϕ(U) where ϕ : U → M is a smooth parameterization with f (ϕ(U)) ⊂ ψ(V ) for some smooth parameterization ψ : V → N that is aligned with h, and the closure of ϕ(U) is compact. Since M is paracompact, there is a locally finite cover by the images of such parameterizations, and since M is separable, this cover is countable. That is, there is a sequence ϕ1 : U1 → M, ϕ2 : U2 → M, . . . whose images cover M, such that for each i, the closure of ϕi (Ui ) is compact and there is a smooth parameterization ψi : Vi → N that is aligned with h such that f (ϕi (Ui )) ⊂ ψi (Vi ). We claim that there is a sequence K1 , K2 , . . . of compact subsets of M that cover M, with Ki ⊂ ϕi (Ui ) for each i. For p ∈ M let δ(p) be the maximum ε such that 11.5. GENERICITY OF TRANSVERSALITY 161 Uε (p) ⊂ ϕi (Ui ) for some i, and let ip be an integer that attains the maximum. Then δ : M → (0, ∞) is a continuous function. For each i let δi := minp∈ϕi (Ui ) δ(p), and let Ki = { p ∈ ϕi (Ui ) : Uδi (p) ⊂ ϕi (Ui ) }. Clearly Ki is a closed subset of ϕi (Ui ), so it is compact. For any p ∈ M we have p ∈ Kip , so the sets K1 , K2 , . . . cover M. Let C0 = ∅, and for each positive i let Ci = K1 ∪ . . . ∪ Ki . Let f0 = f . Suppose for some i = 1, 2, . . . that we have already constructed a neighborhood Wi−1 of Ci−1 and a continuous function fi−1 : M → N with Gr(fi−1 ) ⊂ A such that f |Wi−1 is smooth and f ⋔Ci−1 P . Let Zi be an open subset of ϕi (Ui ) that contains Ki , and whose closure is compact and contained in ϕi (Ui ). Now Lemma 11.5.5 gives an open Zi′ ⊂ Zi containing Ki and a continuous function fi : M → N with Gr(fi ) ⊂ A such that fWi−1 ∪Zi′ is smooth, fi |M \Zi = fi−1 |M \Zi , and fi ⋔Ci P . Set Wi = Wi−1 ∪ Zi′ . Evidently this constructive process can be extended to all i. For each i, ϕi (Ui ) intersects only finitely many ϕj (Uj ), so the sequence f1 |ϕi (Ui ) , f2 |ϕi (Ui ) , . . . is unchanging after some point. Thus the sequence f1 , f2 , . . . has a well defined limit that is smooth and transversal to P , and whose graph is contained in A. We now turn to the proof of Proposition 11.5.4. The main idea is to select a suitable member from a family of perturbations of g. The following lemma isolates the step in the argument that uses Sard’s theorem. Lemma 11.5.7. If U ⊂ Rm and B ⊂ Rn−s are open, P is a p-dimensional smooth submanifold of Rn , and G : U × B → Rn is smooth and transversal to P , then for almost every b ∈ B the functions gb = G(·, b) : U → N is transversal to P . Proof. Let Q = G−1 (P ). By the transversality theorem, Q is a smooth manifold, of dimension (m + (n − s)) − (n − p) = m + p − s. Let π be the natural projection U × B → B. Sard’s theorem implies that almost every b ∈ B is a regular value of π|Q . Fix such a b. We will show that gb is transversal to P . Fix x ∈ gb−1 (P ), set q = gb (x), and choose some y ∈ Tq N. Since G is transversal to P there is a u ∈ T(x,b) (U ×B) such that y is the sum of DG(x, b)u and an element of Tq P . Let u = (v, w) where v ∈ Rm and w ∈ Rn−s . Since (x, b) is a regular point of π|Q , there is a u′ ∈ T(x,b) Q such that Dπ|Q (x, b)u′ = −w. Let u′ = (v ′ , −w). Then Tq P contains DG(x, b)u′ , so it contains DG(x, b)u − y + DG(x, b)u′ = DG(x, b)(v + v ′ , 0) − y = Dgb(x)(v + v ′ ) − y. Thus y is the sum of Dgb (x)(v + v ′ ) and an element of Tq P , as desired. Proof of Proposition 11.5.4. For x ∈ U let α̃(x) be the supremum of the set of αx > 0 such that (x, y) ∈ A whenever y ∈ Uαx (g(x′ )). Clearly α̃ is continuous and positive, so (e.g., Proposition 10.2.7 applied to 21 α̃) there is a smooth function α : U → (0, ∞) such that 0 < α(x) < α̃(x) for all x. There is a neighborhood Y ⊂ U of C such that g|Y is smooth with g|Y ⋔ P . ′ Let Y ′ be an open subsets of U with C ⊂ Y ′ and Y ⊂ Y . Corollary 10.2.5 gives a 162 CHAPTER 11. SARD’S THEOREM smooth function β : U → [0, 1] that vanishes on Y ′ and is identically equal to one on U \ Y . Let B be the open unit disk centered at the origin in Rn−s , and let G : U × B → Rn be the smooth function G(x, b) = g s (x), g n−s (x) + α(x)β(x)b . For any (x, b) the image of DG(x, b) contains the image of Dg(x), so, since g|Y ⋔ P , we have G ⋔Y ×B P . Since g s is a submersion, at every (x, b) such that β(x) > 0 the image of DG(x, b) is all of Rn , so G ⋔(U \Y )×B P . Therefore G ⋔ P . The last result implies that for some b ∈ B, gb = G(·, b) is transversal to P . Evidently gb agrees with g on Y ′ . ′ Let Z ′ be an open subset of U with K ⊂ Z ′ and Z ⊂ Z. Corollary 10.2.5 gives a smooth γ : U → [0, 1] that is identically one on Z ′ and vanishes on U \ Z. Define g̃ be setting g̃(x) = g s (x), γ(x)gbn−s (x) + (1 − γ(x))g n−s (x) . Clearly Gr(g̃) ⊂ A, g̃ is smooth on W ∪ Z ′ , and g̃ agrees with g on U \ Z. Moreover, g̃ agrees with g on Y ′ and with gb on Z ′ , so g̃ ⋔Y ′ ∪Z ′ P . Chapter 12 Degree Theory Orientation is an intuitively familiar phenomenon, modelling, among other things, the fact that there is no way to turn a left shoe into a right shoe by rotating it, but the mirror image of a left shoe is a right shoe. Consider that when you look at a mirror there is a coordinate system in which the map taking each point to its mirror image is the linear transformation (x1 , x2 , x3 ) 7→ (−x1 , x2 , x3 ). It turns out that the critical feature of this transformation is that its determinant is negative. Section 12.1 describes the formalism used to impose an orientation on a vector space and consistently on the tangent spaces of the points of a manifold, when this is possible. Section 12.2 discusses two senses in which an orientation on a given object induces a derived orientation: a) an orientation on a ∂-manifold induces an orientation of its boundary; b) given a smooth map between two manifolds of the same dimension, an orientation of the tangent space of a regular point in the domain induces an orientation of the tangent space of that point’s image. If both manifolds are oriented, we can define a sense in which the map is orientation preserving or orientation reversing by comparing the induced orientation of the tangent space of the image point with its given orientation. In Section 12.3 we first define the smooth degree of a smooth (where “smooth” now means C ∞ ) map over a regular value in the range to be the number of preimages of the point at which the map is orientation preserving minus the number of points at which it is orientation reversing. Although the degree for smooth functions provides the correct geometric intuition, it is insufficiently general. The desired generalization is achieved by approximating a continuous function with smooth functions, and showing that any two sufficiently accurate approximations are homotopic, so that such approximations can be used to define the degree of the given continuous function. However, instead of working directly with such a definition, it turns out that an axiomatic characterization is more useful. 12.1 Orientation The intuition underlying orientation is simple enough, but the formalism is a bit heavy, with the main definitions expressed as equivalence classes of an equivalence relation. We assume prior familiarity with the main facts about determinants of matrices. 163 164 CHAPTER 12. DEGREE THEORY No doubt most readers are well aware that a linear automorphism (that is, a linear transformation from a vector space to itself) has a determinant. What we mean by this is that the determinant of the matrix representing the transformation does not depend on the choice of coordinate system. Concretely, if L and L′ are the matrices of the transformation in two coordinate systems, then there is a matrix U (expressing the change of coordinates) such that L′ = U −1 LU, so that |L′ | = |U −1 LU| = |U|−1 |L| |U| = |L|. Let X be an m-dimensional vector space. An ordered basis of X is an ordered m-tuple (v1 , . . . , vm ) of linearly independent vectors in X. Mostly we will omit the parentheses, writing v1 , . . . , vm when the interpretation is clear. If v1 , . . . , vm ′ and v1′ , . . . , vm are ordered bases, we say that they have the same orientation if ′ the determinant |L| of the linear map L taking v1 7→ v1′ , . . . , vm 7→ vm is positive, and otherwise they have the opposite orientation. To verify that “has the same orientation as” is an equivalence relation we observe that it is reflexive because the determinant of the identity matrix is positive, symmetric because the determinant of L−1 is 1/|L|, and transitive because the determinant of the composition of two linear functions is the product of their determinants. ′ The last fact also implies that if v1 , . . . , vm and v1′ , . . . , vm have the opposite ′′ ′′ ′ ′ orientation, and v1 , . . . , vm and v1 , . . . , vm also have the opposite orientation, then ′′ v1 , . . . , vm and v1′′ , . . . , vm must have the same orientation, so there are precisely two equivalence classes. An orientation for X is one of these equivalence classes. An oriented vector space is a vector space for which one of the two orientations has been specified. An ordered basis of an oriented vector space is said to be positively oriented (negatively oriented) if it is (not) an element of the specified orientation. Since the determinant is continuous, each orientation is an open subset of the set of ordered bases of X. The two orientations are disjoint, and their union is the entire set of ordered bases, so each path component of the space of ordered bases is contained in one of the two orientations. If the space of ordered bases had more than two path components, it would be possible to develop an invariant that was more sophisticated than orientation. But this is not the case. Proposition 12.1.1. Each orientation of X is path connected. Proof. Fix a “standard” basis e1 , . . . , em and some ordered basis v1 , . . . , vm . We will show that there is a path in the space of ordered bases from v1 , . . . , vm to either e1 , e1 , . . . , em or −e1 , e1 , . . . , em . Thus the space of ordered bases has at most two path components, and since each orientation is a nonempty union of path components, each must be a path component. If i 6= j, then for any t ∈ R the determinant of the linear transformation taking v1 , . . . , vm to v1 , . . . , vi + tvj , . . . , vm is one, so varying t gives a continuous path in the space of ordered bases. Combining P such paths, we can find a path from v1 , . . . , vm to w1 , . . . , wm where wi = j bij ej with bij 6= 0 for all i and j. Beginning at w1 , . . . , wm , such paths can be combined to eliminate all off diagonal coefficients, arriving at an ordered basis of the form c1 e1 , . . . , cm em . From here we 12.1. ORIENTATION 165 can continuously rescale the coefficients, arriving at an ordered basis d1 e1 , . . . , dm em with di = ±1 for all i. For any ordered basis v1 , . . . , vm and any i = 1, . . . , m − 1 there is a path θ 7→ (v1 , . . . , vi−1 , cos θvi + sin θvi+1 , cos θvi+1 − sin θvi , vi+1 , . . . , vm ) from [0, π] to the space of ordered bases. Evidently such paths can be combined to construct a path from d1 e1 , . . . , dm em to ±e1 , e2 , . . . , em . This result has a second interpretation. The general linear group of X is the group GL(X) of all nonsingular linear transformations L : X → X, with composition as the group operation. The identity component of GL(X) is the subgroup GL+ (X) of linear transformations with positive determinant. If we fix a particular basis e1 , . . . , em there is a bijection L ↔ (Le1 , . . . , Lem ) between GL(X) and the set of ordered bases of X, which gives the following version of the last result. Corollary 12.1.2. GL+ (X) is path connected. We wish to extend the notion of orientation to ∂-manifolds. Let M ⊂ Rk be an m-dimensional smooth ∂-manifold. Roughly, an orientation of M is a “continuous” assignment of orientations to the tangent spaces at the various points of M. One way to do this is to require that if ϕ : U → M is a smooth parameterization, where U is a connected open subset of X, and (v1 , . . . , vm ) is an ordered basis of X, then the bases (Dϕ(x)v1 , . . . , Dϕ(x)vm ) are all either positively oriented or negatively oriented. The method we adopt is a bit more concrete, and its explanation is a bit long winded, but the tools we obtain will be useful later. A path in M is a continuous function γ : [a, b] → M, where a < b. Fix such a γ. A vector field along γ is a continuous function from [a, b] to Rk that maps each t to an element of Tγ(t) M. A moving frame along γ is an m-tuple v = (v1 , . . . , vm ) of vector fields along γ such that for each t, v(t) = (v1 (t), . . . , vm (t)) is a basis of Tγ(t) M. More generally, for h = 0, . . . , m a moving h-frame along γ is an h-tuple v = (v1 , . . . , vh ) of vector fields along γ such that for each t, v1 (t), . . . , vh (t) are linearly independent. We need to know that moving frames exist in a variety of circumstances. Proposition 12.1.3. For any h = 0, . . . , m−1, any moving h-frame v = (v1 , . . . , vh ) along γ, and any vh+1 ∈ Tγ(a) M such that v1 (a), . . . , vh (a), vh+1 are linearly independent, there is a vector field vh+1 along γ such that vh+1 (a) = vh+1 and (v1 , . . . , vh , vh+1 ) is a moving (h + 1)-frame for γ. There are two parts to the argument, the first of which is concrete and geometric. Lemma 12.1.4. If η : [a, b] → Rm is a path in Rm , then for any h = 0, . . . , m − 1, any moving h-frame u = (u1 , . . . , uh ) along η, and any uh+1 ∈ Rm such that u1 (a), . . . , uh (a), uh+1 are linearly independent, there is a vector field uh+1 along η such that uh+1 (a) = uh+1 and (u1 , . . . , uh , uh+1 ) is a moving (h + 1)-frame for η. 166 CHAPTER 12. DEGREE THEORY Proof. If v1 , . . . , vh , w ∈ Rm and v1 , . . . , vh are linearly independent, let X π(v1 , . . . , vh , w) = w − βi vi i be the projection of w onto the orthogonal complement P of the span of v1 , . . . , vh . The numbers βi are the solution of the linear system h i βi vi , vj i = hw, vj i, so the continuity of matrix inversion implies that π is a continuous function. First suppose that uh+1 is a unit vector that is orthogonal to u1 (a), . . . , uh (a). Let s be the least upper bound of the set of s in [a, b] such that there is a continuous uh+1 : [a, s] → Rm with uh+1 (t) orthogonal to u1 (t), . . . , uh (t) and kuh+1 (t)k = 1 for all t. The set of pairs (s, uh+1 (s)) for such functions has a point of the form (s, vh+1 ) in its closure. The continuity of the inner product implies that vh+1 is a unit vector that is orthogonal to u1 (s), . . . , uh (s). The continuity π implies that there is an ε > 0 and a neighborhood U of vh+1 , which we may insist is convex, such that π(u1 (t), . . . , uh (t), u) 6= 0 for all t ∈ [s − ε, s + ε] ∩ [a, b] and all u ∈ U. We can choose s ∈ [s − ε, s) and a function uh+1 : [a, s] → Rm satisfying the conditions above with uh+1 (s) ∈ U. We extend uh+1 to all of [a, min{s + ε, b}] by setting uh+1 (t) = ũh+1 (t)/kũh+1 (t)k where ( t−s s−t uh+1 (s) + s−s vt+1 ), s ≤ t ≤ s, π(u1 (t), . . . , uh (t), s−s ũh+1 (t) = π(u1 (t), . . . , uh (t), vt+1 ), s ≤ t. Then uh+1 contradicts the definition of s if s < b, and for s = b it provides a satisfactory function. P ′ ′ To prove the general case we write uh+1 = i αi ui (a) + βuh+1 where uh+1 ′ is a unit vector that is orthogonal to u1 (a), . . . , uh (a). If uh+1 is the function constructed in the last paragraph with u′h+1 in place of uh+1, then we can let uh+1 = P ′ i αi ui + βuh+1 . The general result is obtained by applying this in the context of finite collection of parameterizations that cover γ. Proof of Proposition 12.1.3. There are a = t0 < t1 < · · · < tJ−1 < tJ = b such that for each j = 1, . . . , J, the image of γ|[tj−1 ,tj ] is contained in the image of a smooth parameterization. We may assume that J = 1 because the general case can obviously be obtained from J applications of this special case. Thus there is a smooth parameterization ϕ : U → M whose image contains the image of γ. Let ψ = ϕ−1 , let η := ψ ◦ γ, let uh+1 = Dψ(γ(a))vh+1 (a), and define a moving h-frame u along η by setting u1 (t) := Dψ(γ(t))v1 (t), . . . , uh (t) := Dψ(γ(t))vh (t). The last result gives a uh+1 : [a, b] → Rm such that uh+1 (a) = uh+1 and (u1 , . . . , uh , uh+1 ) is a moving (h + 1)-frame along η. We define vh+1 to [a, b] by setting vh+1 (t) = Dϕ(η(t))uh+1(t). 167 12.1. ORIENTATION Corollary 12.1.5. For any basis v1 , . . . , vm of Tγ(a) M there is a moving frame v ′ along γ such that vh (a) = vh for all h. If the ordered basis v1′ , . . . , vm of Tγ(b) M has the same orientation as v1 (b), . . . , vm (b), then there is a moving frame v′ along γ such that vh′ (a) = vh and vh′ (b) = vh′ for all h. Proof. The first assertion is obtained by applying the Proposition m times. To prove the second we regard GL(Rm ) as a group of matrices and let ρ : [a, b] → + m GL with ρ(a) the identity matrix and ρ(b) the matrix such that P (R ) be a path ′ ′ j ρij (b)vj (b) = vi for all i, as per Corollary 12.1.2. Define v by setting vi′ (t) = X ρij (t)vj (t). j Given a moving frame v and an orientation of Tγ(a) M, there is an induced orientation of Tγ(b) M defined by requiring that v(b) is positively oriented if and only if v(a) is positively oriented. The last result implies that it is always possible to induce an orientation in this way, because a moving frame always exists, and the next result asserts that the induced orientation does not depend on the choice of moving frame, so there is a well defined orientation of Tγ(b) M that is induced by γ and an orientation of Tγ(a) M. Lemma 12.1.6. If v and ṽ are two moving frames along a path γ : [a, b] → M, then v(a) and ṽ(a) have the same orientation if and only if v(b) and ṽ(b) have the same orientation. P Proof. For a ≤ t ≤ b let A(t) = (aij (t)) be the matrix such that ṽi (t) = m j=1 aij (t)vj (t). Then A is continuous, and the determinant is continuous, so t 7→ |A(t)| is a continuous function that never vanishes, and consequently |A(a)| > 0 if and only if |A(b)| > 0. If γ(b) = γ(a) and a given orientation of Tγ(a) M = Tγ(b) M differs from the one induced by the given orientation and γ, then we say that γ is an orientation reversing loop. Suppose that M has no orientation reversing loops. For any choice of a “base point” p0 in each path component of M and any specification of an orientation of each Tp0 M, there is an induced orientation of Tp M for each p ∈ M defined by requiring that whenever γ : [a, b] → M is a continuous path with γ(a) = p0 , the orientation of Tγ(b) M is the one induced by γ and the given orientation of Tp0 M. If γ ′ : [a′ , b′ ] → M is a second path with γ ′ (a′ ) = γ(a) and γ ′ (b′ ) = γ(b), then for any given orientation of Tγ(a) the orientations of Tγ(b) induced by γ and γ ′ must be the same because otherwise following γ, then backtracking along γ ′ would be an orientation reversing loop. Thus, in the absense of orientation reversing loops, an orientation of Tp0 M induces an orientation at every p in the path component of p0 . We have arrived at the following collection of concepts. Definition 12.1.7. An orientation for M is a assignment of an orientation to each tangent space Tp M such that for every moving frame v along a path γ : [a, b] → 168 CHAPTER 12. DEGREE THEORY M, v(a) is a positively oriented basis of Tγ(a) M if and only if v(b) is a positively oriented basis of Tγ(b) M. We say that M is orientable if it has an orientation. An oriented ∂-manifold is a ∂-manifold with a specified orientation. If p is a point in an oriented ∂-manifold M, we say that an ordered basis (x1 , . . . , xm ) of Tp M is positively oriented if it is a member of the orientation of Tp M specified by the orientation of M; otherwise it is negatively oriented. For any orientation of M there is an opposite orientation obtained by reversing the orientation to each Tp M. Our discussion above has the following summary: Proposition 12.1.8. Exactly one of the two situations occurs: (a) M has an orientation reversing loop. (b) Each path component of M has two orientations, and any specification of an orientation for each path component of M determines an orientation of M. Probably you already know that the Moëbius strip is the best known example of a ∂-manifold that is not orientable, while the Klein bottle is the best known example of a manifold that is not orientable. From several points of view two dimensional projective space is a more fundamental example of a manifold that is not orientable, but it is more difficult to visualize. (If you are unfamiliar with any of these spaces you should do a quick web search.) 12.2 Induced Orientation An orientation on a manifold induces an orientation on an open subset, obviously. More interesting is the orientation induced on ∂M by an orientation on the a ∂-manifold M. We are also interested in how an orientation on a point in the domain of a smooth map between manifolds of equal dimension induces an orientation on the tangent space of the image point. As we will see, this generalizes to the image point being in an oriented submanifold whose codimension is the dimension of the domain. As before we work with an m-dimensional smooth ∂-manifold M with a given orientation. Consider a point p ∈ ∂M and a basis v1 , . . . , vm of Tp M with v2 , . . . , vm ∈ Tp ∂M. Of course v2 , . . . , vm is a basis of Tp ∂M. There is a visually obvious sense in which v1 is either “inward pointing” or “outward pointing” that is made precise by using a parameterization ϕ : U → M (where U ⊂ H is open) to determine whether the first component of Dϕ−1 (p)v1 is positive or negative. Note that the sets of inward and outward point vectors are both convex. Our convention will be that an orientation of Tp M induces an orientation of Tp ∂M according to the rule that if v1 , . . . , vm is positively oriented and v1 is outward pointing, then v1 , . . . , vm−1 is positively oriented. Does our definition of the induced orientation make sense? There are two issues to address. ′ First, we need to show that if v1 , . . . , vm and v1′ , . . . , vm are two bases of Tp M with the properties described in the definition above, so that either could be used to 12.2. INDUCED ORIENTATION 169 define the induced orientation of Tp ∂M, then they give the same induced orientation. Suppose that v1 and v1′ are both outward pointing. Since the set of outward pointing vectors is convex, ′ t 7→ (1 − t)v1′ + tv1 , v2′ , . . . , vm ′ ′ is a path in the space of bases of Tp M, so v1′ , . . . , vm and v1 , v2′ , . . . , vm have the same orientation. The first row of the matrix A of the linear transformation P ′ taking v1 , v2 , . . . , vm to v1 , v2′ . . . , vm (concretely, vi′ = j aij vj ) is (1, 0, . . . , 0), so the determinant of A is the same as the determinant of its lower right hand (m − 1) × (m − 1) submatrix, which is the matrix of the linear transformation ′ taking v2 , . . . , vm to v2′ , . . . , vm . Therefore v1 , . . . , vm has the same orientation as ′ ′ ′ ′ v1 , v2 . . . , vm and v1 , . . . , vm if and only if v2 , . . . , vm has the same orientation as ′ v2′ , . . . , vm . We also need to check that what we have defined as the induced orientation of ∂M is, in fact, an orientation. Consider a path γ : [a, b] → ∂M. Corollary 12.1.5 gives a moving frame (v2 , . . . , vm ) for ∂M along γ, and Proposition 12.1.3 implies that it extends to a moving frame (v1 , . . . , vm ) for M along γ. Suppose that v1 (a) is outward pointing. By continuity, it must be the case that for all t, v1 (t) is outward pointing. If we assume that v1 (a), . . . , vm (a) is positively oriented, for the given orientation, then v2 (a), . . . , vm (a) is positively oriented, for the induced orientation. In addition, v1 (b), . . . , vm (b) is positively oriented, for the given orientation, so, as desired, v2 (b), . . . , vm (b) is positively oriented, both with respect to the induced orientation and with respect to the orientation induced by γ and v2 (a), . . . , vm (a). Now suppose that M and N are two m-dimensional oriented smooth manifolds, now without boundary, and that f : M → N is a smooth function. If p is a regular point of f , we say that f is orientation preserving at p if Df (p) maps positively oriented bases of Tp M to positively oriented bases of Tf (p) N; otherwise f is ′ orientation reversing at p. This makes sense because if v1 , . . . , vm and v1′ , . . . , vm are two bases of Tp M, then the matrix of the linear transformation taking each vi to vi′ is the same as the matrix of the linear transformation taking each Df (p)vi to Df (p)vi′ . We can generalize this in a way that does not play a very large role in later developments, but does provide some additional illumination at little cost. Suppose that M is an oriented m-dimensional smooth ∂-manifold, N is an oriented n-dimensional boundaryless manifold, P is an oriented (n − m)-dimensional submanifold of N, and f : M → N is a smooth map that is transversal to P . We say that f is positively oriented relative to P at a point p ∈ f −1 (P ) if Df (p)v1 , . . . , Df (p)vm, w1 , . . . , wn−m is a positively oriented ordered basis of Tf (p) N whenever v1 , . . . , vm is a positively oriented ordered basis of Tp M and w1 , . . . , wn−m is a positively oriented ordered basis of Tf (p) P . It is easily checked that whether or not this is the case does not depend on the choice of positively oriented ordered bases v1 , . . . , vm and Tp M and w1 , . . . , wn−m . When this is not the case we say that f is negatively oriented relative to P at p. Now, in addition, suppose that f −1 (P ) is finite. The oriented intersection number I(f, P ) is the number of points in f −1 (P ) at which f is positively oriented 170 CHAPTER 12. DEGREE THEORY relative to P minus the number of points at which f is negatively oriented relative to P . An idea of critical importance for the entire project is that under natural and relevant conditions this number is a homotopy invariant. This corresponds to the special case of the following result in which M is the cartesian product of a boundaryless manifold and [0, 1]. Theorem 12.2.1. Suppose that M is an (m+1)-dimensional oriented smooth manifold, N is an n-dimensional smooth manifold, P is a compact (n − m)-dimensional smooth submanifold of N and f : M → N is a smooth function that is transverse to P with f −1 (P ) compact. Then I(f |∂M , P ) = 0. Proof. Proposition 10.8.5 implies that f −1 (P ) is a neat smooth ∂-submanifold of M. Since f −1 (P ) is compact, it has finitely many connected components, and Proposition 10.9.1 implies that each of these is either a loop or a line segment. Recalling the definition of neatness, we see that the elements of f −1 (P ) ∩ ∂M are the endpoints of the line segments. Fix one of the line segments. It suffices to show that f |∂M is positively oriented relative to P at one endpoint and negatively oriented relative to P at the other. The line segment is a smooth ∂-manifold, and by gluing together smooth parameterizations of open subsets, using a partition of unity, we can construct a smooth path γ : [a, b] → M that traverses it, with nonzero derivative everywhere. Let v1 (t) = Dγ(t)1 for all t. (Here 1 is thought of as an element of Tt [a, b] under the identification of this space with R.) Neatness implies that v1 (a) is inward pointing and v1 (b) is outward pointing. Let v2 , . . . , vm+1 be a basis of Tγ(a) ∂M. Proposition 12.1.3 implies that v1 extends to a moving frame v1 , . . . , vm+1 along γ with v2 (a) = v2 , . . . , vm+1 (a) = vm+1 . We have vj (b) = vj′ + αj v1 (b) (j = 2, . . . , m + 1) ′ of Tγ(b) ∂M and scalars α2 , . . . , αm+1 . We can replace for some basis v2′ , . . . , vm+1 v with the moving frame given by Corollary 12.1.5 applied to the ordered basis ′ v1 (b), v2′ , . . . , vm+1 of Tγ(b) M, so we may assume that v2 (b), . . . , vm+1 (b) ∈ Tγ(b) ∂M. Then v1 (a), . . . , vm+1 (a) is a positively oriented basis of Tγ(a) M if and only if v1 (b), . . . , vm+1 (b) is a positively oriented basis of Tγ(b) M. Since v1 (a) is inward pointing and v1 (b) is outward pointing, v2 (a), . . . , vm+1 (a) is a positively oriented basis of Tγ(a) ∂M if and only if v2 (b), . . . , vm+1 (b) is a negatively oriented basis of Tγ(b) M. Proposition 12.1.3 implies that there is a moving frame w1 , . . . , wn−m along f ◦γ : [a, b] → P . As we have defined orientation, w1 (a), . . . , wn−m (a) is a positively oriented basis of Tf (γ(a)) P if and only if w1 (b), . . . , wn−m (b) is a positively oriented basis of Tf (γ(b)) P , and Df (p)v2 (a), . . . , Df (p)vm+1 (a), w1 (a), . . . , wn−m (a) is a positively oriented basis of Tf (γ(a)) N if and only if Df (p)v2 (b), . . . , Df (p)vm+1 (b), w1 (b), . . . , wn−m (b) 171 12.3. THE DEGREE is a positively oriented basis of Tf (γ(b)) N. Combining all this, we conclude that f |∂M is positively oriented relative to P at γ(a) if and only if it is and negatively oriented relative to P at γ(b), which is the desired result. 12.3 The Degree Let M and N be m-dimensional smooth manifolds. We can restrict a smooth f : M → N to a subset of the domain and consider the degree of the restricted function over some point in the range. The axioms characterizing the degree express relations between the degrees of the various restrictions. In order to get a “clean” theory we need to consider subdomains that are compact, and which have no preimages in their topological boundaries. (Intuitively, a preimage that is in the boundary of a compact subset is neither clearly inside the domain nor unambiguously outside it.) For a compact C ⊂ M let ∂C = C \ (M \ C) be the topological boundary of C. Definition 12.3.1. A continuous function f : C → N with compact domain C ⊂ M is degree admissible over q ∈ N if f −1 (q) ∩ ∂C = ∅. If, in addition, f is smooth and q is a regular value of f , then f is smoothly degree admissible over q. Let D(M, N) be the set of pairs (f, q) in which f : C → N is a continuous function with compact domain C ⊂ M that is degree admissible over q ∈ N. Let D ∞ (M, N) be the set of (f, q) ∈ D(M, N) such that f is smoothly degree admissible over q. Definition 12.3.2. If C ⊂ M is compact, a homotopy h : C × [0, 1] → N is degree admissible over q if , for each t, ht is degree admissible over q. We say that h is smoothly degree admissible over q if, in addition, h is smooth and h0 and h1 are smoothly degree admissible over q. Proposition 12.3.3. There is a unique function deg∞ : D ∞ (M, N) → Z, taking (f, q) to deg∞ q (f ), such that: ∞ −1 (∆1) deg∞ (q) is a singleton {p} q (f ) = 1 for all (f, q) ∈ D (M, N) such that f and f is orientation preserving at p. Pr ∞ ∞ (∆2) deg∞ q (f ) = i=1 degq (f |Ci ) whenever (f, q) ∈ D (M, N), the domain of f is C, and C1 , . . . , Cr are pairwise disjoint compact subsets of C such that f −1 (q) ⊂ C1 ∪ . . . ∪ Cr \ (∂C1 ∪ . . . ∪ ∂Cr ). ∞ (∆3) deg∞ q (h0 ) = deg q (h1 ) whenever C ⊂ M is compact and the homotopy h : C × [0, 1] → N is smoothly degree admissible over q. −1 Concretely, deg∞ (q) at which f is orientation preservq (f ) is the number of p ∈ f −1 ing minus the number of p ∈ f (q) at which f is orientation reversing. 172 CHAPTER 12. DEGREE THEORY Proof. For (f, q) ∈ D(M, N) the inverse function theorem implies that each p ∈ f −1 (q) has a neighborhood that contains no other element of f −1 (q), and since U is −1 compact it follows that f −1 (q) is finite. Let deg∞ (q) q (f ) be the number of p ∈ f −1 at which f is orientation preserving minus the number of p ∈ f (q) at which f is orientation reversing. Clearly deg∞ satisfies (∆1) and (∆2). Suppose that h : C × [0, 1] → N is smoothly degree admissible over q. Let V be a neighborhood of q such that for all q′ ∈ V : (a) h−1 (q ′ ) ⊂ U × [0, 1]; (b) q ′ is a regular value of h0 and h1 ; ∞ ∞ ∞ (c) deg∞ q ′ (h0 ) = deg q (h0 ) and deg q ′ (h1 ) = deg q (h1 ). Sard’s theorem implies that some q ′ ∈ V is a regular value of h. In view of (a) we can apply Theorem 12.2.1, concluding that the degree of h|∂(U ×[0,1]) = h|U ×{0,1} over q ′ is zero. Since the orientation of M × {0} induced by M × [0, 1] is the opposite ∞ of the induced orientation of M × {1}, this implies that deg∞ q ′ (h0 ) − degq ′ (h1 ) = 0, ∞ from which it follows that deg∞ q (h0 ) = degq (h1 ). We have verified (∆3). It remains to demonstrate uniqueness. In view of (∆2), this reduces to showing uniqueness for (f, q) ∈ D ∞ (M, N) such that f −1 (q) = {p} is a singleton. If f is orientation preserving at p, this is a consequence of (∆1), so we assume that f is orientation reversing at p. The constructions in the remainder of the proof are easy to understand, but tedious to elaborate in detail, so we only explain the main ideas. Using the path connectedness of each orientation (Proposition 12.1.1) and an obvious homotopy between an f that has p as a regular point and its linear approximation, with respect to some coordinate systems for the domain and range, one can show that (∆3) implies that deg∞ q (f ) does not depend on the particular orientation reversing f . Using one of the bump functions constructed after Lemma 10.2.2, one can easily construct a smooth homotopy j : M × [0, 1] → M such that j0 = IdM , each jt is a smooth diffeomorphism, and j1 (p) is any point in some neighborhood of p. Applying (∆3) to h = f ◦ j, we find that deg∞ q (f ) does not depend on which point (within some neighborhood of p) is mapped to q. The final construction is a homotopy between the given f and a function f ′ that has three preimages of q near p, with f ′ being orientation reversing at two of them and orientation preserving at the third. In view of the other conclusions we have reached, (∆3) implies that ∞ deg∞ q (f ) = 2 degq (f ) + 1. In preparation for the next result we show that deg∞ is continuous in a rather strong sense. Proposition 12.3.4. If C ⊂ M is compact, f : C → N is continuous, and q ∈ N \ f (∂C), then are neighborhoods Z ⊂ C(C, N) of f and V ⊂ N \ f (∂C) of q such that ′ ′′ deg∞ q ′ (f ) = deg q ′′ (f ) 173 12.3. THE DEGREE whenever f ′ , f ′′ ∈ Z ∩ C ∞ (C, N), q ′ , q ′′ ∈ V , q ′ is a regular value of f ′ , and q ′′ is a regular value of f ′′ . Proof. Let V be an open disk in N that contains q with V ⊂ N \ f (∂C). Then Z ′ = { f ′ ∈ C(C, N) : f (∂C) ⊂ N \ V } is an open subset of C(C, N), and Theorem 10.7.7 gives an open Z ⊂ Z ′ containing f such that for any f ′ , f ′′ ∈ Z ∩C ∞ (C, N) there is a smooth homotopy h : C ×[0, 1] → N with h0 = f ′ , h1 = f ′′ , and ht ∈ Z ′ for all t, which implies that h is a degree ∞ ′′ ′′′ ′ admissible homotopy, so (∆3) implies that deg∞ q ′′′ (f ) = deg q ′′′ (f ) whenever q ∈ V is a regular point of both f ′ and f ′′ . Since Sard’s theorem implies that such a q ′′′ exists, it now suffices to show that ∞ ′ ′ ∞ ′ ′′ ′ deg∞ q ′ (f ) = degq ′′ (f ) whenever f ∈ Z ∩C (C, N) and q , q ∈ V are regular values of f ′ . Let j : N × [0, 1] → N be a smooth function with the following properties: (a) j0 = IdN ; (b) each jt is a smooth diffeomorphism; (c) j(y, t) = y for all y ∈ N \ V and all t; (d) j1 (q ′ ) = q ′′ . (Construction of such a j, using the techniques of Section 10.2, is left as an exercise.) Clearly jt (q ′ ) is a regular value of jt ◦ f for all t, so the concrete characterization of ′ deg∞ implies that deg∞ jt (q ′ ) (jt ◦ f ) is locally constant as a function of t. Since the ∞ ′ ′ unit interval is connected, it follows that deg∞ q ′ (f ) = deg q ′′ (j1 ◦ f ). On the other hand jt ◦ f ′ ∈ Z ′ for all t, so the homotopy (y, t) 7→ j(f ′(y), t) is smoothly degree ∞ ′ ′ admissible over q ′′ , and (∆3) implies that deg∞ q ′′ (j1 ◦ f ) = deg q ′′ (f ). The theory of the degree is completed by extending the degree to continuous functions, dropping the regularity condition. Theorem 12.3.5. There is a unique function deg : D(M, N) → Z, taking (f, q) to degq (f ), such that: (D1) degq (f ) = 1 for all (f, q) ∈ D(M, N) such that f is smooth, f −1 (q) is a singleton {p}, and f is orientation preserving at p. P (D2) degq (f ) = ri=1 degq (f |Ci ) whenever (f, q) ∈ D(M, N), the domain of f is C, and C1 , . . . , Cr are pairwise disjoint compact subsets of U such that f −1 (q) ⊂ C1 ∪ . . . ∪ Cr \ (∂C1 ∪ . . . ∪ ∂Cr ). (D3) If (f, q) ∈ D(M, N) and C is the domain of f , there is a neighborhood A ⊂ C(C, N) × N of (f, q) such that degq′ (f ′ ) = degq (f ) for all (f ′ , q ′ ) ∈ A. 174 CHAPTER 12. DEGREE THEORY Proof. We claim that if deg : D(M, N) → Z satisfies (D1)-(D3), then its restriction to D ∞ (M, N) satisfies (∆1)-(∆3). For (∆1) and (∆2) this is automatic. Suppose that C ⊂ M is compact and h : U × [0, 1] → N is a smoothly degree admissible homotopy over q. Such a homotopy may be regarded as a continuous function from [0, 1] to C(U , N). Therefore (D3) implies that degq (ht ) is a locally constant function of t, and since [0, 1] is connected, it must be constant. Thus (∆3) holds. Proposition 11.5.1 implies that for any (f, q) ∈ D(M, N) the set of smooth f ′ : M → N that have q as a regular value is dense at f . In conjunction with Proposition 12.3.4, this implies that the only possibility consistent with (D3) is to ′ ′ ′ ∞ ′ ′ set degq (f ) = deg∞ q ′ (f ) for (f , q ) ∈ D (M, N) with f and q close to f and q. This establishes uniqueness, and Proposition 12.3.4 also implies that the definition is unambiguous. It is easy to see that (D1) and (D2) follow from (∆1) and (∆2), and (D3) is automatic. Since (D2) implies that the degree of f over q is the sum of the degrees of the restrictions of f to the various connected components of the domain of f , it makes sense to study the degree of the restriction of f to a single component. For this reason, when studying the degree one almost always assumes that M is connected. (In applications of the degree this may fail to be the case, of course.) The image of a connected set under a continuous mapping is connected, so if M is connected and f : M → N is continuous, its image is contained in one of the connected components of N. Therefore it also makes sense to assume that N is connected. So, assume that N is connected, and that f : M → N is continuous. We have (M, f, q) ∈ D(M, N) for all q ∈ N, and (D3) asserts that degq (f ) is continuous as a function of q. Since Z has the discrete topology, this means that it is a locally constant function, and since N is connected, it is in fact constant. That is, degq (f ) does not depend on q; when this is the case we will simply write deg(f ), and we speak of the degree of f without any mention of a point in N. 12.4 Composition and Cartesian Product In Chapter 5 we emphasized restriction to a subdomain, composition, and cartesian products, as the basic set theoretic methods for constructing new functions from ones that are given. The bahevior of the degree under restriction to a subdomain is already expressed by (D3), and in this section we study the behavior of the degree under composition and products. In both cases the result is given by multiplication, reflecting basic properties of the determinant. Proposition 12.4.1. If M, N, and P are oriented m-dimensional smooth manifolds, C ⊂ M and D ⊂ N are compact, f : C → N and g : D → P are continuous, g is degree admissible over r ∈ P , and g −1 (r) is contained in one of the connected components of N \ f (∂C), then for any q ∈ g −1(r) we have degr (g ◦ f ) = degq (f ) · degr (g). Proof. Since C ∞ (C, N) and C ∞ (D, P ) are dense in C(C, N) and C(D, P ) (Theorem 10.7.6) and composition is a continuous operation (Proposition 5.3.6) the continuity 175 12.4. COMPOSITION AND CARTESIAN PRODUCT property (D3) of the degree implies that is suffices to prove the claim when f and g are smooth. Sard’s theorem implies that there are points r arbitrarily near r that are regular values of both g and g ◦ f , and Proposition 12.3.4 implies that the relevant degrees are unaffected if r is replaced by such a point, so we may assume that r has these regularity properties. For q ∈ g −1 (r) let sg (q) be 1 or −1 according to whether g is orientation preserving or orientation reversing at q. For p ∈ (g ◦ f )−1 (q) define sf (p) and sg◦f (p) similarly. In view of the chain rule and the definition of orientation preservation and reversal, sg◦f (p) = sg (f (p))sf (p). Therefore X X X deg(g ◦ f ) = sg (f (p))sf (p) = sg (q) sf (p) p∈(g◦f )−1 (r) = q∈g −1 (r) X p∈g −1 (q) sg (q) degq (f ). q∈g −1 (r) Since g −1 (r) is contained in a single connected component of N \f (∂C), P Proposition −1 12.3.4 implies that degq (f ) is the same for all q ∈ g (r), and q∈g−1 (r) sg (q) = degr (g). The hypotheses of the last result are rather stringent, which makes it rather artificial. For topologists the following special case is the main point of interest. Corollary 12.4.2. If M, N, and P are compact oriented m-dimensional smooth manifolds, N is connected, and f : M → N and g : N → P are continuous, then deg(g ◦ f ) = deg(f ) · deg(g). For cartesian products the situation is much simpler. Proposition 12.4.3. Suppose that M and N are oriented m-dimensional smooth manifolds, M ′ and N ′ are oriented m′ -dimensional smooth manifolds, C ⊂ M and C ′ ⊂ M are compact, and f : C → N and f ′ : C ′ → N ′ are index admissible over q and q ′ respectively. Then deg(q,q′ ) (f × f ′ ) = degq (f ) · degq′ (f ′ ). Proof. For reasons explained in other proofs above, we may assume that f and f ′ are smooth and that q and q ′ are regular values of f and f ′ . For p ∈ f −1 (r) let sf (p) be 1 or −1 according to whether f is orientation preserving or orientation reversing at p, and define sf ′ (p′ ) for p′ ∈ f ′ −1 (q ′ ) similarly. Since the determinant of a block diagonal matrix is the product of the determinants of the blocks, f × f ′ is orientation preserving or orientation reversing at (p, p′ ) according to whether sp (f )sp′ (f ′ ) is positive or negative, so X sp (f )sp′ (f ′ ) deg(q,q′ ) (f × f ′ ) = (p,p′ )∈(f ×f ′ )−1 (q,q ′ ) = X p∈f −1 (q) sp (f ) · X p′ ∈f ′ −1 (q ′ ) sp′ (f ′ ) = degq (f ) · degq′ (f ′ ). Chapter 13 The Fixed Point Index We now take up the theory of the fixed point index. For continuous functions defined on compact subsets of Euclidean spaces it is no more than a different rendering of the theory of the degree; this perspective is developed in Section 13.1. But we will see that it extends to a much higher level of generality, because the domain and the range of the function or correspondence have the same topology. Concretely, there is a property called Commutativity that relates the indices of the two compositions ĝ ◦ g and g ◦ ĝ where g : C → X̂ and ĝ : Ĉ → X are continuous, and other natural restrictions on this data (that will give rise to a quite cumbersome definition) are satisfied. This property requires that we extend our framework to allow comparison across spaces. Section 13.2 introduces the necessary abstractions and verifies that Commutativity is indeed satisfied in the smooth case. It turns out that this boils down to a fact of linear algebra that came as a surprise when this theory was developed. When we extended the degree from smooth to continuous functions, we showed that continuous functions could be approximated by smooth ones, and that this gave a definition of the degree for continuous functions that made sense and was uniquely characterized by certain axioms. In somewhat the same way Commutativity will be used, in Section 13.4, to extend from Euclidean spaces to separable ANR’s, as per the ideas developed in Section 7.6. The argument is lengthy, technically dense, and in several ways the culmination of our work to this point. The Continuity axiom is then used in Section 13.5 to extend the index to contractible valued correspondences. The underlying idea is the one used to extend from smooth to continuous functions: approximate and show that the resulting definition is consistent and satisfies all properties. Again, there are many verifications, and the argument is rather dense. Multiplication is an additional property of the index that describe its behavior in connection with cartesian products. For continuous functions on subsets of Euclidean spaces it is a direct consequence of Proposition 12.4.3. At higher levels of generality it is, in principle, a consequence of the axioms, because those axioms characterize the index uniquely, but an argument deriving Multiplication from the other axioms is not known. Therefore we carry Multiplication along as an additional property that is extended from one level of generality to the next along with everything else. 176 13.1. AXIOMS FOR AN INDEX ON A SINGLE SPACE 13.1 177 Axioms for an Index on a Single Space The axiom system for the fixed point index is introduced in two stages. This section presents the first group of axioms, which describe the properties of the index that concern a single space. Fix a metric space X. For a compact C ⊂ X let int C = C \ ∂C be the interior of C, and let ∂C = C \ int C be its topological boundary. Definition 13.1.1. An index admissible correspondence for X is an upper semicontinuous correspondence F : C → X, where C ⊂ X is compact, that has no fixed points in ∂C. There will be various indices, according to which sorts of correspondences are considered. The next definition expresses the common properties of their domains. Definition 13.1.2. An index base for X is a set of index admissible correspondences F : C → X such that: (a) f ∈ I whenever C ⊂ X is compact and f : C → X is an index admissible continuous function; (b) F |D ∈ I whenever F : C → X is an element of I, D ⊂ C is compact, and F |D is index admissible. For each integer m ≥ 0 an index base for Rm is given by letting I m be the set of index admissible continuous functions f : C → Rm . We can now state the first batch of axioms. Definition 13.1.3. Let I be an index base for X. An index for I is a function ΛX : I → Z satisfying: (I1) (Normalization1) If c : C → X is a constant function whose value is an element of int C, then ΛX (c) = 1. (I2) (Additivity) If F : C → X is an element of I, C1 , . . . , Cr are pairwise disjoint compact subsets of C, and F P(F ) ⊂ int C1 ∪ . . . ∪ int Cr , then ΛX (F ) = X i ΛX (F |Ci ). (I3) (Continuity) For each element F : C → X of I there is a neighborhood A ⊂ U(C, X) of F such that ΛX (F̂ ) = ΛX (F ) for all F̂ ∈ A ∩ I. A proper appreciation of Continuity depends on the following immediate consequence of Theorem 5.2.1. 1 In the literature this condition is sometimes described as “Weak Normalization,” in contrast with a stronger condition defined in terms of homology. 178 CHAPTER 13. THE FIXED POINT INDEX Proposition 13.1.4. If C ⊂ X is compact, then { F ∈ U(C, X) : F is index admissible } is an open subset of U(C, X). Proposition 13.1.5. For each m = 1, 2, . . . there is a unique index ΛRm for I m given by ΛRm (f ) = deg0 (IdC − f ). Proof. Observe that if C ⊂ Rm is compact, then f : C → Rm is index admissible if and only if IdC −f is degree admissible over the origin, Now (I1)-(I3) follow directly from (D1)-(D3). To prove uniqueness suppose that Λ̃m is an index for I m . For (g, q) ∈ D(Rm , Rm ) let dq (g) = Λ̃m (IdC − g − q), where C is the domain of g. It is straightforward to show that d satisfies (D1)-(D3), so it must be the degree, and consequently Λ̃m = Λm . As we explain now, invariance under homotopy is subsumed by Continuity. However, homotopies will still be important in our work, so we have the following definition and result. Definition 13.1.6. For a compact C ⊂ X a homotopy h : C × [0, 1] → X is index admissible if each ht is index admissible. Proposition 13.1.7. If I is an index base for X and ΛX is an index for I, then ΛX (h0 ) = ΛX (h1 ) whenever h : [0, 1] → C(C, X) is an index admissible homotopy. Proof. Continuity implies that ΛX (ht ) is a locally constant function of t, and [0, 1] is connected. We will refer to this result as the homotopy!principle. 13.2 Multiple Spaces We now introduced two properties of the index that involve comparison across different spaces. When we define an abstract notion of an index satisfying these conditions, we need to require that the set of spaces is closed under the operations that are involved in these conditions, so we require that the sets of spaces and correspondences are closed under cartesian products. Definition 13.2.1. An index scope S consists of a class of metric spaces SS and an index base IS (X) for each X ∈ SS such that (a) SS contains X × X ′ whenever X, X ′ ∈ SS ; (b) F × F ′ ∈ IS (X × X ′ ) whenever X, X ′ ∈ SS , F ∈ IS (X), and F ′ ∈ IS (X ′ ). 179 13.2. MULTIPLE SPACES Our first index scope S 0 has the collection of spaces SS 0 = {R0 , R1 , R2 , . . .} with IS 0 (Rm ) = I m for each m. Of course (b) is satisfied by identifying Rm × Rn with Rm+n . To understand the motivation for the following definition, first suppose that X, X̂ ∈ SS , and that g : X → X̂ and g : X̂ → X are continuous. In this circumstance it will be the case that ĝ ◦ g and g ◦ ĝ have the same index. We would like to develop this idea in greater generality, for functions g : C → X̂ and ĝ : Ĉ → X, but for our purposes it is too restrictive to require that g(C) ⊂ Ĉ and ĝ(Ĉ) ⊂ C. In this way we arrive at the following definition. Definition 13.2.2. A commutativity configuration is a tuple (X, D, E, g, X̂, D̂, Ê, ĝ) where X and X̂ are metric spaces and: (a) E ⊂ D ⊂ X, Ê ⊂ D̂ ⊂ X̂, and D, D̂, E, and Ê are compact; (b) g ∈ C(D, X̂) and ĝ ∈ C(D̂, X) with g(E) ⊂ int D̂ and ĝ(Ê) ⊂ int D; (c) ĝ ◦ g|E and g ◦ ĝ|Ê are index admissible; (d) g(F P(ĝ ◦ g|E )) = F P(g ◦ ĝ|Ê ). Before going forward, we should think through the details of what (d) means. If x is a fixed point of ĝ ◦ g|E , then g(x) is certainly a fixed point of ĝ ◦ g, so it is a fixed point of g ◦ ĝ|Ê if and only if g(x) ∈ Ê. Thus the inclusion g(F P(ĝ ◦ g|E )) ⊂ F P(g ◦ ĝ|Ê ) holds if and only if g(F P(ĝ ◦ g|E )) ⊂ Ê. (∗) On the other hand, if x̂ is a fixed point of g ◦ ĝ|Ê , then it is in the image of g and ĝ(x̂) is a fixed point of ĝ ◦ g that is mapped to x̂ by g, so it is contained in g(F P(ĝ ◦ g|E )) if and only if ĝ(x̂) ∈ E. Therefore the inclusion F P(g ◦ ĝ|Ê ) ⊂ g(F P(ĝ ◦ g|E )) holds if and only if ĝ(F P(g ◦ ĝ|Ê )) ⊂ E. (∗∗) Thus (d) holds if and only if both (∗) and (∗∗) hold, and by symmetry this is the case if and only if ĝ(F P(g ◦ ĝ|Ê )) = F P(ĝ ◦ g|E )). Definition 13.2.3. An index for an index scope S is a specification of an index ΛX for each X ∈ SS such that: (I4) (Commutativity) If (X, D, E, g, X̂, D̂, Ê, ĝ) is a commutativity configuration with X, X̂ ∈ SS , (E, ĝ ◦ g|E ) ∈ IS (X), and (Ê, g ◦ ĝ|Ê ) ∈ IS (X̂), then ΛX (ĝ ◦ g|E ) = ΛX̂ (g ◦ ĝ|Ê ). 180 CHAPTER 13. THE FIXED POINT INDEX The index is said to be multiplicative if: (M) (Multiplication) If X, X ′ ∈ SS , F ∈ IS (X), and F ′ ∈ IS (X ′ ), then ΛX×X ′ (F × F ′ ) = ΛX (F ) · Λ′X (F ′ ). We can now state the result that has been the main objective of all our work. Let SS Ctr be the class of separable absolute neighborhood retracts, and for each X ∈ SS Ctr let IS Ctr (X) be the union over compact C ⊂ X of the sets of index admissible upper semicontinuous contractible valued correspondences F : C → X. Since cartesian products of contractible valued correspondences are contractible valued, we have defined an index scope S Ctr . Theorem 13.2.4. There is a unique index ΛCtr for S Ctr , which is multiplicative. 13.3 The Index for Euclidean Spaces The method of proof of Theorem 13.2.4 is to first establish an index in a quite restricted setting, then show that it has unique extensions, first to an intermediate index scope, and then to S Ctr . Our goal in the remainder of this section is to prove: Theorem 13.3.1. There is a unique index Λ0 for the index scope S 0 given by setting Λ0Rm = ΛRm for each m, and Λ0 is multiplicative. Insofar as continuous functions can be approximated by smooth ones with regular fixed points, and we can use Additivity to focus on a single fixed point, the verification of Commutativity will boil down to the following fact of linear algebra, which is not at all obvious, and was not known prior to the discovery of its relevance to the theory of fixed points. Proposition 13.3.2 (Jacobson (1953) pp. 103–106). Suppose K : V → W and L : W → V are linear transformations, where V and W are vector spaces of dimensions m and n respectively over an arbitrary field. Suppose m ≤ n. Then the characteristic polynomials κKL and κLK of KL and LK are related by the equation κKL (λ) = λn−m κLK (λ). In particular, κLK (1) = |IdV − LK| = |IdW − KL| = κKL (1). Proof. We can decompose V and W as direct sums V = V1 ⊕ V2 ⊕ V3 ⊕ V4 and W = W1 ⊕ W2 ⊕ W3 ⊕ W4 where V1 = ker K ∩ im L, V1 ⊕ V2 = im L, V1 ⊕ V3 = ker K, and similarly for W . With suitably chosen bases the matrices of K and L have the forms 0 L12 0 L14 0 K12 0 K14 0 K22 0 K24 and 0 L22 0 L24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 181 13.3. THE INDEX FOR EUCLIDEAN SPACES Computing the product of these matrices, we find that λI −K12 L22 0 −K12 L24 0 λI − K22 L22 0 −K22 L24 κKL (λ) = 0 λI 0 0 0 0 0 λI Using elementary facts about determinants, this reduces to κKL (λ) = λn−k |λI − K22 L22 |, where k = dim V2 = dim W2 . In effect this reduces the proof to the special case V2 = V and W2 = W , i.e. K and L are isomorphisms. But this case follows from the computation |λIdV − LK| = |L−1 | · |λIdV − LK| · |L| = |L−1 (λIdV − LK)L| = |λIdW − KL|. Lemma 13.3.4 states that if we fix the sets X, D, E, X̂, D̂, Ê, then the set of pairs (g, ĝ) giving a commutativity configuration is open. This is simple and unsurprising, but without the spadework we did in Chapter 5 the proof would be a tedious slog. We extract one piece of the argument in order to be able to refer to it later. Lemma 13.3.3. If X and X̂ are metric spaces, E ⊂ D ⊂ X, and Ê ⊂ D̂ ⊂ X̂, where D, D̂, E, and Ê are compact, then (g, ĝ) 7→ ĝ ◦ g|E is a continuous function from { g ∈ C(D, X̂) : g(E) ⊂ D̂ } × C(D̂, X) to C(E, X). Proof. Lemma 5.3.1 implies that the function g 7→ g|E is continuous, after which Proposition 5.3.6 implies that (g, ĝ) 7→ ĝ ◦ g|E is continuous. Lemma 13.3.4. If X and X̂ are metric spaces, E ⊂ D ⊂ X, Ê ⊂ D̂ ⊂ X̂, and D, D̂, E, and Ê are open with compact closure, then the set of (g, ĝ) ∈ C(D, X̂) × C(D̂, X) such that (X, D, E, g, X̂, D̂, Ê, ĝ) is a commutativity configuration is an open subset of C(D, X̂) × C(D̂, X). Proof. Lemma 4.5.10 implies that the set of (g, ĝ) such that g(E) ⊂ int D̂ and ĝ(Ê) ⊂ int D is an open subset of C(D, X̂) × C(D̂, X). The lemma above implies that the functions (g, ĝ) 7→ ĝ ◦ g|E and (g, ĝ) 7→ g ◦ ĝ|Ê are continuous, and Proposition 13.1.4 implies that the set of (g, ĝ) satisfying part (c) of Definition 13.2.2 is open. In view of the discussion in the last section, a pair (g, ĝ) that satisfies (a)-(c) of Definition 13.2.2 will also satisfy (d) if and only if g(F P(ĝ ◦ g|E )) ⊂ int Ê and ĝ(F P(g ◦ ĝ|Ê )) ⊂ int E. Since (g, ĝ) 7→ ĝ ◦g|E and (g, ĝ) 7→ g ◦ ĝ|Ê are continuous, Theorem 5.2.1 and Lemma 4.5.10 imply that the set of such pairs is open. 182 CHAPTER 13. THE FIXED POINT INDEX Proof of Theorem 13.3.1. Uniqueness and (I1)-(I3) follow from Proposition 13.1.5, so, we only need to prove that (I4) and (M) are satisfied. Suppose that (Rm , D, E, g, Rm̂, D̂, Ê, ĝ) is a commutativity configuration. Lemma 13.3.4 states that it remains a commutativity configuration if g and ĝ are replaced by functions in any sufficiently small neighborhood, and Lemma 13.3.3 implies that ĝ ◦ g|E and g ◦ ĝ|Ê are continuous functions of (g, ĝ), so, since we already know that (I3) holds, it suffices to prove the equation of (I4) after such a replacement. Since the smooth functions are dense in C(D, Rm̂ ) and C(D̂, Rm ) (Proposition 10.2.7) we may assume that g and ĝ are smooth. In addition, Sard’s theorem implies that the regular values of IdE − ĝ ◦ g|E are dense, so after perturbing ĝ by adding an arbitrarily small constant, we can make it the case that 0 is a regular value. In the same way we can add a small constant to g to make 0 a regular value of IdÊ −g ◦ ĝ|Ê , and if the constant is small enough it will still be the case that 0 is a regular value of IdE − ĝ ◦ g|E . Let x1 , . . . , xr be the fixed points of ĝ ◦ g|E , and for each i let x̂i = g(xi ). Then x̂1 , . . . , x̂r are the fixed points of g◦ĝ|Ê . Let D1 , . . . , Dr be pairwise disjoint compact subsets of E with xi ∈ int Di , and let D̂1 , . . . , D̂r be pairwise disjoint open subsets of Ê with x̂i ∈ int D̂i . For each i let Ei be a compact subset of g −1(int D̂i ) with xi ∈ int Ei , and let Êi be a compact subset of ĝ −1(int Di ) with x̂i ∈ int Êi . It is easy to check that each (Rm , Di , Ei , gi , Rm̂ , D̂i , Êi , ĝi ) is a commutativity configuration. Recalling the relationship between the index and the degree, we have ΛRm (ĝ ◦ g|Ei ) = ΛRm̂ (g ◦ ĝ|Êi ) because Proposition 13.3.2 gives |I − D(ĝ ◦ g)(xi )| = |I − Dĝ(x̂i )Dg(xi )| = |I − Dg(xi )Dĝ(x̂i )| = |I − D(g ◦ ĝ)(x̂i )|. Applying Additivity to sum over i gives the equality asserted by (I4). ′ Turning to (M), suppose that C ⊂ Rm and C ′ ⊂ Rm are compact and f : C → ′ Rm and f ′ : C ′ → Rm are index admissible. Then Proposition 12.4.3 gives ΛRm+m′ (f × f ′ ) = deg(0,0) (IdC×C ′ − f × f ′ ) = deg(0,0) ((IdC − f ) × (IdC ′ − f ′ )) = deg0 (IdC − f ) · deg0 (IdC ′ − f ′ ) = ΛRm (f ) · ΛRm′ (f ′ ). 13.4 Extension by Commutativity The extension of the fixed point index to absolute neighborhood retracts was first achieved by Felix Browder in his Ph.D. thesis Browder (1948), using the extension method described in this section. This extension method is, perhaps, the most important application of Commutativity, but Commutativity is also sometimes useful in applications of the index, which should not be particularly surprising since the underlying fact of linear algebra it embodies (Proposition 13.3.2) is already nontrivial. 183 13.4. EXTENSION BY COMMUTATIVITY ˆ We Throughout this section we will work with two fixed index scopes S and S. say that S subsumes Ŝ if, for every X ∈ SŜ , we have X ∈ SS and IŜ (X) ⊂ IS (X). If this is the case, and Λ is an index for S, then the restriction (in the obvious sense) of Λ to Ŝ is an index for Ŝ. (It is easy to check that this is an automatic consequence of the definition of an index.) If Λ̂ is an index for Ŝ, then an extension to S is an index for S whose restriction to Ŝ is Λ̂. If f : C → X is in IS (X), a narrowing of focus for f is a pair (D, E) of compact subsets of int C such that F P(f ) ⊂ int E, E ∪ f (E) ⊂ int D, and D ∪ f (D) ⊂ int C. For such a pair let ε(D,E) be the minimum of: • d E ∪ f (E), X \ (int D) , • d D ∪ f (D), X \ (int C) ; • the supremum of the set of ε > 0 such that d(x′ , f (x′ )) > 2ε whenever x ∈ C \ (int E), x′ ∈ C, and d(x, x′ ) < ε, where d is the given metric for X. (Of course X has many metrics that give the same topology. In contexts such as this we will implicitly assume that one has been selected.) Since f is continuous and admissible, narrowings of focus for f exist: continuity implies the existence of an open neighborhood V of F P(f ) satisfying V ∪ f (V ) ⊂ int C. Repeating this observation gives an open neighborhood W of F P(f ) satisfying W ∪ f (W ) ⊂ V , and we can let D = V and E = W . ˆ ε)-domination of C is Let C be a compact subset of a metric space X. An (S, a quadruple (X̂, Ĉ, ϕ, ψ) in which X̂ ∈ SŜ , Ĉ is an compact subset of X̂, ϕ : C → Ĉ and ψ : Ĉ → X are continuous functions, and ψ ◦ ϕ is ε-homotopic to IdC . We say that Ŝ dominates S if, for each X ∈ SS , each compact C ⊂ X, and each ε > 0, there is an (Ŝ, ε)-domination of C. This section’s main result is: Theorem 13.4.1. If Ŝ dominates S and Λ̂ is an index for Ŝ, then there is an index Λ for S that is defined by setting ΛX (f ) = Λ̂X̂ (ϕ ◦ f ◦ ψ|ψ−1 (E) ) (†) whenever X ∈ SS , f : C → X is an element of IS (X), (D, E) is a narrowing of ˆ ε)-domination of C. If, in addifocus for f , ε < ε(D,E) , and (X̂, Ĉ, ϕ, ψ) is an (S, ˆ then Λ is the unique extension of Λ̂ to S. If Λ̂ is multiplicative, tion, S subsumes S, then so is Λ. Let SS ANR be the class of compact absolute neighborhood retracts, and for each X ∈ SS ANR let IS ANR (X) be the union over open C ⊂ X of the sets of index admissible functions in C(C, X). These definitions specify an index scope S ANR because SS ANR is closed under formation of finite cartesian products, and f × f ′ ∈ IS ANR (X × X̂) whenever X, X̂ ∈ SS ANR , f ∈ IS ANR (X), and f ′ ∈ IS ANR (X̂). 184 CHAPTER 13. THE FIXED POINT INDEX Theorem 13.4.2. There is a unique index ΛANR for S ANR that extends Λ0 , and ΛANR is multiplicative. Proof. Theorem 7.6.4 implies that S 0 dominates S ANR , and S ANR evidently subsumes S 0 . The rest of this section is devoted to the proof of Theorem 13.4.1. Before proceeding, the reader should be warned that this is, perhaps, the most difficult argument in this book. Certainly it is the most cumbersome, from the point of view of the burden of notation, because the set up used to extend the index is complex, and then several verifications are required in that setting. To make the expressions somewhat more compact, from this point forward we will frequently drop the symbol for composition, for instance writing ψϕ rather than ψ ◦ ϕ. Lemma 13.4.3. Suppose X ∈ SŜ , f : C → X is in IŜ (X), (D, E) is a narrowing of focus for f , 0 < ε < ε(D,E) , and (X̂, Ĉ, ϕ, ψ) is an (Ŝ, ε)-domination of U. Let D̂ = ψ −1 (D) and Ê = ψ −1 (E). Then (X, D, E, ϕf |D , X̂, D̂, Ê, ψ|D̂ ) is a commutativity configuration. Proof. We need to verify (a)-(d) of Definition 13.2.2. We have E ⊂ D ⊂ X with D and E compact, so Ê ⊂ D̂ ⊂ X̂. In addition D̂ and Ê are closed because ψ is continuous, so they are compact because they are subsets of Ĉ. Thus (a) holds. Of course ϕf |D and ψ|D̂ are continuous. We have ψ(ϕ(f (E))) ⊂ Uε (f (E)) ⊂ int D, so ϕ(f (E)) ⊂ ψ −1 (int D) ⊂ int D̂. In addition, ψ(Ê) ⊂ E ⊂ int D. Thus (b) holds. If x ∈ D \ int (E), then d(x, f (x)) > 2ε(D,E) > 2ε and d(f (x), ψ(ϕ(f (x)))) < ε, so x cannot be a fixed point of ψϕf . Thus F P(ψϕf |E ) ⊂ int E. If x̂ ∈ D̂ is a fixed point of ϕf ψ|D̂ , then ψ(x̂) is a fixed point of ψϕf |D , so F P(ϕf ψ|D̂ ) ⊂ ψ −1 (int E) ⊂ int Ê. Thus (c) holds. We now establish (∗) and (∗∗). We have ψ(ϕ(f (F P(ψϕf |D )))) = F P(ψϕf |D ) ⊂ int E, so ϕ(f (F P(ψϕf |D ))) ⊂ ψ −1 (int E) ⊂ Ê, and F P(ϕf ψ|Ê ) ⊂ int Ê, so ψ(F P(ϕf ψ|D̂ )) ⊂ ψ(int Ê) ⊂ E. Thus (d) holds. From this point forward we assume that there is a given index Λ̂ for Ŝ. In order for our proposed definition of Λ to be workable it needs to be the case that the definition of the derived index does not depend on the choice of narrowing or domination, and it turns out that proving this will be a substantial part of the overall effort. The argument is divided into a harder part proving a special case and a reduction of the general case to this. 185 13.4. EXTENSION BY COMMUTATIVITY Lemma 13.4.4. Let X be an element of SŜ , and let f : C → X be an element of IŜ (X). Suppose that (D, E) is a narrowing of focus for f , 0 < ε1 , ε2 < ε(D,E) , ˆ ε2 )and (X̂1 , Ĉ1 , ϕ1 , ψ1 ) and (X̂2 , Ĉ2 , ϕ2 , ψ2 ) are an (Ŝ, ε1 )-domination and an (S, domination of C. Set D̂1 = ψ1−1 (D), Ê1 = ψ1−1 (E), D̂2 = ψ2−1 (D), Ê2 = ψ2−1 (E). Then Λ̂X̂1 (ϕ1 f ψ1 |Ê1 ) = Λ̂X̂2 (ϕ2 f ψ2 |Ê2 ). Proof. The definition of domination gives an ε-homotopy h : C × [0, 1] → X with h0 = IdC and h1 = ψ1 ϕ1 and a ε-homotopy j : C × [0, 1] → X be an ε-homotopy with j0 = IdC and j1 = ψ2 ϕ2 . We will show that: (a) the homotopy t 7→ ϕ1 jt f ψ1 |Ê1 is well defined and index admissible; (b) the homotopy t 7→ ϕ2 f ht ψ2 |Ê2 is well defined and index admissible; (c) (X̂1 , D̂1 , Ê1 , ϕ2 f ψ1 , X̂2 , D̂2 , Ê2 , ϕ1 ψ2 ) is a commutativity configuration. The claim follows from the computation Λ̂X̂1 (ϕ1 f ψ1 |Ê1 ) = Λ̂X̂1 (ϕ1 ψ2 ϕ2 f ψ1 |Ê1 ) = Λ̂X̂2 (ϕ2 f ψ1 ϕ1 ψ2 |Ê2 ) = Λ̂X̂2 (ϕ2 f ψ2 |Ê2 ). Specifically, in view of (a) and (b) the first and third equalities follows from the homotopy principle, while (c) permits an application of Commutativity that gives the second equality. For each t the composition ϕ1 jt f ψ1 |Ê1 is well defined because ψ1 (Ê1 ) ⊂ E and jt (f (E)) ⊂ Uε (f (E)) ⊂ D ⊂ C. In order to show that this homotopy is index admissible, we assume (aiming at a contradiction) that for some 0 ≤ t ≤ 1, y1 ∈ ∂ Ê1 is a fixed point of ϕ1 jt f ψ1 . Then ψ1 (y1 ) is a fixed point of ψ1 ϕ1 jt f . The definition of Ê1 and the continuity of ψ1 imply that ψ1 (y1 ) ∈ ∂E, so that d f ψ1 (y1 ), ψ1 (y1 ) ≥ 2ε(D,E) > 2ε, but d ψ1 ϕ1 jt f ψ1 (y1 ), f ψ1 (y1 ) = d h1 jt f ψ1 (y1 ), f ψ1 (y1 ) ≤ d h1 jt f ψ1 (y1 ), jt f ψ1 (y1 ) + d jt f ψ1 (y1 ), f ψ1 (y1) < 2ε, so this is impossible. We have established (a) and (by symmetry) (b). To establish (c) we need to verify (a)-(d) of Definition 13.2.2. Evidently Ê1 ⊂ D̂1 ⊂ X̂1 and Ê2 ⊂ D̂2 ⊂ X̂2 with D̂1 , Ê1 , D̂2 , and Ê2 compact, so (a) holds. We have f (ψ1 (D̂1 )) ⊂ f (D) ⊂ C, so ψ1 (D̂1 ) is contained in the domain of ϕ2 , and ψ2 (D̂2 ) ⊂ D ⊂ C, so ψ2 (D̂2 ) is contained in the domain of ϕ1 . Thus ϕ2 f ψ1 and ϕ1 ψ2 are well defined, and of course they are continuous. In addition, ψ2 ϕ2 f ψ1 (Ê1 ) ⊂ ψ2 ϕ2 f (E) ⊂ Uε (f (E)) ⊂ int D and ψ1 ϕ1 ψ2 (Ê2 ) ⊂ Uε (ψ2 (Ê2 )) ⊂ Uε (E) ⊂ int D, 186 CHAPTER 13. THE FIXED POINT INDEX so ϕ2 f ψ1 (Ê1 ) ⊂ ψ2−1 (D) ⊂ int D̂2 and ϕ1 (ψ2 (Ê2 )) ⊂ ψ1−1 (int D) = int D̂1 . Thus (b) holds. Above we showed that ϕ1 ψ2 ϕ2 f ψ1 |Ê1 = ϕ1 j1 f ψ1 |Ê1 and ϕ2 f ψ1 ϕ1 ψ2 |Ê2 = ϕ2 f h1 ψ2 |Ê2 are index admissible. That is, (c) holds. Suppose that y1 ∈ F P(ϕ1 ψ2 ϕ2 f ψ1 |Ê1 ) and y2 = ϕ2 f ψ1 (y1 ). Then ψ2 (y2 ) is a fixed point of ψ2 ϕ2 f ψ1 ϕ1 . The definition of ε(D,E) implies that this is impossible if ψ2 (y2 ) ∈ / E, so y2 ∈ ψ2−1 (E) = Ê2 . Now suppose that y2 ∈ F P(ϕ2 f ψ1 ϕ1 ψ2 |Ê2 ) and y1 = ϕ1 ψ2 (y2 ). Then ψ1 (y1 ) is a fixed point of ψ1 ϕ1 ψ2 ϕ2 f , so ψ1 (y1 ) ∈ E and y1 ∈ Ê1 . We have shown that ϕ2 f ψ1 (F P(ϕ1 ψ2 ϕ2 f ψ1 |Ê1 )) ⊂ Ê2 and ϕ1 ψ2 (F P(ϕ2 f ψ1 ϕ1 ψ2 |Ê2 )) ⊂ Ê1 , which is to say that (∗) and (∗∗) hold, which implies (d), completing the proof. The hypotheses of the next result are mostly somewhat more general, but we now need to assume that Ŝ dominates S. Lemma 13.4.5. Assume that Ŝ dominates S. Let X be an element of SS , and let f : C → X be an element of IS (X). Suppose (D1 , E1 ) and (D2 , E2 ) are narrowings of focus for f , 0 < ε1 < ε(D1 ,E1 ) , 0 < ε2 < ε(D2 ,E2 ) , and (X̂1 , Ĉ1 , ϕ1 , ψ1 ) and ˆ ε2 )-domination of C. Set (X̂2 , Ĉ2 , ϕ2 , ψ2 ) are an (Ŝ, ε1)-domination and an (S, D̂1 = ψ1−1 (D1 ), Ê1 = ψ1−1 (E1 ), D̂2 = ψ2−1 (D2 ), Ê2 = ψ2−1 (E2 ). Then Λ̂X̂1 (ϕ1 f ψ1 |Ê1 ) = Λ̂X̂2 (ϕ2 f ψ2 |Ê2 ). Proof. It suffices to show this when D1 ⊂ D2 and E1 ⊂ E2 , because then the general case follows from two applications in which first D1 and E1 , and then D2 and E2 , are replaced by D1 ∩ D2 and E1 and E2 with E1 ∩ E2 . The assumption that Sˆ dominates S which guarantees the existence of an (Ŝ, ε′2) domination of U for arbitrarily small ε′2 , and if we apply the lemma above to this domination and the given one we find that it suffices to prove the result with the given domination replaced by this one. This means that we may assume that ε2 is as small as need be, and in particular we may assume that ε2 < ε(D1 ,E1 ) . Now Additivity implies that Λ̂X̂2 (ϕ2 f ψ2 |Ê2 ) = Λ̂X̂2 (ϕ2 f ◦ ψ2 |ψ2−1 (E1 ) ), which means that it suffices to establish the result with D2 and E2 replaced by D1 and E1 , which is the case established in the lemma above. Proof of Theorem 13.4.1. Since Sˆ dominates S, the objects used to define Λ exist, and the last result implies that the definition of Λ does not depend on the choice of (D, E), ε, and (X̂, Ĉ, ϕ, ψ). We now verify that Λ satisfies (I1)-(I4) and (M). 187 13.4. EXTENSION BY COMMUTATIVITY For the proofs of (I1)-(I3) we fix a particular X ∈ SS and an f : C → X in IS (X), and we let (D, E), ε, and (X̂, Ĉ, ϕ, ψ) be as in the hypotheses. Normalization: If f is a constant function, then so is ϕf ψ, so Normalization for Λ̂ gives ΛX (f ) = Λ̂X̂ (ϕf ψ) = 1. Additivity: Suppose that F P(f ) ⊂ int C1 ∪ . . . ∪ int Cr where C1 , . . . , Cr ⊂ C are compact and pairwise disjoint. For each j = 1, . . . , r choose open sets Dj ⊂ D ∩ Cj and Ej ⊂ E ∩ Cj such that (Dj , Ej ) is a narrowing of focus for (Cj , f |Cj ). In view of Lemma 13.4.5 we may assume that ε < ε(Dj ,Ej ) for all j. It is easy to see that for ˆ ε)-domination of Cj . For each j let E ′ = ψ −1 (Ej ). each j, (X̂, Ĉ, ϕ|Cj , ψ) is an (S, j Additivity for Λ̂ gives X X ΛX (f |Cj ). Λ̂X̂ (ϕf ψ|Ej′ ) = ΛX (f ) = Λ̂X̂ (ϕf ψ|Ê ) = j j Continuity: It is easy to see that if f ′ : C → X that are sufficiently close to f , then (D, E) is a narrowing of focus for (C, f ′), and (X̂, Ĉ, ϕ, ψ) is a (Ŝ, ε)-domination of C. Since f ′ 7→ ϕf ′ ψ is continuous (Propositions 5.5.2 and 5.5.3) Continuity for Λ̂ gives ΛX (f ) = Λ̂X̂ (ϕf ψ) = Λ̂X̂ (ϕf ′ ψ) = ΛX (f ′ ) when f ′ is sufficiently close to f . Commutativity: Suppose that (X1 , C1 , D1 , g1 , X2 , C2 , D2 , g2 ) is a commutativity configuration with X1 , X2 ∈ SS . Replacing D1 and D2 with smaller open neighborhoods of F P(g2 g1 ) and F P(g1 g2 ) if need be, we may assume that D1 ∪ g2 g1 (D1 ) ⊂ C1 and D2 ∪ g1 g2 (D2 ) ⊂ C2 . Choose open sets E1 and E2 with F P(g2 g1 ) ⊂ E1 , E1 ∪ g2 g1 (E1 ) ⊂ D1 , F P(g1 g2 ) ⊂ E2 , E2 ∪ g1 g2 (E2 ) ⊂ D2 . For any positive ε1 < ε(D1 ,E2 ) and ε2 < ε(D2 ,E2) Lemma 13.4.5 implies that there is a (S, ε1 )-domination (X̂1 , Ĉ1 , ϕ1 , ψ1 ) of C1 and a (S, ε2 )-domination (X̂2 , Ĉ2 , ϕ2 , ψ2 ) of C2 . Let D̂1 = ψ1−1 (D1 ), Ê1 = ψ1−1 (E1 ), D̂2 = ψ2−1 (D2 ), Ê2 = ψ2−1 (E2 ). Let h : C1 × [0, 1] → X1 be a ε1 -homotopy with h0 = IdC1 and h1 = ψ1 ϕ1 , and let j : C2 × [0, 1] → X2 be a ε2 -homotopy with j0 = IdC2 and j1 = ψ2 ϕ2 . The desired result will follow from the calculation ΛX1 (g2 g1 ) = Λ̂X̂1 (ϕ1 g2 g1 ψ1 |Ê1 ) = Λ̂X̂1 (ϕ1 g2 ψ2 ϕ2 g1 ψ1 |Ê1 ) 188 CHAPTER 13. THE FIXED POINT INDEX = Λ̂X̂2 (ϕ2 g1 ψ1 ϕ1 g2 ψ2 |Ê2 ) = Λ̂X̂2 (ϕ2 g1 g2 ψ2 |Ê2 ) = ΛX2 (g1 g2 ). Here the first and fifth equality are from the definition of Λ, the second and fourth are implied by Continuity for Λ̂, and the third is from Commutativity for Λ̂. In order for this to work it must be the case that all the compositions in this calculation are well defined, in the sense that the image of the first function is contained in the domain of the second function, the homotopies t 7→ ϕ1 g2 jt g1 ψ1 |Ê1 and t 7→ ϕ2 g1 ht g2 ψ2 |Ê2 are index admissible, and (X̂1 , D̂1 , Ê1 , ϕ2 g1 ψ1 |D1 , X̂2 , D̂2 , Ê2 , ϕ1 g2 ψ2 |D2 ) is a commutativity configuration. Clearly this will be the case when ε1 and ε2 are sufficiently small. Multiplication: We now consider X1 , X2 ∈ SS , f1 : C1 → X1 in IS (X1 ), and f2 : C2 → X2 in IS (X2 ). For each i = 1, 2 let (Di , Ei ) be a narrowing of focus for (Ci , fi ), and let (X̂i , Ĉi , ϕi , ψi ) be an (Ŝ, εi )-domination of C, where ε < ε(Di ,Ei ) . The definition of an index scope requires that X1 × X2 ∈ SS and (C1 × C2 , f1 × f2 ) ∈ IS (X1 × X2 ). Clearly (D1 × D2 , E1 × E2 ) is a narrowing of focus for (C1 × C2 , f1 × f2 ). If d1 and d2 are given metrics for X1 and X2 respectively, endow X1 × X2 with the metric (x1 , x2 ), (y1 , y2 ) → 7 max{d1 (x1 , y1 ), d2 (x2 , y2)}. Let ε = max{ε1 , ε2 }. Then (X̂1 × X̂2 , Ĉ1 × Ĉ2 , ϕ1 ×ϕ2 , ψ1 ×ψ2 ) is a (Ŝ, ε)-domination of C1 × C2 . It is also easy to check that ε(D1 ×D2 ,E1 ×E2 ) ≥ max{ε(D1 ,E1 ) , ε(D2 ,E2 ) }, so ε < ε(D1 ×D2 ,E1 ×E2 ) . Therefore Lemma 13.4.3 implies that the validity of the first equality in ΛX1 ×X2 (f1 × f2 ) = Λ̂X̂1 ×X̂2 ((ϕ1 f1 ψ1 × ϕ2 f2 ψ2 )|Ê1 ×Ê2 ) = Λ̂X̂1 (ϕ1 f1 ψ1 |Ê1 ) · Λ̂X̂2 (ϕ2 f2 ψ2 |Ê2 ) = ΛX1 (f1 ) · ΛX2 (f2 ) the second one is an application of Multiplication for Λ̂ and the third is the definition of Λ. We now prove that if S subsumes Ŝ, then Λ is the unique extension of Λ̂ to S. Consider X ∈ SŜ and (C, f ) ∈ IŜ (X). For any ε > 0, (X, C, IdC , IdC ) is ˆ ε)-domination of C. For any narrowing of focus (D, E) equation (†) gives an (S, ΛX (f ) = Λ̂X (f |E ) and Additivity for Λ̂ gives Λ̂X (f |E ) = Λ̂X (f ). Thus Λ extends Λ̂. Two indices for S that restrict to Λ̂ necessarily agree everywhere because, by Continuity and Commutativity, (†) holds in the circumstances described in the statement of Theorem 13.4.1. 13.5. EXTENSION BY CONTINUITY 13.5 189 Extension by Continuity This section extends the index from continuous functions to upper semicontinuous contractible valued correspondences. As in the last section, we describe the extension process abstractly, thereby emphasizing the aspects of the situation that drive the argument. Definition 13.5.1. If I and Î are index bases for a compact metric space X, we say that Î approximates I if: (E1) If C, D ⊂ X are open with D ⊂ C, then Î ∩ C(D, X) is dense in { F |D : F ∈ I ∩ U(C, X) and F |D is index admissable }. (E2) If C, D ⊂ X are open with D ⊂ C, F ∈ I ∩ U(C, X), and A ⊂ C × X is a neighborhood of Gr(F ), then there is a neighborhood B ⊂ D × X of Gr(F |D ) such that any two functions f, f ′ ∈ C(D, X) with Gr(f ), Gr(f ′ ) ⊂ B are the endpoints of a homotopy h : [0, 1] → C(D, X) with Gr(ht ) ⊂ A for all t. It would be simpler if, in (E1) and (E2), we could have V = U, but unfortunately Theorem 9.1.1 is not strong enough to justify working with such a definition. Definition 13.5.2. If S and Ŝ are two index scopes with SŜ = SS , then Sˆ approximates S) if, for each X ∈ SŜ , IŜ (X) approximates IS (X), and (E3) If (X, C, D, g, X ′, C ′ , D ′ , g ′) is a commutativity configuration such that X, X ′ ∈ SS , g ′ ◦ g ∈ IS (X), and g ◦ g ′ ∈ IS (X ′ ), and S ⊂ C(C, X ′) and S ′ ⊂ C(C ′ , X) are neighborhoods of g and g ′, then there exist γ ∈ S and γ ′ ∈ S ′ such that γ ′ ◦ γ|D ∈ IŜ (X) and γ ◦ γ ′ |D′ ∈ IŜ (X ′ ). Theorem 13.5.3. Suppose that Sˆ approximates S, and Λ̂ is an index for Ŝ. For each X ∈ SŜ let ΛX be the extension of Λ̂X to IS (X) given by the last result. Then the system Λ of maps ΛX is an index for S. If, in addition, S subsumes Ŝ, then Λ is the unique extension of Λ̂ to S. Evidently S Ctr subsumes S ANR . The constant functions in S ANR and S Ctr are the same, of course, and Theorem 9.1.1 implies that (E1) and (E2) are satisfied when S = S Ctr and Ŝ = S ANR . Therefore Theorem 13.2.4 follows from Theorem 13.4.2 and the last result. The remainder of this section is devoted to the proof of Theorem 13.5.3. The overall structure of our work here is similar to what we saw in the last section. We are given an index Λ̂ for an index base Î, and we wish to use this to define an index for another index base I. In this case Continuity is the axiom that does the heavy lifting. Assumption (E1) states that every element of the second base can be approximated by an element of the first base. Therefore we can define the index of an element of the second base to be the index of sufficiently fine approximations by elements of the first base, provided these all agree, and assumption (E2), in conjunction with Continuity, implies that this is the case. Having defined the index on the second base, we must verify that it satisfies the axioms. This phase is broken down into two parts. The following result verifies that the axioms pertaining to a single index base hold. 190 CHAPTER 13. THE FIXED POINT INDEX Proposition 13.5.4. Suppose I and Î are index bases for a compact metric space X, and Î approximates I. Then for any index Λ̂X : Î → Z there is a unique index ΛX : I → Z such that for each open C ⊂ X with compact closure, each F ∈ I ∩U(C, X), and each open D with F P(F ) ⊂ D and D ⊂ C, there is a neighborhood E ⊂ U(D, X) of F |D such that ΛX (F ) = Λ̂X (f ) for all f ∈ E ∩ C(D, X) ∩ Î. Proof. Fix C, F ∈ I ∩ U(C, X), and D as in the hypotheses. Then F |D is index admissable, hence an element of I because I is an index base. Applying (E2), let B ⊂ D × X be a neighborhood of Gr(F |D ) such that for any f, f ′ ∈ Î ∩ C(D, X) with Gr(f ), Gr(f ′ ) ⊂ B there is a homotopy h : [0, 1] → C(D, X) with h0 = f , h1 = f ′ , and Gr(ht ) ⊂ (C × X) \ { (x, x) : x ∈ C \ D } for all t. Since F has no fixed points in C \ D, the right hand side is a neighborhood of Gr(FD ). We define ΛX by setting ΛX (F ) := Λ̂X (f ) for any such f . We first have to show that this definition makes sense. First, (E1) implies that { F ′ ∈ U(D, X) : Gr(F ′ ) ⊂ B } ∩ C(D, X) ∩ Î = 6 ∅, and Continuity implies that this definition does not depend on the choice of f . Since A and B can be replaced by smaller open sets, it does not depend on the choice of A and B. We must also show that it does not depend on the choice of D. So, let D̃ be another open set with D̃ ⊂ C and F P(F ) ∩ (C \ D̃) = ∅. Then F P(F ) ⊂ D ∩ D̃. The desired result follows if we can show that it holds when D and D̃ are replaced by D and D ∩ D̃ and also when D and D̃ are replaced by D ∩ D̃ and D̃. Therefore we may assume that D̃ ⊂ D. Let B̃ ⊂ D̃ × X be a neighborhood of Gr(F |D̃ ) such that for any f, f ′ ∈ Î ∩ C(D̃, X) with Gr(f ), Gr(f ′ ) ⊂ B̃ there is a homotopy h : [0, 1] → C(D̃, X) with h0 = f , h1 = f ′ , and Gr(ht ) ⊂ (C × X) \ { (x, x) : x ∈ C \ D̃ } for all t. Since restriction to a compact subdomain is a continuous operation (Lemma 5.3.1) we may replace B with a smaller neighborhood of F |D to obtain Gr(f |D̃ ) ⊂ B ′ whenever Gr(f ) ⊂ B. For such an f Additivity gives Λ̂X (f ) = Λ̂X (f |D̃ ) as desired. It remains to show that (I1)-(I3) are satisfied. Normalization: If c is a constant function, we can take c itself as the approximation used to define ΛX (c), so Normalization for ΛX follows from Normalization for Λ̂X . Additivity: Consider F ∈ I with domain C. Let C1 , . . . , Cr be disjoint open subsets of C whose union contains F P(F ). Let D1 , . . . , Dr be open subsets of C with D1 ⊂ 191 13.5. EXTENSION BY CONTINUITY C1 , . . . , Dr ⊂ Cr and F P(F ) ⊂ D1 , . . . , Dr . For each i = 1, . . . , r let Bi be a neighborhood of Gr(F |Di ) such that ΛX (F |Ci ) = Λ̂X (fi ) whenever fi ∈ Î ∩C(Di , X) with Gr(fi ) ⊂ Bi . Let D := D1 ∪ . . . ∪ Dr , and let B be a neighborhood of F |D such that ΛX (F |C ) = Λ̂X (f ) whenever f ∈ Î ∩ C(D, X) with Gr(f ) ⊂ B. Since restriction to a compact subdomain is a continuous operation (Lemma 5.3.1) B may be chosen so that, for all i, Gr(f |Di ) ⊂ Bi whenever Gr(f ) ⊂ B. For any f ∈ Î ∩ C(D, X) with Gr(f ) ⊂ B we now have X X ΛX (F ) = Λ̂X (f |D ) = Λ̂X (f |Di ) = ΛX (F |Ci ). i i Continuity: Suppose that C ⊂ X is open with compact closure, D ⊂ C is open with D ⊂ C, F ∈ I with F P(F ) ⊂ D, and B is a neighborhood of Gr(F |D ) with ΛX (F ) = Λ̂X (f ) for all f ∈ Î ∩ C(D, X) with Gr(f ) ⊂ B. Then the set of F ′ ∈ I ∩ U(C, X) such that F ′ |D ∈ B and F P(F ′ ) ⊂ D is a neighborhood of F , and for every such F ′ we have ΛX (F ′ ) = ΛX (F ). The remainder of the argument shows that the extension procedure described above results in an index satisfying (I4) and (M) when it is used to define extensions for all spaces in an index scope. Proof of Theorem 13.5.3. We begin by noting that when S subsumes Ŝ, Continuity for Λ̂ implies both that Λ is an extension of Λ̂ and that any extension must satisfy the condition used to define Λ, so Λ is the unique extension. In view of the last result, it is only necessary to verify that Λ satisfies (I4) and (M). The argument is based on the description of Λ given in the first paragraph of the proof of the last result. Commutativity: Suppose that (X, C, D, g, X ′, C ′ , D ′ , g) is a commutativity configuration with g ′ ◦ g ∈ IS (X) and g ◦ g ′ ∈ IS (X ′ ). Lemma 13.3.4 implies that there are neighborhoods S ⊂ C(C, X ′) and S ′ ⊂ C(C ′ , X) such that (X, C, D, γ, X ′, C ′ , D ′ , γ ′ ) is a commutativity configuration for all γ ∈ S and γ ′ ∈ S ′ . Let B ⊂ C(D, X) and B ′ ⊂ C(D ′ , X ′) be neighborhoods of g ′ ◦ g|D and g ◦ g ′|D′ , respectively, such that Λ(g ′ ◦ g|D ) = Λ̂(f ) and Λ(g ◦ g ′|D′′ ) = Λ̂(f ′ ) whenever f ∈ B ∩ IŜ (X) and f ′ ∈ C ∩ IŜ (X ′ ). The continuity of restriction and composition (Lemma 5.3.1 and Proposition 5.3.6) implies the existence of neighborhoods T ⊂ C(C, X ′) of g and T ′ ⊂ C(D ′ , X) of g ′ such that γ ′ ◦ γ|D ∈ B and γ ◦ γ ′ |D′ ∈ B ′ whenever γ ∈ T and γ ′ ∈ T ′ . Applying (E3), we may choose γ ∈ S ∩ T and γ ′ ∈ S ′ ∩ T ′ such that γ ′ ◦ γ|D ∈ IŜ (X) and γ ◦ γ ′ |D′ ∈ IŜ (X ′ ). Now Commutativity for Λ̂ gives ΛX (g ′ ◦ g|D ) = Λ̂X (γ ′ ◦ γ|D ) = Λ̂X ′ (γ ◦ γ ′ |D′ ) = ΛX ′ (g ◦ g ′ |D′ ). Multiplication: For spaces X, X ′ ∈ SS and open C ⊂ X and C ′ ⊂ X ′ with compact closure consider F ∈ IS (X) ∩ U(C, X) and F ′ ∈ IS (X ′ ) ∩ U(C ′ , X ′ ). 192 CHAPTER 13. THE FIXED POINT INDEX Then the definition of an index scope implies that F ×F ′ ∈ IS (X×X ′ ). Choose open sets D and D ′ with F P(F ) ⊂ D, D ⊂ C, F P(F ′ ) ⊂ D ′ , and D ′ ⊂ C ′ . As above, we can find neighborhoods B ⊂ U(D, X), B ′ ⊂ U(D ′ , X ′ ), and D ⊂ U(D×D ′ , X ×X ′ ), of F |D , F ′ |D′ , and (F × F ′ )|D×D′ respectively, such that ΛX (F ) = Λ̂X (f ) for all f ∈ B ∩ IŜ (X), ΛX ′ (F ′ ) = Λ̂X ′ (f ′ ) for all f ′ ∈ B ′ ∩ IŜ (X ′ ), and ΛX×X ′ (F × F ′ ) = Λ̂X×X ′ (j) for all j ∈ D ∩ IŜ (X × X ′ ). Since the formation of cartesian products of correspondences is a continuous operation (this is Lemma 5.3.4) we may replace B and B ′ with smaller neighborhoods to obtain F̃ × F̃ ′ ∈ D for all F̃ ∈ B and F̃ ′ ∈ B ′ . Assumption (E1) implies that there are f ∈ B ∩ IŜ (X) ∩ C(D, X) and f ′ ∈ B ′ ∩ IŜ (X ′ ) ∩ C(D ′ , X ′ ). The definition of an index scope implies that f ×f ′ ∈ IŜ (X ×X ′), and Multiplication (I4) for Λ̂ now gives ΛX×X ′ (F × F ′ ) = Λ̂X×X ′ (f × f ′ ) = Λ̂X (f ) · Λ̂X ′ (f ′ ) = ΛX (F ) · ΛX ′ (F ′ ). Part III Applications and Extensions 193 Chapter 14 Topological Consequences This chapter is a relaxing and refreshing change of pace. Instead of working very hard to slowly build up a toolbox of techniques and specific facts, we are going to harvest the fruits of our earlier efforts, using the axiomatic description of the fixed point index, and other major results, to quickly derive a number of quite famous results. In Section 14.1 we define the Euler characteristic, relate it to the Lefschetz fixed point theorem, and then describe the Eilenberg-Montgomery as a special case. For two general compact manifolds, the degree of a map from one to the other is a rather crude invariant, in comparison with many others that topologists have defined. Nevertheless, when the range is the m-dimensional sphere, the degree is already a “complete” invariant in the sense that it classifies functions up to homotopy: if M is a compact m-dimensional manifold that is connected, and f and f ′ are functions from M to the m-sphere of the same degree, then f and f ′ are homotopic. This famous theorem, due to Hopf, is the subject of Section 14.2. Section 12.4 proves a simple result asserting that the degree of a composition of two functions is the products of their degrees. Section 14.3 presents several other results concerning fixed points and antipodal maps of a map from a sphere to itself. Some of these are immediate consequences of index theory and the Hopf theorem, but the Borsuk-Ulam theorem requires a substantial proof, so it should be thought of as a significant independent fact of topology. It has many consequences, including the fact that spheres of different dimensions are not homeomorphic. In Section 14.4 we state and prove the theorem known as invariance of domain. It asserts that if U ⊂ Rm is open, and f : U → Rm is continuous and injective, then the image of f is open, and the inverse is continuous. One may think of this as a purely topological version of the inverse function theorem, but from the technical point of view it is much deeper. If a connected set of fixed points has a nonzero index, it is essential. This raises the question of whether a connected set of fixed points of index zero is necessarily inessential. Section 14.5 presents two results of this sort. 194 14.1. EULER, LEFSCHETZ, AND EILENBERG-MONTGOMERY 14.1 195 Euler, Lefschetz, and Eilenberg-Montgomery The definition of the Euler characteristic, and Euler’s use of it in the analyses of various problems, is often described as the historical starting point of topology as a branch of mathematics. In popular expositions the Euler characteristic of a 2dimensional manifold M is usually defined by the formula χ(M) := V −E +F where V , E, and F are the numbers of vertices, edges, and 2-simplices in a triangulation of M. Our definition is: Definition 14.1.1. The Euler characteristic χ(X) of a compact ANR X is ΛX (IdX ). Here is a sketch of a proof that our definition of χ(M) agrees with Euler’s when M is a triangulated compact 2-manifold. We deform the identity function slightly, achieving a function f : M → M defined as follows. Each vertex of the triangulation is mapped to itself by f . Each barycenter of an edge is mapped to itself, and the points on the edge between the barycenter and either of the vertices of the edge are moved toward the barycenter. Each barycenter of a two dimensional simplex is mapped to itself. If x is a point on the boundary of the 2-simplex, the line segment between x and the barycenter is mapped to the line segment between f (x) and the barycenter, with points on the interior of the line segment pushed toward the barycenter, relative to the affine mapping. It is easy to see that the only fixed points of f are the vertices and the barycenters of the edges and 2-simplices. Euler’s formula follows once we show that the index of a vertex is +1, the index of the barycenter of an edge is −1, and the index of the barycenter of a 2-simplex is +1. We will not give a detailed argument to this effect; very roughly it corresponds to the intuition that f is “expansive” at each vertex, “compressive” at the barycenter of each 2-simplex, and expansive in one direction and compressive in another at the barycenter of an edge. Although Euler could not have expressed the idea in modern language, he certainly understood that the Euler characteristic is important because it is a topological invariant. Theorem 14.1.2. If X and X ′ are homeomorphic compact ANR’s, then χ(X) = χ(X ′ ). Proof. For any homeomorphism h : X → X ′ , Commutativity implies that χ(X) = ΛX (IdX ) = ΛX (IdX ◦ h−1 ◦ h) = ΛX ′ (h ◦ IdX ◦ h−1 ) = ΛX ′ (IdX ′ ) = χ(X ′ ). The analytic method implicit in Euler’s definition—pass from a topological space (e.g., a compact surface) to a discrete object (in this case a triangulation) that can be analyzed combinatorically and quantitatively—has of course been extremely fruitful. But as a method of proving that the Euler characteristic is a topological invariant, it fails in a spectacular manner. There is first of all the question of 196 CHAPTER 14. TOPOLOGICAL CONSEQUENCES whether a triangulation exists. That a two dimensional compact manifold is triangulable was not proved until the 1920’s, by Rado. In the 1950’s Bing and Moise proved that compact three dimensional manifolds are triangulable, and a stream of research during this same general period showed that smooth manifolds are triangulable, but in general a compact manifold need not have a triangulation. For simplicial complexes topological invariance would follow from invariance under subdivision, which can be proved combinatorically, and the Hauptvermutung, which was the conjecture that any two simplicial complexes that are homeomorphic have subdivisions that are combinatorically isomorphic. This conjecture was formulated by Steinitz and Tietze in 1908, but in 1961 Milnor presented a counterexample, and in the late 1960’s it was shown to be false even for triangulable manifolds. The Lefschetz fixed point theorem is a generalization Brouwer’s theorem that was developed by Lefschetz for compact manifolds in Lefschetz (1923, 1926) and extended by him to manifolds with boundary in Lefschetz (1927). Using quite different methods, Hopf extended the result to simplicial complexes in Hopf (1928). Definition 14.1.3. If X is a compact ANR and F : X → X is an upper semicontinuous contractible valued correspondence, the Lefschetz number of F is ΛX (F ). Theorem 14.1.4. If X is a compact ANR, F : X → X is an upper semicontinuous contractible valued correspondence and ΛX (F ) 6= 0, then F P(F ) 6= ∅. Proof. When F P(F ) = ∅ two applications of Additivity give Λ(F |∅ ) = Λ(F ) = Λ(F |∅ ) + Λ(F |∅). In Lefschetz’ originally formulation the Lefschetz number of a function was defined using algebraic topology. Thus one may view the Lefschetz fixed point theorem as a combination of the result above and a formula expressing the Lefschetz number in terms of homology. In the Kakutani fixed point theorem, the hypothesis that the correspondence is convex valued cries out for generalization, because convexity is not a topological concept that is preserved by homeomorphisms of the space. The Eilenberg-Montgomery theorem asserts that if X is a compact acyclic ANR, and F : X → X is an upper semicontinuous acyclic valued correspondence, then F has a fixed point. Unfortunately it would take many pages to define acyclicity, so we will simply say that acyclicity is a property that is invariant under homeomorphism, and is weaker than contractibility. The known examples of spaces that are acyclic but not contractible are not objects one would expect to encounter “in nature,” so it seems farfetched that the additional strength of the Eilenberg-Montgomery theorem, beyond that of the result below, will ever figure in economic analysis. Theorem 14.1.5. If X is a nonempty compact absolute retract and F : X → X is an upper semicontinuous contractible valued correspondence, then F has a fixed point. 14.2. THE HOPF THEOREM 197 Proof. Recall (Proposition 7.5.3) that an absolute retract is an ANR that is contractible. Theorem 9.1.1 implies that F can be approximated in the sense of Continuity by a continuous function, so ΛX (F ) = ΛX (f ) for some continuous f : X → X. Let c : X × [0, 1] → X be a contraction. Then (x, t) 7→ c(f (x), t) (or (x, t) → f (c(x, t))) is a homotopy between f and a constant function, so Homotopy [fix this] and Normalization imply that ΛX (f ) = 1. Now the claim follows from the last result. 14.2 The Hopf Theorem Two functions that are homotopic may differ in their quantitative features, but from the perspective of topology these differences are uninteresting. Two functions that are not homotopic differ in some qualitative way that one may hope to characterize in terms of discrete objects. A homotopy!invariant may be thought of as a function whose domain is the set of homotopy classes; equivalently, it may be thought of as a mapping from a space of functions that is constant on each homotopy class. A fundamental method of topology is to define and study homotopy invariants. The degree is an example: for compact manifolds M and N of the same dimension it assigns an integer to each continuous f : M → N, and if f and f ′ are homotopic, then they have the same degree. There are a great many other homotopy invariants, whose systematic study is far beyond our scope. In the study of such invariants, one is naturally interested in settings in which some invariant (or collection of invariants) gives a complete classification, in the sense that if two functions are not homotopic, then the invariant assigns different values to them. The prototypical result of this sort, due to Hopf, asserts that the degree is a complete invariant when N is the m-sphere. Theorem 14.2.1 (Hopf). If M is an m-dimensional compact connected smooth manifold, then two maps f, f ′ : M → S m are homotopic if and only if deg(f ) = deg(f ′ ). We provide a rather informal sketch of the proof. Since the ideas in the argument are geometric, and easily visualized, this should be completely convincing, and little would be gained by adding more formal details of particular constructions. We already know that two homotopic functions have the same degree, so our goal is to show that two functions of the same degree are homotopic. Consider a particular f : M → S m . The results of Section 10.7 imply that CS (M, S m ) is locally path connected, and that C ∞ (M, S m ) is dense in this space, so f is homotopic to a smooth function. Suppose that f is smooth, and that q is a regular value of f . (The existence of such a q follows from Sard’s theorem.) The inverse function theorem implies that if D is a sufficiently small disk in S m centered at q, then f −1 (D) is a collection of pairwise disjoint disks, each containing one element of f −1 (q). Let q− be the antipode of q in S m . (This is −q when S m is the unit sphere centered at the origin in Rm+1 .) Let j : S m × [0, 1] → S m be a homotopy with j0 = IdS m that stretches D until it covers S m , so that j1 maps the boundary of D and everything outside D to q− . Then f = j0 ◦ f is homotopic to j1 ◦ f . 198 CHAPTER 14. TOPOLOGICAL CONSEQUENCES We have shown that the f we started with is homotopic to a function with the following description: there are finitely many pairwise disjoint disks in M, everything outside the interiors of these disks is mapped to q− , and each disk is mapped bijectively (except that all points in the boundary are mapped to q− ) to S m . We shall leave the peculiarities of the case m = 1 to the reader: when m ≥ 2, it is visually obvious that homotopies can be used to move these disks around freely, so that two maps satisfying this description are homotopic if they have the same number of disks mapped onto S m in an orientation preserving manner and the same number of disks in which the mapping is orientation reversing. The final step in the argument is to show that a disk in which the orientation is positive and a disk in which the orientation is negative can be “cancelled,” so that the map is homotopic to a map satisfying the dsecription above, but with one fewer disk of each type. Repeating this cancellation, we eventually arrive at a map in which the mapping is either orientation preserving in all disks or orientation reversing in all disks. Thus any map is homotopic to a map of this form, and any two such maps with the same number of disks of the same orientation are homotopic. Since the number of disks is the absolute value of the degree, and the maps are orientation preserving or orientation reversing according to whether the degree is positive or negative, we conclude that maps of the same degree are homotopic. For the cancellation step it is best to adopt a concrete model of the domain and range. We will think of S m as the unit disk D m = { x ∈ Rm : kxk ≤ 1 } with the boundary points identified with a single point, which will continue to be denoted by q− . We will think of Rm as representing an open subset of M containing two disks that are mapped with opposite orientation. Let e1 = (1, 0, . . . , 0) ∈ Rm . After sliding the disks around, expanding or contracting them, and revising the maps on their interiors, we can achieve the following specific f : Rm → S m : kx − e1 k < 1, x − e1 , f (x) = x − (−e1 ) − 2(hx, e1 i − h−e1 , e1 i)e1 , kx − (−e1 )k < 1, q− , otherwise. Visually, f maps the unit disk centered at e1 to S m preserving orientation, it maps the unit disk centered at −e1 reversing orientation, and everything else goes to q− . We now have the following homotopy: kx − (1 − 2t)e1 k < 1 and x1 ≥ 0, x − (1 − 2t)e1 , ht (x) = x − (1 − 2t)e1 − 2hx, e1 ie1 , kx − (1 − 2t)(−e1 )k < 1 and x1 ≤ 0, q− , otherwise. Of course the first two expressions agree when x1 = 0, so this is well defined and continuous, and h1 (x) = q− for all x. In preparation for an application of the Hopf theorem, we introduce an important concept from topology. If X is a topological space and A ⊂ X, the pair (X, A) has the homotopy!extension property if, for any topological space Y and any function g : (X × {0}) ∪ (A × [0, 1]) → Y , there is a homotopy h : X × [0, 1] → Y such that is an extension of g: h(x, 0) = g(x, 0) for all x ∈ X and h(x, t) = g(x, t) for all (x, t) ∈ A × [0, 1]. 14.2. THE HOPF THEOREM 199 Lemma 14.2.2. The pair (X, A) has the homotopy extension property if and only if (X × {0}) ∪ (A × [0, 1]) is a retract of X × [0, 1]. Proof. If (X, A) has the homotopy extension property, then the inclusion map from (X × {0}) ∪ (A × [0, 1]) to X × [0, 1] has a continuous extension to all of X × [0, 1], which is to say that there is a retraction. On the other hand, if r is a retraction, then for any g there is continuous extension h = g ◦ r. We will only be concerned with the example given by the next result, but it is worth noting that this concept takes on greater power when one realizes that (X, A) has the homotopy extension property whenever X is a simplicial complex and A is a subcomplex. It is easy to prove this if there is only one simplex σ in X that is not in A; either the boundary of σ is contained in A, in which case there is an argument like the proof of the following, or it isn’t, and another very simple construction works. The general case follows from induction because if (X, A) and (A, B) have the homotopy extension property, then so does (X, B). To show this suppose that g : (X × {0}) ∪ (B × [0, 1]) → Y is given. There is a continuous extension h : A × [0, 1] → Y of the restriction of g to (A × {0}) ∪ (B × [0, 1]). The extension of h to all of (X × {0}) ∪ (A × [0, 1]) defined by setting h|X×{0} = g|X×{0} is continuous because it is continuous on X × {0} and A × [0, 1], both of which are closed subsets of X × [0, 1] (here the requirement that A is closed finally shows up) and since (X, A) has the homotopy extension property this h can be further extended to all of X × [0, 1]. Lemma 14.2.3. The pair (D m , S m−1 ) has the homotopy extension property. Proof. There is an obvious retraction r : D m × [0, 1] → (D m × {0}) ∪ (S m−1 × [0, 1]) defined by projecting radially from (0, 2) ∈ Rm × R. We now relate the degree of a map from D m to Rm with what may be thought of as the “winding number” of the restriction of the map to S m−1 . Theorem 14.2.4. If f : D m → Rm is continuous, 0 ∈ / f (S m−1 ), and f˜ : S m−1 → m−1 S is the function x 7→ f (x)/kf (x)k, then deg0 (f ) = deg(f˜). Proof. For k ∈ Z let fk : D m → Rm be the map (r cos θ, r sin θ, x3 , . . . , xm ) 7→ (r cos kθ, r sin kθ, x3 , . . . , xm ). It is easy to see that deg0 (fk ) = k = deg(f |S m−1 ). Now let k = deg(f˜). The Hopf theorem implies that there is a homotopy h̃ : S m−1 × [0, 1] → S m−1 with h̃0 = f˜ and h̃1 = fk |S m−1 . Let h : S m−1 × [0, 1] → Rm be the homotopy with h0 = f |S m−1 and h1 = fk |S m−1 given by h(x, t) = (1 − t)kf (x)k + t h̃(x, t), and extend this to g : (D m × {0}) ∪ (S m−1 × [0, 1]) → Rm by setting g(x, 0) = f (x). The last result implies that g extends to a homotopy j : D m × [0, 1] → Rm . There 200 CHAPTER 14. TOPOLOGICAL CONSEQUENCES is an additional homotopy ℓ : D m × [0, 1] → Rm with ℓ0 = j1 and ℓ1 = fk given by setting ℓ(x, t) = (1 − t)j1 (x) + tfk (x). Note that ℓt |S m−1 = fk |S m−1 for all t. The invariance of degree under degree admissible homotopy now implies that deg(f˜) = k = deg0 (fk ) = deg0 (j1 ) = deg0 (j0 ) = deg0 (f ). 14.3 More on Maps Between Spheres Insofar as spheres are the simplest “nontrivial” (where, in effect, this means noncontractible) topological spaces, it is entirely natural that mathematicians would quickly investigate the application of degree and index theory to these spaces, and to maps between them. There are many results coming out of this research, some of which are quite famous. Our discussion combines some purely topological reasoning with analysis based on concrete examples, and for the latter it is best to agree that S m := { x ∈ Rm+1 : kxk = 1 }. Some of our arguments involve induction on m, and for this purpose we will regard S m−1 as a subset of S m by setting S m−1 = { x ∈ S m : xm+1 = 0 }. Let am : S m → S m be the function am (x) = −x. Two points x, y ∈ S m are said to be antipodal if y = am (x). Regarded topologically, am is a fixed point free local diffeomorphism whose composition with itself is IdS m , and one should expect that all the topological results below involving am and antipodal points should depend only on these properties, but we will not try to demonstrate this (the subject is huge, and our coverage is cursory) instead treating am as an entirely concrete object. Let Em = { (x, y) ∈ S m × S m : y 6= am (x) }. There is a continuous function rm : Em × [0, 1] → S m given by rm (x, y, t) := tx + (1 − t)y . ktx + (1 − t)yk Proposition 14.3.1. Suppose f, f ′ : S m → S n are continuous. If they do not map any point to a pair of antipodal points—that is, f ′ (p) 6= an (f (p)) for all p ∈ S m — then f and f ′ are homotopic. 14.3. MORE ON MAPS BETWEEN SPHERES 201 Proof. Specifically, there is the homotopy h(x, t) = rn (f (x), f ′ (x), t). Consider a continuous function f : S m → S n . If m < n, then f is homotopic to a constant map, and thus rather uninteresting. To see this, first note that the smooth functions are dense in C(S m , S n ), and a sufficiently nearby function does not map any point to the antipode of its image under f , so f is homotopic to a smooth function. So, suppose that f is smooth. By Sard’s theorem, the regular values of f are dense, and since n > m, a regular value is a y ∈ S n with f −1 (y) = ∅. We now have the homotopy h(x, t) = rn (f (x), an (y), t). When m > n, on the other hand, the analysis of the homotopy classes of maps from S m to S n is a very difficult topic that has been worked out for many specific values of m and n, but not in general. We will only discuss the case of m = n, for which the most basic question is the relation between the index and the degree. Theorem 14.3.2. If f : S m → S m is continuous, then Λ(f ) = 1 + (−1)m deg(f ). Proof. Hopf’s theorem (Theorem 14.2.1) implies that two maps from S m to itself are homotopic if they have the same degree, and the index is a homotopy invariant, so if suffices to determine the relationship between the degree and index for a specific instance of a map of each possible degree. We begin with m = 1. For d ∈ Z let f1,d : S 1 → S 1 be the function f1,d (cos θ, sin θ) := (cos dθ, sin dθ). −1 If d > 0, then f1,d (1, 0) consists of d points at which f1,d is orientation preserving, when d = 0 there are points in S 1 that are not in the image of f1,0 , and if d > 0, −1 then f1,d (1, 0) consists of d points at which f1,d is orientation reversing. Therefore deg(f1,d ) = d. Now observe that f1,1 is homotopic to a map without fixed points, while for d 6= 1 the fixed points of f1,d are the points 2πk 2πk , sin d−1 (k = 0, . . . , d − 2). cos d−1 If d > 1, then motion in the domain is translated by f1,d into more rapid motion in the range, so the index of each fixed point is −1. When d < 1, f1,d translates motion in the domain into motion in the opposite direction in the range, so the index of each fixed point is 1. Combining these facts, we conclude that Λ(f1,d ) = 1 − d, which establishes the result when m = 1. Let em+1 = (0, . . . , 0, 1) ∈ Rm+1 . Then S m = { αx + βem+1 : x ∈ S m−1 , α ≥ 0, α2 + β 2 = 1 }. 202 CHAPTER 14. TOPOLOGICAL CONSEQUENCES We define fm,d inductively by the formula fm,d αx + βem+1 = αfm−1,−d (x) − βem+1 . If fm−1,−d is orientation preserving (reversing) at x ∈ S m−1 , then fm,d is clearly orientation reversing (preserving) at x, so deg(fm,d ) = − deg(fm−1,−d ). Therefore, by induction, deg(fm,d ) = d. The fixed points of fm,d are evidently the fixed points of fm−1,−d . Fix such an x. Computing in a local coordinate system, one may easily show that the index of x, as a fixed point of fm,d , is the same as the index of x as a fixed point of fm−1,−d , so Λ(fm,d ) = Λ(fm−1,−d ). By induction, Λ(fm,d ) = Λ(fm−1,−d ) = 1 + (−1)m−1 deg(fm−1,−d ) = 1 + (−1)m deg(fm,d ). Corollary 14.3.3. If a map f : S m → S m has no fixed points, then deg(f ) = (−1)m+1 . If f does not map any point to its antipode, which is to say that am ◦ f has no fixed points, then deg(f ) = 1. Consequently, if f does not map any point either to itself or its antipode, then m is odd. Proof. The first claim follows from Λ(f ) = 0 and the result above. In particular, am has no fixed points, so deg(am ) = (−1)m+1 . The second result now follows from the multiplicative property of the degree of a composition (Corollary 12.4.2): (−1)m+1 = deg(am ◦ f ) = deg(am ) · deg(f ) = (−1)m+1 deg(f ). Proposition 14.3.4. If the map f : S m → S m never maps antipodal points to antipodal points—that is, am (f (p)) 6= f (am (p)) for all p ∈ S m —then deg(f ) is even. If m is even, then deg(f ) = 0. Proof. The homotopy h : S m × [0, 1] → S m given by h(p, t) := rm (f (p), f (am (p)), t) shows that f and f ◦ am are homotopic, whence deg(f ) = deg(f ◦ am ). Corollary 12.4.2 and Corollary 14.3.3 give deg(f ) = deg(f ◦ am ) = deg(f ) deg(am ) = (−1)m+1 deg(f ), and when m is even it follows that deg(f ) = 0. Since f is homotopic to a nearby smooth function, we may assume that it is smooth, in which case each ht is also smooth. Sard’s theorem implies that each ht has regular values, and since h1/2 = h1/2 ◦ am , any regular value of h1/2 has an even number of preimages. The sum of an even number of elements of {1, −1} is even, so it follows that deg(f ) = deg(h1/2 ) is even. Combining this result with the first assertion of Corollary 14.3.3 gives a result that was actually applied to the theory of general economic equilibrium by Hart and Kuhn (1975): 14.3. MORE ON MAPS BETWEEN SPHERES 203 Corollary 14.3.5. Any map f : S m → S m either has a fixed point or a point p such that f (am (p)) = am (f (p)). Of course am extends to the map x 7→ −x from Rm+1 to itself, and in appropriate contexts we will understand it in this sense. If D ⊂ Rm+1 satisfies am (D) = D, a map f : D → Rn+1 is said to be antipodal if f ◦ am |D = an ◦ f. An antipodal map f : S m → S m induces a map from m-dimensional projective space to itself. If you think about it for a bit, you should be able to see that a map from m-dimensional projective space to itself is induced by such an f if and only if it maps orientation reversing loops to orientation reversing loops. The next result seems to be naturally paired with Proposition 14.3.4, but it is actually much deeper. Theorem 14.3.6. If a map f : S m → S m is antipodal, then its degree is odd. Proof. There are smooth maps arbitrarily close to f . For such an f ′ the map p 7→ rm (f ′ (p), −f ′ (−p), 21 ) is well defined, smooth, antipodal, and close to f , so it is homotopic to f and has the same degree. Evidently it suffices to prove the claim with f replaced by this map, so we may assume that f is smooth. Sard’s theorem implies that there is a regular value of f , say q. After rotating S m we may assume that q = (0, . . . , 0, 1) and −q = (0, . . . , 0, −1) are the North and South poles of S m . We would like to assume that (f −1 (q) ∪ f −1 (−q)) ∩ S m−1 = ∅, and we can bring this about by replacing f with f ◦ h where h : S m → S m is an antipodal diffeomorphism than perturbs neighborhoods of the points in f −1 (q) ∪ f −1 (−q) while leaving points far away from these points fixed. (Such an h can easily be constructed using the methods of Section 10.2.) Since a sum of numbers drawn from {−1, 1} is even or odd according to whether the number of summands is even or odd, our goal reduces to showing that f −1 (q) has an odd number of elements. When m = 0 this is established by considering the two antipode preserving maps from S 0 to itself. Proceeding inductively, suppose the result has been established when m is replaced by m − 1. For p ∈ S m , p ∈ f −1 (q) if and only if −p ∈ f −1 (−q), because f is antipodal, so the number of elements of f −1 (q) ∪ f −1 (−q) is twice the number of elements of f −1 (q). Let S+m := { p ∈ S m : pm+1 ≥ 0 } and S−m := { p ∈ S m : pm+1 ≤ 0 } be the Northern and Southern hemispheres of S m . Then p ∈ S+m if and only if −p ∈ S−m , so S+m contains half the elements of f −1 (q) ∪ f −1 (−q). Thus it suffices to show that (f −1 (q) ∪ f −1 (−q)) ∩ S+m has an odd number of elements. 204 CHAPTER 14. TOPOLOGICAL CONSEQUENCES For ε > 0 consider the small open and closed disks Dε := { p ∈ S m : pm+1 > 1 − ε } and D ε := { p ∈ S m : pm+1 ≥ 1 − ε } centered at the North pole. Since f is antipode preserving, −q is also a regular value of f . In view of the inverse function theorem, f −1 (Dε ∪ −D ε ) is a disjoint union of diffeomorphic images of D ε , and none of these intersect S m−1 if ε is sufficiently small. Concretely, for each p ∈ f −1 (q) ∪ f −1 (−q) the component C p of f −1 (D ε ∪ −D ε ) containing p is mapped diffeomorphically by f to either Dε or −D ε , and the various C p are disjoint from each other and S m−1 . Therefore we wish to show that f −1 (D ε ∪ −D ε ) ∩ S+m has an odd number of components. Let M = S+m \ f −1 (Dε ∪ −Dε ). Clearly M is a compact m-dimensional smooth ∂-manifold. Each point in S m \ {q, −q} has a unique representation of the form αy + βq where y ∈ S m−1 , 0 < α ≤ 1, and α2 + β 2 = 1. Let j : S m \ {q, −q} → S m−1 be the function j αy + βq := y, and let g := j ◦ f |M : M → S m−1 . Sard’s theorem implies that some q ∗ ∈ S m−1 is a regular value of both g and g|∂M . Theorem 12.2.1 implies that degq∗ (g|∂M ) = 0, so (g|∂M )−1 (q ∗ ) has an even number of elements. Evidently g maps the boundary of each Cp diffeomorphically onto S m−1 , so each such boundary contains exactly one element of (g|∂M )−1 (q ∗ ). In addition, j maps antipodal points of S m \ {q, −q} to antipodal ponts of S m−1 , so g|S m−1 is antipodal, and our induction hypothesis implies that (g|∂M )−1 (q ∗ ) ∩ S m−1 has an odd number of elements. Therefore the number of components of f −1 (D ε ∪ −D ε ) contained in S+m is odd, as desired. The hypotheses can be weakened: Corollary 14.3.7. If the map f : S m → S m satisfies f (−p) 6= f (p) for all p, then the degree of f is odd. Proof. This will follow from the last result once we have shown that f is homotopic to an antipodal map. Let h : S m × [0, 1] → S m be the homotopy h(p, t) = rm (f (p), −f (−p), 2t). The hypothesis implies that this is well defined, and h1 is antipodal. This result has a wealth of geometric consequences. Theorem 14.3.8 (Borsuk-Ulam Theorem). The following are true: (a) If f : S m → Rm is continuous, then there is a p ∈ S m such that f (p) = f (am (p)). (b) If f : S m → Rm is continuous and antipodal, then there is a p ∈ S m such that f (p) = 0. (c) There is no continuous antipodal f : S m → S m−1 . (d) There is no continuous g : D m = { (y1, . . . , ym , 0) ∈ Rm+1 : kyk ≤ 1 } → S m−1 such that g|S m−1 is antipodal. 14.3. MORE ON MAPS BETWEEN SPHERES 205 (e) Any cover F1 , . . . , Fm+1 of S m by m + 1 closed sets has a least one set that contains a pair of antipodal points. (f ) Any cover U1 , . . . , Um+1 of S m by m + 1 open sets has a least one set that contains a pair of antipodal points. Proof. We think of Rm as S m with a point removed, so a continuous f : S m → Rm amounts to a function from S m to itself that is not surjective, and whose degree is consequently zero. Now (a) follows from the last result. Suppose that f : S m → Rm is continuous and f (p) = f (−p). If f is also antipodal, then f (−p) = −f (p) so f (p) = 0. Thus (a) implies (b). Obviously (b) implies (c). Let π : p 7→ (p1 , . . . , pm , 0) be the standard projection from Rm+1 to Rm . As in the proof of Theorem 14.3.6 let S+m and S−m be the Northern and Southern hemispheres of S m . If g : D m → S m−1 was continuous and antipodal, we could define a continuous and antipodal f : S m → S m−1 by setting ( g(π(p)), p ∈ S+m , f (p) = g(π(am (p))), p ∈ S−m . Thus (c) implies (d). Suppose that F1 , . . . , Fm+1 is a cover of S m by closed sets. Define f : S m → Rm by setting f (p) = d(x, F1 ), . . . , d(x, Fm ) where d(x, x′ ) = kx − x′ k is the usual metric for Rm+1 . Suppose that f (p) = f (−p) = y. If yi = 0, then p, −p ∈ Fi , and if all the components of y are nonzero, then p, −p ∈ Fm+1 . Thus (a) implies (e). Suppose U1 , . . . , Um+1 is a cover of S m by open sets and ε > 0. For i = 1, . . . , m+ 1 set Fi := { p ∈ S m : d(p, S m \ Ui ) ≥ ε }. Then each Fi is a closed subset of Ui , and these sets cover S m if ε is sufficiently small. Thus (e) implies (f). In the argument above we showed that (a) ⇒ (b) ⇒ (c) ⇒ (d) and (a) ⇒ (e) ⇒ (f). There are also easy arguments for the implications (d) ⇒ (c) ⇒ (b) ⇒ (a) and (f) ⇒ (e) ⇒ (c), so (a)-(f) are equivalent in the sense of each being an elementary consequence of each other. The proofs that (d) ⇒ (c) and (c) ⇒ (b) are obvious and can be safely left to the reader. To show that (b) ⇒ (a), for a given continuous f : S m → Rm we apply (b) to f − f ◦ am . To show that (f) ⇒ (e) observe that if F1 , . . . , Fm+1 are closed and cover S m , then for each n the sets U1/n (Fi ) are open and cover S m , so there is a pn with pn , −pn ∈ U1/n (Fi ) for some i. Any limit point of the sequence {pn } has the desired property. The proof that (e) ⇒ (c) is more interesting. Consider an m-simplex that is embedded in D m with the origin in its interior. Let F1 , . . . , Fm+1 be the radial projections of the facets of the simplex onto S m−1 . These sets are closed and cover S m−1 , and since each facet is separated from the origin by a hyperplane, each Fi does not contain an antipodal pair of points. If f : S m → S m−1 is continuous, then f −1 (F1 ), . . . , f −1 (Fm+1 ) are a cover of S m by closed sets, and (e) implies the 206 CHAPTER 14. TOPOLOGICAL CONSEQUENCES existence of p, −p ∈ f −1 (Fi ) for some i. If f was also antipodal, then f (p), f (−p) = −f (p) ∈ Fi , which is impossible. As a consequence of the Borsuk-Ulam theorem, the following “obvious” fact is actually highly nontrivial. Theorem 14.3.9. Spheres of different dimensions are not homeomorphic. Proof. If k < m then, since S k can be embedded in Rm , part (a) of the Borsuk-Ulam theorem implies that a continuous function from S m to S k cannot be injective. 14.4 Invariance of Domain The main result of this section, invariance of domain, is a famous result with numerous applications. It can be thought of as a purely topological version of the inverse function theorem. However, before that we give an important consequences of the Borsuk-Ulam theorem for Euclidean spaces. Theorem 14.4.1. Euclidean spaces of different dimensions are not homeomorphic. Proof. If k 6= m and f : Rk → Rm was a homeomorphism, for any sequence {xj } in Rk with {xj } → ∞ the sequence {f (xj )} could not have a convergent subsequence, so kf (xj )k → ∞. Identifying Rk and Rm with S k \ {ptk } and S m \ {ptm }, the extension of f to S k given by setting f (ptk ) = ptm would be continuous, with a continuous inverse, contrary to the last result. The next two lemmas develop the proof of this section’s main result. Lemma 14.4.2. Suppose S+m is the Northern hemisphere of S m , f : S+m → S m is a map such that f |S m−1 is antipodal, and p ∈ S+m \ S m−1 is a point such that −p ∈ / f (S+m ) and p ∈ / f (S m−1 ). Then degp (f ) is odd. Proof. Let f˜ : S m → S m be the extension of f given by setting f˜(p) = −f (−p) when pm+1 < 0. Clearly f˜ is continuous and antipodal, so its degree is odd. The hypotheses imply that f˜−1 (p) ⊂ S+m \ S m−1 , and that f is degree admissible over p, so Additivity implies that degp (f ) = degp (f˜). Lemma 14.4.3. If f : D m → Rm is injective, then degf (0) (f ) is odd, and f (D m ) includes a neighborhood of f (0). Proof. Replacing f with x 7→ f (x) − f (0), we may assume that f (0) = 0. Let h : D m × [0, 1] → Rm be the homotopy x h(x, t) := f ( 1+t ) − f ( −tx ). 1+t Of course h0 = f and h1 is antipodal. If ht (x) = 0 then, because f is injective, x = −tx, so that x = 0. Therefore h is a degree admissible homotopy over zero, so deg0 (h0 ) = deg0 (h1 ), and the last result implies that deg0 (h1 ) is odd, so deg0 (h0 ) = deg0 (f ) is odd. The Continuity property of the degree implies that degy (f ) is odd for all y in some neighborhood of f (0). Since, by Additivity, degy (f ) = 0 whenever y∈ / f (D m ), we conclude that f (D m ) contains a neighborhood of 0. 14.5. ESSENTIAL SETS REVISITED 207 The next result is quite famous, being commonly regarded as one of the major accomplishments of algebraic topology. As the elementary nature of the assertion suggests, it is applied quite frequently. Theorem 14.4.4 (Invariance of Domain). If U ⊂ Rm is open and f : U → Rm is continuous and injective, then f (U) is open and f is a homeomorphism onto its image. Proof. The last result can be applied to a closed disk surrounding any point in the domain, so for any open V ⊂ U, f (V ) is open. Thus f −1 is continuous. 14.5 Essential Sets Revisited Let X be a compact ANR, let C ⊂ X be compact, and let f : C → X be an index admissible function. If Λ(f ) 6= 0, then the set of fixed points is essential. What about a converse? More specifically, if Λ(f ) = 0, then of course f may have fixed points, but is f necessarily index admissible homotopic to a function without fixed points? When C = X, so that Λ(f ) = L(f ), this question amounts to a request for conditions under which a converse of the Lefschetz fixed point theorem holds. We can also ask whether a somewhat more demanding condition holds: does every neighborhood of f in C(C, X) contains a function without any fixed points? If C is not connected, the answers to these questions are obtained by combining the answers obtained when this question is applied to the restrictions of f to the various connected components of C, so we should assume that C is connected. If C1 , . . . , Cr are pairwise disjointP subsets of C with F P(f ) contained in the interior of C1 ∪ . . . ∪ Cr , then Λ(f ) = i Λ(f |Ci ), and of course when r > 1 it can easily happen that Λ(f |Ci ) 6= 0 for some i even though the sum is zero. Therefore we should assume that F P(f ) is also connected. Our goal is to develop conditions under which a connected set of fixed points with index zero can be “perturbed away,” in the sense that there is a nearby function or correspondence with no fixed points near that set. Without additional assumptions, there is little hope of achieving positive answers. For the general situation in which the space is an ANR, the techniques we develop below would lead eventually to composing a perturbation with a retraction, and it is difficult to prevent the retraction from introducing undesired fixed points. An approach to this issue for simplicial complexes is developed in Ch. VIII of Brown (1971). Our attention is restricted to the following settings: a) X is a “well behaved” subset of a smooth manifold; b) X is a compact convex subset of a Euclidean space. The gist of the argument used to prove these results is to first approximate with a smooth function that has only regular fixed point, which are necessarily finite and can be organized in pairs of opposite index, then perturb to eliminate each pair. Proposition 14.5.1. If g : D m → Rm is continuous, 0 ∈ / g(S m−1 ), and deg0 (g) = 0, then there is a continuous ĝ : D m → Rm \ {0} with ĝ|S m−1 = g|S m−1 . Proof. Let g̃ : S m−1 → S m−1 be the function g̃(x) = g(x)/kg(x)k. Theorem 14.2.4 implies that deg(g̃) = 0, so the Hopf theorem implies that there is a homotopy 208 CHAPTER 14. TOPOLOGICAL CONSEQUENCES h : S m−1 × [0, 1] → S m−1 with h0 = g̃ and h1 a constant function. For (x, t) ∈ S m−1 × [0, 1] we set ĝ(tx) = tkg(x)k + (1 − t) h1−t (x). If (xr , tr ) is a sequence with tr → 0, then g(tr , xr ) converges to the constant value of h1 , so this is well defined and continuous. For x ∈ S m−1 we have g(x) 6= 0, so the origin is not in the image of ĝ, and ĝ(x) = kg(x)kh0 (x) = g(x). The first of this section’s principal results is as follows. Theorem 14.5.2. Let M be a smooth C r manifold, where 2 ≤ r ≤ ∞, let X ⊂ M be a compact ANR for which there is a homotopy h : X × [0, 1] → X such that h0 = IdX and, for each t > 0, ht : X → ht (X) is a homeomorphism whose image ht (X) is contained in the interior of X. Let C be a compact subset of X, and let f : C → X be an index admissible function. If F P(f ) is connected and Λ(f ) = 0, then F P(f ) is an inessential set of fixed points. Proof. It suffices to show that for a given open W ⊂ C × X containing the graph of f there is a continuous f ′ : C → X with F P(f ′) = ∅. We have Gr(ht ◦ f ) ⊂ W for small t > 0, so it suffices to prove the result with f replaced by ht ◦ f , which means that we may assume that the image of f is contained in the interior of X. Recall that Proposition 10.7.8 gives a continuous function λ : M → (0, ∞) and a C r−1 function κ : Vλ → M, where Vλ = { (p, v) ∈ RM : kvk < λ(p) }, such that κ(p, 0) = p for all p ∈ M and κ̃ = π × κ : Vλ → M × M is a C r−1 embedding, where π : T M → M is the projection. Let Ṽλ = κ̃(Vλ ). Let Y0 = { p ∈ C : (p, f (p)) ∈ Ṽλ }; of course this is an open set containing F P(f ). Let Y1 and Y2 be open sets such that F P(f ) ⊂ Y2 , Y 2 ⊂ Y1 , Y 1 ⊂ Y0 , and Y2 is path connected. (Such a Y2 can be constructed by taking a finite union of images of D m under C r parameterizations.) We can define a vector field ζ on a neighborhood of Y0 by setting ζ(p) = κ̃−1 (p, f (p)). Proposition 11.5.2 and Corollary 10.2.5 combine to imply that there is a vector field ζ̃ on Y0 with image contained in κ̃−1 (W ) that agrees with ζ on Y0 \ Y1 , is C r−1 on Y2 , and has only regular equilibria, all of which are in Y2 . The number of equilibria is necessarily finite, and we may assume that, among all the vector fields on Y0 that agree with ζ on Y0 \ Y1 , are C r−1 on Y2 , and have only regular equilibria in Y2 , ζ̃ minimizes this number. If ζ̃ has no equilibria, then we may define a continuous function f˜ : C → X without any fixed points whose graph is contained in W by setting f˜(p) = κ(ζ̃(p)) if p ∈ Y0 and setting f˜(p) = f (p) otherwise. Aiming at a contradiction, suppose that ζ̃ has equilibria. Since the index is zero, there must be two equilibria of opposite index, say p0 and p1 , and it suffices to show that we can further perturb ζ̃ in a way that eliminates both of them. There is a C r embedding γ : (−ε, 1 + ε) → Upp with γ(0) = p0 and γ(1) = p1 . 14.5. ESSENTIAL SETS REVISITED 209 (This is obvious, but painful to prove formally, and in addition the case m = 1 requires special treatment. A formal verification would do little to improve the reader’s understanding, so we omit the details.) Applying the tubular neighborhood theorem, this path can be used to construct a C r parameterization ϕ : Z → U where Z ⊂ Rm is a a neighborhood of D m . Let g : Z → Rm be defined by setting g(x) = Dϕ(x)−1 ζϕ(x) . Proposition 14.5.1 gives a continuous function ĝ : Z → Rm \ {0} that agrees with g on the closure of Z \ D m . We extend ĝ to all of Z by setting ĝ(x) = g(x) if x ∈ / D m . Define a new vector field ζ̂ on ϕ(Z) by setting ζ̂(p) = Dϕ(ϕ−1 (p))ĝ(ϕ−1 (p)). There are two final technical points. In order to insure that ζ(p) ∈ κ̃−1 (W ) for all p we can first multiplying ĝ by a C r function β : D m → (0, 1] that is identically 1 on Z \ D m and close to zero in the interior of D m outside of some neighborhood of S m−1 . We can also use Proposition 11.5.2 and Corollary 10.2.5 to further perturb ζ̂ to make is C r−1 without introducing any additional equilibria. This completes the construction, thereby arriving at a contradiction that completes the proof. Economic applications call for a version of the result for correspondences. Ideally one would like to encompass contractible valued correspondences in the setting of a manifold, but the methods used here are not suitable. Instead we are restricted to convex valued correspondences, and thus to settings where convexity is defined. Theorem 14.5.3. If X ⊂ Rm is compact and convex, C ⊂ X is compact, F : C → X is an index admissible upper semicontinuous convex valued correspondence, Λ(F ) = 0, and F P(F ) is connected, then F is inessential. Caution: The analogous result does not hold for essential sets of Nash equilibria, which are defined by Jiang (1963) in terms of perturbations of the game’s payoffs. Hauk and Hurkens (2002) give an example of a game with a component of the set of Nash equilibria that has index zero but is robust with respect to perturbations of payoffs. Proof. Let W ⊂ C ×X be an open set containing the graph of F . We will show that there is a continuous f : C → X with Gr(f ) ⊂ W and F P(f ) = ∅. Let x0 be a point in the interior of X, let h : X ×[0, 1] → X be the contraction h(x, t) = (1−t)x+tx0 , and for t ∈ [0, 1] let ht ◦F be the correspondence x 7→ ht (F (x)). This correspondence is obviously upper semicontinuous and convex valued, and Gr(ht ◦F ) ⊂ W for small t > 0, so it suffices to prove the result with F replaced by ht ◦F for such t. Therefore we may assume that the image of F is contained in the interior of X. For each x ∈ F P(F ) we choose convex neighborhoods Yx ⊂ C of x and Zx ⊂ X of F (x) such that Yx ⊂ Zx and Yx ×Zx ⊂ W . Choose x1 , . . . , xk such that F P(F ) ⊂ Yx1 ∪ . . . ∪ Yxk , and let Y 0 = Y x1 ∪ . . . ∪ Y xk and Z0 = (Yx1 × Zx1 ) ∪ . . . ∪ (Yxk × Zxk ). Note that for all (x, y) ∈ Z0 , Z0 contains the line segment { (x, (1 − t)y + tx) }. Let Y1 and Y2 be open subsets of C with F P(F ) ⊂ Y2 , Y 2 ⊂ Y1 , Y 1 ⊂ Y0 , and Y2 is 210 CHAPTER 14. TOPOLOGICAL CONSEQUENCES path connected. Let α : C → [0, 1] be a C ∞ function that is identically one on Y 2 and identically zero on C \ Y1 . Let W0 = Z0 ∪ (W ∩ ((C \ Y1 ) × X)) \ { (x, x) : x ∈ C \ Y2 }. This is an open set that contains the graph of F , so Proposition 10.2.7 implies that there is a C ∞ function f : C → X with Gr(f ) ⊂ W0 that has only regular fixed points. We assume that among all functions with these properties, f is minimal for the number of fixed points. There is some ε > 0 such that {x} × Uε (x) ⊂ Z0 for all x ∈ Y 2 . For any δ ∈ (0, 1] the function f ′ : x 7→ (1 − α(x))f (x) + α(x + δ(f (x) − x)) is C ∞ , its graph is contained in W0 , and it has only regular fixed points. If δ > 0 is sufficiently small, then f ′ (x) ∈ Uε (x) for all x ∈ Y 2 . Therefore we may assume that f (x) ∈ Uε (x) for all x ∈ Y 2 . Define a function ζ : Y2 → Rm by setting ζ(x) = f (x) − x. Aiming at a contradiction, suppose that ζ has zeros. Since the Λ(f ) = 0, there must be two zeros of opposite index, say x0 and x1 . As in the last proof, there is a C r embedding γ : (−ε, 1 + ε) → Y2 with γ(0) = x0 and γ(1) = x1 . Applying the tubular neighborhood theorem, this path can be used to construct a C ∞ parameterization ϕ : T → Y2 where T ⊂ Rm is a neighborhood of D m . Let g : T → Rm be defined by setting g(x) = Dϕ(x)−1 ζϕ(x) . Proposition 14.5.1 gives a continuous function ĝ : T → Rm \ {0} that agrees with g on the closure of T \ D m . We extend ĝ to all of T by setting ĝ(x) = g(x) if x ∈ / D m . Define a new vector field ζ̂ on ϕ(T ) by setting ζ̂(p) = Dϕ(ϕ−1 (p))ĝ(ϕ−1 (p)). There are two final technical points. In order to insure that kζ̂(x)k < ε for all p we can first multiply ĝ by a C ∞ function β : D m → (0, 1] that is identically 1 on T \ D m and close to zero in the interior of D m outside of some neighborhood of S m−1 . We can also use Proposition 11.5.2 and Corollary 10.2.5 to further perturb ζ̂ to make it C ∞ without introducing any additional zeros. We can now define a function f ′ : C → X by setting f (x) = x + ζ̂(x) if x ∈ T and f ′ (x) = f (x) otherwise. Since f ′ has all the properties of f , and two fewer fixed points, this is a contradiction, and the proof is complete. Chapter 15 Vector Fields and their Equilibria Under mild technical conditions, explained in Sections 15.1 and 15.2, a vector field ζ on a manifold M determines a dynamical system. That is, there is a function Φ : W → M, where W ⊂ M × R is a neighborhood of W × {0}, such that the derivative of Φ at (p, t) ∈ W , with respect to time, is ζΦ(p,t) . In this final chapter we develop the relationship between the fixed point index and the stability of rest points, and sets of rest points, of such a dynamical system. In addition to the degree and the fixed point index, there is a third expression of the underlying mathematical principle for vector fields. In Section 15.3 we present an axiomatic description of the vector field index, paralleling our axiom systems for the degree and fixed point index. Existence and uniqueness are established by showing that the vector field index of ζ|C , for suitable compact C ⊂ M, agrees with the fixed point index of Φ(·, t)|C for small negative t. Since we are primarily interested in forward stability, it is more to the point to say that the fixed point index of Φ(·, t)|C for small positive t agrees with the vector field index of −ζ|C . The notion of stability we focus on, asymptotic stability, has a rather complicated definition, but the intuition is simple: a compact set A is asymptotically stable if the trajectory of each point in some neighborhood of A is eventually drawn into, and remains inside, arbitrarily small neighborhoods of A. In order to use the fixed point index to study stability, we need to find some neighborhood of such an A that is mapped into itself by Φ(·, t) for small positive t. The tool we use to achieve this is the converse Lyapunov theorem, which asserts that if A is asymptotically stable, then there is a Lyapunov function for ζ that is defined on a neighborhood of A. Unlike the better known Lyapunov theorem, which asserts that the existence of a Lyapunov function implies asymptotic stability, the converse Lyapunov theorem is a more recent and difficult result. We prove a version of it that is sufficient for our needs in Section 15.5. Once all this background material is in place, it will not take long to prove the culminating result, that if A is a asymptotically stable, and an ANR, then the vector field index of −ζ is the Euler characteristic of A. This was proved in the context of a game theoretic model by Demichelis and Ritzberger (2003). The special case of A being a singleton is a prominent result in the theory of dynamical systems, due to Krasnosel’ski and Zabreiko (1984): if an isolated rest point is asymptotically stable for ζ, then the vector field index of that point for −ζ is 1. 211 212 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA Paul Samuelson advocated a “correspondence principle” in two papers Samuelson (1941, 1942) and his famous book Foundations of Economic Analysis Samuelson (1947). The idea is that the stability of an economic equilibrium, with respect to natural dynamics of adjustment to equilibrium, implies certain qualitative properties of the equilibrium’s comparative statics. There are 1-dimensional settings in which this idea is regarded as natural and compelling, but Samuelson’s writings discuss many examples without formulating it as a general theorem, and its nature and status in higher dimensions has not been well understood; Echenique (2008) provides a concise summary of the state of knowledge and related literature. The book concludes with an explanation of how the Krasnosel’ski-Zabreiko theorem allows the correspondence principle to be formulated in a precise and general way. 15.1 Euclidean Dynamical Systems We begin with a review of the theory of ordinary differential equations in Euclidean space. Let U ⊂ Rm be open, and let z : U → Rm be a function, thought of as a vector field. A trajectory of z is a C 1 function γ : (a, b) → U such that γ ′ (s) = zγ(s) for all s. Without additional assumptions the dynamics associated with z need not be deterministic: there can be more than one trajectory for the vector field satisfying an initial condition that specifies the position of the trajectory at a particular moment. For example, suppose that m = 1, U = R, and ( 0, t ≤ 0, √ z(t) = 2 t, t > 0. Then for any s0 there is a trajectory γs0 : R → M given by ( 0, s ≤ s0 , γs0 (s) = 2 (s − s0 ) , s > s0 . For most purposes this sort of indeterminacy is unsatisfactory, so we need to find a condition that implies that for any initial condition there is a unique trajectory. Let (X, d) and (X ′ , d′ ) be metric spaces. A function f : X → X ′ is Lipshitz if there is a constant L > 0 such that d′ (f (x), f (y)) ≤ Ld(x, y) for all x, y ∈ X. We say that f is locally Lipschitz if each x ∈ X has a neighborhood U such that f |U is Lipschitz. The basic existence-uniqueness result for ordinary differential equations is: Theorem 15.1.1 (Picard-Lindelöf Theorem). Suppose that U ⊂ Rm is open, z : U → Rm is locally Lipschitz, and C ⊂ U is compact. Then for sufficiently small ε > 0 there is a unique function F : C × (−ε, ε) → U such that for each x ∈ C, F (x, 0) = x and F (x, ·) is a trajectory of z. In addition F is continuous, and if z is C s (1 ≤ s ≤ ∞) then so is F . 213 15.2. DYNAMICS ON A MANIFOLD Due to its fundamental character, a detailed proof would be out of place here, but we will briefly describe the central ideas of two methods. First, for any ∆ > 0 one can define a piecewise linear approximate solution going forward in time by setting F∆ (x, 0) = x and inductively applying the equation F∆ (x, t) = F∆ (x, k∆) + (t − k∆) · z(F∆ (x, k∆)) for k∆ < t ≤ (k + 1)∆. Concrete calculations show that this collection of functions has a limit as ∆ → 0, that this limit is continuous and satisfies the differential equation (∗), and also that any solution of (∗) is a limit of this collection. These calculations give precise information concerning the accuracy of the numerical scheme for computing approximate solutions described by this approach. The second proof scheme uses a fixed point theorem. It considers the mapping F 7→ F̃ given by the equation F̃ (x, t) = x + Z t z(F (x, s)) ds. 0 This defines a function from C(C × [−ε, ε], U) to C(C × [−ε, ε], Rm ). As usual, the range is endowed with the supremum norm. A calculation shows that if ε is sufficiently small, then the restriction of this function to a certain neighborhood of the function (x, t) 7→ x is actually a contraction. Since C(C × [−ε, ε], Rm ) is a complete metric space, the contraction mapping theorem gives a unique fixed point. Additional details can be found in Chapter 5 of Spivak (1979) and Chapter 8 of Hirsch and Smale (1974). 15.2 Dynamics on a Manifold Throughout this chapter we will work with a fixed order of differentiability 2 ≤ r ≤ ∞ and an m-dimensional C r manifold M ⊂ Rk . Recall that if S is a subset of M, a vector field on S is a continuous function ζ : S → T M such that π ◦ ζ = IdS , where π : T M → M is the projection (p, v) 7→ p. We write ζ(p) = (p, ζp ), so that ζ is thought of as “attaching” a tangent vector ζp to each p ∈ S, in a continuous manner. A trajectory of ζ is a C 1 function γ : (a, b) → S such that γ ′ (s) = ζγ(s) for all s. We wish to transport the Picard-Lindelöf theorem to M. To this end, we study how vector fields and their associated dynamical systems are transformed by changes of coordinates. In addition to the vector field ζ on M, suppose that N ⊂ Rℓ is a second C r manifold and h : M → N is a C r diffeomorphism. Let η be the vector field on N defined by ηq = Dh(h−1 (q))ζh−1(q) . (∗) This formula preserves the dynamics: Lemma 15.2.1. The curve γ : (a, b) → M is a trajectory of ζ if and only if h ◦ γ is a trajectory of η. 214 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA Proof. For each s the chain rule gives (h ◦ γ)′ (s) = Dh(γ(s))γ ′ (s) and Dh(γ(s)) is a linear isomorphism because h is a diffeomorphism. In our main application of this result N will be an open subset of Rm , to which Theorem 15.1.1 can be applied. The given ζ will be locally Lipschitz, and it should follow that η is also locally Lipschitz. Insofar as M is a subset of Rk , T M inherits a metric, which gives meaning to the assumption that ζ is locally Lipschitz, but this is a technical artifice, and it would be troubling if our concepts depended on this metric in an important way. One consequence of the results below is that different embeddings of M in a Euclidean space give rise to the same class of locally Lipschitz vector fields. Lemma 15.2.2. If U ⊂ Rm is open and f : U → Rn is C 1 , then f is locally Lipschitz. Proof. Consider a point x ∈ U. There is an ε > 0 such that the closed ball B of radius ε centered at x is contained in U. Let L := maxy∈B kDf (y)k. Since B is convex, for any y, z, ∈ B we have Z 1 kf (z) − f (y)k = k Df (y + t(z − y))(z − y) dtk 0 ≤ Z 0 1 kDf (y + t(z − y))k · kz − yk dt ≤ Lkz − yk. Lemma 15.2.3. A composition of two Lipschitz functions is Lipschitz, and a composition of two locally Lipschitz functions is locally Lipschitz. Proof. Suppose that f : X → X ′ is Lipschitz, with Lipschitz constant L, that (X ′′ , d′′ ) is a third metric space, and that g : X ′ → X ′′ is Lipschitz with Lipschitz constant M. Then d′′ (g(f (x)), g(f (y))) ≤ Md′ (f (x), f (y)) ≤ LMd(x, y) for all x, y ∈ X, so g ◦ f is Lipschitz with Lipschitz constant LM. Now suppose that f and g are only locally Lipschitz. For any x ∈ X there is a neighborhood U of x such that f |U is Lipschitz and a neighborhood V of f (x) such that g|V is Lipschitz. Then f |U ∩f −1 (V ) is Lipschitz, and, by continuity, U ∩ f −1 (V ) is a neighborhood of x. Thus g ◦ f is locally Lipschitz. In preparation for the next result we note the following immediate consequences of equation (∗): Dh(p)ζp = ηh(p) and Dh−1 (q)ηq = ζh−1 (q) for all p ∈ M and q ∈ h(M). We also note that for everything we have done up to this point it is enough that r ≥ 1, but the following result depends on r being at least 2. 15.2. DYNAMICS ON A MANIFOLD 215 Lemma 15.2.4. ζ is locally Lipschitz if and only if η is locally Lipschitz. Proof. Suppose that ζ is locally Lipschitz. For any p, p′ ∈ M we have kηh(p) − ηh(p′ ) k = kDh(p)(ζp − ζp′ ) + (Dh(p) − Dh(p′ ))ζp′ k ≤ kDh(p)k · kζp − ζp′ k + kDh(p) − Dh(p′ ))k · kζp′ k. Any p0 ∈ M has a neighborhood M0 ⊂ M such that ζ|M0 is Lipschitz, say with Lipschitz constant L1 , and there are constants C1 , C2 > 0 such that kDh(p)k ≤ C1 and kζp k ≤ C2 for all p ∈ M0 . Since h is C 2 , Dh is C 1 and consequently locally Lipschitz, so we can choose M0 such that Dh|M0 is Lipschitz, say with Lipschitz constant L2 . Then η ◦ h|M0 is Lipschitz with Lipschitz constant C1 L1 + C2 L2 . Now suppose that η is locally Lipschitz. By the definition of a C r function, there is an open W ⊂ Rk containing h(M) and a C r function Ψ : W → Rm whose restriction to h(M) is h−1 . Replacing W with Ψ−1 (M), we may assume that the image of Ψ is contained in M. We extend η to W be setting η = η ◦ h ◦ Ψ. This is locally Lipschitz because η is locally Lipschitz and h ◦ Ψ is C 1 . The remainder of the proof follows the pattern of the first part, with ζ in place of η, η in place of ζ, and Ψ in place of h. With the preparations complete, we can now place the Picard-Lindelöf theorem in a general setting. Theorem 15.2.5. Suppose that ζ is locally Lipschitz and C ⊂ M is compact. Then for sufficiently small ε > 0 there is a unique function Φ : C × (−ε, ε) → U such that for all p ∈ C, Φ(p, 0) = p and Φ(p, t) is a trajectory for ζ. In addition Φ is continuous, and if ζ is C s (1 ≤ s ≤ r) then so is Φ. Proof. We can cover C with the interiors of a finite collection K1 , . . . , Kr of compact subsets, each of which is contained in the image of some C r parameterization ϕi : Ui → M. For each i let zi be the vector field on Ui derived from ζ and ϕ−1 i , as −1 described above, and let Fi : ϕ (Ki ) × (−εi , εi ) → Ui be the function given by Theorem 15.1.1. Then the function Φi : Ki × (−εi , εi ) → M given by Φi (p, t) = ϕi (Fi (ϕ−1 i (p), t)) inherits the continuity and smoothness properties of Fi , and for each p ∈ Ki , Φi (p, 0) = p and Φi (p, ·) is a trajectory for ζ. If ε ≤ min{ε1, . . . , εr }, then we must have Φ(p, t) = Φi (p, t) whenever p ∈ Ki , so Φ is unique if it exists. In fact Φ unambiguously defined by this condition: if p ∈ Ki ∩ Kj , then ϕ−1 i ◦ Φj (p, ·) is −1 trajectory for zi , so it agrees with Fi (ϕi (p), ·), and thus Φi (p, ·) and Φj (p, ·) agree on (−ε, ε). Taking a union of the interiors of the sets C × (−ε, ε) gives an open W ′ ∈ M × R such that: (a) for each p, { t : (p, t) ∈ W ′ } is an interval containing 0; (b) there is a unique function Φ′ : C × (−ε, ε) → U such that for all p ∈ C, Φ′ (p, 0) = p and Φ′ (p, ·) is a trajectory for ζ. 216 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA If W ′′ and Φ′′ is a second pair with these properties, then W ′ ∪ W ′′ satisfies (a), and uniqueness implies that Φ′ and Φ′′ agree on W ′ ∩ W ′′ , so the function on W ′ ∪ W ′′ that agrees with Φ′ on W ′ and with Φ′′ on W ′′ satisfies (b). In fact his logic extends to any, possibly infinite, collection of pairs. Applying it to the collection of all such pairs shows that there is a maximal W satisfying (a), called the flow domain of ζ, such that there is a unique Φ : W → M satisfying (b), which is called the flow of ζ. Since the flow agrees, in a neighborhood of any point, with a function derived (by change of time) from one of those given by Theorem 15.2.5, it is continuous, and it is C s (1 ≤ s ≤ r) if ζ is C s . The vector field ζ is said to be complete if W = M × R. When this is the case each Φ(·, t) : M → M is a homeomorphism (or C s diffeomorphism is ζ is C s ) with inverse Φ(·, −t), and t 7→ Φ(·, t) is a homomorphism from R (thought of as a group) to the space of homeomorphisms (or C s diffeomorphisms) between M and itself. It is important to understand that when ζ is not complete, it is because there are trajectories that “go to ∞ in finite time.” One way of making this rigorous is to define the notion of “going to ∞” as a matter of eventually being outside any compact set. Suppose that Ip = (a, b), where b < ∞, and C ⊂ M is compact. If we had Φ(p, tn ) ∈ C for all n, where {tn } is a sequence in (a, b) converging to b, then after passing to a subsequence we would have Φ(p, tn ) → q for some q ∈ C, and we could used the method of the last proof to show that (p, b) ∈ W . 15.3 The Vector Field Index If S ⊂ M and ζ is a vector field on S, an equilibrium of ζ is a point p ∈ S such that ζ(p) = 0 ∈ Tp M. Intuitively, an equilibrium is a rest point of the dynamical system defined by ζ in the sense that the constant function with value p is a trajectory. The axiomatic description of the vector field index resembles the corresponding descriptions of the degree and the fixed point index. If C ⊂ M is compact, a continuous vector field ζ on C is index admissible if it has no equilibria in ∂C. (As before, int C is the topological interior of C, and ∂C = C \int C is its topological boundary.) Let V(M) be the set of index admissible vector fields ζ : C → T M where C is compact. Definition 15.3.1. A vector field index for M is a function ind : V(M) → Z, ζ 7→ ind(ζ), satisfying: (V1) ind(ζ) = 1 for all ζ ∈ V(M) with domain C such that there is a C r parameterization ϕ : V → M with C ⊂ ϕ(V ), ϕ−1 (C) = D m , and Dϕ(x)−1 ζϕ(x) = x for all x ∈ D m = { x ∈ Rm : kxk ≤ 1 }. Ps (V2) ind(ζ) = i=1 ind(ζ|Ci ) whenever ζ ∈ V(M), C is the domain of ζ, and C1 , . . . , Cs are pairwise disjoint compact subsets of C such that ζ has no equilibria in C \ (int C1 ∪ . . . ∪ int Cs ). (V3) For each ζ ∈ V(M) with domain C there is a neighborhood A ⊂ T M of Gr(ζ) such that ind(ζ ′ ) = ind(ζ) for all vector fields ζ ′ on C with Gr(ζ ′) ⊂ A. 15.3. THE VECTOR FIELD INDEX 217 A vector field homotopy on S is a continuous function η : S × [0, 1] → T M such that π(η(p, t)) = p for all (p, t), which is to say that each ηt = η(·, t) : S → T M is a vector field on S. A vector field homotopy η on C is index admissible if each ηt is index admissible. If ind(·) is a vector field index, then ind(ηt ) is locally constant as a function of t, hence constant because [0, 1] is connected, so ind(η0 ) = ind(η1 ). Our analysis of the vector field index relates it to the fixed point index. Theorem 15.3.2. There is a unique index for M. If ζ ∈ V(M) has an extension to a neighborhood of its domain that is locally Lipschitz and Φ is the flow of this extension, then ind(ζ) = Λ(Φ(·, −t)|C ) = (−1)m Λ(Φ(·, t)|C ) for all sufficiently small positive t. Equivalently, ind(−ζ) = Λ(Φ(·, t)|C ) for small positive t. Remark: In the theory of dynamical systems we are more interested in the future than the past. In particular, forward stability is of much greater interest than backward stability, even though the symmetry t 7→ −t makes the study of one equivalent to the study of the other. From this point of view it seems that it would have been preferable to define the vector field index with (V1) replaced by the normalization requiring that the vector field x 7→ −x ∈ Tx Rm has index 1. The remainder of this section is devoted to the proof of Theorem 15.3.2. Fix ζ ∈ V(M) with domain C. The first order of business is to show that ζ can be approximated by a well enough behaved vector field that is defined on a neighborhood of C. Since C is compact, it is covered by the interiors of a finite collection K1 , . . . , Kk of compact sets, with each Ki contained in an open Vi that is the image of a C r parameterization ϕi . Each ϕi induces an isomorphism between T Vi and Vi × Rm , so that the Tietze extension theorem implies that there is a vector field on Vi that agrees with ζ on C∩Vi . There is a partition of unity {λi } for K1 ∪. . .∪K S k subordinate to the cover V1 , . . . , Vk , and we may define an extension of ζ to V = i Vi by setting X ζ(p) = λi (p)ζi (p). p∈Vi Suppose 2 ≤ r ≤ ∞. We will need to show that ζ can be approximated by a vector field that is locally Lipschitz, but in fact we can approximate with a C r−1 vector field. In the setting of the last paragraph, we may assume that the partition of unity {λi } is C r . Proposition 10.2.7 allows us to approximate each vector field x 7→ Dϕi (x)−1 (ζi,ϕi(x) ) on ϕi (Vi ) with a C r vector field ξi , and p 7→ Dϕi (ϕ−1 i (p))ξi,ϕ−1 i (p) is then a C r−1 vector field ζ̃i on Vi that approximates ζi . The vector field ζ̃ on V given by X ζ̃(p) = λi (p)ζ̃i (p) p∈Vi 218 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA is a C r−1 vector field that approximates ζ. Actually, we wish to approximate ζ with a C r−1 vector field satisfying an additional regularity condition. Recall that T(p,0) (T M) = Tp M × Tp M, and let π2 : Tp M × Tp M → Tp M, π2 (v, w) = w, be the projection onto the second component. We say that p is a regular equilibrium of ζ̃ if p is an equilibrium of ζ̃ and π2 ◦ D ζ̃(p) is nonsingular. (Intuitively, the derivative at p of the map q 7→ ζ̃q has rank m.) We need the following local result. Lemma 15.3.3. Suppose that K ⊂ V ⊂ V ⊂ U ⊂ Rm with U and V open and K and V compact, and λ : U → [0, 1] is a C r−1 (2 ≤ r ≤ ∞) function with λ(x) = 1 whenever x ∈ K and λ(x) = 0 whenever x ∈ / V . Let D be a closed subset of U, and let f : U → Rm be a C r−1 function whose zeros in D are all regular. Then any neighborhood of the origin in Rm contains a y such that all the zeros of fy : x 7→ f (x) + λ(x)y in D ∪ K are regular. Proof. The equidimensional case of Sard’s theorem implies that the set of regular values of f |V is dense, and if −y is a regular value of f |V , then all the zeros of fy |K are regular. If the claim is false, there must be a sequence yn → 0 such that for each n there is a xn ∈ V ∩ D such that xn is a singular zero of fyn . But V ∩ D is compact, so the sequence {xn } must have a limit point, which is a singular zero of f |D by continuity, contrary to assumption. PnUsing the last result, we first choose a perturbation ζ̂1 of ζ̃1 such that λ1 ζ̂1 + i=2 λi ζ̃i has no zeros in K1 . Working inductively, we then choose perturbations ζ̂2 , . . . , ζ̂n of ζ̃2 , . . . , ζ̃n one at a time, in such a way that for each i, i X h=1 λi ζ̂i + n X λi ζ̃i h=i+1 has only P regular equilibria in Di−1 ∪ Ki = (K1 ∪ . . . ∪ Ki−1 ) ∪ Ki . At the end of this ζ̂ = i λi ζ̂i is an approximation of ζ̃ that has only regular equilibria in C. We can now explain the proof that the vector field index is unique. In view of (V3), if it exists, the vector field index is determined by its values on those ζ ∈ V(M) that are C r−1 and have only regular equilibria. Applying (V2), we find that the vector field index is in fact fully determined by its values on the ζ that are C r−1 and have a single regular equilibria. For such ζ the main ideas here are essentially the ones that were developed in connection with our analysis of orientation, so we only sketch them briefly and informally. By the logic of that analysis, there is a C r parameterization ϕ : V → M whose image contains the unique equilibrium, such that either x 7→ Dϕ(x)−1 ζϕ(x) is admissibly homotopic to either the vector field x 7→ x or the vector field x 7→ (−x1 , x2 , . . . , xm ). In the first of these two situations, the index is determined by (V1). In addition, one can easily define an admissible homotopy transforming this situation into one in which there are three regular equilibria, two of which are of the first type and one of which is of the second type. Combining (V1) and (V2), we 219 15.3. THE VECTOR FIELD INDEX find that the index of the equilibrium of the second type is −1, so the vector field index is indeed uniquely determined by the axioms. We still need to construct the index. One way to proceed would be to define the vector field index to be the index of nearby smooth approximations with regular equilibria. This is possible, but the key step, namely showing that different approximations give the same result, would duplicate work done in earlier chapters. Instead we will define the vector field index using the characterization in terms of the fixed point index given in the statement of Theorem 15.3.2, after which the axioms for the fixed point index will imply (V1)-(V3). We need the following technical fact. Lemma 15.3.4. If C ⊂ M is compact, ζ is a locally Lipschitz vector field defined on a neighborhood U of C, Φ is the flow of ζ, and ζ(p) 6= 0 for all p ∈ C, then there is ε > 0 such that Φ(p, t) 6= p for all (p, t) ∈ C × ((−ε, 0) ∪ (0, ε)). Proof. We have C = K1 ∪ . . . ∪ Kk where each Ki is compact and contained in the domain Wi of a C r parameterization ϕi . It suffices to prove the claim with C replaced with Ki , so we may assume that C is contained in the image of a C r parameterization. We can use Lemma 15.2.1 to move the problem to the domain of the parameterization, so we may assume that U is an open subset of Rm and that ζ is Lipschitz, say with Lipschitz constant L. Let V be a neighborhood of C such that V is a compact subset of U, and let M := max kζ(p)k and m := min kζ(p)k. p∈V p∈V Let ε > 0 be small enough that: a) V × (−ε, ε) is contained in the flow domain W ζ of ζ; b) Φ(C × (−ε, ε)) ⊂ V ; c) LMε < m. We claim that kΦ(p, t) − pk < M|t| (∗) for all (p, t) ∈ C × (−ε, ε). For any (p, s) ∈ C × (−ε, ε) and v ∈ Rm we have d hΦ(p, s) − p, vi = hζΦ(p,s), vi ≤ kζΦ(p,s) k · kvk ≤ Mkvk. ds Therefore the intermediate value theorem implies that |hΦ(p, t) − p, vi| ≤ M|t| · kvk. Since v may be any unit vector, (∗) follows this. Now suppose that Φ(p, t) = p for some (p, t) ∈ C × ((−ε, 0) ∪ (0, ε)). Rolle’s theorem implies that there is some s between 0 and t such that 0= d hΦ(p, s) ds − p, ζpi = hζΦ(p,s) , ζp i, but among the vectors that are orthogonal to ζp , the origin is the closest, and kζΦ(p,s) − ζp k ≤ LkΦ(p, s) − pk ≤ LM|s| < LMε < m < kζp k, so this is impossible. 220 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA We now define the vector field index of the pair (U, ζ) to be Λ(Φζ̃ (·, t)|C ), where ζ̃ is a nearby C r−1 vector field, for all sufficiently small negative t. Since such a ζ̃ is vector field admissible, the last result (applied to ∂C) implies that Φζ̃ (·, t)|C is index admissible for all small negative t, and it also (by Homotopy) implies that the choice of t does not affect the definition. We must also show that the choice of ζ̃ does not matter. Certainly there is a neighborhood such that for ζ̃0 and ζ̃1 in this neighborhood and all s ∈ [0, 1], ζ̃s = (1 − s)ζ̃0 + sζ̃1 is index admissible. In addition, that Φζ̃s (p, t) is jointly continuous as a function of (p, t, s) follows from Theorem 15.2.5 applied to the vector field (p, s) 7→ (ζ̃s (p), 0) ∈ T(p,s) (M × (−δ, 1 + δ)) on M × (−δ, 1 + δ), where δ is a suitable small positive number. Therefore Continuity for the fixed point index implies that Λ(Φζ̃0 (·, t)|C ) = Λ(Φζ̃1 (·, t)|C ). We now have to verify that our definition satisfies (V1)-(V3). But the result established in the last paragraph immediately implies (V3). Of course (V2) follows directly from the Additivity property of the fixed point index. Finally, the flow of the vector field ζ(x) = x on Rm is Φ(x, t) = et x, so for small negative t there is an index admissible homotopy between Φ(·, t)|Dm and the constant map x 7→ 0, so (V1) follows from Continuity and Normalization for the fixed point index. All that remains of the proof of Theorem 15.3.2 is to show that ζ is locally Lipschitz and defined in a neighborhood of C, then ind(ζ) = (−1)m Λ(Φζ (·, t)|C ) for sufficiently small positive t. Since we can approximate ζ with a vector field that is C r−1 and has only regular equilibria, by (V2) it suffices to prove this when C is a single regular equilibrium. If ζ is one of the two vector fields x 7→ x ∈ Tx Rm or x 7→ (−x1 , x2 , . . . , xm ) ∈ Tx Rm on Rm , then Φζ (x, −t) = −Φζ (x, t) for all x and t, so the result follows from the relationship between the index and the determinant of the derivative. 15.4 Dynamic Stability If an equilibrium of a dynamical system is perturbed, does the system necessarily return to the equilibrium, or can it wander far away from the starting point? Such questions are of obvious importance for physical systems. In economics the notion of dynamic adjustment to equilibrium is problematic, because if the dynamics of adjustment are understood by the agents in the model, they will usually not adjust their strategies in the predicted way. Nonetheless economists would generally agree that an equilibrium may or may not be empirically plausible according to whether there are some “natural” or “reasonable” dynamics for which it is stable. In this section we study a basic stability notion, and show that a sufficient condition for it is the existence of a Lyapunov function. As before we work with a locally Lipschitz vector field ζ on a C r manifold M where r ≥ 2. Let W be the flow domain of ζ, and let Φ be the flow. 221 15.4. DYNAMIC STABILITY One of the earliest and most useful tools for understanding stability was introduced by Lyapunov toward the end of the 19th century. A function f : M → R is ζ-differentiable if the ζ-derivative ζf (p) = d f (Φ(p, t))|t=0 dt is defined for every p ∈ M. A continuous function L : M → [0, ∞) is a Lyapunov function for A ⊂ M if: (a) L−1 (0) = A; (b) L is ζ-differentiable with ζL(p) < 0 for all p ∈ M \ A; (c) for every neighborhood U of A there is an ε > 0 such that L−1 ([0, ε]) ⊂ U. The existence of a Lyapunov function implies quite a bit. A set A ⊂ M is invariant if A × [0, ∞) ⊂ W and Φ(p, t) ∈ A for all p ∈ A and t ≥ 0. The ω-limit set of p ∈ M is \ { Φ(p, t) : t ≥ t0 }. t0 ≥0 The domain of attraction of A is D(A) = { p ∈ M : the ω-limit set of p is nonempty and contained in A }. A set A ⊂ M is asymptotically stable if: (a) A is compact; (b) A is invariant; (c) D(A) is a neighborhood of A; (d) for every neighborhood Ũ of A there is a neighborhood U such that Φ(p, t) ∈ Ũ for all p ∈ U and t ≥ 0. Asymptotic stability is a local property, in the sense A is asymptotically stable if and only if it is asymptotically stable for the restriction of ζ to any given neighborhood of A; this is mostly automatic, but to verify (c) for the restriction one needs to combine (c) and (d) for the given vector field. Theorem 15.4.1 (Lyapunov (1992)). If A is compact and L is a Lyapunov function for A, then A is asymptotically stable. Proof. If L(Φ(p, t)) > 0 for some p ∈ A and t > 0, the intermediate value theorem would give a t′ ∈ [0, t] with 0< d d L(Φ(p, t))|t=t′ = L(Φ(Φ(p, t′ ), t))|t=0 , dt dt contrary to (b). Therefore A = L−1 (0) is invariant. 222 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA Let K be a compact neighborhood of A, choose ε > 0 such that L−1 ([0, ε]) ⊂ K, and consider a point p ∈ L−1 ([0, ε]). Since L(Φ(p, t)) is weakly decreasing, { Φ(p, t) : t ≥ 0 } ⊂ L−1 ([0, ε]) ⊂ K, so the ω-limit of p is a subset of K. Since K is compact and the ω-limit of is by definition the intersection of nested nonempty closed subsets, the ω-limit is nonempty. To show that the ω-limit of p is contained in A, consider some q ∈ / A, and fix a t > 0. Since L is continuous there are neighborhoods of V0 of q and Vt of Φ(q, t) such that L(q ′ ) > L(q ′′ ) for all q ′ ∈ V0 and q ′′ ∈ Vt . Since Φ is continuous, we can choose V0 small enough that Φ(q ′ , t) ∈ Vt for all q ′ ∈ V0 . The significance of this is that if the trajectory of p ever entered V0 it would continue to Vt , and it could not then return to V0 because L(Φ(p, t)) is a decreasing function of t, so q is not in the ω-limit of p. We have shown that L−1 ([0, ε)) ⊂ D(A), so D(A) is a neighborhood of A. Now consider an neighborhood Ũ of A. We want a neighborhood U such that Φ(U, t) ⊂ Ũ for all t ≥ 0, and it suffices to set U = L−1 ([0, δ)) for some δ > 0 such that L−1 ([0, δ)) ⊂ Ũ . If there was no such δ there would be a sequence {pn } in L−1 ([0, ε]) \ Ũ with L(pn ) → 0. Since this sequence would be contained in K, it would have limit points, which would be in A, by (a), but also in K \ Ũ . Of course this is impossible. 15.5 The Converse Lyapunov Problem A converse Lyapunov theorem is a result asserting that if a set is asymptotically stable, then there is a Lyapunov function defined on a neighborhood of the set. The history of converse Lyapunov theorems is sketched by Nadzieja (1990). Briefly, after several partial results, the problem was completely solved by Wilson (1969), who showed that one could require the Lyapunov function to be C ∞ when the given manifold is C ∞ . Since we do not need such a refined result, we will follow the simpler treatment given by Nadzieja (1990). Let M, ζ, W , and Φ be as in the last section. This section’s goal is: Theorem 15.5.1. If A is asymptotically stable, then (after replacing M with a suitable neighborhood of A) there is a Lyapunov function for A. The construction requires that the vector field be complete, and that certain other conditions hold, so we begin by explaining how the desired situation can be achieved on some neighborhood of A. Let U ⊂ D(A) be an open neighborhood of A whose closure (as a subset of Rk ) is contained in M. For any metric on M (e.g., the one induced by the inclusion in Rk ) the infimum of the distance from a point p ∈ U to a point in M \ U is a positive continuous function on U, so Proposition 10.2.7 implies that there is a C r function α : U → (0, ∞) such that for each p ∈ U, 1/α(p) is less than the distance from p to any point in M \ U. Let M̃ be the graph of α: M̃ = { (p, α(p)) : p ∈ U } ⊂ U × R ⊂ Rk+1 . The closed subsets of M̃ are the subsets that are closed in Rk+1 : 15.5. THE CONVERSE LYAPUNOV PROBLEM 223 Lemma 15.5.2. M̃ is a closed subset of Rk+1. Proof. Suppose a sequence {(pn , hn )} in M̃ converges to (p, h). Then p ∈ M, and it must be in U because otherwise hn = α(pn ) → ∞. Continuity implies that h = α(p), so (p, h) ∈ M̃ . Using the map IdM × α : U → M̃ , we defined a transformed vector field: ζ̃(p,α(p)) = D(IdM × h)(p)ζp . Since IdM × α is C r , Lemma 15.2.4 implies that ζ̃ is a locally Lipschitz vector field on M̃ . Let Φ̃ be the flow of ζ̃. Using the chain rule, it is easy to show that Φ̃((p, h), t) = Φ(p, t), α(Φ(p, t)) for all (p, t) in the flow domain of ζ. Since asymptotic stability is a local property, à = { (p, α(p)) : p ∈ A } is asymptotically stable for ζ̃. We now wish to slow the dynamics, to prevent trajectories from going to ∞ in finite time. Another application of Proposition 10.2.7 gives a C r function β : M̃ → (0, ∞) with β(p, h) < 1/kζ̃(p, h)k for all (p, h) ∈ M̃ . Define a vector field ζ̂ on M̃ by setting ζ̂(p, h) = β(p, h)ζ̃(p, h), and let Φ̂ be the flow of ζ̂. For (p, t) such that (p, α(p), t) is in the flow domain of ζ̂ let Z t B(p, t) = β(Φ̂(p, α(p)), s) ds. 0 The chain rule computation i d h Φ̃ p, α(p), B(p, t) = β(Φ̂(p, α(p), t))ζ̃Φ̃(p,α(p),B(p,t)) dt shows that t 7→ Φ̃(p, α(p), B(p, t)) is a trajectory for ζ̂, so Φ̂(p, α(p), t) = Φ̃ p, α(p), B(p, t) . This has two important consequences. The first is that the speed of a trajectory of ζ̂ is never greater than one, so the final component of Φ̂(p, α(p), t) cannot go to ∞ in finite (forward or backward) time. In view of our remarks at the end of Section 15.2, ζ̂ is complete. The second point is that since β is bounded below on any compact set, if { Φ̃(p, α(p), t) : t ≥ 0 } is bounded, then Φ̂(p, ·) traverses the entire trajectory of ζ̃ beginning at (p, α(p)). It follows that à is asymptotically stable for ζ̂. Note that if L̂ is a Lyapunov function for ζ̂ and Ã, then it is also a Lyapunov function for ζ̃ and Ã, and setting L(p) = L̂(p, α(p)) gives a Lyapunov function for ζ|U and A. Therefore it suffices to establish the claim with M and ζ replaced by M̃ and ζ̂. The upshot of the discussion to this point is as follows. We may assume that ζ is complete, and that the domain of attraction of A is all of M. We may also assume that M has a metric d that is complete—that is, any Cauchy sequence converges—so a sequence {pn } that is eventually outside of each compact subset of M diverges in the sense that d(p, pn ) → ∞ for any p ∈ M. The next four results are technical preparations for the main argument. 224 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA Lemma 15.5.3. Let K be a compact subset of M. For any neighborhood Ũ of A there is a neighborhood V of K and a number T such that Φ(p, t) ∈ Ũ whenever p ∈ V and t ≥ T . Proof. The asymptotic stability of A implies that A has a neighborhood U such that Φ(U, t) ⊂ Ũ for all t ≥ 0. The domain of attraction of A is all of M, so for each p ∈ K there is tp such that Φ(p, tp ) ∈ U, and the continuity of Φ implies that Φ(p′ , tp ) ∈ U for all p′ in some neighborhood of p. Since K is compact, it has a finite open cover V1 , . . . , Vk such that for each i there is some ti such that Φ(p, t) ∈ U whenever p ∈ Vi and t ≥ ti . Set V = V1 ∪ . . . ∪ Vk and T = max{t1 , . . . , tk }. Lemma 15.5.4. If {(pn , tn )} is a sequence in W = M × R such that the closure of {pn } does not intersect A, and {Φ(pn , tn )} is bounded, then the sequence {tn } is bounded below. Proof. Let Ũ be a neighborhood of A that does not contain any element of {pn } Since {Φ(pn , tn )} is bounded, it is contained in a compact set, so the last result gives a T such that Φ(Φ(pn , tn ), t) = Φ(pn , tn + t) ∈ Ũ for all t ≥ T . For all n we have tn > −T because otherwise pn = Φ(pn , 0) ∈ Ũ. Lemma 15.5.5. For all p ∈ M \ A, d(Φ(p, t), p) → ∞ as t → −∞. Proof. Otherwise there is a p and sequence {tn } with tn → −∞ such that {Φ(p, tn )} is bounded and consequently contained in a compact set. The last result implies that this is impossible. Let ℓ : M → [0, ∞) be the function ℓ(p) = inf d(Φ(p, t), A). t≤0 If p ∈ A, then ℓ(p) = 0. If p ∈ / A, then Φ(p, t) ∈ / A for all t ≤ 0 because t is invariant, and the last result implies that ℓ(p) > 0. Lemma 15.5.6. ℓ is continuous. Proof. Since ℓ(p) ≤ d(p, A), ℓ is continuous at points in A. Suppose that {pn } is a sequence converging to a point p ∈ / A. The last result implies that there are t ≤ 0 and tn ≤ 0 for each n such that ℓ(p) = d(Φ(p, t), A) and ℓ(pn ) = d(Φ(pn , tn ), A). The continuity of Φ and d gives lim sup ℓ(pn ) ≤ lim sup d(Φ(pn , t), pn ) = ℓ(p). n n On the other hand d(Φ(pn , tn ), A) ≤ d(pn , A), so the sequence Φ(pn , tn ) is bounded, and Lemma 15.5.4 implies that {tn } is bounded below. Passing to a subsequence, we may suppose that tn → t′ , so that ℓ(p) ≤ d(Φ(p, t′ ), A) = lim d(Φ(pn , tn ), A) = lim inf ℓ(pn ). n Thus ℓ(pn ) → ℓ(p). n 225 15.5. THE CONVERSE LYAPUNOV PROBLEM We are now ready for the main construction. Let L : M → [0, ∞) be defined by Z ∞ L(p) = ℓ(Φ(p, s)) exp(−s) ds. 0 The rest of the argument verifies that L is, in fact, a Lyapunov function. Since A is invariant, L(p) = 0 if p ∈ A. If p ∈ / A, then L(p) > 0 because ℓ(p) > 0. To show that L is continuous at an arbitrary p ∈ M we observe that for any ε > 0 there is a T such that ℓ(Φ(p, T )) < ε/2. Since Φ is continuous we have ℓ(Φ(p′ , T )) < ε/2 and |ℓ(Φ(p′ , t)) − ℓ(Φ(p, t))| < ε/2 for all p′ in some neighborhood of p and all t ∈ [0, T ], so that Z T ′ ℓ(Φ(p′ , s)) − ℓ(Φ(p, s)) exp(−s) ds − |L(p ) − L(p)| ≤ 0 Z ∞ T ′ ′ ℓ(Φ(p , s)) exp(−s) ds − Z ∞ T ℓ(Φ(p, s)) exp(−s) ds < ε for all p in this neighborhood. To show that L is ζ-differentiable, and to compute its derivative, we observe that Z ∞ Z ∞ L(Φ(p, t)) = Φ(p, t + s) exp(−s) ds = exp(t) Φ(p, s) exp(−s) ds, 0 t so that L(Φ(p, t)) − L(p) = (exp(t) − 1) Z ∞ t Φ(p, s) exp(−s) ds − Dividing by t and taking the limit as t → 0 gives Z t ℓ(Φ(p, t)) exp(−s) ds. 0 ζL(p) = L(p) − ℓ(p). Note that L(p) < ℓ(p) Z ∞ exp(s) ds = ℓ(p) 0 because ℓ(Φ(p, ·)) is weakly decreasing with limt→∞ ℓ(Φ(p, t)) = 0. Therefore ζL(p) < 0 when p ∈ / A. We need one more technical result. Lemma 15.5.7. If {(pn , tn )} is a sequence such that d(pn , A) → ∞ and there is a number T such that tn < T for all n, then d(Φ(pn , tn ), A) → ∞. Proof. Suppose not. After passing to a subsequence there is a B > 0 such that d(Φ(pn , tn ), A) < B for all n, so the sequence {Φ(pn , tn )} is contained in a compact set K. Since the domain of attraction of A is all of M, Φ is continuous, and K is compact, for any ε > 0 there is some S such that d(Φ(p, t), A) < ε whenever p ∈ K and t > S. The function p 7→ d(Φ(p, t), A) is continuous, hence bounded on the compact set K × [−T, S], so it is bounded on all of K × [−T, ∞). But this is impossible because −tn > −T and d(Φ(Φ(pn , tn ), −tn ), A) = d(pn , A) → ∞. 226 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA It remains to show that if U is open and contains A, then there is an ε > 0 such that L−1 ([0, ε]) ⊂ U. The alternative is that there is some sequence {pn } in M \ U with L(pn ) → 0. Since L is continuous and positive on M \ U, the sequence must eventually be outside any compact set. For each n we can choose tn ≤ 1 such that ℓ(Φ(pn , 1)) = d(Φ(pn , tn ), A), and the last result implies that ℓ(Φ(pn , 1)) → ∞, so Z 1 Z 1 L(pn ) ≥ ℓ(Φ(pn , t)) exp(−t) dt ≥ ℓ(Φ(pn , t)) exp(−t) dt → ∞. 0 0 This contradiction completes the proof that L is a Lyapunov function, so the proof of Theorem 15.5.1 is complete. 15.6 A Necessary Condition for Stability This section establishes the relationship between asymptotic stability and the vector field index. Let M, ζ, and Φ be as before. If A is a compact set of equilibria for ζ that has a compact index admissible neighborhood C that contains no other equilibria of ζ, then ind(ζC ) is the same for all such C; we denote this common value of the index by indζ (A). Theorem 15.6.1. If A is an ANR that is asymptotically stable, then ind−ζ (A) = χ(A). Proof. From the last section we know that (after restricting to some neighborhood of A) there is a Lyapunov function L for ζ. For some ε > 0, Aε = L−1 ([0, ε]) is compact. Using the flow, it is not hard to show that Aε is a retract of Aε′ for some ε′ > ε, and that Aε′ is a neighborhood of Aε , so Aε is an ANR. For each t > 0, Φ(·, t)|Aε maps Aε to itself, and is homotopic to the identity, so χ(Aε ) = Λ(Φ(·, t)|Aε ) = (−1)m ind(ζ|Aε ). Since A is an ANR, there is a retraction r : C → A, where C is a neighborhood of A. By taking ε small we may insure that Aε ⊂ C, and we may then replace C with ε, so we may assume the domain of r is actually Aε . If i : A → C is the inclusion, then Commutativity gives χ(A) = Λ(r ◦ i) = Λ(i ◦ r) = Λ(r), so it suffices to show that if t > 0, then Λ(Φ(·, t)|C ) = Λ(r). Let W ⊂ M × M be a neighborhood of the diagonal for which there is “convex combination” function c : W ×[0, 1] → M as per Proposition 10.7.9. We claim that if T is sufficiently large, then there is an index admissible homotopy h : Aε ×[0, 1] → Aε between IdAε and r given by 0 ≤ t ≤ 31 , Φ(p, 3tT ), h(p, t) = c((Φ(p, T ), r(Φ(p, T ))), 3(t − 13 )), 31 ≤ t ≤ 23 , 2 ≤ t ≤ 1. r(Φ(p, 3(1 − t)T )), 3 15.6. A NECESSARY CONDITION FOR STABILITY 227 This works because there is some neighborhood U of A such that c((p, r(p)), t) is defined and in the interior of Aε for all p ∈ U and all 0 ≤ t ≤ 1, and Φ(Aε , T ) ⊂ U if T is sufficiently large. The following special case is a prominent result in the theory of dynamical systems. Corollary 15.6.2 (Krasnosel’ski and Zabreiko (1984)). If {p0 } is asymptotically stable, then ind−ζ ({p0 }) = 1. Physical equilibrium concepts are usually rest points of explicit dynamical systems, for which the notion of stability is easily understood. For economic models, dynamic adjustment to equilibrium is a concept that goes back to Walras’ notion of tatonnement, but such adjustment is conceptually problematic. If there is gradual adjustment of prices, or gradual adjustment of mixed strategies, and the agents understand and expect this, then instead of conforming to such dynamics the agents will exploit and undermine them. For this reason there are, to a rough approximation, no accepted theoretical foundations for a prediction that an economic or strategic equilibrium is dynamically stable. Paul Samuelson (1941, 1942, 1947) advocated a correspondence principle, according to which dynamical stability of an equilibrium has implications for the qualitative properties of the equilibrium’s comparative statics. Samuelson’s writings consider many particular models, but he never formulated the correspondence principle as a precise and general theorem, and the economics profession’s understanding of it has languished, being largely restricted to 1-dimensional cases; see Echenique (2008) for a succinct summary. However, it is possible to pass quickly from the Krasnosel’ski-Zabreiko theorem to a general formulation of the correspondence principle, as we now explain. Let U ⊂ Rm be open, let P be a space of parameter values that is an open subset of Rn , and let z : U × P → Rm be a C 1 function that we understand as a parameterized vector field. (Working in a Euclidean setting allows us to avoid discussing differentiation of vector fields on manifolds, which is a very substantial topic.) For (x, α) ∈ U × P let ∂x z(x, α) and ∂α z(x, α) denote the matrices of partial derivatives of the components of z with respect to the components of x and α respectively. We consider a point (x0 , α0 ) with z(x0 , α0 ) = 0 such that ∂x z(x0 , α0 ) is nonsingular. The implicit function implies that there is a neighborhood V of α0 and C 1 function σ : V → U such that σ(α0 ) = x0 and z(σ(α), α) = 0 for all α ∈ V . The method of comparative statics if to differentiate this equation with respect to α, using the chain rule, then rearrange, arriving at dσ (α0 ) = −∂x z(x0 , α0 )−1 · ∂α z(x0 , α0 ). dα The last result implies that if {x0 } is asymptotically stable for the vector field z(·, α0 ), then the determinant of −∂x z(x0 , α0 ) is positive, as is the determinant of dσ (α0 ) is a positive scalar multiple its inverse. When m = 1 this says that the vector dα 228 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA of ∂α z(x0 , α0 ). When m > 1 it says that the transformation mapping ∂α z(x0 , α0 ) dσ (α0 ) is orientation preserving, which is still a qualitiative property of the to dα comparative statics, though of course its intuitive and conceptual significance is less immediate. (It is sometimes argued, e.g., pp. 320-1 of Arrow and Hahn (1971), that the correspondence principle has no consequences beyond the 1-dimensional case, but this does not seem quite right. In higher dimensions it still provides a qualitative restriction on the comparative statics. It is true that the restriction provides only one bit of information, so by itself it is unlikely to be useful, but one should still expect the correspondence principle to have some useful consequences in particular models, in combination with various auxilary hypotheses.) We conclude with some comments on the status of the correspondence principle as a foundational element of economic analysis. First of all, the fact that our current understanding of adjustment to equilibrium gives little reason to expect an equilibrium to be stable is of limited relevance, because in the correspondence principle stability is an hypothesis, not a conclusion. That is, we observe an equilibrium that persists over time, and is consequently stable with respect to whatever mechanism brings about reequilibration after small disturbances. This is given. In general equilibrium theory and noncooperative game theory, and in a multitude of particular economic models, equilibrium is implicitly defined as a rest point of some process according to which, in response to a failure of an equilibrium condition, some agent would change her behavior in pursuit of higher utility. Such a definition brings with it some sense of “natural” dynamics, e.g., the various prices each adjusting in the direction of excess demand, or each agent adjusting her mixed strategy in some direction that would be improving if others were not also adjusting. The Krasnosel’ski-Zabreiko theorem will typically imply that if ind−z(·,α0 ) ({x0 }) 6= 1, then x0 is not stable for any dynamic process that is natural in this sense. Logically, this leaves open the possibility that the actual dynamic process is unnatural, which seemingly requires some sort of coordination on the part of the various agents, or perhaps that it is much more complicated than we are imagining. Almost certainly most economists would regard these possibilities as far fetched. In this sense the correspondence principle is not less reliable or well founded than other basic principles of our imprecise and uncertain science. Bibliography Alexander, J. W. (1924). An example of a simply-connected surface bounding a region which is not simply-connected. Proceedings of the National Academy of Sciences, 10:8–10. Arora, S. and Boaz, B. (2007). Computational Complexity: A Modern Approach. Cambridge University Press, Cambridge. Arrow, K., Block, H. D., and Hurwicz, L. (1959). On the stability of the competitive equilibrium, II. Econometrica, 27:82–109. Arrow, K. and Debreu, G. (1954). Existence of an equilibrium for a competitive economy. Econometrica, 22:265–290. Arrow, K. and Hurwicz, L. (1958). On the stability of the competitive equilibrium, I. Econometrica, 26:522–552. Arrow, K. J. and Hahn, F. H. (1971). General Competitive Analysis. Holden Day, San Francisco. Bollobás, B. (1979). Graph Theory: an Introductory Course. Springer-Verlag, New York. Border, K. C. (1985). Fixed point theorems with applications to economics and game theory. Cambridge University Press, Cambridge. Brouwer, L. E. J. (1912). Uber Abbildung von Mannigfaltikeiten. Mathematiche Annalen, 71:97–115. Browder, F. (1948). The Topological Fixed Point Theory and its Applications to Functional Analysis. PhD thesis, Princeton University. Brown, R. (1971). The Lefschetz Fixed Point Theorem. Scott Foresman and Co., Glenview, IL. Chen, X. and Deng, X. (2006a). On the complexity of 2D discrete fixed point problem. In Proceedings of the 33th International Colloquium on Automata, Languages, and Programming, pages 489–500. Chen, X. and Deng, X. (2006b). Settling the complexity of two-player Nash equilibrium. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 261–272. 229 230 BIBLIOGRAPHY Daskalakis, C., Goldberg, P., and Papadimitriou, C. (2006). The complexity of computing a Nash equilibrium. In Proceedings of the 38th ACM Symposium on the Theory of Computing. Debreu, G. (1959). Theory of Value: An Axiomatic Analysis of Economic Equilibrium. Wiley & Sons, inc., New York. Demichelis, S. and Germano, F. (2000). On the indices of zeros of Nash fields. Journal of Economic Theory, 94:192–217. Demichelis, S. and Ritzberger, K. (2003). From evolutionary to strategic stability. Journal of Economic Theory, 113:51–75. Dierker, E. (1972). Two remarks on the number of equilibria of an economy. Econometrica, 40:951–953. Dugundji, J. (1951). An extension of tietze’s theorem. Pacific Journal of Mathematics, 1:353–367. Dugundji, J. and Granas, A. (2003). Fixed Point Theory. Springer-Verlag, New York. Echenique, F. (2008). The correspondence principle. In Durlauf, S. and Blume, L., editors, The New Palgrave Dictionary of Economics (Second Edition). Palgrave Macmillan, New York. Eilenberg, S. and Montgomery, D. (1946). Fixed-point theorems for multivalued transformations. American Journal of Mathematics, 68:214–222. Fan, K. (1952). Fixed point and minimax theorems in locally convex linear spaces. Proceedings of the National Academy of Sciences, 38:121–126. Federer, H. (1969). Geometric Measure Theory. Springer, New York. Fort, M. (1950). Essential and nonessential fixed points. American Journal of Mathematics, 72:315–322. Glicksberg, I. (1952). A further generalization of the Kakutani fixed point theorem with applications to Nash equilibrium. Proceedings of the American Mathematical Society, 3:170–174. Goldberg, P., Papadimitriou, C., and Savani, R. (2011). The complexity of the homotopy method, equilibrium selection, and Lemke-Howson solutions. In Proceedings of the 52nd Annual IEEE Symposium on the Foundations of Computer Science. Govindan, S. and Wilson, R. (2008). Nash equilibrium, refinements of. In Durlauf, S. and Blume, L., editors, The New Palgrave Dictionary of Economics (Second Edition). Palgrave Macmillan, New York. BIBLIOGRAPHY 231 Guillemin, V. and Pollack, A. (1974). Differential Topology. Springer-Verlag, New York. Hart, O. and Kuhn, H. (1975). A proof of the existence of equilibrium without the free disposal assumption. J. of Mathematical Economics, 2:335–343. Hauk, E. and Hurkens, S. (2002). On forward induction and evolutionary and strategic stability. Journal of Economic Theory, 106:66–90. Hirsch, M. (1976). Differential Topology. Springer-Verlag, New York. Hirsch, M., Papadimitriou, C., and Vavasis, S. (1989). Exponential lower bounds for finding Brouwer fixed points. Journal of Complexity, 5:379–416. Hirsch, M. and Smale, S. (1974). Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, Orlando. Hofbauer, J. (1990). An index theorem for dissipative semiflows. Rocky Mountain Journal of Mathematics, 20:1017–1031. Hopf, H. (1928). A new proof of the Lefschetz formula on invariant points. Proceedings of the National Academy of Sciences, USA, 14:149–153. Jacobson, N. (1953). Lectures in Abstract Algebra. D. van Norstrand Inc., Princeton. Jiang, J.-h. (1963). Essential component of the set of fixed points of the multivalued mappings and its application to the theory of games. Scientia Sinica, 12:951–964. Jordan, J. S. (1987). The informational requirement of local stability in decentralized allocation mechanisms. In Groves, T., Radner, R., and Reiter, S., editors, Information, Incentives, and Economic Mechanisms: Essays in Honor of Leonid Hurwicz, pages 183–212. University of Minnesota Press, Minneapolis. Kakutani, S. (1941). A generalization of Brouwer’s fixed point theorem. Duke Mathematical Journal, 8:457–459. Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In Proceedings of the 16th ACM Symposium on Theory of Computing, STOC ’84, pages 302–311, NewYork, NY, USA. ACM. Kelley, J. (1955). General Topology. Springer Verlag, New York. Khachian, L. (1979). A polynomial algorithm in linear programming. Soviet Mathematics Doklady, 20:191–194. Kinoshita, S. (1952). On essential components of the set of fixed points. Osaka Mathematical Journal, 4:19–22. Kinoshita, S. (1953). On some contractible continua without the fixed point property. Fundamentae Mathematicae, 40:96–98. 232 BIBLIOGRAPHY Klee, V. and Minty, G. (1972). How good is the simplex algorithm? In Sisha, O., editor, Inequalities III. Academic Press, New York. Kohlberg, E. and Mertens, J.-F. (1986). On the strategic stability of equilibria. Econometrica, 54:1003–1038. Krasnosel’ski, M. A. and Zabreiko, P. P. (1984). Geometric Methods of Nonlinear Analysis. Springer-Berlin, Berlin. Kreps, D. and Wilson, R. (1982). Sequential equilibrium. Econometrica, 50:863– 894. Kuhn, H. and MacKinnon, J. (1975). Sandwich method for finding fixed points. Journal of Optimization Theory and Applications, 17:189–204. Kuratowski, K. (1935). Quelques problèms concernant les espaces métriques nonséparables. Fundamenta Mathematicae, 25:534–545. Lefschetz, S. (1923). Continuous transformations of manifolds. Proceedings of the National Academy of Sciences, 9:90–93. Lefschetz, S. (1926). Intersections and transformations of complexes and manifolds. Transactions of the American Mathematical Society, 28:1–49. Lefschetz, S. (1927). Manifolds with a boundary and their transformations. Transactions of the American Mathematical Society, 29:429–462. Lyapunov, A. (1992). The General Problem of the Stability of Motion. Taylor and Francis, London. Mas-Colell, A. (1974). A note on a theorem of F. Browder. Mathematical Programming, 6:229–233. McLennan, A. (1991). Approxiation of contractible valued correspondences by functions. Journal of Mathematical Economics, 20:591–598. McLennan, A. and Tourky, R. (2010). Imitation games and computation. Games and Economic Behavior, 70:4–11. Merrill, O. (1972). Applications and Extensions of an Algorithm that Computes Fixed Points of Certain Upper Semi-continuous Point to Set Mappings. PhD thesis, University of Michigan, Ann Arbor, MI. Mertens, J.-F. (1989). Stable equilibria—a reformulation, part i: Definition and basic properties. Mathematics of Operations Research, 14:575–625. Mertens, J.-F. (1991). Stable equilibria—a reformulation, part ii: Discussion of the definition and further results. Mathematics of Operations Research, 16:694–753. Michael, E. (1951). Topologies on spaces of subsets. Transactions of the American Mathematical Society, 71:152–182. BIBLIOGRAPHY 233 Milnor, J. (1965). Topology from the Differentiable Viewpoint. University Press of Virginia, Charlottesville. Morgan, F. (1988). Geometric Measure Theory: A Beginner’s Guide. Academic Press, New York. Myerson, R. (1978). Refinements of the Nash equilibrium concept. International J. of Game Theory, 7:73–80. Nadzieja, T. (1990). Construction of a smooth Lyapunov function for an asymptotically stable set. Czechoslovak Mathematical Journal, 40:195–199. Nash, J. (1950). Non-cooperative Games. PhD thesis, Mathematics Department, Princeton University. Nash, J. (1951). Non-cooperative games. Annals of Mathematics, 54:286–295. Papadimitriou, C. H. (1994a). Computational Complexity. Addison Wesley Longman, New York. Papadimitriou, C. H. (1994b). On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and System Science, 48:498– 532. Ritzberger, K. (1994). The theory of normal form games from the differentiable viewpoint. International Journal of Game Theory, 23:201–236. Rudin, M. E. (1969). A new proof that metric spaces are paracompact. Proc. Amer. Math. Soc., 20:603. Saari, D. G. (1985). Iterative price mechanisms. Econometrica, 53:1117–1133. Saari, D. G. and Simon, C. P. (1978). Effective price mechanisms. Econometrica, 46:1097–1125. Samuelson, P. (1947). Foundations of Economic Analysis. Harvard University Press. Samuelson, P. A. (1941). The stability of equilibrium: Comparative statics and dynamics. Econometrica, 9:97–120. Samuelson, P. A. (1942). The stability of equilibrium: Linear and nonlinear systems. Econometrica, 10:1–25. Savani, R. and von Stengel, B. (2006). Hard-to-solve bimatrix games. Econometrica, 74:397–429. Scarf, H. (1960). Some examples of global instability of the competitive equilibrium. International Economic Review, 1:157–172. Selten, R. (1975). Re-examination of the perfectness concept for equilibrium points of extensive games. International J. of Game Theory, 4:25–55. 234 BIBLIOGRAPHY Shapley, L. S. (1974). A note on the Lemke-Howson algorithm. Mathematical Programming Study, 1:175–189. Spivak, M. (1965). Calculus on Manifolds : A Modern Approach to Classical Theorems of Advanced Calculus. Benjamin, New York. Spivak, M. (1979). A Comprehensive Introduction to Differential Geometry, volume 1. Publish or Perish, 2nd edition. Sternberg, S. (1983). Lectures on Differential Geometry. Chelsea Publishing Company, New York, 2nd edition. Stone, A. H. (1948). Paracompactness and product spaces. Bull. Amer. Math. Soc., 54:977–982. van der Laan, G. and Talman, A. (1979). A restart algorithm for computing fixed points without an extra dimension. Mathematical Programming, 17:74–84. Vietoris, L. (1923). Bereiche Zweiter Ordnung. Monatschefte für Mathematik und Physik, 33:49–62. von Neumann, J. (1928). Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100:295–320. Williams, S. R. (1985). Necessary and sufficient conditions for the existence of a locally stable message process. Journal of Economic Theory, 35:127–154. Wilson, F. W. (1969). Smoothing derivatives of functions and applications. Transactions of the American Mathematical Society, 139:413–428. Wojdyslawski, M. (1939). Rétractes absolus et hyperspaces des continus. Fundamenta Mathematicae, 32:184–192. Ziegler, G. M. (1995). Lectures on Polytopes. Springer Verlag, New York. Index C r , 126 C r ∂-embedding, 144 C r ∂-immersion, 144 C r atlas, 10 C r function, 127 C r manifold, 10, 131 C r submanifold, 11, 136 Q-robust set, 113 Q-robust set minimal, 114 minimal connected, 114 T1 -space, 66 ω-limit set, 19, 221 ∂-parameterization, 144 ε-domination, 17, 104 ε-homotopy, 17, 104 EXP, 61 FNP, 63 NP, 61 PLS (polynomial local search), 64 PPAD, 64 PPA, 65 PPP (polynomial pigeonhole principle), 64 PSPACE, 61 P, 61 TFNP, 63 Clique, 61 EOTL (end of the line), 64 OEOTL (other end of the line), 65 absolute neighborhood retract, 6, 100 absolute retract, 6, 102 acyclic, 34 affine combination, 23 dependence, 23 hull, 24 independence, 23 subspace, 24 Alexander horned sphere, 131 algorithm, 60 ambient space, 10, 133 annulus, 144 antipodal function, 203 antipodal points, 200 approximates, 189 Arrow, Kenneth, 2 asymptotic stability, 20, 221 atlas, 10, 131 axiom of choice, 36 balanced set, 116 Banach space, 90 barycenter, 32 base of a topology, 67 bijection, 6 Bing, R. H., 196 Border, Kim, i Borsuk, Karol, 18 Borsuk-Ulam theorem, 18, 204 bounding hyperplane, 24 Brouwer’s fixed point theorem, 3 Brouwer, Luitzen, 2 Brown, Robert, i category, 135 Cauchy sequence, 90 Cauchy-Schwartz inequality, 91 certificate, 61 Church-Turing thesis, 60 closed function, 74 codimension, 24, 136 commutativity configuration, 16, 179 compact-open topology, 83 complete invariant, 194 complete metric space, 90 complete vector field, 216 235 236 completely metrizable, 100 component of a graph, 34 computational problem, 60 complete for a class, 62 computable, 60 decision, 61 search, 61 connected graph, 34 space, 8, 113, 165 continuous, 78 contractible, 5 contraction, 5 converse Lyapunov theorem, 20, 222 convex, 24 combination, 24 cone, 25 hull, 24 coordinate chart, 10, 131 correspondence, 4, 77 closed valued, 77 compact valued, 4 convex valued, 4, 77 graph of, 77 lower semicontinuous, 78 upper semicontinuous, 77 correspondence principle, 19, 212 critical point, 139, 154 critical value, 139, 154 cycle, 34 Debreu, Gerard, 2 degree, 11, 33, 174 degree admissible function, 12, 14, 171 homotopy, 12, 171 Dehn, Max, 149 Demichelis, Stefano, 19 derivative, 126, 134, 135 derivative along a vector field, 20 Descartes, René, 30 deterministic, 212 diameter, 32 diffeomorphism, 11, 133 diffeomorphism point, 136 differentiable, 126 INDEX differentiation along a vector field, 221 dimension of a polyhedron, 26 of a polytopal complex, 30 of an affine subspace, 24 directed graph, 64 discrete set, 131 domain of attraction, 19, 221 domination, 183 dual, 25 Dugundji, James, 93 Dugundji, James, i edge, 27, 33 Eilenberg, Samuel, 7, 18 Eilenberg-Montgomery theorem, 196 embedding, 6, 131 endpoint, 33, 42 equilibrium, 19, 21 equilibrium of a vector field, 216 regular, 218 essential fixed point, 7 Nash equilibrium, 8 set of fixed points, 8, 112 set of Nash equilibria, 8 Euclidean neighborhood retract, 6, 99 Euler characteristic, 17, 20, 195 expected payoffs, 37 extension of an index, 183 extraneous solution, 44 extreme point, 29 face, 26 proper, 27 facet, 27 family of sets locally finite, 85 refinement of, 85 Federer, Herbert, 150 Fermat’s last theorem, 149 fixed point, 3, 4 fixed point property, 3, 6 flow, 19, 216 flow domain, 216 Fort, M. K., 107 four color theorem, 149 237 INDEX Freedman, Michael, 149 Fubini’s theorem, 150 functor, 135 general position, 41 general linear group, 165 Granas, Andrzej, i graph, 4, 33 half-space, 24 Hauptvermutung, 196 Hausdorff distance, 70 Hausdorff measre zero, 156 Hausdorff space, 67 have the same orientation, 11, 164 Hawaiian earring, 33, 100 Heegaard, Poul, 149 Hilbert cube, 93 Hilbert space, 91 homology, 2, 3, 177, 196 homotopy, 5, 58–59 class, 5 extension property, 198 invariant, 18, 197 principle, 178 homotopy extension property, 103 Hopf’s theorem, 18, 197–198 Hopf, Heinz, 18, 196 hyperplane, 24 identity component, 165 immersion, 138 immersion point, 136 implicit function theorem, 127 index, 9, 15, 16, 177, 179 index admissible correspondence, 15, 177 homotopy, 178 vector field, 20 index base, 15, 177 index scope, 15, 178 inessential fixed point, 7 initial point, 27 injection, 6 inner product, 91 inner product space, 91 invariance of domain, 18, 207 invariant, 19 invariant set, 221 inverse function theorem, 127 isometry, 52 Kakutani, Shizuo, 4 Kinoshita, Shin’ichi, 6, 95, 108 labelling, 51 Lefschetz fixed point theorem, 17, 196 Lefschetz number, 17, 196 Lefschetz, Solomon, 17, 196 Lemke-Howson algorithm, 36–49, 62, 64 Lesbesgue measure, 150 lineality space, 26 linear complementarity problem, 45 Lipshitz, 212 local diffeomorphism, 138 locally C r , 130 locally closed set, 98 locally Lipschitz, 212 locally path connected space, 102, 142 lower semicontinuous, 78 Lyapunov function, 221 Lyapunov function for A ⊂ M, 20 Lyapunov theorem, 221 Lyapunov, Aleksandr, 20 manifold, 10 C r , 131 manifold with boundary, 13, 144 Mas-Colell, Andreu, 17, 115 maximal, 34 measure theory, 150 measure zero, 151, 157 mesh, 32 Milnor, John, 150, 196 Minkowski sum, 111 Moise, Edwin E., 196 Montgomery, Deane, 7, 18 Morse-Sard theorem, 157 moving frame, 165 multiplicative, 16, 180 Möbius, August Ferdinand, 149 narrowing of focus, 183 Nash equilibrium accessible, 43 238 mixed, 37 pure, 37 refinements of, 8 Nash, John, 2 negatively oriented, 164, 168 negatively oriented relative to P , 169 neighborhood retract, 98 neighbors, 33 nerve of an open cover, 105 no retraction theorem, 100 norm, 90 normal bundle, 140 normal space, 67 normal vector, 24 normed space, 90 opposite orientation, 164, 168 oracle, 62 order of differentiability, 126 ordered basis, 164 orientable, 11, 168 orientation, 163–171 orientation preserving, 11, 53, 169 orientation reversing, 11, 53, 169 orientation reversing loop, 167 oriented ∂-manifold, 168 oriented intersection number, 169 oriented manifold, 11 oriented vector space, 11, 164 paracompact space, 85 parameterization, 10, 131 partition of unity, 86 C r , 128 path, 33, 165 path connected space, 164 payoff functions, 37 Perelman, Grigori, 149 Picard-Lindelöf theorem, 212, 215 pivot, 57 pivoting, 48 Poincaré conjecture, 149 Poincaré, Henri, 149 pointed cone, 26 pointed map, 113 pointed space, 113 polyhedral complex, 30 INDEX polyhedral subdivision, 30 polyhedron, 26 minimal representation of, 27 standard representation of, 27 polytopal complex, 30 polytopal subdivision, 30 polytope, 29 simple, 46 positively oriented, 11, 164, 168 positively oriented relative to P , 169 predictor-corrector method, 58 prime factorization, 63 quadruple edge, 42 qualified, 42 vertex, 42 quotient topology, 80 Rado, Tibor, 149, 196 recession cone, 25 reduction, 62 regular fixed point, 9 regular point, 11, 139 regular space, 67 regular value, 11, 139 retract, 6, 97 retraction, 97 Ritzberger, Klaus, i, 19 Samuelson, Paul, 19 Sard’s theorem, 182, 218 Scarf algorithm, 56 Scarf, Herbert, 18 separable, 6 separable metric space, 91 separating hyperplane theorem, 24 set valued mapping, 4 simplex, 31 accessible completely labelled, 58 almost completely labelled, 54 completely labelled, 52 simplicial complex, 31 abstract, 32 canonical realization, 32 simplicial subdivision, 31 simply connected, 149 239 INDEX slack variables, 44 slice of a set, 153 Smale, Stephen, 149 smooth, 11, 156 Sperner labelling, 51 star-shaped, 5 Steinitz, Ernst, 196 step size, 58 Sternberg, Shlomo, 150 strategy mixed, 37 pure, 37 totally mixed, 39 strategy profile mixed, 37 pure, 37 strong topology, 83 strong upper topology, 78 subbase of a topology, 67 subcomplex, 30 submanifold, 11 neat, 146 submersion, 138 submersion point, 136 subsumes, 183 support of a mixed strategy, 63 surjection, 6 tableau, 47 tangent bundle, 133 tangent space, 10, 133 tatonnement, 18 Tietze, Heinrich, 196 topological space, 66 topological vector space, 88 locally convex, 89 torus, 10 trajectory, 212, 213 transition function, 10 translation invariant topology, 88 transversal, 139, 146, 158 tree, 34 triangulation, 31 tubular neighborhood theorem, 140 Turing machine, 59 two person game, 37 Ulam, Stanislaw, 18 uniformly locally contractible metric space, 101 upper semicontinuous, 4, 77 Urysohn’s lemma, 87 van Dyke, Walther, 149 vector bundle, 140 vector field, 19, 158, 213 along a curve, 165 index admissible, 216 vector field homotopy, 217 index admissible, 217 vector field index, 216 vertex, 27, 32 vertices, 33 connected, 34 Vietoris topology, 68 Vietoris, Leopold, 66 von Neumann, John, 4 Voronoi diagram, 30 walk, 33 weak topology, 83 weak upper topology, 80 well ordering, 85 well ordering theorem, 85 Whitney embedding theorems, 133 Whitney, Hassler, 10 wild embedding, 131 witness, 61 zero section, 140, 158