Linear Algebra I Peter Philip∗ Lecture Notes Originally Created for the Class of Winter Semester 2018/2019 at LMU Munich Includes Subsequent Corrections and Revisions† March 5, 2023 Contents 1 Foundations: Mathematical Logic and Set Theory 1.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Propositional Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4 Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2 Functions and Relations ∗ † 5 25 2.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.2 Order Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . 39 E-Mail: philip@math.lmu.de Resources used in the preparation of this text include [Kun80, Str08]. 1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 CONTENTS 2 3 Natural Numbers, Induction, and the Size of Sets 44 3.1 Induction and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Cardinality: The Size of Sets . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Definition and General Properties . . . . . . . . . . . . . . . . . . 50 3.2.2 Finite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.3 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4 Basic Algebraic Structures 60 4.1 Magmas and Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5 Vector Spaces 91 5.1 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2 Linear Independence, Basis, Dimension . . . . . . . . . . . . . . . . . . . 98 6 Linear Maps 113 6.1 Basic Properties and Examples . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.3 Vector Spaces of Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . 126 7 Matrices 130 7.1 Definition and Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.2 Matrices as Representations of Linear Maps . . . . . . . . . . . . . . . . 135 7.3 Rank and Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.4 Special Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.5 Blockwise Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . 151 8 Linear Systems 153 8.1 General Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.2 Abstract Solution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.3 Finding Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 CONTENTS 3 8.3.1 Echelon Form, Back Substitution . . . . . . . . . . . . . . . . . . 157 8.3.2 Elementary Row Operations, Variable Substitution . . . . . . . . 162 8.3.3 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 164 8.4 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.4.1 Definition and Motivation . . . . . . . . . . . . . . . . . . . . . . 169 8.4.2 Elementary Row Operations Via Matrix Multiplication . . . . . . 170 8.4.3 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.4.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 A Axiomatic Set Theory 185 A.1 Motivation, Russell’s Antinomy . . . . . . . . . . . . . . . . . . . . . . . 185 A.2 Set-Theoretic Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 A.3 The Axioms of Zermelo-Fraenkel Set Theory . . . . . . . . . . . . . . . . 188 A.3.1 Existence, Extensionality, Comprehension . . . . . . . . . . . . . 189 A.3.2 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 A.3.3 Pairing, Union, Replacement . . . . . . . . . . . . . . . . . . . . . 193 A.3.4 Infinity, Ordinals, Natural Numbers . . . . . . . . . . . . . . . . . 197 A.3.5 Power Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 A.3.6 Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 A.4 The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 A.5 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 B Associativity and Commutativity 221 B.1 Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 B.2 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 C Groups 225 D Number Theory 227 E Vector Spaces 231 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 CONTENTS 4 E.1 Cardinality and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 231 E.2 Cartesian Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 F Source Code of Algorithms Implemented in R 233 F.1 Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 F.1.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 F.1.2 Back and Forward Substitution . . . . . . . . . . . . . . . . . . . 235 F.1.3 Gaussian Elimination and LU Decomposition . . . . . . . . . . . 237 F.1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 References AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 243 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 5 1 Foundations: Mathematical Logic and Set Theory 1.1 Introductory Remarks The task of mathematics is to establish the truth or falsehood of (formalizable) statements using rigorous logic, and to provide methods for the solution of classes of (e.g. applied) problems, ideally including rigorous logical proofs verifying the validity of the methods (proofs that the method under consideration will, indeed, provide a correct solution). The topic of this class is linear algebra, a subfield of the field of algebra. Algebra in the sense of this class is also called abstract algebra and constitutes the study of mathematical objects and rules for combining them (one sometimes says that these rules form a structure on the underlying set of objects). An important task of algebra is the solution of equations, where the equations are formulated in some set of objects with a given structure. In linear algebra, the sets of objects are so-called vector spaces (the objects being called vectors) and the structure consists of an addition, assigning two vectors v, w their sum vector v + w, together with a scalar multiplication, assigning a scalar (from a so-called scalar field, more about this later) λ and a vector v the product vector λv. In linear algebra, one is especially interested in solving linear equations, i.e. equations of the form A(x) = b, where A is a linear function, i.e. a function, satisfying A(λv + µw) = λA(v) + µA(w) for all vectors v, w and all scalars λ, µ. Before we can properly define and study vector spaces and linear equations, it still needs some preparatory work. In modern mathematics, the objects under investigation are almost always so-called sets. So one aims at deriving (i.e. proving) true (and interesting and useful) statements about sets from other statements about sets known or assumed to be true. Such a derivation or proof means applying logical rules that guarantee the truth of the derived (i.e. proved) statement. However, unfortunately, a proper definition of the notion of set is not easy, and neither is an appropriate treatment of logic and proof theory. Here, we will only be able to briefly touch on the bare necessities from logic and set theory needed to proceed to the core matter of this class. We begin with logic in Sec. 1.2, followed by set theory in Sec. 1.3, combining both in Sec. 1.4. The interested student can find an introductory presentation of axiomatic set theory in Appendix A and he/she should consider taking a separate class on set theory, logic, and proof theory at a later time. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 1.2 Propositional Calculus 1.2.1 Statements 6 Mathematical logic is a large field in its own right and, as indicated above, a thorough introduction is beyond the scope of this class – the interested reader may refer to [EFT07], [Kun12], and references therein. Here, we will just introduce some basic concepts using common English (rather than formal symbolic languages – a concept touched on in Sec. A.2 of the Appendix and more thoroughly explained in books like [EFT07]). As mentioned before, mathematics establishes the truth or falsehood of statements. By a statement or proposition we mean any sentence (any sequence of symbols) that can reasonably be assigned a truth value, i.e. a value of either true, abbreviated T, or false, abbreviated F. The following example illustrates the difference between statements and sentences that are not statements: Example 1.1. (a) Sentences that are statements: Every dog is an animal. (T) Every animal is a dog. (F) The number 4 is odd. (F) 2 + 3 = 5. (T) √ 2 < 0. (F) x + 1 > 0 holds for each natural number x. (T) (b) Sentences that are not statements: Let’s study calculus! Who are you? 3 · 5 + 7. x + 1 > 0. All natural numbers are green. The fourth sentence in Ex. 1.1(b) is not a statement, as it can not be said to be either true or false without any further knowledge on x. The fifth sentence in Ex. 1.1(b) is not a statement as it lacks any meaning and can, hence, not be either true or false. It would become a statement if given a definition of what it means for a natural number to be green. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 1.2.2 7 Logical Operators The next step now is to combine statements into new statements using logical operators, where the truth value of the combined statements depends on the truth values of the original statements and on the type of logical operator facilitating the combination. The simplest logical operator is negation, denoted ¬. It is actually a so-called unary operator, i.e. it does not combine statements, but is merely applied to one statement. For example, if A stands for the statement “Every dog is an animal.”, then ¬A stands for the statement “Not every dog is an animal.”; and if B stands for the statement “The number 4 is odd.”, then ¬B stands for the statement “The number 4 is not odd.”, which can also be expressed as “The number 4 is even.” To completely understand the action of a logical operator, one usually writes what is known as a truth table. For negation, the truth table is A ¬A T F F T (1.1) that means if the input statement A is true, then the output statement ¬A is false; if the input statement A is false, then the output statement ¬A is true. We now proceed to discuss binary logical operators, i.e. logical operators combining precisely two statements. The following four operators are essential for mathematical reasoning: Conjunction: A and B, usually denoted A ∧ B. Disjunction: A or B, usually denoted A ∨ B. Implication: A implies B, usually denoted A ⇒ B. Equivalence: A is equivalent to B, usually denoted A ⇔ B. Here is the corresponding truth table: A T T F F B T F T F A∧B A∨B A⇒B A⇔B T T T T F T F F F T T F F F T T (1.2) When first seen, some of the assignments of truth values in (1.2) might not be completely intuitive, due to the fact that logical operators are often used somewhat differently in common English. Let us consider each of the four logical operators of (1.2) in sequence: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 8 For the use in subsequent examples, let A1 , . . . , A6 denote the six statements from Ex. 1.1(a). Conjunction: Most likely the easiest of the four, basically identical to common language use: A ∧ B is true if, and only if, both A and B are true. For example, using Ex. 1.1(a), A1 ∧ A4 is the statement “Every dog is an animal and 2 + 3 = 5.”, which is true since both A1 and A4 are true. On the other hand, A1 ∧ A3 is the statement “Every dog is an animal and the number 4 is odd.”, which is false, since A3 is false. Disjunction: The disjunction A ∨ B is true if, and only if, at least one of the statements A, B is true. Here one already has to be a bit careful – A ∨ B defines the inclusive or, whereas “or” in common English is often understood to mean the exclusive or (which is false if both input statements are true). For example, using Ex. 1.1(a), A1 ∨ A4 is the statement “Every dog is an animal or 2 + 3 = 5.”, which is true since both A1 and A4 are true. The statement A1 ∨ A3 , i.e. “Every dog is an animal or the number 4 is odd.” is also true, since A1 is true. However, the statement A2 ∨ A5 , i.e. “Every animal is a √ dog or 2 < 0.” is false, as both A2 and A5 are false. As you will have noted in the above examples, logical operators can be applied to combine statements that have no obvious contents relation. While this might seem strange, introducing contents-related restrictions is unnecessary as well as undesirable, since it is often not clear which seemingly unrelated statements might suddenly appear in a common context in the future. The same occurs when considering implications and equivalences, where it might seem even more obscure at first. Implication: Instead of A implies B, one also says if A then B, B is a consequence of A, B is concluded or inferred from A, A is sufficient for B, or B is necessary for A. The implication A ⇒ B is always true, except if A is true and B is false. At first glance, it might be surprising that A ⇒ B is defined to be true for A false and B true, however, there are many examples of incorrect statements implying correct statements. For instance, squaring the (false) equality of integers −1 = 1, implies the (true) equality of integers 1 = 1. However, as with conjunction and disjunction, it is perfectly valid to combine statements without any obvious context relation: For example, using Ex. 1.1(a), the statement A1 ⇒ A6 , i.e. “Every dog is an animal implies x + 1 > 0 holds for each natural number x.” is true, since A6 is true, whereas the statement A4 ⇒ A2 , i.e. “2 + 3 = 5 implies every animal is a dog.” is false, as A4 is true and A2 is false. Of course, the implication A ⇒ B is not really useful in situations, where the truth values of both A and B are already known. Rather, in a typical application, one tries to establish the truth of A to prove the truth of B (a strategy that will fail if A happens to be false). Example 1.2. Suppose we know Sasha to be a member of a group of students, taking a class in Linear Algebra. Then the statement A “Sasha has taken a class in Linear AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 9 Algebra before.” implies the statement B “There is at least one student in the group, who has taken the class before”. A priori, we might not know if Sasha has taken the Linear Algebra class before, but if we can establish that Sasha has, indeed, taken the class before, then we also know B to be true. If we find Sasha to be taking the class for the first time, then we do not know, whether B is true or false. — Equivalence: A ⇔ B means A is true if, and only if, B is true. Once again, using input statements from Ex. 1.1(a), we see that A1 ⇔ A4 , i.e. “Every dog is an animal is equivalent to 2 + 3 = 5.”, is true as well as A2 ⇔ A3 , i.e. “Every animal is a dog is equivalent to √ the number 4 is odd.”. On the other hand, A4 ⇔ A5 , i.e. “2 + 3 = 5 is equivalent to 2 < 0, is false. Analogous to the situation of implications, A ⇔ B is not really useful if the truth values of both A and B are known a priori, but can be a powerful tool to prove B to be true or false by establishing the truth value of A. It is obviously more powerful than the implication as illustrated by the following example (compare with Ex. 1.2): Example 1.3. Suppose we know Sasha has obtained the highest score among the students registered for the Linear Algebra class. Then the statement A “Sasha has taken the Linear Algebra class before.” is equivalent to the statement B “The student with the highest score has taken the class before.” As in Ex. 1.2, if we can establish Sasha to have taken the class before, then we also know B to be true. However, in contrast to Ex. 1.2, if we find Sasha to have taken the class for the first time, then we know B to be false. Remark 1.4. In computer science, the truth value T is often coded as 1 and the truth value F is often coded as 0. 1.2.3 Rules Note that the expressions in the first row of the truth table (1.2) (e.g. A ∧ B) are not statements in the sense of Sec. 1.2.1, as they contain the statement variables (also known as propositional variables) A or B. However, the expressions become statements if all statement variables are substituted with actual statements. We will call expressions of this form propositional formulas. Moreover, if a truth value is assigned to each statement variable of a propositional formula, then this uniquely determines the truth value of the formula. In other words, the truth value of the propositional formula can be calculated from the respective truth values of its statement variables – a first justification for the name propositional calculus. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 10 Example 1.5. (a) Consider the propositional formula (A ∧ B) ∨ (¬B). Suppose A is true and B is false. The truth value of the formula is obtained according to the following truth table: A B A ∧ B ¬B (A ∧ B) ∨ (¬B) T F F T T (1.3) (b) The propositional formula A ∨ (¬A), also known as the law of the excluded middle, has the remarkable property that its truth value is T for every possible choice of truth values for A: A ¬A A ∨ (¬A) T F T (1.4) F T T Formulas with this property are of particular importance. Definition 1.6. A propositional formula is called a tautology or universally true if, and only if, its truth value is T for all possible assignments of truth values to all the statement variables it contains. Notation 1.7. We write φ(A1 , . . . , An ) if, and only if, the propositional formula φ contains precisely the n statement variables A1 , . . . , An . Definition 1.8. The propositional formulas φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are called equivalent if, and only if, φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) is a tautology. Lemma 1.9. The propositional formulas φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are equivalent if, and only if, they have the same truth value for all possible assignments of truth values to A1 , . . . , An . Proof. If φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are equivalent and Ai is assigned the truth value ti , i = 1, . . . , n, then φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) being a tautology implies it has truth value T. From (1.2) we see that either φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) both have truth value T or they both have truth value F. If, on the other hand, we know φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) have the same truth value for all possible assignments of truth values to A1 , . . . , An , then, given such an assignment, either φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) both have truth value T or both have truth value F, i.e. φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) has truth value T in each case, showing it is a tautology. For all logical purposes, two equivalent formulas are exactly the same – it does not matter if one uses one or the other. The following theorem provides some important equivalences of propositional formulas. As too many parentheses tend to make formulas less readable, we first introduce some precedence conventions for logical operators: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 11 Convention 1.10. ¬ takes precedence over ∧, ∨, which take precedence over ⇒, ⇔. So, for example, (A ∨ ¬B ⇒ ¬B ∧ ¬A) ⇔ ¬C ∧ (A ∨ ¬D) is the same as A ∨ (¬B) ⇒ (¬B) ∧ (¬A) ⇔ (¬C) ∧ A ∨ (¬D) . Theorem 1.11. (a) (A ⇒ B) ⇔ ¬A ∨ B. This means one can actually define implication via negation and disjunction. (b) (A ⇔ B) ⇔ (A ⇒ B) ∧ (B ⇒ A) , i.e. A and B are equivalent if, and only if, A is both necessary and sufficient for B. One also calls the implication B ⇒ A the converse of the implication A ⇒ B. Thus, A and B are equivalent if, and only if, both A ⇒ B and its converse hold true. (c) Commutativity of Conjunction: A ∧ B ⇔ B ∧ A. (d) Commutativity of Disjunction: A ∨ B ⇔ B ∨ A. (e) Associativity of Conjunction: (A ∧ B) ∧ C ⇔ A ∧ (B ∧ C). (f ) Associativity of Disjunction: (A ∨ B) ∨ C ⇔ A ∨ (B ∨ C). (g) Distributivity I: A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C). (h) Distributivity II: A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C). (i) De Morgan’s Law I: ¬(A ∧ B) ⇔ ¬A ∨ ¬B. (j) De Morgan’s Law II: ¬(A ∨ B) ⇔ ¬A ∧ ¬B. (k) Double Negative: ¬¬A ⇔ A. (l) Contraposition: (A ⇒ B) ⇔ (¬B ⇒ ¬A). Proof. Each equivalence is proved by providing a truth table and using Lem. 1.9. (a): A T T F F B T F T F ¬A A ⇒ B ¬A ∨ B F T T F F F T T T T T T AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 12 (b) – (h): Exercise. (i): A T T F F B T F T F ¬A ¬B A ∧ B ¬(A ∧ B) ¬A ∨ ¬B F F T F F F T F T T T F F T T T T F T T (j): Exercise. (k): A T F ¬A ¬¬A F T T F (l): A T T F F B T F T F ¬A ¬B A ⇒ B ¬B ⇒ ¬A F F T T F T F F T F T T T T T T Having checked all the rules completes the proof of the theorem. The importance of the rules provided by Th. 1.11 lies in their providing proof techniques, i.e. methods for establishing the truth of statements from statements known or assumed to be true. The rules of Th. 1.11 will be used frequently in proofs throughout this class. Remark 1.12. Another important proof technique is the so-called proof by contradiction, also called indirect proof. It is based on the observation, called the principle of contradiction, that A ∧ ¬A is always false: A ¬A T F F T A ∧ ¬A F F (1.5) Thus, one possibility of proving a statement B to be true is to show ¬B ⇒ A ∧ ¬A for some arbitrary statement A. Since the right-hand side of the implication is false, the left-hand side must also be false, proving B is true. — AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 13 Two more rules we will use regularly in subsequent proofs are the so-called transitivity of implication and the transitivity of equivalence (we will encounter equivalence again in the context of relations in Sec. 1.3 below). In preparation for the transitivity rules, we generalize implication to propositional formulas: Definition 1.13. In generalization of the implication operator defined in (1.2), we say the propositional formula φ(A1 , . . . , An ) implies the propositional formula ψ(A1 , . . . , An ) (denoted φ(A1 , . . . , An ) ⇒ ψ(A1 , . . . , An )) if, and only if, each assignment of truth values to the A1 , . . . , An that makes φ(A1 , . . . , An ) true, makes ψ(A1 , . . . , An ) true as well. Theorem 1.14. (a) Transitivity of Implication: (A ⇒ B) ∧ (B ⇒ C) ⇒ (A ⇒ C). (b) Transitivity of Equivalence: (A ⇔ B) ∧ (B ⇔ C) ⇒ (A ⇔ C). Proof. According to Def. 1.13, the rules can be verified by providing truth tables that show that, for all possible assignments of truth values to the propositional formulas on the left-hand side of the implications, either the left-hand side is false or both sides are true. (a): A T T F F T T F F B T F T F T F T F C T T T T F F F F A ⇒ B B ⇒ C (A ⇒ B) ∧ (B ⇒ C) A ⇒ C T T T T F T F T T T T T T T T T T F F F F T F F T F F T T T T T A T T F F T T F F B T F T F T F T F C T T T T F F F F A ⇔ B B ⇔ C (A ⇔ B) ∧ (B ⇔ C) A ⇔ C T T T T F F F T F T F F T F F F T F F F F T F F F F F T T T T T (b): Having checked both rules, the proof is complete. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 14 Definition and Remark 1.15. A proof of the statement B is a finite sequence of statements A1 , A2 , . . . , An such that A1 is true; for 1 ≤ i < n, Ai implies Ai+1 , and An implies B. If there exists a proof for B, then Th. 1.14(a) guarantees that B is true1 . Remark 1.16. Principle of Duality: In Th. 1.11, there are several pairs of rules that have an analogous form: (c) and (d), (e) and (f), (g) and (h), (i) and (j). These analogies are due to the general law called the principle of duality: If φ(A1 , . . . , An ) ⇒ ψ(A1 , . . . , An ) and only the operators ∧, ∨, ¬ occur in φ and ψ, then the reverse implication Φ(A1 , . . . , An ) ⇐ Ψ(A1 , . . . , An ) holds, where one obtains Φ from φ and Ψ from ψ by replacing each ∧ with ∨ and each ∨ with ∧. In particular, if, instead of an implication, we start with an equivalence (as in the examples from Th. 1.11), then we obtain another equivalence. 1.3 Set Theory In the previous section, we have had a first glance at statements and corresponding truth values. In the present section, we will move our focus to the objects such statements are about. Reviewing Example 1.1(a), and recalling that this is a mathematics class rather than one in zoology, the first two statements of Example 1.1(a) are less relevant for us than statements 3–6. As in these examples, we will nearly always be interested in statements involving numbers or collections of numbers or collections of such collections etc. In modern mathematics, the term one usually uses instead of “collection” is “set”. In 1895, Georg Cantor defined a set as “any collection into a whole M of definite and separate objects m of our intuition or our thought”. The objects m are called the elements of the set M . As explained in Appendix A, without restrictions and refinements, Cantor’s set theory is not free of contradictions and, thus, not viable to be used in the foundation of mathematics. Axiomatic set theory provides these necessary restrictions and refinements and an introductory treatment can also be found in Appendix A. However, it is possible to follow and understand the rest of this class, without having studied Appendix A. Notation 1.17. We write m ∈ M for the statement “m is an element of the set M ”. Definition 1.18. The sets M and N are equal, denoted M = N , if, and only if, M and N have precisely the same elements. 1 Actually, more generally, a proof of the statement B is given by a finite sequence of statements A1 , A2 , . . . , An such that A1 is true; the logical disjunction A1 ∨ · · · ∨ Ai implies Ai+1 for 1 ≤ i < n; and A1 ∨ · · · ∨ An implies B. It is then still correct that the existence of a proof of B guarantees B to be true. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 15 — Definition 1.18 means we know everything about a set M if, and only if, we know all its elements. Definition 1.19. The set with no elements is called the empty set; it is denoted by the symbol ∅. Example 1.20. For finite sets, we can simply write down all its elements, for example, √ A := {0}, B := {0, 17.5}, C := {5, 1, 5, 3}, D := {3, 5, 1}, E := {2, 2, −2}, where the symbolism “:=” is to be read as “is defined to be equal to”. Note C = D, since both sets contain precisely the same elements. In particular, the order in which the elements are written down plays no role and a set does not change if an element is written down more than once. If a set has many elements, instead of writing down all its elements, one might use abbreviations such as F := {−4, −2, . . . , 20, 22, 24}, where one has to make sure the meaning of the dots is clear from the context. Definition 1.21. The set A is called a subset of the set B (denoted A ⊆ B and also referred to as the inclusion of A in B) if, and only if, every element of A is also an element of B (one sometimes also calls B a superset of A and writes B ⊇ A). Please note that A = B is allowed in the above definition of a subset. If A ⊆ B and A 6= B, then A is called a strict subset of B, denoted A ( B. If B is a set and P (x) is a statement about an element x of B (i.e., for each x ∈ B, P (x) is either true or false), then we can define a subset A of B by writing A := {x ∈ B : P (x)}. (1.6) This notation is supposed to mean that the set A consists precisely of those elements of B such that P (x) is true (has the truth value T in the language of Sec. 1.2). Example 1.22. (a) For each set A, one has A ⊆ A and ∅ ⊆ A. (b) If A ⊆ B, then A = {x ∈ B : x ∈ A}. (c) We have {3} ⊆ {6.7, 3, 0}. Letting A := {−10, −8, . . . , 8, 10}, we have {−2, 0, 2} = {x ∈ A : x3 ∈ A}, ∅ = {x ∈ A : x + 21 ∈ A}. Remark 1.23. As a consequence of Def. 1.18, the sets A and B are equal if, and only if, one has both inclusions, namely A ⊆ B and B ⊆ A. Thus, when proving the equality of sets, one often divides the proof into two parts, first proving one inclusion, then the other. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 16 Definition 1.24. (a) The intersection of the sets A and B, denoted A ∩ B, consists of all elements that are in A and in B. The sets A, B are said to be disjoint if, and only if, A ∩ B = ∅. (b) The union of the sets A and B, denoted A ∪ B, consists of all elements that are in A or in B (as in the logical disjunction in (1.2), the or is meant nonexclusively). If A and B are disjoint, one sometimes writes A ∪˙ B and speaks of the disjoint union of A and B. (c) The difference of the sets A and B, denoted A\B (read “A minus B” or “A without B”), consists of all elements of A that are not elements of B, i.e. A \ B := {x ∈ A: x∈ / B}. If B is a subset of a given set A (sometimes called the universe in this context), then A \ B is also called the complement of B with respect to A. In that case, one also writes B c := A \ B (note that this notation suppresses the dependence on A). Example 1.25. (a) Examples of Intersections: {1, 2, 3} ∩ {3, 4, 5} = {3}, √ { 2} ∩ {1, 2, . . . , 10} = ∅, {−1, 2, −3, 4, 5} ∩ {−10, −9, . . . , −1} ∩ {−1, 7, −3} = {−1, −3}. (1.7a) (1.7b) (1.7c) (b) Examples of Unions: {1, 2, 3} ∪ {3, 4, 5} = {1, 2, 3, 4, 5}, ˙ {1, 2, 3}∪{4, 5} = {1, 2, 3, 4, 5}, {−1, 2, −3, 4, 5} ∪ {−99, −98, . . . , −1} ∪ {−1, 7, −3} = {−99, −98, . . . , −2, −1, 2, 4, 5, 7}. (1.8a) (1.8b) (1.8c) (c) Examples of Differences: {1, 2, 3} \ {3, 4, 5} = {1, 2}, √ {1, 2, 3} \ {3, 2, 1, 5} = ∅, {−10, −9, . . . , 9, 10} \ {0} = {−10, −9, . . . , −1} ∪ {1, 2, . . . , 9, 10}. (1.9a) (1.9b) (1.9c) With respect to the universe {1, 2, 3, 4, 5}, it is {1, 2, 3}c = {4, 5}; (1.9d) with respect to the universe {0, 1, . . . , 20}, it is {1, 2, 3}c = {0} ∪ {4, 5, . . . , 20}. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (1.9e) 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 17 As mentioned earlier, it willoften be unavoidable to consider sets of sets. Here are first examples: ∅, {0}, {0, 1} , {0, 1}, {1, 2} . Definition 1.26. Given a set A, the set of all subsets of A is called the power set of A, denoted P(A) (for reasons explained later (cf. Prop. 2.18), the power set is sometimes also denoted as 2A ). Example 1.27. Examples of Power Sets: P(∅) = {∅}, P({0}) = ∅, {0} , P P({0}) = P ∅, {0} = ∅, {∅}, {{0}}, P({0}) . (1.10a) (1.10b) (1.10c) — So far, we have restricted our set-theoretic examples to finite sets. However, not surprisingly, many sets of interest to us will be infinite (we will have to postpone a mathematically precise definition of finite and infinite to Sec. 3.2). We will now introduce the most simple infinite set. Definition 1.28. The set N := {1, 2, 3, . . . } is called the set of natural numbers (for a more rigorous construction of N, based on the axioms of axiomatic set theory, see Sec. A.3.4 of the Appendix, where Th. A.46 shows N to be, indeed, infinite). Moreover, we define N0 := {0} ∪ N. — The following theorem compiles important set-theoretic rules: Theorem 1.29. Let A, B, C, U be sets. (a) Commutativity of Intersections: A ∩ B = B ∩ A. (b) Commutativity of Unions: A ∪ B = B ∪ A. (c) Associativity of Intersections: (A ∩ B) ∩ C = A ∩ (B ∩ C). (d) Associativity of Unions: (A ∪ B) ∪ C = A ∪ (B ∪ C). (e) Distributivity I: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). (f ) Distributivity II: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). (g) De Morgan’s Law I: U \ (A ∩ B) = (U \ A) ∪ (U \ B). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 18 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY (h) De Morgan’s Law II: U \ (A ∪ B) = (U \ A) ∩ (U \ B). (i) Double Complement: If A ⊆ U , then U \ (U \ A) = A. Proof. In each case, the proof results from the corresponding rule of Th. 1.11: (a): x∈A∩B ⇔x∈A∧x∈B Th. 1.11(c) ⇔ x ∈ B ∧ x ∈ A ⇔ x ∈ B ∩ A. (g): Under the general assumption of x ∈ U , we have the following equivalences: x ∈ U \ (A ∩ B) ⇔ ¬(x ∈ A ∩ B) ⇔ ¬ x ∈ A ∧ x ∈ B ⇔ x ∈ U \ A ∨ x ∈ U \ B ⇔ x ∈ (U \ A) ∪ (U \ B). Th. 1.11(i) ⇔ ¬(x ∈ A) ∨ ¬(x ∈ B) The proofs of the remaining rules are left as an exercise. Remark 1.30. The correspondence between Th. 1.11 and Th. 1.29 is no coincidence. One can actually prove that, starting with an equivalence of propositional formulas φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ), where both formulas contain only the operators ∧, ∨, ¬, one obtains a set-theoretic rule (stating an equality of sets) by reinterpreting all statement variables A1 , . . . , An as variables for sets, all subsets of a universe U , and replacing ∧ by ∩, ∨ by ∪, and ¬ by U \ (if there are no multiple negations, then we do not need the hypothesis that A1 , . . . , An are subsets of U ). The procedure also works in the opposite direction – one can start with a set-theoretic formula for an equality of sets and translate it into two equivalent propositional formulas. 1.4 Predicate Calculus Now that we have introduced sets in the previous section, we have to return to the subject of mathematical logic once more. As it turns out, propositional calculus, which we discussed in Sec. 1.2, does not quite suffice to develop the theory of calculus (nor most other mathematical theories). The reason is that we need to consider statements such as x + 1 > 0 holds for each natural number x. (T) All real numbers are positive. (F) There exists a natural number bigger than 10. (T) There exists a real number x such that x2 = −1. (F) For all natural numbers n, there exists a natural number bigger than n. (T) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (1.11a) (1.11b) (1.11c) (1.11d) (1.11e) 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 19 That means we are interested in statements involving universal quantification via the quantifier “for all” (one also often uses “for each” or “for every” instead), existential quantification via the quantifier “there exists”, or both. The quantifier of universal quantification is denoted by ∀ and the quantifier of existential quantification is denoted by ∃. Using these symbols as well as N and R to denote the sets of natural and real numbers, respectively, we can restate (1.11) as ∀ x + 1 > 0. (T) (1.12a) ∀ x > 0. (F) (1.12b) ∃ n > 10. (T) (1.12c) ∃ x2 = −1. (F) (1.12d) ∀ (1.12e) x∈N x∈R n∈N x∈R ∃ m > n. (T) n∈N m∈N Definition 1.31. A universal statement has the form ∀ P (x), (1.13a) x∈A whereas an existential statement has the form ∃ P (x). (1.13b) x∈A In (1.13), A denotes a set and P (x) is a sentence involving the variable x, a so-called predicate of x, that becomes a statement (i.e. becomes either true or false) if x is substituted with any concrete element of the set A (in particular, P (x) is allowed to contain further quantifiers, but it must not contain any other quantifier involving x – one says x must be a free variable in P (x), not bound by any quantifier in P (x)). The universal statement (1.13a) has the truth value T if, and only if, P (x) has the truth value T for all elements x ∈ A; the existential statement (1.13b) has the truth value T if, and only if, P (x) has the truth value T for at least one element x ∈ A. W V instead of ∃ . instead of ∀ and Remark 1.32. Some people prefer to write x∈A x∈A x∈A x∈A Even though this notation has the advantage of emphasizing that the universal statement can be interpreted as a big logical conjunction and the existential statement can be interpreted as a big logical disjunction, it is significantly less common. So we will stick to ∀ and ∃ in this class. Remark 1.33. According to Def. 1.31, the existential statement (1.13b) is true if, and only if, P (x) is true for at least one x ∈ A. So if there is precisely one such x, then AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 20 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY (1.13b) is true; and if there are several different x ∈ A such that P (x) is true, then (1.13b) is still true. Uniqueness statements are often of particular importance, and one sometimes writes ∃! P (x) (1.14) x∈A for the statement “there exists a unique x ∈ A such that P (x) is true”. This notation can be defined as an abbreviation for (1.15) P (x) ∧ ∀ P (y) ⇒ x = y . ∃ y∈A x∈A Example 1.34. Here are some examples of uniqueness statements: ∃! n > 10. (F) (1.16a) ∃! 12 > n > 10. (T) (1.16b) ∃! 11 > n > 10. (F) (1.16c) ∃! x2 = −1. (F) (1.16d) ∃! x2 = 1. (F) (1.16e) ∃! x2 = 0. (T) (1.16f) n∈N n∈N n∈N x∈R x∈R x∈R Remark 1.35. As for propositional calculus, we also have some important rules for predicate calculus: (a) Consider the negation of a universal statement, ¬ ∀ P (x), which is true if, and x∈A only if, P (x) does not hold for each x ∈ A, i.e. if, and only if, there exists at least one x ∈ A such that P (x) is false (such that ¬P (x) is true). We have just proved the rule ¬ ∀ P (x) ⇔ ∃ ¬P (x). (1.17a) x∈A x∈A Similarly, consider the negation of an existential statement. We claim the corresponding rule is ¬ ∃ P (x) ⇔ ∀ ¬P (x). (1.17b) x∈A x∈A Indeed, we can prove (1.17b) from (1.17a): ¬ ∃ P (x) x∈A Th. 1.11(k) ⇔ ¬ ∃ ¬¬P (x) x∈A (1.17a) ⇔ ¬¬ ∀ ¬P (x) Th. 1.11(k) x∈A ⇔ ∀ ¬P (x). x∈A (1.18) One can interpret (1.17) as a generalization of the De Morgan’s laws Th. 1.11(i),(j). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 21 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY One can actually generalize (1.17) even a bit more: If a statement starts with several quantifiers, then one negates the statement by replacing each ∀ with ∃ and vice versa plus negating the predicate after the quantifiers (see the example in (1.21e) below). (b) If A, B are sets and P (x, y) denotes a predicate of both x and y, then ∀ ∀ P (x, y) x∈A y∈B and ∀ ∀ P (x, y) both hold true if, and only if, P (x, y) holds true for each x ∈ A y∈B x∈A and each y ∈ B, i.e. the order of two consecutive universal quantifiers does not matter: ∀ ∀ P (x, y) ⇔ ∀ ∀ P (x, y) (1.19a) x∈A y∈B y∈B x∈A In the same way, we obtain the following rule: ∃ ∃ P (x, y) ⇔ ∃ x∈A y∈B ∃ P (x, y). y∈B x∈A (1.19b) If A = B, one also uses abbreviations of the form ∀ P (x, y) for ∃ P (x, y) for x,y∈A x,y∈A ∀ ∀ P (x, y), (1.20a) ∃ ∃ P (x, y). (1.20b) x∈A y∈A x∈A y∈A Generalizing rules (1.19), we can always commute identical quantifiers. Caveat: Quantifiers that are not identical must not be commuted (see Ex. 1.36(d) below). Example 1.36. (a) Negation of universal and existential statements: ¬(x+1>0) Negation of (1.12a) : z }| { ∃ x + 1 ≤ 0 . (F) x∈N (1.21a) ¬(x>0) Negation of (1.12b) : z }| { ∃ x ≤ 0 . (T) x∈R (1.21b) ¬(n>10) Negation of (1.12c) : z }| { ∀ n ≤ 10 . (F) n∈N (1.21c) ¬(x2 =−1) Negation of (1.12d) : z }| { ∀ x2 6= −1 . (T) x∈R (1.21d) ¬(m>n) Negation of (1.12e) : ∃ z }| { ∀ m ≤ n . (F) n∈N m∈N AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (1.21e) 22 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY (b) As a more complicated example, consider the negation of the uniqueness statement (1.14), i.e. of (1.15): ¬ ∃! P (x) ⇔ x∈A ¬ ∃ x∈A (1.17b), Th. 1.11(a) ⇔ Th. 1.11(i) ⇔ (1.17a) ⇔ Th. 1.11(j),(k) ⇔ Th. 1.11(a) ⇔ P (x) ∧ ∀ P (y) ⇒ x = y y∈A ∀ ¬ P (x) ∧ ∀ ¬P (y) ∨ x = y x∈A y∈A ¬P (x) ∨ ¬ ∀ ¬P (y) ∨ x = y ∀ y∈A x∈A ¬P (x) ∨ ∃ ¬ ¬P (y) ∨ x = y ∀ y∈A x∈A ¬P (x) ∨ ∃ P (y) ∧ x 6= y ∀ y∈A x∈A P (x) ⇒ ∃ P (y) ∧ x 6= y . ∀ (1.22) y∈A x∈A So how to decode the expression, we have obtained at the end? It states that if P (x) holds for some x ∈ A, then there must be at least a second, different, element y ∈ A such that P (y) is true. This is, indeed, precisely the negation of ∃! P (x). x∈A (c) Identical quantifiers commute: ∀ ∀ x2n ≥ 0 ⇔ ∀ ∀ x2n ≥ 0, ∃ ∃ ny > x2 ⇔ ∀ ∃ x∈R n∈N ∀ n∈N x∈R x∈R y∈R n∈N ∃ ny > x2 . x∈R n∈N y∈R (1.23a) (1.23b) (d) The following example shows that different quantifiers do, in general, not commute (i.e. do not yield equivalent statements when commuted): While the statement ∀ ∃ y>x (1.24a) x∈R y∈R is true (for each real number x, there is a bigger real number y, e.g. y := x + 1 will do the job), the statement ∃ ∀ y>x (1.24b) y∈R x∈R is false (for example, since y > y is false). In particular, (1.24a) and (1.24b) are not equivalent. (e) Even though (1.14) provides useful notation, it is better not to think of ∃! as a quantifier. It is really just an abbreviation for (1.15), and it behaves very differently from ∃ and ∀: The following examples show that, in general, ∃! commutes neither with ∃, nor with itself: ∃ ∃! m < n 6⇔ ∃! n∈N m∈N ∃ m<n m∈N n∈N AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 23 (the statement on the left is true, as one can choose n = 2, but the statement on the right is false, as ∃ m < n holds for every m ∈ N). Similarly, n∈N ∃! ∃! m < n 6⇔ ∃! ∃! m < n n∈N m∈N m∈N n∈N (the statement on the left is still true and the statement on the right is still false (there is no m ∈ N such that ∃! m < n)). n∈N Remark 1.37. One can make the following observations regarding the strategy for proving universal and existential statements: (a) To prove that ∀ P (x) is true, one must check the truth of P (x) for every element x∈A x ∈ A – examples are not enough! (b) To prove that ∀ P (x) is false, it suffices to find one x ∈ A such that P (x) is x∈A false – such an x is then called a counterexample and one counterexample is always enough to prove ∀ P (x) is false! x∈A (c) To prove that ∃ P (x) is true, it suffices to find one x ∈ A such that P (x) is true x∈A – such an x is then called an example and one example is always enough to prove ∃ P (x) is true! x∈A — The subfield of mathematical logic dealing with quantified statements is called predicate calculus. In general, one does not restrict the quantified variables to range only over elements of sets (as we have done above). Again, we refer to [EFT07] for a deeper treatment of the subject. As an application of quantified statements, let us generalize the notion of union and intersection: Definition 1.38. Let I 6= ∅ be a nonempty set, usually called an index set in the present context. For each i ∈ I, let Ai denote a set (some or all of the Ai can be identical). (a) The intersection \ i∈I Ai := x : ∀ x ∈ Ai i∈I consists of all elements x that belong to every Ai . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (1.25a) 24 1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY (b) The union [ i∈I Ai := x : ∃ x ∈ Ai (1.25b) i∈I consists of all elements x that belong to at least one Ai . The union is called disjoint if, and only if, for each i, j ∈ I, i 6= j implies Ai ∩ Aj = ∅. Proposition 1.39. Let I 6= ∅ be an index set, let M denote a set, and, for each i ∈ I, let Ai denote a set. The following set-theoretic rules hold: (a) (b) (c) (d) ∩M = ∪M = ∪M = ∩M = T S T Ai S Ai T Ai S Ai i∈I i∈I i∈I i∈I (e) M \ (f ) M \ Ai = i∈I i∈I S Ai = i∈I T i∈I T (Ai ∩ M ). S (Ai ∪ M ). T (Ai ∪ M ). S (Ai ∩ M ). i∈I i∈I i∈I i∈I (M \ Ai ). (M \ Ai ). Proof. We prove (c) and (e) and leave the remaining proofs as an exercise. (c): x∈ \ Ai i∈I ! (∗) ∪ M ⇔ x ∈ M ∨ ∀ x ∈ Ai ⇔ ∀ x ∈ Ai ∨ x ∈ M i∈I ⇔ x∈ \ i∈I i∈I (Ai ∪ M ). To justify the equivalence at (∗), we make use of Th. 1.11(b) and verify ⇒ and ⇐. For ⇒ note that the truth of x ∈ M implies x ∈ Ai ∨ x ∈ M is true for each i ∈ I. If x ∈ Ai is true for each i ∈ I, then x ∈ Ai ∨ x ∈ M is still true for each i ∈ I. To verify ⇐, note that the existence of i ∈ I such that x ∈ M implies the truth of x ∈ M ∨ ∀ x ∈ Ai . i∈I If x ∈ M is false for each i ∈ I, then x ∈ Ai must be true for each i ∈ I, showing x ∈ M ∨ ∀ x ∈ Ai is true also in this case. i∈I AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 25 2 FUNCTIONS AND RELATIONS (e): x∈M\ \ i∈I Ai ⇔ x ∈ M ∧ ¬ ∀ x ∈ Ai ⇔ x ∈ M ∧ ∃ x ∈ / Ai i∈I i∈I ⇔ ∃ x ∈ M \ Ai ⇔ x ∈ i∈I [ i∈I (M \ Ai ), completing the proof. Example 1.40. We have the following identities of sets: \ N = N, (1.26a) x∈R \ n∈N {1, 2, . . . , n} = {1}, [ (1.26b) N = N, (1.26c) {1, 2, . . . , n} = N, (1.26d) x∈R [ n∈N N\ [ n∈N {2n} = {1, 3, 5, . . . } = \ n∈N N \ {2n} : (1.26e) Comparing with the notation of Def. 1.38, in (1.26a), for example, we have I = R and Ai = N for each i ∈ I (where, in (1.26a), we have written x instead of i). Similarly, in (1.26b), we have I = N and An = {1, 2, . . . , n} for each n ∈ I. 2 Functions and Relations 2.1 Functions Definition 2.1. Let A, B be sets. Given x ∈ A, y ∈ B, the set n o (x, y) := {x}, {x, y} (2.1) is called the ordered pair (often shortened to just pair) consisting of x and y. The set of all such pairs is called the Cartesian product A × B, i.e. A × B := {(x, y) : x ∈ A ∧ y ∈ B}. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (2.2) 26 2 FUNCTIONS AND RELATIONS Example 2.2. Let A be a set. A × ∅ = ∅ × A = ∅, {1, 2} × {1, 2, 3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)} 6 {1, 2, 3} × {1, 2} = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}. = (2.3a) (2.3b) (2.3c) Also note that, for x 6= y, (x, y) = {x}, {x, y} 6= {y}, {x, y} = (y, x). (2.4) Definition 2.3. Given sets A, B, a function or map f is an assignment rule that assigns to each x ∈ A a unique y ∈ B. One then also writes f (x) for the element y. The set A is called the domain of f , denoted dom(f ), and B is called the codomain of f , denoted codom(f ). The information about a map f can be concisely summarized by the notation f : A −→ B, x 7→ f (x), (2.5) where x 7→ f (x) is called the assignment rule for f , f (x) is called the image of x, and x is called a preimage of f (x) (the image must be unique, but there might be several preimages). The set graph(f ) := (x, y) ∈ A × B : y = f (x) (2.6) is called the graph of f (not to be confused with pictures visualizing the function f , which are also called graph of f ). If one wants to be completely precise, then one identifies the function f with the ordered triple (A, B, graph(f )). The set of all functions with domain A and codomain B is denoted by F(A, B) or B A , i.e. F(A, B) := B A := (f : A −→ B) : A = dom(f ) ∧ B = codom(f ) . (2.7) Caveat: Some authors reserve the word map for continuous functions, but we use function and map synonymously. Definition 2.4. Let A, B be sets and f : A −→ B a function. (a) If T is a subset of A, then f (T ) := {f (x) ∈ B : x ∈ T } is called the image of T under f . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (2.8) 27 2 FUNCTIONS AND RELATIONS (b) If U is a subset of B, then f −1 (U ) := {x ∈ A : f (x) ∈ U } (2.9) is called the preimage or inverse image of U under f . (c) f is called injective or one-to-one if, and only if, every y ∈ B has at most one preimage, i.e. if, and only if, the preimage of {y} has at most one element: −1 f {y} = ∅ ∨ ∃! f (x) = y f injective ⇔ ∀ x∈A y∈B ⇔ ∀ x1 ,x2 ∈A x1 6= x2 ⇒ f (x1 ) 6= f (x2 ) . (2.10) (d) f is called surjective or onto if, and only if, every element of the codomain of f has a preimage: f surjective ⇔ ∀ ∃ y = f (x) y∈B x∈A ⇔ ∀ f −1 {y} 6= ∅. y∈B (2.11) (e) f is called bijective if, and only if, f is injective and surjective. Example 2.5. Examples of Functions: f : {1, 2, 3, 4, 5} −→ {1, 2, 3, 4, 5}, g : N −→ N, h : N −→ {2, 4, 6, . . . }, h̃ : N −→ {2, 4, 6, . . . }, G : N −→ R, F : P(N) −→ P P(N) , f (x) := −x + 6, g(n) := 2n, h(n) := 2n, ( n for n even, h̃(n) := n + 1 for n odd, (2.12a) (2.12b) (2.12c) G(n) := n/(n + 1), (2.12e) F (A) := P(A). (2.12f) (2.12d) Instead of f (x) := −x + 6 in (2.12a), one can also write x 7→ −x + 6 and analogously in the other cases. Also note that, in the strict sense, functions g and h are different, since their codomains are different (however, using the following Def. 2.4(a), they have the same image in the sense that g(N) = h(N)). Furthermore, f ({1, 2}) = {5, 4} = f −1 ({1, 2}), h̃−1 ({2, 4, 6}) = {1, 2, 3, 4, 5, 6}, (2.13) f is bijective; g is injective, but not surjective; h is bijective; h̃ is surjective, but not injective. Can you figure out if G and F are injective and/or surjective? AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 28 2 FUNCTIONS AND RELATIONS Example 2.6. (a) For each nonempty set A, the map Id : A −→ A, Id(x) := x, is called the identity on A. If one needs to emphasize that Id operates on A, then one also writes IdA instead of Id. The identity is clearly bijective. (b) Let A, B be nonempty sets. A map f : A −→ B is called constant if, and only if, there exists c ∈ B such that f (x) = c for each x ∈ A. In that case, one also writes f ≡ c, which can be read as “f is identically equal to c”. If f ≡ c, ∅ 6= T ⊆ A, and U ⊆ B, then ( A for c ∈ U , (2.14) f (T ) = {c}, f −1 (U ) = ∅ for c ∈ / U. f is injective if, and only if, A = {x}; f is surjective if, and only if, B = {c}. (c) Given A ⊆ X, the map ι : A −→ X, ι(x) := x, (2.15) is called inclusion (also embedding or imbedding). An inclusion is always injective; it is surjective if, and only if A = X, i.e. if, and only if, it is the identity on A. (d) Given A ⊆ X and a map f : X −→ B, the map g : A −→ B, g(x) = f (x), is called the restriction of f to A; f is called the extension of g to X. In this situation, one also uses the notation f ↾A for g (some authors prefer the notation f |A or f |A). Theorem 2.7. Let f : A → B be a map, let ∅ 6= I be an index set, and assume S, T, Si , i ∈ I, are subsets of A, whereas U, V, Ui , i ∈ I, are subsets of B. Then we have the following rules concerning functions and set-theoretic operations: f (S ∩ T ) ⊆ f (S) ∩ f (T ), ! \ \ Si ⊆ f (Si ), f i∈I i∈I −1 f −1 (U ∩ V ) = f (U ) ∩ f −1 (V ), ! \ \ f −1 Ui = f −1 (Ui ), i∈I i∈I −1 (U ∪ V ) = f (U ) ∪ f −1 (V ), ! [ [ f −1 Ui = f −1 (Ui ), f −1 i∈I (2.16b) i∈I f (S ∪ T ) = f (S) ∪ f (T ), ! [ [ = f (Si ), f Si i∈I (2.16a) i∈I AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (2.16c) (2.16d) (2.16e) (2.16f) (2.16g) (2.16h) 29 2 FUNCTIONS AND RELATIONS f (f −1 (U )) ⊆ U, f −1 (f (S)) ⊇ S, f −1 (U \ V ) = f −1 (U ) \ f −1 (V ). (2.16i) (2.16j) Proof. We prove (2.16b) (which includes (2.16a) as a special case) and the second part of (2.16i), and leave the remaining cases as exercises. For (2.16b), one argues ! \ \ f (Si ). Si ⇔ ∃ ∀ x ∈ Si ∧ y = f (x) ⇒ ∀ y ∈ f (Si ) ⇔ y ∈ y∈f x∈A i∈I i∈I i∈I i∈I The observation x ∈ S ⇒ f (x) ∈ f (S) ⇔ x ∈ f −1 (f (S)). establishes the second part of (2.16i). It is an exercise to find counterexamples that show one can not, in general, replace the four subset symbols in (2.16) by equalities (it is possible to find examples with sets that have at most 2 elements). Definition 2.8. The composition of maps f and g with f : A −→ B, g : C −→ D, and f (A) ⊆ C is defined to be the map g ◦ f : A −→ D, (g ◦ f )(x) := g f (x) . (2.17) The expression g ◦ f is read as “g after f ” or “g composed with f ”. Example 2.9. Consider the maps f : N −→ R, g : N −→ R, n 7→ n2 , n 7→ 2n. (2.18a) (2.18b) We obtain f (N) = {1, 4, 9, . . . } ⊆ dom(g), g(N) = {2, 4, 6, . . . } ⊆ dom(f ), and the compositions (g ◦ f ) : N −→ R, (f ◦ g) : N −→ R, (g ◦ f )(n) = g(n2 ) = 2n2 , (f ◦ g)(n) = f (2n) = 4n2 , (2.19a) (2.19b) showing that composing functions is, in general, not commutative, even if the involved functions have the same domain and the same codomain. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 30 2 FUNCTIONS AND RELATIONS Proposition 2.10. Consider maps f : A −→ B, g : C −→ D, h : E −→ F , satisfying f (A) ⊆ C and g(C) ⊆ E. (a) Associativity of Compositions: h ◦ (g ◦ f ) = (h ◦ g) ◦ f. (2.20) (b) One has the following law for forming preimages: ∀ W ∈P(D) (g ◦ f )−1 (W ) = f −1 (g −1 (W )). (2.21) Proof. (a): Both h ◦ (g ◦ f ) and (h ◦ g) ◦ f map A into F . So it just remains to prove h ◦ (g ◦ f ) (x) = (h ◦ g) ◦ f (x) for each x ∈ A. One computes, for each x ∈ A, h ◦ (g ◦ f ) (x) = h (g ◦ f )(x) = h g(f (x)) = (h ◦ g)(f (x)) = (h ◦ g) ◦ f (x), (2.22) establishing the case. (b): Exercise. Definition 2.11. A function g : B −→ A is called a right inverse (resp. left inverse) of a function f : A −→ B if, and only if, f ◦ g = IdB (resp. g ◦ f = IdA ). Moreover, g is called an inverse of f if, and only if, it is both a right and a left inverse. If g is an inverse of f , then one also writes f −1 instead of g. The map f is called (right, left) invertible if, and only if, there exists a (right, left) inverse for f . Example 2.12. (a) Consider the map f : N −→ N, f (n) := 2n. (2.23a) The maps g1 : N −→ N, g2 : N −→ N, g1 (n) := ( n/2 if n even, 1 if n odd, (2.23b) g2 (n) := ( n/2 if n even, 2 if n odd, (2.23c) both constitute left inverses of f . It follows from Th. 2.13(c) below that f does not have a right inverse. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 31 2 FUNCTIONS AND RELATIONS (b) Consider the map f : N −→ N, f (n) := ( n/2 for n even, (n + 1)/2 for n odd. (2.24a) g1 (n) := 2n, g2 (n) := 2n − 1, (2.24b) (2.24c) The maps g1 : N −→ N, g2 : N −→ N, both constitute right inverses of f . It follows from Th. 2.13(c) below that f does not have a left inverse. (c) The map f : N −→ N, f (n) := ( n − 1 for n even, n + 1 for n odd, is its own inverse, i.e. f −1 = f . For the map 2 3 g : N −→ N, g(n) := 1 n for n = 1, for n = 2, for n = 3, for n ∈ / {1, 2, 3}, (2.25a) (2.25b) the inverse is g −1 : N −→ N, 3 1 g −1 (n) := 2 n for n = 1, for n = 2, for n = 3, for n ∈ / {1, 2, 3}. (2.25c) While Examples 2.12(a),(b) show that left and right inverses are usually not unique, they are unique provided f is bijective (see Th. 2.13(c)). Theorem 2.13. Let A, B be nonempty sets. (a) f : A −→ B is right invertible if, and only if, f is surjective (where the implication “⇐” makes use of the axiom of choice (AC), see Appendix A.4). (b) f : A −→ B is left invertible if, and only if, f is injective. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 32 2 FUNCTIONS AND RELATIONS (c) f : A −→ B is invertible if, and only if, f is bijective. In this case, the right inverse and the left inverse are unique and both identical to the inverse. Proof. (a): If f is surjective, then, for each y ∈ B, there exists xy ∈ f −1 {y} such that f (xy ) = y. By AC, we can define the choice function g : B −→ A, g(y) := xy . (2.26) Then, for each y ∈ B, f (g(y)) = y, showing g is a right inverse of f . Conversely, if g : B −→ A is a right inverse of f , then, for each y ∈ B, it is y = f (g(y)), showing that g(y) ∈ A is a preimage of y, i.e. f is surjective. (b): Fix a ∈ A. If f is injective, then, for each y ∈ B with f −1 {y} 6= ∅, let xy denote the unique element in A satisfying f (xy ) = y. Define ( xy for f −1 {y} 6= ∅, (2.27) g : B −→ A, g(y) := a otherwise. Then, for each x ∈ A, g(f (x)) = x, showing g is a left inverse of f . Conversely, if g : B −→ A is a left inverse of f and x1 , x2 ∈ A with f (x1 ) = f (x2 ) = y, then x1 = (g ◦ f )(x1 ) = g(f (x1 )) = g(f (x2 )) = (g ◦ f )(x2 ) = x2 , showing y has precisely one preimage and f is injective. The first part of (c) follows immediately by combining (a) and (b) (and, actually, without using AC, since, if f is both injective and surjective, then, for each y ∈ B, the element xy ∈ f −1 {y} is unique, and (2.26) can be defined without AC). It merely remains to verify the uniqueness of right and left inverse for bijective maps. So let g be a left inverse of f , let h be a right inverse of f , and let f −1 be an inverse of f . Then, for each y ∈ B, g(y) = g ◦ (f ◦ f −1 ) (y) = (g ◦ f ) ◦ f −1 (y) = f −1 (y), (2.28a) h(y) = (f −1 ◦ f ) ◦ h (y) = f −1 ◦ (f ◦ h) (y) = f −1 (y), (2.28b) thereby proving the uniqueness of left and right inverse for bijective maps. Theorem 2.14. Consider maps f : A −→ B, g : B −→ C. If f and g are both injective (resp. both surjective, both bijective), then so is g ◦ f . Moreover, in the bijective case, one has (g ◦ f )−1 = f −1 ◦ g −1 . (2.29) Proof. Exercise. Definition 2.15. (a) Given an index set I and a set A, a map f : I −→ A is sometimes called a family (of elements in A), and is denoted in the form f = (ai )i∈I with ai := f (i). When using this representation, one often does not even specify f and A, especially if the ai are themselves sets. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 33 2 FUNCTIONS AND RELATIONS (b) A sequence in a set A is a family of elements in A, where the index set is the set of natural numbers N. In this case, one writes (an )n∈N or (a1 , a2 , . . . ). More generally, a family is called a sequence, given a bijective map between the index set I and a subset of N. (c) Given a family of sets (Ai )i∈I , we define the Cartesian product of the Ai to be the set of functions ( ! ) Y [ Ai := f : I −→ Aj : ∀ f (i) ∈ Ai . (2.30) i∈I i∈I j∈I If QI has precisely n elements with n ∈ N, then the elements of the Cartesian product i∈I Ai are called (ordered) n-tuples, (ordered) triples for n = 3. Example 2.16. (a) Using T S the notion of family, we can now say that the intersection i∈I Ai and union i∈I Ai as defined in Def. 1.38 are the intersection and union of the family of sets (Ai )i∈I , respectively. As a concrete example, let us revisit (1.26b), where we have \ (An )n∈N , An := {1, 2, . . . , n}, An = {1}. (2.31) n∈N (b) Examples of Sequences: Sequence in {0, 1} : Sequence in N : Sequence in R : Sequence in R : Finite Sequence in P(N) : (1, 0, 1, 0, 1, 0, . . . ), (n2 )n∈N = (1, 4, 9, 16, 25, . . . ), √ √ √ (−1)n n n∈N = −1, 2, − 3, . . . , 1 1 (1/n)n∈N = 1, , , . . . , 2 3 {3, 2, 1}, {2, 1}, {1}, ∅ . (2.32a) (2.32b) (2.32c) (2.32d) (2.32e) Q (c) The Cartesian product i∈I A, where all sets Ai =QA, is the same as AI , the set of all functions from I into A. So, for example, n∈N R = RN is the set of all sequences in R. If I = {1, 2, . . . , n} with n ∈ N, then Y {1,2...,n} A=A =: n Y A =: An i=1 i∈I is the set of all n-tuples with entries from A. — AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (2.33) 34 2 FUNCTIONS AND RELATIONS In the following, we explain the common notation 2A for the power set P(A) of a set A. It is related to a natural identification between subsets and their corresponding characteristic function. Definition 2.17. Let A be a set and let B ⊆ A be a subset of A. Then ( 1 if x ∈ B, χB : A −→ {0, 1}, χB (x) := 0 if x ∈ / B, (2.34) is called the characteristic function of the set B (with respect to the universe A). One also finds the notations 1B and 1B instead of χB (note that all the notations supress the dependence of the characteristic function on the universe A). Proposition 2.18. Let A be a set. Then the map χ : P(A) −→ {0, 1}A , χ(B) := χB , (2.35) is bijective (recall that P(A) denotes the power set of A and {0, 1}A denotes the set of all functions from A into {0, 1}). Proof. χ is injective: Let B, C ∈ P(A) with B 6= C. By possibly switching the names of B and C, we may assume there exists x ∈ B such that x ∈ / C. Then χB (x) = 1, whereas χC (x) = 0, showing χ(B) 6= χ(C), proving χ is injective. χ is surjective: Let f : A −→ {0, 1} be an arbitrary function and define B := {x ∈ A : f (x) = 1}. Then χ(B) = χB = f , proving χ is surjective. Proposition 2.18 allows one to identify the sets P(A) and {0, 1}A via the bijective map χ. This fact together with the common practice of set theory to identify the number 2 with the set {0, 1} explains the notation 2A for P(A). 2.2 Relations 2.2.1 Definition and Properties Definition 2.19. Given sets A and B, a relation is a subset R of A × B (if one wants to be completely precise, a relation is an ordered triple (A, B, R), where R ⊆ A × B). If A = B, then we call R a relation on A. One says that a ∈ A and b ∈ B are related according to the relation R if, and only if, (a, b) ∈ R. In this context, one usually writes a R b instead of (a, b) ∈ R. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 35 2 FUNCTIONS AND RELATIONS Example 2.20. (a) The relations we are probably most familiar with are = and ≤. The relation R of equality, usually denoted =, makes sense on every nonempty set A: R := ∆(A) := {(x, x) ∈ A × A : x ∈ A}. (2.36) The set ∆(A) is called the diagonal of the Cartesian product, i.e., as a subset of A × A, the relation of equality is identical to the diagonal: x = y ⇔ x R y ⇔ (x, y) ∈ R = ∆(A). (2.37) Similarly, the relation ≤ on R is identical to the set R≤ := {(x, y) ∈ R2 : x ≤ y}. (2.38) (b) Every function f : A −→ B is a relation, namely the relation Rf = {(x, y) ∈ A × B : y = f (x)} = graph(f ). (2.39) Conversely, if B 6= ∅, then every relation R ⊆ A × B uniquely corresponds to the function fR : A −→ P(B), fR (x) = {y ∈ B : x R y}. (2.40) Definition 2.21. Let R be a relation on the set A. (a) R is called reflexive if, and only if, ∀ x R x, x∈A (2.41) i.e. if, and only if, every element is related to itself. (b) R is called symmetric if, and only if, ∀ x,y∈A xRy ⇒ yRx , (2.42) i.e. if, and only if, each x is related to y if, and only if, y is related to x. (c) R is called antisymmetric if, and only if, ∀ x,y∈A (x R y ∧ y R x) ⇒ x = y , (2.43) i.e. if, and only if, the only possibility for x to be related to y at the same time that y is related to x is in the case x = y. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 2 FUNCTIONS AND RELATIONS 36 (d) R is called transitive if, and only if, ∀ x,y,z∈A (x R y ∧ y R z) ⇒ x R z , (2.44) i.e. if, and only if, the relatedness of x and y together with the relatedness of y and z implies the relatedness of x and z. Example 2.22. The relations = and ≤ on R (or N) are reflexive, antisymmetric, and transitive; = is also symmetric, whereas ≤ is not; < is antisymmetric (since x < y∧y < x is always false) and transitive, but neither reflexive nor symmetric. The relation R := (x, y) ∈ N2 : (x, y are both even) ∨ (x, y are both odd) (2.45) on N is not antisymmetric, but reflexive, symmetric, and transitive. The relation S := {(x, y) ∈ N2 : y = x2 } (2.46) is not transitive (for example, 2 S 4 and 4 S 16, but not 2 S 16), not reflexive, not symmetric; it is only antisymmetric. 2.2.2 Order Relations Definition 2.23. A relation R on a set A is called a partial order if, and only if, R is reflexive, antisymmetric, and transitive. If R is a partial order, then one usually writes x ≤ y instead of x R y. A partial order ≤ is called a total or linear order if, and only if, for each x, y ∈ A, one has x ≤ y or y ≤ x. Notation 2.24. Given a (partial or total) order ≤ on A 6= ∅, we write x < y if, and only if, x ≤ y and x 6= y, calling < the strict order corresponding to ≤ (note that the strict order is never a partial order). Definition 2.25. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. (a) x ∈ A is called lower (resp. upper) bound for B if, and only if, x ≤ b (resp. b ≤ x) for each b ∈ B. Moreover, B is called bounded from below (resp. from above) if, and only if, there exists a lower (resp. upper) bound for B; B is called bounded if, and only if, it is bounded from above and from below. (b) x ∈ B is called minimum or just min (resp. maximum or max) of B if, and only if, x is a lower (resp. upper) bound for B. One writes x = min B if x is minimum and x = max B if x is maximum. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 37 2 FUNCTIONS AND RELATIONS (c) A maximum of the set of lower bounds of B (i.e. a largest lower bound) is called infimum of B, denoted inf B; a minimum of the set of upper bounds of B (i.e. a smallest upper bound) is called supremum of B, denoted sup B. Example 2.26. (a) For each A ⊆ R, the usual relation ≤ defines a total order on A. For A = R, we see that N has 0 and 1 as lower bound with 1 = min N = inf N. On the other hand, N is unbounded from above. The set M := {1, 2, 3} is bounded with min M = 1, max M = 3. The positive real numbers R+ := {x ∈ R : x > 0} have inf R+ = 0, but they do not have a minimum (if x > 0, then 0 < x/2 < x). (b) Consider A := N × N. Then (m1 , m2 ) ≤ (n1 , n2 ) ⇔ m1 ≤ n1 ∧ m2 ≤ n2 , (2.47) defines a partial order on A that is not a total order (for example, neither (1, 2) ≤ (2, 1) nor (2, 1) ≤ (1, 2)). For the set B := (1, 1), (2, 1), (1, 2) , (2.48) we have inf B = min B = (1, 1), B does not have a max, but sup B = (2, 2) (if (m, n) ∈ A is an upper bound for B, then (2, 1) ≤ (m, n) implies 2 ≤ m and (1, 2) ≤ (m, n) implies 2 ≤ n, i.e. (2, 2) ≤ (m, n); since (2, 2) is clearly an upper bound for B, we have proved sup B = (2, 2)). A different order on A is the so-called lexicographic order defined by (m1 , m2 ) ≤ (n1 , n2 ) ⇔ m1 < n1 ∨ (m1 = n1 ∧ m2 ≤ n2 ). (2.49) In contrast to the order from (2.47), the lexicographic order does define a total order on A. Lemma 2.27. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. Then the relation ≥, defined by x ≥ y ⇔ y ≤ x, (2.50) is also a partial order on A. Moreover, using obvious notation, we have, for each x ∈ A, x ≤-lower bound for B x ≤-upper bound for B x = min≤ B x = max≤ B x = inf ≤ B x = sup≤ B ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ x ≥-upper bound for B, x ≥-lower bound for B, x = max≥ B, x = min≥ B, x = sup≥ B, x = inf ≥ B. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (2.51a) (2.51b) (2.51c) (2.51d) (2.51e) (2.51f) 38 2 FUNCTIONS AND RELATIONS Proof. Reflexivity, antisymmetry, and transitivity of ≤ clearly imply the same properties for ≥, respectively. Moreover x ≤-lower bound for B ⇔ ∀ x ≤ b ⇔ ∀ b ≥ x ⇔ x ≥-upper bound for B, b∈B b∈B proving (2.51a). Analogously, we obtain (2.51b). Next, (2.51c) and (2.51d) are implied by (2.51a) and (2.51b), respectively. Finally, (2.51e) is proved by x = inf ≤ B ⇔ x = max≤ {y ∈ A : y ≤-lower bound for B} ⇔ x = min≥ {y ∈ A : y ≥-upper bound for B} ⇔ x = sup≥ B, and (2.51f) follows analogously. Proposition 2.28. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. The elements max B, min B, sup B, inf B are all unique, provided they exist. Proof. Exercise. Definition 2.29. Let A, B be nonempty sets with partial orders, both denoted by ≤ (even though they might be different). A function f : A −→ B, is called (strictly) isotone, order-preserving, or increasing if, and only if, (2.52a) ∀ x < y ⇒ f (x) ≤ f (y) (resp. f (x) < f (y)) ; x,y∈A f is called (strictly) antitone, order-reversing, or decreasing if, and only if, ∀ x < y ⇒ f (x) ≥ f (y) (resp. f (x) > f (y)) . x,y∈A (2.52b) Functions that are (strictly) isotone or antitone are called (strictly) monotone. Proposition 2.30. Let A, B be nonempty sets with partial orders, both denoted by ≤. (a) A (strictly) isotone function f : A −→ B becomes a (strictly) antitone function and vice versa if precisely one of the relations ≤ is replaced by ≥. (b) If the order ≤ on A is total and f : A −→ B is strictly isotone or strictly antitone, then f is one-to-one. (c) If the order ≤ on A is total and f : A −→ B is invertible and strictly isotone (resp. antitone), then f −1 is also strictly isotone (resp. antitone). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 2 FUNCTIONS AND RELATIONS 39 Proof. (a) is immediate from (2.52). (b): Due to (a), it suffices to consider the case that f is strictly isotone. If f is strictly isotone and x 6= y, then x < y or y < x since the order on A is total. Thus, f (x) < f (y) or f (y) < f (x), i.e. f (x) 6= f (y) in every case, showing f is one-to-one. (c): Again, due to (a), it suffices to consider the isotone case. If u, v ∈ B such that u < v, then u = f (f −1 (u)), v = f (f −1 (v)), and the isotonicity of f imply f −1 (u) < f −1 (v) (we are using that the order on A is total – otherwise, f −1 (u) and f −1 (v) need not be comparable). Example 2.31. (a) f : N −→ N, f (n) := 2n, is strictly increasing, every constant map on N is both increasing and decreasing, but not strictly increasing or decreasing. All maps occurring in (2.25) are neither increasing nor decreasing. (b) The map f : R −→ R, f (x) := −2x, is invertible and strictly decreasing, and so is f −1 : R −→ R, f −1 (x) := −x/2. (c) The following counterexamples show that the assertions of Prop. 2.30(b),(c) are no longer correct if one does not assume the order on A is total. Let A be the set from (2.48) (where it had been called B) with the (nontotal) order from (2.47). The map f (1, 1) := 1, f : A −→ N, f (1, 2) := 2, f (2, 1) := 2, is strictly isotone, but not one-to-one. The map f (1, 1) := 1, f : A −→ {1, 2, 3}, f (1, 2) := 2, f (2, 1) := 3, is strictly isotone and invertible, however f −1 is not isotone (since 2 < 3, but f −1 (2) = (1, 2) and f −1 (3) = (2, 1) are not comparable, i.e. f −1 (2) ≤ f −1 (3) is not true). 2.2.3 Equivalence Relations Definition 2.32. Let R be a relation on a set A. (a) R is called an equivalence relation if, and only if, R is reflexive, symmetric, and transitive. If R is an equivalence relations, then one often writes x ∼ y instead of x R y. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 40 2 FUNCTIONS AND RELATIONS (b) Let ∼ := R be an equivalence relation on A. For each x ∈ A, define [x] := {y ∈ A : x ∼ y} (2.53) and call [x] the equivalence class of x. Moreover, each y ∈ [x] is called a representative of [x]. The set of all equivalence classes A/ ∼:= {[x] : x ∈ A} is called the quotient set of A by ∼, and the map π : A −→ A/ ∼, x 7→ [x], (2.54) is called the corresponding quotient map, canonical map, or canonical projection. — The following Th. 2.33 shows that the equivalence classes of an equivalence relation on a nonempty set A decompose A into disjoint sets and, conversely, that a given decomposition of a nonempty set A into disjoint nonempty sets Ai , i ∈ I, gives rise to a unique equivalence relation ∼ on A such that the Ai are precisely the equivalence classes corresponding to ∼: Theorem 2.33. Let A be a nonempty set. Ṡ (a) Given a disjoint union A = i∈I Ai with every Ai 6= ∅ (a so-called decomposition of A), an equivalence relation on A is defined by (2.55) x ∼ y :⇔ ∃ x ∈ Ai ∧ y ∈ Ai . i∈I Moreover, for the equivalence classes given by ∼, one then has x ∈ Ai ⇔ Ai = [x] . ∀ ∀ x∈A i∈I (2.56) (b) Given an equivalence relation ∼ on a nonempty set A, the equivalence classes given by ∼ form a decomposition of A: One has [x] = [y] ⇔ x ∼ y ∧ [x] ∩ [y] = ∅ ⇔ ¬(x ∼ y) (2.57) ∀ x,y∈A and A= [ ˙ i∈I Ai , (2.58) where I := A/ ∼ is the quotient set of Def. 2.32(b) and Ai := i for each i ∈ I. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 2 FUNCTIONS AND RELATIONS 41 Proof. (a): That ∼ is symmetric is immediate from (2.55). If x ∈ A, then, as A is the union of the Ai , there exists i ∈ I with x ∈ Ai , showing x ∼ x, i.e. ∼ is reflexive. If x, y, z ∈ A with x ∼ y and y ∼ z, then there exist i, j ∈ I with x, y ∈ Ai and y, z ∈ Aj . Then y ∈ Ai ∩ Aj , implying i = j (as the union is disjoint) and x ∼ z, showing ∼ to be transitive as well. Thus, we have shown ∼ to be an equivalence relation. Now consider x ∈ A and i ∈ I. If Ai = [x], then x ∼ x implies x ∈ [x] = Ai . Conversely, assume x ∈ Ai . Then (2.55) y ∈ Ai ⇔ x ∼ y ⇔ y ∈ [x], proving Ai = [x]. Hence, we have verified (2.56) and (a). (b): Let x, y ∈ A. If [x] = [y], then (as y ∼ y) y ∈ [y] = [x], implying x ∼ y. Conversely, assume x ∼ y. Then z ∈ [y] implies y ∼ z, implying x ∼ z (since ∼ is transitive) and, thus, z ∈ [x] and [y] ⊆ [x]. From this, and the symmetry of ∼, we also obtain x ∼ y implies y ∼ x, which implies [x] ⊆ [y]. Altogether, we have [x] = [y]. Thus, we have established the first equivalence of (2.57). In consequence, we also have [x] 6= [y] ⇔ ¬(x ∼ y). To prove the second equivalence of (2.57), we now show [x] 6= [y] ⇔ [x] ∩ [y] = ∅: If [x] ∩ [y] = ∅, then [x] = [y] could only hold for [x] = [y] = ∅. However, x ∈ [x] and y ∈ [y], showing [x] 6= [y]. For the converse, we argue via contraposition and assume z ∈ [x] ∩ [y]. Then x ∼ z and y ∼ z and, by symmetry and transitivity of ∼, x ∼ y and [x] = [y], proving the second equivalence of (2.57). It remains to verify (2.58). From (2.57), we know the elements of A/ ∼ to be disjoint. On the other hand, if x ∈ A, then x ∈ [x] ∈ A/ ∼, showing A to be the union of the Ai , thereby proving (2.58) and the theorem. Example 2.34. (a) The equality relation = is an equivalence relation on each A 6= ∅, where, for each x ∈ A, one has [x] = {x}. (b) The relation R defined in (2.45) is an equivalence relation on N. Here, R yields precisely two equivalence classes, one consisting of all even numbers and one consisting of all odd numbers. Remark 2.35. If ∼ is an equivalence relation on a nonempty set A, then, clearly, the quotient map π : A −→ A/ ∼, x 7→ [x], is always surjective. It is injective (and, thus, bijective) if, and only if, every equivalence class has precisely one element, i.e. if, and only if, ∼ is the equality relation =. Example 2.36. (a) An important application of equivalence relations and quotient sets is the construction of the set Z = {. . . , −2, −1, 0, 1, 2, . . . } of integers from the set N0 of natural numbers (including 0) of Def. 1.28: One actually defines Z as the AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 42 2 FUNCTIONS AND RELATIONS quotient set with respect to the following equivalence relation ∼ on N0 × N0 , where the relation ∼ on N0 × N0 is defined2 by (a, b) ∼ (c, d) :⇔ a + d = b + c. (2.59) We verify that (2.59) does, indeed, define an equivalence relation on N0 × N0 : If a, b ∈ N0 , then a + b = b + a shows (a, b) ∼ (a, b), proving ∼ to be reflexive. If a, b, c, d ∈ N0 , then (a, b) ∼ (c, d) ⇒ a+d=b+c ⇒ c+b=d+a ⇒ (c, d) ∼ (a, b), proving ∼ to be symmetric. If a, b, c, d, e, f ∈ N0 , then (a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ a + d = b + c ∧ c + f = d + e ⇒ a + d + c + f = b + c + d + e ⇒ a + f = b + e ⇒ (a, b) ∼ (e, f ), proving ∼ to be transitive and an equivalence relation. Thus, we can, indeed, define Z := (N0 × N0 )/ ∼ = [(a, b)] : (a, b) ∈ N0 × N0 . (2.60) To simplify notation, in the following, we will write [a, b] := [(a, b)] (2.61) for the equivalence class of (a, b) with respect to ∼. The map ι : N0 −→ Z, ι(n) := [n, 0], (2.62) is injective (since ι(m) = [m, 0] = ι(n) = [n, 0] implies m + 0 = 0 + n, i.e. m = n). It is customary to identify N0 with ι(N0 ), as it usually does not cause any confusion. One then just writes n instead of [n, 0] and −n instead of [0, n] = −[n, 0] (we will come back to the addition on Z later and then this equation will make more sense, cf. Th. 4.15). (b) Having constructed the set if integers Z in (a), in a next step, one can perform a similar construction to obtain the set of rational numbers Q. One defines Q as the 2 In (2.59), we employ the usual addition on N. Its mathematically precise definition is actually somewhat tedious and, at this stage, we do not even have all the necessary prerequisites in place, yet. Still, it should be possible to follow the present example, using one’s intuitive, informal understanding of the addition on N0 . The precise definition can be found in [Phi16, Sec. D.1] and interested readers might want to study the definition in [Phi16, Sec. D.1] once we have introduced the notions of induction and recursion. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 43 2 FUNCTIONS AND RELATIONS quotient set with respect to the following equivalence relation ∼ on Z × (Z \ {0}), where the relation ∼ on Z × (Z \ {0}) is defined3 by (a, b) ∼ (c, d) a · d = b · c. :⇔ (2.63) Noting that (2.63) is precisely the same as (2.59) if + is replaced by ·, the proof from (a) also shows that (2.63) does, indeed, define an equivalence relation on Z × (Z \ {0}): One merely replaces each + with · and each N0 with Z or Z \ {0}), respectively. The only modification needed occurs for 0 ∈ {a, c, e} in the proof of transitivity (in this case, the proof of (a) yields adcf = 0 = bcde, which does not imply af = be), where one now argues, for a = 0, (a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ ⇒ af = 0 = be b6=0 c=0 d6=0 ⇒ e=0 ⇒ ad = 0 = bc ∧ cf = de ⇒ (a, b) ∼ (e, f ), for c = 0, (a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ ad = 0 = bc ∧ cf = 0 = de ⇒ a = e = 0 af = 0 = be ⇒ (a, b) ∼ (e, f ), (a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ ad = bc ∧ cf = 0 = de ⇒ af = 0 = be d6=0 and, for e = 0, f 6=0 c=0 d6=0 ⇒ a=0 ⇒ ⇒ (a, b) ∼ (e, f ). Thus, we can, indeed, define Q := Z × (Z \ {0}) / ∼ = [(a, b)] : (a, b) ∈ Z × (Z \ {0}) . (2.64) As is common, we will write a := a/b := [(a, b)] b (2.65) for the equivalence class of (a, b) with respect to ∼. The map ι : Z −→ Q, ι(k) := k , 1 3 (2.66) In (2.63), we employ the usual multiplication on Z, as we used addition on N0 in (2.59) above, this time appealing to the reader’s intuitive, informal understanding of multiplication on Z. We will actually provide a mathematically precise definition of multiplication on Z later in this class in Ex. 4.33, making use of the definition of Z given in (a) (readers who do not want to wait can, e.g., consult [Phi16, Def. D.16]). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 44 is injective (since ι(k) = k/1 = ι(l) = l/1 implies k · 1 = l · 1, i.e. k = l). It is customary to identify Z with ι(Z), as it usually does not cause any confusion. One then just writes k instead of k1 . — While the set of real numbers R can now also be constructed from the set Q, making use of the notions of equivalence relation and quotient set, the construction is more complicated and it also makes use of additional notions from the field of Analysis. Thus, this construction is not within the scope of this class and the interested reader is referred to [Phi16, Sec. D.5]. 3 Natural Numbers, Induction, and the Size of Sets 3.1 Induction and Recursion One of the most useful proof techniques is the method of induction – it is used in situations, where one needs to verify the truth of statements φ(n) for each n ∈ N, i.e. the truth of the statement ∀ φ(n). (3.1) n∈N Induction is based on the fact that N satisfies the so-called Peano axioms: P1: N contains a special element called one, denoted 1. P2: There exists an injective map S : N −→ N \ {1}, called the successor function (for each n ∈ N, S(n) is called the successor of n). P3: If a subset A of N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A, then A is equal to N. Written as a formula, the third axiom is: ∀ 1 ∈ A ∧ S(A) ⊆ A ⇒ A = N . A∈P(N) Remark 3.1. In Def. 1.28, we had introduced the natural numbers N := {1, 2, 3, . . . }. The successor function is S(n) = n + 1. In axiomatic set theory, one starts with the Peano axioms and shows that the axioms of set theory allow the construction of a set N which satisfies the Peano axioms. One then defines 2 := S(1), 3 := S(2), . . . , n + 1 := S(n). The interested reader can find more details in [Phi16, Sec. D.1]. Theorem 3.2 (Principle of Induction). Suppose, for each n ∈ N, φ(n) is a statement (i.e. a predicate of n in the language of Def. 1.31). If (a) and (b) both hold, where AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 45 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS (a) φ(1) is true, (b) ∀ n∈N φ(n) ⇒ φ(n + 1) , then (3.1) is true, i.e. φ(n) is true for every n ∈ N. Proof. Let A := {n ∈ N : φ(n)}. We have to show A = N. Since 1 ∈ A by (a), and (b) n ∈ A ⇒ φ(n) ⇒ φ(n + 1) ⇒ S(n) = n + 1 ∈ A, i.e. S(A) ⊆ A, the Peano axiom P3 implies A = N. (3.2) Remark 3.3. To prove some φ(n) for each n ∈ N by induction according to Th. 3.2 consists of the following two steps: (a) Prove φ(1), the so-called base case. (b) Perform the inductive step, i.e. prove that φ(n) (the induction hypothesis) implies φ(n + 1). Example 3.4. We use induction to prove the statement n(n + 1) ∀ 1 + 2 + ··· + n = : n∈N 2 | {z } (3.3) φ(n) , i.e. φ(1) is true. Base Case (n = 1): 1 = 1·2 2 holds. Induction Hypothesis: Assume φ(n), i.e. 1 + 2 + · · · + n = n(n+1) 2 Induction Step: One computes 1 + 2 + · · · + n + (n + 1) φ(n) = = n(n + 1) n(n + 1) + 2n + 2 +n+1= 2 2 2 (n + 1)(n + 2) n + 3n + 2 = , 2 2 (3.4) i.e. φ(n + 1) holds and the induction is complete. Corollary 3.5. Theorem 3.2 remains true if (b) is replaced by ∀ φ(m) ⇒ φ(n + 1) . ∀ n∈N 1≤m≤n AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (3.5) 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 46 Proof. If, for each n ∈ N, we use ψ(n) to denote ∀ φ(m), then (3.5) is equivalent to 1≤m≤n ∀ ψ(n) ⇒ ψ(n + 1) , i.e. to Th. 3.2(b) with φ replaced by ψ. Thus, Th. 3.2 implies n∈N ψ(n) holds true for each n ∈ N, i.e. φ(n) holds true for each n ∈ N. Corollary 3.6. Let I be an index set. Suppose, for each i ∈ I, φ(i) is a statement. If there is a bijective map f : N −→ I and (a) and (b) both hold, where (a) φ f (1) is true, (b) ∀ φ f (n) ⇒ φ f (n + 1) , n∈N then φ(i) is true for every i ∈ I. Finite Induction: The above assertion remains true if f : {1, . . . , m} −→ I is bijective for some m ∈ N and N in (b) is replaced by {1, . . . , m − 1}. Proof. If, for each n ∈ N, we use ψ(n) to denote φ f (n) , then Th. 3.2 shows ψ(n) is true for every n ∈ N. Given i ∈ I, we have n := f −1 (i) ∈ N with f (n) = i, showing that φ(i) = φ f (n) = ψ(n) is true. For the finite induction, let ψ(n) denote n ≤ m ∧ φ f (n) ∨ n > m. Then, for 1 ≤ n < m, we have ψ(n) ⇒ ψ(n + 1) due to (b). For n ≥ m, we also have ψ(n) ⇒ ψ(n + 1) due to n ≥ m ⇒ n + 1 > m. Thus, Th. 3.2 shows ψ(n) is true for every n ∈ N. Given i ∈ I, it is n := f −1 (i) ∈ {1, . . . , m} with f (n) = i. Since n ≤ m ∧ ψ(n) ⇒ φ f (n) , we obtain that φ(i) is true. Apart from providing a widely employable proof technique, the most important application of Th. 3.2 is the possibility to define sequences (i.e. functions with domain N, cf. Def. 2.15(b)) inductively, using so-called recursion: Theorem 3.7 (Recursion Theorem). Let A be a nonempty set and x ∈ A. Given a sequence of functions (fn )n∈N , where fn : An −→ A, there exists a unique sequence (xn )n∈N in A satisfying the following two conditions: (i) x1 = x. (ii) ∀ xn+1 = fn (x1 , . . . , xn ). n∈N The same holds if N is replaced by an index set I as in Cor. 3.6. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 47 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS Proof. To prove uniqueness, let (xn )n∈N and (yn )n∈N be sequences in A, both satisfying (i) and (ii), i.e. x1 = y1 = x and ∀ n∈N (3.6a) xn+1 = fn (x1 , . . . , xn ) ∧ yn+1 = fn (y1 , . . . , yn ) . (3.6b) We prove by induction (in the form of Cor. 3.5) that (xn )n∈N = (yn )n∈N , i.e. ∀ xn = yn : {z } n∈N | (3.7) φ(n) Base Case (n = 1): φ(1) is true according to (3.6a). Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}, i.e. xm = ym holds for each m ∈ {1, . . . , n}. Induction Step: One computes φ(1),...,φ(n) (3.6b) xn+1 = fn (x1 , . . . , xn ) = (3.6b) fn (y1 , . . . , yn ) = yn+1 , (3.8) i.e. φ(n + 1) holds and the induction is complete. To prove existence, we have to show that there is a function F : N −→ A such that the following two conditions hold: F (1) = x, (3.9a) ∀ F (n + 1) = fn F (1), . . . , F (n) . n∈N To this end, let F := B ⊆ N × A : (1, x) ∈ B ∧ ∀ n∈N, (1,a1 ),...,(n,an )∈B and G := \ n + 1, fn (a1 , . . . , an ) ∈ B B. (3.9b) (3.10) (3.11) B∈F Note that G is well-defined, as N × A ∈ F. Also, clearly, G ∈ F. We would like to define F such that G = graph(F ). For this to be possible, we will show, by induction, ∀ ∃! (n, xn ) ∈ G . {z } n∈N xn ∈A | φ(n) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (3.12) 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 48 Base Case (n = 1): From the definition of G, we know (1, x) ∈ G. If (1, a) ∈ G with a 6= x, then H := G \ {(1, a)} ∈ F, implying G ⊆ H in contradiction to (1, a) ∈ / H. This shows a = x and proves φ(1). Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}. Induction Step: From the induction hypothesis, we know ∃! (x1 ,...,xn )∈An (1, x1 ), . . . , (n, xn ) ∈ G. Thus, if we let xn+1 := fn (x1 , . . . , xn ), then (n + 1, xn+1 ) ∈ G by the definition of G. If (n + 1, a) ∈ G with a 6= xn+1 , then H := G \ {(n + 1, a)} ∈ F (using the uniqueness of the (1, x1 ), . . . , (n, xn ) ∈ G), implying G ⊆ H in contradiction to (n + 1, a) ∈ / H. This shows a = xn+1 , proves φ(n + 1), and completes the induction. Due to (3.12), we can now define F : N −→ A, F (n) := xn , and the definition of G then guarantees the validity of (3.9). Example 3.8. In many applications of Th. 3.7, one has functions gn : A −→ A and uses (3.13) ∀ fn : An −→ A, fn (x1 , . . . , xn ) := gn (xn ) . n∈N Here are some important concrete examples: (a) The factorial function F : N0 −→ N, n 7→ n!, is defined recursively by 0! := 1, ∀ (n + 1)! := (n + 1) · n!, 1! := 1, n∈N (3.14a) i.e. we have A = N and gn (x) := (n + 1) · x. So we obtain (n!)n∈N0 = (1, 1, 2, 6, 24, 120, . . . ). (3.14b) (b) Summation Symbol: On A = R (or, more generally, on every set A, where an addition + : A × A −→ A is defined), define recursively, for each given (possibly finite) sequence (a1 , a2 , . . . ) in A: 1 X i=1 ai := a1 , n+1 X ai := an+1 + i=1 n X i=1 ai for n ≥ 1, (3.15a) i.e. we have gn : A −→ A, gn (x) := x + an+1 . (3.15b) In (3.15a), one can also use other symbols for i, except a and n; for a finite sequence, n needs to be less than the maximal index of the finite sequence. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 49 More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, then define n X X ai := aφ(i) . (3.15c) i=1 i∈I The commutativity of addition implies that the definition in (3.15c) is actually independent of the chosen bijective map φ (cf. Th. B.4 in the Appendix). Also define X ai := 0 (3.15d) i∈∅ (for a general A, 0 is meant to be an element such that a + 0 = 0 + a = a for each a ∈ A and we can even define this if 0 ∈ / A). (c) Product Symbol: On A = R (or, more generally, on every set A, where a multiplication · : A × A −→ A is defined), define recursively, for each given (possibly finite) sequence (a1 , a2 , . . . ) in A: 1 Y i=1 ai := a1 , n+1 Y i=1 ai := an+1 · n Y i=1 ai for n ≥ 1, (3.16a) i.e. we have gn : A −→ A, gn (x) := an+1 · x. (3.16b) In (3.16a), one can also use other symbols for i, except a and n; for a finite sequence, n needs to be less than the maximal index of the finite sequence. More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, then define n Y Y ai := aφ(i) . (3.16c) i=1 i∈I The commutativity of multiplication implies that the definition in (3.16c) is actually independent of the chosen bijective map φ (cf. Th. B.4 in the Appendix); however, we will see later that, for a general multiplication on a set A, commutativity will not always hold (an important example will be matrix multiplication), and, in that case, the definition in (3.16c) does, in general, depend on the chosen bijective map φ. Also define Y ai := 1 (3.16d) i∈∅ (for a general A, 1 is meant to be an element such that a · 1 = 1 · a = a for each a ∈ A and we can even define this if 1 ∈ / A). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 50 Example 3.9. As an (academic) example, where, in each step, the recursive definition does depend on all previously computed values, consider the sequence (xn )n∈N , defined by n 1 Y xi , x1 := 1, ∀ xn+1 := n∈N n i=1 i.e. by setting A := N and n fn : An −→ A, fn (x1 , . . . , xn ) := 1 Y xi . n i=1 One obtains 1 1 1 x1 = 1, x2 = f1 (1) = 1, x3 = f2 (1, 1) = , x4 = f3 1, 1, = , 2 2 6 1 1 1 1 1 1 1 = , x6 = f5 1, 1, , , = , ... x5 = f4 1, 1, , 2 6 48 2 6 48 2880 — In the above recursive definitions, we have always explicitly specified A and the gn or fn . However, in the literature as well as in the rest of this class, most of the time, the gn or fn are not provided explicitly. 3.2 Cardinality: The Size of Sets Cardinality measures the size of sets. For a finite set A, it is precisely the number of elements in A. For an infinite set, it classifies the set’s degree or level of infinity (it turns out that not all infinite sets have the same size). 3.2.1 Definition and General Properties Definition 3.10. (a) The sets A, B are defined to have the same cardinality or the same size if, and only if, there exists a bijective map ϕ : A −→ B. According to Th. 3.11 below, this defines an equivalence relation on every set of sets. (b) The cardinality of a set A is n ∈ N (denoted #A = n) if, and only if, there exists a bijective map ϕ : A −→ {1, . . . , n}. The cardinality of ∅ is defined as 0, i.e. #∅ := 0. A set A is called finite if, and only if, there exists n ∈ N0 such that #A = n; A is called infinite if, and only if, A is not finite, denoted #A = ∞ (in the AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 51 strict sense, this is an abuse of notation, since ∞ is not a cardinality – for example #N = ∞ and #P(N) = ∞, but N and P(N) do not have the same cardinality, since the power set P(A) is always strictly bigger than A (see Th. 3.16 below) – #A = ∞ is merely an abbreviation for the statement “A is infinite”). The interested student finds additional material regarding characterizations of infinite sets in Th. A.54 of the Appendix. (c) The set A is called countable if, and only if, A is finite or A has the same cardinality as N. Otherwise, A is called uncountable. Theorem 3.11. Let M be a set of sets. Then the relation ∼ on M, defined by A ∼ B :⇔ A and B have the same cardinality, constitutes an equivalence relation on M. Proof. According to Def. 2.32, we have to prove that ∼ is reflexive, symmetric, and transitive. According to Def. 3.10(a), A ∼ B holds for A, B ∈ M if, and only if, there exists a bijective map f : A −→ B. Thus, since the identity Id : A −→ A is bijective, A ∼ A, showing ∼ is reflexive. If A ∼ B, then there exists a bijective map f : A −→ B, and f −1 is a bijective map f −1 : B −→ A, showing B ∼ A and that ∼ is symmetric. If A ∼ B and B ∼ C, then there are bijective maps f : A −→ B and g : B −→ C. Then, according to Th. 2.14, the composition (g ◦ f ) : A −→ C is also bijective, proving A ∼ C and that ∼ is transitive. Theorem 3.12 (Schröder-Bernstein). Let A, B be sets. The following statements are equivalent (even without assuming the axiom of choice): (i) The sets A and B have the same cardinality (i.e. there exists a bijective map φ : A −→ B). (ii) There exist an injective map f : A −→ B and an injective map g : B −→ A. We will base the proof of the Schröder-Bernstein theorem on the following lemma (for an alternative proof, see [Phi16, Th. A.56]): Lemma 3.13. Let A be a set. Consider P(A) to be endowed with the partial order given by set inclusion, i.e., for each X, Y ∈ P(A), X ≤ Y if, and only if, X ⊆ Y . If F : P(A) −→ P(A) is isotone with respect to that order, then F has a fixed point, i.e. F (X0 ) = X0 for some X0 ∈ P(A). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS Proof. Define A := {X ∈ P(A) : F (X) ⊆ X}, X0 := \ 52 X X∈A (X0 is well-defined, since F (A) ⊆ A). Suppose X ∈ A, i.e. F (X) ⊆ X and X0 ⊆ X. Then F (X0 ) ⊆ F (X) ⊆ X due to the isotonicity of F . Thus, F (X0 ) ⊆ X for every X ∈ A, i.e. F (X0 ) ⊆ X0 . Using the isotonicity of F again shows F (F (X0 )) ⊆ F (X0 ), implying F (X0 ) ∈ A and X0 ⊆ F (X0 ), i.e. F (X0 ) = X0 as desired. Proof of Th. 3.12. (i) trivially implies (ii), as one can simply set f := φ and g := φ−1 . It remains to show (ii) implies (i). Thus, let f : A −→ B and g : B −→ A be injective. To apply Lem. 3.13, define F : P(A) −→ P(A), F (X) := A \ g B \ f (X) , and note X ⊆ Y ⊆ A ⇒ f (X) ⊆ f (Y ) ⇒ B \ f (Y ) ⊆ B \ f (X) ⇒ g B \ f (Y ) ⊆ g B \ f (X) ⇒ F (X) ⊆ F (Y ). Thus, by Lem. 3.13, F has a fixed point X0 . We claim that a bijection is obtained via setting ( f (x) for x ∈ X0 , φ : A −→ B, φ(x) := −1 g (x) for x ∈ / X0 . First, φ is well-defined, since x ∈ / X0 = F (X0 ) implies x ∈ g B \ f (X0 ) . To verify that φ is injective, let x, y ∈ A, x 6= y. If x, y ∈ X0 , then φ(x) 6= φ(y), as f is injective. If x, y ∈ A \ X0 , then φ(x) 6= φ(y), as g −1 is well-defined. If x ∈ X0 and y∈ / X0 , then φ(x) ∈ f (X0 ) and φ(y) ∈ B \ f (X0 ), once again, implying φ(x) 6= φ(y). If remains to prove surjectivity. If b ∈ f (X0 ), then φ(f −1 (b)) = b. If b ∈ B \ f (X0 ), then g(b) ∈ / X0 = F (X0 ), i.e. φ(g(b)) = b, showing φ to be surjective. Theorem 3.14. Let A, B be nonempty sets. Then the following statements are equivalent (where the implication “(ii) ⇒ (i)” makes use of the axiom of choice (AC)). (i) There exists an injective map f : A −→ B. (ii) There exists a surjective map g : B −→ A. Proof. According to Th. 2.13(b), (i) is equivalent to f having a left inverse g : B −→ A (i.e. g ◦ f = IdA ), which is equivalent to g having a right inverse, which, according to Th. 2.13(a), is equivalent to (ii) (AC is used in the proof of Th. 2.13(a) to show each surjective map has a right inverse). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 53 Corollary 3.15. Let A, B be nonempty sets. Using AC, we can expand the two equivalent statements of Th. 3.12 to the following list of equivalent statements: (i) The sets A and B have the same cardinality (i.e. there exists a bijective map φ : A −→ B). (ii) There exist an injective map f : A −→ B and an injective map g : B −→ A. (iii) There exist a surjective map f : A −→ B and a surjective map g : B −→ A. (iv) There exist an injective map f1 : A −→ B and a surjective map f2 : A −→ B. (v) There exist an injective map g1 : B −→ A and a surjective map g2 : B −→ A. Proof. The equivalences are an immediate consequence of combining Th. 3.12 with Th. 3.14. Theorem 3.16. Let A be a set. There can never exist a surjective map from A onto P(A) (in this sense, the size of P(A) is always strictly bigger than the size of A; in particular, A and P(A) can never have the same size). Proof. If A = ∅, then there is nothing to prove. For nonempty A, the idea is to conduct a proof by contradiction. To this end, assume there does exist a surjective map f : A −→ P(A) and define B := {x ∈ A : x ∈ / f (x)}. Now B is a subset of A, i.e. B ∈ P(A) and the assumption that f is surjective implies the existence of a ∈ A such that f (a) = B. If a ∈ B, then a ∈ / f (a) = B, i.e. a ∈ B implies a ∈ B ∧ ¬(a ∈ B), so that the principle of contradiction tells us a ∈ / B must be true. However, a ∈ / B implies a ∈ f (a) = B, i.e., this time, the principle of contradiction tells us a ∈ B must be true. In conclusion, we have shown our original assumption that there exists a surjective map f : A −→ P(A) implies a ∈ B ∧ ¬(a ∈ B), i.e., according to the principle of contradiction, no surjective map from A into P(A) can exist. 3.2.2 Finite Sets While many results concerning cardinalities of finite sets seem intuitively clear, as a mathematician, one still has to provide a rigorous proof. Providing such proofs in the present section also provides us with the opportunity to see more examples of induction proofs. We begin by showing finite cardinalities to be uniquely determined. The key is the following theorem: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 54 Theorem 3.17. If m, n ∈ N and the map f : {1, . . . , m} −→ {1, . . . , n} is bijective, then m = n. Proof. We conduct the proof via induction on m. Base Case (m = 1): If m = 1, then the surjectivity of f implies n = 1. Induction Step: Let m > 1. From the bijective map f , we define the map for x = m, n g : {1, . . . , m} −→ {1, . . . , n}, g(x) := f (m) for x = f −1 (n), f (x) otherwise. Then g is bijective, since it is the composition g = h ◦ f of the bijective map f with the bijective map h : {f (m), n} −→ {f (m), n}, h(f (m)) := n, h(n) := f (m). Thus, the restriction g↾{1,...,m−1} : {1, . . . , m−1} −→ {1, . . . , n−1} must also be bijective, such that the induction hypothesis yields m − 1 = n − 1, which, in turn, implies m = n as desired. Corollary 3.18. Let m, n ∈ N and let A be a set. If #A = m and #A = n, then m = n. Proof. If #A = m, then, according to Def. 3.10(b), there exists a bijective map f : A −→ {1, . . . , m}. Analogously, if #A = n, then there exists a bijective map g : A −→ {1, . . . , n}. In consequence, we have the bijective map (g ◦ f −1 ) : {1, . . . , m} −→ {1, . . . , n}, such that Th. 3.17 yields m = n. Theorem 3.19. Let A 6= ∅ be a finite set. (a) If B ⊆ A with A 6= B, then B is finite with #B < #A. (b) If a ∈ A, then # A \ {a} = #A − 1. Proof. For #A = n ∈ N, we use induction to prove (a) and (b) simultaneously, i.e. we show #A = n ⇒ ∀ ∀ #B ∈ {0, . . . , n − 1} ∧ # A \ {a} = n − 1 . ∀ n∈N B∈P(A)\{A} a∈A {z } | φ(n) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 55 Base Case (n = 1): In this case, A has precisely one element, i.e. B = A \ {a} = ∅, and #∅ = 0 = n − 1 proves φ(1). Induction Step: For the induction hypothesis, we assume φ(n) to be true, i.e. we assume (a) and (b) hold for each A with #A = n. We have to prove φ(n + 1), i.e., we consider A with #A = n + 1. From #A = n + 1, we conclude the existence of a bijective map ϕ : A −→ {1, . . . , n + 1}. We have to construct a bijective map ψ : A \ {a} −→ {1, . . . , n}. To this end, set k := ϕ(a) and define the auxiliary function n + 1 for x = k, f : {1, . . . , n + 1} −→ {1, . . . , n + 1}, f (x) := k for x = n + 1, x for x ∈ / {k, n + 1}. Then f ◦ ϕ : A −→ {1, . . . , n + 1} is bijective by Th. 2.14, and (f ◦ ϕ)(a) = f (ϕ(a)) = f (k) = n + 1. Thus, the restriction ψ := (f ◦ ϕ) ↾A\{a} is the desired bijective map ψ : A \ {a} −→ {1, . . . , n}, proving # A \ {a} = n. It remains to consider the strict subset B of A. Since B is a strict subset of A, there exists a ∈ A \ B. Thus, B ⊆ A \ {a} and, as we have already shown # A \ {a} = n, the induction hypothesis applies and yields B is finite with #B ≤ # A \ {a} = n, i.e. #B ∈ {0, . . . , n}, proving φ(n + 1), thereby completing the induction. Theorem 3.20. For #A = #B = n ∈ N and f : A −→ B, the following statements are equivalent: (i) f is injective. (ii) f is surjective. (iii) f is bijective. Proof. Exercise. Lemma 3.21. For each finite set A (i.e. #A = n ∈ N0 ) and each B ⊆ A, one has #(A \ B) = #A − #B. Proof. For B = ∅, the assertion is true since #(A \ B) = #A = #A − 0 = #A − #B. For B 6= ∅, the proof is conducted over the size of B, i.e. as a finite induction (cf. Cor. 3.6) over the set {1, . . . , n}, showing ∀ #B = m ⇒ #(A \ B) = #A − #B . m∈{1,...,n} | {z } φ(m) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 56 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS Base Case (m = 1): φ(1) is precisely the statement provided by Th. 3.19(b). Induction Step: For the induction hypothesis, we assume φ(m) with 1 ≤ m < n. To prove φ(m + 1), consider B ⊆ A with #B = m + 1. Fix an element b ∈ B and set B1 := B \ {b}. Then #B1 = m by Th. 3.19(b), A \ B = (A \ B1 ) \ {b}, and we compute φ(m) Th. 3.19(b) #(A \ B) = # (A \ B1 ) \ {b} = #(A \ B1 ) − 1 = #A − #B1 − 1 = #A − #B, proving φ(m + 1) and completing the induction. Theorem 3.22. If A, B are finite sets, then #(A ∪ B) = #A + #B − #(A ∩ B). Proof. The assertion is clearly true if A or B is empty. If A and B are nonempty, then there exist m, n ∈ N such that #A = m and #B = n, i.e. there are bijective maps f : A −→ {1, . . . , m} and g : B −→ {1, . . . , n}. We first consider the case A∩B = ∅. We need to construct a bijective map h : A∪B −→ {1, . . . , m + n}. To this end, we define ( f (x) for x ∈ A, h : A ∪ B −→ {1, . . . , m + n}, h(x) := g(x) + m for x ∈ B. The bijectivity of f and g clearly implies the bijectivity of h, proving #(A ∪ B) = m + n = #A + #B. ˙ Finally, we consider the case of arbitrary A, B. Since A ∪ B = A ∪(B \ A) and B \ A = B \ (A ∩ B), we can compute ˙ #(A ∪ B) = # A ∪(B \ A) = #A + #(B \ A) Lem. 3.21 = #A + # B \ (A ∩ B) = #A + #B − #(A ∩ B), thereby establishing the case. Theorem 3.23. If (A1 , . . . , An ), n ∈ N, is a finite sequence of finite sets, then # n Y i=1 Ai = # A1 × · · · × An = n Y #Ai . i=1 Proof. If at least one Ai is empty, then (3.17) is true, since both sides are 0. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (3.17) 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 57 The case where all Ai are nonempty is proved by induction over n, i.e. we know ki := #Ai ∈ N for each i ∈ {1, . . . , n} and show by induction ∀ # n∈N n Y Ai = | i=1 {z φ(n) Base Case (n = 1): Q1 i=1 Ai = #A1 = k1 = Q1 n Y i=1 ki . } i=1 ki , i.e. φ(1) holds. Induction Step: From the induction hypothesis φ(n), Q we obtain a bijective map ϕ : Q A −→ {1, . . . , N }, where A := ni=1 Ai and N := ni=1 ki . To prove φ(n + 1), we need to construct a bijective map h : A × An+1 −→ {1, . . . , N · kn+1 }. Since #An+1 = kn+1 , there exists a bijective map f : An+1 −→ {1, . . . , kn+1 }. We define h : A × An+1 −→ {1, . . . , N · kn+1 }, h(a1 , . . . , an , an+1 ) := f (an+1 ) − 1 · N + ϕ(a1 , . . . , an ). Since ϕ and f are bijective, and since every m ∈ {1, . . . , N · kn+1 } has a unique representation in the form m = a · N + r with a ∈ {0, . . . , kn+1 − 1} and r ∈ {1, . . . , N } (see Th. D.1 in the Appendix), h is also bijective. This proves φ(n + 1) and completes the induction. Theorem 3.24. For each finite set A with #A = n ∈ N0 , one has #P(A) = 2n . Proof. The proof is conducted by induction by showing #A = n ⇒ #P(A) = 2n . n∈N0 | {z } ∀ φ(n) Base Case (n = 0): For n = 0, we have A = ∅, i.e. P(A) = {∅}. Thus, #P(A) = 1 = 20 , proving φ(0). Induction Step: Assume φ(n) and consider A with #A = n + 1. Then A contains at least one element a. For B := A \ {a}, we then know #B = n from Th. 3.19(b). Moreover, setting M := C ∪ {a} : C ∈ P(B) , we have the disjoint decomposition P(A) = P(B) ∪˙ M. As the map ϕ : P(B) −→ M, ϕ(C) := C ∪ {a}, is clearly bijective, P(B) and M have the same cardinality. Thus, #P(A) Th. 3.22 = #P(B) + #M = #P(B) + #P(B) φ(n) = 2 · 2n = 2n+1 , thereby proving φ(n + 1) and completing the induction. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 3.2.3 58 Countable Sets In this section, we present a number of important results regarding the natural numbers and countability. Theorem 3.25. (a) Every nonempty finite subset of a totally ordered set has a minimum and a maximum. (b) Every nonempty subset of N has a minimum. Proof. Exercise. Proposition 3.26. Every subset A of N is countable. Proof. Since ∅ is countable, we may assume A 6= ∅. From Th. 3.25(b), we know that every nonempty subset of N has a min. We recursively define a sequence in A by ( min An if An := A \ {ai : 1 ≤ i ≤ n} 6= ∅, a1 := min A, an+1 := an if An = ∅. This sequence is the same as the function f : N −→ A, f (n) = an . An easy induction shows that, for each n ∈ N, an 6= an+1 implies the restriction f ↾{1,...,n+1} is injective. Thus, if there exists n ∈ N such that an = an+1 , then f ↾{1,...,k} : {1, . . . , k} −→ A is bijective, where k := min{n ∈ N : an = an+1 }, showing A is finite, i.e. countable. If there does not exist n ∈ N with an = an+1 , then f is injective. Another easy induction shows that, for each n ∈ N, f ({1, . . . , n}) ⊇ {k ∈ A : k ≤ n}, showing f is also surjective, proving A is countable. Proposition 3.27. For each set A 6= ∅, the following three statements are equivalent: (i) A is countable. (ii) There exists an injective map f : A −→ N. (iii) There exists a surjective map g : N −→ A. Proof. Directly from the definition of countable in Def. 3.10(c), one obtains (i)⇒(ii) and (i)⇒(iii). To prove (ii)⇒(i), let f : A −→ N be injective. Then f : A −→ f (A) is bijective, and, since f (A) ⊆ N, f (A) is countable by Prop. 3.26, proving A is countable as well. To prove (iii)⇒(i), let g : N −→ A be surjective. Then g has a right inverse f : A −→ N. One can obtain this from Th. 2.13(a), but, here, we can actually construct f without the axiom of choice: For a ∈ A, let f (a) := min g −1 ({a}) (recall Th. 3.25(b)). Then, clearly, g ◦ f = IdA . But this means g is a left inverse for f , showing f is injective according to Th. 2.13(b). Then A is countable by an application of (ii). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 59 3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS Theorem 3.28. If (A1 , . . . , An ), n ∈ N, is a finite family of countable sets, then is countable. Qn i=1 Ai Proof. We first consider the special case n = 2 with A1 = A2 = N and show the map ϕ : N × N −→ N, ϕ(m, n) := 2m · 3n , is injective: If ϕ(m, n) = ϕ(p, q), then 2m · 3n = 2p · 3q . Moreover m ≤ p or p ≤ m. If m ≤ p, then 3n = 2p−m · 3q . Since 3n is odd, 2p−m · 3q must also be odd, implying p − m = 0, i.e. m = p. Moreover, we now have 3n = 3q , implying n = q, showing (m, n) = (p, q), i.e. ϕ is injective. We now come back to the general case stated in the theorem. If at least one of the Ai is empty, then A is empty. So it remains to consider the case, where all Ai are nonempty. The proof is conducted by induction by showing ∀ n∈N n Y Ai is countable . i=1 | {z φ(n) } Base Case (n = 1): φ(1) is merely the hypothesis that A1 is countable. Q Induction Step: Assuming φ(n), Prop. 3.27(ii) provides injective maps f1 : Q ni=1 Ai −→ N and f2 : An+1 −→ N. To prove φ(n+1), we provide an injective map h : n+1 i=1 Ai −→ N: Define h: n+1 Y i=1 Ai −→ N, h(a1 , . . . , an , an+1 ) := ϕ f1 (a1 , . . . , an ), f2 (an+1 ) . The injectivity of f1 , f2 , and ϕ clearly implies the injectivity of h, thereby proving φ(n + 1) and completing the induction. Theorem 3.29. If (Ai )i∈I is a countable family of countable S sets (i.e. ∅ 6= I is countable and each Ai , i ∈ I, is countable), then the union A := i∈I Ai is also countable (this result makes use of AC, cf. Rem. 3.30 below). Proof. It suffices to consider the case that all Ai are nonempty. Moreover, according to Prop. 3.27(iii), it suffices to construct a surjective map ϕ : N −→ A. Also according to Prop. 3.27(iii), the countability of I and the Ai provides us with surjective maps f : N −→ I and gi : N −→ Ai (here AC is used to select each gi from the set of all surjective maps from N onto Ai ). Define F : N × N −→ A, F (m, n) := gf (m) (n). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 60 4 BASIC ALGEBRAIC STRUCTURES Then F is surjective: Given x ∈ A, there exists i ∈ I such that x ∈ Ai . Since f is surjective, there is m ∈ N satisfying f (m) = i. Moreover, since gi is surjective, there exists n ∈ N with gi (n) = x. Then F (m, n) = gi (n) = x, verifying that F is surjective. As N × N is countable by Th. 3.28, there exists a surjective map h : N −→ N × N. Thus, F ◦ h is the desired surjective map from N onto A. Remark 3.30. The axiom of choice is, indeed, essential for the proof of Th. 3.29. It is shown in [Jec73, Th. 10.6] that it is consistent with the axioms of ZF (i.e. with the axioms of Sec. A.3 of the Appendix) that, e.g., the uncountable sets P(N) and R (cf. [Phi16, Th. F.2]) are countable unions of countable sets. 4 Basic Algebraic Structures We are now in a position to conduct some actual algebra, that means we will start to study sets A that come with a composition law ◦ : A × A −→ A, (a, b) 7→ a ◦ b, assumed to satisfy a number of rules, so-called axioms. One goal is to prove further rules as a consequence of the axioms. A main benefit of this method is the fact that these proved rules are then known to hold in every structure, satisfying the axioms. Perhaps the most simple structure that still occurs in a vast number of interesting places throughout mathematics is that of a group (see Def. 4.5(b) below), examples including the set of real numbers R with the usual addition as well as R \ {0} with the usual multiplication. So every rule we will prove as a consequence of the group axioms will hold in these two groups as well as in countless other groups. We note that vector spaces, the structures of central interest to linear algebra, are, in particular, always groups, so that everything we prove about groups in the following is, in particular, true in every vector space. 4.1 Magmas and Groups Definition 4.1. Let A be a nonempty set. A map ◦ : A × A −→ A, (x, y) 7→ x ◦ y, (4.1) is called a composition on A; the pair (A, ◦) is then called a magma. It is also common to call a composition a multiplication Q and to write x · y or just xy instead of x ◦ y. In this situation, the product symbol is defined as in Ex. 3.8(c). If the composition is called an addition and x + y is written instead of x ◦ y, then it is assumed that it is commutative, that means it satisfies the law of Def. 4.2(a) below (a multiplication might P or might now satisfy commutativity). For an addition, the summation symbol is defined as in Ex. 3.8(b). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 61 4 BASIC ALGEBRAIC STRUCTURES Definition 4.2. Let (A, ◦) be a magma. (a) ◦ is called commutative or abelian if, and only if, x ◦ y = y ◦ x holds for all x, y ∈ A. (b) ◦ is called associative if, and only if, x ◦ (y ◦ z) = (x ◦ y) ◦ z holds for all x, y, z ∈ A. (c) An element e ∈ A is called left neutral (resp. right neutral) if, and only if, ∀ x∈A e ◦ x = x (resp. x ◦ e = x). (4.2) e ∈ A is called neutral if, and only if, it is both left neutral and right neutral. — If associativity holds for each triple of elements, then it also holds for each finite tuple of elements; if the composition is associative and commutativity holds for each pair of elements, then it also holds for each finite tuple of elements (see Th. B.3 and Th. B.4 in the Appendix). Lemma 4.3. Let (A, ◦) be a magma with l, r ∈ A. If l is left neutral and r is right neutral, then l = r (in particular, A contains at most one neutral element). Proof. As l is left neutral and r is right neutral, we obtain r = l ◦ r = l. Notation 4.4. If a composition on a set A is written as a multiplication ·, then it is common to write a neutral element as 1 and to also call it a unit or one. If a (commutative) composition on a set A is written as an addition +, then it is common to write a neutral element as 0 and to also call it zero. Definition 4.5. Let (A, ◦) be a magma. (a) (A, ◦) is called a semigroup if, and only if, ◦ is associative. (b) (A, ◦) is called a group if, and only if, it is a semigroup with the additional property that A contains a right neutral4 element e ∈ A and, for each x ∈ A, there exists an inverse element x ∈ A, i.e. an element x ∈ A such that x ◦ x = e. (4.3) A group is called commutative or abelian if, and only if, ◦ is commutative. 4 We will see in Th. 4.6(a) that the remaining properties of a group then force this right neutral element to be a neutral element as well. However, not requiring e to be neutral here has the advantage that we know right away that we only need to prove the existence of a right neutral element to show some structure is a group. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 62 4 BASIC ALGEBRAIC STRUCTURES — Before considering examples of magmas (A, ◦) in Ex. 4.9 below, we prove some first rules and introduce some more notation. Theorem 4.6. Let (G, ◦) be a group with right neutral element e ∈ G. Then the following statements and rules are valid in G: (a) e is a neutral element (thus, in particular, it is uniquely determined according to Lem. 4.3). (b) If x, a ∈ G, then x ◦ a = e if, and only if, a ◦ x = e (i.e. an element is right inverse if, and only if, it is left inverse). Moreover, inverse elements are unique (for each x ∈ G, the unique inverse is then denoted by x−1 ). (c) (x−1 )−1 = x holds for each x ∈ G. (d) y −1 ◦ x−1 = (x ◦ y)−1 holds for each x, y ∈ G. (e) Cancellation Laws: For each x, y, a ∈ G, one has x ◦ a = y ◦ a ⇒ x = y, a ◦ x = a ◦ y ⇒ x = y. (4.4a) (4.4b) Proof. (a): Let x ∈ G. By Def. 4.5(b), there exists y ∈ G such that x ◦ y = e and, in turn, z ∈ G such that y ◦ z = e. Thus, as e is right neutral. e ◦ z = (x ◦ y) ◦ z = x ◦ (y ◦ z) = x ◦ e = x, implying x = e ◦ z = (e ◦ e) ◦ z = e ◦ (e ◦ z) = e ◦ x, showing e to be left neutral as well. (b): To a choose b ∈ G such that a ◦ b = e. If x ◦ a = e, then e = a ◦ b = (a ◦ e) ◦ b = (a ◦ (x ◦ a)) ◦ b = a ◦ ((x ◦ a) ◦ b) = a ◦ (x ◦ (a ◦ b)) = a ◦ (x ◦ e) = a ◦ x. Interchanging the roles of x and a now proves the converse. To prove uniqueness, let a, b be inverses to x. Then a = a ◦ e = a ◦ x ◦ b = e ◦ b = b. (c): x−1 ◦ x = e holds according to (b) and shows that x is the inverse to x−1 . Thus, (x−1 )−1 = x as claimed. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 63 4 BASIC ALGEBRAIC STRUCTURES (d) is due to y −1 ◦ x−1 ◦ x ◦ y = y −1 ◦ e ◦ y = e. (e): If x ◦ a = y ◦ a, then x = x ◦ a ◦ a−1 = y ◦ a ◦ a−1 = y as claimed; if a ◦ x = a ◦ y, then x = a−1 ◦ a ◦ x = a−1 ◦ a ◦ y = y as well. Notation 4.7. Exponentiation with Integer Exponents: Let (A, ·) be a magma with neutral element 1 ∈ A. Define recursively for each x ∈ A and each n ∈ N0 : x0 := 1, ∀ n∈N0 xn+1 := x · xn . (4.5a) Moreover, if (A, ·) constitutes a group, then also define for each x ∈ A and each n ∈ N: x−n := (x−1 )n . (4.5b) Theorem 4.8. Exponentiation Rules: Let (G, ·) be a semigroup with neutral element 1 ∈ G. Let x, y ∈ G. Then the following rules hold for each m, n ∈ N0 . If (G, ·) is a group, then the rules even hold for every m, n ∈ Z. (a) xm+n = xm · xn . (b) (xm )n = xm n . (c) If the composition is also commutative, then xn y n = (xy)n . Proof. (a): First, we fix n ∈ N0 and prove the statement for each m ∈ N0 by induction: The base case (m = 0) is xn = xn , which is true. For the induction step, we compute (4.5a) xm+1+n = x · xm+n ind. hyp. = (4.5a) x · xm · xn = xm+1 xn , completing the induction step. Now assume G to be a group. Consider m ≥ 0 and n < 0. If m + n ≥ 0, then, using what we have already shown, (4.5b) xm xn = xm (x−1 )−n = xm+n x−n (x−1 )−n = xm+n . Similarly, if m + n < 0, then (4.5b) (4.5b) xm xn = xm (x−1 )−n = xm (x−1 )m (x−1 )−n−m = xm+n . The case m < 0, n ≥ 0 is treated completely analogously. It just remains to consider m < 0 and n < 0. In this case, (4.5b) (4.5b) xm+n = x−(−m−n) = (x−1 )−m−n = (x−1 )−m · (x−1 )−n = xm · xn . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 64 4 BASIC ALGEBRAIC STRUCTURES (b): First, we prove the statement for each n ∈ N0 by induction (for m < 0, we assume G to be a group): The base case (n = 0) is (xm )0 = 1 = x0 , which is true. For the induction step, we compute (4.5a) (xm )n+1 = xm · (xm )n ind. hyp. = (a) xm · xm n = xm n+m = xm (n+1) , completing the induction step. Now, let G be a group and n < 0. We already know (xm )−1 = x−m . Thus, using what we have already shown, (4.5b) (xm )n = (xm )−1 −n = (x−m )−n = x(−m)(−n) = xm n . (c): For n ∈ N0 , the statement is proved by induction: The base case (n = 0) is x0 y 0 = 1 = (xy)0 , which is true. For the induction step, we compute (4.5a) xn+1 y n+1 = x · xn · y · y n ind. hyp. = (4.5a) xy · (xy)n = (xy)n+1 , completing the induction step. If G is a group and n < 0, then, using what we have already shown, (4.5b) xn y n = (x−1 )−n (y −1 )−n = (x−1 y −1 )−n which completes the proof. Th. 4.6(d) = (xy)−1 −n (4.5b) = (xy)n , Example 4.9. (a) Let M be a set. Then (P(M ), ∩) and (P(M ), ∪) are magmas with neutral elements, where M is neutral for ∩ and ∅ is neutral for ∪. From Th. 1.29, we also know ∩ and ∪ to be associative and commutative, showing (P(M ), ∩) and (P(M ), ∪) to form abelian semigroups. However, if M 6= ∅, then neither structure forms a group, due to the lack of inverse elements: If N ( M , then there does not exist B ⊆ M such that N ∩ B = M ; if ∅ 6= N ⊆ M , then there does not exist B ⊆ M such that N ∪ B = ∅. If M = ∅, then P(M ) = {∅} and (P(M ), ∩) and (P(M ), ∪) both constitute the group with one element (∅ being inverse to itself in both cases). (b) Let M be a set and define SM := (π : M −→ M ) : π is bijective , (4.6) where the elements of SM are also called permutations on M . Then (SM , ◦), where ◦ is given by composition of maps according to Def. 2.8, forms a group, the so-called symmetric group or permutation group on M . We verify (SM , ◦) to be a group: ◦ is well-defined on SM , since the composition of bijective maps is bijective by Th. 2.14. Moreover, ◦ is associative by Th. 2.10(a). The neutral element is given by the AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 65 4 BASIC ALGEBRAIC STRUCTURES identity map Id : M −→ M , Id(a) := a, and, if π ∈ SM , then its inverse map π −1 is also its inverse element in the group SM . We claim that ◦ is commutative on SM if, and only if, #M ≤ 2: While commutativity is clear for #M ≤ 2, if {a, b, c} ⊆ M with #{a, b, c} = 3 and we define f, g ∈ SM by letting a c f (x) := b x for x = a, for x = b, for x = c, for x ∈ / {a, b, c}, b a g(x) := c x for x = a, for x = b, for x = c, for x ∈ / {a, b, c}. (4.7) Then (f ◦ g)(a) = c 6= b = (g ◦ f )(a), showing ◦ is not commutative. The case M = {1, . . . , n}, n ∈ N, is often of particular interest and, in this case, one also writes Sn for SM . (c) (N0 , +) and (N0 , ·), where + and · denote the usual addition and multiplication on N0 , respectively (see [Phi16, Def. D.1] for a definition via recursion), form commutative semigroups (see [Phi16, Th. D.2]), but no groups due to the lack of inverse elements. From N0 , one can then construct the sets of integers Z (see Ex. 2.36(a)), rational numbers Q (see Ex. 2.36(b)), real numbers R (see [Phi16, Def. D.36]), and complex numbers C (see [Phi16, Def. 5.1]). One can extend + and · to each of these sets (see Th. 4.15, Ex. 4.33, and Ex. 4.39 below, [Phi16, Def. D.36], and [Phi16, Def. 5.1]) and show that (Z, +), (Q, +), (R, +), (C, +) are commutative groups (see Th. 4.15, Ex. 4.33(b), and Ex. 4.39 below, [Phi16, Th. D.39], and [Phi16, Th. 5.2]) as well as (Q \ {0}, ·), (R \ {0}, ·), (C \ {0}, ·) (see Ex. 4.39 below, [Phi16, Th. D.39], and [Phi16, Th. 5.2]). (d) Let (A, ◦) be a magma. We define another magma (P(A), ◦) by letting ∀ B,C∈P(A) B ◦ C := {b ◦ c : b ∈ B ∧ c ∈ C} ⊆ A. (4.8) The case, where B or C is a singleton set is of particular interest and one then uses the following simplified notation: a ◦ B := {a} ◦ B, B ◦ a := B ◦ {a} . (4.9) ∀ ∀ B∈P(A) a∈A Sets of the form a ◦ B are called left cosets, sets of the form B ◦ a are called right cosets. If (A, ◦) is commutative, then B ◦ C = C ◦ B for each B, C ∈ P(A), since a = b ◦ c with b ∈ B, c ∈ C if, and only if, a = c ◦ b. In the same manner, one sees that, if ◦ is associative on A, then it is also associative on P(A). If e ∈ A is left (resp. right) neutral, then, clearly, {e} is left (resp. right) neutral for ◦ on P(A). In AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 66 4 BASIC ALGEBRAIC STRUCTURES general, one can not expect (P(A), ◦) to be a group, even if (A, ◦) is a group, due to the lack of inverse elements: For example, while (Z, +) is a group, (P(Z), +) is not: For example, B := {0, 1} has no inverse: If C were inverse to B, then −1 ∈ C, implying −1 = 0 + (−1) ∈ B + C, i.e. B + C 6= {0}, in contradiction to C being inverse to B. (e) Let (A, ·) be a magma and let M be a set. We consider the set F(M, A) = AM of functions, mapping M into A. We make AM into a magma by defining · pointwise on AM : Define ∀ f,g∈AM f · g : M −→ A, (f · g)(x) := f (x) · g(x). (4.10) We then have (A, ·) commutative (A, ·) associative e ∈ A left/right neutral (A, ·) group ⇒ ⇒ ⇒ ⇒ (AM , ·) commutative, (AM , ·) associative, (4.11a) (4.11b) M (4.11c) (A , ·) group, (4.11d) f ∈ A , f ≡ e left/right neutral, M where (4.11a) – (4.11c) are immediate from (4.10). To verify (4.11d), let (A, ·) be a group with e ∈ A being neutral. If f ∈ AM , then let g : M −→ A, g(x) := (f (x))−1 . Then ∀ x∈M (f · g)(x) = f (x) · (f (x))−1 = e, showing g to be inverse to f in AM , i.e. (AM , ·) forms a group. — The, perhaps, most important notion when studying structures is that of the composition-respecting map. The technical term for such a map is homomorphism, which we define next: Definition 4.10. Let (A, ◦) and (B, ◦) be magmas (caveat: to simplify notation, we denote both compositions by ◦; however, they might even denote different compositions on the same set A = B). A map φ : A −→ B is called homomorphism if, and only if, ∀ x,y∈A φ(x ◦ y) = φ(x) ◦ φ(y). (4.12) If φ is a homomorphism, then it is called monomorphism if, and only if, it is injective; epimorphism if, and only if, it is surjective; isomorphism if, and only if, it is bijective; AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 67 4 BASIC ALGEBRAIC STRUCTURES endomorphism if, and only if, (A, ◦) = (B, ◦); automorphism if, and only if, it is both endomorphism and isomorphism. Moreover, (A, ◦) and (B, ◦) are called isomorphic (denoted (A, ◦) ∼ = (B, ◦)) if, and only if, there exists an isomorphism φ : A −→ B. If A and B have neutral elements e ∈ A, e′ ∈ B, then a homomorphism φ : A −→ B is called unital if, and only if, φ(e) = e′ . — If (A, ◦) and (B, ◦) are isomorphic, then, from an algebraic perspective, the two structures are the same, except that the elements of the underlying set have been “renamed” (where the isomorphism provides the assignment rule for obtaining the new names). Proposition 4.11. Let (A, ◦), (B, ◦), (C, ◦) be magmas. (a) If φ : A −→ B and ψ : B −→ C are homomorphisms, then so is ψ ◦ φ. If φ, ψ are unital, so is ψ ◦ φ. (b) If φ : A −→ B is an isomorphism, then φ−1 is an isomorphism as well. (c) Let φ : A −→ B be an epimorphism. Then the following implications hold: (A, ◦) commutative (A, ◦) semigroup (A, ◦) group ⇒ ⇒ ⇒ (B, ◦) commutative, (B, ◦) semigroup, (B, ◦) group. If φ is even an isomorphism, then the above implications become equivalences. (d) Let φ : A −→ B be an epimorphism. If e ∈ A is neutral, then B contains a neutral element and φ is unital. (e) If (A, ◦), (B, ◦) are groups and φ : A −→ B is a homomorphism, then φ is unital and φ(a−1 ) = (φ(a))−1 for each a ∈ A. Proof. (a): If φ and ψ are homomorphisms and x, y ∈ A, then (ψ ◦ φ)(x ◦ y) = ψ φ(x) ◦ φ(y) = ψ(φ(x)) ◦ ψ(φ(y)), showing ψ ◦ φ to be a homomorphism. If e ∈ A, e′ ∈ B, e′′ ∈ C are neutral and φ and ψ are unital, then (ψ ◦ φ)(e) = ψ(e′ ) = e′′ , showing ψ ◦ φ to be unital. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 68 4 BASIC ALGEBRAIC STRUCTURES (b): If φ is an isomorphism and x, y ∈ B, then φ−1 (x ◦ y) = φ−1 φ φ−1 (x) ◦ φ φ−1 (y) = φ−1 φ φ−1 (x) ◦ φ−1 (y) = φ−1 (x) ◦ φ−1 (y), showing φ−1 to be an isomorphism. (c): Exercise. (d): It suffices to show, e′ := φ(e) is neutral. Let b ∈ B and a ∈ A such that φ(a) = b. Then b ◦ e′ = φ(a) ◦ φ(e) = φ(a ◦ e) = φ(a) = b, e′ ◦ b = φ(e) ◦ φ(a) = φ(e ◦ a) = φ(a) = b, proving e′ to be neutral as desired. (e): Let (A, ◦), (B, ◦) be groups with neutral elements e ∈ A, e′ ∈ B, and let φ : A −→ B be a homomorphism. Then φ(e) ◦ e′ = φ(e) = φ(e ◦ e) = φ(e) ◦ φ(e). Applying (φ(e))−1 to both sides of the above equality proves φ(e) = e′ . Moreover, for each a ∈ A, φ(a−1 ) ◦ φ(a) = φ(a−1 ◦ a) = φ(e) = e′ , proving (e). Example 4.12. (a) If (A, ◦) is a magma, then it is immediate that the identity Id : A −→ A is an automorphism. (b) If (A, ◦), (B, ◦) are magmas, where e ∈ B is neutral, then the constant map φ ≡ e is a homomorphism: Indeed, ∀ x,y∈A φ(x ◦ y) = e = e ◦ e = φ(x) ◦ φ(y). (c) If (A, ·) is a semigroup with neutral element 1 ∈ A, then, for each fixed a ∈ A, φa : N0 −→ A, φa (n) := an , (4.13) is a homomorphism from (N0 , +) into (A, ·); if (A, ·) is a group, we may extend φa to Z and it becomes a homomorphism from (Z, +) into (A, ·): These statements are immediate from Th. 4.8(a). Note that, if A = N0 , Z, Q, R, C and (A, ·) = (A, +), then φa (n) = na. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 69 4 BASIC ALGEBRAIC STRUCTURES (d) If (G, ·) is an abelian group, then for each fixed k ∈ Z, φk : G −→ G, φk (a) := ak , (4.14) is an endomorphism due to Th. 4.8(c) (the case k = −1, where a 7→ a−1 is of particular interest). Note that, if G = N0 , Z, Q, R, C and (G, ·) = (G, +), then φk (a) = ka. Notation 4.13. A composition ◦ on a nonempty finite set A can be defined by means of a composition table (also known as a multiplication table or Cayley table: For each element a ∈ A, the table has a row labeled by a and it has a column labeled by a; if a, b ∈ A, then the entry at the intersection of the row labeled a and the column labeled b is a ◦ b (see Ex. 4.14 below, for examples). Example 4.14. (a) The most trivial groups are the group with one element and the group with two elements, which can be defined by the following Cayley tables (where e, a are distinct elements): ◦ e , e e ◦ e a e e a a a e Both structures are abelian groups, which is immediate for the group with one element and is easily verified for the group with two elements. We will see later that they are isomorphic to the groups we will denote Z1 and Z2 (which, in light of Prop. 4.11(c), shows again that the structure with two elements is a group). (b) The following examples show that, in non-groups, left and right neutral elements need not be unique: For a 6= b, let ◦ a b a a b , b a b ◦ a b a a a b b b One easily verifies these structures to be associative, but they are not groups: The magma on the left has no right neutral element, the magma on the right does not have inverses to both a and b. Theorem 4.15. Let (A, +) be an abelian semigroup with neutral element 0 ∈ A and assume (A, +) to satisfy the cancellation laws (4.4) (with ◦ replaced by +). Consider the quotient set G := (A × A)/ ∼ = [(a, b)] : (a, b) ∈ A × A (4.15) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 70 4 BASIC ALGEBRAIC STRUCTURES with respect to the equivalence relation defined by (a, b) ∼ (c, d) :⇔ a + d = b + c. (4.16) Introducing the simplified notation [a, b] := [(a, b)], we define addition and subtraction on G by + : G × G −→ G, [a, b], [c, d] 7→ [a, b] + [c, d] := [a + c, b + d], (4.17a) − : G × G −→ G, [a, b], [c, d] 7→ [a, b] − [c, d] := [a, b] + [d, c]. (4.17b) Then addition and subtraction on G are well-defined, (G, +) forms an abelian group, where [0, 0] is the neutral element, [b, a] is the inverse element of [a, b] for each a, b ∈ A, and, denoting the inverse element of [a, b] by −[a, b] in the usual way, [a, b] − [c, d] = [a, b] + (−[c, d]) for each a, b, c, d ∈ A. Moreover, the map ι : A −→ G, ι(a) := [a, 0], (4.18) is a monomorphism, where it is customary to identify A with ι(A). One then just writes a instead of [a, 0] and −a instead of [0, a] = −[a, 0]. The most important application is the case A = N0 , which yields that addition on Z = G forms an abelian group (cf. Ex. 2.36(a), [Phi16, Th. D.2], [Phi16, Th. D.7(c)]). Proof. If one reexamines Ex. 2.36(a), replacing N0 by A and Z by G, then one sees it shows ∼ to constitute an equivalence relation, where the proof makes use of commutativity, associativity, and the cancellation law. To verify + and − are well-defined on G, we need to check the above definitions do not depend on the chosen representatives, i.e. ˜ ˜ ∀ [a, b] = [ã, b̃] ∧ [c, d] = [c̃, d] ⇒ [a + c, b + d] = [ã + c̃, b̃ + d] (4.19a) ˜ a,b,c,d,ã,b̃,c̃,d∈A and ∀ ˜ a,b,c,d,ã,b̃,c̃,d∈A ˜ [a, b] = [ã, b̃] ∧ [c, d] = [c̃, d] ⇒ ˜ . (4.19b) [a, b] − [c, d] = [ã, b̃] − [c̃, d] ˜ means c + d˜ = d + c̃, implying Indeed, [a, b] = [ã, b̃] means a + b̃ = b + ã, [c, d] = [c̃, d] ˜ ˜ proving (4.19a). Now (4.19b) a + c + b̃ + d = b + ã + d + c̃, i.e. [a + c, b + d] = [ã + c̃, b̃ + d], is just (4.17b) combined with (4.19a). To verify commutativity and associativity on G, let a, b, c, d, e, f ∈ A. Then [a, b] + [c, d] = [a + c, b + d] = [c + a, b + d] = [c, d] + [a, b], AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 4 BASIC ALGEBRAIC STRUCTURES 71 proving commutativity, and [a, b] + [c, d] + [e, f ] = [a, b] + [c + e, d + f ] = [a + (c + e), b + (d + f )] = [(a + c) + e, (b + d) + f ] = [a + c, b + d] + [e, f ] = [a, b] + [c, d] + [e, f ], proving associativity. For every a, b ∈ A, one obtains [a, b] + [0, 0] = [a + 0, b + 0] = [a, b], proving neutrality of [0, 0], whereas [a, b] + [b, a] = [a + b, b + a] = [a + b, a + b] = [0, 0] (since (a + b, a + b) ∼ (0, 0)) shows [b, a] = −[a, b]. Now [a, b] − [c, d] = [a, b] + (−[c, d]) is immediate from (4.17b). The map ι is injective, as, for each a, b ∈ A, ι(a) = [a, 0] = ι(b) = [b, 0] implies a + 0 = 0 + b, i.e. a = b. The map ι is a homomorphism, as ∀ a,b∈A ι(a + b) = [a + b, 0] = [a, 0] + [b, 0] = ι(a) + ι(b), completing the proof of the theorem. Definition 4.16. Let (G, ◦) be a group, ∅ 6= U ⊆ G. We call U a subgroup of G (denoted U ≤ G) if, and only if, (U, ◦) forms a group, where the composition on U is the restriction of the composition on G, i.e. ◦↾U ×U . Theorem 4.17. Let (G, ◦) be a group, ∅ 6= U ⊆ G. Then U ≤ G if, and only if, (i) and (ii) hold, where (i) For each u, v ∈ U , one has u ◦ v ∈ U . (ii) For each u ∈ U , one has u−1 ∈ U . If U is finite, then U ≤ G is already equivalent to (i). Proof. Exercise. Example 4.18. (a) If (G, ◦) is an arbitrary group with neutral element e ∈ G, then it is immediate from Th. 4.17, that {e} and G are always subgroups. (b) We use Th. 4.17 to verify that, for each k ∈ Z, kZ = {kl : l ∈ Z} is a subgroup of (Z, +): As 0 ∈ kZ, kZ 6= ∅. If l1 , l2 ∈ Z, then kl1 + kl2 = k(l1 + l2 ) ∈ kZ. If l ∈ Z, then −kl = k(−l) ∈ kZ. Thus, Th. 4.17 applies, showing kZ ≤ Z. (c) As N is no subgroup of Z, even though Th. 4.17(i) holds for N, we see that, in general, Th. 4.17(ii) can not be omitted for infinite subsets. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 4 BASIC ALGEBRAIC STRUCTURES 72 (d) If (G, ◦) is a group with neutral element T e ∈ G, I 6= ∅ is an index set, and (Ui )i∈I is a family of subgroups, then U := i∈I Ui is a subgroup as well: Indeed, e ∈ U , since e ∈ Ui for each i ∈ I. If a, b ∈ U , then a, b ∈ Ui for each i ∈ I, implying a ◦ b ∈ Ui for each i ∈ I, i.e. a ◦ b ∈ U . Moreover a−1 ∈ Ui for each i ∈ I as well, showing a−1 ∈ U . Thus, U is a subgroup by Th. 4.17. (e) In contrast to intersections, unions of subgroups are not necessarily subgroups (however, cf. (f) below): We know from (b) that 2Z and 3Z are subgroups of (Z, +). However, 2 ∈ 2Z, 3 ∈ 3Z, but 5 = 2 + 3 6∈ (2Z) ∪ (3Z), showing (2Z) ∪ (3Z) is not a subgroup of (Z, +). (f ) While (e) shows that unions of subgroups are not necessarily subgroups, if (G, ◦) is a group with neutral element e ∈ G, I 6= ∅ is an index set, partially ordered by ≤ in a way such that, for each i, j ∈ I, there exists k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use k := max{i, j}), and (Ui )i∈I is an increasingSfamily of subgroups (i.e., for each i, j ∈ I with i ≤ j, one has Ui ⊆ Uj ), then U := i∈I Ui is a subgroup as well: Indeed, if i ∈ I, then e ∈ Ui ⊆ U . If a, b ∈ U , then there exist i, j ∈ I such that a ∈ Ui and b ∈ Uj . If k ∈ I with i, j ≤ k, then a ◦ b ∈ Uk ⊆ U . Moreover, a−1 ∈ Ui ⊆ U . Thus, U is a subgroup by Th. 4.17. (g) One can show that every group G is isomorphic to a subgroup of the permutation group SG (see Sec. C of the Appendix, in particular, Cayley’s Th. C.2). Definition 4.19. Let G and H be groups and let φ : G −→ H be a homomorphism. Let e′ ∈ H be neutral. Define the sets ker φ := φ−1 {e′ } = {a ∈ G : φ(a) = e′ }, Im φ := φ(G) = {φ(a) : a ∈ G}, (4.20a) (4.20b) where ker φ is called the kernel of φ and Im φ is called the image of φ. Theorem 4.20. Let (G, ·) and (H, ·) be groups and let φ : G −→ H be a homomorphism. Let e ∈ G and e′ ∈ H be the respective neutral elements. Also assume U ≤ G, V ≤ H. Then the following statements hold true: (a) φ−1 (V ) ≤ G. (b) ker φ ≤ G. (c) Im φ ≤ H. (d) φ(U ) ≤ H. (e) φ is a monomorphism (i.e. injective) if, and only if, ker φ = {e}. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 73 4 BASIC ALGEBRAIC STRUCTURES (f ) It holds true that ∀ a∈G φ−1 {φ(a)} = a(ker φ) = (ker φ)a, i.e. the nonempty preimages of elements of H are cosets of the kernel of φ (cf. Ex. 4.9(d)). Proof. (a): Since V ≤ H, we have e′ ∈ V . From Prop. 4.11(e), we know φ(e) = e′ , i.e. e ∈ φ−1 (V ) and φ−1 (V ) 6= ∅. If a, b ∈ φ−1 (V ), then φ(a), φ(b) ∈ V and, since V ≤ H, φ(ab) = φ(a−1 ) Prop. 4.11(e) = φ(a)φ(b) ∈ V, (φ(a))−1 ∈ V, showing ab ∈ φ−1 (V ) and a−1 ∈ φ−1 (V ). Thus, φ−1 (V ) ≤ G by Th. 4.17. (b) is now immediate by applying (a) with V := {e′ } ≤ H. (c): As φ(e) = e′ , we have e′ ∈ Im φ and Im φ 6= ∅. If x, y ∈ Im φ, then there exist a, b ∈ G such that φ(a) = x, φ(b) = y. Thus, φ(ab) = φ(a)φ(b) = xy, φ(a−1 ) = (φ(a))−1 = x−1 , showing xy ∈ Im φ and x−1 ∈ Im φ. Once again, Im φ ≤ H now follows from Th. 4.17. (d) is now immediate by applying (c) with φ replaced by φ↾U (then φ(U ) = Im φ↾U ≤ H). (e): If φ is injective, then e is the unique preimage of e′ , showing ker φ = {e}. Conversely, assume ker φ = {e}. If a, b ∈ G with φ(a) = φ(b), then φ(a−1 b) = (φ(a))−1 φ(b) = e′ , showing a−1 b ∈ ker φ, i.e. a−1 b = e, i.e. b = a, and φ is injective. (f): Fix a ∈ G. Suppose b ∈ φ−1 {φ(a)} . Then φ(a) = φ(b), implying φ(a−1 b) = (φ(a))−1 φ(b) = e′ and a−1 b ∈ ker φ. Thus, b ∈ a(ker φ) and φ−1 {φ(a)} ⊆ a(ker φ). Similarly, φ(ba−1 ) = φ(b)(φ(a))−1 = e′ , showing b ∈ (ker φ)a and φ−1 {φ(a)} ⊆ (ker φ)a. Conversely, suppose b ∈ a(ker φ) (resp. b ∈ (ker φ)a). Then a−1 b ∈ ker φ (resp. ba−1 ∈ ker φ), implying e′ = φ(a−1 b) = φ(a−1 )φ(b) resp. e′ = φ(ba−1 ) = φ(b)φ(a−1 ) . In both cases, we conclude φ(a) = φ(b), i.e. b ∈ φ−1 {φ(a)} . Thus a(ker φ) ⊆ φ−1 {φ(a)} as well as (ker φ)a ⊆ φ−1 {φ(a)} , thereby establishing the case. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 74 4 BASIC ALGEBRAIC STRUCTURES Example 4.21. According to Ex. 4.12(c), if k ∈ Z, then the map φk : Z −→ Z, φk (l) := kl, is a homomorphism of (Z, +) into itself. Clearly, Im φk = kZ, showing, once again, kZ ≤ Z (cf. Ex. 4.18(b)). For another example, let G := {e, a} be the group with two elements of Ex. 4.14(a). According to Ex. 4.12(c), the map φ : Z −→ G, φ(n) := an , is a homomorphism. Clearly, φ is an epimorphism with ker φ = 2Z (i.e. the set of all even numbers). — In Ex. 4.9(d), we have defined cosets for a given subset of a magma. For a group G, cosets of subgroups are of particular interest: We will see in Prop. 4.22 that, if U ≤ G, then the cosets of U form a partition of G, giving rise to an equivalence relation on G according to Th. 2.33(a). Thus, the cosets form the quotient set with respect to this equivalence relation and, if the subgroup is sufficiently benign (namely what we will call a normal subgroup in Def. 4.23 below), then we can make this quotient set itself into another group, the so-called quotient group or factor group G/U (cf. Def. 4.26 and Th. 4.27(a) below). Proposition 4.22. Let (G, ·) be a group, U ≤ G. Then [ G= aU, ∀ aU = bU ∨ aU ∩ bU = ∅ , a,b∈G a∈G G= [ a∈G U a, ∀ a,b∈G Ua = Ub ∨ Ua ∩ Ub = ∅ , (4.21a) (4.21b) i.e. both the left cosets and the right cosets of U form decompositions of G. Proof. We conduct the proof for left cosets – the proof for right cosets is completely analogous. For each a ∈ G, we have a ∈ aU (since e ∈ U ), already showing G = S a∈G aU . Now let a, b, x ∈ G with x ∈ aU ∩ bU . We need to prove aU = bU . From x ∈ aU ∩ bU , we obtain ∃ x = au1 = bu2 . (4.22) u1 ,u2 ∈U −1 Now let au ∈ aU (i.e. u ∈ U ). As (4.22) implies a = bu2 u−1 1 , we obtain au = bu2 u1 u ∈ bU , using u2 u−1 1 u ∈ U , due to U being a subgroup. This shows aU ⊆ bU . Analogously, −1 if bu ∈ bU , then bu = au1 u−1 due to (4.22) and 2 u ∈ aU , where we used b = au1 u2 −1 u1 u2 u ∈ U . This shows bU ⊆ aU , completing the proof. Definition 4.23. Let (G, ·) be a group with subgroup U ≤ G. Then U is called normal if, and only if, ∀ aU = U a, (4.23) a∈G i.e. if, and only if, left and right cosets with respect to U are the same. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 4 BASIC ALGEBRAIC STRUCTURES 75 Theorem 4.24. Let (G, ·) and (H, ·) be groups and let φ : G −→ H be a homomorphism. (a) If V is a normal subgroup of H, then U := φ−1 (V ) is a normal subgroup of G. (b) ker φ is a normal subgroup of G. Proof. (a): As we already know from Th. 4.20(a) that U is a subgroup of G, it merely remains to show U is normal. If a ∈ G and u ∈ U , then φ(u) ∈ V . Since V is normal, we have φ(a)V = V φ(a). Thus, there exists v ∈ V such that φ(a)φ(u) = vφ(a), namely v = φ(a)φ(u)(φ(a))−1 = φ(a)φ(u)φ(a−1 ) = φ(aua−1 ), showing u′ := aua−1 ∈ U . Thus, au = aua−1 a = u′ a, proving aU = U a, as desired. (b) is now immediate by applying (a) with V := {e}, where e denotes the neutral element of H. Alternatively, one also obtains ker φ to be a normal subgroup of G as a direct consequence of Th. 4.20(f). Example 4.25. (a) If G is an abelian group, then every subgroup of G is normal. (b) In Ex. 4.9(b), we defined the symmetric groups S2 and S3 . Clearly, the map ( f (n) for n ∈ {1, 2}, φ : S2 −→ S3 , f 7→ φ(f ), φ(f )(n) := 3 for n = 3, constitutes a monomorphism. By identifying S2 with φ(S2 ), we may consider S2 as a subgroup of S3 . We claim that S2 is not a normal subgroup of S3 . Indeed, one checks that (23)S2 = (23){Id, (12)} = {(23), (132)} 6= S2 (23) = {(23), (123)}, where we made use of a notation, commonly used for permutations (we will need to study it more thoroughly later): (12) is the map that permutes 1 and 2 and leaves 3 fixed, (23) permutes 2 and 3 and leaves 1 fixed, (132) maps 1 into 3, 3 into 2, 2 into 1, (123) maps 1 into 2, 2 into 3, 3 into 1. Definition 4.26. Let (G, ·) be a group and let N ≤ G be a normal subgroup. According to Prop. 4.22, the cosets of N form a partition of G. According to Th. 2.33(a), the cosets are precisely the equivalence classes of the equivalence relation ∼ on G, defined by a ∼ b :⇔ aN = bN. Define G/N := G/ ∼, AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 76 4 BASIC ALGEBRAIC STRUCTURES i.e. G/N is the set of all cosets of N . We define a composition on G/N by letting ∀ a,b∈G (aN ) · (bN ) := abN (4.24) (note that we already know the forming of cosets to be associative by Ex. 4.9(d)). We call (G/N, ·) the quotient group of G by N or the factor group of G with respect to N (cf. Th. 4.27(a) below). Theorem 4.27. (a) Let (G, ·) be a group and let N ≤ G be a normal subgroup. Then (4.24) well-defines a composition on G/N that makes G/N into a group with neutral element N . Moreover, the map φN : G −→ G/N, φN (a) := aN, (4.25) is an epimorphism, called the canonical or natural homomorphism from G onto G/N (comparing with Def. 2.32(b), we see that φN is precisely the quotient map with respect to the equivalence relation ∼ of Def. 4.26). (b) Isomorphism Theorem: If (G, ·) and (H, ·) are groups and φ : G −→ H is a homomorphism, then G/ ker φ ∼ (4.26) = Im φ (recall from Th. 4.24(b) that ker φ is always a normal subgroup of G). More precisely, the map f : G/ ker φ −→ Im φ, f (a ker φ) := φ(a), (4.27) is well-defined and constitutes an isomorphism. If fe : G −→ G/ ker φ denotes the natural epimorphism according to (a) and ι : Im φ −→ H, ι(a) := a, denotes the embedding, then fm : G/ ker φ −→ H, fm := ι ◦ f , is a monomorphism such that φ = fm ◦ fe . (4.28) Proof. (a): To see that (4.24) well-defines a composition on G/N , we need to show that the definition is independent of the chosen coset representatives a, b: To this end, suppose a, b, a′ , b′ ∈ G are such that aN = a′ N and bN = b′ N . We need to show abN = a′ b′ N . There exist na , nb ∈ N such that a′ = ana , b′ = bnb . Since N is a normal subgroup of G, we have bN = N b and there exists n ∈ N such that na b = bn. Then, as nnb N = N , we obtain a′ b′ N = ana bnb N = abnnb N = abN, as needed. That φN is surjective is immediate from (4.25). Moreover, the computation ∀ a,b∈G φN (ab) = abN = (aN ) · (bN ) = φN (a) · φN (b) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 77 4 BASIC ALGEBRAIC STRUCTURES verifies φN to be a homomorphism. That (G/N, ·) forms a group is now an immediate consequence of Prop. 4.11(c),(d). (b): We conduct the proof by showing f (a ker φ) does not depend on the chosen coset representative a ∈ G: Let e′ ∈ H be neutral and suppose a′ ∈ G is such that a′ ker φ = a ker φ. Then there exists x ∈ ker φ such that a′ = ax, implying f (a′ ker φ) = φ(a′ ) = φ(ax) = φ(a)φ(x) = φ(a) · e′ = φ(a) = f (a ker φ), as desired. To prove f to be a homomorphism, let a, b ∈ G. Then f (a ker φ) · (b ker φ) = f (ab ker φ) = φ(ab) = φ(a)φ(b) = f (a ker φ)f (b ker φ), i.e. f is a homomorphism. If x ∈ Im φ, then there exists a ∈ G with x = φ(a). Thus, f (a ker φ) = φ(a) = x, showing f to be surjective. Now suppose a ∈ G is such that f (a ker φ) = e′ . Then φ(a) = e′ , i.e. a ∈ ker φ. Thus, a ker φ = ker φ, showing ker f = {ker φ}, i.e. f is injective, completing the proof that f is an isomorphism. Since f and ι are monomorphisms, so is fm = ι ◦ f . Moreover, if a ∈ G, then (fm ◦ fe )(a) = fm (a ker φ) = f (a ker φ) = φ(a), thereby proving (4.28). Example 4.28. (a) Consider (Z, +) and let n ∈ N. We know from Ex. 4.18(b) that nZ ≤ Z. As (Z, +) is also commutative, nZ is a normal subgroup of Z and we can form the quotient group Zn := Z/nZ. (4.29) The elements of Zn are the cosets r + nZ, r ∈ Z. For r ∈ Z, we have r̄ := r + nZ = {r + mn : m ∈ Z}. Note that r + mn + nZ = r + nZ. For integers k, l ∈ Z, one says k is congruent l modulo n if, and only if, there exists m ∈ Z such that k − l = mn and, in this case, one writes k ≡ l (mod n). If k = r + mn with m ∈ Z, then k − r = mn, implying r + nZ = {k ∈ Z : k ≡ r (mod n)}. For this reason, one also calls the elements of Zn congruence classes. If k = r + mn as above, one also says that r is the residue (or remainder) when dividing k by n and, thus, one also calls the elements of Zn residue classes. Here, the canonical homomorphism of Th. 4.27(a) takes the form φ : Z −→ Zn , φ(r) := r̄ = r + nZ. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 78 4 BASIC ALGEBRAIC STRUCTURES We also note that Z= n−1 [ r̄. (4.30) r=0 Now consider G := {0, . . . , n − 1} and define an addition ⊕ on G by letting ( r+s for r + s ≤ n − 1, ∀ r ⊕ s := r,s∈G r + s − n for n ≤ r + s ≤ 2n − 2, which is commutative due to (Z, +) being commutative. We claim (G, ⊕) ∼ = (Zn , +) due to the isomorphism f : G −→ Zn , f (r) := r̄. If r, s ∈ G, then f (r ⊕ s) = f (r + s) = r + s = r̄ + s̄ = f (r) + f (s) for r + s ≤ n − 1, f (r ⊕ s) = f (r + s − n) = r + s − n = r + s = r̄ + s̄ = f (r) + f (s) for n ≤ r + s ≤ 2n − 2, showing f to be a homomorphism. Moreover, if r 6= s, then r̄ 6= s̄, i.e. f is injective, while (4.30) shows f to be surjective, completing the proof that f is an isomorphism. In particular, (G, ⊕) is a group, due to Th. 4.11(c). Even though it constitutes a slight abuse of notation, we will tend to use the isomorphism f to identify G with Zn , where we write simply + instead of ⊕ and write r instead of r̄, identifying r̄ with its standard representative r ∈ r̄. The Cayley tables for (Z1 , +) and (Z2 , +) are + 0 1 + 0 , 0 0 1 , 0 0 1 1 0 respectively. Comparing with Ex. 4.14(a), we see that φ1 : Z1 −→ {e}, φ1 (0) := e, and φ2 : Z2 −→ {e, a}, φ2 (0) := e, φ2 (1) := a, are isomorphisms. (b) Let m, n ∈ N. As example for an application of the isomorphism theorem of Th. 4.27(b), we show mZ/mnZ ∼ = Zn : Consider the map φ : mZ −→ Zn , φ(mr) := r + nZ. Then φ is a homomorphism, since ∀ r,s∈Z φ(mr+ms) = φ(m(r+s)) = r+s+nZ = (r+nZ)+(s+nZ) = φ(mr)+φ(ms). Clearly, φ is surjective, i.e. Im φ = Zn . Moreover, mr ∈ ker φ ⇔ r ∈ nZ ⇔ ∃ r = kn k∈Z ⇔ mr = mkn ∈ mnZ, showing ker φ = mnZ. In consequence, mZ/mnZ ∼ = Zn holds due to Th. 4.27(b). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 79 4 BASIC ALGEBRAIC STRUCTURES 4.2 Rings and Fields While groups are already sufficiently rich to give rise to the vast algebraic discipline called group theory, before we can define vector spaces, we still need to consider structures called rings and fields (where fields are rings that are especially benign). Rings are richer and, in general, more complicated than groups, as they always come with two compositions instead of just one. Definition 4.29. Let R be a nonempty set with two composition maps + : R × R −→ R, · : R × R −→ R, (x, y) 7→ x + y, (x, y) 7→ x · y (4.31) (+ is called addition and · is called multiplication; as before, one often writes xy instead of x · y). Then (R, +, ·) (or just R, if + and · are understood) is called a ring if, and only if, the following three conditions are satisfied: (i) (R, +) is a commutative group (its neutral element is denoted by 0). (ii) Multiplication is associative. (iii) Distributivity: x,y,z∈R ∀ x · (y + z) = x · y + x · z, (4.32a) ∀ (y + z) · x = y · x + z · x. (4.32b) x,y,z∈R A ring R is called commutative if, and only if, its multiplication is commutative. Moreover, a ring is called a ring with unity if, and only if, R contains a neutral element with respect to multiplication (i.e. there is 1 ∈ R such that 1 · x = x · 1 = x for each x ∈ R) – some authors always require a ring to have a neutral element with respect to multiplication. Finally, (R, +, ·) is called a field if, and only if, it is a ring and, in addition, (R \ {0}, ·) constitutes a commutative group (its neutral element is denoted by 1). For each x, y in a ring R, define y − x := y + (−x), where −x is the additive inverse of x. If R is a field, then, for x 6= 0, also define the fractions xy := y/x := yx−1 with numerator y and denominator x, where x−1 denotes the multiplicative inverse of x. — Like group theory, ring theory and field theory are vast important subdisciplines of algebra. Here we will merely give a brief introduction to some elementary notions and results, before proceeding to vector spaces, which are defined, building on the notion of AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 80 4 BASIC ALGEBRAIC STRUCTURES a field. Before we come to examples of rings and fields, we will prove a number of basic rules. However, it might be useful to already know that, with the usual addition and multiplication, Z is a ring (but not a field), and Q, R, C all are fields. Theorem 4.30. The following statements and rules are valid in every ring (R, +, ·) (let 0 denote the additive neutral element and let x, y, z ∈ R): (a) x · 0 = 0 = 0 · x. (b) x(−y) = −(xy) = (−x)y. (c) (−x)(−y) = xy. (d) x(y − z) = xy − xz, (y − z)x = yx − zx. Proof. (a): One computes (4.32a) x · 0 + x · 0 = x · (0 + 0) = x · 0 = 0 + x · 0, i.e. x · 0 = 0 follows since (R, +) is a group. The second equality follows analogously using (4.32b). (b): xy +x(−y) = x(y −y) = x·0 = 0, where we used (4.32a) and (a). This shows x(−y) is the additive inverse to xy. The second equality follows analogously using (4.32b). (c): xy = −(−(xy)) = −(x(−y)) = (−x)(−y), where (b) was used twice. (d): x(y − z) = x(y + (−z)) = xy + x(−z) = xy − xz and (y − z)x = (y + (−z))x = yx + (−z)x = yx − zx. Theorem 4.31. The following statement and rules are valid in every field (F, +, ·): (a) xy = 0 ⇒ x = 0 ∨ y = 0. (b) Rules for Fractions: ad + bc a b + = , c d cd a b ab · = , c d cd a/c ad = , b/d bc where all denominators are assumed 6= 0. Proof. (a): If xy = 0 and x 6= 0, then y = 1 · y = x−1 xy = x−1 · 0 = 0. (b): One computes a b ad + bc + = ac−1 + bd−1 = add−1 c−1 + bcc−1 d−1 = (ad + bc)(cd)−1 = c d cd AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 4 BASIC ALGEBRAIC STRUCTURES and 81 ab a b · = ac−1 bd−1 = ab(cd)−1 = c d cd and ad a/c = ac−1 (bd−1 )−1 = ac−1 b−1 d = ad(bc)−1 = , b/d bc completing the proof. Definition and Remark 4.32. Let (R, +, ·) be a ring and a ∈ R. Then a is called a left (resp. right) zero divisor if, and only if, there exists x ∈ R \ {0} such that ax = 0 (resp. xa = 0). If a 6= 0 is a zero divisor, then it is called a nonzero or nontrivial zero divisor. According to Th. 4.30(a), 0 is always a zero divisor, except for R = {0}. According to Th. 4.31(a), in a field, there do not exist any nontrivial zero divisors. However, in general, rings can have nontrivial zero divisors (see, e.g. Ex. 4.38 and Ex. 4.42(a) below). Example 4.33 (Ring of Integers Z). Even though we have occasionally already used multiplication on Z in examples, so far, we have not provided a mathematically precise definition. The present example, will remedy this situation. Recall from Ex. 2.36(a) and Th. 4.15 the definition of Z as a set of equivalence classes of elements of N0 × N0 with addition (and subtraction) on Z defined according to (4.17). To obtain the expected laws of arithmetic, multiplication on Z needs to be defined such that (a − b) · (c − d) = (ac + bd) − (ad + bc), which leads to the following definition: Multiplication on Z is defined by · : Z × Z −→ Z, [a, b], [c, d] 7→ [a, b] · [c, d] := [ac + bd, ad + bc]. (4.33) (a) It is an exercise to show the definition in (4.33) does not depend on the chosen representatives, i.e. ˜ ⇒ [ac+bd, ad+bc] = [ãc̃+ b̃d, ˜ ãd+ ˜ b̃c̃] . [a, b] = [ã, b̃]∧[c, d] = [c̃, d] ∀ ˜ 0 a,b,c,d,ã,b̃,c̃,d∈N (4.34) (b) It is an exercise to show (Z, +, ·) is a commutative ring with unity, where [1, 0] is the neutral element of multiplication, and there are no nontrivial zero divisors i.e. [a, b]·[c, d] = [ac+bd, ad+bc] = [0, 0] ⇒ [a, b] = [0, 0]∨[c, d] = [0, 0] . ∀ a,b,c,d∈N0 (4.35) (c) (Z, +, ·) is not a field, since, e.g., there is no multiplicative inverse for 2 ∈ Z: While 0 · 2 = 0, one has n · 2 ≥ 2 for each n ∈ N and −n · 2 ≤ −2 for each n ∈ N, showing k · 2 6= 1 for each k ∈ Z. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 4 BASIC ALGEBRAIC STRUCTURES 82 Definition 4.34. (a) Let (R, +, ·) be a ring (resp. a field), ∅ 6= U ⊆ R. We call U a subring (resp. a subfield) of R if, and only if, (U, +, ·) forms a ring (resp. a field), where the compositions on U are the respective restrictions of the compositions on R, i.e. +↾U ×U and ·↾U ×U . (b) Let (R, +, ·) and (S, +, ·) be rings (caveat: to simplify notation, we use the same symbols to denote the compositions on R and S, respectively; however, they might even denote different compositions on the same set R = S). A map φ : R −→ S is called ring homomorphism if, and only if, φ is a homomorphism in the sense of Def. 4.10 with respect to both + and · (caveat: Ex. 4.36(a) below shows that a ring homomorphism is not necessarily unital with respect to multiplication). The notions ring monomorphism, epimorphism, isomorphism, endomorphism, automorphism are then defined as in Def. 4.10. Moreover, (R, +, ·) and (S, +, ·) are called isomorphic (denoted (R, +, ·) ∼ = (S, +, ·)) if, and only if, there exists a ring isomorphism φ : R −→ S. Theorem 4.35. (a) Let (R, +, ·) be a ring, ∅ 6= U ⊆ R. Then U is a subring of R if, and only if, (i) and (ii) hold, where (i) (U, +) is a subgroup of (R, +). (ii) For each a, b ∈ U , one has ab ∈ U . (b) Let (F, +, ·) be a field, ∅ 6= U ⊆ F . Then U is a subfield of F if, and only if, (i) and (ii) hold, where (i) (U, +) is a subgroup of (F, +). (ii) (U \ {0}, ·) is a subgroup of (F \ {0}, ·). Proof. In both cases, it is clear that (i) and (ii) are necessary. That (i) and (ii) are also sufficient in both cases is due to the fact that, if associativity (resp. commutativity, resp. distributivity) hold on R (resp. F ), then it is immediate that the same property holds on U . Example 4.36. (a) Clearly, the trivial ring ({0}, +, ·) (∼ = (Z1 , +, ·), see Ex. 4.38 below) is a subring of every ring (it is not a field, since {0} \ {0} = ∅ is not a group). If R and S are arbitrary rings, then, clearly, the constant map φ0 : R −→ S, φ0 ≡ 0, is always a ring homomorphism (this shows that a ring homomorphism is not necessarily unital with respect to multiplication). Also note that Th. 4.30(a) implies that any ring that has 0 = 1 (in the sense that the neutral elements for addition and multiplication are the same) has necessarily precisely one element, i.e. it is (isomorphic to) ({0}, +, ·). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 83 4 BASIC ALGEBRAIC STRUCTURES (b) Let n ∈ N. It is clear from Th. 4.35(a) that nZ is a subring of Z (note that, for n > 1, nZ is not a ring with unity). Moreover, for n > 1, φ : Z −→ nZ, φ(k) := nk, is not a ring homomorphism, since, e.g. φ(1 · 1) = n 6= n2 = φ(1)φ(1). (c) If R, S are arbitrary rings and φ : R −→ S is a ring homomorphism, then it is clear from Th. 4.35(a) that Im φ is a subring of S. Moreover, ker φ := φ−1 ({0}) is a subring of R: This is also due to Th. 4.35(a), since (ker φ, +) ≤ (R, +) and, if a, b ∈ ker φ, then φ(ab) = φ(a)φ(b) = 0, showing ab ∈ ker φ. However, if R and S are rings with unity, and φ(1) = 1, then ker φ is a ring with unity if, and only if, S = {0}: Indeed, if 1 ∈ ker φ, then 0 = φ(1) = 1 in S. (d) Let (R, +, ·) be a ring with unity. We claim that the group homomorphism φ1 of Ex. 4.12(c) extends to a unital ring homomorphism Pn n · 1 = i=1 1 for n ≥ 1, φ1 : Z −→ R, φ1 (n) := 0 for n = 0, −((−n) · 1) for n ≤ −1 (note that this is precisely the definition from Not. 4.7 (setting x := 1 in Not. 4.7), except that, in Not. 4.7, the composition was written as multiplication, i.e. xn of Pn Not. 4.7 here becomes n · x = i=1 x): Indeed, ∀ m,n∈Z φ1 (mn) = (mn)·1 Th. 4.8(b) = n·(m·1) = n X φ1 (m) = φ1 (m) i=1 n X 1 = φ1 (m)φ1 (n). i=1 (e) Let (R, +, ·) be a ring (resp. a ring with unity, resp. a field), let I 6= ∅ be an index set, and let (Ui )i∈I be a family of subrings (resp. T of subrings with unity, resp. of subfields). It is then an exercise to show U := i∈I Ui is a subring (resp. a subring with unity, resp. a subfield). (f ) In contrast to intersections, unions of rings are not necessarily rings (however, cf. (g) below): We know from (b) that 2Z and 3Z are subrings of Z. But we already noted in Ex. 4.18(e) that (2Z) ∪ (3Z) is not even a subgroup of (Z, +). To show that the union of two fields is not necessarily a field (not even a ring, actually), needs slightly more work: We use that Q is a subfield of R (cf. Rem. 4.40 below) and show that, for each x ∈ R with x2 ∈ Q, Q(x) := {r + sx : r, s ∈ Q} is a subfield of R (clearly, Q(x) = Q if, and only if, x ∈ Q): Setting r = s = 0, shows 0 ∈ Q(x). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 84 4 BASIC ALGEBRAIC STRUCTURES Setting r = 1, s = 0, shows 1 ∈ Q(x). If r1 , s1 , r2 , s2 ∈ Q, then r1 + s1 x + r2 + s2 x = r1 + r2 + (s1 + s2 )x ∈ Q(x), (r1 + s1 x)(r2 + s2 x) = r1 r2 + s1 s2 x2 + (s1 r2 + r1 s2 )x ∈ Q(x), −(r1 + s1 x) = −r1 − s1 x ∈ Q(x), (r1 + s1 x)−1 = r1 − s 1 x ∈ Q(x), r12 − s21 x2 √ 2) ∪ showing Q(x) to be a subfield of R by Th. 4.35(b). However, e.g., U := Q( √ √ √ √ Q( 3) is not even a ring, since α√:= 2√ + 3∈ / U: α ∈ /√ Q( 2) since, otherwise, there with α = r + s 2, i.e. 3 = r + (s − 1) 2, and 3 = r2 + 2r(s − √ are r, s ∈ Q 2 1) 2 + 2(s − 1) , implying the false statement p √ √ 3=0 ∨ 3∈Q ∨ 3/2 ∈ Q ∨ 2 ∈ Q. √ √ √ Switching the roles of 2 and 3, shows α ∈ / Q( 3), i.e. α ∈ / U. (g) While (f) shows that unions of subrings (or even subfields) are not necessarily subrings, if (R, +, ·) is a ring (resp. a ring with unity, resp. a field), I 6= ∅ is an index set, partially ordered by ≤ in a way such that, for each i, j ∈ I, there exists k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use k := max{i, j}), and (Ui )i∈I is an increasing family of subrings (resp. of subrings with unity, S resp. of subfields) (i.e., for each i, j ∈ I with i ≤ j, one has Ui ⊆ Uj ), then U := i∈I Ui is a subgroup (resp. a subring with unity, resp. a subfield) as well: Indeed, according to Ex. 4.18(f), (U, +) ≤ (R, +). If a, b ∈ U , then there exist i, j ∈ I such that a ∈ Ui and b ∈ Uj . If k ∈ I with i, j ≤ k, then ab ∈ Uk ⊆ U , i.e. U is a ring by Th. 4.35(a). If 1 ∈ Ui , then 1 ∈ U . If each (Ui \ {0}, ·) is a group, then (U \ {0}, ·) is a group by Ex. 4.18(f), showing U to be a subfield by Th. 4.35(b). — We can extend Prop. 4.11(c) to rings and fields: Proposition 4.37. Let A be a nonempty set with compositions + and ·, and let B be a nonempty set with compositions + and · (caveat: to simplify notation, we use the same symbols to denote the compositions on R and S, respectively; however, they might even denote different compositions on the same set A = B). If φ : A −→ B is an AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 85 4 BASIC ALGEBRAIC STRUCTURES epimorphism with respect to both + and ·, then (A, +, ·) satisfies (4.32a) (A, +, ·) satisfies (4.32b) (A, +, ·) ring (A, +, ·) ring with unity (A, +, ·) field ⇒ ⇒ ⇒ ⇒ ⇒ (B, +, ·) satisfies (4.32a), (B, +, ·) satisfies (4.32b), (B, +, ·) ring, (B, +, ·) ring with unity, (B, +, ·) field if B 6= {0}. If φ is even an isomorphism with respect to both + and ·, then the above implications become equivalences. Proof. Let b1 , b2 , b3 ∈ B with preimages a1 , a2 , a3 ∈ A, respectively. If (A, +, ·) satisfies (4.32a), then b1 · (b2 + b3 ) = φ(a1 ) · φ(a2 ) + φ(a3 ) = φ a1 · (a2 + a3 ) = φ(a1 · a2 + a1 · a3 ) = φ(a1 ) · φ(a2 ) + φ(a1 ) · φ(a3 ) = b1 · b2 + b1 · b3 , showing (B, +, ·) satisfies (4.32a) as well. If (A, +, ·) satisfies (4.32b), then (b2 + b3 ) · b1 = φ(a2 ) + φ(a3 ) · φ(a1 ) = φ (a2 + a3 ) · a1 = φ(a2 · a1 + a3 · a1 ) = φ(a2 ) · φ(a1 ) + φ(a3 ) · φ(a1 ) = b2 · b1 + b3 · b1 , showing (B, +, ·) satisfies (4.32b) as well. If (A, +, ·) is a ring, then we know from Prop. 4.11(c) that (B, +) is a commutative group and that (B, ·) is associative, showing that (B, +, ·) is a ring as well. If (A, +, ·) is a ring with unity, then the above and Prop. 4.11(d) imply (B, +, ·) to be a ring with unity as well. If (A, +, ·) is a field, then, as B \ {0} 6= ∅, (B \ {0}, ·) must also be a group, i.e. (B, +, ·) is a field. Finally, if φ is an isomorphism, then the implications become equivalences, as both φ and φ−1 are epimorphisms by Prop. 4.11(b) (and since φ(0) = 0). Example 4.38. Let n ∈ N. In Ex. 4.28(a), we considered the group (Zn , +), where Zn was defined as the quotient group Zn := Z/nZ. We now want to define a multiplication on Zn that makes Zn into a ring with unity (and into a field if, and only if, n is prime, cf. Def. D.2(b)). This is accomplished by letting ∀ r,s∈Z r̄ · s̄ = (r + nZ) · (s + nZ) := rs = rs + nZ : (4.36) First, we check that the definition does not depend on the chosen representatives: Let r1 , r2 , s1 , s2 ∈ Z. If r̄1 = r̄2 and s̄1 = s̄2 , then there exist mr , ms ∈ Z such that r1 − r2 = mr n and s1 − s2 = ms n, implying r̄1 · s̄1 = r1 s1 + nZ = (r2 + mr n)(s2 + ms n) + nZ = r2 s2 + (mr s2 + ms r2 + mr ms n)n + nZ = r2 s2 + nZ = r̄2 · s̄2 , AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 86 4 BASIC ALGEBRAIC STRUCTURES as desired. We already know the canonical homomorphism φ : Z −→ Zn , φ(r) = r̄ = r + nZ to be a homomorphism with respect to addition. Since ∀ r,s∈Z φ(rs) = rs = r̄ · s̄ = φ(r) · φ(s), it is also a homomorphism with respect to the above-defined multiplication. Since φ is also surjective, (Zn , +, ·) is a ring with unity by Prop. 4.37. Now suppose that n is not prime. If n = 1, then #Z1 = 1, i.e. it is (isomorphic) to the trivial ring of Ex. 4.36(a). If n = ab with a, b ∈ N, 1 < a, b < n, then ā · b̄ = n̄ = 0̄, showing ā and b̄ to be nontrivial divisors of 0 (in Zn ). In particular, (Zn , +, ·) is not a field. Now consider the case that n is prime. If r ∈ Z and r̄ 6= 0̄, then gcd(r, n) = 1 (cf. Def. D.2(c)) and, by (D.6) of the Appendix, there exist x, y ∈ Z with xn + yr = 1. Then ȳ · r̄ = yr = 1 − xn = 1, showing ȳ to be the multiplicative inverse to r̄. Thus, (Zn \{0}, ·) is a group and (Zn , +, ·) is a field. For the rest of the example, we, once again, allow an arbitrary n ∈ N. In Ex. 4.28(a), we showed that, letting G := {0, . . . , n − 1}, we had (G, +) ∼ = (Zn , +). We now want to extend this isomorphism to a ring isomorphism. To this end, define a multiplication ⊗ on G by letting ∀ r,s∈G r ⊗ s := p, where rs = qn + p with q, p ∈ N0 and 0 ≤ p < n (cf. Th. D.1), which is commutative due to (Z, +) being commutative. We claim the additive isomorphism f : G −→ Zn , f (r) := r̄, of Ex. 4.28(a) to be a multiplicative isomorphism as well: If r, s ∈ G, then, using q, p as above, f (r ⊗ s) = f (p) = p̄ = rs − qn = rs = r̄ · s̄ = f (r)f (s), showing f to be a multiplicative homomorphism. As we know f to be bijective, (G, +, ·) is a ring by Prop. 4.37, yielding (G, +, ⊗) ∼ = (Zn , +, ·). As stated in Ex. 4.28(a) in the context of (Zn , +), one tends to use the isomorphism f to identify G with Zn . One does the same in the context of (Zn , +, ·), also simply writing · for the multiplication on G. Example 4.39 (Field of Rational Numbers Q). In Ex. 2.36(b), we defined the set of rational numbers Q as the quotient set of Z × (Z \ {0}) with respect to the equivalence relation defined by (a, b) ∼ (c, d) :⇔ a · d = b · c. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 87 4 BASIC ALGEBRAIC STRUCTURES We now want to define “the usual” addition and multiplication on Q and then verify that these make Q into a field (recall the notation ab := a/b := [(a, b)]): Addition on Q is defined by + : Q × Q −→ Q, Multiplication on Q is defined by a c a c ad + bc , 7→ + := . b d b d bd (4.37) a c ac a c , 7→ · := . b d b d bd (4.38) · : Q × Q −→ Q, We will now show that (Q, +, ·) forms a field, where 0/1 and 1/1 are the neutral elements with respect to addition and multiplication, respectively, (−a/b) is the additive inverse to a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0. We already know from Ex. 2.36(b) that the map ι : Z −→ Q, ι(k) := k , 1 is injective. We will now also show it is a unital ring monomorphism, i.e. it satisfies ι(1) = 11 , ∀ k,l∈Z ι(k + l) = ι(k) + ι(l), (4.39a) ι(kl) = ι(k) · ι(l). (4.39b) ∀ k,l∈Z We begin by showing that the definitions in (4.37) and (4.38) do not depend on the chosen representatives, i.e. ! a ã c c̃ ad + bc ãd˜ + b̃c̃ = ∧ = ⇒ = (4.40) ∀ ∀ ˜ a,c,ã,c̃∈Z b,d,b̃,d∈Z\{0} b bd b̃ d d˜ b̃d˜ and ∀ a,c,ã,c̃∈Z ∀ ˜ b,d,b̃,d∈Z\{0} ã c c̃ a = ∧ = b b̃ d d˜ ⇒ ac ãc̃ . = bd b̃d˜ (4.41) Furthermore, the results of both addition and multiplication are always elements of Q. (4.40) and (4.41): a/b = ã/b̃ means ab̃ = ãb, c/d = c̃/d˜ means cd˜ = c̃d, implying (ad + bc)b̃d˜ = bd(ãd˜ + b̃c̃), i.e. and acb̃d˜ = bdãc̃, i.e. ad + bc ãd˜ + b̃c̃ = bd b̃d˜ ac ãc̃ = . bd b̃d˜ AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 88 4 BASIC ALGEBRAIC STRUCTURES That the results of both addition and multiplication are always elements of Q follows from (4.35), i.e. from the fact that Z has no nontrivial zero divisors. In particular, if b, d 6= 0, then bd 6= 0, showing (ad + bc)/(bd) ∈ Q and (ac)/(bd) ∈ Q. Next, we verify + and · to be commutative and associative on Q: Let a, c, e ∈ Z and b, d, f ∈ Z \ {0}. Then, using commutativity on Z, we compute cb + da ad + bc a c c a + = = = + , d b db bd b d c a ca ac a c · = = = · , d b db bd b d showing commutativity on Q. Using associativity and distributivity on Z, we compute a a cf + de e adf + b(cf + de) (ad + bc)f + bde c = + + + = = b d f b df bdf bdf a c e ad + bc e + = + = + , bd f b d f a(ce) (ac)e a c e c e a = · · = = · · , b d f b(df ) (bd)f b d f showing associativity on Q. We proceed to checking distributivity on Q: Using commutativity, associativity, and distributivity on Z, we compute e a(cf + de) acf + dae acbf + bdae ac ae a c a e a c · + = = = = + = · + · , b d f bdf bdf bdbf bd bf b d b f proving distributivity on Q. We now check the claims regarding neutral and inverse elements: a 0 a·1+b·0 a + = = , b 1 b·1 b ab + b(−a) (a − a)b 0 0 a −a + = = = 2 = , 2 2 b b b b b 1 a·1 a a 1 · = = , b 1 b·1 b a b ab 1 · = = . b a ba 1 Thus, (Q, +, ·) is a ring and (Q \ {0}, ·) is a group, implying Q to be a field. Finally l k·1+1·l k+l k + = = = ι(k + l), 1 1 1 1 k l kl ι(k) · ι(l) = · = = ι(kl), 1 1 1 ι(k) + ι(l) = proving (4.39). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 89 4 BASIC ALGEBRAIC STRUCTURES Remark 4.40. With the usual addition and multiplication, (R, +, ·) and (C, +, ·) are fields (see [Phi16, Th. D.39] and and [Phi16, Th. 5.2], respectively). In particular, R is a subfield of C and Q is a subfield of both R and C. Definition and Remark 4.41. Let (R, +, ·) be a ring with unity. We call x ∈ R invertible if, and only if, there exists x ∈ R such that xx = xx = 1. We call (R∗ , ·), where R∗ := {x ∈ R : x invertible} the group of invertible elements of R. We verify (R∗ , ·) to be a group: If x, y ∈ R∗ , then xy ȳx̄ = 1 and ȳx̄xy = 1, showing xy ∈ R∗ . Moreover, R∗ inherits associativity from R, 1 ∈ R∗ and each x ∈ R∗ has a multiplicative inverse by definition of R∗ , proving (R∗ , ·) to be a group. We also note that, if R 6= {0}, then 0 ∈ / R∗ and (R∗ , +) is not a group (in particular, R∗ is then not a subring of R). Example 4.42. (a) Let (R, +, ·) be a ring and let M be a set. As in Ex. 4.9(e), we can define + and · pointwise on RM , i.e. ∀ f,g∈RM f + g : M −→ R, f · g : M −→ R, (f + g)(x) := f (x) + g(x), (f · g)(x) := f (x) · g(x). As with commutativity and associativity, it is immediate that RM inherits distributivity from R. Thus, (RM , +, ·) is a ring as well; if (R, +, ·) is a ring with unity, then so is (RM , +, ·), where f ≡ 1 is neutral for · in RM , and, using the notation of Def. and Rem. 4.41, (RM )∗ = (R∗ )M . We point out that, if M and R both have at least two distinct elements, then RM always has nontrivial divisors of 0, even if R has none: Let x, y ∈ M with x 6= y, and 0 6= a ∈ R. Define ( ( a if z = y, a if z = x, fy (z) := fx , fy : M −→ R, fx (z) := 0 if z 6= y. 0 if z 6= x, Then fx 6≡ 0, fy 6≡ 0, but fx · fy ≡ 0. (b) Let (G, +) be an abelian group and let End(G) denote the set of (group) endomorphisms on G. Define addition and multiplication on End(G) by letting, for each f, g ∈ End(G), (f + g) : G −→ G, (f · g) : G −→ G, (f + g)(a) := f (a) + g(a), (f · g)(a) := f (g(a)). (4.42a) (4.42b) We claim that (End(G), +, ·) forms a ring with unity (the so-called ring of endomorphisms on G): First, we check + to be well-defined (which is clear for ·): For each a, b ∈ G, we compute, using commutativity of + on G, (f +g)(a+b) = f (a+b)+g(a+b) = f (a)+f (b)+g(a)+g(b) = (f +g)(a)+(f +g)(b), AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 90 4 BASIC ALGEBRAIC STRUCTURES showing f + g ∈ End(G). As we then know (End(G), +) ≤ (F(G, G), +), we know · to be associative, and Id is clearly neutral for ·, it remains to check distributivity: To this end, let f, g, h ∈ End(G). Then, for each a ∈ G, f · (g + h) (a) = f g(a) + h(a) = (f g)(a) + (f h)(a) = (f g + f h)(a), (g + h) · f (a) = (g + h)(f (a)) = (gf )(a) + (hf )(a) = (gf + hf )(a), proving distributivity and that (End(G), +, ·) forms a ring with unity. Moreover, the set Aut(G) := End(G) ∩ SG of automorphisms on G is a subgroup of the symmetric group SG and, comparing with Def. and Rem. 4.41, we also have Aut(G) = End(G)∗ . Definition and Remark 4.43. Let (F, +, ·) be a field. We recall the unital ring homomorphism φ1 : Z −→ F from Ex. 4.36(d), where ∀ n∈N φ1 (n) = n X i=1 1 =: n · 1. We say that the field F has characteristic 0 (denoted char F = 0) if, and only if, n·1 6= 0 for each n ∈ N. Otherwise, we define the field’s characteristic to be the number char F := min{n ∈ N : n · 1 = 0}. With this definition, we have char Q = char R = char C = 0 and, for each prime number p, char Zp = p. — The examples above show that 0 and each prime number occur as characteristic of fields. It is not difficult to prove that no other numbers can occur: Theorem 4.44. Let F be a field. If char F 6= 0, then p := char F is a prime number. Proof. Suppose p = ab with ab ∈ N. As before, we apply the unital ring homomorphism φ1 : Z −→ F of Ex. 4.36(d). Since φ1 is a homomorphism, we obtain 0 = φ1 (p) = φ1 (ab) = φ1 (a) · φ1 (b). As the field F has no nontrivial zero divisors, we must have φ1 (a) = 0 or φ1 (b) = 0. But p = min{n ∈ N : φ1 (n) = 0}, implying a = p or b = p, showing p to be prime. Lemma 4.45. If (F, +, ·) is a field and K ⊆ F is a subfield, then char K = char F (in particular, if char F = 0 (e.g. F = Q or F = R or F = C), then F does not have any finite subfields). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 91 5 VECTOR SPACES P Proof. Suppose K ⊆ F is a subfield. If char K = n ∈ N, then ni=1 1 = 0 in K and, thus, P in F , since addition on K is the restriction of the addition on F . If char K = 0, then ni=1 1 6= 0 in K for each n ∈ N. Thus, again, the same must hold in F , since addition on K is the restriction of the addition on F . Remark and Definition 4.46. Let (F, +, ·) be a field. Let P be the intersection of all subfields of F . As a consequence of Ex. 4.36(e), P is a subfield of F . The field P is called the prime field of F and is, clearly, the smallest subfield contained in F . One can show that P ∼ = Zp if, and only if, char F = p ∈ N (these prime fields are often also denoted as Fp in the algebraic literature); P ∼ = Q if, and only if, char F = 0 (cf. [Phi19, Th. D.14]). 5 Vector Spaces 5.1 Vector Spaces and Subspaces Definition 5.1. Let F be a field and let V be a nonempty set with two maps + : V × V −→ V, · : F × V −→ V, (x, y) 7→ x + y, (λ, x) 7→ λ · x (5.1) (+ is called (vector) addition and · is called scalar multiplication; as with other multiplications before, often one writes λx instead of λ · x – take care not to confuse the vector addition on V with the addition on F and, likewise, not to confuse the scalar multiplication with the multiplication on F , the symbol + is used for both additions and · is used for both multiplications, but you can always determine from the context which addition or multiplication is meant). Then V is called a vector space or a linear space over F (sometimes also an F -vector space) if, and only if, the following conditions are satisfied: (i) (V, +) is a commutative group. The neutral element with respect to + is denoted 0 (do not confuse 0 ∈ V with 0 ∈ F – once again, the same symbol is used for different objects (both objects only coincide for F = V )). (ii) Distributivity: λ∈F ∀ x,y∈V ∀ x∈V λ,µ∈F ∀ λ(x + y) = λx + λy, (5.2a) ∀ (λ + µ)x = λx + µx. (5.2b) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 92 5 VECTOR SPACES (iii) Compatibility between Multiplication on F and Scalar Multiplication: ∀ λ,µ∈F ∀ x∈V (λµ)x = λ(µx). (5.3) (iv) The neutral element with respect to the multiplication on F is also neutral with respect to the scalar multiplication: ∀ x∈V 1x = x. (5.4) If V is a vector space over F , then one calls the elements of V vectors and the elements of F scalars. Example 5.2. (a) Every field F is a vector space over itself if one uses the field addition in F as the vector addition and the field multiplication in F as the scalar multiplication (as important special cases, we obtain that R is a vector space over R and C is a vector space over C): All the vector space laws are immediate consequences of the corresponding field laws: Def. 5.1(i) holds as every field is a commutative group with respect to addition; Def. 5.1(ii) follows from field distributivity and multiplicative commutativity on F ; Def. 5.1(iii) is merely the multiplicative associativity on F ; and Def. 5.1(iv) holds, since scalar multiplication coincides with field multiplication on F . (b) The reasoning in (a) actually shows that every field F is a vector space over every subfield E ⊆ F . In particular, R is a vector space over Q. (c) In a further generalization of (b), let V be a vector space over the field F and let E ⊆ F be a subfield of F . Then V is also a vector space over E: If the laws of Def. 5.1 hold for all elements of F , then, in particular, they also hold for all elements of E. (d) If A is any nonempty set, F is a field, and Y is a vector space over the field F , then we can make V := F(A, Y ) = Y A (the set of functions from A into Y ) into a vector space over F by defining for each f, g : A −→ Y : (f + g) : A −→ Y, (λ · f ) : A −→ Y, (f + g)(x) := f (x) + g(x), (λ · f )(x) := λ · f (x) for each λ ∈ F : (5.5a) (5.5b) To verify that (V, +, ·) is, indeed, a vector space, we begin by checking Def. 5.1(i), i.e. by showing (V, +) is a commutative group: Since (5.5a) is precisely the pointwise definition of Ex. 4.9(e), we already know (V, +) to be a commutative group due to (4.11d). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 93 5 VECTOR SPACES To verify Def. 5.1(ii), one computes ∀ λ∈F ∀ f,g∈V ∀ x∈A (∗) λ(f + g) (x) = λ f (x) + g(x) = λf (x) + λg(x) = λf + λg (x), where distributivity in the vector space Y was used at (∗), proving λ(f +g) = λf +λg and, thus, (5.2a). Similarly, ∀ λ,µ∈F ∀ f ∈V ∀ x∈A (∗) (λ + µ)f (x) = (λ + µ)f (x) = λf (x) + µf (x) = λf + µf (x), where, once more, distributivity in the vector space Y was used at (∗), proving (λ + µ)f = λf + µf and, thus, (5.2b). The proof of Def. 5.1(iii), is given by ∀ λ,µ∈F ∀ f ∈V ∀ x∈A (∗) (λµ)f (x) = (λµ)f (x) = λ(µf (x)) = λ(µf ) (x), where the validity of Def. 5.1(iii) in the vector space Y was used at (∗). Finally, to prove Def. 5.1(iv), one computes ∀ f ∈V ∀ x∈A (∗) (1 · f )(x) = 1 · f (x) = f (x), where the validity of Def. 5.1(iv) in the vector space Y was used at (∗). (e) Let F be a field (e.g. F = R or F = C) and n ∈ N. For x = (x1 , . . . , xn ) ∈ F n , y = (y1 , . . . , yn ) ∈ F n , and λ ∈ F , define componentwise addition x + y := (x1 + y1 , . . . , xn + yn ), (5.6a) and componentwise scalar multiplication λx := (λx1 , . . . , λxn ). (5.6b) Then (F n , +, ·) constitutes a vector space over F . The validity of Def. 5.1(i) – Def. 5.1(iv) can easily be verified directly, but (F n , +, ·) can also be seen as a special case of (d) with A = {1, . . . , n} and Y = F : Recall that, according to Ex. 2.16(c), F n = F {1,...,n} = F {1, . . . , n}, F is the set of functions from {1, . . . , n} into F . Then x = (x1 , . . . , xn ) ∈ F n is the same as the function f : {1, . . . , n} −→ F , f (j) = xj . Thus, (5.6a) is, indeed, the same as (5.5a), and (5.6b) is, indeed, the same as (5.5b). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 94 5 VECTOR SPACES Proposition 5.3. The following rules hold in each vector space V over the field F (here, even though we will usually use the same symbol for both objects, we write 0 for the additive neutral element in F and ~0 for the additive neutral element in V ): (a) λ · ~0 = ~0 for each λ ∈ F . (b) 0 · v = ~0 for each v ∈ V . (c) λ · (−v) = (−λ) · v = −(λ · v) for each λ ∈ F and each v ∈ V . (d) If λ · v = ~0 with λ ∈ F and v ∈ V , then λ = 0 or v = ~0. Proof. Exercise. Definition 5.4. Let (V, +, ·) be a vector space over the field F , ∅ 6= U ⊆ V . We call U a subspace of V if, and only if, (U, +, ·) forms a vector space over F with respect to the operations + and · it inherits from V , i.e. with respect to +↾U ×U and ·↾F ×U . Theorem 5.5. Let (V, +, ·) be a vector space over the field F , ∅ 6= U ⊆ V . Then the following statements are equivalent: (i) U is a subspace of V . (ii) One has ∀ x + y ∈ U, (5.7a) λx ∈ U. (5.7b) λx + µy ∈ U. (5.8) x,y∈U ∧ λ∈F ∀ ∀ x,y∈U ∀ x∈U (iii) One has λ,µ∈F ∀ Proof. From the definition of a vector space, it is immediate that (i) implies (ii) and (iii). “(iii)⇒(ii)”: One obtains (5.7a) from (5.8) by setting λ := µ := 1, and one obtains (5.7b) from (5.8) by setting y := 0 (and µ arbitrary, e.g., µ := 1). “(ii)⇒(i)”: Clearly, U inherits commutativity and distributivity as well as the validity of (5.3) and (5.4) from V . Moreover, if x ∈ U , then, setting λ := −1 in (5.7b), shows −x ∈ U . This, together with (5.7a), proves (U, +) to be a subgroup of (V, +), thereby completing the proof of (i). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 95 5 VECTOR SPACES Example 5.6. (a) As a consequence of Th. 5.5, if V is a vector space over the field F , then {0} ⊆ V is always a subspace of V (sometimes called the trivial or the zero vector space over F ). (b) √ Q is not a subspace of R if R is considered as a vector space over R (for example, 2·2 ∈ / Q). However, Q is a subspace of R if R is considered as a vector space over Q. (c) Consider the vector space V := R3 over R and let U := {(x, y, z) ∈ V : 3x − y + 7z = 0}. We use Th. 5.5 to show U is a subspace of V : Since (0, 0, 0) ∈ U , we have U 6= ∅. Moreover, if u1 = (x1 , y1 , z1 ) ∈ U , u2 = (x2 , y2 , z2 ) ∈ U , and λ, µ ∈ R, then λu1 + µu2 = (λx1 + µx2 , λy1 + µy2 , λz1 + µz2 ) and 3(λx1 + µx2 ) − (λy1 + µy2 ) + 7(λz1 + µz2 ) = λ(3x1 − y1 + 7z1 ) + µ(3x2 − y2 + 7z2 ) u1 ,u2 ∈U = λ · 0 + µ · 0 = 0, showing λu1 + µu2 and proving U to be a subspace of V . (d) It is customary to write K if K may stand for R or C, and, from now on, we will occasionally use this notation. From Ex. 5.2(d), we know that, for each ∅ 6= A, F(A, K) constitutes a vector space over K. Thus, as a consequence of Th. 5.5, a subset of F(A, K) is a vector space over K if, and only if, it is closed under addition and scalar multiplication. By using results from Calculus, we obtain the following examples: (i) The set P(K) of all polynomial functions mapping from K into K is a vector space over K by [Phi16, Rem. 6.4]; for each n ∈ N0 , the set Pn (K) of all such polynomial functions of degree at most n is also a vector space over K by [Phi16, Rem. (6.4a),(6.4b)]. (ii) If ∅ 6= M ⊆ C, then the set of continuous functions from M into K, i.e. C(M, K), is a vector space over K by [Phi16, Th. 7.38]. (iii) If a, b ∈ R ∪ {−∞, ∞} and a < b, then the set of differentiable functions from I :=]a, b[ into K is a vector space over K by [Phi16, Th. 9.7(a),(b)]. Moreover, [Phi16, Th. 9.7(a),(b)] also implies that, for each k ∈ N, the set of k times differentiable functions from I into K is a vector space over K, and so is each set C k (I, K) of k times continuously differentiable functions ([Phi16, Th. 7.38] T ∞ k is also needed for the last conclusion). The set C (I, K) := k∈N C (I, K) is also a vector space over K by Th. 5.7(a) below. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 96 5 VECTOR SPACES Theorem 5.7. Let V be a vector space over the field F . (a) Let I 6= ∅ be T an index set and (Ui )i∈I a family of subspaces of V . Then the intersection U := i∈I Ui is also a subspace of V . (b) In contrast to intersections, unions of subspaces are almost never subspaces (however, cf. (c) below). More precisely, if U1 and U2 are subspaces of V , then U1 ∪ U2 is subspace of V ⇔ U1 ⊆ U2 ∨ U2 ⊆ U1 . (5.9) (c) While (b) shows that unions of subspaces are not necessarily subspaces, if I 6= ∅ is an index set, partially ordered by ≤ in a way such that, for each i, j ∈ I, there exists k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use k := max{i, j}), and (Ui )i∈I is an increasing family of subspaces (i.e., for each i, j ∈ I with i ≤ j, S one has Ui ⊆ Uj ), then U := i∈I Ui is a subspace as well. Proof. (a): Since 0 ∈ U , we have U 6= ∅. If x, y ∈ U and λ, µ ∈ F , then λx + µy ∈ Ui for each i ∈ I, implying λx + µy ∈ U . Thus, U is a subspace of V by Th. 5.5. (b): Exercise. (c): Since 0 ∈ Ui ⊆ U , we have U 6= ∅. If x, y ∈ U and λ, µ ∈ F , then there exist i, j ∈ I such that x ∈ Ui and y ∈ Uj . If k ∈ I with i, j ≤ k, then λx + µy ∈ Uk ⊆ U . Thus, U is a subspace of V by Th. 5.5. While the union of two subspaces U1 , U2 is typically not a subspace, Th. 5.7(a) implies there is always a smallest subspace containing U1 ∪ U2 and we will see below that this subspace is always the so-called sum U1 + U2 of the two subspaces (this is a special case of Prop. 5.11). As it turns out, the sum of vector spaces is closely connected with two other essential notions of vector space theory, namely the notions linear combination and span, which we define first: Definition 5.8. Let V be a vector space over the field F . (a) Let n ∈ N and v1 , . . . , vn ∈ V . A vector v ∈ V is called a linear combination of v1 , . . . , vn if, and only if, there exist λ1 , . . . , λn ∈ F (often called coefficients in this context) such that n X v= λi v i . (5.10) i=1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 97 5 VECTOR SPACES (b) Let A ⊆ V , and Then the set U := U ∈ P(V ) : A ⊆ U ∧ U is subspace of V . hAi := span A := \ U (5.11) (5.12) U ∈U is called the span of A. Moreover A is called a spanning set or a generating set of V if, and only if, hAi = V . Proposition 5.9. Let V be a vector space over the field F and A ⊆ V . (a) hAi is a subspace of V , namely the smallest subspace of V containing A. (b) If A = ∅, then hAi = {0}; if A 6= ∅, then hAi is the set of all linear combinations of elements from A, i.e. ( n ) X hAi = λ i ai : n ∈ N ∧ λ 1 , . . . , λ n ∈ F ∧ a1 , . . . , a n ∈ A . (5.13) i=1 (c) If A ⊆ B ⊆ V , then hAi ⊆ hBi. (d) A = hAi if, and only if, A is a subspace of V . (e) hhAii = hAi. Proof. (a) is immediate from Th. 5.7(a). (b): For the case A = ∅, note that {0} is a subspace of V , and that {0} is contained in every subspace of V . For A 6= ∅, let W denote the right-hand side of (5.13), and recall from (5.12) that hAi is the intersection of all subspaces U of V that contain A. If U is a subspace of V and A ⊆ U , then W ⊆ U , since U is closed under vector addition and scalar multiplication, showing W ⊆ hAi. On the other hand, W is clearly a subspace of V that contains A, showing hAi ⊆ W , completing the proof of hAi = W . (c) is immediate from (b). (d): If A = hAi, then A is a subspace by (a). For the converse, while it is clear that A ⊆ hAi always holds, if A is a subspace, then A ∈ U, where U is as in (5.11), implying hAi ⊆ A. (e) now follows by combining (d) with (a). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 98 5 VECTOR SPACES Definition 5.10. Let V be a vector space over the field F , let I be an index set and let (Ui )i∈I be a family of subspaces of V . For I = ∅, define X Ui := {0}. (5.14a) i∈I If #I = n ∈ N, then let φ : {1, . . . , n} −→ I be a bijection and define ) ( n n X X X uk : ∀ uk ∈ Uφ(k) , Ui := Uφ(k) := i∈I k∈{1,...,n} k=1 k=1 (5.14b) where the commutativity of addition guarantees the definition is independent of the chosen bijection (note that this definition is consistent with the definition in Ex. 4.9(d)). For a general, not necessarily finite I, let J := {J ⊆ I : J is finite} and define X [ X Ui := Uj . (5.14c) i∈I In each case, we call P J∈J j∈J i∈I Ui the sum of the subspaces Ui , i ∈ I. Proposition 5.11. Let V be a vector space over the field F , let I be an index set and let (Ui )i∈I be a family of subspaces of V . Then + * [ X Ui (5.15) Ui = i∈I (in particular, P i∈I i∈I Ui is always a subspace of V ). P S Proof. Let A := i∈I Ui , B := h i∈I Ui i. If I = ∅, then both sides of (5.15) are {0}, i.e. the statement holds in this case.PNow assume I 6= ∅. If v ∈ A, then there are i1 , . . . , in ∈ I, n ∈ N, such that v = nk=1 uik , uik ∈ Uik , showing v ∈ B, since B is a subspace of V . Thus, A ⊆ B. To prove B ⊆ A, it suffices to show A is a subspace of V that contains each Ui , i ∈ I. If i ∈ I and v ∈ Ui , then v ∈ A is clear from (5.14c) (set J := {i}). P Let v, w ∈ A, where v is as above andP there exist P j1 , . . . , jm ∈ I, m ∈ N, such n m that w = m w , w ∈ U . Then v + w = u + jk k=1 k=1 ik k=1 ujk ∈ A and, for each Pn j k j k λ ∈ F , λv = k=1 (λuik ) ∈ A, proving A to be a subspace of V . Thus, B ⊆ A, thereby concluding the proof. 5.2 Linear Independence, Basis, Dimension The notions introduced in the present section are of central importance to the theory of vector spaces. In particular, we will see that the structure of many interesting vector AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 99 5 VECTOR SPACES spaces (e.g. Kn ) is particularly simple, as every vector can be written as a so-called linear combination of a fixed finite set of linearly dependent vectors (we will see this is the case if, and only if, the vector space is finite-dimensional, cf. Th. 5.23 below). Definition 5.12. Let V be a vector space over the field F . (a) A vector v ∈ V is called linearly dependent on a subset U of V (or on the vectors in U ) if, and only if, v = 0 or there exists n ∈ N and u1 , . . . , un ∈ U such that v is a linear combination of u1 , . . . , un . Otherwise, v is called linearly independent of U . (b) A subset U of V is called linearly independent if, and only if, whenever 0 ∈ V is written as a linear combination of distinct elements of U , then all coefficients must be 0 ∈ F , i.e. if, and only if, ! X n ∈ N ∧ W ⊆ U ∧ #W = n ∧ λu u = 0 ∧ ∀ λu ∈ F u∈W ⇒ ∀ u∈W u∈W λu = 0. (5.16a) Occasionally, one also wants to have the notion available for families of vectors instead of sets, and one calls a family (ui )i∈I of vectors in V linearly independent if, and only if, ! X n ∈ N ∧ J ⊆ I ∧ #J = n ∧ λ j uj = 0 ∧ ∀ λj ∈ F j∈J ⇒ ∀ j∈J λj = 0. j∈J (5.16b) Sets and families that are not linearly independent are called linearly dependent. Example 5.13. Let V be a vector space over the field F . (a) ∅ is linearly independent: Indeed, if U = ∅, then the left side of the implication in (5.16a) is always false (since W ⊆ U means #W = 0), i.e. the implication is true. Moreover, by Def. 5.12(a), vP∈ V is linearly dependent on ∅ if, and only if, v = 0 (this is also consistent with u∈∅ λu u = 0 (cf. (3.15d)). (b) If 0 ∈ U ⊆ V , then U is linearly dependent (in particular, {0} is linearly dependent), due to 1 · 0 = 0. (c) If 0 6= v ∈ V and λ ∈ F with λv = 0, then λ = 0 by Prop. 5.3(d), showing {v} to be linearly independent. However, the family (v, v) is always linearly dependent, since 1v + (−1)v = 0 (also for v = 0). Moreover, 1v = v also shows that every v ∈ V is linearly dependent on itself. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 100 5 VECTOR SPACES (d) The following example is of some general importance: Let I be a set and V = F I (cf. Ex. 5.2(d),(e)). Define ( 1 if i = j, (5.17) ∀ ei : I −→ F, ei (j) := δij := i∈I 0 if i 6= j. Then δ is known as the Kronecker delta (which can be seen as a function δ : I × I −→ {0, 1}). If n ∈ N and V = F n , i.e. I = {1, . . . , n}, then one often writes the ei in the form e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en−1 = (0, . . . , 0, 1, 0), en = (0, . . . , 0, 1); for example in R3 (or, more generally, in F 3 ), e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1). Returning to the case of a general I, we check that the set B := {ei : i ∈ I} (5.18) is linearly independent: Assume X j∈J λj ej = 0 ∈ V, where J ⊆ I, #J = n ∈ N, and λj ∈ F for each j ∈ J. Recalling that 0 ∈ V is the function that is constantly equal to 0 ∈ F , we obtain ! X X X λj ej (k) = λj ej (k) = λj δjk = λk , ∀ 0= k∈J j∈J j∈J j∈J thereby proving the linear independence of B. (e) For a Calculus application, we let ∀ k∈N0 fk : R −→ K, fk (x) := ekx , and show that U := {fk : k ∈ N0 } is linearly independent as a subset of the vector space V := C(R, K) over K of differentiable functions from R into K (cf. Ex. 5.6(d)(iii)), by showing each set Un := {fk : k ∈ {0, . . . , n}}, n ∈ N0 , is linearly independent, using an induction on n: Since, for n = 0, fn ≡ 1, the base case holds. Now let n ≥ 1 and assume λ0 , . . . , λn ∈ K are such that ∀ x∈R n X λk ekx = 0. k=0 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (5.19) 101 5 VECTOR SPACES Multiplying (5.19) by n yields ∀ n x∈R n X λk ekx = 0, (5.20) k=0 whereas differentiating (5.19) with respect to x yields n X ∀ x∈R kλk ekx = 0. (5.21) k=0 By subtracting (5.21) from (5.20), we then obtain ∀ x∈R n−1 X k=0 (n − k)λk ekx = 0, implying λ0 = · · · = λn−1 = 0 due to the induction hypothesis. Using this in (5.19), then yields ∀ λn enx = 0 x∈R and, thus, λn = 0 as well, completing the induction proof. Proposition 5.14. Let V be a vector space over the field F and U ⊆ V . (a) U is linearly dependent if, and only if, there exists u0 ∈ U such that u0 is linearly dependent on U \ {u0 }. (b) If U is linearly dependent and U ⊆ M ⊆ V , then M is linearly dependent as well. (c) If U is linearly independent and M ⊆ U , then M is linearly independent as well. (d) Let U be linearly independent. If U1 , U2 ⊆ U and V1 := hU1 i, V2 := hU2 i, then V1 ∩ V2 = hU1 ∩ U2 i. Proof. (a):PSuppose, U is linearly dependent. Then there exists W ⊆ U , #W = n ∈ N, such that u∈W λu u = 0 with λu ∈ F and there exists u0 ∈ W with λu0 6= 0. Then u0 = −λ−1 u0 X u∈W \{u0 } λu u = X (−λ−1 u0 λu )u, u∈W \{u0 } showing u0 to be linearly dependent on U \ {u0 }. Conversely, if u0 ∈ U is linearly dependent on U \ {u0 }, then u0 = 0, in which case U is linearly dependent according to AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 102 5 VECTOR SPACES Ex. 5.13(b), or u0 6= 0, in which case, there exists n ∈ N, distinct u1 , . . . , un ∈ U \ {u0 }, and λ1 , . . . , λn ∈ F such that u0 = n X λ i ui i=1 ⇒ −u0 + n X λi ui = 0, i=1 showing U to be linearly dependent, since the coefficient of u0 is −1 6= 0. (b) and (c) are now both immediate from (a). (d): Since V1 ∩ V2 is a vector space with U1 ∩ U2 ⊆ V1 ∩ V2 , we already have hU1 ∩ U2 i ⊆ V1 ∩ V2 . Now let u ∈ V1 ∩ V2 . Then there exist distinct z1 , . . . , zk ∈ U1 ∩ U2 and distinct xk+1 , . . . , xk+n ∈ U1 \ U2 and distinct yk+1 , . . . , yk+m ∈ U2 \ U1 with k ∈ N0 , m, n ∈ N, as well as λ1 , . . . , λk+n , µ1 , . . . , µk+m ∈ F such that u= k X λi z i + i=1 implying 0= k+n X i=k+1 λi xi = k X i=1 µi z i + k+m X µ i yi , i=k+1 k k+n k+m X X X (λi − µi )zi + λi xi − µ i yi . i=1 i=k+1 i=k+1 The linear independence of U then yields λ1 = µ1 , . . . , λk = µk as well as λk+1 = · · · = λk+n = µk+1 = · · · = µk+m = 0. Thus, in particular, u ∈ hU1 ∩ U2 i, proving V1 ∩ V2 ⊆ hU1 ∩ U2 i as desired. Definition 5.15. Let V be a vector space over the field F and B ⊆ V . Then B is called a basis of V if, and only if, B is a generating set for V (i.e. V = hBi) that is also linearly independent. Example 5.16. (a) Due to Ex. 5.13(a), in each vector space V over a field F , we have that ∅ is linearly independent with h∅i = {0}, i.e. ∅ is a basis of the trivial space {0}. (b) If one considers a field F as a vector space over itself, then, clearly, every {λ} with 0 6= λ ∈ F is a basis. (c) We continue Ex. 5.13(d), where we showed that, given a field F and a set I, B = {ei : i ∈ I} with the ei defined by (5.17) constitutes a linearly independent subset I of the vector space F I over F . We will now show that hBi = Ffin (i.e. B is a basis I I of Ffin ), where Ffin denotes the set of functions f : I −→ F such that there exists a finite set If ⊆ I, satisfying f (i) = 0 for each i ∈ I \ If , f (i) 6= 0 for each i ∈ If AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (5.22a) (5.22b) 103 5 VECTOR SPACES I n (then Ffin = F I if, and only if, I is finite (for example Ffin = F n for n ∈ N); but, in I I I general, Ffin is a strict subset of F ). We first show that Ffin is always a subspace I of F I : Indeed, if f, g ∈ Ffin and λ ∈ F , then Iλf = If for λ 6= 0, Iλf = ∅ for λ = 0, I I I and If +g ⊆ If ∪ Ig , i.e. λf ∈ Ffin and f + g ∈ Ffin . We also note B ⊆ Ffin , since I Iei = {i} for each i ∈ I. To see hBi = Ffin , we merely note that X f= f (i)ei . (5.23) ∀ I f ∈Ffin i∈If I Thus, B is a basis of Ffin as claimed – B is often referred to as the standard basis of I Ffin . In particular, we have shown that, for each n ∈ N, the set {ej : j = 1, . . . , n}, where e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1), forms a basis of I F n (of Rn if F = R and of Cn if F = C). If I = N, then Ffin is the space of all sequences in F that have only finitely many nonzero entries. We will see later that, I actually, every vector space is isomorphic to some suitable Ffin (cf. Rem. 5.25 and Th. 6.9). (d) In Ex. 5.6(d)(i), we noted that the set P(K) of polynomial functions with coefficients in K is a vector space over K. We show the set B := {ej : j ∈ N0 } of monomial functions ej : K −→ K, ej (x) := xj , to be a basis of P(K): While hBi = P(K) is immediate from the definition of P(K), linear independence of B can be shown similarly to the linear independence of the exponential functions in Ex. 5.13(e): As in that example, we use the differentiability of the considered functions (here, the ej ) to show each set Un := {ej : j ∈ {0, . . . , n}}, n ∈ N0 , is linearly independent, using an induction on n: Since, for n = 0, en ≡ 1, the base case holds. Now let n ≥ 1 and assume λ0 , . . . , λn ∈ K are such that ∀ x∈R n X λj xj = 0. (5.24) j=0 Differentiating (5.24) with respect to x yields ∀ x∈R n X jλj xj−1 = 0, j=1 implying λ1 = · · · = λn = 0 due to the induction hypothesis. Using this in (5.24), then yields λ0 = λ0 x0 = 0, completing the induction proof. We will see later that P(K) is, actually, isomorphic, to (K)Nfin . Theorem 5.17. Let V be a vector space over the field F and B ⊆ V . Then the following statements (i) – (iii) are equivalent: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 104 5 VECTOR SPACES (i) B is a basis of V . (ii) B is a maximal linearly independent subset of V , i.e. B is linearly independent and each set A ⊆ V with B ( A is linearly dependent. (iii) B is a minimal generating set for V , i.e. hBi = V and hAi ( V for each A ( B. Proof. “(i)⇒(ii)”: Let B ⊆ A ⊆ V and a ∈ A \ B. Since hBi = V , there exist λ1 , . . . , λn ∈ F , b1 , . . . , bn ∈ B, n ∈ N, such that a= n X ⇒ λi bi i=1 (−1)a + n X λi bi = 0, i=1 showing A to be linearly dependent. “(ii)⇒(i)”: Suppose B to be a maximal linearly independent subset of V . We need to show hBi = V . Since B ⊆ hBi, let v ∈ V \ B. Then A := {v} ∪ B is linearly dependent, i.e. ! X #W = n ∈ N ∧ λu u = 0 ∧ ∀ λu ∈ F ∧ ∃ λu 6= 0 , ∃ W ⊆A u∈W u∈W u∈W (5.25) where the linear independence of B implies v ∈ W and λv 6= 0. Thus, X v=− λ−1 v λu u u∈W \{v} showing v ∈ hBi, since W \ {v} ⊆ B. “(i)⇒(iii)”: Let A ( B and b ∈ B \ A. Since B is linearly independent, Prop. 5.14(a) implies b not to be a linear combination of elements in A, showing hAi ( V . “(iii)⇒(i)”: We need to show that a minimal generating set for V is linearly independent. Arguing by contraposition, we let B be a generating set for V that is linearly dependent and show B is not minimal: Since B is linearly dependent, (5.25) holds with A replaced by B and there exists b0 ∈ W with λb0 6= 0, yielding X (5.26) b0 = − λ−1 b0 λu u. u∈W \{b0 } We show A := B \ {b0 } to be a generating set for V : If v ∈ V , since hBi = V , there exist λ0 , λ1 , . . . , λn ∈ F and b1 , . . . , bn ∈ A, n ∈ N, such that v = λ0 b0 + n X i=1 (5.26) λi bi = − X u∈W \{b0 } λ0 λ−1 b 0 λu u + n X i=1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 λ i bi . 105 5 VECTOR SPACES Since u ∈ W \ {b0 } ⊆ A and b1 , . . . , bn ∈ A, this proves A to be a generating set for V , i.e. B is not minimal. Proposition 5.18. Let V be a P vector space over the field F and U ⊆ V a finite subset, #U = n ∈ N, u0 ∈ U , v0 := u∈U λu u with λu ∈ F for each u ∈ U and λu0 6= 0, Ũ := {v0 } ∪ (U \ {u0 }). (a) If hU i = V , then hŨ i = V . (b) If U is linearly independent, then so is Ũ . (c) If U is a basis of V , then so is Ũ . Proof. Under the hypothesis, we have −1 u0 = λ−1 u0 v 0 − λ u0 X λu u. u∈U \{u0 } (a): If v ∈ V , then there exist µu ∈ F , u ∈ U , such that X X (5.27) v= µ u u = µ u 0 u0 + µu u = µu0 λ−1 u0 v 0 + u∈U (5.27) u∈U \{u0 } X u∈U \{u0 } (µu − µu0 λ−1 u0 λu )u, showing hŨ i = V . (b): Suppose µ0 ∈ F and µu ∈ F , u ∈ U \ {u0 }, are such that X X (5.27) µ u u = µ 0 λ u 0 u0 + (µ0 λu + µu )u. 0 = µ0 v0 + u∈U \{u0 } u∈U \{u0 } Since U is linearly independent, we obtain µ0 λu0 = 0 = µ0 λu + µu for each u ∈ U \ {u0 }. Since λu0 6= 0, we have µ0 = 0, then also implying µu = 0 for each u ∈ U \ {u0 }, showing Ũ to be linearly independent. (c) is now immediate from combining (a) and (b). Theorem 5.19 (Coordinates). Let V be a vector space over the field F and assume B ⊆ V is a basis of V . Then each vector v ∈ V has unique coordinates with respect to the basis B, i.e., for each v ∈ V , there exists a unique finite subset Bv of B and a unique map c : Bv −→ F \ {0} such that X c(b) b. (5.28) v= b∈Bv Note P that, for v = 0, one has Bv = ∅, c is the empty map, and (5.28) becomes 0 = b∈∅ c(b) b. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 106 5 VECTOR SPACES Proof. The existence of Bv and the map c follows from the fact that the basis B is a generating set, hBi = V . For the uniqueness proof, consider finite sets Bv , B̃v ⊆ B and maps c : Bv −→ F \ {0}, c̃ : B̃v −→ F \ {0} such that X X c(b) b = c̃(b) b. (5.29) v= b∈Bv b∈B̃v Extend both c and c̃ to A := Bv ∪ B̃v by letting c(b) := 0 for b ∈ B̃v \ Bv and c̃(b) := 0 for b ∈ Bv \ B̃v . Then X c(b) − c̃(b) b, (5.30) 0= b∈A such that the linear independence of A implies c(b) = c̃(b) for each b ∈ A, which, in turn, implies Bv = B̃v and c = c̃. Remark 5.20. If the basis B of V has finitely many elements, then one often enumerates the elements B = {b1 , . . . , bn }, n = #B ∈ N, and writes λi := c(bi ) for bi ∈ Bv , λi := 0 for bi ∈ / Bv , such that (5.28) takes the, perhaps more familiar looking, form v= n X λi bi . (5.31) i=1 — In Ex. 5.16(c), we have seen that the vector space F n , n ∈ N, over F has the finite basis {e1 , . . . , en } and, more generally, that, for each set I, B = {ei : i ∈ I} is a basis I for the vector space Ffin over F . Since, clearly, B and I have the same cardinality (via the bijective map i 7→ ei ), this shows that, for each set I, there exists a vector space, whose basis has the same cardinality as I. In Th. 5.23 below, we will show one of the fundamental results of vector space theory, namely that every vector space has a basis and that two bases of the same vector space always have the same cardinality (which is referred to as the dimension of the vector space, cf. Def. 5.24). As some previous results of this class, the proof of Th. 5.23 makes use of the axiom of choice (AC) of Appendix A.4, which postulates, for each nonempty set M, whose elements are all nonempty sets, the existence of a choice function, that means a function that assigns, to each M ∈ M, an element m ∈ M . Somewhat surprisingly, AC can not be proved (or disproved) from ZF, i.e. from the remaining standard axioms of set theory (see Appendix A.4 for a reference). Previously, when AC was used, choice functions were employed directly. However, in the proof of Th. 5.23, AC enters in the form of Zorn’s lemma. While Zorn’s lemma turns out to be equivalent to AC, the equivalence is not obvious (see Th. A.52(iii)). However, Zorn’s lemma provides an important technique AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 5 VECTOR SPACES 107 for conducting existence proofs that is encountered frequently throughout mathematics, and the proof of existence of a basis in a vector space is the standard place to first encounter and learn this technique (as it turns out, the existence of bases in vector spaces is, actually, also equivalent to AC – see the end of the proof of Th. A.52 for a reference). The following Def. 5.21 prepares the statement of Zorn’s lemma: Definition 5.21 (Same as Part of Appendix Def. A.50). Let X be a set and let ≤ be a partial order on X. (a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there exists no x ∈ X such that m < x (note that a maximal element does not have to be a max and that a maximal element is not necessarily unique). (b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by ≤. Theorem 5.22 (Zorn’s Lemma). Let X be a nonempty partially ordered set. If every chain C ⊆ X (i.e. every nonempty totally ordered subset of X) has an upper bound in X (such chains with upper bounds are sometimes called inductive), then X contains a maximal element (cf. Def. 5.21(a)). Proof. According to Th. A.52(iii), Zorn’s lemma is equivalent to the axiom of choice. Theorem 5.23 (Bases). Let V be a vector space over the field F . (a) If U ⊆ V is linearly independent, then there exists a basis of V that contains U . (b) V has a basis B ⊆ V . (c) Bases of V have a unique cardinality, i.e. if B ⊆ V and B̃ ⊆ V are both bases of V , then there exists a bijective map φ : B −→ B̃. In particular, if #B = n ∈ N0 , then #B̃ = n. (d) If B is a basis of V and U ⊆ V is linearly independent, then there exists C ⊆ B such that B̃ := U ∪˙ C is a basis of V . In particular, if #B = n ∈ N, B = {b1 , . . . , bn }, then #U = m ∈ N with m ≤ n, and, if U = {u1 , . . . , um }, then there exist distinct um+1 , . . . , un ∈ B such that B̃ = {u1 , . . . , un } is a basis of V . Proof. If V is generated by finitely many vectors, then the proof does not need Zorn’s lemma and, thus, we treat this case first: If V = {0}, then (a) – (d) clearly hold. Thus, we also assume V 6= {0}. To prove (d), assume B = {b1 , . . . , bn } is a basis of V with AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 108 5 VECTOR SPACES #B = n ∈ N. We first show, by induction on m ∈ {1, . . . , n} that, if U is linearly independent, #U = m, U = {u1 , . . . , um }, then there exist distinct um+1 P, . . . , un ∈ B such that B̃ = {u1 , . . . , un } is a basis of V : If m = 1, we write u1 = ni=1 λi bi with λi ∈ F . Since u1 6= 0, there exists i0 ∈ {1, . . . , n} such that λi0 6= 0. Then B̃ := {u1 } ∪ (B \ {bi0 }) is a basis of V by Prop. 5.18(c), thereby establishing the base case. If 1 < m ≤ n, then, by induction hypothesis, there exist distinct cm , cm+1 , . . . , cn ∈ B such that {u1 , . . . , um−1 } ∪ {cm , cm+1 , . . . , cn } forms a basis of V . Then there exist µi ∈ F such that m−1 n X X um = µ i ui + µi c i . i=1 i=m There then must exist i0 ∈ {m, . . . , n} such that µi0 6= 0, since, otherwise, U were linearly dependent. Thus, again, by Prop. 5.18(c), B̃ := U ∪ ({cm , cm+1 , . . . , cn } \ {ci0 }) forms a basis of V , completing the induction. Now suppose m > n. Then the above shows {u1 , . . . , un } to form a basis of V , in contradiction to the linear independence of U . This then also implies that U can not be infinite. Thus, we always have #U = m ≤ n and #(B \ {um+1 , . . . , un }) = m, completing the proof of (d). Now if U is not merely linearly independent, but itself a basis of V , then we can switch the roles of B and U , obtaining m = n, proving (c). Since we assume V to be generated by finitely many vectors, there exists C ⊆ V with hCi = V and #C = k ∈ N. Then C is either a basis or it is not minimal, in which case there exists c ∈ C such that C̃ := C \ {c} still generates V , #C̃ = k − 1. Proceeding inductively, we can remove vectors until we obtain a basis of V , proving (b). Combining (b) with (d), now proves (a). General Case: Here, we prove (a) first, making use of Zorn’s lemma: Given a linearly independent set U ⊆ V , define M := {M ⊆ V : U ⊆ M and M is linearly independent} (5.32) and note that set inclusion ⊆ constitutes a partial order on M. Moreover, U ∈ M, i.e. M 6= ∅. Let C ⊆ M be a chain, i.e. assume C 6= ∅ to be totally ordered by ⊆. Define [ C0 := C. (5.33) C∈C It is then immediate that C ∈ C implies C ⊆ C0 , i.e. C0 is an upper bound for C. Moreover, C0 ∈ M: Indeed, if C ∈ C, then U ⊆ C ⊆ C0 . To verify C0 is linearly independent, let c1 , . . . , cn ∈ C0 be distinct and λ1 , . . . , λn ∈ F , n ∈ N, with 0= n X λi c i . i=1 Then, for each i ∈ {1, . . . , n}, there exists Ci ∈ C with ci ∈ Ci . Since D := {C1 , . . . , Cn } ⊆ C AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 109 5 VECTOR SPACES is totally ordered by ⊆, by Th. 3.25(a), there exist i0 ∈ {1, . . . , n} such that Ci0 = max D, i.e. Ci ⊆ Ci0 for each i ∈ {1, . . . , n}, implying ci ∈ Ci0 for each i ∈ {1, . . . , n} as well. As Ci0 is linearly independent, λ1 = · · · = λn = 0, showing C0 is linearly independent, as desired. Having, thus, verified all hypotheses of Zorn’s lemma, Th. 5.22 provides a maximal element B ∈ M. By Th. 5.17(ii), B is a basis of V . As U ⊆ B also holds, this proves (a) and, in particular, (b). To prove (c), let B1 , B2 be bases of V . As we have shown above that (c) holds, provided B1 or B2 is finite, we now assume B1 and B2 both to be infinite. We show the existence of an injective map φ1 : B2 −→ B1 : To this end, for each v ∈ B1 , let B2,v be the finite subset of B2 given by Th. 5.19, consisting of all b ∈ B2 such that the coordinate of v with respect to b is nonzero. Let ψv : B2,v −→ {1, . . . , #B2,v } be bijective. Also define [ E := B2,v . v∈B1 Then, since V = hB1 i, we also have V = hEi. Thus, as E ⊆ B2 and the basis B2 is a minimal generating set, we obtain E = B2 . In particular, for each b ∈ B2 , we may choose some v(b) ∈ B1 such that b ∈ B2,v(b) . The map f : B2 −→ B1 × N, f (b) := v(b), ψv(b) (b) , is then, clearly, well-defined and injective. Moreover, Th. A.57 of the Appendix provides us with a bijective map φB1 : B1 × N −→ B1 . In consequence, φ1 : B2 −→ B1 , φ1 := φB1 ◦ f is injective as well. Since B1 , B2 were arbitrary bases, we can interchange the roles of B1 and B2 to also obtain an injective map φ2 : B1 −→ B2 . According to the Schröder-Bernstein Th. 3.12, there then also exists a bijective map φ : B1 −→ B2 , completing the proof of (c). To prove (d), let B be a basis of V and let U ⊆ V be linearly independent. Analogously to the proof of (a), apply Zorn’s lemma, but this time with the set M := {M ⊆ V : M = U ∪˙ C, C ⊆ B, M is linearly independent}, (5.34) such that the maximal element B̃ ∈ M, obtained from Th. 5.22, has the form B̃ = U ∪˙ C with C ⊆ B. We claim B̃ to be a basis of V : Linear independence is clear, since B̃ ∈ M, and it only remains to check hB̃i = V . Due to the maximality of B̃, if b ∈ B, there exist a finite set B̃b ⊆ B̃ and λbu ∈ F , u ∈ B̃b , such that X b= λbu u. (5.35) u∈B̃b Now let v ∈ V be arbitrary. As B is a basis of V , there exist b1 , . . . , bn ∈ B and AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 110 5 VECTOR SPACES λ1 , . . . , λn ∈ F , n ∈ N, such that v= n X i=1 n (5.35) X X λi bi = i=1 u∈B̃b λi λbi u u, i proving hB̃i = V as desired. Definition 5.24. According to Th. 5.23, for each vector space V over a field F , the cardinality of its bases is unique. This unique cardinality is called the dimension of V and is denoted dim V . If dim V < ∞ (i.e. dim V ∈ N0 ), then V is called finite-dimensional, otherwise infinite-dimensional. Remark 5.25. Let F be a field. In Ex. 5.16(c), we saw that B = {ei : i ∈ I} with ( 1 if i = j, ∀ ei : I −→ F, ei (j) = δij := i∈I 0 if i 6= j, I is a basis of Ffin . Since #B = #I, we have shown I dim Ffin = #I, (5.36) and, in particular, for I = {1, . . . , n}, n ∈ N, dim F n = dim Rn = dim Cn = n. (5.37) I We will see in Th. 6.9 below that, in a certain sense, Ffin is the only vector space of dimension #I over F . In particular, for n ∈ N, one can think of F n as the standard model of an n-dimensional vector space over F . Remark 5.26. Bases of vector spaces are especially useful if they are much smaller than the vector space itself, as in the case of finite-dimensional vector spaces over infinite fields, Rn and Cn being among the most important examples. For infinite-dimensional vector spaces V , one often has dim V = #V (cf. Th. E.1 and Cor. E.2 in the Appendix) such that vector space bases are not as helpful in such situations. On the other hand, infinite-dimensional vector spaces might come with some additional structure, providing a more useful notion of basis (e.g., in infinite-dimensional Hilbert spaces, one typically uses so-called orthonormal bases rather than vector space bases). — In the rest of the section, we investigate relations between bases and subspaces; and we introduce the related notions direct complement and direct sum. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 111 5 VECTOR SPACES Theorem 5.27. Let V be a vector space over the field F and let U be a subspace of V . (a) If BU is a basis of U , then there exists a basis B of V with BU ⊆ B. (b) dim U ≤ dim V , i.e. if BU is a basis of U and B is a basis of V , then there exists an injective map φ : BU −→ B. (c) There exists a subspace U ′ of V such that V = U + U ′ and U ∩ U ′ = {0}. (d) If dim V = n ∈ N0 , then dim U = dim V ⇔ U = V. Proof. (a): Let BU be a basis of U . Since BU is a linearly independent subset of V , we can employ Th. 5.23(a) to obtain C ⊆ V such that B := BU ∪˙ C is a basis of V . (b): Let BU and B be bases of U and V , respectively. As in the proof of (a), choose C ⊆ V such that B̃ := BU ∪˙ C is a basis of V . According to Th. 5.23(c), there exists a bijective map φV : B̃ −→ B. Then the map φ := φV ↾BU is still injective. (c): Once again, let BU be a basis of U and choose C ⊆ V such that B := BU ∪˙ C is a basis of V . Define U ′ := hCi. Since B = BU ∪˙ C is a basis of V , it is immediate that V = U + U ′ . Now suppose u ∈ U ∩ U ′ . Then there exist λ1 , . . . , λm , µ1 , . . . , µn ∈ F with m, n ∈ N, as well as distinct b1 , . . . , bm ∈ BU and distinct c1 , . . . , cn ∈ C such that u= m X i=1 λi bi = n X i=1 µi c i ⇒ 0= m X i=1 λi bi − n X µi c i . i=1 Since {b1 , . . . , bm , c1 , . . . , cn } ⊆ B is linearly independent, this implies 0 = λ1 = · · · = λm = µ1 = · · · = µn and u = 0. (d): Let dim U = dim V = n ∈ N0 . If BU is a basis of U , then #BU = n and, by (a), BU must also be a basis of V , implying U = hBU i = V . Definition 5.28. Let V be a vector space over the field F and let U, W be subspaces of V . Then W is called a direct complement of U if, and only if, V = U + W and U ∩ W = {0}. In that case, we also write V = U ⊕ W and we say that V is the direct sum of U and W . — Caveat: While set-theoretic complements A \ B are uniquely determined by A and B, nontrivial direct complements of vector subspaces are never unique, as shown in the following proposition: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 112 5 VECTOR SPACES Proposition 5.29. Let V be a vector space over the field F and let U, W be subspaces of V such that V = U ⊕ W . If there exist 0 6= u ∈ U and 0 6= w ∈ W , then there exists a subspace W̃ 6= W of V with V = U ⊕ W̃ . Proof. According to Th. 5.23(d), there exist bases BU of U and BW of W with u ∈ BU and w ∈ BW . Then v := u + w ∈ / W , since, otherwise u = (u + w) − w ∈ W , in contradiction to U ∩ W = {0}. Let B̃ := {v} ∪ (BW \ {w}) and W̃ := hB̃i. Suppose, x ∈ U ∩ W̃ . Then there exist distinct w1 , . . . , wn ∈ BW \ {w} and λ0 , λ1 , . . . , λn ∈ F , n ∈ N, such that n X x = λ0 (u + w) + λi w i . i=1 Since x ∈ U , we also obtain x − λ0 u ∈ U . Also x − λ0 u = λ0 w + implying n X 0 = x − λ0 u = λ0 w + λi w i . Pn i=1 λi wi ∈ W, i=1 However, w, w1 , . . . , wn ∈ BW are linearly independent, yielding 0 = λ0 = · · · = λn and, thus, x = 0, showing U ∩ W̃ = {0}. To obtain V = U ⊕ W̃ , it remains to show w ∈ U + W̃ (since V = U ⊕ W , U ⊆ U + W̃ , and BW \ {w} ⊆ W̃ ⊆ U + W̃ ). Since w = −u + (u + w) ∈ U + W̃ , the proof is complete. Theorem 5.30. Let V be a vector space over the field F and let U, W be subspaces of V . Moreover, let BU , BW , B+ , B∩ be bases of U, W, U + W, U ∩ W , respectively. (a) U + W = hBU ∪ BW i. (b) If U ∩ W = {0}, then BU ∪˙ BW is a basis of U ⊕ W . (c) It holds that dim(U + W ) + dim(U ∩ W ) = dim U + dim W, (5.38) by which we mean that there exists a bijective map ˙ W × {1}) −→ (B+ × {0}) ∪(B ˙ ∩ × {1}). φ : (BU × {0}) ∪(B (d) Suppose B := BU ∪˙ BW is a basis of V . Then V = U ⊕ W . Proof. (a): Let B := BU ∪BW . If v = u+w with u ∈ U , w ∈ W , then there exist distinct u1 , . . . , un ∈ BU , distinct w1 , . . . , wm ∈ BW (m, n ∈ N), and λ1 , . . . , λn , µ1 , . . . , µm ∈ F such that n m X X v =u+w = λ i ui + µi w i , (5.39) i=1 i=1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 113 6 LINEAR MAPS showing U + W = hBi. (b): Let B := BU ∪ BW . We know U ⊕ W = hBi from (a). Now consider (5.39) with u + w = 0. Then u = −w ∈ U ∩ W = {0}, i.e. u = 0. Then w = 0 as well and 0 = λ1 = · · · = λn = µ1 = · · · = µm , due to the linear independence of the u1 , . . . , un and the linear independence of the w1 , . . . , wm . In consequence, B is linearly independent as well and, thus, a basis of U ⊕ W . (c): According to Th. 5.27(a), we may choose BU and BW such that B∩ ⊆ BU and B∩ ⊆ BW . Let B0 := BU \ B∩ , U0 := hB0 i, B1 := BW \ B∩ , U1 := hB1 i. Then B := BU ∪ BW = B0 ∪˙ B1 ∪˙ B∩ is linearly independent: Consider, once again, the situation of the proof of (b), with v = 0 and, this time, with u1 , . . . , un ∈ B0 . Again, we obtain u = −w ∈ W , i.e. u ∈ U0 ∩ (U ∩ W ) = hB0 i ∩ hB∩ i Prop. 5.14(d) = hB0 ∩ B∩ i = {0}. Thus, 0 = λ1 = · · · = λn = 0, due to the linear independence of B0 . This then implies w = 0 and 0 = µ1 = · · · = µm due to the linear independence of BW , yielding the claimed linear independence of B. According to (a), we also have U + W = hBi, i.e. B is a basis of U + W . Now we are in a position to define ˙ W × {1}) −→ (B × {0}) ∪(B ˙ ∩ × {1}), φ : (BU × {0}) ∪(B (b, 0) for b ∈ BU and α = 0, φ(b, α) := (b, 0) for b ∈ B1 and α = 1, (b, 1) for b ∈ B∩ and α = 1, which is, clearly, bijective. Bearing in mind Th. 5.23(c), this proves (c). (d): As B = BU ∪˙ BW is a basis of V , we can write each v ∈ V as v = u + w with u ∈ U and w ∈ W , showing V = U + W . On the other hand, U ∩ V = hBU i ∩ hBV i Prop. 5.14(d) = hBU ∩ BV i = {0}, proving V = U ⊕ W . 6 Linear Maps 6.1 Basic Properties and Examples In previous sections, we have studied structure-preserving maps, i.e. homomorphisms, in the context of magmas (in particular, groups) and rings (in particular, fields), cf. Def. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 114 6 LINEAR MAPS 4.10 and Def. 4.34(b). In the context of vector spaces, the homomorphisms are so-called linear maps, which we proceed to define and study next. Definition 6.1. Let V and W be vector spaces over the field F . A map A : V −→ W is called vector space homomorphism or F -linear (or merely linear if the field F is understood) if, and only if, ∀ v1 ,v2 ∈V ∧ ∀ λ∈F A(v1 + v2 ) = A(v1 ) + A(v2 ), (6.1a) ∀ (6.1b) v∈V A(λv) = λA(v) or, equivalently, if, and only if, ∀ λ,µ∈F ∀ v1 ,v2 ∈V A(λv1 + µv2 ) = λA(v1 ) + µA(v2 ) (6.2) (note that, in general, vector addition and scalar multiplication will be different on the left-hand sides and right-hand sides of the above equations). Thus, A is linear if, and only if, it is a group homomorphism with respect to addition and also satisfies (6.1b). In particular, if A is linear, then ker A and Im A are defined by Def. 4.19, i.e. ker A = A−1 {0} = {v ∈ V : A(v) = 0}, Im A = A(V ) = {A(v) : v ∈ V }. (6.3a) (6.3b) We denote the set of all F -linear maps from V into W by L(V, W ). The notions vector space (or linear) monomorphism, epimorphism, isomorphism, endomorphism, automorphism are then defined as in Def. 4.10. Moreover, V and W are called isomorphic (denoted V ∼ = W ) if, and only if, there exists a vector space isomorphism A : V −→ W . — Before providing examples of linear maps in Ex. 6.7 below, we first provide and study some basic properties of such maps. Notation 6.2. For the composition of linear maps A and B, we often write BA instead of B ◦ A. For the application of a linear map, we also often write Av instead of A(v). Proposition 6.3. Let V, W, X be vector spaces over the field F . (a) If A : V −→ W and B : W −→ X are linear, then so is BA. (b) If A : V −→ W is a linear isomorphism, then A−1 is a linear isomorphism as well (i.e. A−1 is not only bijective, but also linear). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 115 6 LINEAR MAPS (c) Let A ∈ L(V, W ). If UW is a subspace of W and UV is a subspace of V , then A−1 (UW ) is a subspace of V and A(UV ) is a subspace of W . The most important special cases are that ker A = A−1 ({0}) is a subspace of V and Im A = A(V ) is a subspace of W . (d) A ∈ L(V, W ) is injective if, and only if, ker A = {0}. Proof. (a): We note BA to be a homomorphism with respect to addition by Prop. 4.11(a). Moreover, if v ∈ V and λ ∈ F , then (BA)(λv) = B(A(λv)) = B(λ(Av)) = λ(B(Av)) = λ((BA)(v)), proving the linearity of BA. (b): A−1 is a homomorphism with respect to addition by Prop. 4.11(b). Moreover, if w ∈ W and λ ∈ F , then −1 −1 −1 −1 −1 A λ(A w) = λ(A−1 w), λ A(A w) = A A (λw) = A proving the linearity of A−1 . (c): According to Th. 4.20(a),(d), A−1 (UW ) is a subgroup of V and A(UV ) is a subgroup of W . Since 0 ∈ UW , 0 ∈ A−1 (UW ). Moreover, if λ ∈ F and v ∈ A−1 (UW ), then Av ∈ UW and, thus A(λv) = λ(Av) ∈ UW , since UW is a subspace of W . This shows λv ∈ A−1 (UW ) and that A−1 (UW ) is a subspace of V . If w ∈ Im A, then there exists v ∈ V with Av = w. Thus, if λ ∈ F , then A(λv) = λ(Av) = λw, showing λw ∈ Im A and that Im A is a subspace of W . In particular A(UV ) = Im(A↾UV ) is a subspace of W . (d) is merely a restatement of Th. 4.20(e) for the current situation. Proposition 6.4. Let V be a vector space over the field F and let W be a set with compositions + : W × W −→ W and · : F × W −→ W . If A : V −→ W is an epimorphism with respect to + that also satisfies (6.1b), then W (with + and ·), is a vector space over F as well. Proof. Exercise. Proposition 6.5. Let V and W be vector spaces over the field F , and let A : V −→ W be linear. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 116 6 LINEAR MAPS (a) A is injective if, and only if, for each linearly independent subset S of V , A(S) is a linearly independent subset of W . (b) A is surjective if, and only if, for each generating subset S of V , A(S) is a generating subset of W . (c) A is bijective if, and only if, for each basis B of V , A(B) is a basis of W . Proof. Exercise. Theorem 6.6. Let V and W be vector spaces over the field F . Then each linear map A : V −→ W is uniquely determined by its values on a basis of V . More precisely, if B is a basis of V , (wb )b∈B is a family in W , and, for each v ∈ V , Bv and cv : Bv −→ F \ {0} are as in Th. 5.19 (we now write cv instead of c to underline the dependence of c on v), then the map ! X X cv (b) wb , (6.4) cv (b) b := A : V −→ W, A(v) = A b∈Bv b∈Bv is linear, and à ∈ L(V, W ) with ∀ b∈B Ã(b) = wb , (6.5) implies A = Ã. Proof. We first verify A is linear. Let v ∈ V and λ ∈ F . If λ = 0, then A(λv) = A(0) = 0 = λA(v). If λ 6= 0, then Bλv = Bv , cλv = λcv , and ! ! X X X cv (b) b = λA(v). (6.6a) λ cv (b) wb = λ A cλv (b) b = A(λv) = A b∈Bλv b∈Bv b∈Bv Now let u, v ∈ V . If u = 0, then A(u + v) = A(v) = 0 + A(v) = A(u) + A(v), and analogously if v = 0. So assume u, v 6= 0. If u + v = 0, then v = −u and A(u + v) = A(0) = 0 = A(u) + A(−u) = A(u) + A(v). If u + v 6= 0, then Bu+v ⊆ Bu ∪ Bv and X X cu+v (b) wb cu+v (b) b = A(u + v) = A b∈Bu+v b∈Bu+v = X b∈Bu cu (b) wb + X cv (b) wb = A(u) + A(v). b∈Bv AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (6.6b) 6 LINEAR MAPS 117 If v ∈ V and Bv and cv are as before, then the linearity of à and (6.5) imply ! X X Ã∈L(V,W ) X = cv (b) Ã(b) = cv (b) wb = A(v). cv (b) b Ã(v) = à (6.7) b∈Bv b∈Bv b∈Bv Since (6.7) establishes à = A, the proof is complete. Example 6.7. (a) Let V and W be finite-dimensional vector spaces over the field F , dim V = n, dim W = m, where m, n ∈ N0 . From Th. 6.6, we obtain a very useful characterization of the linear maps between V and W : If V = {0} or W = {0}, then A ≡ 0 is the only element of L(V, W ). Now let b1 , . . . , bn ∈ V form a basis of V and let c1 , . . . , cm ∈ W form a basis of W . Given A ∈ L(V, W ), define ∀ i∈{1,...,n} wi := A(bi ). Then, by Th. 5.19, there exists a unique family (aji )(j,i)∈{1,...,m}×{1,...,n} in F such that m X ∀ wi = A(bi ) = aji cj . (6.8) i∈{1,...,n} Thus, given v = A(v) = A Pn i=1 λi bi ∈ V , λ1 , . . . , λn ∈ F , we find n X λi bi i=1 = m X j=1 j=1 µj cj , ! = n X λi w i = i=1 where µj := n X m X i=1 j=1 n X λi aji cj = n m X X j=1 aji λi . i=1 aji λi ! cj (6.9) i=1 Thus, that the map, Pnassigning to each vector v = Pn from Th. 6.6, we conclude Pm i=1 aji λi is precisely the unique i=1 λi bi ∈ V the vector j=1 µj cj with µj := linear map A : V −→ W , satisfying (6.8). (b) Let V be a vector space over the field F , let B be a basis of V . Given v ∈ V , let Bv and cv : Bv −→ F \ {0} be as in Th. 5.19. For each c ∈ B, define ( cv (c) for c ∈ Bv , πc : V −→ F, πc (v) := 0 for c ∈ / Bv , calling πc the projection onto the coordinate with respect to c. Comparing with (6.4), we see that πc is precisely the linear map A : V −→ W := F , determined by the family (wb )b∈B in F with ( 1 for b = c, (6.10) ∀ wb := δbc := b∈B 0 for b 6= c, AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 118 6 LINEAR MAPS as this yields ∀ v∈V A(v) = A X cv (b) b b∈Bv ! = X cv (b) wb = X cv (b) δbc = b∈Bv b∈Bv ( cv (c) for c ∈ Bv , 0 for c ∈ / Bv . In particular, πc is linear. Moreover, for each c ∈ B, we have Im πc = F, ker πc = hB \ {c}i, V = hci ⊕ ker πc . (6.11) (c) Let I be a nonempty set and let Y be a vector space over the field F . We then know from Ex. 5.2(d) that V := F(I, Y ) = Y I is a vector space over F . For each i ∈ I, define πi : V −→ Y, πi (f ) := f (i), calling πi the projection onto the ith coordinate (note that, for I = {1, . . . , n}, n ∈ N, Y = F , one has V = F n , πi (v1 , . . . , vn ) = vi ). We verify πi to be linear: Let λ ∈ F and f, g ∈ V . Then πi (λf ) = (λf )(i) = λ(f (i)) = λπi (f ), πi (f + g) = (f + g)(i) = f (i) + g(i) = πi (f ) + πi (g), proving πi to be linear. Moreover, for each i ∈ I, we have Im πi = Y, ker πi = {(f : I −→ Y ) : f (i) = 0} ∼ = Y I\{i} , V = Y ⊕ ker πi , (6.12) where, for the last equality, we identified Y ∼ = {(f : I −→ Y ) : f (j) = 0 for each j 6= i}. A generalization of the present example to general Cartesian products of vector spaces can be found in Ex. E.4 of the Appendix. To investigate the relation between the present projections and the projections of (b), let Y = F . We know from Ex. 5.16(c) that B = {ei : i ∈ I} with ( 1 if i = j, ∀ ei : I −→ F, ei (j) = δij := i∈I 0 if i 6= j, I I I is a basis of FP fin , which is a subspace of F , such that the πi are defined on Ffin . n Now, if f = j=1 λij eij with λ1 , . . . , λn ∈ F ; i1 , . . . , in ∈ I; n ∈ N, and πeik is defined as in (b) (k ∈ {1, . . . , n}), then πik (f ) = f (ik ) = n X j=1 λij eij (ik ) = n X λij δjk = λik = πeik (f ), j=1 I = πe . showing πik ↾Ffin ik AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (6.13) 119 6 LINEAR MAPS (d) We consider the set of complex numbers C as vector space over R with basis {1, i}. Then the map of complex conjugation A : C −→ C, A(z) = A(x + iy) = z̄ = x − iy, is R-linear, since, for each z = x + iy, w = u + iv ∈ C, z + w = x+u−iy−iv = z̄+w̄ ∧ zw = xu−yv−(xv+yu)i = (x−iy)(u−iv) = z̄ w̄ and z̄ = z for z ∈ R. As A is bijective with A = A−1 , A is even an automorphism. However, complex conjugation is not C-linear (now considering C as a vector space over itself), since, e.g., i · 1 = −i 6= i · 1̄ = i. (e) By using results from Calculus, we obtain the following examples: (i) Let V be the set of convergent sequences in K. According to [Phi16, Th. 7.13(a)], scalar multiples of convergent sequences are convergent and sums of convergent sequences are convergent, showing V to be a subspace of KN , the vector space over K of all sequences in K. If (zn )n∈N and (wn )n∈N are elements of V and λ ∈ K, then, using [Phi16, Th. 7.13(a)] once again, we know lim (λzn ) = λ lim zn , n→∞ n→∞ lim (zn + wn ) = lim zn + lim wn , n→∞ n→∞ n→∞ showing the map A : V −→ K, A(zn )n∈N := lim zn , n→∞ (6.14) to be linear. Moreover, we have Im A = K, ker A = {(zn )n∈N ∈ V : lim zn = 0}, n→∞ V = h(1)n∈N i ⊕ ker A, (6.15) where the last equality is due to the fact that, if (zn )n∈N ∈ V with limn→∞ zn = λ, then limn→∞ (zn − λ · 1) = 0, i.e. (zn )n∈N − λ(1)n∈N ∈ ker A. (ii) Let a, b ∈ R, a < b, I :=]a, b[, and let V : {(f : I −→ K) : f is differentiable}. According to [Phi16, Th. 9.7], scalar multiples of differentiable functions are differentiable and sums of differentiable functions are differentiable, showing V to be a subspace of KI , the vector space over K of all functions from I into K. Using [Phi16, Th. 9.7] again, we know the map A : V −→ KI , A(f ) := f ′ , (6.16) to be linear. While ker A = {f ∈ V : f constant} (e.g. due to the fundamental theorem of calculus in the form [Phi16, Th. 10.20(b)]), Im A is not so simple to characterize (cf. [Phi17, Th. 2.11]). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 120 6 LINEAR MAPS (iii) Let a, b ∈ R, a ≤ b, I := [a, b], and let W := R(I, K) be the set of all K-valued Riemann integrable functions on I. According to [Phi16, Th. 10.11(a)], scalar multiples of Riemann integrable functions are Riemann integrable and sums of Riemann integrable functions are Riemann integrable, showing W to be another subspace of KI . Using [Phi16, Th. 10.11(a)] again, we know the map Z B : W −→ K, B(f ) := f, (6.17) I to be linear. Moreover, for a < b, we have Z Im B = K, ker B = f ∈ W : f =0 , I W = h1i ⊕ ker B, where the last equality is due to the fact that, if f ∈ W with R λ λ · 1) = 0, i.e. f − b−a · 1 ∈ ker B. (f − b−a I R I (6.18) f = λ, then — For some of the linear maps investigated in Ex. 6.7 above, we have provided the respective kernels, images, and direct complements of the kernels. The examples suggest relations between the dimensions of domain, image, and kernel of a linear map, which are stated an proved in the following theorem: Theorem 6.8 (Dimension Formulas). Let V and W be vector spaces over the field F , and let A : V −→ W be linear. Moreover, let BV be a basis of V , let Bker be a basis of ker A, let BIm be a basis of Im A, and let BW be a basis of W . (a) If U is a direct complement of ker A (i.e. if V = U ⊕ ker A), then A↾U : U −→ Im A is bijective. In consequence, dim V = dim ker A + dim Im A, i.e. there exists a ˙ Im × {1}). bijective map φ : BV −→ (Bker × {0}) ∪(B (b) dim Im A ≤ dim V , i.e. there exists an injective map from BIm into BV . (c) dim Im A ≤ dim W , i.e. there exists an injective map from BIm into BW . Proof. (a): If u ∈ ker A ↾U , then u ∈ U ∩ ker A, i.e. u = 0, i.e. A ↾U is injective. If w ∈ Im A and Av = w, then v = v0 + u with v0 ∈ ker A and u ∈ U . Then Au = A(v) − A(v0 ) = w − 0 = w, showing A↾U is surjective. According to Th. 5.30(b), if BU is a basis of U , then B := Bker ∪˙ BU is a basis of V . Moreover, by Prop. 6.5(c), A(BU ) is a basis of Im A. Thus, by Th. 5.23(c), it suffices to show there exists a bijective ˙ map ψ : B −→ (Bker × {0}) ∪(A(B U ) × {1}). If we define ( (b, 0) for b ∈ Bker , ˙ ψ : B −→ (Bker × {0}) ∪(A(B ψ(b) := U ) × {1}), (A(b), 1) for b ∈ BU , AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 121 6 LINEAR MAPS then ψ is bijective due to the bijectivity of A↾U . (b) is immediate from (a). (c) is due to Th. 5.27(b), since Im A is a subspace of W . The following theorem is one of the central results of Linear Algebra: It says that vector spaces are essentially determined by the size of their bases: Theorem 6.9. Let V and W be vector spaces over the field F . Then V ∼ = W (i.e. V and W are isomorphic) if, and only if, dim V = dim W . Proof. Suppose dim V = dim W . If BV is a basis of V and BW is a basis of W , then there exists a bijective map φ : BV −→ BW . According to Th. 6.6, φ defines a unique linear map A : V −→ W with A(b) = φ(b) for each b ∈ BV . More precisely, letting, once again, for each v ∈ V , Bv and cv : Bv −→ F \ {0} be as in Th. 5.19 (writing cv instead of c to underline the dependence of c on v), ! X X cv (b) b = cv (b) φ(b). (6.19) ∀ A(v) = A v∈V b∈Bv b∈Bv P It remains to show A is bijective. If v 6= 0, then Bv 6= ∅ and A(v) = b∈Bv cv (b) φ(b) 6= 0, since cv (b) 6= 0 and {φ(b) : b ∈ Bv } ⊆ BW is linearly independent, showing A is injective by Prop. 6.3(d). If w ∈ P W , then there exists a finite set B̃w ⊆ BW and c̃w : B̃w −→ F \ {0} such that w = b̃∈B̃w c̃w (b̃) b̃. Then φ−1 (b̃)∈B X X X A∈L(V,W ) V c̃w (b̃) φ φ−1 (b̃) c̃w (b̃) A φ−1 (b̃) = = c̃w (b̃) φ−1 (b̃) A b̃∈B̃w b̃∈B̃w b̃∈B̃w = X c̃w (b̃) b̃ = w, (6.20) b̃∈B̃w showing Im A = W , completing the proof that A is bijective. If A : V −→ W is a linear isomorphism and B is a basis for V , then, by Prop. 6.5(c), A(B) is a basis for W . As A is bijective, so is A↾B , showing dim V = #B = #A(B) = dim W as claimed. Theorem 6.10. Let V and W be finite-dimensional vector spaces over the field F , dim V = dim W = n ∈ N0 . Then, given A ∈ L(V, W ), the following three statements are equivalent: (i) A is an isomorphism. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 122 6 LINEAR MAPS (ii) A is an epimorphism. (iii) A is a monomorphism. Proof. If suffices to prove the equivalence between (ii) and (iii). “(ii)⇒(iii)”: If A is an epimorphism, then W = Im A, implying dim ker A dim V < ∞, Th. 6.8(a) = dim V − dim Im A = dim V − dim W = n − n = 0, showing ker A = {0}, i.e. A is a monomorphism. “(iii)⇒(ii)”: If A is a monomorphism, then ker A = {0}. Thus, by Th. 6.8(a), n = dim W = dim V = dim ker A + dim Im A = 0 + dim Im A = dim Im A. From Th. 5.27(d), we then obtain W = Im A, i.e. A is an epimorphism. Example 6.11. The present example shows that the analogue of Th. 6.10 does not hold for infinite-dimensional vector spaces: Let F be a field and V := F N (the vector space of sequences in F – the example actually still works in precisely the same way over the N simpler space Ffin ). Define the maps R : V −→ V, L : V −→ V, R(λ1 , λ2 , . . . , ) := (0, λ1 , λ2 , . . . , ), L(λ1 , λ2 , . . . , ) := (λ2 , λ3 , . . . ) (R is called the right shift operator, L is called the left shift operator). Clearly, R and L are linear and L ◦ R = Id, i.e. L is a left inverse for R, R is a right inverse for L. Thus, according to Th. 2.13, R is injective and L is surjective (which is also easily seen directly from the respective definitions of R and L). However, Im R = (f : N −→ F ) : f (1) = 0 6= V (e.g. (1, 0, 0, . . . , ) ∈ / Im R), ker L = (f : N −→ F ) : f (k) = 0 for k 6= 1 6= {0} (e.g. 0 6= (1, 0, 0, . . . , ) ∈ ker L), showing R is not surjective, L is not injective. 6.2 Quotient Spaces In Def. 4.26, we defined the quotient group G/N of a group G with respect to a normal subgroup N . Now, if V is a vector space over a field F , then, in particular, (V, +) is a commutative group. Thus, every subgroup of V is normal. Thus, if U is a subspace of AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 123 6 LINEAR MAPS V , then we can always form the quotient group V /U . We will see below that we can even make V /U into a vector space over F , called the quotient space or factor space of V with respect to U . As we write (V, +) as an additive group, we write the respective cosets (i.e. the elements of V /U ) as v + U , v ∈ V . Theorem 6.12. Let V be a vector space over the field F and let U be a subspace of V . (a) The compositions + : V /U × V /U −→ V /U, · : F × V /U −→ V /U, (v + U ) + (w + U ) := v + w + U, λ · (v + U ) := λv + U, (6.21a) (6.21b) are well-defined, i.e. the results do not depend on the chosen representatives of the respective cosets. (b) The natural (group) epimorphism of Th. 4.27(a), φU : V −→ V /U, φU (v) := v + U, (6.22) φU (v + w) = φU (v) + φU (w), (6.23a) satisfies ∀ v,w∈V ∀ λ∈F ∀ v∈V φU (λv) = λ φU (v). (6.23b) (c) V /U with the compositions of (a) forms a vector space over F and φU of (b) constitutes a linear epimorphism. Proof. (a): The composition + is well-defined by Th. 4.27(a). To verify that · is welldefined as well, let v, w ∈ V , λ ∈ F , and assume v + U = w + U . We need to show λv + U = λw + U . If x ∈ λv + U , then there exists u1 ∈ U such that x = λv + u1 . Since v + U = w + U , there then exists u2 ∈ U such that v + u1 = w + u2 and v = w + u2 − u1 . Thus, x = λv+u1 = λw+λ(u2 −u1 )+u1 . Since U is a subspace of V , λ(u2 −u1 )+u1 ∈ U , showing x ∈ λw + U and λv + U ⊆ λw + U . As we can switch the roles of v and w in the above argument, we also have λw + U ⊆ λv + U and λv + U = λw + U , as desired. (b): (6.23a) holds, as φU is a homomorphism with respect to + by Th. 4.27(a). To verify (6.23b), let λ ∈ F , v ∈ V . Then φU (λv) = λv + U (6.21b) = λ(v + U ) = λφU (v), establishing the case. (c) is now immediate from (b) and Prop. 6.4. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 6 LINEAR MAPS 124 One can sometimes obtain information about a vector space V by studying a subspace U and the corresponding quotient space V /U , both of which are “smaller” spaces in the sense of the following Cor. 6.13: Corollary 6.13. Let V be a vector space over the field F and let U be a subspace of V . Let φ : V −→ V /U , φ(v) := v + U , denote the natural epimorphism. (a) If B is a basis of V and BU ⊆ B is a basis of U , then B/ := {b + U : b ∈ B \ BU } is a basis of V /U . (b) Let BU be a basis of U and let W be another subspace of V with basis BW . If φ↾W : W −→ V /U is bijective, then BU ∪˙ BW forms a basis of V . (c) dim V = dim U + dim V /U (cf. Th. 6.8(a)). Proof. (a): Letting W := hB\BU i, we have V = U ⊕W by Th. 5.30(d). Since U = ker φ, we have V = ker φ ⊕ W and φ ↾W : W −→ V /U is bijective by Th. 6.8(a) and, thus, B/ = φ↾W (B \ BU ) is a basis of V /U . (b): Let v ∈ V . Then there exists w ∈ W with v + U = φ(v) = φ(w) = w + U , implying v − w ∈ U . Thus, v = w + v − w ∈ W + U , showing V = W + U . On the other hand, if v ∈ U ∩ W , then φ(v) = v + U = U = φ(0) and, thus, v = 0, showing V = W ⊕ U . Then BU ∪˙ BW forms a basis of V by Th. 5.30(b). (c): Since U = ker φ and V /U = Im φ, (c) is immediate from Th. 6.8(a). From Def. 4.26 and Th. 4.27(a), we already know that the elements of V /U are precisely the equivalence classes of the equivalence relation defined by v ∼ w :⇔ v + U = w + U ⇔ v − w ∈ U . (6.24) ∀ v,w∈V Here, in the context of vector spaces, it turns out that the equivalence relations on V that define quotient spaces, are precisely so-called linear equivalence relations: Definition 6.14. Let V be a vector space over the field F and let R ⊆ V × V be a relation on V . Then R is called linear if, and only if, it has the following two properties: (i) If v, w, x, y ∈ V with vRw and xRy, then (v + x)R(w + y). (ii) If v, w ∈ V and λ ∈ F with vRw, then (λv)R(λw). Proposition 6.15. Let V be a vector space over the field F . (a) If U is a subspace of V , then the equivalence relation defined by (6.24) is linear. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 125 6 LINEAR MAPS (b) Let ∼ be a linear equivalence relation on V . Then U := {v ∈ V : v ∼ 0} is a subspace of V and ∼ satisfies (6.24). Proof. (a): The linearity of ∼ defined by (6.24) is precisely the assertion that the compositions + and · of Th. 6.12(a) are well-defined. (b): We have 0 ∈ U , as ∼ is reflexive. If v, w ∈ U and λ ∈ F , then the linearity of ∼ implies v + w ∼ 0 + 0 = 0 and λv ∼ λ0 = 0, i.e. v + w ∈ U and λv ∈ U , showing U to be a subspace. Now suppose v, w ∈ V with v ∼ w. Then the linearity of ∼ yields v −w ∼ 0, i.e. v − w ∈ U . If x = v + u with u ∈ U , then x = w + (x − w) = w + (v + u − w) ∈ w + U , showing v + U ⊆ w + U . As ∼ is symmetric, we then have w + U ⊆ v + U as well. Conversely, if v, w ∈ V with v + U = w + U , then v − w ∈ U , i.e. v − w ∼ 0 and the linearity of ∼ implies v ∼ w, proving ∼ satisfies (6.24). In Th. 4.27(b), we proved the isomorphism theorem for groups: If G and H are groups and φ : G −→ H is a homomorphism, then G/ ker φ ∼ = Im φ. Since vector spaces V and W are, in particular, groups, if A : V −→ W is linear, then V / ker A ∼ = Im A as groups, and it is natural to ask, whether V / ker A and Im A are isomorphic as vector spaces. In Th. 6.16(a) below we see this, indeed, to be the case. Theorem 6.16 (Isomorphism Theorem). Let V and X be vector spaces over the field F , let U, W be subspaces of V . (a) If A : V −→ X is linear, then V / ker A ∼ = Im A. (6.25) More precisely, the map f : V / ker A −→ Im A, f (v + ker A) := A(v), (6.26) is well-defined and constitutes an linear isomorphism. If fe : V −→ V / ker A denotes the natural epimorphism and ι : Im A −→ X, ι(v) := v, denotes the embedding, then fm : V / ker A −→ X, fm := ι ◦ f , is a linear monomorphism such that A = fm ◦ fe . (6.27) (b) One has (U + W )/W ∼ = U/(U ∩ W ). (6.28) (c) If U is a subspace of W , then (V /U )/(W/U ) ∼ = V /W. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (6.29) 126 6 LINEAR MAPS Proof. (a): All assertions, except the linearity assertions, were already proved in Th. 4.27(b). Moreover, fe is linear by Th. 6.12(b). Thus, it merely remains to show f λ(v + ker A) = λ f (v + ker A) for each λ ∈ F and each v ∈ V (then fm is linear, as both f and ι are linear). Indeed, if λ ∈ F and v ∈ V , then f λ(v + ker A) = f (λv + ker A) = A(λv) = λ A(v) = λ f (v + ker A), as desired. (b): According to (a), it suffices to show that A : U + W −→ U/(U ∩ W ), A(u + w) := u + (U ∩ W ) for u ∈ U , w ∈ W (6.30) well-defines a linear map with ker A = W . To verify that A is well-defined, let u, u1 ∈ U and w, w1 ∈ W with u + w = u1 + w1 . Then u − u1 = w1 − w ∈ U ∩ W and, thus, A(u + w) = u + (U ∩ W ) = u1 + w1 − w + (U ∩ W ) = u1 + (U ∩ W ) = A(u1 + w1 ), proving A to be well-defined by (6.30). To prove linearity, we no longer assume u + w = u1 + w1 and calculate A (u + w) + (u1 + w1 ) = A (u + u1 ) + (w + w1 ) = u + u1 + (U ∩ W ) = u + (U ∩ W ) + u1 + (U ∩ W ) = A(u + w) + A(u1 + w1 ), as well as, for each λ ∈ F , A λ(u + w) = A(λu + λw) = λu + (U ∩ W ) = λ u + (U ∩ W ) = λ A(u + w), as needed. If w ∈ W , then A(w) = A(0+w) = U ∩W , showing w ∈ ker A and W ⊆ ker A. Conversely, let u + w ∈ ker A, u ∈ U , w ∈ W . Then A(u + w) = u + (U ∩ W ) = U ∩ W , i.e. u ∈ U ∩ W and u + w ∈ W , showing ker A ⊆ W . (c): Exercise. 6.3 Vector Spaces of Linear Maps Definition 6.17. Let V and W be vector spaces over the field F . We define an addition and a scalar multiplication on L(V, W ) by (A + B) : V −→ W, (λ · A) : V −→ W, (A + B)(x) := A(x) + B(x), (λ · A)(x) := λ · A(x) for each λ ∈ F . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (6.31a) (6.31b) 127 6 LINEAR MAPS Theorem 6.18. Let V and W be vector spaces over the field F . The addition and scalar multiplication on L(V, W ) given by (6.31) are well-defined in the sense that, if A, B ∈ L(V, W ) and λ ∈ F , then A + B ∈ L(V, W ) and λA ∈ L(V, W ). Moreover, with the operations defined in (6.31), L(V, W ) forms a vector space over F . Proof. Exercise. Theorem 6.19. Let V and W be vector spaces over the field F , let BV and BW be bases of V and W , respectively. Using Th. 6.6, define maps Avw ∈ L(V, W ) by letting ( w for v = ṽ, ∀ Avw (ṽ) := (6.32) w,v,ṽ∈BW ×BV ×BV 0 for v 6= ṽ. Define B := Avw : (v, w) ∈ BV × BW . (6.33) (a) B is linearly independent. (b) If V is finite-dimensional, dim V = n ∈ N, BV = {v1 , . . . , vn }, then B constitutes a basis for L(V, W ). If, in addition, dim W = m ∈ N, BW = {w1 , . . . , wm }, then we can write dim L(V, W ) = dim V · dim W = n · m. (6.34) (c) If dim V = ∞ and dim W ≥ 1, then hBi ( L(V, W ) and, in particular, B is not a basis of L(V, W ). Proof. (a): Let N, M ∈ N. Let v1 , . . . , vN ∈ BV be distinct and let w1 , . . . , wM ∈ BW be distinct as well. If λji ∈ F , (j, i) ∈ {1, . . . , M } × {1, . . . , N }, are such that A := M X N X λji Avi wj = 0, j=1 i=1 then, for each k ∈ {1, . . . , N }, 0 = A(vk ) = M X N X j=1 i=1 λji Avi wj (vk ) = M X N X j=1 i=1 λji δik wj = M X λjk wj j=1 implies λ1k = · · · = λM k = 0 due to the linear independence of the wj . As this holds for each k ∈ {1, . . . , N }, we have established the linear independence of B. (b): According to (a), it remains to show hBi = L(V, W ). Let A P ∈ L(V, W ) and i ∈ {1, . . . , n}. Then there exists a finite set Bi ⊆ BW such that Avi = w∈Bi λw w with AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 128 6 LINEAR MAPS λw ∈ F . Now let w1 , . . . , wM , M ∈ N, be an enumeration of the finite set B1 ∪ · · · ∪ Bn . Then there exist aji ∈ F , (j, i) ∈ {1, . . . , M } × {1, . . . , n}, such that ∀ i∈{1,...,n} Letting L := ∀ k∈{1,...,n} PM Pn j=1 A(vi ) = M X aji wj . j=1 i=1 aji Avi wj , we claim A = L. Indeed, L(vk ) = M X n X j=1 i=1 aji Avi wj (vk ) = M X n X aji δik wj = j=1 i=1 M X ajk wj = A(vk ), j=1 proving L = A by Th. 6.6. Since L ∈ hBi, the proof of (b) is complete. (c): As dim W ≥ 1, there exists w ∈ BW . If A ∈ hBi, then {v ∈ BV : Av 6= 0} is finite. Thus, if BV is infinite, then the map A ∈ L(V, W ) with A(v) := w for each v ∈ BV is not in hBi, proving (c). Lemma 6.20. Let V, W, X be vector spaces over the field F . Recalling Not. 6.2, we have: (a) If A ∈ L(W, X) and B, C ∈ L(V, W ), then A(B + C) = AB + AC. (6.35) (b) If A, B ∈ L(W, X) and C ∈ L(V, W ), then (A + B)C = AC + BC. (6.36) Proof. (a): For each v ∈ V , we calculate A(B + C) v = A(Bv + Cv) = A(Bv) + A(Cv) = (AB)v + (AC)v = (AB + AC)v, proving (6.35). (b): For each v ∈ V , we calculate (A + B)C v = (A + B)(Cv) = A(Cv) + B(Cv) = (AC)v + (BC)v = (AC + BC)v, proving (6.36). Theorem 6.21. Let V be a vector space over the field F . Then R := L(V, V ) constitutes a ring with unity with respect to pointwise addition and map composition as multiplication. If dim V > 1, then R is not commutative and it has nontrivial divisors of 0. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 6 LINEAR MAPS 129 Proof. Since (V, +) is a commutative group, R is a subring (with unity Id ∈ R) of the ring with unity of group endomorphisms End(V ) of Ex. 4.42(b) (that R constitutes a ring can, alternatively, also be obtained from Lem. 6.20 with V = W = X). Now let v1 , v2 ∈ V be linearly independent and B a basis of V with v1 , v2 ∈ B. Define A1 , A2 ∈ R by letting, for v ∈ B, ( ( v1 for v = v2 , v2 for v = v1 , (6.37) A2 v := A1 v := 0 otherwise. 0 otherwise, Then A1 A2 v1 = 0, but A2 A1 v1 = A2 v2 = v1 , showing A1 A2 6= A2 A1 . Moreover, A21 v1 = A1 v2 = 0 and A22 v2 = A2 v1 = 0, showing A21 ≡ 0 and A22 ≡ 0. In particular, both A1 and A2 are nontrivial divisors of 0. Definition 6.22. Let V be a vector space over the field F . (a) A linear endomorphism A ∈ L(V, V ) is called regular if, and only if, it is an automorphism (i.e. if, and only if, it is bijective/invertible). Otherwise, A is called singular. (b) The set GL(V ) := {A ∈ L(V, V ) : A is regular} = L(V, V )∗ (cf. Def. and Rem. 4.41) is called the general linear group of V (cf. Cor. 6.23 below). Corollary 6.23. Let V be a vector space over the field F . Then the general linear group GL(V ) is, indeed, a group with respect to map composition. For V 6= {0}, it is not a group with respect to addition (note 0 ∈ / GL(V ) in this case). If dim V ≥ 3 or F 6= {0, 1} and dim V ≥ 2, then the group GL(V ) is not commutative. Proof. Since L(V, V ) is a ring with unity, we know GL(V ) = L(V, V )∗ to be a group by Def. and Rem. 4.41 – since the composition of linear maps is linear, GL(V ) is also a subgroup of the symmetric group SV (cf. Ex. 4.9(b)). If dim V ≥ 3, then the noncommutativity of GL(V ) follows as in Ex. 4.9(b), where the elements a, b, c of (4.7) are now taken as distinct elements of a basis of V . Now let v1 , v2 ∈ V be linearly independent and B a basis of V with v1 , v2 ∈ B. Define A1 , A2 ∈ L(V, V ) by letting, for v ∈ B, v2 for v = v1 , −v1 for v = v1 , A1 v := v1 for v = v2 , A2 v := v2 (6.38) for v = v2 , v otherwise, v otherwise. −1 Clearly, A1 , A2 ∈ GL(V ) with A−1 1 = A1 and A2 = A2 . Moreover, A1 A2 v1 = −A1 v1 = −v2 , A2 A1 v1 = A2 v2 = v2 . If F 6= {0, 1}, then v2 6= −v2 , showing A1 A2 6= A2 A1 . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 130 7 MATRICES 7 Matrices 7.1 Definition and Arithmetic Matrices provide a convenient representationfor linear maps A between finite-dimensional vector spaces V and W . Recall the basis Avi wj : (j, i) ∈ {1, . . . , m} × {1, . . . , n} of L(V, W ) that, in Th. 6.19, was shown to arise from bases {v1 , . . . , vn } and {w1 , . . . , wm } of V and W , respectively; m, n ∈ N. Thus, each A ∈ L(V, W ) can be written in the form m X n X A= aji Avi wj , (7.1) j=1 i=1 with coordinates (aji )(j,i)∈{1,...,m}×{1,...,n} in F (also cf. Ex. 6.7(a)). This motivates the following definition of matrices, where, however, instead of F , we allow F an arbitrary nonempty set S, as it turns out to be sometimes useful to have matrices with entries that are not necessarily elements of a field. Definition 7.1. Let S be a nonempty set and m, n ∈ N. A family (aji )(j,i)∈{1,...,m}×{1,...,n} in S is called an m-by-n or an m × n matrix over S, where m × n is called the size, dimension or type of the matrix. The aji are called the entries or elements of the matrix. One also writes just (aji ) instead of (aji )(j,i)∈{1,...,m}×{1,...,n} if the size of the matrix is understood. One usually thinks of the m × n matrix (aji ) as the rectangular array a11 . . . a1n .. .. (aji ) = ... (7.2) . . am1 . . . amn with m rows and n columns. One therefore also calls 1 × n matrices row vectors and m × 1 matrices column vectors, and one calls n × n matrices quadratic. One calls the elements akk the (main) diagonal elements of (aji ) and one also says that these elements lie on the (main) diagonal of the matrix and that they form the (main) diagonal of the matrix. The set of all m × n matrices over S is denoted by M(m, n, S), and for the set of all quadratic n × n matrices, one uses the abbreviation M(n, S) := M(n, n, S). Definition 7.2 (Matrix Arithmetic). Let F be a field and m, n, l ∈ N. Let S be a ring or a vector space over F (e.g. S = F ). (a) Matrix Addition: For m × n matrices (aji ) and (bji ) over S, define the sum (aji ) + (bji ) := (aji + bji ) ∈ M(m, n, S). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (7.3) 131 7 MATRICES (b) Scalar Multiplication: Let (aji ) be an m × n matrix over S. If S is a ring, then let λ ∈ S; if S is a vector space over F , then let λ ∈ F . Define λ (aji ) := (λ aji ) ∈ M(m, n, S). (7.4) (c) Matrix Multiplication: Let S be a ring. For each m × n matrix (aji ) and each n × l matrix (bji ) over S, define the product ! n X (aji )(bji ) := ajk bki ∈ M(m, l, S), (7.5) k=1 (j,i)∈{1,...,m}×{1,...,l} i.e. the product of an m × n matrix and an n × l matrix is an m × l matrix (cf. Th. 7.13 below). Example 7.3. As an example of matrix multiplication, we compute −3 0 1 · (−3) + (−1) · 1 + 0 · 0 1 · 0 + (−1) · 1 + 0 · (−1) 1 −1 0 1 1 = 2 · (−3) + 0 · 1 + (−2) · 0 2 · 0 + 0 · 1 + (−2) · (−1) 2 0 −2 0 −1 −4 −1 . = −6 2 Remark 7.4. Let S be a nonempty set and m, n ∈ N. (a) An m × n matrix A := (aji )(j,i)∈{1,...,m}×{1,...,n} is defined as a family in S, i.e., recalling Def. 2.15(a), A is defined as the function A : {1, . . . , m}×{1, . . . , n} −→ S, A(j, i) = aji ; and M(m, n, S) = F {1, . . . , m} × {1, . . . , n}, S (so we notice that matrices are nothing new in terms of objects, but just a new way of thinking about functions from {1, . . . , m} × {1, . . . , n} into S, that turns out to be convenient in certain contexts). Thus, if F is a field and S = Y is a vector space over F , then the operations defined in Def. 7.2(a),(b) are precisely the same operations that were defined in (5.5) and M(m, n, Y ) is a vector space according to Ex. 5.2(d). Clearly, the map I : M(m, n, Y ) −→ Y m·n , (aji ) 7→ (y1 , . . . , ym·n ), (7.6) where yk = aji if, and only if, k = (j − 1) · n + i, constitutes a linear isomorphism. For Y = F , other important linear isomorphisms between M(m, n, F ) and vector spaces of linear maps will be provided in Th. 7.10(a) below. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 132 7 MATRICES (b) Let A := (aji )(j,i)∈{1,...,m}×{1,...,n} be an m × n matrix over S. Then, for each i ∈ {1, . . . , n}, the ith column of A, a1i .. A ci := ci := . ami can be considered both as an m × 1 matrix and as an element of S m . In particular, if S = F is a field, then one calls ci the ith column vector of A. Analogously, for each j ∈ {1, . . . , m}, the jth row of A, rj := rjA := aj1 . . . ajn can be considered both as a 1 × n matrix and as an element of S n . In particular, if S = F is a field, then one calls rj the jth row vector of A. — It can sometimes be useful to think of matrix multiplication in terms of the matrix columns and rows of Rem. 7.4(b) in a number of different ways, as compiled in the following lemma: Lemma 7.5. Let S be a ring, let l, m, n ∈ N, let A := (aji ) be an m × n matrix over A A A S, and let B := (bji ) be an n × l matrix over S. Moreover, let cA 1 , . . . , cn and r1 , . . . , rm B B B denote the columns and rows of A, respectively, let cB 1 , . . . , cl and r1 , . . . , rn denote the columns and rows of B (cf. Rem. 7.4(b)), i.e. r1B r1A .. . . . cB . . . cA B = (bji ) = cB = ... . A = (aji ) = cA 1 1 n = . , l A rnB rm (a) Consider v1 .. v := . ∈ M(n, 1, S) ∼ = S n, vn Then Av = n X k=1 w := w1 . . . wn ∈ M(1, n, S) ∼ = S n. cA k vk ∈ M(m, 1, S), wB = n X k=1 wk rkB ∈ M(1, l, S), where we wrote cA k vk , as we do not assume S to be a commutative ring – if S is commutative (e.g. a field), then the more familiar form vk cA k is also admissible. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 133 7 MATRICES (b) Columnwise Matrix Multiplication: One has ∀ i∈{1,...,l} cAB = A cB i i . (c) Rowwise Matrix Multiplication: One has ∀ j∈{1,...,m} rjAB = rjA B. Proof. (a): Let x := Av, y := wB. We then have a1k x1 .. y = y1 . . . yl , x = ... , cA k = . , amk xm rkB = bk1 . . . bkl , yielding ∀ j∈{1,...,m} ∀ i∈{1,...,l} xj = n X ajk vk = yi = cA k vk k=1 k=1 n X n X wk bki = k=1 n X ! wk rkB k=1 , j ! , i thereby proving (a). (b): For each i ∈ {1, . . . , l}, j ∈ {1, . . . , m}, one computes (cAB i )j = (AB)ji = n X ajk bki = (A cB i )j , k=1 proving (b). (c): For each j ∈ {1, . . . , m}, i ∈ {1, . . . , l}, one computes (rjAB )i = (AB)ji = n X ajk bki = (rjA B)i , k=1 proving (c). Example 7.6. Using the matrices from Ex. 7.3, we have −3 0 −1 1 1 −1 0 −4 1 ·0 ·1+ · (−3) + = = −2 0 2 2 0 −2 −6 0 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 134 7 MATRICES and −3 0 1 = 1 · −3 0 + (−1) · 1 1 + 0 · 0 −1 . −4 −1 = 1 −1 0 1 0 −1 Theorem 7.7. Let S be a ring. (a) Matrix multiplication of matrices over S is associative whenever all relevant multiplications are defined. More precisely, if A = (aji ) is an m × n matrix, B = (bji ) is an n × l, and C = (cji ) is an l × p matrix, then (AB)C = A(BC). (7.7) (b) Matrix multiplication of matrices over S is distributive whenever all relevant multiplications are defined. More precisely, if A = (aji ) is an m × n matrix, B = (bji ) and C = (cji ) are n × l matrices, and D = (dji ) is an l × p matrix, then A(B + C) = AB + AC, (B + C)D = BD + CD. (7.8a) (7.8b) (c) For each n ∈ N, M(n, S) is a ring. If S is a ring with unity, then M(n, S) is a ring with unity as well, where the additive neutral element is the zero matrix 0 ∈ M(n, S) (all entries are zero) and the multiplicative neutral element of M(n, S) is the so-called identity matrix Idn := (δji ), ( 1 1 for i = j, (7.9) Idn = . . . , δji := 0 for i 6= j, 1 where the usual meaning of such notation is that all omitted entries are 0. We call GLn (S) := M(n, S)∗ , i.e. the group of invertible matrices over S, general linear group (we will see in Th. 7.10(a) below that, if S = F is a field and V is a vector space of dimension n over F , then GLn (F ) is isomorphic to the general linear group GL(V ) of Def. 6.22(b)). Analogous to Def. 6.22(a), we call the matrices in GLn (S) regular and the matrices in M(n, S) \ GLn (S) singular. Proof. (a): One has m × p matrices (AB)C = (dji ) and A(BC) = (eji ), where, using associativity and distributivity in S, one computes ! ! l l X n l n n X X X X X bkα cαi = eji , dji = ajk bkα cαi = ajk ajk bkα cαi = α=1 k=1 α=1 k=1 k=1 α=1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 135 7 MATRICES thereby proving (7.7). (b): Exercise. (c): While M(n, S) is a ring due to (a), (b), and Ex. 4.9(e) (which says that (M(n, S), +) is a commutative group), we check the neutrality of Idn : For each A = (aji ) ∈ M(n, S), we obtain ! ! n n X X ajk δki = A Idn , δjk aki = (aji ) = A = Idn A = k=1 k=1 thereby establishing the case. Caveat 7.8. Matrix multiplication of matrices over a ring S is, in general, not commutative, even if S is commutative: If A is an m × n matrix and B is an n × l matrix with m 6= l, then BA is not even defined. If m = l, but m 6= n, then AB has dimension m × m, but BA has different dimension, namely n × n. And even if m = n = l > 1, then commutativity is, in general, not true – for example, if S is a ring with unity and 0 6= 1 ∈ S, then λ 0 ... 0 1 0 ... 0 1 1 ... 1 0 0 . . . 0 1 0 . . . 0 0 0 . . . 0 (7.10a) .. .. .. .. .. .. .. .. = .. .. .. .. , . . . . . . . . . . . . 0 0 ... 0 1 0 ... 0 0 0 ... 0 1 1 ... 1 1 1 ... 1 1 0 ... 0 1 0 . . . 0 0 0 . . . 0 1 1 . . . 1 (7.10b) .. .. .. .. .. .. .. .. = .. .. .. .. . . . . . . . . . . . . . 1 1 ... 1 0 0 ... 0 1 0 ... 0 Note that λ = m for S = R or S = Z, but, in general, λ will depend on S, e.g. for S = {0, 1}, one obtains λ = m mod 2. 7.2 Matrices as Representations of Linear Maps Remark 7.9. Coming back to the situation Pn discussed at the beginning of the previous section above, resulting in (7.1), let v = i=1 λi vi ∈ V with λ1 , . . . , λn ∈ F . Then (also cf. the calculation in (6.9)) n n m X n n m X X X X (6.32) X A(v) = λi A(vi ) = λi ajk Avk wj (vi ) = λi aji wj i=1 = m X j=1 i=1 n X i=1 j=1 k=1 i=1 ! aji λi wj . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 j=1 (7.11) 136 7 MATRICES Thus, if we represent v by a column vector ṽ (an n×1 matrix) containing its coordinates λ1 , . . . , λn with respect to the basis {v1 , . . . , vn } and A(v) by a column vector w̃ (an m×1 matrix) containing its coordinates with respect to the basis {w1 , . . . , wm }, then (7.11) shows λ1 .. (7.12) w̃ = M ṽ = M . , where M := (aji ) ∈ M(m, n, F ). λn — For finite-dimensional vector spaces, the precise relationship between linear maps, bases, and matrices is provided by the following theorem: Theorem 7.10. Let V and W be finite-dimensional vector spaces over the field F , let BV := {v1 , . . . , vn } and BW := {w1 , . . . , wm } be bases of V and W , respectively; m, n ∈ N, n = dim V , m = dim W . (a) The map I := I(BV , BW ), I : L(V, W ) −→ M(m, n, F ), A 7→ (aji ), (7.13) where the aji are given by (7.1), i.e. by A= m X n X aji Avi wj , j=1 i=1 constitutes a linear isomorphism. Moreover, in the special case V = W , vi = wi for i ∈ {1, . . . , n}, the map I also constitutes a ring isomorphism I : L(V, V ) ∼ = M(n, F ); its restriction to GL(V ) then constitutes a group isomorphism I : GL(V ) ∼ = GLn (F ). (b) Let aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n}. Moreover, let A ∈ L(V, W ). Then (7.1) holds if, and only if, ∀ i∈{1,...,n} Avi = m X aji wj . (7.14) j=1 Proof. (a): According to Th. 6.19, Avi wj : (j, i) ∈ {1, .. . , m} × {1, . . . , n} forms a basis of L(V, W ). Thus, to every family of coordinates aji : (j, i) ∈ {1, . . . , m} × AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 137 7 MATRICES {1, . . . , n} in F , (7.1) defines a unique element of L(V, W ), i.e. I is bijective. It remains to verify that I is linear. To this end, let λ, µ ∈ F and A, B ∈ L(V, W ) with A= B= m X n X j=1 i=1 m X n X aji Avi wj , (aji ) = I(A) ∈ M(m, n, F ), bji Avi wj , (bji ) = I(B) ∈ M(m, n, F ). j=1 i=1 Then λA + µB = λ m X n X aji Avi wj + µ j=1 i=1 m X n X bji Avi wj = j=1 i=1 m X n X (λaji + µbji )Avi wj , j=1 i=1 showing I(λA + µB) = (λaji + µbji ) = λ(aji ) + µ(bji ) = λI(A) + µI(B), proving the linearity of I. Now let V = W and vi = wi for each i ∈ {1, . . . , n}. We claim n X n n X X BA = cji Avi vj , where cji = bjk aki : (7.15) j=1 i=1 k=1 Indeed, for each l ∈ {1, . . . , n}, one calculates ! ! n n n X n n n X X X X X akl vk = aki Avi vk vl = B akl vk bji Avi vj (BA)vl = B = k=1 i=1 n n XX bjk akl vj = j=1 k=1 n X cjl vj = j=1 k=1 n X n X j=1 i=1 k=1 cji Avi vj vl , j=1 i=1 thereby proving (7.15). Thus, I(BA) = n X k=1 bjk aki ! = (bji )(aji ) = I(B)I(A), proving I to be a ring isomorphism. Since, clearly, I(GL(V )) = GLn (F ), the proof of (a) is complete. (b): If (7.1) holds, then the calculation (7.11) (with λk := δik ) proves (7.14). Conversely, assume (7.14). It was then shown in the proof of Th. 6.19(b) that (7.1) must hold. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 138 7 MATRICES Definition and Remark 7.11. In the situation of Th. 7.10, for each A ∈ L(V, W ), one calls the matrix I(A) = (aji ) ∈ M(m, n, F ) the (transformation) matrix corresponding to A with respect to the basis {v1 , . . . , vn } of V and the basis {w1 , . . . , wm } of W . If the bases are understood, then one often tends to identify the map with its corresponding matrix. However, as I(A) depends on the bases, identifying A and I(A) is only admissible as long as one keeps the bases of V and W fixed! Moreover, if one represents matrices as rectangular arrays as in (7.2) (which one usually does), then one actually considers the basis vectors of {v1 , . . . , vn } and {w1 , . . . , wm } as ordered from 1 to n (resp. m), i.e. I(A) actually depends on the so-called ordered bases (v1 , . . . , vn ) and (w1 , . . . , wm ) (ordered bases are tuples rather than sets and the matrix corresponding to A changes if the order of the basis vectors changes). P Similarly, we had seen in (7.12) that it can be useful to identify a vector v = ni=1 λi vi with its coordinates (λ1 , . . . , λn ), typically represented as an n × 1 matrix (a column vector, as in (7.12)) or a 1 × n matrix (a row vector). Obviously, this identification is also only admissible as long as the basis {v1 , . . . , vn } and its order is kept fixed. Example 7.12. Let V := Q4 with ordered basis BV := (v1 , v2 , v3 , v4 ) and let W := Q2 with ordered basis BW := (w1 , w2 ). Moreover, assume A ∈ L(V, W ) satisfies Av1 Av2 Av3 Av4 = w1 = 2w1 − w2 = 3w1 = 4w1 − 2w2 . According to Th. 7.10(a),(b), the coefficients on the right-hand side of the above equations provide the columns of the matrix representing A: The matrix I(A) of A with respect to BV and BW (I according to Th. 7.10(a)) is 1 2 3 4 . I(A) = 0 −1 0 −2 If we switch v1 and v4 in BV to obtain BV′ := (v4 , v2 , v3 , v1 ), and we switch w1 and w2 in ′ ′ BW to obtain BW := (w2 , w1 ), then the matrix I ′ (A) of A with respect to BV′ and BW is obtained from I(A) by switching columns 1 and 4 as well as rows 1 and 2, resulting in −2 −1 0 0 ′ . I (A) = 4 2 3 1 Suppose, we want to determine Av for the vector v := −v1 + 3v2 − 2v3 + v4 . Then, AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 139 7 MATRICES according to (7.12), we can do that via the matrix multiplication −1 −1 3 3 3 1 2 3 4 , . = I(A) = −5 −2 0 −1 0 −2 −2 1 1 obtaining Av = 3w1 − 5w2 . — The following Th. 7.13 is the justification for defining matrix multiplication according to Def. 7.2(c). Theorem 7.13. Let F be a field, let n, m, l ∈ N, and let V, W, X be finite-dimensional vector spaces over F , dim V = n, dim W = m, dim X = l, with ordered bases, BV := (v1 , . . . , vn ), BW := (w1 , . . . , wm ), and BX := (x1 , . . . , xl ), respectively. If A ∈ L(V, W ), B ∈ L(W, X), M = (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with respect to BV and BW , and N = (bji ) ∈ M(l, P m, F ) is the matrix corresponding to B with respect to BW and BX , then N M = ( m k=1 bjk aki ) ∈ M(l, n, F ) is the matrix corresponding to BA with respect to BV and BX . Proof. For each i ∈ {1, . . . , n}, one computes ! m m l m X X X X aki B(wk ) = (BA)(vi ) = B A(vi ) = B aki bjk xj aki wk = k=1 k=1 = l X m X j=1 k=1 bjk aki xj = l X j=1 m X k=1 k=1 ! bjk aki xj , j=1 (7.16) Pm proving N M = ( k=1 bjk aki ) is the matrix corresponding to BA with respect to the bases {v1 , . . . , vn } and {x1 , . . . , xl }. As we have seen, given A ∈ L(V, W ), the matrix I(A) representing A depends on the chosen (ordered) bases of V and W . It will be one goal of Linear Algebra II to find methods to determine bases of V and W such that the form of I(A) becomes particularly simple. In the following Th. 7.14, we show how I(A) changes for given basis transitions for V and W , respectively. Theorem 7.14. Let V and W be finite-dimensional vector spaces over the field F , m, n ∈ N, dim V = n, dim W = m. Let BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) be ′ ordered bases of V and W , respectively. Moreover, let BV′ := (v1′ , . . . , vn′ ) and BW := AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 140 7 MATRICES ′ (w1′ , . . . , wm ) also be ordered bases of V and W , respectively, and let cji , dji , fji ∈ F be such that ∀ vi′ = ∀ wi′ = i∈{1,...,n} i∈{1,...,m} n X j=1 m X cji vj , (7.17a) fji wj , wi = j=1 m X dji wj′ , (7.17b) j=1 where (dji ) = (fji )−1 ∈ GLm (F ) (we then call (cji ) ∈ GLn (F ) and (fji ) ∈ GLm (F ) the transition matrices corresponding to the basis transitions from BV to BV′ and from BW ′ to BW , respectively). If A ∈ L(V, W ) has I(BV , BW )(A) = (aji ) ∈ M(m, n, F ) with I(BV , BW ) as in Th. 7.10(a), then ′ I(BV′ , BW )(A) = (eji ), where (eji ) = (dji )(aji )(cji ) = (fji )−1 (aji )(cji ). (7.18) In particular, in the special case, where V = W , vi = wi , vi′ = wi′ for each i ∈ {1, . . . , n}, one has I(BV′ , BV′ )(A) = (eji ), where (eji ) = (cji )−1 (aji )(cji ). (7.19) Proof. For each i ∈ {1, . . . , n}, we compute ! m n m m n n n X X X X X X (7.14) X ′ djk wj′ cli vl = akl akl wk = Avi = A cli Avl = cli cli l=1 = m X j=1 n X l=1 l=1 m X k=1 djk akl l=1 ! k=1 l=1 k=1 j=1 ! cli wj′ , i.e. (7.18) holds in consequence of Th. 7.10(b),(a). Example 7.15. We explicitly write the transformations and matrices of Th. 7.14 for the situation of Ex. 7.12: There we had V := Q4 with ordered bases BV := (v1 , v2 , v3 , v4 ), ′ BV′ := (v4 , v2 , v3 , v1 ), and W := Q2 with ordered bases BW := (w1 , w2 ), BW := (w2 , w1 ). Thus, v1′ = v4 0 0 0 1 0 1 0 0 v2′ = v2 (c ) = ji 0 0 1 0 v3′ = v3 v4′ = v1 , 1 0 0 0 and w2 w1′ = ′ w2 = w1 , (fji ) = 0 1 1 0 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 141 7 MATRICES and w1 = w2′ ′ w2 = w1 , (dji ) = 0 1 . 1 0 Moreover, A ∈ L(V, W ) in Ex. 7.12 was such that 1 2 3 4 . (aji ) = I(BV , BW )(A) = 0 −1 0 −2 Thus, according to (7.18), ′ )(A) = I(BV′ , BW 0 1 = 0 1 0 0 0 1 1 2 3 4 1 0 1 0 0 0 −1 0 −2 0 0 1 0 0 1 0 0 0 −2 −1 0 0 4 2 3 1 1 , = 4 2 3 1 −2 −1 0 0 0 which is precisely I ′ (A) of Ex. 7.12. 7.3 Rank and Transpose Definition 7.16. Let F be a field and m, n ∈ N. (a) Let V and W be vector spaces over F and A ∈ L(V, W ). Then rk(A) := dim Im A is called the rank of A. (b) Let (aji )(j,i)∈{1,...,m}×{1,...,n} ∈ M(m, n, F ) and consider the column vectors and row vectors of (aji ) as elements of the vector spaces F m and F n , respectively (cf. Rem. 7.4(b)). Then + * a1i .. (7.20a) rk(aji ) := dim . : i ∈ {1, . . . , n} a mi is called the column rank or just the rank of (aji ). Analogously, Dn oE aj1 . . . ajn : j ∈ {1, . . . , m} rrk(aji ) := dim is called the row rank of (aji ). — AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (7.20b) 142 7 MATRICES A main goal of the present section is to show that, if V and W are finite-dimensional vector spaces, A ∈ L(V, W ), and (aji ) is the matrix representing A with respect to chosen ordered bases of V and W , respectively, then rk(A) = rk(aji ) = rrk(aji ) (cf. Th. 7.23 below). Theorem 7.17. Let F be a field, m, n ∈ N. (a) Let V, W be finite-dimensional vector spaces over F , dim V = n, dim W = m, where BV := (v1 , . . . , vn ) is an ordered basis of V and BW := (w1 , . . . , wm ) is an ordered basis of W . If A ∈ L(V, W ) and (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with respect to BV and BW , then rk(A) = rk(aji ). (b) If the matrices (cji ) ∈ M(n, n, F ) and (dji ) ∈ M(m, m, F ) are regular and (aji ) ∈ M(m, n, F ), then rk(aji ) = rk (dji )(aji )(cji ) . Proof. (a) is basically due to Lem. 7.5(a) and Th. 6.9: Let B : W −→ F m be the linear isomorphism with B(wj ) = ej , where {e1 , . . . , em } is the standard basis of F m (cf. Ex. 5.16(c)). Then rk(A) = dim Im A = dim B(Im A) = dim Im(BA) = rk(BA). Since ∀ i∈{1,...,n} m X (BA)vi = B aji wj j=1 ! (7.21) a1i = ... = cA i , ami A we have Im(BA) = h{cA 1 , . . . , cn }i, which, together with (7.21), proves (a). (b): Let BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) be ordered bases of V := F n and W := F m , respectively. Moreover, let A ∈ L(V, W ) be defined by (7.14), i.e. by ∀ i∈{1,...,n} Avi = m X aji wj . j=1 Then (aji ) is the matrix representing A with respect to BV and BW and rk(A) = rk(aji ) ′ ′ ) also be ordered bases of V := (w1′ , . . . , wm by (a). Now let BV′ := (v1′ , . . . , vn′ ) and BW and W , respectively, such that (7.17) holds, i.e. ∀ i∈{1,...,n} vi′ = n X j=1 cji vj , ∀ i∈{1,...,m} wi = m X dji wj′ j=1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 143 7 MATRICES (such bases exist, since (cji ) and (dji ) are regular), then, according to Th. 7.14, the ′ matrix (eji ) := (dji )(aji )(cji ) represents A with respect to BV′ and BW . Thus, (a) yields rk(eji ) = rk(A) = rk(aji ), as desired. Theorem 7.18. Let V, W be finite-dimensional vector spaces over the field F , dim V = n, dim W = m, (m, n) ∈ N2 , A ∈ L(V, W ) with r := rk(A). Then there exists ordered bases BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) of V and W , respectively, such that the matrix (aji ) ∈ M(m, n, F ) corresponding to A with respect to BV and BW satisfies ( 1 for j = i, 1 ≤ j ≤ r, Idr 0 . aji = i.e. (aji ) = 0 0 0 otherwise, Proof. According to Th. 6.8(a), we know n = dim V = dim ker A + dim Im A = dim ker A + r. Thus, we can choose an ordered basis BV = (v1 , . . . , vn ) of V , such that (vr+1 , . . . , vn ) is a basis of ker A. For each i ∈ {1, . . . , r}, let wi := Avi . Then (w1 , . . . , wr ) is a basis of Im A and there exist wr+1 , . . . , wm ∈ W such that BW = (w1 , . . . , wm ) constitutes a basis of W . Since ( wi for 1 ≤ i ≤ r, ∀ Avi = i∈{1,...,n} 0 for i > r, (aji ) has the desired form. As a tool for showing rk(aji ) = rrk(aji ), we will now introduce the transpose of a matrix, which is also of general interest. While, here, we are mostly interested in using the transpose of a matrix for a matrix representing a linear map, the concept makes sense in the more general situation of Def. 7.1: Definition 7.19. Let S be a nonempty set, A := (aji )(j,i)∈{1,...,m}×{1,...,n} ∈ M(m, n, S), and m, n ∈ N. Then we define the transpose of A, denoted At , by At := (atji )(j,i)∈{1,...,n}×{1,...,m} ∈ M(n, m, S), where ∀ (j,i)∈{1,...,n}×{1,...,m} atji := aij . (7.22) Thus, if A is an m×n matrix, then its transpose is an n×m matrix, where one obtains At from A by switching rows and columns: A is the map f : {1, . . . , m} × {1, . . . , n} −→ S, f (j, i) = aji , and its transpose At is the map f t : {1, . . . , n} × {1, . . . , m} −→ S, f t (j, i) = atji = f (i, j) = aij . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 144 7 MATRICES Example 7.20. As examples of forming transposes, consider 1 0 t t 2 −1 1 2 3 4 1 3 1 2 = , = 3 0 . 0 −1 0 −2 2 4 3 4 4 −2 Remark 7.21. Let S be a nonempty set, m, n ∈ N. It is immediate from Def. 7.19 that the map A 7→ At is bijective from M(m, n, S) onto M(n, m, S), where ∀ A∈M(m,n,S) (At )t = A. (7.23) Theorem 7.22. Let m, n, l ∈ N. (a) Let S be a commutative ring. If A ∈ M(m, n, S) and B ∈ M(n, l, S), then (AB)t = B t At . (7.24) If A ∈ GLn (S), then At ∈ GLn (S) and (At )−1 = (A−1 )t . (7.25) (b) Let F be a field. Then the map I : M(m, n, F ) −→ M(n, m, F ), A 7→ At , (7.26) constitutes a linear isomorphism. Proof. Exercise. Theorem 7.23. Let V, W be finite-dimensional vector spaces over the field F , dim V = n, dim W = m, with ordered bases BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ), respectively. If A ∈ L(V, W ) and (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with respect to BV and BW , then rk(A) = rk(aji ) = rrk(aji ). Proof. As the first equality was already shown in Th. 7.17(a), it merely remains to show rk(aji ) = rrk(aji ). Let r := rk(A). According to Th. 7.18 and Th. 7.14, there exist regular matrices (xji ) ∈ GLm (F ), (yji ) ∈ GLn (F ) such that (7.24) Idr 0 Idr 0 t t t (xji )(aji )(yji ) = . ⇒ (yji ) (aji ) (xji ) = 0 0 0 0 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 7 MATRICES Since (xji )t and (yji )t are regular by Th. 7.22(a), we may use Th. 7.17(b) to obtain Idr 0 t t t t = r, rrk(aji ) = rk(aji ) = rk (yji ) (aji ) (xji ) = rk 0 0 as desired. 145 Remark 7.24. Let V, W be finite-dimensional vector spaces over the field F , dim V = n, dim W = m, where m, n ∈ N. Moreover, let BV = (v1 , . . . , vn ) and BW = (w1 , . . . , wm ) be ordered bases of V and W , respectively, let A ∈ L(V, W ), and let (aji ) ∈ M(m, n, F ) be the matrix corresponding to A with respect to BV and BW . Using the transpose matrix (aji )t ∈ M(n, m, F ), one can now also define a transpose At of the map A. However, a subtlety arises related to basis transitions and, given λ1 , . . . , λm ∈ F , representing coordinates of w ∈ W with respect to BW , one can not simply define At by applying (aji )t to the column vector containing λ1 , . . . , λm , as the result would, in general, depend on BV and BW . Instead, one obtains At by left-multiplying (aji ) with the row vector containing λ1 , . . . , λm : At w := λ1 . . . λm (aji ). Even though the resulting coordinate values are the same that one obtains from computing (aji )t (λ1 . . . λm )t , the difference appears when considering basis transitions. In consequence, algebraically, one obtains the transpose At to be a map from the dual W ′ of W into the dual V ′ of V . Here, we will not go into further details, as we do not want to explain the concept of dual spaces at this time. 7.4 Special Types of Matrices In the present section, we investigate types of matrices that posses particularly simple structures, often, but not always, due to many entries being 0. For many of the sets that we introduce in the following, there does not seem to be a standard notation, at least not within Linear Algebra. However, as these sets do appear very frequently, it seems inconvenient and cumbersome not to introduce suitable notation. Definition 7.25. Let n ∈ N and let S be a set containing elements denoted 0 and 1 (usually, S will be a ring with unity or even a field, but we want to emphasize that the present definition makes sense without any structure on the set S). (a) We call a matrix A = (aji ) ∈ M(n, S) diagonal if, and only if, aji = 0 for j 6= i, i.e. if, and only if, all nondiagonal entries of A are 0. We define Dn (S) := A ∈ M(n, S) : A is diagonal . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 146 7 MATRICES If (s1 , . . . , sn ) ∈ S n , then we define diag(s1 , . . . , sn ) := D = (dji ) ∈ Dn (S), dji := ( sj 0 for j = i, for j 6= i. (b) A matrix A = (aji ) ∈ M(n, S) is called upper triangular or right triangular (resp. lower triangular or left triangular) if, and only if, aji = 0 for each i, j ∈ {1, . . . , n} such that j > i (resp. j < i), i.e. if, and only if, all nonzero entries of A are above/right (resp. below/left) of the diagonal. A triangular matrix A is called strict if, and only if, aii = 0 for each i ∈ {1, . . . , n}; it is called unipotent if, and only if, aii = 1 for each i ∈ {1, . . . , n}. We define5 BUn (S) := A ∈ M(n, S) : A is upper triangular , BLn (S) := A ∈ M(n, S) : A is lower triangular , BU0n (S) := A ∈ M(n, S) : A is strict upper triangular , BL0n (S) := A ∈ M(n, S) : A is strict lower triangular , BU1n (S) := A ∈ BUn (S) : A is unipotent , BL1n (S) := A ∈ BLn (S) : A is unipotent . (c) A matrix A = (aji ) ∈ M(n, S) is called symmetric if, and only if, At = A, i.e. if aji = aij for each i, j ∈ {1, . . . , n}. We define Symn (S) := A ∈ M(n, S) : A is symmetric . Proposition 7.26. Let S be a ring and m, n ∈ N. Let s1 , . . . , smax{m,n} ∈ S. (a) Let D1 := diag(s1 , . . . , sm ) ∈ Dm (S), D2 := diag(s1 , . . . , sn ) ∈ Dn (S), A = (aji ) ∈ M(m, n, S). Then left multiplication of A by D1 multiplies the j-th row of A by sj ; right multiplication of A by D2 multiplies the i-th column of A by si : s1 r1A r1A (7.27a) D1 A = diag(s1 , . . . , sm ) ... = ... , A A s m rm rm A A . . . cA AD2 = cA (7.27b) 1 n diag(s1 , . . . , sn ) = c1 s1 . . . cn sn . (b) Dn (S) is a subring of M(n, S) (subring with unity if S is a ring with unity). If S is a field, then Dn (S) is a vector subspace of M(n, S). 5 The ‘B’ in the notation comes from the theory of so-called Lie algebras, where one finds, at least for suitable S, that BUn (S) and BLn (S) form so-called Borel subalgebras of the algebra M(n, S). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 147 7 MATRICES (c) If S is a ring with unity, then D := diag(s1 , . . . , sn ) ∈ GLn (S) if, and only if, each si is invertible, i.e. if, and only if, si ∈ S ∗ for each i ∈ {1, . . . , n}. In that case, one −1 ∗ has D−1 = diag(s−1 1 , . . . , sn ) ∈ GLn (S). Moreover, Dn (S ) = Dn (S) ∩ GLn (S) is a subgroup of GLn (S). Proof. (a): If (dji ) := D1 , (eji ) := D2 , (xji ) := D1 A and (yji ) := AD2 , then, for each (j, i) ∈ {1, . . . , m} × {1, . . . , n}, xji = n X djk aki = sj aji , yji = k=1 m X ajk eki = aji si . k=1 (b): As a consequence of (a), we have, for a1 , . . . , an , b1 , . . . , bn ∈ S, diag(a1 , . . . , an ) diag(b1 , . . . , bn ) = diag(a1 b1 , . . . , an bn ) ∈ Dn (S). (7.28) As A, B ∈ Dn (S), clearly, implies A + B ∈ Dn (S) and −A ∈ Dn (S), we have shown Dn (S) to be a subring of M(n, S), where, if 1 ∈ S, then Idn ∈ Dn (S). If λ ∈ S, A ∈ Dn (S), then, clearly, λA ∈ Dn (S), showing that, for S being a field, Dn (S) is a vector subspace of M(n, S). (c) follows from (7.28) and the fact that (S ∗ , ·) is a group according to Def. and Rem. 4.41. Proposition 7.27. Let S be a ring and n ∈ N. (a) Let A := (aji ) ∈ M(n, S) and B := (bji ) ∈ M(n, S). If A, B ∈ BUn (S) or A, B ∈ BLn (S), and C := (cji ) = AB, then cii = aii bii for each i ∈ {1, . . . , n}. (b) BUn (S), BLn (S), BU0n (S), BL0n (S) are subrings of M(n, S) ( BUn (S), BLn (S) are subrings with unity if S is a ring with unity). If S is a field, then BUn (S), BLn (S), BU0n (S), BL0n (S) are vector subspaces of M(n, S). (c) Let S be a ring with unity. If B := (bji ) ∈ BUn (S) ∩ GLn (S) and A := (aji ) = B −1 , then A ∈ BUn (S) with for i < j, 0 −1 for i = j, aji = bii (7.29a) Pi−1 −1 − recursively for i > j. k=1 ajk bki bii If B := (bji ) ∈ BLn (S) ∩ GLn (S) and A := (aji ) = B −1 , then A ∈ BLn (S) with for j < i, 0 −1 for j = i, aji = bii (7.29b) Pj −1 − recursively for j > i. k=i+1 ajk bki bii AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 148 7 MATRICES (d) If S is a ring with unity and A := (aji ) ∈ BUn (S) ∪ BLn (S), then A ∈ GLn (S) if, and only if, each aii is invertible. Moreover, BU1n (S), BL1n (S) as well as BUn (S) ∩ GLn (S), BLn (S) ∩ GLn (S) are subgroups of GLn (S). Proof. (a): For A, B ∈ BUn (S), we compute =0 n i−1 =0 X X z}|{ z}|{ aik bki + aii bii + aik bki = aii bii , cii = k=i+1 k=1 while, for A, B ∈ BLn (S), we compute cii = i−1 X k=1 =0 =0 n X z}|{ z}|{ aik bki +aii bii + aik bki = aii bii . k=i+1 (b): Let A, B ∈ BUn (S) and C := (cji ) = AB. We obtain C ∈ BUn (S), since, if j > i, then i n X X cji = ajk bki + ajk bki = 0 k=1 k=i+1 (the first sum equals 0 since k ≤ i < j and A is upper triangular, the second sum equals 0 since k > i and B is upper triangular). Now let A, B ∈ BLn (S) and C := (cji ) := AB. We obtain C ∈ BLn (S), since, if j < i, then cji = i−1 X k=1 ajk bki + n X ajk bki = 0 k=i (the first sum equals 0 since k < i and B is lower triangular, the second sum equals 0 since j < i ≤ k and A is lower triangular). Now let M ∈ {BUn (S), BLn (S), BU0n (S), BL0n (S)}. Then the above and (a) show AB ∈ M for A, B ∈ M. As A, B ∈ M, clearly, implies A + B ∈ M and −A ∈ M, we have shown M to be a subring of M(n, S), where, if 1 ∈ S, then Idn ∈ BUn (S) ∩ BLn (S). If λ ∈ S, A ∈ M, then, clearly, λA ∈ M, showing that, for S being a field, M is a vector subspace of M(n, S). (c): If B ∈ BUn (S) ∩ GLn (S), then each bii is invertible by (a) and (b). Now let A := (aji ) be defined by (7.29a), C := (cji ) := AB. We already know C ∈ BUn (S) according to (b). Moreover, cii = 1 follows from (a) and, for j < i, cji = i X k=j ajk bki = aji bii + i−1 X ajk bki = 0 k=j AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 149 7 MATRICES by the recursive part of (7.29a), completing the proof of C = Idn . If B ∈ BLn (S) ∩ GLn (S), then each bii is invertible by (a) and (b). Now let A := (aji ) be defined by (7.29b), C := (cji ) := AB. We already know C ∈ BLn (S) according to (b). Moreover, cii = 1 follows from (a) and, for j > i, cji = j X ajk bki = aji bii + j X ajk bki = 0 k=i+1 k=i by the recursive part of (7.29b), completing the proof of C = Idn . (d) is now merely a corollary of suitable parts of (a),(b),(c). Example 7.28. Let 1 0 0 B := 4 2 0 . 6 5 3 We use (7.29b) to compute A := (aji ) = B −1 : 1 a22 = , 2 1 a33 = , 3 5 1 5 a32 = −a33 b32 /b22 = − · = − , 3 2 6 4 a21 = −a22 b21 /b11 = − = −2, 2 a11 = 1, 5·4 6 a31 = −(a32 b21 + a33 b31 )/b11 = − − + 6 3 4 = . 3 Thus, 1 AB = −2 4 3 0 1 2 − 65 0 1 0 0 1 0 0 0 . 4 2 0 = −2 + 42 1 0 = Id3 . 4 1 − 5·4 + 36 − 5·2 + 35 1 6 5 3 3 3 6 6 Proposition 7.29. Let F be a field and n ∈ N. (a) Symn (F ) is a vector subspace of M(n, F ). (b) If A ∈ Symn (F ) ∩ GLn (F ), then A−1 ∈ Symn (F ) ∩ GLn (F ). Proof. (a): If A, B ∈ Symn (F ) and λ ∈ F , then (A + B)t = At + B t = A + B and (λA)t = λAt = λA, showing A + B ∈ Symn (F ) and λA ∈ Symn (F ), i.e. Symn (F ) is a vector subspace of M(n, F ). (b): If A ∈ Symn (F ) ∩ GLn (F ), then (A−1 )t = (At )−1 = A−1 , proving A−1 ∈ Symn (F ) ∩ GLn (F ). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 7 MATRICES 150 Definition 7.30. Let S be a group, n ∈ N, A ∈ M(n, S). (a) A is called skew-symmetric if, and only if, At = −A. Define Skewn (S) := A ∈ M(n, S) : A is skew-symmetric . (b) Assume F := S is a field with char F 6= 2 (i.e. 2 := 1 + 1 6= 0 in F ). We then call Asym := 21 (A + At ) the symmetric part of A and Askew := 12 (A − At ) the skewsymmetric part of A. Proposition 7.31. Let F be a field, n ∈ N. (a) Skewn (F ) is a vector subspace of M(n, F ). (b) If char F 6= 2, then Skewn (F ) ∩ Symn (F ) = {0}. (c) If A ∈ Skewn (F ) ∩ GLn (F ), then A−1 ∈ Skewn (F ) ∩ GLn (F )6 . Proof. (a): If A, B ∈ Skewn (F ) and λ ∈ F , then (A + B)t = At + B t = −A − B = −(A + B) and (λA)t = λAt = λ(−A) = −(λA), showing A + B ∈ Skewn (F ) and λA ∈ Skewn (F ), i.e. Skewn (F ) is a vector subspace of M(n, F ). (b): Let A ∈ Skewn (F ) ∩ Symn (F ). Then −A = At = A, i.e. 2A = 0, implying 2 = 0 or A = 0. Thus, char F = 2 or A = 0. (c): If A ∈ Skewn (F ) ∩ GLn (F ), then (A−1 )t = (At )−1 = −A−1 , proving A−1 ∈ Symn (F ) ∩ GLn (F ). Proposition 7.32. Let F be a field with char F 6= 2, n ∈ N, A ∈ M(n, F ). (a) The symmetric part Asym = 21 (A + At ) of A is symmetric, the skew-symmetric part Askew = 21 (A − At ) of A is skew-symmetric. (b) A can be uniquely decomposed into its symmetric and skew-symmetric parts, i.e. A = Asym + Askew and, if A = S + B with S ∈ Symn (F ) and B ∈ Skewn (F ), then S = Asym and B = Askew . (c) A is symmetric if, and only if, A = Asym ; A is skew-symmetric if, and only if, A = Askew . 6 Using the determinant det A and results from Linear Algebra II, we can show that A−1 ∈ Skewn (F ) ∩ GLn (F ) can only exist for char F = 2 or n even: If A ∈ Skewn (F ), then det A = det(At ) = det(−A) = (−1)n det A, where we have used [Phi19, Cor. 4.24(b),(d)]. Thus, for n odd, 2 det A = 0, implying 2 = 0 (i.e. char F = 2) or det A = 0 (i.e. A ∈ / GLn (F ) by [Phi19, Cor. 4.24(h)]). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 151 7 MATRICES Proof. (a): Asym is symmetric due to 1 1 Atsym = (A + At )t = (At + A) = Asym . 2 2 Askew is skew-symmetric due to 1 1 Atskew = (A − At )t = (At − A) = −Askew . 2 2 (b): While Asym + Askew = 12 A + 21 At + 21 A − 12 At = A is immediate; if A = S + B with S ∈ Symn (F ) and B ∈ Skewn (F ), then Asym + Askew = A = S + B ⇒ Asym − S = B − Askew , where Asym − S ∈ Symn (F ) by Prop. 7.29(a) and B − Askew ∈ Skewn (F ) by Prop. 7.31(a), showing Asym = S and Askew = B by Prop. 7.31(b). (c): If A = Asym , then A is symmetric by (a); if A = Askew , then A is skew-symmetric by (b). If A is symmetric, then Askew = A − Asym ∈ Skewn (F ) ∩ Symn (F ) = {0}, showing A = Asym . If A is skew-symmetric, then Asym = A−Askew ∈ Skewn (F )∩Symn (F ) = {0}, showing A = Askew . 7.5 Blockwise Matrix Multiplication It is sometimes useful that one can carry out matrix multiplication in a blockwise fashion, i.e. by partitioning the entries of a matrix into submatrices and then performing a matrix multiplication for new matrices that have the submatrices as their entries, see Th. 7.34 below. To formulate and proof Th. 7.34, we need to define matrices and their multiplication, where the entries are allowed to be indexed by more general index sets: Definition 7.33. Let S be a ring. Let J, I be finite index sets, #J, #I ∈ N. We then call each family (aji )(j,i)∈J×I in S a J × I matrix over S, denoting the set of all such matrices by M(J, I, S). If K is another index set with #K ∈ N, then, for each J × K matrix (aji ) and each K × I matrix (bji ) over S, define the product ! X ajk bki . (7.30) (aji )(bji ) := k∈K (j,i)∈J×I Theorem 7.34. Let S be a ring. Let J, I, K be finite index sets, #J, #I, #K ∈ N. Now assume, we have disjoint partitions of J, I, K into nonempty sets: [ [ [ ˙ ˙ ˙ J= Jα , I = Iβ , K = Kγ , (7.31) α∈{1,...,M } β∈{1,...,N } γ∈{1,...,L} AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 152 7 MATRICES where M, N, L ∈ N, ∀ α∈{1,...,M } mα := #Jα ∈ N, ∀ β∈{1,...,N } nβ := #Iβ ∈ N, ∀ γ∈{1,...,M } lγ := #Kγ ∈ N. Let A := (aji ) be a J × K matrix over S and let B := (bji ) be a K × I matrix over S. Define the following submatrices of A and B, respectively: ∀ Aαγ := (aji )(j,i)∈Jα ×Kγ , (7.32a) ∀ Bγβ := (bji )(j,i)∈Kγ ×Iβ . (7.32b) (α,γ)∈{1,...,M }×{1,...,L} (γ,β)∈{1,...,L}×{1,...,N } Then, for each (α, γ, β) ∈ {1, . . . , M }×{1, . . . , L}×{1, . . . , N }, Aαγ is a Jα ×Kγ matrix and Bγβ is a Kγ × Iβ matrix. Thus, we can define ! L X (Cαβ ) := (Aαγ )(Bγβ ) := Aαγ Bγβ . (7.32c) γ=1 (α,β)∈{1,...,M }×{1,...,N } We claim that C := (cji ) := AB = (Cαβ ) (7.33) ∀ (7.34) in the sense that (α,β)∈{1,...,M }×{1,...,N } ∀ (j,i)∈Jα ×Iβ cji = (Cαβ )ji . Proof. Let (α, β) ∈ {1, . . . , M } × {1, . . . , N } and (j, i) ∈ Jα × Iβ . Then, according to (7.32) and since K is the disjoint union of the Kγ , we have (Cαβ )ji = L X X γ=1 k∈Kγ ajk bki = X ajk bki = cji , k∈K proving (7.34) and the theorem. Example 7.35. Let S be a ring with unity and A, B, C, D ∈ GL2 (S). Then we can use Th. 7.34 to perform the following multiplication of a 4 × 6 matrix by a 6 × 4 matrix: A−1 0 2 Id2 BC −1 A B 0 −1 −1 B C . = CB −1 2 Id2 0 C D 0 D−1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 153 8 LINEAR SYSTEMS 8 Linear Systems 8.1 General Setting Let F be a field. A linear equation over F has the form n X ak xk = b, k=1 where n ∈ N; b ∈ F and the a1 , . . . , an ∈ F are given; and one in interested in determining the “unknowns” x1 , . . . , xn ∈ F such that the equation holds. The equation is called linear, as b is desired to be a linear combination of the a1 , . . . , an . A linear system is now a finite set of linear equations that the x1 , . . . , xn ∈ F need to satisfy simultaneously: ∀ j∈{1,...,m} n X ajk xk = bj , (8.1) k=1 where m, n ∈ N; b1 , . . . , bm ∈ F and the aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n} are given. One observes that (8.1) can be concisely written as Ax = b, using matrix multiplication, giving rise to the following definition: Definition 8.1. Let F be a field. Given a matrix A ∈ M(m, n, F ), m, n ∈ N, and b ∈ M(m, 1, F ) ∼ = F m , the equation Ax = b (8.2) is called a linear system for the unknown x ∈ M(n, 1, F ) ∼ = F n . The matrix one obtains by adding b as the (n + 1)th column to A is called the augmented matrix of the linear system. It is denoted by (A|b). The linear system (8.2) is called homogeneous for b = 0 and inhomogeneous for b 6= 0. By L(A|b) := {x ∈ F n : Ax = b}, (8.3) we denote the set of solutions to (8.2). Example 8.2. While linear systems arise from many applications, we merely provide a few examples, illustrating the importance of linear systems for problems from inside the subject of linear algebra. (a) Let v1 := (1, 2, 0, 3), v2 := (0, 3, 2, 1), v3 := (1, 1, 1, 1). The question if the set M := {v1 , v2 , v3 } is a linearly dependent subset of R4 is equivalent to asking if there AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 154 8 LINEAR SYSTEMS exist x1 , x2 , x3 ∈ R, not all 0, such that x1 v1 + x2 v2 + x3 v3 = 0, which is equivalent to the question if the linear system x1 2x1 + 3x2 2x2 3x1 + x2 + + + + x3 x3 x3 x3 = = = = 0 0 0 0 has a solution x = (x1 , x2 , x3 ) 6= (0, 0, 0). The linear system can be written as 0 1 0 1 0 2 3 1 x1 0 2 1 x2 = 0 . x3 0 3 1 1 The augmented matrix of the linear system is 1 0 1 2 3 1 (A|b) := 0 2 1 3 1 1 | | | | 0 0 . 0 0 (b) Let b1 := (1, 2, 2, 1). The question if b1 ∈ h{v1 , v2 , v3 }i with v1 , v2 , v3 as in (a) is equivalent to the question, whether the linear system 1 0 1 1 2 , A := 2 3 1 Ax = 0 2 1 2 3 1 1 1 has a solution. (c) Let n ∈ N. The problem of finding an inverse to an n × n matrix A ∈ M(n, n, F ) is equivalent to solving the n linear systems Av1 = e1 , . . . , Avn = en , (8.4) where e1 , . . . , en are the standard (column) basis vectors. Then the vk are obviously the column vectors of A−1 . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 155 8 LINEAR SYSTEMS 8.2 Abstract Solution Theory Before describing in Sec. 8.3 below, how one can systematically determine the set of solutions to a linear system, we apply some of our general results from the theory of vector spaces, linear maps, and matrices to obtain criteria for the existence and uniqueness of solutions to linear systems. Remark 8.3. Let F be a field and m, n ∈ N. If A ∈ M(m, n, F ), then A can be interpreted as the linear map LA : F n −→ F m , LA (x) := Ax, (8.5) where x ∈ F n is identified with a column vector in M(n, 1, F ) and the column vector Ax ∈ M(m, 1, F ) is identified with an element of F m . In view of this fact, given an n-dimensional vector space V over F , an m-dimensional vector space W over F , a linear map L ∈ L(V, W ), and b ∈ W , we call Lx = b (8.6) a linear system or merely a linear equation for the unknown vector x ∈ V and we write L(L|b) for its set of solutions. If V = F n , W = F m , and the the matrix A ∈ M(m, n, F ) corresponds to L with respect to the respective standard bases of F n and F m (cf. Ex. 5.16(c)), then (8.6) is identical to (8.2) and L(A|b) = L(L|b) = L−1 {b}, (8.7) i.e. the set of solutions is precisely the preimage of the set {b} under L. In consequence, we then have L(A|0) = L(L|0) = ker L, dim ker L Th. 6.8(a) = dim F n − dim Im L = n − rk(A), (8.8) (8.9) and, more generally, b∈ / Im L b ∈ Im L ⇒ Th. 4.20(f) ⇒ L(A|b) = L(L|b) = ∅, (8.10a) ∀ (8.10b) x0 ∈L(L|b) L(A|b) = L(L|b) = x0 + ker L = x0 + L(L|0). One can also express (8.10b) by saying one obtains all solutions to the inhomogeneous system Lx = b by adding a particular solution of the inhomogeneous system to the set of all solutions to the homogeneous system Lx = 0. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 156 8 LINEAR SYSTEMS Theorem 8.4 (Existence of Solutions). Let F be a field and m, n ∈ N, let L ∈ L(F n , F m ), and let A ∈ M(m, n, F ) correspond to L with respect to the respective standard bases of F n and F m . (a) Given b ∈ F m , the following statements are equivalent: (i) L(A|b) = L(L|b) 6= ∅. (ii) b ∈ Im L. A A (iii) b ∈ h{cA 1 , . . . , cn }i, where the ci denote the column vectors of A. (iv) rk(A) = rk(A|b). (b) The following statements are equivalent: (i) L(A|b) = L(L|b) 6= ∅ holds for each b ∈ F m . (ii) L is surjective. (iii) rk(A) = m. Proof. (a): (i) implies (ii) by (8.7). Since A corresponds to L with respect to the respective standard bases of F n and F m , (ii) implies (iii) by Lem. 7.5(a). (iii) implies (iv), since (iii) A A A rk(A|b) = dimh{cA 1 , . . . , cn , b}i = dimh{c1 , . . . , cn }i = rk(A). A (iv) implies (i): If rk(A) = rk(A|b), then b is linearly dependent on cA 1 , . . . , cn , which, Pn n A by Lem. 7.5(a), yields the existence of x ∈ F with Ax = k=1 xk ck = b. (b): The equivalence between (i) and (ii) is merely the definition of L being surjective. Moreover, the equivalences rk(A) = m ⇔ dim Im L = m ⇔ Im L = F m ⇔ prove the equivalence between (ii) and (iii). L surjective Theorem 8.5 (Uniqueness of Solutions). Consider the situation of Th. 8.4. Given b ∈ Im L, the following statements are equivalent: (i) #L(A|b) = #L(L|b) = 1, i.e. Lx = b has a unique solution. (ii) L(A|0) = L(L|0) = {0}, i.e. the homogeneous system Lx = 0 has only the so-called trivial solution x = 0. (iii) rk(A) = n. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 157 8 LINEAR SYSTEMS Proof. Since b ∈ Im L, there exists x0 ∈ F n with L(x0 ) = b, i.e. x0 ∈ L(L|b). “(i)⇔(ii)”: Since L(L|b) = x0 + ker L by (8.10b), (i) is equivalent to L(A|0) = ker L = {0}. “(ii)⇔(iii)”: According to Th. 6.8(a), we know rk(A) = dim Im L = dim F n − dim ker L = n − dim ker L. Thus, L(A|0) = ker L = {0} is equivalent to rk(A) = n. Corollary 8.6. Consider the situation of Th. 8.4 with m = n. Then the following statements are equivalent: (i) There exists b ∈ F n such that Lx = b has a unique solution. (ii) L(A|b) = L(L|b) 6= ∅ holds for each b ∈ F n . (iii) The homogeneous system Lx = 0 has only the so-called trivial solution x = 0. (iv) rk(A) = n. (v) A and L are invertible. Proof. “(iii)⇔(iv)”: The equivalence is obtained by using b := 0 in Th. 8.5. “(iii)⇔(v)”: Since m = n, L is bijective if, and only if, it is injective by Th. 6.10. “(ii)⇔(v)”: Since m = n, L is bijective if, and only if, it is surjective by Th. 6.10. “(i)⇔(iii)” is another consequence of Th. 8.5. 8.3 Finding Solutions While the results of Sec. 8.2 provide some valuable information regarding the solutions of linear systems, in general, they do not help much to solve concrete systems, especially, if the systems consist of many equations with many variables. To remedy this situation, in the present section, we will investigate methods to systematically solve linear systems. 8.3.1 Echelon Form, Back Substitution We recall from (8.1) and Def. 8.1 that, given a field F and m, n ∈ N, we are considering linear systems n X ajk xk = bj , ∀ j∈{1,...,m} k=1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 158 8 LINEAR SYSTEMS which we can write in the form Ax = b with A = (aji ) ∈ M(m, n, F ) and b ∈ M(m, 1, F ) ∼ = F m , where x and b are interpreted as column vectors, and (A|b) ∈ M(m, n + 1, F ), where b is added as a last column to A, is called the augmented matrix of the linear system. The goal is to determine the set of solutions L(A|b) = {x ∈ F n : Ax = b}. We first investigate a situation, where determining L(A|b) is particularly easy, namely where (A|b) has so-called echelon form: Definition 8.7. Let S be a set with 0 ∈ S and A = (aji ) ∈ M(m, n, S), m, n ∈ N. For each row, i.e. for each j ∈ {1, . . . , m}, let ν(j) ∈ {1, . . . , n} be the smallest index k such that ajk 6= 0 and ν(j) := n + 1 if the jth row consists entirely of zeros. Then A is said to be in (row) echelon form if, and only if, for each j ∈ {2, . . . , n}, one has ν(j) > ν(j − 1) or ν(j) = n + 1. Thus, A is in echelon form if, and only if, it looks as follows: 0 ... 0 ∗ ∗ ∗ ∗ ... ∗ ∗ ∗ ∗ 0 . . . 0 0 . . . 0 ∗ ∗ ∗ . . . ∗ ∗ (8.11) A = 0 . . . 0 0 . . . 0 0 . . . 0 ∗ . . . ∗ . .. .. . . The first nonzero elements in each row are called pivot elements (in (8.11), the positions of the pivot elements are marked by squares ). The columns cA k containing a pivot element are called pivot columns and k ∈ {1, . . . , n} is then called a pivot index; an index k ∈ {1, . . . , n} that is not a pivot index is called a free index. Let IpA , IfA ⊆ {1, . . . , n} denote the sets of pivot indices and free indices, respectively. If A represents a linear system, then the variables corresponding to pivot columns are called pivot variables. All remaining variables are called free variables. Example 8.8. The following matrix over R is in echelon form: 0 0 3 3 0 −1 0 3 A := 0 0 0 4 0 2 −3 2 . 0 0 0 0 0 0 1 1 It remains in echelon form if one adds zero rows at the bottom. For the linear system Ax = b, the variables x3 , x4 , x7 are pivot variables, whereas x1 , x2 , x5 , x6 , x8 are free variables. Theorem 8.9. Let F be a field and m, n ∈ N. Let A ∈ M(m, n, F ), b ∈ M(m, 1, F ) ∼ = F m , and consider the linear system Ax = b. Assume the augmented matrix (A|b) to be in echelon form. (a) Then the following statements are equivalent: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 8 LINEAR SYSTEMS 159 (i) Ax = b has at least one solution, i.e. L(A|b) 6= ∅. (ii) rk(A) = rk(A|b). (iii) The final column b in the augmented matrix (A|b) contains no pivot elements (i.e. if there is a zero row in A, then the corresponding entry of b also vanishes). (b) Let LA : F n −→ F m be the linear map associated with A according to Rem. 8.3. Noting A to be in echelon form as well and using the notation from Def. 8.7, one has dim ker LA = dim L(A|0) = #IfA , dim Im LA = rk A = #IpA , i.e. the dimension of the kernel of A is given by the number of free variables, whereas the dimension of the image of A is given by the number of pivot variables. Proof. (a): The equivalence between (i) and (ii) was already shown in Th. 8.4. “(ii)⇔(iii)”: As both A and (A|b) are in echelon form, their respective ranks are, clearly, given by their respective number of nonzero rows. However, A and (A|b) have the same number of nonzero rows if, and only if, b contains no pivot elements. (b): As A is in echelon form, its number of nonzero rows (i.e. its rank) is precisely the number of pivot indices, proving dim Im LA = rk A = #IpA . Since #IpA + #IfA = n = dim ker LA + dim Im LA , the claimed dim ker LA = #IfA now also follows. If the augmented matrix (A|b) of the linear system Ax = b is in echelon form and such that its final column b contains no pivot element, then the linear system can be solved, using so-called back substitution: One obtains a parameterized representation of L(A|b) as follows: Starting at the bottom with the first nonzero row, one solves each row for the corresponding pivot variable, in each step substituting the expressions for pivot variables that were obtained in previous steps. The free variables are treated as parameters. A particular element of L(A|b) can be obtained by setting all free variables to 0. We now provide a precise formulation of this algorithm: Algorithm 8.10 (Back Substitution). Let F be a field and m, n ∈ N. Let 0 6= A = (aji ) ∈ M(m, n, F ) and b ∈ M(m, 1, F ) ∼ = F m and consider the linear system Ax = b. Assume the augmented matrix (A|b) to be in echelon form and assume its final column b to contain no pivot element. As before, let IpA , IfA ⊆ {1, . . . , n} denote the sets of pivot indices and free indices, respectively. Also note IpA 6= ∅, whereas IfA might be empty. Let i1 < · · · < iN be an enumeration of the elements of IpA , where N ∈ {1, . . . , n}. As A has echelon form, for each k ∈ {1, . . . , N }, ak,ik is the pivot element occurring in pivot column cA ik (i.e. the pivot element is in the k-th row and this also implies N ≤ m). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 160 8 LINEAR SYSTEMS The algorithm of back substitution 7 defines a family (αkl )(k,l)∈J×I in F , J := IpA , I := {0} ∪ IfA , such that ∀ k∈{1,...,N }, 0<l<ik ∀ x ik = α ik 0 + k∈{1,...,N } X α ik l x l l∈I: l>ik αik l = 0, ∧ (8.12a) αik 0 = a−1 k ik bk ! , (8.12b) where, for l > ik , the αik l are defined recursively over k = N, . . . , 1 as follows: Assuming αij l have already been defined for each j > k such that (8.12b) holds, solve the k-th equation of Ax = b for xik : n X ak i x i = b k i=ik ⇒ −1 xik = a−1 k ik b k − ak ik n X ak i x i . (8.13) i=ik +1 All variables xi in the last equation have indices i > ik and, inductively, we use (8.12b) for ij > ik to replace all pivot variables in this equation for xik . The resulting expression for xik has the form (8.12b) with αik l ∈ F , as desired. From (αkl ), we now define column vectors v1i s1 .. .. ∀ vi := . , s := . , i∈IfA vni sn where ∀ j∈{1,...,n} sj := ( α ik 0 0 for j = ik ∈ IpA , for j ∈ IfA , α ik i vji := 1 ∀ (j,i)∈{1,...,n}×IfA 0 for j = ik ∈ IpA , for j = i, for j ∈ IfA \ {i}. Theorem 8.11. Consider the situation of Alg. 8.10 and let LA : F n −→ F m be the linear map associated with A according to Rem. 8.3. (a) Let x = (x1 , . . . , xn )t ∈ F n be a column vector. Then x ∈ L(A|b) if, and only if, the xik , k ∈ {1, . . . , N }, satisfy the recursion over k = N, . . . , 1, given by (8.13) or, equivalently, by (8.12b). In particular, x ∈ ker LA = L(A|0) if, and only if, the xik satisfy the respective recursion with b1 = · · · = bN = 0. 7 For F ∈ {R, C}, you can find code for an R implementation using back substitution (and its analogue called forward substitution) to solve linear systems Ax = b in case of triangular A with unique solutions in Sec. F.1.2 of the Appendix. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 161 8 LINEAR SYSTEMS (b) The set B := {vi : i ∈ IfA } (8.14) forms a basis of ker LA = L(A|0) and L(A|b) = s + ker LA . (8.15) Proof. (a) holds, as the implication in (8.13) is, clearly, an equivalence. (b): As we already know dim ker LA = #IfA from Th. 8.9(b), to verify B is a basis of ker LA , it suffices to show B is linearly independent and B ⊆ ker LA . If IfA = ∅, then there is nothing to prove. Otherwise, if (λi )i∈IfA is a family in F such that 0= X λi vi , i∈IfA then ∀ i∈IfA 0= X λj vji = j∈IfA X λj δji = λi , j∈IfA showing the linear independence of B. To show B ⊆ ker LA , fix i ∈ IfA . To prove vi ∈ ker LA , according to (a), we need to show X ∀ v ik i = αik l vli . k∈{1,...,N } l∈IfA : l>ik Indeed, we have v ik i = α ik i = X l∈IfA : l>ik αik l vli = X αik l δli , l∈IfA : l>ik which holds, since, if Ik := {l ∈ IfA : l > ik } = ∅, then the sum is empty and, thus, 0 with vik i = 0 as well, due to i < ik , and, if Ik 6= ∅, then, due to the Kronecker δ, the sum evaluates to αik i as well. To prove (8.15), according to (8.10b), it suffices to show s ∈ L(A|b) and, by (a), this is equivalent to X ∀ s ik = α ik 0 + α ik l s l . k∈{1,...,N } l∈IfA : l>ik However, the validity of these equations is immediate from the definition of s, which, in particular, implies the sums to vanish (sl = 0 for each l ∈ IfA ). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 162 8 LINEAR SYSTEMS Example 8.12. Consider the linear system Ax = b, where 1 1 2 −1 3 0 1 2 0 0 1 1 −1 1 4 A := 0 0 0 0 1 1 ∈ M(4, 6, R), b := 3 ∈ R . 0 0 0 0 0 0 0 Since (A|b) is in echelon form and b does not have any pivot elements, the set of solutions L(A|b) is nonempty, and it can be obtained using back substitution according to Alg. 8.10: Using the same notation as in Alg. 8.10, we have m = 4, n = 6, IpA = {1, 3, 5}, N = 3, IfA = {2, 4, 6}. In particular, we have IpA = {i1 , i2 , i3 } with i1 = 1, i2 = 3, i3 = 5 and the recursion for the xik , αik l is x5 = 3 − x6 , x3 = 2 − x6 + x5 − x4 = 5 − 2x6 − x4 , x1 = 1 − x6 − 3x4 + x3 − 2x2 = 6 − 3x6 − 4x4 − 2x2 . Thus, the αik l are given by α50 = 3, α52 = 0, α54 = 0, α56 = −1, α30 = 5, α32 = 0, α34 = −1, α36 = −2, α10 = 6, α12 = −2, α14 = −4, α16 = −3, and, from Th. 8.11(b), we obtain 6 −3 −4 −2 0 0 0 1 5 −2 −1 0 L(A|b) = s + ker LA = + x2 + x4 + x6 : x2 , x4 , x6 ∈ R . 0 0 0 1 3 −1 0 0 1 0 0 0 8.3.2 Elementary Row Operations, Variable Substitution As seen in the previous section, if the augmented matrix (A|b) of the linear system Ax = b is in echelon form, then the set of solutions can be obtained via back substitution. Thus, it is desirable to transform the augmented matrix of a linear system into echelon form without changing the set of solutions. This can be accomplished in finitely many steps by the so-called Gaussian elimination algorithm of Alg. 8.17 below (cf. Th. 8.19), using elementary row operations and variable substitutions, where variable substitutions actually turn out to constitute particular elementary row operations. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 163 8 LINEAR SYSTEMS Definition 8.13. Let F be a field, m, n ∈ N, and A ∈ M(m, n, F ). Let r1 , . . . , rm denote the rows of A. The following three operations, which transform A into another m × n matrix Aer over F , are known as elementary row operations: (a) Row Switching: Switching two rows ri and rj , where i, j ∈ {1, . . . , m}. (b) Row Multiplication: Replacing a row ri by some nonzero multiple λri of that row, i ∈ {1, . . . , m}, λ ∈ F \ {0}. (c) Row Addition: Replacing a row ri by the sum ri + λrj of that row and a multiple of another row, (i, j) ∈ {1, . . . , m}2 , i 6= j, λ ∈ F . Definition and Remark 8.14. Let F be a field. Consider a linear system ∀ j∈{1,...,m} n X ajk xk = bj , k=1 where m, n ∈ N; b1 , . . . , bm ∈ F ; and aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n}. For aji 6= 0, one has n X −1 −1 ajk xk , (8.16) xi = aji bj − aji k=1, k6=i P and a variable substitution means replacing the l-th equation nk=1 alk xk = bl of the system by the equation, where the variable xi has been substituted using (8.16) with some j 6= l, i.e. by the equation i−1 X k=1 n n X X −1 −1 alk xk + ali a x a b − a + alk xk = bl , jk k ji ji j k=1, k6=i k=i+1 which, after combining the coefficients of the same variables, reads i−1 n X X −1 (alk − ali a−1 a ) x + 0 · x + (alk − ali a−1 k i ji jk ji ajk ) xk = bl − ali aji bj . k=1 (8.17) k=i+1 Comparing with Def. 8.13(c), we see that variable substitution is a particular case of row addition: One obtains (8.17) by replacing row rl of (A|b) by the sum rl − ali a−1 ji rj . Theorem 8.15. Let F be a field and m, n ∈ N. Let A = (aji ) ∈ M(m, n, F ) and b ∈ M(m, 1, F ) ∼ = F m and consider the linear system Ax = b. Applying elementary row operations (and, in particular, variable substitutions as in Def. and Rem. 8.14) to the augmented matrix (A|b) of the linear system Ax = b does not change the system’s set of solutions, i.e. if (Aer |ber ) is the new matrix obtained from applying an elementary row operation according to Def. 8.13 to (A|b), then L(A|b) = L(Aer |ber ). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 8 LINEAR SYSTEMS 164 Proof. For each of the three elementary row operations, it is immediate from the arithmetic laws holding in the field F that, for each x ∈ F n , one has x ∈ L(A|b) if, and only if, x ∈ L(Aer |ber ) (where the inverse operation of row switching is merely switching rows ri and rj again, the inverse operation of row multiplication by λ ∈ F \ {0} is row multiplication by λ−1 , and the inverse operation of row addition ri 7→ ri + λrj , i 6= j, is ri 7→ ri − λrj ). Corollary 8.16. Let F be a field, A = (aji ) ∈ M(m, n, F ), m, n ∈ N. Applying elementary row operations to A does not change the rank of A, i.e. if Aer is the new matrix obtained from applying an elementary row operation according to Def. 8.13 to A, then rk A = rk Aer . Proof. Let LA : F n −→ F m and LAer : F n −→ F m be the linear maps associated with A and Aer , respectively, according to Rem. 8.3. According to Th. 8.15, we have ker LA = L(A|0) = L(Aer |0) = ker LAer . Thus, according to (8.9), we have rk A = n − dim ker LA = n − dim ker LAer = rk Aer . 8.3.3 Gaussian Elimination Algorithm 8.17 (Gaussian Elimination). Let F be a field and m, n ∈ N. Let A = (aji ) ∈ M(m, n, F ). The Gaussian elimination algorithm is the following procedure that, starting with A, recursively applies elementary row operations of Def. 8.13: Let A(1) := A, r(1) := 1. For each k ≥ 1, as long as r(k) < m and k ≤ n, the Gaussian (k+1) (k) elimination algorithm transforms A(k) = (aji ) ∈ M(m, n, F ) into A(k+1) = (aji ) ∈ M(m, n, F ) and r(k) ∈ N into r(k + 1) ∈ N by performing precisely one of the following actions8 : (k) (a) If ar(k),k 6= 0, then, for each i ∈ {r(k) + 1, . . . , m}, replace the ith row by the ith (k) (k) row plus −aik /ar(k),k times the r(k)th row9 . Set r(k + 1) := r(k) + 1. 8 For F ∈ {R, C}, you can find code for an R implementation of Alg. 8.17 in function gaussEl in Sec. F.1.3 of the Appendix: More precisely, function gaussEl uses the augmented version of the Gaussian elimination algorithm of Sec. 8.4.4 below to compute a so-called LU decomposition of A, and, in modification of Alg. 8.17(b), the so-called column maximum strategy is applied for row switching (implemented in function gaussElStep): One switches rows r and i ≥ r such that the pivot element in row i (before the switch) has maximum absolute value (cf. [Phi23, Def. 5.5]). 9 Comparing with Def. and Rem. 8.14, we observe this to be a variable substitution (in the case, where the matrix A represents the augmented matrix of a linear system), substituting the variable xk in the ith row, using the r(k)th row to replace xk . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 165 8 LINEAR SYSTEMS (k) (k) (b) If ar(k),k = 0, and there exists i ∈ {r(k) + 1, . . . , m} such that aik 6= 0, then one chooses such an i ∈ {r(k) + 1, . . . , m} and switches the ith with the r(k)th row. One then proceeds as in (a). (k) (c) If aik = 0 for each i ∈ {r(k), . . . , m}, then set A(k+1) := A(k) and r(k + 1) := r(k). Remark 8.18. Note that the Gaussian elimination algorithm of Alg. 8.17 stops after at most n steps. Moreover, in its kth step, it can only manipulate elements that have row number at least r(k) and column number at least k (the claim with respect to the (k) column numbers follows since aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1}, cf. the proof of the following Th. 8.19). Theorem 8.19. Let F be a field and m, n ∈ N. Let A = (aji ) ∈ M(m, n, F ). The Gaussian elimination algorithm of Alg. 8.17 yields a matrix à ∈ M(m, n, F ) in echelon form. More precisely, if r(k) = m or k = n + 1, then à := A(k) is in echelon form. Moreover, rk à = rk A and, if A = (B|b) and à = (B̃|b̃) represent the augmented matrices of the linear systems Bx = b and B̃x = b̃, respectively, then L(B|b) = L(B̃|b̃). Proof. Let N ∈ {1, . . . , n + 1} be the maximal k occurring during the Gaussian elimination algorithm, i.e. à := A(N ) . To prove that à is in echelon form, we show by induction on k ∈ {1, . . . , N } that the first k − 1 columns of A(k) are in echelon form as well as the (k) first r(k) rows of A(k) with aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1}. For k = 1, the assertion is trivially true. By induction, we assume the assertion for k < N and prove it for k + 1 (for k = 1, we do not assume anything). As already stated in Rem. 8.18 above, the kth step of Alg. 8.17 does not change the first r(k) − 1 rows of A(k) and it does not change the first k − 1 columns of A(k) , since (by induction hypothesis) (k) aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1}. Moreover, if we are in the case of Alg. 8.17(a), then (k) ∀ i∈{r(k)+1,...,m} (k+1) (k) = aik − aik aik (k) ar(k),k (k) ar(k),k = 0. (k+1) In each case, after the application of (a), (b), or (c) of Alg. 8.17, aij = 0 for each (i, j) ∈ {r(k + 1), . . . , m} × {1, . . . , k}. We have to show that the first r(k + 1) rows of A(k+1) are in echelon form. For Alg. 8.17(c), it is r(k + 1) = r(k) and there is nothing (k+1) to prove. For Alg. 8.17(a),(b), we know ar(k+1),j = 0 for each j ∈ {1, . . . , k}, while (k+1) ar(k),k 6= 0, showing that the first r(k + 1) rows of A(k+1) are in echelon form in each case. Thus, we know that the first r(k + 1) rows of A(k+1) are in echelon form, and all elements in the first k columns of A(k+1) below row r(k + 1) are zero, showing that the first k columns of A(k+1) are also in echelon form. As, at the end of the Gaussian AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 166 8 LINEAR SYSTEMS elimination algorithm, r(k) = m or k = n + 1, we have shown à to be in echelon form. That rk à = rk A is an immediate consequence of Cor. 8.16 combined with a simple induction. If A = (B|b) and à = (B̃|b̃) represent the augmented matrices of the linear systems Bx = b and B̃x = b̃, respectively, then L(B|b) = L(B̃|b̃) is an immediate consequence of Th. 8.15 combined with a simple induction. Remark 8.20. To avoid mistakes, especially when performing the Gaussian elimination algorithm manually, it is advisable to check the row sums after each step. It obviously suffices to consider how row sums are changed if (a) of Alg. 8.17 is carried out. Moreover, let i ∈ {r(k) + 1, . . . , m}, as only rows with row numbers i ∈ {r(k) + 1, . . . , m} can be P (k) (k) changed by (a) in the kth step. If si = nj=1 aij is the sum of the ith row before (a) has been carried out in the kth step, then ! n n (k) X X aik (k) (k+1) (k+1) (k) aij − (k) ar(k),j = = si aij ar(k),k j=1 j=1 (k) (k) = si − aik (k) ar(k),k (k) sr(k) (8.18) must be the sum of the ith row after (a) has been carried out in the kth step. Remark 8.21. Theorem 8.19 shows that the Gaussian elimination algorithm of Alg. 8.17 provides a reliable systematic way of solving linear systems (when combined with the back substitution Alg. 8.10). It has few simple rules, which make it easy to implement as computer code. However, the following Ex. 8.22 illustrates that Alg. 8.17 is not always the most efficient way to obtain a solution and, especially when solving small linear systems manually, it often makes sense to deviate from it. In the presence of roundoff errors, there can also be good reasons to deviate from Alg. 8.17, which are typically addressed in classes on Numerical Analysis (see, e.g. [Phi23, Sec. 5]). Example 8.22. Over the field R, consider the linear system 4 x1 6 0 1 0 x2 = 6 =: b, Ax = 0 2 0 x3 1 −1 −1 (8.19) which we can also write in the form 6x1 x1 + x3 = 4 2x2 = 6 − x2 − x3 = 0 One can quickly solve this linear system without making use of Alg. 8.17 and Alg. 8.10: The second equation yields x2 = 3, which, plugged into the third equation, yields AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 167 8 LINEAR SYSTEMS x1 − x3 = 3. Adding this to the first equation yields 7x1 = 7, i.e. x1 = 1. Thus, the first equation then provides x3 = 4 − 6 = −2 and we obtain 1 (8.20) L(A|b) = 3 . −2 To illustrate Alg. 8.17 and Alg. 8.10, we solve (8.19) once again, first using Alg. 8.17 to transform the system to echelon form, then using Alg. 8.10 to obtain the solution: For (k) each k ∈ {1, 2, 3}, we provide (A(k) |b(k) ), r(k), and the row sums si , i ∈ {1, 2, 3}: 6 0 1 4 2 0 6 , r(1) := 1, (A(1) |b(1) ) := (A|b) = 0 1 −1 −1 0 (1) (1) s1 = 11, s2 = 8, (1) s3 = −1. For k = 2, we apply Alg. 8.10(a), where we replace the third row of (A(1) |b(1) ) by the third row plus (− 16 ) times the first row to obtain 4 6 0 1 2 0 6 , r(2) := r(1) + 1 = 2, (A(2) |b(2) ) := 0 7 0 −1 − 6 − 23 1 (1) 11 17 · s1 = −1 − =− . 6 6 6 For k = 3, we apply Alg. 8.10(a) again, where we replace the third row of (A(2) |b(2) ) by the third row plus 21 times the second row to obtain 6 0 1 4 0 6 , r(3) := r(2) + 1 = 3, (A(3) |b(3) ) := 0 2 0 0 − 67 73 (2) s1 = 11, (3) s1 = 11, (2) s2 = 8, (3) s2 = 8, (2) (1) s3 = s3 − (3) (2) s3 = s3 + 1 (2) 17 7 · s2 = − + 4 = . 2 6 6 Since r(3) = 3, Alg. 8.10 is done and (Ã|b̃) := (A(3) |b(3) ) is in echelon form10 . In particular, we see 3 = rk à = rk(Ã|b̃) = rk(A|b), i.e. Ax = b has a unique solution, which we now determine via the back substitution Alg. 8.10: We obtain 7 6 6 1 x3 = − · = −2, x2 = = 3, x1 = (4 − x3 ) = 1, 3 7 2 6 10 You can find R code that computes (Ã|b̃) from (A|b) in Sec. F.1.4 of the Appendix: Matrix (A|b) is stored in A0 and (Ã|b̃) is computed as part of an LU decomposition of (A|b), using the augmented version of the Gaussian elimination algorithm of Sec. 8.4.4 below ((Ã|b̃) is given by component $Atilde of the LU decomposition) of A0. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 168 8 LINEAR SYSTEMS recovering (8.20)11 . Example 8.23. (a) In Ex. 8.2(b), we saw that, in R4 , the question if b ∈ hv1 , v2 , v3 i with b := (1, 2, 2, 1) and v1 := (1, 2, 0, 3), v2 := (0, 3, 2, 1), v3 := (1, 1, 1, 1) is equivalent to determining if the linear system Ax = b with augmented matrix 1 0 1 | 1 2 3 1 | 2 (A|b) := 0 2 1 | 2 3 1 1 | 1 has a solution. Let us answer this question by transforming the system to echelon form via elementary row operations: Replacing the 2nd row by the 2nd row plus (−2) times the 1st row, and replacing the 4th row by the 4th row plus (−3) times the 1st row yields 1 0 1 | 0 0 3 −1 | 0 0 2 1 | 2 . 0 1 −2 | −2 Switching rows 2 and 4 yields 1 0 0 0 0 1 | 0 1 −2 | −2 . 2 1 | 2 3 −1 | 0 Replacing the 3rd row by the 3rd row plus (−2) times the 2nd row, and replacing the 4th row by the 4th row plus (−3) times the 2nd row yields 1 0 1 | 0 0 1 −2 | −2 0 0 5 | 6 . 0 0 5 | 6 Replacing the 4th row by the 4th row plus (−1) times the 3rd row yields 1 0 1 | 0 0 1 −2 | −2 (Ã|b̃) := 0 0 5 | 6 . 0 0 0 | 0 Thus, 3 = rk A = rk à = rk(Ã|b̃) = rk(A|b), showing Ax = b to have a unique solution and b ∈ hv1 , v2 , v3 i. 11 You can find R code that computes the solution x = (x1 , x2 , x3 ) in Sec. F.1.4 of the Appendix. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 169 8 LINEAR SYSTEMS (b) In Ex. 8.2(a), we saw that the question, if the set M := {v1 , v2 , v3 } with v1 , v2 , v3 as in (a) is a linearly dependent subset of R4 , can be equivalently posed as the question, if the linear system Ax = 0, with the same matrix A as in (a), has a solution x = (x1 , x2 , x3 ) 6= (0, 0, 0). From (a), we know rk A = rk à = 3, implying L(A|0) = {0} and M is linearly independent. 8.4 LU Decomposition 8.4.1 Definition and Motivation If one needs to solve linear systems Ax = b with the same matrix A, but varying b (as, e.g., in the problem of finding an inverse to an n × n matrix A, cf. Ex. 8.2(c)), then it is not efficient to separately transform each (A|b) into echelon form. A more efficient way aims at decomposing A into simpler matrices L and U , i.e. A = LU , then facilitating the simpler solution of Ax = LU x = b when varying b (i.e. without having to transform (A|b) into echelon form for each new b), cf. Rem. 8.25 below. Definition 8.24. Let S be a ring with unity and A ∈ M(m, n, S) with m, n ∈ N. A decomposition A = Là (8.21a) is called an LU decomposition of A if, and only if, L ∈ BL1m (S) (i.e. L is unipotent lower triangular) and à ∈ M(m, n, S) is in echelon form. If A is an m × m matrix, then à ∈ BUm (S) (i.e. à is upper triangular) and (8.21a) is an LU decomposition in the strict sense, which is emphasized by writing U := Ã: A = LU. (8.21b) Remark 8.25. Let F be a field. Suppose the goal is to solve linear systems Ax = b with fixed matrix A ∈ M(m, n, F ) and varying vector b ∈ F n . Moreover, suppose one knows an LU decomposition A = LÃ. Then x ∈ L(A|b) ⇔ Ax = LÃx = b ⇔ Ãx = L−1 b ⇔ x ∈ L(Ã|L−1 b). (8.22) Thus, solving Ax = LÃx = b is equivalent to solving solving Lz = b for z and then solving Ãx = z = L−1 b for x. Since à is in echelon form, Ãx = z = L−1 b can be solved via the back substitution Alg. 8.10; and, since L is lower triangular, Lz = b can be solved via an analogous version of Alg. 8.10, called forward substitution 12 . To obtain the entire set L(A|b), one uses L(A|0) = L(Ã|0) and L(A|b) = x0 + L(A|0) for each 12 You can find code for an R implementation that uses the described strategy to solve Ax = b over R and C given an LU decomposition of A in function gaussElSolveFromLU in Sec. F.1.3 of the Appendix. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 170 8 LINEAR SYSTEMS x0 ∈ L(A|b), where one finds x0 as described above and L(Ã|0) via the back substitution Alg. 8.10. Even though the described strategy works fine in many situations, it can fail in the presence of roundoff errors (we will come back to this issue in Rem. 8.36 below). Example 8.26. Unfortunately, not every matrix has an LU decomposition: Consider, for example 0 1 ∈ M(2, 2, R). A = (aji ) := 1 1 Clearly, rk A = 2, i.e. A is regular. Suppose, A has an LU decomposition, i.e. A = LU with L = (lji ) unipotent lower triangular and U = (uji ) upper triangular. But then 0 = a11 = l11 u11 = u11 , showing the first column of U to be 0, i.e. U is singular, in contradiction to A being regular. — Even though the previous example shows not every matrix has an LU decomposition, we will see in Th. 8.34 below that every matrix does have an LU decomposition, provided one is first allowed to permute its rows in a suitable manner. 8.4.2 Elementary Row Operations Via Matrix Multiplication The LU decomposition of Th. 8.34 below can be obtained, with virtually no extra effort, by performing the Gaussian elimination Alg. 8.17. The proof will be founded on the observation that each elementary row operation can be accomplished via left multiplication with a suitable matrix: For row multiplication, one multiplies from the left with a diagonal matrix, for row addition, one multiplies from the left with a socalled Frobenius matrix or with the transpose of such a matrix (cf. Cor. 8.29 below); for row switching, one multiplies from the left with a so-called permutation matrix (cf. Cor. 8.33(a) below) – where, for Alg. 8.17 and the proof of Th. 8.34, we only need row switching and row addition by means of a Frobenius matrix (not its transpose). Definition 8.27. Let n ∈ N and let S be a ring with unity. We call a unipotent lower triangular matrix A = (aji ) ∈ BL1n (S) a Frobenius matrix of index k, k ∈ {1, . . . , n}, if, and only if, aji = 0 for each i ∈ / {j, k}, i.e. if, and only if, it has the following form: 1 ... 1 ak+1,k 1 A= (8.23) . ak+2,k 1 .. ... . an,k 1 AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 171 8 LINEAR SYSTEMS We define Frokn (S) := A ∈ M(n, S) : A is Frobenius of index k . Proposition 8.28. Let S be a ring with unity, n ∈ N. (a) If m ∈ N, A = (aji ) ∈ M(m, n, S), B = (bji ) ∈ Frokm (S), k ∈ {1, . . . , m}, 1 ... 1 bk+1,k 1 B= , b 1 k+2,k .. ... . bm,k 1 and C = (cji ) = BA ∈ M(m, n, S), then ( m X aji for each j ∈ {1, . . . , k}, cji = bjα aαi = bjk aki + aji for each j ∈ {k + 1, . . . , m}. α=1 (8.24) (b) Let F := S be a field and A = (aji ) ∈ M(m, n, F ), m ∈ N. Moreover, let A(k) 7→ A(k+1) be the matrix transformation occurring in the kth step of the Gaussian elimination Alg. 8.17 and assume the kth step is being performed by Alg. 8.17(a), i.e. by, for each i ∈ {r(k) + 1, . . . , m}, replacing the ith row by the ith row plus (k) (k) −aik /ar(k),k times the r(k)th row, then A(k+1) = Lk A(k) , r(k) Lk ∈ From (F ), (8.25a) where 1 ... 1 −lr(k)+1,r(k) 1 Lk = , −lr(k)+2,r(k) 1 .. ... . −lm,r(k) 1 (k) (k) li,r(k) := aik /ar(k),k . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (8.25b) 172 8 LINEAR SYSTEMS (c) Let k ∈ {1, . . . , n}. If L = (lji ) ∈ Frokn (S), then L−1 ∈ Frokn (S). More precisely, 1 1 ... ... 1 1 lk+1,k 1 −lk+1,k 1 L= ⇒ L−1 = . lk+2,k 1 −lk+2,k 1 . . . . .. .. .. .. ln,k 1 −ln,k 1 (d) If 1 l21 L = (lji ) = .. . 0 1 .. . ... ... ... 0 0 1 .. ∈ BLn (S) . ln1 ln2 . . . 1 is an arbitrary unipotent lower triangular matrix, then it is the product of the following n − 1 Frobenius matrices: L = L̃1 · · · L̃n−1 , where 1 ... 1 lk+1,k 1 L̃k := ∈ Frokn (S) lk+2,k 1 . . .. .. ln,k 1 for each k ∈ {1, . . . , n − 1}. Proof. (a): Formula (8.24) is immediate from the definition of matrix multiplication. (b) follows by applying (a) with B := Lk and A := A(k) . (c): Applying (a) with A := L and B := L−1 , where L−1 is defined as in the statement of (c), we can use (8.24) to compute C = BA: ( a = 0 for each j ∈ {1, . . . , k}, i 6= j, ji 1 for each j ∈ {1, . . . , k}, i = j, (8.24) cji = for each j ∈ {k + 1, . . . , n}, i 6= j, k, −ljk · 0 + 0 = 0 bjk aki + aji = −ljk · 1 + ljk = 0 for each j ∈ {k + 1, . . . , n}, i = k, −ljk · 0 + 1 = 1 for each j ∈ {k + 1, . . . , n}, i = j, AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 173 8 LINEAR SYSTEMS showing C = Id, thereby establishing the case. (d): We proof by induction on k = n − 1, . . . , 1 that 1 0 . .. n−1 0 Y L̃α = 0 α=k 0 . .. 0 ... ... 1 0 1 0 lk+1,k ... . 0 lk+2,k . . 1 .. .. ... ln−1,n−2 1 ... . . . . . 0 lnk . . . ln,n−2 ln,n−1 1 =: Ak . For k = n − 1, there is nothing to prove, since L̃n−1 = An−1 by definition. So let 1 ≤ k < n − 1. We have to show that C := L̃k Ak+1 = Ak . Letting A := Ak+1 and letting B := L̃k , we can use (8.24) to compute C := BA: ( 0 for each j ∈ {1, . . . , k}, j 6= i, a = ji 1 for each j ∈ {1, . . . , k}, j = i, (8.24) ( cji = ljk · 0 + aji = aji for each j ∈ {k + 1, . . . , n}, i 6= k, b a + a = jk ki ji l ·1+0=l for each j ∈ {k + 1, . . . , n}, i = k, jk jk showing C = Ak , thereby establishing the case. Corollary 8.29. Consider the situation of Def. 8.13, i.e. let F be a field, m, n ∈ N, and A ∈ M(m, n, F ) with rows r1 , . . . , rm . (a) The elementary row operation of row multiplication, i.e. of replacing row ri by some nonzero multiple λri of that row, i ∈ {1, . . . , m}, λ ∈ F \{0}, can be accomplished by left-multiplying A by the diagonal matrix Dλ := diag(d1 , . . . , dm ) ∈ Dm (F ), where ( λ for j = i, dj := ∀ j∈{1,...,m} 1 for j 6= i. (b) The elementary row operation of row addition, i.e. of replacing row ri by the sum ri + λrk of that row and a multiple of another row, (i, k) ∈ {1, . . . , m}2 , i 6= k, λ ∈ F , can be accomplished, for i > k, by left-multiplying A by the Frobenius AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 174 8 LINEAR SYSTEMS matrix Fλ ∈ Frokm (F ), 1 ... 1 fk+1,k 1 Fλ = , fk+2,k 1 . . .. .. fm,k 1 ∀ j∈{k+1,...,m} fj,k := ( λ 0 for j = i, for j 6= i, and, for i < k, by left-multiplying A by the transpose of the Frobenius matrix Fλ ∈ Froim (F ), where Fλ is as above, except with i and k switched. Proof. (a) is immediate from (7.27a). (b): For i > k, we apply Prop. 8.28(a) with B := Fλ , obtaining for (cji ) := Fλ A, for each j ∈ {1, . . . , k}, ajl cjl = fjk akl + ajl = λakl + ajl for j = i, fjk akl + ajl = ajl for each j ∈ {k + 1, . . . , m}, j 6= i, thereby establishing the case. For i < k, we obtain, for (bji ) := (Fλ )t , (cji ) := (Fλ )t A, ( m X ajl for each j ∈ {1, . . . , m}, j 6= i, cjl = bjα aαl = ajl + bjk akl = ajl + λakl for j = i, α=1 once again, establishing the case. Definition 8.30. Let n ∈ N and recall the definition of the symmetric group Sn from Ex. 4.9(b). Moreover, let F be a field (actually, the following definition still makes sense as long as F is a set containing elements 0 and 1). For each permutation π ∈ Sn , we define an n × n matrix Pπ := etπ(1) etπ(2) . . . etπ(n) ∈ M(n, F ), (8.26) where the ei denote the standard unit (row) vectors of F n . The matrix Pπ is called the permutation matrix (in M(n, F )) associated with π. We define Pern (F ) := Pπ ∈ M(n, F ) : π ∈ Sn . Proposition 8.31. Let n ∈ N and let F be a field. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 8 LINEAR SYSTEMS 175 (a) If π ∈ Sn and the columns of Pπ ∈ Pern (F ) are given according to (8.26), then the rows of Pπ are given according to the inverse permutation π −1 , eπ−1 (1) eπ−1 (2) t Pπ = (8.27) . . . , (Pπ ) = Pπ−1 . eπ−1 (n) (b) P := (pji ) ∈ M(n, F ) is a permutation matrix if, and only if, each row and each column of P have precisely one entry 1 and all other entries of P are 0. (c) Left multiplication of a matrix A ∈ M(n, m, F ), m ∈ N, by a permutation matrix Pπ ∈ Pern (F ), permutes the rows of A according to π, — rπ(1) — — r1 — — r2 — — rπ(2) — Pπ , = .. .. — . — — . — — rπ(n) — — rn — which follows from the special case eπ−1 (1) eπ−1 (1) · eti eπ−1 (2) t eπ−1 (2) · eti = etπ(i) , Pπ eti = . . . ei = ... t eπ−1 (n) eπ−1 (n) · ei that holds for each i ∈ {1, . . . , n}. (d) Right multiplication of a matrix A ∈ M(m, n, F ), m ∈ N, by a permutation matrix Pπ ∈ Pern (F ) permutes the columns of A according to π −1 , | | ... | | | ... | c1 c2 . . . cn Pπ = cπ−1 (1) cπ−1 (2) . . . cπ−1 (n) , | | ... | | | ... | which follows from the special case ei Pπ = ei etπ(1) etπ(2) . . . etπ(n) = ei · etπ(1) ei · etπ(2) . . . ei · etπ(n) = eπ−1 (i) , that holds for each i ∈ {1, . . . , n}. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 176 8 LINEAR SYSTEMS (e) For each π, σ ∈ Sn , one has Pπ Pσ = Pπ◦σ , (8.28) such that I : Sn −→ Pern (F ), I(π) := Pπ , constitutes a group isomorphism. In particular, Pern (F ) is a subgroup of GLn (F ) and ∀ P −1 = P t . (8.29) P ∈Pern (F ) Proof. (a): Consider the ith row of (pji ) := Pπ . Then pji = 1 if, and only if, j = π(i) (since pji also belongs to the ith column). Thus, pji = 1 if, and only if, i = π −1 (j), which means that the jth row is eπ−1 (j) . (b): Suppose P = Pπ , π ∈ Sn , n ∈ N, is a permutation matrix. Then (8.26) implies that each column of P has precisely one 1 and all other elements are 0; (8.27) implies that each row of P has precisely one 1 and all other elements are 0. Conversely, suppose that P ∈ M(n, F ) is such that each row and each column of P have precisely one entry 1 and all other entries of P are 0. Define a map π : {1, . . . , n} −→ {1, . . . , n} by letting π(i) be such that pπ(i),i = 1. Then P = etπ(1) etπ(2) . . . etπ(n) , and it just remains to show that π is a permutation. It suffices to show that π is surjective. To that end, let k ∈ {1, . . . , n}. Then there is a unique i ∈ {1, . . . , n} such that pki = 1. But then, according to the definition of π, π(i) = k, showing that π is surjective and P = Pπ . (c): Note that, for A = (aji ) ∈ M(n, m, F ) A= m X n X i=1 j=1 aji 0 ... etj |{z} ith column ... 0 . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 177 8 LINEAR SYSTEMS Thus, Pπ A = m X n X aji Pπ i=1 j=1 etj |{z} 0 ... ... 0 ith column eπ−1 (1) t eπ−1 (2) 0 . . . e . . . 0 j aji = |{z} ... ith column i=1 j=1 eπ−1 (n) 0 . . . eπ−1 (1) · ej . . . 0 0 . . . eπ−1 (2) · ej . . . 0 m X n X . . . = aji 0 . . . eπ−1 (n) · ej . . . 0 i=1 j=1 | {z } n m X X = m X n X aji i=1 j=1 0 ... ith column etπ(j) ... |{z} ith column which establishes the case. 0 , (d): We have t | | ... | c1 c2 . . . cn Pπ | | ... | (8.27) = = proving (d). — cπ−1 (1) — — c1 — — c2 — — cπ−1 (2) — (c) Pπ−1 = .. .. — . — — . — — cπ−1 (n) — — cn — t | | ... | cπ−1 (1) cπ−1 (2) . . . cπ−1 (n) , | | ... | (e): For π, σ ∈ Sn , we compute eπ−1 (1) eπ−1 (2) t eσ(1) etσ(2) . . . etσ(n) Pπ Pσ = ... eπ−1 (n) = et(π◦σ)(1) et(π◦σ)(2) . . . et(π◦σ)(n) = Pπ◦σ , (∗) showing (8.28), where, at “(∗)”, we used that ∀ i,j∈{1,...,n} π −1 (i) = σ(j) ⇔ i = (π ◦ σ)(j). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 178 8 LINEAR SYSTEMS Both surjectivity and injectivity of I are clear from Def. 8.30 and, then, Pern (F ) must be a group by Prop. 4.11(c). Combining (8.28) with (b) proves (8.29) and, thus, also shows Pern (F ) to be a subgroup of GLn (F ). Definition and Remark 8.32. Let n ∈ N and let F be a field. A permutation matrix Pτ ∈ Pern (F ), corresponding to a transposition τ = (ji) ∈ Sn (τ permutes i and j and leaves all other elements fixed), is called a transposition matrix and is denoted by Pji := Pτ . The transposition matrix Pji has the form 1 .. . 1 1 0 1 . . .. Pji = (8.30) 1 1 0 1 ... 1 Since every permutation is the composition of a finite number of transpositions (which we will prove in Linear Algebra II), it is implied by Prop. 8.31(e) that every permutation matrix is the product of a finite number of transposition matrices. It is immediate from Prop. 8.31(c),(d) that left multiplication of a matrix A ∈ M(n, m), m ∈ N, by Pji ∈ Pern (F ) switches the ith and jth row of A, whereas right multiplication of A ∈ M(m, n) by Pji ∈ Pern (F ) switches the ith and jth column of A. Corollary 8.33. Consider the situation of Def. 8.13, i.e. let F be a field, m, n ∈ N, and A ∈ M(m, n, F ) with rows r1 , . . . , rm . (a) The elementary row operation of row switching, i.e. of switching rows ri and rj , where i, j ∈ {1, . . . , m}, can be accomplished by left-multiplying A by the transposition matrix Pij ∈ Perm (F ). (b) Let A(k) 7→ A(k+1) be the matrix transformation occurring in the kth step of the Gaussian elimination Alg. 8.17 and assume the kth step is being performed by Alg. 8.17(b), i.e. by first switching the ith row with the r(k)th row and then, for each i ∈ (k) (k) {r(k)+1, . . . , m}, replacing the (new) ith row by the (new) ith row plus −aik /ar(k),k times the (new) r(k)th row, then A(k+1) = Lk Pi,r(k) A(k) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (8.31) 179 8 LINEAR SYSTEMS r(k) with transposition matrix Pi,r(k) ∈ Perm (F ) and with Lk ∈ From (F ) being the Frobenius matrix of index r(k) defined in (8.25b). Proof. (a) is immediate from Prop. 8.31(c) (cf. Def. and Rem. (8.32)). (b) is due to (a) combined with Prop. 8.28(b). 8.4.3 Existence Theorem 8.34. Let F be a field and A ∈ M(m, n, F ) with m, n ∈ N. Then there exists a permutation matrix P ∈ Perm (F ) such that P A has an LU decomposition in the sense of Def. 8.24. More precisely, if à ∈ M(m, n, F ) is the matrix in echelon form resulting at the end of the Gaussian elimination Alg. 8.17 applied to A, then there exist a permutation matrix P ∈ Perm (F ) and a unipotent lower triangular matrix L ∈ BL1m (F ) such that P A = LÃ. (8.32) It is an LU decomposition in the strict sense (i.e. with à ∈ BUm (F )) for A being quadratic (i.e. m × m). If no row switches occurred during the application of Alg. 8.17, then P = Idm and A itself has an LU decomposition. If A ∈ GLn (F ) has an LU decomposition, then its LU decomposition is unique. Proof. Let N ∈ {0, . . . , n} be the final number of steps that is needed to perform the Gaussian elimination Alg. 8.17. If N = 0, then A consists of precisely one row and there is nothing to prove (set L := P := (1)). If N ≥ 1, then let A(1) := A and, recursively, for each k ∈ {1, . . . , N }, let A(k+1) be the matrix obtained in the kth step of the Gaussian elimination algorithm. If Alg. 8.17(b) is used in the kth step, then, according to Cor. 8.33(b), A(k+1) = Lk Pk A(k) , (8.33) where Pk := Pi,r(k) ∈ Perm (F ) is the transposition matrix that switches rows r(k) and i, r(k) while Lk ∈ From (F ) is the Frobenius matrix of index r(k) given by (8.25b). If (a) or (c) of Alg. 8.17 is used in the kth step, then (8.33) also holds, namely with Pk := Idm for (a) and with Lk := Pk := Idm for (c). By induction, (8.33) implies à = A(N +1) = LN PN · · · L1 P1 A. (8.34) To show that we can transform (8.34) into (8.32), we first rewrite the right-hand side of (8.34) taking into account Pk−1 = Pk for each of the transposition matrices Pk : à = LN (PN LN −1 PN )(PN PN −1 LN −2 PN −1 PN )(PN PN −1 PN −2 · · · L1 P2 · · · PN )PN PN −1 · · · P2 P1 A. | {z } | | {z } {z } Idm Idm Idm (8.35) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 180 8 LINEAR SYSTEMS Defining L′N := LN , L′k := PN PN −1 · · · Pk+1 Lk Pk+1 · · · PN −1 PN for each k ∈ {1, . . . , N − 1}, (8.36) (8.35) takes the form à = L′N · · · L′1 PN PN −1 · · · P2 P1 A. (8.37) We now observe that the L′k are still Frobenius matrices of index r(k), except that the entries lj,r(k) of Lk with j > r(k) have been permuted according to PN PN −1 · · · Pk+1 : This follows since 1 1 .. .. . . 1 1 .. .. ... ... . . Pji Pji = , li,r(k) 1 lj,r(k) 1 . . . . .. .. .. .. lj,r(k) 1 li,r(k) 1 .. .. ... ... . . as left multiplication by Pji switches the ith and jth row, switching li,r(k) and lj,r(k) and moving the corresponding 1’s off the diagonal, whereas right multiplication by Pji switches ith and jth column, moving both 1’s back onto the diagonal while leaving the r(k)th column unchanged. Finally, using that each (L′k )−1 , according to Prop. 8.28(c), exists and is also a Frobenius matrix, (8.37) becomes Là = P A (8.38) with P := PN PN −1 · · · P2 P1 , L := (L′1 )−1 · · · (L′N )−1 . (8.39) Comparing with Prop. 8.28(d) shows that L = (L′1 )−1 · · · (L′N )−1 is a unipotent lower triangular matrix, thereby establishing (8.32). Note: From Prop. 8.28(d) and the definition of the Lk , we actually know all entries of L explicitly: Every nonzero entry of the r(k)th column is given by (8.25b). This will be used when formulating the LU decomposition algorithm in Sec. 8.4.4 below. If A is invertible, then so is U := à = L−1 A. If A is invertible, and we have LU decompositions L1 U1 = A = L2 U2 , then we obtain U1 U2−1 = L−1 1 L2 =: E. As both upper triangular and unipotent lower triangular matrices are closed under matrix multiplication, the matrix E is unipotent AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 181 8 LINEAR SYSTEMS and both upper and lower triangular, showing E = Idm . This, in turn, yields U1 = U2 as well as L1 = L2 , i.e. the claimed uniqueness of the LU decomposition for invertible A. Remark 8.35. Note that, even for invertible A ∈ GLn (F ), the triple (P, L, Ã) of (8.32) is, in general, not unique. For example 1 0 1 1 1 1 1 0 , = 0 1 1 0 0 −1 1 1 | {z } | {z } | {z } | {z } P1 A L1 Ã1 1 1 1 0 1 0 0 1 . = 1 0 0 1 1 1 1 0 | {z } | {z } | {z } | {z } P2 A L2 Ã2 Remark 8.36. Let F be a field. Suppose the goal is to solve linear systems Ax = b with fixed matrix A ∈ M(m, n, F ) and varying vector b ∈ F n . Moreover, suppose one knows an LU decomposition P A = Là with permutation matrix P ∈ Perm (F ). One can then proceed as in Rem. 8.25 above, except with A replaced by P A and b replaced by P b (you can find code for an R implementation that uses the described strategy to solve Ax = b over R and C given an LU decomposition of A in function gaussElSolveFromLU in Sec. F.1.3 of the Appendix). As already indicated in Rem. 8.25, in the presence of roundoff errors, such errors might be magnified by the use of an LU decomposition. If one works over the fields R or C, then there are other decompositions (e.g. the so-called QR decomposition) that tend to behave more benignly in the presence of roundoff errors (cf., e.g., [Phi23, Rem. 5.14] and [Phi23, Sec. 5.4.3]). 8.4.4 Algorithm Let us crystallize the proof of Th. 8.34 into an algorithm to actually compute the matrices L, P , and à occurring in decomposition (8.32) of a given A ∈ M(m, n, F ). As in the proof of Th. 8.19, let N ∈ {1, . . . , n + 1} be the maximal k occurring during the Gaussian elimination Alg. 8.17 applied to A. For F ∈ {R, C}, you can find code for an R implementation of the following algorithm in function gaussEl in Sec. F.1.3 of the Appendix. (a) Algorithm for Ã: Starting with A(1) := A, the final step of Alg. 8.17 yields the matrix à := A(N ) ∈ M(m, n, F ) in echelon form. (b) Algorithm for P : Starting with P (1) := Idm , in the kth step of Alg. 8.17, define P (k+1) := Pk P (k) , where Pk := Pi,r(k) if rows r(k) and i are switched according to Alg. 8.17(b), and Pk := Idm , otherwise. In the last step, one obtains P := P (N ) . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 182 8 LINEAR SYSTEMS (c) Algorithm for L: Starting with L(1) := 0 (zero matrix), we obtain L(k+1) from L(k) in the kth step of Alg. 8.17 as follows: For Alg. 8.17(c), set L(k+1) := L(k) . If rows r(k) and i are switched according to Alg. 8.17(b), then we first switch rows r(k) and i in L(k) to obtain some L̃(k) (this conforms to the definition of the L′k in (8.36), we will come back to this point below). For Alg. 8.17(a) and for the elimination step of Alg. 8.17(b), we first copy all elements of L(k) (resp. L̃(k) ) to L(k+1) , but then change (k+1) (k) (k) the elements of the r(k)th column according to (8.25b): Set li,r(k) := aik /ar(k),k for each i ∈ {r(k) + 1, . . . , m}. In the last step, we obtain L from L(N ) by setting all elements on the diagonal to 1 (postponing this step to the end avoids worrying about the diagonal elements when switching rows earlier). To see that this procedure does, indeed, provide the correct L, we go back to the proof of Th. 8.34: From (8.38), (8.39), and (8.36), it follows that the r(k)th column of L is precisely the r(k)th column of L−1 k permuted according to PN · · · Pk+1 . This is precisely what is described in (c) above: We start with the r(k)th column of L−1 k (k+1) (k) (k) by setting li,r(k) := aik /ar(k),k and then apply Pk+1 , . . . , PN during the remaining steps. Remark 8.37. Note that in the kth step of the algorithm for Ã, we eliminate all elements of the kth column below the row with number r(k), while in the kth step of the algorithm for L, we populate elements of the r(k)th column below the diagonal for the first time. Thus, when implementing the algorithm in practice, one can save memory capacity by storing the new elements for L at the locations of the previously eliminated elements of A. This strategy is sometimes called compact storage. Example 8.38. Let us determine the LU decomposition (8.32) (i.e. the matrices P , L, and Ã) for the matrix 1 4 2 3 0 0 1 4 A := 2 6 3 1 ∈ M(4, 4, R). 1 2 1 0 According to the algorithm described above, we start by initializing A(1) := A, P (1) := Id4 , L(1) := 0, r(1) := 1 (where the function r is the one introduced in the Gaussian elimination Alg. 8.17). We AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 183 8 LINEAR SYSTEMS use Alg. 8.17(a) to eliminate in the first column, obtaining P (2) = P (1) = Id4 , 1 4 2 3 0 0 1 4 A(2) = 0 −2 −1 −5 , 0 −2 −1 −3 0 0 L(2) = 2 1 0 0 0 0 0 0 0 0 0 0 , 0 0 r(2) = r(1) + 1 = 2. We now need to apply Alg. 8.17(b) and we switch rows 2 and 3 using P23 : 1 0 0 0 0 0 1 0 P (3) = P23 P (2) = 0 1 0 0 , 0 0 0 1 1 4 2 3 0 0 0 0 2 0 0 0 , P23 A(2) = 0 −2 −1 −5 . P23 L(2) = 0 0 0 0 0 0 1 4 0 −2 −1 −3 1 0 0 0 Eliminating the second column yields 1 4 2 3 0 0 0 0 0 −2 −1 −5 2 0 0 0 (3) , L(3) = 0 0 0 0 , A = 0 0 1 4 0 0 0 2 1 1 0 0 r(3) = r(2) + 1 = 3. Accidentally, elimination of the third column does not require any additional work and we have 1 0 0 0 1 0 0 0 2 1 0 0 0 0 1 0 (4) (3) P = P (4) = P (3) = 0 1 0 0 , L = L , L = 0 0 1 0 , 1 1 0 1 0 0 0 1 1 4 2 3 0 −2 −1 −5 , r(4) = r(3) + 1 = 4. à = A(4) = A(3) = 0 0 1 4 0 0 0 2 Recall that L is obtained from L(4) by setting the diagonal values to 1. One checks that, indeed, P A = LÃ. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 184 8 LINEAR SYSTEMS Example 8.39. We remain in the situation of the previous Ex. 8.38. We see that rk A = rk à = 4, showing A to be invertible. In Ex. 8.2(c), we observed that we obtain the columns vk of A−1 by solving the linear systems Av1 = e1 , . . . , Avn = en , where n = 4 in the present case and where e1 , . . . , en denote the standard (column) basis vectors. We now determine the vk using the LU decomposition of Ex. 8.38 together with the strategies described in Rem. 8.36 and Rem. 8.25: First, with 1 0 0 0 2 1 0 0 L= 0 0 1 0 1 1 0 1 we solve Lz1 = P e1 = e1 , Lz2 = P e2 = e3 , Lz3 = P e3 = e2 , Lz4 = P e4 = e4 , using forward substitution: z11 = 1, z21 = 0, z31 = 0, z41 = 0, z12 = −2z11 = −2, z22 = 0, z32 = 1 − 2z31 = 1, z42 = 0, z13 = 0, z23 = 1, z33 = 0, z43 = 0, obtaining the column vectors 0 1 0 −2 z1 = 0 , z2 = 1 , 1 0 Next, with z14 = −z11 − z12 = −1 + 2 = 1, z24 = 0, z34 = −z31 − z32 = −1, z44 = 1, 0 1 z3 = 0 , −1 0 0 z4 = 0 . 1 1 4 2 3 0 −2 −1 −5 à = 0 0 1 4 0 0 0 2 we solve Ãv1 = z1 , Ãv2 = z2 , Ãv3 = z3 , Ãv4 = z4 , AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 185 A AXIOMATIC SET THEORY using backward substitution: v14 = 1 , 2 3 1 (−2 + v13 + 5v14 ) = , 2 4 1 1 (0 + v23 + 5v24 ) = − , v22 = − 2 2 1 1 v32 = − (1 + v33 + 5v34 ) = − , 2 4 1 1 v42 = − (0 + v43 + 5v44 ) = − , 2 4 v13 = 0 − 4v14 = −2, v12 = v23 = 1 − 4v24 = 1, v24 = 0, 1 v34 = − , 2 1 v44 = , 2 v33 = 0 − 4v34 = 2, v43 = 0 − 4v44 = −2, obtaining the column vectors 1 − 0 − 1 v2 = 2 , 1 0 2 3 v1 = 4 , −2 1 2 Thus, v11 = 1 − 4v12 − 2v13 − 3v14 = 1 , 2 v21 = 0 − 4v22 − 2v23 − 3v24 = 0, 3 v31 = 0 − 4v32 − 2v33 − 3v34 = − , 2 7 v41 = 0 − 4v42 − 2v43 − 3v44 = , 2 3 −2 − 1 v3 = 4 , 2 − 12 7 2 − 1 v4 = 4 . −2 1 2 2 0 −6 14 1 3 −2 −1 −1 . A−1 = · 8 −8 4 −8 4 2 0 −2 2 One verifies 1 0 AA−1 = 2 1 4 0 6 2 2 1 3 1 3 1 2 0 −6 14 3 −2 −1 −1 0 1 4 · · = 1 4 −8 4 8 −8 0 0 0 2 0 −2 2 0 1 0 0 0 0 1 0 0 0 . 0 1 You can find R code that computes A−1 via the above strategy (i.e. by first computing an LU decomposition of A) in Sec. F.1.4 of the Appendix. A Axiomatic Set Theory A.1 Motivation, Russell’s Antinomy As it turns out, naive set theory, founded on the definition of a set according to Cantor (as stated at the beginning of Sec. 1.3) is not suitable to be used in the foundation of mathematics. The problem lies in the possibility of obtaining contradictions such as Russell’s antinomy, after Bertrand Russell, who described it in 1901. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 A AXIOMATIC SET THEORY 186 Russell’s antinomy is obtained when considering the set X of all sets that do not contain themselves as an element: When asking the question if X ∈ X, one obtains the contradiction that X ∈ X ⇔ X ∈ / X: Suppose X ∈ X. Then X is a set that contains itself. But X was defined to contain only sets that do not contain themselves, i.e. X ∈ / X. So suppose X ∈ / X. Then X is a set that does not contain itself. Thus, by the definition of X, X ∈ X. Perhaps you think Russell’s construction is rather academic, but it is easily translated into a practical situation. Consider a library. The catalog C of the library should contain all the library’s books. Since the catalog itself is a book of the library, it should occur as an entry in the catalog. So there can be catalogs such as C that have themselves as an entry and there can be other catalogs that do not have themselves as an entry. Now one might want to have a catalog X of all catalogs that do not have themselves as an entry. As in Russell’s antinomy, one is led to the contradiction that the catalog X must have itself as an entry if, and only if, it does not have itself as an entry. One can construct arbitrarily many versions, which we will not do. Just one more: Consider a small town with a barber, who, each day, shaves all inhabitants, who do not shave themselves. The poor barber now faces a terrible dilemma: He will have to shave himself if, and only if, he does not shave himself. To avoid contradictions such as Russell’s antinomy, axiomatic set theory restricts the construction of sets via so-called axioms, as we will see below. A.2 Set-Theoretic Formulas The contradiction of Russell’s antinomy is related to Cantor’s sets not being hierarchical. Another source of contradictions in naive set theory is the imprecise nature of informal languages such as English. In (1.6), we said that A := {x ∈ B : P (x)} defines a subset of B if P (x) is a statement about an element x of B. Now take B := N := {1, 2, . . . } to be the set of the natural numbers and let P (x) := “The number x can be defined by fifty English words or less”. (A.1) Then A is a finite subset of N, since there are only finitely many English words (if you think there might be infinitely many English words, just restrict yourself to the words contained in some concrete dictionary). Then there is a smallest natural number n that is not in A. But then n is the smallest natural number that can not be defined by AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 A AXIOMATIC SET THEORY 187 fifty English words or less, which, actually, defines n by less than fifty English words, in contradiction to n ∈ / A. To avoid contradictions of this type, we require P (x) to be a so-called set-theoretic formula. Definition A.1. (a) The language of set theory consists precisely of the following symbols: ∧, ¬, ∃, (, ), ∈, =, vj , where j = 1, 2, . . . . (b) A set-theoretic formula is a finite string of symbols from the above language of set theory that can be built using the following recursive rules: (i) vi ∈ vj is a set-theoretic formula for i, j = 1, 2, . . . . (ii) vi = vj is a set-theoretic formula for i, j = 1, 2, . . . . (iii) If φ and ψ are set-theoretic formulas, then (φ) ∧ (ψ) is a set-theoretic formula. (iv) If φ is a set-theoretic formulas, then ¬(φ) is a set-theoretic formula. (v) If φ is a set-theoretic formulas, then ∃vj (φ) is a set-theoretic formula for j = 1, 2, . . . . Example A.2. Examples of set-theoretic formulas are (v3 ∈ v5 ) ∧ (¬(v2 = v3 )), ∃v1 (¬(v1 = v1 )); examples of symbol strings that are not set-theoretic formulas are v1 ∈ v2 ∈ v3 , ∃∃¬, and ∈ v3 ∃. Remark A.3. It is noted that, for a given finite string of symbols, a computer can, in principle, check in finitely many steps, if the string constitutes a set-theoretic formula or not. The symbols that can occur in a set-theoretic formula are to be interpreted as follows: The variables v1 , v2 , . . . are variables for sets. The symbols ∧ and ¬ are to be interpreted as the logical operators of conjunction and negation as described in Sec. 1.2.2. Similarly, ∃ stands for an existential quantifier as in Sec. 1.4: The statement ∃vj (φ) means “there exists a set vj that has the property φ”. Parentheses ( and ) are used to make clear the scope of the logical symbols ∃, ∧, ¬. Where the symbol ∈ occurs, it is interpreted to mean that the set to the left of ∈ is contained as an element in the set to the right of ∈. Similarly, = is interpreted to mean that the sets occurring to the left and to the right of = are equal. Remark A.4. A disadvantage of set-theoretic formulas as defined in Def. A.1 is that they quickly become lengthy and unreadable (at least to the human eye). To make formulas more readable and concise, one introduces additional symbols and notation. Formally, additional symbols and notation are always to be interpreted as abbreviations or transcriptions of actual set-theoretic formulas. For example, we use the rules of Th. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 188 A AXIOMATIC SET THEORY 1.11 to define the additional logical symbols ∨, ⇒, ⇔ as abbreviations: (φ) ∨ (ψ) (φ) ⇒ (ψ) (φ) ⇔ (ψ) is short for is short for is short for ¬((¬(φ)) ∧ (¬(ψ))) (¬(φ)) ∨ (ψ) ((φ) ⇒ (ψ)) ∧ ((ψ) ⇒ (φ)) (cf. Th. 1.11(j)), (cf. Th. 1.11(a)), (cf. Th. 1.11(b)). (A.2a) (A.2b) (A.2c) Similarly, we use (1.17a) to define the universal quantifier: ∀vj (φ) is short for ¬(∃vj (¬(φ))). (A.2d) Further abbreviations and transcriptions are obtained from omitting parentheses if it is clear from the context and/or from Convention 1.10 where to put them in, by writing variables bound by quantifiers under the respective quantifiers (as in Sec. 1.4), and by using other symbols than vj for set variables. For example, ∀ (φ ⇒ ψ) transcribes ¬(∃v1 (¬((¬(φ)) ∨ (ψ)))). x Moreover, vi 6= vj is short for ¬(vi = vj ); vi ∈ / vj is short for ¬(vi ∈ vj ). (A.2e) Remark A.5. Even though axiomatic set theory requires the use of set-theoretic formulas as described above, the systematic study of formal symbolic languages is the subject of the field of mathematical logic and is beyond the scope of this class (see, e.g., [EFT07]). In Def. and Rem. 1.15, we defined a proof of statement B from statement A1 as a finite sequence of statements A1 , A2 , . . . , An such that, for 1 ≤ i < n, Ai implies Ai+1 , and An implies B. In the field of proof theory, also beyond the scope of this class, such proofs are formalized via a finite set of rules that can be applied to (set-theoretic) formulas (see, e.g., [EFT07, Sec. IV], [Kun12, Sec. II]). Once proofs have been formalized in this way, one can, in principle, mechanically check if a given sequence of symbols does, indeed, constitute a valid proof (without even having to understand the actual meaning of the statements). Indeed, several different computer programs have been devised that can be used for automatic proof checking, for example Coq [Wik22a], HOL Light [Wik21], Isabelle [Wik22b] and Lean [Wik22c] to name just a few. A.3 The Axioms of Zermelo-Fraenkel Set Theory Axiomatic set theory seems to provide a solid and consistent foundation for conducting mathematics, and most mathematicians have accepted it as the basis of their everyday work. However, there do remain some deep, difficult, and subtle philosophical issues regarding the foundation of logic and mathematics (see, e.g., [Kun12, Sec. 0, Sec. III]). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 189 A AXIOMATIC SET THEORY Definition and Remark A.6. An axiom is a statement that is assumed to be true without any formal logical justification. The most basic axioms (for example, the standard axioms of set theory) are taken to be justified by common sense or some underlying philosophy. However, on a less fundamental (and less philosophical) level, it is a common mathematical strategy to state a number of axioms (for example, the axioms defining the mathematical structure called a group), and then to study the logical consequences of these axioms (for example, group theory studies the statements that are true for all groups as a consequence of the group axioms). For a given system of axioms, the question if there exists an object satisfying all the axioms in the system (i.e. if the system of axioms is consistent, i.e. free of contradictions) can be extremely difficult to answer. — We are now in a position to formulate and discuss the axioms of axiomatic set theory. More precisely, we will present the axioms of Zermelo-Fraenkel set theory, usually abbreviated as ZF, which are Axiom 0 – Axiom 8 below. While there exist various set theories in the literature, each set theory defined by some collection of axioms, the axioms of ZFC, consisting of the axioms of ZF plus the axiom of choice (Axiom 9, see Sec. A.4 below), are used as the foundation of mathematics currently accepted by most mathematicians. A.3.1 Existence, Extensionality, Comprehension Axiom 0 Existence: ∃ X (X = X). Recall that this is just meant to be a more readable transcription of the set-theoretic formula ∃v1 (v1 = v1 ). The axiom of existence states that there exists (at least one) set X. In Def. 1.18 two sets are defined to be equal if, and only if, they contain precisely the same elements. In axiomatic set theory, this is guaranteed by the axiom of extensionality: Axiom 1 Extensionality: ∀ X ∀ Y ∀ (z ∈ X ⇔ z ∈ Y ) ⇒ X = Y . z Following [Kun12], we assume that the substitution property of equality is part of the underlying logic, i.e. if X = Y , then X can be substituted for Y and vice versa without AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 190 A AXIOMATIC SET THEORY changing the truth value of a (set-theoretic) formula. In particular, this yields the converse to extensionality: X = Y ⇒ ∀ (z ∈ X ⇔ z ∈ Y ) . ∀ ∀ X z Y Before we discuss further consequences of extensionality, we would like to have the existence of the empty set. However, Axioms 0 and 1 do not suffice to prove the existence of an empty set (see [Kun12, I.6.3]). This, rather, needs the additional axiom of comprehension. More precisely, in the case of comprehension, we do not have a single axiom, but a scheme of infinitely many axioms, one for each set-theoretic formula. Its formulation makes use of the following definition: Definition A.7. One obtains the universal closure of a set-theoretic formula φ, by writing ∀ in front of φ for each variable vj that occurs as a free variable in φ (recall vj from Def. 1.31 that vj is free in φ if, and only if, it is not bound by a quantifier in φ). Axiom 2 Comprehension Scheme: For each set-theoretic formula φ, not containing Y as a free variable, the universal closure of ∃ ∀ x ∈ Y ⇔ (x ∈ X ∧ φ) Y x is an axiom. Thus, the comprehension scheme states that, given the set X, there exists (at least one) set Y , containing precisely the elements of X that have the property φ. Remark A.8. Comprehension does not provide uniqueness. However, if both Y and Y ′ are sets containing precisely the elements of X that have the property φ, then ′ ∀ x ∈ Y ⇔ (x ∈ X ∧ φ) ⇔ x ∈ Y , x and, then, extensionality implies Y = Y ′ . Thus, due to extensionality, the set Y given by comprehension is unique, justifying the notation {x : x ∈ X ∧ φ} := {x ∈ X : φ} := Y (A.3) (this is the axiomatic justification for (1.6)). Theorem A.9. There exists a unique empty set (which we denote by ∅ or by 0 – it is common to identify the empty set with the number zero in axiomatic set theory). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 191 A AXIOMATIC SET THEORY Proof. Axiom 0 provides the existence of a set X. Then comprehension allows us to define the empty set by 0 := ∅ := {x ∈ X : x 6= x}, where, as explained in Rem. A.8, extensionality guarantees uniqueness. Remark A.10. In Rem. A.4 we said that every formula with additional symbols and notation is to be regarded as an abbreviation or transcription of a set-theoretic formula as defined in Def. A.1(b). Thus, formulas containing symbols for defined sets (e.g. 0 or ∅ for the empty set) are to be regarded as abbreviations for formulas without such symbols. Some logical subtleties arise from the fact that there is some ambiguity in the way such abbreviations can be resolved: For example, 0 ∈ X can abbreviate either / y). or χ : ∀ φ(y) ⇒ y ∈ X , where φ(y) stands for ∀ (v ∈ ψ : ∃ φ(y) ∧ y ∈ X v y y Then ψ and χ are equivalent if ∃! φ(y) is true (e.g., if Axioms 0 – 2 hold), but they can y be nonequivalent, otherwise (see discussion between Lem. 2.9 and Lem. 2.10 in [Kun80]). — At first glance, the role played by the free variables in φ, which are allowed to occur in Axiom 2, might seem a bit obscure. So let us consider examples to illustrate that allowing free variables (i.e. set parameters) in comprehension is quite natural: Example A.11. (a) Suppose φ in comprehension is the formula x ∈ Z (having Z as a free variable), then the set given by the resulting axiom is merely the intersection of X and Z: X ∩ Z := {x ∈ X : φ} = {x ∈ X : x ∈ Z}. (b) Note that it is even allowed for φ in comprehension to have X as a free variable, so one can let φ be the formula ∃ (x ∈ u ∧ u ∈ X) to define the set u n o X ∗ := x ∈ X : ∃ (x ∈ u ∧ u ∈ X) . u Then, if 0 := ∅, 1 := {0}, 2 := {0, 1}, we obtain 2∗ = {0} = 1. — AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 192 A AXIOMATIC SET THEORY It is a consequence of extensionality that the mathematical universe consists of sets and only of sets: Suppose there were other objects in the mathematical universe, for example a cow C and a monkey M (or any other object without elements, other than the empty set) – this would be equivalent to allowing a cow or a monkey (or any other object without elements, other than the empty set) to be considered a set, which would mean that our set-theoretic variables vj were allowed to be a cow or a monkey as well. However, extensionality then implies the false statement C = M = ∅, thereby excluding cows and monkeys from the mathematical universe. Similarly, {C} and {M } (or any other object that contains a non-set), can not be inside the mathematical universe. Indeed, otherwise we had ∀ x ∈ {C} ⇔ x ∈ {M } x (as C and M are non-sets) and, by extensionality, {C} = {M } were true, in contradiction to a set with a cow inside not being the same as a set with a monkey inside. Thus, we see that all objects of the mathematical universe must be so-called hereditary sets, i.e. sets all of whose elements (thinking of the elements as being the children of the sets) are also sets. A.3.2 Classes As we need to avoid contradictions such as Russell’s antinomy, we must not require the existence of a set {x : φ} for each set-theoretic formula φ. However, it can still be useful to think of a “collection” of all sets having the property φ. Such collections are commonly called classes: Definition A.12. (a) If φ is a set-theoretic formula, then we call {x : φ} a class, namely the class of all sets that have the property φ (typically, φ will have x as a free variable). (b) If φ is a set-theoretic formula, then we say the class {x : φ} exists (as a set) if, and only if x∈X ⇔ φ (A.4) ∀ ∃ X x is true. Then X is actually unique by extensionality and we identify X with the class {x : φ}. If (A.4) is false, then {x : φ} is called a proper class (and the usual interpretation is that the class is in some sense “too large” to be a set). Example A.13. (a) Due to Russell’s antinomy of Sec. A.1, we know that R := {x : x∈ / x} forms a proper class. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 193 A AXIOMATIC SET THEORY (b) The universal class of all sets, V := {x : x = x}, is a proper class. Once again, this is related to Russell’s antinomy: If V were a set, then R = {x : x ∈ / x} = {x : x = x ∧ x ∈ / x} = {x : x ∈ V ∧ x ∈ / x} would also be a set by comprehension. However, this is in contradiction to R being a proper class by (a). Remark A.14. From the perspective of formal logic, statements involving proper classes are to be regarded as abbreviations for statements without proper classes. For example, it turns out that the class G of all sets forming a group is a proper class. But we might write G ∈ G as an abbreviation for the statement “The set G is a group.” A.3.3 Pairing, Union, Replacement Axioms 0 – 2 are still consistent with the empty set being the only set in existence (see [Kun12, I.6.13]). The next axiom provides the existence of nonempty sets: Axiom 3 Pairing: ∀ ∀ ∃ (x ∈ Z ∧ y ∈ Z). x y Z Thus, the pairing axiom states that, for all sets x and y, there exists a set Z that contains x and y as elements. In consequence of the pairing axiom, the sets 0 := ∅, 1 := {0}, 2 := {0, 1} (A.5a) (A.5b) (A.5c) all exist. More generally, we may define: Definition A.15. If x, y are sets and Z is given by the pairing axiom, then we call (a) {x, y} := {u ∈ Z : u = x ∨ u = y} the unordered pair given by x and y, (b) {x} := {x, x} the singleton set given by x, (c) (x, y) := {{x}, {x, y}} the ordered pair given by x and y (cf. Def. 2.1). — AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 194 A AXIOMATIC SET THEORY We can now show that ordered pairs behave as expected: Lemma A.16. The following holds true: (x, y) = (x′ , y ′ ) ⇔ (x = x′ ) ∧ (y = y ′ ) . ∀′ ′ x,y,x ,y Proof. “⇐” is merely (x, y) = {{x}, {x, y}} x=x′ , y=y ′ = {{x′ }, {x′ , y ′ }} = (x′ , y ′ ). “⇒” is done by distinguishing two cases: If x = y, then {{x}} = (x, y) = (x′ , y ′ ) = {{x′ }, {x′ , y ′ }}. Next, by extensionality, we first get {x} = {x′ } = {x′ , y ′ }, followed by x = x′ = y ′ , establishing the case. If x 6= y, then {{x}, {x, y}} = (x, y) = (x′ , y ′ ) = {{x′ }, {x′ , y ′ }}, where, by extensionality {x} 6= {x, y} 6= {x′ }. Thus, using extensionality again, {x} = {x′ } and x = x′ . Next, we conclude {x, y} = {x′ , y ′ } = {x, y ′ } and a last application of extensionality yields y = y ′ . While we now have the existence of the infinitely many different sets 0, {0}, {{0}}, . . . , we are not, yet, able to form sets containing more than two elements. This is remedied by the following axiom: Axiom 4 Union: ∀ ∃ ∀ ∀ (x ∈ X ∧ X ∈ M) ⇒ x ∈ Y . M Y x X Thus, the union axiom states that, for each set of sets M, there exists a set Y containing all elements of elements of M. Definition A.17. (a) If M is a set and Y is given by the union axiom, then define [ [ M := X := x ∈ Y : ∃ x∈X . X∈M X∈M (b) If X and Y are sets, then define X ∪ Y := [ {X, Y }. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 195 A AXIOMATIC SET THEORY (c) If x, y, z are sets, then define {x, y, z} := {x, y} ∪ {z}. Remark A.18. (a) The definition of set-theoretic unions as [ Ai := x : ∃ x ∈ Ai i∈I i∈I in (1.25b) will be equivalent to the definition in Def. A.17(a) if we are allowed to form the set M := {Ai : i ∈ I}. If I is a set and Ai is a set for each i ∈ I, then M as above will be a set by Axiom 5 below (the axiom of replacement). (b) In contrast to unions, intersections can be obtained directly from comprehension without the introduction of an additional axiom: For example X ∩ Y := {x ∈ X : x ∈ Y }, \ Ai := x ∈ Ai0 : ∀ x ∈ Ai , i∈I i∈I where i0 ∈ I 6= ∅ is an arbitrary fixed element of I. (c) The union [ ∅= [ X= X∈∅ [ i∈∅ Ai = ∅ is the empty set – in particular, a set. However, \ \ ∅ = x : ∀ x ∈ X = V = x : ∀ x ∈ Ai = Ai , X∈∅ i∈∅ i∈∅ i.e. the intersection over the empty set is the class of all sets – in particular, a proper class and not a set. Definition A.19. We define the successor function x 7→ S(x) := x ∪ {x} (for each set x). Thus, recalling (A.5), we have 1 = S(0), 2 = S(1); and we can define 3 := S(2), . . . In general, we call the set S(x) the successor of the set x. — AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 196 A AXIOMATIC SET THEORY In Def. 2.3 and Def. 2.19, respectively, we define functions and relations in the usual manner, making use of the Cartesian product A × B of two sets A and B, which, according to (2.2) consists of all ordered pairs (x, y), where x ∈ A and y ∈ B. However, Axioms 0 – 4 are not sufficient to justify the existence of Cartesian products. To obtain Cartesian products, we employ the axiom of replacement. Analogous to the axiom of comprehension, the following axiom of replacement actually consists of a scheme of infinitely many axioms, one for each set-theoretic formula: Axiom 5 Replacement Scheme: For each set-theoretic formula, not containing Y as a free variable, the universal closure of ∀ ∃! φ ⇒ ∃ ∀ ∃ φ x∈X y Y x∈X y∈Y is an axiom. Thus, the replacement scheme states that if, for each x ∈ X, there exists a unique y having the property φ (where, in general, φ will depend on x), then there exists a set Y that, for each x ∈ X, contains this y with property φ. One can view this as obtaining Y by replacing each x ∈ X by the corresponding y = y(x). Theorem A.20. If A and B are sets, then the Cartesian product of A and B, i.e. the class A × B := x: ∃ a∈A ∃ b∈B x = (a, b) exists as a set. Proof. For each a ∈ A, we can use replacement with X := B and φ := φa being the formula y = (a, x) to obtain the existence of the set {a} × B := {(a, x) : x ∈ B} (A.6a) (in the usual way, comprehension and extensionality were used as well). Analogously, using replacement again with X := A and φ being the formula y = {x} × B, we obtain the existence of the set M := {{x} × B : x ∈ A}. (A.6b) In a final step, the union axiom now shows [ [ M= {a} × B = A × B (A.6c) a∈A to be a set as well. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 197 A AXIOMATIC SET THEORY A.3.4 Infinity, Ordinals, Natural Numbers The following axiom of infinity guarantees the existence of infinite sets (e.g., it will allow us to define the set of natural numbers N, which is infinite by Th. A.46 below). Axiom 6 Infinity: ∃ X 0∈X ∧ ∀ x∈X (x ∪ {x} ∈ X) . Thus, the infinity axiom states the existence of a set X containing ∅ (identified with the number 0), and, for each of its elements x, its successor S(x) = x ∪ {x}. In preparation for our official definition of N in Def. A.41 below, we will study so-called ordinals, which are special sets also of further interest to the field of set theory (the natural numbers will turn out to be precisely the finite ordinals). We also need some notions from the theory of relations, in particular, order relations (cf. Def. 2.19 and Def. 2.23). Definition A.21. Let R be a relation on a set X. (a) R is called asymmetric if, and only if, xRy ⇒ ¬(yRx) , ∀ x,y∈X (A.7) i.e. if x is related to y only if y is not related to x. (b) R is called a strict partial order if, and only if, R is asymmetric and transitive. It is noted that this is consistent with Not. 2.24, since, recalling the notation ∆(X) := {(x, x) : x ∈ X}, R is a partial order on X if, and only if, R \ ∆(X) is a strict partial order on X. We extend the notions lower/upper bound, min, max, inf, sup of Def. 2.25 to strict partial orders R by applying them to R ∪∆(X): We call x ∈ X a lower bound of Y ⊆ X with respect to R if, and only if, x is a lower bound of Y with respect to R ∪ ∆(X), and analogous for the other notions. (c) A strict partial order R is called a strict total order or a strict linear order if, and only if, for each x, y ∈ X, one has x = y or xRy or yRx. (d) R is called a (strict) well-order if, and only if, R is a (strict) total order and every nonempty subset of X has a min with respect to R (for example, the usual ≤ constitutes a well-order on N (see, e.g., [Phi16, Th. D.5]), but not on R (e.g., R+ does not have a min)). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 198 A AXIOMATIC SET THEORY (e) If Y ⊆ X, then the relation on Y defined by xSy :⇔ xRy is called the restriction of R to Y , denoted S = R↾Y (usually, one still writes R for the restriction). Lemma A.22. Let R be a relation on a set X and Y ⊆ X. (a) If R is transitive, then R↾Y is transitive. (b) If R is reflexive, then R↾Y is reflexive. (c) If R is antisymmetric, then R↾Y is antisymmetric. (d) If R is asymmetric, then R↾Y is asymmetric. (e) If R is a (strict) partial order, then R↾Y is a (strict) partial order. (f ) If R is a (strict) total order, then R↾Y is a (strict) total order. (g) If R is a (strict) well-order, then R↾Y is a (strict) well-order. Proof. (a): If a, b, c ∈ Y with aRb and bRc, then aRc, since a, b, c ∈ X and R is transitive on X. (b): If a ∈ Y , then a ∈ X and aRa, since R is reflexive on X. (c): If a, b ∈ Y with aRb and bRa, then a = b, since a, b ∈ X and R is antisymmetric on X. (d): If a, b ∈ Y with aRb, then ¬bRa, since a, b ∈ X and R is asymmetric on X. (e) follows by combining (a) – (d). (f): If a, b ∈ Y with a = b and ¬aRb, then bRa, since a, b ∈ X and R is total on X. Combining this with (e) yields (f). (g): Due to (f), it merely remains to show that every nonempty subset Z ⊆ Y has a min. However, since Z ⊆ X and R is a well-order on X, there is m ∈ Z such that m is a min for R on X, implying m to be a min for R on Y as well. Remark A.23. Since the universal class V is not a set, ∈ is not a relation in the sense of Def. 2.19. It can be considered as a “class relation”, i.e. a subclass of V × V, but it is a proper class. However, ∈ does constitute a relation in the sense of Def. 2.19 on each set X (recalling that each element of X must be a set as well). More precisely, if X is a set, then so is R∈ := {(x, y) ∈ X × X : x ∈ y}. (A.8a) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 199 A AXIOMATIC SET THEORY Then ∀ (x, y) ∈ R∈ x,y∈X ⇔ x ∈ y. (A.8b) Definition A.24. A set X is called transitive if, and only if, every element of X is also a subset of X: ∀ x ⊆ X. (A.9a) x∈X Clearly, (A.9a) is equivalent to ∀ x,y x∈y ∧ y∈X ⇒ x∈X . (A.9b) Lemma A.25. If X, Y are transitive sets, then X ∩ Y is a transitive set. Proof. If x ∈ X ∩ Y and y ∈ x, then y ∈ X (since X is transitive) and y ∈ Y (since Y is transitive). Thus y ∈ X ∩ Y , showing X ∩ Y is transitive. Definition A.26. (a) A set α is called an ordinal number or just an ordinal if, and only if, α is transitive and ∈ constitutes a strict well-order on α. An ordinal α is called a successor ordinal if, and only if, there exists an ordinal β such that α = S(β), where S is the successor function of Def. A.19. An ordinal α 6= 0 is called a limit ordinal if, and only if, it is not a successor ordinal. We denote the class of all ordinals by ON (it is a proper class by Cor. A.33 below). (b) We define ∀ (α < β :⇔ α ∈ β), (A.10a) ∀ (α ≤ β :⇔ α < β ∨ α = β). (A.10b) α,β∈ON α,β∈ON Example A.27. Using (A.5), 0 = ∅ is an ordinal, and 1 = S(0), 2 = S(1) are both successor ordinals (in Prop. A.43, we will identify N0 as the smallest limit ordinal). Even though X := {1} and Y := {0, 2} are well-ordered by ∈, they are not ordinals, since they are not transitive sets: 1 ∈ X, but 1 6⊆ X (since 0 ∈ 1, but 0 ∈ / X); similarly, 1 ∈ 2 ∈ Y , but 1 ∈ / Y. Lemma A.28. No ordinal contains itself, i.e. ∀ α∈ON α∈ / α. Proof. If α is an ordinal, then ∈ is a strict order on α. Due to asymmetry of strict orders, x ∈ x can not be true for any element of α, implying that α ∈ α can not be true. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 200 A AXIOMATIC SET THEORY Proposition A.29. Every element of an ordinal is an ordinal, i.e. X ∈ α ⇒ X ∈ ON ∀ α∈ON (in other words, ON is a transitive class). Proof. Let α ∈ ON and X ∈ α. Since α is transitive, we have X ⊆ α. As ∈ is a strict well-order on α, it must also be a strict well-order on X by Lem. A.22(g). In consequence, it only remains to prove that X is transitive as well. To this end, let x ∈ X. Then x ∈ α, as α is transitive. If y ∈ x, then, using transitivity of α again, y ∈ α. Now y ∈ X, as ∈ is transitive on α, proving x ⊆ X, i.e. X is transitive. Proposition A.30. If α, β ∈ ON, then X := α ∩ β ∈ ON (we will see in Th. A.35(a) below that, actually, α ∩ β = min{α, β}). Proof. X is transitive by Lem. A.25, and, since X ⊆ α, ∈ is a strict well-order on X by Lem. A.22(g). Proposition A.31. On the class ON, the relation ≤ (as defined in (A.10)) is the same as the relation ⊆, i.e. ∀ α ≤ β ⇔ α ⊆ β ⇔ (α ∈ β ∨ α = β) . (A.11) α,β∈ON Proof. Let α, β ∈ ON. Assume α ≤ β. If α = β, then α ⊆ β. If α ∈ β, then α ⊆ β, since β is transitive. Conversely, assume α ⊆ β and α 6= β. We have to show α ∈ β. To this end, we set X := β \ α. Then X 6= ∅ and, as ∈ well-orders β, we can let m := min X. We will show m = α (note that this will complete the proof, due to α = m ∈ X ⊆ β). If µ ∈ m, then µ ∈ β (since m ∈ β and β is transitive) and µ ∈ / X (since m = min X), implying µ ∈ α (since X = β \ α) and, thus, m ⊆ α. Seeking a contradiction, assume m 6= α. Then there must be some γ ∈ α \ m ⊆ α ⊆ β. In consequence γ, m ∈ β. As γ ∈ / m and ∈ is a total order on β, we must have either m = γ or m ∈ γ. However, m 6= γ, since γ ∈ α and m ∈ / α (as m ∈ X). So it must be m ∈ γ ∈ α, implying m ∈ α, as β is transitive. This contradiction proves m = α and establishes the proposition. Theorem A.32. The class ON is well-ordered by ∈, i.e. (i) ∈ is transitive on ON: ∀ α,β,γ∈ON α<β ∧ β<γ ⇒ α<γ . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 201 A AXIOMATIC SET THEORY (ii) ∈ is asymmetric on ON: ∀ α,β∈ON α < β ⇒ ¬(β < α) . (iii) Ordinals are always comparable: ∀ α<β ∨ β<α ∨ α=β . α,β∈ON (iv) Every nonempty set of ordinals has a min. Proof. (i) is clear, as γ is a transitive set. (ii): If α, β ∈ ON, then α ∈ β ∈ α implies α ∈ α by (i), which is a contradiction to Lem. A.28. (iii): Let γ := α ∩ β. Then γ ∈ ON by Prop. A.30. Thus γ⊆α ∧ γ⊆β Lem. A.31 ⇒ (γ ∈ α ∨ γ = α) ∧ (γ ∈ β ∨ γ = β). (A.12) If γ ∈ α and γ ∈ β, then γ ∈ α ∩ β = γ, in contradiction to Lem. A.28. Thus, by (A.12), γ = α or γ = β. If γ = α, then α ⊆ β. If γ = β, then β ⊆ α, completing the proof of (iii). (iv): Let X be a nonempty set of ordinals and consider α ∈ X. If α = min X, then we are already done. Otherwise, Y := α ∩ X = {β ∈ X : β ∈ α} 6= ∅. Since α is well-ordered by ∈, there is m := min Y . If β ∈ X, then either β < α or α ≤ β by (iii). If β < α, then β ∈ Y and m ≤ β. If α ≤ β, then m < α ≤ β. Thus, m = min X, proving (iv). Corollary A.33. ON is a proper class (i.e. there is no set containing all the ordinals). Proof. If there is a set X containing all ordinals, then, by comprehension, β := ON = {α ∈ X : α is an ordinal} must be a set as well. But then Prop. A.29 says that the set β is transitive and Th. A.32 yields that the set β is well-ordered by ∈, implying β to be an ordinal, i.e. β ∈ β in contradiction to Lem. A.28. Corollary A.34. For each set X of ordinals, we have: (a) X is well-ordered by ∈. (b) X is an ordinal if, and only if, X is transitive. Note: A transitive set of ordinals X is sometimes called an initial segment of ON, since, here, transitivity can be restated in the form ∀ ∀ α<β ⇒ α∈X . (A.13) α∈ON β∈X AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 202 A AXIOMATIC SET THEORY Proof. (a) is a simple consequence of Th. A.32(i)-(iv). (b) is immediate from (a). Theorem A.35. Let X be a nonempty set of ordinals. T (a) Then γ := X is an ordinal, namely γ = min X. In particular, if α, β ∈ ON, then min{α, β} = α ∩ β. S (b) Then δ := X is an ordinal, namely δ = sup X. In particular, if α, β ∈ ON, then max{α, β} = α ∪ β. Proof. (a): Let m := min X. Then γ ⊆ m, since m ∈ X. Conversely, if α ∈ X, then m ≤ α implies m ⊆ α by Prop. A.31, i.e. m ⊆ γ. Thus, m = γ, proving (a). (b): To show δ ∈ ON, we need to show δ is transitive (then δ is an ordinal by Cor. A.34(b)). If α ∈ δ, then there is β ∈ X such that α ∈ β. Thus, if γ ∈ α, then γ ∈ β, since β is transitive. As γ ∈ β implies γ ∈ δ, we see that δ is transitive, as needed. It remains to show δ = sup X. If α ∈ X, then α ⊆ δ, i.e. α ≤ δ, showing δ to be an upper bound for X. Now let u ∈ ON be an arbitrary upper bound for X, i.e. ∀ α∈X α ⊆ u. Thus, δ ⊆ u, i.e. δ ≤ u, proving δ = sup X. Next, we obtain some results regarding the successor function of Def. A.19 in the context of ordinals. Lemma A.36. We have ∀ α∈ON x, y ∈ S(α) ∧ x ∈ y ⇒ x 6= α . Proof. Seeking a contradiction, we reason as follows: α∈α / x = α ⇒ y 6= α y∈S(α) ⇒ y∈α α transitive ⇒ x∈y y ⊆ α ⇒ α ∈ α. This contradiction to α ∈ / α yields x 6= α, concluding the proof. Proposition A.37. For each α ∈ ON, the following holds: (a) S(α) ∈ ON. (b) α < S(α). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 203 A AXIOMATIC SET THEORY (c) For each ordinal β, β < S(α) holds if, and only if, β ≤ α. (d) For each ordinal β, if β < α, then S(β) < S(α). (e) For each ordinal β, if S(β) < S(α), then β < α. Proof. (a): Due to Prop. A.29, S(α) is a set of ordinals. Thus, by Cor. A.34(b), it merely remains to prove that S(α) is transitive. Let x ∈ S(α). If x = α, then x = α ⊆ α ∪ {α} = S(α). If x 6= α, then x ∈ α and, since α is transitive, this implies x ⊆ α ⊆ S(α), showing S(α) to be transitive, thereby completing the proof of (a). (b) holds, as α ∈ S(α) holds by the definition of S(α). (c) is clear, since, for each ordinal β, β < S(α) ⇔ β ∈ S(α) ⇔ β ∈ α ∨ β = α ⇔ β ≤ α. (d): If β < α, then S(β) = β ∪ {β} ⊆ α, i.e. S(β) ≤ α < S(α). (e) follows from (d) using contraposition: If ¬(β < α), then β = α or α < β, implying S(β) = S(α) or S(α) < S(β), i.e. ¬(S(β) < S(α)). We now proceed to define the natural numbers: Definition A.38. An ordinal n is called a natural number if, and only if, m ≤ n ⇒ m = 0 ∨ m is successor ordinal . n 6= 0 ∧ ∀ m∈ON Proposition A.39. If n = 0 or n is a natural number, then S(n) is a natural number and every element of n is a natural number or 0. Proof. Suppose n is 0 or a natural number. If m ∈ n, then m is an ordinal by Prop. A.29. Suppose m 6= 0 and k ∈ m. Then k ∈ n, since n is transitive. Since n is a natural number, k = 0 or k is a successor ordinal. Thus, m is a natural number. It remains to show that S(n) is a natural number. By definition, S(n) = n ∪ {n} 6= 0. Moreover, S(n) ∈ ON by Prop. A.37(a), and, thus, S(n) is a successor ordinal. If m ∈ S(n), then m ≤ n, implying m = 0 or m is a successor ordinal, completing the proof that S(n) is a natural number. Theorem A.40 (Principle of Induction). If X is a set satisfying 0∈X ∧ ∀ x∈X S(x) ∈ X, then X contains 0 and all natural numbers. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (A.14) A AXIOMATIC SET THEORY 204 Proof. Let X be a set satisfying (A.14). Then 0 ∈ X is immediate. Let n be a natural number and, seeking a contradiction, assume n ∈ / X. Consider N := S(n)\X. According to Prop. A.39, S(n) is a natural number and all nonzero elements of S(n) are natural numbers. Since N ⊆ S(n) and 0 ∈ X, 0 ∈ / N and all elements of N must be natural numbers. As n ∈ N , N 6= 0. Since S(n) is well-ordered by ∈ and 0 6= N ⊆ S(n), N must have a min m ∈ N , 0 6= m ≤ n. Since m is a natural number, there must be k such that m = S(k). Then k < m, implying k ∈ / N . On the other hand k < m ∧ m ≤ n ⇒ k ≤ n ⇒ k ∈ S(n). Thus, k ∈ X, implying m = S(k) ∈ X, in contradiction to m ∈ N . This contradiction proves n ∈ X, thereby establishing the case. Definition A.41. If the set X is given by the axiom of infinity, then we use comprehension to define the set N0 := {n ∈ X : n = 0 ∨ n is a natural number} and note N0 to be unique by extensionality. We also denote N := N0 \ {0}. In set theory, it is also very common to use the symbol ω for the set N0 . Corollary A.42. N0 is the set of all natural numbers and 0, i.e. ∀ n ∈ N0 ⇔ n = 0 ∨ n is a natural number . n Proof. “⇒” is clear from Def. A.41 and “⇐” is due to Th. A.40. Proposition A.43. ω = N0 is the smallest limit ordinal. Proof. Since ω is a set of ordinals and ω is transitive by Prop. A.39, ω is an ordinal by Cor. A.34(b). Moreover ω 6= 0, since 0 ∈ ω; and ω is not a successor ordinal (if ω = S(n) = n ∪ {n}, then n ∈ ω and S(n) ∈ ω by Prop. A.39, in contradiction to ω = S(n)), implying it is a limit ordinal. To see that ω is the smallest limit ordinal, let α ∈ ON, α < ω. Then α ∈ ω, that means α = 0 or α is a natural number (in particular, a successor ordinal). In the following Th. A.44, we will prove that N satisfies the Peano axioms P1 – P3 of Sec. 3.1 (if one prefers, one can show the same for N0 , where 0 takes over the role of 1). Theorem A.44. The set of natural numbers N satisfies the Peano axioms P1 – P3 of Sec. 3.1. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 A AXIOMATIC SET THEORY 205 Proof. For P1 and P2, we have to show that, for each n ∈ N, one has S(n) ∈ N \ {1} and that S(m) 6= S(n) for each m, n ∈ N, m 6= n. Let n ∈ N. Then S(n) ∈ N by Prop. A.39. If S(n) = 1, then n < S(n) = 1 by Prop. A.37(b), i.e. n = 0, in contradiction to n ∈ N. If m, n ∈ N with m 6= n, then S(m) 6= S(n) is due to Prop. A.37(d). To prove P3, suppose A ⊆ N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A. We need to show A = N (i.e. N ⊆ A, as A ⊆ N is assumed). Let X := A ∪ {0}. Then X satisfies (A.14) and Th. A.40 yields N0 ⊆ X. Thus, if n ∈ N, then n ∈ X \ {0} = A, showing N ⊆ A. Notation A.45. For each n ∈ N0 , we introduce the notation n + 1 := S(n) (more generally, one also defines α + 1 := S(α) for each ordinal α). Theorem A.46. Let n ∈ N0 . Then A := N0 \ n is infinite (see Def. 3.10(b)). In particular, N0 and N = N0 \ {0} = N0 \ 1 are infinite. Proof. Since n ∈ / n, we have n ∈ A 6= ∅. Thus, if A were finite, then there were a bijection f : A −→ Am := {1, . . . , m} = {k ∈ N : k ≤ m} for some m ∈ N. However, we will show by induction on m ∈ N that there is no injective map f : A −→ Am . Since S(n) ∈ / n, we have S(n) ∈ A. Thus, if f : A −→ A1 = {1}, then f (n) = f (S(n)), showing that f is not injective and proving the cases m = 1. For the induction step, we proceed by contraposition and show that the existence of an injective map f : A −→ Am+1 , m ∈ N, (cf. Not. A.45) implies the existence of an injective map g : A −→ Am . To this end, let m ∈ N and f : A −→ Am+1 be injective. If m + 1 ∈ / f (A), then f itself is an injective map into Am . If m + 1 ∈ f (A), then there is a unique a ∈ A such that f (a) = m + 1. Define ( f (k) for k < a, g : A −→ Am , g(k) := (A.15) f (k + 1) for a ≤ k. Then g is well-defined: If k ∈ A and a ≤ k, then k +1 ∈ A\{a}, and, since f is injective, g does, indeed, map into Am . We verify g to be injective: If k, l ∈ A, k < l, then also k < l + 1 and k + 1 6= l + 1 (by Peano axiom P2 – k + 1 < l + 1 then also follows, but we do not make use of that here). In each case, g(k) 6= g(l), proving g to be injective. For more basic information regarding ordinals see, e.g., [Kun12, Sec. I.8]. A.3.5 Power Set There is one more basic construction principle for sets that is not covered by Axioms 0 – 6, namely the formation of power sets. This needs another axiom: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 206 A AXIOMATIC SET THEORY Axiom 7 Power Set: Y ⊆X ⇒ Y ∈M . ∀ ∃ ∀ X M Y Thus, the power set axiom states that, for each set X, there exists a set M that contains all subsets Y of X as elements. Definition A.47. If X is a set and M is given by the power set axiom, then we call P(X) := {Y ∈ M : Y ⊆ X} the power set of X. Another common notation for P(X) is 2X (cf. Prop. 2.18). A.3.6 Foundation Foundation is, perhaps, the least important of the axioms in ZF. It basically cleanses the mathematical universe of unnecessary “clutter”, i.e. of certain pathological sets that are of no importance to standard mathematics anyway. Axiom 8 Foundation: ∀ X ∃ (x ∈ X) ⇒ x . ∃ ¬∃ z ∈ x ∧ z ∈ X x∈X z Thus, the foundation axiom states that every nonempty set X contains an element x that is disjoint to X. Theorem A.48. Due to the foundation axiom, the ∈ relation can have no cycles, i.e. there do not exist sets x1 , x2 , . . . , xn , n ∈ N, such that x1 ∈ x2 ∈ · · · ∈ xn ∈ x1 . (A.16a) In particular, sets can not be members of themselves: ¬∃ x ∈ x. x (A.16b) Proof. If there were sets x1 , x2 , . . . , xn , n ∈ N, such that (A.16a) were true, then, by using the pairing axiom and the union axiom, we could form the set X := {x1 , . . . , xn }. Then, in contradiction to the foundation axiom, X ∩ xi 6= ∅, for each i = 1, . . . , n: Indeed, xn ∈ X ∩ x1 , and xi−1 ∈ X ∩ xi for each i = 2, . . . , n. For a detailed explanation, why “sets” forbidden by foundation do not occur in standard mathematics, anyway, see, e.g., [Kun12, Sec. I.14]. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 207 A AXIOMATIC SET THEORY A.4 The Axiom of Choice In addition to the axioms of ZF discussed in the previous section, there is one more axiom, namely the axiom of choice (AC) that, together with ZF, makes up ZFC, the axiom system at the basis of current standard mathematics. Even though AC is used and accepted by most mathematicians, it does have the reputation of being somewhat less “natural”. Thus, many mathematicians try to avoid the use of AC, where possible, and it is often pointed out explicitly, if a result depends on the use of AC (but this practice is by no means consistent, neither in the literature nor in this class, and one might sometimes be surprised, which seemingly harmless result does actually depend on AC in some subtle nonobvious way). We will now state the axiom: Axiom 9 Axiom of Choice (AC): ∀ M ∅ ∈ /M ⇒ ∃ S f : M−→ N ∈M N ∀ M ∈M f (M ) ∈ M . Thus, the axiom of choice postulates, for each nonempty set M, whose elements are all nonempty sets, the existence of a choice function, that means a function that assigns, to each M ∈ M, an element m ∈ M . Example A.49. For example, the axiom of choice postulates, for each nonempty set A, the existence of a choice function on P(A) \ {∅} that assigns each subset of A one of its elements. — The axiom of choice is remarkable since, at first glance, it seems so natural that one can hardly believe it is not provable from the axioms in ZF. However, one can actually show that it is neither provable nor disprovable from ZF (see, e.g., [Jec73, Th. 3.5, Th. 5.16] – such a result is called an independence proof, see [Kun80] for further material). If you want to convince yourself that the existence of choice functions is, indeed, a tricky matter, try to define a choice function on P(R) \ {∅} without AC (but do not spend too much time on it – one can show this is actually impossible to accomplish). Theorem A.52 below provides several important equivalences of AC. Its statement and proof needs some preparation. We start by introducing some more relevant notions from the theory of partial orders: Definition A.50. Let X be a set and let ≤ be a partial order on X. (a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there exists no x ∈ X such that m < x (note that a maximal element does not have to be a max and that a maximal element is not necessarily unique). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 208 A AXIOMATIC SET THEORY (b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by ≤. Moreover, a chain C is called maximal if, and only if, no strict superset Y of C (i.e. no Y ⊆ X such that C ( Y ) is a chain. — The following lemma is a bit technical and will be used to prove the implication AC ⇒ (ii) in Th. A.52 (other proofs in the literature often make use of so-called transfinite recursion, but that would mean further developing the theory of ordinals, and we will not pursue this route in this class). Lemma A.51. Let X be a set and let ∅ 6= M ⊆ P(X) be a nonempty set of subsets of X. We let M be partially ordered by inclusion, i.e. setting A ≤ B :⇔ A ⊆ B for each A, B ∈ M. Moreover, define [ [ S := S (A.17) ∀ S⊆M and assume ∀ C⊆M S∈S [ C is a chain ⇒ C∈M . If the function g : M −→ M has the property that M ⊆ g(M ) ∧ #(g(M ) \ M ) ≤ 1 , ∀ M ∈M (A.18) (A.19) then g has a fixed point, i.e. ∃ M ∈M g(M ) = M. (A.20) Proof. Fix some arbitrary M0 ∈ M. We call T ⊆ M an M0 -tower if, and only if, T satisfies the following three properties (i) M0 ∈ T . (ii) If C ⊆ T is a chain, then S C ∈T. (iii) If M ∈ T , then g(M ) ∈ T . Let T := {T ⊆ M : T is an M0 -tower}. If T1 := {M ∈ M : M0 ⊆ M }, then, clearly, T1 is an M0 -tower and, T in particular, T 6= ∅. Next, we note that the intersection of all M0 -towers, i.e. T0 := T ∈T T , is also an M0 -tower. Clearly, no strict subset of T0 can be an M0 -tower and M ∈ T0 ⇒ M ∈ T1 ⇒ M0 ⊆ M. (A.21) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 209 A AXIOMATIC SET THEORY The main work of the rest of the proof consists of showing that T0 is a chain. To show T0 to be a chain, define Γ := M ∈ T0 : ∀ (M ⊆ N ∨ N ⊆ M ) . (A.22) N ∈T0 We intend to show that Γ = T0 by verifying that Γ is an M0 -tower. As an intermediate step, we define ∀ M ∈Γ Φ(M ) := {N ∈ T0 : N ⊆ M ∨ g(M ) ⊆ N } and also show each Φ(M ) to be an M0 -tower. Actually, Γ and each Φ(M )Ssatisfy (i) due to (A.21). To verify Γ satisfies (ii), let C ⊆ Γ be a chain and U := C. Then U ∈ T0 , since T0 satisfies (ii). If N ∈ T0 , and C ⊆ N for each C ∈ C, then U ⊆ N . If N ∈ T0 , and there is C ∈ C such that C 6⊆ N , then N ⊆ C (since C ∈ Γ), i.e. N ⊆ U , showing U ∈ Γ and Γ satisfyingS(ii). Now, let M ∈ Γ. To verify Φ(M ) satisfies (ii), let C ⊆ Φ(M ) be a chain and U := C. Then U ∈ T0 , since T0 satisfies (ii). If U ⊆ M , then U ∈ Φ(M ) as desired. If U 6⊆ M , then there is x ∈ U such that x ∈ / M . Thus, there is C ∈ C such that x ∈ C and g(M ) ⊆ C (since C ∈ Φ(M )), i.e. g(M ) ⊆ U , showing U ∈ Φ(M ) also in this case, and Φ(M ) satisfies (ii). We will verify that Φ(M ) satisfies (iii) next. For this purpose, fix N ∈ Φ(M ). We need to show g(N ) ∈ Φ(M ). We already know g(N ) ∈ T0 , as T0 satisfies (iii). As N ∈ Φ(M ), we can now distinguish three cases. Case 1: N ( M . In this case, we cannot have M ( g(N ) (otherwise, #(g(N ) \ N ) ≥ 2 in contradiction to (A.19)). Thus, g(N ) ⊆ M (since M ∈ Γ), showing g(N ) ∈ Φ(M ). Case 2: N = M . Then g(N ) = g(M ) ∈ Φ(M ) (since g(M ) ∈ T0 and g(M ) ⊆ g(M )). Case 3: g(M ) ⊆ N . Then g(M ) ⊆ g(N ) by (A.19), again showing g(N ) ∈ Φ(M ). Thus, we have verified that Φ(M ) satisfies (iii) and, therefore, is an M0 -tower. Then, by the definition of T0 , we have T0 ⊆ Φ(M ). As we also have Φ(M ) ⊆ T0 (from the definition of Φ(M )), we have shown ∀ Φ(M ) = T0 . M ∈Γ As a consequence, if N ∈ T0 and M ∈ Γ, then N ∈ Φ(M ) and this means N ⊆ M ⊆ g(M ) or g(M ) ⊆ N , i.e. each N ∈ T0 is comparable to g(M ), showing g(M ) ∈ Γ and Γ satisfying (iii), completing the proof that Γ is an M0 -tower. As with the Φ(M ) above, we conclude Γ = T0 , as desired. To conclude the proof of the lemma, we note Γ = T0 implies T0 is a chain. We claim that [ M := T0 satisfies (A.20): Indeed, M ∈ T0 , since T0 satisfies (ii). Then g(M ) ∈ T0 , since T0 satisfies (iii). We then conclude g(M ) ⊆ M from the definition of M . As we always have M ⊆ g(M ) by (A.19), we have established g(M ) = M and proved the lemma. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 210 A AXIOMATIC SET THEORY Theorem A.52 (Equivalences to the Axiom of Choice). The following statements (i) – (v) are equivalent to the axiom of choice (as stated as Axiom 9 above). Q (i) Every Cartesian product i∈I Ai of nonempty sets Ai , where I is a nonempty index set, is nonempty (cf. Def. 2.15(c)). (ii) Hausdorff’s Maximality Principle: Every nonempty partially ordered set X contains a maximal chain (i.e. a maximal totally ordered subset). (iii) Zorn’s Lemma: Let X be a nonempty partially ordered set. If every chain C ⊆ X (i.e. every nonempty totally ordered subset of X) has an upper bound in X (such chains with upper bounds are sometimes called inductive), then X contains a maximal element (cf. Def. A.50(a)). (iv) Zermelo’s Well-Ordering Theorem: Every set can be well-ordered (recall the definition of a well-order from Def. A.21(d)). (v) Every vector space V over a field F has a basis B ⊆ V . Proof. “(i) ⇔ AC”: Assume (i). Given a nonempty set of nonempty sets M, let I := M Q and, for each M ∈ M, let AM := M . If f ∈ M ∈I AM , then, according to Def. 2.15(c), for each M ∈ I = M, one has f (M ) ∈ AM = M , proving AC holds. Conversely, assume AC. Consider a family (Ai )i∈I such that I S 6= ∅ and eachSAi 6= ∅. Let M := {Ai : i ∈ I}. Then, by AC, there is a map g : M −→ N ∈M N = j∈I Aj such that g(M ) ∈ M for each M ∈ M. Then we can define [ f : I −→ Aj , f (i) := g(Ai ) ∈ Ai , j∈I to prove (i). Next, we will show AC ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ AC. “AC ⇒ (ii)”: Assume AC and let X be a nonempty partially ordered set. Let M be the set of all chains in X (i.e. the set of all nonempty totally ordered subsets of X). Then ∅∈ / M and M 6= ∅ (since X 6= ∅ and {x} ∈ M for each x ∈ X). Moreover, M satisfies the hypothesis of Lem. A.51, since, if C ⊆ M is a chain of totally ordered subsets of X, S then C is a totally ordered subset of X, i.e. in M (here we have used the notation of (A.17); also note that we are dealing with two different types of chains here, namely those with respect to the order on X and those with respect to the order given by ⊆ on M). Let f : P(X) \ {∅} −→ X be a choice function given by AC, i.e. such that ∀ Y ∈P(X)\{∅} f (Y ) ∈ Y. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 211 A AXIOMATIC SET THEORY As an auxiliary notation, we set ∀ M ∈M M ∗ := x ∈ X \ M : M ∪ {x} ∈ M . With the intention of applying Lem. A.51, we define ( M ∪ {f (M ∗ )} if M ∗ 6= ∅, g : M −→ M, g(M ) := M if M ∗ = ∅. Since g clearly satisfies (A.19), Lem. A.51 applies, providing an M ∈ M such that g(M ) = M . Thus, M ∗ = ∅, i.e. M is a maximal chain, proving (ii). “(ii) ⇒ (iii)”: Assume (ii). To prove Zorn’s lemma, let X be a nonempty set, partially ordered by ≤, such that every chain C ⊆ X has an upper bound. Due to Hausdorff’s maximality principle, we can assume C ⊆ X to be a maximal chain. Let m ∈ X be an upper bound for the maximal chain C. We claim that m is a maximal element: Indeed, if there were x ∈ X such that m < x, then x ∈ / C (since m is upper bound for C) and C ∪ {x} would constitute a strict superset of C that is also a chain, contradicting the maximality of C. “(iii) ⇒ (iv)”: Assume (iii) and let X be a nonempty set. We need to construct a well-order on X. Let W be the set of all well-orders on subsets of X, i.e. W := (Y, W ) : Y ⊆ X ∧ W ⊆ Y × Y ⊆ X × X is a well-order on Y . We define a partial order ≤ on W by setting ∀′ ′ (Y, W ) ≤ (Y ′ , W ′ ) :⇔ Y ⊆ Y ′ ∧ W = W ′ ↾Y (Y,W ),(Y ,W )∈W ∧ (y ∈ Y, y ′ ∈ Y ′ , y ′ W ′ y ⇒ y ′ ∈ Y ) (recall the definition of the restriction of a relation from Def. A.21(e)). To apply Zorn’s lemma to (W, ≤), we need to check that every chain C ⊆ W has an upper bound. To this end, if C ⊆ W is a chain, let [ [ UC := (YC , WC ), where YC := Y, WC := W. (Y,W )∈C (Y,W )∈C We need to verify UC ∈ W: If aWC b, then there is (Y, C) ∈ C such that aW b. In particular, (a, b) ∈ Y × Y ⊆ YC × YC , showing WC to be a relation on YC . Clearly, WC is a total order on YC (one just uses that, if a, b ∈ YC , then, as C is a chain, there is (Y, W ) ∈ C such that a, b ∈ Y and W = WC ↾Y is a total order on Y ). To see that WC is a well-order on YC , let ∅ 6= A ⊆ YC . If a ∈ A, then there is (Y, W ) ∈ C such that AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 212 A AXIOMATIC SET THEORY a ∈ Y . Since W = WC ↾Y is a well-order on Y , we can let m := min Y ∩ A. We claim that m = min A as well: Let b ∈ A. Then there is (B, U ) ∈ C such that b ∈ B. If B ⊆ Y , then b ∈ Y ∩ A and mW b. If Y ⊆ B, then m, b ∈ B. If mU b, then we are done. If bU m, then b ∈ Y (since (Y, W ) ≤ (B, U )), i.e., again, b ∈ Y ∩ A and mW b (actually m = b in this case), proving m = min A. This completes the proof that WC is a well-order on YC and, thus, shows UC ∈ W. Next, we check UC to be an upper bound for C: If (Y, W ) ∈ C, then Y ⊆ YC and W = WC ↾Y are immediate. If y ∈ Y , y ′ ∈ YC , and y ′ WC y, then y ′ ∈ Y (otherwise, y ′ ∈ A with (A, U ) ∈ C, (Y, W ) ≤ (A, U ), y ′ U y, in contradiction to y ′ ∈ / Y ). Thus, (Y, W ) ≤ UC , showing UC to be an upper bound for C. By Zorn’s lemma, we conclude that W contains a maximal element (M, WM ). But then M = X and WM is the desired well-order on X: Indeed, if there is x ∈ X \ M , then we can let Y := M ∪ {x} and, ∀ aW b :⇔ (a, b ∈ M ∧ aWM b) ∨ b = x . a,b∈Y Then (Y, W ) ∈ W with (M, WM ) < (Y, W ) in contradiction to the maximality of (M, WM ). “(iv) ⇒ AC”: Assume (iv). Given a nonempty set of nonempty sets M, let X := S M ∈M M . By (iv), there exists a well-order R on X. Then every nonempty Y ⊆ X has a unique min. As every M ∈ M is a nonempty subset of X, we can define a choice function f : M −→ X, f (M ) := min M ∈ M, proving AC. “(v) ⇔ AC”: That every vector space has a basis is proved in Th. 5.23 by use of Zorn’s lemma. That, conversely, (v) implies AC was first shown in [Bla84], but the proof needs more algebraic tools than we have available in this class. There are also some set-theoretic subtleties that distinguish this equivalence from the ones provided in (i) – (iv), cf. Rem. A.53 below. Remark A.53. What is actually proved in [Bla84] is that Th. A.52(v) implies the axiom of multiple choice (AMC), which states ∀ ∅ ∈ /M ⇒ ∃ ∀ f (M ) ⊆ M ∧ 0 < #f (M ) < ∞ ! , M M ∈M f : M−→P S N N ∈M i.e., for each nonempty set M, whose elements are all nonempty sets, there exists a function that assigns, to each M ∈ M, a finite subset of M . Neither the proof that (i) – (iv) of Th. A.52 are each equivalent to AC nor the proof in [Bla84] that Th. A.52(v) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 A AXIOMATIC SET THEORY 213 implies AMC makes use of the axiom of foundation (Axiom 8 above). However, the proof that AMC implies AC does require the axiom of foundation (cf., e.g., [Hal17, Ch. 6]). A.5 Cardinality The following theorem provides two interesting, and sometimes useful, characterizations of infinite sets: Theorem A.54. Let A be a set. Using the axiom of choice (AC) of Sec. A.4, the following statements (i) – (iii) are equivalent. More precisely, (ii) and (iii) are equivalent even without AC (a set A is sometimes called Dedekind-infinite if, and only if, it satisfies (iii)), (iii) implies (i) without AC, but AC is needed to show (i) implies (ii), (iii). (i) A is infinite. (ii) There exists M ⊆ A and a bijective map f : M −→ N. (iii) There exists a strict subset B ( A and a bijective map g : A −→ B. One sometimes expresses the equivalence between (i) and (ii) by saying that a set is infinite if, and only if, it contains a copy of the natural numbers. The property stated in (iii) might seem strange at first, but infinite sets are, indeed, precisely those identical in size to some of their strict subsets (as an example think of the natural bijection n 7→ 2n between all natural numbers and the even numbers). Proof. We first prove, without AC, the equivalence between (ii) and (iii). “(ii) ⇒ (iii)”: Let E denote the even numbers. Then E ( N and h : N −→ E, h(n) := 2n, is a bijection, showing that (iii) holds for the natural numbers. According to (ii), there exists M ⊆ A and a bijective map f : M −→ N. Define B := (A\M ) ∪˙ f −1 (E) and ( x for x ∈ A \ M , h : A −→ B, h(x) := (A.23) −1 f ◦ h ◦ f (x) for x ∈ M . Then B ( A since B does not contain the elements of M that are mapped to odd numbers under f . Still, h is bijective, since h↾A\M = IdA\M and h↾M = f −1 ◦ h ◦ f is the composition of the bijective maps f , h, and f −1 ↾E : E −→ f −1 (E). “(iii) ⇒ (ii)”: As (iii) is assumed, there exist B ⊆ A, a ∈ A \ B, and a bijective map g : A −→ B. Set M := {an := g n (a) : n ∈ N}. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 214 A AXIOMATIC SET THEORY We show that an 6= am for each m, n ∈ N with m 6= n: Indeed, suppose m, n ∈ N with n > m and an = am . Then, since g is bijective, we can apply g −1 m times to an = am to obtain a = (g −1 )m (am ) = (g −1 )m (an ) = g n−m (a). Since l := n − m ≥ 1, we have a = g(g l−1 (a)), in contradiction to a ∈ A \ B. Thus, all the an ∈ M are distinct and we can define f : M −→ N, f (an ) := n, which is clearly bijective, proving (ii). “(iii) ⇒ (i)”: The proof is conducted by contraposition, i.e. we assume A to be finite and proof that (iii) does not hold. If A = ∅, then there is nothing to prove. If ∅ 6= A is finite, then, by Def. 3.10(b), there exists n ∈ N and a bijective map f : A −→ {1, . . . , n}. If B ( A, then, according to Th. 3.19(a), there exists m ∈ N0 , m < n, and a bijective map h : B −→ {1, . . . , m}. If there were a bijective map g : A −→ B, then h ◦ g ◦ f −1 were a bijective map from {1, . . . , n} onto {1, . . . , m} with m < n in contradiction to Th. 3.17. “(i) ⇒ (ii)”: Inductively, we construct a strictly increasing sequence M1 ⊆ M2 ⊆ . . . of subsets Mn of A, n ∈ N, and a sequence of functions fn : Mn −→ {1, . . . , n}, satisfying ∀ n∈N ∀ m,n∈N fn is bijective, m ≤ n ⇒ f n ↾M m = f m : (A.24a) (A.24b) Since A 6= ∅, there exists m1 ∈ A. Set M1 := {m1 } and f1 : M1 −→ {1}, f1 (m1 ) := 1. Then M1 ⊆ A and f1 bijective are trivially clear. Now let n ∈ N and suppose M1 , . . . , Mn and f1 , . . . , fn satisfying (A.24) have already been constructed. Since A is infinite, there must be mn+1 ∈ A\Mn (otherwise Mn = A and the bijectivity of fn : Mn −→ {1, . . . , n} shows A is finite with #A = n; AC is used to select the mn+1 ∈ A \ Mn ). Set Mn+1 := Mn ∪ {mn+1 } and ( fn (x) for x ∈ Mn , (A.25) fn+1 : Mn+1 −→ {1, . . . , n + 1}, fn+1 (x) := n + 1 for x = mn+1 . Then the bijectivity of fn implies the bijectivity of fn+1 , and, since fn+1 ↾Mn = fn holds by definition of fn+1 , the implication m≤n+1 ⇒ fn+1 ↾Mm = fm holds true as well. An induction also shows Mn = {m1 , . . . , mn } and fn (mn ) = n for each n ∈ N. We now define [ M := Mn = {mn : n ∈ N}, f : M −→ N, f (mn ) := fn (mn ) = n. (A.26) n∈N Clearly, M ⊆ A, and f is bijective with f −1 : N −→ M , f −1 (n) = mn . AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 215 A AXIOMATIC SET THEORY We now proceed to prove some rules regarding cardinality: Theorem A.55. Let A, B be sets. Then #A ≤ #B ∨ #B ≤ #A, (A.27) i.e. there exists a bijective map φA : A −→ N with N ⊆ B or a bijective map φB : M −→ B, M ⊆ A (this result makes use of AC). Proof. To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on the set n o M := (M, N, f ) : M ⊆ A, N ⊆ B, f : M −→ N is bijective (A.28) by letting (M, N, f ) ≤ (U, V, g) :⇔ M ⊆ U, g↾M = f. (A.29) Then M contains the empty map, (∅, ∅, ∅) ∈ M, i.e. S M 6= ∅. Every chain C ⊆ M has an upper bound, namely (MC , fC ) with MC := (M,N,f )∈C M and fC (x) := f (x), where (M, N, f ) ∈ C is chosen such that x ∈ M (since C is a chain, the value of fC (x) does not actually depend on S the choice of (M, N, f ) ∈ C and is, thus, well-defined). Clearly, MC ⊆ A and NC := (M,N,f )∈C N ⊆ B. We need to verify fC : MC −→ NC to be bijective. If x, y ∈ MC , then, since C is a chain, there exists (M, N, f ) ∈ C with x, y ∈ M . As f is injective, we have, for x 6= y, that fC (x) = f (x) 6= f (y) = fC (y), showing fC to be injective as well. If z ∈ NC , then z ∈ N for some (f, M, N ) ∈ C. Since f : M −→ N is surjective, there exists x ∈ M ⊆ MC with fC (x) = f (x) = z, showing fC to be surjective as well, proving (MC , NC , fC ) ∈ M. To see (MC , NC , fC ) to be an upper bound for C, note that the definition of (MC , NC , fC ) immediately implies M ⊆ MC for each (M, N, f ) ∈ C and fC ↾M = f for each (M, N, f ) ∈ C. Thus, Zorn’s lemma applies, yielding a maximal element (Mmax , Nmax , fmax ) ∈ M. If a ∈ A \ Mmax and b ∈ B \ Nmax , then ( b for x = a, f : {a} ∪˙ Mmax −→ {b} ∪˙ Nmax , f (x) := fmax (x) for x ∈ Mmax is a bijective extension of fmax . Thus, the maximality of (Mmax , Nmax , fmax ) implies Mmax = A or Nmax = B, completing the proof. Theorem A.56. Let A, B be sets, let A be infinite and assume there exists an injective map φB : B −→ A. Then #(A ∪ B) = #A, (A.30) i.e. there exists a bijective map, mapping A onto A ∪ B (this result makes use of AC). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 216 A AXIOMATIC SET THEORY Proof. Since the map a 7→ a always maps A injectively into A ∪ B, it remains to show the existence of an injective map φ : A ∪ B −→ A (then (A.30) holds due to the Schröder-Bernstein Th. 3.12). Possibly replacing B with B \ A, we may also assume A and B to be disjoint, without loss of generality. Thus, let M := A ∪˙ B. We apply Zorn’s lemma of Th. A.52(iii) to M := A ⊆ P(A) : ∀ #X = #N ∧ ∀ X 6= Y ⇒ X ∩ Y = ∅ , (A.31) X∈A X,Y ∈A partially ordered by set inclusion ⊆ (each element of M constitutes a partition of some subset of A into infinite, countable subsets). Then M 6= ∅ by Th. A.54(ii). In the usual way, if C ⊆ M is a chain, then the union of all elements of C is an element of M that constitutes an upper bound for C. Thus, Zorn’s lemma applies, yielding a maximal element Amax ∈ M. The maximality of Amax then implies [ F := A \ C C∈Amax to be finite. Replacing some fixed C ∈ Amax with C ∪ F , we may assume Amax to be a partition of A. For each X ∈ Amax , we have X ⊆ A and a bijective map φX : N −→ X. Thus, the map φ0 : N × Amax −→ A, φ0 (n, X) := φX (n), is bijective as well. Letting N0 := {n ∈ N : n even} and N1 := {n ∈ N : n odd}, we obtain ˙ 1 × Amax ). N × Amax = (N0 × Amax ) ∪(N Moreover, there exist bijective maps ψ0 : N×Amax −→ N0 ×Amax and ψ1 : N×Amax −→ N1 × Amax . Thus, we can define ( (φ0 ◦ ψ0 ◦ φ−1 for x ∈ A, 0 )(x) φ : M := A ∪˙ B −→ A, φ(x) := −1 (φ0 ◦ ψ1 ◦ φ0 ◦ φB )(x) for x ∈ B, which is, clearly, injective. Theorem A.57. Let A, B be nonempty sets, let A be infinite and assume there exists an injective map φB : B −→ A. Then #(A × B) = #A, (A.32) i.e. there exists a bijective map, mapping A onto A × B (this result makes use of AC). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 217 A AXIOMATIC SET THEORY Proof. Since the map (a, b) 7→ (a, φB (b)) is injective from A × B into A × A, and, for fixed b0 ∈ B, the map a 7→ (a, b0 ) is injective from A into A × B, it suffices to show the existence of an injective map from A × A into A (then (A.30) holds due to the Schröder-Bernstein Th. 3.12). However, we will actually directly prove the existence of a bijective map from A onto A × A: To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on the set n o M := (D, f ) : D ⊆ A, f : D −→ D × D is bijective (A.33) by letting (D, f ) ≤ (E, g) :⇔ D ⊆ E, φ̃ : A −→ A × A, φ̃ := (φ, φ) ◦ fmax ◦ φ−1 , g↾D = f. (A.34) Then M 6= ∅, since there exists a bijective map f : N −→ N × N by Th. 3.28 and a bijective map between some D ⊆ A and N by STh. A.54(ii). Every chain C ⊆ M has an upper bound, namely (DC , fC ) with DC := (D,f )∈C D and fC (x) := f (x), where (D, f ) ∈ C is chosen such that x ∈ D (since C is a chain, the value of fC (x) does not actually depend on the choice of (D, f ) ∈ C and is, thus, well-defined). Clearly, DC ⊆ A, and we merely need to verify fC : DC −→ DC × DC to be bijective. If x, y ∈ DC , then, since C is a chain, there exists (D, f ) ∈ C with x, y ∈ D. As f is injective, we have fC (x) = f (x) 6= f (y) = fC (y), showing fC to be injective as well. If (x, y) ∈ DC × DC and (D, f ) ∈ C with x, y ∈ D as before, then the surjectivity of f : D −→ D × D yields z ∈ D ⊆ DC with fC (z) = f (z) = (x, y), showing fC to be surjective as well, proving (DC , fC ) ∈ M. To see (DC , fC ) to be an upper bound for C, note that the definition of (DC , fC ) immediately implies D ⊆ DC for each (D, f ) ∈ C and fC ↾D = f for each (D, f ) ∈ C. Thus, Zorn’s lemma applies, yielding a maximal element (Dmax , fmax ) ∈ M. We would like to prove Dmax = A. However, in general, this can not be expected to hold (for example, if A = N, and N \ D is finite, then (D, f ) ∈ M is always maximal – since (E × E) \ (D × D) is infinite for each D ( E ⊆ A, f does not have a bijective extension g : E −→ E × E). What we can prove, though, is the existence of a bijective map φ : Dmax −→ A, which suffices, since the map is then also bijective. It remains to establish the existence of the bijective map φ : Dmax −→ A. Arguing via contraposition, we assume the nonexistence of a bijective φ and show (Dmax , fmax ) is not maximal: Let E := A \ Dmax . Then A = Dmax ∪˙ E. According to Th. A.56, there does not exist an injective map from E into Dmax . Then, according to Th. A.55, there exists a bijective map φD : Dmax −→ N , N ⊆ E. Define ˙ ˙ max × N ). P := (N × N ) ∪(N × Dmax ) ∪(D Since φD : Dmax −→ N is bijective and fmax : Dmax −→ Dmax × Dmax is bijective, we conclude #(N × N ) = #(N × Dmax ) = #(Dmax × N ) = #Dmax = #N. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 218 A AXIOMATIC SET THEORY Thus, #P = #Dmax = #N by Th. A.56 and, in particular, there exists a bijection fN : N −→ P . In consequence, we can combine the bijections fmax : Dmax −→ Dmax × Dmax and fN : N −→ P to obtain a bijection ˙ max × Dmax ) = (Dmax ∪˙ N ) × (Dmax ∪˙ N ), f : N ∪˙ Dmax −→ P ∪(D which is a bijective extension of fmax , i.e. (N ∪˙ Dmax , f ) ∈ M and (Dmax , fmax ) is not maximal. Thus, the maximality of (Dmax , fmax ) implies the existence of a bijective φ : Dmax −→ A as desired. Corollary A.58. Let n ∈ N and let A be an infinite set. Then #An = #A, (A.35) i.e. there exists a bijective map, mapping A onto An (this result makes use of AC). Proof. The claimed rule (A.35) follows via a simple induction from Th. A.57, since, for each n ∈ N with n ≥ 2, the map φ : An −→ An−1 × A, φ(a1 , . . . , an ) := (a1 , . . . , an−1 ), an , is, clearly, bijective. Theorem A.59. Let A be a set and let Pfin (A) denote the set of finite subsets of A. (a) One has #Pfin (A) = #2A fin , (A.36) A where 2A fin := {0, 1}fin is defined as in Ex. 5.16(c) and a bijection between Pfin (A) and {0, 1}A fin is given by restricting the map χ : P(A) −→ {0, 1}A , χ(B) := χB , of Prop. 2.18 to Pfin (A). (b) If A is infinite, then ∀ n∈N #Pn (A) = #Pfin (A) = #A, (A.37) i.e. there exists a bijective map, mapping A onto Pfin (A) as well as a bijective map, mapping A onto Pn (A), where Pn (A) denotes the set of subsets of A with precisely n elements (both results of (b) make use of AC). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 219 A AXIOMATIC SET THEORY Proof. (a): χ↾Pfin (A) is injective, since χ is injective. If f ∈ {0, 1}A fin , then Bf := {a ∈ A : f (a) 6= 0} ∈ Pfin (A) with f = χBf , showing χ↾Pfin (A) to be surjective onto {0, 1}A fin . (b): Let n ∈ N and let a1 , . . . , an+1 ∈ A be n + 1 distinct elements of A. Define ( a ∪ {a1 , . . . , an−1 } for a ∈ / {a1 , . . . , an−1 }, ιn : A −→ Pn (A), ιn (a) := {a1 , . . . , an+1 } \ {a} for a ∈ {a1 , . . . , an−1 }. Then, clearly, each ιn is injective as a map into Pn (A) and, due to Pn (A) ⊆ Pfin (A), also as a map into Pfin (A). Thus, it remains to show the existence of injective maps φn : Pn (A) −→ A and φ : Pfin (A) −→ A (then (A.37) holds due to the SchröderBernstein Th. 3.12). Fix n ∈ N. Due to Cor. A.58, it suffices to define an injective map φ̃n : Pn (A) −→ An . For each B ∈ Pn (A), let φB : {1, . . . , n} −→ B be bijective and define φ̃n : Pn (A) −→ An , φ̃n (B) := φB . If B, C ∈ Pn (A) with x ∈ B \ C, then x ∈ φB ({1, . . . , n}), but x ∈ / φC ({1, . . . , n}), showing φB 6= φC , i.e. φ̃n is injective. To obtain φ, it suffices to define an injective map φ̃ : Pfin (A) −→ A × N, as there exists a bijective map from A × N onto A by Th. A.57. Since [ ˙ Pfin (A) = Pn (A), (A.38) n∈N we may define φ̃ : Pfin (A) −→ A × N, φ̃(B) := φn (B), n for B ∈ Pn (A), which is, clearly, injective, due to (A.38) and the injectivity of φn . Theorem A.60. Let A, B be sets. If 2 ≤ #A (i.e. there exists an injective map ι2 : {0, 1} −→ A), #A ≤ #B (i.e. there exists an injective map ιA : A −→ B), and B is infinite, then #AB = #2B = #P(B), (A.37) B #AB fin = #2fin = #Pfin (B) = #B, (A.39a) (A.39b) i.e. there exist bijective maps φ1 : AB −→ {0, 1}B and φ2 : {0, 1}B −→ P(B), as well as B B bijective maps φ1,f : AB fin −→ {0, 1}fin and φ2,f : {0, 1}fin −→ Pfin (B) (this result makes use of AC). For the purposes of (A.39b), we introduce the notation 0 := ι2 (0) ∈ A, so B that both AB fin and 2fin make sense according to the definition in Ex. 5.16(c). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 220 A AXIOMATIC SET THEORY Proof. The existence of φ2 was already established in Prop. 2.18, the existence of φ2,f in Th. A.59(a). Thus, according to the Schröder-Bernstein Th. 3.12, for the existence of φ1 , it suffices to show the existence of injective maps f1 : {0, 1}B −→ AB , f2 : AB −→ B B , f3 : B B −→ P(B × B), as well as a bijective map g : P(B × B) −→ P(B), which can be stated concisely as #2B ≤ #AB ≤ #B B ≤ #P(B × B) = #P(B). (A.40a) Analogously, for the existence of φ1,f , it suffices to show the existence of injective maps B B B B f1,f : {0, 1}B fin −→ Afin , f2,f : Afin −→ Bfin , f3,f : Bfin −→ Pfin (B × B), as well as a bijective map gf : Pfin (B × B) −→ Pfin (B), which can be stated concisely as B B #2B fin ≤ #Afin ≤ #Bfin ≤ #Pfin (B × B) = #Pfin (B) (A.40b) B (for Bfin to make sense according to the definition in Ex. 5.16(c), we now also introduce the notation 0 := ιA (ι2 (0)) ∈ B). Clearly, the maps f1 : {0, 1}B −→ AB , f1 (α)(b) := ι2 (α(b)), f2 : AB −→ B B , f2 (α)(b) := ιA (α(b)), B f3 : B −→ P(B × B), f3 (α) := {(x, y) ∈ B × B : y = α(x)}, are, indeed, injective. Then the restrictions B f1,f : {0, 1}B fin −→ Afin , f1,f := f1 ↾{0,1}Bfin , B f2,f : AB fin −→ Bfin , f2,f := f2 ↾ABfin , are well-defined and injective as well. While the restriction of f3 does not work, as it does not map into Pfin (B × B), we can use the injective map B f3,f : Bfin −→ Pfin (B × B), f3,f (α) := {(x, y) ∈ B × B : y = α(x) 6= 0}. From Th. A.57, we know the existence of a bijective map ψ : B × B −→ B, implying g : P(B × B) −→ P(B), g(C) := ψ(C) = {ψ(x, y) : (x, y) ∈ C}, to be bijective as well, thereby proving (A.40a) and (A.39a). Then the restriction gf : Pfin (B × B) −→ Pfin (B), gf := g↾Pfin (B×B) , is well-defined and also bijective, proving (A.40b) and (A.39b). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 221 B ASSOCIATIVITY AND COMMUTATIVITY B General Forms of the Laws of Associativity and Commutativity B.1 Associativity In the literature, the general law of associativity is often stated in the form that a1 a2 · · · an gives the same result “for every admissible way of inserting parentheses into a1 a2 · · · an ”, but a completely precise formulation of what that actually means seems to be rare. As a warm-up, we first prove a special case of the general law: Proposition B.1. Let (A, ·) be a semigroup (i.e. · : A × A −→ A is an associative composition on A). Then ! k−1 ! n n Y Y Y ai , (B.1) ai = ai ∀ ∀ ∀ n∈N, n≥2 a1 ,...,an ∈A k∈{2,...,n} i=1 i=k i=1 where the product symbol is defined according to (3.16a). Proof. The assumed associativity means the validity of ∀ a,b,c∈A (ab)c = a(bc). (B.2) If k = n, then (B.1) is immediate from (3.16a). For 2 ≤ k < n, we prove (B.1) by induction on n: For the base case, n = 2, there is nothing to prove. For n > 2, one computes ! k−1 !! ! k−1 ! ! k−1 ! n−1 n−1 n Y Y Y Y Y Y (B.2) (3.16a) = an · ai ai ai = an · ai ai ai i=k i=1 i=1 i=k ind.hyp. = an · n−1 Y i=1 ai n (3.16a) Y = i=k ai , i=1 (B.3) i=1 completing the induction and the proof of the proposition. The difficulty in stating the general form of the law of associativity lies in giving a precise definition of what one means by “an admissible way of inserting parentheses into a1 a2 · · · an ”. So how does one actually proceed to calculate the value of a1 a2 · · · an , given that parentheses have been inserted in an admissible way? The answer is that one does it in n − 1 steps, where, in each step, one combines two juxtaposed elements, consistent AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 222 B ASSOCIATIVITY AND COMMUTATIVITY with the inserted parentheses. There can still be some ambiguity: For example, for (a1 a2 )(a3 (a4 a5 )), one has the freedom of first combining a1 , a2 , or of first combining a4 , a5 . In consequence, our general law of associativity will show that, for each admissible sequence of n − 1 directives for combining two juxtaposed elements, the final result is the same (under the hypothesis that (B.2) holds). This still needs some preparatory work. In the following, one might see it as a slight notational inconvenience that we have Qn defined i=1 ai as an · · · a1 rather than a1 · · · an . For this reason, we will enumerate the elements to be combined by composition from right to left rather then from left to right. Definition and Remark B.2. Let A be a nonempty set with a composition · : A × A −→ A, let n ∈ N, n ≥ 2, and let I be a totally ordered index set, #I = n, I = {i1 , . . . , in } with i1 < · · · < in . Moreover, let F := (ain , . . . , ai1 ) be a family of n elements of A. (a) An admissible composition directive (for combining two juxtaposed elements of the family) is an index ik ∈ I with 1 ≤ k ≤ n − 1. It transforms the family F into the family G := (ain , . . . , aik+1 aik , . . . , ai1 ). In other words, G = (bj )j∈J , where J := I \ {ik+1 }, bj = aj for each j ∈ J \ {ik }, and bik = aik+1 aik . We can write this transformation as two maps (1) (B.4a) (2) (B.4b) F 7→ δik (F ) := G = (ain , . . . , aik+1 aik , . . . , ai1 ) = (bj )j∈J , I 7→ δik (I) := J = I \ {ik+1 }. Thus, an application of an admissible composition directive reduces the length of the family and the number of indices by one. (b) Recursively, we define (finite) sequences of families, index sets, and indices as follows: Fn := F, ∀ α∈{2,...,n} (1) Fα−1 := δjα (Fα ), In := I, (2) Iα−1 := δjα (Iα ), (B.5a) where jα ∈ Iα \ {max Iα }. (B.5b) The corresponding sequence of indices D := (jn , . . . , j2 ) in I is called an admissible evaluation directive. Clearly, ∀ α∈{1,...,n} #Iα = α, i.e. Fα has length α. (B.6) In particular, I1 = {j2 } = {i1 } (where the second equality follows from (B.4b)), F1 = (a), and we call D(F ) := a (B.7) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 223 B ASSOCIATIVITY AND COMMUTATIVITY the result of the admissible evaluation directive D applied to F . Theorem B.3 (General Law of Associativity). Let (A, ·) be a semigroup (i.e. · : A × A −→ A is an associative composition on A). Let n ∈ N, n ≥ 2, and let I be a totally ordered index set, #I = n, I = {i1 , . . . , in } with i1 < · · · < in . Moreover, let F := (ain , . . . , ai1 ) be a family of n elements of A. Then, for each admissible evaluation directive as defined in Def. and Rem. B.2(b), the result is the same, namely D(F ) = n Y ai k . (B.8) k=1 Proof. We conduct the proof via induction on n. For n = 3, there are only two possible directives and (B.2) guarantees that they yield the same result. For the induction step, let n > 3. As in Def. and Rem. B.2(b), we write D = (jn , . . . , j2 ) and obtain some I2 = {i1 , im }, 1 < m ≤ n, as the corresponding penultimate index set. Depending on im , we partition (jn , . . . , j3 ) as follows: Set J1 := k ∈ {3, . . . , n} : jk < im , J2 := k ∈ {3, . . . , n} : jk ≥ im . (B.9) Then, for k ∈ J1 , jk is a composition directive to combine two elements to the right of aim and, for k ∈ J2 , jk is a composition directive to combine two elements to the left of aim . Moreover, J1 and J2 might or might not be the empty set: If J1 = ∅, then jk 6= i1 for each k ∈ {3, . . . , n}, implying im = i2 ; if J2 = ∅, then, in each of the n − 2 steps to obtain I2 , an ik with k < m was removed from I, implying im = in (in particular, as n 6= 2, J1 and J2 cannot both be empty). If J1 6= ∅, then D1 := (jk )k∈J1 is an admissible evaluation directive for (aim−1 , . . . , ai1 ) – this follows from (2) jk ∈ K ⊆ {i1 , . . . , im−1 } ⇒ δjk (K) ⊆ K ⊆ {i1 , . . . , im−1 }. (B.10) Since m − 1 < n, the induction hypothesis applies and yields D1 (aim−1 , . . . , ai1 ) = m−1 Y ai k . (B.11) k=1 Analogously, if J2 6= ∅, then D2 := (jk )k∈J2 is an admissible evaluation directive for (ain , . . . , aim ) – this follows from (2) jk ∈ K ⊆ {im , . . . , in } ⇒ δjk (K) ⊆ K ⊆ {im , . . . , in }. (B.12) Since m > 1, the induction hypothesis applies and yields D2 (ain , . . . , aim ) = n Y aik . k=m AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (B.13) 224 B ASSOCIATIVITY AND COMMUTATIVITY Thus, if J1 6= ∅ and J2 6= ∅, then we obtain j2 =i1 D(F ) = D2 (ain , . . . , aim ) · D1 (aim−1 , . . . , ai1 ) = n Y ai k k=m ! m−1 Y ai k k=1 ! Prop. B.1 = aik k=1 (B.14) as desired. If J1 = ∅, then, as explained above, im = i2 . Thus, in this case, ! n n Y Y Prop. B.1 j2 =i1 ai k · ai 1 = aik D(F ) = D2 (ain , . . . , ai2 ) · ai1 = k=2 n Y (B.15) k=1 as needed. Finally, if J2 = ∅, then, as explained above, im = in . Thus, in this case, j2 =i1 D(F ) = ain · D1 (ain−1 , . . . , ai1 ) = n Y ai k , (B.16) k=1 again, as desired, and completing the induction. B.2 Commutativity In the present section, we will generalize the law of commutativity ab = ba to a finite number of factors, provided the composition is also associative. Theorem B.4 (General Law of Commutativity). Let (A, ·) be a semigroup (i.e. · : A × A −→ A is an associative composition on A). If the composition is commutative, i.e. if ∀ ab = ba, (B.17) a,b∈A then ∀ n∈N ∀ π∈Sn ∀ a1 ,...,an ∈A n Y ai = i=1 n Y aπ(i) , (B.18) i=1 where Sn is the set of bijective maps on {1, . . . , n} (cf. Ex. 4.9(b)). Proof. We conduct the proof via induction on n: For n = 1, there is nothing to prove and, for n = 2, (B.18) is the same as (B.17). So let n > 2. As π is bijective, we may define k := π −1 (n). Then ! ! n k−1 n Y Y Y Th. B.3 aπ(i) = aπ(i) aπ(i) · aπ(k) · i=1 i=1 i=k+1 (B.17) = aπ(k) · n Y i=k+1 aπ(i) ! · k−1 Y i=1 aπ(i) ! AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 . (B.19) 225 C GROUPS Define the bijective map ϕ : {1, . . . , n − 1} −→ {1, . . . , n − 1}, ϕ(j) := ( π(j) for 1 ≤ j ≤ k − 1, (B.20) π(j + 1) for k ≤ j ≤ n − 1 (where the bijectivity of π implies the bijectivity of ϕ). Then ! ! n−1 k−1 n n−1 Y Y Y Y (B.19) Th. B.3 aϕ(i) · aϕ(i) aπ(i) = an · = an · aϕ(i) i=1 ind.hyp. = an · completing the induction proof. i=1 i=k n−1 Y ai = i=1 n Y i=1 ai , (B.21) i=1 The following example shows that, if the composition is not associative, then, in general, (B.17) does not imply (B.18): Example B.5. Let A := {a, b, c} with #A = 3 (i.e. the elements a, b, c are all distinct). Let the composition · on A be defined according to the composition table · a b c a b b b b b b a c b a a Then, clearly, · is commutative. However, · is not associative, since, e.g., (cb)a = aa = b 6= a = cb = c(ba), (B.22) and (B.18) does not hold, since, e.g., a(bc) = aa = b 6= a = cb = c(ba). C (B.23) Groups In Ex. 4.9(b), we defined the symmetric group SM of bijective maps from M into M (i.e. of so-called permutations). It is a remarkable result that every group G is isomorphic to a subgroup of the permutation group SG and the proof is surprisingly simple (cf. Th. C.2 below). On the other hand, this result seems to have surprisingly few useful applications, partly due to the fact that SG is usually much bigger (and, thus, usually more difficult to study) than G (cf. Prop. C.4 below). We start with a preparatory lemma: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 226 C GROUPS Lemma C.1. Let M, N be sets and let ψ : M −→ N be bijective. Then the symmetric groups SM and SN are isomorphic. Proof. Exercise. Theorem C.2 (Cayley). Let (G, ·) be a group. Then G is isomorphic to a subgroup of the symmetric group (SG , ◦) of permutations on G. In particular, every finite group is isomorphic to a subgroup of Sn . Proof. Exercise. Hint: Show that φ : G −→ SG , a 7→ fa , fa (x) := ax, defines a monomorphism and use Lem. C.1. (C.1) Notation C.3. Let M, N be sets. Define S(M, N ) := {(f : M −→ N ) : f bijective}. Proposition C.4. Let M, N be sets with #M = #N = n ∈ N0 , S := S(M, N ) (cf. Not. C.3). Then #S = n!; in particular #SM = n!. Proof. We conduct the proof via induction: If n = 0, then S contains precisely the empty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = {a}, N = {b}, then S contains precisely the map f : M −→ N , f (a) = b, and #S = 1 = 1! is true. For the induction step, fix n ∈ N and assume #M = #N = n + 1. Let a ∈ M and [ A := S M \ {a}, N \ {b} . (C.2) b∈N Since the union in (C.2) is finite and disjoint, one has X X ind.hyp. #A = #S M \ {a}, N \ {b} = (n!) = (n + 1) · n! = (n + 1)!. b∈N (C.3) b∈N Thus, it suffices to show φ : S −→ A, φ(f ) : M \ {a} −→ N \ {f (a)}, φ(f ) := f ↾M \{a} , (C.4) is well-defined and bijective. If f : M −→ N is bijective, then f ↾M \{a} : M \ {a} −→ N \ {f (a)} is bijective as well, i.e. φ is well-defined. Suppose f, g ∈ S with f 6= g. If f (a) 6= g(a), then φ(f ) 6= φ(g), as they have different codomains. If f (a) = g(a), then there exists x ∈ M \ {a} with f (x) 6= g(x), implying φ(f )(x) = f (x) 6= g(x) = φ(g)(x), AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 227 D NUMBER THEORY i.e., once again, φ(f ) 6= φ(g). Thus, φ is injective. Now let h ∈ S M \ {a}, N \ {b} for some b ∈ N . Letting ( b for x = a, f : M −→ N, f (x) := h(x) for x 6= a, we have φ(f ) = h, showing φ to be surjective as well. D Number Theory The algebraic discipline called number theory, studies properties of the ring of integers (and similar algebraic structures). Theorem D.1 (Remainder Theorem). For each pair of numbers (a, b) ∈ N2 , there exists a unique pair of numbers (q, r) ∈ N20 satisfying the two conditions a = qb + r and 0 ≤ r < b. Proof. Existence: Define q := max{n ∈ N0 : nb ≤ a}, r := a − qb. (D.1a) (D.1b) Then q ∈ N0 by definition and (D.1b) immediately yields a = qb + r as well as r ∈ Z. Moreover, from (D.1a), qb ≤ a = qb + r, i.e. 0 ≤ r, in particular, r ∈ N0 . Since (D.1a) also implies (q + 1)b > a = qb + r, we also have b > r as required. Uniqueness: Suppose (q1 , r1 ) ∈ N0 , satisfying the two conditions a = q1 b + r1 and 0 ≤ r1 < b. Then q1 b = a − r1 ≤ a and (q1 + 1)b = a − r1 + b > a, showing q1 = max{n ∈ N0 : nb ≤ a} = q. This, in turn, implies r1 = a − q1 b = a − qb = r, thereby establishing the case. Definition D.2. (a) Let a, k ∈ Z. We define a to be a divisor of k (and also say that a divides k, denoted a| k) if, and only if, there exists b ∈ Z such that k = ab. If a is no divisor of k, then we write a 6 | k. (b) A number p ∈ N, p ≥ 2, is called prime if, and only if, 1 and p are its only divisors. (c) Let M be a nonempty set of integers. If M 6= {0}, then the number gcd(M ) := max n ∈ N : ∀ n| k k∈M AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 (D.2) 228 D NUMBER THEORY is called the greatest common divisor of the numbers in M . If M = {a, b}, then one also writes gcd(a, b) := gcd(M ). The numbers in M are called relatively prime or mutually prime if, and only if gcd(M ) = 1. If M is finite and 0 ∈ / M , then the number lcm(M ) := min n ∈ N : ∀ k| n (D.3) k∈M is called the least common multiple of the numbers in M . If M = {a, b}, then one also writes lcm(a, b) := lcm(M ). Remark D.3. Let M be a nonempty set of integers. If M 6= {0}, then gcd(M ) is well-defined, since, for each k ∈ Z, 1| k, and, if 0 6= k ∈ M , then gcd(M ) ≤ k. If M is finite and 0 ∈ / M , then lcm(M ) is well-defined, since max{|k| : k ∈ M } ≤ lcm(M ) ≤ Q k∈M |k|. Some examples are gcd(Z) = 1, gcd(3Z) = 3, lcm(2, 3} = 6, lcm(4, 8) = 8, gcd{8, 12, 20} = 4, lcm{8, 12, 20} = 120. Theorem D.4 (Bézout’s Lemma). If a, b ∈ Z with {a, b} 6= {0} and d := gcd(a, b), then {xa + yb : x, y ∈ Z} = d Z. (D.4) In particular, ∃ xa + yb = d, x,y∈Z (D.5) which is known as Bézout’s identity. An important special case is gcd(a, b) = 1 ⇒ ∃ x,y∈Z xa + yb = 1. (D.6) Proof. Clearly, it suffices to prove (D.4). To prove (D.4), we show S := {xa + yb : x, y ∈ Z} ∩ N = d N. (D.7) By setting y := 0 and x := ±1, we see that S contains a or −a, i.e. S 6= ∅. Let d := min S. Then there exist s, t ∈ Z with sa + tb = d. It remains to show d := gcd(a, b). We apply Th. D.1, to obtain (q, r) ∈ N20 with |a| = qd + r ∧ 0 ≤ r < d. Then, letting s̃ := s for a = |a| and s̃ := −s for a = −|a|, we have r = |a| − qd = |a| − q(|a|s̃ + bt) = |a|(1 − qs̃) − bqt ∈ S ∪ {0}. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 229 D NUMBER THEORY Since d = min S and r < d, this yields r = 0 and d | a. Using b instead of a in the above reasoning, yields d | b as well, showing d to be a common divisor of a and b. Now let c ∈ N be an arbitrary common divisor of a and b. Then there exist u, v ∈ Z such that a = cu and b = cv, implying d = as + bt = cus + cvt = c(us + vt). In consequence c|d, implying c ≤ d, showing d := gcd(a, b), as desired. Theorem D.5 (Euclid’s Lemma). Let a, b ∈ Z and n ∈ N. (a) If n and a are relatively prime (i.e. gcd(n, a) = 1), then n| ab (b) If n is prime, then n| ab ⇒ ⇒ n|b. n|a ∨ n|b . Proof. (a): As gcd(n, a) = 1, from (D.6), we obtain x, y ∈ Z such that xn + ya = 1. Multiplying this equation by b yields xnb + yab = b. Moreover, by hypothesis, there exists c ∈ Z with ab = nc, implying n(xb + yc) = b, i.e. n|b as claimed. (b): Suppose n is prime, n| ab, and n 6 | a. As n is prime, the only divisors of n are 1 and n. As n 6 | a, we know gcd(n, a) = 1, implying n|b by (a). Theorem D.6 (Fundamental Theorem of Arithmetic). Every n ∈ N, n ≥ 2, has a unique factorization into prime numbers (unique, except for the order of the primes). More precisely, given n as above, there exists a unique function kn : Pn −→ N such that Pn ⊆ {p ∈ N : p prime} and Y n= pkn (p) (D.8) p∈Pn (then, clearly, Pn is necessarily finite). Proof. Existence: This is a simple induction proof: The base case is n = 2. As 2 is clearly prime, we can let P2 := {2} and k2 (2) := 1. For the induction step, let n > 2. If n is prime, then, as in the base case, let Pn := {n} and kn (n) := 1. If n is not prime, AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 230 D NUMBER THEORY then there exist a, b ∈ N with 1 < a ≤ b < n and n = ab. By induction hypothesis, there exist functions ka : Pa −→ N, kb : Pb −→ N, such that Pa , Pb ⊆ {p ∈ N : p prime} and Y Y a= pka (p) , b = pkb (p) . p∈Pa Letting Pn := Pa ∪ Pb and kn : Pn −→ N, we obtain n = ab = Y p∈Pa pka (p) Y p∈Pb p∈Pb for p ∈ Pa \ Pb , ka (p) kn (p) := kb (p) for p ∈ Pb \ Pa , ka (p) + kb (p) for p ∈ Pa ∩ Pb , pkb (p) = Y pka (p) p∈Pa \Pb Y p∈Pb \Pa thereby completing the induction proof. pkb (p) Y pka (p)+kb (p) = p∈Pa ∩Pb Y pkn (p) , p∈Pn Uniqueness: Here we will make use of Th. D.5(b). As existence, the proof is via induction on n. Since n = 2 has precisely the divisors 1 and 2, uniqueness for the base case is clear. In the same way, uniqueness is clear for every prime n > 2. Thus, for the induction step, it only remains to consider the case, where n > 2 is not prime. Suppose, we have P, Q ⊆ {p ∈ N : p prime} and k : P −→ N, l : Q −→ N, such that Y Y n= pk(p) = q l(q) . p∈P q∈Q Q Let p0 ∈ P . Since p0 is prime and p0 |n = q∈Q q l(q) , Th. D.5(b) implies p0 | q0 for some q0 ∈ Q. As p0 and q0 are both prime, this implies p0 = q0 and Y Y k(p )−1 l(p )−1 m := n/p0 = p0 0 pk(p) = p0 0 q l(q) . p∈P \{p0 } q∈Q\{p0 } Since n is not prime, we know m ≥ 2. As we know m < n, we may use the induction hypothesis to conclude P = Q, k(p) = l(p) for each p ∈ P \{p0 }, and k(p0 )−1 = l(p0 )−1, i.e. k = l as desired. Theorem D.7 (Euclid’s Theorem). The set of prime numbers is infinite. Proof. We show that no finite set of Qprimes contains every prime: Let ∅ 6= P be a finite set of prime numbers. Define p := q∈P q, p̃ := p + 1. Then p̃ ∈ / P , since p̃ > q for each q ∈ P . Thus, if p̃ is prime, we are already done. If p̃ is not prime, then there exists a prime n such that n|p̃ (this is immediate from Th. D.6, but also follows easily without Th. D.6, since p̃ can only have finitely many factors a with 1 < a < p̃). If n ∈ P , then there is a ∈ N such that na = p. As there is also b ∈ N such that nb = p̃, we have 1 = p̃ − p = nb − na = n(b − a), implying n = b − a = 1, in contradiction to 1 ∈ / P. Thus, n ∈ / P , completing the proof. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 231 E VECTOR SPACES E Vector Spaces E.1 Cardinality and Dimension Theorem E.1. Let V be a vector space over the field F and let B be a basis of V . B B B (a) #V = #Ffin , i.e. there exists a bijective map φ : V −→ Ffin , where Ffin is defined as in Ex. 5.16(c). (b) If B is infinite and #F ≤ #B (i.e. there exists an injective map φF : F −→ B), then B #V = #Ffin = #2B (E.1) fin = #Pfin (B) = #B, i.e. there exist bijective functions B φ1 : V −→ Ffin , φ3 : {0, 1}B fin −→ Pfin (B), B φ2 : Ffin −→ {0, 1}B fin , φ4 : Pfin (B) −→ B. Proof. (a): Given v ∈ V and b ∈ B, let cv (b) denote the coordinate of v with respect to b according to Th. 5.19. Then B φ : V −→ Ffin , φ(v)(b) := cv (b), is well-defined and bijective by Th. 5.19. (b): The map φ1 exists according to (a), whereas φ2 , φ3 , φ3 exist according to Th. A.60. Corollary E.2. Let F be a field and let M be an infinite set. If #F ≤ #M (i.e. there exists an injective map φF : F −→ M ), then dim F M = #2M (as vector space over F ), (E.2) i.e. if B is a basis of F M , then there exists a bijective map φ : B −→ P(M ). In particular, we have dim RR = #2R dim(Z2 )N = #2N [Phi16, Th. F.2] = #R (as vector space over R), (E.3a) (as vector space over Z2 ). (E.3b) M M Proof. Let B be a basis of F M . Since Ffin is a subspace of F M and dim Ffin = #M by (5.36), there exists an injective map φM : M −→ B by Th. 5.27(b). In consequence, the map (φM ◦ φF ) : F −→ B is injective as well. Thus, Th. E.1(b) applies and proves (E.2). For (E.3a), we apply (E.2) with F = M = R; for (E.3b), we apply (E.2) with F = Z2 and M = N. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 232 E VECTOR SPACES E.2 Cartesian Products Example E.3. In generalization of Ex. 5.2(d), if I is a nonempty index set, F is a field, and (Y Qi )i∈I is a family of vector spaces over F , then we make the Cartesian product V := i∈I Yi into a vector space over F by defining for each a := (ai )i∈I , b := (bi )i∈I ∈ V : (a + b)i := ai + bi , (λa)i := λai for each λ ∈ F : (E.4a) (E.4b) To verify that (V, +, ·) is, indeed, a vector space, we begin by checking Def. 5.1(i), i.e. by showing (V, +) is a commutative group: Let n = (ni )i∈I ∈ A, ni := 0 ∈ Yi . Then ∀ ∀ i∈I a=(ai )i∈I ∈V (a + n)i = ai + 0 = ai , i.e. a + n = a, showing n is the additive neutral element of V . Next, given a = (ai )i∈I ∈ V , define ā = (āi )i∈I , āi := −ai . Then ∀ i∈I (a + ā)i = ai + (−ai ) = 0, i.e. a + ā = n = 0, showing ā to be the additive inverse element of a (as usual, one then writes −a instead of ā). Associativity is verified since ∀ (ai )i∈I ,(bi )i∈I ,(ci )i∈I ∈V ∀ i∈I (∗) (a + b) + c i = (ai + bi ) + ci = ai + (bi + ci ) = a + (b + c) i , where the associativity of addition in the vector space Yi was used at (∗). Likewise, commutativity is verified since ∀ (ai )i∈I ,(bi )i∈I ∈V ∀ i∈I (∗) (a + b)i = ai + bi = bi + ai = (b + a)i , where the commutativity of addition in the vector space Yi was used at (∗). This completes the proof that (V, +) is a commutative group. To verify Def. 5.1(ii), one computes ∀ λ∈F ∀ (ai )i∈I ,(bi )i∈I ∈V ∀ i∈I (∗) λ(a + b) i = λ(ai + bi ) = λai + λbi = (λa + λb)i , where distributivity in the vector space Yi was used at (∗), proving λ(a + b) = λa + λb. Similarly, (∗) (λ + µ)a i = (λ + µ)ai = λai + µai ∀ ∀ ∀ i∈I λ,µ∈F (ai )i∈I ∈V = (λa + µa)i , AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 233 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R where, once more, distributivity in the vector space Yi was used at (∗), proving (λ+µ)a = λa + µa. The proof of Def. 5.1(iii), is given by ∀ λ,µ∈F ∀ (ai )i∈I ∈V ∀ i∈I (∗) (λµ)a i = (λµ)ai = λ(µai ) = λ(µa) i , where the validity of Def. 5.1(iii) in the vector space Yi was used at (∗). Finally, to prove Def. 5.1(iv), one computes ∀ (ai )i∈I ∈V ∀ i∈I (∗) (1 · a)i = 1 · ai = ai , where the validity of Def. 5.1(iv) in the vector space Yi was used at (∗). Example E.4. In generalization of Ex. 6.7(c), we consider general projections defined on a general Cartesian product of vector spaces, as defined in Ex. E.3 above: Let I be a nonempty index set, let (Yi )i∈I be a family of vector spaces over the field F , and let Q V := i∈I Yi be the Cartesian product vector space as defined in Ex. E.3. For each ∅ 6= J ⊆ I, define Y πJ : V −→ VJ := Yi , πJ ((ai )i∈I ) := (ai )i∈J , i∈J calling πJ the projection onto the coordinates in J We verify πJ to be linear: Let λ ∈ F and a := (ai )i∈I , b := (bi )i∈I ∈ V . Then πJ (λa) = (λai )i∈J , = λ(ai )i∈J = λπJ (a), πJ (a + b) = (ai + bi )i∈J = (ai )i∈J + (bi )i∈J = πJ (a) + πJ (b), proving πJ to be linear. Moreover, for each ∅ 6= J ⊆ I, we have Im πJ = VJ , ker πJ = {(ai )i∈I : ai = 0 for each i ∈ J} ∼ = Y I\J , V = VJ ⊕ ker πJ , (E.5) where, for the last equality, we identified VJ ∼ / J}. = {(ai )i∈I : ai = 0 for each i ∈ F Source Code of Algorithms Implemented in R In the following, we present the source code of implementations for a number of algorithms considered in the main text. All implementations are coded in R (cf. [R C20]). AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R F.1 Linear Systems F.1.1 Tools The following code is available for download at https://www.math.lmu.de/~philip/code/LA1/linSysTools.r # Provide functions that are used as tools in the context # of linear systems: # Define a function that extracts the tail end of a vector: vecTail <- function(k,v) { index <- 1:length(v) v[index >= k] } # Define a function that extracts the front part of a vector: vecFront <- function(k,v) { index <- 1:length(v) v[index <= k] } # Define a function that extracts the rows of the upper # triangular part of a matrix and returns them as a list # of vectors: matUpTriRowsList <- function(mat) { m <- nrow(mat) indexList <- as.list(1:m) # Store rows of mat in a list: rowList <- as.list(as.data.frame(t(mat))) mapply(vecTail, indexList, rowList, SIMPLIFY=FALSE) } # Define a function that extracts the rows of the lower # triangular part of a matrix and returns them as a list # of vectors: matLowTriRowsList <- function(mat) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 234 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R 235 { m <- nrow(mat) indexList <- as.list(1:m) # Store rows of mat in a list: rowList <- as.list(as.data.frame(t(mat))) mapply(vecFront, indexList, rowList, SIMPLIFY=FALSE) } F.1.2 Back and Forward Substitution The following constitutes code for an R implementation that solves linear systems Ax = b with unique solution over R and C via back substitution (cf. Alg. 8.10) in case of given upper triangular matrix A and via forward substitution in case of given lower triangular matrix A, providing the respective functions backSub and forwardSub. The used functions matUpTriRowsList and matLowTriRowsList can be found in Sec. F.1.1 above. The following code is available for download at https://www.math.lmu.de/~philip/code/LA1/linSysBackAndForSub.r # Define functions to solve a triangular linear system: # For an upper triangular matrix via back substitution; # for a lower triangular matrix via forward substitution: # Load code from file linSysTools.r to provide functions # matUpTriRowsList and matLowTriRowsList: source("linSysTools.r") backSubStep <- function(xvec,row) { l <- length(row) index <- 1:l akkinv <- 1/row[1] b <- row[l] if(l==2) { xnew <- akkinv*b } else if (l>2) { rowpart <- row[1<index & index<l] xnew <- akkinv*b-akkinv*sum(rowpart*xvec) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R } c(xnew,xvec) 236 # Return augmented solution vector. } backSub <- function(upTriMat,b) { # Create augmented matrix of linear system Ax=b: Ab <- cbind(upTriMat,b) # Create and store list that contains the rows of the upper # triangular part of the augmented matrix in reverse order: rowList <- rev(matUpTriRowsList(Ab)) # Create an empty vector to use as initial value in the following # call (such that, in the initial step, backSubStep is applied # to the empty vector and the first vector in rowList): emptyVec <- vector("numeric",0) # The following call successively applies backSubStep to the # vectors in rowList, in each step using the result of the previous # step as backSubStep’s first argument: Reduce(backSubStep, rowList, init=emptyVec) } # For technical reasons, the component b of the right-hand # side of the linear system is stored in the first entry # of row (cf. definition of function forwardSub below): forwardSubStep <- function(xvec,row) { l <- length(row) index <- 1:l akkinv <- 1/row[l] b <- row[1] if(l==2) { xnew <- akkinv*b } else if (l>2) { rowpart <- row[1<index & index<l] xnew <- akkinv*b-akkinv*sum(rowpart*xvec) } c(xvec,xnew) # Return augmented solution vector. } forwardSub <- function(lowTriMat,b) AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R 237 { # Create and store list that contains the rows of the lower # triangular part of lowTriMat: rowList <- matLowTriRowsList(lowTriMat) # Adjoin the entries of b as new first elements to the # rows in rowList: rowList <- mapply(c,as.list(as.vector(b)),rowList,SIMPLIFY=FALSE) # Create an empty vector to use as initial value in the following # call (such that, in the initial step, forwardSubStep is applied # to the empty vector and the first vector in rowList): emptyVec <- vector("numeric",0) # The following call successively applies forwardSubStep to the # vectors in rowList, in each step using the result of the previous # step as forwardSubStep’s first argument: Reduce(forwardSubStep, rowList, init=emptyVec) } F.1.3 Gaussian Elimination and LU Decomposition The following constitutes code for an R implementation that uses the Gaussian elimination Alg. 8.17 to solve linear systems Ax = b over R and C by transforming the system into an equivalent system in echelon form. More precisely, function gaussEl uses the augmented version of the Gaussian elimination algorithm of Sec. 8.4.4 to compute an LU decomposition of A, and, in modification of Alg. 8.17(b), the so-called column maximum strategy is applied for row switching (implemented in function gaussElStep): One switches rows r and i ≥ r such that the pivot element in row i (before the switch) has maximum absolute value (cf. [Phi23, Def. 5.5]). Function gaussElSolve applies gaussEl to the augmented matrix (A|b) to solve Ax = b, whereas function gaussElSolveFromLU uses a precomputed LU decomposition of A together with the strategies described in Rem. 8.36 and Rem. 8.25 to solve Ax = b. Function matInverseViaLU applies gaussElSolveFromLU to compute the inverse matrix of invertible A. The used functions backSub and forwardSub can be found in Sec. F.1.2 above. The following code is available for download at https://www.math.lmu.de/~philip/code/LA1/linSysGaussAndLU.r # Define functions to apply the Gaussian elimination algorithm # and to compute LU decompositions: AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R # Load code from file linSysBackAndForSub.r to provide functions # backSub and forwardSub: source("linSysBackAndForSub.r") # Define a function that, given matrix mat, performs one step of # Gaussian elimination in column with number c, assuming upper r # rows are already in echelon form and mat[r,c]!=0, by replacing # the i-th row with the i-th row plus (-mat[i,c]/mat[r,c]) times # the r-th row: gaussElColStep <- function(mat,r,c,i) { fac <- -mat[i,c]/mat[r,c] m <- mat m[i,] <- m[i,]+fac*m[r,] m # Return matrix m. } # Define a function that, given matrix mat, performs Gaussian # elimination in column with number c, assuming upper r # rows are already in echelon form and mat[r,c]!=0: gaussElCol <- function(mat,r,c) { index <- (r+1):nrow(mat) # The following call successively applies gaussElColStep # to the rows with numbers r+1 through nrow(mat), in each # step using the result of the previous step as # gaussElColStep’s first argument - the syntax of the call # is slightly complicated by the fact that Reduce needs to # be applied to a binary function: Reduce(function(...) gaussElColStep(...,r=r,c=c), index, init=mat) } # Define a function that, given a list PLAandr, consisting of # a matrix Atilde, a matrix L, a matrix P, and and a row # number r, performs one step of the Gaussian elimination # algorithm on Atilde, namely elimination in column with # number c (if necessary), assuming the upper r rows are # already in echelon form. Row switching is AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 238 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R # performed according to the so-called column maximum # strategy, where one switches rows r and i>=r such that # the pivot element in row i (before the switch) has # maximum absolute value. # Together with Atilde, the function also updates the # matrices L and P, performing one step in the determination # of an LU decomposition of the form P*A = L*Atilde: gaussElStep <- function(PLAandr,c) { PLAandrNew <- PLAandr r <- PLAandr$r m <- nrow(PLAandr$Atilde) if(r<m) { Atilde <- PLAandr$Atilde L <- PLAandr$L index <- r:m imax <- which.max(abs(Atilde[index,c]))+r-1 ival <- Atilde[imax,c] if(ival!=0) { # Increase row index in PLAandrNew: PLAandrNew$r <- r+1 if(r!=imax) { P <- PLAandr$P # Switch r-th row with imax-th row: Atilde[c(r,imax),] <- Atilde[c(imax,r),] P[c(r,imax),] <- P[c(imax,r),] L[c(r,imax),] <- L[c(imax,r),] PLAandrNew$P <- P } # Eliminate in row with number c of Atilde, # store result matrix in PLAandrNew and update L # accordingly (this needs to be done first): L[(r+1):m,r] <- (1/Atilde[r,c])*Atilde[(r+1):m,c] PLAandrNew$Atilde <- gaussElCol(Atilde,r,c) PLAandrNew$L <- L } } AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 239 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R PLAandrNew # Return PLAandrNew. } # Define a function that, given matrix A, performs Gaussian # elimination, determining an LU decomposition of the form # P*A = L*Atilde, where Atilde is in echelon form, # L is a unipotent lower triangular matrix, and # P is a permutation matrix: gaussEl <- function(A) { m <- nrow(A) index <- 1:ncol(A) PLAandr <- list(Atilde=A,P=diag(m),L=0*diag(m),r=1) # The following call successively applies gaussElStep # with c running through the entries of index, in each # step using the result of the previous step as # gaussElStep’s first argument: res <- Reduce(gaussElStep, index, init=PLAandr) res$L <- res$L+diag(m) # Put 1’s on diagonal of L. res # Return result list. } # Define a function that, given matrix A and a right-hand side # vector b, attempts to solve the linear system Ax=b for x, # using Gaussian elimination, returning solution vector x: gaussElSolve <- function(A,b) { # Create augmented matrix of linear system Ax=b: mat <- cbind(A,b) # Perform Gaussian elimination to obtain upper triangular # matrix E in echelon form: E <- gaussEl(mat)$Atilde # Extract upper triangular matrix R and new right-hand # side bnew for back substitution: ncolR <- ncol(E)-1 R <- E[,1:ncolR] bnew <- E[,ncol(E)] # Solve Rx=bnew using back substitution: backSub(R,bnew) } AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 240 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R 241 # Define a function that, given a list containing an LU # decomposition P*A=L*Atilde of the matrix A and a # right-hand side vector b, attempts to solve the linear # system Ax=b by first solving Lz=Pb for z via forwardSub, # then solving Atilde*x=z for x via backSub, # returning solution vector x: gaussElSolveFromLU <- function(LU,b) { z <- forwardSub(LU$L,LU$P%*%b) backSub(LU$Atilde,z) } # Define a function that, given a list containing an LU # decomposition P*A=L*Atilde of an invertible matrix A, # computes and returns the inverse of A: matInverseFromLU <- function(LU) { n <- ncol(LU$L) # Create list of n standard unit vectors: rhs <- as.list(as.data.frame(diag(n))) matrix(unlist(lapply(rhs,gaussElSolveFromLU,LU=LU)),n,n) } # Define a function that, given an invertible matrix A, # computes and returns the inverse of A, first computing an # LU decomposition of A: matInverseViaLU <- function(A,pivot=0) { LU <- gaussEl(A,pivot) matInverseFromLU(LU) } F.1.4 Examples The following constitutes R code, testing some functions for the solution of linear systems: Matrix (A|b) of Ex. 8.22 is stored in A0, computing the corresponding echelon form (Ã|b̃) as part of an LU decomposition of (A|b) ((Ã|b̃) is given by component $Atilde of AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 F SOURCE CODE OF ALGORITHMS IMPLEMENTED IN R 242 the LU decomposition) of A0. The solution x to Ax = b is then also computed. Matrix A used in Ex. 8.38 and Ex. 8.39 is stored in A1, computing its inverse via the same stragety portrayed in Ex. 8.39 (i.e. by first computing an LU decomposition of A1). The used functions gaussEl, gaussElSolve, matInverseFromLU, and gaussElSolveFromLU can be found in Sec. F.1.3 above. The following code is available for download at https://www.math.lmu.de/~philip/code/LA1/linSysTest.r # Testing some functions for the solution of linear systems: # Load code from file linSysGaussAndLU.r: source("linSysGaussAndLU.r") A0 <- matrix(c(6,0,1,0,2,-1,1,0,-1,4,6,0),3,4) print("A0 =") print(A0) LU <- gaussEl(A0) print("LU decompostion of A0") print(LU) A <- matrix(c(6,0,1,0,2,-1,1,0,-1),3,3) print("A =") print(A) b <- c(4,6,0) b <- t(t(b)) print("b =") print(b) x <- gaussElSolve(A,b) x <- t(t(x)) print("Ax=b is solved by") print("x =") print(x) A1 <- matrix(c(1,0,2,1,4,0,6,2,2,1,3,1,3,4,1,0),4,4) print("A1 =") print(A1) LU <- gaussEl(A1) A1invLU <- matInverseFromLU(LU) print("Inverse of A1 via LU:") AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 REFERENCES 243 print("4*A1invLU =") print(4*A1invLU) print("A1*A1invLU =") print(A1%*%A1invLU) b <- c(-1,2,-3,4) b <- t(t(b)) print("b =") print(b) xlr <- gaussElSolveFromLU(LU,b) xlr <- t(t(xlr)) x <- xlr # x <- gaussElSolve(A1,b) # x <- t(t(x)) print("A1*x=b is solved by (found via LU)") print("x =") print(x) References [Bla84] A. Blass. Existence of Bases Implies the Axiom of Choice. Contemporary Mathematics 31 (1984), 31–33. [EFT07] H.-D. Ebbinghaus, J. Flum, and W. Thomas. Einführung in die mathematische Logik, 5th ed. Spektrum Akademischer Verlag, Heidelberg, 2007 (German). [Hal17] Lorenz J. Halbeisen. Combinatorial Set Theory, 2nd ed. Springer Monographs in Mathematics, Springer, Cham, Switzerland, 2017. [Jec73] T. Jech. The Axiom of Choice. North-Holland, Amsterdam, 1973. [Kun80] Kenneth Kunen. Set Theory. Studies in Logic and the Foundations of Mathematics, Vol. 102, North-Holland, Amsterdam, 1980. [Kun12] Kenneth Kunen. The Foundations of Mathematics. Studies in Logic, Vol. 19, College Publications, London, 2012. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10 REFERENCES 244 [Phi16] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, LMU Munich, 2015/2016, AMS Open Math Notes Ref. # OMN:202109.111306, available in PDF format at https://www.ams.org/open-math-notes/omn-view-listing?listingId=111306. [Phi17] P. Philip. Functional Analysis. Lecture Notes, Ludwig-Maximilians-Universität, Germany, 2017, available in PDF format at http://www.math.lmu.de/~philip/publications/lectureNot es/philipPeter_FunctionalAnalysis.pdf. [Phi19] P. Philip. Linear Algebra II. Lecture Notes, Ludwig-Maximilians-Universität, Germany, 2019, AMS Open Math Notes Ref. # OMN:202109.111305, available in PDF format at https://www.ams.org/open-math-notes/omn-view-listing?listingId=111305. [Phi23] P. Philip. Numerical Mathematics I. Lecture Notes, Ludwig-Maximilians-Universität, Germany, 2022/2023, AMS Open Math Notes Ref. # OMN:202204.111317, available in PDF format at https://www.ams.org/open-math-notes/omn-view-listing?listingId=111317. [R C20] R Core Team. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2020, https:// www.R-project.org/. [Str08] Gernot Stroth. Lineare Algebra, 2nd ed. Berliner Studienreihe zur Mathematik, Vol. 7, Heldermann Verlag, Lemgo, Germany, 2008 (German). [Wik21] Wikipedia Contributors. HOL Light — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=HOL_Light, 2021, [Online; accessed 5-February-2023]. [Wik22a] Wikipedia Contributors. Coq — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Coq, 2022, [Online; accessed 5-February-2023]. [Wik22b] Wikipedia Contributors. Isabelle (proof assistant) — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title= Isabelle_(proof_assistant), 2022, [Online; accessed 5-February-2023]. [Wik22c] Wikipedia Contributors. Lean (proof assistant) — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Lean_ (proof_assistant), 2022, [Online; accessed 5-February-2023]. AMS Open Math Notes: Works in Progress; Reference # OMN:202109.111304; Last Revised: 2023-03-05 11:01:10