David Poole A M 0 'D E R N I N T R 0 D U, C T I 0 N 4th edition Fourth edition David Poole Trent University �.. � CENGAGE •- Learning· Australia• Brazil• Mexico• Singapore•United l<ingdom •United States �,.,, # CENGAGE •• Learning· Linear Algebra A Modern Introduction, 4th Edition David Poole Product Director: Liz Covello Product Team Manager: Richard Stratton © 2015, 2011, 2006 Cengage Learning WCN: 02-200-201 ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, web Content Developer: Laura Wheel distribution, information networks, or information storage and retrieval Product Assistant: Danielle Hallock systems, except as permitted under Section 107 or 108 of the 1976 Media Developer: Andrew Coppola United States Copyright Act, without the prior written permission of the publisher. Content Project Manager: Alison Eigel Zade Senior Art Director: Linda May For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706 Manufacturing Planner: Doug Bertke For permission to use material from this text or product, submit all requests online at www-cengage.com/permissions­ Rights Acquisition Specialist: Shalice Shah-Caldwell Further permissions questions can be emailed to permissionrequest@cengage.com. Production Service & Compositor: MPS Limited Text Designer: Leonard Massiglia Cover Designer: Chris Miller Cover & Interior design Image: Image Source/Getty Images Library of Congress Control Number: 2013944173 ISBN-13: 978-1-285-46324-7 ISBN-10: 1-285-46324-2 Cengage Learning 200 First Stamford Place, 4th Floor Stamford, CT 06902 USA Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United international.cengage.com/region. Kingdom, Australia, Mexico, Brazil and japan. Locate your local office at Cengage Learning products are represented in Canada by Nelson Education, Ltd. For your course and learning solutions, visit www.cengage.com. preferred online store www.cengagebrain.com. Purchase any of our products at your local college store or at our Instructors: Please visit login.cengage.com and log in to access instructor-specific resources. Printed in the United States of America 1 2 3 4 5 6 7 17 16 15 14 13 Dedicated to the memory of Peter Hilton, who was an exemplary mathematician, educator, and citizen-a unit vector in every sense. Contents vii Preface To the Instructor xvii To the Student xxiii Chapter 1 Vectors 1 .0 1.1 1 .2 1 .3 1 .4 Chapter 2 Introduction: The Racetrack Game The Geometry and Algebra of Vectors Length and Angle: The Dot Product Exploration: Vectors and Geometry Lines and Planes 34 2.3 2.4 2.5 3 18 32 Exploration: The Cross Product 48 Writing Project: The Origins of the Dot Product and Cross Product 50 Applications Force Vectors 50 55 Chapter Review Systems of Linear Equations 2.0 2.1 2.2 iv 1 57 Introduction: Triviality 57 Introduction to Systems o f Linear Equations Direct Methods for Solving Linear Systems 58 64 Writing Project: A History of Gaussian Elimination Explorations: Lies My Computer Told Me 83 84 Partial Pivoting Counting Operations: An Introduction to the Analysis of Algorithms 85 88 Spanning Sets and Linear Independence 99 Applications Allocation of Resources 99 101 Balancing Chemical Equations Network Analysis 1 02 1 04 Electrical Networks Linear Economic Models 1 07 Finite Linear Games 1 09 Vignette: The Global Positioning System 121 Iterative Methods for Solving Linear Systems 134 Chapter Review 1 24 82 49 Chapter 3 Chapter 4 Contents V Writing Project: Which Came First: The Matrix or the Determinant? Vignette: Lewis Carroll's Condensation Method 284 Exploration: Geometric Applications of Determinants 286 Eigenvalues and Eigenvectors of n X n Matrices 292 Writing Project: The History of Eigenvalues 30 1 283 Matrices 136 3.0 3. 1 3.2 3.3 3.4 3.5 3.6 136 Introduction: Matrices in Action 138 Matrix Operations 1 54 Matrix Algebra 1 63 The Inverse of a Matrix The LU Factorization 1 80 Subspaces, Basis, Dimension, and Rank Introduction to Linear Transformations 3.7 Applications 230 Markov Chains 230 Linear Economic Models 235 Population Growth 239 Graphs and Digraphs 24 1 251 Chapter Review Vignette: Robotics 226 Eigenvalues and Eigenvectors 4.0 4. 1 4.2 4.3 4.4 4.5 4.6 191 211 253 Introduction: A Dynamical System on Graphs Introduction to Eigenvalues and Eigenvectors 263 Determinants 253 254 30 1 Similarity and Diagonalization 31 1 Iterative Methods for Computing Eigenvalues Applications and the Perron-Frobenius Theorem 325 325 Markov Chains Population Growth 330 The Perron-Frobenius Theorem 332 Linear Recurrence Relations 335 Systems of Linear Differential Equations 340 Discrete Linear Dynamical Systems 348 356 Vignette: Ranking Sports Teams and Searching the Internet Chapter Review Chapter 5 Orthogonality 5.0 5.1 5.2 5.3 5.4 5.5 364 366 Introduction: Shadows on a Wall 366 Orthogonality in IR " 368 Orthogonal Complements and Orthogonal Projections The Gram-Schmidt Process and the QR Factorization Explorations: The Modified QR Factorization 396 Approximating Eigenvalues with the QR Algorithm Orthogonal Diagonalization of Symmetric Matrices 408 Applications Quadratic Forms 408 415 Graphing Quadratic Equations 425 Chapter Review 378 388 398 400 Vi Contents Chapter 6 Vector Spaces 6.0 6. 1 Introduction: Fibonacci in (Vector) Space Vector Spaces and Subspaces 429 6.2 Linear Independence, Basis, and Dimension 6.3 6.4 6.5 6.6 6.7 Chapter 7 Chapter 8 427 427 Writing Project: The Rise of Vector Spaces Exploration: Magic Squares 443 460 443 Change of Basis 463 472 Linear Transformations The Kernel and Range of a Linear Transformation The Matrix of a Linear Transformation 497 48 1 Exploration: Tilings, Lattices, and the Crystallographic Restriction 518 Applications Homogeneous Linear Differential Equations Chapter Review 527 Distance and Approximation 529 7.0 7. 1 529 Introduction: Taxicab Geometry 531 Inner Product Spaces 518 Explorations: Vectors and Matrices with Complex Entries Geometric Inequalities and Optimization Problems 7.2 7.3 7.4 Norms and Distance Functions 552 Least Squares Approximation 568 The Singular Value Decomposition 590 7.5 Applications 610 Approximation o f Functions 618 Chapter Review Vignette: Digital Image Compression 607 610 Online only Codes 8. 1 Code Vectors 620 8.2 8.3 8.4 8.5 627 Error-Correcting Codes 632 Dual Codes 639 Linear Codes The Minimum Distance of a Code Vignette: The Codabar System APPENDIX A APPENDIXB APPENDIXC APPENDIXD APPENDIX£ 543 547 620 626 644 Mathematical Notation and Methods of Proof Al Mathematical Induction B1 Complex Numbers Cl Polynomials D1 Online only Technology Bytes Answers to Selected Odd-Numbered Exercises II Index ANS I 515 Preface 1he last thing one knows when writing a book is what to put first. 1670 -Blaise Pascal Pensees, For more on the recommendations of the Linear Algebra Curriculum Study Group, see 1he College 41-46. Mathematics Journal 24 (1993), The fourth edition of Linear Algebra: A Modern Introduction preserves the approach and features that users found to be strengths of the previous editions. However, I have streamlined the text somewhat, added numerous clarifications, and freshened up the exercises. I want students to see linear algebra as an exciting subject and to appreciate its tremendous usefulness. At the same time, I want to help them master the basic con cepts and techniques of linear algebra that they will need in other courses, both in mathematics and in other disciplines. I also want students to appreciate the interplay of theoretical, applied, and numerical mathematics that pervades the subject. This book is designed for use in an introductory one- or two-semester course sequence in linear algebra. First and foremost, it is intended for students, and I have tried my best to write the book so that students not only will find it readable but also will want to read it. As in the first three editions, I have taken into account the reality that students taking introductory linear algebra are likely to come from a variety of disciplines. In addition to mathematics majors, there are apt to be majors from engineering, physics, chemistry, computer science, biology, environmental science, geography, economics, psychology, business, and education, as well as other students taking the course as an elective or to fulfill degree requirements. Accordingly, the book balances theory and applications, is written in a conversational style yet is fully rigorous, and combines a traditional presentation with concern for student-centered learning. There is no such thing as a universally best learning style. In any class, there will be some students who work well independently and others who work best in groups; some who prefer lecture-based learning and others who thrive in a workshop setting, doing explorations; some who enjoy algebraic manipulations, some who are adept at numerical calculations (with and without a computer), and some who exhibit strong geometric intuition. In this edition, I continue to present material in a variety of ways-algebraically, geometrically, numerically, and verbally-so that all types oflearn­ ers can find a path to follow. I have also attempted to present the theoretical, computa­ tional, and applied topics in a flexible yet integrated way. In doing so, it is my hope that all students will be exposed to the many sides of linear algebra. This book is compatible with the recommendations ofthe Linear Algebra Curriculum Study Group. From a pedagogical point of view, there is no doubt that for most students Vii Viii Preface concrete examples should precede abstraction. I have taken this approach here. I also believe strongly that linear algebra is essentially about vectors and that students need to see vectors first (in a concrete setting) in order to gain some geometric insight. Moreover, introducing vectors early allows students to see how systems of linear equations arise naturally from geometric problems. Matrices then arise equally naturally as coefficient matrices oflinear systems and as agents of change (linear transformations). This sets the stage for eigenvectors and orthogonal projections, both of which are best understood geometrically. The dart that appears on the cover of this book symbolizes a vector and reflects my conviction that geometric understanding should precede computational techniques. I have tried to limit the number of theorems in the text. For the most part, results labeled as theorems either will be used later in the text or summarize preceding work. Interesting results that are not central to the book have been included as exercises or explorations. For example, the cross product of vectors is discussed only in explo­ rations (in Chapters 1 and 4). Unlike most linear algebra textbooks, this book has no chapter on determinants. The essential results are all in Section 4.2, with other inter­ esting material contained in an exploration. The book is, however, comprehensive for an introductory text. Wherever possible, I have included elementary and accessible proofs of theorems in order to avoid having to say, "The proof of this result is beyond the scope of this text:' The result is, I hope, a work that is self-contained. I have not been stingy with the applications: There are many more in the book than can be covered in a single course. However, it is important that students see the impressive range of problems to which linear algebra can be applied. I have included some modern material on finite linear algebra and coding theory that is not normally found in an intro­ ductory linear algebra text. There are also several impressive real-world applications of linear algebra and one item of historical, if not practical, interest; these applications are presented as self-contained "vignettes:' I hope that instructors will enjoy teaching from this book. More important, I hope that students using the book will come away with an appreciation of the beauty, power, and tremendous utility of linear algebra and that they will have fun along the way. W h a t's New i n t h e Fo u rt h Edition Th e overall structure and style o f Linear Algebra: A Modern Introduction remain the same in the fourth edition. Here is a summary of what is new: The applications to coding theory have been moved to the new online Chapter 8. To further engage students, five writing projects have been added to the exer­ cise sets. These projects give students a chance to research and write about aspects of the history and development oflinear algebra. The explorations, vignettes, and many of the applications provide additional material for student projects. • There are over 200 new or revised exercises. In response to reviewers' com­ ments, there is now a full proof of the Cauchy-Schwarz Inequality in Chapter 1 in the form of a guided exercise. • I have made numerous small changes in wording to improve the clarity or accuracy of the exposition. Also, several definitions have been made more explicit by giving them their own definition boxes and a few results have been highlighted by labeling them as theorems. • All existing ancillaries have been updated. • • See pages 49, 82, 283, 301, 443 Preface ix Features Clear Writi ng Stvle The text is written is a simple, direct, conversational style. As much as possible, I have used "mathematical English" rather than relying excessively on mathematical nota­ tion. However, all proofs that are given are fully rigorous, and Appendix A contains an introduction to mathematical notation for those who wish to streamline their own writing. Concrete examples almost always precede theorems, which are then followed by further examples and applications. This flow-from specific to general and back again-is consistent throughout the book. Kev concepts Introduced Early Many students encounter difficulty in linear algebra when the course moves from the computational (solving systems of linear equations, manipulating vectors and matri­ ces) to the theoretical (spanning sets, linear independence, subspaces, basis, and dimension) . This book introduces all of the key concepts of linear algebra early, in a concrete setting, before revisiting them in full generality. Vector concepts such as dot product, length, orthogonality, and projection are first discussed in Chapter 1 in the concrete setting of IR 2 and IR 3 before the more general notions of inner product, norm, and orthogonal projection appear in Chapters 5 and 7. Similarly, spanning sets and linear independence are given a concrete treatment in Chapter 2 prior to their gener­ alization to vector spaces in Chapter 6. The fundamental concepts of subspace, basis, and dimension appear first in Chapter 3 when the row, column, and null spaces of a matrix are introduced; it is not until Chapter 6 that these ideas are given a general treatment. In Chapter 4, eigenvalues and eigenvectors are introduced and explored for 2 X 2 matrices before their n X n counterparts appear. By the beginning of Chap­ ter 4, all of the key concepts of linear algebra have been introduced, with concrete, computational examples to support them. When these ideas appear in full generality later in the book, students have had time to get used to them and, hence, are not so intimidated by them. Emphasis on Vectors and Geometry In keeping with the philosophy that linear algebra is primarily about vectors, this book stresses geometric intuition. Accordingly, the first chapter is about vectors, and it develops many concepts that will appear repeatedly throughout the text. Concepts such as orthogonality, projection, and linear combination are all found in Chapter 1 , as is a comprehensive treatment o f lines and planes in IR 3 that provides essential insight into the solution of systems of linear equations. This emphasis on vectors, geometry, and visualization is found throughout the text. Linear transformations are introduced as matrix transformations in Chapter 3, with many geometric examples, before general linear transformations are covered in Chapter 6. In Chapter 4, eigen­ values are introduced with "eigenpictures" as a visual aid. The proof of Perron's Theorem is given first heuristically and then formally, in both cases using a geometric argument. The geometry of linear dynamical systems reinforces and summarizes the material on eigenvalues and eigenvectors. In Chapter 5, orthogonal projections, or­ thogonal complements of subspaces, and the Gram-Schmidt Process are all presented in the concrete setting of IR 3 before being generalized to IR " and, in Chapter 7, to inner X Preface product spaces. The nature of the singular value decomposition is also explained in­ formally in Chapter 7 via a geometric argument. Of the more than 300 figures in the text, over 200 are devoted to fostering a geometric understanding of linear algebra. Exploralions See pages 1, 136, 427, 529 See pages 32, 286, 460, 515, 543, 547 See pages 83, 84, 85, 396, 398 The introduction to each chapter is a guided exploration (Section O) in which stu­ dents are invited to discover, individually or in groups, some aspect of the upcoming chapter. For example, "The Racetrack Game" introduces vectors, "Matrices in Action" introduces matrix multiplication and linear transformations, "Fibonacci in (Vector) Space" touches on vector space concepts, and "Taxicab Geometry" sets up general­ ized norms and distance functions. Additional explorations found throughout the book include applications of vectors and determinants to geometry, an investigation of 3 X 3 magic squares, a study of symmetry via the tilings of M. C. Escher, an intro­ duction to complex linear algebra, and optimization problems using geometric inequalities. There are also explorations that introduce important numerical consid­ erations and the analysis of algorithms. Having students do some of these explo­ rations is one way of encouraging them to become active learners and to give them "ownership" over a small part of the course. APPlicalions See pages 623, 641 See pages 121, 226, 356, 607, 626 The book contains an abundant selection of applications chosen from a broad range of disciplines, including mathematics, computer science, physics, chemistry, engi­ neering, biology, business, economics, psychology, geography, and sociology. Note­ worthy among these is a strong treatment of coding theory, from error-detecting codes (such as International Standard Book Numbers) to sophisticated error­ correcting codes (such as the Reed-Muller code that was used to transmit satellite photos from space). Additionally, there are five "vignettes" that briefly showcase some very modern applications of linear algebra: the Global Positioning System (GPS), ro­ botics, Internet search engines, digital image compression, and the Codabar System. Examples and Exercises See pages 248, 359, 526, 588 There are over 400 examples in this book, most worked in greater detail than is cus­ tomary in an introductory linear algebra textbook. This level of detail is in keeping with the philosophy that students should want (and be able) to read a textbook. Accordingly, it is not intended that all of these examples be covered in class; many can be assigned for individual or group study, possibly as part of a project. Most examples have at least one counterpart exercise so that students can try out the skills covered in the example before exploring generalizations. There are over 2000 exercises, more than in most textbooks at a similar level. Answers to most of the computational odd-numbered exercises can be found in the back of the book. Instructors will find an abundance of exercises from which to select homework assignments. The exercises in each section are graduated, progressing from the routine to the challenging. Exercises range from those intended for hand computa­ tion to those requiring the use of a calculator or computer algebra system, and from theoretical and numerical exercises to conceptual exercises. Many of the examples and exercises use actual data compiled from real-world situations. For example, there are problems on modeling the growth of caribou and seal populations, radiocarbon dating Preface Xi of the Stonehenge monument, and predicting major league baseball players' salaries. Working such problems reinforces the fact that linear algebra is a valuable tool for mod­ eling real-life problems. Additional exercises appear in the form of a review after each chapter. In each set, there are 10 true/false questions designed to test conceptual understanding, followed by 1 9 computational and theoretical exercises that summarize the main concepts and techniques of that chapter. Biographical Sketches and Etvmological Notes See page 34 It is important that students learn something about the history of mathematics and come to see it as a social and cultural endeavor as well as a scientific one. Accord­ ingly, the text contains short biographical sketches about many of the mathemati­ cians who contributed to the development of linear algebra. I hope that these will help to put a human face on the subject and give students another way of relating to the material. I have found that many students feel alienated from mathematics because the terminology makes no sense to them-it is simply a collection of words to be learned. To help overcome this problem, I have included short etymological notes that give the origins of many of the terms used in linear algebra. (For example, why do we use the word normal to refer to a vector that is perpendicular to a plane?) Margin Icons The margins of the book contain several icons whose purpose is to alert the reader in various ways. Calculus is not a prerequisite for this book, but linear algebra has many interesting and important applications to calculus. The � icon denotes an example or exercise that requires calculus. (This material can be omitted if not everyone in the class has had at least one semester of calculus. Alternatively, this material can be as­ signed as projects.) The � icon denotes an example or exercise involving complex numbers. (For students unfamiliar with complex numbers, Appendix C contains all the background material that is needed.) The cAs icon indicates that a computer algebra system (such as Maple, Mathematica, or MATLAB) or a calculator with matrix capa­ bilities (such as almost any graphing calculator) is required-or at least very useful­ for solving the example or exercise. In an effort to help students learn how to read and use this textbook most ef­ fectively, I have noted various places where the reader is advised to pause. These may be places where a calculation is needed, part of a proof must be supplied, a claim should be verified, or some extra thought is required. The _.... icon appears in the margin at such places; the message is "Slow down. Get out your pencil. Think about this:' Technology This book can be used successfully whether or not students have access to technol­ ogy. However, calculators with matrix capabilities and computer algebra systems are now commonplace and, properly used, can enrich the learning experience as well as help with tedious calculations. In this text, I take the point of view that stu­ dents need to master all of the basic techniques of linear algebra by solving by hand examples that are not too computationally difficult. Technology may then be used Xii Preface (in whole or in part) to solve subsequent examples and applications and to apply techniques that rely on earlier ones. For example, when systems of linear equations are first introduced, detailed solutions are provided; later, solutions are simply given, and the reader is expected to verify them. This is a good place to use some form of technology. Likewise, when applications use data that make hand calcula­ tion impractical, use technology. All of the numerical methods that are discussed depend on the use of technology. With the aid of technology, students can explore linear algebra in some exciting ways and discover much for themselves. For example, if one of the coefficients of a linear system is replaced by a parameter, how much variability is there in the solu­ tions? How does changing a single entry of a matrix affect its eigenvalues? This book is not a tutorial on technology, and in places where technology can be used, I have not specified a particular type of technology. The student companion website that accompanies this book offers an online appendix called Technology Bytes that gives instructions for solving a selection of examples from each chapter using Maple, Math­ ematica, and MATLAB. By imitating these examples, students can do further calcula­ tions and explorations using whichever CAS they have and exploit the power of these systems to help with the exercises throughout the book, particularly those marked with the cAs icon. The website also contains data sets and computer code in Maple, Mathematica, and MATLAB formats keyed to many exercises and examples in the text. Students and instructors can import these directly into their CAS to save typing and eliminate errors. Finite and Numerical linear Algebra See pages 83, 84, 124, 180, 311, 392, 555, 561, 568, 590 See pages 319, 563, 600 The text covers two aspects of linear algebra that are scarcely ever mentioned to­ gether: finite linear algebra and numerical linear algebra. By introducing modular arithmetic early, I have been able to make finite linear algebra (more properly, "linear algebra over finite fields;' although I do not use that phrase) a recurring theme throughout the book. This approach provides access to the material on coding theory in Chapter 8 (online) . There is also an application to finite linear games in Section 2.4 that students really enjoy. In addition to being exposed to the applications of finite linear algebra, mathematics majors will benefit from seeing the material on finite fields, because they are likely to encounter it in such courses as discrete mathematics, abstract algebra, and number theory. All students should be aware that in practice, it is impossible to arrive at exact solutions of large-scale problems in linear algebra. Exposure to some of the tech­ niques of numerical linear algebra will provide an indication of how to obtain highly accurate approximate solutions. Some of the numerical topics included in the book are roundoff error and partial pivoting, iterative methods for solving linear systems and computing eigenvalues, the LU and QR factorizations, matrix norms and condition numbers, least squares approximation, and the singular value decomposition. The inclusion of numerical linear algebra also brings up some interesting and important issues that are completely absent from the theory of linear algebra, such as pivoting strategies, the condition of a linear system, and the convergence of iterative methods. This book not only raises these questions but also shows how one might approach them. Gerschgorin disks, matrix norms, and the singular values of a matrix, discussed in Chapters 4 and 7, are useful in this regard. Preface Xiii Appendices Appendix A contains an overview of mathematical notation and methods of proof, and Appendix B discusses mathematical induction. All students will benefit from these sections, but those with a mathematically oriented major may wish to pay particular attention to them. Some of the examples in these appendices are uncom­ mon (for instance, Example B.6 in Appendix B) and underscore the power of the methods. Appendix C is an introduction to complex numbers. For students familiar with these results, this appendix can serve as a useful reference; for others, this sec­ tion contains everything they need to know for those parts of the text that use com­ plex numbers. Appendix D is about polynomials. I have found that many students require a refresher about these facts. Most students will be unfamiliar with Descartes's Rule of Signs; it is used in Chapter 4 to explain the behavior of the eigenvalues of Leslie matrices. Exercises to accompany the four appendices can be found on the book's website. Short answers to most of the odd-numbered computational exercises are given at the end of the book. Exercise sets to accompany Appendixes A, B, C, and D are avail­ able on the companion website, along with their odd-numbered answers. Ancillaries For 1ns1ruc1ors Enhanced WebAssign® �ebAssign Printed Access Card: 978- 1 -285-85829-6 Online Access Code: 978- 1 -285-85827-2 Exclusively from Cengage Learning, Enhanced WebAssign combines the exceptional mathematics content that you know and love with the most powerful online home­ work solution, WebAssign. Enhanced WebAssign engages students with immediate feedback, rich tutorial content, and interactive, fully customizable eBooks (YouBook), helping students to develop a deeper conceptual understanding of their subject matter. Flexible assignment options give instructors the ability to release assignments conditionally based on students' prerequisite assignment scores. Visit us at www. cengage.com/ewa to learn more. Cengage Learning Testing Powered by Cognero Cengage Learning Testing Powered by Cognero is a flexible, online system that allows you to author, edit, and manage test bank content from multiple Cengage Learning solutions; create multiple test versions in an instant; and deliver tests from your LMS, your classroom, or wherever you want. Complete Solutions Manual The Complete Solutions Manual provides detailed solutions to all exercises in the text, including Exploration and Chapter Review exercises. The Complete Solutions Manual is available online. Instructor's Guide This online guide enhances the text with valuable teaching resources such as group work projects, teaching tips, interesting exam questions, examples and extra xiv Preface material for lectures, and other items designed to reduce the instructor's prepara­ tion time and make linear algebra class an exciting and interactive experience. For each section of the text, the Instructor's Guide includes suggested time and empha­ sis, points to stress, questions for discussion, lecture materials and examples, tech­ nology tips, student projects, group work with solutions, sample assignments, and suggested test questions. Solution Builder www.cengage.com/solutionbuilder Solution Builder provides full instructor solutions to all exercises in the text, includ­ ing those in the explorations and chapter reviews, in a convenient online format. Solution Builder allows instructors to create customized, secure PDF printouts of solutions matched exactly to the exercises assigned for class. *Access Cognero and additional instructor resources online at login.cengage.com. For Students Student Solutions Manual (ISBN- 13: 978 - 1 -285-84 195-3) The Student Solutions Manual and Study Guide includes detailed solutions to all odd­ numbered exercises and selected even-numbered exercises; section and chapter summaries of symbols, definitions, and theorems; and study tips and hints. Complex exercises are explored through a question-and-answer format designed to deepen understanding. Challenging and entertaining problems that further explore selected exercises are also included. Enhanced WebAssign® � WebAssign Printed Access Card: 978- 1 -285-85829-6 Online Access Code: 978- 1 -285-85827-2 Enhanced Web Assign (assigned by the instructor) provides you with instant feedback on homework assignments. This online homework system is easy to use and includes helpful links to textbook sections, video examples, and problem-specific tutorials. CengageBrain.com To access additional course materials and companion resources, please visit www. cengagebrain.com. At the CengageBrain.com home page, search for the ISBN of your title (from the back cover of your book) using the search box at the top of the page. This will take you to the product page where free companion resources can be found. Acknowledgments The reviewers of the previous edition of this text contributed valuable and often in­ sightful comments about the book. I am grateful for the time each of them took to do this. Their judgement and helpful suggestions have contributed greatly to the devel­ opment and success of this book, and I would like to thank them personally: Jamey Bass, City College of San Francisco; Olga Brezhneva, Miami University; Karen Clark, The College of New Jersey; Marek Elzanowski, Portland State University; Christopher Francisco, Oklahoma State University; Brian Jue, California State University, Stanislaus; Alexander Kheyfits, Bronx Community College/CUNY; Henry Krieger, Harvey Mudd College; Rosanna Pearlstein, Michigan State To the Instructor See page 284 See page 286 See pages 325, 330 See page 348 xix course" in determinants contains all the essential material students need, including an optional but elementary proof of the Laplace Expansion Theorem. The vignette "Lewis Carroll's Condensation Method" presents a historically interesting, alternative method of calculating determinants that students may find appealing. The explo­ ration "Geometric Applications of Determinants" makes a nice project that contains several interesting and useful results. (Alternatively, instructors who wish to give more detailed coverage to determinants may choose to cover some of this exploration in class.) The basic theory of eigenvalues and eigenvectors is found in Section 4.3, and Section 4.4 deals with the important topic of diagonalization. Example 4.29 on powers of matrices is worth covering in class. The power method and its variants, discussed in Section 4.5, are optional, but all students should be aware of the method, and an applied course should cover it in detail. Gerschgorin's Disk Theorem can be covered independently of the rest of Section 4.5. Markov chains and the Leslie model of pop­ ulation growth reappear in Section 4.6. Although the proof of Perron's Theorem is optional, the theorem itself (like the stronger Perron-Frobenius Theorem) should at least be mentioned because it explains why we should expect a unique positive eigen­ value with a corresponding positive eigenvector in these applications. The applica­ tions on recurrence relations and differential equations connect linear algebra to dis­ crete mathematics and calculus, respectively. The matrix exponential can be covered if your class has a good calculus background. The final topic of discrete linear dynamical systems revisits and summarizes many of the ideas in Chapter 4, looking at them in a new, geometric light. Students will enjoy reading how eigenvectors can be used to help rank sports teams and websites. This vignette can easily be extended to a project or enrichment activity. Chapter 5: onhouonalilv See page 366 See pages 396, 398 See pages 408, 415 The introductory exploration, "Shadows on a Wall;' is mathematics at its best: it takes a known concept (projection of a vector onto another vector) and generalizes it in a useful way (projection of a vector onto a subspace-a plane), while uncovering some previously unobserved properties. Section 5 . 1 contains the basic results about or­ thogonal and orthonormal sets of vectors that will be used repeatedly from here on. In particular, orthogonal matrices should be stressed. In Section 5.2, two concepts from Chapter 1 are generalized: the orthogonal complement of a subspace and the orthogonal projection of a vector onto a subspace. The Orthogonal Decomposition Theorem is important here and helps to set up the Gram-Schmidt Process. Also note the quick proof of the Rank Theorem. The Gram-Schmidt Process is detailed in Section 5.3, along with the extremely important QR factorization. The two explo­ rations that follow outline how the QR factorization is computed in practice and how it can be used to approximate eigenvalues. Section 5.4 on orthogonal diagonalization of (real) symmetric matrices is needed for the applications that follow. It also contains the Spectral Theorem, one of the highlights of the theory of linear algebra. The appli­ cations in Section 5.5 are quadratic forms and graphing quadratic equations. I always include at least the second of these in my course because it extends what students al­ ready know about conic sections. Chapter 6: vector Spaces See page 427 The Fibonacci sequence reappears in Section 6.0, although it is not important that students have seen it before (Section 4.6). The purpose of this exploration is to show Preface XV University; William Sullivan, Portland State University; Matthias Weber, Indiana University. I am indebted to a great many people who have, over the years, influenced my views about linear algebra and the teaching of mathematics in general. First, I would like to thank collectively the participants in the education and special linear algebra sessions at meetings of the Mathematical Association of America and the Canadian Mathematical Society. I have also learned much from participation in the Canadian Mathematics Education Study Group and the Canadian Mathematics Education Forum. I especially want to thank Ed Barbeau, Bill Higginson, Richard Hoshino, John Grant McLaughlin, Eric Muller, Morris Orzech, Bill Ralph, Pat Rogers, Peter Taylor, and Walter Whiteley, whose advice and inspiration contributed greatly to the philosophy and style of this book. My gratitude as well to Robert Rogers, who devel­ oped the student and instructor solutions, as well as the excellent study guide content. Special thanks go to Jim Stewart for his ongoing support and advice. Joe Rotman and his lovely book A First Course in Abstract Algebra inspired the etymological notes in this book, and I relied heavily on Steven Schwartzman's The Words of Mathematics when compiling these notes. I thank Art Benjamin for introducing me to the Codabar system and Joe Grear for clarifying aspects of the history of Gaussian elimination. My colleagues Marcus Pivato and Reem Yassawi provided useful information about dy­ namical systems. As always, I am grateful to my students for asking good questions and providing me with the feedback necessary to becoming a better teacher. I sincerely thank all of the people who have been involved in the production of this book. Jitendra Kumar and the team at MPS Limited did an amazing job produc­ ing the fourth edition. I thank Christine Sabooni for doing a thorough copyedit. Most of all, it has been a delight to work with the entire editorial, marketing, and produc­ tion teams at Cengage Learning: Richard Stratton, Molly Taylor, Laura Wheel, Cynthia Ashton, Danielle Hallock, Andrew Coppola, Alison Eigel Zade, and Janay Pryor. They offered sound advice about changes and additions, provided assistance when I needed it, but let me write the book I wanted to write. I am fortunate to have worked with them, as well as the staffs on the first through third editions. As always, I thank my family for their love, support, and understanding. Without them, this book would not have been possible. David Poole dpoole@trentu.ca To the Instructor "Would you tell me, please, which way I ought to go from here?" "That depends a good deal on where you want to get to," said the Cat. -Lewis Carroll 1865 Alice's Adventures in Wonderland, This text was written with flexibility in mind. It is intended for use in a one- or two-semester course with 36 lectures per semester. The range of topics and applica­ tions makes it suitable for a variety of audiences and types of courses. However, there is more material in the book than can be covered in class, even in a two­ semester course. After the following overview of the text are some brief suggestions for ways to use the book. An overview of t h e Text Chanler 1: vec1ors See page 1 See page 32 See page 48 The racetrack game in Section 1 .0 serves to introduce vectors in an informal way. (It's also quite a lot of fun to play!) Vectors are then formally introduced from both algebraic and geometric points of view. The operations of addition and scalar multiplication and their properties are first developed in the concrete settings of !R 2 and IR3 before being general­ ized to !R n . Modular arithmetic and finite linear algebra are also introduced. Section 1 .2 defines the dot product of vectors and the related notions of length, angle, and orthogo­ nality. The very important concept of (orthogonal) projection is developed here; it will reappear in Chapters 5 and 7. The exploration "Vectors and Geometry" shows how vec­ tor methods can be used to prove certain results in Euclidean geometry. Section 1 .3 is a basic but thorough introduction to lines and planes in IR 2 and IR3. This section is crucial for understanding the geometric significance of the solution of linear systems in Chap­ ter 2. Note that the cross product of vectors in IR3 is left as an exploration. The chapter concludes with an application to force vectors. Chapter 2: svstems of linear Equations See page 57 The introduction to this chapter serves to illustrate that there is more than one way to think of the solution to a system of linear equations. Sections 2 . 1 and 2.2 develop the xv i i xv i i i To the Instructor See pages 72, 205, 386, 486 See page 121 See pages 83, 84, 85 main computational tool for solving linear systems: row reduction of matrices (Gaus­ sian and Gauss-Jordan elimination) . Nearly all subsequent computational methods in the book depend on this. The Rank Theorem appears here for the first time; it shows up again, in more generality, in Chapters 3, 5, and 6. Section 2.3 is very important; it introduces the fundamental notions of spanning sets and linear independence of vec­ tors. Do not rush through this material. Section 2.4 contains six applications from which instructors can choose depending on the time available and the interests of the class. The vignette on the Global Positioning System provides another application that students will enjoy. The iterative methods in Section 2.5 will be optional for many courses but are essential for a course with an applied/numerical focus. The three ex­ plorations in this chapter are related in that they all deal with aspects of the use of computers to solve linear systems. All students should at least be made aware of these issues. Chanler 3: Malrices See page 136 See pages 172, 206, 296, 512, 605 See page 226 See pages 230, 239 This chapter contains some of the most important ideas in the book. It is a long chapter, but the early material can be covered fairly quickly, with extra time allowed for the crucial material in Section 3.5. Section 3.0 is an exploration that introduces the notion of a linear transformation: the idea that matrices are not just static objects but rather a type of function, transforming vectors into other vectors. All of the basic facts about matrices, matrix operations, and their properties are found in the first two sections. The material on partitioned matrices and the multiple representations of the matrix product is worth stressing, because it is used repeatedly in subsequent sections. The Fundamental Theorem of Invertible Matrices in Section 3.3 is very important and will appear several more times as new characterizations of invertibility are pre­ sented. Section 3.4 discusses the very important LU factorization of a matrix. If this topic is not covered in class, it is worth assigning as a project or discussing in a work­ shop. The point of Section 3.5 is to present many of the key concepts of linear algebra (subspace, basis, dimension, and rank) in the concrete setting of matrices before stu dents see them in full generality. Although the examples in this section are all famil­ iar, it is important that students get used to the new terminology and, in particular, understand what the notion of a basis means. The geometric treatment of linear transformations in Section 3.6 is intended to smooth the transition to general linear transformations in Chapter 6. The example of a projection is particularly important because it will reappear in Chapter 5. The vignette on robotic arms is a concrete demonstration of composition of linear (and affine) transformations. There are four applications from which to choose in Section 3.7. Either Markov chains or the Leslie model of population growth should be covered so that they can be used again in Chapter 4, where their behavior will be explained. Chanler 4: Eigenvalues and Eigenveclors See page 253 The introduction Section 4.0 presents an interesting dynamical system involving graphs. This exploration introduces the notion of an eigenvector and foreshadows the power method in Section 4.5. In keeping with the geometric emphasis of the book, Section 4. 1 contains the novel feature of "eigenpictures" as a way of visualizing the eigenvectors of 2 X 2 matrices. Determinants appear in Section 4.2, motivated by their use in finding the characteristic polynomials of small matrices. This "crash XX To the Instructor See page 515 that familiar vector space concepts (Section 3.5) can be used fruitfully in a new setting. Because all of the main ideas of vector spaces have already been introduced in Chapters 1 -3, students should find Sections 6. 1 and 6.2 fairly familiar. The emphasis here should be on using the vector space axioms to prove properties rather than rely­ ing on computational techniques. When discussing change of basis in Section 6.3, it is helpful to show students how to use the notation to remember how the construc­ tion works. Ultimately, the Gauss-Jordan method is the most efficient here. Sec­ tions 6.4 and 6.5 on linear transformations are important. The examples are related to previous results on matrices (and matrix transformations) . In particular, it is impor­ tant to stress that the kernel and range of a linear transformation generalize the null space and column space of a matrix. Section 6.6 puts forth the notion that (almost) all linear transformations are essentially matrix transformations. This builds on the information in Section 3.6, so students should not find it terribly surprising. However, the examples should be worked carefully. The connection between change of basis and similarity of matrices is noteworthy. The exploration "Tilings, Lattices, and the Crystallographic Restriction" is an impressive application of change of basis. The con­ nection with the artwork of M. C. Escher makes it all the more interesting. The appli­ cations in Section 6.7 build on previous ones and can be included as time and interest permit. Chapter 1: Distance and Approximation See page 529 See page 543 See page 547 See page 607 Section 7 .0 opens with the entertaining "Taxicab Geometry" exploration. Its purpose is to set up the material on generalized norms and distance functions (metrics) that follows. Inner product spaces are discussed in Section 7 . 1 ; the em­ phasis here should be on the examples and using the axioms. The exploration "Vec­ tors and Matrices with Complex Entries" shows how the concepts of dot product, symmetric matrix, orthogonal matrix, and orthogonal diagonalization can be ex­ tended from real to complex vector spaces. The following exploration, "Geometric Inequalities and Optimization Problems:' is one that students typically enjoy. (They will have fun seeing how many "calculus" problems can be solved without using calculus at all!) Section 7.2 covers generalized vector and matrix norms and shows how the condition number of a matrix is related to the notion of ill-conditioned linear systems explored in Chapter 2. Least squares approximation (Section 7.3) is an important application of linear algebra in many other disciplines. The Best Ap­ proximation Theorem and the Least Squares Theorem are important, but their proofs are intuitively clear. Spend time here on the examples-a few should suffice. Section 7.4 presents the singular value decomposition, one of the most impressive applications of linear algebra. If your course gets this far, you will be amply re­ warded. Not only does the SVD tie together many notions discussed previously; it also affords some new (and quite powerful) applications. If a CAS is available, the vignette on digital image compression is worth presenting; it is a visually impres­ sive display of the power of linear algebra and a fitting culmination to the course. The further applications in Section 7.5 can be chosen according to the time avail­ able and the interests of the class. Chapter 8: Codes This online chapter contains applications of linear algebra to the theory of codes. Section 8 . 1 begins with a discussion of how vectors can be used to design To the Instructor See page 626 XXi error- detecting codes such as the familiar Universal Product Code (UPC) and International Standard Book Number (ISBN). This topic only requires knowl­ edge of Chapter 1 . The vignette on the Codabar system used in credit and bank cards is an excellent classroom presentation that can even be used to introduce Section 8 . 1 . Once students are familiar with matrix operations, Section 8.2 de­ scribes how codes can be designed to correct as well as detect errors. The Hamming codes introduced here are perhaps the most famous examples of such error- correcting codes. Dual codes, discussed in Section 8.3, are an important way of constructing new codes from old ones. The notion of orthogonal comple­ ment, introduced in Chapter 5, is the prerequisite concept here. The most important, and most widely used, class of codes is the class of linear codes that is defined in Section 8.4. The notions of subspace, basis, and dimension are key here. The powerful Reed-Muller codes used by NASA spacecraft are important examples of linear codes. Our discussion of codes concludes in Section 8.5 with the definition of the minimum distance of a code and the role it plays in deter­ mining the error-correcting capability of the code. H o w to u s e t h e B o o k Students find the book easy to read, s o I usually have them read a section before I cover the material in class. That way, I can spend class time highlighting the most important concepts, dealing with topics students find difficult, working examples, and discussing applications. I do not attempt to cover all of the material from the assigned reading in class. This approach enables me to keep the pace of the course fairly brisk, slowing down for those sections that students typically find challenging. In a two-semester course, it is possible to cover the entire book, including a rea­ sonable selection of applications. For extra flexibility, you might omit some of the topics (for example, give only a brief treatment of numerical linear algebra), thereby freeing up time for more in-depth coverage of the remaining topics, more applica­ tions, or some of the explorations. In an honors mathematics course that emphasizes proofs, much of the material in Chapters 1 -3 can be covered quickly. Chapter 6 can then be covered in conjunction with Sections 3.5 and 3.6, and Chapter 7 can be in­ tegrated into Chapter 5. I would be sure to assign the explorations in Chapters 1, 4, 6, and 7 for such a class. For a one-semester course, the nature of the course and the audience will deter­ mine which topics to include. Three possible courses are described below and on the following page. The basic course, described first, has fewer than 36 hours suggested, allowing time for extra topics, in-class review, and tests. The other two courses build on the basic course but are still quite flexible. A Basic Course A course designed for mathematics majors and students from other disciplines is outlined on the next page. This course does not mention general vector spaces at all (all concepts are treated in a concrete setting) and is very light on proofs. Still, it is a thorough introduction to linear algebra. XXii To the Instructor Section 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.5 Number of Lectures 1 - 1 .5 1 - 1 .5 0.5- 1 1 -2 1 -2 1-2 2 2 Section Number of Lectures 3.6 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 7.3 1 -2 1 2 1 1 -2 1 - 1 .5 1 - 1 .5 0.5 1 2 Total: 23-30 lectures Because the students in a course such as this one represent a wide variety of dis­ ciplines, I would suggest using much of the remaining lecture time for applications. In my course, I do code vectors in Section 8. 1 , which students really seem to like, and at least one application from each of Chapters 2-5. Other applications can be as­ signed as projects, along with as many of the explorations as desired. There is also sufficient lecture time available to cover some of the theory in detail. A Course with a Comoulalional Emphasis For a course with a computational emphasis, the basic course outlined on the previous page can be supplemented with the sections of the text dealing with numerical linear algebra. In such a course, I would cover part or all of Sections 2.5, 3.4, 4.5, 5.3, 7.2, and 7.4, ending with the singular value decomposition. The explorations in Chapters 2 and 5 are particularly well suited to such a course, as are almost any of the applications. A course tor Sludenls Who Have Already SIUdied Some linear Algebra Some courses will be aimed at students who have already encountered the basic prin­ ciples of linear algebra in other courses. For example, a college algebra course will often include an introduction to systems of linear equations, matrices, and deter­ minants; a multivariable calculus course will almost certainly contain material on vectors, lines, and planes. For students who have seen such topics already, much early material can be omitted and replaced with a quick review. Depending on the back­ ground of the class, it may be possible to skim over the material in the basic course up to Section 3.3 in about six lectures. If the class has a significant number of mathemat­ ics majors (and especially if this is the only linear algebra course they will take), I would be sure to cover Sections 6. 1 - 6.5, 7. 1 , and 7.4 and as many applications as time permits. If the course has science majors (but not mathematics majors), I would cover Sections 6. 1 and 7. 1 and a broader selection of applications, being sure to include the material on differential equations and approximation of functions. If computer sci­ ence students or engineers are prominently represented, I would try to do as much of the material on codes and numerical linear algebra as I could. There are many other types of courses that can successfully use this text. I hope that you find it useful for your course and that you enjoy using it. To the Student "Where shall I begin, please your Majesty?" he asked. "Begin at the beginning," the King said, gravely, "and go on till you come to the end: then stop." -Lewis Carroll 1865 Alice's Adventures in Wonderland, Linear algebra is an exciting subject. It is full of interesting results, applications to other disciplines, and connections to other areas of mathematics. The Student Solu­ tions Manual and Study Guide contains detailed advice on how best to use this book; following are some general suggestions. Linear algebra has several sides: There are computational techniques, concepts, and applications. One of the goals of this book is to help you master all of these facets of the subject and to see the interplay among them. Consequently, it is important that you read and understand each section of the text before you attempt the exercises in that section. If you read only examples that are related to exercises that have been assigned as homework, you will miss much. Make sure you understand the defini­ tions of terms and the meaning of theorems. Don't worry if you have to read some­ thing more than once before you understand it. Have a pencil and calculator with you as you read. Stop to work out examples for yourself or to fill in missing calculations. The � icon in the margin indicates a place where you should pause and think over what you have read so far. Answers to most odd-numbered computational exercises are in the back of the book. Resist the temptation to look up an answer before you have completed a ques­ tion. And remember that even if your answer differs from the one in the back, you may still be right; there is more than one correct way to express some of the solutions. For example, a value of l / v2 can also be expressed as v2/2 and the set of all scalar multiples of the vector [� ] [� ] . is the same as the set of all scalar multiples of 1 2 As you encounter new concepts, try to relate them to examples that you know. Write out proofs and solutions to exercises in a logical, connected way, using com plete sentences. Read back what you have written to see whether it makes sense. Better yet, if you can, have a friend in the class read what you have written. If it doesn't make sense to another person, chances are that it doesn't make sense, period. You will find that a calculator with matrix capabilities or a computer algebra sys­ tem is useful. These tools can help you to check your own hand calculations and are indispensable for some problems involving tedious computations. Technology also xx i i i XXiV To the Student enables you to explore aspects of linear algebra on your own. You can play "what if?" games: What if I change one of the entries in this vector? What if this matrix is of a different size? Can I force the solution to be what I would like it to be by changing something? To signal places in the text or exercises where the use of technology is recommended, I have placed the icon cAs in the margin. The companion website that accompanies this book contains computer code working out selected exercises from the book using Maple, Mathematica, and MATLAB, as well as Technology Bytes, an appendix providing much additional advice about the use of technology in linear algebra. You are about to embark on a journey through linear algebra. Think of this book as your travel guide. Are you ready? Let's go! Vectors Here they come pouring out of the blue. Little arrowsfor me andfor you. -Albert Hammond and Mike Hazelwood 1968 Little Arrows Dutchess Music/BM!, 1.0 I n t ro d u ctio n : T h e R a cetrack G a m e Many measurable quantities, such as length, area, volume, mass, and temperature, can be completely described by specifying their magnitude. Other quantities, such as velocity, force, and acceleration, require both a magnitude and a direction for their description. These quantities are vectors. For example, wind velocity is a vector consisting of wind speed and direction, such as 1 0 km/h southwest. Geometrically, vectors are often represented as arrows or directed line segments. Although the idea of a vector was introduced in the 1 9th century, its usefulness in applications, particularly those in the physical sciences, was not realized until the 20th century. More recently, vectors have found applications in computer science, statistics, economics, and the life and social sciences. We will consider some of these many applications throughout this book. This chapter introduces vectors and begins to consider some of their geometric and algebraic properties. We begin, though, with a simple game that introduces some of the key ideas. [You may even wish to play it with a friend during those (very rare!) dull moments in linear algebra class.] The game is played on graph paper. A track, with a starting line and a finish line, is drawn on the paper. The track can be of any length and shape, so long as it is wide enough to accommodate all of the players. For this example, we will have two players (let's call them Ann and Bert) who use different colored pens to represent their cars or bicycles or whatever they are going to race around the track. (Let's think of Ann and Bert as cyclists.) Ann and Bert each begin by drawing a dot on the starting line at a grid point on the graph paper. They take turns moving to a new grid point, subject to the following rules: 1. Each new grid point and the line segment connecting it to the previous grid point must lie entirely within the track. 2. No two players may occupy the same grid point on the same turn. (This is the "no collisions" rule.) 3. Each new move is related to the previous move as follows: If a player moves a units horizontally and b units vertically on one move, then on the next move he or she must move between a 1 and a + 1 units horizontally and between - 2 Chapter 1 Vectors b - 1 and b + 1 units vertically. In other words, if the second move is c units horizontally and d units vertically, then l a - c l :::::: 1 and l b - d i :::::: 1 . (This is the "acceleration/deceleration" rule.) Note that this rule forces the first move to be 1 unit vertically and/or 1 unit horizontally. :r: r§ (1805-1865) A player who collides with another player or leaves the track is eliminated. The winner is the first player to cross the finish line. If more than one player crosses the finish line on the same turn, the one who goes farthest past the finish line is the winner. In the sample game shown in Figure 1 . 1 , Ann was the winner. Bert accelerated too quickly and had difficulty negotiating the turn at the top of the track. To understand rule 3, consider Ann's third and fourth moves. On her third move, she went 1 unit horizontally and 3 units vertically. On her fourth move, her options were to move 0 to 2 units horizontally and 2 to 4 units vertically. (Notice that some of these combinations would have placed her outside the track.) She chose to move 2 units in each direction. The Irish mathematician William Rowan Hamilton used vector concepts in his study of complex numbers and their generalization, the quaternions. I A I B Figure 1 . 1 r r l l A sample game of racetrack Problem 2 Problem 1 of moves? Play a few games of racetrack. Is it possible for Bert to win this race by choosing a different sequence Problem 3 Use the notation [a, b] to denote a move that is a units horizontally and b units vertically. (Either a or b or both may be negative.) If move [3, 4] has just been made, draw on graph paper all the grid points that could possibly be reached on the next move. Problem 4 What is the net effect of two successive moves? In other words, if you move [a, b] and then [c, d] , how far horizontally and vertically will you have moved altogether? Section 1.1 The Geometry and Algebra of Vectors 3 Problem 5 Write out Ann's sequence of moves using the [a, b] notation. Suppose she begins at the origin (O, 0) on the coordinate axes. Explain how you can find the coordinates of the grid point corresponding to each of her moves without looking at the graph paper. If the axes were drawn differently, so that Ann's starting point was not the origin but the point (2, 3), what would the coordinates of her final point be? Although simple, this game introduces several ideas that will be useful in our study of vectors. The next three sections consider vectors from geometric and alge­ braic viewpoints, beginning, as in the racetrack game, in the plane. T h e G e o m etrv a n d A l g e b ra o f vectors Vectors in the Plane The Cartesian plane is named after the French philosopher and mathematician Rene Descartes whose introduction of coordinates allowed geometric problems to be handled using algebraic techniques. (1596-1650), The word vector comes from the Latin root meaning "to carrY:' A vector is formed when a point is displaced-or "carried off" -a given distance in a given direction. Viewed another way, vectors "carry" two pieces of information: their length and their direction. When writing vectors by hand, it is difficult to indicate boldface. Some people prefer to write v for the vector denoted in print by v, but in most cases it is fine to use an ordinary lowercase v. It will usu­ ally be clear from the context when the letter denotes a vector. We begin by considering the Cartesian plane with the familiar x- and y-axes. A vector is a directed line segment that corresponds to a displacement from one point A to another point B; see Figure 1 .2. The vector from A to B is denoted by AB ; the point A is called its initial point, or tail, and the point B is called its terminal point, or head. Often, a vector is simply denoted by a single boldface, lowercase letter such as v. The set of all points in the plane corresponds to the set of all vectors whose tails ----> are at the origin 0. To each point A, there corresponds the vector a = OA; to each vector a with tail at 0, there corresponds its head A. (Vectors of this form are some­ times called position vectors.) It is natural to represent such vectors using coordinates. For example, in ----> Figure 1 .3, A = (3, 2) and we write the vector a = OA = [3, 2] using square brackets. Similarly, the other vectors in Figure 1 .3 are [ - 1 , 3 ] and c = [ 2, - 1 ] The individual coordinates (3 and 2 in the case of a) are called the components of the vector. A vector is sometimes said to be an ordered pair of real numbers. The order is important since, for example, [3, 2] � [2, 3] . In general, two vectors are equal if and only if their corresponding components are equal. Thus, [x, y] = [ l , 5] implies that x = 1 and y = 5. b = [ �] It is frequently convenient to use column vectors instead of (or in addition to) row vectors. Another representation of [3, 2] is . (The important point is that the y y The word component is derived from the Latin words co, meaning "together with;' and ponere, mean­ ing "to put:' Thus, a vector is "put together" out of its components. A �B Figure 1 . 2 Figure 1 . 3 4 Chapter 1 Vectors IR2 is pronounced "r two:' components are ordered.) In later chapters, you will see that column vectors are some­ what better from a computational point of view; for now, try to get used to both representations. It may occur to you that we cannot really draw the vector [O, OJ = 00 from the origin to itself. Nevertheless, it is a perfectly good vector and has a special name: the zero vector. The zero vector is denoted by 0. The set of all vectors with two components is denoted by IR 2 (where IR denotes the set of real numbers from which the components of vectors in IR 2 are chosen). Thus, [ - 1, 3.5] , [ \/2, 7f ], and rn, 4 ] are all in IR 2 • Thinking back to the racetrack game, let's try to connect all of these ideas to vec­ tors whose tails are not at the origin. The etymological origin of the word vector in the verb "to carry" provides a clue. The vector [3, 2] may be interpreted as follows: Starting at the origin 0, travel 3 units to the right, then 2 units up, finishing at P. The same displacement may be applied with other init � oin� igure 1 .4 shows two equivalent displacements, represented by the vectors AB and CD . y c Figure 1 . 4 When vectors are referred to by their coordinates, they are being considered analytically. We define two vectors as equal if they have the same length and the same direc­ tion. Thus, AB = CD in Figure 1 .4. (Even though they have different initial and ter­ minal points, they represent the same displacement.) Geometrically, two vectors are equal if one can be obtained by sliding (or translating) the other parallel to itself until the two vectors coincide. In terms of components, in Figure 1 .4 we have A = ( 3, 1) and B = (6, 3). Notice that the vector [3, 2] that records the displacement is just the difference of the respective components: -----> AB = [ 3, 2 ] = [ 6 3, 3 - l] CD = r - 1 - ( -4), 1 - ( - 1) J Similarly, -----> - [ 3, 2 ] ----> and thus AB = CD , as expected. A vector such as oP with its tail at the origin is said to be in standard position. The foregoing discussion shows that every vector can be drawn as a vector in stan­ dard position. Conversely, a vector in standard position can be redrawn (by transla­ tion) so that its tail is at any point in the plane. Exa m p l e 1 . 1 If A = ( - 1 , 2 ) and B = ( 3, 4), find AB and redraw it (a) in standard position and (b) with its tail at the point C = (2, - 1). -----> -----> We compute AB = [3 - ( - 1), 4 - 2] = [4, 2] . If AB is then translated to CD, where C = ( 2, - 1 ), then we must have D = ( 2 + 4, - 1 + 2) = (6, 1). (See Figure 1 .5. ) Solulion Section 1.1 The Geometry and Algebra of Vectors 5 y D(6, 1 ) Figure 1 . 5 New Veclors from Old As in the racetrack game, we often want to "follow" one vector by another. This leads to the notion of vector addition, the first basic vector operation. If we follow u by v, we can visualize the total displacement as a third vector, denoted by u + v. In Figure 1 .6, u = [ 1, 2] and v = [ 2, 2 ] , so the net effect of follow­ ing u by v is [ 1 + 2, 2 + 2 ] = [3, 4] which gives u + v. In general, if u = [u 1 , u 2 ] and v = [ v 1 , v2 ] , then their s u m u + v is the vector It is helpful to visualize u + v geometrically. The following rule is the geometric version of the foregoing discussion. y / Figure 1 . 6 Vector addition �(/lj2 : 2 /_ ) U _ I I 2_ _ 4 _ _ _ _ _ _ _ _ _ 3 _n 6 Chapter 1 Vectors The Head-to-Tail R u l e Given vectors u and v in IR 2 , translate v s o that its tail coincides with the head of u. The s u m u + v of u and v is the vector from the tail of u to the head of v. (See Figure 1 .7. ) Figure 1 . 1 The head-to-tail rule Figure 1 . 8 The parallelogram determined by u and v The Parallelogram Rule By translating u and v parallel to themselves, we obtain a parallelogram, as shown in Figure 1 .8. This parallelogram is called the parallelogram determined by u and v. It leads to an equivalent version of the head-to-tail rule for vectors in standard position. Given vectors u and v in IR 2 (in standard position), their s u m u + v is the vector in standard position along the diagonal of the parallelogram determined by u and v. (See Figure 1 .9. ) y Figure 1 . 9 The parallelogram rule Exa m p l e 1 . 2 If u = [3, - 1 ] and v = [ l , 4], compute and draw u + v. We compute u + v = [3 + 1, - 1 + 4] = [4, 3] . This vector is drawn using the head-to-tail rule in Figure l . l O(a) and using the parallelogram rule in Figure l . l O(b). Solulion Section 1.1 The Geometry and Algebra of Vectors y 1 y u (b) (a) Figure 1 . 1 0 The second basic vector operation is scalar multiplication. Given a vector v and a real number c, the scalar multiple c v is the vector obtained by multiplying each component of v by c. For example, 3 [ - 2, 4] = [ - 6, 12] . In general, Geometrically, cv is a "scaled" version of v. Exa m p l e 1 . 3 If v = [ - 2, 4], compute and draw 2v, tv, and - 2v. Solution We calculate as follows: 2v = [ 2 ( - 2 ) , 2 ( 4 ) ] = [ - 4, 8 J tv = [t ( - 2 ) , t ( 4 ) ] = [ - 1 , 2 ] - 2v = [ - 2 ( - 2 ) , - 2 ( 4 ) ] = [ 4, - 8 ] These vectors are shown in Figure 1 . 1 1 . y 2v - 2v Figure 1 . 1 1 Chapter 8 1 Vectors /./ 2v u - 2v Figure 1 . 1 3 u + ( - v) Vector subtraction Figure 1 . 1 2 The term scalar comes from the Latin word scala, meaning "lad­ dd' The equally spaced rungs on a ladder suggest a scale, and in vec­ tor arithmetic, multiplication by a constant changes only the scale (or length) of a vector. Thus, constants became known as scalars. Observe that cv has the same direction as v if c > 0 and the opposite direction if c < 0. We also see that cv is l e I times as long as v. For this reason, in the context of vectors, constants (i.e., real numbers) are referred to as scalars. As Figure 1 . 1 2 shows, when translation of vectors is taken into account, two vectors are scalar multiples of each other if and only if they are parallel. A special case of a scalar multiple is ( - l )v, which is written as -v and is called the negative ofv. We can use it to define vector subtraction: The difference of u and v is the vector u - v defined by u - v = u + ( - v) Figure 1 . 1 3 shows that u - v corresponds to the "other" diagonal of the parallelo­ gram determined by u and v. Exa m p l e 1 . 4 y A Figure 1 . 1 4 � If u = [ 1 , 2] and v = [ - 3, 1 ] , then u - v = [ 1 - ( - 3), 2 - 1 ] = [ 4, l ] . The definition of subtraction in Example 1 .4 also agrees with the way we cal­ culate a vector such as AB .---If the points A and B correspond to the vectors a and b -> = in standard position, then AB b - a, as shown in Figure 1 . 14. [Observe that the head-to-tail rule applied to this diagram gives the equation a + (b - a ) = b. If we had accidentally drawn b - a with its head at A instead of at B, the diagram would have read b + (b - a) = a, which is clearly wrong! More will be said about algebraic expressions involving vectors later in this section.] vec1ors in � 3 Everything we have just done extends easily to three dimensions. The set of all or­ dered triples of real numbers is denoted by IR 3 . Points and vectors are located using three mutually perpendicular coordinate axes that meet at the origin 0. A point such as A = ( 1 , 2, 3) can be located as follows: First travel 1 unit along the x-axis, then move 2 units parallel to the y-axis, and finally move 3 units parallel to the z-axis. The corresponding vector a = [ l , 2, 3] is then OA, as shown in Figure 1 . 15. Another way to visualize vector a in IR 3 is to construct a box whose six sides are de­ termined by the three coordinate planes (the xy-, xz-, and yz-planes) and by three planes through the point ( 1 , 2, 3 ) parallel to the coordinate planes. The vector [ 1 , 2, 3] then corre­ sponds to the diagonal from the origin to the opposite corner of the box (see Figure 1 . 16). Section 1.1 The Geometry and Algebra of Vectors z z I A ( l , 2, 3 ) 3 • I I I ,. :3 - I I I -- 1 2 - x Figure 1 . 1 5 9 � y x Figure 1 . 1 6 The "componentwise" definitions of vector addition and scalar multiplication are extended to IR 3 in an obvious way. Vectors in � n In general, we define !R n as the set of all ordered n-tuples of real numbers written as row or column vectors. Thus, a vector v in !R n is of the form � Figure 1 . 1 1 u+v=v+u The individual entries of v are its components; V; is called the ith component. We extend the definitions of vector addition and scalar multiplication to !R n in the obvious way: If u = [u 1 , u 2 , • • • , u n l and v = [ v 1 , v2 , . . . , vn ] , the ith component of u + v is U ; + V; and the ith component of cv is just C V;. Since in !R n we can no longer draw pictures of vectors, it is important to be able to calculate with vectors. We must be careful not to assume that vector arithmetic will be similar to the arithmetic of real numbers. Often it is, and the algebraic calculations we do with vectors are similar to those we would do with scalars. But, in later sections, we will encounter situations where vector algebra is quite unlike our previous experi­ ence with real numbers. So it is important to verify any algebraic properties before attempting to use them. One such property is commutativity of addition: u + v = v + u for vectors u and v. This is certainly true in IR 2 • Geometrically, the head-to-tail rule shows that both u + v and v + u are the main diagonals of the parallelogram determined by u and v. (The parallelogram rule also reflects this symmetry; see Figure 1 . 1 7.) Note that Figure 1 . 1 7 is simply an illustration of the property u + v = v + u. It is not a proof, since it does not cover every possible case. For example, we must also include the cases where u = v, u = - v, and u = 0. (What would diagrams for these cases look like?) For this reason, an algebraic proof is needed. However, it is just as easy to give a proof that is valid in !R n as to give one that is valid in IR 2 . The following theorem summarizes the algebraic properties of vector addition and scalar multiplication in !R n . The proofs follow from the corresponding properties of real numbers. Chapter 10 1 Vectors Theorem 1 . 1 Algebraic Properties o f Vectors i n !R n Let u, v, and w be vectors in !R n and let c and d be scalars. Then a. u + v = v + u b. (u + v) + w = u + (v + w) c. u + 0 = u d. u + ( -u) = 0 e. c (u + v) = cu + cv f. (c + d)u = cu + du g. c (du) = (cd)u h. lu = u Commutativity Associativity Distributivity Distributivity R e m a rks The word theorem is derived from the Greek word theorema, which in turn comes from a word mean­ ing "to look af' Thus, a theorem is based on the insights we have when we look at examples and extract from them properties that we try to prove hold in general. Similarly, when we understand something in mathematics-the proof of a theorem, for example­ we often say, "I see:' • Properties (c) and (d) together with the commutativity property (a) imply that 0 + u = u and - u + u = 0 as well. • If we read the distributivity properties (e) and (f) from right to left, they say that we can factor a common scalar or a common vector from a sum. We prove properties (a) and (b) and leave the proofs of the remain­ ing properties as exercises. Let u = [u 1 , u 2 , . . . , u n ] , v = [ v 1 , v2 , . . . , vn ] , and w = Proof [ W1 , Wz , . . . , Wn ] . (a) U + = [U1, U z , . . . , U n ] + [V1, Vz , . . . , Vn ] = [U1 + V1 , U z + Vz , . . . , U n + Vn ] = [ v 1 + u 1 , v2 + u 2 , , v n + u n ] = [v 1 , Vz , . . . , Vn ] + [u 1 , U z , . . . , U n ] =v+u V • . . The second and fourth equalities are by the definition of vector addition, and the third equality is by the commutativity of addition of real numbers. (b) Figure 1 . 1 8 illustrates associativity in IR 2 . Algebraically, we have [ (u 1 + v 1 ) + w 1 , (u 2 + vz ) + w2 , . . . , (u n + vn ) + wn ] [ u 1 + (v l + W1 ), U 2 + ( Vz + Wz ), . . . , U n + (vn + Wn ) J [ u 1 , u 2 , . . . , u n ] + ( [ v 1 , v2 , . . . , vn ] + [ w 1 , w2 , . . . , w n ] ) = u + (v + w) u Figure 1 . 1 8 The fourth equality is by the associativity of addition of real numbers. Note the care­ ful use of parentheses. Section 1.1 The Geometry and Algebra of Vectors 11 By property (b) of Theorem 1 . 1 , we may unambiguously write u + v + w without parentheses, since we may group the summands in whichever way we please. By (a), we may also rearrange the summands-for example, as w + u + v-if we choose. Likewise, sums of four or more vectors can be calculated without regard to order or grouping. In general, if v1 , v2 , , vk are vectors in !R n , we will write such sums with­ out parentheses: • • . The next example illustrates the use of Theorem 1 . 1 in performing algebraic calculations with vectors. Exa m p l e 1 . 5 Let a, b, and x denote vectors in !R n . (a) Simplify 3a + (Sb - 2a) + 2(b - a). (b) If Sx - a = 2(a + 2x), solve for x in terms of a. We will give both solutions in detail, with reference to all of the properties in Theorem 1 . 1 that we use. It is good practice to justify all steps the first few times you do this type of calculation. Once you are comfortable with the vector properties, though, it is acceptable to leave out some of the intermediate steps to save time and space. Solution (a) We begin by inserting parentheses. 3a + (Sb - 2a) + 2(b - a) = (3a + (Sb - 2a)) + 2(b - a) (3a + ( - 2a + Sb)) + (2b - 2a) ((3a + ( - 2a)) + Sb) + (2b - 2a) ((3 + ( - 2))a + Sb) + (2b - 2a) ( la + Sb) + (2b - 2a) ((a + Sb) + 2b) - 2a (a + (Sb + 2b)) - 2a (a + (S + 2)b) - 2a (7b + a) - 2a = 7b + (a - 2a) = 7b + ( 1 - 2)a = 7b + ( - l )a = 7b - a (a), (e) (b) (f) (b ), (h) (b) (f) (a) (b) (f), (h) You can see why we will agree to omit some of these steps! In practice, it is acceptable to simplify this sequence of steps as 3a + (Sb - 2a) + 2(b - a) = 3a + Sb - 2a + 2b - 2a (3a - 2a - 2a) + (Sb + 2b) = - a + 7b or even to do most of the calculation mentally. 12 Chapter 1 Vectors (b) In detail, we have Sx - a = 2 ( a + 2x) Sx - a = 2a + 2 ( 2x) Sx - a = 2a + ( 2 · 2 ) x Sx - a = 2a + 4x ( Sx - a ) - 4x = ( 2a + 4x) - 4x ( - a + Sx) - 4x = 2a + ( 4x - 4x) - a + ( Sx - 4x) = 2a + 0 - a + ( 5 - 4 ) x = 2a - a + ( l ) x = 2a a + ( - a + x) = a + 2a ( a + ( - a )) + x = ( 1 + 2 ) a 0 + x = 3a x = 3a (e) (g) (a), (b) (b), (d) (f), (c) (h) (b), (f) (d) (c) Again, we will usually omit most of these steps. Linear Combin alions and Coordin a1es A vector that is a sum of scalar multiples of other vectors is said to be a linear combi­ nation of those vectors. The formal definition follows. D e f i n i l i O D A vector v is a linear combination of vectors v1 , v2 , . . . , vk if there are scalars c 1 , c2 , . . . , c k such that v = c 1 v1 + c2v2 + + ckvk . The scalars c 1 , c2 , . . . , ck are called the coefficients of the linear combination. · · · Exa m p l e 1 . 6 Remark Determining whether a given vector is a linear combination of other vectors is a problem we will address in Chapter 2. In IR 2 , it is possible to depict linear combinations of two (nonparallel) vectors quite conveniently. Exa m p l e 1 . 1 Let u = [�] and v = way that e 1 = [�] [�] . We can use u and v to locate a new set of axes (in the same and e2 = [�] locate the standard coordinate axes) . We can use Section 1.1 The Geometry and Algebra of Vectors 13 y I u I Figure 1 . 1 9 I I I- these new axes to determine a coordinate grid that will let us easily locate linear combinations of u and v. As Figure 1 . 1 9 shows, w can be located by starting at the origin and traveling - u followed by 2v. That is, w = -u + 2v We say that the coordinates of w with respect to u and v are - 1 and 2 . (Note that this is just another way of thinking of the coefficients of the linear combination.) It follows that (Observe that - 1 and 3 are the coordinates of w with respect to e 1 and e2 . ) Switching from the standard coordinate axes to alternative ones is a useful idea. It has applications in chemistry and geology, since molecular and crystalline structures often do not fall onto a rectangular grid. It is an idea that we will encounter repeatedly in this book. Binarv vec1ors and Modular Arilhmelic We will also encounter a type of vector that has no geometric interpretation-at least not using Euclidean geometry. Computers represent data in terms of Os and ls (which can be interpreted as off/on, closed/open, false/true, or no/yes) . Binary vectors are vectors each of whose components is a 0 or a 1 . As we will see in Chapter 8, such vectors arise naturally in the study of many types of codes. In this setting, the usual rules of arithmetic must be modified, since the result of each calculation involving scalars must be a 0 or a 1 . The modified rules for addition and multiplication are given below. The only curiosity here is the rule that 1 + 1 = 0. This is not as strange as it appears; if we replace 0 with the word "even" and 1 with the word "odd," these tables simply 14 Chapter 1 Vectors summarize the familiar parity rules fo r the addition and multiplication of even and odd integers. For example, 1 + 1 = 0 expresses the fact that the sum of two odd inte­ gers is an even integer. With these rules, our set of scalars {O, l } is denoted by 22 and is called the set of integers modulo 2. Exa m p l e 1 . 8 We are using the term length dif­ ferently from the way we used it in !FR". This should not be confusing, since there is no geometric notion of length for binary vectors. In 22, 1 + 1 + 0 + 1 = 1 and 1 + 1 + 1 + 1 = 0. (These calculations illustrate the parity rules again: The sum of three odds and an even is odd; the sum of four odds is even.) .+ With 22 as our set of scalars, we now extend the above rules to vectors. The set of all n-tuples of Os and l s (with all arithmetic performed modulo 2) is denoted by 2� . The vectors in 2� are called binary vectors of length n. Exa m p l e 1 . 9 The vectors in 2� are [O, OJ , [O, l ] , [ l , OJ , and [ l , l J . (How many vectors does 2� contain, in general?) Exa m p l e 1 . 1 0 Let u = [l, 1, 0, 1, OJ and v = [O, 1, 1, 1, OJ be two binary vectors oflength 5. Find u + v. Solulion The calculation of u + v takes place over 22 , so we have u + v = [ 1 , i , o, i , o ] + [o, I , I , i , o ] = [ 1 + o, I + I , o + I , I + I , o + o ] = [ 1 , 0, 1 , 0, 0 ] It is possible to generalize what we have just done for binary vectors to vectors whose components are taken from a finite set {O, 1, 2, . . . , k} for k 2: 2. To do so, we must first extend the idea of binary arithmetic. Exa m p l e 1 . 1 1 The integers modulo 3 is the set 2 3 = {O, 1, 2} with addition and multiplication given by the following tables: 2 0 1 2 0 0 0 0 0 0 1 2 1 2 0 2 1 0 2 0 2 1 2 2 0 Observe that the result of each addition and multiplication belongs to the set {O, 1, 2}; we say that 2 3 is closed with respect to the operations of addition and multi­ plication. It is perhaps easiest to think of this set in terms of a 3-hour clock with 0, 1 , and 2 o n its face, as shown i n Figure 1 .20. The calculation 1 + 2 = 0 translates as follows: 2 hours after 1 o'clock, it is 0 o'clock. Just as 24:00 and 1 2:00 are the same on a 1 2-hour clock, so 3 and 0 are equivalent on this 3-hour clock. Likewise, all multiples of 3-positive and negative­ are equivalent to 0 here; 1 is equivalent to any number that is 1 more than a multiple of 3 (such as - 2, 4, and 7); and 2 is equivalent to any number that is 2 more than a + 0 Section 1.1 The Geometry and Algebra of Vectors 15 multiple of 3 (such as - 1 , 5, and 8). We can visualize the number line as wrapping around a circle, as shown in Figure 1 .2 1 . 0 . . . , - 3 , 0, 3, . . . 2 . . . , 1 , 2, 5 , . . . Figure 1 . 2 0 Arithmetic modulo Exa m p l e 1 . 1 2 3 . . . , - 2, 1 , 4, . . . Figure 1 . 2 1 To what is 3548 equivalent in Z/ Solution This is the same as asking where 3548 lies on our 3-hour clock. The key is to calculate how far this number is from the nearest (smaller) multiple of 3; that is, we need to know the remainder when 3548 is divided by 3. By long division, we find that 3548 = 3 · 1 1 82 + 2, so the remainder is 2. Therefore, 3548 is equivalent to 2 in l'.. 3 • 4 In courses in abstract algebra and number theory, which explore this concept in greater detail, the above equivalence is often written as 3548 = 2 (mod 3) or 3548 = 2 (mod 3), where = is read "is congruent to." We will not use this notation or termi­ nology here. Exa m p l e 1 . 1 3 In l'.. 3 , calculate 2 + 2 + 1 + 2. We use the same ideas as in Example 1 . 12. The ordinary sum is 2 + 2 + 1 + 2 = 7, which is 1 more than 6, so division by 3 leaves a remainder of 1 . Thus, 2 + 2 + 1 + 2 = 1 in l'.. 3 . Solution 1 Solution 2 A better way to perform this calculation is to do it step by step entirely in l'.. 3 • 2 + 2 + 1 + 2 = (2 + 2) + 1 + 2 = 1 + 1 +2 = (1 + 1) + 2 =2+2 = 1 Here we have used parentheses to group the terms we have chosen to combine. We could speed things up by simultaneously combining the first two and the last two terms: (2 + 2) + ( 1 + 2) = 1 + 0 = 1 Chapter 16 1 Vectors Repeated multiplication can be handled similarly. The idea is to use the addition and multiplication tables to reduce the result of each calculation to 0, 1 , or 2. Extending these ideas to vectors is straightforward. Exa m p l e 1 . 1 4 m m -2 0 - 1 _.,---. _ 2 [ 2, 2, 0, 1, 2 ] + [ 1, 2, 2, 2, l ] [ 2 + 1, 2 + 2, 0 + 2, 1 + 2, 2 + l ] [ 0, 1 , 2, 0, 0 ] Vectors in Z� are referred to as ternary vectors of length 5. In general, we have the set Z m = {O, 1, 2, . . . , m - l } of integers modulo m (cor­ responding to an m-hour clock, as shown in Figure 1 .22). A vector of length n whose entries are in Z m is called an m-ary vector of length n. The set of all m-ary vectors of length n is denoted by z::i. Figure 1.22 Arithmetic modulo m I In Z�, let u = [2, 2, 0, 1 , 2] and v = [ l , 2, 2, 2, l ] . Then u + v = 3 .. 4 Exercises 1 . 1 1 . Draw the following vectors in standard position in IR 2 : (a) a = [] 3 0 (b) b = (d) d = [] [] 2 3 3 -2 2 . Draw the vectors in Exercise 1 with their tails at the point (2, - 3). 3. Draw the following vectors in standard position in IR 3 : (a) a = [O, 2, OJ (b) b = [ 3 , 2, l ] (c) c = [ l , - 2, l ] (d) d = [ - 1, - 1 , - 2] 4. If the vectors in Exercise 3 are translated so that their heads are at the point (3, 2, 1), find the points that correspond to their tails. 5. For each of the following pairs of points, draw the ----> ----> vector AB. Then compute and redraw AB as a vector in standard position. (a) A = ( 1, - 1 ), B = (4, 2) (b) A = (O, - 2), B = ( 2, - 1 ) (c) A = (2, f), B = (t, 3 ) (d) A = (t, t), B = (i, t) 6. A hiker walks 4 km north and then 5 km northeast. Draw displacement vectors representing the hiker's trip and draw a vector that represents the hiker's net displacement from the starting point. Exercises 7- 1 0 refer to the vectors in Exercise 1. Compute the indicated vectors and also show how the results can be obtained geometrically. 8. b - c 7. a + b 10. a + d 9. d - c Exercises 1 1 and 12 refer to the vectors in Exercise 3. Compute the indicated vectors. 1 1 . 2a + 3c 12. 3b - 2c + d 13. Find the components of the vectors u, v, u + v, and u - v, where u and v are as shown in Figure 1 .23. 14. In Figure 1 . 2 4 , A, B, C, D, E, and F are the vertices of a regular hexagon centered at the origin. Express each of the following vectors in terms of ----> ----> a = OA and b = OB : (a) Ai (c) Ai5 (b) BC (d) a (e) xc (f) BC + ill + PX Section y 21. u = 22. u = 1.1 The Geometry and Algebra of Vectors 11 [ - � ] , [ � ] , [�] [ - � ] , [ �] , [ � ] v= w= v= w= 23. Draw diagrams to illustrate properties (d) and (e) of Theorem 1 . 1 . 24. Give algebraic proofs of properties ( d) through (g) of Theorem 1 . 1 . In Exercises 25-28, u and v are binary vectors. Find u + v in each case. 25. u = Figure 1 . 2 3 y c B E F Figure 1 . 2 4 [ � ] ,v [ � ] = 27. u = [ 1 , 0, 1 , 1 ] , v = [ 1, 1 , 1, 1 ] 28. u = [ 1 , 1, 0, 1, 0 ] , v = [ O, 1, 1 , 1, 0 ] 29. Write out the addition and multiplication tables for Z 4 . 30. Write out the addition and multiplication tables for Zs. In Exercises 3 1 -43, perform the indicated calculations. 31. 2 + 2 + 2 in Z3 32. 2 · 2 · 2 in Z3 ( ) 33. 2 2 + 1 + 2 in Z3 34. 3 + 1 + 2 + 3 in Z4 35. 2 · 3 · 2 in Z4 36. 3 ( 3 + 3 + 2 ) in Z4 37. 2 + 1 + 2 + 2 + 1 in Z3 , Z4 , and Zs 38. ( 3 + 4 ) ( 3 + 2 + 4 + 2 ) in Zs 39. 8 ( 6 + 4 + 3 ) in Z9 40. 2 100 in Z 1 1 41. [ 2 , 1 , 2 ] + [ 2, 0, 1 ] in Z� 42. 2 [ 2, 2, 1 ] in Z� 43. 2 ( [ 3, l , 1 , 2 ] + [ 3, 3, 2, l ] ) in Z! and Z� In Exercises 1 5 and 1 6, simplify the given vector expression. Indicate which properties in Theorem 1 . 1 you use. 15. 2 (a - 3b) + 3 (2b + a) 16. - 3(a - c) + 2(a + 2b) + 3(c - b) In Exercises 1 7 and 1 8, solve for the vector x in terms of the vectors a and b. 17. x - a = 2(x - 2a) 18. x + 2a - b = 3 (x + a) - 2 (2a - b) In Exercises 19 and 20, draw the coordinate axes relative to u and v and locate w. [ � ] [ �l w [ -�l [ �l 19. u = _ , v = 20. u = v= _ = 2 u + 3v w = - u - 2v In Exercises 21 and 22, draw the standard coordinate axes on the same diagram as the axes relative to u and v. Use these to find w as a linear combination of u and v. In Exercises 44-55, solve the given equation or indicate that there is no solution. 45. x + 5 = 1 in Z6 44. x + 3 = 2 in Zs 46. 2x = 1 in Z3 47. 2x = 1 in Z4 48. 2x = 1 in Zs 49. 3x = 4 in Zs 51. 6x = 5 in Zs 50. 3x = 4 in Z6 52. Bx = 9 in Z 1 1 53. 2x + 3 = 2 in Z5 55. 6x + 3 = 1 in Zs 54. 4x + 5 = 2 in Z6 56. (a) For which values of a does x + a = 0 have a solu­ tion in Zs? (b) For which values of a and b does x + a = b have a solution in Z6? (c) For which values of a, b, and m does x + a = b have a solution in Z m ? 57. (a) For which values of a does ax = 1 have a solution in Zs? (b) For which values of a does ax = 1 have a solution in Z6? (c) For which values of a and m does ax = 1 have a solution in z m ? 18 Chapter 1 Vectors Length a n d A n g l e : T h e D o t P ro d u ct It is quite easy to reformulate the familiar geometric concepts of length, distance, and angle in terms of vectors. Doing so will allow us to use these important and powerful ideas in settings more general than IR 2 and IR 3 • In subsequent chapters, these simple geometric tools will be used to solve a wide variety of problems arising in applications-even when there is no geometry apparent at all! The Doi Producl The vector versions of length, distance, and angle can all be described using the notion of the dot product of two vectors. Definilion If then the dot product u · v of u and v is defined by In words, u · v is the sum of the products of the corresponding components of u and v. It is important to note a couple of things about this "product" that we have just defined: First, u and v must have the same number of components. Second, the dot product u · v is a number, not another vector. (This is why u · v is sometimes called the scalar product of u and v.) The dot product of vectors in !R n is a special and im­ portant case of the more general notion of inner product, which we will explore in Chapter 7. Exa m p l e 1 . 1 5 Solution u · v = l · ( - 3) + 2 · 5 + ( - 3) · 2 = 1 Notice that if we had calculated v · u in Example 1 . 15, we would have computed v · u = ( - 3) · 1 + 5 · 2 + 2 · ( - 3) = 1 That u · v = v · u in general is clear, since the individual products of the components commute. This commutativity property is one of the properties of the dot product that we will use repeatedly. The main properties of the dot product are summarized in Theorem 1 .2. Section Theorem 1 . 2 1.2 Length and Angle: The Dot Product 19 Let u, v, and w be vectors in � n and let c be a scalar. Then Commutativity a. u · v = v · u b. u . ( v + w) = u . v + u . w Distributivity c. ( cu) . v = c ( u . v) d. u · u 2: 0 and u · u = 0 if and only if u = 0 We prove (a) and (c) and leave proof of the remaining properties for the exercises. Proof (a) Applying the definition of dot product to u · v and v · u, we obtain u · v = U 1 V 1 + U z Vz + . . + U n V n = V 1 U 1 + Vz U z + . . . + Vn U n = v·u · where the middle equality follows from the fact that multiplication of real numbers is commutative. (c) Using the definitions of scalar multiplication and dot product, we have (cu) · v = [ cu 1 , cu 2 , . . . , cu " ] • [ v 1 , v2 , , v" ] + CU n Vn CU 1 V 1 + CU z V2 + + U n Vn ) c( U 1 V 1 + U z Vz + c(u · v) • · · · · · · . . R e m a rlls Property (b) can be read from right to left, in which case it says that we can factor out a common vector u from a sum of dot products. This property also has a "right-handed" analogue that follows from properties (b) and (a) together: (v + w) · u = v · u + w · u. • Property (c) can be extended to give u · (cv) = c(u · v) (Exercise 58). This extended version of ( c) essentially says that in taking a scalar multiple of a dot product of vectors, the scalar can first be combined with whichever vector is more convenient. For example, • (� [ - 1, -3, 2 ] ) · [ 6, - 4, 0 ] = [ - 1 , -3, 2 ] · (� [6, - 4, 0 ] ) = [ - 1, -3, 2 ] · [3, - 2, 0] = 3 With this approach we avoid introducing fractions into the vectors, as the original grouping would have. • The second part of ( d) uses the logical connective if and only if. Appendix A dis­ cusses this phrase in more detail, but for the moment let us just note that the wording signals a double i mplication- namely, if u = 0, then u · u = 0 and if u · u = 0, then u = 0 Theorem 1 .2 shows that aspects of the algebra of vectors resemble the algebra of numbers. The next example shows that we can sometimes find vector analogues of familiar identities. Chapter 20 1 Vectors Exa m p l e 1 . 1 6 Prove that (u + v) · (u + v) = u · u + 2(u · v) + v · v for all vectors u and v in !R n . Solulion (u + v) · (u + v) = ( u + v) · u + ( u + v) · v = u·u + v·u + u·v + v·v = u·u + u·v + u·v + v·v = u · u + 2(u · v) + v · v (Identify the parts of Theorem 1 .2 that were used at each step.) y b v = [�] Length To see how the dot product plays a role in the calculation oflengths, recall how lengths are computed in the plane. The Theorem of Pythagoras is all we need. In IR 2 , the length of the vector v = a [�] is the distance from the origin to the point (a, b), which, by Pythagoras' Theorem, is given by Va 2 + b 2 , as in Figure 1 .25. Observe that a 2 + b 2 = v · v. This leads to the following definition. Figure 1 . 2 5 Definition The length (or norm) of a vector v = tive scalar ll v ll defined by [ �: ] · in !R n is the nonnega- vn ll v ll = VV:V = Vvi + v� + · · · + v� In words, the length of a vector is the square root of the sum of the squares of its components. Note that the square root of v · v is always defined, since v · v 2': 0 by Theorem l .2 (d). Note also that the definition can be rewritten to give ll v ll 2 = v · v, which will be useful in proving further properties of the dot product and lengths of vectors. Exa m p l e 1 . 1 1 11 [2, 3 J 11 = v2 2 + 3 2 = vT3 Theorem 1 . 3 lists some of the main properties of vector length. Theorem 1 . 3 Let v be a vector in IR " and let c be a scalar. Then a. ll v ll = 0 if and only ifv = 0 b. ll cv ll = l c l ll v ll Proof Property (a) follows immediately from Theorem l .2(d). To show (b), we have ll cv ll 2 = (cv) · (cv) = c 2 (v · v) = c 2 ll v ll 2 using Theorem l .2 (c). Taking square roots ofboth sides, using the fact that W = I c l for any real number c, gives the result. Section 1.2 Length and Angle: The Dot Product 21 A vector of length 1 is called a unit vector. In IR 2 , the set of all unit vectors can be identified with the unit circle, the circle of radius 1 centered at the origin (see Figure 1 .26). Given any nonzero vector v, we can always find a unit vector in the same direction as v by dividing v by its own length (or, equivalently, multiplying by 1/ ll v ll ) . We can show this algebraically by using property (b) of Theorem 1.3 above: If u = ( 1 / ll v ll )v, then ll u ll = 11 ( 1/ ll v ll )v ll = l l / ll v ll I ll v ll = ( 1/ ll v ll ) ll v ll = 1 and u is in the same direction as v, since 1 / I v I is a positive scalar. Finding a unit vec­ tor in the same direction is often referred to as normalizing a vector (see Figure 1 .27). y v >:;:. � / rr�1r Figure 1 . 2 7 Figure 1 . 2 6 Normalizing a vector Unit vectors in lffi 2 Exa m p l e 1 . 1 8 In IR 2 , let e1 = [�] and e2 = [�] . Then e1 and e2 are unit vectors, since the sum of the squares of their components is 1 in each case. Similarly, in IR 3 , we can construct unit vectors Observe in Figure 1 .28 that these vectors serve to locate the positive coordinate axes in IR 2 and IR 3 . t .--+ z y Figure 1 . 2 8 Standard unit vectors in lffi 2 and !ffi 3 x y Chapter 22 1 Vectors In general, in !R n , we define unit vectors e1, e2 , , en , where e; has 1 in its ith component and zeros elsewhere. These vectors arise repeatedly in linear algebra and are called the standard unit vectors. • Exa m p l e 1 . 1 9 Nmmalfae the vectm v � Solulion . . [ -n ll v ll = V2 2 + ( - 1 ) 2 + 3 2 = \/14, so a unit vector in the same direc­ tion as v is given by { :J [ u � ( 1/ ll v ll l v � ( 1/ v'14 - 2/ \/14 - 1/ \/14 3/ \/14 ] Since property (b) of Theorem 1 .3 describes how length behaves with respect to scalar multiplication, natural curiosity suggests that we ask whether length and vec­ tor addition are compatible. It would be nice if we had an identity such as II u + v II = ll u 1 + ll v ll , but for almost any choice of vectors u and v this turns out to be false. [See Exercise 52(a).] However, all is not lost, for it turns out that if we replace the = sign by :s , the resulting inequality is true. The proof of this famous and important result-the Triangle Inequality-relies on another important inequality-the Cauchy-Schwarz Inequality-which we will prove and discuss in more detail in Chapter 7. Theorem 1 . 4 The Cauchy-Schwarz Inequality For all vectors u and v in !R n , l u · v l :s ll n ll ll v ll - u Figure 1 . 2 9 The Triangle Inequality Theorem 1 . 5 See Exercises 7 1 and 72 for algebraic and geometric approaches to the proof of this inequality. In IR 2 or IR 3 , where we can use geometry, it is clear from a diagram such as Figure 1 .29 that ll u + v ii :s ll n ll + ll v ll for all vectors u and v. We now show that this is true more generally. The Triangle Inequality For all vectors u and v in !R n , ll u + v ii :s ll u ll + ll v ll Section 1 . 2 Length and Angle: The Dot Product � 23 Proof Since both sides of the inequality are nonnegative, showing that the square of the left-hand side is less than or equal to the square of the right-hand side is equiva­ lent to proving the theorem. (Why?) We compute ll u + v ll 2 = (u + v) - (u + v) = u · u + 2(u · v) + v · v By Example 1.9 2 2 l l ll ll ll ll ::; u + 2 u · v + v ::; ll n ll 2 + 2 ll n ll ll v ll + ll v ll 2 By Cauchy-Schwarz = ( ll n ll + ll v ll ) 2 as required. Distance The distance between two vectors is the direct analogue of the distance between two points on the real number line or two points in the Cartesian plane. On the number line (Figure 1 .30), the distance between the numbers a and b is given by l a - bl. (Tak­ ing the absolute value ensures that we do not need to know which of a or b is larger.) This distance is also equal to V ( a - b ) 2 , and its two-dimensional generalization is the familiar formula for the distance d between points (a 1 , a 2 ) and (b 1 , b 2 )-namely, d = V ( a 1 - b 1 ) 2 + ( a 2 - bi ) 2 • 4 I I I a + -2 0 I I Figure 1 . 3 0 d = la - bl = l-2 - 3 1 = In terms of vectors, if a = [ :: ] and b = [ �:], b + 3 I I � 5 then d is just the length of a - b, as shown in Figure 1 . 3 1 . This is the basis for the next definition. d _ _ _ _ _ _ _ _ _ _ _ a1 - b1 I I I I : a 2 - b2 I I I I _f] Figure 1 . 3 1 d = V( a , - b,)2 + ( a 2 - b )2 = I l a - b ll 2 Definition The distance d(u, v) between vectors u and v in u;g n is defined by d ( U, v) = I u - v II 24 Chapter 1 Vectors Exa m p l e 1 . 2 0 Find the di,tance between u � Solution [ 1] [ _ � l [ �J '" •nd v � · We rnmpute u - v � d ( u, v) = ll u - v ii = V( \/2) 2 + ( - 1 ) 2 + 1 2 = V4 = 2 Angles The dot product can also be used to calculate the angle between a pair of vectors. In IR 2 or IR 3 , the angle between the nonzero vectors u and v will refer to the angle (} determined by these vectors that satisfies 0 :::::: (} :::::: 1 80° (see Figure 1 .3 2). � v v4 v u (} 0 Figure 1 . 3 2 The angle between u and v u 'D . /7 u u In Figure 1 .33, consider the triangle with sides u, v, and u - v, where (} is the angle between u and v. Applying the law of cosines to this triangle yields v ll u - v ll 2 = ll u ll 2 + ll v ll 2 - 2 ll u ll ll v ll cos (} Expanding the left-hand side and using ll v ll 2 = v · v several times, we obtain Figure 1 . 3 3 ll u ll 2 - 2 ( u · v) + ll v ll 2 = ll u ll 2 + ll v ll 2 - 2 ll u ll ll v ll cos (} which, after simplification, leaves us with u · v = ll u ll ll v ll cos (} . From this we obtain the following formula for the cosine of the angle (} between nonzero vectors u and v. We state it as a definition. Definition Exa m p l e 1 . 2 1 For nonzero vectors u and v in !R n , u·v cos (} = u ll ll ll v ll Compute the angle between the vectors u = [2, 1 , - 2] and v = [1, 1 , 1 ] . Section 1.2 Length and Angle: The Dot Product 25 We calculate u·v = 2 · 1 + l · l + ( - 2 ) · 1 = 1 , JJ u JJ = V2 2 + 1 2 + (- 2 ) 2 = V9 = 3, and JJ v JJ = Vl 2 + 1 2 + 1 2 = v3. Therefore, cos e = 1 / 3 v3, so 1 e = cos - ( 1 /3 v3) = 1 .377 radians, or 78.9°. Solution Exa m p l e 1 . 2 2 ...._+ Compute the angle between the diagonals on two adjacent faces of a cube. The dimensions of the cube do not matter, so we will work with a cube with sides of length 1 . Orient the cube relative to the coordinate axes in IR 3 , as shown in Figure 1 .34, and take the two side diagonals to be the vectors [ 1 , 0, 1 ] and [O, 1 , 1 ] . Then angle e between these vectors satisfies Solution cos e = l·O + O·l + l·l 2 \/2 v'2 from which it follows that the required angle is n /3 radians, or 60°. ----- z [O, 1 , l ] y x Figure 1 . 3 4 (Actually, we don't need to do any calculations at all to get this answer. If we draw a third side diagonal joining the vertices at ( 1 , 0, 1 ) and (O, 1 , 1 ) , we get an equilateral triangle, since all of the side diagonals are of equal length. The angle we want is one of the angles of this triangle and therefore measures 60°. Sometimes, a little insight can save a lot of calculation; in this case, it gives a nice check on our work!) R e m a rks As this discussion shows, we usually will have to settle for an approximation to the angle between two vectors. However, when the angle is one of the so-called special angles (0°, 30°, 45°, 60°, 90°, or an integer multiple of these), we should be able to recognize its cosine (Table 1 . 1 ) and thus give the corresponding angle exactly. In all other cases, we will use a calculator or computer to approximate the desired angle by means of the inverse cosine function. • e Ta b l e 1 . 1 cos e cosines or Special Angles 30° V4 -= 1 2 v3 2 45° v'2 2 60° 1 v'2 Vi 2 90° 2 Vo -=O 2 26 Chapter 1 Vectors The derivation of the formula for the cosine of the angle between two vectors is valid only in IR 2 or IR 3 , since it depends on a geometric fact: the law of cosines. In IR n , for n > 3, the formula can be taken as a definition instead. This makes sense, since u·v u·v the Cauc�y-Schwarz ! � equality �mplies that ranges from :::::: 1 , so I I u u ll ll v ll ll ll v ll - 1 to 1, J USt as the cosme funct10n does. • I I Orthogonal Veclors The word orthogonal is derived from the Greek words orthos, mean­ ing "upright;' and gonia, meaning "angle:' Hence, orthogonal literally means "right-angled'.' The Latin equivalent is rectangular. The concept of perpendicularity is fundamental to geometry. Anyone studying geometry quickly realizes the importance and usefulness of right angles. We now gen­ eralize the idea of perpendicularity to vectors in !R n , where it is called orthogonality. In IR 2 or IR 3 , two nonzero vectors u and v are perpendicular if the angle (J between u·v = cos 90° = 0, them is a right angle-that is, if (J = 1T /2 radians, or 90°. Thus, I ll u ll v ll and it follows that u · v = 0. This motivates the following definition. DefiniliOD Two vectors u and v in !R n are orthogonal to each other if u ·v = 0. Since 0 · v = 0 for every vector v in !R n , the zero vector is orthogonal to every vector. Exa m p l e 1 . 2 3 In IR 3 , u = [ l , 1 , - 2 ] and v = [3, 1 , 2] are orthogonal, since u · v 3 + 1 - 4 = 0. Using the notion of orthogonality, we get an easy proof of Pythagoras' Theorem, valid in !R n . Theorem 1 . 6 Pythagoras' 1heoreITI For all vectors u and v in lR n , ll u + v ll 2 = ll u ll 2 + ll v ll 2 if and only if u and v are orthogonal. From Example 1 . 16, we have ll u + v ll 2 = ll u ll 2 + 2 ( u · v) + ll v ll 2 for all vectors u and v in !R n . It follows immediately that ll u + v ll 2 = ll u ll 2 + ll v ll 2 if and only if u · v = 0. See Figure 1 .35. Proof Figure 1 . 3 5 The concept of orthogonality is one of the most important and useful in linear algebra, and it often arises in surprising ways. Chapter 5 contains a detailed treatment of the topic, but we will encounter it many times before then. One problem in which it clearly plays a role is finding the distance from a point to a line, where "dropping a perpendicular" is a familiar step. Section 1.2 Length and Angle: The Dot Product 21 Proieclions We now consider the problem of finding the distance from a point to a line in the context of vectors. As you will see, this technique leads to an important concept: the projection of a vector onto another vector. As Figure 1 .36 shows, the problem of finding the distance from a point B to a line € (in IR 2 or IR 3 ) reduces to the problem of finding the length of the perpendicular line segment PB or, equivalently, the length of the vector PB. If we choose a point A "----> on €, then, in the----" right-angled triangle fl.APB, the other two vectors are the leg AP and -----> -----; the hypotenuse AB. AP is called the projection of AB onto the line €. We will now look at this situation in terms of vectors. B B e A Figure 1 . 3 6 The distance from a point to a line _..,.,.-figure 1 . 3 1 The projection of v onto u u vu= l Puv.l u, u l u )u u v, = = v Consider two nonzero vectors and Let p be the vector obtained by dropping a perpendicular from the head of onto and let e be the angle between and as ( I / I is the unit vector in shown in Figure 1 .37. Then clearly p where I II cos e , and the direction of Moreover, elementary trigonometry gives I p I Thus, after substitution, we obtain we know that cos e II II u. u·v = u l v1 . p = l v l C1 uil · �l vl l ) (ilf)u u·v )u = (� = (�)u u·u This is the formula we want, and it is the basis of the following definition for vec­ tors in !R n . v u u v u * 0, u·v = (-u·u) u If and are vectors in !R n and onto is the vector proju(v) defined by Definition proj0 ( v ) then the projection of An alternative way to derive this formula is described in Exercise 73. 28 Chapter 1 Vectors R e m a rks The term projection comes from the idea of projecting an image onto a wall (with a slide projector, for example) . Imagine a beam oflight with rays parallel to each other and perpendicular to u shining down on v. The projection of v onto u is just the shadow cast, or projected, by v onto u. • It may be helpful to think of proj0(v) as a function with variable v. Then the variable v occurs only once on the right-hand side of the definition. Also, it is helpful to remember Figure 1 .38, which reminds us that proj0(v) is a scalar multiple of the vector u (not v) . • Although in our derivation of the definition of proj0(v) we required v as well as u to be nonzero (why?), it is clear from the geometry that the projection of the 0 zero vector onto u is 0. The definition is in agreement with this, since " u= u·u Ou = 0. • If the angle between u and v is obtuse, as in Figure 1 .38, then proj0(v) will be in the opposite direction from u; that is, proj0(v) will be a negative scalar multiple of u. • If u is a unit vector then proj0(v) = ( u · v) u. (Why?) • proju(v) u Figure 1 . 3 8 ( ) . Exa m p l e 1 . 2 4 Find the projection of v onto u in each case. (a) v = (c) v = [ - �] [ �] [ � ] [ ��� ] [ �] [ - �] and u = and u = 3 1 / \/2 (a) We compute u · v = · Solution proj u ( v) = (b) Since e3 is a unit vector, [ �] [ �] [] [ ] = 1 and u · u = ( ) · 2 /5 u·v 1 2 = = " u·u 1 /5 s 1 (c) We see that ll u ll = V� + � + t = 1 . Thus, proj0 ( v) = ( u · v) u = � (� 3(I = 5, so �) [ ��� ] : [ �l + 1 + V2J 1 / \/2 = 3( 1 � \/2) [ ��� ] 1 / \/2 J Section Exercises 1 . 2 5. u = [ 1 , \/2, \/3, o ] , v = [ 4, - \/2, 0, - 5 ] 6. U = [ 1 . 1 2, - 3.25, 2.07, - 1 .83 ] , v = [ - 2.29, 1 .72, 4.33, - 1 .54 ] In Exercises 7-12, find II u II for the given exercise, and give a unit vector in the direction of u. 7. Exercise 1 GAs 10. Exercise 4 8. Exercise 2 1 1 . Exercise 5 GAs 9. Exercise 3 12. Exercise 6 In Exercises 13-16, find the distance d(u, v) between u and v in the given exercise. 14. Exercise 2 13. Exercise 1 GAs 15. Exercise 3 16. Exercise 4 1 7. If u, v, and w are vectors in ll�r, n 2: 2, and c is a scalar, explain why the following expressions make no sense: (b) u · v + w (a) ll u · v ll ( ) (c) u · v · w (d) c . ( u + w ) In Exercises 18-23, determine whether the angle between u and v is acute, obtuse, or a right angle. 20. u = [4, 3, - 1 ] , v = [ l , - 1 , l ] GAS 2 1 . U = [0.9, 2 . 1 , 1 .2] , V = [ - 4.5, 2.6, - 0.8] 22. u = [ 1 , 2, 3, 4], v = [ - 3, 1 , 2, - 2 ] 23. u = [ 1 , 2, 3, 4], v = [5, 6, 7, 8] In Exercises 24-29, find the angle between u and v in the given exercise. 24. Exercise 1 8 29 .. In Exercises 1-6, find u · v. GAS 1.2 Length and Angle: The Dot Product 25. Exercise 1 9 GAs 27. Exercise 2 1 26. Exercise 20 GAs GAs 28. Exercise 22 29. Exercise 23 30. Let A = ( - 3, 2), B = ( 1 , 0), and C = (4, 6). Prove that MBC is a right-angled triangle. 31. Let A = ( 1 , 1, - 1 ), B = ( - 3, 2, - 2), and C = (2, 2, - 4). Prove that �ABC is a right-angled triangle. GAs 32. Find the angle between a diagonal of a cube and an ad­ jacent edge. 33. A cube has four diagonals. Show that no two of them are perpendicular. 34. A parallelogram has diagonals determined by the vectors d, � [H and d, � [ - �- Show that the parallelogram is a rhombus (all sides of equal length) and determine the side length. 35. The rectangle ABCD has vertices at A = ( 1 , 2, 3), B = (3, 6, - 2), and C = (O, 5, - 4) . Determine the coordinates of vertex D. 36. An airplane heading due east has a velocity of 200 miles per hour. A wind is blowing from the north at 40 miles per hour. What is the resultant velocity of the airplane? 37. A boat heads north across a river at a rate of 4 miles per hour. If the current is flowing east at a rate of 3 miles per hour, find the resultant velocity of the boat. 38. Ann is driving a motorboat across a river that is 2 km wide. The boat has a speed of 20 km/h in still water, and the current in the river is flowing at 5 km/h. Ann heads out from one bank of the river for a dock directly across from her on the opposite bank. She drives the boat in a direction perpendicular to the current. (a) How far downstream from the dock will Ann land? (b) How long will it take Ann to cross the river? 39. Bert can swim at a rate of 2 miles per hour in still water. The current in a river is flowing at a rate of 1 mile per hour. If Bert wants to swim across the river to a point directly opposite, at what angle to the bank of the river must he swim? 30 Chapter 1 Vectors In Exercises 40-45, find the projection of v onto u. Draw a sketch in Exercises 40 and 41. In Exercises 48 and 49, find all values of the scalar k for which the two vectors are orthogonal. 48. u � [ : Jv [ � � : l � 50. Describe all vectors v = [ ] to u = ] [ 1 .34 3.01 CAS 45. U = 4.25 - 0.33 V = - 1 .66 2.52 Figure 1 .39 suggests two ways in which vectors may be used to compute the area of a triangle. The area A of , 51. Describe all vectors v = to u = [�] . [ -J [ _fl � [;] that are orthogonal [;] that are orthogonal 52. Under what conditions are the following true for vectors u and v in IR 2 or IR 3 ? (a) ll u + v ii = ll u ll + ll v ll (b) ll u + v ii = ll u ll - ll v ll 53. Prove Theorem 1 .2 (b). 54. Prove Theorem 1 .2 (d) . In Exercises 55-57, prove the stated property of distance between vectors. 55. d(u, v) = d(v, u) for all vectors u and v 56. d(u, w) :s d(u, v) + d(v, w) for all vectors u, v, and w 57. d(u, v) = 0 if and only if u = v 58. Prove that u · c v = c( u · v) for all vectors u and v in !R n and all scalars c. 59. Prove that ll u - v ii 2': ll u ll - ll v ll for all vectors u and v in !R n . [Hint: Replace u by u - v in the Triangle (a) (b) [�] . 49. u � u Figure 1 . 3 9 the triangle in part (a) is given by t ll u ll ll v - proj u ( v ) II , and part (b) suggests the trigonometric form of the area of a triangle: A = t I u I I v II sin e (We can use the identity sin e = v 1 - cos 2 e to find sin e.) In Exercises 46 and 47, compute the area of the triangle with the given vertices using both methods. 46. A = ( 1 , - 1 ), B = (2, 2), C = (4, O) 47. A = (3, - 1 , 4), B = (4, - 2 , 6), C = (5, 0, 2) Inequality.] 60. Suppose we know that u · v = u · w. Does it follow that v = w? If it does, give a proof that is valid in !R n ; otherwise, give a counterexample (i.e., a specific set of vectors u, v, and w for which u · v = u · w but v -=F w) . 61. Prove that (u + v) · (u - v) = ll u ll 2 - ll v ll 2 for all vec­ tors u and v in !R n . 62. (a) Prove that ll u + v ll 2 + ll u - v ll 2 = 2 ll u ll 2 + 2 ll v ll 2 for all vectors u and v in !R n . (h) Draw a diagram showing u, v, u + v, and u - v in IR 2 and use (a) to deduce a result about parallelograms. 1 1 63. Prove that u · v = - ll u + v ll 2 - - ll u - v ll 2 for all 4 4 vectors u and v in !R n . Section 64. (a) Prove that ll u + v ii = ll u - v ii if and only if u and v are orthogonal. (b) Draw a diagram showing u, v, u + v, and u - v in IR 2 and use (a) to deduce a result about parallelograms. 65. ( a) Prove that u + v and u - v are orthogonal in !R n if and only if ll u ll = ll v ll . (b) Draw a diagram showing u, v, u + v, and u - v in IR 2 and use (a) to deduce a result about parallelograms. 1.2 Length and Angle: The Dot Product 31 72. Figure 1 .40 shows that, in IR 2 or IR 3 , ll proju ( v ) I :::::: ll v ll · (a) Prove that this inequality is true in general. [Hint: Prove that proju(v) is orthogonal to v - proju(v) and use Pythagoras' Theorem.] (b) Prove that the inequality ll proju ( v ) I :::::: ll v ll is equivalent to the Cauchy-Schwarz Inequality. 66. If ll u ll = 2, ll v ll = v'3 , and u · v = 1, find ll u + v ii . 67. Show that there are no vectors u and v such that ll u ll = 1, ll v ll = 2, and u · v = 3. 68. ( a) Prove that if u is orthogonal to both v and w, then u is orthogonal to v + w. (b) Prove that if u is orthogonal to both v and w, then u is orthogonal to sv + tw for all scalars s and t. 69. Prove that u is orthogonal to v - proju(v) for all vectors u and v in !R n , where u * 0. 70. ( a) Prove that proju(proju(v)) = proju(v) · (b) Prove that proju(v - proju(v)) = 0. proju(v) u Figure 1 . 4 0 73. Use the fact that proju(v) = c u for some scalar c, to­ gether with Figure 1 . 4 1 , to find c and thereby derive the formula for proju(v) . (c ) Explain (a) and (b) geometrically. 71. The Cauchy-Schwarz Inequality I u · v i :::::: ll u ll ll v ll is v equivalent to the inequality we get by squaring both sides: (u · v) 2 :::::: ll u ll 2 ll v ll 2 . ( a) In IR 2 , with u = [ �: ] and v = [ :J this becomes Prove this algebraically. [Hint: Subtract the left-hand side from the right-hand side and show that the difference must necessarily be nonnegative.] (b) Prove the analogue of (a) in IR 3 . - cu Figure 1 . 4 1 cu u 74. Using mathematical induction, prove the following generalization of the Triangle Inequality: ll v1 + Vz + · · · + vn ll ::=::: ll v1 ll + ll vz ll + · · · + ll vn ll for all n 2: 1. Exp loration Ve ctors and G e o m etry Many results i n plane Euclidean geometry can b e proved using vector techniques. For example, in Example 1 .24, we used vectors to prove Pythagoras' Theorem. In this exploration, we will use vectors to develop proofs for some other theorems from Euclidean geometry. As an introduction to the notation and the basic approach, consider the following easy example. Exa m p l e 1 . 2 5 Give a vector description of the midpoint M of a line segment AB. We first convert everything to vector notation. If 0 denotes the origin and P----is-> a point, let----> p be the vector GP. In this situation, a = GA, b = GB, m = oM, and ------> AB = OB - OA = b - a (Figure 1.42). Now, since M is the midpoint of AB, we have Solution A B m so Figure 1 . 4 2 The midpoint of AB m - a = AM = fAB = t ( b - a ) = a + t (b - a) = t ( a + b) 1 . Give a vector description of the point P that is one-third of the way from A to B on the line segment AB. Generalize. c A ----------.... B Figure 1 . 4 3 32 2. Prove that the line segment joining the midpoints of two sides of a triangle is parallel to the third side and half as long. (In vector notation, prove that PQ = t AB in Figure 1 .43.) 3. Prove that the quadrilateral PQRS (Figure 1 .44), whose vertices are the mid­ points of the sides of an arbitrary quadrilateral ABCD, is a parallelogram. 4. A median of a triangle is a line segment from a vertex to the midpoint of the opposite side (Figure 1 .45). Prove that the three medians of any triangle are con­ current (i.e., they have a common point of intersection) at a point G that is two­ thirds of the distance from each vertex to the midpoint of the opposite side. [ Hint: In Figure 1 .46, show that the point that is two-thirds of the distance from A to P is given by t ( a + b + c). Then show that t ( a + b + c ) is two-thirds of the distance from B to Q and two-thirds of the distance from C to R.] The point G in Figure 1 .46 is called the centroid of the triangle. A A B B Figure 1 . 4 4 c c Figure 1 . 4 5 Figure 1 . 4 6 A median The centroid 5. An altitude of a triangle is a line segment from a vertex that is perpendicu­ lar to the opposite side (Figure 1 .47) . Prove that the three altitudes of a triangle are concurrent. [Hint: Let H be the point of intersection of the altitudes from A and B in Figure 1 .48. Prove that cH is orthogonal to AB .] The point H in Figure 1 .48 is called the orthocenter of the triangle. 6. A perpendicular bisector of a line segment is a line through the midpoint of the segment, perpendicular to the segment (Figure 1 .49). Prove that the perpendicular bisectors of the three sides of a triangle are concurrent. [Hint: Let K be the point of in­ tersection of t� erpendicular bisectors of AC and BC in Figure 1 .50. Prove that RK is orthogonal to AB .] The point K in Figure 1 .50 is called the circumcenter of the triangle. c A B A --�-*��������--=e B Figure 1 . 41 Figure 1 . 4 8 Figure 1 . 4 9 An altitude The orthocenter A perpendicular bisector 7. Let A and B be the endpoints of a diameter of a circle. If C is any point on the circle, prove that LACE is a right angle. [Hint: In Figure 1 . 5 1 , let 0 be the center of the circle. Express everything in terms of a and c and show that AC is orthogonal to BC. J 8. Prove that the line segments joining the midpoints of opposite sides of a quadrilateral bisect each other (Figure 1 .52). c c 0 D R Figure 1 . 5 0 The circumcenter Figure 1 . 5 1 Figure 1 . 5 2 33 34 Chapter 1 Vectors lines a n d P l a n es We are all familiar with the equation of a line in the Cartesian plane. We now want to consider lines in IR 2 from a vector point of view. The insights we obtain from this approach will allow us to generalize to lines in IR 3 and then to planes in IR 3 . Much of the linear algebra we will consider in later chapters has its origins in the simple geom­ etry of lines and planes; the ability to visualize these and to think geometrically about a problem will serve you well. Lines in � 2 and � 3 In the xy-plane, the general form of the equation of a line is ax + by = c. If b * 0, then the equation can be rewritten as y = - ( a/b )x + c/b, which has the form y = mx + k. [This is the slope-intercept form; m is the slope of the line, and the point with coordi­ nates (O, k) is its y-intercept.J To get vectors into the picture, let's consider an example. Exa m p l e 1 . 2 6 The line C with equation 2x + y = 0 is shown in Figure 1 .53. It is a line with slope - 2 passing through the origin. The left-hand side of the equation is in the form of a dot product; in fact, if we let n = [ �] and x = [; ] , then the equation becomes n · x = 0. The vector n is perpendicular to the line-that is, it is orthogonal to any vector x that is parallel to the line (Figure 1 .54)-and it is called a normal vector to the line. The equation n x = 0 is the normal form of the equation of e. Another way to think about this line is to imagine a particle moving along the line. Suppose the particle is initially at the origin at time t = 0 and it moves along the line in such a way that its x-coordinate changes 1 unit per second. Then at t = 1 the particle is at ( 1 , - 2 ), at t = 1 .5 it is at ( 1 .5, - 3 ), and, if we allow negative values of t (i.e., we consider where the particle was in the past), at t = - 2 it is (or was) at ( - 2, 4). . y y The Latin word norma refers to a carpenter's square, used for draw­ ing right angles. Thus, a normal vector is one that is perpendicular to something else, usually a plane. 2x + 0 Figure 1 . 5 3 The line y = Figure 1 . 5 4 A normal vector n Section 1. 3 Lines and Planes 35 This movement is illustrated in Figure 1 .55. In general, if x = t, then y = -2t, and we may write this relationship in vector form as What is the significance of the vector d = [ �] ? _ It is a particular vector parallel to e, called a direction vector for the line. As shown in Figure 1 .56, we may write the equation of e as x = td. This is the vector form of the equation of the line. If the line does not pass through the origin, then we must modify things slightly. y y e Figure 1 . 5 6 A direction vector d Figure 1 . 5 5 Exa m p l e 1 . 2 1 Consider the line C with equation 2x + y = 5 (Figure 1 .57). This is just the line from Example 1 .26 shifted upward 5 units. It also has slope - 2, but its y-intercept is the point (O, 5). It is clear that the vectors d and n from Example 1 .26 are, respectively, a direction vector and a normal vector for this line too. Thus, n is orthogonal to every vector that is parallel to e. The point P = ( 1, 3) ---+ is on C. If X = ( x, y) represents a general point on C, then the vector PX = x p is parallel to e and n · (x p ) = 0 (see Figure 1 .58 ) . Simplified, we have n · x = n · p . As a check, we compute - - n·x = [ �] [;] · = 2x + y and n ·p = [ �] [ �] · = 5 Thus, the normal form n · x = n · p is just a different representation of the general form of the equation of the line. (Note that in Example 1 .26, p was the zero vector, so n · p = 0 gave the right-hand side of the equation.) 36 Chapter 1 Vectors y y n -t--j--+--t--t--f--\-t-jl-t-... x 2x + 5 figure 1 . 5 1 The line y = (x - p) = figure 1 . 5 8 n • 0 These results lead to the following definition. Definition The normal form of the equation of a line e in IR 2 is n · ( x - p ) = 0 or n · x = n · p where p is a specific point on e and n =F 0 is a normal vector for e. The general form of the equation of e is ax + = where n = by normal vector for e. C, [ ab] i· s a Continuing with Example 1 .27, let us now find the vector form of the equation of e. Note that, for each choice of X, x - p must be parallel to-and thus a multiple of-the direction vector d. That is, x - p = td or x = p + td for some scalar t. In terms of components, we have or The word parameter and the cor­ responding adjective parametric come from the Greek words para, meaning "alongside;' and metron, meaning "measure:' Mathemati­ cally speaking, a parameter is a variable in terms of which other variables are expressed-a new "measure" placed alongside old ones. [;] [ �] [ - � ] y +t (1) x= 1 + t = 3 2t (2) Equation ( 1) is the vector form of the equation of €, and the componentwise Equa­ tions ( 2) are called parametric equations of the line. The variable t is called a parameter. How does all of this generalize to IR 3 ? Observe that the vector and parametric forms of the equations of a line carry over perfectly. The notion of the slope of a line in IR 2 -which is difficult to generalize to three dimensions-is replaced by the more convenient notion of a direction vector, leading to the following definition. The vector form of the equation of a line e in IR 2 or IR 3 is x = p + td where p is a specific point on e and d =F 0 is a direction vector for e. The equations corresponding to the components of the vector form of the equation are called parametric equations of e. Definition Section 1.3 Lines and Planes 31 We will often abbreviate this terminology slightly, referring simply to the general, normal, vector, and parametric equations of a line or plane. Exa m p l e 1 . 2 8 [ -:J Find vector and parametric equations of the line in IR 3 through the point P = ( 1, 2, - 1 ), pornllel to the mtm d � Solution The vector equation x = p + td is The parametric form is x = 1 + St y= 2 - t z = - 1 + 3t R e m a rks The vector and parametric forms of the equation of a given line e are not unique-in fact, there are infinitely many, since we may use any point on e to de­ termine p and any direction vector for e. However, all direction vectors are clearly multiples of each other. 1o In Example 1 .28, (6, 1 , 2) is another point on the line (take t = 1), and - 2 is another direction vector. Therefore, 6 • [ l gives a different (but equivalent) vector equation for the line. The relationship between the two parameters s and t can be found by comparing the parametric equations: For a given point ( x, y, z) on e, we have x = 1 + St = 6 + 10s y = 2 - t = 1 - 2s z = - 1 + 3t = 2 + 6s implying that - 10s + St = S 2s - t = -1 - 6s + 3t = 3 Each of these equations reduces to t = 1 + 2s. 38 Chapter 1 Vectors 3 Intuitively, we know that a line is a one-dimensional object. The idea of "dimension" will be clarified in Chapters and 6, but for the moment observe that this idea appears to agree with the fact that the vector form of the equation of a line requires one parameter. • Exa m p l e 1 . 2 9 One often hears the expression "two points determine a line:' Find a vector equation of the line f in IR 3 determined by the points P = ( - 1 , 5, o) and Q = ( 2, 1 , 1 ) . Solulion fine) . 3[ ] We may choose any point on f for p, so we will use P ( Q would also be A convenient direction vector is d = PQ = Thus, we obtain -4 1 (or any scalar multiple of this). Planes i n lR 3 n is orthogonal to infinitely many vectors Figure 1 . 5 9 n p n · (x - p ) = Figure 1 . 6 0 0 The next question we should ask ourselves is, How does the general form of the equa­ tion of a line generalize to IR 3 ? We might reasonably guess that if ax + by = c is the general form of the equation of a line in IR 2 , then ax + by + cz = d might represent a line in IR 3 . In normal form, this equation would be n · x = n · p, where n is a normal vector to the line and p corresponds to a point on the line. To see if this is a reasonable hypothesis, let's think about the special case of the equation ax + by + cz � 0. In normal furn, it becomes n • x � 0, wh"e n � [�l · However, the set of all vectors x that satisfy this equation is the set of all vectors or­ thogonal to n. As shown in Figure 1 .59, vectors in infinitely many directions have this property, determining a family of parallel planes. So our guess was incorrect: It appears that ax + by + cz = d is the equation of a plane-not a line-in IR 3 . Let's make this finding more precise. Every plane <!]' in IR 3 can be determined by specifying a point p on <!I' and a nonzero vector n normal to <!I' (Figure 1 .60) . Thus, if x represents an arbitrary point on <!I', we have n · (x - p) = 0 or n · x = n · p. If n � [�] and x � [� l then, in teems of rnmponents, the equation bernmes ax + by + cz = d (where d = n · p ) . The normal form of the equation of a plane <!I' in IR 3 is n · (x - p) = 0 or n · x = n · p where p is a specific point on <!I' and n * 0 is a normal vector for <!I' . Definition The general form of the equation of <!I' i s ax + by + cz = d , where n = is a normal vector for <!I' . Section 1.3 Lines and Planes 39 Note that any scalar multiple of a normal vector for a plane is another normal vector. Exa m p l e 1 . 3 0 Find the normal and general forms of the equation of the plane that contains the point P � (6, O, 1 ) and har normal vodor n � - With p � [fl and x � [� ]. [H we havr n · p � 1 · 6 + 2 ' 0 � 3 · 1 � 9, 'o the normal equation n · x = n · p becomes the general equation x + 2y + 3z = 9 . .+ Geometrically, it is clear that parallel planes have the same normal vector(s) . Thus, their general equations have left-hand sides that are multiples of each other. So, for example, 2x + 4y + 6z = 10 is the general equation of a plane that is parallel to the plane in Example 1 .30, since we may rewrite the equation as x + 2y + 3z = 5-from which we see that the two planes have the same normal vector n. (Note that the planes do not coincide, since the right-hand sides of their equations are distinct.) We may also express the equation of a plane in vector or parametric form. To do so, we observe that a plane can also be determined by specifying one of its points P (by the vector p ) and two direction vectors u and v parallel to the plane (but not parallel to each other). As Figure 1 .6 1 shows, given any point X in the plane (located ,<= I x tv - p = s u + tv -x SU figure 1 . 6 1 X - p = SU + t v by x), we can always find appropriate multiples s u and tv of the direction vectors such that x - p = su + tv or x = p + s u + tv. If we write this equation componentwise, we obtain parametric equations for the plane. The vector form of the equation of a plane <lP in IR 3 is X = p + SU + tv where p i s a point o n <lP and u and v are direction vectors fo r <lP ( u and v are non zero and parallel to <lP , but not parallel to each other) . The equations corresponding to the components of the vector form of the equation are called parametric equations of <lP . Definition Chapter 40 1 Vectors Exa m p l e 1 . 3 1 Find vector and parametric equations for the plane in Example 1 .30. We need to find two direction vectors. We have one point P = (6, 0, 1 ) in the plane; if we can find two other points Q and R in <!J', then the vectors PQ and PR can serve as direction vectors (unless by bad luck they happen to be parallel!). By trial and error, we observe that Q = (9, 0, O) and R = (3, 3, O) both satisfy the general equation x + 2y + 3z = 9 and so lie in the plane. Then we compute Solulion Figure 1 . 6 2 \ which, since they are not scalar multiples of each other, will serve as direction vectors. Therefore, we have the vector equation of <!J', Two normals determine a line and the corresponding parametric equations, <;JP I Figure 1 . 6 3 The intersection of two planes is a line x = 6 + 3s - 3t � y = 3t z= l - s- t [What would have happened had we chosen R = (O, 0, 3)?] R e m a rks A plane is a two-dimensional object, and its equation, in vector or parametric form, requires two parameters. • As Figure 1 .59 shows, given a point P and a nonzero vector n in IR 3 , there are infinitely many lines through P with n as a normal vector. However, P and two non­ parallel normal vectors n 1 and n2 do serve to locate a line e uniquely, since e must then be the line through P that is perpendicular to the plane with equation x = p + sn 1 + tn2 (Figure 1 .62) . Thus, a line in IR 3 can also be specified by a pair of equations • a 1 x + b1y + c 1 z = d 1 a 1X + b 2y + C 2 Z = d1 one corresponding to each normal vector. But since these equations correspond to a pair of nonparallel planes (why nonparallel?), this is just the description of a line as the intersection of two nonparallel planes (Figure 1 .63). Algebraically, the line con­ sists of all points (x, y, z) that simultaneously satisfy both equations. We will explore this concept further in Chapter 2 when we discuss the solution of systems of linear equations. Tables 1 .2 and 1 . 3 summarize the information presented so far about the equa­ tions of lines and planes. Observe once again that a single (general) equation describes a line in IR 2 but a plane in IR 3 . [In higher dimensions, an object (line, plane, etc.) determined by a single equation of this type is usually called a hyperplane.] The relationship among Section Ta b l e 1 . 3 1. 3 Lines and Planes Ta b l e 1 . 2 Equations or lines in !R 2 Normal Form General Form Vector Form n·x=n·p ax + by = c x = p + td 41 { yx == P 1 ++ tdtd1 Parametric Form P2 2 lines and Planes in !R 3 Lines {n1 · x = n1 · p1 n2 • x = n2 • P2 {a 1X + b 1y + C1Z = d1 x = p + td Planes n·x = n·p ax + by + cz = d X = p + SU + tv Normal Form Vector Form General Form a zX + b1Y + C2 Z = dz Parametric Form { ;: {;: P1 + td 1 P2 + td2 z = p 3 + td3 p 1 + su 1 + tv 1 p2 + su 2 + tv2 z = p 3 + S U 3 + tv3 the dimension of the object, the number of equations required, and the dimension of the space is given by the "balancing formula'': (dimension of the object) + (number of general equations) = dimension of the space The higher the dimension of the object, the fewer equations it needs. For example, a plane in IR 3 is two-dimensional, requires one general equation, and lives in a three-dimensional space: 2 + 1 = 3. A line in IR 3 is one-dimensional and so needs 3 1 = 2 equations. Note that the dimension of the object also agrees with the number of parameters in its vector or parametric form. Notions of "dimension" will be clarified in Chapters 3 and 6, but for the time being, these intuitive observa­ tions will serve us well. We can now find the distance from a point to a line or a plane by combining the results of Section 1 .2 with the results from this section. - Exa m p l e 1 . 3 2 Find the distance from the point B = ( 1 , 0, 2) to the line € through the point A � (3, ! , I ) with dimhon vedoc d � [ -u --+ As we have already determined, we need to calculate the length of ----> PB, where P is the point on e at the foot of the perpendicular from B. If we label v = AB, then AP = projd(v) and PB = v projd ( v) (see Figure 1 .64) . We do the necessary calculations in several steps. Solution - 42 Chapter 1 Vectors B v - profa(v) ___...--- e d A = llv - projd ( v ) II Figure 1 . 6 4 d ( B, € ) Step 2: The projection of v onto d is projd ( v ) = = (:��) d (( - 1 ) · ( (--21) ) 1·1 ( - 10) 0· 1 ) [ - � ol + 2+ + + Step 3: The vector we want is Step 4: The distance d(B, f) from B to e is Using Theorem 1.3(b) ll v - prnj , ( v ) I � [ =! ] [ =� - to simplify the calculation, we have ll v - prnj , ( v ) II � l = t v9 + 9 + 4 = t v22 Nole • In terms of our earlier notation, d(B, f) = d(v, projd(v)). Section 1.3 Lines and Planes 43 In the case where the line £ is in IR 2 and its equation has the general form ax + by = c, the distance d(B, €) from B = (x0, y0 ) is given by the formula d ( B, € ) = l ax0 + by0 c l Va z + b 2 - (3) You are invited to prove this formula in Exercise 39. Exa m p l e 1 . 3 3 Find the distance from the point B = ( 1 , 0, 2) to the plane <fP whose general equation is x + y z = 1 . - In this case, we need to calculate the length of PB, where P is the point on <fP at the foot of the perpendicular from B. As Figure 1 .65 shows, if A is any point on Solution 'ii' and we 'ituate thr nurnrnl vectu' n � [_:] of 'ii' ' ° that it; tail [, at A, then we need to find the length of the projection of AB onto n. Again we do the necessary calculations in steps. n B I I I I I 11 p Figure 1 . 6 5 d ( B, <!P ) = ll proj n (AB) II Step 1: x+y - Step 2: Step 3: By trial and error, we find any point whose coordinates satisfy the equation z = 1 . A = ( 1 , 0, O) will do. Set The projection of v onto n is (n n) ( 1 + 1 + ( - 1 ) 2 ) [ �1 proj n ( V ) = � n · = 1 ·0 + 1·0 - 1•2 -t: l [=! ] -1 Chapter 44 1 Vectors Step 4: The distance d(B, <JP) from B to <JP is [J [J l l r rnj " ( vl ll � H �i In general, the distance d(B, <JP) from the point B = (x0, y0 , z0 ) to the plane whose general equation is ax + by + cz = d is given by the formula d ( B, <JP ) l ax0 + by0 + CZ0 - d i Va 2 + b 2 + c 2 (4) You will be asked to derive this formula in Exercise 40. .. I Exercises 1 . 3 In Exercises 1 and 2, write the equation of the line passing through P with normal vector n in (a) normal form and (b) general form. I . P = ( 0, 0 ) , n = [�] 2. P = ( 1, 2 ) , n = [ _! ] In Exercises 3-6, write the equation of the line passing through P with direction vector d in (a) vector form and (b) parametric form. 3. P = ( 1, 0 ) , d = S. P � (0, 0, 0), d � [ - �] [ -: l [�] _ m 4. p = ( - 4, 4 ) , d = •. p � ( 3 . o. 2 ) . d � In Exercises 7 and 8, write the equation of the plane passing through P with normal vector n in (a) normal form and (b) general form. 7. P � ( 0, 1, 0 ) , n � [;] 8. P � ( 3, 0, - 2 ) , n � m In Exercises 9 and 1 0, write the equation of the plane pass­ ing through P with direction vectors u and v in (a) vector form and (b) parametric form. 9. p � ( o. o. 0 ) . u � } [ [ :l [ l [ - :i 10. p � ( 6, - 4, - 3), u � � � In Exercises 1 1 and 12, give the vector equation of the line passing through P and Q. 1 1 . P = ( 1, - 2), Q = (3, O) 12. P = (O, 1, - 1), Q = ( - 2, 1, 3) In Exercises 1 3 and 1 4, give the vector equation of the plane passing through P, Q, and R. 13. P = ( 1, 1, 1 ) , Q = (4, 0, 2), R = (O, 1, - 1 ) 14. P = ( 1, 1, O), Q = ( 1 , 0, 1 ), R = (O, 1 , 1) 15. Find parametric equations and an equation in vector form for the lines in IR 2 with the following equations: (a) y = 3x - 1 (b) 3x + 2y = 5 Section 16. Consider the vector equation x = p + t(q - p ) , where p and q correspond to distinct points P and Q in IR 2 or IR 3 . (a) Show that this equation describes the line segment PQ as t varies from 0 to 1 . (b) For which value of t is x the midpoint of PQ, and what is x in this case? (c) Find the midpoint of PQ when P = (2, - 3 ) and Q = ( 0, 1 ) . (d) Find the midpoint o f P Q when P = ( 1 , 0, 1 ) and Q = ( 4, 1 , - 2) . (e) Find the two points that divide P Q i n part ( c ) into three equal parts. (f) Find the two points that divide PQ in part (d) into three equal parts. 17. Suggest a "vector proof" of the fact that, in IR 2 , two lines with slopes m 1 and m 2 are perpendicular if and only if m 1 m 2 = - 1 . 18. The line e passes through the point P = ( 1 , - 1 , 1 ) and h"' dfr,dion mtoc d � [ _;l Foe mh of th, following planes <if', determine whether e and <if' are parallel, perpendicular, or neither: (a) 2x + 3y - z = 1 (h) 4x - y + 5z = 0 (c) x - y - z = 3 (d) 4x + 6y - 2z = 0 19. The plane <if' 1 has the equation 4x - y + 5z = 2. For each of the planes <if' in Exercise 18, determine whether <if' 1 and <if' are parallel, perpendicular, or neither. 20. Find the vector form of the equation of the line in IR 2 that passes through P = (2, - 1 ) and is perpendicular to the line with general equation 2x - 3y = 1 . 21. Find the vector form of the equation of the line in IR 2 that passes through P = (2, - 1 ) and is parallel to the line with general equation 2x - 3y = 1 . 22. Find the vector form of the equation of the line in IR 3 that passes through P = ( - 1 , 0, 3 ) and is perpendicular to the plane with general equation x - 3y + 2z = 5. 23. Find the vector form of the equation of the line in IR 3 that passes through P = ( - 1 , 0, 3 ) and is parallel to the line with parametric equations passes through P = (O, - 2, 5 ) and is parallel to the plane with general equation 6x - y + 2z = 3. 45 25. A cube has vertices at the eight points (x, y, z) , where each of x, y, and z is either 0 or 1 . (See Figure 1 .34.) (a) Find the general equations of the planes that determine the six faces (sides) of the cube. (b) Find the general equation of the plane that con­ tains the diagonal from the origin to ( 1, 1, 1 ) and is perpendicular to the xy-plane. (c) Find the general equation of the plane that contains the side diagonals referred to in Example 1 .22. 26. Find the equation of the set of all points that are equidistant from the points P = ( 1 , 0, - 2) and Q = ( 5, 2, 4 ) . In Exercises 27 and 28, find the distance from the point Q to the line e. 27. Q = (2, 2), f with equation [;] [ - � ] [ _ � ] �[ ] [ : J { - � ] 28. Q � (O, ! , O), M h 'qoation + t = � + In Exercises 29 and 30, find the distance from the point Q to the plane <if' . 29. Q = (2, 2, 2), <if' with equation x + y - z = 0 30. Q = (O, 0, O), <if' with equation x - 2y + 2z = 1 Figure 1 . 66 suggests a way to use vectors to locate the point R on e that is closest to Q. 31. Find the point R on f that is closest to Q in Exercise 27. 32. Find the point R on e that is closest to Q in Exercise 28. Q I x= 1 - t y = 2 + 3t z = -2 - t 24. Find the normal form of the equation of the plane that 1. 3 Lines and Planes Fioure 1 . 6 6 r -----> = p + PR 46 Chapter 1 Vectors Figure 1.67 suggests a way to use vectors to locate the point R on <!J' that is closest to Q. k -- the angle between <!J' and <!J' z to be either e or 1 80° - e, whichever is an acute angle. (Figure 1.68) l e 0 \ r = p + PQ + QR ------> Figure 1 . 6 8 33. Find the point R on <!J' that is closest to Q in Exercise 29. 34. Find the point R on <!J' that is closest to Q in Exercise 30. In Exercises 35 and 36, find the distance between the parallel lines. 35. 36. [;] [ � ] [ � ] [;] [ � ] [ � ] [�] [ _�] {] [;] [:J {] +s - + and + t and - + In Exercises 37 and 38, find the distance between the parallel planes. 37. 2x + y 2z = 0 and 2x + y 2z = 5 38. x + y + z = 1 and x + y + z = 3 39. Prove Equation (3) on page 43. 40. Prove Equation (4) on page 44. 41. Prove that, in !R z , the distance between parallel lines with equations n · x = c1 and n · x = Cz is given by l c 1 - Cz l ll n ll . 42. Prove that the distance between parallel planes with equations n · x = d 1 and n · x = dz is given by I d , - dz l ll n ll - � 1 80 - Figure 1 . 6 1 ------> \lJ> I - If two nonparallel planes <!J' 1 and <!J' z have normal vectors n 1 and nz and e is the angle between n1 and Dz, then we define e In Exercises 43-44, find the acute angle between the planes with the given equations. 43. x + y + z = 0 and 2x + y - 2z = 0 44. 3x - y + 2z = 5 and x + 4y - z = 2 In Exercises 45-46, show that the plane and line with the given equations intersect, and then find the acute angle of intersection between them. 45. The plane given by x + y + 2z = 0 and the line given by x = 2 + t y = 1 - 2t z= 3 + t 46. The plane given by 4x - y - z 6 and the line given by x = t y = 1 + 2t z = 2 + 3t Exercises 47-48 explore one approach to the problem of finding the projection of a vector onto a plane. As Figure 1.69 shows, if <!J' is a plane through the origin in IR 3 with normal vector n, and v is a vector in IR 3, then p = proj9p(v) is a vector in <!J' such that v - en = p for some scalar c. n Figure 1 . 6 9 Projection onto a plane Section 47. Using the fact that n is orthogonal to every vector in <!f' (and hence to p ), solve for c and thereby find an expres­ sion for p in terms of v and n. 48. Use the method of Exercise 43 to find the projection of 1. 3 Lines and Planes onto the planes with the following equations: (b) 3x - y + z = 0 (a) x + y + z = 0 (c) x - 2z = 0 (d) 2x - 3y + z = 0 41 Exp loration T h e C ro s s Pro duct It would b e convenient i f we could easily convert the vector form x = p + s u + t v of the equation of a plane to the normal form n · x = n · p. What we need is a process that, given two nonparallel vectors u and v, produces a third vector n that is orthogo­ nal to both u and v. One approach is to use a construction known as the cross product of vectors. Only valid in IR 3 , it is defined as follows: Definition defined by U XV= [U2U3VV31 -- U3U1VV2]3 U 1 V2 - U 2 V 1 A shortcut that can help you remember how to calculate the cross product of two vectors is illustrated below. Under each complete vector, write the first two com­ ponents of that vector. Ignoring the two components on the top line, consider each block of four: Subtract the products of the components connected by dashed lines from the products of the components connected by solid lines. (It helps to notice that the first component of u X v has no ls as subscripts, the second has no 2s, and the third has no 3s.) UU21 VVz1 U3U1 xx V3V1 UU23VV31 -- U3U1 VV23 Uz x Vz U1V2 - U2V1 The following problems briefly explore the cross product. 1 . Compute u X v. 48 u X v v Figure 1 . 1 0 2. Show that e 1 X e = e3 , e X e3 = e 1 , and e3 X e 1 = e . 2 2 2 3. Using the definition of a cross product, prove that u X v (as shown in Figure 1 .70) is orthogonal to u and v. 4. Use the cross product to help find the normal form of the equation of the plane. (a) The pfane P"'ing thrnugh P � ( 1 . 0, - 2 ), parn!kl to u � [:] [ - �] and v � (b) The plane passing through P = (0, - 1 , 1 ) , Q = (2, 0, 2), and R = ( 1 , 2, - 1 ) 5. Prove the following properties of the cross product: (a) v X u = - ( u X v) (b) u X 0 = 0 (c) u X u = 0 (d) u X kv = k ( u X v) (e) u X ku = 0 (f) u X ( v + w) = u X v + u X w 6. Prove the following properties of the cross product: (a) u · ( v X w) = ( u X v ) · w (b) u X (v X w) = (u · w)v - (u · v)w (c) ll u X v ll 2 = ll u ll 2 ll v ll 2 - ( u · v ) 2 7. Redo Problems 2 and 3, this time making use of Problems 5 and 6. 8. Let u and v be vectors in IR 3 and let (} be the angle between u and v. (a) Prove that ll u X v ii = ll u ll ll v ll sin (}. [Hint: Use Problem 6(c) .] (b) Prove that the area A of the triangle determined by u and v (as shown in Fig­ ure 1. 7 1 ) is given by Figure 1 . 1 1 A = H u X v ii ( c) Use the result in part (b) to compute the area of the triangle with vertices A = ( 1 , 2, 1 ) , B = (2, 1 , O), and C = (5, - 1 , 3). Writi n g Project The Origins o f the Dot Product and Cross Product The notations for dot and cross product that we use today were introduced in the late 1 9th century by Josiah Willard Gibbs, a professor of mathematical physics at Yale University. Edwin B. Wilson was a graduate student in Gibbs's class, and he later wrote up his class notes, expanded upon them, and had them published in 1 90 1 , with Gibbs's blessing, as Vector Analysis: A Text-Book for the Use of Students of Mathematics and Physics. However, the concepts of dot and cross product arose earlier and went by various other names and notations. Write a report on the evolution of the names and notations for the dot product and cross product. 1 . Florian Cajori, A History of Mathematical Notations (New York: Dover, 1 993). 2. J. Willard Gibbs and Edwin Bidwell Wilson, Vector Analysis: A Text-Book for the Use of Students of Mathematics and Physics (New York: Charles Scribner's Sons, 1 90 1 ) . Available online at http://archive.org/details/ 1 1 77 1 4283. 3. Ivor Grattan-Guinness, Companion Encyclopedia of the History and Philosophy of the Mathematical Sciences (London: Routledge, 2013). 49 50 Chapter 1 Vectors A p p l icati o n s Force Veclors Force is defined as the product of mass and acceleration due to grav­ ity (which, on Earth, is m/s2 ). Thus, a kg mass exerts a down­ ward force of kg X m/s2 or kg • m/s2 • This unit of measure­ ment is a newton (N). So the force exerted by a kg mass is N. 1 1 9.89.8 9.8 1 9.8 We can use vectors to model force. For example, a wind blowing at 30 km/h in a west­ erly direction or the Earth's gravity acting on a 1 kg mass with a force of 9.8 newtons downward are each best represented by vectors since they each consist of a magnitude and a direction. It is often the case that multiple forces act on an object. In such situations, the net result of all the forces acting together is a single force called the resultant, which is simply the vector sum of the individual forces (Figure 1 .72). When several forces act on an object, it is possible that the resultant force is zero. In this case, the object is clearly not moving in any direction and we say that it is in equilibrium. When an object is in equilibrium and the force vectors acting on it are arranged head-to-tail, the result is a closed polygon (Figure 1 .73). Figure 1 . 1 2 Figure 1 . 1 3 The resultant of two forces Equilibrium Exa m p l e 1 . 3 4 Ann and Bert are trying to roll a rock out of the way. Ann pushes with a force of 20 N in a northerly direction while Bert pushes with a force of 40 N in an easterly direction. (a) What is the resultant force on the rock? (b) Carla is trying to prevent Ann and Bert from moving the rock. What force must Carla apply to keep the rock in equilibrium? Solution (a) Figure 1 .74 shows the position of the two forces. Using the paral­ lelogram rule, we add the two forces to get the resultant r as shown. By Pythagoras' a Figure 1 . 1 4 a b The resultant of two forces b Section 1.4 Applications 51 Theorem, we see that ll r ll = V20 2 + 40 2 = V20QO = 44.72 N. For the direc­ tion of r, we calculate the angle e between r and Bert's easterly force. We find that sine = 20/ ll r ll = 0.447, so e = 26.57°. (b) If we denote the forces exerted by Ann, Bert, and Carla by a, b, and c, respec­ tively, then we require a + b + c = 0. Therefore c = - ( a + b) = - r, so Carla needs to exert a force of 44. 72 N in the direction opposite to r. Often, we are interested in decomposing a force vector into other vectors whose resultant is the given vector. This process is called resolving a vector into com­ ponents. In two dimensions, we wish to resolve a vector into two components. However, there are infinitely many ways to do this; the most useful will be to re­ solve the vector into two orthogonal components. (Chapters 5 and 7 explore this idea more generally.) This is usually done by introducing coordinate axes and by choosing the components so that one is parallel to the x-axis and the other to the y-axis. These components are usually referred to as the horizontal and vertical components, respectively. In Figure 1 .75, f is the given vector and fx and fy are its horizontal and vertical components. y x Figure 1 . 1 5 Resolving a vector into components Exa m p l e 1 . 3 5 Ann pulls on the handle of a wagon with a force of 100 N. If the handle makes an angle of20° with the horizontal, what is the force that tends to pull the wagon forward and what force tends to lift it off the ground? Solution consider. Figure 1 .76 shows the situation and the vector diagram that we need to Figure 1 . 1 6 52 Chapter 1 Vectors We see that l l fx I I = 11 £ 11 cos 20° and l l fy I = 1 £ 11 sin 20° Thus, ll fx ll = 1 00 ( 0.9397 ) = 93.97 and ll fy ll = 1 00 ( 0.3420 ) = 34.20. So the wagon is pulled forward with a force of approximately 93.97 N and it tends to lift off the ground with a force of approximately 34.20 N. We solve the next example using two different methods. The first solution considers a triangle of forces in equilibrium; the second solution uses resolution of forces into components. Exa m p l e 1 . 3 6 Figure 1. 77 shows a painting that has been hung from the ceiling by two wires. If the painting has a mass of 5 kg and if the two wires make angles of 45 and 60 degrees with the ceiling, determine the tension in each wire. Figure 1 . 1 1 We assume that the painting is in equilibrium. Then the two wires must supply enough upward force to balance the downward force of gravity. Gravity exerts a downward force of 5 X 9.8 = 49 N on the painting, so the two wires must collectively pull upward with 49 N of force. Let f1 and f2 denote the tensions in the wires and let r be their resultant (Figure 1 . 78 ). It follows that II r II = 49 since we are in equilibrium. Solulion 1 Section 1.4 Applications 53 Using the law of sines, we have ll f1 ll sin 45° so sin 30° ll r ll sin 1 05° 11 f1 11 = sin 1 05° ll r ll sin 45° = 49 ( 0.7071 ) = ll r ll sin 30° = 49 ( 0.5 ) = 35.87 and I f2 II = 25.36 sin 1 05° 0.9659 0.9659 Therefore, the tensions in the wires are approximately 35.87 N and 25.36 N. We resolve f1 and f2 into horizontal and vertical components, say, f1 = h1 + v1 and f2 = h2 + v2 , and note that, as above, there is a downward force of 49 N (Figure 1 .79). Solution 2 30 ° It follows that Figure 1 . 1 8 ll vz ll = ll f2 ll sin 45° = Vz 60 ° / I� Since the painting is in equilibrium, the horizontal components must balance, as must the vertical components. Therefore, I h1 I = ll h2 II and ll v1 II + ll v2 II = 49, from which it follows that and 45 ° Substituting the first of these equations into the second equation yields v'3 ll f2 ll , ;;:; VL + fil - 49, ,v;;:;2 _ or ll f2 I _ 49 \/2 - 25.36 1 + ,v;;:;3 _ Thus, 11 £1 11 = v2 ll f2 ll = 1 .4 1 42 ( 25.36 ) = 35.87, so the tensions in the hires are approximately 35.87 N and 25.36 N, as before. 49 N Figure 1 . 1 9 I Exercises 1 . 4 Force Vectors In Exercises 1 -6, determine the resultant of the given forces. 1 . f1 acting due north with a magnitude of 1 2 N and f2 acting due east with a magnitude of 5 N 2. f1 acting due west with a magnitude of 1 5 N and f2 acting due south with a magnitude of 20 N 3. f1 acting with a magnitude of 8 N and f2 acting at an angle of 60° to f1 with a magnitude of 8 N 4. f1 acting with a magnitude of 4 N and f2 acting at an angle of 1 35° to f1 with a magnitude of 6 N 54 Chapter 1 Vectors 5. f1 acting due east with a magnitude of 2 N, f2 acting due west with a magnitude of 6 N, and f3 acting at an angle of 60° to f1 with a magnitude of 4 N 6. f1 acting due east with a magnitude of 1 0 N, f2 acting due north with a magnitude of 1 3 N, f3 acting due west with a magnitude of 5 N, and f4 acting due south with a magnitude of 8 N 7. Resolve a force of 1 0 N into two forces perpendicular to each other so that one component makes an angle of 60° with the 10 N force. 8. A 10 kg block lies on a ramp that is inclined at an angle of 30° (Figure 1 .80). Assuming there is no friction, what force, parallel to the ramp, must be applied to keep the block from sliding down the ramp? 10. A lawn mower has a mass of 30 kg. It is being pushed with a force of 1 00 N. If the handle of the lawn mower makes an angle of 45° with the ground, what is the horizontal component of the force that is causing the mower to move forward? 1 1 . A sign hanging outside Joe's Diner has a mass of 50 kg (Figure 1 .82). If the supporting cable makes an angle of 60° with the wall of the building, determine the tension in the cable. Figure 1 . 8 2 12. A sign hanging in the window of Joe's Diner has a Figure 1 . 8 0 9. A tow truck is towing a car. The tension in the tow cable is 1 500 N and the cable makes a 45° with the horizontal, as shown in Figure 1 . 8 1 . What is the verti­ cal force that tends to lift the car off the ground? mass of 1 kg. If the supporting strings each make an angle of 45° with the sign and the supporting hooks are at the same height (Figure 1 .83), find the tension in each string. • _ _ f = 1500 N , , ' _ , , • , ' ,f2 45 ° OPEN FOR BUSINESS Figure 1 . 8 3 13. A painting with a mass of 1 5 kg is suspended by two Figure 1 . 8 1 wires from hooks on a ceiling. If the wires have lengths of 1 5 cm and 20 cm and the distance between the hooks is 25 cm, find the tension in each wire. 14. A painting with a mass of 20 kg is suspended by two wires from a ceiling. If the wires make angles of 30° and 45° with the ceiling, find the tension in each wire. Chapter Review Kev Defi nitions and concepts head-to-tail rule, 6 integers modulo m (Z m ), 14- 1 6 length (norm) of a vector, 2 0 linear combination of vectors, 1 2 normal vector, 34, 38 m-ary vector 16 orthogonal vectors, 26 parallel vectors, 8 parallelogram rule, 6 algebraic properties of vectors, 10 angle between vectors, 24 binary vector, 1 3 Cauchy-Schwarz Inequality, 22 cross product, 48 direction vector, 3 5 distance between vectors, 2 3 dot product, 1 8 equation o f a line, 3 6 equation o f a plane, 38-39 Review Questions 1 . Mark each of the following statements true or false: (a) For vectors U, v, and w in rr;r, if u + w = v + w, then u = v. (b) For vectors U, v, and w in rr;r, if u . w = v . w, then u = v. (c) For vectors u, v, and w in IR 3 , if u is orthogonal to v, and v is orthogonal to w, then u is orthogonal to w. (d) In IR 3 , if a line C is parallel to a plane <:IP, then a di­ rection vector d for e is parallel to a normal vector n for <:IP . (e) I n IR 3 , i f a line e i s perpendicular t o a plane <:IP , then a direction vector d for e is a parallel to a normal vector n for <:IP . (f) I n IR 3 , i f two planes are not parallel, then they must intersect in a line. ( g) In IR 3 , if two lines are not parallel, then they must intersect in a point. (h) If v is a binary vector such that v · v = 0, then v = 0. (i) In l:'.5, if ab = 0 then either a = 0 or b = 0. (j) In l:'.6, if ab = 0 then either a = 0 or b = 0. 2. If u = [ -�l [�l [ -�l [�l v= and the vector 4u + v is drawn with its tail at the point ( 1 0 , 1 0 ), find the coordinates of the point at the head of 4u + v. 3. If u = for x. v= and 2x + u 3 (x - v), solve = projection o f a vector onto a vector, 27 Pythagoras' Theorem, 26 scalar multiplication, 7 standard unit vectors, 22 Triangle Inequality, 22 unit vector, 2 1 vector, 3 vector addition, 5 zero vector, 4 4. Let A, B, C, and D be the vertices of a square centered ----> at the origin 0, labeled in clockwise order. If a = OA and b = OB , find BC in terms of a and b. 5. Find the angle between the vectors [ - 1 , 1, 2] and [2, 1, - 1 ] . 6. Hnd ilie prnjection of v � [:] m onto u � [ -n 7. Find a unit vector in the xy-plane that is orthogonal to 8. Find the general equation of the plane through the point ( 1, 1, 1 ) that is perpendicular to the line with parametric equations x= 2- t y = 3 + 2t z = -1 + t 9. Find the general equation of the plane through the - point (3, 2, 5) that is parallel to the plane whose general equation is 2 x + 3y z = 0. 10. Find the general equation of the plane through the points A ( l , 1 , O ) , B ( l , 0, 1 ) , and C ( O, 1 , 2). 11. Find the area of the triangle with vertices A(l, 1 , O), B ( l , 0, 1 ) , and C(O, 1 , 2). 55 56 Chapter 1 Vectors 12. Find the midpoint of the line segment between A = (5, 1 , - 2) and B = (3, - 7, 0). 13. Why are there no vectors u and v in !R n such that ll u ll = 2, ll v ll = 3, and u · v = - 7? 14. Find the distance from the point (3, 2, 5) to the plane whose general equation is 2x + 3y - z = 0. 15. Find the distance from the point (3, 2, 5) to the line with parametric equations x = t, y = 1 + t, z = 2 + t. 16. Compute 3 - (2 + 4) 3 (4 + 3) 2 in Z5. 17. If possible, solve 3(x + 2) = 5 in Z7• 18. If possible, solve 3 (x + 2) = 5 in Z9• 19. Compute [2, 1, 3, 3] · [3, 4, 4, 2] in zt. 20. Let u [ l , 1, 1, O] in Zi. How many binary vectors v = satisfy u · v = O? Systems of Lineari Eq u ations The world was full of equations . . . . There must be an answerfor everything, if only you knew how to setforth the questions. -Anne Tyler 1985, p. 235 The Accidental Tourist Alfred A. Knopf, 2.0 I n t ro d u ctio n : Triv i a l ilV The word trivial is derived from the Latin root tri ("three") and the Latin word via ("road"). Thus, speaking literally, a triviality is a place where three roads meet. This common meeting point gives rise to the other, more familiar meaning of trivial­ commonplace, ordinary, or insignificant. In medieval universities, the trivium con­ sisted of the three "common" subjects (grammar, rhetoric, and logic) that were taught before the quadrivium (arithmetic, geometry, music, and astronomy) . The "three roads" that made up the trivium were the beginning of the liberal arts. In this section, we begin to examine systems oflinear equations. The same system of equations can be viewed in three different, yet equally important, ways-these will be our three roads, all leading to the same solution. You will need to get used to this threefold way of viewing systems of linear equations, so that it becomes common­ place (trivial!) for you. The system of equations we are going to consider is 2x + y = 8 x - 3y = - 3 Problem 1 Draw the two lines represented by these equations. What is their point of intersection? Problem 2 Consider the vectors u = [�] and v = [ _ �] . Draw the coordinate grid determined by u and v. [Hint: Lightly draw the standard coordinate grid first and use it as an aid in drawing the new one.] Problem 3 On the u-v grid, find the coordinates of w = [ _: ] . Problem 4 Another way to state Problem 3 is to ask for the coefficients x and y for which xu + yv = w. Write out the two equations to which this vector equation is equivalent (one for each component) . What do you observe? Problem 5 Return now to the lines you drew for Problem 1 . We will refer to the line whose equation is 2x + y = 8 as line 1 and the line whose equation is x - 3y = - 3 as line 2. Plot the point (O, O) on your graph from Problem 1 and label it P0 . Draw a 51 58 Chapter 2 Systems of Linear Equations Ta b l e 2 . 1 Point Po P1 P2 P3 P4 Ps p6 x 0 y 0 horizontal line segment from P0 to line 1 and label this new point P1 . Next draw a vertical line segment from P1 to line 2 and label this point P2 • Now draw a horizontal line segment from P2 to line 1, obtaining point P3 • Continue in this fashion, drawing vertical segments to line 2 followed by horizontal segments to line 1 . What appears to be happening? Problem 6 Using a calculator with two-decimal-place accuracy, find the (approxi­ mate) coordinates of the points P 1 , P2 , P3 , . . . , P6• (You will find it helpful to first solve the first equation for x in terms of y and the second equation for y in terms of x.) Record your results in Table 2 . 1 , writing the x- and y-coordinates of each point separately. The results of these problems show that the task of "solving" a system of linear equations may be viewed in several ways. Repeat the process described in the prob­ lems with the following systems of equations: ( a ) 4x - 2y = 0 ( b ) 3x + 2y = 9 ( c ) x + y = S ( d ) x + 2y = 4 x + 2y = 5 x + 3y = 1 0 x-y=3 2x - y = 3 Are all of your observations from Problems 1-6 still valid for these examples? Note any similarities or differences. In this chapter, we will explore these ideas in more detail. I n t ro d u ct i o n to svste m s o t l i n e a r E q u a t i o n s Recall that the general equation o f a line in IR 2 is of the form ax + by = c and that the general equation of a plane in IR 3 is of the form ax + by + cz = d Equations of this form are called linear equations. D e f i n i t i o n A linear equation in the n variables x 1 , x2 , • • • , xn is an equation that can be written in the form where the coefficients a 1 , a 2 , . • . , a n and the constant term b are constants. Exa m p l e 2 . 1 The following equations are linear: 3x - 4y = - 1 \/2x + r - ts - lft = 9 :y - (sin :)z = 1 3.2x1 - O.O lx2 = 4 . 6 Observe that the third equation is linear because it can be rewritten in the form x 1 + 5x 2 + x 3 - 2x4 = 3. It is also important to note that, although in these examples (and in most applications) the coefficients and constant terms are real numbers, in some examples and applications they will be complex numbers or members of "11.P for some prime number p. Section 2. 1 Introduction to Systems of Linear Equations 59 The following equations are not linear: xy + 2z = 1 xi - x� = 3 x -+z=2 y Thus, linear equations do not contain products, reciprocals, or other functions of the variables; the variables occur only to the first power and are multiplied only by con stants. Pay particular attention to the fourth example in each list: Why is it that the fourth equation in the first list is linear but the fourth equation in the second list is not? 4 A solution of a linear equation a 1 x 1 + a 2 x 2 + + a nxn = b is a vector , sn l whose components satisfy the equation when we substitute x 1 = s 1 , [ s 1 , s2 , X2 = S 2> , Xn = Sn · • Exa m p l e 2 . 2 • • • • · · • (a) [5, 4] is a solution of 3x - 4y = - 1 because, when we substitute x = 5 and y = 4, the equation is satisfied: 3 (5) - 4(4) = - 1 . [ l , 1] is another solution. In general, the solutions simply correspond to the points on the line determined by the given equa­ tion. Thus, setting x = t and solving for y, we see that the complete set of solutions can be written in the parametric form [t, � + �t] . (We could also set y equal to some parameter-say, s -and solve for x instead; the two parametric solutions would look different but would be equivalent. Try this.) (b) The linear equation x 1 - x 2 + 2x 3 = 3 has [3, 0, O ] , [O, 1, 2], and [6, 1 , - 1 ] as specific solutions. The complete set of solutions corresponds to the set of points in the plane determined by the given equation. If we set x 2 = s and x 3 = t, then a parametric solution is given by [3 + s - 2t, s, t] . (Which values of s and t produce the three specific solutions above?) A system of linear equations is a finite set of linear equations, each with the same variables. A solution of a system of linear equations is a vector that is simultaneously a solution of each equation in the system. The solution set of a system of linear equa­ tions is the set of all solutions of the system. We will refer to the process of finding the solution set of a system of linear equations as "solving the system:' Exa m p l e 2 . 3 The system 2x - y = 3 x + 3y 5 has [2, 1 ] as a solution, since it is a solution of both equations. On the other hand, [ 1 , - 1 ] is not a solution of the system, since it satisfies only the first equation. = Exa m p l e 2 . 4 Solve the following systems of linear equations: (a) x - y = 1 x+y=3 (b) x - y = 2 2x - 2y = 4 (c) x - y = 1 x-y= 3 60 Chapter 2 Systems of Linear Equations Solution (a) Adding the two equations together gives 2x = 4, so x = 2, from which we find that y = 1 . A quick check confirms that [2, 1 ] is indeed a solution of both equations. That this is the only solution can be seen by observing that this solution corresponds to the (unique) point of intersection ( 2, 1 ) of the lines with equations x - y = 1 and x + y = 3, as shown in Figure 2 . l (a). Thus, [2, 1 ] is a unique solution. (b) The second equation in this system is just twice the first, so the solutions are the solutions of the first equation alone-namely, the points on the line x - y = 2. These can be represented parametrically as [2 + t, t] . Thus, this system has infinitely many solutions [Figure 2 . 1 (b) ] . (c) Two numbers x and y cannot simultaneously have a difference of 1 and 3 . Hence, this system has no solutions. (A more algebraic approach might be to subtract the second equation from the first, yielding the absurd conclusion 0 = - 2.) As Figure 2.l (c) shows, the lines for the equations are parallel in this case . .J y y (b) (c) 4 ( a) Figure 2 . 1 A system oflinear equations is called consistent if it has at least one solution. A sys­ tem with no solutions is called inconsistent. Even though they are small, the three sys­ tems in Example 2.4 illustrate the only three possibilities for the number of solutions of a system of linear equations with real coefficients. We will prove later that these same three possibilities hold for any system of linear equations over the real numbers. A system of linear equations with real coefficients has either (a) a unique solution (a consistent system) or (b) infinitely many solutions (a consistent system) or ( c) no solutions (an inconsistent system) . Solving a svstem or Linear Equations � Two linear systems are called equivalent if they have the same solution sets. For example, and x-y= l x-y= l x+y=3 y= 1 are equivalent, since they both have the unique solution [2, l ] . (Check this.) Section 2. 1 Introduction to Systems of Linear Equations 61 Our approach to solving a system of linear equations is to transform the given system into an equivalent one that is easier to solve. The triangular pattern of the second example above (in which the second equation has one less variable than the first) is what we will aim for. Exa m p l e 2 . 5 Solve the system x-y- z= 2 y + 3z = 5 5z = 1 0 Starting from the last equation and working backward, we find successively that z = 2, y = 5 - 3(2) = - 1 , and x = 2 + ( - 1 ) + 2 = 3. So the unique solution is [3, - 1 , 2] . Solution The procedure used to solve Example 2.5 is called back substitution. We now turn to the general strategy for transforming a given system into an equivalent one that can be solved easily by back substitution. This process will be described in greater detail in the next section; for now, we will simply observe it in action in a single example. Exa m p l e 2 . 6 Solve the system x- y- z= 2 3x - 3y + 2z = 1 6 2x - y + z = 9 To transform this system into one that exhibits the triangular structure of Example 2.5, we first need to eliminate the variable x from Equations 2 and 3. Observe that subtracting appropriate multiples of equation 1 from Equations 2 and 3 will do the trick. Next, observe that we are operating on the coefficients, not on the variables, so we can save ourselves some writing if we record the coefficients and constant terms in the matrix -1 -1 2 -3 2 16 1 9 -1 Solution The word matrix is derived from the Latin word mater, meaning "mother:' When the suffix - ix is added, the meaning becomes "womb:' Just as a womb surrounds a fetus, the brackets of a matrix surround its entries, and just as the womb gives rise to a baby, a matrix gives rise to certain types of functions called linear transforma­ tions. A matrix with m rows and n columns is called an m X n matrix (pronounced "m by n"). The plural of matrix is matrices, not "matrixes:' [: ] where the first three columns contain the coefficients of the variables in order, the final column contains the constant terms, and the vertical bar serves to remind us of the equal signs in the equations. This matrix is called the augmented matrix of the system. There are various ways to convert the given system into one with the triangular pattern we are after. The steps we will use here are closest in spirit to the more general method described in the next section. We will perform the sequence of operations on the given system and simultaneously on the corresponding augmented matrix. We begin by eliminating x from Equations 2 and 3. x- y- z= 2 3x - 3y + 2z = 1 6 2x - y + z = 9 -1 -3 -1 2 2 16 1 9 -1 ] 62 Chapter 2 Systems of Linear Equations Subtract 3 times the first equation from the second equation: Subtract 3 times the first row from the second row: x-y- z= 2 5z = 1 0 2x - y + z = 9 Subtract 2 times the first equation from the third equation: [: -1 0 -1 Subtract 2 times the first row from the third row: x-y- z= 2 5z = 1 0 y + 3z = 5 Interchange Equations 2 and 3: [: -1 0 1 Interchange rows 2 and 3: x-y- z= 2 y + 3z = 5 5z = 1 0 [i -1 1 0 -1 5 3 -1 3 5 !�: lt This is the same system that we solved using back substitution in Example 2.5, where we found that the solution was [ 3, - 1 , 2] . This is therefore also the solution to the sys­ tem given in this example. Why? The calculations above show that any solution of the given system is also a solution of thefinal one. But since the steps we just performed are reversible, we could recover the original system, starting with the final system. (How?) So any solution of the final system is also a solution of the given one. Thus, the systems are equivalent (as are all of the ones obtained in the intermediate steps above). More­ over, we might just as well work with matrices instead of equations, since it is a simple matter to reinsert the variables before proceeding with the back substitution. (Work­ ing with matrices is the subject of the next section.) Remark Calculators with matrix capabilities and computer algebra systems can facilitate solving systems of linear equations, particularly when the systems are large or have coefficients that are not "nice;' as is often the case in real-life applications. As always, though, you should do as many examples as you can with pencil and paper until you are comfortable with the techniques. Even if a calculator or CAS is called for, think about how you would do the calculations manually before doing anything. After you have an answer, be sure to think about whether it is reasonable. Do not be misled into thinking that technology will always give you the answer faster or more easily than calculating by hand. Sometimes it may not give you the answer at all! Roundoff errors associated with the floating-point arithmetic used by calculators and computers can cause serious problems and lead to wildly wrong an­ swers to some problems. See Exploration: Lies My Computer Told Me for a glimpse of the problem. (You've been warned! ) Section 1 2. 1 Introduction to Systems of Linear Equations 63 Exercises 2 . 1 In Exercises 1-6, determine which equations are linear equations in the variables x, y, and z. If any equation is not linear, explain why not. 1 . x - 'TTY + efsz = 0 2. x 2 + y 2 + z 2 = 1 3. x - 1 + 7y + z = sin (;) 4. 2x - xy - 5z = 0 6. (cos3)x - 4y + z = v3 5. 3 cos x - 4y + z = v3 _ In Exercises 7-1 0, find a linear equation that has the same solution set as the given equation (possibly with some restrictions on the variables). x2 y2 8. x - y 7. 2x + y = 7 - 3y 1 1 4 9. - + - = 10. log 1 0 x - log 1 0 y = 2 x y xy --­ In Exercises 1 1 -14, find the solution set of each equation. 12. 2x 1 + 3x 2 = 5 1 1 . 3x - 6y = 0 13. x + 2y + 3 z = 4 In Exercises 15-18, draw graphs corresponding to the given linear systems. Determine geometrically whether each sys­ tem has a unique solution, infinitely many solutions, or no solution. Then solve each system algebraically to confirm your answer. 16. x - 2y = 7 15. x + y = 0 2x + y = 3 3x + y = 7 18. O. lOx - 0.05y = 0.20 17. 3x - 6y = 3 -0.06x + 0.03y = -0.12 -x + 2y = l In Exercises 1 9-24, solve the given system by back substitution. 19. x - 2y = 1 20. 2u - 3v = 5 2v = 6 y=3 21. x - y + z = 0 22. X 1 + 2X 2 + 3X 3 = 0 - 5x2 + 2x 3 = 0 2y - z = 1 4X 3 = 0 3z = - 1 23. X 1 + Xz - X3 - X4 = 1 24. x - 3y + z = 5 y - 2z = - 1 X2 + X3 + X4 = 0 X 3 - X4 = 0 X4 = 1 The systems in Exercises 25 and 26 exhibit a "lower trian­ gular" pattern that makes them easy to solve by forward substitution. (We will encounter forward substitution again in Chapter 3.) Solve these systems. = -1 25. X 2 26. X1 ! = -3 2x + y 5 - x1 + x2 �X 1 + 2X 2 + X3 = 7 - 3x - 4y + z = - 10 Find the augmented matrices of the linear systems in Exercises 27-30. 28. 2X1 + 3X2 - X3 = 1 27. x - y = 0 2x + y = 3 + X3 = 0 X1 - X 1 + 2X2 - 2X 3 = 0 29. x + Sy = - 1 30. a - 2b + d=2 -x + y = - 5 - a + b - c - 3d = 1 2x + 4y = 4 In Exercises 31 and 32, find a system of linear equations that has the given matrix as its augmented matrix. 31. 32. [ � =: � :J [i �] -1 0 3 2 -1 0 2 3 For Exercises 33-38, solve the linear systems in the given exercises. 33. Exercise 27 34. Exercise 28 35. Exercise 29 36. Exercise 3 0 38. Exercise 32 37. Exercise 3 1 39. (a) Find a system of two linear equations in the vari­ ables x and y whose solution set is given by the parametric equations x = t and y = 3 - 2t. (b) Find another parametric solution to the system in part (a) in which the parameter is s and y = s. 40. ( a) Find a system of two linear equations in the vari­ ables x 1 , x 2 , and x 3 whose solution set is given by the parametric equations x 1 = t, x 2 = 1 + t, and X 3 = 2 - t. (b) Find another parametric solution to the system in part (a) in which the parameter is s and x3 = s. 64 Chapter 2 Systems of Linear Equations In Exercises 4 1 -44, the systems of equations are nonlinear. Find substitutions (changes of variables) that convert each system into a linear system and use this linear system to help solve the given system. 2 3 41. -x + - = 0 y -3 + -4 = l x y 42. x 2 + 2y 2 = 6 x2 - y 2 = 3 43. tan x 2 sin y 2 sin y + cos z = 2 tan x sin y cos z = - 1 44. - 2a + 2 (3 b ) = 1 3(2a) - 4(3 b ) = 1 -- - D i rect M e t h o d s f o r S o lvi n g l i n e a r Svst e m s In this section, we will look at a general, systematic procedure fo r solving a system of linear equations. This procedure is based on the idea of reducing the augmented matrix of the given system to a form that can then be solved by back substitution. The method is direct in the sense that it leads directly to the solution (if one exists) in a finite number of steps. In Section 2.5, we will consider some indirect methods that work in a completely different way. Matrices a n d Echelon Form There are two important matrices associated with a linear system. The coefficient matrix contains the coefficients of the variables, and the augmented matrix (which we have already encountered) is the coefficient matrix augmented by an extra column containing the constant terms. For the system 2x + y - z = 3 x + Sz = 1 -x + 3y - 2z = O the coefficient matrix is and the augmented matrix is [ -1] - � � -� 2 1 1 -1 3 0 5 1 3 -2 0 ] Note that if a variable is missing (as y is in the second equation), its coefficient 0 is entered in the appropriate position in the matrix. If we denote the coefficient matrix of a linear system by A and the column vector of constant terms by b, then the form of the augmented matrix is [A I b J . Section 2.2 Direct Methods for Solving Linear Systems 65 In solving a linear system, it will not always be possible to reduce the coefficient matrix to triangular form, as we did in Example 2.6. However, we can always achieve a staircase pattern in the nonzero entries of the final matrix. The word echelon comes from the Latin word scala, meaning "ladder" or "stairs:' The French word for "ladder:' echelle, is also derived from this Latin base. A matrix in echelon form exhibits a staircase pattern. Definition properties: A matrix is in row echelon form if it satisfies the following 1. Any rows consisting entirely of zeros are at the bottom. 2. In each nonzero row, the first nonzero entry (called the leading entry) is in a column to the left of any leading entries below it. Note that these properties guarantee that the leading entries form a staircase pat­ tern. In particular, in any column containing a leading entry, all entries below the leading entry are zero, as the following examples illustrate. Exa m p l e 2 . 1 The following matrices are in row echelon form: [: �] [: :J [i 4 -1 0 0 1 0 1 2 0 1 0 0 �- l � 1 -1 2 0 -1 1 4 0 0 0 2 0 0 0 0 0 �-4 If a matrix in row echelon form is actually the augmented matrix of a linear sys­ tem, the system is quite easy to solve by back substitution alone. Exa m p l e 2 . 8 Assuming that each of the matrices in Example 2.7 is an augmented matrix, write out the corresponding systems of linear equations and solve them. We first remind ourselves that the last column in an augmented matrix is the vector of constant terms. The first matrix then corresponds to the system Solution 2x1 + 4X2 = 1 -x2 = 2 (Notice that we have dropped the last equation 0 = 0, or Ox 1 + Ox 2 = 0, which is clearly satisfied for any values of x 1 and x 2 .) Back substitution gives x 2 = - 2 and then 2x1 = 1 - 4( - 2) = 9, so x1 = � . The solution is [ �, - 2 ] . The second matrix has the corresponding system x, = 1 X2 = 5 0 =4 The last equation represents Ox 1 + Ox2 = 4, which clearly has no solutions. Therefore, the system has no solutions. Similarly, the system corresponding to the fourth matrix has no solutions. For the system corresponding to the third matrix, we have 66 Chapter 2 Systems of Linear Equations X 1 + X2 + 2X3 = 1 X3 = 3 so x1 = 1 - 2 ( 3 ) - x2 = - 5 x2• There are infinitely many solutions, since we may assign x2 any value t to get the parametric solution [ 5 t, t, 3 ] . - - - Elemenlarv Row Operalions We now describe the procedure by which any matrix can be reduced to a matrix in row echelon form. The allowable operations, called elementary row operations, correspond to the operations that can be performed on a system of linear equations to transform it into an equivalent system. DefiniliOD matrix: The following elementary row operations can be performed on a 1 . Interchange two rows. 2. Multiply a row by a nonzero constant. 3. Add a multiple of a row to another row. Observe that dividing a row by a nonzero constant is implied in the above definition, since, for example, dividing a row by 2 is the same as multiplying it by ! . Similarly, subtracting a multiple o f one row from another row i s the same a s adding a negative multiple of one row to another row. We will use the following shorthand notation for the three elementary row operations: 1 . R; � Rj means interchange rows i and j. 2. kR; means multiply row i by k. 3. R; + kRj means add k times row j to row i (and replace row i with the result). The process of applying elementary row operations to bring a matrix into row echelon form, called row reduction, is used to reduce a matrix to echelon form. Exa m p l e 2 . 9 Reduce the following matrix to echelon form: [ _: -4 -4 0 0 2 3 1 3 6 1 2 4 �] We work column by column, from left to right and from top to bottom. The strategy is to create a leading entry in a column and then use it to create zeros below it. The entry chosen to become a leading entry is called a pivot, and this phase of the process is called pivoting. Although not strictly necessary, it is often convenient to use the second elementary row operation to make each leading entry a 1 . Solution Section 2.2 Direct Methods for Solving Linear Systems 5] 55 [ 61 -- :5 ] We begin by introducing zeros into the first column below the leading 1 in the first row: 2 -4 -4 4 0 0 2 1 3 2 3 6 R, - 2R1 R3 - 2R1 � R, + R1 l 2 -4 -4 0 0 8 8 0 - 1 10 9 0 3 -1 2 10 The first column i s now a s we want it, s o the next thing t o do i s to create a leading entry in the second row, aiming for the staircase pattern of echelon form. In this case, we do this by interchanging rows. (We could also add row 3 or row 4 to row 2.) ]� 2 -4 -4 - 1 10 9 0 8 8 -8 3 - 1 2 10 ] The pivot this time was - 1 . We now create a zero at the bottom o f column 2, using the leading entry - 1 in row 2: -� -5 2 -4 -4 - 1 10 9 0 8 8 -8 0 29 29 Column 2 is now done. Noting that we already have a leading entry in column 3, we just pivot on the 8 to introduce a zero below it. This is easiest if we first divide row 3 by 8: � � R, [1 ]� 2 -4 -4 - 1 10 9 1 -1 0 1 0 29 29 - 5 ] Now we use the leading 1 in row 3 to create a zero below it: ·�· [ i -� 2 -4 -4 - 1 10 9 1 -1 0 1 0 0 0 24 With this final step, we have reduced our matrix to echelon form. R e m arks The row echelon form of a matrix is not unique. (Find a different row echelon form for the matrix in Example 2.9. ) • 68 Chapter 2 Systems of Linear Equations The leading entry in each row is used to create the zeros below it. The pivots are not necessarily the entries that are originally in the posi­ tions eventually occupied by the leading entries. In Example 2.9, the pivots were 1 , - 1 , 8, and 24. The original matrix had 1 , 4, 2, and 5 in those positions on the "staircase." • Once we have pivoted and introduced zeros below the leading entry in a column, that column does not change. In other words, the row echelon form emerges from left to right, top to bottom. • • Elementary row operations are reversible-that is, they can be "undone:' Thus, if some elementary row operation converts A into B, there is also an elementary row operation that converts B into A. (See Exercises 1 5 and 16.) Matrices A and B are row equivalent if there is a sequence of elementary row operations that converts A into B. Definition The matrices in Example 2.9, 2 -4 -4 4 0 0 1 3 2 3 6 �] [� md -�] 2 -4 -4 - 1 10 9 1 -1 0 1 0 0 0 24 are row equivalent. In general, though, how can we tell whether two matrices are row equivalent? Theorem 2 . 1 Matrices A and B are row equivalent if and only if they can be reduced to the same row echelon form. Proof If A and B are row equivalent, then further row operations will reduce B (and therefore A) to the (same) row echelon form. Conversely, if A and B have the same row echelon form R, then, via elementary row operations, we can convert A into R and B into R. Reversing the latter sequence of operations, we can convert R into B, and therefore the sequence A � R � B achieves the desired effect. Remark In practice, Theorem 2 . 1 is easiest to use if R is the reduced row echelon form of A and B, as defined on page 73. See Exercises 17 and 18. Gaussian Elimination When row reduction is applied to the augmented matrix of a system of linear equations, we create an equivalent system that can be solved by back substitution. The entire process is known as Gaussian elimination. Section 2.2 Direct Methods for Solving Linear Systems G a u s s i a n Elimination 69 1. Write the augmented matrix of the system of linear equations. 2. Use elementary row operations to reduce the augmented matrix to row echelon form. 3. Using back substitution, solve the equivalent system that corresponds to the row-reduced matrix. Remark When performed by hand, step 2 of Gaussian elimination allows quite a bit of choice. Here are some useful guidelines: (a) Locate the leftmost column that is not all zeros. (b) Create a leading entry at the top of this column. (It will usually be easiest if you make this a leading 1 . See Exercise 22.) (c) Use the leading entry to create zeros below it. ( d) Cover up the row containing the leading entry, and go back to step (a) to repeat the pro­ cedure on the remaining submatrix. Stop when the entire matrix is in row echelon form. Exa m p l e 2 . 1 0 Solve the system 2x2 + 3x3 = 8 2x1 + 3x2 + x3 = 5 X1 - Xz - 2X3 = 5 - Solution The augmented matrix is [� 2 3 3 - 1 -2 - �l We proceed to reduce this matrix to row echelon form, following the guidelines given for step 2 of the process. The first nonzero column is column 1 . We begin by creating Carl Friedrich Gauss (1777-1855) is generally considered to be one of the three greatest mathematicians of all time, along with Archimedes and Newton. He is often called the "prince of mathematicians;' a nickname that he richly deserves. A child prodigy, Gauss reportedly could do arithmetic before he could talk. At the age of he corrected an error in his father's calculations for the company payroll, and as a young student, he found the formula n(n + 1) /2 for the sum of the first n natural numbers. When he was 19, he proved that a 17-sided polygon could be constructed using only a straightedge and a compass, and at the age of21, he proved, in his doctoral dissertation, that every polynomial of degree n with real or complex coefficients has exactly n zeros, counting multiple zeros-the Fundamental Theorem of Algebra. Gauss's 1801 publication Disquisitiones Arithmeticae is generally considered to be the foundation of modern number theory, but he made contributions to nearly every branch of mathematics as well as to statistics, physics, astronomy, and surveying. Gauss did not publish all of his findings, probably because he was too critical of his own work. He also did not like to teach and was often critical of other mathematicians, perhaps because he discovered-but did not publish-their results before they did. The method called Gaussian elimination was known to the Chinese in the third century B . c . and was well known by Gauss's time. The method bears Gauss's name because of his use of it in a paper in which he solved a system oflinear equations to describe the orbit of an asteroid. 3, 10 Chapter 2 Systems of Linear Equations a leading entry at the top of this column; interchanging rows 1 and 3 is the best way to achieve this. [� 2 3 -1 SJ [ 3 s -2 -s [ Ri ++ R, � l 2 0 -1 3 2 -2 -S 1 s 3 8 i We now create a second zero in the first column, using the leading 1 : R, 2R � - 1 l - l -2 -S 0 s s lS 0 2 3 8 i We now cover up the first row and repeat the procedure. The second column is the first nonzero column of the submatrix. Multiplying row 2 by � will create a leading 1 . [i -1 s 2 -2 s 3 - 5 1 [ i -! ] I[ lS 8 tR2 � -1 -2 2 3 We now need another zero at the bottom of column 2: � ,, - rn, 0 0 - 1 -2 1 1 0 1 5i 3 2 The augmented matrix is now in row echelon form, and we move to step 3. The cor­ responding system is X 1 - Xz - 2X3 = - S Xz + x3 = 3 X3 = 2 and back substitution gives x3 = 2, then x2 = 3 - x3 = 3 - 2 = 1, and finally x 1 = - S + x2 + 2x3 = - S + 1 + 4 = 0. We write the solution in vector form as (We are going to write the vector solutions of linear systems as column vectors from now on. The reason for this will become clear in Chapter 3.) Exa m p l e 2 . 1 1 Solve the system w - x - y + 2z = 2w - 2x - y + 3z = 3 = -3 -w + x - y Section Solution 2.2 Direct Methods for Solving Linear Systems 11 The augmented matrix is u -1 -1 2 -2 - 1 3 -1 0 which can be row reduced as follows: u J [: [: -1 -1 2 -2 - 1 3 -1 0 R, - 2R, R, + R, -----+ R3 + 2R2 -----+ The associated system is now w J -1 -1 2 0 -1 0 -2 2 -1 -1 2 0 1 -1 0 0 0 J il - x - y + 2z = 1 y - z = 1 which has infinitely many solutions. There is more than one way to assign param eters, but we will proceed to use back substitution, writing the variables correspond­ ing to the leading entries (the leading variables) in terms of the other variables (the free variables) . In this case, the leading variables are w and y, and the free variables are x and z. Thus, y = 1 + z, and from this we obtain w = 1 + x + y - 2z = 1 + x + (1 + z) - 2z =2+x-z If we assign parameters x = s and z = t, the solution can be written in vector form as Example 2. 1 1 highlights a very important property: In a consistent system, the free variables are just the variables that are not leading variables. Since the number of leading variables is the number of nonzero rows in the row echelon form of the coefficient matrix, we can predict the number of free variables (parameters) before we find the explicit solution using back substitution. In Chapter 3, we will prove that, although the row echelon form of a matrix is not unique, the number of nonzero rows is the same in all row echelon forms of a given matrix. Thus, it makes sense to give a name to this number. 12 Chapter 2 Systems of Linear Equations Definition echelon form. The rank of a matrix is the number of nonzero rows in its row We will denote the rank of a matrix A by rank(A) . In Example 2 . 1 0, the rank of the coefficient matrix is 3, and in Example 2. 1 1 , the rank of the coefficient matrix is 2. The observations we have just made justify the following theorem, which we will prove in more generality in Chapters 3 and 6. Theorem 2 . 2 The Rank Theorem - Let A be the coefficient matrix of a system of linear equations with n variables. If the system is consistent, then number of free variables = n - rank (A ) Thus, in Example 2 . 1 0, we have 3 - 3 = 0 free variables (in other words, a unique solution), and in Example 2. 1 1 , we have 4 2 = 2 free variables, as we found. Exa m p l e 2 . 1 2 Solve the system X 1 - X2 + 2X3 = 3 X 1 + 2X2 - X3 = - 3 2x2 - 2X3 = Solulion When we row reduce the augmented matrix, we have [i -1 2 2 2 -1 -2 - : l [i [i [I Ri - R 1 � !. R2 � � � - ,,, (1842-1899) Wilhelm Jordan was a German professor of geodesy whose contribution to solving linear systems was a systematic method of back substitution closely related to the method described here. 0 0 -1 2 3 -3 2 -2 -1 1 2 2 -1 -2 -1 1 2 -1 0 0 -;] �] -�] - - leading to the impossible equation 0 = 5. (We could also have performed R3 t R2 as the second elementary row operation, which would have given us the same contradiction but a different row echelon form.) Thus, the system has no solutions-it is inconsistent. Gauss-Jordan Eliminalion 4 A modification of Gaussian elimination greatly simplifies the back substitution phase and is particularly helpful when calculations are being done by hand on a system with Section 2.2 Direct Methods for Solving Linear Systems 13 infinitely many solutions. This variant, known as Gauss-Jordan elimination, relies on reducing the augmented matrix even further. Definition properties: A matrix is in reduced row echelon form ifit satisfies the following 1. It is in row echelon form. 2. The leading entry in each nonzero row is a 1 (called a leading 1 ) . 3. Each column containing a leading 1 has zeros everywhere else. The following matrix is in reduced row echelon form: 1 0 0 0 0 2 0 0 0 0 1 0 0 0 -3 0 4 -1 0 0 3 -2 0 0 0 0 0 1 0 0 0 0 0 For 2 X 2 matrices, the possible reduced row echelon forms are For a short proof that the reduced row echelon form of a matrix is unique, see the article by Thomas Yuster, "The Reduced Row Echelon Form of a Matrix Is Unique: A Simple Proof;' in the March issue of Mathematics Magazine (vol. no. pp. 57, 2, 93-94). 1984 G a u ss-Jordan Elimination where * can be any number. It is clear that after a matrix has been reduced to echelon form, further elementary row operations will bring it to reduced row echelon form. What is not clear (although intuition may suggest it) is that, unlike the row echelon form, the reduced row ech­ elon form of a matrix is unique. In Gauss-Jordan elimination, we proceed as in Gaussian elimination but reduce the augmented matrix to reduced row echelon form. 1 . Write the augmented matrix of the system of linear equations. 2. Use elementary row operations to reduce the augmented matrix to reduced row echelon form. 3. If the resulting system is consistent, solve for the leading variables in terms of any remaining free variables. Exa m p l e 2 . 1 3 Solve the system in Example 2.1 1 by Gauss-Jordan elimination. SolUtion [: The reduction proceeds as it did in Example 2.1 1 until we reach the echelon form: ll -1 -1 2 0 1 -1 1 0 0 0 0 14 Chapter 2 Systems of Linear Equations We now must create a zero above the leading 1 in the second row, third column. We do this by adding row 2 to row 1 to obtain [: -1 0 0 0 0 The system has now been reduced to w-x +z=2 y-z= I It is now much easier to solve for the leading variables: w=2+x-z and y= I +z If we assign parameters x = s and z = t as before, the solution can be written in vector form as Remark From a computational point of view, it is more efficient (in the sense that it requires fewer calculations) to first reduce the matrix to row echelon form and then, working from right to left, make each leading entry a 1 and create zeros above these leading 1 s. However, for manual calculation, you will find it easier to just work from left to right and create the leading ls and the zeros in their columns as you go. Let's return to the geometry that brought us to this point. Just as systems of linear equations in two variables correspond to lines in IR 2 , so linear equations in three vari­ ables correspond to planes in IR 3 . In fact, many questions about lines and planes can be answered by solving an appropriate linear system. Exa m p l e 2 . 1 4 Find the line of intersection of the planes x + 2y - z = 3 and 2x + 3y + z = 1 . Solution First, observe that there will b e a line of intersection, since the normal vectors of the two planes- [ I, 2, - I ] and [2, 3, 1 ] -are not parallel. The points that lie in the intersection of the two planes correspond to the points in the solution set of the system x + 2y - z = 3 2x + 3y + z = 1 Gauss-Jordan elimination applied to the augmented matrix yields Section 2.2 Direct Methods for Solving Linear Systems 2 2 [� 3 1 3 1 -1 3 ] 1- 3 - 7 J 7 -7 -5 0 s Replacing variables, we have 5 + Sz = x z 15 y 3z = 5 We set the free variable z equal to a parameter t and thus obtain the parametric equa­ tions of the line of intersection of the two planes: x= - St y = 5 + 3t z= In vector form, the equation is x y Figure 2 . 2 The intersection of two planes Exa m p l e 2 . 1 5 See Figure 2.2. Lct p � [ _ H m' [:J. q � u� md v � [J Dcte;m;ne whcthec the lilles x = p + tu and x = q + tv intersect and, if so, find their point of intersection. We need to be careful here. Although t has been used as the parameter in the equations of both lines, the lines are independent and therefore so are their parameters. Let's use a different parameter-say, s -for the first line, so its equation Solution bewmes x � p + su. If the lines illtmed, then we wmt to find on x � [;] thot satisfies both equations simultaneously. That is, we want x = p + su = q + tv or SU - tv = q - p. Substituting the given p, q, u, and v, we obtain the equations s - 3t = s+ t= s+ t= -1 2 2 whose solution is easily found to be s = �, t = � . The point of intersection is therefore Chapter 16 2 Systems of Linear Equations See Figure 2.3. (Check that substituting t = i in the other equation gives the same point.) z x In IR 3 , it is possible for two lines to intersect in a point, to be parallel, or to do neither. Nonparallel lines that do not intersect are called skew lines. Remark y Figure 2 . 3 Two intersecting lines Homogeneous svste ms We have seen that every system of linear equations has either no solution, a unique solution, or infinitely many solutions. However, there is one type of system that always has at least one solution. A system of linear equations is called homogeneous ifthe constant term in each equation is zero. Definition In other words, a homogeneous system has an augmented matrix of the form [A I O] . The following system is homogeneous: 2x + 3y - z = 0 -x + Sy + 2z = 0 Since a homogeneous system cannot have no solution (forgive the double negative!), it will have either a unique solution (namely, the zero, or trivial, solution) or infinitely many solutions. The next theorem says that the latter case must occur if the number of variables is greater than the number of equations. Theorem 2 . 3 If [A I O] is a homogeneous system of m linear equations with n variables, where m < n, then the system has infinitely many solutions. Since the system has at least the zero solution, it is consistent. Also, rank(A) :::::: m (why?) . By the Rank Theorem, we have Proof � number of free variables = n - rank (A ) 2: n - m > 0 So there is at least one free variable and, hence, there are infinitely many solutions. Note Theorem 2.3 says nothing about the case where m 2: n. Exercise 44 asks you to give examples to show that, in this case, there can be either a unique solution or infinitely many solutions. Section IR and ZP are examples offields. The set of rational numbers Q and the set of complex numbers C are other examples. Fields are covered in detail in courses in abstract algebra. 2.2 Direct Methods for Solving Linear Systems linear svs1ems over Z p Thus far, all o f the linear systems we have encountered have involved real numbers, and the solutions have accordingly been vectors in some !R n . We have seen how other number systems arise-notably, "ll_P ' When p is a prime number, "ll_P behaves in many respects like IR; in particular, we can add, subtract, multiply, and divide (by nonzero numbers). Thus, we can also solve systems of linear equations when the variables and coefficients belong to some "ll_P ' In such instances, we refer to solving a system over "ll_P ' For example, the linear equation x1 + x2 + x3 = 1 , when viewed as an equation over Z 2 , has exactly four solutions: (where the last solution arises because 1 + 1 + 1 = 1 in Z 2 ) . wh,n wni,w th"q"'Hon X ; + x, + X; � 1 ov" 2,, thno!utiom � Exa m p l e 2 . 1 6 11 [ :: l m (Check these.) But we need not use trial-and-error methods; row reduction of augmented matri­ ces works just as well over "ll_P as over IR. Solve the following system of linear equations over Z 3 : x3 0 + X3 = 2 x2 + 2x3 1 X 1 + 2x2 + X1 = = The first thing to note in examples like this one is that subtraction and division are not needed; we can accomplish the same effects using addition and mul­ tiplication. (This, however, requires that we be working over "ll_P , where p is a prime; see Exercise 60 at the end of this section and Exercise 57 in Section 1 . 1 .) We row reduce the augmented matrix of the system, using calculations modulo 3. Solution [i ;] [i ;] [i :J 2 0 1 2 � R2 + 2R1 R, + R2 � R3 + 2R2 2 1 1 0 1 2 0 1 0 0 2 18 Chapter 2 Systems of Linear Equations R, + R, � [ 1 Q 0 1 0 1 02 0 0 1 1 l Thus, the solution is Xi = 1, x2 = 2, x3 = 1 . Exa m p l e 2 . 1 1 Solve the following system of linear equations over 2 2 : X i + Xz + X 3 + X4 = 1 = 1 X i + X2 = 0 X2 + X3 X3 + X4 = 0 + X4 = 1 Xi Solution The row reduction proceeds as follows: 1 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 R2 + R1 R5 + R, � R2 <--> R3 R1 + R2 � R5 + R2 R2 + R3 � R4 + R3 1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Therefore, we have X1 Xz Setting the free variable x4 = t yields + X4 = 1 + X4 = 0 X3 + X4 = Q Section 2.2 Direct Methods for Solving Linear Systems 0 1, Since t can take on the two values and � 1 .. In Exercises 1 -8, determine whether the given matrix is in row echelon form. If it is, state whether it is also in reduced row echelon form. !. 3. 5. 3 3 2. 4. 6. 5 7 . 01 -11 ] [� 0 0 � 00 [ : 0 :J 01 [: 0 :J 001 01 [ � 0 0 -� ] 3 8 . there are exactly two solutions: Remark For linear systems over "11..P , there can never be infinitely many solutions. (Why not?) Rather, when there is more than one solution, the number of solutions is finite and is a function of the number of free variables and p. (See Exercise 59.) Exercises 2 . 2 0 ] 0 [i � [ � 01 0 � ] 00 0 - 40 [i 0 :J 20 [ j 01 �: 19 In Exercises 9-14, use elementary row operations to reduce the given matrix to (a) row echelon form and (b) reduced row echelon form. 0 • [: : J [: -!] 13. [ : -- 21 --- 111 i 9. II 10. [� � ] 1 2. [� - 41 - 62 : J . -[ 2 - 4 1 : ] 2 14. - � -6 -3 -3 15. Reverse the elementary row operations used in Example 2.9 to show that we can convert 2- 1 -104 - 4 1 1 0 1 [� 0 0 0 -�]24 9 into Chapter 80 2 Systems of Linear Equations 2 -4 -4 4 0 0 3 2 3 6 �] 16. In general, what is the elementary row operation that "undoes" each of the three elementary row operations R; � �' kR;, and R; + kR/ In Exercises 1 7 and 1 8, show that the given matrices are row equivalent and find a sequence of elementary row operations that will convert A into B. 1 3 -1 'B = 17. A = 3 4 1 0 [ 2] [ 18. ] A= [ � � i [ � - � ] 0 -1 -1 1 ,B = 3 5 2 19. What is wrong with the following "proof" that every matrix with at least two rows is row equivalent to a matrix with a zero row? Perform R 2 + R 1 and R 1 + R2 • Now rows 1 and 2 are identical. Now perform R 2 - R 1 to obtain a row of zeros in the second row. 20. What is the net effect of performing the following sequence of elementary row operations on a matrix (with at least two rows)? R2 + R , , R , - R2 , R2 + R , , - R , 21. Students frequently perform the following type of cal­ culation to introduce a zero into a matrix: However, 3R 2 - 2R 1 is not an elementary row opera­ tion. Why not? Show how to achieve the same result using elementary row operations. 22. Consider the matrix A = [ � !] . Show that any of the three types of elementary row operations can be used to create a leading 1 at the top of the first column. Which do you prefer and why? 23. What is the rank of each of the matrices in Exercises 1 -8? 24. What are the possible reduced row echelon forms of 3 X 3 matrices? In Exercises 25-34, solve the given system of equations using either Gaussian or Gauss-Jordan elimination. 26. x - y + z = 0 25. X 1 + 2X2 - 3X3 = 9 2X 1 - X2 + X3 = 0 -x + 3y + z = 5 4x 1 - X2 + x3 = 4 3x + y + 7z = 2 27. X 1 - 3X2 - 2X3 = 0 28. 2w + 3x - y + 4z = 1 3w - x -X1 + 2X2 + X3 = 0 + z=1 3w - 4x + y - z = 2 2x1 + 4x2 + 6x3 = 0 29. 2r + s = 3 4r + s = 7 2r + Ss = - 1 30. -X1 + 3X2 - 2X3 + 4X4 = 0 2X 1 - 6X2 + X3 - 2X4 = - 3 X 1 - 3X2 + 4X3 - 8x4 = 2 2 31. � X 1 + X2 - X3 - 6X4 ix , + � Xz - 3X4 + X5 = - 1 - 4X5 = 8 32. Vlx + y + 2z = 1 Vly - 3z = - V2 -y + Vlz = 33. w + x + 2y + z = 1 w-x- y+z= 0 x+ y -1 +z= 2 w+x 34. a + b + c + d = 4 a + 2b + 3c + 4d = 1 0 a + 3b + 6c + lOd = 20 a + 4b + lOc + 20d = 35 In Exercises 35-38, determine by inspection (i.e., without performing any calculations) whether a linear system with the given augmented matrix has a unique solution, infinitely many solutions, or no solution. Justify your answers. 0 -2 0 35. 2 -3 36. 1 0 4 -6 2 3 4 O 2 3 4 6 7 8 0 5 4 3 37. 38. 10 1 1 12 0 7 7 7 39. Show that if ad - be -=!= 0, then the system ax + by = r ex + dy = s [: [i [� J [� has a unique solution. Section In Exercises 40-43,for what value(s) of k, if any, will the systems have (a) no solution, (b) a unique solution, and (c) infinitely many solutions? kx + 2y = 3 x + ky = 1 kx + y = 1 2x - 4y = x - 2y + 3 z = 2 x + y + kz = x + ky + z = x+ y+ z=k 2 kx + y + z = - 2 2x - y + 4z = k Give examples of homogeneous systems of m linear equations in n variables with m = n and with m > n 40. 42. 44. 445.. 467. -6 41. 43. that have (a) infinitely many solutions and (b) a unique solution. 2.2 Direct Methods for Solving Linear Systems 6 (c ) Give an example of three planes, exactly two of which are parallel (Figure 2. ) . figure 2 . 6 (d) Give an example of three planes that intersect in a single point (Figure 2.7). In Exercises 45 and 46, find the line of intersection of the given planes. 3x + 2y + z = - 1 and 2x - y + 4z = 5 4x + y + z = 0 and 2x - y + 3z = 2 (a) Give an example of three planes that have a com­ mon line of intersection (Figure 2.4). p q 48.p � [ - Hq � [H u � [ _+ � [ - i l 49.p � [�J q � [: J . u � [ + � m 50. p � m. u � [ J. � m p 51. u uu � [ :: l [ :: l figure 2 . 1 In Exercises 48 and 49, determine whether the lines x = + su and x = + tv intersect and, if they do, find their point of intersection. Figure 2 . 4 (b) Give an example of three planes that intersect in pairs but have no common point of intersection (Figure 2.5 ). let ond v Desaibe all points Q = ( a, b, c) such that the line through Q with direction vector v intersects the line with equation x = + su. Recall that the cross product of vectors and v is a vector X v that is orthogonal to both and v. (See Exploration: The Cross Product in Chapter 1 .) If Figure 2 . 5 81 U and F 82 Chapter 2 Systems of Linear Equations show that there are infinitely many vectors that simultaneously satisfy u · x = 0 and v · x = 0 and that all are multiples of U XV= [ U 2 V3 - U 3 V2 U 3 V 1 - U 1 V3 U 1 V2 - U 2 V 1 ] Show that the lines x = p + su and x = q + tv are skew lines. Find vector equations of a pair of parallel planes, one containing each line. In Exercises 53-58, solve the systems of linear equations over the indicated ZP. 53. x + 2y = 1 over Z 3 x+ y=2 54. x + y = 1 over Z 2 y+z=O x +z= 1 Writi n g Proiect 55. x + y = 1 over Z 3 y+z=O x +z= 1 56. 3 x + 2y = 1 over Z s x + 4y = 1 57. 3 x + 2y = 1 over Z 7 x + 4y = 1 58. x 1 + 4x4 = 1 over Z s =3 X 1 + 2X2 + 4x3 2X 1 + 2X2 + X4 = 1 =2 X1 + 3X3 59. Prove the following corollary to the Rank Theorem: Let A be an m X n matrix with entries in Zp . Any consistent system of linear equations with coefficient matrix A has exactly p n - rank(A) solutions over ZP ' 60. When p is not prime, extra care is needed in solving a linear system (or, indeed, any equation) over ZP ' Using Gaussian elimination, solve the following system over Z6• What complications arise? 2x + 3y = 4 4x + 3y = 2 A History of Gaussian Elimination As noted in the biographical sketch of Gauss in this section, Gauss did not actually "invent" the method known as Gaussian elimination. It was known in some form as early as the third century B . C . and appears in the mathematical writings of cultures throughout Europe and Asia. Write a report on the history of elimination methods for solving systems of linear equations. What role did Gauss actually play in this history, and why is his name attached to the method? 1. S. Athloen and R. McLaughlin, Gauss-Jordan reduction: A brief history, American Mathematical Monthly 94 ( 1 987), pp. 1 30- 142. 2. Joseph F. Grear, Mathematicians of Gaussian Elimination, Notices of the AMS, Vol. 58, No. 6 (20 1 1 ) , pp. 782-792. (Available online at http://www.ams.org/ notices/20 1 1 06/index.html) 3. Roger Hart, The Chinese Roots of Linear Algebra (Baltimore: Johns Hopkins University Press, 20 1 1 ) . 4. Victor J. Katz, A History of Mathematics: An Introduction (Third Edition) (Reading, MA: Addison Wesley Longman, 2008). cAs L i e s My C o mp uter Told M e Computers and calculators store real numbers in floating-point form. For example, 2001 is stored as 0.200 1 X 10 4 , and - 0.00063 is stored as - 0.63 X 10 - 3 . In general, the floating-point form of a number is :±:.M X lO k , where k is an integer and the mantissa M is a (decimal) real number that satisfies 0. 1 ::::: M < 1 . The maximum number of decimal places that can be stored in the mantissa depends on the computer, calculator, or computer algebra system. If the maximum number of decimal places that can be stored is d, we say that there are d significant digits. Many calculators store 8 or 12 significant digits; computers can store more but still are subject to a limit. Any digits that are not stored are either omitted (in which case we say that the number has been truncated) or used to round the number to d significant digits. For example,-rr = 3.141592654, and its floating-point form is 0.3 141592654 X 10 1 . In a computer that truncates to five significant digits, 1T would be stored as 0.3 1415 X 10 1 (and displayed as 3.1415); a computer that rounds to five significant digits would store 1T as 0.3 1416 X 10 1 (and display 3.1416). When the dropped digit is a solitary 5, the last remaining digit is rounded so that it becomes even. Thus, rounded to two significant digits, 0.735 becomes 0.74 while 0.725 becomes 0.72. Whenever truncation or rounding occurs, a roundoff error is introduced, which can have a dramatic effect on the calculations. The more operations that are per­ formed, the more the error accumulates. Sometimes, unfortunately, there is nothing we can do about this. This exploration illustrates this phenomenon with very simple systems of linear equations. 1 . Solve the following system of linear equations exactly (that is, work with rational numbers throughout the calculations). x+ y=O 01 = 1 X + 8sooY 2. As a decimal, ��b = 1 .00 125, so, rounded to five significant digits, the system becomes x+ y=O x + 1 .00 1 2y = 1 Using your calculator or CAS, solve this system, rounding the result of every calcula­ tion to five significant digits. 83 3. Solve the system two more times, rounding first to four significant digits and then to three significant digits. What happens? 4. Clearly, a very small roundoff error (less than or equal to 0.00 125) can re­ sult in very large errors in the solution. Explain why geometrically. (Think about the graphs of the various linear systems you solved in Problems 1 - 3 .) Systems such as the one you just worked with are called ill-conditioned. They are extremely sensitive to roundoff errors, and there is not much we can do about it. We will encounter ill-conditioned systems again in Chapters 3 and 7. Here is another example to experiment with: 4.552x + 7.083y = 1 .93 1 1 .73 1x + 2.693y = 2.00 1 Play around with various numbers of significant digits to see what happens, starting with eight significant digits (if you can). Partial P ivoting In Exploration: Lies My Computer Told Me, we saw that ill-conditioned linear sys­ tems can cause trouble when roundoff error occurs. In this exploration, you will dis­ cover another way in which linear systems are sensitive to roundoff error and see that very small changes in the coefficients can lead to huge inaccuracies in the solution. Fortunately, there is something that can be done to minimize or even eliminate this problem (unlike the problem with ill-conditioned systems). 84 1 . (a) Solve the single linear equation 0.0002 lx = 1 for x. (b) Suppose your calculator can carry only four decimal places. The equa­ tion will be rounded to 0.0002x = 1 . Solve this equation. The difference between the answers in parts (a) and (b) can be thought of as the effect of an error of 0.0000 1 on the solution of the given equation. 2. Now extend this idea to a system of linear equations. (a) With Gaussian elimination, solve the linear system 0.400x + 99.6y = 1 00 75.3x - 45. 3y = 30.0 using three significant digits. Begin by pivoting on 0.400 and take each calculation to three significant digits. You should obtain the "solution" x = - 1 .00, y = 1 .0 1 . Check that the actual solution is x = 1 .00, y = 1 .00. This is a huge error-200% in the x value! Can you discover what caused it? (b) Solve the system in part (a) again, this time interchanging the two equa­ tions (or, equivalently, the two rows of its augmented matrix) and pivoting on 75.3. Again, take each calculation to three significant digits. What is the solution this time? The moral of the story is that, when using Gaussian or Gauss-Jordan elimination to obtain a numerical solution to a system of linear equations (i.e., a decimal approxi­ mation), you should choose the pivots with care. Specifically, at each pivoting step, choose from among all possible pivots in a column the entry with the largest absolute value. Use row interchanges to bring this element into the correct position and use it to create zeros where needed in the column. This strategy is known as partial pivoting. 3. Solve the following systems by Gaussian elimination, first without and then with partial pivoting. Take each calculation to three significant digits. (The exact solutions are given.) (a) O.- O10.O l2xx 0.1.99050yy - 51.0.000 -l Ox3x - 2.09y7y 6z 73.91 5x - y [5zx l 6[ 0.00 ] . 0 0 5 0 0 y 1. [;] [ 1.00 ] z - 1.00 + + (b) = = Exact solution: + + + Exact solution: = = = = C o unting O p e r at i o n s : An I n t r o d u c t i o n to t h e An alys i s of Algorith m s Gaussian and Gauss-Jordan elimination are examples of algorithms: systematic pro­ cedures designed to implement a particular task-in this case, the row reduction of the augmented matrix of a system of linear equations. Algorithms are particularly well suited to computer implementation, but not all algorithms are created equal. Apart from the speed, memory, and other attributes of the computer system on which they are running, some algorithms are faster than others. One measure of the so-called of an algorithm (a measure of its efficiency, or ability to perform its task in a reasonable number of steps) is the number of basic operations it performs as a func­ tion of the number of variables that are input. Let's examine this proposition in the case of the two algorithms we have for solving a linear system: Gaussian and Gauss-Jordan elimination. For our pur­ poses, the basic operations are multiplication and division; we will assume that all other operations are performed much more rapidly and can be ignored. (This is a reasonable assumption, but we will not attempt to justify it.) We will consider only systems of equations with coefficient matrices, so, if the coefficient matrix is n X n, the number of input variables is n. Thus, our task is to find the number of operations performed by Gaussian and Gauss-Jordan elimination as a function of n. Furthermore, we will not worry about special cases that may arise, but rather establish the that can arise-when the algorithm takes as long as possible. Since this will give us an estimate of the time it will take a computer to perform the algorithm (if we know how long it takes a computer to perform a single operation) , we will denote the number of operations performed by an algorithm by T ( n ) . We will typically be interested in T ( n ) for large values of n, so comparing this function for different algorithms will allow us to determine which will take less time to execute. com­ plexity 0 00 j 780-850) Abu Ja'far Muhammad ibn Musa al-Khwarizmi (c. was a Persian mathematician whose book Hisab al-jabr w'al muqabalah (c. described the use of Hindu­ Arabic numerals and the rules of basic arithmetic. The second word of the book's title gives rise to the English word algebra, and the word algorithm is derived from al-Khwarizmi's name. 825) 9 square worst case 1. Consider the augmented matrix [A l h J 9 4 85 I Count the number of operations required to bring [A b] to the row echelon form 4] [: 2 3 1 -1 0 0 1 1 (By "operation" we mean a multiplication or a division.) Now count the number of operations needed to complete the back substitution phase of Gaussian elimi­ nation. Record the total number of operations. 2. Count the number of I operations needed to perform Gauss-Jordan elimination-that is, to reduce [A b ] to its reduced row echelon form (where the zeros are introduced into each column immediately after the leading 1 is created in that column) . What do your answers suggest about the relative efficiency of the two algorithms? We will now attempt to analyze I the algorithms in a general, systematic way. Sup­ pose the augmented matrix [A b ] arises from a linear system with n equations and I n variables; thus, [A b ] is n X (n + 1 ): I [A b ] = [ all � a 1 an ] We will assume that row interchanges are never needed-that we can always create a leading 1 from a pivot by dividing by the pivot. 3. (a) Show that n operations are needed to create the first leading 1 : � 86 l � l� :, l l l * * a I a z2 a2 n an l an2 a nn b n * * * * an2 a nn a., a 12 aln b, a2 1 a 22 a2 n an l a n2 a nn b n 2 � (Why don't we need to count an operation for the creation of the leading 1 ?) Now show that n operations are needed to obtain the first zero in column 1 : la�, iJ ....,.. (Why don't we need to count an operation for the creation of the zero itself?) When the first column has been "swept out;' we have the matrix [� : : :] * * 0 * Show that the total number of operations needed up to this point is n + (n - 1) n. (b) Show that the total number of operations needed to reach the row echelon form is [! * * * * 0 1 * ' ] [ n + (n - l)n ] + [ (n - 1) + (n - 2)(n - 1 ) ] + [ (n - 2) + (n - 3)(n - 2 ) ] + . . . + [2 + 1 . 2 ] + 1 which simplifies to n 2 + (n - 1 ) 2 + · · · + 22 + 1 2 ( c) Show that the number of operations needed to complete the back substi­ tution phase is 1 +2+ · · · + (n - 1) (d) Using summation formulas for the sums in parts (b) and (c) (see Exercises 51 and 52 in Section 2.4 and Appendix B), show that the total number of operations, T(n), performed by Gaussian elimination is T(n) = tn 3 + n 2 - t n Since every polynomial function is dominated by its leading term fo r large values of the variable, we see that T (n) = t n 3 for large values of n. 4. Show that Gauss-Jordan elimination has T(n) = ! n 3 total operations if we create zeros above and below the leading l s as we go. (This shows that, for large systems of linear equations, Gaussian elimination is faster than this version of Gauss­ Jordan elimination.) 81 88 Chapter 2 Systems of Linear Equations � S p a n n i n g se1s a n d l i n e a r I n d e p e n d e n c e The second of the three roads in our "trivium" is concerned with linear combina­ tions of vectors. We have seen that we can view solving a system of linear equations as asking whether a certain vector is a linear combination of certain other vectors. We explore this idea in more detail in this section. It leads to some very important concepts, which we will encounter repeatedly in later chapters. Spanning Sets of Vectors We can now easily answer the question raised in Section 1 . 1 : When is a given vector a linear combination of other given vectors? Exa m p l e 2 . 1 8 (o) fa th, wdoc (b) fa m m m [ � �} m [ � �} ' linm rnmbinotion of the vodorn ' linemomb;mtion of the vectorn Solulion and and (a) We want to find scalars x and y such that Expanding, we obtain the system whose augmented matrix is x- y= 1 y=2 3x - 3y = 3 [� �� �] (Observe that the columns of the augmented matrix are just the given vectors; notice the order of the vectors-in particular, which vector is the constant vector.) The reduced echelon form of this matrix is [� : �] � (Verify this.) So the solution is x = 3, y combination is 2, and the corresponding linear Section 2. 3 Spanning Sets and Linear Independence 89 (b) Utilizing our observation in part (a), we obtain a linear system whose augmented matrix is which reduces to m [J ;eve.ling that th, 'Y'"m h"' no ,oJotion. Th°', in thi, ""· bination of and [ �] 4 i' not a linm com- The notion of a spanning set is intimately connected with the solution of linear systems. Look back at Example 2 . 1 8. There we saw that a system with augmented matrix [A I b] has a solution precisely when b is a linear combination of the columns of A. This is a general fact, summarized in the next theorem. Theorem 2 . 4 A system of linear equations with augmented matrix [A I b] is consistent if and only if b is a linear combination of the columns of A. Let's revisit Example 2.4, interpreting it in light of Theorem 2.4. (a) The system x-y= l x+y=3 has the unique solution x = 2, y = 1 . Thus, See Figure 2.S(a). (b) The system x- y=2 2 x - 2y = 4 has infinitely many solutions of the form x = 2 + t, y = t. This implies that for all values of t. Geometrically, the vectors [ � ], [ = � ], [! ] and are all parallel and so all lie along the same line through the origin [see Figure 2.S(b) ] . Chapter 90 2 Systems of Linear Equations y y y 5 5 5 4 4 -2 2 -1 3 -1 -2 -2 -3 -3 ( a) -3 (b) (c) Figure 2 . 8 (c) The system x-y= I x-y= 3 has no solutions, so there are no values of x and y that satisfy In this case, [�] [=�] and are parallel, but through the origin [see Figure 2.S(c) ] . [ �] does not lie along the same line We will often b e interested in the collection of all linear combinations of a given set of vectors. D e f i n i t i o n If S = {v1 , v2 , . . . , vk} is a set of vectors in u;g n , then the set of all linear combinations of v1 , v2 , . . . , vk is called the span of v1 , v2 , . . . , vk and is de, vk) or span(S ). If span(S ) = W, then S is called a span­ noted by span(v1 , v2 , ning set for u;g n . . Exa m p l e 2 . 1 9 Show that u;g z = span Solulion • ( [ _ � l [ �] ) . We need to show that an arbitrary vector combination of 1 [�] [ : ] = . [ _ � ] [� } and [:] can be written as a linear that is, we must show that the equation x [ _�] + can always be solved for x and y (in terms of a and b ), regardless of the values of a and b. Section The augmented matrix is 2. 3 Spanning Sets and Linear Independence 91 [ _ � � I �] ' 1] [ 1 ] and row reduction produces b 3 b �1 - 1 3 0 7 a + 2b 1 a at which point it is clear that the system has a (unique) solution. (Why?) If we con­ tinue, we obtain ---'-----+ b R1 ] [ I [ 1 R 1 - 3R, - 1 O (b - 3 a )/ 7 b 3 � 0 1 (a + 2b)/7 0 1 (a + 2b)/7 -1 ] from which we see that x = ( 3a - b)/7 and y = (a + 2b)/7. Thus, for any choice of a and b, we have � (Check this.) Remark ( [ _�l [�l [�] } [:l [ _ �] y [�] [ : l [ _ �] [�] It is also true that IR 2 = span find x and y such that x [�] [:] . + = If, given we can then we also have x +y + In fact, any set of vectors that contains a spanning set for IR 2 will also be a spanning set for IR 2 (see Exercise 20). The next example is an important (easy) case of a spanning set. We will encounter versions of this example many times. 0 Exa m p l e 2 . 2 0 = Let e , , e, . and e; be the standMd unit vedocs in HI'. Then I°' any mtn< = [;] � x[ � ] m m � +y +z xe , + ye, + ze., [;J Thus, IR 3 span(e 1 , e2 , e 3 ) . You should have n o difficulty seeing that, i n general, !R n = span(e 1 , e2 , . • . we haw , en 4 When the span of a set of vectors in !R n is not all of !R n , it is reasonable to ask for a description of the vectors' span. Exa m p l e 2 . 2 1 92 Chapter 2 Systems of Linear Equations z Solulion m [J and m [ �: ] [�l m { ] m [J Thinking geometrically, we can see that the set of all linear combinations of i' iu't the plane thrnugh the migm wiili and � s vecto" ( Figme 2. 9). The vectm equotion of this pfane is which i*t anothe; way of "ying that Figure 2 . 9 Two nonparallel vectors span a plane [� l OS is m th, ,pan of dimtion + and Suppose we want to obtain the general equation of this plane. There are several ways to proceed. One is to use the fact that the equation ax + by + cz = 0 must be satisfied by the points ( 1 , 0, 3) and ( - 1, 1, - 3) determined by the direction vectors. Substitution then leads to a system of equations in a, b, and c. (See Exercise 1 7.) Another method is to use the system of equations arising from the vector equation: s- t=x t=y 3s - 3t = z If we row reduce the augmented matrix, we obtain N�� we know that this ry'1em is consistent, since [; ] is in the span of [�] and [ _�] by ossumption. So we must have z - 3x � 0 (o< 3x - z � 0, in mo<e standa<d Remark A normal vector to the plane in this example is also given by the cross form), giving us the general equation we seek. product Linear Independence ill E"mple 2 18, wdound that 3 m [ �: l m +2 � Let's abb<e�ate thi"qua­ tion as 3u + 2v = w. The vector w "depends" on u and v in the sense that it is a linear combination of them. We say that a set of vectors is linearly dependent if one Section 2. 3 Spanning Sets and Linear Independence 93 of them can be written as a linear combination of the others. Note that we also have �v + t w and v = � u + ! w. To get around the question of which vector to express in terms of the rest, the formal definition is stated as follows: u= - - Definition scalars c 1 , c2 , A set of vectors v1 , v2 , , vk is linearly dependent if there are , ck, at least one of which is not zero, such that . . • • • . A set of vectors that is not linearly dependent is called linearly independent. R e m arks In the definition of linear dependence, the requirement that at least one of the scalars c 1 , c2 , , ck must be nonzero allows for the possibility that some may be zero. In the example above, u, v, and w are linearly dependent, since 3u + 2v - w = 0 and, in fact, all of the scalars are nonzero. On the other hand, • so . . • [!l [� l [�] are linearly dependent, since at least one (in fact, two) of the and three scalars 1 , - 2, and 0 is nonzero. (Note that the actual dependence arises simply from the fact that the first two vectors are multiples.) (See Exercise 44.) • Since Ov1 + Ov2 + . . . + Ovk = 0 for any vectors v1 , v2 , . . . , vk, linear de­ pendence essentially says that the zero vector can be expressed as a nontrivial linear combination of v1 , v2 , . . . , vk · Thus, linear independence means that the zero vector can be expressed as a linear combination of v1 , v2 , , vk only in the trivial way: c 1 v1 + c2v2 + + ckvk = 0 only if c 1 = 0, c2 = 0, . . . , ck = 0. • · · • . · The relationship between the intuitive notion of dependence and the formal defi­ nition is given in the next theorem. Happily, the two notions are equivalent! Theorem 2 . 5 Vectors v1 , v2 , , vm in !R n are linearly dependent if and only if at least one of the vectors can be expressed as a linear combination of the others. • . . Proof If one of the vectors-say, v1 -is a linear combination of the others, then there are scalars c2 , , c m such that v1 = c2v2 + . . . + c mvm . Rearranging, we obtain v1 c2v2 cmvm = 0, which implies that v1 , v2 , . . . , vm are linearly dependent, since at least one of the scalars (namely, the coefficient 1 of v1 ) is nonzero. Conversely, suppose that v1 , v2 , , vm are linearly dependent. Then there are scalars c 1 , c2 , . . . , cm , not all zero, such that c 1 v 1 + c2v2 + . . . + c mvm = 0. Suppose c 1 -=!= 0. Then . - - · · · • • - • . . and we may multiply both sides by 1/ c 1 to obtain v1 as a linear combination of the other vectors: ( C1 ) v1 = - � v2 - · · · - ( cCm1 ) vm 94 Chapter 2 Systems of Linear Equations It may appear as if we are cheating a bit in this proof. After all, we cannot be sure that v1 is a linear combination of the other vectors, nor that c1 is nonzero. However, the argument is analogous for some other vector V; or for a different scalar cj . Alternatively, we can just relabel things so that they work out as in the above proof. In a situation like this, a mathematician might begin by saying, "without loss of gen­ erality, we may assume that v1 is a linear combination of the other vectors" and then proceed as above. Note Exa m p l e 2 . 2 2 Any set of vectors containing the zero vector is linearly dependent. For if 0, v2, , vm are in IJ�r, then we can find a nontrivial combination of the form c10 + c2v2 + . . . + Cm Vm = 0 by setting c1 = 1 and c2 = c3 = . . . = C m = 0 . Exa m p l e 2 . 2 3 Determine whether the following sets of vectors are linearly independent: . (a) (c) . • [�] [ �] [ -i ] . [ J n l and - and In answering any question of this type, it is a good idea to see if you can determine by inspection whether one vector is a linear combination of the others. A little thought may save a lot of computation! (a) The only way two vectors can be linearly dependent is if one is a multiple of the other. (Why?) These two vectors are clearly not multiples, so they are linearly independent. (b) There is no obvious dependence relation here, so we try to find scalars c1, c2, c3 such that Solution � The corresponding linear system is C3 = 0 C1 + C 1 + Cz and the augmented matrix is [i =O 0 1 O 0 0 1 0 J Once again, we make the fundamental observation that the columns of the coefficient matrix are just the vectors in question! Section 2. 3 Spanning Sets and Linear Independence The reduced row echelon form is � � [i 95 OJ 0 0 1 0 0 0 1 0 (check this), so c 1 = 0, c2 = 0, c 3 = 0. Thus, the given vectors are linearly independent. ( c) A little reflection reveals that so the three vectors are linearly dependent. [Set up a linear system as in part (b) to check this algebraically.] ( d) Once again, we observe no obvious dependence, so we proceed directly to reduce a homogeneous linear system whose augmented matrix has as its columns the given vectors: [ � � : � 1 �I [ � 1 1 0 -1 2 0 0 -1 2 0 l �: �: 0 -1 2 0 If we let the scalars be c 1 , c2 , and c 3 , we have c1 + 3c 3 = 0 c2 - 2c3 = 0 � _ R , [ -� �1 1 0 0 1 0 0 0 0 from which we see that the system has infinitely many solutions. In particular, there must be a nonzero solution, so the given vectors are linearly dependent. If we continue, we can describe these solutions exactly: c 1 = - 3c3 and c2 = 2c3 . Thus, for any nonzero value of c3 , we have the linear dependence relation � (Once again, check that this is correct.) We summarize this procedure for testing for linear independence as a theorem. Theorem 2 . 6 Let v 1 , v2 , . . . , vm be (column) vectors in IK£ n and let A be the n X m matrix [v1 v2 vm l with these vectors as its columns. Then v1 , v2 , , vm are linearly dependent if and only if the homogeneous linear system with augmented matrix [A I O ] has a nontrivial solution. · · . · [ :: ] • • , vm are linearly dependent ifand only if there are scalars c 1 , c2 , , cm , not all zero, such that c 1 v1 + c2v2 + . . . + cmvm = 0. By Theorem 2.4, this is equivalent Proof v1 , v2 , . . • to sa'.i� g that the nonzero vector matnx is [v1 v2 vm I OJ . . • . . · cm • • . is a solution of the system whose augmented 96 Chapter 2 Systems of Linear Equations Exa m p l e 2 . 2 4 The standard unit vectors e 1 , e2 , and e 3 are linearly independent in IR 3 , since the sys­ tem with augmented matrix [ e1 e2 e3 I OJ is already in the reduced row echelon form [: 0 0 O 1 0 0 0 1 0 J and so clearly has only the trivial solution. In general, we see that e1 , e2 , . . . , en will be linearly independent in !R n . Performing elementary row operations on a matrix constructs linear combina­ tions of the rows. We can use this fact to come up with another way to test vectors for linear independence. Exa m p l e 2 . 2 5 Consider the three vectors of Example 2.23(d) as row vectors: [ 1 , 2, 0] , ( 1 , 1 , - 1 ] , and [ 1 , 4, 2] We construct a matrix with these vectors as its rows and proceed to reduce it to eche­ lon form. Each time a row changes, we denote the new row by adding a prime symbol: [: 2 1 4 2 -1 2 From this we see that O -1 2 J � R; = R; + ZR; l -�J [ 2 0 -1 0 0 or, in terms of the original vectors, - 3 ( 1 , 2, 0 ] + 2 [ 1 , 1 , - 1 ] + ( 1 , 4, 2 ] = [ O, O, O ] [Notice that this approach corresponds to taking c 3 = 1 in the solution to Example 2.23(d).] Thus, the rows of a matrix will be linearly dependent if elementary row opera­ tions can be used to create a zero row. We summarize this finding as follows: Theorem 2 . 1 Let v, , v, . . . . . vm be ( rnw) ve<to" in �" and kt A be t he m X n motrix these vectors as its rows. Then v1 , v2 , rank(A) < m . • . • [J with , vm are linearly dependent if and only if Proof Assume that v1 , v2 , , vm are linearly dependent. Then, by Theorem 2.2, at least one of the vectors can be written as a linear combination of the others. • • . Section 2. 3 Spanning Sets and Linear Independence 91 We relabel the vectors, if necessary, so that we can write vm = c 1 v1 + c 2v2 + + cm - iYm - l · Then the elementary row operations R m - c 1 R 1 , R m - c 2 R2 , , R m - cm - i R m - I applied to A will create a zero row in row m. Thus, rank(A) < m. Conversely, assume that rank(A) < m. Then there is some sequence of row opera­ tions that will create a zero row. A successive substitution argument analogous to that used in Example 2.25 can be used to show that 0 is a nontrivial linear combination of v1 , v2 , , vm . Thus, v1 , v2 , , vm are linearly dependent. · • • . . . • · · • • . In some situations, we can deduce that a set of vectors is linearly dependent with­ out doing any work. One such situation is when the zero vector is in the set (as in Example 2.22). Another is when there are "too many" vectors to be independent. The following theorem summarizes this case. (We will see a sharper version of this result in Chapter 6.) Theorem 2 . 8 Any set of m vectors in !R n is linearly dependent if m > n. Let v1 , v2 , , vm be (column) vectors in !R n and let A be the n X m matrix [v1 v2 vm l with these vectors as its columns. By Theorem 2.6, v1 , v2 , , vm are lin­ early dependent if and only if the homogeneous linear system with augmented matrix [A I O] has a nontrivial solution. But, according to Theorem 2.6, this will always be the case if A has more columns than rows; it is the case here, since number of columns m is greater than number of rows n. Proof . . Exa m p l e 2 . 2 6 ... I • • • • . The vectors [ � ], [ !] [ � ] , and . . are linearly dependent, since there cannot be more than two linearly independent vectors in IR 2 • (Note that if we want to find the ac­ tual dependence relation among these three vectors, we must solve the homogeneous system whose coefficient matrix has the given vectors as columns. Do this! ) Exercises 2 . 3 In Exercises 1 -6, determine if the vector v is a linear combi­ nation of the remaining vectors. � m. u, m. u, [ : J u, � m 3[ .2 ] [ 0.4 ] [ 31..44 ] -[ �:� ] 5. v GAS � � 6. V = 1 .0 , ll2 = 2.0 , U 1 = 48 - 2.6 . U3 = - 1 .0 - 64 . , Chapter 98 2 Systems of Linear Equations In Exercises 7 and 8, determine if the vector b is in the span of the columns of the matrix A. [� ! l [ : J 8. [ � ! � ] , [ : l ([�] [ �]) ( [ � l [ �] ). 'P""( [ H [J [ : ] ) 'P"" ( [J m. [ _ : ] ) 7. A = A= (b) In part (a), suppose in addition that each vj is also a linear combination of u 1 , . . . , uk. Prove that span(u 1 , . . . , uk) = span(v1 , . . . , vm ) . (c ) Use the result o f part (b) t o prove that b= 9 10 1 1 b= 9 . Show that IR 2 = span 10. Show that IR 2 = span 12 , _ . _ 1 1 . Show thot � ' � [Hint: We know that IR 3 = span(e 1 , e2 , e3 ).] Use the method of Example 2.23 and Theorem 2. 6 to deter­ mine if the sets of vectors in Exercises 22-31 are linearly in­ dependent. If, for any of these, the answer can be determined by inspection (i.e., without calculation), state why. For any sets that are linearly dependent, find a dependence relation­ ship among the vectors. [ : ] . m. [ - � ] [ :H�lE m. [ !J Hl [ -; i . [ -!J [ : J m [ ;] . [ H [ � : 23. 12. Show thot � ' � 24 • 25 ::r!fFr ::�fa [:J m. u i [ _ � i r n [ - : i 28r�H �J u1 ;�:�;�; �� �rt��� � ��� �;���:� �;::! � ��: In Exercises 1 3- 1 6, describe the span of the given vectors d (b) alg,bm 15. 26. 16. 17. 1 r f n f , r i , 1 form ax + by + cz = 0. Solve for a, b, and c. 18. Prove that u, v, and w are all in span(u, v, w) . 19. Prove that u, v, and w are all in span(u, u + v, u + � ; ;:� ve that if u 1 , . . . , um are vectors in !R n , S = { U 1 , U2 , . . . , Uk } , an d T = { U 1 , . . . , Uk , Uk + !> . . . , um }, then span (S ) � span ( T) . [Hint: Rephrase this question in terms of linear combinations.] (b) Deduce that if [R n = span (S), then [R n = span ( T) also. 2 1 . (a) Suppose that vector w is a linear combination of vectors u 1 , . . . , uk and that each u; is a linear combination of vectors v1 , . . . , vm . Prove that w is a linear combination ofv1 , . . . , vm and therefore span(u 1 , . . . , uk) � span(v 1 , . . . , vm ) . 20 . a [ l [ -1 1 [ l [ o l �[ l [ � l [ � l [ : l 3 �[ l [ - � 1 [ � l , [ -- 3� 1 l 29 . 30 · 31. 27. -1 , 1 0 l 1 1 0 , , -1 1 0 1 1 -1 0 ' 2 ' 2 ' 2 1 1 1 1 1 l - , , -l l -l l In Exercises 32- 41, determine if the sets of vectors in the given exercise are linearly independent by converting the Section vectors to row vectors and using the method of Example 2.25 and Theorem 2.7. For any sets that are linearly dependent, find a dependence relationship among the vectors. 32. Exercise 22 33. Exercise 23 34. Exercise 24 35. Exercise 25 36. Exercise 26 37. Exercise 27 38. Exercise 28 39. Exercise 29 40. Exercise 30 41. Exercise 3 1 42. (a) If the columns of an n X n matrix A are linearly in­ dependent as vectors in !R n , what is the rank of A? Explain. (b) If the rows of an n X n matrix A are linearly inde­ pendent as vectors in !R n , what is the rank of A? Explain. 43. (a) If vectors u, v, and w are linearly independent, will u + v, v + w, and u + w also be linearly indepen dent? Justify your answer. 11 f 2.4 Applications 99 (b) If vectors u, v, and w are linearly independent, will u - v, v - w, and u - w also be linearly indepen­ dent? Justify your answer. 44. Prove that two vectors are linearly dependent if and only if one is a scalar multiple of the other. [Hint: Separately consider the case where one of the vectors is O.] 45. Give a "row vector proof" of Theorem 2.8. 46. Prove that every subset of a linearly independent set is linearly independent. 47. Suppose that S = {v1 , , vk> v} is a set of vectors in some !R n and that v is a linear combination of v 1 , . . . , vk. If S' = {v1 , . . . , vd, prove that span (S ) = span (S'). [Hint: Exercise 2 l (b) is helpful here.] 48. Let {v1 , . . . , vd be a linearly independent set of vec­ tors in !R n , and let v be a vector in !R n . Suppose that v = c 1 v1 + c 2 v2 + · · · + ck vk with c 1 * 0. Prove that {v, v2 , , vd is linearly independent. . . • • . . A p p l icati o n s There are too many applications of systems of linear equations to do them justice in a single section. This section will introduce a few applications, to illustrate the diverse settings in which they arise. Allocation o f Resources A great many applications of systems of linear equations involve allocating limited resources subject to a set of constraints. Exa m p l e 2 . 2 1 A biologist has placed three strains of bacteria (denoted I, II, and III) in a test tube, where they will feed on three different food sources (A, B, and C). Each day 2300 units of A, 800 units of B, and 1 500 units of C are placed in the test tube, and each bacte­ rium consumes a certain number of units of each food per day, as shown in Table 2.2. How many bacteria of each strain can coexist in the test tube and consume all of the food? Ta b l e 2 . 2 Food A Food B Food C Bacteria Strain I 2 1 1 Bacteria Strain II 2 2 3 Bacteria Strain III 4 0 100 Chapter 2 Systems of Linear Equations Let x 1 , x 2 , and x3 b e the numbers of bacteria of strains I , II, and III, respectively. Since each of the x 1 bacteria of strain I consumes 2 units of A per day, strain I consumes a total of 2x 1 units per day. Similarly, strains II and III consume a total of 2x 2 and 4x 3 units of food A daily. Since we want to use up all of the 2300 units of A, we have the equation 2x 1 + 2x2 + 4x3 = 2300 Solulion Likewise, we obtain equations corresponding to the consumption of B and C: = 800 X 1 + 2X 2 X 1 + 3X 2 + X3 = 1 500 Thus, we have a system of three linear equations in three variables. Row reduction of the corresponding augmented matrix gives [: 2 4 2300 2 0 800 3 1 1 500 l ] [ � 0 0 1 00 0 0 350 0 0 1 350 ] Therefore, x 1 = 1 00, x 2 = 350, and x 3 = 350. The biologist should place 100 bacteria of strain I and 350 of each of strains II and III in the test tube if she wants all the food to be consumed. Exa m p l e 2 . 2 8 Repeat Example 2.27, using the data on daily consumption of food (units per day) shown in Table 2.3. Assume this time that 1 500 units of A, 3000 units of B, and 4500 units of C are placed in the test tube each day. Ta b l e 2 . 3 Bacteria Strain II Bacteria Strain I Food A Food B Food C 2 3 Bacteria Strain III 1 3 5 Let x 1 , x2 , and x3 again be the numbers of bacteria of each type. The aug­ mented matrix for the resulting linear system and the corresponding reduced echelon form are Solulion [: � ���� i 1 2 3 5 4500 � �[ � O J -1 2 1 500 0 0 0 0 We see that in this case we have more than one solution, given by 0 - X3 = X 2 + 2X 3 = 1 500 Letting x3 = t, we obtain x 1 = t, x2 = 1 500 - 2t, and x3 = t. In any applied problem, we must be careful to interpret solutions properly. Certainly the number of bacteria X1 Section 2.4 Applications 101 cannot be negative. Therefore, t 2: 0 and 1 500 2t 2: 0. The latter inequality implies that t :::::: 750, so we have 0 :::::: t :::::: 750. Presumably the number of bacteria must be a whole number, so there are exactly 75 1 values of t that satisfy the inequality. Thus, our 75 1 solutions are of the form - one for each integer value of t such that 0 :::::: t :::::: 750. (So, although mathematically this system has infinitely many solutions, physically there are only finitely many.) Balancing Chemical Equations .+ When a chemical reaction occurs, certain molecules (the reactants) combine to form new molecules (the products). A balanced chemical equation is an algebraic equation that gives the relative numbers of reactants and products in the reaction and has the same number of atoms of each type on the left- and right-hand sides. The equation is usually written with the reactants on the left, the products on the right, and an arrow in between to show the direction of the reaction. For example, for the reaction in which hydrogen gas (H 2 ) and oxygen (0 2 ) com­ bine to form water (H 2 0), a balanced chemical equation is 2H 2 + 0 2 -----+ 2H 2 0 indicating that two molecules of hydrogen combine with one molecule of oxygen to form two molecules of water. Observe that the equation is balanced, since there are four hydrogen atoms and two oxygen atoms on each side. Note that there will never be a unique balanced equation for a reaction, since any positive integer multiple of a balanced equation will also be balanced. For example, 6H 2 + 30 2 -----+ 6H 2 0 is also balanced. Therefore, we usually look for the simplest balanced equation for a given reaction. While trial and error will often work in simple examples, the process of balancing chemical equations really involves solving a homogeneous system oflinear equations, so we can use the techniques we have developed to remove the guesswork. Exa m p l e 2 . 2 9 The combustion of ammonia (NH 3 ) in oxygen produces nitrogen (N 2 ) and water. Find a balanced chemical equation for this reaction. Solution If we denote the numbers of molecules of ammonia, oxygen, nitrogen, and water by w, x, y, and z, respectively, then we are seeking an equation of the form wNH 3 + x0 2 -----+ yN 2 + zH 2 0 Comparing the numbers of nitrogen, hydrogen, and oxygen atoms in the reactants and products, we obtain three linear equations: Nitrogen: w = 2y Hydrogen: 3w = 2z Oxygen: 2x = z Rewriting these equations in standard form gives us a homogeneous system of three linear equations in four variables. [Notice that Theorem 2.3 guarantees that such a 102 Chapter 2 Systems of Linear Equations system will have (infinitely many) nontrivial solutions.] We reduce the corresponding augmented matrix by Gauss-Jordan elimination. w 3w - 2y 2x 0O -----+ [ = 0 - 2z = - z = 1 0 3 0 0 2 -2 0 0 - � �] -----+ [ � � -1 0 0 0 0 0 1 0] -t -! 0 -t 0 Thus, w = t z, x = ! z, and y = t z . The smallest positive value of z that will produce integer values for all four variables is the least common denominator of the fractions t, !, and t-namely, 6-which gives w = 4, x = 3, y = 2, and z = 6. Therefore, the balanced chemical equation is Network Analysis j20 f +2 i 30 Many practical situations give rise to networks: transportation networks, communi­ cations networks, and economic networks, to name a few. Of particular interest are the possible flows through networks. For example, vehicles flow through a network of roads, information flows through a data network, and goods and services flow through an economic network. For us, a network will consist of a finite number of nodes (also called junctions or vertices) connected by a series of directed edges known as branches or arcs. Each branch will be labeled with a flow that represents the amount of some commodity that can flow along or through that branch in the indicated direction. (Think of cars traveling along a network of one-way streets.) The fundamental rule governing flow through a network is conservation offlow: At each node, the flow in equals the flow out. Figure 2 . 1 0 Flow at a node: f1 + f2 = 50 Exa m p l e 2 . 3 0 Figure 2 . 1 0 shows a portion of a network, with two branches entering a node and two leaving. The conservation of flow rule implies that the total incoming flow, f1 + f 2 units, must match the total outgoing flow, 20 + 30 units. Thus, we have the linear equation f1 + f = 50 corresponding to this node. 2 We can analyze the flow through an entire network by constructing such equa­ tions and solving the resulting system of linear equations. Describe the possible flows through the network of water pipes shown in Figure 2. 1 1 , where flow is measured in liters per minute. At each node, we write out the equation that represents the conservation of flow there. We then rewrite each equation with the variables on the left and the constant on the right, to get a linear system in standard form. Solution Node A : 1 5 = !1 + 14 Node B : !1 = !2 + 1 0 Node C: !2 + f3 + 5 = 3 0 Node D: f4 + 20 = f3 -----+ + f4 = 1 5 !1 = 10 !1 - !2 = 25 !2 + f3 f, - f4 = 20 Section 5! jQ. A 20 - !1 J4 ! h - D Figure 2 . 1 1 10 1d ] 1 [ c 5 +-- Using Gauss-Jordan elimination, we reduce the augmented matrix: .......,.. [i 1 15 0 0 0 10 -1 0 0 25 1 0 1 - 1 20 � 103 - B 3 0! 2.4 Applications 0 0 0 0 0 0 1 0 0 0 1 15 1 5 - 1 20 0 0 ] (Check this.) We see that there is one free variable, f4 , so we have infinitely many solutions. Settingf4 = t and expressing the leading variables in terms off4 , we obtain f, = 1 5 - t !2 = 5 - t f3 = 20 + t !4 = These equations describe all possible flows and allow us to analyze the network. For example, we see that if we control the flow on branch AD so that t = 5 L/min, then the other flows are f1 = 1 O,f2 = 0, and f3 = 25. We can do even better: We can find the minimum and maximum possible flows on each branch. Each of the flows must be nonnegative. Examining the first and sec­ ond equations in turn, we see that t :s 15 (otherwise f1 would be negative) and t :s 5 (otherwisef2 would be negative). The second of these inequalities is more restrictive than the first, so we must use it. The third equation contributes no further restrictions on our parameter t, so we have deduced that 0 :s t :s 5. Combining this result with the four equations, we see that 1 0 :s f, :s 1 5 0 :S f2 :S 5 20 :s f3 :s 25 0 :S .fi :S 5 We now have a complete description of the possible flows through this network. 4 104 Chapter 2 Systems of Linear Equations Electrical Networks Electrical networks are a specialized type of network providing information about power sources, such as batteries, and devices powered by these sources, such as light bulbs or motors. A power source "forces" a current of electrons to flow through the network, where it encounters various resistors, each of which requires that a certain amount of force be applied in order for the current to flow through it. The fundamental law of electricity is Ohm's law, which states exactly how much force E is needed to drive a current I through a resistor with resistance R. force = resistance X current O h m 's Law or E = RI Force is measured in volts, resistance in ohms, and current in amperes (or amps, for short). Thus, in terms of these units, Ohm's law becomes "volts = ohms X amps;' and it tells us what the "voltage drop" is when a current passes through a resistor-that is, how much voltage is used up. Current flows out of the positive terminal of a battery and flows back into the negative terminal, traveling around one or more closed circuits in the process. In a diagram of an electrical network, batteries are represented by � I- (where the positive terminal is the longer vertical bar) and resistors are represented by -'VV'v- . The following two laws, whose discovery we owe to Kirchhoff, govern electrical net­ works. The first is a "conservation of flow" law at each node; the second is a "balancing of voltage" law around each circuit. Kirchh off 's Laws Current Law (nodes) The sum of the currents flowing into any node is equal to the sum of the currents flowing out of that node. Voltage Law (circuits) The sum of the voltage drops around any circuit is equal to the total voltage around the circuit (provided by the batteries) . Figure 2 . 1 2 illustrates Kirchhoff's laws. I n part (a), the current law gives 11 = 12 + 13 (or 11 - 12 - 13 = 0, as we will write it); part (b) gives 41 = 10, where we have used Ohm's law to compute the voltage drop 41 at the resistor. Using Kirchhoff's laws, we can set up a system of linear equations that will allow us to determine the currents in an electrical network. Exa m p l e 2 . 3 1 Determine the currents 11 , 12 , and 13 in the electrical network shown in Figure 2 . 1 3 . This network has two batteries and four resistors. Current 11 flows through the top branch BCA, current 12 flows across the middle branch AB, and current 13 flows through the bottom branch BDA. At node A, the current law gives 11 + 13 = 12 , or 11 - 12 + 13 = 0 (Observe that we get the same equation at node B.) Solution Section 1 lY',, 11 c +--- 1 +--- +--- 2. 4 Applications 12 2 ohms 10 volts --+ A 1 � 8 volts h 4 ohms D +--- 4 ohms (b) 41 = 1 0 11 +--- lz --+ I ohm 105 2 ohms B h +--- 1 6 volts Figure 2 . 1 3 Figure 2 . 1 2 Next we apply the voltage law for each circuit. For the circuit CABC, the voltage drops at the resistors are 2Ii , I2 , and 2Ii . Thus, we have the equation 4Ii + I2 Similarly, for the circuit DABD, we obtain I2 + 4I3 = = 8 16 (Notice that there i s actually a third circuit, CADBC, if we "go against the flow:' In this case, we must treat the voltages and resistances on the "reversed" paths as negative. Doing so gives 2Ii + 2Ii - 4I3 = 8 = - 8 or 4I - 4I3 = - 8, which we observe i is just the difference of the voltage equations for the other two circuits. Thus, we can omit this equation, as it contributes no new information. On the other hand, includ­ ing it does no harm.) We now have a system of three linear equations in three variables: - 16 0 16 - 1 1 o 01 o1 � 41 [� � 1! l [ 0 o 1 3 l 1 I1 - I2 + I3 4I1 + I2 I2 + 4I3 Gauss-Jordan elimination produces = 8 = � Hence, the currents are Ii = amp, I2 = 4 amps, and I3 = 3 amps. In some electrical networks, the currents may have fractional values or may even be negative. A negative value simply means that the current in the correspond­ ing branch flows in the direction opposite that shown on the network diagram. Remark CAS Exa m p l e 2 . 3 2 The network shown in Figure 2 . 1 4 has a single power source A and five resistors. Find the currents I, Ii , . . . , I5 • This is an example of what is known in electrical engineering as a Wheatstone bridge circuit. 106 Chapter 2 Systems of Linear Equations 1i i 13 14 -- B I ohm -1 l ohm c 2 ohms !Ii 2 ohms 15 ! -E D 2 ohms A 1 0 volts -1 Figure 2 . 1 4 A bridge circuit Solulion nodes: Kirchhoff's current law gives the following equations at the four Node B: I - I1 - I4 = 0 Node C: I1 - I2 - I3 = 0 Node D: I - I2 - Is = 0 Node E : I3 + I4 - Is = 0 For the three basic circuits, the voltage law gives Circuit ABEDA: I4 + 2Is = 1 0 Circuit BCEB: 2I1 + 2I3 - I4 = 0 Circuit CDEC: I2 - 2Is - 2I3 = 0 (Observe that branch DAB has no resistor and therefore no voltage drop; thus, there is no I term in the equation for circuit ABEDA. Note also that we had to change signs three times because we went "against the current:' This poses no problem, since we will let the sign of the answer determine the direction of current flow.) We now have a system of seven equations in six variables. Row reduction gives 1 0 1 0 0 0 0 � -1 1 0 0 0 2 0 0 -1 -1 0 0 0 0 -1 0 -1 0 0 0 2 -2 -1 0 0 0 0 0 -1 0 -1 0 2 10 0 0 -2 0 � 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 7 0 3 0 4 0 -1 0 4 1 3 0 0 (Use your calculator or CAS to check this.) Thus, the solution (in amps) is I = 7, I1 = Is = 3, I2 = I4 = 4, and I3 = - 1 . The significance of the negative value here is that the current through branch CE is flowing in the direction opposite that marked on the diagram. -+ There is only one power source in this example, so the single 1 0-volt battery sends a current of 7 amps through the network. If we substitute these values into Remark Section 2.4 Applications 101 Ohm's law, E = RI, we get 1 0 = 7R or R = 1?-. Thus, the entire network behaves as if there were a single 1?--ohm resistor. This value is called the effective resistance (Reff) of the network. linear Economic Models An economy is a very complex system with many interrelationships among the vari­ ous sectors of the economy and the goods and services they produce and consume. Determining optimal prices and levels of production subject to desired economic goals requires sophisticated mathematical models. Linear algebra has proven to be a powerful tool in developing and analyzing such economic models. In this section, we introduce two models based on the work of Harvard econo­ mist Wassily Leontief in the 1 930s. His methods, often referred to as input-output analysis, are now standard tools in mathematical economics and are used by cities, corporations, and entire countries for economic planning and forecasting. We begin with a simple example. Exa m p l e 2 . 3 3 The economy of a region consists of three industries, or sectors: service, electricity, and oil production. For simplicity, we assume that each industry produces a single commodity (goods or services) in a given year and that income (output) is gener­ ated from the sale of this commodity. Each industry purchases commodities from the other industries, including itself, in order to generate its output. No commodities are purchased from outside the region and no output is sold outside the region. Further­ more, for each industry, we assume that production exactly equals consumption (out­ put equals input, income equals expenditure). In this sense, this is a closed economy that is in equilibrium. Table 2.4 summarizes how much of each industry's output is consumed by each industry. Ta b l e 2 . 4 (1906-1999) Wassily Leontief was born in St. Petersburg, Russia. He studied at the University of Lenin­ grad and received his Ph.D. from the University of Berlin. He emigrated to the United States in teach­ ing at Harvard University and later at New York University. In Leontiefbegan compiling data for the monumental task of conducting an input-output analysis of the United States economy, the results of which were published in He was also an early user of computers, which he needed to solve the large-scale linear systems in his models. For his pio­ neering work, Leontief was awarded the Nobel Prize in Economics in 1931, 1932, 1941. 1973. Consumed by (input) Service Electricity Oil Service Produced by (output) Electricity Oil 1/4 1/4 1/2 1/3 1/3 1/3 1 /2 1 /4 1 /4 From the first column of the table, we see that the service industry consumes 1 I4 of its own output, electricity consumes another 1/4, and the oil industry uses 1/2 of the service industry's output. The other two columns have similar interpretations. Notice that the sum of each column is 1 , indicating that all of the output of each industry is consumed. Let X i , x2 , and x3 denote the annual output (income) of the service, electricity, and oil industries, respectively, in millions of dollars. Since consumption corresponds to expenditure, the service industry spends � X i on its own commodity, .!.3 x2 on electricity, and t x3 on oil. This means that the service industry's total annual expenditure is � X i + .!.3 x2 + t x3 . Since the economy is in equilibrium, the service industry's 108 Chapter 2 Systems of Linear Equations expenditure must equal its annual income x 1 . This gives the first of the following equations; the other two equations are obtained by analyzing the expenditures of the electricity and oil industries. Service: Electricity: Oil: ...... "] "] Rearranging each equation, we obtain a homogeneous system of linear equations, which we then solve. (Check this!) -� x 1 + l3 x2 + t x3 = 0 i x1 - � x2 - i x3 = 0 3 t x1 + l3 x2 - � x3 = 0 ---7 n I 3 2 l3 -3 l l 0 -i 0 � [� 0 1 - :4 0 0 0 0 Setting x3 = t, we find that x 1 = t and x2 = � t. Thus, we see that the relative outputs of the service, electricity, and oil industries need to be in the ratios x 1 : x2 : x3 = 4 : 3 : 4 for the economy to be in equilibrium. 4 R e m a rks The last example illustrates what is commonly called the Leontief closed model. Since output corresponds to income, we can also think of x 1 , x2 , and x3 as the prices of the three commodities. • • We now modify the model in Example 2.33 to accommodate an open economy, one in which there is an external as well as an internal demand for the commodities that are produced. Not surprisingly, this version is called the Leontief open model. Exa m p l e 2 . 3 4 Consider the three industries of Example 2.33 but with consumption given by Table 2.5. We see that, of the commodities produced by the service industry, 20% are consumed by the service industry, 40% by the electricity industry, and 1 0% by the oil industry. Thus, only 70% of the service industry's output is consumed by this econ­ omy. The implication of this calculation is that there is an excess of output (income) over input (expenditure) for the service industry. We say that the service industry is productive. Likewise, the oil industry is productive but the electricity industry is non­ productive. (This is reflected in the fact that the sums of the first and third columns are less than 1 but the sum of the second column is equal to 1 ) . The excess output may be applied to satisfy an external demand. Tab l e 2 . 5 Consumed by (input) Service Electricity Oil Service Produced by (output) Electricity Oil 0.20 0.40 0. 1 0 0.50 0.20 0.30 0. 1 0 0.20 0.30 Section 2.4 Applications 109 For example, suppose there is an annual external demand (in millions of dollars) for 10, 1 0, and 30 from the service, electricity, and oil industries, respectively. Then, equating expenditures (internal demand and external demand) with income (out­ put), we obtain the following equations: output internal demand external demand Service Xi = 0.2X 1 + 0.5X2 + O.lX3 Electricity X2 X3 = 0.4X 1 + 0.2X2 + 0.2X3 + 10 + 10 + 30 Oil = O.lx 1 + 0.3x2 + 0.3x3 Rearranging, we obtain the following linear system and augmented matrix: 0.8X 1 - O.SX2 - O. lX3 = 1 0 - 0.4X 1 + 0.8X2 - 0.2X3 = 1 0 ---+ - 0. lx 1 - 0.3X2 + 0.7X3 = 30 CAS [ 08 - 0.4 -0.l Row reduction yields [� 0 1 0 0 0 6 1 .74 63.04 78.70 - 0.5 0.8 - 0.3 -0.1 - 0.2 0.7 10 10 30 ] l from which we see that the service, electricity, and oil industries must have an an­ nual production of $6 1 . 7 4, $63.04, and $78. 70 (million), respectively, in order to meet both the internal and external demand for their commodities. We will revisit these models in Section 3.7. Finile Linear Games There are many situations in which we must consider a physical system that has only a finite number of states. Sometimes these states can be altered by applying certain pro­ cesses, each of which produces finitely many outcomes. For example, a light bulb can be on or off and a switch can change the state of the light bulb from on to off and vice versa. Digital systems that arise in computer science are often of this type. More frivolously, many computer games feature puzzles in which a certain device must be manipulated by various switches to produce a desired outcome. The finiteness of such situations is perfectly suited to analysis using modular arithmetic, and often linear systems over some ZP play a role. Problems involving this type of situation are often called finite linear games. Exa m p l e 2 . 3 5 A row of five lights is controlled by five switches. Each switch changes the state (on or off) of the light directly above it and the states of the lights immediately adjacent to the left and right. For example, if the first and third lights are on, as in Figure 2 . l S (a), then pushing switch A changes the state of the system to that shown in Figure 2 . l S (b). If we next push switch C, then the result is the state shown in Figure 2 . l S (c). 110 Chapter B 2 Systems of Linear Equations c D E A B ( a) c D E '- A " B c D IH E (c) (b) Figure 2 . 1 5 Suppose that initially all the lights are off. Can we push the switches in some order so that only the first, third, and fifth lights will be on? Can we push the switches in some order so that only the first light will be on? The on/off nature of this problem suggests that binary notation will be helpful and that we should work with 2 2 • Accordingly, we represent the states of the five lights by a vector in Z�, where 0 represents off and 1 represents on. Thus, for example, the vector Solution 0 1 0 0 corresponds to Figure 2 . I S (b). We may also use vectors in Z� to represent the action of each switch. If a switch changes the state of a light, the corresponding component is a 1 ; otherwise, it is 0. With this convention, the actions of the five switches are given by a = 1 0 , b= 1 , c= 0 0 0 0 0 0 0 1 0 0 1 , d= 0 , e = 1 0 The situation depicted in Figure 2 . I S(a) corresponds to the initial state s = 1 0 1 0 0 followed by a = 1 0 0 0 Section 2.4 Applications 111 It is the vector sum (in Z� ) 0 1 1 s+a= 0 0 Observe that this result agrees with Figure 2 . 1 5 (b). Starting with any initial configuration s, suppose we push the switches in the order A, C, D, A, C, B. This corresponds to the vector sum s + a + c + d + a + c + b. But in Z�, addition is commutative, so we have s + a + c + d + a + c + b = s + 2a + b + 2c + d = s + b + d .- where we have used the fact that 2 = 0 in z:'. 2 . Thus, we would achieve the same result by pushing only B and D-and the order does not matter. (Check that this is correct.) Hence, in this example, we do not need to push any switch more than once. So, to see if we can achieve a target configuration t starting from an initial configuration s, we need to determine whether there are scalars x 1 , , x5 in z:'. 2 such that s + x 1 a + x2 b + · · · + x5 e = t . • • In other words, we need to solve (if possible) the linear system over z:'. 2 that corre­ sponds to the vector equation x 1 a + x2 b + · · · + x5 e = t - s=t+s In this case, s = 0 and our first target configuration is 1 0 t = 1 0 The augmented matrix of this system has the given vectors as columns: 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 We reduce it over z:'. 2 to obtain 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 112 Chapter 2 Systems of Linear Equations Thus, x5 i s a free variable. Hence, there are exactly two solutions (corresponding to x5 = 0 and x5 = 1 ) . Solving for the other variables in terms of x5 , we obtain X1 = X2 = X3 = X4 = X5 + X5 + X5 So, when x5 = 0 and x5 = 1 , we have the solutions X1 X2 X3 X4 X5 _... 0 1 0 and 1 0 1 0 X1 X2 X3 X4 X5 respectively. (Check that these both work.) Similarly, in the second case, we have 1 0 t= 0 0 0 The augmented matrix reduces as follows: 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 -----+ 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 showing that there is no solution in this case; that is, it is impossible to start with all of the lights off and turn only the first light on. Example 2.35 shows the power of linear algebra. Even though we might have found out by trial and error that there was no solution, checking all possible ways to push the switches would have been extremely tedious. We might also have missed the fact that no switch need ever be pushed more than once. Exa m p l e 2 . 3 6 Consider a row with only three lights, each of which can be off, light blue, or dark blue. Below the lights are three switches, A, B, and C, each of which changes the states of particular lights to the next state, in the order shown in Figure 2 . 1 6. Switch A changes the states of the first two lights, switch B all three lights, and switch C the last two Section Dark blue 2.4 Applications 113 j Light blue � B c Figure 2.11 Figure 2 . 1 6 lights. If all three lights are initially off, is it possible to push the switches in some order so that the lights are off, light blue, and dark blue, in that order (as in Figure 2 . 1 7) ? Whereas Example 2 . 3 5 involved "1!.. 2 , this one clearly (is it clear?) involves "1!.. 3 • Accordingly, the switches correspond to the vectors Solution in Zl. and th, final wnfigmation w' "' <riming fod, t � [n ( Offo 0, light bl"' i' 1 , and dark blue i s 2 . ) We wish t o find scalars X i , x2 , x3 i n "1!.. 3 such that X 1 a + X 2 b + X3 C = t (where X; represents the number of times the ith switch is pushed) . This equation gives rise to the augmented matrix [a b c I t] , which reduces over "1!.. 3 as follows: .. I Hence, there is a unique solution: X i = 2, x2 = 1 , x3 = 1 . In other words, we must push switch A twice and the other two switches once each. (Check this.) Exercises 2 . 4 1. Suppose that, in Example 2.27, 400 units of food A, A l l o c a t i o n of R e s o u rces 600 units of B, and 600 units of C are placed in the test tube each day and the data on daily food consump­ tion by the bacteria (in units per day) are as shown in Table 2.6. How many bacteria of each strain can coexist in the test tube and consume all of the food? 2. Suppose that in Example 2.27, 400 units of food A, 500 units of B, and 600 units of C are placed in the test tube each day and the data on daily food Tab l e 2 . 6 Food A Food B Food C Bacteria Strain I 1 2 1 Bacteria Strain II 2 1 Bacteria Strain III 0 1 2 consumption by the bacteria (in units per day) are as shown in Table 2.7. How many bacteria of each 114 Chapter 2 Systems of Linear Equations Tab l e 2 . 1 Food A Food B Food C Bacteria Strain I 2 1 Bacteria Strain II 2 Bacteria Strain III 0 3 1 the special blend produces a profit of $ 1 .50, and one bag of the gourmet blend produces a profit of $2.00. How many bags of each type should the merchant prepare if he wants to use up all of the beans and maximize his profit? What is the maximum profit? Balancing Chemical Equations In Exercises 7-14, balance the chemical equation for each reaction. 7. FeS 2 + 0 2 -----+ Fe2 0 3 + S0 2 8. C0 2 + H 2 0 -----+ C6H 1 2 06 + 0 2 (This reaction takes strain can coexist in the test tube and consume all of the food? place when a green plant converts carbon dioxide and 3. A florist offers three sizes of flower arrangements water to glucose and oxygen during photosynthesis.) containing roses, daisies, and chrysanthemums. Each small arrangement contains one rose, three daisies, 9. C4H 1 0 + 0 2 -----+ C0 2 + H 2 0 (This reaction occurs and three chrysanthemums. Each medium arrange­ when butane, C 4 H 1 0 , burns in the presence of oxygen ment contains two roses, four daisies, and six chry­ to form carbon dioxide and water.) santhemums. Each large arrangement contains four 10. C 7 H60 2 + 0 2 -----+ H 2 0 + C0 2 roses, eight daisies, and six chrysanthemums. One 1 1 . C 5 H 1 1 0H + 0 2 -----+ H 2 0 + C0 2 (This equation rep­ day, the florist noted that she used a total of 24 roses, resents the combustion of amyl alcohol.) 50 daisies, and 48 chrysanthemums in filling orders 12. HC10 4 + P 4 0 1 0 ----+ H 3 P0 4 + Cl2 0 7 for these three types of arrangements. How many arrangements of each type did she make? 13. Na2 C0 3 + C + N2 -----+ NaCN + CO 4. (a) In your pocket you have some nickels, dimes, and cA5 14. C 2 H 2 Cl4 + Ca(OH) z -----+ C 2 HC13 + CaC12 + H 2 0 quarters. There are 20 coins altogether and exactly twice as many dimes as nickels. The total value of the N e t w o r k A n alvsis coins is $3.00. Find the number of coins of each type. 15. Figure 2 . 1 8 shows a network of water pipes with flows (b) Find all possible combinations of 20 coins (nickels, measured in liters per minute. dimes, and quarters) that will make exactly $3.00. (a) Set up and solve a system of linear equations to find 5. A coffee merchant sells three blends of coffee. A bag the possible flows. of the house blend contains 300 grams of Colombian (b) If the flow through AB is restricted to 5 L/min, what beans and 200 grams of French roast beans. A bag of the will the flows through the other two branches be? special blend contains 200 grams of Colombian beans, (c) What are the minimum and maximum possible 200 grams of Kenyan beans, and 100 grams of French flows through each branch? roast beans. A bag of the gourmet blend contains ( d) We have been assuming that flow is always posi­ 100 grams of Colombian beans, 200 grams of Kenyan tive. What would negative flow mean, assum­ beans, and 200 grams of French roast beans. The mer­ ing we allowed it? Give an illustration for this chant has on hand 30 kilograms of Colombian beans, example. 15 kilograms of Kenyan beans, and 25 kilograms of French roast beans. Ifhe wishes to use up all of the beans, how many bags of each type of blend can be made? !1 --+ A 6. Redo Exercise 5, assuming that the house blend contains 300 grams of Colombian beans, 50 grams of Kenyan beans, and 1 50 grams of French roast beans and the gourmet blend contains 100 grams of Colombian beans, c 350 grams of Kenyan beans, and 50 grams of French roast beans. This time the merchant has on hand 30 kilograms of Colombian beans, 1 5 kilograms of Kenyan beans, and B 1 5 kilograms of French roast beans. Suppose one bag of the house blend produces a profit of $0.50, one bag of Fioure 2 . 1 8 Section 16. The downtown core of Gotham City consists of one-way streets, and the traffic flow has been measured at each intersection. For the city block shown in Figure 2 . 1 9, the numbers represent the average numbers of vehicles per minute entering and leaving intersections A, B, C, and D during business hours. (a) Set up and solve a system oflinear equations to find the possible flows f1 , . . . ,f4 . (b) If traffic is regulated on CD so that f4 = 1 0 vehi­ cles per minute, what will the average flows on the other streets be? (c) What are the minimum and maximum possible flows on each street? (d) How would the solution change if all of the direc­ tions were reversed? ! 10 J.Q. A h .£ ! D f !1 (b) Suppose DC is closed. What range of flow will need to be maintained through DB? ( c ) From Figure 2.20 it is clear that DB cannot be closed. (Why not?) How does your solution in part (a) show this? (d) From your solution in part (a), determine the mini­ mum and maximum flows through DB. 18. (a) Set up and solve a system of linear equations to find the possible flows in the network shown in Figure 2.2 1 . (b) Is it possible forf1 = 1 0 0 andf6 = 1 50? [Answer this question first with reference to your solution in part (a) and then directly from Figure 2.2 1 . ] ( c) I ff4 = 0 , what will the range o f flow b e o n each of the other branches? f4 +--- ! 10 B 13 t c is f 1 50 1 0o 200 .2.... 13 .£ t 200 f f Ji A -- D +--- 1 0o Figure 2 . 1 9 16 f4 1 00 ! ! h f 20o B -- E +--- ! fs h t l c F l.QQ.. .l2Q. 1 0o Figure 2 . 2 1 17. A network of irrigation ditches is shown in Figure 2.20, with flows measured in thousands of liters per day. l.QQ.. A E l e c t r i c a l Networks For Exercises 19 and 20, determine the currents for the given electrical networks. 19. c /1 /1 +--- +--- 8 volts 1 ohm .l2Q. A " /2 c Figure 2 . 2 0 /2 -- ! ..!22. -- 1 ohm D 200 115 (a) Set up and solve a system oflinear equations to find the possible flows f1 , . . . ,f5 • 20 -- 2.4 Applications 4 ohms /3 +--- D 1 3 volts /3 +--- B Chapter 116 2 Systems of Linear Equations c Ii 20. - 11 R 1 1 R1 Rz eff - 5 volts -+­ 1 ohm h 12 -- A --- -- B 2 ohms 4 ohms 13 - D 13 E - ( a) 8 volts 21. (a) Find the currents I, I1 , . . . , I5 in the bridge circuit in Figure 2.22. (b) Find the effective resistance of this network. O? (c) Can you change the resistance in branch BC (but leave everything else unchanged) so that the cur­ rent through branch CE becomes l Ii i B c ohm 2 ohms /4 13 2 ohms I - ! 15 E A Ri 12 R2 1 f -- E (b) !h 1 ohm -- 11 -- -- 14 volts 1 ohm D I - Figure 2 . 2 2 22. The networks in parts (a) and (b) of Figure 2.23 show two resistors coupled in series and in parallel, respectively. We wish to find a general formula for the effective resistance of each network-that is, find Reff such that E = Refl (a) Show that the effective resistance Reff of a network with two resistors coupled in series [Figure 2.23(a) J is given by (b) Show that the effective resistance Reff of a net­ work with two resistors coupled in parallel [Figure 2.23(b)] is given by Figure 2 . 2 3 Resistors in series and in parallel l i n e a r E c o n o m i c M o d e ls 23. Consider a simple economy with just two industries: farming and manufacturing. Farming consumes 1/2 of the food and 1/3 of the manufactured goods. Manufac­ turing consumes 1/2 of the food and 2/3 of the manu­ factured goods. Assuming the economy is closed and in equilibrium, find the relative outputs of the farming and manufacturing industries. 24. Suppose the coal and steel industries form a closed economy. Every $1 produced by the coal industry requires $0.30 of coal and $0.70 of steel. Every $ 1 produced by steel requires $0.80 o f coal and $0.20 of steel. Find the annual production (output) of coal and steel if the total annual production is $20 million. 25. A painter, a plumber, and an electrician enter into a cooperative arrangement in which each of them agrees to work for himself/herself and the other two for a total of 1 0 hours per week according to the schedule shown in Table 2.8. For tax purposes, each person must establish a value for his/her services. They agree to do this so that they each come out even-that is, so that the Section total amount paid out by each person equals the amount he/she receives. What hourly rate should each person charge if the rates are all whole numbers between $30 and $60 per hour? Ta b l e 2 . 8 Supplier Painter Plumber Electrician 2 4 4 Painter Consumer Plumber Electrician 5 1 4 5 4 26. Four neighbors, each with a vegetable garden, agree to share their produce. One will grow beans (B), one will grow lettuce (L), one will grow tomatoes (T), and one will grow zucchini (Z). Table 2.9 shows what fraction of each crop each neighbor will receive. What prices should the neighbors charge for their crops if each person is to break even and the lowest-priced crop has a value of $50? 2.4 Applications 111 they produce, each department uses a certain amount of the services produced by the other departments and itself, as shown in Table 2 . 1 0. Suppose that, dur­ ing the year, other city departments require $ 1 million in Administrative services, $ 1 .2 million in Health services, and $0.8 million in Transportation services. What does the annual dollar value of the services produced by each department need to be in order to meet the demands? Ta b l e 2 . 1 0 Buy A H T Department H A T $0.20 0. 1 0 0.20 0.20 0.20 0.30 0. 1 0 0. 1 0 0.40 Finite linear Games 29. (a) In Example 2.35, suppose all the lights are initially Ta b l e 2 . 9 Consumer B L T z B Producer L T z 0 1 /2 1 /4 1 /4 1/4 1/4 1/4 1/4 1/8 1/4 1/2 1/8 1 /6 1 /6 1/3 1/3 27. Suppose the coal and steel industries form an open economy. Every $1 produced by the coal industry requires $0. 1 5 of coal and $0.20 of steel. Every $ 1 produced by steel requires $0.25 of coal and $0. 1 0 of steel. Suppose that there is an annual outside demand for $45 million of coal and $ 1 24 million of steel. (a) How much should each industry produce to satisfy the demands? (b) If the demand for coal decreases by $5 million per year while the demand for steel increases by $6 million per year, how should the coal and steel industries adjust their production? 28. In Gotham City, the departments of Administra­ tion (A), Health (H), and Transportation (T) are interdependent. For every dollar's worth of services off. Can we push the switches in some order so that only the second and fourth lights will be on? (b) Can we push the switches in some order so that only the second light will be on? 30. (a) In Example 2.35, suppose the fourth light is initially on and the other four lights are off. Can we push the switches in some order so that only the second and fourth lights will be on? (b) Can we push the switches in some order so that only the second light will be on? 31. In Example 2.35, describe all possible configurations of lights that can be obtained if we start with all the lights off. 32. (a) In Example 2.36, suppose that all of the lights are initially off. Show that it is possible to push the switches in some order so that the lights are off, dark blue, and light blue, in that order. (b) Show that it is possible to push the switches in some order so that the lights are light blue, off, and light blue, in that order. (c) Prove that any configuration of the three lights can be achieved. 33. Suppose the lights in Example 2.35 can be off, light blue, or dark blue and the switches work as described Chapter 118 2 Systems of Linear Equations in Example 2.36. (That is, the switches control the same lights as in Example 2.35 but cycle through the colors as in Example 2.36.) Show that it is possible to start with all of the lights off and push the switches in some order so that the lights are dark blue, light blue, dark blue, light blue, and dark blue, in that order. 34. For Exercise 33, describe all possible configurations of lights that can be obtained, starting with all the lights off. cAs 35. Nine squares, each one either black or white, are ar­ ranged in a 3 X 3 grid. Figure 2.24 shows one possible how the state changes work. (Touching the square whose number is circled causes the states of the squares marked * to change.) The object of the game is to turn all nine squares black. [Exercises 35 and 36 are adapted from puzzles that can be found in the interactive CD-ROM game The Seventh Guest (Trilobyte Software/Virgin Games, 1 992) .] (a) If the initial configuration is the one shown in Figure 2.24, show that the game can be won and describe a winning sequence of moves. (b) Prove that the game can always be won, no matter what the initial configuration. CAS 36. Consider a variation on the nine squares puzzle. The game is the same as that described in Exercise 35 except that there are three possible states for each square: white, gray, or black. The squares change as shown in Figure 2.25, but now the state changes follow the cycle white � gray � black � white. Show how the winning all-black configuration can be achieved from the initial configuration shown in Figure 2.26. Figure 2 . 2 4 The nine squares puzzle arrangement. When touched, each square changes its own state and the states of some of its neighbors (black � white and white � black) . Figure 2.25 shows CD* 2 4 5 * * * 8 7 @* 3 4 5 7 8 3 l 6 9 * 1 2 6 4 5 9 7 8 * * * ®* 6 * Figure 2.26 The nine squares puzzle with more states 9 Miscellaneous Problems l 2 3 l @* 5 6 4 7 8 9 2 * * l 4 * (j)* 5 8 * * 2 * 3 1 2 3 4 5 ®* G)* 6 7 8 9 7 8 9 3 l 2 3 l 2 3 6 4 5 6 4 5 9 7 ®* 9 7 8 * * * Figure 2 . 2 5 State changes for the nine squares puzzle * * * * 6 * * * ®* In Exercises 37-53, set up and solve an appropriate system of linear equations to answer the questions. 37. Grace is three times as old as Hans, but in 5 years she will be twice as old as Hans is then. How old are they now? 38. The sum of Annie's, Bert's, and Chris's ages is 60. Annie is older than Bert by the same number of years that Bert is older than Chris. When Bert is as old as Annie is now, Annie will be three times as old as Chris is now. What are their ages? The preceding two problems are typical of those found in popular books of mathematical puzzles. However, they have their origins in antiquity. A Babylonian clay tablet that sur­ vives from about 300 B. c. contains the following problem. Section Over 2000 years ago, the Chinese developed methods for solving systems of linear equations, including a version of Gaussian elimination that did not become well known in Europe until the 1 9th century. (There is no evidence that Gauss was aware of the Chinese methods when he devel­ oped what we now call Gaussian elimination. However, it is clear that the Chinese knew the essence of the method, even though they did not justify its use.) The following problem is taken from the Chinese text Jiuzhang suanshu (Nine Chapters in the Mathematical Art), written during the early Han Dynasty, about 200 B . C . (a) (O, 1), ( - 1, 4), and (2, 1 ) (b) ( - 3, 1), ( - 2, 2), and ( - 1, 5) 46. Through any three noncollinear points there also passes a unique circle. Find the circles (whose general equations are of the form x2 + y2 + ax + by + c = 0) that pass through the sets of points in Exercise 45. (To check the validity of your answer, find the center and radius of each circle and draw a sketch.) 40. There are three types of corn. Three bundles of the first type, two of the second, and one of the third make 39 measures. Two bundles of the first type, three of the second, and one of the third make 34 measures. And one bundle of the first type, two of the second, and three of the third make 26 measures. How many measures of corn are contained in one bundle of each type? 41. Describe all possible values of a, b, c, and d that will make each of the following a valid addition table. [Problems 4 1 -44 are based on the article ''An Application of Matrix Theory" by Paul Glaister in The Mathematics Teacher, 85 ( 1 992), pp. 220-223.] (b) The process of adding rationalfunctions (ratios ofpolyno­ mials) by placing them over a common denominator is the analogue of adding rational numbers. The reverse process of taking a rationalfunction apart by writing it as a sum ofsimpler rationalfunctions is useful in several areas of mathematics; for example, it arises in calculus when we need to integrate a rationalfunction and in dis­ crete mathematics when we use generatingfunctions to solve recurrence relations. The decomposition of a rational function as a sum ofpartialfractions leads to a system of linear equations. In Exercises 47-50,find the partial fraction decomposition of the givenform. (The capital letters denote constants.) � b c 3 6 d 4 5 42. What conditions on w, x, y, and z will guarantee that we can find a, b, c, and d so that the following is a valid addition table? 3x + 1 A B + -= x + 2x - 3 x - 1 x + 3 C B x2 - 3x + 3 A =48. 3 + + 2 x + 2x + x x x + 1 (x + 1) 2 x-1 49. (X + l )(x 2 + l )(x 2 + 4 ) A Bx + C Dx + E = + 2 + --x+ 1 x + 1 x2 + 4 x3 + x + 1 A B 50 • =-+ x (x - l) (x 2 + x + l) (x 2 + 1 ) 3 x x - 1 Cx + D Ex + F Gx + H Ix + J + + x2 + + 2 2 2 + x + l x + 1 (x + 1 ) (x 2 + 1 ) 3 47. 2 - -- -- 43. Describe all possible values of a, b, c, d, e, and f that will make each of the following a valid addition table. ( a) + d e f a b c 3 2 1 5 4 3 4 3 (b) + d e f a b c 1 2 3 3 4 5 4 5 6 119 44. Generalizing Exercise 42, find conditions on the en­ tries of a 3 X 3 addition table that will guarantee that we can solve for a, b, c, d, e, and f as previously. 45. From elementary geometry we know that there is a unique straight line through any two points in a plane. Less well known is the fact that there is a unique parabola through any three noncollinear points in a plane. For each set of points below, find a parabola with an equation of the form y = ax 2 + bx + c that passes through the given points. (Sketch the resulting parabola to check the validity of your answer.) 39. There are two fields whose total area is 1 800 square yards. One field produces grain at the rate of � bushel per square yard; the other field produces grain at the rate of t bushel per square yard. If the total yield is 1 1 00 bushels, what is the size of each field? ( a) 2.4 Applications CAS --- ------ -- CAS -- ---- 120 Chapter 2 Systems of Linear Equations Following are two useful formulas for the sums ofpowers of consecutive natural numbers: n(n + 1 ) 1 + 2 + · · · + n = ---2 and n(n + 1 ) (2n + 1) 1 2 + 2 2 + · . · + n 2 = ----6 The validity of these formulas for all values of n 2:: 1 (or even n 2:: OJ can be established using mathematical induc­ tion (see Appendix B) . One way to make an educated guess as to what the formulas are, though, is to observe that we can rewrite the two formulas above as respectively. This leads to the conjecture that the sum ofpth powers of the first n natural numbers is a poly nomial of degree p + 1 in the variable n. 5 1 . Assuming that 1 + 2 + · · · + n = an 2 + bn + c, find a, b, and c by substituting three values for n and thereby obtaining a system of linear equations in a, b, and c. 52. Assume that 1 2 + 2 2 + · · · + n 2 = an 3 + bn 2 + en + d. Find a, b, c, and d. [Hint: It is legitimate to use n = 0. What is the left-hand side in that case?] 53. Show that 1 3 + 2 3 + · · · + n 3 = (n(n + 1)/2) 2 . Vi gnette T h e Glob al Po s it i o n i n g System This application is based on the article "An Underdetermined Linear System for GPS" by Dan Kalman in The College 33 (2002), pp. 384-390. For a more in-depth Mathematics Journal, treatment of the ideas introduced here, see G. Strang and K. Borre, Linear Algebra, Geodesy, and GPS 1997). (Wellesley-Cambridge Press, MA, The Global Positioning System (GPS) i s used i n a variety o f situations fo r determin­ ing geographical locations. The military, surveyors, airlines, shipping companies, and hikers all make use of it. GPS technology is becoming so commonplace that some automobiles, cellular phones, and various handheld devices are now equipped with it. The basic idea of GPS is a variant on three-dimensional triangulation: A point on Earth's surface is uniquely determined by knowing its distances from three other points. Here the point we wish to determine is the location of the GPS receiver, the other points are satellites, and the distances are computed using the travel times of radio signals from the satellites to the receiver. We will assume that Earth is a sphere on which we impose an xyz-coordinate system with Earth centered at the origin and with the positive z-axis running through the north pole and fixed relative to Earth. For simplicity, let's take one unit to be equal to the radius of Earth. Thus Earth's surface becomes the unit sphere with equation x 2 + y 2 + z 2 = 1 . Time will be measured in hundredths of a second. GPS finds distances by knowing how long it takes a radio signal to get from one point to another. For this we need to know the speed of light, which is approximately equal to 0.47 (Earth radii per hundredths of a second) . Let's imagine that you are a hiker lost in the woods at point (x, y, z) at some time t. You don't know where you are, and furthermore, you have no watch, so you don't know what time it is. However, you have your G PS device, and it receives simultaneous signals from four satellites, giving their positions and times as shown in Table 2 . 1 1 . (Distances are measured in Earth radii and time in hundredths of a second past midnight.) Ta b l e 2 . 1 1 sa1em1e Dara Satellite 2 3 4 Position Time ( 1 . 1 1 , 2.55, 2 . 1 4) (2.87, 0.00, 1 .43) (0.00, 1 .08, 2.29) ( 1 .54, 1 . 0 1 , 1 .23) 1 .29 1.31 2.75 4.06 121 Let (x, y, z) be your position, and let t be the time when the signals arrive. The goal is to solve for x, y, z, and t. Your distance from Satellite 1 can be computed as follows. The signal, traveling at a speed of 0.47 Earth radii/ 1 0 - 2 sec, was sent at time 1 .29 and arrived at time t, so it took t - 1 .29 hundredths of a second to reach you. Distance equals velocity multiplied by (elapsed) time, so d = 0.47(t - 1 .29) We can also express d in terms of (x, y, z) and the satellite's position ( 1 . 1 1 , 2.55, 2. 14) using the distance formula: d = V( x - 1 . 1 1 ) 2 + (y - 2.55 ) 2 + ( z - 2 . 1 4 ) 2 Combining these results leads to the equation (x - 1 . 1 1 ) 2 + (y - 2.55) 2 + (z - 2 . 1 4) 2 = 0.47 2 (t - 1 .29) 2 Expanding, simplifying, and rearranging, we find that Equation ( 1 ) becomes 2.22x + 5 . l Oy + 4.28z - 0.57t = x 2 + y 2 + z 2 - 0.22t 2 + 1 1 .95 (1) Similarly, we can derive a corresponding equation for each of the other three satel­ lites. We end up with a system of four equations in x, y, z, and t: 2.22x + 5 . l Oy + 4.28z - 0.57t = x 2 + y 2 + z 2 - 0.22t 2 + 1 1 .95 5.74x + 2.86z - 0.58t = x 2 + y 2 + z 2 - 0.22t 2 + 9.90 2 . 1 6y + 4.58z - 1 .2 1 t = x 2 + y 2 + z 2 - 0.22t 2 + 4.74 3.08x + 2.02y + 2.46z - 1 .79t = x 2 + y 2 + z 2 - 0.22t 2 + 1 .26 These are not linear equations, but the nonlinear terms are the same in each equation. If we subtract the first equation from each of the other three equations, we obtain a linear system: 3.52x - 5 . l Oy - 1 .42z - O.O l t = 2.05 - 2.22x - 2.94y + 0.30z - 0.64t = 7.2 1 0.86x - 3.08y - 1 .82z - 1 .22t = - 1 0.69 The augmented matrix row reduces as 122 [ 3.52 - 2.22 0.86 - 5. 1 0 - 2.94 - 3.08 - 1 .42 0.30 - 1 .82 = �:��1 [� � � �:�: ] - 0.01 - 0.64 - 1 .22 - 1 0.69 � 0 0 2.97 0.81 1 0.79 5.91 from which we see that x = 2.97 - 0.36t y = 0.81 - 0.03t z = 5.91 - 0.79t (2) with t free. Substituting these equations into ( 1 ), we obtain (2.97 - 0.36t - 1 . 1 1 ) 2 + (0.81 - 0.03t - 2.55) 2 + (5.9 1 - 0.79t - 2. 14) 2 = 0.47 2 (t - 1 .29) 2 which simplifies to the quadratic equation 0.54t 2 - 6.65t + 20.32 = 0 There are two solutions: t = 6.74 and t = 5.60 Substituting into (2), we find that the first solution corresponds to (x, y, z) = (0.55, 0.6 1 , 0.56) and the second solution to (x, y, z) = (0.96, 0.65, 1 .46). The second solution is clearly not on the unit sphere (Earth), so we reject it. The first solution produces x 2 + y 2 + z 2 = 0.99, so we are satisfied that, within acceptable roundoff error, we have located your coordinates as (0.55, 0.6 1 , 0.56). In practice, GPS takes significantly more factors into account, such as the fact that Earth's surface is not exactly spherical, so additional refinements are needed in­ volving such techniques as least squares approximation (see Chapter 7). In addition, the results of the GPS calculation are converted from rectangular (Cartesian) coor­ dinates into latitude and longitude, an interesting exercise in itself and one involving yet other branches of mathematics. 123 124 Chapter '� 2 Systems of Linear Equations � llerative M e l h o d s f o r S o l v i n g l i n e a r svste m s The direct methods for solving linear systems, using elementary row operations, lead to exact solutions in many cases but are subject to errors due to roundoff and other factors, as we have seen. The third road in our "trivium" takes us down quite a different path indeed. In this section, we explore methods that proceed iteratively by succes­ sively generating sequences of vectors that approach a solution to a linear system. In many instances (such as when the coefficient matrix is sparse-that is, contains many zero entries), iterative methods can be faster and more accurate than direct methods. Also, iterative methods can be stopped whenever the approximate solution they gen­ erate is sufficiently accurate. In addition, iterative methods often benefit from inac­ curacy: Roundoff error can actually accelerate their convergence toward a solution. We will explore two iterative methods for solving linear systems: Jacobi's method and a refinement of it, the Gauss-Seidel method. In all examples, we will be consid­ ering linear systems with the same number of variables as equations, and we will assume that there is a unique solution. Our interest is in finding this solution using iterative methods. Exa m p l e 2 . 3 1 Consider the system 7x 1 - x2 = 5 3x 1 - 5x2 = - 7 Jacobi's method begins with solving the first equation for x1 and the second equation for x2, to obtain 5 + Xz X1 = 7 (1) 7 + 3x 1 Xz = 5 We now need an initial approximation to the solution. It turns out that it does not matter what this initial approximation is, so we might as well take x1 = 0, x2 = 0. We use these values in Equations ( 1 ) to get new values of x1 and x2: (1804-1851) Carl Gustav Jacobi was a German mathematician who made important contributions to many fields of mathematics and physics, including geometry, number theory, analysis, mechanics, and fluid dynamics. Although much of his work was in applied mathematics, Jacobi believed in the importance of doing mathematics for its own sake. A fine teacher, he held positions at the Universities of Berlin and Konigsberg and was one of the most famous mathematicians in Europe. -- = - = 0.714 5 + 0 5 7 7 7 + 3.0 7 Xz = 5 5 1 .400 Now we substitute these values into ( 1 ) to get 5 + 1 .4 = 0.9 14 7 7 + 3 27 = 1 .829 Xz = --5 X1 = --- . (written to three decimal places) . We repeat this process (using the old values of x2 and x1 to get the new values of x1 and x2 ) , producing the sequence of approximations given in Table 2 . 1 2 . Section 1 0 Ta b l e 2 . 1 2 2 4 3 125 5 6 00 0.1.47001 4 0.1.982914 0.1.994769 0.1.999385 0.1.999698 0.1.999999 4, [::] [ 01.·999385 ] . [�l n ....... 2. 5 Iterative Methods for Solving Linear Systems The successive vectors are called iterates, so, for example, when n = the fourth iterate is We can see that the iterates in this example are approaching which is the exact solution of the given system. (Check this.) 4 We say in this case that Jacobi's method converges. 2. 1 3. Jacobi's method calculates the successive iterates in a two-variable system accord­ ing to the crisscross pattern shown in Table 1 0 Ta b l e 2 . 1 3 n The Gauss-Seidel method is named after C. F. Gauss and Philipp Ludwig von Seidel Seidel worked in analysis, probability theory, astronomy, and optics. Unfortunately, he suffered from eye problems and retired at a young age. The paper in which he described the method now known as Gauss­ Seidel was published in Gauss, it seems, was unaware of the method! (1821-1896). 1874. 2 3 Before we consider Jacobi's method in the general case, we will look at a modification of it that often converges faster to the solution. The Gauss-Seidel method is the same as the Jacobi method except that we use each new value as soon as we can. =* So in our example, we begin by calculating x1 = + as before, but we now use this value of x 1 to get the next value of x 2 : (5 0)/7 0.7 14 7 53. 1.829 = Xz = ---7 + 2. 1 4. 2. = We then use this value of x2 to recalculate x 1 , and so on. The iterates this time are shown in Table We observe that the Gauss-Seidel method has converged faster to the solu­ tion. The iterates this time are calculated according to the zigzag pattern shown in Table 2. 1 5. 0 Ta b l e 2 . 1 4 n 1 2 3 4 5 00 0.1.872914 0.1.997685 0.1.999998 2.1.000000 2.1.000000 126 Chapter 2 Systems of Linear Equations 1 0 Ta b l e 2 . 1 5 n 2 3 Xz The Gauss-Seidel method also has a nice geometric interpretation in the case of two variables. We can think of x 1 and x2 as the coordinates of points in the plane. Our starting point is the point corresponding to our initial approximation, (0, O). Our first calculation gives x 1 = � , so we move to the point ( �, O) (0.714, 0). Then we compute x2 = � 1 .829, which moves us to the point ( � , �) (0.714, 1 .829). Continuing in this fashion, our calculations from the Gauss-Seidel method give rise to a sequence of points, each one differing from the preceding point in exactly one coordinate. If we plot the lines 7x 1 - x2 = 5 and 3x 1 - 5x2 = - 7 corresponding to the two given equations, we find that the points calculated above fall alternately on the two lines, as shown in Figure 2.27. Moreover, they approach the point of intersection of the lines, which corresponds to the solution of the system of equations. This is what convergence means! = = = 2 05 . 0.2 - 0.4 05 . -1 Figure 2 . 2 1 Converging iterates The general cases of the two methods are analogous. Given a system of n linear equations in n variables, a i 1X1 + a i 2X2 + · · · + a i nXn = b i a z1 X1 + a z 2X2 + · · · + a z nXn = b z (2) we solve the first equation for x 1 , the second for x2 , and so on. Then, beginning with an initial approximation, we use these new equations to iteratively update each Section 2. 5 Iterative Methods for Solving Linear Systems 121 variable. Jacobi's method uses all of the values at the kth iteration to compute the (k + l )st iterate, whereas the Gauss-Seidel method always uses the most recent value of each variable in every calculation. Example 2.39 later illustrates the Gauss-Seidel method in a three-variable problem. At this point, you should have some questions and concerns about these iterative methods. (Do you?) Several come to mind: Must these methods converge? If not, when do they converge? If they converge, must they converge to the solution? The answer to the first question is no, as Example 2.38 illustrates. Apply the Gauss-Seidel method to the system Exa m p l e 2 . 3 8 X1 - Xz = 1 2x 1 + Xz = 5 with initial approximation Solution [ � ]. We rearrange the equations to get X1 = 1 + X2 X2 = 5 - 2x 1 � [:: ] [ � ]. The first few iterates are given in Table 2 . 1 6. (Check these.) The actual solution to the given system is = Clearly, the iterates in Table 2 . 1 6 are not approaching this point, as Figure 2.28 makes graphically clear in an example of divergence. 0 Ta b l e 2 . 1 6 n Xi X2 0 0 1 3 2 4 -3 3 -2 9 4 10 - 15 5 - 14 33 -4 Fioure 2.28 Diverging iterates 128 Chapter 2 Systems of Linear Equations S o when do these iterative methods converge? Unfortunately, the answer to this question is rather tricky. We will answer it completely in Chapter 7, but for now we will give a partial answer, without proof. Let A be the n X n matrix [ "" a12 a a A = .2 1 22 a n) a n2 a,, a2 n a nn l We say that A is strictly diagonally dominant if l I l I l I l l a 11 > a 1 2 + a 13 + · · · + a 1n l I l I l I l l a22 > a2 1 + a 2 3 + · · · + a2 n That is, the absolute value of each diagonal entry a11 , aw . . . , ann is greater than the sum of the absolute values of the remaining entries in that row. Theorem 2 . 9 If a system of n linear equations in n variables has a strictly diagonally domi­ nant coefficient matrix, then it has a unique solution and both the Jacobi and the Gauss-Seidel method converge to it. Remark Be warned! This theorem is a one-way implication. The fact that a system is not strictly diagonally dominant does not mean that the iterative meth­ ods diverge. They may or may not converge. (See Exercises 1 5 - 1 9.) Indeed, there are examples in which one of the methods converges and the other diverges. However, if either of these methods converges, then it must converge to the solution-it cannot converge to some other point. Theorem 2 . 1 0 If the Jacobi or the Gauss-Seidel method converges for a system of n linear equations in n variables, then it must converge to the solution of the system. Proof We will illustrate the idea behind the proof by sketching it out for the case of Jacobi's method, using the system of equations in Example 2.37. The general proof is similar. Convergence means that "as iterations increase, the values of the iterates get closer and closer to a limiting value:' This means that x1 and x2 converge to r and s, respectively, as shown in Table 2 . 1 7. We must prove that [:: ] [:] is the solution of the system of equations. In other words, at the ( k + l ) st iteration, the values of x1 and x2 must stay the same as at Section 2. 5 Iterative Methods for Solving Linear Systems k+ I k Ta b l e 2 . 1 1 n r s + + k+2 r s r s the kth iteration. But the calculations give x1 = (5 (7 3x1)/5 = (7 3r)/5. Therefore, Rearranging, we see that 5 129 + x2 )/7 = (5 + s)/7 and x2 = + s = r and 7 + 3r = s --- -- 5 7 7r - s = 5 3r - 5s = - 7 Thus, x1 = r, x2 = s satisfy the original equations, as required. By now you may be wondering: If iterative methods don't always converge to the solution, what good are they? Why don't we just use Gaussian elimination? First, we have seen that Gaussian elimination is sensitive to roundoff errors, and this sensitiv­ ity can lead to inaccurate or even wildly wrong answers. Also, even if Gaussian elimi­ nation does not go astray, we cannot improve on a solution once we have found it. For example, if we use Gaussian elimination to calculate a solution to two decimal places, there is no way to obtain the solution to four decimal places except to start over again and work with increased accuracy. In contrast, we can achieve additional accuracy with iterative methods simply by doing more iterations. For large systems, particularly those with sparse coefficient matrices, iterative methods are much faster than direct methods when implemented on a computer. In many applications, the systems that arise are strictly diagonally dominant, and thus iterative methods are guaranteed to converge. The next example illustrates one such application. Exa m p l e 2 . 3 9 Suppose we heat each edge of a metal plate to a constant temperature, as shown in Figure 2 . 2 9 . 50° Figure 2 . 2 9 A heated metal plate oo 130 Chapter 2 Systems of Linear Equations Eventually the temperature at the interior points will reach equilibrium, where the following property can be shown to hold: The temperature at each interior point P on a plate is the average of the tempera­ tures on the circumference of any circle centered at P inside the plate (Figure 2.30). figure 2 . 3 0 To apply this property in an actual example requires techniques from calculus. As an alternative, we can approximate the situation by overlaying the plate with a grid, or mesh, that has a finite number of interior points, as shown in Figure 2.3 1 . Figure 2 . 3 1 The discrete version of the heated plate problem The discrete analogue of the averaging property governing equilibrium tempera­ tures is stated as follows: The temperature at each interior point P is the average of the temperatures at the points adjacent to P. For the example shown in Figure 2.3 1 , there are three interior points, and each is adjacent to four other points. Let the equilibrium temperatures of the interior points Section 2. 5 Iterative Methods for Solving Linear Systems 131 be t 1 , t2 , and t3 , as shown. Then, by the temperature-averaging property, we have t! = 1 00 + 1 00 + G + 50 ------ 4 t + t + 0 + 50 t1 = 1 3 4 1 00 + 1 00 + o + G t3 = 4 (3) _ _ _ _ _ _ _ or 250 - t1 + 4t2 - t3 = 50 - t1 + 4t3 = 200 Notice that this system is strictly diagonally dominant. Notice also that Equa­ tions (3) are in the form required for Jacobi or Gauss-Seidel iteration. With an initial approximation of t i = 0, t2 = 0, t3 = 0, the Gauss-Seidel method gives the following iterates. t1 = Iteration 2: � 1 00 + 1 00 + 0 + 50 = 62.5 4 62.5 + 0 + 0 + 50 = 28. 125 t1 = 4 1 00 + 1 00 + 0 + 28. 125 = 57.03 1 t3 = 4 100 + 1 00 + 28. 125 + 50 = 69.5 3 1 t1 = 4 69.53 1 + 57.03 1 + 0 + 50 = 44. 141 t1 = 4 100 + 1 00 + 0 + 44. 141 = 6 1 .035 t3 = 4 Iteration 1 : Continuing, we find the iterates listed in Table 2 . 1 8 . We work with five-significant­ digit accuracy and stop when two successive iterates agree within 0.00 1 in all variables. Thus, the equilibrium temperatures at the interior points are (to an accuracy of 0.00 1 ) t i = 74. 1 08, t2 = 46.430, and t3 = 6 1 .607. (Check these calculations.) By using a finer grid (with more interior points), we can get as precise informa­ tion as we like about the equilibrium temperatures at various points on the plate. 0 Ta b l e 2 . 1 8 n ti t1 t3 0 0 0 1 62.500 28. 125 57.03 1 2 69.53 1 44 . 1 4 1 6 1 .035 3 73.535 4 6. 1 4 3 6 1 .536 7 74. 107 46.429 6 1 .607 8 74. 107 46.429 6 1 .607 4 132 .. GAS I Chapter 2 Systems of Linear Equations Exercises 2 . 5 In Exercises 1 -6, apply Jacobi's method to the given system. Take the zero vector as the initial approximation and work with four-significant-digit accuracy until two successive iterates agree within 0.001 in each variable. In each case, compare your answer with the exact solution found using any direct method you like. 1 . 7X 1 - X2 = 6 2. 2X 1 + X2 = 5 X 1 - 5X2 = - 4 X 1 - X2 = 1 3. 4.5X 1 - 0.5X2 = 1 X 1 - 3.5X2 = - 1 4. 20X 1 + X2 - X3 = 1 7 X1 - 10x2 + x3 = 1 3 -x 1 + X2 + 10x3 = 1 8 5 . 3X 1 + X2 =1 X 1 + 4X2 + X3 = 1 X2 + 3X3 = 1 =1 6. 3X 1 - X2 =0 -X 1 + 3X2 - X3 -X2 + 3X3 - X4 = -x3 + 3X4 = 1 In Exercises 7-12, repeat the given exercise using the Gauss­ Seidel method. Take the zero vector as the initial approxi­ mation and work with four-significant-digit accuracy until two successive iterates agree within 0.001 in each variable. Compare the number of iterations required by the Jacobi and Gauss-Seidel methods to reach such an approximate solution. 7. Exercise 1 8. Exercise 2 9. Exercise 3 10. Exercise 4 1 1 . Exercise 5 12. Exercise 6 In Exercises 13 and 14, draw diagrams to illustrate the con­ vergence of the Gauss-Seidel method with the given system. 13. The system in Exercise 1 14. The system in Exercise 2 In Exercises 15 and 1 6, compute the first four iterates, using the zero vector as the initial approximation, to show that the Gauss-Seidel method diverges. Then show that the equations can be rearranged to give a strictly diagonally dominant coefficient matrix, and apply the Gauss-Seidel method to obtain an approximate solution that is accurate to within 0.001. 16. X 1 - 4X2 + 2X3 = 2 15. X 1 - 2X2 = 3 3x 1 + 2x2 = 1 2x2 + 4x3 = 1 6X 1 - X2 - 2X3 = 1 17. Draw a diagram to illustrate the divergence of the Gauss-Seidel method in Exercise 15. In Exercises 18 and 1 9, the coefficient matrix is not strictly diagonally dominant, nor can the equations be rearranged to make it so. However, both the Jacobi and the Gauss-Seidel method converge anyway. Demonstrate that this is true of the Gauss-Seidel method, starting with the zero vector as the initial approximation and obtaining a solution that is accurate to within 0.01. 18. - 4x 1 + 5X2 = 14 X 1 - 3x2 = - 7 19. 5x 1 - 2x2 + 3x3 = - 8 X1 + 4X2 - 4X3 = 102 - 2x1 - 2x2 + 4x3 = - 90 20. Continue performing iterations in Exercise 18 to obtain a solution that is accurate to within 0.00 1 . 2 1 . Continue performing iterations in Exercise 19 to obtain a solution that is accurate to within 0.00 1 . In Exercises 22-24, the metal plate has the constant tem­ peratures shown on its boundaries. Find the equilibrium temperature at each of the indicated interior points by setting up a system of linear equations and applying either the Jacobi or the Gauss-Seidel method. Obtain a solution that is accurate to within 0.001. 22. 50 50 Section oo oo 23. t1 oo 100° t2 t3 24. oo 40° 100° !00° 100° oo 20° f1 t2 f3 t4 20° 100° 100° 40° In Exercises 25 and 26, we refine the grids used in Exer­ cises 22 and 24 to obtain more accurate information about the equilibrium temperatures at interior points of the plates. Obtain solutions that are accurate to within 0.001, using either the Jacobi or the Gauss-Seidel method. 25. oo oo t2 oo t4 ts 50 50 50 oo oo 20° 20° oo f1 t2 t3 t4 20° oo 15 16 I] lg 20° 40° 19 110 11 1 112 100° 40° t1 3 t14 t15 t1 6 100° 26. 40° 133 Exercises 27 and 28 demonstrate that sometimes, if we are lucky, the form of an iterative problem may allow us to use a little insight to obtain an exact solution. 27. A narrow strip of paper 1 unit long is placed along a oo t4 2. 5 Iterative Methods for Solving Linear Systems number line so that its ends are at 0 and 1 . The paper is folded in half, right end over left, so that its ends are now at 0 and i. Next, it is folded in half again, this time left end over right, so that its ends are at ! and t . Figure 2.32 shows this process. We continue folding the paper in half, alternating right-over-left and left­ over-right. If we could continue indefinitely, it is clear that the ends of the paper would converge to a point. It is this point that we want to find. (a) Let x 1 correspond to the left-hand end of the paper and x2 to the right-hand end. Make a table with the first six values of [x 1 , x2 ] and plot the correspond­ ing points on x 1 , x2 coordinate axes. (b) Find two linear equations of the form x2 = ax1 + b and x 1 = cx 2 + d that determine the new values of the endpoints at each iteration. Draw the corre­ sponding lines on your coordinate axes and show that this diagram would result from applying the Gauss-Seidel method to the system of linear equa­ tions you have found. (Your diagram should resem ble Figure 2.27 on page 1 2 6.) (c) Switching to decimal representation, continue applying the Gauss-Seidel method to approximate the point to which the ends of the paper are con­ verging to within 0.00 1 accuracy. (d) Solve the system of equations exactly and compare your answers. 28. An ant is standing on a number line at point A. It walks halfway to point B and turns around. Then it walks halfway back to point A, turns around again, and walks halfway to point B. It continues to do this indefinitely. Let point A be at 0 and point B be at 1 . 40° 100° 100° The ant's walk is made up of a sequence of overlap­ ping line segments. Let x 1 record the positions of the left-hand endpoints of these segments and x2 their right-hand endpoints. (Thus, we begin with x 1 = 0 and x2 = t . Then we have x 1 = ! and x2 = t, and so on. ) Figure 2.33 shows the start of the ant's walk. (a) Make a table with the first six values of [x1 , x2 ] and plot the corresponding points on x 1 , x2 coordinate axes. (b) Find two linear equations of the form x2 = ax1 + b and x 1 = cx2 + d that determine the newvalues ofthe endpoints at each iteration. Draw the corresponding Chapter 134 2 Systems of Linear Equations � 0 I ........----.. ! I 0 D 0 l 4 l 2 0 !"tf l 4 I 3 4 2 ) l 2 0 I 3 4 � l 4 l 2 _5 0 l 4 � l 2 Figure 2 . 3 2 Figure 2 . 3 3 Folding a strip of paper The ant's walk 5 8 3 4 lines on your coordinate axes and show that this dia­ gram would result from applying the Gauss-Seidel method to the system of linear equations you have found. (Your diagram should resemble Figure 2.27 on page 126.) (c) Switching to decimal representation, continue applying the Gauss-Seidel method to approximate the values to which x 1 and x2 are converging to within 0.001 accuracy. (d) Solve the system of equations exactly and compare your answers. Interpret your results. 3 4 Chapter Review Kev Defi nitions and concepts augmented matrix, 61, 64 back substitution, 61 coefficient matrix, 64 consistent system, 60 convergence, 125- 126 divergence, 127 elementary row operations, 66 free variable, 71 Gauss-Jordan elimination, 73 Gauss-Seidel method, 124 Gaussian elimination, 68-69 homogeneous system, 76 inconsistent system, 60 iterate, 125 Jacobi's method, 124 leading variable (leading 1), 71 -73 linear equation, 58 linearly dependent vectors, 93 linearly independent vectors, 93 1. Mark each of the following statements true or false: Review Questions (a) Every system of linear equations has a solution. (b) Every homogeneous system of linear equations has a solution. (c) If a system of linear equations has more vari­ ables than equations, then it has infinitely many solutions. (d) If a system of linear equations has more equations than variables, then it has no solution. pivot, 66 rank of a matrix, 72 Rank Theorem, 72 reduced row echelon form, 73 row echelon form, 65 row equivalent matrices, 68 span of a set of vectors, 90 spanning set, 90 system of linear equations, 59 (e) Determining whether b is in span(a 1 , . . . , an ) is equivalent to determining whether the system [A I b] is consistent, where A = [a 1 . . . an l · (f) In IR 3 , span( u, v) is always a plane through the origin. ( g) In IR 3 , if nonzero vectors u and v are not parallel, then they are linearly independent. (h) In IR 3 , if a set of vectors can be drawn head to tail, one after the other so that a closed path (polygon) is formed, then the vectors are linearly dependent. Chapter Review (i) If a set of vectors has the property that no two vectors in the set are scalar multiples of one another, then the set of vectors is linearly independent. (j) If there are more vectors in a set of vectors than the number of entries in each vector, then the set of vectors is linearly dependent. 2. Find the rnAA of the ma1'U ] [ i �; � _ 3. Solve the linear system 3 3 -3 6 2 4 . 2 2 x + y - 2z = 4 x + 3y - z = 7 2x + y - Sz = 7 4. Solve the linear system - 3w + 8x - 18y + z = 35 w + 2x 4y = 1 1 w + 3x - 7y + z = 10 5. Solve the linear system 2x + 3y = 4 x + 2y = 3 over 2 7 . 6. Solve the linear system over 2 5 . 3x + 2y = 1 x + 4y = 2 [� I � ] 7. For what value(s) of k is the linear system with augmented matrix 2 2k inconsistent? 8. Find parametric equations for the line of intersection of the planes x + 2y + 3z = 4 and Sx + 6y + 7z = 8. 9. Find the point of intersection of the following lines, if it exists. 135 [} m 1 1 . Find the general equation of the plane spanned by nd [ � ] , [ -- � ] , [ ! ] w) u � [ } � [ Hw � [ : ] u � [ -} � [ -n w � [ -:] 12. ? etermine whether mdependent. -3 2 13. Determine whether IR 3 = span(u, v, ( a) (b) are linearly -2 if: 14. Let a1, a2 , a3 be linearly independent vectors in IR 3 , and let A = [ a1 a2 a3 ] . Which of the following statements are true? ( a) The reduced row echelon form of A is 13 . (b) The rank of A is 3. (c) The system [A I b] has a unique solution for any vector b in IR 3 . (d) (a), (b), and (c) are all true. (e) (a) and (b) are both true, but not (c) . 15. Let a1, a2 , a3 be linearly dependent vectors in IR 3 , not all zero, and let A = [ a1 a2 a3 ] . What are the possible values of the rank of A? 16. What is the maximum rank of a 5 X 3 matrix? What is the minimum rank of a 5 X 3 matrix? 17. Show that if u and v are linearly independent vectors, then so are u + v and u - v. 18. Show that span(u, v) = span(u, u + v) for any vectors u and v. 19. In order for a linear system with augmented matrix [A I b] to be consistent, what must be true about the ranks of A and [A I b] ? 20. Are ilie matci= O U ! l [i -:i - row equivalent? Why or why not? nd M atri ces We [Halmos and KaplanskyJ share a philosophy about linear algebra: we think basisjree, we write basis-free, but when the chips are down we close the office door and compute with matrices like fury. -Irving Kaplansky In Paul Halmos: Celebrating J. H. Ewing and Gehring, eds. Springer-Verlag, 1991, p. 88 50 Years of Mathematics F. W 3.0 I n t ro d u ctio n : M atrices i n Acti o n In this chapter, we will study matrices in their own right. We have already used matrices-in the form of augmented matrices-to record information about and to help streamline calculations involving systems of linear equations. Now you will see that matrices have algebraic properties of their own, which enable us to calculate with them, subject to the rules of matrix algebra. Furthermore, you will observe that matrices are not static objects, recording information and data; rather, they represent certain types of functions that "act" on vectors, transforming them into other vectors. These "matrix transformations" will begin to play a key role in our study of linear algebra and will shed new light on what you have already learned about vectors and systems of linear equations. Furthermore, matrices arise in many forms other than augmented matrices; we will explore some of the many applications of matrices at the end of this chapter. In this section, we will consider a few simple examples to illustrate how matri­ ces can transform vectors. In the process, you will get your first glimpse of "matrix arithmetic:' Consider the equations Y 1 = X 1 + 2Xz 3x2 Yz = We can view these equations as describing a transformation of the vector x = [;:] [ � � ], ( 1) [:: ] into the vector y = . If we denote the matrix of coefficients of the right-hand side by F, then F = and we can rewrite the transformation as or, more succinctly, y = Fx. [Think of this expression as analogous to the functional notation y = f( x) you are used to: x is the independent "variable" here, y is the depen­ dent "variable;' and F is the name of the "function:'] 136 Section Thus, if x = [ - � ], 3. 0 Introduction: Matrices in Action then the Equations ( 1) give 3 3 [ � ] [ � � ] [ - � ]. y1 = - 2 + 2 · 1 = 0 or y = ·1 = y2 = We can write this expression as Problem 1 131 [ o3 ] = Compute Fx for the following vectors x: (a) x = [�] (b) x = [ �] (c) x = _ [ = �J Problem 2 The heads of the four vectors x in Problem 1 locate the four corners of a square in the x 1 x2 plane. Draw this square and label its corners A, B, C, and D, corresponding to parts (a), (b), (c), and (d) of Problem 1 . O n separate coordinate axes (labeled y 1 and y2 ), draw the four points determined by Fx in Problem 1 . Label these points A', B', C', and D'. Let's make the (reasonable) assumption that the line segment AB is transformed into the line segment A ' B ', and likewise for the other three sides of the square ABCD. What geometric figure is rep­ resented by A 'B' C'D'? Problem 3 Th e center o f square ABCD i s the origin 0 = A' B' C' D ' ? What algebraic calculation confirms this? Now consider the equations Z1 = that transform a vector y = [ �: ] Y1 - Yz Zz = - 2y l into the vector z = transformation as z = Gy, where G= [ �]. What i s the center of [ 1 -1] -2 (2) [:: J. We can abbreviate this 0 Problem 4 We are going to find out how G transforms the figure A 'B' C'D'. Compute Gy for each of the four vectors y that you computed in Problem 1 . [That is, compute z = G ( Fx). You may recognize this expression as being analogous to the composition of functions with which you are familiar.] Call the corresponding points A " , B " , C " , and D " , and sketch the figure A "B " C "D " on z 1 z2 coordinate axes. Problem 5 By substituting Equations ( 1 ) into Equations ( 2 ) , obtain equations for z 1 and z2 in terms of x1 and x2 • If we denote the matrix of these equations by H, then we have z = Hx. Since we also have z = GFx, it is reasonable to write H = GF Can you see how the entries of H are related to the entries of F and G? Problem 6 Let's do the above process the other way around: First transform the square ABCD, using G, to obtain figure A * B * C * D * . Then transform the resulting figure, using F, to obtain A ** B ** C ** D ** . [Note: Don't worry about the "variables" x, 138 Chapter 3 Matrices y, and z here. Simply substitute the coordinates of A, B, C, and D into Equations (2) and then substitute the results into Equations ( l ).] Are A * *B* *C**D** and A"B"C"D" the same? What does this tell you about the order in which we perform the transfor­ mations and G? Problem 1 Repeat Problem 5 with general matrices F F = [ ] = [ggll21 gg2212 ] , (2) F = i ! !ll !1 2 , G 2 1 22 and H = [hh2ll1 hh2212 ] F That is, if Equations ( 1) and Equations have coefficients as specified by and G, find the entries of H in terms of the entries of and G. The result will be a formula for the "product" H GF. Problem 8 Repeat Problems 1 -6 with the following matrices. (Your formula from Problem 7 may help to speed up the algebraic calculations.) Note any similarities or differences that you think are significant. F = [� -�l = [� �] F = �l = [� � ] (c) F= [ � � ],c = [ _� - � ] F = - 2 ] = [ 2 � ] ( a) (b) G G ( d) 4 ,G 1 M atrix O p e rati o n s Although we have already encountered matrices, we begin by stating a formal definition and recording some facts for future reference. A matrix is a rectangular array of numbers called the entries, or elements, of the matrix. D e fi n it i o n Although numbers will usually be chosen from the set � of real numbers, they may also be taken from the set C of complex numbers or from "ll_P' where p is prime. Technically, there is a distinction between row/column matrices and vectors, but we will not be­ labor this distinction. We will, however, distinguish between row matrices/vectors and column matrices/vectors. This distinction is important-at the very least­ for algebraic computations, as we will demonstrate. The following are all examples of matrices: [� � l [ � �l Ul 1 1 1 1 -1 'TT [ ], [ 1.2 51 6.9 0 - 7.3 9 -1 4.4 , [ 7 ] 8.5 l The size of a matrix is a description of the numbers of rows and columns it has. A matrix is called m X n (pronounced " m by n") if it has m rows and n columns. Thus, the examples above are matrices of sizes X X 3, 3 X 1, 1 X 4, 3 X 3, and 1 X 1, respectively. A 1 X m matrix i s called a row matrix (or row vector), and an n X 1 matrix is called a column matrix (or column vector) . We use double-subscript notation to refer to the entries of a matrix A. The entry of A in row i and column j is denoted by a ij · Thus, if = 2 2, 2 = A = [� 9 5 then a 1 3 - 1 and a 22 5. (The notation A ij is sometimes used interchangeably with a ij .) We can therefore compactly denote a matrix A by [a;j ] (or [a;j l m x n if it is impor­ tant to specify the size of A, although the size will usually be clear from the context) . Section [ 3.1 Matrix Operations 139 With this notation, a general m X n matrix A has the form A= a ., a 12 a 21 a zz a ml a m 2 If the columns of A are the vectors a1, a2 , . . . , an , then we may represent A as A = [ a 1 a2 · · · a" ] If the rows of A are A 1 , A2 , . . . , Am , then we may represent A as A� []J The diagonal entries of A are a11, aw a 33 , . . . , and if m = n (that is, if A has the same number of rows as columns), then A is called a square matrix. A square matrix whose nondiagonal entries are all zero is called a diagonal matrix. A diagonal matrix all of whose diagonal entries are the same is called a scalar matrix. If the scalar on the diagonal is 1, the scalar matrix is called an identity matrix. For example, let [ 0 6 0 2 5 A= -1 4 The diagonal entries of A are 2 and 4, but A is not square; B is a square matrix of size 2 X 2 with diagonal entries 3 and 5; C is a diagonal matrix; D is a 3 X 3 identity ma­ trix. The n X n identity matrix is denoted by In (or simply I if its size is understood). Since we can view matrices as generalizations of vectors (and, indeed, matrices can and should be thought of as being made up of both row and column vectors), many of the conventions and operations for vectors carry through (in an obvious way) to matrices. Two matrices are equal if they have the same size and if their corresponding entries are equal. Thus, if A = [a ij l m x n and B = [b ij lrxs' then A = B if and only if m = r and n = s and a ij = b;j for all i and j. Exa m p l e 3 . 1 Consider the matrices OJ 3 , and C = [ 2 0 x 5 3 y ] Neither A nor B can be equal to C (no matter what the values of x and y), since A and B are 2 X 2 matrices and C is 2 X 3. However, A = B if and only if a = 2, b = 0, c = 5, and d = 3. Exa m p l e 3 . 2 Consider the matrices R � I 1 4 3 ] •nd c � [r 4 140 Chapter 3 Matrices Despite the fact that R and C have the same entries in the same order, R of- C since R is 1 X 3 and C is 3 X 1 . (If we read R and C aloud, they both sound the same: "one, four, three:') Thus, our distinction between row matrices/vectors and column matrices/vectors is an important one. Matrix Addilion a n d scalar M u ltiolicalion Generalizing from vector addition, we define matrix addition componentwise. If A = [a;) and B = [b;) are m X n matrices, their sum A + B is the m X n matrix obtained by adding the corresponding entries. Thus, [We could equally well have defined A + B in terms of vector addition by specifying that each column (or row) of A + B is the sum of the corresponding columns (or rows) of A and B.] If A and B are not the same size, then A + B is not defined. Exa m p l e 3 . 3 Let A = Then [ OJ [ ] - �] [ ] 1 4 -1 4 3 -3 , and C = , B= 2 1 3 0 2 -2 6 5 but neither A + C nor B + C is defined. The componentwise definition o f scalar multiplication will come a s n o surprise. If A is an m X n matrix and c is a scalar, then the scalar multiple cA is the m X n matrix obtained by multiplying each entry of A by c. More formally, we have [In terms of vectors, we could equivalently stipulate that each column (or row) of cA is c times the corresponding column (or row) of A.] Exa m p l e 3 . 4 For matrix A in Example 3.3, 2A = [ 2 8 - 4 12 -4 -6 -�J 4 The matrix ( - l )A is written as -A and called the negative of A. As with vectors, we can use this fact to define the difference of two matrices: If A and B are the same size, then A - B = A + ( - B) Section Exa m p l e 3 . 5 For matrices A and B in Example 3.3, A-B= [l 1 0 4 -2 6 3.1 Matrix Operations -�] 3 6 141 �] A matrix all of whose entries are zero is called a zero matrix and denoted by 0 (or Om x n if it is important to specify its size) . It should be clear that if A is any matrix and 0 is the zero matrix of the same size, then A+O=A=O+A and A - A = 0 = -A + A M alrix M u ltiplication Mathematicians are sometimes like Lewis Carroll's Humpty Dumpty: "When I use a word;' Humpty Dumpty said, "it means just what I choose it to mean-neither more nor less" (from Through the Look­ ing Glass). The Introduction in Section 3.0 suggested that there is a "product" of matrices that is analogous to the composition of functions. We now make this notion more precise. The definition we are about to give generalizes what you should have discovered in Problems 5 and 7 in Section 3.0. Unlike the definitions of matrix addition and scalar multiplication, the definition of the product of two matrices is not a componentwise definition. Of course, there is nothing to stop us from defining a product of matrices in a componentwise fashion; unfortunately such a definition has few applications and is not as "natural" as the one we now give. If A is an m x n matrix and B is an n x r matrix, then the product C = AB is an m X r matrix. The (i, j) entry of the product is computed as D e fi n it i o n follows: R e m a rks Notice that A and B need not be the same size. However, the number of col­ umns of A must be the same as the number of rows of B. If we write the sizes of A, B, • and AB in order, we can see at a glance whether this requirement is satisfied. More­ over, we can predict the size of the product before doing any calculations, since the number of rows of AB is the same as the number of rows of A, while the number of columns of AB is the same as the number of columns of B, as shown below: A m X n B n X r Size of AB AB m X r 142 Chapter 3 Matrices The formula for the entries of the product looks like a dot product, and indeed it is. It says that the (i, j) entry of the matrix AB is the dot product of the ith row of A and the jth column of B: • a 11 a 1 2 a ln a i1 a i2 a in a ml a m 2 a mn [ b" b2 1 b l) b2j b,, b2 r b nl b nj b nr : Notice that, in the expression c iJ = a i 1 b11 + a i2 b 21 + · · · + a in b n) ' the "outer subscripts" on each ab term in the sum are always i and j whereas the "inner subscripts" always agree and increase from 1 to n. We see this pattern clearly if we write c iJ using sum­ mation notation: n Ci) = � a ikbkj k� I Exa m p l e 3 . 6 Compute AB if [ 1 3 A= -2 - 1 -[ : � -�] 3 - -1 and B = -1 2 0 Since A is 2 X 3 and B is 3 X 4, the product AB is defined and will be a 2 X 4 matrix. The first row of the product C = AB is computed by taking the dot product of the first row of A with each of the columns of B in turn. Thus, C11 = 1 ( - 4) + 3(5) + ( - 1) ( - 1 ) = 12 C1 2 = 1 (0) + 3( - 2) + ( - 1 ) (2) = - 8 C1 3 = 1 ( 3) + 3( - 1) + ( - 1) (0) = 0 C1 = l ( - 1) + 3( 1) + ( - 1) (6) = - 4 The second row of C is computed by taking the dot product of the second row of A with each of the columns of B in turn: Solulion 4 Cz1 = ( - 2)( - 4) + ( - 1)(5) + ( 1) ( - 1) = 2 C22 = ( - 2) ( 0) + ( - 1 ) ( - 2) + ( 1) ( 2 ) = 4 Cz 3 = ( - 2) ( 3) + ( - 1 ) ( - 1 ) + ( 1) ( 0) = - 5 c2 4 = ( - 2)( - l) + ( - l ) ( l) + ( 1 ) ( 6 ) = 7 Thus, the product matrix is given by AB = [ 12 - 8 0 2 4 -5 - �] (With a little practice, you should b e able t o d o these calculations mentally without writing out all of the details as we have done here. For more complicated examples, a calculator with matrix capabilities or a computer algebra system is preferable.) 4 Section 3.1 Matrix Operations 143 Before we go further, we will consider two examples that justify our chosen definition of matrix multiplication. Exa m p l e 3 . 1 Ann and Bert are planning to go shopping for fruit for the next week. They each want to buy some apples, oranges, and grapefruit, but in differing amounts. Table 3.1 lists what they intend to buy. There are two fruit markets nearby-Sam's and Theo's-and their prices are given in Table 3.2. How much will it cost Ann and Bert to do their shopping at each of the two markets? Ta b l e 3 . 2 Ta b l e 3 . 1 Ann Bert Apples 6 4 Grapefruit 3 8 Oranges 10 5 Solution Theo's $0. 15 $0.30 $0.20 Sam's $0. 10 $0.40 $0. 10 Apple Grapefruit Orange If Ann shops at Sam's, she will spend 6(0.10) + 3(0.40) + 10(0. 10) = $2.80 If she shops at Theo's, she will spend 6(0. 1 5) + 3(0.30) + 10(0.20) = $3.80 Bert will spend 4(0. 10) + 8(0.40) + 5(0. 10) = $4. 10 at Sam's and 4(0. 15) + 8(0.30) + 5(0.20) = $4.00 at Theo's. (Presumably, Ann will shop at Sam's while Bert goes to Theo's.) The "dot product form" of these calculations suggests that matrix multiplication is at work here. If we organize the given information into a demand matrix D and a price matrix P, we have D= [ [6 3 0. 10 0. 15 and P = 0.40 0.30 4 8 0. 10 0.20 ] [ 6 3 10 ] [ 0.40 0. 10 0. 15 ] [ ] 2.80 3.80 0.30 - The calculations above are equivalent to computing the product Ta b l e 3 . 3 Ann Bert Sam's $2.80 $4. 10 Theo's $3.80 $4.00 DP = 4 8 5 0. 10 0.20 - 4. 10 4.00 Thus, the product matrix DP tells us how much each person's purchases will cost at each store (Table 3.3). 144 Chapter 3 Matrices Exa m p l e 3 . 8 Consider the linear system X 1 - 2X2 + 3X3 = 5 -X 1 + 3X2 + X3 = 1 2x 1 - x2 + 4x3 = 14 (1) Observe that the left-hand side arises from the matrix product so the system ( 1) can be written as or A x = b, where A is the coefficient matrix, x is the (column) vector of variables, and b is the (column) vector of constant terms. You should have no difficulty seeing that every linear system can be written in the form Ax = b. In fact, the notation [A I b] for the augmented matrix of a linear system is just shorthand for the matrix equation A x = b. This form will prove to be a tre­ mendously useful way of expressing a system of linear equations, and we will exploit it often from here on. Combining this insight with Theorem 2.4, we see that Ax = b has a solution if and only if b is a linear combination of the columns of A. There is another fact about matrix operations that will also prove to be quite use­ ful: Multiplication of a matrix by a standard unit vector can be used to "pick out" or ] [ 4 2 l and consider the 0 5 -1 products Ae 3 and e2A, with the unit vectors e 3 and e2 chosen so that the products "reproduce" a column or row of a matrix. Let A = make sense. Thus, 2 5 [ �] _ and e2A = [ 0 l ] = [O 5 [� - � J 2 5 - 1] Notice that Ae 3 gives us the third column of A and e2A gives us the second row of A. We record the general result as a theorem. Theorem 3 . 1 Let A be an m X n matrix, e; a 1 X m standard unit vector, and ej an n X 1 standard unit vector. Then a. e; A is the ith row of A and b. Aej is the jth column of A. Section 3. 1 Matrix Operations 145 We prove (b) and leave proving (a) as Exercise 4 1 . If a1, . . . , an are the columns of A, then the product Aej can be written Proof Aej = Oa 1 + Oa2 + [ · · · + laj + · · · We could also prove (b) by direct calculation: Ae. = } + Oan = aj 0 a1 1 a2 1 . aml 0 since the 1 in ej is the jth entry. Partitioned Matrices It will often be convenient to regard a matrix as being composed of a number of smaller submatrices. By introducing vertical and horizontal lines into a matrix, we can partition it into blocks. There is a natural way to partition many matrices, par­ ticularly those arising in certain applications. For example, consider the matrix 1 0 A= 0 0 0 0 1 0 0 0 0 0 1 0 0 2 -1 3 4 1 7 0 7 2 It seems natural to partition A as 1 0 0 ! 2 -1 0 1 0i 1 3 � � � L� � o 0 0 ! 1 7 0 0 0! 7 2 - ----- ----- - -------- - [ � �] where I is the 3 X 3 identity matrix, B is 3 X 2, 0 is the 2 X 3 zero matrix, and C is 2 X 2. In this way, we can view A as a 2 X 2 matrix whose entries are themselves matrices. When matrices are being multiplied, there is often an advantage to be gained by viewing them as partitioned matrices. Not only does this frequently reveal underly­ ing structures, but it often speeds up computation, especially when the matrices are large and have many blocks of zeros. It turns out that the multiplication of partitioned matrices is just like ordinary matrix multiplication. We begin by considering some special cases of partitioned matrices. Each gives rise to a different way of viewing the product of two matrices. Suppose A is m X n and B is n X r, so the product AB exists. If we partition B in terms of its column vectors, as B = [b 1 : b 2 : . . . : b r ] , then 146 Chapter 3 Matrices This result is an immediate consequence of the definition of matrix multiplication. The form on the right is called the matrix-column representation of the product. Exa m p l e 3 . 9 If A= then Ab I = [� - 1 �] 3 [ 1 - 1 :t ] � [ ' ] �1 : 5 [3 ] 3 and B = and Ab 2 = Q Therefore, AB = [Ab 1 : Ah2 ] = . 2 : -2 4[ � - �1 i [� - 1 :J [ -� ] � [ _�] 3 . (Check by ordinary matrix multiplication.) � Remark Observe that the matrix-column representation of AB allows us to write each column of AB as a linear combination of the columns of A with entries from B as the coefficients. For example, -1 3 (See Exercises 23 and 26.) Suppose A is m X n and B is n X r, so the product AB exists. If we partition A in terms of its row vectors, as then Once again, this result is a direct consequence of the definition of matrix multiplication. The form on the right is called the row-matrix representation of the product. Exa m p l e 3 . 1 0 Using the row-matrix representation, compute AB for the matrices in Example 3.9. Section Solution 3. 1 Matrix Operations 141 We compute A ,B � [ 1 3 2 ] Therefore ' AB = [ : - �] � [ 13 [-���-] [-�-�- - - - �-] A2 B = 2 -2 ' 5 and A2 B = [ 0 - 1 l ] [� �] 4 -1 [2 -2] as before. The definition of the matrix product AB uses the natural partition of A into rows and B into columns; this form might well be called the row-column representation of the product. We can also partition A into columns and B into rows; this form is called the column-row representation of the product. In this case, we have so (2) Notice that the sum resembles a dot product expansion; the difference is that the in­ dividual terms are matrices, not scalars. Let's make sure that this makes sense. Each term a;B; is the product of an m X 1 and a 1 X r matrix. Thus, each a;B; is an m X r matrix-the same size as AB. The products a;B; are called outer products, and (2) is called the outer product expansion of AB. Exa m p l e 3 . 1 1 Compute the outer product expansion of AB for the matrices in Example 3.9. Solution We have The outer products are and 148 Chapter 3 Matrices (Observe that computing each outer product is exactly like filling in a multiplication table.) Therefore, the outer product expansion of AB is ] [ ] 4 �- J [ : ] [ -1 3 + 0 -1 0 13 5 = = AB 0 2 -2 + We will make use of the outer product expansion in Chapters 5 and 7 when we discuss the Spectral Theorem and the singular value decomposition, respectively. Each of the foregoing partitions is a special case of partitioning in general. A ma­ trix A is said to be partitioned if horizontal and vertical lines have been introduced, subdividing A into submatrices called blocks. Partitioning allows A to be written as a matrix whose entries are its blocks. For example, 1 0 A= 0 0 0 0 0 ! 2 -1 0i 1 3 0 l ..i 4 0 0 0 1 7 0 0 7 2 ________________ _____________ 4 and B = -1 --------- 1 0 3! 1 2 2!2 1 ! = � - _ _ } _ _ _ } _ _: _ }_ 0!0 0i2 1 i0 0i3 are partitioned matrices. They have the block structures and B = [ BB2ll1 If two matrices are the same size and have been partitioned in the same way, it is clear that they can be added and multiplied by scalars block by block. Less obvious is the fact that, with suitable partitioning, matrices can be multiplied blockwise as well. The next example illustrates this process. Exa m p l e 3 . 1 2 Consider the matrices A and B above. If we ignore for the moment the fact that their entries are matrices, then A appears to be a 2 X 2 matrix and B a 2 X 3 matrix. Their product should thus be a 2 X 3 matrix given by [ [ ][ ] A 11 A 1 2 B 11 B 1 2 B 13 A 2 1 A 22 B2 1 B22 B2 3 A 11 B 11 + A , 2B2 1 A 11 B 1 2 + A 1 2B 22 A ll B 13 + A , 2B2 3 A 2 1 B 1 1 + A 22B2 1 A 2 1 B 1 2 + A 22B 22 A 2 1 B 13 + A 22B2 3 But all of the products in this calculation are actually matrix products, so we need to AB = ] make sure that they are all defined. A quick check reveals that this is indeed the case, since the numbers of columns in the blocks of A ( 3 and 2) match the numbers of rows in the blocks of B. The matrices A and B are said to be partitioned conformably for block multiplication. Carrying out the calculations indicated gives us the product AB in partitioned form: A , , B , , + A ,, B,, � I, B ,, + A ,, I, � B ,, + A" � 6 ] � [-: � i [ [ -1 3 = 0 + 4 0 5 -5 Section 3. 1 Matrix Operations 149 (When some of the blocks are zero matrices or identity matrices, as is the case here, these calculations can be done quite quickly.) The calculations for the other five blocks of AB are similar. Check that the result is 6 0 2 : 1 2 2 5 i 2 1 12 5 -5 i 3 3 ' 9 ---------7-r-0----o-T-i3 ' � ' 2 i 0 0 i 20 7 (Observe that the block in the upper-left corner is the result of our calculations above.) Check that you obtain the same answer by multiplying A by B in the usual way. M atrix Powers When A and B are two n X n matrices, their product AB will also be an n X n matrix. A special case occurs when A = B. It makes sense to define A 2 = AA and, in general, to define A k as k A = AA · · · A k factors � if k is a positive integer. Thus, A 1 = A, and it is convenient to define A 0 = Iw Before making too many assumptions, we should ask ourselves to what extent matrix powers behave like powers of real numbers. The following properties follow immediately from the definitions we have just given and are the matrix analogues of the corresponding properties for powers of real numbers. If A is a square matrix and r and s are nonnegative integers, then 1. A rAs = A r+s 2. (A r ) s = A'" In Section 3.3, we will extend the definition and properties to include negative integer powers. Exa m p l e 3 . 1 3 (a) If A = A2 = [� �] [ � � ] [ � � ] [� ] , then [ ] [ � ] [! :J 2 2 1 2 3 , A = A 2A = 2 2 1 2 and, in general, 2n - l 2n - 1 ] for all n 2: 1 The above statement can be proved by mathematical induction, since it is an infinite collection of statements, one for each natural number n. (Appendix B gives a 150 Chapter 3 Matrices 1. brief review of mathematical induction.) The basis step is to prove that the formula holds for n = In this case, A1 = [ 22 11 -- 11 22 11 -- 1J J as required. The induction hypothesis is to assume that 1. 1. [� �J =A for some integer k 2: The induction step is to prove that the formula holds for = k + Using the definition of matrix powers and the induction hypothesis, we compute n �J [ 22(k+(k+ JJ)) -- 1J 1 [ 01 - o1 J ' [ o1 - o1 J [ o1 - o1 J [ - o1 - 01 J. [ - 01 - o1 J [ o1 - 01 J [ _ � � J [ - 01 0l J [ O1 -l0 J [ 0l O1 J Thus, the formula holds for all n 2: by the principle of mathematical induction. (b) If B = we find then B 2 = = Continuing, B 3 = B 2B = and B 4 = B 3B = = 5 Thus, B = B, and the sequence of powers of B repeats in a cycle of four: [ o1 -1J0 ' [ -10 - oJ1 ' [ - o1 0l J ' [ 01 o1 J ' [ o1 -1J0 ' The Tra nspose of a M atrix Thus far, all of the matrix operations we have defined are analogous to operations on real numbers, although they may not always behave in the same way. The next opera­ tion has no such analogue. Section 3. 1 Matrix Operations 151 The transpose of an m x n matrix A is the n x m matrix A T obtained by interchanging the rows and columns of A. That is, the ith column of A T is the ith row of A for all i. Definition Exa m p l e 3 . 1 4 Let A= [ � � � l [ : �l B= and C = [ 5 - 1 2] Then their transposes are The transpose is sometimes used to give an alternative definition of the dot prod­ uct of two vectors in terms of matrix multiplication. If then [ �: j u2 . . . u n l . vn A useful alternative definition of the transpose is given componentwise: (A T) ij = Aji for all i and j In words, the entry in row i and column j of A T is the same as the entry in row j and column i of A. The transpose is also used to define a very important type of square matrix: a symmetric matrix. Definition A square matrix A is symmetric if A T = A-that is, if A is equal to its own transpose. Exa m p l e 3 . 1 5 Let and B = [ 1 2 1 - 3 ] Chapter 152 3 Matrices Then A is symmetric, since A T = A; but B is not symmetric, since B T = [� -�] * 4 B. A symmetric matrix has the property that it is its own "mirror image" across its main diagonal. Figure 3.1 illustrates this property for a 3 X 3 matrix. The correspond­ ing shapes represent equal entries; the diagonal entries (those on the dashed line) are arbitrary. A componentwise definition of a symmetric matrix is also useful. It is simply the algebraic description of the "reflection" property. Figure 3 . 1 A symmetric matrix A square matrix A is symmetric if and only if A ij = Aji for all i and j. .. Let I Exercises 3 . 1 c [ _ � �l = [� [ � :J 0 -3 ] E = (4 2 ] , F = [ -�] D = [_ 2 -2 2 B A= l � , In Exercises 1 - 1 6, compute the indicated matrices (if possible). 2. 3D - 2A 1 . A + 2D 4. c - B T 3. B - C 5. AB 6. BD 7. D + BC 8. BB T 10. F (DF) 9. E (AF) 1 1 . FE 12. EF 14. DA - AD 13. B TCT - ( CB ) T 15. A 3 16. U2 - D) 2 17. Give an example of a nonzero 2 X 2 matrix A such [� �] . that A 2 = 0. 18. Let A = Find 2 X 2 matrices B and C such that AB = AC but B *- C. 19. A factory manufactures three products (doohickies, gizmos, and widgets) and ships them to two ware­ houses for storage. The number of units of each prod­ uct shipped to each warehouse is given by the matrix [ 200 75 A = 1 50 100 100 125 ] (where a ij is the number of units of product i sent to warehouse j and the products are taken in alphabetical order) . The cost of shipping one unit of each product by truck is $ 1 .50 per doohickey, $1 .00 per gizmo, and $2.00 per widget. The corresponding unit costs to ship by train are $ 1 .75, $1 .50, and $1 .00. Organize these costs into a matrix B and then use matrix multiplica­ tion to show how the factory can compare the cost of shipping its products to each of the two warehouses by truck and by train. 20. Referring to Exercise 19, suppose that the unit cost of distributing the products to stores is the same for each product but varies by warehouse because of the distances involved. It costs $0.75 to distribute one unit from warehouse 1 and $1 .00 to distribute one unit from warehouse 2. Organize these costs into a matrix C and then use matrix multiplication to compute the total cost of distributing each product. Section In Exercises 21 -22, write the given system of linear equa­ tions as a matrix equation of the form Ax = b. X 1 - 2X2 + 3X3 = 0 2x 1 + x2 - 5x3 = 4 - X1 + 2X3 = 1 21. 22. = -2 X 1 - Xz Xz + x3 = - 1 In Exercises 23-28, let A= H -�i u :J 0 B= 25. 26. 27. 28. 3 -1 6 to write each column of AB as a linear combination of the columns of A. Use the row-matrix representation of the product to write each row of AB as a linear combination of the rows of B. Compute the outer product expansion of AB. Use the matrix-column representation of the product to write each column of BA as a linear combination of the columns of B. Use the row-matrix representation of the product to write each row of BA as a linear combination of the rows of A. Compute the outer product expansion of BA. In Exercises 29 and 30, assume that the product AB makes sense. 29. Prove that if the columns of B are linearly dependent, then so are the columns of AB. 30. Prove that if the rows of A are linearly dependent, then so are the rows of AB. In Exercises 31 -34, compute AB by block multiplication, using the indicated partitioning. 31. A � o [ � H i � J n � [ o of 1 l 1 - 1 :: 0 0 2 3: 33. A � :1 B -2 : 3 2 [! �L! J· 34· L:jH ' ] [ 35. [ ] B= 0 0 Oi4 O 1 . Let A = -1 1 23. Use the matrix-column representation of the product 24. [ 153 o o � � ] . [ +!t -� l A= 0 -1 and 32. A = 42 53 0 3. 1 Matrix Operations : -1 1 : 0 0 0: 1 (a) Compute A 2 , A 3 , . • . , A 7 • (b) What is A 2 0 1 5 ? Why? 36. Let B = 37. Let A = 0 0. [[� ] ] � ] [ [ - v2 Find, with justification, B 2 0 1 5 . v2 . Find a formula for A n ( n 2 1) and verify your formula using mathematical induction. 38. Let A = cossin tJtJ -cossin etJ . 39. ] cos W - sin W (a) Show that A 2 = . . cos W sm W (b) Prove, by mathematical induction, that cos n tJ - sin n tJ c An = 1or n 2 1 cos n tJ sin n tJ In each o f the following, find the 4 X 4 matrix A = [ a;j ] that satisfies the given condition: (b) a u = j - i (a) a ij = ( - l) i +j (i + j - 1 )7T (c) a ij = (i - lY (d) aiJ = sin [ ] ( ) 40. In each of the following, find the 6 X 6 matrix A = [aij] 41. { { that satisfies the given condition: i + j if i ::; j (a) a ij = (b) a ij = 0 if i > j 1 if 6 ::; i + j ::; 8 (c) a11 = 0 otherwise Prove Theorem 3 . l (a) . 4 { 01 II ii - jj ll > 11 if if - ::::: 154 Chapter 3 Matrices M atrix A l g e b ra In some ways, the arithmetic of matrices generalizes that of vectors. We do not expect any surprises with respect to addition and scalar multiplication, and indeed there are none. This will allow us to extend to matrices several concepts that we are already familiar with from our work with vectors. In particular, linear combinations, span­ ning sets, and linear independence carry over to matrices with no difficulty. However, matrices have other operations, such as matrix multiplication, that vec­ tors do not possess. We should not expect matrix multiplication to behave like multi­ plication of real numbers unless we can prove that it does; in fact, it does not. In this section, we summarize and prove some of the main properties of matrix operations and begin to develop an algebra of matrices. Prooerlies of Addilion and scalar M u lliolicalion All of the algebraic properties of addition and scalar multiplication for vectors (Theorem 1 . 1 ) carry over to matrices. For completeness, we summarize these proper­ ties in the next theorem. Theorem 3 . 2 Algebraic Properties of Matrix Addition and Scalar Multiplication Let A, B, and C be matrices of the same size and let c and d be scalars. Then a. A + B = B + A b. (A + B) + C = A + (B + C ) c. A + 0 = A d. A + ( -A) = 0 e. c (A + B) = cA + cB f. ( c + d )A = cA + dA g. c (dA) = (cd )A h. IA = A Commutativity Associativity Distributivity Distributivity The proofs of these properties are direct analogues of the corresponding proofs of the vector properties and are left as exercises. Likewise, the comments following Theorem 1 . 1 are equally valid here, and you should have no difficulty using these properties to perform algebraic manipulations with matrices. (Review Example 1.5 and see Exercises 17 and 18 at the end o f this section.) The associativity property allows us to unambiguously combine scalar multiplica­ tion and addition without parentheses. IfA, B, and C are matrices of the same size, then ( 2A + 3B ) - C = 2A + ( 3B - C) and so we can simply write 2A + 3B - C. Generally, then, if A 1 , A 2 , , Ak are matri­ ces of the same size and c 1 , c2 , . . . , ck are scalars, we may form the linear combination • . . c 1 A 1 + c 2A 2 + · · · + ckA k We will refer to c 1 , c2 , . . . , ck as the coefficients of the linear combination. We can now ask and answer questions about linear combinations of matrices. Section Exa m p l e 3 . 1 6 Let A 1 = [ _ � �l [ � �l [ � �] [� !] A2 = and A 3 = 3. 2 Matrix Algebra 155 [ � � ]. (a) Is B = a linear combination ofA 1 , A 2 , and A 3 ? (b) Is C = a linear combination of A 1 , A 2 , and A 3 ? Solution (a) We want to find scalars c 1 , c2 , and c 3 such that c 1A 1 + c2A 2 + c 3A 3 = B. Thus, The left-hand side of this equation can be rewritten as Comparing entries and using the definition of matrix equality, we have four linear equations: C1 - cl 1 C2 + c 3 = + c3 = 4 + c3 = 2 C2 + C 3 = 1 [ � � � :j ----+ [ � 0 � � j Gauss-Jordan elimination easily gives � 1, -1 0 1 2 0 1 - 0 0 1 3 0 0 0 0 (check this!), so c 1 = c 2 = - 2, and c3 = 3. Thus, A 1 - 2A 2 + 3A 3 = B, which can be easily checked. (b) This time we want to solve Proceeding as in part (a), we obtain the linear system C1 - c1 1 c2 + C3 = + c3 = 2 + c3 = 3 C2 + C 3 = 4 156 Chapter 3 Matrices Row reduction gives We need go no further: The last row implies that there is no solution. Therefore, in this case, C is not a linear combination of Ai, A 2 , and A 3 • Observe that the columns of the augmented matrix contain the entries of the matrices we are given. If we read the entries of each matrix from left to right and top to bottom, we get the order in which the entries appear in the columns of the augmented matrix. For example, we read Ai as " O, 1, - 1 , O," which corresponds to the first column of the augmented matrix. It is as if we simply "straightened out" the given matrices into column vectors. Thus, we would have ended up with exactly the same system of linear equations as in part (a) if we had asked Remark {] a linemombillation of [-l}[ � l [ '. } and We will encounter such parallels repeatedly from now on. In Chapter 6, we will explore them in more detail. We can define the span of a set of matrices to be the set of all linear combinations of the matrices. Exa m p l e 3 . 1 1 Describe the span of the matrices Ai, A 2 , and A 3 in Example 3 . 1 6. One way to do this is simply to write out a general linear combination of Ai, A 2 , and A 3 • Thus, Solulion �] [ ; :J (which is analogous to the parametric representation of a plane) . But suppose we want to know when the matrix is in span (Ai, A 2 , A 3 ) . From the representa­ tion above, we know that it is when for some choice of scalars Ci, c2 , c3 . This gives rise to a system of linear equations whose left-hand side is exactly the same as in Example 3 . 1 6 but whose right-hand side Section 3.2 Matrix Algebra 151 is general. The augmented matrix of this system is and row reduction produces __. [ -l � : fl [� 0 � -- 0 0 ! x tJ 0 - ! x tJ + w 0 1 !x + h 0 0 w-z l (Check this carefully.) The only restriction comes from the last row, where clearly we must have w - z = in order to have a solution. Thus, the span of A 1 , A 2 , and A 3 con- [ ; :] sists of all matrices for which w = z. That is, span (A 1 , A 2 , A 3 ) = [ ] [� �] { [ ; :] } . 4 If we had known this before attempting Example 3.16, we would have seen Nole is a linear combination of A 1 , A 2 , and A 3 , since it has 1 2 the necessary form (take w = 1, x = 4, and y = 2), but C = cannot be a linear 3 4 combination of A 1 , A 2 , and A 3 , since it does not have the proper form ( 1 i= 4). immediately that B = Linear independence also makes sense for matrices. We say that matrices A 1 , A2 , , Ak of the same size are linearly independent if the only solution of the equation (1) • • • 0. is the trivial one: c 1 = c2 = · · · = ck = If there are nontrivial coefficients that satisfy ( 1 ) , then A 1 , A 2 , , Ak are called linearly dependent. • Exa m p l e 3 . 1 8 . . Determine whether the matrices A 1 , A 2 , and A 3 in Example 3 . 1 6 are linearly independent. We want to solve the equation c 1A 1 + c2A 2 + c3A 3 = 0. Writing out the matrices, we have Solulion This time we get a homogeneous linear system whose left-hand side is the same as in Examples 3 . 1 6 and 3.17. (Are you starting to spot a pattern yet?) The augmented matrix row reduces to give 158 Chapter 3 Matrices Thus, c1 = c2 = c 3 = 0 , and we conclude that the matrices A1, A 2 , and A 3 are linearly independent. 4 Prooenies of M alrix M ulliolicalion Whenever we encounter a new operation, such as matrix multiplication, we must be careful not to assume too much about it. It would be nice if matrix multiplication behaved like multiplication of real numbers. Although in many respects it does, there are some significant differences. Exa m p l e 3 . 1 9 Consider the matrices [ � �] [ � �] [ � �] [ � �] [ : �] [ � �] [ � �] [� �] A= _ and B = _ Multiplying gives AB = _ _ _ _ and BA = _ _ Thus, AB -=fa BA. So, in contrast to multiplication of real numbers, matrix multiplica­ tion is not co m m uta tive the order of the factors in a product matters! - It is easy to check that A 2 = [� �] (do so!). So, for matrices, the equation A 2 = 0 does not imply that A = 0 (unlike the situation for real numbers, where the equation x2 = 0 has only x = 0 as a solution) . However gloomy things might appear after the last example, the situation is not really bad at all-you just need to get used to working with matrices and to constantly remind yourself that they are not numbers. The next theorem summarizes the main properties of matrix multiplication. Theorem 3 . 3 Properties of Matrix Multiplication Let A, B, and C be matrices (whose sizes are such that the indicated operations can be performed) and let k be a scalar. Then a. A (B C ) = (AB) C b. A ( B + C ) = AB + AC c. (A + B)C = AC + BC d. k (AB ) = ( kA ) B = A ( kB ) e. Im A = A = Aln if A is m X n Associativity Left distributivity Right distributivity Multiplicative identity We prove (b) and half of (e) . We defer the proof of property (a) until Section 3.6. The remaining properties are considered in the exercises. Proof Section 3. 2 Matrix Algebra 159 (b) To prove A(B + C ) = AB + AC, we let the rows of A be denoted by A; and the columns of B and C by bj and cf Then the jth column of B + C is bj + cj (since addi­ tion is defined componentwise), and thus [A ( B + C) J u = A; · ( bj + c) = A; . bj + A; . CJ = (AB ) ;1 + (AC) ;1 = (AB + A C) ;j Since this is true for all i and j, we must have A(B + C ) = AB + AC. (e) To prove Ain = A, we note that the identity matrix In can be column-partitioned as where e; is a standard unit vector. Therefore, Ain = [Ae 1 : Ae2 : : Aen l : an ] = [ a 1 : az : =A • · · • · · by Theorem 3 . 1 (b) . We can use these properties to further explore how closely matrix multiplication resembles multiplication of real numbers. Exa m p l e 3 . 2 0 If A and B are square matrices of the same size, is (A + B) 2 = A 2 + 2AB + B 2 ? Solution Using properties of matrix multiplication, we compute ( A + B ) 2 = ( A + B ) (A + B ) by left distributivity = (A + B ) A + (A + B ) B by right distributivity = A 2 + BA + AB + B 2 Therefore, (A + B) 2 = A 2 + 2AB + B 2 if and only if A 2 + BA + AB + B 2 = A 2 + 2AB + B 2 . Subtracting A 2 and B 2 from both sides gives BA + AB = 2AB. Subtracting AB from both sides gives BA = AB. Thus, (A + B) 2 = A 2 + 2AB + B2 if and only if A and B commute. (Can you give an example of such a pair of matrices? Can you find two matrices that do not satisfy this property?) Properties of the Transpose Theorem 3 . 4 Properties of the Transpose Let A and B be matrices (whose sizes are such that the indicated operations can be performed) and let k be a scalar. Then a. (A T ) T = A b. (A + B) T = A T + B T T d. (AB) T = B TA T c. (kAl = k (A ) r r e. (A l = (A T ) for all nonnegative integers r 160 Chapter 3 Matrices Proof Properties (a) -(c) are intuitively clear and straightforward to prove (see Exercise 30). Proving property ( e) is a good exercise in mathematical induction (see Exercise 3 1). We will prove (d), since it is not what you might have expected. [Would you have sus­ pected that (AB) T = A TB T might be true?] First, ifA is m X n and B is n X r, then B T is r X n and A T is n X m. Thus, the product B TA T is defined and is r X m. Since AB is m X r, (AB) T is r X m, and so (AB) T and B TA T have the same size. We must now prove that their corresponding entries are equal. We denote the ith row of a matrix X by row; (X) and its jth column by col/X) . Using these conventions, we see that [ (AB ) T ] ;j = (AB )ji = row/A ) · col; ( B ) = col/A T) · row; ( B T) = row; ( B T) · col/A T) = [ B TA T ] ij (Note that we have used the definition of matrix multiplication, the definition of the transpose, and the fact that the dot product is commutative.) Since i and j are arbi­ trary, this result implies that (AB) T = B TA T. Remark Properties (b) and (d) of Theorem 3.4 can be generalized to sums and products of finitely many matrices: (A 1 + A 2 + · · · + A k f = A[ + Af + · · · + A [ and (A 1 A 2 · · A k f • = A [ · · AfA [ assuming that the sizes o f the matrices are such that all o f the operations can be per­ formed. You are asked to prove these facts by mathematical induction in Exercises 32 and 33. - Exa m p l e 3 . 2 1 Let Then A T = We have [� ] A = 3 4 ' so A + AT = [ �] [ � :] and B = 2 5 -1 3 , a symmetric matrix. H �l [� � i [ - r n � [ ': l: J H : J [� � i � n � ] BT = so BBT = and [� !] B TB = -1 3 -1 3 2 10 3 Thus, both BB T and B TB are symmetric, even though B is not even square! (Check that AA T and A TA are also symmetric.) Section 3.2 Matrix Algebra 161 The next theorem says that the results of Example 3.2 1 are true in general. Theorem 3 . 5 a. If A is a square matrix, then A + A T is a symmetric matrix. b. For any matrix A, AA T and A TA are symmetric matrices. Proof We prove (a) and leave proving (b) as Exercise 34. We simply check that (A + A Tf = A T + (A Tf = A T + A = A + AT (using properties of the transpose and the commutativity of matrix addition). Thus, A + A T is equal to its own transpose and so, by definition, is symmetric. I Exercises 3 . 2 [� !] [ - � �] . - - In Exercises 1 -4, solve the equation for X, given that and B = A= 1. X 2A + 3B = 0 In Exercises 9- 12, find the general form of the span of the indicated matrices, as in Example 3. 1 7. 9. span(A 1 , A 2 ) in Exercise 5 10. span(A 1 , A 2 , A 3 ) in Exercise 6 1 1 . span(A 1 , A 2 , A 3 ) in Exercise 7 12. span(A 1 , A 2 , A 3 , A 4 ) in Exercise 8 2. 2X = A B 3. 2 (A + 2B ) = 3X 4. 2 (A - B + X) = 3 (X - A ) In Exercises 5-8, write B as a linear combination of the other matrices, ifpossible. 5. B = 6. B = A3 = 7. B = A2 = 8. B � A, � [� :l [ _ � �l [� � ] [ � l [ l �l [ � ] [� � ] [ � �l [ l ] [ �l [� �] [: - n [ � n [: n n i A1 = 2 -4 Az = AI = Q AI = -1 2 0 Az = Q A3 = -2 1 0 0 0 0 A, � A3 = 0 -1 , 0 1 1 -1 0 In Exercises 1 3- 1 6, determine whether the given matrices are linearly independent. , 13. 14. 15. 0 0 1 0 0 -1 0 , 0 -1 I •. [� !l [� � ] [� �l [ - � �l [ � � ] , [ � � ] , [ � � ] , [ � � i [ � - !] [� -� m: ! m� : n -[ � � i -1 0 1 -1 0 0 -4 1 0 2 4 5 Chapter 162 3 Matrices 17. Prove Theorem 3.2(a) -(d). 18. Prove Theorem 3.2(e) -(h). 19. Prove Theorem 3.3(c). 20. Prove Theorem 3.3(d) . 21. Prove the half o f Theorem 3.3(e) that was not proved in the text. 22. Prove that, for square matrices A2 and B,2 AB = BA if and only if (A - B) (A + B) = A - B • [ : �] ,find conditions on a, b, In Exercises 23-25, if B = c, and d such that AB = BA. 23. A = [ � ] 24. [ l - ] 25. A [ ] 1 1 A= 1 1 -1 = 26. Find conditions on a, b, c, and d such that B = commutes with both 2 3 4 0 o o and o o [ ] 27. Find conditions on a, b, c, and d such that B = ac db commutes with every 2 X 2 matrix. 28. Prove that if AB and BA are both defined, then AB and BA are both square matrices. A square matrix is called upper triangular if all of the en­ tries below the main diagonal are zero. Thus, the form of an upper triangular matrix is * 0 0 0 * * * * * * * * 0 0 0 * where the entries marked are arbitrary. A more formal definition of such a matrix A = [ a ij ] is that a ij = 0 if i > j. Prove that the product of two upper triangular n X n * then so is A + B. 36. 37. (b) Prove that if A is a symmetric n X n matrix, then so is kA for any scalar k. (a) Give an example to show that if A and B are symmetric n X n matrices, then AB need not be symmetric. (b) Prove that if A and B are symmetric n X n matrices, then AB is symmetric if and only if AB = BA. A square matrix is called skew-symmetric if A r = -A. Which of the following matrices are skew-symmetric? (a) 1 [ ] [ 1 35. (a) Prove that ifA and B are symmetric n X n matrices, 29. matrices is upper triangular. 30. Prove Theorem 3.4(a) - (c) . 31. Prove Theorem 3.4(e). 32. Using induction, prove that for all n 2: 1, (A 1 + A 2 + · · · + A n ) r = A f + Af + · · · + A�. 33. Using induction, prove that for all n 2: 1 , T (A 1 A 1 · · · A n ) = A�· · · AfA f. 34. Prove Theorem 3.S(b). (<) [� -�] [ _� �] -[ : -� - �] [ - : � �] (b) (d) 38. Give a componentwise definition of a skew-symmetric matrix. 39. Prove that the main diagonal of a skew-symmetric ma­ trix must consist entirely of zeros. 40. Prove that if A and B are skew-symmetric n X n matrices, then so is A + B. 41. If A and B are skew-symmetric 2 X 2 matrices, under what conditions is AB skew-symmetric? 42. Prove that if A is an n X n matrix, then A - A r is skew-symmetric. 43. (a) Prove that any square matrix A can be written as the sum of a symmetric matrix and a skew­ symmetric matrix. [Hint: Consider Theorem 3.5 and Exercise 42.] 2 5 (b) Illu,tmte pact (o) fo, the nrnt'ix A � [� 8 The trace of an n X n matrix A = [ a ij ] is the sum of the en­ tries on its main diagonal and is denoted by tr(A). That is, tr (A ) = a 11 + a 22 + · · · + a "" If A and B are n X n matrices, prove the following 44. 45. 46. 47. properties of the trace: (a) tr (A + B) = tr (A) + tr (B) (b) tr (kA) = ktr (A), where k is a scalar Prove that if A and B are n X n matrices, then tr (AB) = tr (BA). If A is any matrix, to what is tr (AA T ) equal? Show that there are no 2 X 2 matrices A and B such that AB - BA = 12 • Section 3. 3 The Inverse of a Matrix 163 T h e Inverse of a M atrix In this section, we return to the matrix description Ax = b of a system of linear equa­ tions and look for ways to use matrix algebra to solve the system. By way of analogy, consider the equation ax = b, where a, b, and x represent real numbers and we want to solve for x. We can quickly figure out that we want x = b/a as the solution, but we must remind ourselves that this is true only if a * 0. Proceeding more slowly, assum­ ing that a * 0, we will reach the solution by the following sequence of steps: ( ) 1 1 1 b b b = -( =:> -( a) x = - =:> l · x = - =:> x = ax = b =:> -( a ax) a b) a a a a (This example shows how much we do in our head and how many properties of arith­ metic and algebra we take for granted! ) To imitate this procedure for the matrix equation Ax = b, what do we need? We need to find a matrix A ' (analogous to 1 /a) such that A ' A = I, an identity matrix (analogous to 1 ) . If such a matrix exists (analogous to the requirement that a * O), then we can do the following sequence of calculations: Ax = b =:> A ' (Ax) = A 'b =:> (A 'A)x = A 'b =:> Ix = A 'b =:> x = A 'b � � (Why would each of these steps be justified?) Our goal in this section is to determine precisely when we can find such a matrix A ' . In fact, we are going to insist on a bit more: We want not only A ' A = I but also AA' = I. This requirement forces A and A ' to be square matrices. (Why?) If A is an n x n matrix, an inverse of A is an n x n matrix A ' with the property that AA ' = I and A ' A = I where I = In is the n X n identity matrix. If such an A ' exists, then A is called invertible. Definition Exa m p l e 3 . 2 2 If A = AA , = Exa m p l e 3 . 2 3 [ � 53 ] , [ 3 -52 ] [ 2 35 ] [ 3 -52 ] [ � ] , then A = 1 -1 -1 o 1 is an inverse of A, since [ 3 - 25 ] [ 2 35 ] and A 'A = - 1 Show that the following matrices are not invertible: (a) 0 = [� �] (b) B = 1 [� �] Solution (a) It is easy to see that the zero matrix 0 does not have an inverse. If it did, then there would be a matrix O ' such that 00' = I = O' O. But the product of the zero matrix with any other matrix is the zero matrix, and so 00' could never equal the identity 164 Chapter 3 Matrices matrix I. (Notice that this proof makes no reference to the size of the matrices and so is true for n X n matrices in general.) [; :] [ � ! ][; :J [ � � ] (b) Suppose B has an inverse B' = . The equation BB' = I gives = from which we get the equations + 2y w x 2w + 4y 2x = 1 + 2z = 0 = O + 4z = 1 Subtracting twice the first equation from the third yields 0 = - 2, which is clearly absurd. Thus, there is no solution. (Row reduction gives the same result but is not really needed here.) We deduce that no such matrix B' exists; that is, B is not invert­ ible. (In fact, it does not even have an inverse that works on one side, let alone two!) 4 R e m a rks Even though we have seen that matrix multiplication is not, in general, commutative, A' (if it exists) must satisfy A' A = AA' . • The examples above raise two questions, which we will answer in this section: • ( 1) How can we know when a matrix has an inverse? (2) If a matrix does have an inverse, how can we find it? We have not ruled out the possibility that a matrix A might have more than • one inverse. The next theorem assures us that this cannot happen. Theorem 3 . 6 If A is an invertible matrix, then its inverse is unique. Proof In mathematics, a standard way to show that there is just one of something is to show that there cannot be more than one. So, suppose that A has two inverses-say, A ' and A " . Then Thus, AA ' = I = A 'A and AA " = I = A " A A ' = A 'I = A ' (AA " ) = (A 'A )A " = IA " = A " Hence, A ' = A", and the inverse is unique. Thanks to this theorem, we can now refer to the inverse of an invertible matrix. From now on, when A is invertible, we will denote its (unique) inverse by A - 1 (pro­ nounced "A inverse"). Warning Do not be tempted to write A - l = 1 A ! There is no such operation as "division by a matrix." Even if there were, how on earth could we divide the scalar 1 by Section 3. 3 The Inverse of a Matrix 165 the matrix A? If you ever feel tempted to "divide" by a matrix, what you really want to do is multiply by its inverse. We can now complete the analogy that we set up at the beginning of this section. Theorem 3 . 1 If A is an invertible n X n matrix, then the system of linear equations given by Ax = b has the unique solution x = A - l b for any b in !R n . Theorem 3.7 essentially formalizes the observation we made at the beginning of this section. We will go through it again, a little more carefully this time. We are asked to prove two things: that Ax = b has a solution and that it has only one solution. (In mathematics, such a proof is called an "existence and uniqueness" proof.) To show that a solution exists, we need only verify that x = A - l b works. We check that Proof So A - l b satisfies the equation Ax = b, and hence there is at least this solution. To show that this solution is unique, suppose y is another solution. Then Ay = b, and multiplying both sides of the equation by A - l on the left, we obtain the chain of implications A - 1 (Ay) = A - 1 b => (A- 1A) y = A - 1 b => Iy = A - 1 b => y = A - 1 b Thus, y is the same solution as before, and therefore the solution is unique. So, returning to the questions we raised in the Remarks before Theorem 3.6, how can we tell if a matrix is invertible and how can we find its inverse when it is invert­ ible? We will give a general procedure shortly, but the situation for 2 X 2 matrices is sufficiently simple to warrant being singled out. Theorem 3 . 8 If A = [ : �], then A is invertible if ad - be * 0, in which case A-1 - [ ] 1 d ad - be - e If ad - be = 0 , then A is not invertible. [ ] - b a The expression ad - be is called the determinant of A, denoted det A. The formula a b 1 for the inverse of (when it exists) is thus times the matrix obtained by e d dct A interchanging the entries on the main diagonal and changing the signs on the other two entries. In addition to giving this formula, Theorem 3.8 says that a 2 X 2 matrix A is invertible if and only if det A * 0. We will see in Chapter 4 that the determinant can be defined for all square matrices and that this result remains true, although there is no simple formula for the inverse of larger square matrices. -- [ ae db ] [ -ed - ab ] [aedd -- bede -ebab daba] [ad - be ad - be] Proof Suppose that det A = ad - be * 0. Then = - + + = 0 0 = det A [ OJ l 0 1 166 Chapter 3 Matrices Similarly, [ ] [ ] [ OJ [ : �] ( � [ � - � ] ) [ � � ] ( � [ � - � ] ) [ : �] [ � � ] [ ] d -b a b l = det A -e a e d 0 1 Since det A * 0, we can multiply both sides of each equation by 1 / det A to obtain = de A - and = de A - [Note that we have used property (d) of Theorem 3.3.] Thus, the matrix 1 d -b det A - e a satisfies the definition of an inverse, so A is invertible. Since the inverse of A is unique, by Theorem 3.6, we must have A-1 - [ ] d -b det A - e a 1 -- Conversely, assume that ad - be = 0. We will consider separately the cases where a * 0 and where a = 0. If a * 0, then d = be/a, so the matrix can be written as where k = e/a. In other words, the second row of A is a multiple of the first. Referring to Example 3.23(b), we see that if A has an inverse [; ;] , then and the corresponding system of linear equations + by aw ax kaw � + kby kax = 1 O + bz = 0 = + kbz = 1 has no solution. (Why?) If a = 0, then ad - be = 0 implies that be = 0, and therefore either b or e is 0. Thus, A is of the form [ � �] [ � �] [ � �] [; ;] [ � � ] [ � � ] . or In the first case, � * have an inverse. (Verify this.) Consequently, if ad - be = 0, then A is not invertible. Similarly, [ � �] cannot Section Exa m p l e 3 . 2 4 Find the inverses o f A = Solution [ l 2 ] [ 12 --515 ] 1(4)1- 2 --2 2 - : � _ _-_2 [ 1 ] [ - --] 12 ( - 5) - ( - 15) 3 4 We have <let A = A-1 � Exa m p l e 3 . 2 5 3. 3 The Inverse of a Matrix and B = (3) = 4 , if they exist. 4 * 0, so A is invertible, with = -3 (Check this.) On the other hand, <let B = 2 2 (4) = 0, so B is not invertible. 2yy - 2 Use the inverse o f the coefficient matrix to solve the linear system x+ = 3x + 4 = Solution -+- 3 The coefficient matrix is the matrix A = [ _ � ]; 161 [� ! l whose inverse we com­ 1 puted in Example 3.24. By Theorem 3.7, Ax = b has the unique solution x = A - b. Here we have b = thus, the solution to the given system is x= [ - � -tJ [ _ � ] [ -�] 22 1 1 Solving a linear system Ax = b via x = A - b would appear to be a good method. Unfortunately, except for X coefficient matrices and matrices with cer­ tain special forms, it is almost always faster to use Gaussian or Gauss-Jordan elimi­ nation to find the solution directly. (See Exercise 3 .) Furthermore, the technique of Example works only when the coefficient matrix is square and invertible, while elimination methods can always be applied. Remark 3.25 Properties of Invertible Malrices The following theorem records some of the most important properties of invertible matrices. Theorem 3 . 9 a. If A is an invertible matrix, then A - I is invertible and b. If A is an invertible matrix and c is a nonzero scalar, then cA is an invertible matrix and _l_ (cA) - 1 = c A - 1 c. If A and B are invertible matrices of the same size, then AB is invertible and 168 Chapter 3 Matrices d. If A is an invertible matrix, then A T is invertible and e. If A is an invertible matrix, then A n is invertible for all nonnegative inte­ gers n and (A " ) - 1 = (A - 1 ) " Proof We will prove properties (a), (c), and (e), leaving properties (b) and (d) to be proven in Exercises 1 4 and 1 5 . (a) To show that A - i i s invertible, we must argue that there i s a matrix X such that A - 1x = I = XA - 1 But A certainly satisfies these equations in place of X, so A - l is invertible and A is an inverse of A - 1 . Since inverses are unique, this means that (A - 1 ) - 1 = A. ( c) Here we must show that there is a matrix X such that (AB ) X = I = X (AB ) The claim is that substituting B - 1A - l for X works. We check that � where we have used associativity to shift the parentheses. Similarly, (B - 1A - l ) (AB ) = I (check!), so AB is invertible and its inverse is B - 1A - 1 . (e) The basic idea here is easy enough. For example, when n = 2, we have A 2 (A - 1 ) 2 = AAA - IA - I = A L4. - I = AA - I = I Similarly, (A - I ) 2A 2 = I. Thus, (A - I ) 2 is the inverse of A 2 . It is not difficult to see that a similar argument works for any higher integer value of n. However, mathematical induction is the way to carry out the proof. The basis step is when n = 0, in which case we are being asked to prove that A 0 is invertible and that � This is the same as showing that I is invertible and that r 1 = I, which is clearly true. (Why? See Exercise 1 6.) Now we assume that the result is true when n = k, where k is a specific nonnega­ tive integer. That is, the induction hypothesis is to assume that A k is invertible and that The induction step requires that we prove that Ak+I is invertible and that k k1 k 1 1 1k1 (A + ) - = (A - ) + . Now we know from (c) that A + = A A is invertible, since A and (by hypothesis) A k are both invertible. Moreover, (A - 1 ) k+ 1 = (A - 1 ) kA - 1 by the induction hypothesis = (A k ) - 1A - 1 = (AA k ) - 1 by property (c) = (A k+ 1 ) - 1 Section 3. 3 The Inverse of a Matrix 169 Therefore, A n is invertible for all nonnegative integers n , and (A n ) - 1 = (A - l ) n by the principle of mathematical induction. R e m a rks While all of the properties of Theorem 3.9 are useful, ( c) is the one you should highlight. It is perhaps the most important algebraic property of matrix inverses. It is also the one that is easiest to get wrong. In Exercise 17, you are asked to give a counterexample to show that, contrary to what we might like, (AB) - 1 * A - l B - 1 in general. The correct property, (AB) - 1 = B - 1A - 1 , is sometimes called the socks-and­ shoes rule, because, although we put our socks on before our shoes, we take them off in the reverse order. • Property (c) generalizes to products of finitely many invertible matrices: If A 1 , A 2 , . . . , A n are invertible matrices of the same size, then A 1 A 2 • • • A n is invertible and • (A 1A 2 · · · A n ) - 1 = A ;;- 1 · · · A2 1A� 1 (See Exercise 1 8.) Thus, we can state: The inverse of a product of invertible matrices is the product of their inverses in the reverse order. 1 1 1 a b a + b square matrices, (A + B) - 1 = A - 1 + B - 1 (and, indeed, this is not true in general; see Exercise 1 9 ) . In fact, except for special matrices, there is no formula for (A + B) - 1 . • Since, for real numbers, -- * - + -, we should not expect that, for Property (e) allows us to define negative integer powers of an invertible matrix: • Definition defined by If A is an invertible matrix and n is a positive integer, then A - n is With this definition, it can be shown that the rules for exponentiation, A rA s = A'· + s and (AT = A'5, hold for all integers r and s, provided A is invertible. One use of the algebraic properties of matrices is to help solve equations involving matrices. The next example illustrates the process. Note that we must pay particular attention to the order of the matrices in the product. Exa m p l e 3 . 2 6 Solve the following matrix equation for X (assuming that the matrices involved are such that all of the indicated operations are defined): A - 1 (BX) - 1 = (A - 1 B 3 ) 2 110 Chapter 3 Matrices Solulion There are many ways to proceed here. One solution is A - 1 (BX)- 1 = (A - 1 B 3 ) 2 ==? ((BX)A) - 1 = (A - 1 B 3 ) 2 =} [ (( BX)A ) - 1 ] - 1 = [ (A - 1 B 3 ) 2 ] - 1 ==? (BX)A = [ (A - 1 B 3 )(A- 1 B 3 ) J - 1 ==? ( BX)A = B - 3 (A- 1 ) - 1 B - 3 (A- 1 ) - 1 ==? BXA = B - 3AB - 3A ==? B - 1 BXAA - 1 = B - 1 B - 3AB - 3AA - 1 ==? IXI = B - 4AB - 3I ==? X = B - 4AB - 3 � (Can you justify each step?) Note the careful use of Theorem 3.9(c) and the expansion of (A - 1 B 3 ) 2 • We have also made liberal use of the associativity of matrix multiplica­ tion to simplify the placement (or elimination) of parentheses. Elemen1arv Malrices We are going to use matrix multiplication to take a different perspective on the row reduction of matrices. In the process, you will discover many new and important insights into the nature of invertible matrices. If we find that In other words, multiplying A by E (on the left) has the same effect as interchanging rows 2 and 3 of A. What is significant about E? It is simply the matrix we obtain by applying the same elementary row operation, R 2 � R 3 , to the identity matrix J3 . It turns out that this always works. An elementary matrix is any matrix that can be obtained by per­ forming an elementary row operation on an identity matrix. DefiniliOD Since there are three types of elementary row operations, there are three cor­ responding types of elementary matrices. Here are some more elementary matrices. Exa m p l e 3 . 2 1 Let Section 3. 3 The Inverse of a Matrix 111 Each of these matrices has been obtained from the identity matrix I4 by applying a single elementary row operation. The matrix E 1 corresponds to 3 R2 , E2 to R 1 � R 3 , and E3 to R4 - 2R 2 . Observe that when we left-multiply a 4 X n matrix by one of these elementary matrices, the corresponding elementary row operation is performed on the matrix. For example, if A= then and [ "[ " a12 a 2 1 a 22 a 2 3 a 31 a 3 2 a 33 a41 a42 a43 "" ] " " [ au a12 3a 2 1 3a 22 3a 2 3 ' E A = E1A = 2 a 31 a 3 2 a 33 a41 a42 a43 a12 au a 22 a ,, E 3A = a32 a 31 a a41 - 2a 2 1 42 - 2a 22 ] "" [ : a 3 2 a ,, a z1 a 22 a z3 ' a 11 a 1 2 a 13 a 41 a42 a 43 a 13 a,, a 33 a43 - 2a 2 3 l Example 3.27 and Exercises 24-30 should convince you that any elementary row operation on any matrix can be accomplished by left-multiplying by a suitable elementary matrix. We record this fact as a theorem, the proof of which is omitted. Theor em 3 . 1 0 Let E be the elementary matrix obtaine d by performing an elementary row opera­ A, tion on In - If the same elementary row operation is performed on an n X r matrix the result is the same as the matrix EA. Remark From a computational point of view, it is not a good idea to use el­ ementary matrices to perform elementary row operations-just do them directly. However, elementary matrices can provide some valuable insights into invertible matrices and the solution of systems of linear equations. We have already observed that every elementary row operation can be "undone:' or "reversed:' This same observation applied to elementary matrices shows us that they are invertible. Exa m p l e 3 . 2 8 Let 0 0 � 0 4 0 �] 1 , and E3 = [ � � �] -2 0 1 Then E 1 corresponds to R 2 � R3 , which is undone by doing R2 � R3 again. Thus, E 1 - l = E 1 . (Check by showing that E l = E 1 E 1 = I.) The matrix E2 comes from 4R2 , 112 Chapter 3 Matrices which is undone by performing iR 2 • Thus, which can be easily checked. Finally, E3 corresponds to the elementary row opera­ tion R 3 - 2R 1 , which can be undone by the elementary row operation R 3 + 2R 1 . So, in this case, (Again, it is easy to check this by confirming that the product of this matrix and E3 , in both orders, is I.) Notice that not only is each elementary matrix invertible, but its inverse is another elementary matrix of the same type. We record this finding as the next theorem. Theorem 3 . 1 1 Each elementary matrix is invertible, and its inverse is an elementary matrix of the same type. The Fundam en1a1 Theorem ol lnvenible Malrices We are now in a position to prove one of the main results in this book-a set of equivalent characterizations of what it means for a matrix to be invertible. In a sense, much of linear algebra is connected to this theorem, either in the development of these characterizations or in their application. As you might expect, given this intro­ duction, we will use this theorem a great deal. Make it your friend! We refer to Theorem 3 . 1 2 as the first version of the Fundamental Theorem, since we will add to it in subsequent chapters. You are reminded that, when we say that a set of statements about a matrix A are equivalent, we mean that, for a given A, the state­ ments are either all true or all false. Theorem 3 . 1 2 The Fundamental Theorem of Invertible Matrices: Version 1 Let A be an n X n matrix. The following statements are equivalent: a. A is invertible. b. Ax = b has a unique solution for every b in !R n . c. Ax = 0 has only the trivial solution. d. The reduced row echelon form of A is Iw e. A is a product of elementary matrices. Section 3. 3 The Inverse of a Matrix 113 We will establish the theorem by proving the circular chain of implications Proof ( a ) =} ( b ) =} ( c ) =} ( d ) =} ( e ) =} ( a ) (a) =} (b) We have already shown that if A is invertible, then Ax = b has the unique solution x = A - 1 b for any b in !R n (Theorem 3.7). (b) =} (c) Assume that Ax = b has a unique solution for any b in !R n . This implies, in particular, that A x = has a unique solution. But a homogeneous system A x = always has x = as one solution. So in this case, x = must be the solution. (c) =} (d) Suppose that Ax = has only the trivial solution. The corresponding system of equations is 0 0 0 0 0 a 11 x 1 + a 1 2x2 + · · · + a 1nxn = 0 a 11 X 1 + a 2 1X2 + . . . + a 1nXn = 0 and we are assuming that its solution is = O = O In other words, Gauss-Jordan elimination applied to the augmented matrix of the system gives [A l o J = [ "" � a12 a , a21 a ln a2n a n ] a n2 a nn f l [! 0 � 0 0 0 1 !] [In l o J Thus, the reduced row echelon form of A is In ( d) =} (e) If we assume that the reduced row echelon form of A is In , then A can be reduced to In using a finite sequence of elementary row operations. By Theorem 3. 10, each one of these elementary row operations can be achieved by left-multiplying by an appropriate elementary matrix. If the appropriate sequence of elementary matrices is E 1 , E2 , , Ek (in that order), then we have • • . Ek · · · E2E 1A = In According to Theorem 3. 1 1 , these elementary matrices are all invertible. Therefore, so is their product, and we have Again, each E; - 1 is another elementary matrix, by Theorem 3. 1 1 , so we have written A as a product of elementary matrices, as required. (e) =} (a) If A is a product of elementary matrices, then A is invertible, since elementary matrices are invertible and products of invertible matrices are invertible. 114 Chapter 3 Matrices Exa m p l e 3 . 2 9 If possible, express A = Solution [� �J as a product of elementary matrices. [2 J [ 2l J [ l [ l J � [ l OJ We row reduce A as follows: A= 1 3 � 3 � 0 3 �· 3 0 3 -3 0 = I2 0 -3 1 J Thus, the reduced row echelon form of A is the identity matrix, so the Fundamental Theorem assures us that A is invertible and can be written as a product of elementary matrices. We have E4E3 E2 E 1A = I, where are the elementary matrices corresponding to the four elementary row operations used to reduce A to I. As in the proof of the theorem, we have [ O l J [ 2l O J [ l - l J [ l O J 1 0 as required. 1 0 1 0 -3 Remark Because the sequence of elementary row operations that transforms A into I is not unique, neither is the representation of A as a product of elementary matrices. (Find a different way to express A as a product of elementary matrices.) "Never bring a cannon o n stage in Act I unless you intend to fire it by the last act:' - Anton Chekhov Theorem 3 . 1 3 The Fundamental Theorem is surprisingly powerful. To illustrate its power, we consider two of its consequences. The first is that, although the definition of an in vertible matrix states that a matrix A is invertible if there is a matrix B such that both AB = I and BA = I are satisfied, we need only check one of these equations. Thus, we can cut our work in half! Let A be a square matrix. If B is a square matrix such that either AB = I or BA = I, then A is invertible and B = A - 1 • 0. 0. 0 Proof Suppose BA = I. Consider the equation Ax = Left-multiplying by B, we have BAx = BO. This implies that x = Ix = Thus, the system represented by Ax = has the unique solution x = From the equivalence of (c) and (a) in the Fundamental Theo­ rem, we know that A is invertible. (That is, A - 1 exists and satisfies AA i = I = A iA.) If we now right-multiply both sides of BA = I by A - i , we obtain BAA - i = IA - i =} BI = A - i =} B = A i 0. - - - (The proof in the case of AB = I is left as Exercise 4 1 .) The next consequence of the Fundamental Theorem is the basis for an efficient method of computing the inverse of a matrix. Section Theorem 3 . 1 4 3. 3 The Inverse of a Matrix 115 Let A be a square matrix. If a sequence of elementary row operations reduces A to I, then the same sequence of elementary row operations transforms I into A - i . Proof If A is row equivalent to I, then we can achieve the reduction by left­ multiplying by a sequence E 1 , E2 , , Ek of elementary matrices. Therefore, we have Ek · · · E2 E 1A = I. Setting B = Ek · · · E2 E 1 gives BA = I. By Theorem 3 . 1 3, A is invert­ ible and A - l = B. Now applying the same sequence of elementary row operations to I is equivalent to left-multiplying I by Ek · · · E2 E 1 = B. The result is Ek · · · E2E 1 I = BI = B = A - 1 • • • Thus, I is transformed into A - i by the same sequence of elementary row operations. The Gauss-Jordan Melhod for Compuling lhe Inverse We can perform row operations on A and I simultaneously by constructing a "super­ augmented matrix" [A I I ] . Theorem 3 . 1 4 shows that if A is row equivalent to I [which, by the Fundamental Theorem (d) � (a), means that A is invertible] , then elementary row operations will yield If A cannot be reduced to I, then the Fundamental Theorem guarantees us that A is not invertible. The procedure just described is simply Gauss-Jordan elimination performed on an n X 2n, instead of an n X (n + 1), augmented matrix. Another way to view this pro­ cedure is to look at the problem of finding A -I as solving the matrix equation AX = In for an n X n matrix X. (This is sufficient, by the Fundamental Theorem, since a right inverse of A must be a two-sided inverse.) If we denote the columns of X by x1 , . . . , Xn' then this matrix equation is equivalent to solving for the columns of X, one at a time. Since the columns of In are the standard unit vectors e 1 , . . . , en , we thus have n systems of linear equations, all with coefficient matrix A: Since the same sequence of row operations is needed to bring A to reduced row echelon form in each case, the augmented matrices for these systems, [A I e 1 ] , . . . , [A I e n ] , can be combined as We now apply row operations to try to reduce A to In , which, if successful, will simul­ taneously solve for the columns of A - i , transforming In into A - i . We illustrate this use of Gauss-Jordan elimination with three examples. Exa m p l e 3 . 3 0 Find the inverse of A= if it exists. [� � �i 1 3 -3 - 116 Chapter 3 Matrices Solulion Gauss-Jordan elimination produces [A I I J R2 - 2R, R3 - R1 � H:J R, R3-R2 � R, + R3 R2 + 3R3 � R, - 2R2 � Therefore, [: [: [: [: [: [: A-1 = [ 2 2 3 0 -1 4 0 -3 0 0 2 -2 1 -1 2 -1 -3 -2 - 1 1 2 1 0 �: 0 6 -2 1 -2 - 1 0 -1 -3 0 I -2 0 0 1 -2 2 0 -1 1 0 -5 -2 0 0 0 9 1 0 -5 0 1 -2 9 -t - 5 3 -5 -2 t 1 I 2 I -2 I 2 1 2 ! 3 -2 1 ] 2 ! 1] �] �] �: - :i (You should always check that AA - I = I by direct multiplication. By Theorem 3 . 1 3, we do not need to check that A - iA = I too.) Remark Notice that we have used the variant of Gauss-Jordan elimination that first introduces all of the zeros below the leading ls, from left to right and top to bottom, and then creates zeros above the leading l s, from right to left and bottom to top. This approach saves on calculations, as we noted in Chapter 2, but you may find it easier, when working by hand, to create all of the zeros in each column as you go. The answer, of course, will be the same. Exa m p l e 3 . 3 1 Find the inverse of A= if it exists. [ ! - :i -2 -1 2 -2 Section 3. 3 The Inverse of a Matrix 111 Solution We proceed as in Example 3.30, adjoining the identity matrix to A and then trying to manipulate [A I I] into [I I A - 1 ] . [ -: [: ,-,., [ I [A I IJ 1 -1 2 -2 R2 + 2R1 R3 + R1 � � 1 1 3 �] -4 1 0 6 0 1 -2 0 0 0 -4 -2 2 1 -6 1 0 2 -1 0 -3 2 0 0 0 -5 �] 0 1 -3 �] At this point, we see that it is not possible to reduce A to I, since there is a row of zeros on the left-hand side of the augmented matrix. Consequently, A is not invertible. 4 As the next example illustrates, everything works the same way over "11.P. , where p is prime. Exa m p l e 3 . 3 2 Find the inverse of A = if it exists, over "11.. 3 . Solution 1 in "11.. 3 . [� �] We use the Gauss-Jordan method, remembering that all calculations are [A I I] = � 2R1 � R1 + R1 � R1 + 2R2 [ �] , 0 Thus , A - 1 = 2 [� 1 �] [� 1 �] [� 1 �] [� 1 �] 2 1 00 1 2 0 0 1 2 1 2 0 0 1 2 and it is easy to check that, over "11.. 3 , AA - l = I. Since A is a 2 X 2 matrix, we can also compute A - l using the formula given in Theorem 3.8. The determinant of A is Solution 2 det A = 2 ( 0 ) - 2 ( 2 ) = - 1 = 2 in "11.. 3 (since 2 + 1 = O ) . Thus, A - l exists and is given by the formula in Theorem 3.8. We must be careful here, though, since the formula introduces the "fraction" l /det A Chapter 118 3 Matrices and there are no fractions in Z 3 . We must use multiplicative inverses rather than division. Instead of l /det A = 1 /2, we use T 1 ; that is, we find the number x that satisfies the equation 2x = 1 in Z 3 • It is easy to see that x = 2 is the solution we want: In Z 3 , T 1 = 2, since 2(2) = 1 . The formula for A - 1 now becomes A-1 = T1 [ _ � -� ] [ � � ] [ � � ] =2 = which agrees with our previous solution. .. I Exercises 3 . 3 In Exercises 1 - 1 0, find the inverse of the given matrix (if it exists) using Theorem 3.8. [� �] [! !] [[i n ] [IO. � -�] [ ] 1. 2. 3. 4. 5. 7. - 1 .5 0.5 [ �_� - �� ] [ ] [[ ] ] l / V2 l / V2 - 1 ; V2 1 / \/2 3.55 0.25 8. 8.52 0.60 6. - 4.2 2.4 9. • l /a l /b , where neither a, b, c, nor d is O 1 /c l /d • In Exercises 1 1 and 12, solve the given system using the method of Example 3.25. 12. X 1 - X2 = 1 2x 1 + x2 = 2 1 1 . 2X + y = - 1 5x + 3y = 2 13. Let A = For larger systems, the difference is even more pronounced, and this explains why computer systems do not use one of these methods to solve linear systems. 14. Prove Theorem 3.9(b). 15. Prove Theorem 3.9(d). 16. Prove that the n X n identity matrix In is invertible and that ln- l = Iw 17. (a) Give a counterexample to show that (AB) - 1 * A - 1 B - 1 in general. (h) Under what conditions on A and B is (AB) - 1 = A - 1 B - 1 ? Prove your assertion. 18. By induction, prove that if A 1 , A 2 , , A n are invertible matrices of the same size, then the product A 1A 2 · · · A n A n ) - 1 = A ;;- 1 · · · A2 1 A � 1 • is invertible and (A 1A 2 19. Give a counterexample to show that (A + B) - 1 * A - 1 + B - 1 in general. [ � !l [ ;] [ �J h1 = , h2 = - , and b 3 = (a) Find A - 1 and use it to solve the three systems Ax = h 1 , Ax = h2 , and Ax = h 3 . [ �] . (h) Solve all three systems at the same time by row re­ ducing the augmented matrix [A I h 1 h 2 h 3 ] using Gauss-Jordan elimination. (c) Carefully count the total number of individual multiplications that you performed in (a) and in (b) . You should discover that, even for this 2 X 2 example, one method uses fewer operations. • • • • In Exercises 20 -23, solve the given matrix equation for X. Simplify your answers as much as possible. (In the words of Albert Einstein, "Everything should be made as simple as pos­ sible, but not simpler.") Assume that all matrices are invertible. 21. AXB = ( BA ) 2 20. XA 2 = A - 1 1 1 1 22. (A - X) - = A ( B - 2A ) - 23. ABXA - 1 B - 1 = I + A In Exercises 24-30, let A= c� [: [: ] [ � �l ] H -:] � , B= 2 1 -1 -1 2 -1 1 ' D= -1 1 -1 1 2 2 -1 -1 -1 Section In each case, find an elementary matrix E that satisfies the given equation. 24. EA = B 25. EB = A 26. EA = C 28. EC = D 27. EC = A 29. ED = C 30. Is there an elementary matrix E such that EA = D? Why or why not? In Exercises 48-63, use the Gauss-Jordan method to find the inverse of the given matrix (if it exists). 48. 31. 50. 33. 35. 37. 0 1 0 0 c 0 32. 34. 36. 38. [� �] [ _� � ] [: :J [i l H 0 1 0 0 . 39. A = • . [ _ � -�J 52. 54. 0 In Exercises 39 and 40, find a sequence of elementary matrices E1, E2 , , Ek such that Ek · · · E2 E1A = I. Use this sequence to write both A and A i as products of elementary matrices. 56. - 40. A = [ � �] 41. Prove Theorem 3 . 1 3 fo r the case o f AB = I. 42. (a) Prove that if A is invertible and AB = 0, then [ � :J [� - �] [: -� -� ] :[ � : J [: � �] 0 49. Thus, something that is idempotent has the "same power" when squared.) ( a) Find three idempotent X matrices. (b) Prove that the only invertible idempotent n X n matrix is the identity matrix. 45. Show that if A is a square matrix that1 satisfies the equation A 2 - 2A + I = 0, then A - = 2I - A. 46. Prove that if a symmetric matrix is invertible, then its inverse is symmetric also. 22 [ -2 4] 3 -1 -1 [ 0 2 V2 V2 0 - 4 V2 V2 58. 1 0 0 3 0 0 B = 0. (b) Give a counterexample to show that the result in part (a) may fail if A is not invertible. 43. (a) Prove that if A is invertible and BA = CA, then B = C. (b) Give a counterexample to show that the result in part (a) may fail if A is not invertible. 44. A square matrix A is called idempotent if A 2 = A. (The word idempotent comes from the Latin idem, meaning "same;' and potere, meaning "to have power:' 119 47. Prove that if A and B are square matrices and AB is invertible, then both A and B are invertible. In Exercises 3 1 -38, find the inverse of the given elementary matrix. [� �] [� �] [ i - �] [ i n' H 3. 3 The Inverse of a Matrix 60. 61. [! !] [: 2 OJ over Z 5 5 63. 62. [ 0 1 [: 4 over Z 7 6 1 Partitioning large square matrices can sometimes make their inverses easier to compute, particularly if the blocks have a nice form. In Exercises 64-68, verify by block multip lica­ tion that the inverse of a matrix, ifpartitioned as shown, is as claimed. (Assume that all inverses exist as needed.) Chapter 180 65. 66. 67. 3 Matrices [ � �r [ [� �r [ [� �r l l l ] (BC) - 1 B - (BC) - 1 1 C(BC) - I - C(BC) - 1 B (I - BC) - 1 - (I - BC) - 1 B - C(I - BC) - 1 I + C(I - BC) - 1 B [ ] (BD- 1 C) - 1 BD - 1 - (BD- 1 C) - 1 - D - 1 c(BD- 1 c) - l D - 1 - D - 1 c(BD- 1 c) - 1 BD - l A B -1 - p Q 1 1 68. - R S , where P = (A - BD - C) - , C D - 1 , R = -D - 1 CP, and S = D - 1 Q = -PBD 1 + D - CPBD - 1 [ ] [ ] • In Exercises 69-72, partition the given matrix so that you can apply one of the formulas from Exercises 64-68, and then calculate the inverse using that formula. ] �] 0 0 1 0 3 1 2 0 70. The matrix in Exercise 58 71. [� �: 0 0 1 -1 1 1 0 72. T h e LU Factorizatio n Just as it is natural (and illuminating) to factor a natural number into a product of other natural numbers-for example, 30 = 2 3 5-it is also frequently helpful to fac­ tor matrices as products of other matrices. Any representation of a matrix as a product of two or more other matrices is called a matrix factorization. For example, · · [ ] [ ][ ] 3 9 -1 -5 = l 0 3 3 1 0 -1 -2 is a matrix factorization. Needless to say, some factorizations are more useful than others. In this section, we introduce a matrix factorization that arises in the solution of systems of linear equations by Gaussian elimination and is particularly well suited to computer imple­ mentation. In subsequent chapters, we will encounter other equally useful matrix factorizations. Indeed, the topic is a rich one, and entire books and courses have been devoted to it. Consider a system of linear equations of the form Ax = b, where A is an n X n matrix. Our goal is to show that Gaussian elimination implicitly factors A into a prod­ uct of matrices that then enable us to solve the given system (and any other system with the same coefficient matrix) easily. The following example illustrates the basic idea. Exa m p l e 3 . 3 3 Let 1 -1 5 Section 3.4 The Factorization LU Row reduction of A proceeds as follows: 181 u -1 5 (1) The three elementary matrices £ 1 , E2 , E3 that accomplish this reduction of A to echelon form U are (in order) : 0 1 0 0 1 0 Hence, Solving for A, we get A = E 1- 1 E2- 1 E3- 1 U = [� �J U �m �J u LU u � J � u 0 1 0 0 1 0 0 1 -2 0 -2 Thus, A can be factored as A = LU where U is an upper triangular matrix (see the exercises for Section 3.2), and L is unit lower triangular. That is, L has the form 1948 (1912-1954) The LU factorization was introduced in by the great English mathematician Alan M. Turing in a paper entitled "Rounding-off Errors in Matrix Processes" ( Quarterly Journal of (1948), 287-308). Mechanics and Applied Mathematics, 1 pp. During World War II, Turing was instrumental in cracking the German "Enigma'' code. However, he is best known for his work in mathematical logic that laid the theoretical groundwork for the development of the digital computer and the modern field of artificial intelligence. The "Turing test" that he proposed in is still used as one of the benchmarks in addressing the question of whether a computer can be considered "intelligent:' 1950 r � [: : with zeros above and l s on the main diagonal. !] The preceding example motivates the following definition. D e f i n i t i o n Let A be a square matrix. A factorization of A as A = LU, where L is unit lower triangular and U is upper triangular, is called an LU factorization of A. R e m a rks Observe that the matrix A in Example 3.33 had an LU factorization because no row interchanges were needed in the row reduction of A. Hence, all of the elementary • matrices that arose were unit lower triangular. Thus, L was guaranteed to be unit 182 Chapter 3 Matrices � lower triangular because inverses and products of unit lower triangular matrices are also unit lower triangular. (See Exercises 29 and 30.) If a zero had appeared in a pivot position at any step, we would have had to swap rows to get a nonzero pivot. This would have resulted in L no longer being unit lower triangular. We will comment further on this observation below. (Can you find a ma­ trix for which row interchanges will be necessary?) • The notion of an LU factorization can be generalized to nonsquare matrices by simply requiring U to be a matrix in row echelon form. (See Exercises 1 3 and 1 4 . ) • Some books define an LU factorization of a square matrix A to be any factor­ ization A = LU, where L is lower triangular and U is upper triangular. The first remark above is essentially a proof of the following theorem. Theorem 3 . 1 5 If A is a square matrix that can be reduced to row echelon form without using any row interchanges, then A has an LU factorization. To see why the LU factorization is useful, consider a linear system Ax = b, where the coefficient matrix has an LU factorization A = LU. We can rewrite the system Ax = b as LUx = b or L( Ux) = b. If we now define y = Ux, then we can solve for x in two stages: 1. Solve Ly = b for y byforward substitution (see Exercises 25 and 26 in Section 2 . 1 ) . 2. Solve Ux = y fo r x by back substitution. Each of these linear systems is straightforward to solve because the coefficient matri­ ces L and U are both triangular. The next example illustrates the method. Exa m p l e 3 . 3 4 Use an LU factorization of A = [ ! !] to ,olve Ax � b, whece b � -2 0 1 -1 5 -2 Solution In Example 3.33, we found that A = [f:J [ � � i [� - � -� ] 2 -1 1 0 2 [-: l = LU As outlined above, to solve Ax = b (which is the same as L( Ux) = b ), we first solve Ly � b fm y � Thi' ' ' ju,t the Hnem- 'l"tem Y1 -4 2y 1 + Y2 -y 1 - 2y2 + y3 = 9 - Forward substitution (that is, working from top to bottom) yields Y i = l , y2 = - 4 2y i = - 6 , y = 9 + Y i + 2y2 = - 2 3 Section 2x 1 + 3.4 The Factorization LU 183 x2 + 3x3 = - 3x2 - 3x3 = - 6 2x3 = - 2 and back substitution quickly produces X3 = - 1 , - 3x2 = - 6 + 3x3 = - 9 so that x2 = 3, and 2x 1 = 1 - x2 - 3x3 = 1 so that x1 = t Thmfoce, the rnlution to the given 'Y'tem Ax � b ; , x � [ _!l · A n Easv wav to Find L U Facto rizations ii>-"'-- In Example 3.33, we computed the matrix L as a product of elementary matrices. Fortunately, L can be computed directly from the row reduction process without our needing to compute elementary matrices at all. Remember that we are assuming that A can be reduced to row echelon form without using any row interchanges. If this is the case, then the entire row reduction process can be done using only elementary row operations of the form R; - kR1 . (Why do we not need to use the remaining elementary row operation, multiplying a row by a nonzero scalar?) In the operation R; - kRj, we will refer to the scalar k as the multiplier. In Example 3.33, the elementary row operations that were used were, in order, 2) -1) -2) R2 - 2R 1 (multiplier R3 + R 1 = R 3 - ( - l)R 1 (multiplier R3 + 2R2 = R 3 - ( - 2)R2 (multiplier The multipliers are precisely the entries of L that are below its diagonal! Indeed, = = = and L 2 1 = 2, L 3 1 = - 1 , and L 3 2 = - 2. Notice that the elementary row operation R; - kR1 has its multiplier k placed in the (i, j) entry of L. Exa m p l e 3 . 3 5 Find an LU factorization of A= r ! 4 ! --�4�1 3 2 -9 5 5 -2 -1 184 Chapter 3 Matrices Solulion ] [' [; . -HJ•, [ : Reducing A to row echelon form, we have A= [j 4 2 5 3 8 5 -2 -4 - 10 -1 -4 ] ! ] ] R2 - 2R1 R3-R1 R4- ( - 3)R1 1 3 -4 Q 2 2 -2 2 0 0 8 7 -1 R, - !R2 R4-4R1 3 -4 2 -2 4 - 1 -8 -----+ -----+ -----+ 2 0 0 3 -4 2 2 -2 4 0 0 1 0 0 0 -4 = U The first three multipliers are 2, 1 , and - 3, and these go into the subdiagonal entries of the first column of L. So, thus far, The next two multipliers are t and 4, so we continue to fill out L: The final multiplier, - 1 , replaces the last * in L to give Thus, an LU factorization of A is ![ ! -��i [ � � � i [ � � =�i 0 A= 4 3 2 -9 5 as is easily checked. 5 -2 -1 -4 1 t -3 4 -1 0 1 2 0 0 1 0 0 0 4 -4 = LU Section 3.4 The LU Factorization 185 R e m a rks In applying this method, it is important to note that the elementary row opera­ tions R; - kRj must be performed from top to bottom within each column (using the diagonal entry as the pivot), and column by column from left to right. To illustrate what can go wrong if we do not obey these rules, consider the following row reduction: • A = [ � � �i [ � � �i [� � �] U L L32-= L2- 1 = L � [i : �] A LU. LUL A.) 2 2 1 �' 0 0 -1 � 0 0 This time the multipliers would be placed in as follows: get .,._.. but -1 2, 1 . We would (Check this! Find a correct factorization of An alternative way to construct is to observe that the multipliers can be obtained directly from the matrices obtained at the intermediate steps of the row reduction process. In Example 3.33, examine the pivots and the corresponding col­ umns of the matrices that arise in the row reduction • * 1 -1 5 u A. The first pivot is 2, which occurs in the first column of Dividing the entries of this column vector that are on or below the diagonal by the pivot produces [± J [ _ � - A1 The next pivot is - 3, which occurs in the second column of . Dividing the entries of this column vector that are on or below the diagonal by the pivot, we obtain U. The final pivot (which we did not need to use) is 2, in the third column of Divid­ ing the entries of this column vector that are on or below the diagonal by the pivot, we obtain ±[J [J If we place the resulting three column vectors side by side in a matrix, we have L u -� J which is exactly once the above-diagonal entries are filled with zeros. 186 Chapter 3 Matrices In Chapter 2, we remarked that the row echelon form of a matrix is not unique. However, if an invertible matrix A has an LU factorization A = LU, then this factoriza­ tion is unique. Theorem 3 . 1 6 � If A is an invertible matrix that has an LU factorization, then L and U are unique. Proof Suppose A = LU and A = L 1 U1 are two LU factorizations of A. Then LU = L 1 U1 , where L and L 1 are unit lower triangular and U and U1 are upper triangular. In fact, U and U1 are two (possibly different) row echelon forms of A. By Exercise 30, L 1 is invertible. Because A is invertible, its reduced row echelon form is an identity matrix I by the Fundamental Theorem of Invertible Matrices. Hence U also row reduces to I (why?) and so U is invertible also. Therefore, Lj 1 ( L U) u - 1 = Lj 1 ( L 1 U1 ) u - 1 so ( Ll 1 L ) ( uu - 1 ) = ( Ll 1 L 1 ) ( UI u - 1 ) Hence, � ( Ll 1 L ) I = I( UI u - 1 ) so Li 1 L = U1 u - I But Lj 1 L is unit lower triangular by Exercise 29, and U1 u - 1 is upper triangular. (Why?) It follows that Lj 1 L = U1 u- 1 is both unit lower triangular and upper tri­ angular. The only such matrix is the identity matrix, so Lj 1 L = I and U1 u - 1 = I. It follows that L = L 1 and U = U1 , so the LU factorization of A is unique. The pr LU Facto rization We now explore the problem of adapting the LU factorization to handle cases where row interchanges are necessary during Gaussian elimination. Consider the matrix A straightforward row reduction produces which is not an upper triangular matrix. However, we can easily convert this into upper triangular form by swapping rows 2 and 3 of B to get Alternatively, we can swap rows 2 and 3 of A first. To this end, let P be the elementary matrix Section 3.4 The LU Factorization 181 corresponding to interchanging rows 2 and 3, and let E be the product of the elementary matrices that then reduce PA to U (so that E - 1 = L is unit lower triangu­ lar). Thus EPA = U, so A = (EP) - 1 U = P - 1 E - 1 U = P - 1 LU. Now this handles only the case of a single row interchange. In general, P will be the product P = Pk · · ·P2 P 1 of all the row interchange matrices P 1 ,P2 , , Pk (where P 1 is performed first, and so on) . Such a matrix P is called a permutation matrix. Ob­ serve that a permutation matrix arises from permuting the rows of an identity matrix in some order. For example, the following are all permutation matrices: • [� �] . [ : • • 0 0 Fortunately, the inverse of a permutation matrix is easy to compute; in fact, no calcu­ lations are needed at all! Theorem 3 . 1 1 If P is a permutation matrix, then p - l = P T. We must show that p Tp = I. But the ith row of p T is the same as the ith column of P, and these are both equal to the same standard unit vector e, because P is a permutation matrix. So Proof ( prp) ;; = ( ith row of p r ) ( ith column of P) = e re = e · e = 1 This shows that diagonal entries of pTp are all ls. On the other hand, if j i= i, then the jth column of P is a different standard unit vector from e-say e ' . Thus, a typical off-diagonal entry of prp is given by ( p rp) iJ = ( ith row of p r ) (jth column of P) = e re ' = e · e ' = 0 Hence pTP is an identity matrix, as we wished to show. Thus, in general, we can factor a square matrix A as A = P - 1 LU = P rLU. Let A be a square matrix. A factorization of A as A = PTLU, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular, is called a PrLUfactorization of A. Definition Exa m p l e 3 . 3 6 Find o pTLU fadmizotion of A � [: � n First we reduce A to row echelon form. Clearly, we need at least one row interchange. 2 2 0 R1 +-* Rz R, - 2R1 � � 0 0 2 A = 1 -3 Solution [ : ! ] [ i ! ] [ i - �l [: -!] � Rz +-* R3 2 -3 0 188 Chapter 3 Matrices We have used two row interchanges (R 1 � R 2 and then R 2 � R 3 ), so the required permutation matrix is p = P2 P 1 = [i �m :J [: �] 1 0 0 1 0 0 0 0 We now find an LU factorization of PA. PA = [: �m !] [� !] [: -!] [ � : rn � � : J [ i -� - ! : 1 0 0 2 1 0 0 2 1 2 -3 0 R2 - 2R1 u Hence L 2 1 = 2, and so A � P'L U � The discussion above justifies the following theorem. Theorem 3 . 1 8 Every square matrix has a PTLU factorization. Even for an invertible matrix, the P TLU factorization is not unique. In Example 3.36, a single row interchange R 1 � R 3 also would have worked, leading to a different P. However, once P has been determined, L and U are unique. Remark Comp utational Considerations IfA is n X n, then the total number of operations (multiplications and divisions) required to solve a linear system Ax = b using an LU factorization of A) is T(n) n 3 /3, the same as is required for Gaussian elimination. (See the Exploration "Counting Operations;' in Chapter 2.) This is hardly surprising since the forward elimination phase produces the LU factorization in n 3 /3 steps, whereas both forward and backward substitution require n 2 /2 steps. Therefore, for large values of n, the n 3 /3 term is dominant. From this point of view, then, Gaussian elimination and the LU factorization are equivalent. However, the LU factorization has other advantages: = = = From a storage point of view, the L U factorization is very compact because we can overwrite the entries of A with the entries of L and U as they are computed. In Example 3.33, we found that • A= �[ - � � i [ � � � i [ � - � � ] - -2 This can be stored as 5 5 -1 -2 1 [ � =� �] -1 -2 - 2 0 0 2 = LU Section 3.4 The LU Factorization 189 with the entries placed in the order ( 1 , 1 ) , ( 1 ,2), ( 1 ,3), (2, 1 ) , (3, 1 ) , (2,2), (2,3), (3,2), (3,3). In other words, the subdiagonal entries of A are replaced by the corresponding multipliers. (Check that this works! ) • Once an LU factorization of A has been computed, it can be used to solve as many linear systems of the form Ax as we like. We just need to apply the method of Example 3.34, varying the vector each time. • For matrices with certain special forms, especially those with a large number of zeros (so-called "sparse" matrices) concentrated off the diagonal, there are methods that will simplify the computation of an LU factorization. In these cases, this method is faster than Gaussian elimination in solving Ax • For an invertible matrix A, an LU factorization of A can be used to find A - 1 , if necessary. Moreover, this can be done in such a way that it simultaneously yields a factorization of A - 1 . (See Exercises 1 5 - 1 8.) b= b = b. Remark If you have a CAS (such as MATLAB) that has the LU factorization built in, you may notice some differences between your hand calculations and the computer output. This is because most CAS's will automatically try to perform partial pivoting to reduce roundoff errors. (See the Exploration "Partial Pivoting;' in Chapter 2.) Turing's paper is an extended discussion of such errors in the context of matrix factorizations. I This section has served to introduce one of the most useful matrix factorizations. In subsequent chapters, we will encounter other equally useful factorizations. Exercises 3 . 4 =b 1. A = [ l ] [ J [ l ] ' b = [ S J = [� -� ] = [t � ][� - ! l b = [ � ] = [ - ! : =�J H ; � l �[ = !], b [ - � 1 = [ � = � � 1 [ -�; � 1 [� -� H b � U_ In Exercises 1 -6, solve the system Ax LU factorization of A. 1 O -1 1 -2 2 5 5. A = 6. A = using the given -2 0 6 1 2. A 3. A _ _ x 4. A 4 0 0 -1 x -� 2 2 = 0 0 0 1 ] [ ] In Exercises 7-12,find an LUfactorization of the given matrix. 7· [ 1 -3 2 -1 s. 2 3 -4 1 Chapter 190 9. 11. [: ; :l [ � : _; -:i [ -� ! -� � i -1 12. 3 Matrices -2 4 4 6 9 -9 In Exercises 1 9-22, write the given permutation matrix as a product of elementary (row interchange) matrices. 19. [ : � ] 0 0 0 21. 7 3 5 8 Generalize the definition of LUfactorization to nonsquare matrices by simply requiring U to be a matrix in row ech­ elon form. With this modification, find an LUfactorization of the matrices in Exercises 13 and 14. [ 1 c 13. 0 3 3 0 0 0 14. [� 1 -2 2 -7 1 3 -� 0 3 3 -3 ] -1 8 5 -6 -�] Fo r a n invertible matrix with a n L Ufactorization1 A1 = LU, both L and U will be invertible and1 A - l = u1 - L - . In 1 Exercises 15 and 1 6, find L , u - , and A - for the given matrix. 15. A in Exercise 1 16. A in Exercise 4 The inverse of a matrix can also be computed by solving sev­ eral systems of equations using the method ofExample 3.34. For an n X n matrix A, to find its inverse we need to solve AX = In for the n X n matrix X. Writing this equation as A [ x1 x 2 · · x n ] = [ e 1 e2 · · · e n ] , using the matrix-column form ofAX, we see that we need to solve n systems of linear equations: Ax1 = e 1 , Ax2 = e2 , . . . , AXn = en · Moreover, we can use the factorization A = LU to solve each one of these systems. 0 1 22. 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 [ -; �] In Exercises 23-25, find a PTLUfactorization of the given matrix A. 23. A = 25. A = H !] 2 3 [ -; -1 1 1 0 -1 1 0 24. A = il 2 1 3 -1 26. Prove that there are exactly n! n X n permutation matrices. In Exercises 27-28, solve the system Ax = b using the given factorization A = PTLU. Because ppT = I, PTLUx = b can be rewritten as LUx = Pb. This system can then be solved using the method of Example 3.34. 27. A = • In Exercises 1 7 and 18, use the approach just outlined to find A - l for the given matrix. Compare with the method of Exercises 15 and 1 6. 18. A in Exercise 4 17. A in Exercise 1 20. [ ;] 1 0 0 0 [� � ] 0 0 0 1 1 0 0 0 [� � �l [� � �l [� C :l m [ : �l l [ ! �- [ : : J [ : : J [ �:i [� r - 1 -1 3 1 0 x 0 0 1 � P'L U, b � 3 28. A = 0 0 0 x 1 -1 0 = P TL U, b = 2 -t 1 0 1 -1 Section 29. Prove that a product of unit lower triangular matrices is unit lower triangular. 30. Prove that every unit lower triangular matrix is invertible and that its inverse is also unit lower triangular. An LDUfactorization of a square matrix A is a factoriza­ tion A = LDU, where L is a unit lower triangular matrix, D is a diagonal matrix, and U is a unit upper triangular matrix (upper triangular with ls on its diagonal). In Exercises 31 and 32, find an LDUfactorization of A. 3. 5 Subspaces, Basis, Dimension, and Rank 191 31. A in Exercise 1 32. A in Exercise 4 33. If A is symmetric and invertible and has an LDU factorization, show that U = L r. 34. If A is symmetric and invertible and A = LDL T (with L unit lower triangular and D diagonal), prove that this factorization is unique. That is, prove that if we also have A = L1D1L[ (with L1 unit lower triangular and D1 diagonal), then L = L1 and D = D1• S u b s p aces , Basis , D i m e n s io n , and R a n k z 2u + v x y Figure 3 . 2 This section introduces perhaps the most important ideas in the entire book. We have already seen that there is an interplay between geometry and algebra: We can often use geometric intuition and reasoning to obtain algebraic results, and the power of algebra will often allow us to extend our findings well beyond the geometric settings in which they first arose. In our study of vectors, we have already encountered all of the concepts in this section informally. Here, we will start to become more formal by giving definitions for the key ideas. As you'll see, the notion of a subspace is simply an algebraic generalization of the geometric examples of lines and planes through the origin. The fundamental concept of a basis for a subspace is then derived from the idea of direc­ tion vectors for such lines and planes. The concept of a basis will allow us to give a precise definition of dimension that agrees with an intuitive, geometric idea of the term, yet is flexible enough to allow generalization to other settings. You will also begin to see that these ideas shed more light on what you already know about matrices and the solution of systems of linear equations. In Chapter 6, we will encounter all of these fundamental ideas again, in more detail. Consider this section a "getting to know you" session. A plane through the origin in IR 3 "looks like" a copy of IR 2 • Intuitively, we would agree that they are both "two-dimensional:' Pressed further, we might also say that any calculation that can be done with vectors in IR 2 can also be done in a plane through the origin. In particular, we can add and take scalar multiples (and, more generally, form linear combinations) of vectors in such a plane, and the results are other vec­ tors in the same plane. We say that, like IR 2 , a plane through the origin is closed with respect to the operations of addition and scalar multiplication. (See Figure 3.2.) But are the vectors in this plane two- or three-dimensional objects? We might argue that they are three-dimensional because they live in IR 3 and therefore have three components. On the other hand, they can be described as a linear combination of just two vectors-direction vectors for the plane-and so are two-dimensional objects liv­ ing in a two-dimensional plane. The notion of a subspace is the key to resolving this conundrum. 192 Chapter 3 Matrices Definition A subspace of !R n is any collection S of vectors in !R n such that: 1 . The zero vector 0 is in S. 2. If u and v are in S, then u + v is in S. (S is closed under addition.) 3. If u is in S and c is a scalar, then cu is in S. (S is closed under scalar multiplication.) We could have combined properties (2) and (3) and required, equivalently, that S be closed under linear combinations: If u 1 , u2 , • . • , uk are in S and c 1 , c2 , . • . , ck are scalars, then c 1 u 1 + c2 u2 + · · · + ckuk is in S. Exa m p l e 3 . 3 1 Every line and plane through the origin in IR 3 is a subspace of IR 3 . It should be clear geometrically that properties ( 1 ) through (3) are satisfied. Here is an algebraic proof in the case of a plane through the origin. You are asked to give the corresponding proof for a line in Exercise 9. Let <JP be a plane through the origin with direction vectors v1 and v2 . Hence, <JP = span (v1 , v2 ) . The zero vector 0 is in <JP, since 0 = Ov1 + Ov2 • Now let be two vectors in <JP. Then Thus, u + v is a linear combination of v1 and v2 and so is in <JP. Now let c be a scalar. Then 4 which shows that cu is also a linear combination of v1 and v2 and is therefore in <JP. We have shown that <JP satisfies properties ( 1 ) through ( 3) and hence is a subspace of IR 3 If you look carefully at the details of Example 3.3 7, you will notice that the fact that v1 and v2 were vectors in IR 3 played no role at all in the verification of the prop­ erties. Thus, the algebraic method we used should generalize beyond IR 3 and apply in situations where we can no longer visualize the geometry. It does. Moreover, the method of Example 3.37 can serve as a "template" in more general settings. When we generalize Example 3.3 7 to the span of an arbitrary set of vectors in any !R n , the result is important enough to be called a theorem. Theorem 3 . 1 9 Let v1 , v2 , . . . , vk be vectors in !R n . Then span(v1 , v2 , . . . , vk) is a subspace of !R n . Let S = span (v1 , v2 , . . . , vk) · To check property ( 1 ) of the definition, we simply observe that the zero vector 0 is in S, since 0 = Ov1 + Ov2 + · · · + Ovk. Proof 3. 5 Subspaces, Basis, Dimension, and Rank Section 193 Now let be two vectors in S. Then u + v = ( c 1 v1 + c2v2 + · · · + ckvk ) + ( d 1 v1 + d2v2 + · · · + dkvk ) = ( c 1 + d 1 ) v1 + ( cz + dz ) V2 + · · · + ( ck + dk ) vk Thus, u + v is a linear combination of v1 , v2 , , vk and so is in S. This verifies prop­ erty ( 2). To show property (3), let c be a scalar. Then • . . cu = c ( c 1 v1 + c2v2 + · · · + c kvk ) = ( cc 1 ) v1 + ( cc2 ) v2 + · · · + ( cck ) vk which shows that cu is also a linear combination of v1 , v2 , . . . , vk and is therefore in S. We have shown that S satisfies properties ( 1 ) through (3) and hence is a subspace of ll�r. We will refer to span (v1 , v2 , , vk) as the subspace spanned by v1 , v2 , , vk. We will often be able to save a lot of work by recognizing when Theorem 3 . 1 9 can be applied. . Exa m p l e 3 . 3 8 Show that the set of all vectors forms a subspace of IR\ 3 . Solullon • . [;] . . • that satisfy the conditions x = 3y and z = - 2y z Substitufog the two conditions into [�] Since y is arbitrary, the given set of vectors is span of lR\ 3 , by Theorem 3 . 1 9. [ _n yields ( [ �] ) -2 and is thus a subspace 4 Geometrically, the set of vectors in Example 3.38 represents the line through the migin in GI' with dimtion vectm 194 Chapter 3 Matrices - 2y Exa m p l e 3 . 3 9 [; ] Determine whether the set of all vectors and z = is a subspace of IR 3 . Solulion ....,.. 3y that satisfy the conditions x = z 3[ y -:2y l l + 1 This time, we have all vectors of the form The zero vector is not of this form. (Why not? Try solving 3[ y [ ) -:2y l H H'""' 1 property ( 1 ) does not hold, so this set cannot be a subspace of IR 3 . Exa m p l e 3 . 4 0 [;], y [ :2 ] where Determine whether the set o f all vectors S o lulion These are the vectors of the form = :x2-, i s a subspace o f IR 2 . - call this set S. This time 0 = belongs to S (take x = O), so property ( 1 ) holds. Let u = Then u + v = 2. [ J x� + x� [ :� ] and v = [:� ] [�] be in S. X 1 + Xz which, in general, is not in S, since it does not have the correct form; that is, x � + x? * (x1 + x2 ) To be specific, we look for a counterexample. If u = (2) [�] [!] 2IR . [ ! ] and v = then both u and v are in S, but their sum u + v = property fails and S is not a subspace of (3) is not in S since 5 * 32 . Thus, .+ In order for a set S to be a subspace of some !R n , we must prove that properties ( 1 ) through hold in general. However, for S to fail to be a subspace of !R n , it is enough to show that one of the three properties fails to hold. The easiest course is usually to find a single, specific counterexample to illustrate the failure of the property. Once you have done so, there is no need to consider the other properties. Remark Subspaces Associaled wilh Malrices 2; A great many examples of subspaces arise in the context of matrices. We have already encountered the most important of these in Chapter we now revisit them with the notion of a subspace in mind. Section Definition 3. 5 Subspaces, Basis, Dimension, and Rank 195 Let A be an m x n matrix. 1 . The row space of A is the subspace row(A) of !R n spanned by the rows of A. 2. The column space of A is the subspace col (A) of !R m spanned by the columns of A. Remark Observe that, by Example 3.9 and the Remark that follows it, col(A) consists precisely of all vectors of the form Ax where x is in !R n . Exa m p l e 3 . 4 1 Consider the matrix ( ' ) Dctmnine wheth" b � [�] [ 1 A= 0 3 ;, -1 1 -3 i in the column 'P"' of A. (b) Determine whether w = [ 4 5 ] is in the row space of A. (c) Describe row(A) and col (A) . Solution (a) By Theorem 2.4 and the discussion preceding it, b is a linear combination of the columns of A if and only if the linear system Ax = b is consistent. We row reduce the augmented matrix as follows: [ � �: � ] [ � � � ] � Thus, the system is consistent (and, in fact, has a unique solution) . Therefore, b is in col (A) . (This example is just Example 2 . 1 8, phrased in the terminology of this section.) (b) As we also saw in Section 2.3, elementary row operations simply create linear combinations of the rows of a matrix. That is, they produce vectors only in the row space of the matrix. If the vector w is in row(A), then w is a linear combination of the rows of A, so if we augment A by w as ___... [�], [�' ] it will be possible to apply elementary row operations to this augmented matrix to reduce it to form using only elementary row operations of the form R; + kRj , where i > j -in other words, workingfrom top to bottom in each column. (Why?) In this example, we have [�] [ � - � i 3 -3 4 5 �� �:= !�: � [� - � i [� -� i 0 0 0 9 �� � R 4 - 9R2 0 0 0 0 �� 196 Chapter 3 Matrices � Therefore, w is a linear combination of the rows of A (in fact, these calculations show that w = 4 [ l - 1 ] + 9 [ O l ] -how?), and thus w is in row(A) . ( c) It is easy to check that, fo r any vector w = [ x y ] , the augmented matrix reduces to [�] in a similar fashion. Therefore, every vector in IR 2 is in row(A), and so row(A) = IR 2 . Finding col (A) is identical to solving Example 2.2 1 , wherein we determined that it coincides with the plane (through the origin) in IR 3 with equation 3x - z = 0. (We will discover other ways to answer this type of question shortly.) Remark We could also have answered part (b) and the first part of part (c) by observing that any question about the rows of A is the corresponding question about the columns of A T. So, for example, w is in row(A) if and only if wT is in col (A T ) . This is true if and only if the system A Tx = wT is consistent. We can now proceed as in part (a) . (See Exercises 2 1 -24.) The observations we have made about the relationship between elementary row operations and the row space are summarized in the following theorem. Theorem 3 . 2 0 Let B be any matrix that is row equivalent to a matrix A. Then row(B) = row(A). The matrix A can be transformed into B by a sequence of row operations. Consequently, the rows of B are linear combinations of the rows of A; hence, linear combinations of the rows of B are linear combinations of the rows of A. (See Exer­ cise 2 1 in Section 2.3.) It follows that row(B) � row(A) . O n the other hand, reversing these row operations transforms B into A. There­ fore, the above argument shows that row(A) � row(B) . Combining these results, we have row(A) = row(B) . Proof There is another important subspace that we have already encountered: the set of solutions of a homogeneous system of linear equations. It is easy to prove that this subspace satisfies the three subspace properties. Theorem 3 . 2 1 Let A be an m X n matrix and let N be the set of solutions of the homogeneous linear system Ax = 0. Then N is a subspace of !R n . [Note that x must be a (column) vector in !R n in order for Ax to be defined and that 0 = O m is the zero vector in !R m .] Since AOn = Om , O n is in N. Now let u and v be in N . Therefore, Au = 0 and Av = 0. It follows that A ( u + v) = Au + Av = 0 + 0 = 0 Proof Section 3. 5 Subspaces, Basis, Dimension, and Rank 191 Hence, u + v is in N . Finally, for any scalar c, A ( c u ) = c (Au ) = c O = 0 and therefore cu is also in N. It follows that N is a subspace of !R n . Definition Let A be an m X n matrix. The null space of A is the subspace of !R n consisting of solutions of the homogeneous linear system Ax = 0. It is denoted by null(A). The fact that the null space of a matrix is a subspace allows us to prove what intuition and examples have led us to understand about the solutions of linear systems: They have either no solution, a unique solution, or infinitely many solutions. Theorem 3 . 2 2 Let A be a matrix whose entries are real numbers. For any system of linear equations Ax = b, exactly one of the following is true: a. There is no solution. b. There is a unique solution. c. There are infinitely many solutions. At first glance, it is not entirely clear how we should proceed to prove this theo­ rem. A little reflection should persuade you that what we are really being asked to prove is that if (a) and (b) are not true, then ( c) is the only other possibility. That is, if there is more than one solution, then there cannot be just two or even finitely many, but there must be infinitely many. If the system Ax = b has either no solutions or exactly one solution, we are done. Assume, then, that there are at least two distinct solutions of Ax = b-say, x 1 and x2 . Thus, Ax , = b and Ax2 = b Proof with x1 * x2 . It follows that A ( x, - x2 ) = Ax 1 - Ax2 = b - b = 0 Set Xo = x 1 - x2 . Then Xo * 0 and AXo = 0. Hence, the null space of A is nontrivial, and since null (A) is closed under scalar multiplication, CXo is in null (A) for every scalar c. Consequently, the null space of A contains infinitely many vectors (since it contains at least every vector of the form CXo and there are infinitely many of these) . Now, consider the (infinitely many) vectors of the form x1 + CXo , as c varies through the set of real numbers. We have A ( x, + c x0 ) = Ax, + cAXo = b + c O = b Therefore, there are infinitely many solutions of the equation Ax = b. Basis We can extract a bit more from the intuitive idea that subspaces are generalizations of planes through the origin in IR 3 . A plane is spanned by any two vectors that are 198 Chapter 3 Matrices parallel to the plane but are not parallel to each other. In algebraic parlance, two such vectors span the plane and are linearly independent. Fewer than two vectors will not work; more than two vectors is not necessary. This is the essence of a basis for a subspace. Definition A basis for a subspace S of ll�r is a set of vectors in S that 1 . spans S and 2. is linearly independent. Exa m p l e 3 . 4 2 Exa m p l e 3 . 4 3 In Section 2.3, we saw that the standard unit vectors e1, e2 , . . . e n in !R n are linearly independent and span !R n . Therefore, they form a basis for !R n , called the standard basis. In Example 2 . 1 9, we showed that IR 2 = span ( [ � ] [� ] ) 4 [ � ] [� ] . Since _ and _ , also linearly independent (as they are not multiples), they form a basis for IR 2 . { [�l [�] } { [ �l [�] } are 4 A subspace can (and will) have more than one basis. For example, we have just seen that IR 2 has the standard basis . How_ ever, we will prove shortly that the number of vectors in a basis for a given subspace will always be the same. Exa m p l e 3 . 4 4 and the basis Find a basis for S = span (u, v, w) , where The vectors u, v, and w already span S, so they will be a basis for S if they are also linearly independent. It is easy to determine that they are not; indeed, w = 2u - 3v. Therefore, we can ignore w, since any linear combinations involving u, v, and w can be rewritten to involve u and v alone. (Also see Exercise 47 in Section 2.3.) This implies that S = span (u, v , w) = span ( u, v) , and since u and v are certainly linearly independent (why?), they form a basis for S. (Geometrically, this means that u, v, and w all lie in the same plane and u and v can serve as a set of direction vectors for this plane.) Solution � 4 Section Exa m p l e 3 . 4 5 3. 5 Subspaces, Basis, Dimension, and Rank 199 Find a basis for the row space of A= Solution [ -! 3 -1 0 2 1 -2 1 6 The reduced row echelon form of A is 0 1 0 0 1 2 0 0 0 0 1 0 By Theorem 3.20, row(A) = row(R), so it is enough to find a basis for the row space of R. But row(R) is clearly spanned by its nonzero rows, and it is easy to check that the staircase pattern forces the first three rows of R to be linearly independent. (This is a general fact, one that you will need to establish to prove Exercise 33.) Therefore, a basis for the row space of A is {[1 0 0 - 1 ] , [0 1 2 0 3 ] , [0 0 0 4]} We can use the method of Example 3.45 to find a basis for the subspace spanned by a given set of vectors. Exa m p l e 3 . 4 6 Rework Example 3.44 using the method from Example 3.45. We transpose u, v, and w to get row vectors and then form a matrix with these vectors as its rows: Solution [ !> ] � � -! Proceeding as in Example 3.45, we reduce B to its reduced row echelon form 1 0 and use the nonzero row vectors as a basis for the row space. Since we started with column vectors, we must transpose again. Thus, a basis for span (u, v, w) is { [ H Ul l R e m arks In fact, we do not need to go all the way to reduced row echelon form-row ech­ elon form is far enough. If U is a row echelon form of A, then the nonzero row vectors • 200 Chapter 3 Matrices of U will form a basis for row(A) (see Exercise 33). This approach has the advantage of (often) allowing us to avoid fractions. In Example 3.46, B can be reduced to which gives us the basis \HH - rn for span (u, v, w) . • Observe that the methods used in Example 3.44, Example 3.46, and the Remark above will generally produce different bases. We now turn to the problem of finding a basis for the column space of a matrix A. One method is simply to transpose the matrix. The column vectors of A become the row vectors of A T' and we can apply the method of Example 3.45 to find a basis for row(A T ) . Transposing these vectors then gives us a basis for col (A) . (You are asked to do this in Exercises 2 1 -24.) This approach, however, requires performing a new set of row operations on A T. Instead, we prefer to take an approach that allows us to use the row reduced form of A that we have already computed. Recall that a product Ax of a matrix and a vec­ tor corresponds to a linear combination of the columns of A with the entries of x as coefficients. Thus, a nontrivial solution to Ax = 0 represents a dependence relation among the columns of A. Since elementary row operations do not affect the solution set, if A is row equivalent to R, the columns of A have the same dependence relation­ ships as the columns of R. This important observation is the basis (no pun intended!) for the technique we now use to find a basis for col (A). Exa m p l e 3 . 4 1 Find a basis for the column space of the matrix from Example 3.45, A= [-� [� 1 3 1 1 -1 0 2 -2 6 1 -:i Let a; denote a column vector of A and let r; denote a column vector of the reduced echelon form Solution R� 0 1 0 1 2 0 0 0 1 0 0 0 -il We can quickly see by inspection that r3 = r 1 + 2r2 and rs = - r 1 + 3r2 + 4r4 • (Check that, as predicted, the corresponding column vectors of A satisfy the same depen­ dence relations.) Thus, r3 and rs contribute nothing to col (R) . The remaining column Section 3. 5 Subspaces, Basis, Dimension, and Rank 201 vectors, r 1 , r2 , and r4 , are linearly independent, since they are just standard unit vec­ tors. The corresponding statements are therefore true of the column vectors of A. Thus, among the column vectors ofA, we eliminate the dependent ones ( a3 and a5 ) , and the remaining ones will be linearly independent and hence form a basis for col(A). What is the fastest way to find this basis? Use the columns of A that correspond to the columns of R containing the leading ls. A basis for col(A) is r . . . ., . ., } � { [ - ;H - �H - m Warning Elementary row operations change the column space! In our example, col (A) * col (R), since every vector in col (R) has its fourth component equal to 0 but this is certainly not true of col (A) . So we must go back to the original matrix A to get the column vectors for a basis of col (A) . To be specific, in Example 3.47, r 1 , r2 , and r4 do not form a basis for the column space of A. Exa m p l e 3 . 4 8 Find a basis for the null space of matrix A from Example 3.47. There is really nothing new here except the terminology. We simply have to find and describe the solutions of the homogeneous system Ax = 0. We have al­ ready computed the reduced row echelon form R of A, so all that remains to be done in Gauss-Jordan elimination is to solve for the leading variables in terms of the free variables. The final augmented matrix is 0 1 0 2 0 0 0 1 0 0 0 Solution If then the leading l s are in columns 1 , 2, and 4, so we solve for x 1 , x2 , and x4 in terms of the free variables x3 and x5 • We get x 1 = - x3 + x5 , x2 = - 2x3 - 3x5, and x4 = - 4x5. Setting x3 = s and x5 = t, we obtain -s + t 1 -1 X1 -3 - 2s - 3t -2 X2 = s 1 + t 0 = su + tv s x = X3 0 -4t -4 X4 0 X5 Thus, u and v span null(A ), and since they are linearly independent, they form a basis for null(A). 4 202 Chapter 3 Matrices Following is a summary of the most effective procedure to use to find bases for the row space, the column space, and the null space of a matrix A. 1. Find the reduced row echelon form R of A. 2. Use the nonzero row vectors of R (containing the leading l s) to form a basis for row(A) . 3. Use the column vectors o fA that correspond t o the columns o f R containing the leading ls (the pivot columns) to form a basis for col (A) . 4. Solve fo r the leading variables o f Rx = 0 i n terms o f the free variables, set the free variables equal to parameters, substitute back into x, and write the result as a linear combination off vectors (where f is the number of free variables) . These f vectors form a basis for null (A) . I f we d o not need t o find the null space, then it i s faster t o simply reduce A t o row echelon form to find bases for the row and column spaces. Steps 2 and 3 above remain valid (with the substitution of the word "pivots" for "leading l s"). Dimension and Rank We have observed that although a subspace will have different bases, each basis has the same number of vectors. This fundamental fact will be of vital importance from here on in this book. Theorem 3 . 2 3 The Basis Theorem Let S be a subspace of !R n . Then any two bases for S have the same number of vectors. Proof Let B = {u 1 , u2 , . . . , u,.} and C = {v1 , v2 , , v, } be bases for S. We need to prove that r = s. We do so by showing that neither of the other two possibilities, r < s or r > s, can occur. Suppose that r < s. We will show that this forces C to be a linearly dependent set of vectors. To this end, let . Sherlock Holmes noted, "When you have eliminated the impos­ sible, whatever remains, however improbable, must be the truth" (from The Sign of Four by Sir Arthur Conan Doyle). • • (1) Since B i s a basis for S , we can write each V; a s a linear combination o f the elements "f V1 = a 1 1 u, + a 12 u2 + · · · + a , rur Vz = a 1 1 u , + a z 2U2 + · · · + a zrUr Substituting the Equations (2) into Equation ( 1 ) , we obtain (2) Section 3. 5 Subspaces, Basis, Dimension, and Rank 203 Regrouping, we have ( c 1 a ll + C2 a2 1 + . . . + c, as1 ) U1 + ( c 1 a 1 2 + C2 a22 + . . . + cs asz ) Uz + + ( c 1 a 1r + c2a2 r + + c, a ) u, = 0 Now, since B is a basis, the u/s are linearly independent. So each of the expressions in · · · · · · parentheses must be zero: c 1 a 11 + c2a2 1 + + c, a, 1 = 0 C 1 a 1 2 + C2 a22 + . . . + csas 2 = 0 · · · This is a homogeneous system of r linear equations in the s variables c 1 c 2 , c5 (The fact that the variables appear to the left of the coefficients makes no difference.) Since r < s, we know from Theorem 2.3 that there are infinitely many solutions. In particu­ lar, there is a nontrivial solution, giving a nontrivial dependence relation in Equa­ tion ( 1 ) . Thus, C is a linearly dependent set of vectors. But this finding contradicts the fact that C was given to be a basis and hence linearly independent. We conclude that r < s is not possible. Similarly (interchanging the roles of B and C), we find that r > s leads to a contradiction. Hence, we must have r = s, as desired. , , • . . • Since all bases for a given subspace must have the same number of vectors, we can attach a name to this number. If S is a subspace of !R n , then the number of vectors in a basis for S is called the dimension of S, denoted dim S. Definition The zero vector 0 by itself is always a subspace of !R n . (Why?) Yet any set containing the zero vector (and, in particular, { 0}) is linearly dependent, so { 0} cannot have a basis. We define dim {O} to be 0. Remark Exa m p l e 3 . 4 9 Since the standard basis for !R n has n vectors, dim !R n = n . (Note that this result agrees with our intuitive understanding of dimension for n :s 3.) Exa m p l e 3 . 5 0 In Examples 3.45 through 3.48, we found that row(A) has a basis with three vectors, col (A) has a basis with three vectors, and null (A) has a basis with two vectors. Hence, dim(row(A)) = 3, dim(col (A)) = 3, and dim(null (A)) = 2. A single example is not enough on which to speculate, but the fact that the row and column spaces in Example 3.50 have the same dimension is no accident. Nor is the fact that the sum of dim(col (A)) and dim(null (A)) is 5, the number of columns of A. We now prove that these relationships are true in general. 204 Chapter 3 Matrices Theorem 3 . 2 4 The row and column spaces of a matrix A have the same dimension. Proof Let R be the reduced row echelon form of A. By Theorem 3.20, row(A) = row(R), so dim(row (A)) = dim(row(R)) = number of nonzero rows of R = number of leading l s of R Let this number be called r. Now col (A) * col (R), but the columns of A and R have the same dependence relationships. Therefore, dim(col (A)) = dim(col (R)). Since there are r leading ls, R has r columns that are standard unit vectors, e1, e2 , . . . , er. (These will be vectors in !R m if A and R are m X n matrices.) These r vectors are linearly independent, and the remaining columns of R are linear combinations of them. Thus, dim(col (R)) = r. It follows that dim(row(A)) = r = dim(col (A)), as we wished to prove. 1878 (1849-1917), The rank o f a matrix was first de­ fined in by Georg Frobenius although he defined it using determinants and not as we have done here. (See Chapter Frobenius was a German mathematician who received his doctorate from and later taught at the University of Berlin. Best known for his contributions to group theory, Frobenius used matrices in his work on group representations. 4.) The rank of a matrix A is the dimension of its row and column spaces and is denoted by rank(A) . Definition For Example 3.50, we can thus write rank(A) = 3. R e m a rks The preceding definition agrees with the more informal definition of rank that was introduced in Chapter 2. The advantage of our new definition is that it is much more flexible. • The rank of a matrix simultaneously gives us information about linear dependence among the row vectors of the matrix and among its column vectors. In particular, it tells us the number of rows and columns that are linearly independent (and this number is the same in each case!). • Since the row vectors of A are the column vectors of A r, Theorem 3.24 has the following immediate corollary. Theorem 3 . 2 5 For any matrix A, rank (A T) = rank (A ) Proof We have rank (A T) = dim ( col (A T)) = dim ( row (A )) = rank (A ) The nullity of a matrix A is the dimension of its null space and is denoted by nullity(A) . Definition Section 3. 5 Subspaces, Basis, Dimension, and Rank 205 In other words, nullity(A) is the dimension of the solution space of Ax = 0, which is the same as the number of free variables in the solution. We can now revisit the Rank Theorem (Theorem 2.2), rephrasing it in terms of our new definitions. Theorem 3 . 2 6 The Rank Theorem If A is an m X n matrix, then rank (A ) + nullity (A ) = n - Proof Let R be the reduced row echelon form of A, and suppose that rank(A) = r. Then R has r leading ls, so there are r leading variables and n r free variables in the solution to Ax = 0. Since dim(null (A)) = n r, we have - rank (A ) + nullity (A ) = r + (n = n - r) Often, when we need to know the nullity of a matrix, we do not need to know the actual solution of Ax = 0. The Rank Theorem is extremely useful in such situations, as the following example illustrates. Exa m p l e 3 . 5 1 Find the nullity of each of the following matrices: M= N [I �: - � [� and 1 4 7 -2 -3 :1 Since the two columns of M are clearly linearly independent, rank(M ) = 2. Thus, by the Rank Theorem, nullity(M ) = 2 - rank(M ) = 2 - 2 = 0. There is no obvious dependence among the rows or columns of N, so we apply row operations to reduce it to Solution [: 2 0 -2 1 0 -� ] We have reduced the matrix far enough (we do not need reduced row echelon form here, since we are not looking for a basis for the null space). We see that there are only two nonzero rows, so rank(N) = 2. Hence, nullity(N) = 4 rank(N) = 4 2 = 2. - - 4 The results of this section allow us to extend the Fundamental Theorem of Invertible Matrices (Theorem 3 . 1 2 ) . 206 Chapter 3 Matrices Theorem 3 . 2 1 The Fundamental Theorem of Invertible Matrices: Version 2 Let A be an n X n matrix. The following statements are equivalent: The of a matrix wasSyldefined in(1814-1887), 1884nullbyityJames Joseph vesterin who was i n terested -properties of certain matrices that do not change under types of transformations. Bornthe insecond England, Syl v ester became presidentSociety. of theInLondon Mathematical 1878, whi l e teaching at Johns Hopkins University imore, he founded thein Balthetfirst mathematical journal in the United States. invariants American Journal of Mathematics, a. A is invertible. b. Ax = b has a unique solution for every b in ll�r. c. Ax = 0 has only the trivial solution. d. The reduced row echelon form of A is In e. A is a product of elementary matrices. f. rank(A) = n g. nullity(A) = 0 h. The column vectors of A are linearly independent. i. The column vectors of A span ll�r. j. The column vectors of A form a basis for ll�r. k. The row vectors of A are linearly independent. 1. The row vectors of A span !R n . m. The row vectors of A form a basis for !R n . We have already established the equivalence of (a) through (e). It remains to be shown that statements (f) to (m) are equivalent to the first five statements. (f) � (g) Since rank(A) + nullity(A) = n when A is an n X n matrix, it follows from the Rank Theorem that rank(A) = n if and only if nullity(A) = 0. (f) ==> (d) ==> ( c) ==> (h) If rank(A) = n , then the reduced row echelon form of A has n leading ls and so is I - From (d) ==> (c) we know that Ax = 0 has only the trivial n solution, which implies that the column vectors of A are linearly independent, since Ax is just a linear combination of the column vectors of A. (h) ==> (i) If the column vectors of A are linearly independent, then Ax = 0 has only the trivial solution. Thus, by (c) ==> (b), Ax = b has a unique solution for every b in !R n . This means that every vector b in !R n can be written as a linear combination of the column vectors of A, establishing (i). (i) ==> (j) If the column vectors of A span W, then col (A) = !R n by definition, so rank(A) = dim(col (A)) = n. This is (f), and we have already established that (f) ==> (h) . We conclude that the column vectors of A are linearly independent and so form a basis for !R n , since, by assumption, they also span !R n . (j) ==> (f) If the column vectors of A form a basis for !R n , then, in particular, they are linearly independent. It follows that the reduced row echelon form of A contains n leading l s, and thus rank(A) = n . The above discussion shows that (f) ==> (d) ==> (c) ==> (h) ==> (i) ==> (j) ==> (f) � (g) . Now recall that, by Theorem 3.25, rank(A T ) = rank(A), so what we have just proved gives us the corresponding results about the column vectors of A r_ These are then results about the row vectors of A, bringing (k), (1), and (m) into the network of equivalences and completing the proof. Proof Theorems such as the Fundamental Theorem are not merely of theoretical inter­ est. They are tremendous labor-saving devices as well. The Fundamental Theorem has already allowed us to cut in half the work needed to check that two square matri­ ces are inverses. It also simplifies the task of showing that certain sets of vectors are bases for !R n . Indeed, when we have a set of n vectors in !R n , that set will be a basis for !R n if either of the necessary properties of linear independence or spanning set is true. The next example shows how easy the calculations can be. Section 3. 5 Subspaces, Basis, Dimension, and Rank Exa m p l e 3 . 5 2 Show that the vectors 201 m n mr and form a basis for IR 3 . Solution According to the Fundamental Theorem, the vectors will form a basis for IR 3 if and only if a matrix with these vectors as its columns (or rows) has rank 3. We [� � � f l --7 [ � - � J perform just enough row operations to determine this: A - We see that A has rank 3, so the given vectors are a basis for IR 3 by the equivalence of (f) and (j) . 4 The next theorem is an application of both the Rank Theorem and the Funda­ mental Theorem. We will require this result in Chapters 5 and 7. Theorem 3 . 2 8 Let A be an m X n matrix. Then: a. rank(A TA) = rank(A) b. The n X n matrix A TA is invertible if and only if rank(A) = n . Proof (a) Since A TA is n X n , it has the same number of columns as A. The Rank Theorem then tells us that rank (A ) + nullity (A ) = n = rank (A TA ) + nullity (A TA ) Hence, to show that rank(A) = rank(A TA), it is enough to show that nullity(A) = nullity(A TA). We will do so by establishing that the null spaces of A and A TA are the same. To this end, let x be in null (A) so that Ax = 0. Then A TAx = A To = 0, and thus x is in null (A TA). Conversely, let x be in null (A TA). Then A TAx = 0, so xTA TAx = xTO = 0. But then and hence Ax = 0, by Theorem 1 .2 (d) . Therefore, x is in null (A), so null (A) null (A TA), as required. (b) By the Fundamental Theorem, the n X n matrix A TA is invertible if and only if rank(A TA) = n. But, by (a) this is so if and only if rank(A) = n . Coordin ates We now return to one of the questions posed at the very beginning of this section: How should we view vectors in IR 3 that live in a plane through the origin? Are they two-dimensional or three-dimensional? The notions of basis and dimension will help clarify things. 208 Chapter 3 Matrices A plane through the origin is a two-dimensional subspace o f IR 3 , with any set of two direction vectors serving as a basis. Basis vectors locate coordinate axes in the plane/subspace, in turn allowing us to view the plane as a "copy" of IR 2 • Before we illustrate this approach, we prove a theorem guaranteeing that "coordinates" that arise in this way are unique. Theorem 3 . 2 9 Let S be a subspace of !R n and let B = {v1 , v2 , . . . , vk } be a basis for S. For every vector v in S, there is exactly one way to write v as a linear combination of the basis vectors in B: Proof Since B is a basis, it spans S, so v can be written in at least one way as a linear combination of v1, v2 , . . . , vk . Let one of these linear combinations be Our task is to show that this is the only way to write v as a linear combination of v1, v2 , . . . , vk . To this end, suppose that we also have v = d1 v1 + d2v2 + · · · + dkvk Then Rearranging (using properties of vector algebra), we obtain (c 1 - d 1 )v1 + (c2 - d2 )v2 + · · · + (ck - dk ) vk = 0 Since B is a basis, v1, v2 , . . . , vk are linearly independent. Therefore, (c 1 - d1 ) = (c2 - d2 ) = · · · = (ck - dk ) = 0 In other words, c1 = d1, c 2 = d2 , , ck = dk, and the two linear combinations are actually the same. Thus, there is exactly one way to write v as a linear combination of • • • the basis vectors in B. Let S be a subspace of !R n and let B = {v1 , v2 , . . . , vk } be a basis for S. Let v be a vector in S, and write v = c1v1 + c2v2 + · · · + ckvk . Then c1, c2 , . . . , ck are called the coordinates of v with respect to B, and the column vector Definition is called the coordinate vector of v with respect to B. Exa m p l e 3 . 5 3 Let E = { e 1 , e2 , e 3 } be the standard basis for IR 3 . Find the coordinate vector of with respect to E. Section 3. 5 Subspaces, Basis, Dimension, and Rank Solution Since v = 209 2e1 7e 4e 2 [ 47 ] + 2 + 3, [v] E = It should be clear that the coordinate vector of every (column) vector in !R n with respect to the standard basis is just the vector itself. Exa m p l e 3 . 5 4 ID Exompk 3.44, we <aw iliat u � 2u [ -} m [ _�] � and w � [ ;l - ace th"e v" tors in the same subspace (plane through the origin) S of IR 3 and that B = { u, v} is a basis for S. Since w = - 3v, we have [w] B = See Figure 3.3. z - 3v ,w = 2u - 3v x ---. The respectcoordinates to a basisof a vector with Figure 3 . 3 I y Exercises 3 . 5 In Exercises 1 -4, let S be the collection of vectors [; ] in IR 2 that satisfy the given property. In each case, either prove that S forms a subspace of IR 2 or give a counterexample to show that it does not. 1. x = 2. x O, y 3. y = 2x 4. xy 0 2 20 20 In Exm;,,, 5-8, let S be the collea;on of vato" [;] ;n D;l' 7. x - y + z = 1 9. Prove that every line through the origin in IR 3 is a sub­ space of IR 3 . 10. Suppose S consists of all points in IR 2 that are on the x-axis or the y-axis (or both). (S is called the union of the two axes.) Is S a subspace of IR 2 ? Why or why not? In Exercises 1 1 and 12, determine whether b is in col(A) and whether w is in row(A), as in Example 3.4 1 . 11. A = b= ,w = [-1 1] that satisfy the given property. In each case, either prove that S forms a subspace of IR 3 or give a counterexample to show that it does not. 5. x = y = z 6. z = 2x, y = 0 8. I x - y I = I Y - z I 12. A � [ � � � l [�] [ � -� � } [i} 2 4 � w� [ -5 Chapter 3 Matrices 210 13. In Exercise 1 1 , determine whether w is in row(A), using the method described in the Remark following Example 3.4 1 . 14. In Exercise 1 2 , determine whether w is in row(A), using the method described in the Remark following Example 3.4 1 . 1 5 . IfA ;, tho mateix in F.xmi" 1 1 , b � 1 6 . I fA i' 1he mateix in Exmi" 12, i< v � [ �: } [ -:} n null (A) ? n nu!l(A) ? In Exercises 1 7-20, give bases for row(A), col(A), and null(A). 17. A = 19. A = 20. A = [� -�] [ [� J !] H 1 18. A = 0 1 0 1 1 1 1 2 -1 -3 1 -4 l 0 -1 -1 -4 0 2 2 2 4 -2 In Exercises 21-24, find bases for row(A) and col(A) in the given exercises using A T. 30. [ 0 1 -2 1 1 ], [3 - 1 0], [2 1 5 1] For Exercises 31 and 32, find bases for the spans of the vectors in the given exercises from among the vectors themselves. 3 1 . Exercise 29 32. Exercise 30 33. Prove that if R is a matrix in echelon form, then a basis for row(R) consists of the nonzero rows of R. 34. Prove that if the columns of A are linearly indepen­ dent, then they must form a basis for col (A) . For Exercises 35-38, give the rank and the nullity of the matrices in the given exercises. 35. Exercise 1 7 36. Exercise 1 8 37. Exercise 1 9 38. Exercise 2 0 39. I f A i s a 3 X 5 matrix, explain why the columns o f A must be linearly dependent. 40. If A is a 4 X 2 matrix, explain why the rows of A must be linearly dependent. 41. If A is a 3 X 5 matrix, what are the possible values of nullity(A) ? 42. I f A i s a 4 X 2 matrix, what are the possible values of nullity(A) ? l In Exercises 43 and 44, find all possible values of rank(A) as a varies. 2 a [ � � =�l 2 1 . Exercise 1 7 22. Exercise 1 8 23. Exercise 1 9 24. Exercise 20 25. Explain carefully why your answers to Exercises 1 7 and 2 1 are both correct even though there appear to be differences. 26. Explain carefully why your answers to Exercises 1 8 and 2 2 are both correct even though there appear to be differences. Answer Exercises 45-48 by considering the matrix with the given vectors as its columns. In Exercises 27-30, find a basis for the span of the given vectors. 46. Do 27. [ - il [ - � J [ J [ - :J m. [ : J m 29. [ 2 28. -3 1], [1 - 1 O ] , [4 -4 1 ] 4a 2 -2 1 45. Do 47. Do 44. A = [J [ H [ : J [ -J [ - n [ -1 ] -2 -1 form a b"i' foe � ' ? rn ril m m focm a bM1' fm � ' ? focm a bMi' foe � ' ' a Section 3. 6 Introduction to Linear Transformations 49. Do 50. Do m [:J m m [:J [�l form a b.,i, fo, Zl? fmm a bn<i• fo, Zj? In Exercises 51 and 52, show that w is in span(B) andfind the coordinate vector [w] 8. In Exercises 53-56, compute the rank and nullity of the given matrices over the indicated "ll..P " 53. 55. 56. [ � :J [� �] 1 1 0 ov°' z, 3 1 3 0 0 4 [! 4 0 0 3 5 1 0 2 2 1 54. om Z; �] [ � �] 1 1 0 om Z, 211 57. If A is m X n, prove that every vector in null (A) is orthogonal to every vector in row(A) . 58. I f A and B are n X n matrices o f rank n, prove that AB has rank n. 59. (a) Prove that rank(AB) :::::: rank(B) . [Hint: Review Exercise 29 in Section 3 . 1 . ] (b) Give an example in which rank(AB) < rank(B). 60. ( a) Prove that rank(AB) :::::: rank(A). [Hint: Review Exercise 30 in Section 3 . 1 or use transposes and Exercise 59(a).] (b) Give an example in which rank(AB) < rank(A) . 61. ( a) Prove that i f U i s invertible, then rank( UA) = rank(A) . [Hint: A = U- 1 ( UA).] (b) Prove that if V is invertible, then rank(A V) = rank(A) . 62. Prove that an m X n matrix A has rank 1 i f and only if A can be written as the outer product uvT of a vector u in !Rm and v in !R n . 63. If an m X n matrix A has rank r, prove that A can be written as the sum of r matrices, each of which has rank 1. [Hint: Find a way to use Exercise 62.] 64. Prove that, for m X n matrices A and B, rank (A + B) :::::: rank(A) + rank(B). 65. Let A be an n X n matrix such that A 2 = 0. Prove that rank(A) :::::: n/2. [Hint: Show that col(A) � null(A) and use the Rank Theorem.] 66. Let A be a skew-symmetric n X n matrix. (See page 1 6 2 ) . ( a) Prove that xT Ax = 0 for all x in !R n . (b) Prove that I + A is invertible. [Hint: Show that null(I + A) = {O}.] om Z, 0 I n t ro d u ct i o n to l i n e a r Tra n s f o r m a t i o n s In this section, we begin to explore one o f the themes from the introduction t o this chapter. There we saw that matrices can be used to transform vectors, acting as a type of "function'' of the form w = T (v) , where the independent variable v and the de­ pendent variable w are vectors. We will make this notion more precise now and look at several examples of such matrix transformations, leading to the concept of a linear transformation- a powerful idea that we will encounter repeatedly from here on. 212 Chapter 3 Matrices We begin by recalling some of the basic concepts associated with functions. You will be familiar with most of these ideas from other courses in which you encountered func­ tions of the form f: IR ---+ IR [such as f(x) = x 2 ] that transform real numbers into real num­ bers. What is new here is that vectors are involved and we are interested only in functions that are "compatible" with the vector operations of addition and scalar multiplication. Consider an example. Let Then Thi, ,hows that A tmmfmmn into w � [ _ :J We can describe this transformation more generally. The matrix equation [ � - � l [; ] [ � l = 3 4 2x y 3x + 4y gives a formula that shows how A transforms an arbitrary vector vectm [ "_ ] [; ] in IR 2 into the 2x y in D;l'. We denote this tmmfmmotion by T, and w'ite 3x + 4y x � 2x � y 3x + 4y Y r,( [ ] ) [ l (Although technically sloppy, omitting the parentheses in definitions such as this one is a common convention that saves some writing. The description of TA becomes with this convention.) With this example in mind, we now consider some terminology. A transformation (or mapping or function) T from !R n to !R m is a rule that assigns to each vector v in !R n a unique vector T (v) in !R m . The domain of T is !R n , and the codomain of T is !R m . We indicate this by writing T : !R n ---+ !R m . For a vector v in the domain of T, the vector T (v) in the co domain is called the image of v under (the action of) T. The set of all possible images T (v) (as v varies throughout the domain of T) is called the range of T. In our example, the domain of TA is IR 2 and its codomain is IR 3 , so we write T,, D;l' --. D;l'. The image of v � [_:J is w � T, ( v) � [ _iJ What is the cange of Section 3. 6 Introduction to Linear Transformations 213 TA? It consists of all vectors in the codomain IR 3 that are of the form TA � [ -�i [ ] [ � ] [�] [ �] x y 2x y = x +y 3x + 4y 3 4 = whi<h d mib" the "t of •ll linm wmbimtiom of the wlumn ve<to" and [�] of A. In othe; wo'd', the 'ange of T IB the wlumn 'P"'e of A ! (We will have more to say about this later-for now we'll simply note it as an interesting observation.) Geometrically, this shows that the range of TA is the plane through the origin in IR 3 with direction vectors given by the column vectors of A. Notice that the range of TA is strictly smaller than the co domain of TA. linear Tra nsformations The example TA above is a special case of a more general type of transformation called a linear transformation. We will consider the general definition in Chapter 6, but the essence of it is that these are the transformations that "preserve" the vector operations of addition and scalar multiplication. D e fi n it i o n A transformation T : !R n ---+ !R m is called a linear transformation if 1 . T (u + v) = T (u) + T (v) for all u and v in !R n and 2. T(cv) = cT (v) for all v in !R n and all scalars c. Exa m p l e 3 . 5 5 Consider once again the transformation T : IR 2 ---+ IR 3 defined by r[;] � [ :: ; � l [;: J [;: J Let's check that T is a linear transformation. To verify ( 1 ) , we let u= Then and v = 214 Chapter 3 Matrices To show (2 ), we let v = [; ] T( cv) = and let c b e a scalar. Then r(c [;] ) r( [ :;] ) :x__ c� [ 32((cx)cx) 4(cy)(cy) ] [ cc((3x2 4y)y) ] = = + + = cT(v) Thus, T is a linear transformation. Remark The definition of a linear transformation can be streamlined by com­ bining ( 1 ) and (2 ) as shown below. T : !R n � !R m is a linear transformation if In Exercise 53 , you will be asked to show that the statement above is equivalent to the original definition. In practice, this equivalent formulation can save some writing-try it! Although the linear transformation T in Example 3.55 originally arose as a matrix transformation TA, it is a simple matter to recover the matrix A from the definition of T given in the example. We observe that rn T � TA, whm A � [: � J - (N otke that when the vmiahles x ond y "' J;ned up, the matrix A is just their coefficient matrix.) Recognizing that a transformation is a matrix transformation is important, since, as the next theorem shows, all matrix transformations are linear transformations. Theorem 3 . 3 0 Let A be an m X n matrix. Then the matrix transformation TA : !R n � !R m defined by is a linear transformation. Section 3. 6 Introduction to Linear Transformations Proof 215 Let u and v be vectors in !R n and let c be a scalar. Then and TA (cv) = A(cv) = c (Av) = cTA (v) Hence, TA is a linear transformation. Exa m p l e 3 . 5 6 Let F : IR 2 ---+ IR 2 be the transformation that sends each point to its reflection in the x-axis. Show that F is a linear transformation. From Figure 3.4, it is clear that F sends the point (x, y) to the point (x, -y). Thus, we may write Solution y ( 1 , 2) T (x, y) T I I I �+----+1�--+-�� :I 1--• x I I I I We could proceed to check that F is linear, as in Example 3.55 (this one is even easier to check! ), but it is faster to observe that I I • • (x, - y ) ( 1 , - 2) Therefore, F Reflection in the x-axis Figure 3 . 4 [; ] [; ] = A , where A = [ � � ], _ so F is a matrix transformation. It now follows, by Theorem 3.30, that F is a linear transformation. Exa m p l e 3 . 5 1 Let R : IR 2 ---+ IR 2 be the transformation that rotates each point 90° counterclockwise about the origin. Show that R is a linear transformation. Solution we have As Figure 3.5 shows, R sends the point (x, y) to the point ( - y, x). Thus, Hence, R is a matrix transformation and is therefore linear. -y 90° rotation Figure 3 . 5 A x Observe that if we multiply a matrix by standard basis vectors, we obtain the col­ umns of the matrix. For example, We can use this observation to show that every linear transformation from !R n to !R m arises as a matrix transformation. 216 Chapter 3 Matrices Theorem 3 . 3 1 Let T : !R n � !R m b e a linear transformation. Then T is a matrix transformation. More specifically, T = TA > where A is the m X n matrix Let e1, e2 , , en be the standard basis vectors in !R n and let x be a vector n in !R . We can write x = x1e1 + x 2 e2 +· · · + xn e n (where the x/s are the components of x) . We also know that T(e1), T(e2 ), , T(e n ) are (column) vectors in !R m . Let A = [ T(e1) : T(e2 ) : · : T(en )l be the m X n matrix with these vectors as its columns. Then Proof • • . • • . • • T ( x) = T(x1 e 1 + X 2 e 2 + . . . + xn en ) = x1 T(e 1 ) + x2 T(e2 ) + · · · + Xn T(en ) [ T(e , ) T(e, ) : · · · T(e,) ] � as required. [}] � Ax The matrix A in Theorem 3.31 is called the standard matrix of the linear trans­ formation T. Exa m p l e 3 . 5 8 ,...... Show that a rotation about the origin through an angle e defines a linear transforma­ tion from IR 2 to IR 2 and find its standard matrix. Solulion Let Re be the rotation. We will give a geometric argument to establish the fact that R0 is linear. Let u and v be vectors in IR 2 • If they are not parallel, then Figure 3.6(a) shows the parallelogram rule that determines u + v. If we now apply R0 , the entire parallelogram is rotated through the angle e, as shown in Figure 3.6(b ). But the diagonal of this parallelogram must be R0(u) + R0(v), again by the parallelogram rule. Hence, R0(u + v) = R0(u) + R0(v) . (What happens if u and v are parallel?) y y u+v r (a) - I u I I I Re ( v)' x x (b) Figure 3 . 6 ,...... Similarly, if we apply Re to v and cv, we obtain R0(v) and Re(cv) , as shown in Figure 3.7. But since the rotation does not affect lengths, we must then have R8 ( cv) = cR8 ( v) , as required. (Draw diagrams for the cases 0 < c < 1 , - 1 < c < 0, and c < - 1 .) Section 3. 6 Introduction to Linear Transformations 211 y y '--_,-' cos 8 CV Re (cv) Figure 3 . 1 ( 1 , 0) R 0(e1) Figure 3 . 8 Therefore, Re is a linear transformation. According to Theorem 3.3 1 , we can find its matrix by determining its effect on the standard basis vectors e1 and e2 of IFR 2 • Now, 1 c s0 as Figure 3.8 shows, R 0 = � . 0 sm () We can find Re [] [ ] [�] [] similarly, but it is faster to observe that Re [�] [] [ ] must be per- 1 0 - sin () = pendicular (counterclockwise) to R0 and so, by Example 3.57, R8 . 1 cos () 0 (F 1gure 3 . 9) . Therefore, the standard matrix of Re is [ cos () . () sm ] - sin () . cos () y R e(ez) Figure 3 . 9 The result of Example 3.58 can now be used to compute the effect of any rota­ tion. For example, suppose we wish to rotate the point (2, - 1 ) through 60° about the origin. (The convention is that a positive angle corresponds to a counterclockwise Chapter 3 Matrices 218 rotation, while a negative angle is clockwise.) Since cos 60° = 1 /2 and sin 60° = '\/3/2, we compute y R60 [] [ cos 60° 2 = -1 sin 60° - sin 60° cos 60° (2, I ) - 60° rotation 1 /2 '\/3/2 - '\/3/2 1 /2 ( 2 + '\/3) /2 ( 2 '\/3 - 1 ) /2 ] ][ ] 2 -1 = Exa m p l e 3 . 5 9 (a) Show that the transformation P : IR 2 ---+ IR 2 that projects a point onto the x-axis is a linear transformation and find its standard matrix. (b) More generally, if e is a line through the origin in IR 2 , show that the transforma­ tion Pe : IR 2 ---+ IR 2 that projects a point onto e is a linear transformation and find its standard matrix. Solution y (a) As Figure 3. 1 1 shows, P sends the point (x, y) to the point (x, O). Thus, It follows that P is a matrix transformation (and hence a linear transformation) with (x, y ) T -+------+---+ x (x, 0) projection Figure 3 . 1 1 A 2 -1 Thus, the image of the point (2, - 1 ) under this rotation is the point ((2 + '\/3)/2, (2'\/3 - 1 )/2) ( 1 .87, 1 .23), as shown in Figure 3.10. Figure 3 . 1 0 A ][ ] [ [ standard matrix [� �] . (b) Let the line e have direction vector d and let v be an arbitrary vector. Then Pe is given by projd(v), the projection of v onto d, which you'll recall from Section 1 .2 has the formula Thus, to show that Pe is linear, we proceed as follows: Pe( u + v) = = = = ( � ) ( ) ( ) ( ) ( ) d · (u + v) d d·d d·v d·u d d d d·u - d·d + d·v - d·d d·u d + d·d d d•v d = Pe(u) + Pe(v) d·d Similarly, Pe(cv) = cPe(v) for any scalar c (Exercise 52). Hence, Pe is a linear transformation. Section 3. 6 Introduction to Linear Transformations To find the standard matrix of Pe, we apply Theorem 3 .3 1 . If we let d = 219 [�: ] , then and Thus, the standard matrix of the projection is d 1 d2 d� U (d f + d D d 1 d2/(d i + d i ) ] ] [ ddd2/(d? + d i ) d U (dl + d i ) = 1 As a check, note that in part (a) we could take d x-axis. Therefore, d1 = 1 and d2 = = 0, and we obtain A e1 as a direction vector for the = [� �] , 4 as before. New Linear Tra nsformations from Old If T : !R m � !R n and S : !R n � [RP are linear transformations, then we may follow T by S to form the composition of the two transformations, denoted S T. Notice that, in order for S T to make sense, the codomain of T and the domain of S must match (in this case, they are both !R n ) and the resulting composite transformation S T goes from the domain of T to the codomain of S (in this case, S T : !R m � [RP ). Figure 3 . 1 2 shows schematically how this composition works. The formal definition o f composi­ tion of transformations is taken directly from this figure and is the same as the cor­ responding definition of composition of ordinary functions: 0 0 0 0 ( S o T ) (v) = S ( T ( v)) Of course, we would like S T to be a linear transformation too, and happily we find that it is. We can demonstrate this by showing that S T satisfies the definition of a linear transformation (which we will do in Chapter 6), but, since for the time being we are assuming that linear transformations and matrix transformations are the same thing, it is enough to show that S T is a matrix transformation. We will use the nota­ tion [ T ] for the standard matrix of a linear transformation T. 0 0 0 S T v •�•�• S(T(v)) (S T)(v) T(v) �m �n �P = The composition of transformations Figure 3 . 1 2 0 220 Chapter 3 Matrices Theorem 3 . 3 2 Let T : !R m ---+ !R n and S : !R n ---+ !RP b e linear transformations. Then S T : !R m ---+ !RP is a linear transformation. Moreover, their standard matrices are related by 0 [S TJ = [SJ [ TJ 0 Let [S J = A and [ T J = B. (Notice that A is p X n and B is n X m.) If v is a vector in !R m , then we simply compute Proof ( S T ) (v) = S ( T ( v)) = S ( Bv) = A ( Bv) = (AB ) v 0 ....... (Notice here that the dimensions of A and B guarantee that the product AB makes sense.) Thus, we see that the effect of S T is to multiply vectors by AB, from which it follows immediately that S T is a matrix (hence, linear) transformation with [S T J = [S J [ T J . 0 0 0 Isn't this a great result? Say it in words: "The matrix of the composite is the prod­ uct of the matrices:' What a lovely formula! Exa m p l e 3 . 6 0 Consider the linear transformation T : IR 2 ---+ IR 3 from Example 3.55, defined by r [:: J [ 3x1 + ] = x 2x 1 � x2 4X2 and the linear transformation S : IR 3 ---+ IR 4 defined by Solulion We see that the standard matrices are [SJ so Theorem 3.32 gives [S o TJ _ -l [! � � ( S a T) ITI [ � -: l : -� i [� m� � [ ] + [:; ] + [ :J [ -1 -l l [ '" + �,l [SJ [ TJ It follows that and 0 3 -1 -1 6 3x 1 - 7x2 - xi X2 6x 1 3x2 1 3 Section 3. 6 Introduction to Linear Transformations 221 (In Exercise 29, you will be asked to check this result by setting [;:] r[::J [ = = y3 x 2x 1 � x2 3x, + 4x2 ] and substituting these values into the definition of S, thereby calculating ( S T) dice<tly.) 0 Exa m p l e 3 . 6 1 x, .:+ Find the standard matrix of the transformation that first rotates a point 90° counter­ clockwise about the origin and then reflects the result in the x-axis. [ 1 -1O ] . 3.56, respectively, where we found their standard matrices to be [ R ] = [F] = [ � �] The rotation R and the reflection F were discussed in Examples 3.57 and Solution 0 - and It follows that the composition F R has for its matrix 0 [F a R ] = [ F ] [R ] = ii--""-- [] [ 1 -1o ] [ o1 1 ] [ -1o 1 ] - 0 0 - = 0 (Check that this result is correct by considering the effect of F R on the standard basis vectors e1 and e 2 . Note the importance of the order of the transformations: R is performed before F, but we write F R. In this case, R F also makes sense. Is R o F = F o R?) 0 0 ° 4 Inverses of linear Transformations Consider the effect of a 90° counterclockwise rotation about the origin followed by a 90° clockwise rotation about the origin. Clearly this leaves every point in IR 2 un­ changed. If we denote these transformations by R90 and R_90 (remember that a nega­ tive angle measure corresponds to clockwise direction), then we may express this as ( R90 R_90) (v) = v for every v in IR 2 . Note that, in this case, if we perform the transformations in the other order, we get the same end result: (R_90 R9 0) (v) = v for every v in IR 2 • Thus, R9 0 R_9 0 (and R_9 0 R9 0 too) is a linear transformation that leaves every vector in IR 2 unchanged. Such a transformation is called an identity transformation. Generally, we have one such transformation for every !R n -namely, I : !R n � !R n such that I(v) = v for every v in !R n . (If it is important to keep track of the dimension of the space, we might write In for clarity.) So, with this notation, we have R90 R_90 = I = R_90 R90. A pair of transforma­ tions that are related to each other in this way are called inverse transformations. ° ° ° ° ° ° Let S and T be linear transformations from !R n to !R n . Then S and T are inverse transformations if S T = In and T S = In - D e fi n i t i o n 0 0 222 Chapter 3 Matrices Remark Since this definition is symmetric with respect to S and T, we will say that, when this situation occurs, S is the inverse of T and T is the inverse of S. Further­ more, we will say that S and T are invertible. � In terms of matrices, we see immediately that if S and T are inverse transformations, then [S J [ T J = [S T J = [ JJ = I, where the last I is the identity matrix. (Why is the standard matrix of the identity transformation the identity matrix?) We must also have [ T ] [S J = [ T S J = [ JJ = I. This shows that [S J and [ T J are inverse matrices. It shows something more: If a linear transformation T is invertible, then its standard matrix [ T J must be invertible, and since matrix inverses are unique, this means that the inverse of T is also unique. Therefore, we can unambiguously use the notation r - 1 to refer to the inverse of T. Thus, we can rewrite the above equations as 1 1 1 [ T J [ T - J = I = [ T - J [ T J , showing that the matrix of r - is the inverse matrix of [ T J . We have just proved the following theorem. 0 0 Theorem 3 . 3 3 Let T : !R n ---+ !R n b e an invertible linear transformation. Then its standard matrix [ T J is an invertible matrix, and Remark Say this one in words too: "The matrix of the inverse is the inverse of the matrix:' Fabulous! Exa m p l e 3 . 6 2 Find the standard matrix of a 60° clockwise rotation about the origin in IR 2 • Earlier we computed the matrix of a 60° counterclockwise rotation about the origin to be - V3/2 1 /2 Solulion ] Since a 60° clockwise rotation is the inverse of a 60° counterclockwise rotation, we can apply Theorem 3.33 to obtain )-1 1 /2 V3/2 1 /2 - V3/2 - 1 [ R - 6o J - [ ( R6o J 1 /2 1 /2 - V3/2 V3/2 [ lt--l'-- Exa m p l e 3 . 6 3 ] [ ] (Check the calculation of the matrix inverse. The fastest way is to use the 2 X 2 short­ cut from Theorem 3.8. Also, check that the resulting matrix has the right effect on the standard basis in IR 2 by drawing a diagram.) 4 Determine whether projection onto the x-axis is an invertible transformation, and if it is, find its inverse. Solulion The standard matrix of this projection P is [� �] , since its determinant is 0. Hence, P is not invertible either. which is not invertible 4 Section 3. 6 Introduction to Linear Transformations 223 Remark Figure 3 . 1 3 gives some idea why P in Example 3.63 is not invertible. The projection "collapses" IR 2 onto the x-axis. For P to be invertible, we would have to have a way of "undoing" it, to recover the point ( a, b) we started with. However, there are infinitely many candidates for the image of ( a, O) under such a hypothetical "inverse:' Which one should we use? We cannot simply say that P - 1 must send (a, 0) to (a, b), since this cannot be a definition when we have no way of knowing what b should be. (See Exercise 42.) T (a, b) I � (a, b ') I (a, 0) I • (a, b ' ') Projections are not invertible Figure 3 . 1 3 Associalivilv � Theorem 3.3 (a) in Section 3.2 stated the associativity property for matrix multiplication: A (BC) = (AB) C. (If you didn't try to prove it then, do so now. Even with all matrices restricted 2 X 2, you will get some feeling for the notational complexity involved in an "elementwise" proof, which should make you appreciate the proof we are about to give.) Our approach to the proof is via linear transformations. We have seen that every m n m X n matrix A gives rise to a linear transformation TA : !R ---+ !R ; conversely, every m n linear transformation T : !R ---+ !R has a corresponding m X n matrix [ T ] . The two correspondences are inversely related; that is, given A, [ TA] = A, and given T, Tr T J = T. Let R = TA, S = Tn, and T = Tc. Then, by Theorem 3.32, A ( BC) = (AB ) C if and only if R ( S T ) = ( R S ) a T We now prove the latter identity. Let x be in the domain of T [and hence in the do­ main of both R (S T) and (R S) T-why?] . To prove that R (S T) = (R S) T, it is enough to prove that they have the same effect on x. By repeated application of the definition of composition, we have 0 � 0 0 0 0 0 0 0 0 0 0 ( R a ( S a T ) ) ( x) = R ( ( S T ) ( x)) = R ( S ( T ( x))) = ( R S ) ( T ( x)) = ( ( R S ) T ) ( x) 0 0 � I 0 0 as required. (Carefully check how the definition of composition has been used four times.) This section has served as an introduction to linear transformations. In Chap­ ter 6, we will take a more detailed and more general look at these transformations. The exercises that follow also contain some additional explorations of this important concept. Exercises 3 . 6 1 . Let TA : IR 2 IR 2 be the matrix transformation corre­ [� -:] . [ � ] [ - �l ---+ sponding to A = where u = and v = Find TA (u) and TA (v), 2. Let TA : IR 2 ---+ [ ] � ! [�] [ _�J . IR 3 be the matrix transformation corre- sponding to A = TA (v), where u = 3 -1 . Find TA ( u) and and v = 224 Chapter 3 Matrices In Exercises 3-6, prove th at th e given transformation is a linear transformation, using th e definition (or th e Remark following Example 3 . 55) . In Exercises 7- 1 0, give a counterexample to sh ow th at th e given transformation is not a linear transformation. I I 7. r 8. r :2 Y + 9. r 10. r x7 y- 1 [; J [ l x l ] [; J [ x l ] [; J [ J [; J [ J In Exercises 1 1 - 1 4, find th e standard matrix of th e linear transformation in th e given exercise. 12. Exercise 4 14. Exercise 6 1 1 . Exercise 3 13. Exercise 5 In Exercises 215-18,2 sh ow th at th e given transformation from IR to IR is linear by sh owing th at it is a matrix transformation. 15. F reflects a vector in the y-axis. 16. R rotates a vector 45° counterclockwise about the origin. 17. D stretches a vector by a factor of 2 in the x-component and a factor of 3 in the y-component. 18. P projects a vector onto the line y = x. 19. The three types of elementary matrices give rise to five types of 2 X 2 matrices with one of the following forms: [� �] [� �] [� �] [� �] [� �] In Exercises 20-25, find th e standard matrix of th e given 2 2 linear transformation from IR to IR . 20. Counterclockwise rotation through 1 20° about the origin 2 1 . Clockwise rotation through 30° about the origin 22. Projection onto the line y = 2x 23. Projection onto the line y = - x 24. Reflection in the line y = x 25. Reflection in the line y = - x 26. Let e be a line through the origin in IR 2 , Pe the linear transformation that projects a vector onto e, and Fe the transformation that reflects a vector in e. (a) Draw diagrams to show that Fe is linear. (b) Figure 3.14 suggests a way to find the matrix of Fe, using the fact that the diagonals of a parallelogram bisect each other. Prove that Fe (x) = 2Pe (x) - x, and use this result to show that the standard matrix of Fe is (where the direction vector of e is d = [ �: ] ). (c) If the angle between e and the positive x-axis is e , show that the matrix of Fe is [ cos W sin W sin W - cos W ] y or or Each of these elementary matrices corresponds to a linear transformation from IR 2 to IR 2 • Draw pictures to illustrate the effect of each one on the unit square with vertices at (0, O), ( 1 , O), (0, 1), and ( 1 , 1 ) . Figure 3 . 1 4 In Exercises 27 and 28, apply part (b) or (c) of Exercise 26 to find th e standard matrix of th e transformation. 27. Reflection in the line y = 2x Section 3. 6 Introduction to Linear Transformations \/3 28. Reflection in the line y = x 29. Check the formula for S T in Example 3.60, by performing the suggested direct substitution. 0 In Exercises 30-35, verify Th eorem 3.32 byfinding th e matrix of S T (a) by direct substitution and (b) by matrix multiplication of [S] [T] . 0 225 41. If () is the angle between lines e and m (through the origin), then Fm Fe = R + w · (See Exercise 26.) 42. (a) If P is a projection, then P P = P. ° 0 (b) The matrix of a projection can never be invertible. 43. If e, m , and n are three lines through the origin, then Fn Fm Fe is also a reflection in a line through the origin. 44. Let T be a linear transformation from IR 2 to IR 2 (or from IR 3 to IR 3 ). Prove that T maps a straight line to a straight line or a point. [Hint: Use the vector form of the equation of a line.] 45. Let T be a linear transformation from IR 2 to IR 2 (or from IR 3 to IR 3 ). Prove that T maps parallel lines to parallel lines, a single line, a pair of points, or a single point. ° ° In Exercises 46-51, let ABCD be th e square with vertices ( - 1 , 1 ) , ( 1 , 1 ) , ( 1 , - 1 ), and ( l, - 1 ) . Use th e results in Exercises 44 and 45 to find and draw th e image ofABCD under th e given transformation. 46. T in Exercise 3 - 47. D in Exercise 1 7 48. P in Exercise 1 8 49. The projection in Exercise 2 2 In Exercises 36-39, find th e standard matrix of th e compos­ ite transformation from IR 2 to IR 2 • 36. Counterclockwise rotation through 60°, followed by reflection in the line y = x 37. Reflection in the y-axis, followed by clockwise rotation through 30° 38. Clockwise rotation through 45°, followed by projec­ tion onto the y-axis, followed by clockwise rotation through 45° 39. Reflection in the line y = x, followed by counterclock­ wise rotation through 30°, followed by reflection in the line y = -x In Exercises 40-43, use matrices to prove th e given state­ ments about transformations from IR 2 to IR 2 • 40. If Re denotes a rotation (about the origin) through the angle (), then R" R13 = Ra + f3 · 0 50. T in Exercise 3 1 5 1 . The transformation in Exercise 37 52. Prove that Pe ( cv) = cPe ( v) for any scalar c [Example 3.59(b) ] . 53. Prove that T : !R n ---+ !R m is a linear transformation if and only if for all v1 , v2 in !R n and scalars c1 , c2 . 54. Prove that (as noted at the beginning of this section) the range of a linear transformation T : !R n ---+ !R m is the column space of its matrix [ T ] . 55. If A is an invertible 2 X 2 matrix, what does the Fundamental Theorem of Invertible Matrices assert about the corresponding linear transformation TA in light of Exercise 19? Vi gnette Rob o t i c s I n 1 98 1 , the U.S. Space Shuttle Columbia blasted o ff equipped with a device called the Shuttle Remote Manipulator System (SRMS). This robotic arm, known as Canadarm, has proved to be a vital tool in all subsequent space shuttle missions, providing strong, yet precise and delicate handling of its payloads (see Figure 3 . 1 5). Canadarm has been used to place satellites into their proper orbit and to retrieve malfunctioning ones for repair, and it has also performed critical repairs to the shut­ tle itself. Notably, the robotic arm was instrumental in the successful repair of the Hubble Space Telescope. Since 1 998, Canadarm has played an important role in the assembly and operation of the International Space Station. ;;; Canadarm <( z 35 <( z Figure 3 . 1 5 226 A robotic arm consists of a series of links of fixed length connected at joints where they can rotate. Each link can therefore rotate in space, or (through the effect of the other links) be translated parallel to itself, or move by a combination (composition) of rotations and translations. Before we can design a mathematical model for a robotic arm, we need to understand how rotations and translations work in composition. To simplify matters, we will assume that our arm is in IR 2 . ] [ In Section 3.6, we saw that the matrix of a rotation R about the origin through an cos e - sin e angle e is a linear transformation with matrix . (Figure 3 . 1 6(a)). If sm e cos e , then a translation along v is the transformation v = [:] T ( x) = x + v or, equivalently, T (Figure 3 . 1 6(b)) . y [; ] [; : : ] = y T(x) = x + v x R(x) (b) Translation (a) Rotation Figure 3 . 1 6 [�] Unfortunately, translation is not a linear transformation, because T(O) * 0. How­ ever, there is a trick that will get us around this problem. We can represent the vector x� [; ] "' th, mtoc in D;l'. This i& <oll,d "P""'nting x in homogrn,om coor­ dinates. Then the matrix multiplication [ o ] [x] [ : l represents the translated vector T (x) in homogeneous coordinates. We can treat rotations in homogeneous coordinates too. The matrix multiplication cos e sin e 0 [i l [ - sin e cos e 0 � ; = x cos e - y sin e x sin e y cos e l� [ � represents the rotated vector R(x) in homogeneous coordinates. The composition T R that gives the rotation R followed by the translation T is now represented by the product 0 a 1 b 0 1 cos e sin e 0 - sin e o cos e 0 = cos e si e - sin e cos e 0 0 �] [Note that R o T * T o R. ] To model a robotic arm, we give each link its own coordinate system (called a frame) and examine how one link moves in relation to those to which it is directly connected. To be specific, we let the coordinate axes for the link A; be X; and y;, with the X;-axis aligned with the link. The length of A; is denoted by a;, and the angle 221 between X; and X; - 1 is denoted by e;. The joint between A; and A; - 1 is at the point (0, O) relative to A; and (a;_1, O) relative to A;_ 1 . Hence, relative to A;_ 1 , the coordinate system for A; has been rotated through e i and then translated along [ a� I ] (Figure 3 . 1 7) . This transformation is represented in homogeneous coordinates by the matrix Figure 3 . 1 1 To give a specific example, consider Figure 3 . 1 8(a). It shows an arm with three links in which A 1 is in its initial position and each of the other two links has been rotated 45° from the previous link. We will take the length of each link to be 2 units. Figure 3 . 1 8(b) shows A 3 in its initial frame. The transformation T3 = [ cos 45 sin 45 0 - sin 45 cos 45 0 - 1 / \/2 l / v2 0 causes a rotation of 45° and then a translation by 2 units. As shown in 3 . 1 8(c), this places A 3 in its appropriate position relative to A 2 's frame. Next, the transformation T2 228 = [ cos 45 sin 45 0 ]� [ :2 - sin 45 2 cos 45 0 = l / v2 1/ - 1 / v2 l / v2 0 is applied to the previous result. This places both A 3 and A 2 in their correct posi­ tion relative to A 1 , as shown in Figure 3 . 1 8 (d) . Normally, a third transformation Ti (a rotation) would be applied to the previous result, but in our case, Ti is the identity transformation because A 1 stays in its initial position. Typically, we want to know the coordinates of the end (the "hand") of the robotic arm, given the length and angle parameters-this is known as forward kinematics. Following the above sequence of calculations and referring to Figure 3 . 1 8, we see that Y3 Y1 (b) A 3 in its initial frame (a) A three-link chain YI Y2 (c) T3 puts A3 in Az's initial frame Figure 3 . 1 8 (d) T2 T3 puts A3 in A 1's initial frame we need to determine where the point (2, O) ends up after T3 and T2 are applied. Thus, the arm's hand is at - 1 / Vl 1 / V2 0 which represents the point ( 2 + V2, 2 + Vl) in homogeneous coordinates. It is easily checked from Figure 3 . l S(a) that this is correct. The methods used in this example generalize to robotic arms in three dimen­ sions, although in IR 3 there are more degrees of freedom and hence more variables. The method of homogeneous coordinates is also useful in other applications, notably computer graphics. 229 230 Chapter 3 Matrices A p p l icati o n s Markov Chains A market research team is conducting a controlled survey to determine people's pref­ erences in toothpaste. The sample consists of 200 people, each of whom is asked to try two brands of toothpaste over a period of several months. Based on the responses to the survey, the research team compiles the following statistics about toothpaste preferences. Of those using Brand A in any month, 70% continue to use it the following month, while 30% switch to Brand B; of those using Brand B in any month, 80% continue to use it the following month, while 20% switch to Brand A. These findings are summa­ rized in Figure 3 . 1 9, in which the percentages have been converted into decimals; we will think of them as probabilities. � 0.30 Andreia Russian A. Markovmathematician who was studied andoflaterSt. Petersburg. taught at theHe University was interested in theory numberoftheory, analysis, and the con­ tinued fractions, a recentl y devel­ oped field that Markov applied towasprobabi lterested ity theory.in Markov also i n poetry, and one of the uses to whi c h he put Markov chains was theandanalotherysis ofliterary patternstexts. in poems 0.70 ( 1 856- 1 922) Exa m p l e 3 . 6 4 0.80 0.20 Figure 3 . 1 9 Figure 3.19 is a simple example of a (finite) Markov chain. It represents an evolv­ ing process consisting of a finite number of states. At each step or point in time, the process may be in any one of the states; at the next step, the process can remain in its present state or switch to one of the other states. The state to which the process moves at the next step and the probability of its doing so depend only on the present state and not on the past history of the process. These probabilities are called transition probabilities and are assumed to be constants (that is, the probability of moving from state i to state j is always the same). In the toothpaste survey described above, there are just two states-using Brand A and using Brand B-and the transition probabilities are those indicated in Figure 3.19. Suppose that, when the survey begins, 1 20 people are using Brand A and 80 people are using Brand B. How many people will be using each brand 1 month later? 2 months later? The number of Brand A users after 1 month will be 70% of those initially using Brand A (those who remain loyal to Brand A) plus 20% of the Brand B users (those who switch from B to A): Solution 0.70 ( 120 ) + 0.20 ( 80) = 1 00 Similarly, the number of Brand B users after 1 month will be a combination of those who switch to Brand B and those who continue to use it: 0.30 ( 1 20 ) + 0.80 ( 80) = 1 00 Section 3. 7 Applications 231 We can summarize these two equations in a single matrix equation: [ �:�� �:!� ] [l !� ] [ � �� ] [ ] = [ ] 1 00 120 . (Note and x 1 = 1 00 80 that the components of each vector are the numbers of Brand A and Brand B users, Let,s call the matrix P and label the vectors Xo = in that order, after the number of months indicated by the subscript.) Thus, we have x1 = Px 0 . Extending the notation, let xk be the vector whose components record the distri­ bution of toothpaste users after k months. To determine the number of users of each brand after 2 months have elapsed, we simply apply the same reasoning, starting with x 1 instead of x 0 . We obtain 0.70 0.20 1 00 90 X2 = Px1 = 0.30 0.80 1 00 1 10 ][ ] [ ] [ from which we see that there are now 90 Brand A users and 1 1 0 Brand B users. The vectors xk in Example 3.64 are called the state vectors of the Markov chain, and the matrix P is called its transition matrix. We have just seen that a Markov chain satisfies the relation xk + 1 = Pxk for k = 0, 1 , 2, . . . From this result it follows that we can compute an arbitrary state vector iteratively once we know x 0 and P. In other words, a Markov chain is completely determined by its transition probabilities and its initial state. R e m arks Suppose, in Example 3.64, we wanted to keep track of not the actual numbers of toothpaste users but, rather, the relative numbers using each brand. We could con­ vert the data into percentages or fractions by dividing by 200, the total number of users. Thus, we would start with 0.60 Xo = �s�o� = 0.40 2 00 to reflect the fact that, initially, the Brand A-Brand B split is 60%-40%. Check by 0.50 , which can then be taken as x1 (in agreement direct calculation that PXo = 0.50 with the 50-50 split we computed above) . Vectors such as these, with nonnegative components that add up to 1 , are called probability vectors. • Observe how the transition probabilities are arranged within the transition matrix P. We can think of the columns as being labeled with the present states and the rows as being labeled with the next states: • [ ] Next [] [ ] [ Present B A A 0.70 0.20 B 0.30 0.80 ] Chapter 3 Matrices 232 The fromword the Greekmeaning adjectiis "capabl vderie vede of aiming" (or guessi ng). nItghasthatcome togoverned be applied to anythi is by the laws of probability in the sense about that probabi lelitiyhood makesof predictions the l i k things happening. Inprocesses" probabiliform ty theory, "stochastic a generalization of Markov chains. stochastic stokhastikos, Note also that the columns of P are probability vectors; any square matrix with this property is called a stochastic matrix. We can realize the deterministic nature of Markov chains in another way. Note that we can write and, in general, xk = P k x0 for k = 0, 1, 2, ... This leads us to examine the powers of a transition matrix. In Example have [ 0.0.3700 0.0.8200 ] [ 0.0.3700 0.0.8200 ] [ 0.0.4555 0.0.3700 ] 1. 14.) 0.45. 3.20 22 2 1 0. 7 ( 0 . 3 ) 0. 2 1) , 1 0.5. 3(0.8) 0.24). 0. 4 exactly 2 0.45 p2 = 7 � Y � B 0.2 1 ' A 0.49 A A Y � B 0.24* A 0.06 B figure 3 . 2 0 3.64, we = What are we to make of the entries of this matrix? The first thing to observe is that P 2 is another stochastic matrix, since its columns sum to (You are asked to prove this in Exercise Could it be that P 2 is also a transition matrix of some kind? Consider one of its entries-say, (P 2 ) 2 1 = The tree diagram in Figure clarifies where this entry came from. There are four possible state changes that can occur over months, and these correspond to the four branches (or paths) of length in the tree. Someone who initially is using Brand A can end up using Brand B months later in two different ways (marked * in the figure) : The person can continue to use A after month and then switch to B (with probability = or the person can switch to B after month and then stay with B (with probability = The sum of these probabilities gives an overall probability of Observe that these calculations are what we do when we compute (P 2 ) n. It follows that (P 2 ) n = represents the probability of moving from state (Brand A) to state (Brand B) in two transitions. (Note that the order of the sub­ scripts is the reverse of what you might have guessed.) The argument can be general­ ized to show that 1 k (P ) ij is the probability of moving from state j to state i in k transitions. 3.64, In Example what will happen to the distribution of toothpaste users in the long run? Let's work with probability vectors as state vectors. Continuing our calcula­ tions (rounding to three decimal places), we find x0 - [ 0.0.3700 0.0.8200 ] [ 0.0.5500 ] [0.0.4555 ], [ 0.0.4600 ] [ 0.0.5500 ] [ 0.0.3700 0.0.8200 ] [ 0.0.4555 ] [ 0.0.457525 ] [ 0.0.458812 ] [0.0.459406 ] ' [ 0.0.459703 ] [ 0.0.450298 ] [ 0.0.459901 ] [ 0.0.460000 ] [ 0.0.460000 ] ' x1 - ' x2 - Px 1 - X3 = Px2 = X6 = = ' X? = ' Xg = = ' x4 = ' x5 = = ' X io ' X9 = Section 3. 7 Applications 233 40% 60% [ 0.0.3700 0.0.8200 ] [ 0.0.46] [0.0.46 ] and so on. It appears that the state vectors approach (or converge to ) the vector [ 0.0.46 ] , implying that eventually of the toothpaste users in the survey will be using Brand A and will be using Brand B. Indeed, it is easy to check that, once this distribution is reached, it will never change. We simply compute = 4, A state vector x with the property that Px = x is called a steady state vector. In Chapter we will prove that every Markov chain has a unique steady state vector. For now, let's accept this as a fact and see how we can find such a vector without doing any iterations at all. We begin by rewriting the matrix equation Px = x as Px = Ix, which can in turn be rewritten as (I - P)x = 0. Now this is just a homogeneous system of linear equations with coefficient matrix I - P, so the augmented matrix is [I - P I OJ . In Example we have [I 3.64, - I [ 1 --0.0.3070 1 --0.0.2080 I 0 J - [ - 0.0.3300 - 0.0.2200 l 0o ] O p OJ = which reduces to So, if our steady state vector is x = solution is [ :: ] , then x2 is a free variable and the parametric 1 X1 Xz t t t x2 t 3 0. 6 x 1 2 0. 4 , [ 0.0.46]actual x1 x2 200, If we require x to be a probability vector, then we must have Therefore, = =5= =t + =� = + and =5= so x = iterative calculations above. (If we require x to contain the in this example we must have Exa m p l e 3 . 6 5 + , in agreement with our from which it follows that x = = [ 12080 ] .) 3.2 1. distribution, then A psychologist places a rat in a cage with three compartments, as shown in Figure The rat has been trained to select a door at random whenever a bell is rung and to move through it into the next compartment. 2 1, (a) If the rat is initially in compartment what is the probability that it will be in compartment after the bell has rung twice? three times? (b) In the long run, what proportion of its time will the rat spend in each compartment? Solution Let P = [p;j J be the transition matrix for this Markov chain. Then P21 = P31 = 2I , P12 = Pn = 3I , P32 = p23 = 32 , and P11 = P22 = p 33 = O 234 Chapter 3 Matrices Figure 3 . 2 1 � (Why? Remember that pij is the probability of moving from j to i.) Therefore, and the initial state vector is (a) After one ring of the bell, we have Continuing (rounding to three decimal places), we find [ x2 = Px1 = [ l m11 [!] x3 = Px2 = [ l ! ] [!] [ [ and l 3 0 2 3 I 3 0 2 3 �l 18 l = = 0333 0.333 0.333 0 222 0.389 0.389 l Therefore, after two rings, the probability that the rat is in compartment 2 is ! 0.333, and after three rings, the probability that the rat is in compartment 2 is fs 0.389. [Note that these questions could also be answered by computing ( P 2 )n and ( P 3 ) 2 1 .] = = Section 3. 7 Applications - 235 (b) This question is asking for the steady state vector x as a probability vector. As we saw above, x must be in the null space of I P, so we proceed to solve the system [ J x, x, -i 0 _l O Hence, if x � [ :: l x, x1 x2 x3 then 1 0 � l idm and ability vector, we need 1 = + + -----* � l t, 1 0 0 0 0 -� O -1 0 0 0 J � l. Since x mu•t be ' prnb· = � t. Thus, t = i and which tells us that, in the long run, the rat spends � of its time in compartment 1 and i of its time in each of the other two compartments. linear Economic Models We now revisit the economic models that we first encountered in Section 2.4 and recast these models in terms of matrices. Example 2.33 illustrated the Leontief closed model. The system of equations we needed to solve was ] [xx21 ] In matrix form, this is the equation Ex = x, where E= [ � �: � � ! 1 /2 1 /3 1 /2 1 /4 and x = 1 /4 X3 The matrix E is called an exchange matrix and the vector x is called a price vector. In general, if E [ e;j ] , then e;j represents the fraction (or percentage) of industry j's output that is consumed by industry i and X; is the price charged by industry i for its output. In a closed economy, the sum of each column of E is 1 . Since the entries of E are also nonnegative, E is a stochastic matrix and the problem of finding a solution to the equation = Ex = x (1) is precisely the same as the problem of finding the steady state vector of a Markov chain! Thus, to find a price vector x that satisfies Ex = x, we solve the equivalent homogeneous equation ( I E ) x = 0. There will always be infinitely many solu­ tions; we seek a solution where the prices are all nonnegative and at least one price is positive. - 236 Chapter 3 Matrices 2.34, xXz=1 = 0.0.42xx1 0.0.25Xx2 O0..2lX3x3 1010 1 2 x3= O. lx1 0.3x2 0.3x3 30 x = Cx (2) - C) x = 0. 1 x 10 0. 2 0. 5 1 = [ 0.0.41 0.0.23 0.0.23 ] = [ X3Xz ] = [ 3100 ] C C= x X; d; xx The Leontief open model is more interesting. In Example system + + + + + + + + + +d or ( I we needed to solve the In matrix form, we have where , X c d , d The matrix is called the consumption matrix, x is the production vector, and d is the demand vector. In general, if [c;j ] , [x;J , and d [d;J , then c;j represents the dollar value of industry i's output that is needed to produce one dollar's worth of industryj's output, is the dollar value (price) of industry i's output, and is the dol­ lar value of the external demand for industry i 's output. Once again, we are interested in finding a production vector with nonnegative entries such that at least one entry is positive. We call such a vector a feasible solution. = Exa m p l e 3 . 6 6 = Determine whether there is a solution to the Leontief open model determined by the following consumption matrices: (a) C = [ 1/1/24 1/1/33 ] Solulion (a) We have I- so the equation ( I -C C= [ � �� 2/1/23 ] C = [� 1 J [ 1/1/24 1/1/33 ] [ - 31/2/4 -1/2/33 ] - C)x 3/4 1/3 X1 d1 ] [ - 1/2 - 2/3 ] [x2 ] = [dz (b) O _ = d becomes 3.7. [XX21 ] = [ - 31/2/4 - 2/1/33 ] - 1 [dd21 ] - [ 3/22 9/l 4 ] [dd:] d 1 , d2 , x1 x2• - C) - 1 In practice, we would row reduce the corresponding augmented matrix to determine a solution. However, in this case, it is instructive to notice that the coefficient matrix I is invertible and then to apply Theorem We compute Since and all entries of ( I are nonnegative, so are find a feasible solution for any nonzero demand vector. and Thus, we can Section 3.7 Applications (b) In this case, I- C= so that [ 1 /2 - 1 /2 - 1 /2 2/3 ] (I - c) - l = and x = (I - C) - 1d = [ -- 46 -- 66 ]d 231 [ -- 46 -- 66 ] Since all entries of (I - C) - 1 are negative, this will not produce a feasible solution for any nonzero demand vector d. 4 Motivated by Example 3.66, we have the following definition. (For two m X n matrices A = [a ij ] and B [b;j ] , we will write A 2 B if a iJ 2 b;Jor all i and j. Similarly, we may define A > B, A :::::: B, and so on. A matrix A is called nonnegative if A 2 0 and positive if A > 0.) = Definition A consumption matrix C is called productive if I - C is invertible and (I - c) - l 2 0. We now give three results that give criteria for a consumption matrix to be productive. Theorem 3 . 3 4 Let C be a consumption matrix. Then C is productive if and only if there exists a production vector x 2 0 such that x > Cx. Proof Assume that C is productive. Then I - C is invertible and (I - C) - 1 2 0. Let Then x = (I - c) - 1j 2 0 and (I - C)x = j > 0. Thus, x - Cx > 0 or, equiva­ lently, x > Cx. Conversely, assume that there exists a vector x 2 0 such that x > Cx. Since C 2 0 and C * 0, we have x > 0 by Exercise 35. Furthermore, there must exist a real number ,\ with 0 < ,\ < 1 such that Cx < ,\x. But then C 2 x = C( Cx) :::::: C(,\x) = ,\( Cx) < ,\ (,\x) = ,\ 2 x � By induction, it can be shown that 0 :::::: C "x < ,\ " x for all n 2 0. (Write out the de­ tails of this induction proof.) Since 0 < ,\ < 1, ,\ " approaches 0 as n gets large. There­ fore, as n ---+ oo , ,\ " x ---+ 0 and hence C"x ---+ 0. Since x > 0, we must have C " ---+ 0 as n ---+ oo . Now consider the matrix equation (I - C) ( I + C + C2 + · · · + C" - 1) = I - C " 238 Chapter 3 Matrices As n � oo , e n � 0 , s o we have (I - C) (I + C + C2 + . . . ) = I - 0 = I Therefore, I - C is invertible, with its inverse given by the infinite matrix series I + C + C2 + . . . . Since all the terms in this series are nonnegative, we also have (I - c) - 1 = r + c + c2 + . . . :::: o Hence, C is productive. R e m a rks The infinite series I + C + C2 + . . . is the matrix analogue of the geomet­ ric series 1 + x + x2 + . . . . You may be familiar with the fact that, for l x l < 1 , 1 + x + x2 + . . . = 1 / ( 1 - x) . • Since the vector Cx represents the amounts consumed by each industry, the in­ equality x > Cx means that there is some level of production for which each industry is producing more than it consumes. • For an alternative approach to the first part of the proof of Theorem 3.34, see Exercise 42 in Section 4.6. • corollarv 3 . 3 5 TheLatin wordword comes from therefers which to a garland given as a re­tle ward. Thus, a corollary is a l i t extra reward that follows from a theorem. corollary corollarium, Corollarv 3 . 3 6 Let C be a consumption matrix. If the sum of each row of C is less than 1, then C is productive. Proof If then Cx is a vector consisting of the row sums of C. If each row sum of C is less than 1 , then the condition x > Cx is satisfied. Hence, C is productive. Let C be a consumption matrix. If the sum of each column of C is less than 1 , then C is productive. Proof If each column sum of C is less than 1, then each row sum of cT is less than 1 . Hence, CT is productive, by Corollary 3.35. Therefore, by Theorems 3.9(d) and 3.4, It follows that (I - c) - l 2: 0 too and, thus, c is productive. You are asked to give alternative proofs of Corollaries 3.35 and 3.36 in Exercise 52 of Section 7.2. It follows from the definition of a consumption matrix that the sum of column j is the total dollar value of all the inputs needed to produce one dollar's worth of industry j 's output-that is, industry j 's income exceeds its expenditures. We say that such an industry is profitable. Corollary 3.36 can therefore be rephrased to state that a consumption matrix is productive if all industries are profitable. Section 3. 7 Applications P.Matrices H. Leslie,in Certain "On thePopulation Use of Mathematics;' (1945), pp. 183-212. Biometrika 33 Exa m p l e 3 . 6 1 Population G rowlh 239 1945. One of the most popular models of population growth is a matrix-based model, first introduced by P. H. Leslie in The Leslie model describes the growth of the fe­ male portion of a population, which is assumed to have a maximum lifespan. The females are divided into age classes, all of which span an equal number of years. Using data about the average birthrates and survival probabilities of each class, the model is then able to determine the growth of the population over time. 1 3 (0-1 ( 1 -2 (2-3 A certain species of German beetle, the Vollmar-Wasserman beetle (or VW beetle, for short), lives for at most years. We divide the female VW beetles into three age classes of year each: youths year), juveniles years), and adults years). The youths do not lay eggs; each juvenile produces an average of four female beetles; and each adult produces an average of three females. The survival rate for youths is (that is, the probability of a youth's surviving to become a juvenile is and the survival rate for juveniles is Suppose we begin with a population of female VW beetles: youths, juveniles, and adults. Predict the beetle population for each of the next years. Solution that year: 0.1005), 50% 40 40 25%. 20 5 1 40 4 20 3 220 40 0.5 20 40 0.25 10 4 [ � s 00.25 2 ;n, [ [ :� 1 0, 1 , 2, 0, 1 , 2, 110 220 3 4 [ � s 00.25 00 ] [ 2010 ] [ 1105 ] 4s 0 03 110110 45555 l [ � 0.25 0 ][ 5 ] [ 27.5 After year, the number of youths will be the number produced during + x = The number of juveniles will simply be the number of youths that have survived: x = Likewise, the number of adults will be the number of juveniles that have survived: x = We can combine these into a single matrix equation x ° ' Lx o � x,, whm x, � the initilli populotion di'tcibution vcrtor nnd x, � - is the distribution after year. We see that the structure of the equation is exactly the same as for Markov chains: xk + i = Lxk for k = . . . (although the interpretation is quite different) . It follows that we can iteratively compute successive population distribution vectors. (It also follows that xk = L k x0 for k = . . . , as for Markov chains, but we will not use this fact here.) We compute x, � L x, x, � Lx, � 240 Chapter 3 Matrices 40 3 l [ 455 l [ 227.302.55 l 0.4 25 �3 302.��.55 13.951.752 00.25 00 ][ 227.13.755 l [ 151.56.828 l Ther e for e , t h e model pr e di c t s t h at af t e r year s t h er e wi l be appr o xi m at e l y 5 You exampl could are,g28ue e VWrobeetundedles,to151thjeuveninearleesst, iandnteger57 adulat eachts. (Note: t951adulhat young wets aftsehroulfestmedpalhave3-whi s t e p-for ch would have affected the subsequent iterations. We elected not to do this, since the calculations are only approxim ations anyway and it is much easier to use a calculator or if you do not round as you go.) Theatiomatn wirixth ninageExampl es 3.of6equal 7 is caldurledataion, wil be anInn genern matal, irfixwewihaveth thae popul cl a s e following structure: b b b b b , 2 3 n-1 n s, 0 0 0 0 00 S02 S03 00 00 0 0 0 Sn - 1 0 Here, b1by, beach2, fearmale teheinbirtclahs parameters (b; the average numbers of females pro­ duced i ) and s1 , s2 , are the survival probabilities (s; t h e probabiWhatlityartehweat atofemalmakee inofclouras cali sucrulviavtesionsin?toOverclasali , th1)e beet. le population appears t225o befrionmcreyearasing,1 talotyearhough2. tFihgerure ear3.e 2s2omeshowsfluctthueatchange ions, suichn tasheapopul decreaastieonfroinmeach250 tofo the three age clas es and clearly shows the growth, with fluctuations. = CAS L Leslie matrix. L X L= • • = • • • = • + 4000 3000 ·� :; §' 2000 i:: 0 Cl., 1 000 I Adults o l..:::::+::::: :: :i::: :: ___,_�::;..._. ::: """'*���=::::::i: : :::: ::: ::::=:::+:+ _. 2 4 8 6 10 0 Figure 3 . 2 2 Time (in years) Section 3. 7 Applications 241 0.9 0.8 0.7 -� c 0 :; 0.. 0 0.. 4-< 0 ... c 0) (.) .... 0) 0... 0.6 0.5 0.4 0.3 0.2 I Juveniles I Adults 0. 1 Time (in years) 5 10 Figure 3 . 2 3 20 15 Ifcl, iansst, eaaddiffoferplentotpatingtetrhneemergespopul attihoin,s, wewe needplot ttohecompute tpopul acttiioonn iofn eachthe popul . To do h e fr a as in each year;s. Forthexampl at is, wee, afneedter 1toyeardiv,idwee eachhavedistribu­ tion vectoartbyionthine eachsum ageof itcls component -2501-x1 250 [ 2202010 ] [ 0.0.0.800488 ] sgetts ofayoutgraphhsl,ik8%e tihsejuoneveniinleFis, gandure43%.23,is adulwhiwhiccthhs.stIehflowswes usplcltohetaratthl88%yisthtyatpeofthtofheeprdatpopuloaporovertaitoiontinmofconse,thweeipopul each cleasis is approaching a steady state. turns out that the steady0.st7a2te vectoraitniotnhiisnexampl [ 0.0.0244 ] Thatadultsis. ,(iInn totheherlongworrudn,s, 7th2%e popul of theatpopul aditiostnriwibutl ebed among youths,t2h4e%thjruevenie agelescl, aands es4i%n i o n i s the ratio 18: 6 : 1 .) We wil see how to determine this ratio exactly in Chapter 4. Therrelatieonsarheipmanys amongsituata ifionnsiteinsetwhiofcobjh itectiss.imForporexampl tant toe,beweablmiegthto model thdese icnrtiebre­ wi s h t o varies,iocommuni us types cofatineton wlinorksksconnect (roads iconnect iintegs,toetwnsc.) ,orairrleilnateiornsoutheipssconnect igrngoupscit­ n g s a t e l among or individuals (friendship relationships in a society, predator-prey relationships in actual = It Graphs and Digraphs relative l __ = 242 Chapter 3 Matrices B A an ecosinygstseum,chdominetwnoranceks andrelarteiloanstiohnsiphsipins,aandsporittt,uetrncs.).outGraphsthat matareriidceales arlyesuaiustedefulto model tool in their stconsudy.ists of a finite set of points (called vertices) and a finite set of connectif theys twaroe (tnhote endpoi necessnaritslofy dianstedge. inct) verFiguretices3.. 2We4 shsayowsthanat tedges, wexampl o verteacheiofcestofharewhiesameadcjhacentgraph drawn i n t w o di ff e r e nt ways. The gr a phs ar e t h e " s a me" i n t h e s e ns e t h at al l we care about ar e t h e adj a cency r e l a t i o ns h i p s t h at i d ent i f y D c edges.We can record the essential information about a graph in a matrix and use mattrhixe questioensrs canabouthandlthe egrthaph.e calThiculsaitsioparns tveryiculaquirlycuskley­. fulalgebrif thaetogrhelaphsp usareanslawrgere,cersintceaincomput i s the A B c D n n matrix If[or i s a grdefiaphnedwitbyh n vertices, then its { 1 i f t h er e i s an edge bet w een ver t i c es i and j Two representations of the same a graph otherwise The termcomes from the Latin is theverb Figure 3. 2 5 shows a graph and its associated adj acency matrix. plural) which means "to turn:' In A graph G A(G)] Definition X Figure 3 . 2 4 A .. '1 = adjacency matrix 0 vertex (vertices theetry),context of graphs (and geom­ a vertex is a corner-a poidif­nt where an edge "turns" i n to a ferent edge. vertere, Vl V2 A= V4 V3 graph with adjacency matrix Figure 3 . 2 5 A A 1 1 [ tl 0 0 0 sleysmmettherericismataObser rloopix. atv(Weverhy?thtate)x Noti.thIenicsadjeomealascency ositthuatatimatoansdir,aiaxgonalgrofaphaentmaygrryapha;;haveofis mornecesise tzerhsaanroilyoneun­a of vertmaticesr.ixInsosutchhatcasaijeequal s, it mays themakenumbersensofe edges to modibetfyweenthe defiveredgetnicibetestioinwandeenof jth.ae paiadjracency ofofedgesa paththisatthale lnumber ows us tofo tedgesravel frit ocontmWeoneaidefinvers, nandteexa weto anotwilinhrereafegrcontr atophiantuouspato behlwiya. Thetshequence as a is a 5-pForath.exampl et,hiatn tthhee Not i c e grfirastphofofthFiesgeuriseclosed 3.25, (it beginiss aand3-pendsath, andat thedges e sametwiverce; taepatx); hsutchhatadoespathnotis calinclleuddea The s e cond us e s t h e edge bet w een and the same edge more than once is called a path. Remark � A path length k v 1 v3 v2 v 1 circuit. v1 k-path. v4 v 1 v2 v2 v 1 v3 simple v2 Section 3. 7 Applications 243 x to gievofe ustheinadjforamcencyationmataboutrix tinheFipatWegurhcanes of3.2usevar5: itohuse power lengthssofinathgreagrph'aph.s adjConsacencyidermattherisquar represtehntat? Look at the (2, 3) entry. From the definition of matWhatrixdomulthteipentlicatrieiosn,ofwe know The onlys way thtihsatexprmakees iupon tcanhe sruemsulits inonzer n a nonzero. Buto numberis inonzer s if at loeasiftandoneonlof tyhief product welbot3 (vhliaasveranandteedgex k) .betIarnweournonzer eenexamplando, whie, thcThushismeans ,happensthertheatforkwitlherbeeais2-andanpatedgeforkh betbetweenw2,eensvero ticandes 2 andas 2 whithe crhemaitel sniusngthentat rtiheers ofe are tcorrwo 2-ectplatyhgis vbete 2-wpeenathvers inticthese 2grandaph.3.) (TheCheckargtument o see thweat haveas Exerjusctisgie v72.en can be generalized to yield the following result, whose proof we leave If is thofe adjk-pacency number aths betmatweenrix verof atigrcesaphand then the entry of is equal to the A2 a2k a 2 ka k 3 a k3 a 2 k a k3 vk v3 . =1 v2 vk = (A 2 ) 23 = a 2 , a l3 + a 22 a 23 + a 23 a 33 + a 24a43 = l·l + l·l + l·O + O·O = � A2 A Exa m p l e 3 . 6 8 i G, Ak (i, j) j. How manyWe3-pneedaths arthee ther2)e betentwryeenof andwhichinis Fithgeurdote 3.pr25?oduct of row of and column 2 of The calculation gives 3 2 6 so there are six 3-paths between vertices and 2, which can be easily checked. beposmodel verticesForareexampl orderede, bydirectsIonmeemany td edgesypeapplofmiicgrathteliaotbensionusthteathdatcantoimrepres eesnta one-diedrectbywiayaongraph, onroutetshthieneedges. graphmodel that model s aecostransystpeorm.taAtiographn netwiworthkdiorrectpredated edgesor-priesycalrelleadtioanships in aFiagraph i n g an examplAn eeas. y modification to the definition of adjacency matricgesureall3.ows26 sushowsto usane them with digraphs. Solution A. v1 A3, (1, (A 3 ) 1 2 = v2 1 ·1 + ·1 + 1·1 + 0·0 = 1 digraph. A digraph Figure 3 . 2 6 A2 Chapter 3 Matrices 244 i s the matrix If[or is a didefigraphnedwibyth vertices, then its otif thhererweiissean edge from vertex to vertex j Thus, the adjacency matrix for the digraph in Figure is Definition n Xn A G A(G)] adjacency matrix n i 3.26 Not(Whensurwoulprisindglity,be?th)e Youadjascency mathaverixnoofdiaffidiculgrtayphseeiisngnotthatsymmetnowriccontin agenerins thale. h oul d sist thate algilvedges pattnumber hion offlotwhs iofisnidthea.e samek-dipratecthsiobetn. (wSeenee Exervertciicsese , wherThee wenextinexampl es an applalongica­a Firovund-e tennirobisnpltoayerurnsament(Djokoviin whic, Federch eacher, Nadal , plRoddiays everck, yandothSafier pln)ayercompetonce.e iThen a pl a yer ditexgrjameans ph in Fithgurat eplayersudefmmareateizdespltahyere rej.su(lAts.diAgdiraphrectiendwhiedgechfrthomereveris texactex tloy verone­ direThectedadjedgeacencybetwmateen reverix fory paither diofgverraphticiesn Fiis gcalurleed a is Ak directed Exa m p l e 3 . 6 9 3.27 D F tournament Figure 3 . 2 1 A i i tournament.) 3.27 s R 72.) N 0 0 A= 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 wher e thiecalorlyd.erThusof th, eFedervertiecrescor(arndesphence toherorwowsandandcolcoluumnmns offor exampl is deteer.mined alphabetSuppos onds t ayers, basofewid nons fortheeachresulpltsaofyerth. eiObsr matervcehesthat. Onethe waynumberto dooftwiehiwesnmis wieachgshth tplbeoaryertaonkcounthadtheifistvhjueespltnumber ng roew; equivalently, the vector containing all tthheersouwmsuofmstheisentgivreniesbyin tthhee prcororductespondiwher 2 2, A) Aj, j= 1 1 1 1 1 Section 3. 7 Applications 245 In our case, we have 0 1 0 3 01 00 01 1 01 23 00 00 01 00 01 11 which produces the followinFigrrsat:nkinDjg:okovic, Federer (tie) Second: Nadal Thi r d : Roddi c k, Safi n ( t i e ) e thdefe pleatayered sFederwho etire, dheindesthiesrrvaesnkifinrsgtequalplace.lyRoddi strong?ckDjwoulokovid usc mie tghhte same arguettyhpeat sofinarceArgheument the tiee hewitbeath SafiNadal n. However ,defeat Safin ecould d arotghueertsh; atfurhetherhasmtorweo, "ihendimiregctht" notvictteoorbrtiheeatsakbecaus , who Roddick has only indirect victory (over Safin, who then defeatSineced Nadal ) . grcororup,esptondsheinnotatogrioaonup2-ofpofatinhtdiieinrsettcthherewidie nmaygsrasph,eemsnotsobemorwea canple ausyerusefulewhot.hMore defesquareoverateedof, alanthl teihnadjediotraehcency ctervis icntma­othrye tofrixth. Toe matcomput rix e bothwhiwincshandare giinvdienrebyct wins for each player, we need the row sums 00 01 01 1 01 00 21 1 21 001 000 00 001 00 001 001 00 01 020 0 01 22 22 23 11 87 01 001 011 02 201 11 23 Thus Unfor, tweunatwoulely,dthriasnkapprtheoachplayeris nots as guarfolloaws:nteDjedotkovio brecak, Federall tieers,.Nadal, Safin, Roddick. Aj = two one A + A2, (A + A 2 )j + = 6 I Exercises 3 . 1 M a rkov C h a i n s [ 0.0.55 0.0.73 ] [ 0.0.55 ] In Exercises 1 -4, let P be the transition mabe trix for a Markov chain with two states. Let x0 the initial state vector for the population. = = 1. 2. 3. 4. Computpreoporandtion of the state 1 population wil be in What state 2prafoteporr twtioonstofeps?the state 2 population wil be in What sFitantde 2thafetsetreadytwossttaetps?e vector. x1 x2 • Chapter 3 Matrices 246 P [! ! I] : � [ :m Comput e and What ation wil be in sWhattate 1prprafooteporporr twttiiooonnstofofeps?tthhee ssttaattee 21 popul sFitantde t3hafetsetreadytwossttaetps?e vector. population wil be in Supposeaccor that tdhiengweatto ahMarer inkaovparchaiticnul. aSpecir regifiocnally, behaves sauwetpposdaye thisat0.t6h62e prifotbabiodaylitiys wetthatandtomor0.2r50owifwitoldaybe idays driys. 0.The750prifotbabiodaylitiys drthaty andtomor0.3r38owifwitoldaybe aisdrwety . studyR.ofGabrrainfaliell iand[nThiTels exerAviNeumann, vcioverse is abas27-eMardyearonkovperian Chaiactod.ualnSeeModel Rainfall Occurrence at Tel Aviv;' ( 1 962),for Daily pp. 90-95. ] Wrchaiitne. down the transition matrix for this Markov If Mondaydayiswialdrbey day,wet?what is the probability that Wednes In thdre lyodaysng rube?n, what wil the distribution of wet and ated onetthheatheithgehtprs oofbabichillidtireesn rtDathelatataiavhaveteatlopartbeenheientr paraccumul ewilnthaves. Suppos sabihorlittiechis tlhdataraemedi0.6, 0.u2m-, andheiag0.thta2l,par,rmediesepntectuwim-ivlelhhaveyei; tghhtea,prtoraol b­, medi shavepectuaivm-telaylh;, eiandmedightt,huorem-prshhooreibabigthtchil,iortlidessarhtheorat0.t 1chia, s0.hld7or,artandpare 0.0.e2nt,20., wire­4,l andWr0.4,itreesdownpectivtehley.transition matrix for this Markov chain.is the probability that a short person wil What havef 20%a tofal thgreacurndchirentldpopul ? ation is tal , 50% is of Imedi is shaortiotns, what? wil the diWhatstriubpropor utmiheiongbethtio,innandofthrt30%eheegener popul a t i o n wi l be t a l , of medium height, and short in the long run? In Emdm 5-8, let � 11. be the tmn;;t;on ma- i,;xfor n Ma,kov cha;n wUh th " al" Let x,, � H. be 6. 7. 8. 9. x1 J. x2 . ''A (a) (b) K. Quarterly Journal of the Royal Meteorological Society, 88 ''A Models in Archaeology the initial state vector for the population. 5. Asoutstuhdywesoft pifrofmwn1940(pine)tonut1947cropshypotinhtesheizAmer ican ed t h at nutD. prThomas oduction, folComput lowed a eMarr Sikmovulachaitionn.Model [See of etaternBass;' iinn ShosD. L.hCloneanarke,Subsed., istence and Settlement PatGr(London: .] tThehendattheaprsuoggesbabitleidtieths atthat itfhonee folyearlowiMet'ns gcryearhouen,p was's cr1972)good, wercrope was0.08,fa0.ir0, 7,thanden th0.eo8prp5,woul robabiespdectlitbeiievsgood, eltyh;atiftonefaheirfol,yearorlopoorw-'s 0.itnh1gen1,yeartandhe'spr0.cro8obabi0,p woulrelsiptiecteds betihvatelgood,yt;hief onefolfailro,yearwiornpoor'gs yearcropwer'swascreo0.ppoor09, , woul respWrectd beiivteegood, lydown. faithre, ortrapoornsitiowern mate 0.ri1x1,fo0.r0t5,hisandMar0.k8ov4, chaiIf thne .pifion nut crop was good in 1940, find the probabi crop in the years 1941 tIhnrtohughe lolngi1945.tiesruofn,awhatgoodpropor tion of the crops wil be good, fai r , and poor? Robot e mazey schoos hownesiwhihaven Ficghbeenurwaye 3.pr2to8ogo.andgrammed at eachtojutrnctaversionertahndoml (c) 12. (a) (b) ( c) 10. Figure 3 . 2 8 (b) (b) ( c) (d) Cons he trasnsthitiisosnitmatuatiroixn.for the Markov chaiSupposntrthueatctwetmodel t(iAosn.uFimendththatsetiasttrettaadywikesthseacht1a5terobotdirosbottrisbatuttheacheiosnameofjurnc-amount obots. of t i m e t o t r a vel bet w een t w o adj a cent j u nct i o ns. ) e a roiwvevectmatorrixconsPisisatisntogchasentitrieclymatof rlis.x Prif ove tandLethatjonladenotnonnegat y if j P = j . (a) (a) 13. Section 3. 7 Applications 02 0. 1 0. 4 04 ] Show t h at t h e product of t w o 2 2 s t o chas t i c [ 0. 2 [ 0 35 5 0.0.41 55 0.0.3505 �0.6350 : 0.0 3 0.0.42 0.0.25 0.0.31 Prmatmatoverriiccteseshatiisstalhalsesoproaaosducttsotochaschasoftittcwicmatomatrirxi.x.stochastic 0.5 0 0.2 0.2 IPf-a1 2is als2osatoschastochastictmatic matrixrPixi. s invertible, prove that C x 1/21/2 1/41/2 ] [ 3l ] [ P 0.31 0.0.42 ] , [ 21 ] [ 0. In Exerecdisnumber e if Monda y iunts a dril ya wetday,day?what is the 5 0.0.42 010.2 ] '2 ] expect of days [ [ � ItinonsExeruntciislea10,shorwhatt perissothnehasexpecta tael ddesnumber of gener a ­ 0 0. 5 4 c endant ? IisntExerhe expectcise 11,ed inumber fthe pifiofonyearnutscruntopiilsafagoodir onecryearop occur, whats? C [ 010 0.0.42 010.2 ] [ 3.u5 ] ItinonsExer, whatcise i12,s thsetaexpect rting fredomnumber each ofofthmoves e otheruntjunc­il a Let A be0.3an 0.2 mat0.3 rix, A 2 2.0 Suppose that robot reaches junction 4? Ax< x fo r s o me x i n x 2 Prove that x and ixtiandes: vectLet A,IofrAs in2C, and2ProvebeandtheCfol2lomatwin2rgicinesequal ACIf A 2 and2 x 2 x thentAxhen [ 11/22 3/41/4 ] [ 1/21/3 2/31 2 ] [ 0.0.46 0.0.47] [ 0.0. 1 0.0.46] A popul a t i o n wi t h t w o age cl a s e s has a Les l i e mat r i x 2 [ 1/31/3 3/20 1/2 1 �] . If the initial population vector is [ � 0. 6 [ 0 0 1 3 L ] � 1/2 1/3 [ 030.3 0.0 5 0.023 ] 0500.1/225 00.0.37002/30350.l 25 ] x0 [ 1 � ], compute x1, x2, and x3. 0.4 0.5 0.5 [ 0.25 0 0.40 A population with three age clas es has a Leslie matrix L � H 0.05 1 � l If the m;t;ru populahon 00.2300 0.0. 1105 0100.45 vector is x0 [ : . compute x,. x,, ond x,. [ 0.0.25 0.0.63 ] [ 0. 1 5 0.30 0.50 ] ] 14. (a) X (b) nXn In Exercises 31 -34, a consumption matrix and a demand vector d are given. In each case, find a feasible production vector that satisfies Equation (2). Suppose we want to know the average (or expected) number of steps it will take to go from state i to state j in a Markov chain. It can be shown that the following computation answers this question: Delete the jth row and the jth column of the transition matrix to get a new matrix Q. (Keep the rows and columns of Q labeled as they were in P.) The expected number of steps from state i to state j is given by the sum of the entries in the column of (I Q) - 1 labeled i. 15. 9, 17. 18. 30. 29. X (c) 16. CAS 31. c = ,d = 32. c = d= 33. c = ,d = 34. ,d = = nXn 35. 36. Linear E c o n o m i c M o d e l s In Exercises 1 9-26, determine which of the matrices are exchange matrices. For those that are exchange matrices, find a nonnegative price vector that satisfies Equation (1). 19. 21. 23. I 20. 22. I 9 B, D !R n . ll�r, B 0 D 0, BD 0. >B 0, * 0, (a) (b) P o p u l at i o n G r owth 37. 24. 26. 38. 1 · In Exercises 27-30, determine whether the given consumption matrix is productive. 27. 28. 0. 0. = > 0. y nXn = 25. 241 > Bx. Chapter 3 Matrices 248 39. A popul1 ation1 with3three age clas es has a Leslie matrix [ 00. 00.5 00 l . If the initial population vector is [ 100100 ] , compute and 100 A population with four age clas es has a Leslie matrix -_[ �.5 0.0� 7 0.0 3 00 . Ifthe initial population � � �i voctodno rn l rnmpute Md nasspuecirvievsalwiprthobabitwolageity ofcla80%s es frofom1 yearclas's dur1 toa­ tAclioacerns has2.taiEmpi eachtwo posfemalsibleegiLesrivcesalliebievimatrtdhencertiocesfivshareowsfee maltheats ,peronyearaver.aThusge, , [ 0.0 8 05] and [ 40. 8 01 ] Steacharticasng ewi. th [ � � l compute , in of eachdoageyour clgrForaasphseachoversucasggestime,etpl?(aost itnheFirgeurlateiv3.e2s3)ize. What Suppose the2Leslie matrix for the VW beetle is [ �.! L � l sta,ting with acb;tmy Xo, detec­ miSupposne theetbehavi o r of t h i s popul a t i o n. he Leslie matrix for the VW beetle is [ 0� �0.5 20� l Investigate the effect of varying tWoodl he suravndivalcarpriobbabiou arelityfoundof thpre young beet l e s . imaricianly inorn ththeweswestt.ern prTheoviaverncesageoflCanada and t h e Amer ThegivenbiirnthTablandeifs3.esu4rpv,anwhiivalofcrhaatfseehsmalowsforeeachtihsataboutcarageibbr1ou4ayearcketcowssar. doe not give birth at all during their first 2 years and give L = 7 x0 = 40. x1, x2 , biyearrths.tTheo aboutmoronetalitcaly rfatpere foryearyoungduricalngvtesheiisr vermidydlhiegh. x3 . L x,. x,. � 41. LI L2 = (a) x0 x, = x1, = . • . x1 0 (b) 42. L · 43. L= cAs 44. AA · s = Ta b l e 3 . 4 Age (years) Birth Rate Survival Rate 2-44-60-2 0.0.1.408 0.0.0.93 6-88-10 1.1.88 0.0.99 10-12 12-14 0.1.66 0.0.60 JTablaspTheere 3.Nat5number i. UsonalingParsaofCAS,kwoodl in Alprbeadierndcttacartihnei1990bcarouibrareoupore popul shteownd inatiinon ctyoutheconclpopulude?atio(nWforhatthe yearforassu1992mpts 2010iandonsanddoes1994.2020.thThenisWhatmodelprodojemake, and how could it be improved?) 7 Ta b l e 3 . 5 Age (years) 0-22-4 4-66-8 8-10 10-12 12-14 Woodland Caribou Population in Jasper National Park. 1 9 9 0 102 85 120 Number Source: World Wildlife Fund Canada Section 3. 7 Applications Graphs and Digraphs In Exercises 45-48, determine the adjacency matrix of the given graph. 45 VI In Exercises 53-56, determine the adjacency matrix of the given digraph. V2 .--------. V4 46. 249 V4 V3 V1 V2 54. V3 v, V2 V4 47. 55. V4 V5 56. V J V2 V4 V3 In Exercises 49-52, draw a graph that has the given adjacency matrix. 49. 51. 001 001 [ 0� 00 01 �]1 0 01 00 00 01 11 10 0 00 00 50. 52. 0 [ 0; 0 00 ll1 1 00 00 00 1 1 11 00 00 In Exercises 57-60, draw a digraph that has the given adjacency matrix. 57. 10 00 �[ 01 0 ; ] 58. 10 00 0 0 [: 0 1 �: Chapter 3 Matrices 250 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 59. 0 1 0 60. sadjweracency the folmatlowirinxg quesfor tthioisns.digraph and use it to an­ A 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 Rodent In Exercises 61-68, use powers of adjacency matrices to determine the number ofpaths of the specified length between the given vertices. 61. 50, 2, v 1 v2 62. 2, v 1 v2 63. 50, 3, v 1 v3 64. 52, 4, v2 v2 65. 57, 2, v 1 v3 66. 57, 3, v4 v 1 67. 60, 3, v4 v 1 68. 60, 4, v 1 v4 69. A (a) i A ExerExercciissee 52, lleengtngthh andand ExerExercciissee lleengtngthh andand Fish Bird ExerExercciissee lleengtngthh ttoo WhiHowcdoesh speciesshhasow tthhies?most direct sources of food? ExerExercciissee lleengtngthh ttoo Whimostchotshperecispeseciiseas?diHowrect sdoesource ofshfoodow thforis? the Let Ifberowtheofadjaicency mat r i x of a gr a ph G. s all zeros, what does this imply Iinf direatects sbandourcebeatof food.s weHowsaycanthatwehasuse astoande­ about G? IfaboutcoluG?mn of is all zeros, what does this imply tseormiurcesne?whiWhichchspspeciecieseshashasththe emosmost tindidirreectctandfood i n di r e ct food s o ur c es combi n ed? Let Ifberowtheofadjacency mat r i x of a di g r a ph D. Suppos e t h at pol l u t a nt s ki l t h e pl a nt s i n t h i s food i s al l zer o s , what does t h i s i m pl y web, and we want t o det e r m i n e t h e eff e ct t h i s aboutcoluD?mn ofA2 is all zeros, what does this imply change wi l have on t h e ecos y s t e m. Cons t r u ct a Ifabout new adj a cency mat r i x fr o m by del e t i n g t h e D? r o w and col u mn cor r e s p ondi n g t o pl a nt s . Repeat a tourmatnamentrices,wiratnkh stixhe parthetmoss (a)t tando (c)leandast affdetecteremdibyne twhihe change. ch species are plplFiaagyeryeruress,firsttoibys thdeteUsdierignmrgaiphnadjinofgacency What wimatl rtihxecallong-culatetiromnseffwielctsofhowthethpolis?lution be? usas iinngExampl the noteion of combinedwiwinsnonls andy andindithreenctbywins, FiveWhat peopl e ar e al l connect e d by em ai l . Whenever tithalemonghearbyse-amjuaiiclyinpigeitcetoofsogosmeonesip, heelsore isnhtehe pasgroneoupseofs accor d i n g t o Tabl e netDrawworthke" diandgrafiphndtihtats adjmodelacencys thmatis "grosixsip (b) 70. 71. Figure 3 . 3 0 (a) A (b) (c) a A c, a j A A (a) i A2 (h) j 3.29 P1 c A (d) A* A P6. (e) 3.69. 73. 3.6. (a) A. Ta b l e 3 . 6 amfoodtowebinidin ­ aFicatsgmeursaltehl atecoshasiyssatedim.asgrAaaphdisourrreectprceeedofsedgeentfood.infrg oCons truct the Figure 3 . 2 9 72. 3.30 a BerAnnCartla Dana Ehaz Sender p6 b a b CarCarllaa,, Ehaz Dana Ehaz Ann, Car l a Bert Recipients Chapter Review evergosDefisyinponeegeta ons frhioasms torhAnnehertimtoleisbotitt. t(ahTkeshusCara,lpersainandoneonEhaz.sttoep,e-)mIfail steps wiWhatl it tmatakerix forcalBercteverulhearsatyioonenarreeluvealmor,se tsothhowhearis? many the rumor? mat r i x i n Exer c i s e 49 forcalIf Annceverulathearyioonen rseaelvealrsue mortsothheari,show? themanyrumor?stepsWhatwil matit tarkeix TheThe adjadjaacency cency mat r i x i n Exer c i s e 52 The adj a cency mat r i x i n Exer c i s e 51 Idinggeneral , i f i s t h e adj a cency mat r i x of a ed to 00 00 00 11 ver[Thetreaxgosph,byshowipanetpatcanwhor(wekofinstoetlmehiisf verexerlengttecxhis)e?iiss connect r e mi n i s c ent ofplatyheandnotfiiolmn ofby"tshixatdegrname)ees, ofwhisecparh sautggesion"ts(fothundat anyin the 0 0 01 01 01 01 0 0 0 1 1 twhoswo people lengte arh eisconnect e d by a pat h of acquai n t a nces at mose frivt ol6.oTheuslygameassert"sStihxatDegrall acteesoofrs are 1 Prove0that1 a 0graph0 is bipartite if and only if its Keviconnectn Bacon" mor e d t o t h e act o r Kevi n Bacon i n s u ch a wa y . ] vercantibecesparcantitbeionedlabelased so that its adjacency matrix Let Bybeintducthe adjion,acencyprovematthatrixforofalalgraph1, the of n-paths [�--H�-] betentHowwryeendoof tverheitssitcequal esatementandto tandhe number tIf ios tbehemodiadjacency fied ifmatisriaxdiofgraproof adiph?graphin partwhat(a) havedoes grUsaiphng hasthe rnoesuciltrcinuiparts oft (odda), prleongtvehth. at a bipartite the entry of represent if ? (b) A graph is called bipartite if its vertices can be subdi­ vided into two sets U and V such that every edge has one endpoint in U and the other endpoint in V For example, the graph in Exercise 48 is bipartite with U = { v 1 , v2 , v3 } and V = {v4 , v5 }. In Exercises 76-79, determine whether a graph with the given adjacency matrix is bipartite. step (c) (d) 76. 77. 78. A i j 79. 80. ( a) 74. 251 (a) G. A An (b) 75. A n 2:: i (i, j) (b) G (i, j ) A= j. AA T i *j G, Chapter Review basBasiiss,Theor198em, 202 colcoluumnmn matspacerixof(vaector), 138 matrixi,tion195oflinear compos coortbasransdisi,nfoatrm208e vectationsor,wit219h respect to a didiamgonalensiomatn, r203ix, 139 elementary matrix, 170 Kev Defi nitions and concepts Fundament al172,Theor206em of Invertible matmatrriixx,addi138tion, 140 Mat r i c es , iindverentisteyofmata srqixuar, e139 matmatrriixx mulfactotirpizliatcatioin,on, 180141 matnegatriixvpowers , r149ix, 140 intmatverranssreifoxof,rma163atlinioearn, 221 e of a mat lliinnearear dependence/ combinationiofndependence matrices, 154 nulnuloutlleirtsyppraceofoductaofmata, matrix147,rix204, 197 linofearfactmattrorarinsiczesatfo,iromn,157atio181n, 213 perparmulmtituttiiopanedltiicoatnmatiomatn)r,irciesx145,, (bl187o1ck48 LU Chapter 3 Matrices 252 sttarndaransfodrmatmatrioixn,of a216linear ssyummet bspace,ric mat192rix, 151 tzerranso matposeriofx, a mat141 rix, 151 MarFork eachanyofmattherifolx lowibotngh statementand s truearore defifalsne:ed. uctIf posofselibelment e, exprareys mattherimatces.rix [41 ! ] as a prodIf andthBenareBmatrices such that AB and If is a square matrix such that show that If A, B,B,andthen are invertible matrices such that Thetary matinverrisxe. of an elementary matrix is an elemen­ Find an faotmization of [: - l :J Find bases for the row space, column space, and null Theelementtransarpyosmate ofrixan. elementary matrix is an Theelementproductary matof trwixo. elementary matrices is an sp�<ofA [� =� � � n aEverIfsubsiyspplanaceaneofin matis aritxw, o-thendimthense nulionall spsaceubsofpaceis haveSupposthee smatamericroesw spandace?BarWhye roorwwhyequinotvale?ntDo. Do andthey BIfhaveis antheinsverametibcolle umatmnrisxp, ace?explaWhyin whyor whyandnot? must ofThe transformation T: defined by nulmatl srpixace.? ExplIs tahinis. true if is a nonin­ T(If xT:) - xis aislianlearineartratnsrafonsrfomratmiaton.ion, then verhavetibtlheessame q uar e whose berowsinveraddtibuple.to the zero talhlerxeinistahe4 domai5 matn ofrix such that T(x) Ax for vectIf oisr,aexplsquarainewhymatrixcannot Let be an mat r i x wi t h l i n ear l y i n dependent col u mns . Expl a i n why mus t be an i n ver t i b l e ] -� �]. [� � [� mat r i x . Mus t al s o be i n ver t i b l e ? Expl a i n . Find a linear transformation T: such that A(B2BBT) - 1 A(B2TBB2) - 1 B -1B � ] ]. ] J and r r � [ [ [ [ _� � The outer product expansion of Fi n d t h e s t a ndar d mat r i x of t h e l i n ear t r a ns fo r m at i o n 2 / -l l If is a matrix such that [ - 3 /2 4 ], find rT:otation of 45°thataboutcorrtehspeondsorigintofola count ebyrcloackwiprojseec­ l o wed ion onteoththate lTine is a linear transformation If [ � : = : ] ond X is a matdx such thot tSuppos andT2 (vs)uppos(ewtherhatevTis2 a vectT orT)su. PrchotvehatthTat(vv) and T(butv) -[ � - �], findX. are linearly independent. 3 -2 pr158,oper159,ties of167matrix algebra, 154, rank ofTheora matem,rix, 205204 Rank reprproeductsentast, ions146-148 of matrix rrooww matspacerixof(vaectmatorr)i,x, 138195 ssspccalalanaarrofmulmata setrtiipxofl,e ofmat139a rmaticesr,ix, 156140 square matrix, 139 Review Questions 1. (a) (b) A A * 0, (c) XA = A, AA T = 0. X A TA =0 X = A - 1 B. (d) 10. A= 11. A (I - A) - 1 = I + A + A 2 • A 3 = 0, 12. (e) LU 13. (f) � mXn (g) A (h) (i) (j) A ll�r . 14. IR 3 IR 3 . = IR 4 ---+ IR 5 X A 9. A � AX = A A A 16. A = 17. In Exercises 2-7, let A = and B = Compute the indicated matrices, ifpossible. 4. TA 3. 2. 5. 6. 7. AA T A-1 = A 15. A IR 2 ---+ IR 2 T. 8. A A� mXn A AA T A A TA IR 2 ---+ IR 2 18. A. 19. AT IR 2 ---+ IR 2 y = - 2x. : !R n ---+ !R n 20. =O = 0 *0 E i genva l ues and E i genvectors latent,secular, characteristic,proper, eiroot, gen number value, Almost every combination of the adjectives and with the nouns and has been used in the literature for what we call a proper value. 4.0 Introduction : A ovnamical svstem o n Graphs Weestinsgawreisnultths.eBotlasthchapt eovr tchaihat intesrandatintghmate Lesrilxiemulmodeltipliofcatpopul ion ofatteionnprgrooduces intebri­t Mar k wt h exhi steadystsatndatessuichn cerbehavitain osri.tuFiatrsiot ns.we wiOnel loofokthate anotgoalhserofittehraistivchapte proceser iss,toro help you under -Paul R. Halmos t h at us e s mat r i c es . ( I n t h e pr o bl e ms t h at fol l o w, you wi l fi n d i t hel p ful t o us e a CAS or a cal c ul a t o r wi t h mat r i x capabi l i t i e s t o faci l i t a t e t h e comput a t i o ns. ) (2nd edition) Our exampl e i n vol v es gr a phs ( s ee Sect i o n A i s any gr a ph Van Nostrand, 1958, p. 102 which every vertex is adj acent to every other vertex. If a complete graph has vertiin­ ces, it is denotePid cbyk anyKw vectFor exampl e , Fi g ur e s h ows a r e pr e sent a t i o n of K4 • 4 or x ison IRthatwithisnonnegat ivethentriandes andso on.labelComput the veretitchese ofadjK4acencywith matthe rcomponent s ofx, l a bel e d wi iAx.x A TrofyK4thandis forrelsaebelvertahlevectvertoircsesxofandtheexplgraphainwi, inthtethrme cors ofrethspeondigraph,ng component s of how the new laNowbels can be dettheerprmoinceseds frinomProthbleeolmd laThatbels. is, for a given choice of x, ces asges.descSirnibceedcomponent above and tshofentapplhe vecty Aoagairs thnems(andelvagaies win, landgetagaiquinte) untrlaerlagibele,l awetpathewitverelrntiemer hem byatdiiovnidriensguleachts in tvecthe vector byor its after each iteration. Thus, if a tcomput CAS dynami­ Finite Dimensional Vector Spaces cal system, 3.7). complete graph n 4. 1 Problem 1 v1 x1, 1. iterate Problem 2 scale VJ V2 ------ we wil replace it by largest component .!. [ � ] [ � ] 4 1 Figure 4 . 1 K4 1 ·5 0.25 0.25 253 Chapter 4 Eigenvalues and Eigenvectors 254 thisisforprocestshenguarantandees thatUsteheatlalerasgest ttecomponent ofandeachtwo-vectdeciormwiall-pnowlace beNotaccure Dotahcy.atthWhat n i t e r a t i o ns appearshouls todbehavehappeninotincedg? that, in each case, the labeling vector is You grappr(waiphsthooutachiwistnchgaltiahnicerg)s s.ttWhat eaadyin vectsitsatotehrevect(aresltoaetradyioandnsshtapplaiptebetlaybeltwheene!)adj. Labelthaecencynewthelmatverabeltrisicxandes ofonetthheemorolcompl ed onestimet?ee Make aifconjwe elactbelure aboutwithththeegenersteadyal casstatee vectWhator andis tapplhe sytetadyhe adjstaatelcency abel? What happens matrix Thewithout scaling? is shown in Figure Repeat the process in ProblWeemswil tnowhroughexplowiretthhtehpris ogrcesaph.s with some other clas es of graphs to see if they behave me way.e, Theis the graphisshthowne grainphFiwigurthe 4.3ver. tices arranged in a cyclic fashion.thFore saexampl Repeat oblemsthe tgener hroughal caswie.th cycles for various values of Repeat and makePrthoeblaprconjeomcesesctwiofutrhPre about val u es of What happens ? A bi p ar t i t e gr a ph i s a ( s e e Exer c i s e s i n Sec­ tisioadjn a.cent) ifittos verevertiycesvercantexbeinparV, tanditionedviceinvertossae.tIsf andandVVeach such thhaveat eververy vertictesex, tihnen the graph is denotRepeated bythe prForocesexampl e,blemsis thtehrgroughaph in wiFigthurcompl e ete bipartite s of Pr o graphsBy the endforofvarthioisuschaptvalueesr, youof wiWhatl be ihappens ? have made in this Introduction. n a position to explain the observations you 1. K4 , K3 Ks . Problem 3 A Kw Problem 4 Kn A Figure 4 . 2 1 3 Figure 4 . 3 Problem 1 cycle Cn Cs Problem 6 odd n 1 n 6 3 even Cn n. 74-78 complete bipartite graph 37 Kn , n - Problem 8 K3 , 3 Kn , n Figure 4 . 4 4.2. Petersen graph Problem 5 n. U U 1 n U 4.4. 3 I n t ro d u ct i o n to E i g e n v a l u e s a n d E i g envectors eady stofatepopulvectoartioi nn tgrheowtconth. eForxt ofa tInMarwoChapt applkov chaiiecratinons:wewithMarencount ktransovitiechaiornedmatntsheandrinotx tihoaensLestofeadylaiestmodel or atihadon vecttheoprr opersatitsyfythinatg for, whera Lese lireematpresreintx edathsteeadysteadystatsetavectte groorwtswastahteravectapopul te. For example, we saw that ] ] ] and [�5 [ [ [ The German adj e ctive means . "own'' orand"characteristicareof'charac­ I n t h i s chapt e r , we i n ves t i g at e t h i s phenomenon mor e gener a l y . That i s , for a s q uar e ma­ teristic of a matrix in the sense that t r i x we as k whet h er t h er e exi s t nonzer o vect o r s s u ch t h at Ax i s j u s t a s c al a r mul t i p l e they contai n i m portant informa­ of Thi s i s t h e and i t i s one of t h e mos t cent r a l pr o bl e ms i n l i n ear tion aboutThetheletter,.\ nature(lambda), of the the algebra. has applications throughout mathematics and in many other fields as well. matrix. Greek equiis used valentforofeigenvalues the English letter Let be an s c al a r i s cal l e d an of i f mat r i x . A because at one time they were also t h er e i s a nonzer o vect o r s u ch t h at Such a vect o r xi s cal l e d an knownis aspronounced "EYE-gun:' The prefix of corresponding to 3, Px = x; Lx = rx Eigen­ x x 4 0 0.25 eigenvectors A, x. L, It D e fi n it i o n latent values. eigen r 0.4 0.7 0.2 0.4 = 0.6 0 3 0.8 0.6 eigen values P, L, tor A x eigenvalue problem, A x nXn A. A x = Ax. A eigenvalue A eigenvec­ Section 4.1 Introduction to Eigenvalues and Eigenvectors Exa m p l e 4 . 1 255 Show that x [ � ] is an eigenvector of [ � � ] and find the corresponding eigenvalue. We compute [ � � ] [ � ] [:] [ � ] from which it follows that xis an eigenvector of corresponding to the eigenvalue A= = Solution = 4x =4 Ax = A Exa m p l e 4 . 2 4. 4 genvalue.ue of [ � ] and determine all eigenvectors corresShowponditnhgatto tihs ians eigeienval Weequimusvaletntshtowo ththeatequattherieonis a vectosroxwesuchneedthatto comput x Sx.e Butthe nulthisl equat i o n i s space of the matrix We find that [ I 23 ] [ J [ ] Since ethme ofcolIunmnsvertibofletMathis matricesriximarpleiecls etarhatly iltisnnulearllyspdependent , theo.Fundament al Theor ace i s nonzer Thus , x x has ainnontg therinulviall sspoace:lution, so is an eigenvalue of We find its eigenvectors by comput [ 1 ] [I t 1 J Thus, if x [:J is an eigenvector corresponding to the eigenvalue it satisfies t or t so these eigenvectors are of the form 5 A= nonzero (A - SI)x = 0, Solution A - 51. A _ SJ = S 1 4 _ 4 5 O 0 5 = 5 A = -4 2 4 -2 A = A. -4 2 0 - 0 [A - SI I O J = � 4 -2 0 0 0 0 = x 1 - x2 = 0, 5, x 1 = x2 , That is, they are the nonzero multiples of [ t ] (or, equivalently, the nonzero multiples of [ :Ji .+ Thest thseetsetofofall eigenvectvectorsocorrs inretshpeondinulnl gsptaceo anofeigenval,\Iu. eIt folofloanws that tmathis setrix of ieisgjuenvect ors, together with the zero vector in the null space of 1 ,\ nonzero !R n , is A- nXn A - AI. A 256 Chapter 4 Eigenvalues and Eigenvectors Let be an mat r i x and l e t be an ei g enval u e of The colcallleedcttihone of all eigenvectof oandrs coris rdenot espondied nbyg to together with the zero vector, is Therefore, in Example { t[ � ] } . D e fi n it i o n A nxn 4.2, E5 Exa m p l e 4 . 3 A, = eiShowgenspthace.at is an eigenvalue of produces As in Example we compute the null space of Row reduction [ - � � - !] [ � � - � i ee thatotrhsecornulrelssppondiace ofng to this ieis gnonzer oe. satHence,isfy x1 is an ei-g2x3envalue oforfroxm1 andwhicthheweeig2x3senvect enval u . It follows that A = 6 A = 4.2, Solulion A - 6I A = - 2 2 -4 A A, = - - -----+ 0 0 - 6I. 0 6I 6 x2 + + x2 = 0 , Inion we can gisavyse athgeomet ric ionrtserpreandtationarofe tparhe anotl eli.oThusn of ,an ieis ganenvecteigenvect or. Theor equat at t h e vect ofif if andis paronlalyeilfto trwher ansfoerms is itnhteomata parrixaltrealnsvectforomrat[oior,nequicorvrealspentondily, nifgandto only Find the eigenvectors and eigenvalues of [ � O ] geometrically. Example We rTheecognionlzyevectthators thisatthemapsmatparrix aofl elatoretfhleemselctionves arein tvecthe x-orasxipars a(sleeel to the y-axis (i.e., multiples of [ �] ), w�ich are reversed (eigenvalue and vectors paral el to the x-axis (i.e., multiples of [ ] ), which are sent to themselves (eigenvalue cor(seereFispgondiure n.g ei. gAccor enspacesdinglarye, and are the eigenvalues of and the span([ � ] ) and span([ � ] ) IR 2 , Ax Ax Ax = A TA (x) Exa m p l e 4 . 4 A. EA- A eigenspace A A x, TA x A 3.56 ) . x A] . A Solulion x = -1 F F - 1 ), 1) 0 4 5) A E_ 1 = = -1 A = E1 1 A, = Section 4.1 Introduction to Eigenvalues and Eigenvectors 251 y 3 -3 The eigenvectors of a reflection Figure 4 . 5 y 4 3 2 3 4 Figure 4 . 6 The discussion is basedPictur­ on the article "Eigenpictures: ingSteventhe Schonefeld EigenvectorinProblem" by (1996), pp. 316-319. The College Mathematics Journal 26 of eigoenvect oirfsandgeometonlyriifcalx landy is Axto drareawalxigandnedAxin ahead-straigthto­ tlianile..AnotThenIn Fihgxerurwiwayel4.be6to, xianthseiinangkenvect r of A ei g envect o r of A but i s not . eigenvalorsuegeometthenricsoalliys,anywe need non­ zeronloyIfconsmulxistiiandperleeitofghenvect x.e effSo,ectoiroffofweAAonwantcorretsopvectsearondiocrnhsg. forFitogurteihgeeenvect 4.7(a) shows what happens when we transform unit vectors with the matrix A [ � �] of Example 4.[\�ny,]isplay the results head-to-tail, as in Figure 4.6. We can see that the vector x 1 is an eigenvector, but we also notice that there appears to be an eigenvector in the second quadrant. Indeed, this is the case, and it turns out to be the vector [ � j �] . y A, unit = = - / v'2 258 Chapter 4 Eigenvalues and Eigenvectors y y (b) (a) Figure 4 . 1 1 (b), weorsseate what happens when we us e t h e mat r i x A ITherne arFiegurnoeei4.g7envect [ 1 al l ! to firincdineitegrenvect oiorns onceofthem-but we have tonehe corrquesestpioondin renmaig eingsenval ­ uesdo we,Weandfinowrwest fihavenknowd thaehowgeomet p r e t a t : How an eigenvalue ofA ieif gandenvalonluyesifofthaeginulvenl smatpaceriofx?AThe- keyisisnontthe robsiviaelr.vation that is Recall from Section that the determinant of a matrix A [ ; �] is tFurhe texprhermesorioe,nthdete Fundament A -al TheorandeAmiofls innververtitbibleleifMatandriconles yguarif detantAeesisthnonzer o . atonla ma­y if titrsixdethaserma nontinantriivsizeral nulo. Putl sptaceing itfhandese factonlys tiofgetit ishernoni, wenverseettibhlate-hence, i f and (fosr fact charmatactriecresizesat leiegasenvalt) iusesan, andeigenval u e of A i f and onl y i f det( A 0. Thi moment, though,weletwi's sleesohowon generto usaeliiztewiitttho squarematmatrirciceses. ofarbitrary size. For the Find all of the eigenvalues and corresponding eigenvectors of the matrix A [ � �] from Example 4. 1 . tion det(A The- precedi0. nSignrceemarks show that we must find all solutions of the equa­ det (A det [ 1 l ] -1 wetionneedare easto isloylvfound e the quadr to be atic equat 4 andion These are th0.erTheeforesotlhuetieionsgenvalto thuiess equa­ofA. = A AI 3.3 = 2 X 2 = be, ad A - 2X2 AI ) = 2X2 Exa m p l e 4 . 5 A Solulion AI ) - AI) = = 3 -A A = = 3-A (3 - A ) ( 3 - A ) A 2 - 6A + 8 = A 2. = = A 2 - 6A + 8 Section 4.1 Introduction to Eigenvalues and Eigenvectors 259 To find the eigenvectors corresponding to the eigenvalue A = 4, we compute the null space of A - 4I. We find l [-1 1 J [1 1 J [:J { [:: ] } { [ � ] } [A - 4I oJ = from which it follows that x = l 1 O -+ -1 0 0 -1 O 0 0 is an eigenvector corresponding to A = 4 if and only if x 1 - x2 = 0 or x 1 = x2 • Hence, the eigenspace E4 = span ([ � ] ) . Similarly, for A = 2, we have so y = [A - 2I I 0 l = [;J = x2 [ � � I �J [� � I �J { [ �2 ] } { [ � ] } ([ � ] ) = ---+ is an eigenvector corresponding to A = 2 if and only if y 1 + y2 = 0 or y 1 = -Yi- Thus , the eigenspace E2 = = y2 = span . Figure 4.8 shows graphically how the eigenvectors of A are transformed when multiplied by A: an eigenvector x in the eigenspace E4 is transformed into 4x, and an eigenvector y in the eigenspace E2 is transformed into 2y. As Figure 4.7(a) shows, the eigenvectors of A are the only vectors in IR 2 that are transformed into scalar multiples of themselves when multiplied by A. y Ay = 2y -4 -3 -2 2 -1 -1 -2 -3 -4 How transforms eigenvectors Figure 4 . 8 A 3 4 Chapter 4 Eigenvalues and Eigenvectors 260 Remark You will recall that a polynomial equation with real coefficients (such as the quadratic equation in Example 4.5) need not have real roots; it may have complex roots. (See Appendix C.) It is also possible to compute eigenvalues and eigenvectors when the entries of a matrix come from "ll_P ' where p is prime. Thus, it is important to specify the setting we intend to work in before we set out to compute the eigenvalues of a matrix. However, unless otherwise specified, the eigenvalues of a matrix whose entries are real numbers will be assumed to be real as well. Exa m p l e 4 . 6 Interpret the matrix in Example 4.5 as a matrix over "11_ 3 and find its eigenvalues in that field. A2 = -2 = 1, A2 - A=6A 1 = A= -1 = 2A2 2 = The solution proceeds exactly as above, except we work modulo 3. Hence, the quadratic equation + 8 0 becomes + 0. This equation is the same as giving and as the eigenvalues in Z 3 • (Check that the same answer would be obtained by first reducing A modulo 3 to obtain S o lulion [� �] Exa m p l e 4 . 1 and then working with this matrix.) = [� -�] = - AI) = [ -1A --1A ] = ,\2 1 A= A= Find the eigenvalues of A S o lulion (a) over IR and (b) over the complex numbers C. We must solve the equation O det (A det + (a) Over IR, there are no solutions, so A has no real eigenvalues. (b) Over C, the solutions are i and - i. (See Appendix C.) In the next section, we will extend the notion of determinant from 2 2 X to n X n matrices, which in turn will allow us to find the eigenvalues of arbitrary square .. I matrices. (In fact, this isn't quite true-but we will at least be able to find a polynomial equation that the eigenvalues of a given matrix must satisfy.) Exercises 4 . 1 In Exercises 1 -6, show that v is an eigenvector ofA and find the corresponding eigenvalue. [� �l = [ � ] = [� �l v = [ -�] = [ - ! �l = [ _� ] 1. A = 2. A 3. A v v 4. A = [! = �l = [�] = [ 1 1 -; ] .F [ -: ] -1 -2 = [ : 2 1 ] = [ 11 ] v 0 5. A 6. A 0 0 ,v • Section 4.1 Introduction to Eigenvalues and Eigenvectors A In Exercises 7- 12, show that is an eigenvalue of A and find one eigenvector corresponding to this eigenvalue. 7. A = 8. A = 9. A = IO. A = 11. A = 12. A � [� 2] A [� 2] , [ _� 4] A [! -2] 0 : J A -I H 0 -I ] � [: 2 � , A = 2 , =3 -1 3 IR2 In Exercises 1 9-22, the unit vectors x in and their images Ax under the action of a 2 X 2 matrix A are drawn head-to-tail, as in Figure 4.7. Estimate the eigenvectors and eigenvalues ofA from each ''eigenpicture." 19. y ,\ = - 1 , 5 -7 , =1 A = -6 1 In Exercises 1 3 - 1 8, find the eigenvalues and eigenvectors of A geometrically. 13. A = 14. A = 15. A = [ -� �] [� �] [� �] [il� �d] [� �] [� -�] (reflection in the y-axis) 20. (reflection in the line y = x) (projection onto the x-axis) (projection onto the line through the 2 origin ith irection vector 16. A = 17. A = [i]) and a factor of 3 vertically) 18. A = 2 (stretching by a factor of horizontally (counterclockwise rotation of about the origin) 90° 261 y 262 Chapter 4 Eigenvalues and Eigenvectors 21. y 29. A = [� �] 30. A = [1 0 1 + i 1 i ] In Exercises 3 1 -34, find all of the eigenvalues of the ma­ trix A over the indicated Zp . 31. A = 33. A = [ � �] [ ! �] over Z 3 32. A = over Zs 34. A = [� [: �] over Zs 35. (a) Show that the eigenvalues of the 2 X 2 matrix A= -2 22. [ : �] are the solutions of the quadratic equation A 2 - tr(A)A + det A = 0, where tr (A) is the trace of A. (See page 1 62.) (b) Show that the eigenvalues of the matrix A in part (a) are y A = H a + d ± V(a - d) 2 + 4bc ) ( c) Show that the trace and determinant of the matrix A in part (a) are given by tr(A) = A 1 + A 2 and det A = A 1 A 2 35. In Exercises 23-26, use the method of Example 4.5 to find all of the eigenvalues of the matrix A. Give bases for each of the corresponding eigenspaces. Illustrate the eigenspaces and the effect of multiplying eigenvectors by A as in Figure 4. 8. 23. A = [� - � ] [� �] 24. A = [! �] [ � �] where A 1 and A 2 are the eigenvalues of A. 36. Consider again the matrix A in Exercise Give conditions on a, b, c, and d such that A has (a) two distinct real eigenvalues, (b) one real eigenvalue, and (c) no real eigenvalues. 37. Show that the eigenvalues of the upper triangular matrix A= [ � �] are A = a and A = d, and find the corresponding eigenspaces. 25. A = 26. A = ,E:S738. Let a and b be real numbers. Find the eigenvalues and ,E:S7 corresponding eigenspaces of In Exercises 27-30, find all of the eigenvalues of the matrix A over the complex numbers C. Give bases for each of the _ corresponding eigenspaces. 27. A = [ � �] _ 28. A = [ -3] 2 l O over the complex numbers. 11 f Section 4.2 Determinants 263 Determ i n a nts Historically, determinants preceded matrices-a curious fact in light of the way linear algebra is taught today, with matrices before determinants. Nevertheless, determi­ nants arose independently of matrices in the solution of many practical problems, and the theory of determinants was well developed almost two centuries before matrices were deemed worthy of study in and of themselves. A snapshot of the his­ tory of determinants is presented at the end of this section. Recall that the determinant of the 2 X 2 matrix A = [ aa2l 1l det A = a l l a 22 - a 1 2 a 2 1 We first encountered this expression when we determined ways to compute the inverse of a matrix. In particular, we found that [ aa2111 aa2212 ] I The determinant of a matrix A is sometimes also denoted by I A , so for the 2 X 2 matrix A = Warning we may also write . I aa ll1 aa 12 I 2 22 tation. It 1s. easy to mistake · [ aa2ll1 aa2212 ] This notation for the determinant is reminiscent of absolute value no- . for determmant, . , the notat10n for , the notation for the matrix itself. Do not confuse these. Fortunately, it will usually be clear from the context which is intended. 11 We define the determinant of a X matrix A = [a] to be det A = l a l = a I I (Note that we really have to be careful with notation here: a does not denote the absolute value of a in this case.) How then should we define the determinant of a 3 X 3 matrix? If you ask your CAS for the inverse of A= [ the answer will be equivalent to A-1 = [� � ;] g h l � Jg - di ai - cg cd - af ei - fh ch - bi bf - ce l dh - eg bg - ah ae - bd where Li = aei - afh - bdi + bfg + cdh - ceg. Observe that Li = aei - afh - bdi + bfg + cdh - ceg = a ( ei - fh) - b (di - Jg) + c(dh - eg) =a I � �I - b I � �I c I � �I + 264 Chapter 4 Eigenvalues and Eigenvectors and that each of the entries in the matrix portion of A - i appears to b e the determi­ nant of a 2 X 2 submatrix of A. In fact, this is true, and it is the basis of the definition of the determinant of a 3 X 3 matrix. The definition is recursive in the sense that the determinant of a 3 X 3 matrix is defined in terms of determinants of 2 X 2 matrices. Defi n ition [ a ll Let A = a 21 a 31 ] a 13 a 23 . Then the determinant of A is the scalar a 33 (1) Notice that each of the 2 X 2 determinants is obtained by deleting the row and col­ umn of A that contain the entry the determinant is being multiplied by. For example, the first summand is a 11 multiplied by the determinant of the submatrix obtained by deleting row 1 and column 1 . Notice also that the plus and minus signs alternate in Equation ( 1 ) . If we denote by A;j the submatrix of a matrix A obtained by deleting row i and column j, then we may abbreviate Equation ( 1 ) as det A = a ll det A ll - a 12 det A 12 + a 13 det A 13 3 2: ( - l ) 1 +ja 1j det A 1j j= l For any square matrix A, det A;j is called the (i, j)-minor of A. Exa m p l e 4 . 8 Compute the determinant of Solution We compute det A = 5 1 � �I - - ( - 3) I� �I I� �I +2 _ = 5 ( 0 - ( - 2 )) + 3 ( 3 - 4 ) + 2 ( - 1 - 0 ) = 5 (2 ) + 3 ( - 1 ) + 2 ( - 1 ) = 5 :+ With a little practice, you should find that you can easily work out 2 X 2 determinants in yom head. Wciting out the secood line in the above solution is then unnecess Another method for calculating the determinant of a 3 X 3 matrix is analogous to the method for calculating the determinant of a 2 X 2 matrix. Copy the first two columns of A to the right of the matrix and take the products of the elements on the six Section 4.2 Determinants 265 diagonals shown below. Attach plus signs to the products from the downward-sloping diagonals and attach minus signs to the products from the upward-sloping diagonals. (2) + This method gives In Exercise 19, you are asked to check that this result agrees with that from Equa­ tion ( 1) for a 3 X 3 determinant. Exa m p l e 4 . 9 Calculate the determinant of the matrix in Example 4.8 using the method shown in (2). Solution products: We adjoin to A its first two columns and compute the six indicated 0 - 10 -9 -2 Adding the three products at the bottom and subtracting the three products at the top gives det A = 0 + ( - 1 2) + ( - 2) - 0 - ( - 1 0) - ( - 9) = 5 as before. Warning We are about to define determinants for arbitrary square matrices. However, there is no analogue of the method in Example 4.9 for larger matrices. It is valid only for 3 X 3 matrices. Delerminanls of n x n Malrices The definition of the determinant of a 3 X 3 matrix extends naturally to arbitrary square matrices. D e fi D i l i O D Let A [a;) be an n x n matrix, where n 2: 2. Then the determinant of A is the scalar = n _L ( - l ) 1 +ia 11 det A 11 j� I (3) 266 Chapter 4 Eigenvalues and Eigenvectors It is convenient to combine a minor with its plus o r minus sign. To this end, we define the (i, j)-cofactor ofA to be With this notation, definition (3) becomes <let A 2: a 1j C ij j= l n = (4) Exercise 20 asks you to check that this definition correctly gives the formula for the determinant of a 2 X 2 matrix when n 2. Definition ( 4) is often referred to as cofactor expansion along the first row. It is an amazing fact that we get exactly the same result by expanding along any row (or even any column) ! We summarize this fact as a theorem but defer the proof until the end of this section (since it is somewhat lengthy and would interrupt our discussion if we were to present it here). = Theorem 4 . 1 The Laplace Expansion Theorem The determinant of an n X n matrix A = [a;) , where n 2: 2, can be computed as 2: a ;j Cij j= l n (5) (which is the cofactor expansion along the ith row) and also as 2: a ij cij i= l n = (6) (the cofactor expansion along the jth column). A Since Cij = ( - l ) i +j <let A;j , each cofactor is plus or minus the corresponding minor, with the correct sign given by the term ( - 1 y +j . quick way to determine whether the sign is + or - is to remember that the signs form a "checkerboard" pattern: + + + + + + + + Section 4.2 Determinants Exa m p l e 4 . 1 0 261 Compute the determinant of the matrix by (a) cofactor expansion along the third row and (b) cofactor expansion along the second column. Solution (a) We compute = 2 1 -� � 1 - (- 1) 1 � � 1 + 3 1 � -� 1 2 ( - 6) + + 3 ( 3 ) = =5 8 PierrebornSimonin Normandy, Laplace (17France, 49-1827) (b) In this case, we have was and was expected tos mathematical become a clergyman unti l hi talents at school. Hecontributions madeweremanynoticed important = - ( - 3) 1 � � 1 + 0 1 � � 1 - ( - 1 ) 1 � � I to calculus, probabi He was 3( - 1) + 0 + 8 = anBonaparte examilitny,eratandofthetheastronomy. young Napoleon = 5 RoyalNapoleon Artillery was Corps and later, when inof thepower,Interior servedandbrithen efly asChancellor Minister Laplace wasEmpire granted Notice that in part (b) of Example 4. 1 0 we needed to do fewer calculations than theinof the1806 titlSenate. e and of Count of the in part (a) because we were expanding along a column that contained a zero entry­ receivedinthe1817.title of namely, a 22 ; therefore, we did not need to compute C22 • It follows that the Laplace Marquis de Laplace Expansion Theorem is most useful when the matrix contains a row or column with lots of zeros, since, by choosing to expand along that row or column, we minimize the number of cofactors we need to compute. Exa m p l e 4 . 1 1 Compute the determinant of 3 + First, notice that column has only one nonzero entry; we should there­ fore expand along this column. Next, note that the / - pattern assigns a minus sign Solution 268 Chapter 4 Eigenvalues and Eigenvectors to the entry a 2 3 = 2 . Thus, we have det A = a 13 C 13 + a 23 C 23 + a 33 C 33 + a 43 C43 = O(C 13 ) + 2C 23 + O(C 33 ) + O(C43 ) 2 -3 1 -2 1 - 1 3 -2 0 We now continue by expanding along the third row of the determinant above (the third column would also be a good choice) to get ( I � � I - I � � I) det A = - 2 - 2 = - 2 ( - 2 ( - 8) 5) - 2 ( 1 1 ) = - 22 - (Note that the + I pattern for the 3 X 3 minor is not that of the original matrix but that of a 3 X 3 matrix in general.) - 4 The Laplace expansion is particularly useful when the matrix is (upper or lower) triangular. Exa m p l e 4 . 1 2 Compute the determinant of 2 0 A= 0 0 0 Solution 0 -3 3 2 5 0 1 6 0 0 5 0 0 0 4 7 0 2 -1 We expand along the first column to get 3 0 det A = 2 0 0 2 1 0 0 7 5 0 6 5 2 0 -1 (We have omitted all cofactors corresponding to zero entries.) Now we expand along the first column again: 0 1 6 2 det A = 2 · 3 0 5 0 0 -1 Continuing to expand along the first column, we complete the calculation: det A = 2 · 3 · 1 I� 1 2 = 2 . 3 . 1 . ( 5 ( - 1 ) - 2 . 0 ) = 2 . 3 . 1 . 5 . ( - 1 ) = - 30 -1 4 Section 4.2 Determinants 269 Example 4. 1 2 should convince you that the determinant of a triangular matrix is the product of its diagonal entries. You are asked to give a proof of this fact in Exercise 2 1 . We record the result as a theorem. Theorem 4 . 2 The determinant of a triangular matrix is the product of the entries on its main diagonal. Specifically, if A = [a;) is an n X n triangular matrix, then <let A = a 1 1 a 22 · • · a "" Nole In general (that is, unless the matrix is triangular or has some other special form), computing a determinant by cofactor expansion is not efficient. For example, the determinant of a 3 X 3 matrix has 6 = 3 ! summands, each requiring two multipli­ cations, and then five additions and subtractions are needed to finish off the calcula­ tions. For an n X n matrix, there will be n! summands, each with n - 1 multiplications, and then n ! - 1 additions and subtractions. The total number of operations is thus T(n) = (n - l ) n ! + n ! - 1 > n! Even the fastest o f supercomputers cannot calculate the determinant o f a mod­ erately large matrix using cofactor expansion. To illustrate: Suppose we needed to calculate a 50 X 50 determinant. (Matrices much larger than 50 X 50 are used to store the data from digital images such as those transmitted over the Internet or taken by a digital camera.) To calculate the determinant directly would require, in general, more 6 than 50! operations, and 50! 3 X 1 0 4 . If we had a computer that could perform 1 2 a trillion ( 1 0 ) operations per second, it would take approximately 3 X 1 0 52 sec­ onds, or almost 1 0 4 5 years, to finish the calculations. To put this in perspective, con­ sider that astronomers estimate the age of the universe to be at least 10 billion ( 1 0 1 0 ) years. Thus, on even a very fast supercomputer, calculating a 50 X 50 determinant by cofactor expansion would take more than 1 0 30 times the age of the universe! Fortunately, there are better methods-and we now turn to developing more computationally effective means of finding determinants. First, we need to look at some of the properties of determinants. = Properties of Determinants The most efficient way to compute determinants is to use row reduction. However, not every elementary row operation leaves the determinant of a matrix unchanged. The next theorem summarizes the main properties you need to understand in order to use row reduction effectively. Theorem 4 . 3 Let A = [a ij ] be a square matrix. a. If A has a zero row (column), then <let A = 0. b. IfB is obtained by interchanging two rows (columns) of A, then <let B = - <let A. c. If A has two identical rows (columns), then <let A = 0. d. If B is obtained by multiplying a row (column) of A by k, then <let B = k <let A. e. If A, B, and C are identical except that the ith row (column) of C is the sum of the ith rows (columns) of A and B, then <let C = <let A + <let B. f. If B is obtained by adding a multiple of one row (column) of A to another row (column), then <let B = <let A. 210 Chapter 4 Eigenvalues and Eigenvectors Proof We will prove (b) as Lemma 4. 14 at the end of this section. The proofs of properties (a) and (f) are left as exercises. We will prove the remaining properties in terms of rows; the corresponding proofs for columns are analogous. (c) If A has two identical rows, swap them to obtain the matrix B. Clearly, B = A, so <let B = det A. On the other hand, by (b), det B = - det A. Therefore, det A = - det A, so det A = 0. ( d) Suppose row i ofA is multiplied by k to produce B; that is, b ij = ka;j for j = 1 , . . . , n. Since the cofactors Cij of the elements in the ith rows of A and B are identical (why?), expanding along the ith row of B gives n n n det B = � b ;j Cij = � ka ij cij = k � a ij cij = k det A j= l j= l j= l (e) As in (d), the cofactors Cij of the elements in the ith rows of A, B, and C are identical. Moreover, c ij = a;j + b;j for j = 1 , . . . , n. We expand along the ith row of C to obtain n n n n det C = � c ;j Cij = � ( au + b ;)Cu = � a ;j CiJ + � bu Cu = det A + det B j= l j= l j= l j= l Notice that properties (b), ( d), and ( f) are related to elementary row operations. Since the echelon form of a square matrix is necessarily upper triangular, we can combine these properties with Theorem 2 to calculate determinants efficiently. (See Exploration: Counting Operations in Chapter 2, which shows that row reduction of an n X n matrix uses on the order of n 3 operations, far fewer than the n ! needed for cofactor expansion.) The next examples illustrate the computation of determinants using row reduction. Exa m p l e 4 . 1 3 [ 20 35 - 31 ] A Compute <let A if (a) = (b) A � Solution -4 -6 2 [ i -� j il (a) Using property (f) and then property (a), we have 2 3 5 <let A = 0 -4 -6 - 1 R, + 2R1 2 3 3 0 5 2 0 0 -1 3 =O 0 Section 4.2 Determinants 211 (b) We reduce A to echelon form as follows (there are other possible ways to do this): 1 2 -4 5 0 3 0 -3 6 2 - 4 5 Ri/ 3 2 0 - 3 6 R ,<->R, 0 0 = -3 2 4 5 7 4 2 4 5 7 5 -1 -3 1 5 -1 - 1 -3 1 1 2 2 0 -1 0 -1 R3 - 2R 1 1tt 4 R4 - SR 1 R R 2 -4 5 2 -9 0 Q -1 = -3 = - ( - 3) 0 4 4 0 7 7 3 3 2 -9 0 -1 5 2 -4 0 0 3 <let A = 2 5 -1 -4 5 -3 2 5 7 1 1 0 -1 2 Q -1 2 -9 =3 0 1 5 - 33 0 0 0 0 - 13 = 3 . 1 . ( - 1 ) . 1 5 . ( - 1 3 ) = 585 R, + 4R, R4 + 2 R2 Remark By Theorem 4.3, we can also use elementary column operations in the process of computing determinants, and we can "mix and match'' elementary row and column operations. For example, in Example 4. l 3(a), we could have started by adding column 3 to column 1 to create a leading 1 in the upper left-hand corner. In fact, the method we used was faster, but in other examples column operations may speed up the calculations. Keep this in mind when you work determinants by hand. oe1erminan1s of Eleme n1arv Malrices Recall from Section 3.3 that an elementary matrix results from performing an ele­ mentary row operation on an identity matrix. Setting A = In in Theorem 4.3 yields the following theorem. Theorem 4 . 4 Let E be an n X n elementary matrix. a. If E results from interchanging two rows of In , then <let E = - 1 . b. If E results from multiplying one row of In by k, then <let E = k. c. If E results from adding a multiple of one row of In to another row, then det E = 1 . TheGreek word verb is derivedwhifromch themeans "toisgrasp: 'pIner mathematics, athatlemma a "hel theorem" we "grasp hol d of" andmoreuse toimprove another, usually portant, theorem. lemma lambanein, Since <let In = 1 , applying (b), (d), and (f) of Theorem 4.3 immediately gives (a), (b), and (c), respectively, of Theorem 4.4. Proof Next, recall that multiplying a matrix B by an elementary matrix on the left per­ forms the corresponding elementary row operation on B. We can therefore rephrase (b), (d), and (f) of Theorem 4.3 succinctly as the following lemma, the proof of which is straightforward and is left as Exercise 43. 212 Chapter 4 Eigenvalues and Eigenvectors Lem m a 4 . 5 Let B b e an n X n matrix and let E b e an n X n elementary matrix. Then det ( EB ) = ( det E ) ( det B ) We can use Lemma 4.5 to prove the main theorem of this section: a characteriza­ tion of invertibility in terms of determinants. Theorem 4 . 6 A square matrix A is invertible if and only if det A * 0. Let A be an n X n matrix and let R be the reduced row echelon form of A. We will show first that det A * 0 if and only if det R * 0. Let E 1 , E2 , , Er be the elementary matrices corresponding to the elementary row operations that reduce A to R. Then Proof • • • Taking determinants of both sides and repeatedly applying Lemma 4.5, we obtain ( det E) · · · ( det E 2 ) ( det E , ) ( det A ) det R By Theorem 4.4, the determinants of all the elementary matrices are nonzero. We conclude that det A * 0 if and only if det R * 0. Now suppose that A is invertible. Then, by the Fundamental Theorem oflnvertible Matrices, R In , so det R 1 * 0. Hence, det A * 0 also. Conversely, if det A * 0, then det R * 0, so R cannot contain a zero row, by Theorem 4.3 (a). It follows that R must be In (why?), so A is invertible, by the Fundamental Theorem again. = ...-- = = Determinants and Matrix Operations ....,... Theorem 4 . 1 Let's now try to determine what relationship, if any, exists between determinants and some of the basic matrix operations. Specifically, we would like to find formulas for det (kA), det(A + B), det(AB), det(A - i ), and det(A T ) in terms of det A and det B . Theorem 4.3(d) does not say that det(kA) = k det A. The correct relationship between scalar multiplication and determinants is given by the following theorem. If A is an n x n matrix, then det ( kA ) ....,... = kn det A You are asked to give a proof of this theorem in Exercise 44 . Unfortunately, there is no simple formula for det (A + B), and in general, det (A + B) * det A + det B. (Find two 2 X 2 matrices that verify this.) It therefore comes as a pleasant surprise to find out that determinants are quite compatible with matrix multiplication. Indeed, we have the following nice formula due to Cauchy. Section 4.2 Determinants wasth. bornbriliniaParis andprolstudied engineering buthe published switched toovermathemati c s because of poor heal nt and i fi c mathemati c i a n, 700 papers,inmany on quite difficult problems. His name can betheory, found alongebra, manyandtheorems and definitions di ff erenti a l equations, infinite series, probability physics. He is noted for asintroducing rigor intoconservati calculus, laying the foundation for theandbranch ofhe mathematics known analysis. Politically v e, Cauchy was a royalist, in post 1830 followed Charles i n to exi l e. He returned to France in 1838 but did not return to his atloyalthetySorbonne to the newuntiking.l the university dropped its requirement that faculty swear an oath of 213 Augustin Louis Cauchy ( 1 789- 1 857) A X Theorem 4 . 8 If A and B are n X n matrices, then det (AB ) = ( det A ) ( det B ) We consider two cases: A invertible and A not invertible. If A is invertible, then, by the Fundamental Theorem of lnvertible Matrices, it can be written as a product of elementary matrices-say, Proof A = E , E2 • · · Ek Then AB = E 1 E2 · · · EkB, so k applications of Lemma 4.5 give Continuing to apply Lemma 4.5, we obtain det (AB ) = det ( E 1 E 2 • • • E k ) det B = ( det A ) ( det B ) On the other hand, if A is not invertible, then neither is AB, by Exercise 47 in Section 3.3. Thus, by Theorem 4.6, det A = 0 and det(AB) = 0. Consequently, det (AB) = (det A)(det B), since both sides are zero. Exa m p l e 4 . 1 4 Applying Theorem 4.8 to A = [ 22 �] [ � � ], [ 1216 � ] 12 and B = AB = and that det A = 4, det B = 3, and det(AB) = (Check these assertions! ) we find that = 4 · 3 = (det A) (det B), as claimed. 4 The next theorem gives a nice relationship between the determinant of an invertible matrix and the determinant of its inverse. 214 Chapter 4 Eigenvalues and Eigenvectors Theorem 4 . 9 If A is invertible, then 1 det (A - 1 ) = -det A Since A is invertible, AA - I = I, so <let (AA - I ) = <let I = 1 . Hence, (<let A)(det A - 1 ) = 1 , by Theorem 4.8, and since <let A * 0 (why?), dividing by <let A yields the result. Proof ....... Exa m p l e 4 . 1 5 Verify Theorem 4.9 fo r the matrix A of Example 4.14. Solulion We compute so det A - 1 = Remark (�) (�) - ( - �) ( - �) = % - t = � = � de A The beauty of Theorem 4.9 is that sometimes we do not need to know what the inverse of a matrix is, but only that it exists, or to know what its determinant is. For the matrix A in the last two examples, once we know that <let A = 4 * 0, we immediately can deduce that A is invertible and that <let A - l = � without actually computing A - l . We now relate the determinant of a matrix A to that of its transpose A T_ Since the rows of A T are just the columns of A, evaluating <let A T by expanding along the first row is identical to evaluating <let A by expanding along its first column, which the Laplace Expansion Theorem allows us to do. Thus, we have the following result. Theorem 4 . 1 0 For any square matrix A, det A = det AT Gabriel Cramer (1704-1752) was athatSwissbearsmathematician. The rule in 1750, inhishisname treatisewas published Ascasesearlyof theas 1730, however, special formula were known other mathematicians, including theto(1698-1746), Scotsman Colin Maclaurin perhaps the greatest ofwerethetheBritish mathematicians "successors of Newton:who' Introduction to the Analysis ofAlgebraic Curves. Cra m e r 's Rule and lhe Adioinl In this section, we derive two useful formulas relating determinants to the solution of linear systems and the inverse of a matrix. The first of these, Cramer's Rule, gives a formula for describing the solution of certain systems of n linear equations in n variables entirely in terms of determinants. While this result is of little practical use beyond 2 X 2 systems, it is of great theoretical importance. We will need some new notation for this result and its proof. For an n X n ma­ trix A and a vector b in !R n , let A;(b) denote the matrix obtained by replacing the ith column of A by b. That is, Column i -1.- A ; ( b ) = [ a 1 • · · h · · · an l Section 4.2 Determinants Theorem 1 . 1 1 215 Cramer's Rule Let A be an invertible n X n matrix and let b be a vector in !R n . Then the unique solution x of the system Ax = b is given by X·l = Proof e2 , . • • det (A; ( b )) for i = 1 , . . . , n det A The columns of the identity matrix I = In are the standard unit vectors e 1 , , en - If Ax = b, then AI; ( x) = A [ e 1 x · · · en ] = [Ae 1 = [ a 1 · · · b · · · an ] = A; ( b ) • • • Ax · · · Ae n ] Therefore, by Theorem 4.8, ( detA ) ( det I; ( x)) = det (AI; ( x)) = det (A ; ( h )) Now 1 0 0 1 X1 Xz 0 0 0 0 det I; ( x) = 0 0 X; 0 0 = X; 0 0 0 0 Xn - 1 Xn 1 0 0 as can be seen by expanding along the ith row. Thus, (<let A) x; = det(A; (h)), and the result follows by dividing by <let A (which is nonzero, since A is invertible) . Exa m p l e 4 . 1 6 Use Cramer's Rule t o solve the system X 1 + 2X 2 = 2 - x 1 + 4x 2 = 1 Solution det A We compute = 1-� !I = 6, det (A 1 ( b )) = I� !I = = 1-� �I 6, and det (A 2 ( b )) =3 By Cramer's Rule, det (A 2 ( b )) det (A 1 ( b )) 6 X 1 = ---- = - = 1 and x2 = ---­ det A 6 det A 3 6 2 216 Chapter 4 Eigenvalues and Eigenvectors As noted previously, Cramer's Rule is computationally inefficient for all but small systems of linear equations because it involves the calculation of many determinants. The effort expended to compute just one of these determinants, using even the most efficient method, would be better spent using Gaussian elimination to solve the system directly. Remark The final result of this section is a formula for the inverse of a matrix in terms of determinants. This formula was hinted at by the formula for the inverse of a 3 X 3 matrix, which was given without proof at the beginning of this section. Thus, we have come full circle. Let's discover the formula for ourselves. If A is an invertible n X n matrix, its inverse is the (unique) matrix X that satisfies the equation AX = I. Solving for X one column at a time, let "i be the jth column of X. That is, X·1 = Therefore, Axj = ej , and by Cramer's Rule, det (A/e) ) det A However, i th column a 11 a i 2 a z, a 22 <let (A; ( e) ) = -J, 0 0 ai , a1 2 a n , an 2 a 1n a2n aJn 0 = ( - l )j + i det A = C JI JI a nn which is the (j, i ) -cofactor of A. It follows that xij = ( 1 /det A)Cji • so A - i = X = ( 1 / <let A) [Cji ] = ( 1 /det A) [Cij f. In words, the inverse of A is the transpose of the matrix of cofactors of A, divided by the determinant of A. The matrix is called the adjoint (or adjugate) of A and is denoted by adj A. The result we have just proved can be stated as follows. Section 4.2 Determinants Theorem 4 . 1 2 Let A be an invertible n X n matrix. Then A-1 = Exa m p l e 4 . 1 1 211 1 adj A -- det A Use the adjoint method to compute the inverse of A= Solution C1 1 = C2 1 = C31 = We compute det A = + I � 4 1 -18 - I� 1 1 + I � -14 1 3 -3 2 and the nine cofactors - I� 41 +I� 11 - I � -14 1 + I� �I 4 I � � I -1 +I� �I [ -1 : �� � J [ �4� -1- � �� 1 1 1 [ -18l � �� 1 [ - � - : 1 -3 - - [ �1 � -�i -3 = C1 2 = =3 C22 = = 10 C3 2 = -3 - -3 = 10 C1 3 = = -2 C2 3 = = -6 C33 = = = - = -2 The adjoint is the transpose of the matrix of cofactors-namely, r adj A = Then A-1 = det A 10 -6 -2 adj A = - 2 = - 3 -2 -1 - 2 -2 = -t -2 t which is the same answer we obtained (with less work) in Example 3.30. 1 Proof of lhe Laplace Expansion Theorem Unfortunately, there is no short, easy proof of the Laplace Expansion Theorem. The proof we give has the merit of being relatively straightforward. We break it down into several steps, the first of which is to prove that cofactor expansion along the first row of a matrix is the same as cofactor expansion along the first column. Le m m a 4 . 1 3 Let A be an n X n matrix. Then 218 Chapter 4 Eigenvalues and Eigenvectors We prove this lemma by induction on n . For n = 1, the result is trivial. Now assume that the result is true for (n - 1) X ( n - 1) matrices; this is our induction hypothesis. Note that, by the definition of cofactor (or minor), all of the terms con­ taining au are accounted for by the summand au Cu. We can therefore ignore terms containing au. The ith summand on the right-hand side of Equation (7) is a; 1 C; 1 = a; 1 ( - 1 ) i + l det A; 1 . Now we expand det A; 1 along the first row: Proof a 13 a 11 a ln a ; - 1, 2 a ; - 1,3 a ; +1, 2 a ; +1,3 a ; - 1,J a ; +1,J a ; - 1,n a ; +J,n a n3 a n) a n,n a12 an2 1 The jth term in this expansion of det A; 1 is a 11( - 1) +J - l det A 1 ;, 11, where the nota­ tion A kl, rs denotes the submatrix of A obtained by deleting rows k and l and columns r and s. Combining these, we see that the term containing a; 1 a 11 on the right-hand side of Equation ( 7) is a ; 1 ( - l ) ; +1 a 1/ - l ) 1 +J - I det A 1 u1 = ( - l ) i +J+ 1 a ; 1 a 11 det A 1 ;,11 What is the term containing a i l a 11 on the left-hand side of Equation (7)? The factor a 11 occurs in the jth summand, a 11 C 1i = a 1/ - 1) 1 +J det A 11 . By the induction hypothesis, we can expand det A 1i along its first column: a z1 a 31 a z, J - 1 a z, J+ 1 a 3, j - I a 3, j+ I a2n a 3n a;1 a ;,j - 1 a ;, J+ l a ;n an ! a n, j - 1 a n, )+ ! a nn The ith term in this expansion of det A 11 is a i l ( - l ) (i - l ) + l det A 1 ;, 11 , so the term con­ taining a; 1 a 1i on the left-hand side of Equation (7) is i a 1J ( - 1 ) 1 +Ja ; 1 ( - 1 ) ( ; - i ) + i det A 1 ;, 11 - ( - l) +J+ i a il a 1J det A 1 ;, 11 which establishes that the left- and right-hand sides of Equation (7) are equivalent. Next, we prove property (b) of Theorem 4.3. Le m m a 4 . 1 4 Let A be an n X n matrix and let B be obtained by interchanging any two rows (columns) of A. Then det B = - det A Section 4.2 Determinants 219 Proof Once again, the proof is by induction on n. The result can be easily checked when n = 2, so assume that it is true for ( n - 1 ) X (n - 1 ) matrices. We will prove that the result is true for n X n matrices. First, we prove that it holds when two adja­ cent rows of A are interchanged-say, rows r and r + 1 . By Lemma 4. 13, we can evaluate <let B by cofactor expansion along its first col­ umn. The ith term in this expansion is ( - 1 ) 1 + i b; 1 <let B; 1 . If i * r and i * r + 1, then b; 1 = a; 1 and B; 1 is an (n - 1) X (n - 1) submatrix that is identical to A; 1 except that two adjacent rows have been interchanged. al l a12 aln a; 1 a;2 a; n a r + l, l a r + l, 2 a rl a r2 a r + l ,n a rn a n2 a nn an l Thus, by the induction hypothesis, <let B; 1 = - <let A; 1 if i * r and i * r + 1 . If i = r, then b; 1 = a r + l , l and B i l = A r + l , l · al l Row a12 aln i -+ a r + l, l a r + l, 2 a rl a r2 a r + l ,n a rn an ] an2 a nn Therefore, the rth summand in <let B is ( - l ) r + I b r l <let B r ! = ( - l ) r + I a r + I,I <let A r + I, I = - ( - l ) ( r + I) + I a r + I,I <let A r + I, I Similarly, if i = r + 1 , then b; 1 = a r 1 , B; 1 = A r 1 , and the (r + l )st summand in <let B is ( - l ) ( r + I) + I b r I,I det B r I,I = ( - l ) 'a r l det A r l = - ( - l ) r + l a r l det A r l + + In other words, the rth and (r + l)st terms in the first column cofactor expansion of <let B are the negatives of the ( r + 1 )st and rth terms, respectively, in the first column cofactor expansion of <let A. Substituting all of these results into <let B and using Lemma 4. 1 3 again, we obtain n � ( - 1 ) i + l b; 1 <let B; 1 <let B = ,L,.; i= l n � ( - l ) i + l b; 1 <let B; 1 + ( - l ) r+ l b rl <let B r1 + ( - l ) ( r + J ) + l b r + l, l <let B r + l,l i= l i ;" r, r + l n � ( - l ) i + 1 a; 1 ( - det A; 1 ) - ( - l ) ( r + l) + l a r + l , l det A r + l, l - ( - l ) r + l a r1 det A r1 i= l i t= r,r + l n - ,L,.; � ( - 1 ) i + l a; 1 <let A; 1 i= l - det A 280 Chapter 4 Eigenvalues and Eigenvectors This proves the result for n X n matrices if adjacent rows are interchanged. To see that it holds for arbitrary row interchanges, we need only note that, for example, rows r and s, where r < s, can be swapped by performing 2( s - r) - 1 interchanges of adjacent rows (see Exercise 67) . Since the number of interchanges is odd and each one changes the sign of the determinant, the net effect is a change of sign, as desired. The proof for column interchanges is analogous, except that we expand along row 1 instead of along column 1 . We can now prove the Laplace Expansion Theorem. Proof of Theorem 4 . 1 Let B be the matrix obtained by moving row i of A to the top, using i - 1 interchanges of adjacent rows. By Lemma 4. 14, <let B = ( - 1 ) i - l det A. But b 1j = a;j and B 1j = A;j for j = 1 , . . . , n. a;1 a in a ;j a ll a 1j a ln Thus, <let B = a ; - 1,1 a; + 1,1 a ; - 1,j a; + 1,j a i 1, n ai+ l, n a nl a nj a nn - det A = ( - l ) i - 1 det B = ( - l ) i - 1 2: ( - 1 ) 1 +j b 1j det B 1j j� I n n n = ( - 1 ) ; - i 2: ( - 1 ) 1 +ja ij <let A ij = 2: ( - l ) i +ja ij <let A ij j� I j� I which gives the formula for cofactor expansion along row i. The proof for column expansion is similar, invoking Lemma 4. 1 3 so that we can use column expansion instead of row expansion (see Exercise 68). A Brief Hislorv of oe1erm ina n1s ATakakazu self-taughtSekichiKowa ld prodi(1642-1708) gy, was descended fromIn aaddition family of samurai warriors. discovering determinants, hetoequations, wrote about diophantine magic squares, and Bernoulli numbers (before Bernoulli) andin calculus. quite likely made discoveries As noted at the beginning of this section, the history of determinants predates that of matrices. Indeed, determinants were first introduced, independently, by Seki in 1 683 and Leibniz in 1 693. In 1 748, determinants appeared in Maclaurin's Treatise on A lge ­ bra, which included a treatment of Cramer's Rule up to the 4 X 4 case. In 1 7 50, Cramer himself proved the general case of his rule, applying it to curve fitting, and in 1 772, Laplace gave a proof of his expansion theorem. The term determinant was not coined until 1 80 1 , when it was used by Gauss. Cauchy made the first use of determinants in the modern sense in 1 8 1 2 . Cauchy, in fact, was responsible for developing much of the early theory of determinants, in­ cluding several important results that we have mentioned: the product rule for de­ terminants, the characteristic polynomial, and the notion of a diagonalizable matrix. Determinants did not become widely known until 1 84 1 , when Jacobi popularized them, albeit in the context of functions of several variables, such as are encountered in a multivariable calculus course. (These types of determinants were called "Jacobians" by Sylvester around 1 850, a term that is still used today.) Section 4.2 Determinants Gottfried phiWilhelm vonandLeibniz (1646-1716) wasprobabl bornyinbestLeipzig andforstudied law, (with theology, l osophy, mathematics. He is known developing Newton, independentl ybranches ) the mainof imathematics deas of differenti alsand integralve.calculus. However,the notion his contributions to other are al o impressi He developed ofothers a determinant, knew versions ofandCramer' s Rulefoundation and the Laplace Expansion Theoremworkbeforehe were given credi t for them, laid the for matrix theory through didHe believed on quadratic forms. Leibnizofalsogoodwasnotation the firstand,to develop the binary systemnotation of arithmetic. i n the importance along with the familiar fora deri v ati v es and integrals, i n troduced a form of subscri p t notation for the coefficients of linear system that is essential y the notation we use today. 281 By the late 1 9th century, the theory of determinants had developed to the stage that entire books were devoted to it, including Dodgson's An Elementary Treatise on Determinants in 1 867 and Thomas Muir's monumental five-volume work, which appeared in the early 20th century. While their history is fascinating, today deter­ minants are of theoretical more than practical interest. Cramer's Rule is a hopelessly inefficient method for solving a system of linear equations, and numerical methods have replaced any use of determinants in the computation of eigenvalues. Determi­ nants are used, however, to give students an initial understanding of the characteristic polynomial (as in Sections 4. 1 and 4.3). .. I Exercises 4 . 2 • Compute the determinants in Exercises 1 -6 using cofactor expansion along the first row and along the first column. 1 0 3 1. 5 0 2 1 3. - 1 0 -1 0 2 3 5. 2 3 1 2 3 0 1 2. 2 3 -1 3 0 -1 -1 -2 0 1 1 0 4. 1 0 0 2 3 6. 4 5 6 7 8 9 8. 2 3 1 0 -2 -1 1 2 0 tan e - sin e cos e 12. b c d 0 e 0 -1 0 3 5 2 6 0 0 4 2 1 0 0 0 a 0 0 b c 15. 0 d e f g h cos e sin e 10. 0 cos e 0 sin e 0 a 0 a b 0 11. 0 a b a 0 b 13. Compute the determinants in Exercises 7-15 using cofactor expansion along any row or column that seems convenient. 5 2 2 7. - 1 2 3 0 0 1 3 -4 9. 2 - 2 4 -1 0 j 2 1 14. 0 2 0 3 0 2 -1 0 -1 2 4 -3 Chapter 4 Eigenvalues and Eigenvectors 282 In Exercises 1 6- 1 8, compute the indicated 3 X 3 determi­ nants using the method of Example 4. 9. 16. The determinant in Exercise 6 17. The determinant in Exercise 8 18. The determinant in Exercise 1 1 19. Verify that the method indicated in (2) agrees with Equation ( 1 ) for a 3 X 3 determinant. 20. Verify that definition ( 4) agrees with the definition of a 2 X 2 determinant when n = 2. 2 1 . Prove Theorem 4.2. [Hint: proof by induction would be appropriate here.] A In Exercises 22-25, evaluate the given determinant using elementary row and/or column operations and Theorem 4.3 to reduce the matrix to row echelon form. 22. The determinant in Exercise 1 23. The determinant in Exercise 9 24. The determinant in Exercise 1 3 25. Th e determinant i n Exercise 14 1 -2 2 1 0 3 27. 0 - 2 5 0 4 0 0 1 0 28. 0 5 2 3 -1 4 29. 2 3 30. 0 4 1 6 4 4 1 31. -2 0 5 4 1 0 32. 0 0 0 -3 33. 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 34. 1 1 0 0 1 0 0 2a 2b 2c 2a b/ 3 - c 36. 2 d e/ 3 -f e f g h 2g h/ 3 - i a-c b c d e f 37. a b c 38. d - f e f g- i h g h 2c b a 39. 2f e d 2i h g a + 2g b + 2h c + 2i 40. 3d + 2g 3e + 2h 3f + 2i h g 35. d 41. Prove Theorem 4.3(a). 43. Prove Lemma 4.5. In Exercises 26-34, use properties of determinants to evaluate the given determinant by inspection. Explain your reasoning. 1 1 26. 3 0 2 2 Find the determinants in Exercises 35-40, assuming that a b c d e f =4 g h 2 -1 3 -3 5 -4 -2 2 3 -2 1 2 0 0 0 0 0 0 0 4 0 0 42. Prove Theorem 4.3(f) . 44. Prove Theorem 4.7. In Exercises 45 and 46, use Theorem 4. 6 to find all values of k for which A is invertible. -k k+1 45. A = -8 k 1 k 46. A � [1 � : [ � �] 2 k In Exercises 47-52, assume that A and B are n X n matrices with det A = 3 and det B = - 2. Find the indicated determinants. 47. det (AB ) 50. det ( 2A ) 48. det (A 2 ) 5 1 . det ( 3B r ) 49. det ( B- 1 A ) 52. det (AA r) In Exercises 53-56, A and B are n X n matrices. 53. Prove that det (AB) = det (BA). 54. If B is invertible, prove that det (B - 1AB) = det (A) . 5 5 . I f A i s idempotent (that is, A 2 = A), find all possible A values of det (A) . 56. square matrix A i s called nilpotent i f A m = 0 for some m > 1. (The word nilp otent comes from the Latin nil, meaning "nothing;' and potere, meaning "to have power:' nilpotent matrix is thus one that A Section 4.2 Determinants becomes "nothing" -that is, the zero matrix-when raised to some power.) Find all possible values of det (A) if A is nilpotent. In Exercises 57-60, use Cramer's Rule to solve the given linear system. 58. 2x - y = 5 57. x + y = 1 x + 3y = - 1 x-y=2 59. 2x + y + 3z = 1 60. x + y - z = 1 y+ z=1 x+y+z=2 =3 z=1 x-y In Exercises 61-64, use Theorem 4. 12 to compute the in­ verse of the coefficient matrix for the given exercise. 62. Exercise 58 61. Exercise 57 63. Exercise 59 64. Exercise 60 65. If A is an invertible n X n matrix, show that adj A is where P and S are square matrices. Such a matrix is said to be in block (upper) triangularform. Prove that det A = ( det P) ( det S) [Hint: Try a proof by induction on the number of rows of P.] 70. (a) Give an example to show that if A can be partitioned as A = (b) Assume that A is partitioned as in part (a) and that P is invertible. Let B= [-�-f-�-] Writi n g Project [��-�� -d-?-] Compute det (BA) using Exercise 69 and use the result to show that det A = det P det ( S - RP - 1 Q ) [The matrix S - RP - 1 Q is called the Schur com­ 1 ( adj A ) - 1 = --A = adj (A - 1 ) det A 66. If A is an n X n matrix, prove that det ( adj A ) = ( det A ) n - I A = [�- f{j where P, Q, R, and S are all square, then it is not necessarily true that det A = ( det P) ( det S) - ( det Q ) ( det R ) also invertible and that 67. Verify that if r < s, then rows r and s of a matrix can be interchanged by performing 2 (s - r) - 1 inter­ changes of adjacent rows. 68. Prove that the Laplace Expansion Theorem holds for column expansion along the jth column. 69. Let A be a square matrix that can be partitioned as 283 plement of P in A, after Issai Schur ( 1 875- 1 94 1 ), who was born in Belarus but spent most of his life in Germany. He is known mainly for his fun­ damental work on the representation theory of groups, but he also worked in number theory, analysis, and other areas.] (c) Assume that A is partitioned as in part (a), that P is invertible, and that PR = RP. Prove that detA = det ( PS - RQ ) Which Came First: The Matrix or the Determinant? The way in which matrices and determinants are taught today-matrices before determinants-bears little resemblance to the way these topics developed histori­ cally. There is a brief history of determinants at the end of Section 4.2. Write a report on the history of matrices and determinants. How did the nota­ tions used for each evolve over time? Who were some of the key mathematicians involved and what were their contributions? 1. Florian Cajori, A History of Mathematical Notations (New York: Dover, 1 993). 2. Howard Eves, An Introduction to the History of Mathematics (Sixth Edition) (Philadelphia: Saunders College Publishing, 1 990). 3. Victor J. Katz, A History of Mathematics: An Introduction (Third Edition) (Reading, MA: Addison Wesley Longman, 2008). 4. Eberhard Knobloch, Determinants, in Ivor Grattan-Guinness, ed., Compan­ ion Encyclopedia of the History and Philosophy of the Mathematical Sciences (London: Routledge, 2013). Vi g n ette L e wis C arro ll's C o n d e n s at i o n M e th o d In 1 866, Charles Dodgson-better known by his pseudonym Lewis Carroll­ published his only mathematical research paper. In it, he described a "new and brief method" for computing determinants, which he called "condensation:' Although not well known today and rendered obsolete by numerical methods for evaluating determi­ nants, the condensation method is very useful for hand calculation. When calculators or computer algebra systems are not available, many students find condensation to be their method of choice. It requires only the ability to compute 2 X 2 determinants. We require the following terminology. If A is an n x n matrix with n 2: 3, the interior of A, denoted int(A), is the (n - 2 ) X (n - 2 ) matrix obtained by deleting the first row, last row, first column, and last column of A. D e fi n it i o n is muchLewisbetterCarroll, knownunderby his pen name, which he wroteand He aland so wrote several mathematics books collections oflogic puzzles. Charles Lutwidge Dodgson ( 1 8321 898) in Wonderland Looking Glass. Alice's Adventures Through the This visgCarrol nette ils'sbased on theonarticle "Lewi Condensati atiEveng Determi nants" byMethod Adriaforn RiEvalce anduNovember Torrence in 2006, 12-15. For ofurther detaiseels of thepp.Davicondensati n method, d M. Bressoud, We will illustrate the condensation method for the 5 X 5 matrix 0 2 3 -1 2 2 3 1 -4 A = 1 2 -1 2 1 3 1 - 1 2 -2 -4 2 0 Begin by setting A 0 equal to the 6 X 6 matrix all of whose entries are 1 . Then, we set A 1 = A. It is useful to imagine A 0 as the base of a pyramid with A 1 centered on top of A 0 . We are going to add successively smaller and smaller layers to the pyramid until we reach a 1 X 1 matrix at the top-this will contain <let A. (Figure 4.9) Math Horizons, Proofs and Confirmations: The Story of the Alternating Sign Matrix Conjecture, MAA UniversiSpectrum ty Press, Series 1999).(Cambridge 284 Ao Figure 4 . 9 alNextl , we s"ucondens bmatricees" Aof1 iAnto a matrix A; whose entries are the determinants of 2X2 1: 4X4 3 -� 1 1 -� � I I � �I 1 1 2 I � 3 -�1 � � 1 1 1 1 1 1 2 2 � I � � A; = 2 - - 2 1 1 I 3 - 11 1 1 - 11 - 21 1 1 - 1 2 1 1 2 - � I 1 1 -! � 1 1 1 -� 1 1 -� � I I� -� 1 2 [ -j 11 7 -1 1 -7 1 5 -1 - : -4 6 1 of A;Aby2 =theA;corr. esponding entry of int(A0) to get matrix ANow2. WeSiwenceredipeatA0viidstehaleachelprso, cedur tenthisrymeans dividing each entry ofA� bye,thconse cortrruectspinondig A�ngfrentomrtyhofe int(Asubmatand sroicon.es ofWeA2obtandaitnh: en 1 - � �l I � - ; 1 1 - ; - : 1 ] A� = 1 : _; 1 1 _; � I I � ! I , 1 2X2 1 1), 1 - A' 4 1 1 11 11 60/3 36/2 - 4/ - 1 �= ��� l = [ �� �� -- 2927 ] , 26/2 12 4 - 27 - 29 26 13 1 = [ - 42 - 94 ] , 350 - 96 30 1 8 1 8 - 29 4 13 1 12 4 3 1 20 30 1 8 20 18 - 27 - 29 ]=[ ] [ A � = [ 1 : -�� I ] = A5 = = be checked by hotodherwimetl prhoodsduce, <leat A = matIrnixgener al, aforinianng <let A. matrix A, tAshecancondens a t i o n met A cont n y, thedmetthenhodbe brtryeaksing down idfethbye inzerteoriotor ofconsanytruofctthA;e +A; sHowever contain,s cara zerefulo, susineceClofweeelarelwoulment t o di v i that we can proaceed.ry row and column operations can be used to eliminate the zeros so - 42/7 - 96/ ( - 1 ) 6 [ 8604/ 1 8 ] - 94/ 1 350/5 -6 96 - 94 70 [ 8604 ] , [ 478 ] 478. 1X1 nXn 1. 285 Exp loration G e o m etric App l i c at i o n s of D eterminants This exploration will reveal some of the amazing applications o f determinants to geometry. In particular, we will see that determinants are closely related to area and volume formulas and can be used to produce the equations of lines, planes, and certain other curves. Most of these ideas arose when the theory of determinants was being developed as a subject in its own right. The Cross Product [ :: ] Recall from Exploration: The Cross Product in Chapter 1 that the cross product of u � [ ::J ond v � ;, thrndor n X v defined by U XV = [ U 2 V 3 - U 3 V2 U 3 V 1 - U 1 V3 U 1 V2 - U 2 V 1 ] If we write this cross product as ( u 2 v3 - u 3 v2 ) e1 - ( u1v3 - u 3 v1 ) e2 + ( u1v2 - u 2 v1 ) e3 , where e1, e2 , and e3 are the standard basis vectors, then we see that the form of this formula is if we expand along the first column. (This is not a proper determinant, of course, since e1, e2 , and e 3 are vectors, not scalars; however, it gives a useful way of remem­ bering the somewhat awkward cross product formula. It also lets us use properties of determinants to verify some of the properties of the cross product.) Now let's revisit some of the exercises from Chapter 1 . 286 1. Use the determinant version of the cross product to compute u X v. [} HJ [ l m [ [ i l n =: lm [:l [ :: l [ :l [ U3UU21 ( a) u � � (b) u � u� v� ( d) u � (o) 2. If u � and w � � U • - � � how that (V X W) = det 3. Use properties of determinants (and Problem 2 above, if necessary) to prove the given property of the cross product. ( a ) v X u = - ( u X v) (b) u x 0 = 0 ( c) u X u = 0 ( d ) u X kv = k ( u X v) ( e ) u X ( v + w) = u X v + u X w ( f ) u · ( u X v) = 0 and v · ( u X v) = 0 ( g ) u · ( v X w) = ( u X v) · w ( the triple scalar product identity) Area a n d Vol u m e We can now give a geometric interpretation of the determinants of 2 X 2 and 3 X 3 matrices. Recall that if u and v are vectors in IFR 3 , then the area A of the parallelogram determined by these vectors is given by A = II u X v 11 . (See Exploration: The Cross Product in Chapter 1 . ) 4. Let u = y [ �:] [ ::] . Show that the area A o f the parallelogram determined by u and v is given by [H;nt w,;,, u and v "' Fioure 4 . 1 0 and v = [ �:] [�:Ji and 5 . Derive the area formula in Problem 4 geometrically, using Figure 4. 1 0 as a guide. [Hint: Subtract areas from the large rectangle until the parallelogram remains.] Where does the absolute value sign come from in this case? 281 h{ vxw v Figure 4 . 1 1 6. Find the area of the parallelogram determined by u and v. ( a) u = Figure 4 . 1 2 [ � ], v [ - � ] = (b) u = [!l [�] v= Generalizing from Problems 4-6, consider a parallelepiped, a three-dimensional solid resembling a "slanted" brick, whose six faces are all parallelograms with oppo­ site faces parallel and congruent (Figure 4. 1 1 ). Its volume is given by the area of its base times its height. 7. Prove that the volume V of the parallelepiped determined by u, v, and w is given by the absolute value of the determinant of the 3 X 3 matrix [ u v w] with u, v, and w as its columns. [Hint: From Figure 4. 1 1 you can see that the height h can be expressed as h = ll u ll cos e, where e is the angle between u and v X w. Use this fact to show that V = l u · (v X w) I and apply the result of Froblem 2.] 8. Show that the volume V of the tetrahedron determined by u, v, and w (Figure 4. 12) is given by V = i l u · (v X w) I [Hint: From geometry, we know that the volume of such a solid is V = t (area of the base) (height).] Now let's view these geometric interpretations from a transformational point of view. Let A be a 2 X 2 matrix and let P be the parallelogram determined by the vectors u and v. We will consider the effect of the matrix transformation TA on the area of P. Let TA ( P) denote the parallelogram determined by TA (u) = Au and TA ( v) = Av. 9. Prove that the area of TA ( P ) is given by I det Al (area of P ). 10. Let A be a 3 X 3 matrix and let P be the parallelepiped determined by the vectors u, v, and w. Let TA ( P ) denote the parallelepiped determined by TA (u) = Au, TA (v) = Av, and TA (w) = Aw. Prove that the volume of TA (P) is given by l det A I (volume of P ). The preceding problems illustrate that the determinant of a matrix captures what the corresponding matrix transformation does to the area or volume of figures upon which the transformation acts. (Although we have considered only certain types of fig­ ures, the result is perfectly general and can be made rigorous. We will not do so here.) 288 lines and Planes Suppose we are given two distinct points (x1, y 1) and (x2 , y2 ) in the plane. There is a unique line passing through these points, and its equation is of the form ax + by + c = 0 Since the two given points are on this line, their coordinates satisfy this equation. Thus, ax1 + by 1 + c = 0 ax2 + by2 + c = 0 The three equations together can be viewed as a system of linear equations in the vari­ ables a, b, and c. Since there is a nontrivial solution (i.e., the line exists), the coefficient matrix cannot be invertible, by the Fundamental Theorem of Invertible Matrices. Conse­ quently, its determinant must be zero, by Theorem 4.6. Expanding this determinant gives the equation of the line. The equation of the line through the points (x1, y 1) and (x2 , y2 ) is given by 1 1 . Use the method described above to find the equation of the line through the given points. ( a ) ( 2, 3 ) and ( - 1 , 0 ) ( b ) ( 1 , 2 ) and ( 4, 3 ) 12. Prove that the three points (xi, Y i) , ( x2 , y2 ), and (x3 , y3 ) are collinear (lie on the same line) if and only if X i Yi 1 Xz Y2 1 = 0 X3 Y3 1 3 . Show that the equation of the plane through the three noncollinear points (x1, Y 1 , z 1 ), (x2 , y2 , z2 ), and (x3 , y3 , z3 ) is given by x Xi Xz X3 y z Y1 Z i =O Y2 Zz Y3 Z3 What happens if the three points are collinear? [Hint: Explain what happens when row reduction is used to evaluate the determinant.] 289 14. Prove that the four points (x1, y 1 , z 1), (x2 , y2 , z2 ), (x3 , y3 , z3 ), and ( x4 , y4 , z4 ) are coplanar (lie in the same plane) if and only if X1 Xz X3 X4 Y1 Y2 Y3 Y4 Z1 Zz Z3 Z4 =O Curve Filling y �+--+--+---+---. x -2 2 4 6 Figure 4 . 1 3 When data arising from experimentation take the form of points (x, y) that can be plotted in the plane, it is often of interest to find a relationship between the variables x and y. Ideally, we would like to find a function whose graph passes through all of the points. Sometimes all we want is an approximation (see Section 7. 3 ), but exact results are also possible in certain situations. 1 5 . From Figure 4. 1 3 it appears as though we may be able to find a parabola passing through the points A( - 1 , 10), B(O, 5), and C(3, 2). The equation of such a parabola is of the form y = a + bx + cx 2 . By substituting the given points into this equation, set up a system of three linear equations in the variables a, b, and c. Without solving the system, use Theorem 4.6 to argue that it must have a unique solution. Then solve the system to find the equation of the parabola in Figure 4. 13. 1 6. Use the method of Problem 1 5 to find the polynomials of degree at most 2 that pass through the following sets of points. (a) A ( l , - 1 ), B ( 2 , 4), C(3, 3) l (b) A ( - 1 , - 3 ), B ( , - 1 ), C(3, 1 ) 1 7. Generalizing from Problems 1 5 and 1 6, suppose a 1 , a 2 , and a 3 are distinct real numbers. For any real numbers b 1 , b 2 , and b 3 , we want to show that there is a unique quadratic with equation of the form y = a + bx + cx 2 passing through the points (a1, b 1) , (a 2 , b 2 ), and (a 3 , b 3 ). Do this by demonstrating that the coefficient matrix of the associated linear system has the determinant a 1 ai a 2 a� = ( a 2 - a 1 ) ( a 3 - a 1 ) ( a 3 - a z ) a 3 a� which is necessarily nonzero. (Why?) 1 8. Let a 1 , a 2 , a 3 , and a 4 be distinct real numbers. Show that a1 az a3 a4 ai a� a� a4z ai a 23 = (a 2 - a 1 ) ( a 3 - a 1 ) ( a4 - a 1 ) ( a 3 - a 2 ) ( a4 - a 2 ) ( a4 - a 3 ) * 0 a� a 43 For any real numbers b 1 , b 2 , b 3 , and b4 , use this result to prove that there is a unique cubic with equation y = a + bx + cx 2 + dx 3 passing through the four points (a1 , b 1) , (a 2 , b2 ), (a 3 , b 3 ) , and (a4 , b4 ) . (Do not actually solve for a, b, c, and d. ) 290 19. Let a1, a 2 , • • • , a n be n real numbers. Prove that a 1 ai a 1 a� a 3 a� a n1 - 1 a zn - 1 a 3n - 1 l II (aj - a ;) 5' i < j -:5 n a n a zn a nn - 1 where TI1 :s i <J:s n (a1 - a;) means the product of all terms of the form (a1 - a;), where i < j and both i and j are between 1 and n. [ The determinant of a matrix of this form (or its transpose) is called a Vandermonde determinant, named after the French mathematician A. T. Vandermonde ( 1 735- 1 796).] Deduce that for any n points in the plane whose x-coordinates are all distinct, there is a unique polynomial of degree n - 1 whose graph passes through the given points. 291 292 Chapter 4 Eigenvalues and Eigenvectors ' Eigenvalues a n d E i g envectors o t n x n M atrices Now that we have defined the determinant o f an n X n matrix, we can continue our discussion of eigenvalues and eigenvectors in a general context. Recall from Section 4. 1 that A is an eigenvalue of A if and only if A - AI is noninvertible. By Theorem 4.6, this is true if and only if det(A - AI) = 0. To summarize: The eigenvalues of a square matrix A are precisely the solutions A of the equation det (A - AI) = 0 When we expand det(A - AI ), we get a polynomial in A, called the characteristic polynomial of A. The equation det(A - AI ) = 0 is called the characteristic equation of A. For example, if A = det (A - AI) = [ : �], its characteristic polynomial is l a -e A d -b i\ = (a - i\)(d - A) - be = A2 - (a + d)i\ + (ad - be) I If A is n X n, its characteristic polynomial will be of degree n. According to the Fun­ damental Theorem of Algebra (see Appendix D), a polynomial of degree n with real or complex coefficients has at most n distinct roots. Applying this fact to the charac­ teristic polynomial, we see that an n X n matrix with real or complex entries has at most n distinct eigenvalues. Let's summarize the procedure we will follow (for now) to find the eigenvalues and eigenvectors ( eigenspaces) of a matrix. Let A be an n X n matrix. 1 . Compute the characteristic polynomial det(A - U ) of A. 2. Find the eigenvalues ofA by solving the characteristic equation det (A - AI ) = 0 for i\. 3. For each eigenvalue A, find the null space of the matrix A - i\I. This is the eigenspace EA , the nonzero vectors of which are the eigenvectors of A corresponding to A. 4. Find a basis for each eigenspace. Exa m p l e 4 . 1 8 Find the eigenvalues and the corresponding eigenspaces of Section 4.3 Eigenvalues and Eigenvectors of Solution nX n Matrices 293 We follow the procedure outlined previously. The characteristic polynomial is det (A - AI) = 0 1 0 - ;\ - ;\ 1 2 - 5 4 - ;\ 1 0 1 = - ;\ - 5 4 - ;\ 2 4 - ;\ 2 = - ;\(;\ - 4;\ + 5 ) - ( - 2 ) = - +4 2 +2 -I A I 1 I A3 A - SA To find the eigenvalues, we need to solve the characteristic equation det(A - AI) = 0 for A. The characteristic polynomial factors as (A - 1 ) 2 (;\ - 2). (The Factor - the characteristic equation is Theorem is helpful here; see Appendix D.) Thus, - (;\ - 1 ) 2 (;\ - 2) = 0, which clearly has solutions ;\ = 1 and ;\ = 2. Since ;\ = 1 is a multiple root and A = 2 is a simple root, let us label them ;\1 = ;\ 2 = 1 and A3 = 2. To find the eigenvectors corresponding to ;\1 = ;\ = 1 , we find the null space of A - II = Row reduction produces � J n !] [ ] n � : �] n [A - I I O J = 9--!>- 2 1 -1 -1 -5 4 -5 1 0 -1 1 0 -----+ -5 3 -1 1 -1 0 0 (We knew ;n odvan<e thot we mu" get ot lmt one mo rnw. Why?) Thu', x � = [:: l' in the eigenspace E1 if and only if x1 - x3 = 0 and x2 - x3 = 0. Setting the free variable x3 = t, we see that x1 = t and x2 t, from which it follows that -2 [A - 2I I O J = So x � [:: l A3 = 2, we find the null space of A - 21 [ � � � ----+ [ � =i � i � i To find the eigenvectors corresponding to by row reduction: 1 -2 -5 2 0 0 0 0 0 ; , ;n the e;gmpoce E, ;f ond only ;f x, � )x, ond x, � !x,. Sethng the free variable x3 = t, we have 294 Chapter 4 Eigenvalues and Eigenvectors where we have cleared denominators in the basis by multiplying through by the least common denominator 4. (Why is this permissible?) 4 Remark Notice that in Example 4. 1 8, A is a 3 X 3 matrix but has only two distinct eigenvalues. However, if we count multiplicities, A has exactly three eigenvalues (A = 1 twice and A = 2 once). This is what the Fundamental Theorem of Algebra guarantees. Let us define the algebraic multiplicity of an eigenvalue to be its multiplicity as a root of the characteristic equation. Thus, A = 1 has algebraic multiplicity 2 and A = 2 has algebraic multiplicity 1 . Next notice that each eigenspace has a basis consisting of just one vector. In other words, dim E 1 = dim E2 = 1 . Let us define the geometric multiplicity of an eigenvalue A to be dim E;"' the dimension of its corresponding eigenspace. As you will see in Section 4.4, a comparison of these two notions of multiplicity is important. Exa m p l e 4 . 1 9 Find the eigenvalues and the corresponding eigenspaces of A= Solulion [ -� � -�] 1 0 -1 The characteristic equation is -1 - A 0 = det (A - AI) = 3 1 1 1 -1 - A -3 = -A 1 -1 - A 0 -A 0 Hence, the eigenvalues are A 1 = A 2 = 0 and A 3 = -2. Thus, the eigenvalue 0 has alge­ braic multiplicity 2 and the eigenvalue - 2 has algebraic multiplicity 1 . For A 1 = A 2 = 0 , we compute [A - OJ I O J [A I OJ 0 0 0 &om which it follow' th"t '"' eigenwdm x � -� �i [� � J [ :: l -1 0 � in E0 "ti'fi" x, � x,. Thecefoce, both x2 and x3 are free. Setting x2 = s and x3 = t, we have For A 3 = - 2, [A - ( - 2)I I O J [A + 2I I O J 0 0 -1 O 0 0 0 0 Section 4.3 Eigenvalues and Eigenvectors of nX n Matrices 295 so x3 = t is free and x 1 = -x3 = - t and x2 = 3x3 = 3t. Consequently, It follows that ,\ 1 = ,\ 2 = 0 has geometric multiplicity 2 and ,\ 3 = - 2 has geometric multiplicity 1 . (Note that the algebraic multiplicity equals the geometric multiplicity for each eigenvalue.) 4 In some situations, the eigenvalues of a matrix are very easy to find. If A is a triangular matrix, then so is A - AI, and Theorem 4.2 says that det (A - AI ) is just the product of the diagonal entries. This implies that the characteristic equation of a triangular matrix is (a ll - .\ )(a 22 - A) · · · (a nn - A) = 0 , from which it follows immediately that the eigenvalues are A 1 = a 11 , A 2 = a 22 , A n = a n,,. We summarize this result as a theorem and illustrate it with an example. • Theorem 4 . 1 5 The eigenvalues of a triangular matrix are the entries on its main diagonal. Exa m p l e 4 . 2 0 The eigenvalues of . • are ,\ 1 = 2, ,\ 2 = 1, ,\ 3 = 3, and ,\4 = - 2, by Theorem 4. 15. Indeed, the characteristic polynomial is just (2 - ,\)( 1 - ,\)(3 - ,\)( - 2 - A). Note that diagonal matrices are a special case of Theorem 4.15. In fact, a diagonal matrix is both upper and lower triangular. Eigenvalues capture much important information about the behavior of a matrix. Once we know the eigenvalues of a matrix, we can deduce a great many things without doing any more work. The next theorem is one of the most important in this regard. Theorem 4 . 1 6 A square matrix A is invertible if and only if 0 is not an eigenvalue of A. Let A be a square matrix. By Theorem 4.6, A is invertible if and only if <let A * 0. But <let A * 0 is equivalent to <let (A - O J ) * 0, which says that 0 is not a root of the characteristic equation of A (i.e., 0 is not an eigenvalue of A). Proof We can now extend the Fundamental Theorem of Invertible Matrices to include results we have proved in this chapter. 296 Chapter 4 Eigenvalues and Eigenvectors Theorem 4 . 1 1 The Fundamental Theorem of Invertible Matrices: Version 3 Let A be an n X n matrix. The following statements are equivalent: a. A is invertible. b. A x = b has a unique solution for every b in IJ�r. c. A x = 0 has only the trivial solution. d. The reduced row echelon form of A is In e. A is a product of elementary matrices. f. rank(A) = n g. nullity(A) = 0 h. The column vectors of A are linearly independent. i. The column vectors of A span IJ�r. j. The column vectors of A form a basis for IJ�r. k. The row vectors of A are linearly independent. 1. The row vectors of A span !R n . m. The row vectors of A form a basis for !R n . n. det A * 0 o. 0 is not an eigenvalue of A. The equivalence (a) � (n) is Theorem 4.6, and we just proved (a) � (o) in Theorem 4. 16. Proof There are nice formulas for the eigenvalues of the powers and inverses of a matrix. Theorem 4 . 1 8 Let A be a square matrix with eigenvalue A and corresponding eigenvector x. a. For any positive integer n, A n is an eigenvalue of A n with corresponding eigenvector x. b. IfA is invertible, then 1 /A is an eigenvalue ofA - i with corresponding eigenvector x. c. If A is invertible, then for any integer n, A n is an eigenvalue of A n with corre­ sponding eigenvector x. Proof We are given that A x = Ax. (a) We proceed by induction on n. For n = 1 , the result is just what has been given. Assume the result is true for n = k. That is, assume that, for some positive integer k, A k x = A k x. We must now prove the result for n = k + 1 . But A k + 1 x = A (A kx) = A(Akx) by the induction hypothesis. Using property ( d) ofTheorem 3.3, we have 1 A(Akx) = A k (Ax) = Ak (Ax) = A k +I x Thus, A k + l x = A k + x, as required. By induction, the result is true for all integers n 2: 1 . (b) You are asked to prove this property in Exercise 13. (c) You are asked to prove this property in Exercise 14. The next example shows one application of this theorem. Section 4.3 Eigenvalues and Eigenvectors of Exa m p l e 4 . 2 1 Compute Solution [o ] [ ] [� � ] 2 1 10 5 . 1 1 Let A [�] = eigenvalues of A are A 1 and v2 � = = and x = [�l nX n Matrices 291 � [_ ] then what we want to find is A 1 0 x. T e - 1 and A 2 = 2, with corresponding eigenvectors v 1 . That is, = 1 (Check this.) Since {v1 , v2 } forms a basis for IR 2 (why?), we can write x as a linear com­ bination of v1 and v2 . Indeed, as is easily checked, x 3v1 + 2v2 . Therefore, using Theorem 4.1 8(a), we have 10 A 1 0x A 10 ( 3v1 + 2v2 ) 3(A 10v 1 ) + 2 (A v2 ) 3(A: 0 ) v1 + 2 ( A� 0 ) v2 1 3 + 211 205 1 3 ( - 1 ) 10 + 2 ( 2 1 0) 1 2 -1 2 -3 + 2 4093 = = = = = [ l] [] [ ] [ ] = = This is certainly a lot easier than computing A 1 0 first; in fact, there are no matrix multiplications at all! When it can be used, the method of Example 4.2 1 is quite general. We summarize it as the following theorem, which you are asked to prove in Exercise 42. Theorem 4 . 1 9 Suppose the n X n matrix A has eigenvectors v1 , v2 , , vm with corresponding eigenvalues A 1 , A 2 , . . . , A m . If x is a vector in !R n that can be expressed as a linear combination of these eigenvectors-say, • . . then, for any integer k, Warning The catch here is the "if" in the second sentence. There is absolutely no guarantee that such a linear combination is possible. The best possible situation would be if there were a basis of !R n consisting of eigenvectors of A; we will explore this possibility further in the next section. As a step in that direction, however, we have the following theorem, which states that eigenvectors corresponding to distinct eigenvalues are linearly independent. Theorem 4 . 2 0 Let A be an n X n matrix and let A 1 , A 2 , . . . , A m be distinct eigenvalues of A with cor­ responding eigenvectors v1 , v2 , , vm . Then v1 , v2 , , vm are linearly independent. . . • . . • Chapter 4 Eigenvalues and Eigenvectors 298 The proof is indirect. We will assume that v1 , v2 , , vm are linearly dependent and show that this assumption leads to a contradiction. If v1 , v2 , , vm are linearly dependent, then one of these vectors must be express­ ible as a linear combination of the previous ones. Let vk + 1 be the first of the vectors V; that can be so expressed. In other words, v1 , v2 , , vk are linearly independent, but there are scalars c 1 , c2 , , ck such that (1) Proof • • . . . • • • . . . • Multiplying both sides of Equation ( 1 ) by A from the left and using the fact that Av; = A;V; for each i, we have A k+ i Vk+ i = Avk+ 1 = A(c 1v1 + c2v2 + · · · + ckvk ) = C 1AV1 + C2AV2 + . . . + ckAvk = C 1 A 1 V1 + C2 A 2V2 + . . . + ck ,\ kvk Now we multiply both sides of Equation ( 1 ) by A k + l to get (2) (3) When we subtract Equation (3) from Equation (2), we obtain 0 = c 1 (,\ 1 - A k+1 )v1 + c 2 (A 2 - A k+ 1 )v2 + · · · + ck (,\ k - A k+1 )vk The linear independence of v 1 , v2 , • . • , vk implies that c 1 (,\ 1 - A k+1 ) = c2 (A 2 - A k+1 ) = · · · = ck (,\ k - A k+1 ) = 0 Since the eigenvalues A; are all distinct, the terms in parentheses (A; - ,\ k + 1) , i = 1 , . . . , k, are all nonzero. Hence, c 1 = c2 = · · · = ck = 0. This implies that vk+1 = c 1 v1 + c2v2 + · · · + ckvk = Ov1 + Ov2 + · · · + Ovk = 0 which is impossible, since the eigenvector Vk + 1 cannot be zero. Thus, we have a contradiction, which means that our assumption that v1 , v2 , . . . , vm are linearly dependent is false. It follows that v1 , v2 , . . . , vm must be linearly independent. .. I Exercises 4 . 3 In Exercises 1 - 12, compute (a) the characteristic poly nomial of A, (b) the eigenvalues of A, (c) a basis for each eigenspace ofA, and (d) the algebraic and geometric multiplicity of each eigenvalue. 2 1 3 1. A 2. A = -1 0 -2 6 = 3. A � 5. A � [ ] [� - � :J [ - � - : :J 4. A � 6. A � [ l] [� : �] [� -� :J Section 4.3 Eigenvalues and Eigenvectors of 13. Prove Theorem 4. 1 8(b). 14. Prove Theorem 4. 1 8(c). [Hint: Combine the proofs of parts (a) and (b) and see the fourth Remark following Theorem 3.9 (page 1 69).] [_�] [�] In Exercises 1 5 and 1 6, A is a 2 X 2 matrix with eigenvec­ tors v1 = and v2 = corresponding to eigenvalues [ � ]. A 1 = t and A 2 = 2, respectively, and x = 15. Find A 1 0x. 16. Find Ak x. What happens as k becomes large (i.e., k ---+ oo)? In Exercises 1 7 and 1 8, A is a 3 X 3 matrix with eigenvectors v, � [H [J v, � and v, � [ :J wm,pond;ng to eigenvalues A 1 = -L A2 = t , and A 3 = 1 , respectively, and Fm 17. Find A 2 0 x. 18. Find Ak x. What happens as k becomes large (i.e., k ---+ oo)? 19. (a) Show that, for any square matrix A, A T and A have the same characteristic polynomial and hence the same eigenvalues. (b) Give an example of a 2 X 2 matrix A for which A T and A have different eigenspaces. 20. Let A be a nilpotent matrix (that is, A m = 0 for some m > 1 ) . Show that A = 0 is the only eigenvalue of A. 21. Let A be an idempotent matrix (that is, A 2 = A). Show that A = O and A = 1 are the only possible eigenvalues ofA. 22. If v is an eigenvector of A with corresponding eigen value A and c is a scalar, show that v is an eigenvector of A - cI with corresponding eigenvalue A - c. 23. ( a) Find the eigenvalues and eigenspaces of A = [� �] (b) Using Theorem 4. 1 8 and Exercise 22, find the eigen­ values and eigenspaces of A - i , A - 2I, and A + 2I. 24. Let A and B be n X n matrices with eigenvalues A and IL, respectively. ( a) Give an example to show that A + IL need not be an eigenvalue of A + B. (b) Give an example to show that AIL need not be an eigenvalue of AB. nX n Matrices 299 (c) Suppose A and IL correspond to the same eigen­ vector x. Show that, in this case, A + IL is an eigen value of A + B and AIL is an eigenvalue of AB. 25. If A and B are two row equivalent matrices, do they necessarily have the same eigenvalues? Either prove that they do or give a counterexample. Let p(x) be the poly nomial The companion matrix ofp(x) is the n X n matrix - an - I - an - 2 C(p) = 1 0 0 0 - a 1 - ao 0 0 0 0 0 0 0 0 (4) 26. Find the companion matrix ofp(x) = x 2 - 7x + 12 and then find the characteristic polynomial of C( p). 27. Find the companion matrix of p(x) = x 3 + 3x 2 4x + 1 2 and then find the characteristic polynomial of C(p). 28. ( a) Show that the companion matrix C (p) ofp (x) = x 2 + ax + b has characteristic polynomial A 2 + aA + b. (b) Show that if A is an eigenvalue of the companion matrix C(p) in part (a), then of C ( p) corresponding to A. [ �] is an eigenvector 29. (a) Show that the companion matrix C(p) of p (x) = x3 + ax 2 + bx + c has characteristic polynomial - (A 3 + aA 2 + bA + c). (b) Show that if A is an eigenvalue of the companion [� l 2 matrix C(p) in part (a), then of C ( p) corresponding to A. is an eigenvector 1 30. Construct a nontriangular 2 X 2 matrix with eigen­ values 2 and 5. [Hint: Use Exercise 28.] 31. Construct a nontriangular 3 X 3 matrix with eigen­ values - 2, 1 , and 3. [Hint: Use Exercise 29.] 32. ( a) Use mathematical induction to prove that, for n 2, theI companion matrix C(p) of p (x) = x n + a n _ 1 x n - + · · · + a 1 x + a 0 has characteristic polynomial ( - 1 ) np (A). [Hint: Expand by cofactors along the last column. You may find it helpful to introduce the polynomial q ( x) = (p ( x) - a 0 ) /x .] 2 Chapter 4 Eigenvalues and Eigenvectors 300 (b) Show that if A is an eigenvalue of the companion matrix C(p) in Equation (4), then an eigenvector corresponding to A is given by An - I ,t n - 2 Ifp(x) = x n + a n _ 1 x n - ! + + a 1 x + a0 and A is a square matrix, we can define a square matrix p(A) by + a 1A + a0I p(A) = A n + a n _ 1A n - i + An important theorem in advanced linear algebra says that if cA (,\) is the characteristic poly nomial of the matrix A, then cA (A) = 0 (in words, every matrix satisfies its charac­ teristic equation). This is the celebrated Cayley-Hamilton Theorem, named after Arthur Cayley ( 1 82 1 - 1 895), pictured below, and Sir William Rowan Hamilton (see page 2). Cayley proved this theorem in 1 858. Hamilton discovered it, independently, in his work on quaternions, a generalization of the complex numbers. · · · · · · and A 3 = AA 2 = A( - aA - bl) = - aA 2 - bA = - a( - aA - bl) - bA = (a 2 - b ) A + ab I It is easy to see that by continuing in this fashion we can express any positive power of A as a linear combination of I and A. From A 2 + aA + bl = 0, we also obtain A(A + al ) = - bl, so 1 a A - 1 = - -A - - I b b provided b * 0. 35. For the matrix A in Exercise 33, use the Cayley­ Hamilton Theorem to compute A 2 , A 3 , and A 4 by expressing each as a linear combination of I and A. 36. For the matrix A in Exercise 34, use the Cayley­ Hamilton Theorem to compute A 3 and A 4 by express­ ing each as a linear combination of I, A, and A 2 . 37. For the matrix A in Exercise 33, use the Cayley­ Hamilton Theorem to compute A - i and A - 2 by expressing each as a linear combination of I and A. 38. For the matrix A in Exercise 34, use the Cayley­ Hamilton Theorem to compute A - i and A - 2 by expressing each as a linear combination of I, A, and A 2 . 39. Show that if the square matrix A can be partitioned as [-ij-i-;-] A = c j [ ] 1 -1 . 2 3 That is, find the characteristic polynomial cA ( ,\) of A and show that cA(A) = 0. 34. Verify the Cayley-Hamilton Theorem for 33. Verify the Cayley-Hamilton Theorem for A = A� [� � :J The Cayley-Hamilton Theorem can be used to calculate powers and inverses of matrices. For example, ifA is a 2 X 2 matrix with characteristic poly nomial cA (,\) = ,\ 2 + a,\ + b, then A 2 + aA + bl = 0, so A 2 = - aA - bl where P and S are square matrices, then the character­ istic polynomial of A is cA (,\) = cp(A) c5(,\). [Hint: Use Exercise 69 in Section 4.2.] 40. Let A 1 , A 2 , . . . , A n be a complete set of eigenvalues (rep­ etitions included) of the n X n matrix A. Prove that det (A ) = A 1 A 2 A n and tr (A ) = A 1 + A 2 + + An • • · · · · [Hint: The characteristic polynomial of A factors as det (A - AI) = ( - l) n (,\ - ,\ 1 ) (,\ - A 2 ) (,\ - A n ) Find the constant term and the coefficient of A n - ! on • • • the left and right sides of this equation.] 41. Let A and B be n X n matrices. Prove that the sum of all the eigenvalues of A + B is the sum of all the eigenval­ ues of A and B individually. Prove that the product of all the eigenvalues of AB is the product of all the eigenval­ ues of A and B individually. (Compare this exercise with Exercise 24.) 42. Prove Theorem 4. 19. Section 4.4 Similarity and Diagonalization Writi n g Proiect 301 The History of Eigenvalues Like much oflinear algebra, the way the topic of eigenvalues is taught today does not correspond to its historical development. Eigenvalues arose out of problems in sys­ tems of differential equations before the concept of a matrix was even formulated. Write a report on the historical development of eigenvalues. Describe the types of mathematical problems in which they originally arose. Who were some of the key mathematicians involved with these problems? How did the terminology for eigenvalues change over time? 1 . Thomas Hawkins, Cauchy and the Spectral Theory of Matrices, Historia Math­ ematica 2 ( 1 975), pp. 1 -29. 2. Victor J. Katz, A History of Mathematics: An Introduction (Third Edition) (Read­ ing, MA: Addison Wesley Longman, 2008) . 3. Morris Kline, Mathematical Thought from Ancient to Modern Times (Oxford: Oxford University Press, 1 972) . S i m i l a rilv and D i a u o n alizalion As you saw in the last section, triangular and diagonal matrices are nice in the sense that their eigenvalues are transparently displayed. It would be pleasant if we could relate a given square matrix to a triangular or diagonal one in such a way that they had exactly the same eigenvalues. Of course, we already know one procedure for con verting a square matrix into triangular form-namely, Gaussian elimination. Unfor­ tunately, this process does not preserve the eigenvalues of the matrix. In this section, we consider a different sort of transformation of a matrix that does behave well with respect to eigenvalues. Similar Malrices D e fi n it i o n Let A and B be n x n matrices. We say that A i s similar to B if there is an invertible n X n matrix P such that P - 1AP = B. If A is similar to B, we write A � B. R e m a rks If A � B, we can write, equivalently, that A = PBP - 1 or AP = PB. • Similarity is a relation on square matrices in the same sense that "less than or equal to" is a relation on the integers. Note that there is a direction (or order) implicit in the definition. Just as a :::; b does not necessarily imply b :::; a, we should not assume that A � B implies B � A. (In fact, this is true, as we will prove in the next theorem, but it does not follow immediately from the definition.) • The matrix P depends on A and B. It is not unique for a given pair of similar ma­ trices A and B. To see this, simply take A = B = I, in which case I � I, since p - i IP = I for any invertible matrix P. • 302 Chapter 4 Eigenvalues and Eigenvectors Exa m p l e 4 . 2 2 Let A = [ J [ J [ � - � J [ � -l J [ l J [ 1 -l J [ O J [ -l J 1 2 1 and B = 0 -1 -2 1 - 0 . Then A � B , since -1 3 -1 -1 - 1 1 1 -2 - 1 1 . (Note that it is not necessary to compute P - 1 . 1 1 See the first Remark before Example 4.22. ) Thus, AP = PB with P = Theorem 4 . 2 1 Let A, B, and C be n X n matrices. 4 a. A � A b. If A � B, then B � A. c. If A � B and B � C, then A � C. (a) This property follows from the fact that r 1AI = A. (b) If A � B, then P - 1AP = B for some invertible matrix P. As noted in the first Remark on the previous page, this is equivalent to PBP - 1 = A. Setting Q = P - 1 , we have Q - 1 BQ = (P - 1 ) - 1 BP - 1 = PBP - 1 = A. Therefore, by definition, B � A. (c) You are asked to prove property (c) in Exercise 30. Proof Remark Any relation satisfying the three properties of Theorem 4.2 1 is called an equivalence relation. Equivalence relations arise frequently in mathematics, and objects that are related via an equivalence relation usually share important properties. We are about to see that this is true of similar matrices. Theorem 4 . 2 2 Let A and B be n X n matrices with A � B. Then a. det A = det B b. A is invertible if and only if B is invertible. c. A and B have the same rank. d. A and B have the same characteristic polynomial. e. A and B have the same eigenvalues. f. A m � B m for all integers m 2::: 0. g. If A is invertible, then A m � B m for all integers m . Proof We prove (a) and (d) and leave the remaining properties as exercises. If A � B, then P - 1AP = B for some invertible matrix P. (a) Taking determinants of both sides, we have det B = det ( P - 1AP) = ( det P - 1 ) ( det A ) ( det P) 1 = -- ( det A ) ( det P) = det A det P (d) The characteristic polynomial of B is det ( B - AI) = det (p - i AP - AI) = det ( P - 1 AP - AP- 1 IP) ( ) Section 4.4 Similarity and Diagonalization 303 det ( P - 1 AP - P - 1 ( AI) P) det ( P - 1 (A - AI) P) = det (A - AI) with the last step following as in (a) . Thus, det(B - i\I) = det(A - i\I); that is, the characteristic polynomials of B and A are the same. Remark [ � � ] and B = [ � � ] both have de­ Two matrices may have properties (a) through ( e) (and more) in common and yet still not be similar. For example, A = terminant 1 and rank 2, are invertible, and have characteristic polynomial ( 1 - i\) 2 and eigenvalues i\ 1 = i\ 2 = 1 . But A is not similar to B, since P - 1 AP = P - 1 IP = I * B for any invertible matrix P. Theorem 4.22 is most useful in showing that two matrices are not similar, since A and B cannot be similar if any of properties (a) through (e) fails. Exa m p l e 4 . 2 3 [ � �] and B = [� � ] are not similar, since det A = - 3 but det B = 3. 1 3 1 1 ] are not similar, since the characteristic polyno] and B = [ (b) A = [ 2 2 3 -1 (a) A = mial of A is i\ 2 - 3i\ - 4 while that of B is i\ 2 - 4. (Check this.) Note that A and B do have the same determinant and rank, however. Diagonalization The best possible situation is when a square matrix is similar to a diagonal matrix. As you are about to see, whether a matrix is diagonalizable is closely related to the eigenvalues and eigenvectors of the matrix. An n x n matrix A is diagonalizable if there is a diagonal matrix D such that A is similar to D -that is, if there is an invertible n X n matrix P such that P - 1AP = D. D e fi n it i o n Exa m p l e 4 . 2 4 [l ] [l ] [ 3 3 4 0 A= is diagonalizable since, if P = and D = , then 0 -1 2 2 1 -2 1 P - AP = D, as can be easily checked. (Actually, it is faster to check the equivalent statement AP = PD, since it does not require finding P - 1 .) ] -+ Example 4.24 begs the question of where matrices P and D came from. Observe that the diagonal entries 4 and - 1 of D are the eigenvalues of A, since they are the roots of its characteristic polynomial, which we found in Example 4.23(b ). The origin of matrix P is less obvious, but, as we are about to demonstrate, its entries are obtained from the eigenvectors of A. Theorem 4.23 makes this connection precise. 304 Chapter 4 Eigenvalues and Eigenvectors Theorem 4 . 2 3 Let A b e an n X n matrix. Then A is diagonalizable if and only if A has n linearly independent eigenvectors. More precisely, there exist an invertible matrix P and a diagonal matrix D such that P - 1AP = D if and only if the columns of P are n linearly independent eigen­ vectors of A and the diagonal entries of D are the eigenvalues of A corresponding to the eigenvectors in P in the same order. Suppose first that A is similar to the diagonal matrix D via P - 1AP = D or, equivalently, AP = PD. Let the columns of P be p1 , p2 , , P n and let the diagonal entries of D be A 1 , A 2 , , A n . Then A, 0 Az A [ p 1 Pz (1) [ p l Pz p. ] Pn l Proof • . or . . . . [ A p 1 A pz . . A p n ] = [ A 1P1 A z pz . [ . . • ! ]J 0 A nPn ] (2) where the right-hand side is just the column-row representation of the product PD. Equating columns, we have A p 1 = A 1P1 , A p z = A z pz , · · · , A pn = A nPn which proves that the column vectors of P are eigenvectors of A whose corresponding eigenvalues are the diagonal entries of D in the same order. Since P is invertible, its col­ umns are linearly independent, by the Fundamental Theorem of lnvertible Matrices. Conversely, if A has n linearly independent eigenvectors p1 , p2 , , P n with cor­ responding eigenvalues A 1 , A 2 , , A n , respectively, then A pn = A nPn A p 1 = A 1P1 , A p z = A z pz , This implies Equation ( 2) above, which is equivalent to Equation ( 1 ) . Consequently, if we take P to be the n X n matrix with columns p1 , p2 , . . . , P n ' then Equation ( 1 ) becomes AP = PD. Since the columns of P are linearly independent, the Fundamental Theorem of lnvertible Matrices implies that P is invertible, so P - 1AP = D ; that is, A . . • • . • · · · , is diagonalizable. Exa m p l e 4 . 2 5 If possible, find a matrix P that diagonalizes A � [� -� :J We studied this matrix in Example 4. 1 8, where we discovered that it has eigenvalues A 1 = A 2 = 1 and A 3 = 2. The eigenspaces have the following bases: Solulion Fod , � A, � 1 , E , has hasis Fod, � 2, E, has basis [n [ :J Section 4.4 Similarity and Diagonalization 305 Since all other eigenvectors are just multiples of one of these two basis vectors, there cannot be three linearly independent eigenvectors. By Theorem 4.23, therefore, A is not diagonalizable. Exa m p l e 4 . 2 6 4 [� - � � =i l If possible, find a matrix P that diagonalizes A Solution This is the matrix of Example 4. 19. There, we found that the eigenvalues of A are A 1 = A 2 = 0 and ,\ 3 = - 2, with the following bases for the eigenspaces: Fod , Fod, � � � A, �m �m [� - :J 0, E 0 h" b"i' p, - 2, E_, h" b"i' P ; ond p, It is straightforward to check that these three vectors are linearly independent. Thus, if we take then P is invertible. Furthermore, as can be easily checked. (If you are checking by hand, it is much easier to check the equivalent equation AP = PD.) R e m a rks When there are enough eigenvectors, they can be placed into the columns of P in any order. However, the eigenvalues will come up on the diagonal of D in the same order as their corresponding eigenvectors in P. For example, if we had chosen • then we would have found 306 Chapter 4 Eigenvalues and Eigenvectors In Example 4.26, you were asked to check that the eigenvectors p1, p 2 , and p 3 were linearly independent. Was it necessary to check this? We knew that { p1, p2 } was linearly independent, since it was a basis for the eigenspace E0 We also knew that the sets { p1, p 3 } and { p2 , p 3 } were linearly independent, by Theorem 4.20. But we could not conclude from this information that { p 1 , p2 , p 3 } was linearly independent. The next theorem, however, guarantees that linear independence is preserved when the bases of different eigenspaces are combined. • • Theorem 4 . 2 4 Let A be an n X n matrix and let A 1 , A 2 , , A k be distinct eigenvalues of A. If B; is a basis for the eigenspace E;v then B B1 U B 2 U · · · UB k (i.e., the total collection of basis vectors for all of the eigenspaces) is linearly independent. • • • = Proof Let Bi = {v;1 , v;2 , • . • , v; n ) for i = 1, . . . , k. We have to show that is linearly independent. Suppose some nontrivial linear combination of these vectors is the zero vector-say, ( C 1 1 V1 1 + . . ' + C 1 n,V1 n ) + ( C2 1 V2 1 + . ' . + C2 n2V2 n) + . ' . + ( ck! Vk l + . . . + Ckn,Vkn) 0 (3) Denoting the sums in parentheses by x1 , x2 , . . . xk, we can write Equation (3) as x 1 + x2 + (4) + xk 0 = · � · · = Now each x, is in EA, (why?) and so either is an eigenvector corresponding to A; or is 0. But, since the eigenvalues A; are distinct, if any of the factors X; is an eigenvector, they are linearly independent, by Theorem 4.20. Yet Equation ( 4) is a linear depen­ dence relationship; this is a contradiction. We conclude that Equation (3) must be trivial; that is, all of its coefficients are zero. Hence, B is linearly independent. There is one case in which diagonalizability is automatic: an n X n matrix with n distinct eigenvalues. Theorem 4 . 2 5 If A is an n X n matrix with n distinct eigenvalues, then A is diagonalizable. Let v1, v2 , . . . , vn be eigenvectors corresponding to the n distinct eigenvalues of A. (Why could there not be more than n such eigenvectors?) By Theorem 4.20, v1 , v2 , . . . , vn are linearly independent, so, by Theorem 4.23, A is diagonalizable. Proof � Exa m p l e 4 . 2 1 The matrix has eigenvalues A 1 2, A 2 5, and A 3 - 1 , by Theorem 4.15. Since these are three distinct eigenvalues for a 3 X 3 matrix, A is diagonalizable, by Theorem 4.25. (If we actu­ 1 ally require a matrix P such that p - AP is diagonal, we must still compute bases for the eigenspaces, as in Example 4. 1 9 and Example 4.26 above.) = = = 4 Section 4.4 Similarity and Diagonalization 301 The final theorem of this section is an important result that characterizes diagonaliz­ able matrices in terms of the two notions of multiplicity that were introduced following Example 4. 1 8. It gives precise conditions under which an n X n matrix can be diago­ nalized, even when it has fewer than n eigenvalues, as in Example 4.26. We first prove a lemma that holds whether or not a matrix is diagonalizable. Le m m a 4 . 2 6 A If is an n X n matrix, then the geometric multiplicity of each eigenvalue is less than or equal to its algebraic multiplicity. A A Q Suppose 1 is an eigenvalue of with geometric multiplicity p; that is, dim EA , = p. Specifically, let EA , have basis B 1 = {v1 , v2 , , vp }. Let be any invertible n X n matrix having v1 , v2 , , vp as its first p columns - say, Proof Q ( Y1 . • = or, as a partitioned matrix, · · Yp · Let C U Yp + l Q [U VJ = where is p X n. Since the columns of also have . . • . . . . Yn J i are eigenvectors corresponding to A 1 , AU A U. = 1 CU= CV= DU= DV �!.l:!!._l-�!.l:Y] LA[�_11-D�_f!U:_�-D�!.l:AVYJ [�-1�!'.L!DAV ��-�-] Q- 1AQ [�lD ] A [ U :' V J [-DAU!DAV 1 ( ) ( ) det Q A Q A P V A AI A 1 Q- 1AQ - AI) A, Q- 1AQ, A1 A A ... , Ak. A A from which we obtain IP , = = 0, and 0, = In- p · Therefore, = = By Exercise 69 in Section 4.2, it follows that = We det ( DA 0 I) (5) But det ( is the characteristic polynomial of which is the same as the characteristic polynomial of by Theorem 4.22(d) . Thus, Equation (5) implies that the algebraic multiplicity of is at least p, its geometric multiplicity. Theorem 4 . 2 1 The Diagonalization Theorem Let be an n X n matrix whose distinct eigenvalues are 1 , A 2 , statements are equivalent: The following a. is diagonalizable. b. The union B of the bases of the eigenspaces of (as in Theorem 4.24) contains n vectors. c. The algebraic multiplicity of each eigenvalue equals its geometric multiplicity. 308 Chapter 4 Eigenvalues and Eigenvectors a)=? (ebm) I4.f2A3.isIfdiagonalof thieszable eie,gtenvect hen it ohasrs corlirneearspondly intdependent eigenvec­ ttohrsen, bycont(Theor o t h e ei g enval ueare A;ly, a i n s at l e as t vect o r s . ( W e al r e ady know t h at t h es e vect o r s ar e l i n ng that micontghtaprinesventat letashemt vectfromorbeis. nButg a, basby Theor is for emis4.th2at4, tinhdependent eyis amilingearht lnoty; tihnesdependent ponlanyitt.h) iThus, (multbip)l=?icit(yc)ofLetA; bethe geometBy setLemma ricinmul4.tihence, p26,licid;ty ioft contA; beforaidn;s==exactdi1,m...ly andvectNowloerst t.ashesualmegebrthaatic property (b) holds. Then we also have iplicities ofy, the eiButgenvalIt foluloeswsofthAati·sd·j1·ust tdh2e degr=· · ·eesiofndcekth=tehchare suamcteofristth·iec· pol·algyebrnomiaiwhic amullcofhtA-namel implies that (6) Uscanindeduce g Lemmathat4.each26 agaisummand n, we knowin Equatthat ion-(6)d;is2:zer0 ofor; that=is1,, ...= d; from whi1, ...ch we fo r = ity has andd1 thde2 geometridck mul= tip1 licity d; are equal(c)for==?eachvect(a) Ioeifrtgsh,envalwhie alcguhebrearA;aeicoflinmulA,earttliyhpenliincdependent Thus,etmhes4.e2ar3.e linearly independent eigenvectors of A, and A is ,dibyagonalTheorizeablm e4., 2by4.Theor n Proof � B; n; n; n; B B n !R n ; :s m; m;. m1 + m 2 + + mk n Exa m p l e 4 . 2 8 n EA ; , k. i n, + mk + EA ; + + m1 + m 2 + + m k, m; i B n m; + + n. m; · · · , k, m + m2 + + , k. · · · (a) The matrix A = [ 2� - �5 4� i from Example 4. 1 8 has two distinct eigenvalues, A2 1but= Ageomet 2 = 1 andric mulA3 t=ipl2.icSiitynce1, Athies einotgenvaldiagonalue A1iz=ablAe2, by= 1thhase DialaggonalebraiiczatmuliontipTheo­licity rem. (See also Example-4.25.) (b) The matrix A = [ �l 0� - �1 ] fromExample 4. 1 9also hastwodistincteigenvaltipluicesit,yA2,1 =andA2 t=he0eiandgenvalA3 =ue--2.2Thehas eialggenval uc eand0 hasgeomet algebrriacicmulandtigeomet ricThusmul­, ebr a i p l i c i t y 1. tfihnidis matngsriinxExampl diagonale 4.i2z6.abl) e, by the Diagonalization Theorem. (This agrees with our tionWeof tconcl he powerude sthofis asectmatiornixwi. th an application of diagonalization to the computa­ Compute A10 if A = [ � � ]. In Example 4.2 1, we found that this matrix ]has eigenvalues]. A1 = - 1 and A2 = 2, with corresponding eigenvectors = [ _ � and = [ � It follows - is Exa m p l e 4 . 2 9 i Solution v1 v2 Section 4.4 Similarity and Diagonalization 309 P-(fro1AmPany= Done, wherof ae number of theorems in this section) that A1 is diagonalizable and l J and D = [ - 0 2O J 2 Solvi1.ng for A, we have A = PDP- 1 and, by Theorem 4.22(f), An = PDnP - 1 for all Since [ ( -01 r 2On J we have An = PDnp- 1 = [ _ � � J [ ( -01 r �n J [ _ � � r l [ _ � � J [ ( -Ol )n 20n J [iI - iJ 2( - 1 )" + 2" H r' + 2· l [ 2( - 1 )•: + 2•+• r - 1r: + 2·+• Sisent ce=we10wertoefionlndy asked for A10, this is more than we needed. But now we can simply 10 ( - 1 ) 1 + 2 10 2 1 � ) ( + 1 2 2 AlO = [ 2( - 1 ) 1 + 2 1 :3 + 2 " ] � [68234 68343 ] 3 n 2: n (- !)' I Exercises 4 . 4 A A = [! � l = [� �J 2. A = [ - 42 6l J , [ 3 - l J 1 3.A = [ : 02 ! J .B � [- : 403 � ] 4.A � [ : - 21 - : J B � [ : 011 � ] In Exercises 1 -4, show that and B are not similar matrices. 1. B B= -5 7 P- 1AP = D. AA [ - 21 - � J [ � - � J [ � � J = [� �J 1 - 11 1 0 0 [! 1 -I 1 1 �m 0 J 0 0 � [: 0 -� l In Exercises 5-7, a diagonalization of the matrix is given in the form List the eigenvalues of and bases for the corresponding eigenspaces. 5. 6. l 6 -2 3 -3 Chapter 4 Eigenvalues and Eigenvectors 310 0 3 [ - ! -�I23 03 �m _ : ] 02 � [ � 0 - �l A A = -3 8.A = [ � � ] A = [ �] 0 IO.A = [ � 03 � ] A = [ : il 2 0 12.A = [ � 20 : J 13. A = H 0 il 02 00 03 02 14.A = 0 3 ] 15. A � 0 - 2 ] ll 0 0 ; l� 0 0 j [ -- 43 :r [ - � �r [ 4 - r6 [ � �r " � r � [� n [i 0 0 [ � - 2 ff [: 0 J A A = [� �] 24.A = [ � �] 0 26.A = [ � � ] A = [i 0 �] I 8 3 4 3 -3 7. 1 -1 -3 In Exercises 8-15, determine whether is diagonalizable and, if so, find an invertible matrix P and a diagonal matrix D such that P 1 P D. - 9. -1 11. 1 In Exercises 1 6-23, use the method of Example 4.29 to compute the indicated power of the matrix. 1 7. 1 6. 18. 1 9. -1 21. 20. 22. 1 1 -1 -1 23. In Exercises 24-29, find all (real) values ofkfor which is diagonalizable. 25. 27. 1 Pr o ve Theor e m 4. 2 ( c ) . l 31. PrProoveve Theor e m 4. 2 2( b ) . Theoreemm 4.4.222(2(ce)).. PrProoveve Theor Theor e m 4. 2 2( f ) . PrIfAoveandTheorB areemin4.ver22(tibgl)e. matrices, show that AB and BAProvearethsiatmiiflaAr. and Bare similar matrices, then tr(A) Sect= tri(oBn) .3.2.] Find a way to use Exercise 45 from A B = B. 38.A = [ � - � l B = [ � � ] 39.A = [ : -- 23 ],B = [ -l 4l ] 2 40.A = [ : - 20 : J B � [ : 22 -- 4S i 2 0 2 [ 3 41. A = [ : 0 � ] ,B = : 45 - � l PrProoveve tthhatat iiffAA iiss disimagonal ilar toizB,ablteh,ensoAisTAisT.similar to BT. matrix. Prove that ifA is diago­ nalLetProiveAzablbetheat,ansoifinAisverAis taibdileagonal eimatgenvalrix isucale letdhena A is of thiezablforemmatA rix wi(tShuchonlya one eivectLetgenvalAorands iufesandB. bePronlovey tihfABatmatA=andriBA.cesB, eachhavewithtehsamedisteiingctences. PruoesveofthAatandthe Balarge­e brtLetheaiAscamulme.andtiBplbeicitsiiems ioflarthmate eirgienval 30. 32. 33. 34. 35. 36. 37. [Hint: In general, it is difficult to show that two matrices are simi­ lar. However, if two similar matrices are diagonalizable, the task becomes easier. In Exercises 38-41, show that and are similar by showing that they are similar to the same diagonal matrix. Then find an invertible matrix P such that P - 1AP -6 1 -1 -1 42. 43. 44. - 45. 46. 47. A, i. scalar matrix.) nXn = AI. n Section 4.5 Iterative Methods for Computing Eigenvalues 48. ove that it ivects notorposs sible toifinnIRd6tshurchee ltihnatearly metLetthe sraicme.andmulBtibeplicsiiShow tmieislaofrtmaththate,reiiifcgesBenval. Prouvees tofhatthtandhene geo­everB arye iPrndependent and eivectgenvect o r of Bi s of t h e for m for s o me ei g en­ Ihf e eiisgdiensagonal izable, whatandare the dimensions of o r of t p aces Preverovey eithgatenvalif uies ofa diaigonal ierzableormattrhixensuchistihdatem­ 52.LetA = [ : �] . s ei t h potLetentbe(thaatniilsp,otent matrix (that is, for some Prove that is diagonalizable if and is not diagonalizable if Pr o ve t h at i f i s di a gonal i z abl e , t h en mus t beSupposthe zere tohatmatrisixa. matrix with characteristic Find two examples totdemons trateorthmayat ifnot be h en may diagonalizable. ( ( polynomial A 50. 51. v A.] A A m > 1). (a) A = P - 1AP, p - 1v [Hint: 49. 311 A A 2 = A). A 0 1, v1 , v2 , v3 Av1 = v1 , Av2 = v2 , Av3 = v3 . (b) A E2 ? E_ 1 , E 1 , A (a) Am = 0 4bc > 0 (a - d) 2 + (a - d) 2 + A 4 bc < 0. A (b) A 6X6 cA (,\) = ( 1 + ,\) 1 - ,\) 2 2 - ,\) 3 • (a - d) 2 + 4bc = 0, A Iterative M e t h o d s t o r C o m p ut i n g E i g e n v a l u e s Inematician 1824, theNielsNorwegian math­ Henri k Abel (1802-1829) proved thatpolynomial a general fiequation fth-degreeis not(quintic) that is,inthere isofnoitformula for its roots terms s coefficients that uses onlsubtraction, y the operations ofi­ addition, mul t i p l cati on,Indivision, andwrittaki ning 1830 nth roots. a paper ten and published posthumously in 1846, the French mathematician Evariste Galois (1811-1832) gave alished moreconditions complete theory that estab­ under which an arbitrary polynomial equation can bewassolinstrumental ved by radicals.in establishing Galois's work the branchhis approach of algebratocalled equations is now knownpolynomial as solvable by radicals; group theory; Galois theory. Atto stohlivsepoithnetchar, theaonlcteyrismettic hequatod weion.haveHowever for comput iengartehesevereigenvalal pruoesbleofmsa wimatthritxhiiss , t h er metit depends hod thatonrenderthe comput it impraatctioincalofina aldetl butermsimnantall exampl esis. Thea verfiyrsttiprme-oblconsemuismithnatg , whi c h pra polocesynomi s for alalrequat ge matiorn,ices.andThetherseecondare noprforoblmemulaiss tforhatstohlveicharng polactyenomiristicaequat ioinonsis l equat ofquadrdegrateiec forhigmherulathandan its(panalolynomi al.sThusof degr, weeesare forcanded to can be solvedeigusenvaling uthese o gues) t praalctariecalquiprteoblseensms.itivUnfor tuundoff nately,ermetrorhandods arfore tapprhereoforxime unratineglitahbleer. oots of a polin mosyInomi e t o r o cteristoirc fipolrstyandnomithaenl alustogetinghterhisandeigenvect take aodir tffoefirenntd apprthe coronach,streead,spapprondiweobypas xingmeiatgsienval ntgheancharue.eigIaenvect one such method that is based onn athsiismsplecte iioten,rawetivewitelchniexplque.ore several variations on 2, 3, 4 4 approximate Thethat power metgenvalhodueapplthatiesistloaranger in absmatolutreixvalthuate hasthanaall of the other eigenvalues. i s , an ei Onandthuee.otthherenhand,is athmate domirix nwiantth eieiForggenval envalexampluuese, es,inifcea matrixandhas eihasgenvalno 2domiues n2ant eigenval meta sheodquenceproceedsof vectiteorrastitvhelatyconver to produceges toa tshequence ofpondiscalnargseithgatenvec­con­ vertor gThees tthopower and e cor r e s sume metthathtod.he matrix is diagonaliezable. The following theorForemsiismtplheicbasity,isweforwithl easpower The Power Method nXn 4 = 1 -41 > l -31 - 4, - 3, 3, 4 v1 , ,\ 1 dominant eigenvector. dominant eigenvalue A 1 - - 4, - 3, 1, I l l. 131 3, -4 A 312 Chapter 4 Eigenvalues and Eigenvectors Theorem 4 . 2 8 an o vectdioargonalsuichzabltheatmattherisxequence with domiof vectnantoreis xgenval u e Then t h er e exiLetsts abenonzer defi n ed by k , approaches a dominant eigenvector of We may assume that the eigenvalues2: of have2: · ·been· 2: labeled so that be),ththeeycorforersmpondia basngiseifogrenvectConsors. Siequent nce ly, we...can, vwrn aritee linearaslya ilLetinndependent ( w hy? ear combination of these eigenvectors-say, Nowx1 and, gener a l y , fo r k 2: As we saw in Example nXn A A1. x0 x1 = AXo , x2 = Ax1 x3 = Ax2 , . . . , xk = Axk _ 1 , • • • A. Proof _. I A 1 I > I Az I v1, v2 , . . . , v" A I A3 I I A" I v1, v2 , !R n . x0 Xo = C 1 V 1 + C2V2 + . . . + C nVn = Ax0, x2 = Ax1 = A (Ax0) = A 2 x0, x3 = Ax2 = A (A 2 x0) = A 3 x0, xk = A k Xo 4.2 1 , 1 A kXo = C 1 A �v1 + C2 A ;v2 + . . . + cn A �vn ( = A i c 1 v1 + c 2 (�:)\2 + · · · + c" ( ��yv") (1) wherThee wefacthavethusated thise tfacthe domi that nant eigenvalue means that each of the fractions is les than in absolute value. Thus, A1 * 0 . A1 A 2 / A1, A 3 / A1, . . . , A n / A1, 1 all go to zero as k � It folxk lows that� ask� Now, and x i s appr o achi n g a mul t i p l e of ( t h at i s , an s i n ce v1 k eiongenvect (This is thine trheequidirreedcticondi tdomihne antinioteiiraglcorrenvect vectesoproondir Int gmusto t have a nonzero component on oftitohne Appr Theoroeximmate the dominant eigenvector of [ � � ] using the method of We wil take [ � ] as the initial vector. Then [ � � ] [� ] [ � ] [� �] [�] [�] We continue in this fashion to obtain the values ofxk in Table oo. A1 * 0 = A k Xo v1 * 0, x0 : v1.) Exa m p l e 4 . 3 0 (2) oo nonzero A1) provided c 1 * 0. c1 A= 4.28. Solution A � c 1 v1 x0 = x 1 = Ax0 = x2 = Ax1 = 4. 1 . Section 4.5 Iterative Methods for Computing Eigenvalues 313 Ta b l e 4 . 1 xk [ � ] 0.[ � ]0 1.[ �5]0 0.[ :8J3 [1.�1�0] [0.�9�5] [1.!0�2] [0.!:99] [1.1701710 1] rlkk 1.050 3.00 1.67 2.20 1. 9 1 2.05 1.98 2.0 1 k 0 2 1 3 4 5 6 7 8 y 7 6 5 4 3 2 o x 0-/-t----2 --t- 3---1 --t-----t ---r---r-4 5 6 7 Xo Figure 4 . 1 4 � e 4.n14antsheiowsgenvect whatoisr happeni ngdigeomet rnic1.all(yW. Wehy?knowSee Exerthatcitshee46.eig)ensTherpacee­ foforretFi,hietgiurdomi wi l have m ens i o in sTheas thfioughrst fewthieteirtaetreastexskarareeconver showngalinogngonwiththe the diresctaiolinsnetthheyroughdeterthmeinore.igIitnappear lweinesewhosek, wee dineedrectionlon yvectobsorerivse[t�h].atTotheconfiratiormr tofhatthtehifisrisst tthoethdomie secondnant eicomponent genvector k line in the body of Table 4. 1 gives ofthesxkegetvalsuveres, yandcloyouse tocan1 asseke iclnecrarelasesy th.atTherk isseicond n deed appr o achi n g 1. We deduce t h at a dominant eigenvector of is [ � ]. Oncenantweeighaveenvalfound a domiapprnoantacheigisenvect ore,rvhowe thcanat ifweanfixnkd itsheapprcororxiesmpondiatelynga domi u e? One t o obs dominant eigenvector of for the dominant eigenvalue A1, then Iaskt folilnocrwseastheats. tTablhe reat4.io1 lgik ofvesthtehefivalrst ucomponent ofyouxk+can1 tosteheatthofat xthk eywilarappre approachoach­A1 es of l , and k ing 2, which is the dominant eigenvalue. IR 2 . A A 314 Chapter 4 Eigenvalues and Eigenvectors e ivers aydrlaarwback srsof. Totheavoiiterd­ attheissTherdragetwback, gwe cane vermulytoquitthipcelklymetyeachandhodicanteofraExampl causte byessoiegmenifisccantalThearrotcomponent undoff er r o hatwirleduces the magni ­ tdomiude ofnantits eicomponent s . Si n ce s c al a r mul t i p l e s of t h e i t e r a t e s s t i l conver g e t o a oach byis accept wayseachto accom­ pla ish itvect. Oneor)gienvect s. Anto noreasorm,ietalhr imetiszeappreachhod-and dithveidoneianblgweeit. Ther bywil eusare(e-ii.vare.s, titooousmake iterabyte di v i d e each largest component nowtthhee maxicomponent 1. Thimums metabsolwihodthuittsehecalvalmaxiluee,d wemumwilabsolrThusepluacet,eivalf byue,denotso the(1ats ththeecomponent of witihs is notWehinilgutsotrdo,ate tshinisceapproach1. wiHence,th the calculations from Example For there 4.30: xk xk unit xk II xd l scaling. mk xk Yk = / m k) xk . xk m0 = xk 4.30. x0, We then compute [ � ] as before, but now we scale with 1 to get m =2 x1 = Now the calculations change. We take and scale to get C�5 )[ � ·5 ] [ �.67 ] The next few calculations are summarized in Table 4.2. ], a You can now see cl e ar l y t h at t h e s e quence of vect o r s i s conver g i n g t o [ � sdomipondinantng domieigenvectnantoeir.gMorenvaleoverue ,\,1 the sequence of scalars converges to the corre­ = Y2 = Yk mk = 2. Ta b l e 4 . 2 k xk Yk mk [ � ] [ � ] [ � ·5 ] [ �·67 ] [ 1.1.6837 ] [ �·9 1 ] [ 1.1.9951 ] [ �·98 ] [ 1.1.9998 ] [ � ] [ � · 5 ] [ �. 67 ] [ � . 83 ] [ �. 9 1 ] [ � · 95 ] [ �. 9 8 ] [ � · 99 ] [ � .99 ] 1.5 1 1.8 1.99 1.95 0 1 2 2 3 2 5 4 3 2 6 7 2 8 Section 4.5 Iterative Methods for Computing Eigenvalues 315 This method, called the is summarized below. Levaltue be a diagonalizable matrix with a corresponding dominant eigen­ Let x0 =thy0e folbeloanywiningisttiaelpsvectforkor i=n whos... :e largest component is Repeat (((cba))) SetyComput Let mk k=bee( lxt/mhke=component of xk with the largest absolute value. . ) x k k k converor. ges to the dominant eigenvalue and Yk converForgmoses tot achoidomicesnantof x0ei,gmenvect 4 eiUsgeenvectthe power or of method to approximate the domi5 nant- eigenvalue and a dominant = [ -� 0l Taking as our initial vector power method, The Power Method A 1. 2. nX n ,\ 1 . !R n 1. 1, 2, AYk - 1· ,\ 1 Exa m p l e 4 . 3 1 A -2 12 -2 6 - 12 1 Solution we compute the entries in Table 4.3. 0. 5 0 l You can see that the vectors Yk are approaching [ - 0.50 and the scalars mk are tapprhe domioachinnantg eigThienvals suueggesof ts that they are, respectively, a dominant eigenvector and 1 16. A. the domimetnhantod eigenvectIf tohre in(iit.iea.l, vectif o=r x00 inhasthae zerprooofcomponent of Theoremin the dirthecteniotnheofpower R e m a rks • v1 4.28), c Ta b l e 4 . 3 xk [: l [ = : l [ l [ - .00 l [ . 5 l [ - 80..0053 l [ - 8..00 1 - [ - 800.00 ] 8 Yk [ : l [ = � �; l [ - 0.�40 l [ - 0.�s5o2 l [ - �s0.50o l [ - 0.�s50o l [ - 0.� 5s0o l [ - �s0.50o l mk . 5 .05 k 0 1 1 I 1 6 2 3 4 5 6 7 - 9 33 - 1 9.33 1 1 .67 8 62 1 7.3 1 9 8 12 16 2 - 8.20 16 8 4 1 6.0 1 8 1 1 6.00 8 6 - 1 9.33 1 7. 3 1 16 2 16 1 6.01 1 6.00 316 Chapter 4 Eigenvalues and Eigenvectors ee tsoubsa domi ,undoff it is quierterloikr elwiylthpratodurduceinang thxek calwiwiltchulnotaanonzer ticonver on ofothgcomponent equentninantitthereeiadigteenvect sre, ctatisoonomerof. However poiv1. Thent ropower metherodrowirs actl thuenallysthelartpt!o) convergThee to power a multimetple ofhodv1.s(tTilhiwors is kones wheninstatnceherwher e r o undoff e is acertain condidomitionns.antDeteigenval ue, or even when t h e mat r i x i s not di a gonal i z abl e , under a i l s may be foundForinsomosmetmatmoderricesn tthexte power books onmetnumer ical analgesyrsaispi. d(SlyeetoExera domicisesnant eigen­ John h od conver Baron Rayl e i g h, was a British physictoistthewhofieldsmadeof major contribu­ vectof Theoror, whiemle for otrehvealers sthwhy.ekconverSincegencel ,\2 /,\mayk 1 2:bel qui,\3 /,\te s1lo2':w.· A· ·car2:elful,\n /,\look1 , atif lt,\h2e/,\proof1 is tions acoustics and optics. Inexplanation 1871, he gaveof whythe thefirstsky is close to zero, then (,\2 /,\k 1 ) , , ( ,\n /,\ 1 ) wil all approach zero rapidly. Equation correct t h en s h ows t h at x = A x0 wi l appr o ach ,\ i c r a pi d l y t o o. v k 1 1 blue, and in 1895, he discovered the As an i l u s t r a t i o n, cons i d er Exampl e The ei g enval u es are , and s o inert gas argon, for whi c h discovery by t h e sevent h i t e r a t i o n we s h oul d have ,\ 16 = Si n ce ,\ = 4 / 1 2 heRaylreceieighvedwasthepresident Nobel Prize inRoyal 1904. close to four- decimal-place accuracy. This is exactly what we saw. of the Ther e i s an al t e r n at i v e wa y t o es t i m at e t h e domi n ant ei g enval u e ,\ of a ma­ 1 Soci e ty from 1905 to 1908 and t r i x A i n conj u nct i o n wi t h t h e power met h od. Fi r s t , obs e r v e t h at i f Ax = ,\ x , t h en 1 became chancellor ofHeCambridge Uni v ersi t y in 1908. used1873 paper ( ) ( ( ) ) ,\ x · x A A x x · x · x 1 1 Rayl e i g h quotients in an = = = Ai x·x x·x x·x onbookvibrating systems and later in his tsThehymmet e itexprerarteeicss matxiok,ntrhiec(essxu),ccest=he(sRaylAivx)e Rayle·ix)gh/equot(ixgh· x)quotientis calimetentleshdoda(xisk)aboutshoultdwapprice asoachfasAst,\as1we. tIhncomput fae sccalt, ifornge factor method. (See Exercises 17-20.) Thewhatpower metwehoddo canif wehelwantp us apprthe otoxihmerateiegtenval he ues? Foreitugnatenvalelyu, ethoferae matare rsiexver, butal s h oul d variTheations of the power methodustheats tcanhe obsbe applervatieid.on that, if ,\ is an eigenvalue of A, tThushen ,,\if-,\1 iiss tanhe eidomigenvalnantue ofeigAenval- ueforof A,anythsecaleiagrenval(ExeruescofiseA -in,\Sectif wiiolnbe A22 -- ,\A11,, and,\3 -fr,\om1, .th. i.s,valAn u-e we,\1. canWe ficannd t,\h2en. Repeat applyinthgethpower metcesshwiodl talo lcomput e i s pr o o w us t o ,\comput e all of the eigenvalues. Use th]e shifted power method to compute the second eigenvalue of the matrix A [ � � from Example method toIn Example we foundA -that =,\1 - To fi]nd ,\2, we apply the power [� � We take x0 = [ � ], but other choices wil also work. The calculations are summarized in Table repeated • 2 1 -24.) William Strutt ( 1 842- 1 9 1 9), • 4.28 . (2) • 1 6 4, 4.3 1 . 0.25 7 = 0.00006, 0.25. / • 2, • The Theory of Sound. R Rayleigh quotient. R The Sh illed Power Melhod and lhe Inverse Power Melhod dominant shifted power method a al a 22 4.3). 0, = Exa m p l e 4 . 3 2 4.30. 4.30, Solulion = 2. 21 4.4. _ Section 4.5 Iterative Methods for Computing Eigenvalues 311 Ta b l e 4 . 4 [ � ] [ - � ] [ - �·5 ] [ - �·5 ] [ _ �·5 ] Yk [ � ] [ -2�.5 ] [ - �.5 ] [ - �.5 ] [ - �.5 ] teratiuonse of. Ther:._+e­ foreOur, A2 -choice of-x0 shaso produced the ei2g-envalue isafthteersonlecondy tweiogienval k 1 0 2 3 4 -3 -3 -3 xk 1 mk A 1 = 3, A2 = A 1 - 3 = -3 3 = -1 A m preigoenvalpertyu(eb) of Theor eforme, if wethapplat ify thiseipower nvertibmetle wihtodh eitogenvall, uites tdomihRecalennantl eifrl gohasenval Ther e u e wi l be t h e ( i n magni t u de) ei g enval u e ofmethod,To except use thisthat in step 2(a) we computweefollow the sYkame- l · (sItnepsprasactiince,thwee power don't actforuallusy icomput e expl i c i t l y ; i n s t e ad, we s o l v e t h e equi v al e nt equat i o n ng Gaussian elimination. This turns out to be faster.) A- A, A. xk Exa m p l e 4 . 3 3 4. 1 8 1 /A. A A- recip rocal of the smallest inverse power method, xk = A - 1 i A- Axk = Yk - i Use the] inverse power method to compute the second eigenvalue of the matrix [ � � from Example We start, as in Example with [ � ]. To solve we use row reduction: A= 4.30. Xo = y0 = 4.30, Solution Ax1 = y0, Thus, [ � ] ,so [ � ] · Then we get from [2 I J [ I J 0·5 ], and, by scaling, we get [ ] Continuing, we get the Hence, [ valsmualeslesstheiowngenvalinuTable ofe is thwhere recieptrhoecalvalofues m(kwarhiechconver ging toThis agrThusees ,withthe i s al s o our previous finding in Example x1 = y1 = x2 [A I Yi ] = x2 = - 0.5 4.5, A Ax2 = y1: 1 o 0.5 1 1 o � 0 1 0 1 - 0.5 1 y2 = . -1 4.32. -1 - 1). - 1. Chapter 4 Eigenvalues and Eigenvectors 318 Ta b l e 4 . 5 k [ � ] [ �] [ - �:� ] [ - 0.l .SS ] [ - o0..s83 ] [ - o1..s1 ] [ - o0..9sS ] [ - o1..0s2 ] [ - o0..9s9 ] [ - o1..s0 1 ] [ �] [ � ] [ _�O].S [ - �l ..S33 ] [--0.�.863] [ --�1..41S ] [ -- 0.�.9SS2 ] [ -- �1..0429 ] [ -- 0.�.9S9l ] [ -- �1..0SO1 ] 0 1 2 3 7 6 5 4 8 9 mostiversoned.atiIltecanof tbehe usvareidanttosfiofndthane power metmathiodonifors one theiatgenval combiune,esprtohvie tdwedo jweTheusthavement appr o xi n othuerewor,\ ofds, tifhaatscalis clarosesits gitoven, the If ,\ aisclanoseeiapprgenvaloxiumeatofion tando thwiatl eifignenval d,\, tthheenueie.gIenval - is inver(SeetibExerle ifciseis4S.not) Ifan eiisgclenvalose tuoe,\of, thenand1/(1 /,\(,\- - wiils anbe eia gdomienvalnuante ofeigenval of tude thanInthfacte next, if ieigsenvalucle,osseo t(oas,\not, theend i1n/(t,\he-thirdwiRemar l be k follbiowiggerng uiExampl ne magni e 4.3 1) the conver­ gence wil be very rapid. Use the shifted inverse power method to approximate th6e eigenvalue of ] s [ - �2 -122 - 1210 that is closest to S. Shifting, we have - SI= [ =:- 2 - �2 -� �s l Now we apply the inverse power method with T h e Sh illed Inverse Power Method any shifted inverse power method A a) a) A a very Exa m p l e 4 . 3 4 a* A al (A - aI ) - 1 . a) much A a a (A - aI ) - 1 . A= - Solulion A We solve [A for 6 s s - SI [ -- 42 - 2 - 12s :]--� [ : �0 �1 =- 0.�::39� ] (A - SI )x1 = y0 x1 : I y0 ] = 7 a. a Section 4.5 Iterative Methods for Computing Eigenvalues 319 Ta b l e 4 . 6 050 050 050 041 061 049 047 [: J [ -- 0.0.3898 ] [ -- 0.0.6395 : [ -- 0.0.89 : [ -- 0.0.985 ] [ -- 0.0.998 ] [ -- 0.0.9590 ] [ -- 0.1.0500 ] 053 050 050 069 059 051 050 [: ] [ 0.1.005 [ 0.1.0501 ] [ 0.1.0500 ] [ 0.1.0500 l [ 0.1.0500 ] [ 0.1.0500 ] [ 0.1.0500 - 0.88 - 0.69 - 0.89 - 0.95 - 0.98 - 0.99 - 1.00 This gives -[ - 0.0.6881 ] , - 0.88, and 1- -0.88 [ -- 0.0.8681 l [ 0.1 69 ] - 0.39 - 0.39 0.45 Wethatcontthe ieingueenvalin tuheisoffasAhiclonosteostobtto a5inisthappre valoxiuesmatineTablly 5 e 4.16/, from whi5 ch1/we( - deduce 1) which, in fact, is exact. 4 and itservar5, iweantswirel prdiessentcussonlanoty oneherapprmethoodachbasto ethdeoncomput a­ tfactionoTheofrizeiatpower gioenvaln ofmetua esmat.hIodnrixChapt t h e For a morical emetcomplhodset. e treatment ofthis topic, you can consult almost any textbook on .numer � I n t h i s sect i o n, we have di s c us s e d sever a l var i a t i o ns on t h e power met h od for ap­ pr o xi m at i n g t h e ei g enval u es of a mat r i x . Al l of t h es e met h ods ar e i t e r a t i v e, and t h e WeRussian owe this theorem to the s p eed wi t h whi c h t h ey conver g e depends on t h e choi c e of i n i t i a l vect o r . I f onl y we had mathematician sidedinmakeformaatjiuodin"ciabout theceloofcatthioeninofitiathl vecte eigoenval uperes ofhapsa gispveeden matup rthixe, S.statedGerschgorin (1901-1933), whove tshoenmewe"incoul o us choi r and i t i n 1931. I t di d not recei muchresurrected attention unti l 1949,Taussky­ when it convergence of the iterative process. was by Olga ion ofuesthofe eia g(renvaleal oruescomplof anyex) Todd in a note she published in the matFor rn X nixmat.tunatrixelaly,l tlhieereinsiids eathwaye unitoonesofsttiamnteatcis erthcultathaetrhldieocateiskgsenval in the complex plane. LetA [ a ; ]bea( r e alorcompl e x)nX nmat r i x , a ndl e tr ; d enot e j tish,ers; um of tl ahue· IabsTheolute values of the off-diaisgonalthe cientrculrieasr idin stkhe ithinrotwheofcomplA; theatx plane wijt1'hicenter a;; and radius r;. That is, k 0 1 2 xk Yk 5 4 3 44 4 6 7 4 4 mk = m1 = X1 = + m7 = + = 4, QR Gerschgori n 's Theorem American Mathematica/ Monthly. Gerschgorin's Disk Theorem Definition = 2: = ith Gerschgorin disk D; 320 Chapter 4 Eigenvalues and Eigenvectors Olga Taussky-Todd (1906-1995) was born in Olmiitz indoctorate the Austro-Hungarian Empirefrom the (now Olmuac in the Czech Republic). She recei v ed her i n number theory Uni versity ofinVienna inwhere 1930. Duri nvesti g World WartheII,problsheeworked for thein theNational Physi cal Laboratory London, she i n g ated m of flutter wi n gs of super­ sonic aircraft.on theAlthough the problem involmatrix. ved diffTaussky-Todd erential equations, the stabiGerschgori lity of an aircraft depended eigenvalues of a related remembered n's Theorem from her graduate studies in Vienna and was able to use i t to simplify the otherwise laborious computations needed to determine thein 1947, eigenvalues relevant to thesheflutter problem. Taussky-Todd moved to the United States and ten years later became theover first woman appointed to the California Insti t ute of Technol o gy. In her career, she produced publications and receivnow ed numerous the200 branch of mathematics known asawards. matrixShetheory.was instrumental in the development of Exa m p l e 4 . 3 5 Sketch the Gerschgorin disks and the eigenvalues fo-r the following matrices: (a)A = [ � - � J (b)A = [ � � ] adii 1and2, respectively(a. )TheThechartwoactGererissctihgorc poliynnomidisksaarl ofe centA ise,\r2ed at,\2-and8, s-o3thwie teihgrenval ues are ,\ = ( - 1 v12 - 4( - 8) /2 2.37, - 3.37 diFi(bsg)kurs.Thee 4.t1w5oshGerowsschgorthat itnhedieiskgsenvalare centues earreedcontat 1aandined3wiwiththinrtahdiei two3 Ger= s3chgorand i2,n respectively. The characteristic polynomial ofA is ,\2 - 4,\ 9, so the eigenvalues are (4 V( - 4)2 - 4(9) /2 = 2 iVs 2 2.23i, 2 - 2.23i Figure 4. 1 6 plots the location of the eigenvalues relative to the Gerschgorin disks. Solulion + ± = + A= ± ± Im 4 -4 Figure 4 . 1 5 = + I- I Section 4.5 Iterative Methods for Computing Eigenvalues 321 Im 4 -4 Figure 4 . 1 6 GerAsschgorExamplin diesks. Thesunextggesttsh, eortheemeigverenvalifieusesthofat tahimats is rsiox. are contained within its 4.35 Theorem 4 . 2 9 Gerschgorin's Disk Theorem wiLetthinbea anGerschgor(rienaldiorskcompl . ex) matrix. Then every eigenvalue of is contained respondihenceng einonzer genvecto.or(Why?Let) Thenbe the entrLety tofhe itbehwirantohwteiofhgewhienvallargchesuites absof oluwitetvalh corue-and nXn A A Proof _.... Ax = Ax, A A x x; x. 2: a x = Ax; 1 �1 ij J n Rearranging, we have becaus Appendie x weTakiobtanign absolute values and using properties of absolute value (see X; * 0. C), IA - a;; I a;1x1 2: i * J = I fi1 aij xJ I J2:* i la;1x1 1 J2:* i lau l l x1 I 2: la;1 I r; l x; I X; l x; I :S l x; I :S J*i = becausThies establishes forjthat thei.eigenvalue is contained within the Gerschgorin disk centered at with radius l x1 1 :S l x; I a ;; * r;. A 322 Chapter 4 Eigenvalues and Eigenvectors Ther e i s a cor r e s p ondi n g vers i o n of t h e precedi n g t h eorem for Gers c hgor i n disks whosIt cane rbeadisiharowne thethsatumifofofthethoffe Ger-diasgonal entn dirisekssinartehediistjhoint fromofthe other c hgor i haenr, ifexacta sinlgly e dieisgkenvalis diusjesoinartefrcontom tahienedothwiertdihisnks,thtehunien iotnmusof tthcontese aindiexactsks. Ilny pardionesktiseic, ulgtenval utheatofinthExampl e matrixe.4.Exampl e i4.s 3not5(acont) il uasitnraedtesinthaisGer. schgorin disk; that is, Not e 3 5( a ) , 0 tful0hiatswhennotthe anmatappleirgiixenval ied tosuliaenrvergofertmatibHence, lerbyicesTheor, wibecaustheoutme4.tanyh1e6.GersfurtThihschgorerobscomput eirnvatdiisokantsiicanosn,parwebetidetccanulaerrdeduce lmy iusnede­ directly from the entries of the matrix. R e m a rks • • k • k A. A Exa m p l e 4 . 3 6 column A. k Consider the matrix [ 2� 06 �8 l Gerschgorin's Theorem tel s us that the eigenvalandues2, ofrespectareivcontely. aSeeinedFiwigurthein4.tlh7(reae).diBecaus sks cente tehreedfiatrst2,di6,skandi s di8swijoitnht rfradiomi 1,th1,e otremher4.t2w9.o,Becaus it mustecontthe acharin exactactelryisonetic poleigyenvalnomiuae,l ofby thhase secondreal coeffi Remarcikentafst,eirfTheo­ i t has complxex rootHences (i.e.t,heiergeenvalis a uunies qofue realtheieygenval mustuoccur ineenconj1 andugate3,paiandrs.th(SeeeuniAp­on pendi e bet w o (possibly complex) eigenvalues whose real parts lofietbethOne wotteenhhere ot5twandhoerdihand, 10.sks conttheaifinrssttwRemar kdiafsktsercentTheoreredemat4.2,296,tandel s us8 withtath rtahdie isame, 1, tandhree eiregsenval u es of are cont a i n ed i n Figrealure )4.eil 7(genvalb) . Thesue. eCombi disks narinegmutthesuale lryesdiusltjos,inwet, sodeduce each con­that taihasns athsirnpegleectereiv(alaendleiy.gSeehence (Compute the actualenvaleigenvalues, oneues ofin eachto verofitfhyethinist.e)rvals [ 1 , 3] , [5, 7], and [7.5, 8.5] . A= · A A A), D.) t, � A A A Im Im 4 4 -4 Figure 4 . 1 1 (a) -4 (b) . '� 1 Section 4.5 Iterative Methods for Computing Eigenvalues Exercises 4 . 5 In Exercises 1 -4, a matrix A is given along with an iterate x5 , produced as in Example 4.30. (a) Use these data to approximate a dominant eigenvector whose first component is and a corresponding dominant eigenvalue. (Use three-decimal-place accuracy.) (b) Compare your approximate eigenvalue in part (a) with the actual dominant eigenvalue. 1 1. A = 2. A = 3. A = 4. A = [ l 42 ] ' Xs [ 111094443 ] [ _ � - 41 ] , Xs [ - 78113904] [ � l1 ] ' Xs [ 144 ] [ 1.5 0.3.05 ] ' 5 [ 60.625 ] 6. A = 7. A = = 89 x = 2.0 239.500 [ _ � -1 03 ] ,xs = [ -11.3.060167 ] [ � - � l X1 0 [ � : ! � � ] 1 � X , � [-� 4-12 g [ �:��3.�4115 1- 1 - 31 1 , X10 [ - 2.1.290714 1 = = 0 2 1 0.000 In Exercises 9-14, use the power method to approximate the dominant eigenvalue and eigenvector of A. Use the given initial vector x0 , the specified number of iterations k, and three-decimal-place accuracy. 10. A = 12. A = 13. A = = = 9. A = 11. A = = 5 In Exercises 5-8, a matrix A is given along with an iterate xk > produced using the power method, as in Example 4.31. (a) Approximate the dominant eigenvalue and eigenvector by computing the corresponding m k and Yk· (b) Verify that you have approximated an eigenvalue and an eigenvector ofA by comparing Ayk with m kYk · 5. A = 323 [ 14 l � l [ � l [ - 6 - �l [ �l 6 5 8 Xo Xo = k= = = - 0.5 ' Xo = 0 = 5 In Exercises 15 and 1 6, use the power method to approxi­ mate the dominant eigenvalue and eigenvector ofA to two-decimal-place accuracy. Choose any initial vector you like (but keep the first Remark after Example 4.31 in mind!) and apply the method until the digit in the second decimal place of the iterates stops changing. 16. A = 60 - 6 1 [ -6 6 1 12 2 -2 2 Rayleigh quotients are described on page 316. In Exercises 1 7-20, to see how the Rayleigh quotient method ap­ proximates the dominant eigenvalue more rapidly than the ordinary power method, compute the successive Rayleigh quotients R(x;) for i = for the matrix A in the given exercise. 18. Exercise 2 17. Exercise 20. Exercise 19. Exercise 1, ... , k 1113 114 The matrices in Exercises 21 -24 either are not diagonaliz­ able or do not have a dominant eigenvalue (or both). Apply the power method anyway with the given initial vector x0, performing eight iterations in each case. Compute the exact eigenvalues and eigenvectors and explain what is happening. [� � ],x0 [ � ] [ _� � ],x0 [ � ] � [ � � +• � m � [� � n � � m 21. A = 23. A k=5 = [ � � l Xo [ �l k 6 [ 3.1. 55 1.5 ] [ l ] ' k 6 9[ 4 14 8 -4 : n � � uk � 6 24. A = 22. A = = Chapter 4 Eigenvalues and Eigenvectors 324 In Exercises 25-28, the power method does not converge to the dominant eigenvalue and eigenvector. Verify this, using the given initial vector x0 • Compute the exact eigenvalues and eigenvectors and explain what is happening. [ = � �l [ � ] [ _� �l [ � ] � _ n ; n [: J - [: �: � ] · � - [: ] 25. A = Xo = 26. A = Xo = 27. A 42. p(x) = x 2 - x - 3, a = 2 43. p(x) = x 3 - 2x 2 + 1 , a = 0 44. p(x) = x 3 - 5x 2 + x + l, a = 5 45. A A x. a * A a A, 1 /(A - a) be anor eigIenvalf ue ofandwitishnotcorranespeiondigenvalngue of eiLetgsenvect is an eiogrenval(Wuhye ofmust wibetihnhverowcortrtiehbsatlpeondi?) ng eigenvect Igensf haspacea domiis one-nantdeiimgenvalensiounale . , prove that the ei­ 46. A _ _ x. (A - al ) - 1 A - al A1 EA, � In Exercises 47-50, draw the Gerschgorin disks for the given matrix. 2& A 1 -i 4 2i 1 + 48. 47. In Exercises 29-32, apply the shifted power method to -2i 0 approximate the second eigenvalue of the matrix A in the 2 4 3i -2 given exercise. Use the given initial vector x0 , k iterations, 0 0 l + i and three-decimal-place accuracy. 49. 2; 5 + 6i 1 + i -i 30. 10 29. 9 1 2i -2i -5 - Si 31. 32. I2 0 In Exercises 33-36, apply the inverse power method to 4 !4 approximate, for the matrix A in the given exercise, the ei­ 50. I 6 6 genvalue that is smallest in magnitude. Use the given initial 0 8I vector x0 , k iterations, and three-decimal-place accuracy. 51. strictly diagonally dominant 34. 10 33. 9 36. k-5 ExerExercciissee ExerExercciissee tinhethsuatmroofw.th(SeeeabsSectoluitoenvaluesUseof thGerse remaichgorniinng's entDisrkies Theor e m t o pr o ve t h at a s t r i c t l y di a gonal l y domi n ant matafterriTheor x musetmbe invertible. See the third Remark Itfhe sius msan of thematabsolrixu,tleetvalues denot of theertohwse maxiof mtumhat ofis, c� ) (See Section Prove that iLetf ,\ is beananeigeienvalgenvalueuofe of athsentochastic matrix Apply Exer(see Sectcise i52onto Prove that Prove that the eigenvalues of [ � � � �1 are alcllosreeald ,inandtervloalcatone teachhe reofal tlihnese.e eigenvalues within a 2.5.) 14 In Exercises 37-40, use the shifted inverse power method to approximate, for the matrix A in the given exercise, the eigenvalue closest to a. 38. 12, a = 0 37. 9, a = 0 39. 7, a = 5 40. 13, a = - 2 Exercise 32 in Section 4.3 demonstrates that every poly­ nomial is (plus or minus) the characteristic poly nomial of its own companion matrix. Therefore, the roots of a poly­ nomial p are the eigenvalues of C (p). Hence, we can use the methods of this section to approximate the roots of any poly nomial when exact results are not readily available. In Exercises 41 -44, apply the shifted inverse power method to the companion matrix C ( p) ofp to approximate the root of p closest to a to three decimal places. 41. p (x) = x 2 + 2x - 2, a = 0 �i [ l [ ; �] Aabssquaroluteevalmatueriofx ieachs diagonal entry is greater thifanthe Exercise Exercise Exmi" � - [ _ :J Exercise 7. [� � ExerExercciissee 14 ExerExercciissee 13 35. u !] nXn 52. A ll A ll = �'!,� 1 53. A 3.7). A r.] 54. [Hint: 4.29.] 11A 11 A; l a iJ I . 7 .2.) A, I A I ::; II A II . I A I ::; 1 . [Hint: A= A ! 0 3 2 0 0 � 7 ' Section 4.6 Applications and the Perron-Frobenius Theorem 325 A p p l icati o n s a n d the Perro n - Fro b e n i u s T h e o re m genvaleruses. and eigenvectors. WeIn thbegiis snectbyiorn,eviwesitiwingl sexplomeoapplre seivercatiaolnsapplfroimcatprionseviofouseichapt sSectitiisoitnohne(sttroachasnsinittritooicduced )n matmatrrMariixcesofkasaovMarsochaiciaktnovesdandchaiwithmade tn, hthem.enseverInhasparal obsatiscteuleradyvarat, iwesotnsatobseaboutvectervoedtrhethtThatratan­if eiis,gtenvalhere uise.aWevectaroer nowsuchintahposat ition toThiprosveistequihis factvale. nt to saying that has as an If Pis the transition matrix of a Markov chain, then is an eigenvalue of Recal l t h at ever y t r a ns i t i o n mat r i x i s s t o chas t i c ; hence, each of i t s col u mns sciusmse toin SectTherionefore, iTakif isnag rtoransw vectposoesr, conswe haveisting of l s, then (See Exer­ whiExercchisiempliiens Sectthation is an eiandgenvecthaveor ofthe samewitheicorgenvalrespuondies, snog eiisgenval ueeigen­By al s o an value of is truue.e Foris most transitthioatnismat, if rices, theneigenvaluWee needsat­ itshfieeIfolsn lfactowi,nmuch andg twothmordefie einegienval are posie­, tive, and a sq] uare matrixtiios ns:callAedmatrix i]sifcalsolmeed power ofifitalisl ofpositistientve. rFories exampl [ � � is positive but B [ � � is not. However, B is regular, since B2 [ ! � ] is positive. Leta. be an transition matrix with eigenvalue b. If Pis regular and then As i n Theor e m t h e t r i c k t o pr o vi n g t h i s t h eor e m i s t o us e t h e fact t h at has(a) tLethe sabemeaneigeienvalgenvectuesoasr of corresponding to and let be the component of withethktthhecomponent largest absols ofutethvale equat ue ioThenn we havefor Comparing ··· Markov Chains 3.7 P P Theorem 4 . 3 0 x. Px = x. x P nXn 1 P. 1 Proof 13 1. 3.7.) jr 19 P. jP = j . n j pYf = (j Pf = f pT pT 4.3, P I A I ::=::: 1 1 dominant; 1 every A * 1, I A I < 1. 1. A positive regular Theorem 4 . 3 1 = = A= 1 nXn P I A I ::=::: 1 A. A * 1, 4.30, Proof x IAI < 1 . pT P. pT m. A lx;I ::=::: lxk l = m PTx = Ax, Plk X1 + P2 k X2 + xk i = 1, 2, . . . , n. + Pnk Xn = Axk x 326 Chapter 4 Eigenvalues and Eigenvectors (Remember that the rows ofpT are the columns ofP.) Taking absolute values, we obtain I A l m = I A I l xk l = I Axk l = l p lk x 1 + P 2 k X2 + · · · + Pnk xn l :S l p lk x l I + I P 2 k X2 I + · · · + I Pnk Xn I = P1k l x, I + P2 k l x2 I + · · · + Pnk l xn l :S Plk m + P 2 k m + · · · + Pnk m = + P2 k + · · · + Pnk)m = m (1) (p,k Thecomesfirsfrtoimnequalthe factity folthatlowsthefrroomwstofhepTrT isaunglm etoInequalThusit,y in andAfttheerladistviequal ibyty d i n g we have as des i r e d. ove tPhe(aequind tvhalereentforime PplTi)catis iaon:posIfitive matrtihxen. If Fithrenst, weall ofshtowhe ti(nhbequal at) Weit isitwitireuls eiprnwhenEquat ions are actually equalities. In particular, Equivalently, itive, p;k in Equatfor ion must beAlszero, o, and this canforhappen only if TherNow,esforinforcee, eachPis possummand thermoraree, weposgetitivequal itlyarine tnegat he Triivae;ngline Iotnhequaler woritydisn, ithefp;andkX; 'sonlallyhaveif alltofhetsamehe suFurmmands e or al sign. This implies that IAI :s 1, m, IR, IAlm :s m. 1. IAI = 1 , ( 1) A = 1. IAI = 1, (2) >0 lx;I = m i = 1 , 2, IR i = 1, 2, (2) m - lx;I 2: 0 i = 1, 2, . . . , n. . . . , n. wher e.m Thus, in either case, the eigenspace l s, as inspTheor r of pButTecorj,iussreaisnrpogondiwthvectenprgootroofofofisTheor an( j ) em This handl we seeesththate jcasT ekwher pT {e P is{,posand,itivkcompar ­ ing Icomponent s , we fi n d t h at e. kt prandovedofkPit+hatsarpose eiitgivenvale-suay,esPofTher.pItkfolandeforlowspek,+thk(atresppect+ I musivelyt, albysoTheorbef Pipossermeitgulive.ar(W, twehhy?enhave)soSimenjucespower which implies that since is impossible if ChaptWeercanInnowExamplexplaein someweofsatwhethbehavi at for othr eoftrMaransiktiovonchaimatnrisxthat we observed in p [ ] and ]initial state vector [ ] , the state vectors xk converge to the vector [ , a steady state vector for P (i.e., We are going to prove that for regular 4.30. n A EA = A = 1. � . . . , n. 4. 18, 3. A 1 A A = 1, 4.30, A=0 Ak = Ak + l = 1 . IAI = 1 . =A 1, A A - 1) = 0, 3.64, = 0.4 0.6 = Xo = 0.6 0.4 0.7 0.2 0.3 0.8 x= Px = x) . Section 4.6 Applications and the Perron-Frobenius Theorem 321 Marthe sktovate chaivectnosr,s txhkissataliwsfaysy xkhappens. Indeed, wetigwiatel whatprovehappens much morto teh.eRecalpowerl tshat k P x0. Let ' s i n ves as P becomes large. The transition matrix P [ 0.0.73 0.0.82 ] has characteristic equation 0 det(P 1 0.70.3 0.80.2 I 2 - l .5A + 0.5 - 0.5) so3i1,ts weeigenval uesin aradvance e thandat would0.5be. (Nanoteiegtenval hat, thuankse andtothTheor emseig4.enval30 andue 4.woul knew e ot h er eigensdpbeaceslesartehan in absolute value. However, we stil needed to compute The span( [ � ] ) and E0.5 span( [ � ] ) So, taking [ 3 ], we know that [ 0 0.O 5 ] From the method used in Example 4.29 in Section 4.4, we have [ 3 ][0 0(0.5 ] [ 3 ] - ! Now, ask� (0.5)k � 0, so and � [ � � ] [ � � ] [ � ] [ 0.0.46 0.0.46 ] (Observe that the columns of thias ]"limit matrix" are identical and each is a steady state vector for P.) Now let [ b be any initial probability vector (i.e., a + b Then � [ 0.0.46 0.0.46] [ ab] [0.0.46aa ++ 0.0.46bb] [0.0.46 ] Not only does this explain what we saw in Example 3.64,0.4it also tel s us that the state veoto;s wil oonmge to the steody stote vedoc x [ 0.6] foe any choke of x,!4 e is notorhalinwgaysspoccur ecial about Exampl etr4.ans37.itioThen matnextrices.theorBeforemeswehowscanthpratesentthis ttyhpee tTherhofeorbehavi s wi t h r e gul a r em, we need the following lemma. LeteigenvalP beuaeregular nhas aln gtrebransaiitciomuln mattiprliicxi.tyIfP is diagonalizable, then the dominant pk = Exa m p l e 4 . 3 1 = -A - AI) = = A1 = 1 -A 1 1 =A A2 = A2 .) E1 = Q= 2 = (A - l)(A = 1 -1 Q - 1 PQ = 2 pk = QDk Q - i = _ 1 l lk )k -1 oo, pk = D. 2 1 -1 1 -l = -1 _ = 1). Xo = k Xk = p ·v� ·-u � x, Lem m a 4 . 3 2 = = A1 = 1 X 1. 328 Chapter 4 Eigenvalues and Eigenvectors eigenvalricuesmuloftPipandlicity 1 araseantheeisame. FruoemoftPher.prSionofceofPiTheor em 4.i3z1abl(be),, Aso1 i=s P1rhas,Theby geomet g enval s di a gonal Thereeform.e, the eigenvalue A1 = 1 has algebraic multiplicityExer1, bycitshee41DiiangonalSectiiozatn i4.on4.Theor Let P bemata rriexgulL awhosr e colturamnsnsitiarone matidentriixc.alThen, eachasequalk ---+ to thek sapprameovectachesor anx. This vector xis a steady state probability vector for P. seorimplemifyisthtreupre, however oof1 , we wi, wil tconshoutidtheris onlassyumptthe icason.e wher1 e P is diagonaliz­ ableWe. ThediToathgonal ize Pas Q- PQ =Dor, equivalently, P = QDQ- , where pT Proof Theorem 4 . 3 3 n See and J. L. Snell (Newby J.York: G. Kemeny Springer-Verlag, 1976). Finite Markov Chains n X n X n oo, p Proof 0 From Theor e ms 4. 3 0 and 4. 3 1, we know t h at each ei g enval u e ei t h er i s 1 or sat i s fi e s k ask---+matrix-sappray, D*-each oaches1 1or0of whosfor e=di1a, gonal ... entIt folriesloiwss 1or0.that DThusap­, prpko=achesQD1. Hence, akQdi-a1 gonal approaches L = QD*Q- . We write lim = L Obs e r v e t h at Wethe notion are takiofngasome l i b erties wi t h limibet. Nevertheless, PL = P l i m l i m = l i m = L these steps should i n tui t i v el y clear.properties Rigorousoflimits, proofs folwhilowchfrom Ther e for e , each col u mn of Li s an ei g envect o r of P corr e s p ondi n g t o A = 1. To s e e 1 themay you t h at each of t h es e col u mns i s a vect o r ( i . e . , L i s a s t o chas t i c mat r i x ) , we have encountered ingeta side­ calcu­ need only observe that, if j is the row vector with 1 s, then lustracked course. Rather than discussion of matrix j L = j l i m l i m j P = l i m j = j limits, wewiwithla omit the proofs. snowinceipmkplisieasstthoatchasL itsicstmatochasrixt,ibyc. Exercise 14 in Section 3.7. Exercise 13 in Section 3.7 umn oforLs ofis jPuforstWeLe;mi,nneedwherg a basonle isyisofsthhowe itwithhattshtatvndarh1ecorcoldruebassmnspiondis vectofnLgortar.oLete i1dvent1,1.vi2cWr,al.... iThete, vnibeth eicolgenvect for scalars c1, c2, , cw Then, by Theorem 4. 1 9, By---+Lemma0 as k4.---+32, for 1 for1. It fol1,loswso, bythatTheorem 4.3 l (b), 1 for 1. Hence, = lim = c 1 v1 I A; I < oo, A7 A; i k---+ 'X! , n. pk pk = + pk i pp k probability k---+ 'X! e; k k---+ oc· • . Aj -=F oo, k---+ oc· A = !R n , . AJ pk = n j -=F I Aj l < j -=F Le ; k---+ oc, P k e; j -=F Section 4.6 Applications and the Perron-Frobenius Theorem 329 Isnhownotherthworat thdes,colcoluumnsmn ofofLLiarseanproeibabigenvectlity vector cororrse, sspoondiLe; ins gthteo A1 mulButtweiplehavex of val1lwhosof thee colcomponent umns ofsLsarume itdoentiSicaln,ceeachthisequalis truetoforthieachs vectcolorux.mn of L, it implies that SiofntceheLMaris akstovochaschaitinc. matThatrixis,,weLijcanreprinesteentrpsretthiet pras othbabie lity of being in Theas facttshtaetenextthathaviexampl thengcolstueamnsritleudstoffrraotLmes.arsetaitdeentical says that the Recall the rat in a box from Example The transition matrix was = 1. i unique 1. long range tran­ Remark sition matrix i, Exa m p l e 4 . 3 8 j, if the transitions were to continue indefinitely. starting state does not matter, 3.65. We determined that the steady state probability vector was Hence, the powers of appr[ � oach � i [ ] L frmentom whiandch we canofseeitstthimatethine eachrat wiofl the other twspoendcompartofmentits tsi.me in compart­ discussioofn tofhereingulitiaalr sMartatek. ovThechaipronofs byis preasoivilynadapt g thatetdhteostcoveready sthtaeteWecasvecteconcloforsxtuaitdese ivectnourdependent o r s whos e component s s u m t o an ar b i t r a r y cons t a nt s a y , I n the exercises, you are asked to prove some other properties of regular Markov chains. gular em traThen, nsitionformatanyrixi,nwiititahl xprtohbabie steladyity vectstateorprx0obabi, thelisteyquence vector forofLetiterbeaasteaisnrxeTheor k approaches x. Let P � = 1 � � � � 8 8 - 0.250 0.250 0.250 0.375 0.375 0.375 0.375 0.375 0.375 eventually 3 7 .5% 25% s. Theorem 4 . 3 4 P P, Proof nXn 4.33. 330 Chapter 4 Eigenvalues and Eigenvectors kx0, we must show that lim Pkx0 = x.kNow, wher e = Si n ce x = P X k n byTherTheoreforee,m the long range transition matrix is L = [ x x x] and lim p = L. lim pkx0 = ( lim p )x = Lx0 x1 + x2 + · · · 4.33, 1. + ··· k k-'>GO k-tGO k-toc- k--+oo 0 == X1X XXz2X XnX)nxX= X WeSectiroentu3.rn7.tIontExampl he Lesleie3.model 67 in thofat popul sectioan,tiweon sgrawowtthhat, forwhitchhe weLeslfiiersmatt explrixored in [�s L �] iterates of the population vectors began to approach a multiple of the vector + (xl + + '''+ + ···+ Pooulalion G rowlh r � populiattiisonstaeventble, suinalcely tended rfolInatliotoowiherngworyear.dMors,artehegieovervthenr,ebyoncee agetclhiasssteasteofisthreisached, he ratioups forin tthhee [�5 00.25 l . 5x s arofethstiisl popul in theatriaotniowhen it1.has5 =reached itObss steeadyrvestthaatte.1.5 repre­ sandentWesththe ecancomponent recognithezsetetadyhat sxitasteangroeiwtgenvect oirs aof L correeispgondienvalnugetooftL,heandeigen­an valeigenvect ue =or1.nowcor5. Thus, h r a t e thibeens eigenvalreached.ue reWeprecansentcomput s the e thesesidizesreofctlyt,hwie ageth­ cloutas havies whenng totihteerresasttpeeondiadyas wesntagditetodhasbefor e. wth rate and the corresponding ratios between the age clas es forFindthtehLese stWeeliadye matneedstartixetogrLfioabove. L. The characteristic polnydnomiall posal ofitivLeieis genvalues and corresponding eigenvectors of det(L = 00.5 0.25 0 0.375 18:6: 1 IF 4 27 : 9 : growth rate ,\ Exa m p l e 4 . 3 9 18:6: 1 . positive relative Solution - ,\ - AI) 4 -A 3 - ,\ - ,\ 3 + 2 ,\ + Section 4.6 Applications and the Perron-Frobenius Theorem 331 sinog,wewemushavet solve 2,\ 0. 5 0 or, equivalently, 0. Factor­ (2,\ ) 0 1 x Since- t1.he1,setcondhe onlfacty posor hasitiveonlroyotthofe rtohotis sequation i s - 0.1.159. andThe(See corAppendi row reductrespioondin: ng eigenvect1.o5rs are in the null space of - l .50I, whi18ch we find by O - 1.5II O J [ �-5 - 0.1.255 - 01.5 01 0 00 J Thu,, if [ :: ] i"n eigenvectoc '°""ponding to 1.5, it "ti'fi" I and That is, - ,\ 3 + 8,\ 3 - 16,\ - 3 = + 37 = - 3)(4,\ 2 + 6,\ + = D.) ( - 3 - Vs)/4 L 4 = [L ( - 3 + Vs)/4 A=�= = 3 = 3 -6 x� A� x, � Bx, x2 = 6x3 • Hence, t h e s t e ady s t a t e gr o wt h r a t e i s 1. , and when t h i s r a t e has been r e ached, t h e age 5 clas es are in the ratio 18: 1, as we saw before. ddatwee forhavethedonesteadyif stahadte grhadowtmorh raetet:htanhe unioneqIposnueExampl positivietieiveeg4.enval ei3g9,envaltuheerorueewasnone? of onlButWey onewhatwercandiwoul e also appars werentleyposfortituivnate, ewhithatchtalherloewedwasusa corto rreelaspteondithesnegcomponent eigenvectors altol tofhewhossize eofcomponent atandiona icors notresaccipondidentngaeil; gthenvect at is, or witLesh poslitehimatteivpopul re component ix hasatioexactn. Wesl.ycanoneprposoveitivtheateigthenvalis situue­ Recall that the form of a Leslie matrix is 0 0 0 0 00 0 0 00 00 000 0 Since thoe(oenttherriews ise,rethpreepopul sent sautrivoinvalwoulprodbabirapilidtliyesdi, wee outwi)l. Weassuwimel althsato asthseyumearethalatl nonzer atand,leasagait onen, tofhethpopul e birtahtiparon awoulmetedrsdie iouts nonzer o ( o t h er w i s e, t h er e woul d be no bi r t h s now prove the assertion we made above as). Wia thteorh theesm.e standing assumptions, we can a uniques. positive eigenvalue and a corresponding eigen­ vectEveroyrLeswitlhieposmatitrivixe hascomponent 6: L. L every b 1 b2 b 3 51 52 L= 53 (3) Sn - I sj b; Theorem 4 . 3 5 bn - 1 bn 332 Chapter 4 Eigenvalues and Eigenvectors Proof Let L be as in Equation The characteristic polynomial of Lis (3). = ( - l ) nj(A) rs(Yuorotouvsivareofal prasokbabiedSitnloicetipresatovelareastehtipossoneinitiExerofve,tthcheiesebicoeffirth parcTheientametseiofgenval ers uchange iess posof Litsivarigeneandtexactheraleforllyofonce.e tthhee x D), therefore, has exactly one positive rByootDescar .ByLetdirusetctecals'cals lRulictuleatofion,Sigwens can(Appendi check that an eigenvector corresponding to is 16.) f(A). b; (A) f f(A) sj A 1. A1 s1 /A 1 S 1 S 2 I Af S 1 S 2S3 /A i pos(Youitiarve.e asked to prove this in Exercise Clearly, all of the components of are biruthe parofaImetnLifactsers, morande is trtuhe.atarWieis,posteverh tithiyvee,otaddiihterttui(orrnnalesaloutrorequicompl thratementtheex)uniteihgatqenvalue posueitiveofeiLgenval satproisoffieiss ( I t i s beyond t h e s c ope of t h i s book t o pr o ve t h i s r e s u l t , but a par t i a l i.n) edThiins explExeraciinses whyforweregetaderconver s who garencee famitoliaasrtwieadyth tshteatale gvectebraorofwhencomplweexinum­ bertouthelspopul ation vectors: It is just the power method working for us! terate x1 18.) b; two consecutive b;+ i A1 dominant; I A I < A1. A 27 Ieingtenvalhe prueeviofousinttweroesapplt wasicposatioinstiv,eMarandkovdomichainantns .andMorLeseoverlie mat, therrieceswas, wea corrsawetshpatond­the heoremof guartinhge oneseiantgenvect eeswethhaveoatr twihbeenitshwiposlconsbeitivtiehdeercomponent casing.e Thefor afislr.asrItgt veretuclrsanisosnoutofofmattthhiatsritcaheseorr,emarineclmukidiabls nforeg tmany matFiricres.st, we need some terminology and notation. Let's agree to refer to a vectposoitrivase ifweall wioflitwrs component sf are positfoivre.alForl andtwo (Similamatr defiricnesitions wil applandy i t e i for to be theandmatsorixon.of)tThus he absol, a posuteivaltiveuesvectofotrhe entsatirsifieessof Let us define The Perro n -Frobenius Theorem Oskar Perron (1880-1975)whowasdida German mathematician work in manyanalysis, fields of mathemat­ ics,equations, including differentiandal al g ebra, geometry, number theory.inPerron' s Theorem was published 1907 in on continued fractions. a paper positive B = [b ii ] , A > B, A :s B, IA I = [ la;j l l A 2: B a ii 2: b ij i j. mX n x A = [a ii ] x > 0. A. Section 4.6 Applications and the Perron-Frobenius Theorem Theorem 4 . 3 6 333 Perron's Theorem prLeta. operbe>tie0as:positive matrix. Then has a real eigenvalue with the following b.c. If hasis anya corotrheserpondieigenvalng posueioftive eitghenvect o r. en Iofntauitively,posweitcanive matsee whyrix thThee fircorrst tweospsonditatementng mats shrioulx trdansbefotrrume.atConsion mapsider tthhee casfirset quadr e pltoaneactpronoperthelyiminagesto itsweelf,getsin,cethaleyl neces component sconver are posgeittiovware. Ifdwesomere­ peatray iendlathntyealoffilrostwthquadr s a r i l y a nt ( F i g ur e A di r e ct i o n vect o r for t h i s ray wi l be a pos i t i v e vectleavesor thewhiraychfixmused. tInbeotmapped ins,to some posiwitivteh mulandtiple ofbotithseposlf (siatyiv, e. since h er wor d oalvectl o>rs thus2, we foneedr someonlyscconsalar iderWhen tvecthis happens , tChapt hen erForwes2omewilnonzer for o r s I n vectors iinve vect(tohres on this unit isnpthero ae",gtenerherealwiizedl beelseleiaptsoimaxihatd:'mSo,mapsumasvalthruaeengessetof ofoversalulchtunihethtnonnegat Denote this number by and the corresponding unitatvecto2r by x1.(See Figure nXn A A1 A1 A A A, 2X2 4. 18). x, Proof 7, I A I :::; A1. A. A A(kx) A(kx) A k Ax = A 1 x, x, Ax Ax x A y y Figure 4 . 1 8 y Figure 4 . 1 9 x A1), A1 A. O; A1 y A1 Ax A unit x. n !R unit sphere) Ax. 4. 19.) y 334 Chapter 4 Eigenvalues and Eigenvectors obtaWein now show that Ax1 = A1x1. If not, then Ax1 > A1x1, and, applying A again, we wher Exerthereceiwistehle beinequalsButometithAyen2is>yprA1=eses(rulv/ched,l Athx1siatnl ce)AyAAx1isisAposa2yuni.itThiivte.vects (contSeeorExerrtahdiatcctsissaettihsfie efactands AytSecth>at A1iA1onywas, so tA1hex1maxi; thatmisum, A1valis anue wieigtenval h thisupre ofoperA.ty. Consequently, it must be the case that Ax1 = Now A i s pos i t i v e and x1 i s pos i t i v e, s o A1 x 1 = Ax1 > Thi s means t h at A1 > andTox1 >provewhi(c)c,hsucompl etAesisthanye protoofherof(r(eaal) andor compl (b) . ex) eigenvalue of A with cor­ ppos e responding eigenvector z. Then Az = Az, and, taking absolute values, we have e uniinequalt vectityorfoul oiwsn thfreomdirtehcteioTrniaofngllzel Iinsequalalso iposty. (itSieeve Exerand csiasetisfies SiAuwhernceeIAlzthlul e>. Bymitdhtdlheemaxi mality ofAJrom the first part ofthis proof, we must have IAI ::::: A1. I A1 forric, anymuleitigpenval AWe IwinA1lfact. notIt ,imorspralosveeoitsththeresucase.e factIet ttuhsrat.nsA1outhasthalatgA1ebrisadomi ic, andnanthence, so IAgeomet licity ue tain nonnegat iveematmatrrixi­. cesA squar. PerFrorbenieon'mats uTheor sridix Ad esimos calincanledbe gener The raelsiuzifedlt, srufrebjquiomectrepostsoasittoeivmechnie topercceralmcondi t i o n on t h the same permutation of the columns, A can be writ en in bloutckatforionmofasthe rows and [ � �] wher matriex andsuch tharate square. Equivalently, A is reducible if there is some permutation A [ � �] (See page For example, the matrix A= is reducible, since interchanging rows and and then columns and produces 40 36.) 3.7 2: 0. 0, 0 (4) 40.) 0, 2: < i= 1. 1912. reducible B P D P PT = 1 87.) 0 2 2 0 0 2 4 1 6 1 0 1 7 0 0 3 5 5 3 0 2 7 2 3 1 7 2: 1 3 0 �----�-J � ?. 0 0i2 1 0 0!6 2 0 0! 1 7 __ ____ ____ ?. 3 1 2 3 Section 4.6 Applications and the Perron-Frobenius Theorem 335 (This is just where PAPT, 0 0 0 P= 1 0 0 0 1 0 0 � Theorem 4 . 3 1 A Ak > 0 irreducible. primitive. Letwith thbee anfolliorwieducing prbloepernonnegat ive n n matrix. Then has a real eigenvalue t i e s: b.c.a. If hasis aanycorroteshperondieignenvalg posuietivofe eigenvect or. If is primitive, then this t h en iInf equal strict.ue of such that ,\ then is a (complex) root of the d. equat ,\ isioaninty,\eiisgenval e. has algebraic,\mul� tiplicity rested readerive matcanricfiesndoramatprorofix analof thyesisPer. Theron-eiFgenval robeniuues Theor eemn calinlemany tTheextsionntenonnegat i s oft dartihlye of and a cor r e s p ondi n g ei g envect o r ( w hi c h i s neces s unique) is called the of The e, after the first two arteermthse, eachnumbernews itnertmheisseobtquenceained by summing the two ter...ms, wher plpreetcediely defing int.edIf weby tdenot he equate thieonsnth Fibonacci number and, forbyn then this sequence is com­ The Perron-Frobenius Theorem X A1 > 0 A1 A A, n - A1 Matrix Analysis 0 0 0 0 0 A A See C. R. Johnson (Cambridge, by R. A. Horn and England: Press, 1985).Cambridge University 0 0 0 CheckA stqhuaris!)e matrix that is not reducible is called If for some ttranshenitionismatcalrleixd, by definitiForon. exampl e , ever y r e gul a r Mar k ov chai n has a pr i m i t i v e ir educible. (Do you see why? TrIyt sihs owinotnharg thdetocontshowraposthatitivevere ofythprisi.m) itive matrix is k, __.. 1 0 0 0 0 = 0. A A A IAI ::=::: A1. IAI = 1, A1 A 1. A1 Perron root A, probability Perron eigenvector A. linear Recurrence Relalions 0, 1, 1, 2, 3, 5, 8, 13, 21, Fibonacci numbers Jo = 0, f1 = 1, Jn = fn - 1 + fn - 2 fn , 2: 2, ThiFibonacci s last equatnumbers ion is, anbutexampl e ofwial lconsinearidrerecurlinrearencerercurelatrieoncen. Werelwiatilonsretsuormewhat n to the fi r s t we more general y. 336 Chapter 4 Eigenvalues and Eigenvectors Leonardo of Pisa"son(1170-1250), pictured left,a number is betterofknown by hisbooks, nickname, Fibonacci, whi c h means of Bonaccio:' He wrote i m portant many ofappears which have survived, including and The Fibonacci sequence assurrounded the solutionontoallasides problem inwall. How many pairs certainof rabbits man putcana pairbe produced of rabbitsfrom in a place by a thatthe pair in a year if i t is supposed that every month each pair begets a new pair which from month onbybecomes ve?" The name was given to the terms ofsecond this sequence the Frenchproducti mathematician Edouard Lucas (1842-1891). Liber abaci Liber quadratorum. Liber abaci: "A Fibonacci numbers be a sequence ofnumbers that is defined as follows: Let where wherare e scalars. are Forscalaalrls. k, If ionstihne equatareiorenfeinr ed tios ascaltlheed a equat of the recurrence. The Thus, the Fibonacci numbers satisfy a linear recurrence relation of order the entiht, ttherenmthine raecurrecurrencerencerelraetliaotniohasn, weordrerequik. re the rmordbuter tnoofo defitineritmniaelbeforcondi - k)IThetfh, itnenumber tfiirosnst teisrmtheofortdheersofequence the recurbercalenceledrelatiWeon. could I t i s not neces s a r y t h at t h e start at It isorposanywher e el s e. curbyralencelowirnelgatanionsextbyra,alisloowilatnedg tcoeffi he coefciefintci,ewhintsscibhtlomaye tbeo havealso evenbe a funct rmorathereiogener tn.hanAnasclexampl allinaearrs andreewoul d be the recurrence We wil not consider such recurrences here. rsConseecurquence.irdeerncethreelsaetiquence on defined by tforhe initial Wrcondiite toutionsthe first five termsandof tthhies late the nextWetharreeegitevrmenst. hWee fihaverst two terms. We use the recurrence relation to calcu­ D e fi n it i o n (xn ) = (x0, x1, x2 , . . . ) a0, a 1 , . . . , a k - l 1 . x0 = a0, x 1 = a 1 , . . . , xk - l = a k _ 1 , 2. c 1 , c2 , n 2: Xn = c 1 xn - l + c2xn _ 2 + · · · + ckxn - k' ck * 0, (2) ( 1) . • . , ck linear recurrence relation of order k. initial conditions 2. Remarks (n • • • • Exa m p l e 4 . 4 0 x0. x1 C; functions (xn ) xn = 5xn - l - 6xn - l n 2: 2. x 1 = 1, x2 = 5 Solulion X3 = 5 X2 - 6X1 = 5 • 5 - 6 • 1 = 19 X4 = 5X3 - 6X2 = 5 . 19 - 6 . 5 = 65 X5 = 5X4 - 6X3 = 5 · 65 - 6 · 19 = 2 1 1 1, 5 , 19, 65, 2 1 1, so the sequence begins .... Section 4.6 Applications and the Perron-Frobenius Theorem 331 pltoeapplCleyarttlhyhe,enirfetwecurhe rwerapprenceeoiachrnetleatrusieosneteddthinteri,mesay,eswoul. Itthdwoule belOOdrtahbethterenirmtceediofifoweusth,ecoulsisnecequence wed findwoulanin Exam­ d have forcurmreulncea forrelaXtnioasn.aWefunctwiilonil ofustraWete threefprerotceso fisnwidinthgtshuechseaquence formulfraoasm Exampltehe re­ To begin, we rewrite the recurrence relation as a matrix equation. Let A [� -�J and introduce vectors xn [ :: J for 2. Thus, x2 [:: J [ � l X3 [:: J [ � l x4 [::J [ �: l and so on. Now observe that, for 2, we have AXn- [ 5 - 6 J [Xn-2Xn-I J [ 5Xn-IXn-- 16xn-2 J [Xn-Xn 1 J Xn NotLesliiceematthatritches.is iAss thine stahmeose tcasypeesof, weequatcaniowrn weite encountered with Markov chains and We Thenowcharuse tahcteetreichnistic equat que ofioExampl n ofA2 ise to compute the powers ofA. A 6 5,\ hat thieoneigfolenvallowsuesthatareof,\t1he r3eandcurre,\nce2 r2.el(aNtiootn.iceIftwehatwrtheitfore thme ofrfreocurtmherwhiecharncechaasctweeXrnfiisn-tdic5tequat the same!) The correspn-ondin6g ein-genspacesit isarappare ent that the coefficients are exactly E3 span( [ � ]) and E2 span( [ � ]) SettingP [ � � J , we knowthatP- 1AP D [ � �J. ThenA PDP - 1 and Ak PDk 1 [ 3 2 J [ 3k 2OkJ [ 3 2 J - 1 [ 3 2 J [ 3k 2OkJ [ - 23 J - 2-(32k(+3kI)) 33((22kk)+ 1) J It now follows that 4.40, 98 explicit solving n. 4.40. = = 1 = n x = = 2: = n i= 1 = = 0 2: = 4.29 + =0 = x I + x z = 0, = = = = = = p- = 1 1 0 1 1 0 = = 1 1 1 -1 + + = 338 Chapter 4 Eigenvalues and Eigenvectors � frinonm which we readtooffvertihfye sthoatluttihoins xforn mul3na-gi2vnes. (tThoe check ourermwors thkat, wewecoulcalcdulplatuedg s a me t usinObsg theervreecurthatrenceXn isrealalitnioearn. Trcombiy it! )nation of powers of the eigenvalues. This is nec­ essariicliyt]t.hUse casinge asthilsongobseras tvhateieion,genvalwe canues aresavediourstinsctelv[esas Theor eworm 4.k.38(Oncea) wiwel makehave explcomput s o me ed the eigenvalues ,\1 3 and ,\ we can immediately write where c1 and are to be determined. Using the initial conditions, we have when n and when n We now solve the system for cThi1 ands is thtoe obtmetahinod1we wiandl use in practThusice. ,Wexn now3n -il u2snt,rasatebeforits use. e to find an explicit formula for the Fibonacci numbers. Solve the FiWrbonacci r e cur r e ncef0 , f andf f for n f O 1 2 n n n l f f f we s e e t h at t h e char a ct e r i s ­ i t i n g t h e r e cur r e nce as 2 n l n n tic equation is ,\ ,\ so the eigenvalues are and Itthefolforlomws from the discussion above that the solution to the recurrence relation has for sUsomeingstchaleairnsitci1aandl conditions, we find fo and in 1 is and Hence, an explicit forSolmviulnga forfor theandnth Fibweonacciobtanumber = = 1, 2, . . . , 5 = c2 2 = 2, 1 = x 1 = c 1 3 1 + c2 2 1 = 3c 1 + 2c2 =1 5 = X2 = C 1 3 2 + C2 2 2 = 9c 1 + 4c2 = 2. c2 Exa m p l e 4 . 4 1 c =1 = Solulion X = 1, = + = 0, 2 - - 1 = 0, A1 = Jacques Binet (1786-1856) made contributions to matrix theory, number theory, physics, theandruleas­ tronomy. He discovered forBinet'matrix multiplication in 1812. s formula for the Fibonacci numbers isishedactualit inly due tohow­ Euler, who publ 1765; ever, it washisforgotten unti l Binet published version in 1843. LikeheCauchy, Binet was ay royal ist, and lost his uni v ersi t posi­ tion when Charles abdicated in 1830. He recei v ed many honors for hisinwork, including his elec­des tion, 1843, to the Academie Sciences. 3c 1 + 2c2 = 1 9c 1 + 4c2 = 5 c2 = - 1 . = ---- 1 + Vs 2 A2 = 0= = C 1 A � + C2 A � = C 1 + C2 2: 2. 1 - Vs 2 --- c2 . c1 c = 1 / Vs c2 , l _ Jn = Vs c2 - 1/ Vs. ( 1 +2 vs) n Vs ( 1 -2 vs) n - l _ (5) 4 Section 4.6 Applications and the Perron-Frobenius Theorem Fornumber mula is ayetremarthe Fikablbonacci e formnumber ula, becauss aree alitl iisntdefiegersned! Triny tplerumggis ofngtihnea few valuisesknown for toassee how the terms cancel out to leave the integer valuesfn- Formula lineduwores arkes alforl dianystinctse.condWhenorthderereliisneara repeatrecured­ reiegnceenvalThereluamete,tiotnhhodewhostewechnie ashaveqsueocimusjautsetdtouteibegenval ince thizeesdibotagonalh sitiuzatatiioons.n method we used may no longer work. The next tmodiheorefimed,susmmar LetLeta,\ -X,\n1b=and=ax0.n,\-2l be bxthne-eiz begenvala recuruesrofencethereaslatsioocinattheatd ichars sataisctfieerdisbyticaequatsequenceion ,\(2xn-) . a.b. IIff AA11 = AA22, =A,thentXhnen=Xcn1A=7 c1Acn2A� cfor2n,\some n for ssocmealarsscalc1aandrs c1cand2. c2• In either case, c1 and c2 can be determined using the initial conditions. Axn_ 1, wher(a) Gener e alizing our disXcussion above, we can write the recurrence as xn = xn = [Xn-n 1 ] and A = [ � �] Sifonr ceExerA chasise distinct eigenvalues, it can be diagonalized. The rest of the details are left (bxb)n-Wez orwi, equil shvowtalenthlatxy, n = c1,\n c2n,\n satisfies the recurrence relation Xn = axn- l if ,\2 - a,\ - b = 0. Since sXubs-tiaxtution intbxo Equat= i(ocnAn yicelnd,\s n) - a (c An- I c ( ),\n- l) n n- I - n-2 -1 b (c1An-2z c2( - 12),\n-2) 2 - l = C1 (An --a,\2)n,\- 1 - b,\n-2) C2(n,\n - l )An- 1 == cc1,\Ann-2-2((,\02)- a,\c n-,\bn)-2(0c)2n,\nc-2,\(,\n2-2-(aa,\,\ - 2bb) ) c2An-2(a,\ 2b) 1= c An-2(a,\ 2b2 ) 2 2 Buta/2,, susiniceng,\thies aquadrdoublateicroforotmof,\ula2. -Consa,\ e-quentb = ly0,, wea,\ mus+ 2bt have= a2/a22 4b2b == 0-and4b/2,\ = 2b = 0, so tional n (5) Theorem 4 . 3 8 339 (5) Vs irra­ Vs Binet's formula. + * + + Proof 53. + + (6) ( 6) + + + n + n - Z) - b(n n - a (n + + + + + + + + + + 340 Chapter 4 Eigenvalues and Eigenvectors is a Suppos unique esothluetiinonitiforal condic1 andtions (arSeee Exerrandcise 54.) Then, in either (a) or (b) there Solve the rTheecurrcharenceactreelartisiotinc equation is and- Xn n- l -whi9xnc-hl hasfor 3 as a n n n doubl e r o ot . By Theor e m 4. 3 8( b ) , we mus t have X 3 ( c n ) 3 . 3 2 n Since c1 and ( c2)3, we find that so Therelattioechnins. Wequesstaoutte, lwiintedhoutin Theor 38 canalberesextulte. nded to higher order recurrence proof,etmhe4.gener LettXhnat isamsat- iiXsfin-edl byama-lsXequence Xn-m bee athreecurasrsoenceciaterdelachartionaofcteorrisdteric n-l ( n ) . Suppos polynomial where factThenorxs nashas the form X ( x0 = c2 . Exa m p l e 4 . 4 2 x0 = 1, x1 = 6, A2 Solulion 1 = x0 = Theorem 4 . 3 9 m x1 = s . 6 = x1 = c 1 + = + + (,\ - A 1 ) m 1 (A - A 2 ) m2 • • · · · • = 6x n 2 2. A= 6,\ + 9 = 0, = c1 + c2 n = c 1 + c2 = 1, + a0 x (,\ - ,\ k ) m', m1 + m 2 + · · · + m k = m. n - C 11 1t n1 + C 1 2 n 1t n1 + C 13 n 2 fl1n + . . . + C 1 m 1 n m 1 - I fl nJ ) + . . . + (ck1 AJ: + ck2 nA J: + ck3 n 2 AJ: + · · · + ck m, n m, - I A J: ) _ \ \ \ \ Ientn calialcequat ulus, iyouon ofletarhne forthatmif ( wher) is aediffiseraentconsiabltaentfunct, thenionthsate gener isfyinagl sola diufftieorn­ is specitfiheed,unithqen,ue bysisolusutibsontittouttiwher nhegdiffe ereiisntnaitahconsleequatgenertantioa. nlIfstoanhlatuitnsioaittn,iiasfilweecondis fithnedtiitnohintatial conditiHence, on is Suppose we have differentiable functions oft-say, , Xn-that satisfy a { We can write this system in matrix form as where [x{�(tt)) l , and ( t) Now we can use matrix methods to help us find the solution. svs1ems of Linear Differenlial Equalions x = Ce kt, x=x t x' = kx, C t=0 k x ( O ) = x0 C = x0• x = x e kt o n system of differential equations x1, x2 , x = a 11 x 1 + a 1 2x2 + · · · + a 1n xn x� = a z1 X 1 + a z 2X2 + · · · + a zn Xn x' = Ax, x' ( t) = x� x� • • • Section 4.6 Applications and the Perron-Frobenius Theorem 341 a usefulions:observation. Suppose we want to solve the following sys­ temFiofrsdit,ffweeremakential equat 2 Each equation can be solved separately, as above, to give wher a e andcoefficarieentconsmattraintx s. Notice that, in matrix form, our equation has X{ = X 1 x; = 5X2 X1 = C1 e 21 Xz = C2 e s t C1 diagonal C2 A= tion. cThiients smatanduggesrtihxet,sieifthgposenval at,sforiblueanes. 2arandbitraryoccursysteim,n thwee exponent should stiaarltsby diandagonalofizitnhge tshoelucoeffi Solve the following system of differential equations: 5 Exa m p l e 4 . 4 3 [� �] x' = Ax e 21 e 51 Here the coefficient matrix is [31 22 ] , and we find[ 2th]at the eigenval[ -u1 es] arreespective4lyand. Therefor-e,1, wiisthdicoragonalrespizondiablen, gandeigenvect , o r s and 1 3 the matrix that does the job is A= Solution ,\ 1 = ,\ 2 = A P = [v1 v2 ] = We know that P - 1AP = [� - � ] [40 - OJ1 = D v1 = P v2 = Let to get(so that or, equiandvsaluebsnttliytu, te these results into the original equation This is just the system whose general solution is or x = Py x ' = Ax x ' = Py' ) Py' = APy y' = P - 1APy = Dy 4 Y1 = C 1 e 1 Y2 = C 2 e - t 342 Chapter 4 Eigenvalues and Eigenvectors To find we just compute x, x = Py = [� (Check that these values satisfy the gisoven system.) and Observe that we could also expres the solution in Example as [] [ ] This itzeablchnie.qTheue genernext athlieorzeseeasm,ilwhosy to e proofsyisstelemsft aswheran eexertheccoeffi cmmar ient matizesrixthies disitaugonal i s e , s u ation. Let be an diagonalizable matrix and let be such that x1 = 2C 1e 4 t - C2 e - t x2 = 3C 1e 4 1 + C2 e - t. 4.43 Remark 2 -1 x = C I e41 + C 2 e -1 l = C 1 e41v1 + C2 e -1v2 3 n Theorem 4 . 4 0 A n X n X n P = [ v1 v2 • • • v" ] Then the general solution to the system is Theecosnextysteexampl ereinasvolonablves ea tbio oaslosgiumecal tmodel ingrwhiowtchhrtawteoofspeachecies slpivecie iens de­the same m. I t i s h at t h e pends ations.sim(plOfe courby igsnore, thinergethareese.ot)her factors that govern growtIf hon, butthandewesizwieslofkeepdenotourpopul model wo populis ofattihoensforatmtime then and are their rates of growtehthate tsiimzese ofOurthe tmodel 0 x' = Ax x = C 1 e A 1 t v1 + C 2 e Air v2 + · · · + C " e A"1 V" both x�(t) Exa m p l e 4 . 4 4 x 1 (t) x 2 ( t) t. x{(t) = ax 1 (t) + bx 2 (t) x;(t) = cx 1 (t) + dx2 (t) a, b, c, d t, x{(t) where the coefficients and depend on the conditions. Raccoons andr, andsquisprace.els iLetnhabithte traccoon he same andecosysqstuiemr eandl populcompetationse wiatthtieachme otyearhers forbe food, wat e giravteenisby and butreswhen pectivesqlyui. Irnetlhs earabse prenceesentof, tsquihe compet r els, tihtieornaccoon grheorwtac­h s l o ws t wth rat.eIntothe absence of raccoonsThe, thsquie grorwtelhpopul atitohne sisqsuiimr ielal rpopul ly affeacttioend bycoon tis hegrroaccoons r a t e of e populisation growth rate for squir els when the ecosystem wiandth rtahccoons Suppostheeytharate isnhiartiainl gy r ( t) s ( t), r'( t) = 2.5r ( t), r' ( t) = 2.5r ( t) - s ( t) . s'( t) = 2.5s ( t) , s'( t) = - 0 . 25r ( t) + 2.Ss(t). t Section 4.6 Applications and the Perron-Frobenius Theorem 343 raccoons and squir els in the ecosystem. Determine what happens to tthheresee tarweo popul a t i o ns. Our system is where [ ] and [ -I.O J The- ei] genvalues of] are and with corresponding eigenvectors [ � and [ � By Theorem the general solution to our system is 60 Solution 60 x' = Ax, 2.5 r (t) Ax = x(t) = s (t) -0.25 A2 = 2, A A1 = 3 . 4.40, v2 = [ ] 2.5 v1 = [] 2 -2 x(t) = C 1 e 3 tv1 + C2 e 2 tv2 = C1 e 3 r + C2 e 2 r 1 1 (7) .tionThe iwenitihaveal population vector is [ : ���] [ :� ], so, set ing in Equa­ Solving this equation, we find and Hence, (7), C1 = 15 x(t) = 15e3 r t=0 C2 = 45. [ - � ] + 45e 2t [ � ] Ficcoongurepopulshatowsion tdifrhoeemsgroutwhiaphsafchtofewer tahfilesitnedletwmoro functe thanions,yearand. you(Canandcanyouseedetcleerarmlyintehat the rawhen it dies out?) .+ We now cons i d er a s i m i l a r exampl e , i n whi c h one s p eci e s i s a s o ur c e of food forl tbehedrotahsteric. alSuchly overa model i s cal l e d a Once agai n , our model wi simplified in order to il ustrate its main features. r(t) = - 30e 3 1 + 90e 2 1 � = x( 0) = s(t) = 15e 3 1 + 45e 2 1. 1 exactly predator-prey model. Raccoon and squirrel populations Figure 4 . 2 0 4.20 344 Chapter 4 Eigenvalues and Eigenvectors Exa m p l e 4 . 4 5 Robionlynsours andce worof food.ms cohabi t an ecos y s t e m. The r o bi n s eat t h e wor m s , whi c h ar e t h ei r Theivelyr, oandbinthande equatwormionspopulgoveratnioinnsg atthetigrmoewtyearh ofsthareetwdenoto populed bya­ and r e s p ect tions are Ithf eintiwtioalpopul y roabitionnss andover timwore. ms occupy the ecosystem, determine the behavior of fiandrst thinginwethnote twicoe equat aboutiotns.his exampl e eilsyt, hwee prcanesenceget rofidthofe texthemra conswithtaantsims,plThee change For t u nat and ofvariablSubses. Itfitweutilnegt into Equations andwe have then r ( t) t w(t), r'(t) = w (t) - 12 w '(t) = - r (t) + 10 6 (8) 20 Solulion - 12 r'(t) = x'(t) 10, w'(t) = y'(t). r(t) = x(t) + 10 w(t) = y(t) + 12, (8), x'(t) = y (t) (9) y'(t) = -x(t) which is] easier to work with. Equations have the form where [ . Our new initial conditions are and so Proceedi[ -ng:].as in the last example, we find the eigenvalues and eigenvectors of 2 ewhix rootch hass, whinocrhealareroAot1 s. WhatandsAh2ould weThedo? TheWe havecharanoctechoiristiccepolbutynomito usaeltihseAcompl = = correspemonding oureigenvect Theor solutioorns hasare althseoforcomplm ex-namely, v1 [�] and v2 [ -�l By x' = Ax, (9) O 1 -1 0 x(O) = r (O) - 10 = 6 - 10 = - 4 A = y (O) = w (O) - 12 = 2 0 - 12 = 8 x(O) = A. + 1, i = 4.40, From x( O ) = - i. = [ - : ] ,we get whose solution is C1 = - 2 - 4i and C2 = So the solution to system is (9) - 2 + 4i. of thisesxonumber lution? Robis! Fearnsleands lyworpromceedis innhabig, wet a applreal yworEullde­r's yetformWhat ourulasoarluetiowen itnovolmakeves compl cos sin e ;1 = t+i t Section 4.6 Applications and the Perron-Frobenius Theorem 345 IN.Si\NC.T. T\<?sERS AAE �n ·tum n . CALV I N A N D H O B B E S © 1 988 Watte rson. R e p r i nted with p e r m i s s i o n o f U N IVERSAL P R E S S SYN D I CATE. A l l r i g hts reserved (Appendix to get cos ( - t) + i sin ( - t) cos t - i sin t. Substituting, we have x(t) ( - 2 - 4i) (cost + isin t) l�J + ( - 2 + 4i) (cost - isin t) � �J ( ) ) + 4s i n t + i 2s i n t 4cost ] [ ( -(42cost ) ( ) cost + 2s i n t + i 2cost + 4s i n t +24ssinint)t)++i (i2(4coscostt -+42ssinint)t) J + l ( -(42cost cos t + [ - 4coscost t++4 sisnint t] Thiterms sgiofvesourx(tor) igi-nal4 cost+ variables,swein tconcland y(udet) thatcost+ 4 sin t. Putting everything in r( t ) x( t ) + 10 4 cost + s i n t + 10 w( t) y( t) + 12 cost + 4 sin t + 12 and e - it = C) = = _ = 8 8 = =8 8 = = = = 8 8 25 20 15 10 5 15 0 2 4 6 8 10 Robin and worm populations Figure 4 . 2 1 12 14 30 16 Figure 4 . 2 2 346 Chapter 4 Eigenvalues and Eigenvectors Sotheourtwosolpopulutionatiisornsealosafctielraaltel!perTheiodigrcaalphsly. ofAs theandrobin populin Fiagturioen increshasesow, tthhate wortheimr number populastisotnarstttaortdecls to idecrne aseaswele, lbut. Asasthethpre erodatbinors's onldisyappear food,stohure worce dimmpopul inishesa­, tandion tbegihe cyclns etorerpeatecovers it.seAslf. Thiits foods oscisluappltiony iisntcryepiasescal ,ofsoexampl does tehseinrowhibinchpopulthe eiatgien­on, valuPlesoart iencompl enx.s, worms, and time on separate axes, as in Figure clearly g r o bi reveals the cyclic nature of the two populations. secta diioffnerbyentlioaokiblenfunctg at whation oftwe, havethendonethe generfromaal sdiolffuetrioentn ofpoithnet ofordviiWenearw.yconclIfdiffuerdeentthiailissequat ionwe have beenis consid,erwhering havee is tahscale forarm. The syste,mssoofif lweinearsimdiplffyeplreontwedial equat i o ns ahead witwher hout ethiniskianvectg, weor.miButghtwhatbe teonmptearedthtocouldeduced thisthmean? at the sOnolutthioenrwoul d be i g ht h and s i d e, we have t h e r a i s e d t o t h e power of a Thi s appearLets'stsotabert nonsby consensied,eryetingyouthewiexprl seees tihoatn therIeniscala cwayulus,toyoumakelearsennstheatoftiht.e func­ tion has a power series expansion 2! 3! that converges for every real number By analogy, let us define 2! 3! Theit converrightg-eshandfor anysideriesaljumatst defirixnedSoi nnowterms ofispower siofx, callanded thitecan be shownofthat a mat r But how can we compute or For diagonal matrices, it is easy. Compute for [ ]. From the definition, we have 2! 3! ] [ ] [ ] [ r(t) w ( t) 4.2 1 4.22, x = x(t) x' = ax x = c e a1 x = ce A 1, c x' = Ax c number e eA. ex matrix. xz x 3 ex = 1 + x + - + - + . . . x. Az A3 eA = I + A + - + - + · · · eA Exa m p l e 4 . 4 6 eD t D= A. e A 1? A, eA exponential 4 0 0 -1 Solution e Dt = I + Dt + (Dt) 2 -- + (Dt) 3 -- +· · · (4t) 3 (4t) 2 0 0 + _i,2 + · 0 ( - t) 2 31 0 -t 1 + (4t) + t C4t) 2 + t, C4t) 3 + · · · o 1 + C - t) + ti C - t) 2 + t, C - t) 3 + · · · 0 et e� t I [; ] The matrix exponential is also nice if is diagonalizable. A A. Section 4.6 Applications and the Perron-Frobenius Theorem Exa m p l e 4 . 4 1 Compute eAforA = [� � ]. In Example 4.43, we found the eigenvalues of-A],to be ,\1 = and ,\2 = with corresponding2eigenvectors = [ � ] and v2 = [ � respectively. Hence, with P = k V2 ] k=-[1 3 -l ],wehaveP- 1AP = D = [ 0 0 ].sinceA=PDP - 1, we have A = PD P , so eA = I A A2!1 A3!3 = PIP- 1 PDP- 1 -2! PD 2P- 1 -3! PD 3P- 1 = P(I D -D2!z -D3!3 ) 0- 1 ] [ 23 ] e [ 2e3e44 - 3e3e-- 11 2e3e44 - 2e2e-- 11 ] are nowial"insoalupostionitiofon to=showwasthatnotoursobolfardoff(andaftseeremiall!ngly foolish) guess at an "Weexponent Let A abel solanution todithaegonalsysteimzable=matrixis wi=th eA1eigcenval, wherueesc,\is1,an,\2,arbit,raryThenconsttahnte gener vector. If an initial condition is specified, then c = Let P diagonalize A. Then A PDP - 1, and, as in Example 1 Hence, we need t o check t h at i s s a t i s fi e d by Pe01 P c. Now, everything is constant except for e01, so ( 1 0) If 4 Solution v1 [vi 4 1 +-+-+ + + + + · · 1 1 + + + · -1 + · · · + · · · p- i - 1 -I 1 _!_ 5 + + x' Theorem 4 . 4 1 341 n Proof X n Ax • x' Ax x x ( O) . Aw x ( O). = x' = Ax . 4.47, x= - 1, 348 Chapter 4 Eigenvalues and Eigenvectors then Taking derivatives, we have [I e Dt = 0 e A2 t 0 0 0 0 ). , ] 0 0 d ( A,t e ) dt 0 � n·, JJ � [ ! �r·' J,] 0 A z e A,t 0 0 A1 0 e A,t 0 An 0 0 Substituting this result into Equation we obtain as reThequirlead.st statement follows easily from the fact that if then since (Why?) , Theorateimon ofmatis rtirxueexponent even if ialsisfornotnondidiagonalagonalizablizable, ebutmatwericwies lrenotquirpres othvee tInhisfact. Comput in more advanced linear algebrIdeala telyxt, tsh. is shortofdiagrmates iroixn, hasa tospiercvtedhattomayil usbetrafound t e t h e power of mat h emat i c s t o gener imporatlaintze andtoolsthine valmanyue ofapplcreicatativioensthiofnkilinng.earMatalgrebrix exponent a, both thieorals etutircnaloutandtoapplbe veried.y Wechaiconcl er as weofbegan itt-ionbygrloookiwthngarate dynami csalofsystems. Markov ns andudethtehiLess chaptliEache model popul a exampl e can be described by a matrix equation of the form ( 10), x' = PDe Dtp - 1 c = PDP - 1 Pe Dtp - 1 c = (PDP - 1 )(Pe Dtp - 1 )c = Ae A tc = Ax x = x ( t) = e A tc, x(O) = e A · Oc = e0c = le = c e0 = I. see A. J. Insel, and byL.ForE.S.example, H.Spence Friedberg, (Engle1979). wood Cliffs, NJ: Prentice-Hall, Linear Algebra 4.41 A Jordan normal form Discrele Linear ovnamical svs1ems discrete linear dynamical systems. Section 4.6 Applications and the Perron-Frobenius Theorem 349 wher ate oftohre ofsystthemeseatsy"tsitmemse" isandrelateids atosqthuare eie gmatenvalrix­. Asuesweande thaveheiegvectenvect seen,or othrrseeofcorlong-thdes ttmatehremsrtibehavi x oxiThematpower methuodes explandoeiitsgenvect the iteroartsi,veandnatuthree ofPersruochn-Fdynami c al s y s t e ms t o appr e ei g enval inforcmieatntiomatn aboutrix thisenonnegat long-termivbehavi or ofaWhen discrertoebeniliins earaus2Theor dynami2 matemcalgirisxvy,esswetsepmecicanwhosaldesizedeccoeffi e. reibalelythane evolinfiuntiitoencolofleactdynami cal syisotensm. geomet r i c al l y . The equat i o n i s r i o n of equat Beginning with an initial vector we have: xk A k A. A A X xk + I = Axk XcJ, X1 = A Xo x2 = Ax1 x3 = Ax2 wiNotThel eisetdthentatify each vect} oisr calin laetdraajectory withofittshheade systseom.th(atForwegrcanaphiplcoalt purit aspaospoies,nwet.) Letin the traj[e0.0ct5orie0.0s 8wi] .thForthethfole dynami c al s y s t e m pl o t t h e fi r s t fi v e poi n t s lowing initial vectors: (a) [ Os ] 1.0 25 ] 2.0 5 ] ( a ) We comput e [ [ 0.0 625 ] , [ 0.0 3 125 ] . These are plot ed in Figure 4.23, and the points are [connect marked ed to higandhlight itnheFitgraurjeect4.o2ry3.. Similar calculations produce the trajectories { Xo , x1, x2 , . • trajectory . xk = A k x0 . Exa m p l e 4 . 4 8 xk + 1 = Axk , A = Xo = x 1 = A Xo = Solution , x2 = Ax 1 = X4 = Ax3 = ( b), (c), (d) y X4 -2 (a) Xj X3 X 2 2 Xo 4 6 x -2 -4 (b) Fioure 4 . 2 3 , x3 = Ax2 = 350 Chapter 4 Eigenvalues and Eigenvectors everysttraandjectwhyory conver in thInisExampl case. Wee can under t] his i]s gsoesfrtoom TheTheororeimgin is calTheled anmatrix A in Exampl, respeectivelhasy. (Ceiheckgenvectthios.r)sAccor [ � anddingl[ �y, focorrr anyespiondinitianl vectg to oitrs eigenvalues and 4.48, 0. attractor 4. 19. 4.48 � 0.5 0.8 we have Because zero askem gets ltahrgate,becausappreo0.aches8 is thefordomiany­ choinantceie gofx0envalbot•hIuneaddiof A,tandion, wiwel knowapprapprofroachachomaTheor mul t i p l e of t h e cor r e s p ondi n g ei g envect o r cient of corresponding to [ � ]) . In other words, [al�l ]trasajelctongorieass except th(toshee coeffi y-axis, as Figure shows.that begin on the x-axis (where wil approach the (0.5) k (0.8) k xk c2 * 0 Xo c2 = O) 4.23 Exa m p l e 4 . 4 9 Discuss the behavior of the dynamical system xk+i Axk corresponding to the ] matrix A [ The eigenvalues of A are and 0.8 with corresponding eigenvectors [ � ] andwe have[ - � ],respectively. (Checkthis.) Henceforaninitialvector Xo [ � ] [ � l xk Ak ( )k [ 11 ] (0.8)k [ - 11 ] Once agaithen ttrhaejeorctoigriynwiislanapprattroaachctort,hbecaus ehxrkoughapprtohachese originfowir tanyh dichoirectcioenofvectoIrf e l i n e t [ - � ]. Several such trajectories are shown in Figure The vectors where arineg tonrajethcteolriyneintthhriosughcasethfole lorowsigitnhiwis ltihnediinrteocttihoenorvectiginor. [ � ], and the correspond­ = 0.65 -0.15 = . -0.15 0.65 0.5 Solution � 0 xk 4.28 = c, = Xo = c 1 0.5 + c2 - + c2 0 c2 * 0, 4.24. Xo · Xo c2 = 0 Section 4.6 Applications and the Perron-Frobenius Theorem 351 y Figure 4 . 2 4 Exa m p l e 4 . 5 0 folDislocwiussngthmate behavi rices: or of the dynamical systems corresponding to the (a) [ � �] (b) [ �.5 �-S J (], a) The eigenvalues of are 5 and with correspondi] ng ei-genvect], ors [ � ] and [ � respectively. Hence for an initial vector [ � [ � we have xk + 1 = Axk A= A= A Solution 3 Xo = c 1 + c2 AsBecausbecomes l a r g e, s o do bot h and Hence, t e nds away fr o m t h e or i g i n . 5 ], e t h e domi n ant ei g enval u e of has cor r e s p ondi n g ei g envect o r al l t r a j e c­ 5 [ � tories for which 0 wil eventually end up in the first or the third quadrant. Tr-lajec­]. tSeeorieFis gwiurteh 4.25(a0).start and stay on the line - whose direction vector is [ 1 (b) In this-example, the eigenvalues are 1.5 and 0.5 with corresponding eigenvectors [ � ] and [ � ], respectively. Hence, k k 3k_ xk c 1 -=fa c1 = y = x 352 Chapter 4 Eigenvalues and Eigenvectors y y 20 - 20 (a) Figure 4 . 2 5 If c1 = 0, then xk = c2(0.5)k [ - � ] [ � ] ask ---+ But if c 0, then ---+ cxi. 1 -=fa and such trnjeoto;ies osymptotkally apprnooh the line See Egme 4.25(b) 4 etu4.de50(becaus a), all epoints th1atforstabotrt outh eineargenvalthueesor; iginis become increasinglIny lExampl argIeninExampl magni a cal l e d e 4. i s cal l e d a becaus e t h e or i g i n at t r a ct s poi n t s i n s o me 5 0( b direThectionsnextandexampl repelsepoishnowsts inwhatothercandirhappen ections.whenIn thitshcase eieg, envalues1 andof a real 2 1. 2 matrix are complex (and hence conjugates of one another) . y � x. ), 0 Exa m p l e 4 . 5 1 l,t l > saddle point 0 repeller. l -\1 1 < l -\2 1 > X Plcorotretshpeonditrajnectg otorythbegie folnloniwingngwimatth rices=: [ ! ] for the dynamical systems xk+ 1 = Axk (a) A = [ 0.0.55 - 0.0.55 ] (b) A = [ 0.0.62 - 1.1.42 ] ure 4.e2as6((a)b)andappear(b),srteospfoectl oivwelany. Notellipettihcatal (ora)biits. a trajTheectortryajsepctiroalriinegs arinetoshtownhe oriingiFin,gwher Xo Solulion Section 4.6 Applications and the Perron-Frobenius Theorem y y -4 -1 (a) Figure 4 . 2 6 Theorem 4 . 4 2 353 0 (b) ExamplThee 4.folSllo(wia)n. g theorem explains the spiral behavior of the trajectory in LetbothAzer=o,[tabhen-Aab Jcan. Thebe facteigenvaloreduases of A are = a ± bi, and if a and b are not A = [ ab - abJ = [ r Or J [ cs�ms()() -cos()sin () J where r = I A I = Va2 b2 and() is the principal argument of a bi. TheA = eit(g2enval u es of A ar e ( ) a ± = 2 a ± 2\/b2\/=l = a ± b i = a ± bi t l l by Exercise 35(b) inaSect-iobn 4. 1 . Fia/rgure 4.-2b/r7 displaysra O bic, r,s()and()-.sIint fol() lows that A = [ b a J = r [ b/r a/rJ [ r J [ sm� () cos() J Geomet r i c al l y , Theor e m 4. 4 2 i m pl i e s t h at when A = [: �b J thce�sl()inear- sitnra()nsfothrrmoughationthe angle A() folilsowedthe bycompos i t i o n of a r o t a t i o n = r O a scaluiesng are= [ = 0.r5J ±wiOth.Sifac-so [tosrmr() (Figurcos()e 4.J 28). In Example 4.S l (a), the eigenval r = I A I = \./2/2 1, and hence the trajectories all spiral inward toward The next theorem shows that, in general, whena a-realb 2 2 matrix has complex eigenvalues, it is similar to a matrix of the form [ b a J . For a complex vector x = [ :J [ ::�:J = [ : J [ �} ,\ 0 + + Proof �) + = Im 0 -=F 0, Remark a + bi r e Figure 4 . 2 1 T(x) = x S = b R ,\ 0.707 < X + 0 0. 354 Chapter 4 Eigenvalues and Eigenvectors y ln / Sca i g Ax = SRx Rx / Rotation rotation followed by a scaling Figure 4 . 2 8 A we define the real part, Re x, and the imaginary part, Im x, of x to be Rex [ :] [ :::] and Im x [ �] [ :::] andandLet corrbe easpreondial ng eimatgenvectrix wiorthx.aThencompltheexmateigenvalrix ue A[ Re x Im x](wheris inevertible = = Theorem 4 . 4 3 A 2X2 P= = a - bi b i= 0) Let x Axso that Rex and Im x From x Ax, we have Equating real and imaginary parts, we obtandain Now[ so ] [ ] to s,htowhenthitatwoulandd folarelowlinthearatly indepen­for dentsomeTo. I(fnshonzerowandthoatcompl weriseinnoteverx) slticinbalearlae,rilyt iisbecaus nenough dependent e neither nor Thus, x Now, because is real, x Ax implies that Ax Ax Butso is an eigenvector corresponding to the other eigenvalue A = u + vi =u = v. A = Au + Avi = Ax = = (a - bi) (u + vi) = au + avi - bui + bv = (au + bv) + ( - bu + av)i Proof Au = au + bv Av = - bu + av P = [u l vJ , a -b a -b = [ul vl = [au + bv l - bu + av] = [Au l Av] = A [ u l v l P b a b a = AP P u v u v v = ku k, u v is 0. = u + vi = u + kui = ( 1 + ki)u A A = Ax = Ax = Ax = = x = u - vi = a + bi. x = ( 1 + ki)u = ( 1 - ki)u Section 4.6 Applications and the Perron-Frobenius Theorem 355 becaus muleigenvect tipleesoofriss coraandrerealstphvectondiereforonr.geHence, artoedimulstitnhtctiepleieiesggenvect ofenvaloneuoesanotrsmusxhandert.beThiliofsniearsAimlaryposienbotdependent sibhlenonzer becausbyoe Theorems.)4.20. (This theorem is valid over the complex numbers as well as the real number inverThitibslecont. It nowradicfoltiolonwsimtplhaties that and are linearly independent and hence P is Theor0.2 em- 1.4.24]3 serves to explain Example 4.5 l (b). The eigenvalues of A = [ 0.6 1.4 are 0.8 0.6i. For ,.\ = 0.8 - 0.6i, a corresponding eigenvector is Frhaveom Theorem 4.43, it follows that for P = - 01 ] and C = [0.0.68 - 0.0.68 ] , we A = PCP- 1 and P- 1AP = C For the given dynamical system 1 = Axk, we perform a change of variable. Let Then so NowChast calthe sorysitgeimn byhesameei =TheorCyemkg4.enval sim42.pluyesasA( rotates wthhy?e poi) andnts i0.n 8ever0.y6triaje=ct1.Thus,t ory in a cihedynami rcle about­ cttroarnsyfoofrmthateiodynami c=al Axsyst=emPCPin Exampl e t4.ra5nsl (fob)r,mweatioitn­ 1 ( ) ercanatiTobeveltdetyhapploughtermyitnofheeasalinttrhearaejecompos n T x x . The ofvariaofblevar(xitaobly)e (,yfolbacklowedto x)by. Wethe rwiotlatencount ion deteerrmtihnieds idbyeaC,agaifolnlowedinittihobyen applofthae irchange ecatveriosne tchange os"grinaphiSectnigonquadr6.3. aInticExerequatcisieons74 iofn SectSectiioonn 5.5.55,and,you wimorl sehgener a l y , as " c hange of bas i as itToappearsummars to ibeze frthoen:mowIFifgaturhreateal4.t2h2e6(trb2a)jmat.ectorriyx AinhasExamplcomple 4.ex5ei1 (gbenval) is inudeedes,.\ =an ellipsbie,, thenis athe trajectories of stphieradynami cald isfy,.\stem 1 i=s aAxk spiral inwardandif l,.\ie on a1 l out w ar closed orbit if ,.\ = 1 is an u x u u v ±: xk + � l Yk +i X (O 1 1 spiral attractor) , (O xk + I 1 1 > (O orbital center) . ±: l a ±: spiral repeller) , 1 1< Vi gnette Ranking Sp o r t s Te a m s and S e arch ing t h e I nternet Inrankanythecompet istiorve tsepamsorts. Count league,inigt iwis notns andneceslossaersilyaloanestroveraightlofooksrwarthde proces slittoy pl a yer pos s i b i tanothat honeer tteeamam maymay haveaccumulfewaerteviacltaorgrieesnumber of vithcemtoriagaies againstnssttroweakng tetamseams. Whi, whiclhe but al l of ofanotthhesere?tShoul eams dispoibetnttesrs?cHow shtoulakend weintocompar e t?wPoio tnetamss agaithnatst?never play one or e d be account Despite thaceeseandcomplmuch-exitiaents,itchipeatraenkid fenatguofreatinhltehteesmediand asp. Fororts exampl teams hase, thbecome avarcommonpl erseandare itenniouss plannual r a nki n gs of U. S . col l e ge foot b al l and bas k et b al l t e ams , and gol f er ustheedprtooblpraeyeromducebys arusesuialnchsgotrhraaenkinkedidneasgsi,nfrbuttoemrnweatthiioscannalchaptlgaiy. Therenr.somee arienmany sight incopyrto howightteodapprschemesoach let's revi. sWiit Exampl elos es arFiveeretecornnidsedplianyerthseplforaymoneof anota digTohraerphesitnainbla whiirsohund-tchheabasrodibiricenctitdoeea,durnament n s and edgerifrx omtherefore to indihascates that1 plifaplyerayerdefeatdefesatpls playerayer Theandcorhasrespondi0notghaderwacency mat ise. 3.69. j aiJ = j A 5 i j 3 j. i 0 1 0 0 0 1 A= 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 2 4 i aiJ = We iwoulndicatd elsiktehatto plasasyerociate isa rraanked nking morwie thhigplhlayyerthaninplsauyerch a wayFor tthhatis r; > r1 356 i r; i j. purpose, let's require that theand'sthbeenprorogbabianizleittiehse (rtahnkiat nisgs, in a for all and r1 + r2 + r3 + r4 + r5 = 1 ) 0 :s r; :s 1 i, ranking vector r ; d be pre,oplporayertionaldefeatto thede plsuamyerofs tFurhe trhaandnkiermnorgsseo,oflweetth'swant eiplnsaisyert ths atdefeatplayered byi's plraankiyerng Forshoulexampl he consthteantfoloflowiprnoporg sytsitoenalm:ity. Writing out similar equations for the other plwherayeres prisotduces 2, 4, i. 5, 1 a r 1 = a(r2 + r4 + r5 ) r2 = a(r3 + r4 + rs ) r3 = a(r 1 + r4 ) Observe that we can write this system in matrix form as or '1 r2 f3 = f4 f5 ll' 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 '1 r2 f3 f4 f5 r = aAr Equiis anvaleiegntenvect ly, weosrecore thratespthondie rannkig tnogtvecthe mator rixmust satisfy In other words, Fur t h er m or e , i s a pr i m i t i v e nonnegat i v e mat r i x , s o t h e Per r o nF r o beni u s Theo­ vectremoguarr turannts eoutes tthoatbethere is a ranking vector In this example, the ranking r r A 1 Ar = -r. a A! unique r. 0.29 0.27 r = 0.22 0.08 0.14 so weBywoulmodidfryainkng tthhee plmatayerrixs in tiht eisorposdersible to take into account many of the com­ plseerxivedtietsomentindiciatonede oneinustehfule openiappronachg partoatgrheaph.problHowever em of ra, nkithins gsitmeams.ple example has A, 1, 2, 3, 5, 4. 351 The same i d ea can be us e d t o under s t a nd how an I n t e r n et s e ar c h engi n e s u ch as Googlefulesworites kwouls. Oldderoftseenarbechburengiiendesamong used toir reelteuvantrn thonese res.uMuch lts of ascsreoarl cinhg was often Usneeded sisuneeded. lts to uncover accordwhating toyoutheiwerr liekelloyokirelnegvance.for. ByThuscont, raametst, Googl hod fore reratunkirnns gsewebsarchirtees­ , we nowthehavesituatwebsion, onlitesylinownkinang tedgeo onefran­om ottoherjIni.nsWetdieadccanatofesoncettheamsat websagaiplaniyitusenegliaonendiksgtranotoaph(ohrtoerremodel tdieamgraph,digrianph,comiinncomig dirnegctdiedreedges cted edgesare goodarefebadr(sthtoey(t)hwebsieyndiincidiatteecj.atliSone kslowhersfreos)me,asforotforhterhtehsIeintestps)eror.nIettns tinhgsis sofetalinl g,thwee webswantitetshtehatrankilinkngtoof website to be proportional to sum of the rank­ Using the digraph on page to represent just five websites, we have formatexampl e . I t i s eas y t o s e e t h at we now want t o us e t h e of t h e adj a cency Thereforor eof, the rIannkithnisgexampl vector er,muswe obtt saatiisnfy r -rand wil thusribex ofthtehPere dirgornaph.eigenvect . and r so aGoogl searche actthatualtulyrnuss eups atvarhesieantfivofe stihteesmetwoulhodd ldesist cthriembedihern theeandordcomput er es the rank­ ing vector via an iterative method very similar to the power method (Section unordered. ordered i i i i. 356 r4 = a(r 1 + r2 + r3 ) A r. 0 0 0 T A = 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 transpose 1 AT = a 0 14 0.08 = 0.22 0.27 0.29 5, 4, 3, 1, 2. 4.5). 358 I Section 4.6 Applications and the Perron-Frobenius Theorem Exercises 4 . 6 17. Markov Chains Which of the stochastic matrices in Exercises 1-6 are regular? 1. 3. 5. [� �] [i � ] O[ 0.J5 0 � ' ] 0.4 0 0.5 2. 4. 1 6. [ � !] 0 [! 0 �] [ �".5' 00 � ] 10. P - 1 LP 18. [l [ [i iJ 02[ 0.6 0.0.31 00.4 ] 0. 2 0. 2 0. 6 eovstchaieadynsitsatunie prqoue.bability vectUseorTheo­of a rremPregulove4.art3h3MaratorthkTheor em 4.34.] 3 2 I 6 8. P = 4 In Exercises 1 1 -14, calculate the positive eigenvalue and a corresponding positive eigenvector of the Leslie matrix L. 1 11. L = 12. L = 15. 16. [ 0.5 [ 0.O 5 � ] 7 5 L [�s 00.5 � ] [i 0 � ] If a Lesisltiheematsigrniixfihascancea uniforqtuehe pospopulitivaetieiongenvalif ue what VermatirfiyxtLihatntEquat he charioanct(e3r)isistic polynomial of the Leslie () det(L Use matalohngemattheiclaalstincolductumn.ion] and expand 14. L � 2 3 A1 < l ? A1 = 1 ? S1 A1, A1 > 1 ? cL A = ( - l ) " (A " - b 1 A " - 1 - b 2s 1 A " - 2 - b 3s 1 s2 A " - 3 - · · · - bns 1 s2 · · · Sn - I ) [Hint: - Al) S1S2 S1S2 · · · sn - 1 L. [Hint: L A1 S1 / A 1 S1S2 / A i S1S2S3 / A i l [Hint: � 1 1 Population Growth 13. s; l 7. P = 9. p = If all ofthe survival rates are nonzero, let 0 0 0 00 0 0 00 00 0 Comput tSectic polionynomi4.e 3.] al ofand use itRefeto firndtotExerhe charciseact32eriins­ Verify that an eigenvector of corresponding to is P= In Exercises 7-9, P is the transition matrix of a regular Markov chain. Find the long range transition matrix L of P. 359 SectionCombi 4.3 andneExerExercciissee4617iabove n Sectiwionth4.Exer4.] cise 32 in [Hint: GAs In Exercises 1 9-21, compute the steady state growth rate of the population with the Leslie matrix L from the given exercise. Then use Exercise 18 to help find the corresponding distribution of the age classes. 19. 20. 21. GAs 22. ExerExercciissee 4039 iinn SectSectiioonn 3.3.77 ExerManycisspe eci44eisnofSectsealionhave3.7 suffered from commercial huntand meating. They havefur trbeenade, ikin lparedtforicultahrei, rresduced kin, blsuobberme , . The pointaoftioextnsiarenctideclon. iToday, tshhe grssteoalecksatpopul esduet thartteoiatoovernss totofisealtshheinpopul n e of fi habiby fitsahter, enty owners anglement. Someing,smarepolalsilnuhaveetiodebrn,beendiiss,tuanddeclrbanceculareldiofng endangereTabld specie 4.e7s;giotvheserthspeecibiretsharande carsuerfulvivlyal rates managed. forclasthees. nor[Thetherdatnafuraresealbas,edidvonided iE.ntoYor2-kyearandageR. HarNorttlheery,n"PFurup Product i o n Fol l o wi n g Har v es t of Femal e Seals(1981) ;' , pp. 84-90.] A. Aquatic Science, 38 J. Canadian Journal ofFisheries and Chapter 4 Eigenvalues and Eigenvectors 360 (b) sShowents that = if and only if ,\1 = Let(This repre­ g(,\) = ,\ ,\2 ,\3 " ,\ Show t h at ,\ i s an ei g enval u e of i f and onl y i f g(As,\s)umi= n1.g] that there is a unique positive eigen­ lvalpopul atiuoen,\aits1i,odecrsnhowiseiasnthcrinategasanding. if andifonlandyonlif thyeifpopu­ the r 1 1. zero population growth.) [Hint: bnS1S2 · . . Sn - 1 b l bzS1 b 3S1S - + - + --2 + . . . + ---L (c) A sustainable harvesting policy is a procedure that allows a certain fraction of a population (represented by a population distribution vector x) to be harvested so that the population returns to x after one time interval (where a time interval is the length of one age class). If h is the fraction of each age class that is harvested, then we can express the harvest­ ing procedure mathematically as follows: If we start with a population vector x, after one time interval we have Lx; harvesting removes hLx, leaving Ta b l e 4 . 1 Birth Rate 0.00 0.02 0.70 1 .53 1 .67 1 .65 1 .56 1 .45 1 .22 0.91 0.70 0.22 0.00 Age (years) 0-2 2-4 4-6 6-8 8- 10 10-12 12-14 14- 1 6 16- 1 8 18-20 20-22 22-24 24-26 r< 1 r> 1 Survival Rate 0.91 0.88 0.85 0.80 0.74 0.67 0.59 0.49 0.38 0.27 0.17 0.15 0.00 Lx - hLx = ( 1 - h)Lx = mattIhf ath,\1riixs theanduni1/,\qiues 1t.hpose siutisvteaieinablgenvale haruveesoft araLestiol,ipre ove Fiandnd cartheibsuoustaininExerable charisevestinraSecttio forionthe wood­ Cons t r u ct t h e Les l i e mat r i x for t h es e dat a and l comput eigenvalor.ue and a corre­ Useduceing tthheedatcaraibinouExerhercdisaccor e idninSectg toioyourn answer sInpondithe lnoegngtposheruposin,tivwhatietieivegenvect r each age clas and whatperwicentl thaegegrofowtsealh sratwielbe?be in Findtorothipargeinsutals(talae)ivel.nVerablafeitfyeharrtonehvatesttthimreapopul etioinfoterravtthalioe.nserealtuinrns to its Exerseal popul cise ati(oCnsonswhenervatoverionifistsshhaveing hashadretduced o harvteshet lable foodof stasurvpplatiyon.to)the point where the seals are The = of a population is defined as avaiiLetn danger be,\1.aShowLesliethmatat ifri,\x iwis anyth aotuniherque(reposal oritivcom­e eigen­ val u e wher rth raatteison.and the are the plr(ecx)os eigenvalsinu()e)ofand tshuenbst1i,\tu1 te i,\t 1in. to the Wrequatiteio,\=n survExplivealthraeaintewhysarfore ththecanebipopul bers iborntenrptroetaesdinasgltehfeemalavereaoverge g(Theor,\) =em andas inthparentta(bke) ofExer ciparse t ofUsbote hDesidMoies. vThere's number of daught e t h e r e al her lifetime. Triangle Inequality should prove useful.] (a) L Sustainability requires that (1 - h)Lx 24. cAs x L h =1- 25. (a) 44 44 (b) 3.7. 3.7, (b) 26. Exercise 23 shows that the long-run behavior of a popula­ tion can be determined directly from the entries of its Leslie matrix. 23. net reproduction rate � 27. r b 1 + b 2s 1 + b 3 s 1 s2 + · · · + b n s 1 s2 · · · s n - l b; sj (a) r 22. L () + i 1, L, :s [Hint: 23. Section 4.6 Applications and the Perron-Frobenius Theorem In Exercises 28-31, find the Perron root and the correspond­ ing Perron eigenvector ofA. 2[ 1 01 ] [ : � i] 28. A = 30. A � 29. A = 31. A � ( 1[ 2 03 ] [: 0 �l 39. [Hint: i ( a) l cA I = l c l I A I (c) I A x l :s I A I l x l 41. 42. 33. A � A In Exercise 40, the absolute value of a matrix A = [ a j is defined to be the matrix IA I = [ l a u l 40. A B nXn x IJ�r, c It can be shown that a nonnegative n X n matrix is irreduc­ ible if and only if I + A) n - t > 0. In Exercises 32-35, use this criterion to determine whether the matrix A is irreduc­ ible. If A is reducible, find a permutation of its rows and columns that puts A into the block form [! � � �l (b) I A + B l ::; I A I + I B I (d) I A B I ::; I A I I B I a a A = " ,2 · a z1 a z 2 a2 1 = Prove that a matrix 0.[ ] red 'ble iLetf and beonlaynonnegat if 0ivore, ir educible matrix such tAh1atand- beistihneverPertirbolne androot and- Per) ron2eigenvect Let or of Prove that 0 A 1. Apply Exercise iDeduce n Sectiofrno4.m3 (anda) thTheor e m 4. 1 8( b ) . ] at 2X2 a1 2 = A I A. ( a) IS A v1 (I < 1< A -1 UC! 0. [Hint: 22 00 01 0 00 0 00 01 00 00 0 10 00 1 01 01 01 00 01 00 01 0 0 0 1 1 0 0 0 0 showed. 1128,, for fo2r 12 tI(hfAatgrisaphtihs eiirsadjreduciacencyble imatf andif trhixeronlofe yiasigraf pataphishconnect 2 , for everWhicyhpaiofrthofevergratphsices.i)n Section 4.0 havebetanween l 1, 1, for 2 iprr iemduciitivbeladje adjacency acencymatmatrixri?x? Which have a Let Showbe atbihatpartiitsenotgraphprimwiitthivadje. acency matrix 0, 5, 4 for 2 Show tUsehat iExerf A iscanise ei80genval u e of s o i s A. 0, 1, for 2 i n Sect i o n 3. 7 and par t i t i o n anthiseipargenvecttitioniorngforofA soUstheatthitisisparcompat i b l e wi t h titioning to find an ei g envect o r for A. ] Atex.grLetaph isbeakcalledregular graiph.f k edges meet at each verEV - The recurrence relation in Exercise 45. Show that your s o l u t i o n agr e es wi t h t h e ans w er t o Exer c i s e 45. Show t h at t h e adj a cenc y mat r i x of has A emk4.as30.an] eigenvalue. Adapt Theor tCompl hat if tehtee rthecure prroenceof ofreTheor lationem 4.38(a) by showinhasg 35. A = 34. A = 36. (a) A A G, G connected ( a) (b) G A. A A, [Hint: (b) v1 > Av, . Linear Recurrence Relations In Exercises 43-46, write out the first six terms of the sequence defined by the recurrence relation with the given initial conditions. 43. Xo = Xn = 2xn - I n 44. a , = a n = a n _ , /2 n 2 45. y0 = O, y 1 = yn = Yn - I - Yn - z n 2 46. bo = b , = b n = 2bn - I + b n - 2 n 2 (b) 37. Showuestarhate alifl leiss prthiankmitivine,absthenoluthteevalothuere. eigen­ valAdapt Theor e m 4. 3 1. ] iExpln ligahtinofthExere resucilstessof36-38your andexploSectratiioonn i4.n5Sect. ion 4.0 ] ]. Leta scalaandr. Provebethe follomatwinrgicmates, raixvectinequalor inities: and (b) The Perron-Frobenius Theorem 361 In Exercises 47-52, solve the recurrence relation with the given initial conditions. 47. Xo = X 1 = Xn = 3Xn - I + Xn - 2 n 2 48. Xo = X 1 = Xn = 4Xn - I - 3Xn - 2 n 2 A. 38. k-regular ( a) 52. G = A [Hint: G 53. Xn = axn - + bxn - 2 i Chapter 4 Eigenvalues and Eigenvectors 362 diofstthinectforeimgenvalues ,\2, then the solution wil be e 4.40 works in generShowaShow l.] thatthforat tanyhe metchoihodce ofofiExampl nitiandal condic2 cantiobens rand t h e s c al a r s found, as s t a t e d i n Theor e m 4. 3 8( a ) and ( b ) . iInfitthiaeleicondigenvalltiounses areand ,\20,are dist1,inscthowandththate ( )(,\7 - ,\�). asThesociFiabtonacci ed matrriexcurequatr]enceion l l wherhase the and [ 1 0 ] tWiionthtf0o pro0veandfthat1 1, use mat] hematical induc­ [ Usforinalgl par2t 1.(a), prove that ( - 1 )" This is Gicalolvanni ed Domenico Cas­ afsfointreialr(1625-1712) tlhe 2ast1.ro[nomer sinsi wasXIVbor, movedn in Iitnaly1669but, onto Frthaence,inviwhere tation.heofCasLouibecame di r ect o r of t h e Par i s ObserevdattohreyFr. HeenchbecameversioanFrofenchhis name: citizenJeandan­ adopt DomininiqteueresCasts otsihneri. Matthanhematastroinomy. cs was Casonesofinihi'ss many ed in 1680ofinScia epapernces isnub­Paris.] miAnIdentt 8eidtyto8wastchecker hepublRoyalibsoarhAcademy itno form Figureth4.e259(a)1and3 recttdhaecannglpieebecesindiFiresgasurectseeembl4.d 2as9(edsbh)own. ,\1 * v [Hint: 54. (a) x0 = x1 = s, (b) Xn = 55. A 1 - Az f,, = fn - I + fn - 2 xn = AXn _ 1, (b) = Therectaarngleae'ofs artheeasiqs uar65 seqiuars 64e suniquartse! Wher units,ebutdidtthhee exthaveratsoquardo wie cometh thefrFiobm?onacci seWhat does t h i s Youof 1 have2 tialessupplandyoneofthkireendkiofn1ds of1ttiquence? lieles:, astwsoh]kiownndsin Figure 4.30. 56. X X D CJ = ways toe,cover a tthhatthesofe tdii5.leffs.eForrentexampl 1FigurLeter4.ect3bea1 nglshthowseewinumber Fi(Dnoesd make any sense? If so, what is it?) ence rteiolanstio, snofortlve t"h'e UsrSetecurinupgreancesandecondretl2atasiorotndherienirnpareicurtital(rbcondi against the data in part (a) . ) . Check your answer Figure 4 . 3 0 A " = Jn + I fn Jn fn - 1 n fn + ifn - 1 - fn2 = n (c) (b) [Hint: A= n- 1 Cassini's Identity, __.v" ...... Figure 4 . 2 9 x1 = Xo = Xn = ILtfn (a) _...... c1 ,\1 � v � 1.._... ...- .. X n � (a) (b) (c) tn t1, • • • , t5• t0 t3 = t1 X X \ v v ...... � \' \ \ The five ways to tile a 1 X 3 rectangle Figure 4 . 3 1 \ ' (a) 57. Youto coverhaveaa2supplyreofcta1ngle2. Letdominbeoesthwie tnumber h whichof diFiffgurereent4.3ways2 showsto coverthat the re3.ctangle. For example, X X n dn d3 = Section 4.6 Applications and the Perron-Frobenius Theorem 363 x' == Xx - 3z, yx((OO)) == 32 ( ) = 3x = z O Em Aa petscireintdiissthpl. Iancesitiatlwy,othsterraeinars eof400bactofXeriaand, X and500 ofY.in / l � nottThehe number ftewedo bacton seacheofritahotcompet he sterra. iInfesxfoat=rtifoodme andanddayss,p=tacehe grbutoarwtdoeh a t i o ns ar e gi v en by t h e s y s t e m [8 BJ ITIJ rates of the two popul x' == - 0.l .22xx The three ways to cover a 2 X 3 rectangle with 1 X 2 Detatioensrmbyinesowhatlvinghappens tmo thofesdiefftweroentpopu­ial dominoes lequat t h e s y s t e iroensth.e effect ofchanging the initial populations Fi n d , Expl o ..-... Set(Doesup a smake anyordserensreecur? Ifresnceo, whatrelatiisoint?for) bylpensettointgx(he popul O) atandionsy(inO)te=rmsDesof candribe what hap­ e cond ragaiUsecurinngrset ncethande rdatelaatiasionnparthine tparin(ait)ti.a(lbcondi ) . Checktionsyour, solansve twhere TwoThatdependssips,ecineionetsh,thXereandsotphecieresforlivcaneitisnssuaurrvvivivealon. Initistrieownallayti,otandnsherheieachp. In Example find eigenvectors and corre- arsizees ofofXthe popul and 1 0 ofY. I f x = ar e t h e and = a t i o ns at t i m e mont h s , t h e gr o wt h sponding to, Aver1 =ify and A2 = 2 . With rates of the two populations are given by the system xk = ] fo:mula (2) in Section That is, 0. S x x' = = show that, for some scalar c1, Det e r m i n e what happens t o t h es e t w o popul a t i o ns . x hm A1 = C1V1 = = x' = x x = [;] x(O). x = = x' = 2xx yx((OO)) = 0 ] ,b = [ - 3010 ] ,x(O) = [ 2030 ] l = _� [ x' == 2x- x- y, xy((OO)) == [ ] 1 - ' - [ 40O J ' x(o) - [ 3010 ] = x;x� == XX11 - XXzz,, XXz1 ((O0)) == 0 Let x = be a t w i c ed i ff e r e nt i a bl e funct i o n and cons i d er t h e YiY� == Y1Y1 - YzYz,, YzY1 ((0O)) == 1 x" 0 X1 == X - yx((OO)) == 0 Show= x althlatowstheEquatchangeionof vartioablbeeswri=t ex'n asanda sys­ tem of two linear differential equations in and = x z (O) = 64. 65. + y' z' + 2y + Z, z, 4 x (t) Figure 4 . 3 2 (a) (a) (b) (c) d1, d0 . • d5. • d1 58. (b) dw d2 4.41, I fk Ltk- 1 v1 v2 llili_ In Exercises 59-64, find the general solution to the given system of differential equations. Then find the specific solution that satisfies the initial conditions. (Consider all functions to be functions oft.) = 59. = + 3y, 5 y' + 2y, 60. 1 1 y' + 2y, 61. + 1 EV 1 + y' z' y + + y, b. Y, Z, Z, -1 y y (t) a symbiotic x (t) t y' svstems of L i n e a r D i f f e r e n t i a l E q u a t i o n s 63. =a 4.5. . kk k->oo 62. y' 15 1 - Vs t 0.2y + I.Sy 66. 1 + Vs Y, b. y y (t) + 0.4y 0.4x - 0.2y In Exercises 67 and 68, species X preys on species Y. The sizes of the populations are represented by x x (t) and y y (t) . The growth rate of each population is governed by the system of differential equations A + b, where and b is a constant vector. Determine what happens to the two populations for the given A and b and initial conditions (First show that there are constants a and b such that the substitutions u + a and y v + b convert the system into an equivalent one with no constant terms.) 67. A 1 1 b -1 -1 69. x ( t) second order differential equation + ax' + bx = ( a) y z 1 1) ( y 68. A (1 1) z. Chapter 4 Eigenvalues and Eigenvectors 364 the sShowShowytshteatmtthhinaterparethiest achar(achange ) ias ct,\erofistivarc equatiableisotnhatofconver ts the nth _ into acsiyentstematm ofrix liisntearhe compani differentoianl equat ions whosoftehe coeffi mat r i x pol[Theynominotaatilp(on ,\) denot,\" es th_e,\kt-h derivative of See Exercompaniciseson matriinx.Sect] ion for the definition of a (b) 70. 2 + a,\ + b = 0. order differential equation n ( x ) + an 1x(n I ) + + a 1 x' + a o = 0 n C( p) I n = + an 1 + + a1A + a0• x (k) x. 26-32 4.3 · · · · · · In Exercises 71 and 72, use Exercise 69 to find the general solution of the given equation. 71. x " - 5x' + 6x = 0 72. x " + 4x' + 3x = 0 In Exercises 73-76, solve the system of differential equations in the given exercise using Theorem 4.41. 74. 60 59 73. 76. 64 75. 63 ExerExercciissee ExerExercciissee D i s c r e t e linear D v n a m i c a l svstems In Exercises 77-84, consider the dynamical system xk + 1 = Axk. 1 (a) Compute and plot Xo' x1 , x2 , x3 x0 = . 1 1 (b) Compute and plot Xo' x1 , x2 , x3 Xo = 0 (c) Using eigenvalues and eigenvectors, classify the origin as an attractor, repeller, saddle point, or none of these. (d) Sketch several typical trajectories of the system. 2 0.5 - 0.5 78. A = 77. A = 0 3 0 0.5 for [ ] for [ ] . [ l] [ Chapter Review adjalgebroinat iofc mula mattiprliixci,ty of an charchareigenval aactcteerriiusstte,iicc polequatynomiion, al, cofact Crdetaemerrmoir'nsexpans Rulant,e, ion, Kev Defi nitions and Concepts 276 294 266 274-275 263-265 292 292 ] 79. A = [ _� - � ] [ -�] [ ] 80. A = 0.1 0.9 0.5 0.5 - 1 .5 84. A = 1 .2 3.6 1 .5 -1 0.2 0.4 83. A = - 0.2 0.8 81. A = [ [ -� -�J [ ] [O ] 82. A = ] In Exercises 85-88, the given matrix is of the form a -b A= . In each case, A can be factored as the b a product of a scaling matrix and a rotation matrix. Find the scalingfactor r and the angle e of rotation. Sketch the first four points of the trajectory for the dynamical system and classify the origin as a spi­ xk + 1 = Axk with x0 = ral attractor, spiral repeller, or orbital center. 1 0 0.5 85. A = 86. A = 1 - 0.5 0 V3 1 - V3;2 - 1 /2 88. A = 87. A = V3 1 1/2 - V3;2 In Exercises 89-92, find an invertible matrix P and a maa -b trix C of the form C = such that A = PCP i . b a Sketch the first six points of the trajectory for the dynamical system xk + 1 = Axk with Xo = and classify the origin as a spiral attractor, spiral repeller, or orbital center. [�] [ -�] [ ] [ 89. A = 91. A = [ ] [ � -�] 0.1 - 0.2 0.1 0.3 ] [ [ ] [�] 90. A = 92. A = - [ _� � ] [o - 1 ] 1 V3 ' GersGerscchgorhgoriinn'sdiDisks,k Theorem, dieigaenval gonaluize,able matrix, eieiggenvect or, LaplacemetExpanshod (iaondn Theor em, power i t s ens p ace, Fundament al Theorem of Invertible prvaroperianttiess),of determinants, Mat r i c es , geomet r i c mul t i p l i c i t y of an eigenvalue, similar matrices, 254 254 256 296 294 319 303 3 1 1-319 269-274 301 321 266 ] Chapter Review 365 MarFork eachall sofquartheefolmatlowiricnesg A,statdetement( -A)s tr=ue-or<lefat A.lse: LetA = [ - : - : � . -2] (b) <lIfeAt (andBA).Bare matrices, then det(AB) = ristic upolesyofnomiA. al of A. IfarAe tandhe sameBarebut in dimatfferreintcesorwhosders,etcolhenumns (b) FiFiFinnnddd alathbasleofcharitshforeaeicteachgeenval of t h e ei g ens p aces of A. <l e t B = <l e t A. Det e r m i n e whet h er A i s di a gonal i z abl e . I f Adiisago­ ( IfIfA iisstihneveronltiybleie,gtenval hen detue(ofA -a sq=uar<leetmatAT.rix A, not di a gonal i z abl e , expl a i n why not . I f A i s nalimatzablrei,xfindsuanchitnhverat tPib-l1eAmatP =rix P and a diago­ tTwohen Aeigienvect s the zeroros matcorrreisxp. onding to the same nal I f A i s a 3 3 di a gonal i z abl e mat r i x wi t h ei g enval u es eiIfganenvalue musmattribex haslineardilystdependent . 2, 3, and fi n d <l e t A. iItfmusan t be dimatagonalrix iiszabldiaegonal . inizctableieg,envalthenuitesmus, thent If-A is a 2 2 matrix with eigenvalues A1 = t, = and cor-responding eigenvectors v1 = [ � ], = [ - �l haveSimilardimatstinrictceseihavegenvalthueessa. me eigenvectors. Irfeduced A and rBoarweecheltwoon form,matthrenicesAwiis tshimthilearsatmeo B. fiIfnAdAis asdi[ �aJgonal. izable matrix and all of its eigenvalues n s a t i s f y pr o ve t h at A appr o aches t h e zer o ma­ 3 t r i x as get s l a r g e. LetA = [ 3 � ] . A B. A B, P Comput e <l e t A by cofact o r expans i o n al o ng any rComput ow or cole <luemn.t A by first reducing A to triangular P- 1 AP= B. = [! � ] ,B = [ � � ] form. If = 3, find 14.A = [ � � ] ,B = [ � � ] Let<let AB and= -B�.beFind <let matC forricteshewiintdihc<lateet dAmat= 2riandx C: B � �] i : J . [ [ 1 � 2 � T C= ( A B) ( b )C=A B ( 3 A ) prIfAoveis tahsatkew-<letsAymmet = ric matrix and2 is odd, Let AA has= [ei� genval�] . Fiunesd3alandl values of for which: Find all values of for which 2 = (b) AA hashas noan eiregalenvaleigenvalue wiutesh .algebraic multiplicity 2. x A IIffAa sq=uarA,e whatmatriarex Athashe postwosibequalle eigrenvalows, uwhyes ofmusA? t A have as one of i t s ei g enval u es ? ,A x = [ � ] = [! �] Ihf atxixis ans aleisgoenvect oenvect r ofAowir ofthAeigenval ue 2I.= What 3, showis t an ei g SA ] t h e cor r e s p ondi n g ei g enval u e? 1 H J [ �: - 32 vectIf A oisr sofimA,ilasrhowto BthwiatthP -P1-x iAs Pan=eiBand genvectxiosr ofanB.eigen­ Review Questions 1. (a) 9. nXn (c) 0 ( a) nXn d) (c) (d) i) 0 (e) (f) (g) nXn (h) nXn n (i) n 5. 6. 11. X 4, 12. 1 A2 - 1, I A I < 1, n 5 7 9 11 In Questions 13-15, determine, with reasons, whether is similar to If � give an invertible matrix such that (h) 4. X D. nXn (a) 3. 10. D Vz (j) 2. 0 13. A a b c d e f g h 4X4 ( a) 0. k 3d 2e - 4f f 3a 2b - 4c c . 3g 2h - 4i 15. A � nXn -1 1 k 1 4 k2 n ( a) 0. 17. 18. 7. 19. s. F A� k 16. In Questions 7 and 8, show that is an eigenvector of and find the corresponding eigenvalue. - 60 - 45 15 18 - 40 � 20. ( c) - 1. 3 0 2 + A O rthogona l ity . . . that sprightly Scot of Scots, Douglas, that runs a-horseback up a hill perpendicular- -Wil iam Shakespeare Act II, Scene IV Henry IV, Part I 5.0 I n t ro d u ctio n : S h a d ows o n a Wa l l r, we wierl extandendthentheagainotnioinnofChaptortheogonal priol jnow,ectiowen thhaveat wediencoun­ tonlIenretyhdiprsfiorchaptsjtectinieoChapt r Unt sbscuspsaceed n ont o a s i n gl e vect o r ( o r , equi v al e nt l y , t h e oned i m ens i o nal s u spannedas forbyprtohjeatctvection oontr) .oIna plthaisneseictn ion,Fiwegurwiel seeshifowswe whatcan fihappens nd the anal, forogousexamplfor­e, mulwhen ona tawwalo-dli.mAenssimioinallar sprcroecesen,s soccur s when a tputhreee-r moni dparimaenslteolri.oliLatnalghteobjrraiysnectthcriisesatchapt diesaplsaehyedradow on u ch as a com­ ,2 weat wiwhatl conswe ialderreadythesknow e ideasaboutin fulprl generojectaiolinsty.. In To begi n , l e t ' s t a ke anot h er l o ok Section we showed that, in IR , the standard matrix of a projection onto the line through the origin with direction vector = [ �:] is )] ]=[ Hence, the projShowectionthofattPhecanvectbeorwrv iontt eno itnhitshleinequie is vjualsetntPv.form P = [ cos()cos2si()n () cos()sin2si()n () ] (What does() reShowpresetnthatherP ecan?) also be writ en in the form P = where is a vector in the dirUsectiniognProfoblem find P and then find the projection of v = [ _! ] onto the lines with the following unit direction vectors: (a) " = [ �:�J (b) " = [fl (c) " = [ -n rm P e=nt) . show that (a) = P (i.e., Pis symmetric) and (b) P2 = PUs(i.ein.,gPithseidfoempot 1 3. IR 3 . 5.1 3.6, d dl !(dl + d i ) d , d2!(d l + di d, d2!(df + d i ) d i !(df + d i ) d, d2 di Problem 1 Shadows on a wall are projections Figure 5 . 1 � uuT, Problem 2 d. Problem 3 Problem 4 366 2, uuT, pT u unit Section 5.0 Introduction: Shadows on a Wall 361 projects vectorsExplis thaeincolwhy,umnifspaceis aof projection matrix, the line onto which it consider projections onto planes through the origNowFiing. urWeewe5.wi2wilshexpll owsmoveoronee severinwayto atloapprprandooceed.aches. norv mal vectoforr sandomeifsvcalisara vector in tIhf enis a plpraneojgpth(vro) ughis a vectthe ororiignin insuchwithtath P Problem 5 2X2 P. IR 3 -c n =p n IR 3 , <!J' c. p= n v <!J' IR 3 - en Projection onto a plane Figure 5 . 2 v =p Us i n g t h e fact t h at i s or t h ogonal t o ever y vect o r i n s o l v e forUsteothfiendmetanhexprod ofesPrioonblforem itnotfiernmdsthofe vprandojection of Problem 6 -c n Problem 1 n c <!/', p n. 6 onto the planes wi(a)th the yfollowing equat(b)ions: (c) 3y finding theethpre oprjeoctjeictonioofn ofa vectv ontoor ontinotoa plthaeneAnotis shofuerggesitapprs tpredoojbyachectFiiotgnsourthontee5.pro3.otWehbleedimcanreofctdecompos irson. Accor vectordsinforgly, letThisandworksbeonldiyreictf itohne divectrectorisonforvectwiorstharthe eorprthoogonal uni t vect o perty that and x+ +z=0 x - 2z = 0 2x <!/'. sum <!J' u 1 • u2 = 0 ll u 1 ll = ll u2 ll = 1 IV I I I I '""" I I U2 : p = pI .- -:::� .L.:...- Figure 5 . 3 + P2 - +z=0 <!J' u1 u2 368 Chapter 5 Orthogonality By Problem 2, the projections of onto andand are gioves tIhteisprenough ojectiontoofshowonttohat we-need to showis trorhesattphectogonal-ivelyt. oTobotshhowistandhoratthuogonal t 2. (Why?) and Show t h at ive formunioft vectthe dotors.pr] oduct, together with the fact that andUseItthfoleareallotwseorrnthfratogonal projection ontootmhe Prsuobsblpeacem andof thespcomment anned bysorprthecediogonalng iunit thtatvecttheormats randix of thies Usortehogonal the sameunitRepeat andvectusoresPrinoblandtheemgivenasuspliinnadine.g ctat)heedforbelmoulw.a(FforirstP, vergivienfy tbyhatEquatandion are (a) x z with [ - �j�] and [ � ] 2 (b) x - 2z with [ 1:5] and [ � ] (c) 2x - z with [ - ] and [ 2 ] ties (a) and (b) ofShow PrShowobltthheatmat athpreomatjectrioixn Pmatofriax prgivoenjectbyionEquatontoioan pl( anesatiinsfies prcanoperbe­ expres ed as for some 2 matShowrix that if PiShows the tmathat rEquat iaonprojeicts ianonoutonteropra oplductane expans itohn.en] i x of i n rank(InPt)his chapt 2. er, we wil look at the concepts of orthogonality and orthogonal pro­ jgener ectionaliiznedgrandeatetrhdetat tahiley. Wehavewimany l see tihmatporthteantideasapplinictratoduced ions. in this section can be v v ( p1 + p2 ) u1 u2 p 1 = u 1 u fv p 1 + p2 p2 = u2 ufv !JP. u1 !JP, v v ( p1 + p2 ) u1 (v - ( p1 + p2 )) = 0 u2 (v - (p1 + p2 )) = 0. [Hint: xTy = x y, u1 Problem 8 • • · u2 8 !JP IR 3 u1 u2 ( 1) 7, Problem 9 v u1 +y+ u2 , = 0 = 0 3y + u1 Uz = u1 = 1/\/6 u1 = l/ Vs = 0 u1 = u2 = l/ v'3 l/ v'3 l/ v'3 ( 1). l/ - 1/ V2 0 /\/6 1 / \/6 - 1 / \/6 u2 = 1) Problem 10 Problem 11 u2 4. IR 3 p = AA T 3X Problem 12 A. [Hint: = • ( 1) IR 3 , O rth o g o n a l ilV i n IR n tsIwntaondarthvectis sdeobasctrsiotion,s setwes wioflvectgenerorsofa. lIinzedoiteashenygnottsooiwor,owen kofwiwiorltthsh:eogonal eFirthstat, anyittwyotofwprovectodipersotirtnsiectisnvectmakeofrrsothmine !R n { e1, e2 , . . . , en } !R n Section 5. 1 Orthogonality in lffi 11 369 terhteiesets learade usortthoogonal . iSecond, eachhogonalvectbasor ienstandhe setortishonora unimt vectal basore. Thess-concept e two prs tohp­at t h e not o n of or t we wil be able to fruitful y apply to a variety of applications. O rth ogonal and Orth onormal Sels of Veclors rths e set are ort,hogonalin -itshcalat lies,difan if all pairs of distinActsvectet ofovectrs inowhenever i j for i, j tThehe fisrtsatndarexampld base iils ustrates, th,ere arofe manyis anotorherthposogonalsibilsiteite,sas. is any subset of it. As Show that is an orthogonal set in if Definilion {v1 , v2 , • Exa m p l e 5 . 1 z {v1 , v2 , v3 } • . !R n vd * vi · vj = 0 {e 1 , e2 , • . • en } orthogonal set = 1, 2, . . . , k !R n IR 3 is true, sinceWe must show that every pair of vectors from this set is orthogonal. This Solution V1 • Vz = 2(0) + 1 ( 1 ) + ( - 1 ) ( 1) = 0 Vz • V3 = 0( 1) + 1 ( - 1) + ( 1 ) ( 1) = 0 y x V1 • V3 = 2( 1) + 1 ( - 1) + ( - 1) ( 1) = 0 An orthogonal set of vectors Figure 5 . 4 Geomet r i c al l y , t h e vect o r s i n Exampl e ar e mut u al l y per p endi c ul a r , as Figure shows. 4 of thesamairilynlinadvant of work,iasngTheor with eormthogonalshowsset. s of vectors is that theyOneare neces early iangesdependent is an or.thogonal set of nonzero vectors in then these vectors arIfe linearly in, dependent If are scalars such that then or, equivalently, t, all oftothe dot products in Equation are zerSinoce, except Thusis ,anEquatorthiogonal on rseeduces 5.1 5.4 5.1 Theorem 5 . 1 {v1 , v2 , Proof . • c1, !R n , vd • . • . , ck c 1v1 + + ckvk = 0, (c 1 v1 + · · · + ckvk ) · vi = 0 · V; = 0 · · · c 1 (v1 · vi ) + · · · + ci (vi · vi ) + · · · + ck (vk · vi ) = 0 {v1 , v2 , . . . , vk } V; · vi . ( 1) c; (vi · v;) = 0 (1) (1) 310 Chapter 5 Orthogonality The fact tsheatt. tNow,his is true for allbecause imbypliehypot s thathesis. So we musistahavelinearly independent tloyTheor em . weForknowexamplthate, iwef a setcanofimvectmediorsatieslyordeduce thogonalth,atit itsheautthormatee vecticalThanks loyrlsininearExampl independent dependent. diContrectrlays!t this approach with the work needed to estaeblish tarheie rlilninearearlyinindependence for a subspace W of is a basis of W that is an orthogonal seAnt. The vectors V; V; * 0 • V; * 0 i = 1, . . . , k C; = 0 . {v1 , v2 , . . . , vk } 5.1, Remark 5.1 Definition Exa m p l e 5 . 2 !R n orthogonal basis e arevectortohrogonal and,formhence, lifornearly byindependent . Sinceal anyTheorthereme lfroflinoearmnverExampl ly tiinbdependent s i n a bas i s t h e Fundament le Matrices, it follows that is an orthogonal basis for 4 onlvecty tohre orttohmake ogonal vectors anandorthogonalwere gibasvenis forand youOneInwerExampl ewayasktoededototfihinssudippossattohireredmember theeatExplin oratthioen:crTheos product ofoducttwo vectin Chaptors erand Henceis orwethogonal t o each of t h em. ( S Cros s Pr may take 5.1 IR 3 5.2, Remark IR 3 . v1 v2 1 .) IR 3 , {v1 , v2 , v3 } v3 IR 3 . IR 3 , v1 {v1 , v2 , v3 } Note that the resulting vector is a multiple ofthe vector in Example as it must be. Find an orthogonal basis for the subspace W of given by v3 Exa m p l e 5 . 3 v2 5.2, IR 3 Sect i o n gi v es a gener a l pr o cedur e for pr o bl e ms of t h i s s o rt . For now, tweofhevectwiorligofiirnsnofdintthhee orforFrtohmmogonalthe equatbasisiobyn ofbruthtee forplacne,e. Thewe havesubspace Wis a plsoaneWtconshroughists 5.3 Solution IR 3 . x=y - 2z, Section 5. 1 Orthogonality in IR" 311 It follow' that [ �] and [ - �] "e a b";' foe W, but they Me mthogo­ nalthes. eIt. suffices to find another nonzero vector in W that is orthogonal to either one of Suppo'e [ �] ;, a vedm ;n W that ;, mthogonal to Then ssyinstceem is in the plane W. Since we also have Solving the linear we find that and (Check this.) Thus, any nonzero vector of the form u� v� nol w� x y + 2z � 0, u. w x + y = 0. u w = 0, · x y + 2z = 0 =0 x+y � x = -z y = z. w wil do. To b"pedfie, we wold toke [ - :J- It to ehe<k that ;, an orthogonal set in Wand, hence, an orthogonal basis for W, since dim W base.iIsnideed, s that tthheercoore is adifornatmesulofa aforvecttAnothesorewihcoorerthadvant rdeinspatecteasge,tasoofstuhworche folaklibasonwig iwins garthtehaneoreasoryemtthoogonal comput establishes. vectLet or in W. Then betheanuniorqtuehogonal scalarsbasis for a ssuubschptacehat W of and let be any are given by for ... ;""Y w� {u, w} = 2. Theorem 5 . 2 {v1 , v2 , . . . , vk } !R n c 1 , . . . , ck W ' V; C; = -V; ' V; i = 1, w ,k Sincesuch that is a basis for W,(frweomknowTheortheatmthere arToe uniesqtaueblisschaltahres formula for we take the dot product of this linear combination with to obtain Proof c 1 , . . . , ck {v1 , v2 , . . . , vk } w = c 1 v1 + C;, · · · + ckvk 3.29). w · V; = (c 1 v1 + · · · + ckvk ) • V; = c 1 (v1 · v;) + · · · + c ; (v; • v;) + · · · + ck (vk • v; ) = C; (V; ' v; ) vj V; = 0 * i. V; V;, V; * 0, V; V; * 0. dessinceired result. for j Since · • V; Dividing by we obtain the • 312 Chapter 5 Orthogonality Exa m p l e 5 . 4 [ FiofnExampl d the coores dinandates of � ] with respectto the orthogonal basis Using Theorem we compute w = 5.1 5.2. 3 B = {v1 , v2 , v3 } 5.2, Solulion 2+2-3 1 = V1 • V1 4 + 1 + 1 6 w·V 0+2+3 5 C2 = V " V2 = 0 + 1 + 1 = 2 2 2 w · V3 1-2+3 2 C3 = V • V = 1 + 1 + 1 3 3 3 w · V1 -- c] = = - -- - -- � Thus, (Checkionthasis.) With the notation introduced in Section 3.5, we can also write the above equat to findbastheess. e coorAsCompar dinnotateesdediattrhetctehelprybegiandocedurnyouninegshiofnoulExampl dthisstsaertcteitoo5.n,appr4thwieetciothahtterheethprworeovalperkutreyeofquiofortrhteehdogonal iornthogonal is thatityeach, weshavetandarthde basfollioswivectngodefir is naiunition.t vector. Combining thissprtaondarpertdybaswitihs sthetatofisunian torvectthonoroArs.smeAntalofsevectt. ors in is anfor a subspace ioff it is ians aorbasthiogonal s of If The fact thatiseachan orthionors a unimalt vectset oofr vectis equiorsv,altheennt to for j and It follows that we can summarize the statement that is orthonormal as { o iiff jj Show that is an orthonormal set in if [ ] and [ ] !R n !R n Definition orthonormal set W orthonormal basis S = { q1 , . . . , qd ll q; ll = 1 . · q; q; . qj = 1 Exa m p l e 5 . 5 qi = · S i* i= IR 3 S = { q1 , q2 } l / v'3 - l / v'3 l / v'3 W q; qj = 0 q ; q; = 1 . Remark i -=fa !R n q2 = l / v'6 2/ v'6 l / v'6 Solution Section 5. 1 Orthogonality in We check that 313 lffi " q1 • q2 = 1 / \/18 - 2/ \/18 + 1 / \/18 = 0 q i . q i = 1 /3 + 1 /3 + 1 /3 = 1 q2 · q2 = 1 /6 + 4/6 + 1 /6 = 1 Exa m p l e 5 . 6 simplIfywenorhavemalanizeoreachthogonal vectors. et, we can easily obtain an orthonormal set from it: We Construct an orthonormal basis for from the vectors in Example malize themSintocegetwe already know that and are an orthogonal basis, we nor­ IR 3 v1, v2 , Solution v3 [ l [ ] ; � Then is an orthonormal basis for Since any, byorTheor thonoremm5.al1s.eItfofwevecthaveorsanis,orinthparonorticmulalar,basortihs,ogonal ,eimt is5.l2inbecomes early independent Theor even simpler. anyLet vector in Thenbe an orthonormal basis for a subspace of and let be and this representation is unique. Apply Theorem 5.2 and use the fact that for q3 = 11 1r3 = 3 l -1 1 l / V3 = - 110 l / V3 IR 3 . { q 1, q2 , q3 } Theorem 5 . 3 5. 1 . { q 1, q2 , . . . , qk} W W. IR " w w = ( w · q1 ) q1 + ( w · qz ) qz + · · · + ( w · qk) qk q; q; = 1 Proof · i = 1, . . . , k. iclesswhos younowMatrwiexami eenie.n eSectcoliuomnsn formSuchanmatortrhiconores havemalsseteveraralisate tfrraectquentive prlyoipern appltiesi,cwhiatiocnsh ,weas Orthogonal Malrices 5.5. 314 Chapter 5 Orthogonality Theorem 5 . 4 The columns of an n matrix form an orthonormal set if and only if We need to show that { Q iiff ii jj Letentry ofdenote tihs ethitehdotcoluproduct mn of of(athnd,e ithence, tofhe ithandrowthofe )th colSiunmnce thofe ( i, jit) h r o w follows that by tNow he defithneitcolionuofmnsmatrforix mulm antiporlicthatonorion. mal set if and only if { o iiff ii j which, by Equation holds if and only if if i j - { Q if i j ThisIfcompl e t e s t h e pr o of . the matrix in Theorem is a matrix, it has a special name. called an An n n matrix whose columns form an orthonormal set is The most important fact about orthogonal matrices is given by the next theorem. A square matrix is orthogonal if and only if only if Byis Theor inverteibmle and is orthogonalby Theorif andemonly if This is true if and Show that the following matrices are orthogonal and find their inverses: and [ cossin -cossin ] the staandndard basis vectors for which are clearly orthTheonorcolmalumns. Hence,of ariseorjuthstogonal [� � � ] m X Q TQ = In " Q Proof (Q TQ )ij = q; * 1 = Q Q TQ Q T) . QT Q, (Q TQ) ;j = q; · qj Q q; . qi = (2) *J = 1 (2), is an "Ortho­ unfortu­ nate bi t of terminology. normalterm, matributx" would clstandard. early be a better i t is not Moreover, there is winothtermorthonor­ for a nonsquare matrix mal columns. Orthogonal matrix Theorem 5 . 5 ( Q TQ )ij Q Definition x orthogonal matrix. 1 square Q Q - 1 = Q T. Q Proof Q Exa m p l e 5 . 1 5.4 * = _ 5.4, Q 1 Q- = Q T, 3.13. B= Solulion A A A - ' � A' � {J {J Q TQ = I. {J {J IR 3 , Section 5. 1 Orthogonality in For B, we checkcos()diresctinly()thatcos() - sin () BrB = [ - sin () cos () ] [ sin () cos() ] [ - sin ()coscos2 sicosn2 ()() sin () - cos s()insi2n cossin2 ()() cos () ] [ � � ] = I Therefore, B is orthogonal, by Theorem and B = B = [ - sin () cos()sin () ] ational,matanyrix, a matper­rix obtmutaaintiedonbymatperMatrixmirutsixorintghitnogonal hExampl e colu(mnsseee Exerofisanancisidexampl ente ityMatematofrairxiperxB. Iimns uttgener tthrarnsoughformthateianglon (ke nown () in IRas2. Anyan rotation hasin geomet the prorpery) . tThey thatnextihteistmathaeorriexmofshaowsrotattihoatn ation matis anricisesometare rchary. Oracttheogonal prteverheseseeyrprvoreotdotperhogonal prtieos.ductmats.rIinx facttrans, orfotrhmogonal rized bymateithrericesonealsofo Leta. QQisbeoranthogonalmat. rix. The following statements are equivalent: IR". n !R". = l xl · yforforever b.c. lQxQx·l Qy=x every xiynxandyi (mn)c)::::}vect(b)::::}ors i(na)IR. To", thdoensox, ·wey =wixlryneed. to make use of(a)::::}the fact(cWe) tAshwiatsluimefprxoandvethatthyatQarei(sa)::::}or(colthuogonal r . Then Q Q =I , and we have ( Qx · Qy = Q xfQy = xrQrQy = xrly = xry = x · y and yi=n IRl x".l Then, taking y = x, . we((bc)::::})::::}have((baQx)) AsAs· Qxssuumeme= xtthh·atatx, Qxprsool·perOQyxtyl ==(bx)v'· holyQx·fodrs everQxandy=lextVX:-X Using Exercise in Sectx·yio=n i ( l andx prr lo2per- tyl x(b-), wer l 2)havedenote the ith column of Q. == ii (( ll QQx(x Qy)rl l22 -- l l QQ(xx -- yQ)rl 2l )2) = Qx·Qy for alNowl x andif yisinthIRe"i.th[Tshitasndarshowsd basthiats vect(b) o::::}r, t(hc)en.] Qe;. Consequently, = Qe; . Qej = = { o iiff = Thus, the columns of Q form an orthonormal set, so Q is an orthogonal matrix. () + () + () + () + _1 isometry isos metron Theorem 5 . 6 5.5, COS () T A Remark The wordpreserving;'lisince terallyitmeans "length is deri v ed from the Greek roots ("equal") and ("measure"). 315 lffi " 5. 7 nXn 25). length-preserving isometry nXn Proof 63 q; 1 .2 + + + e; q; . % q; = e; . ej 1 i*j i j Chapter 5 Orthogonality 316 matorritchesonorandmalBisents-Exampl eth5.ei7r, you mayIn notfacti,ceeverthaty notorthLooki onlogonaly dongmatattheitrhrixecolorhastuhmnstogonal for m s o do his property, as the next theorem shows. If Q is an orthogonal matrix, then its rows form an orthonormal set. From Theorem 5.5(,QweT) know that Q- 1 QQT. Ther( QT) Yefore, T sQ-for o QT ismananororththogonal mat r i x . Thus , t h e col u mns of Q -which are just the rows of onor m al s e t . The fi n al t h eorem i n t h i s s e ct i o n l i s t s s o me ot h er propert i e s of or t h ogonal matrices. Leta. QQ-be1 isanortorhogonal thogonal. matrix. b.c. <lIfet Qis an eigenvalue of Q, then 1. d. If Q1 and Q2 are orthogonal matrices, then so is Q1 Q2. ve property (c) and leave the proofs of the remaining properties as(c)exerLetcWeisebes.wianl preigoenval and, using Theorem 5.6 (ube), ofweQhavewith corresponding eigenvector Then Q Since 0, this implies that 1. o1 - 01 ] Pr o per t y ( c ) hol d s even fo r compl e x ei g enval u es . The mat r i x [ is orthogonal with eigenvalues and - both of which have absolute value 1. A Theorem 5 . 1 - 1 = co- 1 ) - 1 = Proof Theorem 5 . 8 rows. = = = ::t:: l A IAI = nXn Proof A v. JJ v JJ = JJ Q v JJ = JJ Av JJ = I A I JJ v JJ JJ v JJ * IAI = Remark .. I i Exercises 5 . 1 In Exercises 1 -6, determine which sets of vectors are orthogonal. I. ,_ nHn [ - �l 2. uirn m [ Jf :H - !l ·· [ :H - H Ul i, v = AV, Section 5. 1 Orthogonality in In Exercises 7-10, show that the given vectorsform an ortho­ gonal basis for IR 2 or IR 3 • Then use Theorem 5.2 to express w as a linear combination of these basis vectors. Give the coordinate vector [w]8 of w with respect to the basis B = {v1 , vJ of !R 2 or B = v1 , v2 , v3 of!R 3 • 7. v1 = 8 . v1 = 9. v, � 10. v, � [ � ] [ � J [ �] [ � ] , [ �J [ � ] [ +' [ l; [ - :} [: l [ l, [ -J, [ _n m _ , v2 = v2 = w= - w= � _ _ � � � w� 0 2/3 - 2/3 1 /3 27. In Exercises 1 6-21, determine whether the given matrix is orthogonal. If it is, find its inverse. (c) 12 ' ] ] [ [ l[ [ l 16. 18. 19. 20. [o ] [ u -! :i cos[ cose 2sien e -sicosn ee - cos- sien2sien e sin e cos e l 2 l [ -l 2 ] 1 I -2 I 2 I 2 ! (b) v'3/2 0 0 - v'3/6 0 v6/3 ' ' 1 / v6 v'3/6 1 / \/2 - 1 / v6 1 / \/2 - v'3/6 -1 0 1 / \/2 1 / \/2 17. - 1 / \/2 1 / \/2 0 ! I 2 I 2 I -2 Qx x x y, y 2X2 b - 1 /2 1 /2 15 ' - 1 /2 ' 1 /2 2X2 IR 2 . angle-preserving IR 2 , [il [ -iJ [il [ -n [!J [ !J [ J l . [ l l • -i H ! U 13. Q Q 28. (a) 11. ] PrProoveve Theor e m S ( a ) . 5. Theor e m S ( b ) . 5. PrProoveve tTheor e m S ( d ) . 5. h at ever y per m ut a t i o n mat r i x i s or t h ogonal . Iobtf aiins edan byortrheogonal matng trhiex,rprowsoveofthatisanyalsomatrix ar r a ngi orLetthogonal . beprovectve bethoatrans itnhore tanglhogonal Ifeebetis twheeenanglmate betandrixwandQyeenilsealtandsoande. s provesmatthatricteshearlineear transformatioinsn defianfaedctby ort(hTatthihogonal iPrs otrveuethinatgeneran oratlh.)ogonal matrix must have the form a b [ - ] or [ ] where [ : ] is a unit vector. Using matpartri(xa)i,ssofhowthethformat every orthogonal cos[ sin ee -cossin ee ] or [ cossin ee - cossin ee ] wher ethat evere y orthogonal matrix corre­ Show spondsthtato eianthorerthaogonal rotation or a rmateflerctixioncor­in Show resreflpeondsctiontiona roitfa<ltieotn in if <let and a 22. 23. 24. 25. 26. Q w� In Exercises 1 1 - 1 5, determine whether the given orthogo­ nal set of vectors is orthonormal. If it is not, normalize the vectors to form an orthonormal set. 1 / v6 0 1 / v6 1 / \/2 1 / \/2 - 1 / v6 1 / \/2 0 311 IR" ] a a b b -a 2X2 0 :::::: < 21T. 2X2 2X2 (d) IR 2 IR 2 Q = - 1. Q Q=1 IR 2 . In Exercises 29-32, use Exercise 28 to determine whether the given orthogonal matrix represents a rotation or a reflection. If it is a rotation, give the angle of rotation; if it is a reflection, give the line of reflection. [ 1 / \/2 1 / \/2] 30. [ - V3/2 - 1 /2 ] - 1 /2 V3/2 31. [ 1 /2 ] V3/2 29. 1 / \/2 - 1 / \/2 - 1 /2 V3/2 Chapter 5 Orthogonality 318 33. Let Prandove thbeat r orthr)ogonal matrices. Usethenpart (a)itos notproivenverthtatib,lief. <let <let Let be a unit vector in !Rn. Partition as A B (a) A(A + B B = A + B. A+ (b) 34. x A+B B = 0, x 35. 36. 37. Let Q= [-�1- �- - - C- - I�-xJyy- - - r] yi I > m, I Ax I = ll x ll n (a) B={ • • x mX A n • x + (x x y = (x + + (x · · · Parseval's Identity. ) Prquiocvek metthathodisfoorrtfihnogonal ding an. (Torhiths onorprocedurmal base giivsesfora !Rn Q wifretquent h a prleysuscriebfuledifinrapplst vecticatorions.a)construction that is triangulmatrairxmat. rix is orthogonal, tPrPrhenooveveittthhmusatat iitffbean aupperdiagonal t h en t h er e i s no mat r i x n sLetuch that v1, , vn} beforan alorlthonorin !R m. al basis for !Rn. Prove that, for any and yin !Rn, · · v1) (y · v1) · v2) (y · v2) · vn) (y · vn) (This idoesdentiParsty isecalvalle'sdIdentity imply about the What relationship between the dot products · and x, X n n (b) - - x y [xla · [y] a? O rth o g o n a l C o m p l e m ents a n d O rth o g o n a l Proieclio n s red in Chapt eermentThes, andno­ ttIihnoetnhprofisosjaeectctnoriioomn,n alofwevectonegenerovectr atoliozareplonttwaneooconcept wianotl hbeersextwithatlendedgiweveencount troiseorttohetogonal compl he concept of orthogonal projection onto a subspace. Apasnorsesmthalrovectughotrhne ortoiagipln,atnehenisiort isthaogonal to everW ofy vectasorisinsptan(hatnpl) .aHence, ne. If thwee plhaveane s u bs p ace vecttwo osur bsofptaceshe otofher. Thiwitshisththeepridoeaperbehity thnatd teverhe foly vectlowionrgofdefionenitiisoorn.thogonal to every n n Let Wbe a s u bs p ace of !R . We say t h at a vect o r v i n !R irss that are W i f v i s or t h ogonal t o ever y vect o r i n W. The s e t of al l vect o orthogonal to Wis calW_j_led th{vie n !Rn : v ·w for all wiW,n denot e d W_j_ . That i s , W} diorIfctWihulogonal asr atoplWatneo(everit.eh.r,oyparughvectalteholretwioro tighnienW;norinhence, malandvectee oisrW_Ltthoe W).liMorne, tthheenoverroughever, Wtyhconsevectoriigositrnsvperpen­ on e is ofFigturhose e vectil uosrtsrwatetshtathiars seitoruatthioogonal n. to every v on hence, we also have W e_L. 1. w _j_ is pronounced perp:' "w O rth ogonal Complements IR 3 , IR 3 w w DefiniliOD nal to I e and Figure 5 . 5 e = w_j_ orthogo­ orthogonal complement of = 0 = w = e _j_ Exa m p l e 5 . 8 IR 3 = 5.5 €; precisely = Section 5.2 Orthogonal Complements and Orthogonal Projections 319 thogonalementcomplof theementcomplofementa subsofpacea sutbsurpnaceed wasout tthoebeorian­gi­ otnalhersIunbssExampl upbsace.pace.These Alseoprt, htohepereorcompl (b) ofATheorofemsets APrandoperttconsiieess ar(ics)etsandtrofuet(hidnei)gener wir common l alasloandbeelaruseement eprfulo.ved(sR. Seeecalas prAppendi l thoatperthtieexs A.(a)) and Leta. beis aa ssuubsbsppaceace of!Rof !Rnn.. b.c. d. If span(W1, . . . , Wk), then Vis in if and only if V W; for all be a sc(ala)ar.SiThennce w u ·forw allvwi· wn ifos rinall w inLet u and v be in and let Therefore, (u v) ·w u·w v·w so uWe valissoinhave ( ) ·w c u ·w fr(bo)mWewhiwichl weprosveeetthhiats prcuoperis itny as CorIt ofoll alorwsy that is a subspace of !Rn. ((dc)) YouYou araree asaskkededtoto prprooveve tthhiiss prprooperperttyyiinn ExerExercciissee assoWeciatecand winowth anexpres matsomerixfundament . al relationships involving the subspaces LetA isAthbee nulanl space ofmatA, randix. Thenthe ortthheogonal orthogonalcomplcompl e ment of t h e r o w s p ace of of the column space ofA is the null space(ofrowAT(A: ) .L null(A) and (cole(ment A) .L null(AT) n I f x i s a vect o r i n !R everin nuly lr(oAw),ofsoA.weButhavethiess itsabltr,uistehhenedif andtxhiesonlfiinrsyt(rioidfw(entAxAit)y. Toifwhiandprocvehonlitshyequieifsexcondvials eornttidhtentogonal o xitbeiy, wentog simply replace A by AT and use the fact that row(AT ) col (A) . rix hascompl four suement bspaces:s inro!Rw(n, Aand), nulthle(Ala)s, tcoltw(Ao )ar, ande ornulthogonal l (A T ) . TheThus,first tanwo are ormatthogonal 5.8, tion Theorem 5 . 9 5.9. nB intersec­ B W w.L ( W.L ) .L = W w n w.L = {o} W= W.L · i = 1, . . . , k. Proof c 0 · = 0 = + + = W.L W.L . W, 0 = 0 W + = 0 + 0 = 0 W .L . (cu) = = c (O) = 0 W.L W.L . 5.12. 23. 24. m X n Theorem 5 . 1 0 m X n = = Proof .L = 0, = m X n = 0 380 Chapter 5 Orthogonality null(AT) null(A) . / / �. row(A) col(A) [ffi m The four fundamental subspaces Figure 5 . 6 compl e ment s i n The mat r i x A defi n es a l i n ear t r a ns fo r m at i o n fr o m into Figurwhose rlaungestraitsescolth(esA)e. iMordeaseoverschemat, thisictalralyns. foThesrmeatfourion sesundsbspnulacesl (Aar)etocalleind e i the of the matrix A. Find bases for the four fundamental subspaces of A= ] I [ ; and verify Theorem column space,In Exampl and nulelsspace ofA. Weandfound twehat comput row(A) e=d sbaspan(esufor1, u2t,hu3e )r,owherw speace, , U3 = u2 = Also, null (A) = span(x1, x2 ), where = , = Toeachshxowj, whithatch(riosw(anAeas) 1y- =exernulcils(eA. )(,Withyis enough is this sutoffischiowentt?h) at every U; is orthogonal to �m. �m �m. mX n 5.6 mX fundamental subspaces Exa m p l e 5 . 9 0 �n n 3 -1 0 2 1 -2 1 6 1 5.10. 3.45, 3.47, Solulion ll 1 [1 0 0 -1], X1 � 3.48, [O 2 0 3] -1 -2 1 0 0 1 -3 0 -4 Xz = [0 0 0 4] Section 5.2 Orthogonal Complements and Orthogonal Projections 381 The column space ofA is col (A) span(a1, a2, a3), where = We stil need to compute the null space ofAr. Row reduction produces -3 63 [Ar 3 6 6 3 So,It foliflowsisthinat the null space of Ar, then y1 -y4, y2 - 6y4, and y3 - 3y4. 2 4 0 1 -1 2 1 0 0 0 1 1 -2 1 0 -1 0 i O] y � 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 = = null(A') { [ =:�]} sp� ( [ =m and it is easy to check that this vector is orthogonal to a1, a2, and a3. The method of Example is easily adapted to other situations. Let be the subspace of spanned by -3 , , 3 Find a basis for space of The subspace spanned by w1, w2, and w3 is the same as the column A = -3 3 � � 5.9 Exa m p l e 5 . 1 0 IR 5 W 1 5 0 5 W1 = Wz = -1 1 2 -2 W3 = W_j_ . Solution W 1 -1 0 1 -1 5 2 4 0 -2 - 1 5 5 0 -1 4 -1 5 382 Chapter 5 Orthogonality T Ther e for e , by Theor e m Wl _ = ( c ol ( A ) J _ = nul l ( A ), and we may proceed as in the previous example. We compute IA' o J [ - : J [ J = and = It folHence,lows thiats in W1-if and only if = = = span and these two vectors form a basis for WJ_ . Recall that, in the projection of a vector (vu•vonto)a nonzero vector u is given by proj0(v) = u·u u Furcantdecompos hermore, tehve asvector perp0(v) = v - proj0(v) is orthogonal to proj0(v), and we as shIfownwe leint WFig=ursepan(u), then w = proju(v) is in Wand w1- = perpu(v) is in W1- . We a way ofto"dW-namel ecomposiyn,g"v v=inwto thwle _s.uWem ofnowtwogenervectoarlisz, eonethifrs iodmeaWand tthhere oteforhere haveorthogonal to basdefiins edforasW. ForLetanyWvectbe aorsuvbsinpace ofthe and let {u1, ... , uk} be anvorthogonalWis proj w(v) (UU11••U1 )u, ( Uk"Uk·Uk)uk The v W is the vector perpw (v) = v - projw (v) tsoensr Each (eo)r., Therequisummand veforaleent, lwiyi,ntthhthetheone-edefinotdnaiitmtiiooensnn ofofiotprnalheojprswu(ebsvcedi)piaces nalgssodefipaannedprniotjioectn,byioweint-ontcaninowroura siitneprgleevivec­ous 5.10, l -3 5 0 5 O 2 -2 3 0 -1 4 -1 5 0 � y - 3y4 - 4y5 , y2 y1 wl_ � 1 0 0 3 4 O 0 1 0 1 3 0 0 0 1 0 2 0 -y4 - 3y5 , y3 - 2y5 . -4 -3 -2 0 -3 -1 0 1 0 - 3y4 - 4y5 -y4 - 3y5 - 2y5 Y4 Ys O rth ogonal Proieclions v IR 2 , -- u v proj0(v) + perp0(v) Figure 5 . 1 = 5.7. !R n . + Definition !R n !R n , = orthogonal projection of V -- component of orthogonal to + ···+ V -- onto Section 5.2 Orthogonal Complements and Orthogonal Projections 383 w Figure 5.8 P = P1 + Pz ofurSineitces prthiolejuvectescttriaootnsress tonthiarsoseione-toruatthdiogonal ionmwiensthi,otWnalhe orsutsbshpogonal panaces(u1, tuprh2at)o,jparecteimutoprn ofouj alwvl(yontv)or, opt1hWiogonalsprthoej.0sFi1u(vmg)­, andAsp2 a spprecioja0l,(casv) .e of the definition ofproj w(v), we now also have a nice geometric erminolbasogy,is ti{nvhtat1e, rvtp2h,reoretate,imovd,nstofathteTheor senthateimf is inInthteersmubss ofpaceourWprofesent whinotcahtiohasn andorthtogonal U; 5.8 = = = = 5.2. !R n , w • . • Thus, oforthogonal projections onto mutually orthogo­ nal one-The ddefiisimdecompos ensnitioionnalaboveesdubsinspteoacesemsa suoftmo W.depend on t h e choi c e of or t h ogonal bas i s ; t h at i s , differentFor" prnow,oj w(lve)t'sandbe perconta dipffew(ntervewint) . tForbash antiusexampl nat{u;e,ly..., eth,. iu£s }isfornotWthwoule casde,appear as we wito lgisvoeona "prove. w Exa m p l e 5 . 1 1 Let W be tbe plone in with equation and let v � [ : ] . Find tbe orthogonal projection ofv onto Wand the component of v orthogonal to W. In Example we found an orthogonal basis for W. Taking II' Solution we have x - y + 2z � 0, 5.3, ll1 " V = 2 U1 • U1 = 2 Uz • V = - 2 Uz • Uz = 3 - 384 Chapter 5 Orthogonality Therefore, proj w (v) = ( ) + ( ) � i[ � l - f : J Ul and perpw(v) = v - proj w (v) = [ - �3 ] [ -i� ] [ -�; ] nce it sattoisW,fiessitnhcee equat is equalIt isleasy easy tyotsoeseetehatthatprperojwp(vw() vis)iins orW,thsiogonal it is aioscnalofartmulhe pltiapne.le ofIt thrnornal vootm [ - : l to W. igmo u1 · v -- u 1 U 1 . U1 ( S" F u2 · v - u2 U U z. z 5.9.) perpw(v) v projw(v) + perpw(v) Figure 5 . 9 = Theorem 5 . 1 1 respTheect tnexto a suthbseorpaceemandshowsits torhatthweogonalcan alcomplwaysefimentnd a.decomposition of a vector with ubspacew_j_ofin W_j_andsuchletthvatbe a vector in Then there are unique vectLet oWrsbewian sWand v = w + w_j_ We need to show two things: that such a decomposition and that it is for Let orthogonal basis w =Toprowsjhwow+(vw_j_)exiands=telprence,t ow_j_j wwe(=v)perchoos p+ perw(epvanw) .(Then v) = proj w(v) + (v - proj w(v) = v The Orthogonal Decomposition Theorem IW Proof unique. !R n . exists {u1, . . . , uk} W Section 5.2 Orthogonal Complements and Orthogonal Projections 385 Clu1,ear. .ly. ,, wuk. Toprshojoww (vth) atis wlin. W,is insinWlce. ,ititiiss aenough linear combi nattihoatn wlof.thiseorbasthiogonal s vectotros t o s h ow each of the basis vectors U;, by Theorem 5.9(d) . We compute U; wl. u;· per(v -pwpr(vo)j w(v) u;· (v - (:1.";Ju1 - - (�k.";Juk) Uu;·; " u;V ) (u' · u)' UU11·"Vu1 ) (u' · u ) - (-u-·' v - (-u-·v - - (-uui' ·-· uvi ) (u-· u.) that wl. is in Wl. and completes the existence parsincetToofu;sthhujowe prtohof.e founir quenesThis ofs tprhiosvesdecompos ppose wew_Lhave anotW1 +hwt,er de­so composition v W1 +wt, where w1 is in Wandwtitioin,s inletwl's s.uThenw+ But sinthceatwth-is common w1 is in Wand these earme 5.su9bs(cp) ]aces. Thus,), we know vectorwtis in-Wwl. isWlin. Wl. (b[ecaus usingeTheor = • = = U; • = ··· = = t 0-··· z -··· -··· z z -· · ·- 0 = u; · v - u; · v = 0 · =0 j -=F i. = = n = {O} the Exampl subspacee 5.of1IR1 3ilgiuvsentrabytedththeeplOranethwiogonalth equatDecompos ion - iti+on Theoretm.he orWhenthogonalW is dernmpo,;tion of v [ -! ] wdh mped to W ;, w + w", whece w pmj w(v) Ul Md w" pecpw(v) H l s of thve)ordothogonal decomposon thiteiochoin guarce aofnteoresththogonal at the defibasinsi.tiTheons ofOrtprhTheogonal ojwuni(v)qDecompos anduenesperpw( not depend i t i o n Theor e m al s o al l o ws us t o pr o ve pr o per t y ( b ) of Theo­ rem 5.e9m.. We state that property here as a corol ary to the Orthogonal Decomposition Theor If Wis a subspace of then x y � C o r o l l a rv 5 . 1 2 . � � � !R n , 2z = 0, � � 386 Chapter 5 Orthogonality f w is inWWand(W_Lxis) _Lin. NowW_L ,lteht env bewinx(W_L ) _LBut. BythTheor is nowemimplieswethcanat wwris iitne (W_Lv w) _L . IHence, w_Lv w_Lfor (u(wnique)w_Lvect) wo_Lrs wiwn Wand w_L i n W_L . But now _L w_L w_L w_L w_L w_L w_L w and,so w_Lsince tTherhe reeverforsee, vincluwsionw_Lis alsow,trandue, wethusconclvisuindeW.thThiat (sWsh_Lows) _L thW,at (asWr_Le)qui_L red.W There eims al5.so1 3.a nice relationship between the dimensions of Wand W_L , expres ed in Theor If W is a subspace of then dim W dim W _L anworthogonal onalWe clbasaimLetistforhatw_LBi...s. Thenan, ukor} betdihmogonal kbasandis fodibasrmiw_Ls for WandLet lBet {v1, ......, v1,} Ukbe,vanl, ...orth, v1og-}. We first note thatU,; 'siVnj ce eachforiis in Wand.. .,kandj each vj is in W_L , Thus,, iBf viiss ana vectortohrogonal shete and,Orthhence, iDecompos s linearly iintidependent , bytelTheor emat v Next i n t ogonal o n Theor e m s us t h wtion ofw_Lthfoe rvectsomeorswiU; andn Wandw_L canw_LbeinwrW_Lit .enSinascea wlincanear combi be wrint atenioasn ofa litnheare vectcombiors vnj,a­v andcan beso wris aitbaseniass fora linearItcombi follodiwsnmatthiwoatnkofdithme vectw_Ldimors in orB. Therefore, B spans also As a l o vel y bonus , when we appl y t h i s r e s u l t t o t h e fundament a l s u bs p aces of a matCororilxa, rwey get a quick proof of the Rank Theorem (Theorem restated here as C Proof = + 0= · = 0. Theorem 5 . 1 3 = · + · = + = = + · = 0. 5. 1 1, =0+ · = · C !R n , = + Proof {u 1 , = !R n . u; =0 n = l. = {u 1 , = 1, = 1, . . . , l 5. 1 . = !R n , + !R n . +l= + !R n !R n , = n 3.26), 5. 14. C o r o l l a rv 5 . 1 4 = · The Rank Theorem IfA is an matrix, thenrank(A) nullity(A) em so diNotm eWInthTheor rat awenk(emgetA) anda counttdiakemeW_LrWpart riodnulw(entlAiitty)y.(Then Aby) . tTheakiW_LnrgesuWlnult folll(coloAws.)(,Aby) Theor [and therefore W_L null (AT )J : rank(A) nullity(A mX n + Proof = 5.13, = = n = = 5.10, = = + r) = m Section 5.2 Orthogonal Complements and Orthogonal Projections 381 have, weilhaveustranotted essometablisofhedthtehatadvanteveryagessubsofpaceworkinganwiorth­ orthogonal thSectogonaliobasnsbasis, enorsand. However ven a mete hodThesfor conse is turuesctarineg tshuechsuabjbasectisof(etxcept parsecttiioculn. ar exampleshave, suchweasgiExampl he nextin 5.1 5.2 has 5.3). I Exercises 5 . 2 In Exercises 1 -6, find the orthogonal complement W_j_ of W and give a basis for W_j_ . 1. W = 2. W = 3. w � 4. W � 5. W � 6. W � { [;] } } { [;] { [�l ) { [:J ) { [�l ) { [�] � +. ) : : 2x - y = 0 3x + 4y = 0 x+y-z� 0 2x - y + 3z � 0 x � I, y � - I, z � 3t x � t, y � z � 21 In Exercises 7 and 8, find bases for the row space and null space of A. Verify that every vector in row(A) is orthogonal to every vector in null(A). 7. A = [ � : _! ] [ -� � -� � !l -1 -1 8. A = In Exercises 1 1 - 1 4, let W be the subspace spanned by the given vectors. Find a basis for W_j_ . 1 2 2 -2 0 1 -3 - 1 3 4 5 In Exercises 9 and 1 0, find bases for the column space of A and the null space of A T for the given exercise. Verify that every vector in col(A) is orthogonal to every vector in null(A T). 10. 8 9. 7 Exercise Exercise 4 2 6 2 2 = = W , W = , W 2 0 3 -1 z 14. 1 1 -1 2 -1 -3 In Exercises 1 5- 1 8, find the orthogonal projection of v onto the subspace W spanned by the vectors u;. (You may assume that the vectors U; are orthogonal.) Chapter 5 Orthogonality 388 t h at v = I s i t neces s a r i l y t r u e t h at i s i n W J . ? EiLetth{erv1pr, ...ove, vthn}atbeit anis torrutehorogonalfind abascountis foerre!Rxampl eand l.et v n v = [ - � l W = span( [�] ) Wpan(= vspk+an(l' ...v1, ,...vn),?vEik)t·hIers itprnecesove tsharatilyit tirsuteruthe atorWlfin.d=a scount e r e xampl e . J W ([ J ) [ : !Rn, x n!R . W Pr o ve t h at xi s i n W i f and onl y i f pr o j w ( x ) = x. ( ) H to W if and only if [ � ]. [ - : ] [prPrProoojvevew(ttxhh)atat=prxiosjorw(tphrogonal o j w ( x ) = pr o j w ( x ) . Letlet xSbe= a{vvect1, ...or ,invk!R} nbe. an orthonormal set i n !Rn, and [ -; l W ( [J [ -m Prol vex l 2th2:at lx ·v1 l2 lx ·v l 2 lx ·vkl2 23. PrProoveve Theor e m 9 ( c ) . 5. 2 ( T hi s i n equal i t y i s cal l e d Theor e m 9 ( d ) . 5. LetthatW andbe a subsarepaceorthofogonal !Rn andvectv aovectrs wiotrhin !Rinn. Wand Suppose onlProyveiftxihats iBesn sspean(l's ISn).equality is an equality if and w + w' . In Exercises 1 9-22, find the orthogonal decomposition of with respect to W 26. 19. � 'P"• 20. v � 21. 22. w' In Exercises 27-29, let a vector in 27. 28. 0. 29. 30. w � 'P"" F � 'P"" F be a subspace of (a) + 24. 25. and let be +···+ Bessel's Inequality. ) (b) w w' w T h e G r a m - S c h m i d t Process and t h e OR Facto rizati o n tmosIhnonorthtiuss mseefulalct)iobasofn,aliwesl formatpranyerisxentfactsubsaospriimacezatpliofeons.met!Rn.hThiod sformetconshodtrwiuctl itnhgenanleoradthusogonal to one(oofr torhe­ n Weis towoulbegidnliwiketthoanbe arablbietrtaorfiynbasd anis or{xt1h, ogonal bas i s for a s u bs p ace W of !R .zThee" it ionedea . . . , xd for Wand t o " o r t h ogonal i vectExamplor ate 5.a 3ti.me. We wil il ustrate the basic construction with the subspace W from Let W = span(x1, x2 ), where The Gram-Schmidl Process Exa m p l e 5 . 1 2 Construct an orthogonal basis for W. 1, we getto ax1se(Fcondigurevect5. 1o0)r.that is orthogonal to it by taking the componentStartiofngxwi2 orththxogonal Solulion Section 5.3 The Gram-Schmidt Process and the Factorization QR 389 w Constructing orthogonal to Figure 5.10 v2 x1 Algebraically, we set v1 = x1 , so v2 = perpxJ xz ) = x2 - proj x, ( xz ) ( ) = x2 - � xI X 1 • xi Then {v1 , v2 } is an orthogonal set of vectors in W. Hence, {v1 , v2 } is a linearly independent set and therefore a basis for W, since dim W = 2. Remark Theorem 5 . 1 5 [il Observe that this method depends on the order of the original basis mtocs. Jn Exrunpl' 5 . 1 2, if we had taken x , � � [ -fl 4 and x, � we would have obtained a different orthogonal basis for W. (Verify this.) The generalization of this method to more than two vectors begins as in Example 5 . 1 2 . Then the process is to iteratively construct the components of subse­ quent vectors orthogonal to all of the vectors that have already been constructed. The method is known as the Gram-Schmidt Process. The Gram-Schmidt Process Let {x 1 , . . . , xd be a basis for a subspace W of !R n and define the following: _ ( : ) _ ... xk vz Vz Vz v2 Then for each i = 1 , . . . , k, {v1 , . . . , v; } is an orthogonal basis for W;. In particular, {v1 , , vk } is an orthogonal basis for W. . • . 390 Chapter 5 Orthogonality Jorgen PedersenwasGram (1850-1916) a Danishwhoactuary (insurance stati s tician) was isurement. nterested inHethefirstscience of mea­ published the process that bears his name in an 1883Schmidt paper on(1876-1959) least squares.was Erhard astudiGerman mathematician whod e d under the great Davi Hilbert and is considered one of ofmathematics the founders of the branch known as functional analysis. His contribution to thein a Gram-Schmidt Process came integral inof1907thewhipaper cmethod h heonwrotemore outexpltheequations, details i c itly than Gram had done. Stated succinctly, Theorem 5 . 1 5 says that every subspace of ll�r has an orthogonal basis, and it gives an algorithm for constructing such a basis. Proof We will prove by induction that, for each i = 1, . . . , k, {v1 , . . . , v;} is an or­ thogonal basis for W;. Since v1 = x1 , clearly {v i } is an (orthogonal) basis for W1 = span(x 1 ) . Now assume that, for some i < k, {v1 , . . . , v;} is an orthogonal basis for W;. Then V;+ J = X;+ J _( ) _( VI • X;+ 1 V1 V1 . V1 ) _ ... _ ( V2 • X;+ 1 Vz Vz . Vz ) V; • X;+ 1 V; V; . V; By the induction hypothesis, {v1 , . . . , v;} is an orthogonal basis for span(x1 , . . . , x;) = W;. Hence, V;+ 1 = X;+ 1 - proj w, C x;+ 1 ) = perp w, C X;+ 1 ) So, by the Orthogonal Decomposition Theorem, V;+ 1 is orthogonal to W;. By definition, v1 , . . . , v; are linear combinations of x1 , . . . , X; and, hence, are in W;. Therefore, {v1 , . . . , V;+ d is an orthogonal set of vectors in W;+ i · Moreover, V;+ i * 0, since otherwise X;+ 1 = proj w, C X; + 1 ) , which in turn implies that X;+ 1 is in W;. But this is impossible, since W; = span(x 1 , . . . , X;) and {x 1 , . . . , X;+ i } is linearly independent. (Why?) We conclude that {v1 , . . . , V;+ d is a set of i + ! linearly independent vectors in W;+ i · Consequently, {v1 , . . . , V;+ d is a basis for W;+ 1 , since dim W;+ 1 = i + 1 . This completes the proof. If we require an orthonormal basis for W, we simply need to normalize the orthogonal vectors produced by the Gram-Schmidt Process. That is, for each i, we replace V; by the unit vector q; = ( l / ll v; ll )v; . Exa m p l e 5 . 1 3 Apply the Gram-Schmidt Process to construct an orthonormal basis for the subspace W = span(x1 , x2 , x3 ) of IR 4 , where First we note that {x1 , x2 , x3 } is a linearly independent set, so it forms a basis for W. We begin by setting v1 = x1 . Next, we compute the component of x2 orthogo­ nal to W1 = span(v1 ) : Solulion Section 5.3 The Gram-Schmidt Process and the Factorization QR 391 For hand calculations, it is a good idea to "scale" v2 at this point to eliminate fractions. When we are finished, we can rescale the orthogonal set we are constructing to obtain an orthonormal set; thus, we can replace each V; by any convenient scalar multiple without affecting the final result. Accordingly, we replace v2 by We now find the component of x3 orthogonal to W2 span ( x1 , xi ) span ( v1 , v2 ) = = = span ( v1 , v� ) using the orthogonal basis {v1 , v�}: Agllin, we mrnle and "'"; � [ � 2 � !l v, We now have an orthogonal basis {v1 , v�, v� } for W. (Check to make sure that these vectors are orthogonal.) To obtain an orthonormal basis, we normalize each vector: q3 = -1� 1 [ -1/V6] [ -V6/6 ] [ C : ) ( �) 2 21 /�V6V6 �/V6/63 v 1 � 11 � = = Then {q 1 , q2 , q3 } is an orthonormal basis for W. 392 Chapter 5 Orthogonality One of the important uses of the Gram-Schmidt Process is to construct an orthogo­ nal basis that contains a specified vector. The next example illustrates this application. Exa m p l e 5 . 1 4 Find an orthogonal basis for IR 3 that contains the vector Solulion � We first find any basis for IR 3 containing v1 • If we take then {v1 , x2 , x3 } is clearly a basis for IR 3 . (Why?) We now apply the Gram-Schmidt Process to this basis to obtain and finally Then {v1 , v�, v� } is an orthogonal basis for IR 3 that contains v1 . Similarly, given a unit vector, we can find an orthonormal basis that contains it by using the preceding method and then normalizing the resulting orthogonal vectors. R e m a r k When the Gram-Schmidt Process is implemented on a computer, there is almost always some roundoff error, leading to a loss of orthogonality in the vec­ tors q;. To avoid this loss of orthogonality, some modifications are usually made. The vectors V; are normalized as soon as they are computed, rather than at the end, to give the vectors q;, and as each q; is computed, the remaining vectors xj are modified to be orthogonal to q;. This procedure is known as the Modified Gram-Schmidt Process. In practice, however, a version of the QR factorization is used to compute orthonormal bases. The OB Facto rization 2 If A is an m X n matrix with linearly independent columns (requiring that m n), then applying the Gram-Schmidt Process to these columns yields a very useful fac­ torization of A into the product of a matrix Q with orthonormal columns and an Section 5.3 The Gram-Schmidt Process and the Factorization QR 393 upper triangular matrix R. This is the QR factorization, and it has applications to the numerical approximation of eigenvalues, which we explore at the end of this section, and to the problem of least squares approximation, which we discuss in Chapter 7. To see how the QR factorization arises, let a 1 , . . . , an be the (linearly independent) columns of A and let q 1 , . . . , qn be the orthonormal vectors obtained by applying the Gram-Schmidt Process to A with normalizations. From Theorem 5.15, we know that, for each i = 1, . . . , n, W; = span ( a 1 , . . . , a; ) = span ( q 1 , . . . , q; ) Therefore, there are scalars r 1 ;, r2 ;, • . • , r;; such that That is, a, = r l l q, az = r 1 2 q 1 + Yzz qz which can be written in matrix form as r 1n r� n 0 rnn l = QR Clearly, the matrix Q has orthonormal columns. It is also the case that the diago­ nal entries of R are all nonzero. To see this, observe that if r;; = 0, then a; is a linear combination of q 1 , . . . , q; _ 1 and, hence, is in W; _ 1 . But then a; would be a linear com­ bination of a 1 , . . . , a; _ 1 , which is impossible, since a 1 , . . . , a; are linearly independent. We conclude that r;; -=fa 0 for i = 1, . . . , n. Since R is upper triangular, it follows that it must be invertible. (See Exercise 23. ) We have proved the following theorem. Theorem 5 . 1 6 The QR Factorization Let A be an m X n matrix with linearly independent columns. Then A can be fac­ tored as A = QR, where Q is an m X n matrix with orthonormal columns and R is an invertible upper triangular matrix. R e m a rks We can also arrange for the diagonal entries of R to be positive. If any r;; < 0, simply replace q; by - q; and r;; by - r;;. • The requirement that A have linearly independent columns is a necessary one. To prove this, suppose that A is an m X n matrix that has a QR factorization, as in The­ orem 5.16. Then, since R is invertible, we have Q = AR - 1 • Hence, rank(Q) = rank(A), by Exercise 61 in Section 3.5. But rank( Q) = n , since its columns are orthonormal and, therefore, linearly independent. So rank(A) = n too, and consequently the columns of A are linearly independent, by the Fundamental Theorem. • Chapter 5 Orthogonality 394 = = • The QR factorization can be extended to arbitrary matrices in a slightly modified form. If A is m X n, it is possible to find a sequence of orthogonal matrices Q1, , Om - I such that Om - I · · · Q2 Q1A1 is an upper triangular m X n matrix R. Then A QR, where Q ( O m - I · · · Q2 Q1) - is an orthogonal matrix. We will examine this approach in Exploration: The Modified QR Factorization. . Exa m p l e 5 . 1 5 • . =A [ - � � �1 Find a QR factorization of -1 0 1 1 2 The columns of A are just the vectors from Example 5.13. The orthonormal basis for col(A) produced by the Gram-Schmidt Process was Solulion = [ = :;: 1 , = [ ��:: 1 , = [ - ��:1 = =[ v: 1 = = = =IR = [ l = = =: l [ � l q1 qz Q V6/3 Vs/ 10 1/2 so q3 [ q i qz q 3 ] 1/2 3 Vs/10 - /6 - 1/2 3 Vs/10 - 1/2 Vs/10 V6/6 1/2 Vs/10 V6/3 From Theorem 5.16, A QR for some upper triangular matrix R. To find R, we use the fact that Q has orthonormal columns and, hence, Q TQ I. Therefore, Q TA We compute R .. I Exercises 5 . 3 Q'A R 1/2 - 1/2 - 1 /2 1 /2 3 Vs/ 10 3 Vs/10 Vs/10 \/5/ 10 V6/6 V6/3 - V6/6 0 1 1 /2 Vs 3 Vs/2 0 V6/2 � [: In Exercises 1 -4, the given vectors form a basis for IR 2 or IR 3 • Apply the Gram-Schmidt Process to obtain an orthogonal basis. Then normalize this basis to obtain an orthonormal basis. Q TQR = [ _�l [�] [ [ 3 � x� x� J, n, m 2. X1 . .. Xz = 1 2 1 0 1 Section 5.3 The Gram-Schmidt Process and the Factorization QR Exercises 5 and 6, the given vectors form a basis for a subspace W of IR 3 or IR 4. Apply the Gram-Schmidt Process to obtain an orthogonal basis for W ** *] In Exercises 13 and 1 4, fill in the missing entries of Q to make Q an orthogonal matrix. 1/ \/2 1 / \/3 13. Q 0 1 / \/3 - 1/ \/2 1 / \/3 1/2 2/ Vi4 1/2 l/ Vi4 14. Q 0 1/2 1/2 - 3/ Vi4 = In 395 = [ [ ** ** **: ] In Exercises 1 5 and 1 6, find a QR factorization of the matrix in the given exercise. 16. Exercise 10 15. Exercise 9 In Exercises 7 and 8, find the orthogonal decomposition of v with respect to the subspace W 7. v � 8. F [ -:} [ i} w "' i• Emd" , ] [' i -! ] = = W as in Exmise 6 Use the Gram-Schmidt Process to find an orthogonal basis for the column spaces of the matrices in Exercises 9 and 1 0. 9. In Exercises 1 7 and 1 8, the columns of Q were obtained by apply ing the Gram-Schmidt Process to the columns ofA. Find the upper triangular matrix R such that A QR. l 8 2 3 � 17. A 7 -1 ' Q 3 2 1 -2 --3 3 18. A = u ] = [ -j - � [ � 3 4 ,Q = :3 1 /v'6 1/ 2/ V6 - 1 V6 1 / \/3 1 / \/3 _ [: � �] 19. If A is an orthogonal matrix, find a QR factorization of A. 20. Prove that A is invertible if and only if A QR, where Q is orthogonal and R is upper triangular with nonzero [:J In Exercises 21 and 22, use the method suggested by Exercise 20 to compute A i for the matrix A in the given exercise. 22. Exercise 15 21. Exercise 9 23. Let A b e an m X n matrix with linearly independent 1 1 . Find an orthogonal basis for IR 3 that contains the veoto' 12. Find an orthogonal basis for IR 4 that contains the vectors = entries on its diagonal. - columns. Give an alternative proof that the upper triangular matrix R in a QR factorization of A must be invertible, using property (c) of the Fundamental Theorem. 24. Let A be an m X n matrix with linearly independent columns and let A = QR be a QR factorization of A. Show that A and Q have the same column space. Exp lorations 0 T h e M o difi e d QR Factorizat i o n When the matrix A does not have linearly independent columns, the Gram-Schmidt Process as we have stated it does not work and so cannot be used to develop a gen­ eralized QR factorization of A. There is a modification of the Gram-Schmidt Process that can be used, but instead we will explore a method that converts A into upper triangular form one column at a time, using a sequence of orthogonal matrices. The method is analogous to that of LU factorization, in which the matrix L is formed using a sequence of elementary matrices. The first thing we need is the "orthogonal analogue" of an elementary matrix; that is, we need to know how to construct an orthogonal matrix Q that will trans­ form a given column of A-call it x-into the corresponding column of R-call it y. I Qx ll By Theorem 5.6, it will be necessary that ll x ll Figure 5. 1 1 suggests a way to proceed: We can reflect x in a line perpendicular to x y. If = = l r l-. y figure 5 . 1 1 - uj_ = [ �: ] u [ l--2d,2dd212 --2d21dd22 ] = I - 2uu . is the unit vector in the direction of x y, then - is orthogonal to u, and we can use Exercise 26 in Section 3.6 to find the standard matrix Q of the reflection in the line through the origin in the direction of J_ . 1 . Show that Q = 2. Compute Q for T 1 ( a) u = [fl = [ : J , r = [ � ] u = I - 2uur (b) x We can generalize the definition of Q as follows. If is any unit vector in !R n , we define an n X n matrix Q as Q 396 Such a matrix is called a Householder matrix (or an elementary reflector) . 3. Prove that every Householder matrix Q satisfies the following properties: (a) Q is symmetric. (b) Q is orthogonal. (c) Q 2 = I 4. Prove that if Q is a Householder matrix corresponding to the unit vector u, then -v if v is in span ( u ) Qv = v if v · u = 0 AlstononeHouseholder (1904-1993) was of the pioneers in theHefield ofwasnumerical linear algebra. the firstoftoalgpresent a forsystemati c treatment ori t hms sol v i n g problems intovolintroducing ving linear systems. Inwidaddition the el y used Householder trans­ formations thatfirstbeartohisadvocate name, hethe was one of the systematic algebra. Hisuse1964of norms book in linear is considered a classic. The Theory of Matrices in Numerical Analysis 5. Compute Q foe u � { [ �] - <md wdfy Prnblem' 3 ond 4. 6. Let x * y with ll x ll = llrll and set u = ( 1/ ll x - rll ) ( x - y). Prove that the corresponding Householder matrix Q satisfies Q x = y. [Hint: Apply Exercise 57 in Section 1 .2 to the result in Problem 4.] 7. Find Q and verify Problem 6 for We are now ready to perform the triangularization of an m X n matrix A, column by column. 8. Let x be the first column of A and let Show that if Q 1 is the Householder matrix given by Problem 6, then Q 1 A is a matrix with the block form where A 1 is (m - l) X ( n - 1 ). If we repeat Problem 8 on the matrix A 1 , we use a Householder matrix P2 such that where A 2 is ( m - 2 ) X ( n - 2 ). 9. Set Q = 2 [ � :J. Show that Q 2 is an orthogonal matrix and that 391 10. Show that we can continue in this fashion to find a sequence of orthogonal matrices Qi, . . . , Q m - i such that Q m - i Q2 QiA = R is an upper triangular m X n matrix (i.e., r;1 = 0 if i > j). 1 1 . Deduce that A = QR with Q = Qi Q2 Q m - i orthogonal. 1 2. Use the method of this exploration to find a QR factorization of · · · · · · 3 3 -4 -1 -5 Ap proxi m ating Eigenval u e s with t h e QR Algo rith m SeeLoan,G. H. Golub and C. F. Van (Baltimore:Press, Johns1983). Hopkins University Matrix Computations One of the best (and most widely used) methods for numerically approximating the eigenvalues of a matrix makes use of the QR factorization. The purpose of this ex­ ploration is to introduce this method, the QR algorithm, and to show it at work in a few examples. For a more complete treatment of this topic, consult any good text on numerical linear algebra. (You will find it helpful to use a CAS to perform the calcula­ tions in the problems below.) Given a square matrix A, the first step is to factor it as A = QR (using whichever method is appropriate). Then we define Ai = RQ. 1. First prove that Ai is similar to A. Then prove that Ai has the same eigen­ values as A. 2. If A = [ � �] , find A 1 and verify that it has the same eigenvalues as A. Continuing the algorithm, we factor A i as Ai = QiR 1 and set A 2 = RiQi. Then we factor A 2 = Q2 R 2 and set A 3 = R 2 Q2 , and so on. That is, for k 2: 1 , we compute Ak = QkRk and then set Ak i = RkQk. + 3. Prove that Ak is similar to A for all k 2: 1. 4. Continuing Problem 2, compute A 2 , A 3 , A 4 , and A5, using two-decimal-place accuracy. What do you notice? It can be shown that if the eigenvalues of A are all real and have distinct absolute values, then the matrices Ak approach an upper triangular matrix U. 5. What will be true of the diagonal entries of this matrix U? 6. Approximate the eigenvalues of the following matrices by applying the QR algorithm. Use two-decimal-place accuracy and perform at least five iterations. ( a) ( c) [� � ] [� �] �[ � - � ] [ � � ] [ ] (b) -4 0 1 - ( d) -2 2 4 2 2 7. Apply the QR algorithm to the matrix A = -1 Why? 398 3 -2 . What happens? 8. Shift the eigenvalues of the matrix in Problem 7 by replacing A with B = A + 0.9I. Apply the QR algorithm to B and then shift back by subtracting 0.9 from the (approximate) eigenvalues of B. Verify that this method approximates the eigenvalues of A. 9. Let Q0 = Q and R 0 = R. First show that QO Q I . . . Qk - lA k = AQ0Q 1 . . . Qk - 1 for all k 2: 1 . Then show that [Hint: Repeatedly use the same approach used for the first equation, working from the "inside out:'] Finally, deduce that ( Q0Q 1 Qk) (Rk R 1 R0) is the QR factorization of A k+ i . · · · · · · 399 400 Chapter 5 Orthogonality Orthogonal Diagonalizalion ot svmmetric Matrices [o ] We saw in Chapter 4 that a square matrix with real entries will not necessarily have real -1 eigenvalues. Indeed, the matrix has complex eigenvalues i and - i. We also 1 0 discovered that not all square matrices are diagonalizable. The situation changes dramatically if we restrict our attention to real symmetric matrices. As we will show in this section, all of the eigenvalues of a real symmetric matrix are real, and such a matrix is always diagonalizable. Recall that a symmetric matrix is one that equals its own transpose. Let's begin by studying the diagonalization process for a symmetric 2 X 2 matrix. Exa m p l e 5 . 1 6 If possible, diagonalize the matrix A = [ ] 1 2 . 2 -2 The characteristic polynomial of A is A 2 + A - 6 = ( A + 3 )(A - 2), from which we see that A has eigenvalues A1 = - 3 and A 2 = 2. Solving for the correspond­ ing eigenvectors, we find Solulion v1 = [ -� �] [ _�] and v2 = [�] respectively. So A is diagonalizable, and if we set P = [ v1 p - 1AP = v2 ] , then we know that = D. However, we can do better. Observe that v1 and v2 are orthogonal. So, if we nor­ malize them to get the unit eigenvectors [ l / Vs U1 = -2/ Vs and then take ] and [ u2 = [ 2/l/ VsVs] ] l / Vs 2/ Vs -2/ Vs l / Vs we have Q - 1AQ = D also. But now Q is an orthogonal matrix, since { u1, u2 } is an 1 orthonormal set of vectors. Therefore, Q - = Q T, and we have Q TAQ = D. (Note that Q = [ u 1 Uz ] = checking is easy, since computing Q - 1 only involves taking a transpose!) The situation in Example 5.16 is the one that interests us. It is important enough to warrant a new definition. D efi n iii 0 n A square matrix A is orthogonally diagonalizable if there exists an orthogonal matrix Q and a diagonal matrix D such that Q TAQ = D. We are interested in finding conditions under which a matrix is orthogonally diagonalizable. Theorem 5 . 1 7 shows us where to look. Section 5.4 Orthogonal Diagonalization of Symmetric Matrices Theorem 5 . 1 1 401 If A is orthogonally diagonalizable, then A is symmetric. Proof If A is orthogonally diagonalizable, then there exists an orthogonal ma­ trix Q and a diagonal matrix D such that Q TAQ = D. Since Q - 1 = Q T, we have Q TQ = I = QQ T, so But then since every diagonal matrix is symmetric. Hence, A is symmetric. Theorem 5 . 1 7 shows that the orthogonally diagonalizable matrices are all to be found among the symmetric matrices. It does not say that every symmetric matrix must be orthogonally diagonalizable. However, it is a remarkable fact that this indeed is true! Finding a proof for this amazing result will occupy us for much of the rest of this section. Remark We next prove that we don't need to worry about complex eigenvalues when work­ ing with symmetric matrices with real entries. Theorem 5 . 1 8 If A is a real symmetric matrix, then the eigenvalues of A are real. Recall that the complex conjugate of a complex number z = a + bi is the number z = a - bi (see Appendix C). To show that z is real, we need to show that b = 0. One way to do this is to show that z = z, for then bi = - bi (or 2bi = O), from which it follows that b = 0. We can also extend the notion of complex conjugate to vectors and matrices by, for example, defining A to be the matrix whose entries are the complex conjugates of the entries of A; that is, if A = [a;) , then A = [ au ] . The rules for complex conjugation extend easily to matrices; in particular, we have AB = AB for compatible matrices A and B. Proof Suppose that A is an eigenvalue of A with corresponding eigenvector v. Then - Av = Av, and, taking complex conjugates, we have Av = Av. But then Av = Av = Av = Av = Av since A is real. Taking transposes and using the fact that A is symmetric, we have vTA = vTA T = (Avf = ( Avf = Avr Therefore, A ( vrv) = vr( Av) = vr(Av) = ( vrA ) v = ( Avr ) v = A( vrv) [ � l __ [ � l or ( A - A) ( vrv) = o. Now if v = a, . b, i an + bn i , then v - a, . b,i an - bni , so 402 Chapter 5 Orthogonality since v * 0 (because it is an eigenvector). We conclude that A - A = 0, or A = A. Hence, A is real. Theorem 4.20 showed that, for any square matrix, eigenvectors corresponding to distinct eigenvalues are linearly independent. For symmetric matrices, something stronger is true: Such eigenvectors are orthogonal. Theorem 5 . 1 9 If A is a symmetric matrix, then any two eigenvectors corresponding to distinct eigenvalues of A are orthogonal. Let v1 and v2 be eigenvectors corresponding to the distinct eigenvalues A 1 * A 2 so that Av1 = A 1 v1 and Av2 = A 2v2 . Using A T = A and the fact that x y = xTy for any two vectors x and y in !R n , we have Proof · (vfA T) Vz ( vfA ) v2 = vf (Av2 ) = vi ( A 2v2 ) A 2 ( vfv2 ) = A 2 ( v1 • v2 ) Hence, (A 1 - A 2 ) (v 1 v2 ) = 0. But A 1 - A 2 * 0, so v1 v2 = 0, as we wished to show. • Exa m p l e 5 . 11 • Verify the result of Theorem 5.19 for The characteristic polynomial of A is - A 3 + 6A 2 - 9A + 4 = - (A - 4) 2 (A - 1 ) , from which it follows that the eigenvalues of A are A 1 = 4 and A 2 = 1 . The corresponding eigenspaces are Solulion � · (Check this.) We easily verify that from which it follows that every vector in £4 is orthogonal to every vector in £ 1 . (Why?) Roman Note thot [ - �] [ - i ] • � !. Thos, 'igenvedo;s conesponding to the same eigenvalue need not be orthogonal. Section 5.4 Orthogonal Diagonalization of Symmetric Matrices 403 We can now prove the main result of this section. It is called the Spectral Theo­ rem, since the set of eigenvalues of a matrix is sometimes called the spectrum of the matrix. (Technically, we should call Theorem 5.20 the Real Spectral Theorem, since there is a corresponding result for matrices with complex entries.) Theorem 5 . 2 0 The Spectral Theorem Let A be an n X n real matrix. Then A is symmetric if and only if it is orthogonally diagonalizable. We have already proved the "if" part as Theorem 5 . 1 7. To prove the "only if" implication, we proceed by induction on n. For n = 1 , there is nothing to do, since a 1 X 1 matrix is already in diagonal form. Now assume that every k X k real symmet­ ric matrix with real eigenvalues is orthogonally diagonalizable. Let n = k + 1 and let A be an n X n real symmetric matrix with real eigenvalues. Let A 1 be one of the eigenvalues of A and let v 1 be a corresponding eigenvector. Then v1 is a real vector (why?) and we can assume that v1 is a unit vector, since otherwise we can normalize it and we will still have an eigenvector corresponding to A 1 . Using the Gram-Schmidt Process, we can extend v1 to an orthonormal basis {v1 , v2 , . . . , vn } of !R n . Now we form the matrix Q l = [v1 Vz vn J Proof is a Latinatomswordvibrate, meaning "image:' When they emit l i g ht. And when l i g ht passes through a prism,-ait spreads out irainbow nto a spectrum band of colors.correspond Vibrationto the frequencies eigenvalues of aascertain operator and are visible bright lines in the spectrum ofl i g ht that is emitted fromseea prism. Thus, we ofcantheliter-atom ally the eigenvalues inson,itsitspectrum, and forthatthistherea-word is appropriate to be applofiaed tomatrix the set(orhasofoperator). allcome eigenvalues Spectrum � . . . Then Q 1 is orthogonal, and spectrum In a lecture he delivered atconsidered the Universilinear ty of operators Gottingenacting in 1905,onthecertain German mathematician infini tte-dimensional vector spaces. Out of this lecture arose the notion of a quadratic form in i n fini elmean y manya variables, and i t was i n this context that Hi l b ert first used the term to complete setmade of eigmajor envalcontributions ues. The spacestoinmany question areof mathematics, now called among them integral Hilbert areas equations,International number theory, geometry, and the foundations ofHilbert mathematics. Inaddress 1900, entitled at the Second Congress of Mathematicians in Paris, gave an "Thefundamental Problems ofimportance Mathematics:' In theit, hecoming challenged mathematicians to solve 23haveproblems ofsolved-some during century. Many of the problems been were proved true, others false-and some may never be sol v ed. Nevertheless, mathematical community and is often regarded as the most iHilbert' nfluentis aspeech l speechenergized ever giventheabout mathematics. David Hilbert ( 1 862- 1 943) spectrum Hilbert spaces. 404 Chapter 5 Orthogonality since vf (A 1 v1 ) = A 1 ( vfv1 ) = A 1 (v1 · v1 ) = A 1 and vf (A 1 v1 ) = A 1 ( vfv1 ) = A 1 (v; · v1 ) = 0 for i of- 1, because {v1 , v2 , , vn } is an orthonormal set. But • . . so B is symmetric. Therefore, B has the block form � ..-... and A 1 is symmetric. Furthermore, B is similar to A (why?), so the characteristic poly­ nomial of B is equal to the characteristic polynomial of A, by Theorem 4.22. By Exercise 39 in Section 4.3, the characteristic polynomial of A 1 divides the character­ istic polynomial of A. It follows that the eigenvalues of A 1 are also eigenvalues of A and, hence, are real. We also see that A 1 has real entries. (Why?) Thus, A 1 is a k X k real symmetric matrix with real eigenvalues, so the induction hypothesis applies to it. Hence, there is an orthogonal matrix P2 such that PiA 1 P2 is a diagonal matrix-say, D 1 . Now let Then Q2 is an orthogonal (k + l)X(k + 1) matrix, and therefore so is Q = Q 1 Q2 . Consequently, Q rAQ = ( Q 1 Q 2 fA( Q 1 Q z ) = ( Q IQ f ) A ( Q 1 Q z ) = Q I( Q fAQ 1 )Q z = Q IB Q 2 2 which is a diagonal matrix. This completes the induction step, and we conclude that, for all n 1, an n X n real symmetric matrix with real eigenvalues is orthogonally diagonalizable. Exa m p l e 5 . 1 8 Orthogonally diagonalize the matrix This is the matrix from Example 5.17. We have already found that the eigenspaces of A are Solution Section 5.4 Orthogonal Diagonalization of Symmetric Matrices [ -�] r -il n l [! ] 405 We need three orthonormal eigenvectors. First, we apply the Gram-Schmidt Process to and to obtain and The new wctm, which h., been rnmtructed to be orthogoml to ,._... (why?) and '° ;, o,thogonal to [ [:l [ -�l ;, ''ill ;n E , Thu,, we haw th'" mutually mthogonal ] vectors, and all we need to do is normalize them and construct a matrix Q with these vectors as its columns. We find that Q = 1/ v3 - 1/ v2 - 1 / v6 l/ v3 0 2/ v6 v2 v3 l/ l/ - 1 / v6 and it is straightforward to verify that Q 'AQ � [� � �] The Spectral Theorem allows us to write a real symmetric matrix A in the form A = QDQ T, where Q is orthogonal and D is diagonal. The diagonal entries of D are just the eigenvalues of A, and if the columns of Q are the orthonormal vectors q1, . . . , qn , then, using the column-row representation of the product, we have This is called the spectral decomposition of A. Each of the terms A;q; qT is a rank 1 matrix, by Exercise 62 in Section 3.5, and q;qT is actually the matrix of the projec­ tion onto the subspace spanned by q;. (See Exercise 2 5.) For this reason, the spectral decomposition is sometimes referred to as the projection form of the Spectral Theorem. 406 Chapter 5 Orthogonality Exa m p l e 5 . 1 9 Find the spectral decomposition of the matrix A from Example 5 . 1 8. Solulion [ ] From Example 5 . 1 8, we have: qi = Therefore, l / VJ l / VJ , l / VJ q3 = l / VJ] = [ [ ] - 1 / \/6 2/ \/6 - 1 / \/6 ] � �] 1 /3 1 /3 1 /3 1 /3 1 /3 1 /3 1 /3 1 /3 1 /3 1 /2 -1 2 [� [ -1 2 0 so =4 [ttt ttt tttl [ + 1 /2 1 /6 - 1 /3 2/3 - 1 /3 1 /6 - 1 /3 1 /6 - 1 /3 1 /6 ] ! 0 0 0 -! 0 which can be easily verified. In this example, ,\ 2 = ,\ 3 , so we could combine the last two terms A 2 q2 qi + ,\ 3 q3 qr to get The rank 2 matrix q2 qi + q3 qr is the matrix of a projection onto the two-dimensional subspace (i.e., the plane) spanned by qi and q3 . (See Exercise 26.) 4 Observe that the spectral decomposition expresses a symmetric matrix A explic­ itly in terms of its eigenvalues and eigenvectors. This gives us a way of constructing a matrix with given eigenvalues and (orthonormal) eigenvectors. Exa m p l e 5 . 2 0 Find a 2 X 2 matrixwith eigenvalues ,\ 1 = 3 and ,\ 2 = - 2 and corresponding eigenvectors Section 5.4 Orthogonal Diagonalization of Symmetric Matrices 401 We begin by normalizing the vectors to obtain an orthonormal basis {q 1 , q2 }, with Solution Now, we compute the matrix A whose spectral decomposition is A = A 1 q 1 q f + A 2 qz qi [i] [ i] !1 [ ] [ -�] [-J �] -2 [t � ] =3 - 16 !s = 3 1 2 2516 - 2 251 2 25 = � I 3. A = 5. A = 7. A = 9. A = 25 25 It is easy to check that A has the desired properties. (Do this.) Exercises 5 . 4 Orthogonally diagonalize the matrices in Exercises 1 - 1 0 by finding a n orthogonal matrix Q and a diagonal matrix D such that Q TAQ = D. I. A = 25 l [ -� t [� :J [ V21 V2] [� : :] u : -�] O 2. A = 4. A = 6. A = [-1 -1] [9 ] [ � ! �] [� � ;] 3 -2 I 0. A = [ ab ab ] . [� � n 3 12. If b oF 0, orthogonally diogonolire A = -2 13. Let A and B be orthogonally diagonalizable n X n matrices and let c be a scalar. Use the Spectral Theorem to prove that the following matrices are orthogonally diagonalizable: (c) A 2 (a) A + B (b) cA 14. If A is an invertible matrix that is orthogonally diago­ nalizable, show that A- 1 is orthogonally diagonalizable. 15. If A and B are orthogonally diagonalizable and AB = BA, show that AB is orthogonally diagonalizable. 16. If A is a symmetric matrix, show that every eigenvalue of A is nonnegative if and only if A = B 2 for some symmetric matrix B. 6 [ � � � ! ] [ � � � �: 8. A = 1 1 . If b * 0 , orthogonally diagonalize A = In Exercises 1 7-20, find a spectral decomposition of the matrix in the given exercise. 408 Chapter 5 Orthogonality 17. Exercise 1 19. Exercise 5 18. Exercise 2 20. Exercise 8 In Exercises 21 and 22, find a symmetric 2 X 2 matrix with eigenvalues A 1 and A 2 and corresponding orthogonal eigenvectors v1 and v2 • 2 1 . A 1 = - l , A 2 = 2 , v1 = 22. A 1 = 3, A 2 = - 3, v1 = [�] [_�] [�] [ -�] , v2 = · , v2 = In Exercises 23 and 24, find a symmetric 3 X 3 matrix with eigenvalues A 1 , A 2 , and ,\ 3 and corresponding orthogonal eigenvectors v1 , v2 , and v3 . 23 A, � 1 , A, � 2, A, � 3, v, � 25. Let q be a unit vector in !R n and let W be the subspace spanned by q. Show that the orthogonal projection of a vector v onto W (as defined in Sections 1 .2 and 5.2) is given by proj w ( v) = ( qq T) v and that the matrix of this projection is thus qq T. [Hint: Remember that, for x and y in !R n , x y = xTy.] 26. Let {q 1 , . . . , qd be an orthonormal set of vectors in !R n and let W be the subspace spanned by this set. (a) Show that the matrix of the orthogonal projection onto W is given by p = q 1 qf + . . . + qkql [}, [ - : ] . � (b) Show that the projection matrix P in part (a) is symmetric and satisfies P 2 = P. (c) Let Q = [ q 1 · · · qk ] be the n X k matrix whose columns are the orthonormal basis vectors of W. Show that P = QQ T and deduce that rank(P) = k. 27. Let A be an n X n real matrix, all of whose eigenvalues are real. Prove that there exist an orthogonal matrix Q and an upper triangular matrix T such that Q TAQ = T. This very useful result is known as Schur's Triangular­ ization Theorem. [Hint: Adapt the proof of the Spec­ tral Theorem.] 28. Let A be a nilpotent matrix (see Exercise 56 in Sec­ tion 4.2). Prove that there is an orthogonal matrix Q such that Q T AQ is upper triangular with zeros on its diagonal. [Hint: Use Exercise 27.] A p p l icati o n s Quad ratic Forms An expression of the form ax 2 + by 2 + cxy is called a quadratic form in x and y. Similarly, ax 2 + by 2 + cz 2 + dxy + exz + fyz is a quadratic form in x, y, and z. In words, a quadratic form is a sum of terms, each of which has total degree two in the variables. Therefore, 5 x 2 - 3y 2 + 2xy is a quadratic form, but x 2 + y 2 + x is not. We can represent quadratic forms using matrices as follows: ax 2 + by 2 + cxy = [x y] [ ; � J [;J c 2 c 2 Section 5.5 Applications and ax 2 + by 2 + cz 2 + dxy + exz + fyz � = [� d/2 e/2 [x y z] d 2 b f/2 e/2 f/2 c 409 ][ ] x y z (Verify these.) Each has the form xTAx, where the matrix A is symmetric. This obser­ vation leads us to the following general definition. D e fi n it i o n form A quadratic form in n variables is a function f : !R n ---+ IR of the where A is a symmetric n X n matrix and x is in !R n . We refer to A as the matrix associated with f Exa m p l e 5 . 2 1 What is the quadratic form with associated matrix A Solution If x = [:J = [ ] 2 -3 ? -3 5 then Observe that the off-diagonal entries a1 2 a 2 1 -3 of A are combined to give the coefficient - 6 of x1x2 • This is true generally. We can expand a quadratic form in n variables xTAx as follows: = xTAx = = a 11 x� + a 22x� + · · · + a nnx� + 2: 2a ;jxi xj i<j Thus, if i * j, the coefficient of X; Xj is 2a ij . Exa m p l e 5 . 2 2 Find the matrix associated with the quadratic form f (x 1 , x 2 , x3 ) = 2x � - xi + 5xf + 6x1 x 2 - 3x 1 x3 The coefficients of the squared terms x;2 go on the diagonal as a;;, and the coefficients of the cross-product terms x;xj are split between a ij and aji· This gives Solution 3 -1 0 -�] 410 Chapter 5 Orthogonality so f(x ,. x,, x, ) � [x, x, x, J as you can easily check. [ _� - ! -�J[ ::l In the case of a quadratic form f(x, y) in two variables, the graph of z = j(x, y) is a surface in IR 3 . Some examples are shown in Figure 5 . 1 2 . Observe that the effect o f holding x o r y constant i s t o take a cross section of the graph parallel to the yz or xz planes, respectively. For the graphs in Figure 5 . 1 2, all of these cross sections are easy to identify. For example, in Figure 5 . 1 2 (a), the cross sections we get by holding x or y constant are all parabolas opening upward, so f(x, y) 0 for all values of x and y. In Figure 5 . 1 2 (c), holding x constant gives parabolas opening downward and holding y constant gives parabolas opening upward, producing a saddle point. 2 z z x x (a) z = 2x2 + 3y2 y (b) z = - 2x2 - 3y2 z z x y y (c) z = 2x2 - 3y2 Graphs of quadratic forms f Figure 5 . 1 2 (x, y) (d) z = 2x2 Section 5. 5 Applications 411 What makes this type of analysis quite easy is the fact that these quadratic forms have no cross-product terms. The matrix associated with such a quadratic form is a diagonal matrix. For example, 2x2 - 3y2 = [x y ] [ � - � J [;J In general, the matrix of a quadratic form is a symmetric matrix, and we saw in Sec­ tion 5.4 that such matrices can always be diagonalized. We will now use this fact to show that, for every quadratic form, we can eliminate the cross-product terms by means of a suitable change of variable. Let f(x) = xTAx be a quadratic form in n variables, with A a symmetric n X n matrix. By the Spectral Theorem, there is an orthogonal matrix Q that diagonalizes A; that is, Q TAQ = D, where D is a diagonal matrix displaying the eigenvalues of A. We now set X = Qy Or, equivalently, y = Q - 1 X = Q TX Substitution into the quadratic form yields xTAx = (Q yfA(Q y) = yTQ TAQy = yTDy which is a quadratic form without cross-product terms, since D is diagonal. Further­ more, if the eigenvalues of A are A1, . . . , A n , then Q can be chosen so that If y = [y l becomes Yn ] T, then, with respect to these new variables, the quadratic form yTDy = A 1Y12 + · · . + A ny; This process is called diagonalizing a quadratic form. We have just proved the fol­ lowing theorem, known as the Principal Axes Theorem. (The reason for this name will become clear in the next subsection.) Theorem 5 . 2 1 The Principal Axes Theorem Every quadratic form can be diagonalized. Specifically, if A is the n X n symmet­ ric matrix associated with the quadratic form xTAx and if Q is an orthogonal matrix such that Q TAQ = D is a diagonal matrix, then the change of variable x = Qy transforms the quadratic form xTAx into the quadratic form yTDy, which has no cross-product terms. If the eigenvalues of A are A1, , A n and y = [y 1 Yn f , then • xTAx = yTDy = A 1y � + · · · + A ny; . . 412 Chapter 5 Orthogonality Exa m p l e 5 . 2 3 5 Find a change o f variable that transforms the quadratic form f (x1 , x2 ) = x f + 4x1 x2 + 2x� into one with no cross-product terms. Solulion The matrix off is A= 6 2/Vs1. l /Vs] [ - 2/Vs - [ l /Vs] 6 /Vs o l ] [ 2/Vs [ 2/Vs 1] /Vs l [:: ] [;:] [ � �] [;:] 6 with eigenvalues A1 = and A 2 = (Check this.) If we set Corresponding unit eigenvectors are and qI � [� �] q2 = and D = Q= then Q TAQ = D. The change of variable x = Qy, where x= converts f into 0 and y = f (y) = f(y 1 , yi } = [ Y1 Y2 l = y f + Yi The original quadratic form xTAx and the new one yTDy (referred to in the Princi­ pal Axes Theorem) are equal in the following sense. In Example 5.23, suppose we want [ - � ]. - 1 5 - 1 - 1 2 11 1 /Vs /Vs l l ] ] ] [] [ [ [ 2/Vs 2/Vs /Vs /Vs l 6 6( 1/Vs)2 /Vs)2 55/5 11 to evaluate f( x) = xTAx at x = j( , 3) = ( We have )2 + 4( ) ( 3) + ( 3)2 = In terms of the new variables, so Y 1 = y = Q TX = y2 f(yl , yi } = y f + y� = 3 + (-7 = = -7 = exactly as before. The Principal Axes Theorem has some interesting and important consequences. We will consider two of these. The first relates to the possible values that a quadratic form can take on. 2.1. 5. D e fi n i t i o n A quadratic form f(x) = xTAx is classified as one of the following: positive de.finite iff( x) > 0 for all x -=fa 0 positive semidefinite iff( x) ::=:: 0 for all x 3. negative de.finite ifj(x) < 0 for all x -=fa 0 4. negative semidefinite iff( x) :s 0 for all x indefinite ifj(x) takes on both positive and negative values Section 5. 5 Applications 413 A symmetric matrix A is called positive definite, positive semidefinite, nega­ tive definite, negative semidefinite, or indefinite if the associated quadratic form f( x) = xTAx has the corresponding property. The quadratic forms in parts (a), (b ), (c), and (d) of Figure 5 . 1 2 are positive definite, negative definite, indefinite, and positive semidefinite, respectively. The Principal Axes Theorem makes it easy to tell if a quadratic form has one of these properties. Theorem 5 . 2 2 Let A be an n X n symmetric matrix. The quadratic form f( x) = xTAx is a. positive definite if and only if all of the eigenvalues of A are positive. b. positive semidefinite if and only if all of the eigenvalues of A are nonnegative. c. negative definite if and only if all of the eigenvalues of A are negative. d. negative semidefinite if and only if all of the eigenvalues of A are non positive. e. indefinite if and only if A has both positive and negative eigenvalues. You are asked to prove Theorem 5.22 in Exercise 27. Exa m p l e 5 . 2 4 Classify f(x, y, z) = 3.x2 + 3y2 + 3z2 - 2xy - 2xz - 2yz as positive definite, negative definite, indefinite, or none of these. Solution -� -[ � = � The matrix associated with f is ] 3 -1 -1 which has eigenvalues 1 , 4, and 4. (Verify this.) Since all of these eigenvalues are posi­ tive,f is a positive definite quadratic form. If a quadratic form f( x) = xTAx is positive definite, then, since f(O) = 0, the minimum value of f(x) is 0 and it occurs at the origin. Similarly, a negative definite quadratic form has a maximum at the origin. Thus, Theorem 5.22 allows us to solve certain types of maxima/minima problems easily, without resorting to calculus. A type of problem that falls into this category is the constrained optimization problem. It is often important to know the maximum or minimum values of a quadratic form subject to certain constraints. (Such problems arise not only in mathematics but also in statistics, physics, engineering, and economics.) We will be interested in finding the extreme values of f( x) = xTAx subject to the constraint that I x ii = 1 . In the case of a quadratic form in two variables, we can visualize what the problem means. The graph of z = f(x, y) is a surface in IR 3 , and the constraint ll x ll = 1 restricts the point (x, y) to the unit circle in the xy-plane. Thus, we are considering those points that lie simultaneously on the surface and on the unit cylinder perpendicular to the xy plane. These points form a curve lying on the surface, and we want the high­ est and lowest points on this curve. Figure 5 . 1 3 shows this situation for the quadratic form and corresponding surface in Figure 5 . 1 2 (c). 414 Chapter 5 Orthogonality z y The intersection of 1 cylinder Figure 5 . 1 3 x2 + y2 = z = 2x2 - 3y2 with the In this case, the maximum and minimum values off(x, y) = 2x 2 - 3y 2 (the high­ est and lowest points on the curve of intersection) are 2 and - 3, respectively, which are just the eigenvalues of the associated matrix. Theorem 5.23 shows that this is always the case. Theorem 5 . 2 3 Let f(x) = xTAx be a quadratic form with associated n X n symmetric matrix A. Let the eigenvalues of A be A 1 :::::: A 2 :::::: :::::: A w Then the following are true, subject to the constraint II xii = 1 : · · · a . A i 2: f(x) 2: A n b. The maximum value of f(x) is A 1 , and it occurs when x is a unit eigenvector corresponding to A 1 . c. The minimum value of f(x) is A n , and it occurs when x is a unit eigenvector corresponding to Aw As usual, we begin by orthogonally diagonalizing A. Accordingly, let Q be an orthogonal matrix such that Q TAQ is the diagonal matrix Proof Then, by the Principal Axes Theorem, the change of variable x = Qy gives xTAx = yTDy. Now note that y = Q Tx implies that since Q T = Q - 1 . Hence, using x x = xTx, we see that llrll = Wr = � = ll x ll = 1 . Thus, if x is a unit vector, so is the corresponding y, and the values of xTAx and yTDy are the same. · Section 5. 5 Applications 415 (a) To prove property (a), we observe that if y = [y 1 · · · Yn ] T, then f ( x) = xTAx = yTD y = A1Yl + A zyi + · · · + A ny; :::; A 1y f + A 1Yi + · · · + A 1y; = A 1 (y f + Yi + · · · + y; ) = A 1 llrll 2 = A1 Thus, f(x) s A 1 for all x such that ll x ll = The proof that f(x) 2: A n is similar. (See Exercise 37.) (b) If q 1 is a unit eigenvector corresponding to A 1 , then Aq 1 = A 1 q 1 and 1. f ( q 1 ) = q fAq 1 = q fA 1 Q 1 = A 1 ( q fq 1 ) = A 1 This shows that the quadratic form actually takes on the value A 1 , and so, by prop­ erty (a), it is the maximum value off(x) and it occurs when x = q 1 . (c) You are asked to prove this property in Exercise 38. Exa m p l e 5 . 2 5 2xi 1, Find the maximum and minimum values of the quadratic form f(x 1 , x 2 ) = 5x i + 4x 1 x2 + subject to the constraint xi + xi = and determine values of x 1 and x 2 for which each of these occurs. 1, In Example 5.23, we found thatf has the associated eigenvalues A 1 = 6 and with corresponding unit eigenvectors Solution A2 = -_ [ 2/Vs 1/Vs] 1 1/Vs Qi 1/Vs] -_ [ - 2/Vs 2/Vs 1/Vs. - 2/Vs. and Qz Therefore, the maximum value off is 6 when x 1 = and x2 = The mini­ mum value off is when x 1 = and x2 = (Observe that these extreme values occur twice-in opposite directions-since - q 1 and - q2 are also unit eigen­ vectors for A 1 and A 2 , respectively.) Graphing Quad ratic Equations The general form of a quadratic equation in two variables x and y is ax 2 + by 2 + cxy + dx + ey + f = 0 where at least one of a, b, and c is nonzero. The graphs of such quadratic equations are called conic sections (or conics), since they can be obtained by taking cross sections of a (double) cone (i.e., slicing it with a plane) . The most important of the conic sec­ tions are the ellipses (with circles as a special case), hyperbolas, and parabolas. These are called the nondegenerate conics. Figure 5 . 1 4 shows how they arise. It is also possible for a cross section of a cone to result in a single point, a straight line, or a pair of lines. These are called degenerate conics. (See Exercises 59-64.) The graph of a nondegenerate conic is said to be in standard position relative to the coordinate axes if its equation can be expressed in one of the forms in Figure 5 . 1 5. 416 Chapter 5 Orthogonality Ellipse Circle Parabola Hyperbola The nondegenerate conics Figure 5 . 1 4 x 2 y2 Ellipse or Circle: 2 + 2 = l; a , b > 0 a b y y y b a -b -a a<b a = b b -b a>b Hyperbola y y b -b x2 a2 _ .i_ - 1 , a, b > 0 b2 _ r2 b _ x2 - 1 , a, b > 0 a2 Parabola y y y y 2 y = ax , a > 0 2 y = ax , a < 0 x = ay 2 , a > 0 x = ay 2 , a < 0 Nondegenerate conics in standard position Figure 5 . 1 5 Section 5. 5 Applications Exa m p l e 5 . 2 6 411 If possible, write each of the following quadratic equations in the form of a conic in standard position and identify the resulting graph. (a) 4x 2 + 9y 2 = 36 (c) 4x 2 - 9y = 0 (b) 4x 2 - 9y 2 + 1 = 0 Solulion (a) The equation 4x 2 + 9y 2 = 36 can be written in the form x2 y z + = 1 9 4 - - so its graph is an ellipse intersecting the x-axis at ( ± 3, 0) and the y-axis at (O, ± 2) . (b) Th e equation 4x2 - 9y2 + 1 = 0 can b e written i n the form y2 x2 1- 1 = 1 9 4 so its graph is a hyperbola, opening up and down, intersecting the y-axis at (O, ± t) . (c) Th e equation 4x2 - 9y = 0 can b e written i n the form 4 y = -x 2 9 so its graph is a parabola opening upward. If a quadratic equation contains too many terms to be written in one of the forms in Figure 5 . 1 5, then its graph is not in standard position. When there are additional terms but no xy term, the graph of the conic has been translated out of standard position. Exa m p l e 5 . 2 1 Identify and graph the conic whose equation is x 2 + 2y 2 - 6x + Sy + 9 Solulion = 0 We begin by grouping the x and y terms separately to get ( x 2 - 6x) + ( 2y 2 + Sy) - 9 = or ( x 2 - 6x) + 2 (y 2 + 4y) = -9 Next, we complete the squares on the two expressions in parentheses to obtain ( x 2 - 6x + 9 ) + 2 (y 2 + 4y + 4 ) = - 9 + 9 + S or ( x - 3 ) 2 + 2 (y + 2 ) 2 = S We now make the substitutions x' = x - 3 and y' = y + 2, turning the above equa­ tion into ( x ' ) 2 (y ' ) 2 ( x' ) 2 + 2 (y' ) 2 = S or + = I s 4 -- -- 418 Chapter 5 Orthogonality This is the equation of an ellipse in standard position in the x 'y ' coordinate system, intersecting the x' -axis at ( ± 2 \/2, O) and the y'-axis at (O, ± 2) . The origin in the x'y' coordinate system is at x = 3, y = - 2, so the ellipse has been translated out of stan­ dard position 3 units to the right and 2 units down. Its graph is shown in Figure 5 . 1 6. y' y 2 x -2 -2 - x ' 4 translated ellipse Figure 5 . 1 6 A If a quadratic equation contains a cross-product term, then it represents a conic that has been rotated. Exa m p l e 5 . 2 8 Identify and graph the conic whose equation is 5x 2 + 4xy + 2y 2 = 6 The left-hand side of the equation is a quadratic form, so we can write it in matrix form as xTAx = 6, where Solulion A= [� �] In Example 5.23, we found that the eigenvalues of A are 6 and 1 , and a matrix Q that orthogonally diagonalizes A is Q= ....... [ l / Vs 2/ Vs l / Vs - 2 / Vs ] Observe that <let Q = - 1 . In this example, we will interchange the columns o f this matrix to make the determinant equal to + 1 . Then Q will be the matrix of a rotation, by Exercise 28 in Section 5. 1 . It is always possible to rearrange the columns of an orthogonal matrix Q to make its determinant equal to + 1 . (Why?) We set Q= instead, so that [ l / Vs 2/ Vs - 2/ Vs l / Vs ] Section 5. 5 Applications 419 The change ofvariable x = Qx' converts the given equation into the form (x' ) TDx' = 6 by means of a rotation. If x' = [; : ] x' , then this equation is just (x') 2 (x') 2 + 6(y') 2 = 6 or + (y') 2 = 1 6 which represents an ellipse in the y' coordinate system. y 3 To graph this ellipse, we need to know which vectors play the roles of e; = [x' �] and e� = of the in the new coordinate system. (These two vectors locate the positions and y' axes.) But, from x = Qx' , we have Qe; = and 1 Qez = x' A rotated ellipse Figure 5 . 1 1 [�] /Vs l l /Vs 2/Vs l ] ] [ - 2/Vs [ [ l /Vs - 2/Vs] 2/Vs l /Vs 2/Vs Q ] ] [ [ [ - 2/Vs l /Vs l /Vs] 0 1 = These are just the columns q 1 and q2 o f Q , which are the eigenvectors o f A ! The fact that these are orthonormal vectors agrees perfectly with the fact that the change of variable is just a rotation. The graph is shown in Figure 5 . 1 7. 4 You can now see why the Principal Axes Theorem is so named. If a real symmet­ ric matrix A arises as the coefficient matrix of a quadratic equation, the eigenvectors of A give the directions of the principal axes of the corresponding graph. It is possible for the graph of a conic to be both rotated and translated out of stan­ dard position, as illustrated in Example 5.29. Exa m p l e 5 . 2 9 5x 4xy 2 - -x Vs28 - Vs4 4 4 [ � � ] [ - � -�] 5.28 /Vs 2/Vs l- 2/Vs ] [ /Vs l 5.28, x' l /Vs 2/Vs ] [ - Vs28 - Vs4 ] [ - 2/Vs [ l /Vs ] - 4x' - 12y' Identify and graph the conic whose equation is 2+ + y2 - y+ =0 Solution The strategy is to eliminate the cross-product term first. In matrix form, the equation is xTAx + Bx + = 0, where and B = A= The cross-product term comes from the quadratic form xTAx, which we diagonalize as in Example by setting x = Qx' , where Q= Then, as in Example xTAx = ( x' fDx' = (x') 2 + 6(y') 2 But now we also have B x = BQx = I y' = 420 Chapter 5 Orthogonality y \ 2 -2 y' -1 y" x -1 -2 � -4 x ' x " Figure 5 . 1 8 Thus, in terms of x' and y', the given equation becomes (x') 2 + 6(y ') 2 - 4x' - 12y ' + 4 = 0 To bring the conic represented by this equation into standard position, we need to translate the x'y' axes. We do so by completing the squares, as in Example 5.27. We have or ((x') 2 - 4x' + 4) + 6((y ') 2 - 2y ' + 1 ) = -4 + 4 + 6 = 6 (x' - 2) 2 + 6(y ' - 1) 2 = 6 This gives us the translation equations x " = x' - 2 and y " = y ' In the x "y " coordinate system, the equation is simply (x " ) 2 + 6(y ") 2 = 6 4 which is the equation of an ellipse (as in Example 5.28). We can sketch this ellipse by first rotating and then translating. The resulting graph is shown in Figure 5.18. The general form of a quadratic equation in three variables x, y, and z is ax 2 + by 2 + cz 2 + dxy + exz + fyz + gx + hy + iz + j = 0 where at least one of a, b, . . . , f is nonzero. The graph of such a quadratic equation is called a quadric surface (or quadric) . Once again, to recognize a quadric we need Section 5. 5 Applications x--,,2 y2 z2 c z x2 y2 z2 c z Ellipsoid: :::2 + b2 + :::2 = 1 a Hyperboloid of one sheet: a "' + b2 - 421 :::2 = I y x x2 -+--ty22 -z22 c Hyperboloid of two sheets : -;;, z y x Elliptic paraboloid: z x Quadric surfaces Figure 5 . 1 9 = z x-;;,2 yb22 = - Elliptic cone: I z z2 x-;;,2 y2 y x Hyperbolic paraboloid: + z y + b2 = z x2 y2 = -;;, - p 422 Chapter 5 Orthogonality t o put it into standard position. Some quadrics i n standard position are shown in Figure 5.19; others are obtained by permuting the variables. Exa m p l e 5 . 3 0 Identify the quadric surface whose equation is 5x 2 + l ly 2 + 2z 2 + 16xy + 20xz - 4yz Solulion ] = 36 The equation can be written in matrix form as xTAx = 36, where 8 10 1 1 -2 -2 2 We find the eigenvalues of A to be 18, 9, and - 9, with corresponding orthogonal eigenvectors respectively. We normalize them to obtain and form the orthogonal matrix = [ ! ! =l l = - Note that in order for Q to be the matrix of a rotation, we require <let Q 1 , which is true in this case. (Otherwise, <let Q - 1 , and swapping two columns changes the sign of the determinant.) Therefore, and, with the change of variable x = Qx', we get xTAx = (x' )Dx' = 36, so 18(x') 2 + 9(y') 2 - 9(z') 2 = (x') 2 (y ') 2 + 36 or 2 4 -- -- - (z') 2 4 -- = 1 From Figure 5.19, we recognize this equation as the equation of a hyperboloid of one sheet. The x', y', and z' axes are in the directions of the eigenvectors q 1, q2 , and q3 , respectively. The graph is shown in Figure 5.20. Section 5. 5 Applications 423 z z' Anonstandard hyperboloidposition of one sheet in Figure 5.20 I We can also identify and graph quadrics that have been translated out of standard position using the "complete-the-squares method" of Examples 5.27 and 5.29. You will be asked to do so in the exercises. Exercises 5 . 5 Q u a d ratic F o r m s In Exercises 1 -6, evaluate the quadratic form f(x) = xTAx for the given A and x. 1. A = 2. A = 3. A = 4. A = 5. A = 6. A = [ � ! l [;] [ � � l [::] [ _ � ! l- [ ! ] u -} � [�] } � :] [ u [ : :Jx � m x= x= - 0 2 0 2 2 0 x= In Exercises 7-12, find the symmetric matrix A associated with the given quadratic form. 7. xl + 2xf + 6x 1 x2 8. X1 X2 9. 3x 2 - 3xy - y2 10. x � - x � + 8X 1 X2 - 6XzX3 1 1 . 5xl - xi + 2x� + 2x 1 x2 - 4x 1 x3 + 4x2x3 12. 2x 2 3y 2 + z 2 - 4xz - Diagonalize the quadratic forms in Exercises 13-18 by finding an orthogonal matrix Q such that the change of variable x = Qy transforms the given form into one with no cross-product terms. Give Q and the new quadratic form. 13. 2x l + 5x� 4x 1 x2 14. x 2 + 8xy + y 2 15. 7xl + x i + xj + 8X 1 X2 + 8X 1 X3 - 16XzX3 16. x l + x i + 3xj - 4x 1 x2 17. x 2 + z 2 - 2xy + 2yz 18. 2xy + 2xz + 2yz - 424 Chapter 5 Orthogonality Classify each of the quadratic forms in Exercises 1 9-26 as positive definite, positive semidefinite, negative definite, negative semidefinite, or indefinite. 20. x f + xi - 2x, x2 19. x f + 2x} 2 2 21. - 2x - 2y + 2xy 22. x 2 + y 2 + 4xy 23. 2x f + 2x i + 2x� + 2x1 x2 + 2x1 x3 + 2x2x3 24. x f + x i + x� + 2x1 x3 25. x i + x � - x � + 4x 1 x2 26. -x 2 - y 2 - z 2 - 2xy - 2xz - 2yz 27. Prove Theorem 5.22. 28. Let A = [ � �] be a symmetric 2 X 2 matrix. Prove that A is positive definite if and only if a > 0 and <let A > 0. [Hint: ax 2 + 2bxy + dy 2 = ( � y (d - �2 )y 2.] a x+ y + 29. Let B be an invertible matrix. Show that A = B TB is positive definite. 30. Let A be a positive definite symmetric matrix. Show that there exists an invertible matrix B such that A = B TB . [Hint: Use the Spectral Theorem to write A = QDQ T. Then show that D can be factored as c Tc for some invertible matrix C.] 31. Let A and B be positive definite symmetric n X n matrices and let c be a positive scalar. Show that the following matrices are positive definite. (a) cA (b) A 2 (c) A + B _ , (d) A (First show that A is necessarily invertible.) 32. Let A be a positive definite symmetric matrix. Show that there is a positive definite symmetric matrix B such that A = B 2 • (Such a matrix B is called a square root of A.) In Exercises 33-36, find the maximum and minimum val­ ues of the quadratic form f(x) in the given exercise, subject to the constraint ll x ll = 1, and determine the values of x for which these occur. 33. Exercise 20 34. Exercise 22 35. Exercise 23 36. Exercise 24 37. Finish the proof of Theorem 5.23 (a) . 38. Prove Theorem 5.23 ( c). G r a p h i n g Q u a d ratic E q u a t i o n s In Exercises 39-44, identify the graph of the given equation. 39. x 2 + 5y 2 = 25 40. x 2 - y 2 - 4 = 0 41. x 2 - y - 1 = 0 42. 2x 2 + y 2 - 8 = 0 44. x = - 2y 2 43. 3x 2 = y 2 - 1 In Exercises 45-50, use a translation of axes to put the conic in standard position. Identify the graph, give its equation in the translated coordinate system, and sketch the curve. 45. x 2 + y 2 - 4x - 4y + 4 = 0 46. 4x 2 + 2y 2 - Sx + l2y + 6 = 0 47. 9x 2 - 4y 2 - 4y = 37 48. x 2 + lOx - 3y = - 13 49. 2y 2 + 4x + Sy = 0 50. 2y 2 - 3x 2 - 1 8x - 20y + 1 1 = 0 In Exercises 51 -54, use a rotation of axes to put the conic in standard position. Identify the graph, give its equation in the rotated coordinate system, and sketch the curve. 51. x 2 + xy + y 2 = 6 52. 4x 2 + lOxy + 4y 2 = 9 53. 4x 2 + 6xy - 4y 2 = 5 54. 3x 2 - 2xy + 3y 2 = 8 In Exercises 55-58, identify the conic with the given equa­ tion and give its equation in standard form. 55. 3x 2 - 4xy + 3y 2 - 28 v'2x + 22 Vly + 84 = 0 56. 6x 2 - 4xy + 9y 2 - 20x - lOy - 5 = 0 57. 2xy + 2 \/2x - 1 = 0 58. x 2 - 2xy + y 2 + 4 V2x - 4 = 0 Sometimes the graph of a quadratic equation is a straight line, a pair of straight lines, or a single point. We refer to such a graph as a degenerate conic. It is also possible that the equation is not satisfied for any values of the variables, in which case there is no graph at all and we refer to the conic as an imaginary conic. In Exercises 59-64, identify the conic with the given equation as either degenerate or imaginary and, where possible, sketch the graph. 59. x 2 - y 2 = 0 60. x 2 + 2y 2 + 2 = 0 62. x 2 + 2xy + y 2 = 0 61. 3x 2 + y 2 = 0 63. x 2 - 2xy + y 2 + 2 Vlx - 2 Vly = 0 64. 2x 2 + 2xy + 2y 2 + 2 \/2x - 2 Vly + 6 = 0 65. Let A be a symmetric 2 X 2 matrix and let k be a scalar. Prove that the graph of the quadratic equation xTA x = k is (a) a hyperbola if k * 0 and <let A < 0 (b) an ellipse, circle, or imaginary conic if k * 0 and det A > 0 (c) a pair of straight lines or an imaginary conic if k * 0 and <let A = 0 (d) a pair of straight lines or a single point if k = 0 and det A * 0 (e) a straight line if k = 0 and <let A = 0 [Hint: Use the Principal Axes Theorem.] Chapter Review In Exercises 66-73, identify the quadric with the given equation and give its equation in standard form. 66. 4x 2 + 4y 2 + 4z 2 + 4xy + 4xz + 4yz = 8 67. x 2 + y 2 + z 2 - 4yz = 1 68. -x 2 - y 2 - z 2 + 4xy + 4xz + 4yz = 12 69. 2xy + z = 0 70. 16x 2 + 100y 2 + 9 z 2 - 24xz - 60x - 80z = 0 71. x 2 + y 2 - 2z 2 + 4xy - 2xz + 2yz - x + y + z = 0 72. 10x 2 + 25y 2 + 10z 2 - 40xz + 20 \/2x + soy + 20 \/2z = 1 5 73. l lx 2 + l ly 2 + 14z 2 + 2xy + 8xz - 8yz - 12x + 12y + 12z = 6 425 74. Let A be a real 2 X 2 matrix with complex eigenvalues A = a :±: bi such that b =F 0 and I A I = 1 . Prove that every trajectory of the dynamical system xk + i = Axk lies on an ellipse. [Hint: Theorem 4.43 shows that if v is an eigenvector corresponding to A = a - bi, then the matrix P = [ Re v Im v] is invertible and - -1 A=P P . Set B = (PPT) - 1 . Show that the [: �J quadratic xTBx = k defines an ellipse for all k > 0, and prove that if x lies on this ellipse, so does Ax.] Chapter Review Kev Defi nitions and Concepts fundamental subspaces of a matrix, 380 Gram-Schmidt Process, 389 orthogonal basis, 370 orthogonal complement of a subspace, 378 orthogonal matrix, 374 orthogonal projection, 382 orthogonal set of vectors, 369 Orthogonal Decomposition Theorem, 384 orthogonally diagonalizable matrix, 400 orthonormal basis, 372 Review Questions 1 . Mark each of the following statements true or false: (a) Every orthonormal set of vectors is linearly independent. (b) Every nonzero subspace of u;g n has an orthogonal basis. (c) If A is a square matrix with orthonormal rows, then A is an orthogonal matrix. (d) Every orthogonal matrix is invertible. (e) If A is a matrix with det A = 1 , then A is an orthogonal matrix. (f) If A is an m X n matrix such that (row(A))_j_ = u;g n , then A must be the zero matrix. (g) If W is a subspace of u;g n and v is a vector in u;g n such that projw(v) = 0, then v must be the zero vector. (h) If A is a symmetric, orthogonal matrix, then A 2 = I. (i) Every orthogonally diagonalizable matrix is invertible. orthonormal set of vectors, 372 properties of orthogonal matrices, 374-376 QR factorization, 393 Rank Theorem, 386 spectral decomposition, 405 Spectral Theorem, 403 (j) Given any n real numbers A 1 , , A n , there exists a symmetric n X n matrix with A 1 , , A n as its eigenvalues. 2. Find all values of a and b such that . • . . \ [H [ J [fl ) . i< an mthogonal <et of vedm. 3. Find the coordinate vector [ v] of v = respect to the orthogonal basis B • 8 [ -�] � ml [ J [ - � l) or n' 2 with Chapter 5 Orthogonality 426 4. The coordinate vector of a vector v with respect to an -3 orthonormal basis B = {v1 , v2 } of lR 2 is [v] 8 = . 1/2 3/5 If v1 = , find all possible vectors v. 4/ 5 [ ] [ ] [ - �j� [ � :] 5. Show that � ] 1 2 - [ ! J · � - [ iJ· � - [ rJ A- [ ! � ] m [J - ml· - } A [ � �l A. A. 15. ( a) Apply the Gram-Schmidt Process to 2 is an 4/7 Vs - 1 5/7 Vs 2/7 Vs orthogonal matrix. 6. If �� 2 7 with respect to � is an orthogonal matrix, find all possible values of a, b, and c. 7. If Q is an orthogonal n X n matrix and {v1 , , vk } is an orthonormal set in !R n , prove that { Q v1 , , Q vk } is an orthonormal set. 8. If Q is an n X n matrix such that the angles L ( Q x, Q y) and L ( x, y) are equal for all vectors x and y in IR " , prove that Q is an orthogonal matrix. • • • . • . In Questions 9-12, find a basis for W _j_ . 9. W is the line in IR 2 with general equation 2x - Sy = 0 10. W is the line in IR 3 with parametric equations x=t y = 2t z = -t to find an orthogonal basis for W = span{x 1 , x2 , xJ. (b) Use the result of part (a) to find a QR factorization of 16. Find an orthogonal basis for IR 4 that contains the vecto" '" d 17. Find an orthogonal basis for the subspace w 18. Let + x, + x, + x. = 2 -1 1 0 om' - · 2 ( a) Orthogonally diagonalize (b) Give the spectral decomposition of A [ � � -� � �] - - 13. Find bases for each of the four fundamental subspaces of = 1 2 3 -5 4 8 6 -1 14. Find the orthogonal decomposition of v - [ -�] 9 7 19. Find a symmetric matrix with eigenvalues ,\ 1 = ,\ 2 = 1, A 3 = - 2 and eigenspaces 20. If {v1 , v2 , • prove that ues c 1 , c2 , • A . • . • , v" } is an orthonormal basis for !R n and is a symmetric matrix with eigenval­ , e n and corresponding eigenvectors Vector Spaces 6.0 Algebra is generous; she often gives more than is asked of her. -Jean le Rond d'Alembert In Carl B. Boyer Wiley, ( 1 7 1 7- 1 783) A History of Mathematics 1 968, p. 48 1 I n t ro d u ctio n : Fib o n acci i n (Vecto r> S p a c e The Fibonacci sequence was introduced in Section 4.6. It i s the sequence 0, 1, 1, 2, 3, 5, 8, 13, . . . of nonnegative integers with the property that after the first two terms, each term is the sum of the two terms preceding it. Thus 0 + 1 = 1, 1 + 1 = 2, 1 + 2 = 3, 2 + 3 = 5, and so on. If we denote the terms of the Fibonacci sequence by f0,f1,f2 , , then the entire sequence is completely determined by specifying that Jo = O,f1 = 1 and fn = fn - l + fn - l for n 2 By analogy with vector notation, let's write a sequence x0, x1, x2 , x3 , . . . as x = [xo , X1, X2 , X3 , . . . ) The Fibonacci sequence then becomes f = [f0, J1, J2 ,f3 , ) = [O, 1, 1, 2, . . . ) We now generalize this notion. . 2 • . • • . 2 A Fibonacci-type sequence is any sequence x [x0, x1, x 2 , x 3 , such that x0 and x 1 are real numbers and Xn = xn - l + Xn - z for n 2. D e fi n it i o n = • . • ) For example, [1, \/2, 1 + \/2, 1 + 2 \/2, 2 + 3 \/2, . . . ) is a Fibonacci-type sequence. Write down the first five terms of three more Fibonacci-type sequences. By analogy with vectors again, let's define the sum of two sequences x = [x0, x1, x2 , ) and y = [y0, y 1, y2 , ) to be the sequence Problem 1 • . . • . • If c is a scalar, we can likewise define the scalar multiple of a sequence by ex = [ cx0 , cx 1 , cx2 , • • • ) 421 428 Chapter Vector Spaces 6 (a) Using your examples from Problem 1 or other examples, compute the sums of various pairs of Fibonacci-type sequences. Do the re­ sulting sequences appear to be Fibonacci-type? (b) Compute various scalar multiples of your Fibonacci-type se­ quences from Problem 1. Do the resulting sequences appear to be Fibonacci-type? Problem 3 (a) Prove that if x and y are Fibonacci-type sequences, then so is x + y. (b) Prove that if x is a Fibonacci-type sequence and c is a scalar, then ex is also a Fibonacci-type sequence. Let's denote the set of all Fibonacci-type sequences by Fib. Problem 3 shows that, like 11;r, Fib is closed under addition and scalar multiplication. The next exercises show that Fib has much more in common with 11;r. Problem 4 Review the algebraic properties of vectors in Theorem 1 . 1 . Does Fib satisfy all of these properties? What Fibonacci-type sequence plays the role of O? For a Fibonacci-type sequence x, what is - x? Is - x also a Fibonacci-type sequence? n Problem 5 In !R , we have the standard basis vectors e1, e 2 , , e n - The Fibonacci sequence f = [O, 1, 1, 2, . . . ) can be thought of as the analogue of e2 because its first two terms are 0 and 1 . What sequence e in Fib plays the role of e1? What about e3 , e4, ? Do these vectors have analogues in Fib? Problem 6 Let x = [ x0 , x1, x2 , ) be a Fibonacci-type sequence. Show that x is a linear combination of e and f. Problem 1 Show that e and f are linearly independent. (That is, show that if ce + df = 0, then c = d = 0.) Problem 8 Given your answers to Problems 6 and 7, what would be a sensible value to assign to the "dimension" of Fib ? Why? Problem 9 Are there any geometric sequences in Fib? That is, if 2 [ 1 , r, r , r 3 , . . . ) is a Fibonacci-type sequence, what are the possible values of r? Problem 10 Find a "basis" for Fib consisting of geometric Fibonacci-type sequences. Problem 11 Using your answer to Problem 10, give an alternative derivation of Binet's formula [formula ( 5) in Section 4.6] : Problem 2 • . • . • • . • . ) ( ( ) 1_ 1 Vs n 1 1 + Vs n Vs 2 Vs 2 for the terms of the Fibonacci sequence f = [f0 , f1 , f2 , ) . [Hint: Express f in terms of j, n= The LucasLucas sequence is named after Edouard (see page 336). _... __ _ _ - . • . the basis from Problem 10.] The Lucas sequence is the Fibonacci-type sequence l = [ 10 , 11 , 12 , 13 , ) = [2, 1, 3, 4, . . . ) Problem 12 Use the basis from Problem 1 0 to find an analogue of Binet's formula for the nth term Zn of the Lucas sequence. Problem 13 Prove that the Fibonacci and Lucas sequences are related by the identity fn - 1 + fn + I = Zn for n 2: 1 [Hint: The Fibonacci-type sequences C = [ 1 , 1, 2, 3, . . . ) and f + = [ 1 , 0, 1, 1, . . . ) form a basis for Fib. (Why?) ] I n this Introduction, we have seen that the collection Fib o f all Fibonacci-type sequences behaves in many respects like IR 2 , even though the "vectors" are actually infinite sequences. This useful analogy leads to the general notion of a vector space that is the subject of this chapter. . • . Section Vector Spaces and Subspaces 6. 1 429 Vecto r S p a ces a n d S u b s p aces I n Chapters 1 and 3, we saw that the algebra o f vectors and the algebra o f matrices are similar in many respects. In particular, we can add both vectors and matrices, and we can multiply both by scalars. The properties that result from these two opera­ tions (Theorem 1 . 1 and Theorem 3.2) are identical in both settings. In this section, we use these properties to define generalized "vectors" that arise in a wide variety of examples. By proving general theorems about these "vectors:' we will therefore simultaneously be proving results about all of these examples. This is the real power of algebra: its ability to take properties from a concrete setting, like !R n , and abstract them into a general setting. Let V be a set on which two operations, called addition and scalar multiplication, have been defined. If u and v are in V, the sum of u and v is denoted by u + v, and if c is a scalar, the scalar multiple of u by c is denoted by cu. If the following axioms hold for all u, v, and w in V and for all scalars c and d, then V is D e fi n it i o n The GermanGrassmann mathematician Hermann is generallythecredited wiath fivector rst introducing i d ea of (alitnhough heUnfor­ did nottunatelcallyspace i, thisthat)work wasnotveryreceidivfefi­ cult to read and did thepersonattentiwhoondidit deserved. Onethe study it was Italian mathematicianInGiuseppe Peanobook his Peano clarified Grassmann' sthe earlier work and laid down axioms a vector spacePeano'as s webook knowis foralsothem today. remarkable forsets. introducing operations on His notations andand (for"is "union;' "intersection;' anstilelement of") aretheythewere ones notwe use, al t hough immediatel y accepted bys axio­ other mathematicians. Peano' matic definition ofe ianvector space also had very l i t tl fl u ence for many years. Acceptance came in afterrepeated Hermannit Wey! hisintroduction book to Einstein'singeneralan theory of relativity. ( 1 809- 1 877) 1 844. 1 888 called a vector space and its elements are called vectors. Closure under addition Commutativityvity Associati Closure under scalar multiplication Distributi Distributivviittyy 1 . u + v is in V. 2. u + v = v + u 3. (u + v) + w = u + (v + w) 4. There exists an element 0 in V, called a zero vector, such that u + 0 = u. 5. For each u in V, there is an element - u in V such that u + ( - u) = 0. 6. cu is in V. 7. C ( U + V) = CU + CV 8. (c + d)u = cu + du 9. c (du) = (cd) u 10. lu = u ( 1 858- 1 932) . Calcolo Geometrico, U, n, E 1 9 1 8, ( 1 88 5 - 1 955) Space, Time, Matter, R e m a rks By "scalars" we will usually mean the real numbers. Accordingly, we should refer to V as a real vector space (or a vector space over the real numbers). It is also pos­ sible for scalars to be complex numbers or to belong to "ll.P , where p is prime. In these cases, V is called a complex vector space or a vector space over "ll.P , respectively. Most of our examples will be real vector spaces, so we will usually omit the adjective "real:' If something is referred to as a "vector space:' assume that we are working over the real number system. In fact, the scalars can be chosen from any number system in which, roughly speaking, we can add, subtract, multiply, and divide according to the usual laws of arithmetic. In abstract algebra, such a number system is called a field. • The definition of a vector space does not specify what the set V consists of. Neither does it specify what the operations called "addition" and "scalar multi­ plication'' look like. Often, they will be familiar, but they need not be. See Example 6.6 and Exercises 5-7. • We will now look at several examples of vector spaces. In each case, we need to specify the set V and the operations of addition and scalar multiplication and to verify axioms 1 through 10. We need to pay particular attention to axioms 1 and 6 (closure), 430 Chapter Vector Spaces 6 axiom 4 (the existence of a zero vector in V), and axiom 5 (each vector in V must have a negative in V). Exa m p l e 6 . 1 For any n 2: 1 , !R n is a vector space with the usual operations of addition and scalar multiplication. Axioms 1 and 6 follow from the definitions of these operations, and the remaining axioms follow from Theorem 1 . 1 . Exa m p l e 6 . 2 The set of all 2 X 3 matrices is a vector space with the usual operations of matrix addition and matrix scalar multiplication. Here the "vectors" are actually matrices. We know that the sum of two 2 X 3 matrices is also a 2 X 3 matrix and that multiply­ ing a 2 X 3 matrix by a scalar gives another 2 X 3 matrix; hence, we have closure. The remaining axioms follow from Theorem 3.2. In particular, the zero vector 0 is the 2 X 3 zero matrix, and the negative of a 2 X 3 matrix A is just the 2 X 3 matrix -A. There is nothing special about 2 X 3 matrices. For any positive integers m and n , the set of all m X n matrices forms a vector space with the usual operations of matrix addition and matrix scalar multiplication. This vector space is denoted Mm n · Exa m p l e 6 . 3 Let <;IP 2 denote the set of all polynomials of degree 2 or less with real coefficients. Define addition and scalar multiplication in the usual way. (See Appendix D.) If p(x) = a 0 + a 1 x + a 2x 2 and q(x) = b0 + b 1 x + b2x 2 are in <!P 2 , then has degree at most 2 and so is in <!P 2 • If c is a scalar, then cp(x) = ca0 + ca 1 x + ca 2x 2 is also in <!P 2 . This verifies axioms 1 and 6. The zero vector 0 is the zero polynomial-that is, the polynomial all of whose coefficients are zero. The negative of a polynomial p(x) = a0 + a1x + a 2x 2 is the poly­ nomial -p ( x) = - a0 - a1x - a 2x 2 • It is now easy to verify the remaining axioms. We will check axiom 2 and leave the others for Exercise 12. With p(x) and q(x) as above, we have p (x) + q(x) = = = = = (a0 + a 1 x + a 2x 2 ) + (b0 + b 1 x + b2x 2 ) (a0 + b0) + (a 1 + b 1 )x + (a 2 + b2 )x2 ( b0 + a0) + (b 1 + a 1 )x + (b2 + a 2 )x2 ( b0 + b 1 x + b2x 2 ) + (a0 + a 1 x + a 2x 2 ) q(x) + p(x) where the third equality follows from the fact that addition of real numbers is commutative. Section Vector Spaces and Subspaces 6. 1 431 In general, for any fixed n 2: 0, the set <!J' n of all polynomials of degree less than or equal to n is a vector space, as is the set <!J' of all polynomials. Exa m p l e 6 . 4 Let ?Jf denote the set of all real-valued functions defined on the real line. Iff and g are two such functions and c is a scalar, then f + g and cf are defined by (f + g) (x) = f (x) + g(x) and (cj) (x) = cf (x) In other words, the value off + g at x is obtained by adding together the values off and g at x [Figure 6.1 (a) J . Similarly, the value of cf at x is just the value off at x mul­ tiplied by the scalar c [Figure 6. 1 (b) J . The zero vector in ?Jf is the constant function f0 that is identically zero; that is, f0 (x) 0 for all x. The negative of a functionf is the function -f defined by ( -j) (x) -f(x) [Figure 6.l (c) ] . Axioms 1 and 6 are obviously true. Verification o f the remaining axioms is left as Exercise 1 3 . Thus, ?Jf is a vector space. = = y y (x, 2/(x)) \ I (x, f(x) + g (x)) I 2f (x, - 3/(x)) (b) (a) y /-f(x)) -f (x, (c) The graphs of (a) f, and f (b) f, 2f, and -3f, and (c) f and -f Figure 6 . 1 g, + g, 432 Chapter Vector Spaces 6 In Example 6.4, we could also have considered only those functions defined on some closed interval [a, b] of the real line. This approach also produces a vector space, denoted by ?f [a, b] . Exa m p l e 6 . 5 Exa m p l e 6 . 6 The set "1l_ of integers with the usual operations is not a vector space. To demon­ strate this, it is enough to find that one of the ten axioms fails and to give a specific instance in which it fails (a counterexample). In this case, we find that we do not have closure under scalar multiplication. For example, the multiple of the integer 2 by the scalar t is (t)(2) = t , which is not an integer. Thus, it is not true that ex is in "1l_ for every x in "1l_ and every scalar c (i.e., axiom 6 fails). Let V = IR 2 with the usual definition of addition but the following definition of scalar multiplication: Then, for example, so axiom 10 fails. [In fact, the other nine axioms are all true (check this), but we do not need to look into them, because V has already failed to be a vector space. This example shows the value of looking ahead, rather than working through the list of axioms in the order in which they have been given.] Exa m p l e 6 . 1 Let C 2 denote the set of all ordered pairs of complex numbers. Define addition and scalar multiplication as in IR 2 , except here the scalars are complex numbers. For example, ] [ ] [ - 2-+31� i ] [ 2l -+ 3ii ] = [ ((ll -- i)i)((2l -+ 3i)i) ] = [ - 1 2- Si ] ( 1 - i) [ and - 3 + 2i 1+i + . 4 2 - 31 6 Using properties of the complex numbers, it is straightforward to check that all ten axioms hold. Therefore, C 2 is a complex vector space. In general, e n is a complex vector space for all n Exa m p l e 6 . 8 2 1. 4 If p is prime, the set z; (with the usual definitions of addition and multiplication by scalars from "ll_p ) is a vector space over ZP for all n 2: 1 . 4 Section Vector Spaces and Subspaces 6. 1 433 Before we consider further examples, we state a theorem that contains some use­ ful properties of vector spaces. It is important to note that, by proving this theorem for vector spaces in general, we are actually proving it for every specific vector space. Theorem 6 . 1 Let V be a vector space, u a vector in V, and c a scalar. a. Ou = 0 b. co = 0 c. ( - l)u = -u d . I f cu = 0, then c = 0 o r u = 0. Proof We prove properties (b) and ( d) and leave the proofs of the remaining proper­ ties as exercises. (b) We have cO = c(O + 0) = cO + cO by vector space axioms 4 and 7. Adding the negative of cO to both sides produces co + ( -co) = (co + co) + ( -co) which implies by axioms and 3 by axiom by axiom 5 o = co + (co + ( -co)) 5 = co + 0 4 = co (d) Suppose cu = 0. To show that either c = 0 or u = 0, let's assume that c * 0. (If c = 0, there is nothing to prove.) Then, since c * 0, its reciprocal l /c is defined, and u = lu 1 -( cu) c 1 -0 c 0 by axiom 10 by axiom 9 by property (b) We will write u - v for u + ( -v), thereby defining subtraction of vectors. We will also exploit the associativity property of addition to unambiguously write u + v + w for the sum of three vectors and, more generally, for a linear combination of vectors. Subspaces We have seen that, in !R n , it is possible for one vector space to sit inside another one, giving rise to the notion of a subspace. For example, a plane through the origin is a subspace of IR 3 . We now extend this concept to general vector spaces. 434 Chapter Vector Spaces 6 A subset W of a vector space V is called a subspace of V if W is itself a vector space with the same scalars, addition, and scalar multiplication as V. D e fi n i t i o n As in !R n , checking to see whether a subset W of a vector space V is a subspace of V involves testing only two of the ten vector space axioms. We prove this observation as a theorem. Theorem 6 . 2 Let V be a vector space and let W be a nonempty subset of V. Then W is a sub­ space of V if and only if the following conditions hold: a. If u and v are in W, then u + v is in W. b. If u is in W and c is a scalar, then cu is in W. Assume that W is a subspace of V. Then W satisfies vector space axioms 1 to 10. In particular, axiom 1 is condition (a) and axiom 6 is condition (b) . Conversely, assume that W is a subset of a vector space V, satisfying condi­ tions (a) and (b) . By hypothesis, axioms 1 and 6 hold. Axioms 2, 3, 7, 8, 9, and 10 hold in W because they are true for all vectors in V and thus are true in particular for those vectors in W. (We say that W inherits these properties from V.) This leaves axioms 4 and 5 to be checked. Since W is nonempty, it contains at least one vector u. Then condition (b) and Theorem 6. l (a) imply that Ou = 0 is also in W. This is axiom 4. If u is in V, then, by taking c = - 1 in condition (b ), we have that -u = ( - l)u is also in W, using Theorem 6.l (c) . Proof Remark Since Theorem 6.2 generalizes the notion of a subspace from the con­ text of !R n to general vector spaces, all of the subspaces of !R n that we encountered in Chapter 3 are subspaces of !R n in the current context. In particular, lines and planes through the origin are subspaces of IR 3 . Exa m p l e 6 . 9 Exa m p l e 6 . 1 0 We have already shown that the set <!/' n of all polynomials with degree at most n is a vector space. Hence, <!/' n is a subspace of the vector space <!/' of all polynomials. 4 Let W be the set of symmetric n X n matrices. Show that W is a subspace of Mnn - Clearly, W is nonempty, so we need only check conditions (a) and (b) in Theorem 6.2. Let A and B be in W and let c be a scalar. Then A T = A and B T = B, from which it follows that Solulion Therefore, A + B is symmetric and, hence, is in W. Similarly, (cAf = cA T = cA so cA is symmetric and, thus, is in W. We have shown that W is closed under addition and scalar multiplication. Therefore, it is a subspace of Mnn ' by Theorem 6.2. 4 Section Vector Spaces and Subspaces 6. 1 Exa m p l e 6 . 1 1 435 Let C(;S be the set of all continuous real-valued functions defined on IR and let 0J be the set of all differentiable real-valued functions defined on IR. Show that C(;S and 0J are subspaces of �, the vector space of all real-valued functions defined on IR. From calculus, we know that iff and g are continuous functions and c is a scalar, then f + g and cf are also continuous. Hence, C(;S is closed under addition and scalar multiplication and so is a subspace of �. Iff and g are differentiable, then so are f + g and cf Indeed, Solution (j + g) ' = f ' + g ' and ( cf) ' = c (j') So 0J is also closed under addition and scalar multiplication, making it a subspace of �. ...+ It is a theorem of calculus that every differentiable function is continuous. Conse­ quently, 0J is contained in C(;S (denoted by 0J C C(;S), making 0J a subspace ofC(;S . It is also the case that every polynomial function is differentiable, so rt/' C 0J , and thus rt/' is a subspace of 0J . We therefore have a hierarchy of subspaces of �, one inside the other: This hierarchy is depicted in Figure 6.2. The hierarchy of subspaces of� Figure 6 . 2 There are other subspaces of � that can be placed into this hierarchy. Some of these are explored in the exercises. In the preceding discussion, we could have restricted our attention to functions defined on a closed interval [a, b] . Then the corresponding subspaces of � [a, b] would be C(;S [a, b] , 0J [a, b] , and rt/' [a, b] . Exa m p l e 6 . 1 2 Let S be the set of all functions that satisfy the differential equation f" + f 0 = That is, S is the solution set of Equation ( 1 ). Show that S is a subspace of �- (1) 436 Chapter Vector Spaces 6 Solulion S is nonempty, since the zero function clearly satisfies Equation ( 1 ). Let f and g be in S and let c be a scalar. Then (f + g) " + (f + g) = (j" + g " ) + (f + g) = (f" + f) + (g " + g) =O+O =O which shows that f + g is in S. Similarly, (cf)" + cf = cf" + cf = c (f " + f) = co =O so cf is also in S. Therefore, S is closed under addition and scalar multiplication and is a subspace of ?F. The differential Equation ( 1) is an example of a homogeneous linear differential equation. The solution sets of such equations are always subspaces of ?F. Note that in Example 6.12 we did not actually solve Equation ( 1) (i.e., we did not find any specific solutions, other than the zero function) . We will discuss techniques for finding solu­ tions to this type of equation in Section 6.7. As you gain experience working with vector spaces and subspaces, you will notice that certain examples tend to resemble one another. For example, consider the vector spaces IR 4 , <!/' 3 , and M22 • Typical elements of these vector spaces are, respectively, u� Indejathevuwords all overof Yogi again:'Berra, "It's Exa m p l e 6 . 1 3 m p (x) � a + bx + ex' + dx', and A � [: !] Any calculations involving the vector space operations of addition and scalar multi­ plication are essentially the same in all three settings. To highlight the similarities, in the next example we will perform the necessary steps in the three vector spaces side by side. (a) Show that the set W of all vectors of the form is a subspace of IR 4 • (b) Show that the set W of all polynomials of the form a + bx - bx 2 + ax 3 is a subspace of <!!' 3 . ( c) Show that the set W of all matrices of the form [ _ : �] is a subspace of M22 . Section Vector Spaces and Subspaces 6. 1 a=b= b= uv Solution (a) W is nonempty because it con­ tains the zero vector 0. (Take 0.) Let and be in W-say, a= b= AA = B � �] a [ B = [ � �] A+B= [ -a(b++ ba ++ d] (b) W is nonempty because it con­ tains the zero polynomial. (Take O.) Letp (x) and q(x) be in W-say, p ( x) ax 3 and q ( x) [ ab ++ l u+v= -ab+ [ ba ++ l - a(b++ u+v k Then Then p ( x) c d -d c = a + bx -bx2 + =+ + + = +(a(+b + -+ ((ba ++ dx - dx2 c q ( x) (c) W is nonempty because it con­ tains the zero matrix 0. (Take 0. ) Let and be in W-say, cx 3 and c) _ c c ) x3 so is also in W (because it has the right form). Similarly, if is a scalar, then _ Then d) x d ) x2 c d d) c 431 d) + k A+B k k = ka + kbx - kbx2 + kax3 kA = [ -kakb kakb] kA so p(x) q(x) is also in W (because it has the right form). Similarly, if is a scalar, then c so is also in W (because it has the right form). Similarly, if is a scalar, then p ( x) so ku is in W. Thus, W is a nonempty subset of IR 4 that is closed under addition and scalar multiplication. Therefore, W is a subspace of iR 4 , by Theorem 6.2. so is in W. Thus, W is a nonempty subset of M22 that is closed under addition and scalar multiplication. Therefore, W is a subspace of M22 , by Theorem 6. so kp (x) is in W. Thus, W is a nonempty subset of <;IP 3 that is closed under addition and scalar multiplication. Therefore, W is a subspace of <;if 3 by Theorem 6.2. 4 Example 6. 1 3 shows that it is often possible to relate examples that, on the surface, appear to have nothing in common. Consequently, we can apply our knowledge of !R n to polynomials, matrices, and other examples. We will encounter this idea several times in this chapter and will make it precise in Section 6.5. Exa m p l e 6 . 1 4 += = If V is a vector space, then V is clearly a subspace of itself. The set {O}, consisting of only the zero vector, is also a subspace of V, called the zero subspace. To show this, we simply note that the two closure conditions of Theorem 6.2 are satisfied: 0 0 0 and c 0 0 for any scalar c The subspaces {O} and V are called the trivial subspaces of V. 438 Chapter Vector Spaces 6 An examination of the proof of Theorem 6.2 reveals the following useful fact: If W is a subspace of a vector space V, then W contains the zero vector 0 of V. This fact is consistent with, and analogous to, the fact that lines and planes are sub­ spaces of !R 3 if and only if they contain the origin. The requirement that every subspace must contain 0 is sometimes useful in showing that a set is not a subspace. Exa m p l e 6 . 1 5 Let W be the set of all 2 X 2 matrices of the form Is W a subspace of M22 ? Solulion Each matrix in W has the property that its ( 1 , 2) entry is one more than its ( 1 , 1) entry. Since the zero matrix 0 = [� �] does not have this property, it i s not in W. Hence, W is not a subspace o f M22 • Exa m p l e 6 . 1 6 Let W be the set of all 2 X 2 matrices with determinant equal to 0. Is W a subspace of M22 ? (Since <let 0 0, the zero matrix is in W, so the method of Example 6. 15 is of no use to us.) = Solulion Let A Then <let A = <let B = = [ � �] and B [ � � ] = 0, so A and B are in W. But A +B = [� � ] so det (A + B) = 1 =fa 0, and therefore A + B is not in W. Thus, W is not closed under addition and so is not a subspace of M22 . Spanning sets The notion of a spanning set of vectors carries over easily from !R n to general vector spaces. D e fi n it i o n If S {v1 , v2 , . . • , vd is a set of vectors in a vector space V, then the set of all linear combinations of v1 , v2 , . . . , vk is called the span of v1 , v2 , . . . , vk and is denoted by span (v1 , v2 , . . . , vk) or span ( S ) . If V = span (S ), then S is called a spanning set for V and V is said to be spanned by S. = Section Vector Spaces and Subspaces 6. 1 Exa m p l e 6 . 1 1 439 Show that the polynomials 1, x, and x 2 span l!J' 2 • By its very definition, a polynomial p (x) = a + bx + cx2 is a linear combi­ nation of 1, x, and x2 • Therefore, l!J' 2 = span ( l , x, x 2 ) . Solution Example 6. 1 7 can clearly b e generalized t o show that l!J' n = span ( l , x, x2 , . . . , n x ) . However, no finite set of polynomials can possibly span l!J', the vector space of all polynomials. (See Exercise 44 in Section 6.2.) But, if we allow a spanning set to be infinite, then clearly the set of all nonnegative powers of x will do. That is, l!J' = span ( l , x, x2 , • • • ) . Exa m p l e 6 . 1 8 Show that M23 = span (E 1 1 , E 1 2 , E 1 3 , E2 1 , E22 , E23 ), where El l = E2 1 = [� �] [� �] 0 0 E1 2 = 0 0 E22 = [� �] [� �] 0 0 E1 3 = E23 = [� �] [ � �] 0 0 0 0 (That is, E;j is the matrix with a 1 in row i, column j and zeros elsewhere.) Solution We need only observe that Extending this example, we see that, in general, Mm n is spanned by the mn matri­ ces Eij , where i = 1 , . . . , m and j = 1, . . . , n. Exa m p l e 6 . 1 9 In l!J' 2 , determine whether r (x) = 1 - 4x + 6x 2 is in span(p(x), q(x)), where p ( x) = 1 - x + x 2 and q ( x) = 2 + x - 3x 2 We are looking for scalars c and d such that cp(x) + dq(x) = r (x) . This means that c ( l - x + x 2 ) + d ( 2 + x - 3x 2 ) = 1 - 4x + 6x 2 Solution Regrouping according powers of x, we have ( c + 2 d) + ( - c + d) x + ( c - 3d) x 2 = 1 - 4x + 6x 2 Equating the coefficients of like powers of x gives c + 2d = 1 - c + d = -4 c - 3d = 6 440 Chapter Vector Spaces 6 which is easily solved to give c = 3 and d = r(x) is in span(p(x), q(x)). (Check this.) Exa m p l e 6 . 2 0 - 1 Therefore, r(x) = 3p(x) - q(x), so . In ?F, determine whether sin 2x is in span(sin x, cos x) . We set c sin x + d cos x = sin 2x and try to determine c and d so that this equation is true. Since these are functions, the equation must be true for all values of x. Setting x = 0, we have S o lution c sin O + d cos O = sin O or c ( O ) + d ( l ) = O from which we see that d = 0. Setting x = 1T /2, we get c sin ( 7r/2 ) + d cos ( 'TT / 2 ) = sin ( 7T ) or c ( l ) + d ( O ) = 0 giving c = 0. But this implies that sin 2x = O (sin x) + O(cos x) = 0 for all x, which is absurd, since sin 2x is not the zero function. We conclude that sin 2x is not in span (sin x, cos x). Remark It is true that sin 2x can be written in terms of sin x and cos x. For example, we have the double angle formula sin 2x = 2 sin x cos x. However, this is not a linear combination. Exa m p l e 6 . 2 1 In M22 , describe the span of A = Solution [ � � l B = [ � � l and C = [ � �] . Every linear combination of A, B, and C is of the form [� �] + d [ � � ] + e[ � �] c+d c+e ] = [ c+ e d cA + dB + eC = c This matrix is symmetric, so span (A, B, C) is contained within the subspace of sym­ metric 2 X 2 matrices. In fact, we have equality; that is, every symmetric 2 X 2 matrix is [; �] be a symmetric 2 2 matrix. Setting [yx y ] = [ cc ++ de c +d e ] in span(A, B, C). To show this, we let X z and solving for c and d, we find that c = x - z, d = z, and e = - x + y + z. Therefore, � (Check this.) It follows that span(A, B, C) is the subspace of symmetric 2 X 2 matrices . .+ Section Vector Spaces and Subspaces 6. 1 441 As was the case in !R n , the span of a set of vectors is always a subspace of the vector space that contains them. The next theorem makes this result precise. It generalizes Theorem 3.19. Theorem 6 . 3 Let V i, v2 , • . . , vk b e vectors in a vector space V. a. span (vi, v2 , b. span (vi, v2 , • . • . • . , vk) is a subspace of V. , vk) is the smallest subspace of V that contains Vi, v2 , • . . , vk. (a) The proof of property (a) is identical to the proof of Theorem 3 . 1 9, with Proof !R n replaced by V. (b) To establish property (b ), we need to show that any subspace of V that contains Vi, v2 , , vk also contains span( vi, v2 , , vk) · Accordingly, let W be a subspace of V that contains Vi, v2 , . . . , vk. Then, since W is closed under addition and scalar multi­ plication, it contains every linear combination Ci Vi + c 2v2 + + ckvk of Vi, v2 , , vk. Therefore, span (vi, v2 , , vk) is contained in W. . • . • • • · · · . 1 • • . . . Exercises 6 . 1 In Exercises 1 - 1 1, determine whether the given set, together with the specified operations of addition and scalar multi­ plication, is a vector space. If it is not, list all of the axioms that fail to hold. 1 . The set of all vectors in IR 2 of the form [: l with the usual vector addition and scalar multiplication 2. The set of all vectors [;] in IR2 with x 2: 0, y 2: 0 (i.e., the first quadrant), with the usual vector addition and scalar multiplication 3. The set of all vectors [�] in IR2 with xy 2: 0 (i.e., the union of the first and third quadrants), with the usual vector addition and scalar multiplication 4 . The set of all vectors [; ] in IR 2 with x 2: y, with the usual vector addition and scalar multiplication 5. IR 2 , with the usual addition but scalar multiplication defined by 6. IR 2 , with the usual scalar multiplication but addition defined by [] [] [ x, x + 2 Y1 Yz = l] x , + x2 + Yi + Yz + 1 7. The set of all positive real numbers, with addition EB defined by x EB y = xy and scalar multiplication 0 defined by c 0 x = xc 8. The set of all rational numbers, with the usual addition and multiplication 9. The set of all upper triangular 2 X 2 matrices, with the usual matrix addition and scalar multiplication [: �l 10. The set of all 2 X 2 matrices of the form where ad = 0, with the usual matrix addition and scalar multiplication 1 1 . The set of all skew-symmetric n X n matrices, with the usual matrix addition and scalar multiplication (see page 162). 12. Finish verifying that <!f 2 is a vector space (see Example 6.3). 13. Finish verifying that ge is a vector space (see Example 6.4) . 442 Chapter Vector Spaces 6 E:v In Exercises 1 4- 1 7, determine whether the given set, together with the specified operations of addition and scalar multi­ plication, is a complex vector space. If it is not, list all of the axioms that fail to hold. 14. The set of all vectors in C 2 of the form [ �] , with the usual vector addition and scalar multiplication 15. The set Mm n ( C) of all m X n complex matrices, with the usual matrix addition and scalar multiplication 16. The set C 2 , with the usual vector addition but scalar 30. V = Mnn ' W = {A in Mnn : <let A = 1 } 31. V = Mnn ' W is the set of diagonal n X n matrices 32. V = Mnn > W is the set of idempotent n X n matrices 33. V = Mnn ' W = {A in Mnn : AB = BA}, where B is a given (fixed) matrix 34. V = C!P 2 , W = {bx + cx 2 } 35. V = C!P 2 , W = {a + bx + cx 2 : a + b + c = O} 36. V = C!P 2 , W = {a + bx + cx 2 : abc = O} z z multiplication defined by c 1 = � 1 37. V = C!P, W is the set of all polynomials of degree 3 CZ2 Z2 17. !R n , with the usual vector addition and scalar multiplication 38. V = ':ffe , W = {f in '2F : f ( -x) = f(x)} 39. V = ':ffe , W = {f in '2F :f( -x) = -f(x)} In Exercises 1 8-21, determine whether the given set, together 40. V = ':ffe , w = {f in '2F :f(O) = l } with the specified operations of addition and scalar. multipli. ';'Y .· f( ) -_ } . is. a ector space �ver th e in. d"zca ted lL · If "t is no t, l"is t ·tt;. 41. v _- '!:Y, w _- {f m catzon, p � . . Jlh 42. V _- '!:Y, W is. the set of all mtegrable funct10ns all of the axzoms that fail to hold. CZl! :f'(x) 2: for all x} 18. The set of all vectors in "11._ � with an even number of Jlh 43. V = CZJJ , W = {f in l (2 l s, over "ll 2 with the usual vector addition and scalar � 44. V = ':ffe , W = ct; , the set of all functions with [] [ ] 77 a;-; u.:: 1 a;-; 0 0 0 continuous second derivatives multiplication Jlh 45. V = ':ffe , W = {f in '2F : = lim f (x) = 00} 19. The set of all vectors in "11._ � with an odd number of X->0 l s, over "ll 2 with the usual vector addition and scalar 46. Let V be a vector space with subspaces U and W. Prove multiplication that U n W is a subspace of V. 20. The set Mm n ( "llp ) of all m X n matrices with entries 47. Let V be a vector space with subspaces U and W. Give from "11._P ' over "11._P with the usual matrix addition and an example with v = [R 2 to show that U U W need not scalar multiplication be a subspace of V. 21. "ll 6 , over "ll 3 with the usual addition and multiplication 48. Let V be a vector space with subspaces U and W. (Think this one through carefully!) Define the sum of U and W to be 22. Prove Theorem 6.l (a) . U + W = { u + w : u is in U, w is in W} 23. Prove Theorem 6.l (c) . (a) If V = IR 3 , U is the x-axis, and W is the y-axis, what is U + W ? In Exercises 24-45, use Theorem 6.2 to determine whether (b) If U and W are subspaces o f a vector space V, W is a subspace of V prove that U + W is a subspace of V. 49. If U and V are vector spaces, define the Cartesian product of U and V to be 24. v � �' . w � 25. v � �' . w � U X V = { (u, v) : u is in U and v is in V} Prove that U X V is a vector space. 50. Let W be a subspace of a vector space V. Prove that 26. v = IR 3 , w = Li = { (w, w) : w is in W } is a subspace of V X V. a+b+l mii fl [ � l) 27. v � �' . w � lUJ l trn 1 1 In Exercises 51 and 52, let A = and 1 1 l -1 B= . Determine whether C is in span(A, B). 1 l 2 3 -5 51. c = 52. c = 5 -1 3 4 [ 0] [ ] [ ] [ ] Section Linear Independence, Basis, and Dimension 6.2 i ] [o ] In Exercises 53 and 54, let p(x) = 1 - 2x, q(x) = x - x 2, and r(x) = - 2 + 3x + x 2 • Determine whether s(x) is in span(p(x), q(x), r(x) ). 53. s(x) = 3 - 5x - x 2 54. s ( x) = 1 + x + x 2 In Exercises 55-58, let f(x) = sin 2x and g(x) = cos 2x. Determine whether h(x) is in span(j(x), g(x) ). 55. h (x) = 1 56. h (x) = cos 2x 57. h (x) = sin 2x 58. h ( x) = sin x 59. Is M22 spanned by [� � l [� �l [ � �l [� �] - Writi n g Project 443 1 ' 1 -1 ? 0 61. Is <!!' 2 spanned by 1 + x, x + x 2 , 1 + x 2 ? 62. Is <!!' 2 spanned by 1 + x + 2 x 2 , 2 + x + 2 x 2 , - 1 + x + 2x 2 ? 63. Prove that every vector space has a unique zero ? vector. 64. Prove that for every vector v in a vector space V, there is a unique v ' in V such that v + v ' = 0. The Rise of Vector Spaces As noted in the sidebar on page 429, in the late 1 9th century, the mathematicians Hermann Grassmann and Giuseppe Peano were instrumental in introducing the idea of a vector space and the vector space axioms that we use today. Grassmann's work had its origins in barycentric coordinates, a technique invented in 1 827 by August Ferdinand Mobius (of Mobius strip fame). However, widespread acceptance of the vector space concept did not come until the early 20th century. Write a report on the history of vector spaces. Discuss the origins of the notion of a vector space and the contributions of Grassmann and Peano. Why was the math­ ematical community slow to adopt these ideas, and how did acceptance come about? 1 . Carl B. Boyer and Uta C. Merzbach, A History of Mathematics (Third Edition) (Hoboken, NJ: Wiley, 20 1 1 ) . 2. Jean-Luc, Dorier ( 1 995), A General Outline o f the Genesis o f Vector Space Theory, Historia Mathematica 22 ( 1 995), pp. 227-26 1 . 3. Victor J. Katz, A History of Mathematics: A n Introduction (Third Edition) (Read­ ing, MA: Addison Wesley Longman, 2008). f l i n e a r I n d e p e n d e n ce . Basis , a n d D i m e n s i o n In this section, we extend the notions of linear independence, basis, and dimension to general vector spaces, generalizing the results of Sections 2.3 and 3.5. In most cases, the proofs of the theorems carry over; we simply replace !R n by the vector space V. linear Independence D e fi n it i o n A set of vectors {v1, v2 , . . . , vd in a vector space V is linearly de­ pendent if there are scalars c1, c2 , . . . , c k, at least one of which is not zero, such that A set of vectors that is not linearly dependent is said to be linearly independent. 444 Chapter Vector Spaces 6 As in IJ�r , {v1 , v2 , . • , vd is linearly independent in a vector space V if and only if • c 1 v1 + c 2v2 + · · · + ckvk = 0 implies c 1 = 0, c2 = 0, . . . , c k = 0 We also have the following useful alternative formulation of linear dependence. Theorem 6 . 4 A set of vectors {v1 , v2 , , vk } in a vector space V is linearly dependent if and only if at least one of the vectors can be expressed as a linear combination of the others. . Proof • . The proof is identical to that of Theorem 2.5. As a special case of Theorem 6.4, note that a set of two vectors is linearly depen­ dent if and only if one is a scalar multiple of the other. Exa m p l e 6 . 2 2 In l!f 2 , the set { l + x + x 2 , 1 - x + 3x 2 , 1 + 3x - x 2 } is linearly dependent, since 2 ( 1 + x + x 2 ) - ( 1 - x + 3x 2 ) = 1 + 3x - x 2 Exa m p l e 6 . 2 3 In M22 , let A= [� � l B = [� - 01 ] , C = [ 21 O1 J Then A + B = C, so the set {A, B, C} is linearly dependent. Exa m p l e 6 . 2 4 Exa m p l e 6 . 2 5 In ':IF, the set {sin 2 x, cos 2 x, cos 2x} is linearly dependent, since cos 2x = cos 2x - sin2x Show that the set { l , x, x2 , . . . , x n } is linearly independent in <!P w , e n are scalars such that Co · 1 + C 1 X + Cz X 2 + · · · + C n X n = 0 Then the polynomial p (x) = c0 + c 1 x + c2 x2 + + c n x n is zero for all values of x. But a polynomial of degree at most n cannot have more than n zeros (see Appendix D). So p (x) must be the zero polynomial, meaning that c0 = c 1 = c2 = = en = 0. Therefore, {l, x, x2 , . . . , x n } is linearly independent. Solution 1 Suppose that c0 , c 1 , . • . · · · · · · l� We begin, as in the first solution, by assuming that p ( x ) = Co + C 1 X + C z X 2 + · · · + C n X n = Q Since this is true for all x, we can substitute x = 0 to obtain c0 = 0. This leaves C 1 X + CzX 2 + · · · + C n X n = 0 Solution 2 Section Linear Independence, Basis, and Dimension 6.2 445 Taking derivatives, we obtain + ncnxn - I = 0 and setting x = 0, we see that c 1 = 0. Differentiating 2c 2 x + 3c 3 x 2 + + ncn x n - I = 0 and setting x = 0, we find that 2c 2 = 0, so c 2 = 0. Continuing in this fashion, we find that k!ck = 0 for k = 0, . . . , n. Therefore, c0 = c 1 = c 2 = = e n = 0, and { 1 , x, C 1 + 2CzX + 3C3X2 + · · · · · · · · · x2 , . . . ' x n } is linearly independent. Exa m p l e 6 . 2 6 4 In <;!/' 2 , determine whether the set { 1 + x, x + x 2 , 1 + x 2 } is linearly independent. Solution Let c 1 , c 2 , and c 3 be scalars such that Then This implies that C1 + c3 = 0 =0 C1 + Cz C 2 + C3 = 0 the solution to which is c 1 = c2 = c 3 = 0. It follows that { 1 + x, x + x 2 , 1 + x 2 } is linearly independent. Remark Compare Example 6.26 with Example 2.23(b). The system of equations that arises is exactly the same. This is because of the correspondence between <;!/' 2 and IR 3 that relates I+ r n [iJ x + x' - [:J 1 + x' - m and produces the columns of the coefficient matrix of the linear system that we have to solve. Thus, showing that { 1 + x, x + x 2 , 1 + x 2 } is linearly independent is equiva­ lent to showing that is linearly independent. This can be done simply by establishing that the matrix [i : �] has rank 3, by the Fundamental Theorem of Invertible Matrices. 446 Chapter Vector Spaces 6 Exa m p l e 6 . 2 1 In ?JP , determine whether the set {sin x, cos x} is linearly independent. The functions f(x) = sin x and g(x) = cos x are linearly dependent if and only if one of them is a scalar multiple of the other. But it is clear from their graphs that this is not the case, since, for example, any nonzero multiple ofj(x) = sin x has the same zeros, none of which are zeros ofg(x) = cos x. This approach may not always be appropriate to use, so we offer the following direct, more computational method. Suppose c and d are scalars such that S o lulion c sin x + d cos x = 0 Setting x = 0, we obtain d = 0, and setting x = n/2, we obtain c = 0. Therefore, the set {sin x, cos x} is linearly independent. Although the definitions of linear dependence and independence are phrased in terms of finite sets of vectors, we can extend the concepts to infinite sets as follows: A set S of vectors in a vector space V is linearly dependent if it contains finitely many linearly dependent vectors. A set of vectors that is not linearly dependent is said to be linearly independent. Note that for finite sets of vectors, this is just the original definition. Following is an example of an infinite set of linearly independent vectors. Exa m p l e 6 . 2 8 In <lP , show that S = { l , x, x2 , . . . } is linearly independent. Suppose there is a finite subset T of S that is linearly dependent. Let x m be n the highest power of x in T and let x be the lowest power of x in T. Then there are scalars en, Cn+ l ' . . . , c m , not all zero, such that Solulion But, by an argument similar to that used in Example 6.25, this implies that en = Cn + i = = cm = 0, which is a contradiction. Hence, S cannot contain finitely many linearly dependent vectors, so it is linearly independent. · · · Bases The important concept of a basis now can be extended easily to arbitrary vector spaces. D e fi n it i o n A subset l3 of a vector space V is a basis for V if 1 . l3 spans V and 2. l3 is linearly independent. Section Linear Independence, Basis, and Dimension 6.2 Exa m p l e 6 . 2 9 Exa m p l e 6 . 3 0 441 If e; is the ith column of the n X n identity matrix, then { e1, e2 , . . . , en } is a basis for !R n , called the standard basis for !R n . { l , x, x 2 , • . • , x n } is a basis for '!P n ' called the standard basis for '!P n - Exa m p l e 6 . 3 1 The set [ = {E 11 , . • . , E 1n , E 21 , • • • , E 2n , Em1 , • • • , Emn } is a basis for Mm n ' where the matrices Eij are as defined in Example 6. 1 8. [ is called the standard basis for Mm n ­ We have already seen that [ spans Mm n - It is easy to show that [ is linearly inde­ pendent. (Verify this! ) Hence, [ is a basis for Mm n - Exa m p l e 6 . 3 2 Show that B = { l + x, x + x 2 , 1 + x 2 } is a basis for '!1' 2 • We have already shown that B is linearly independent, in Example 6.26. To show that B spans '!1' 2 , let a + bx + cx 2 be an arbitrary polynomial in '!1' 2 • We must show that there are scalars c1, c2 , and c 3 such that Solution or, equivalently, (c 1 + c3 ) + (c 1 + c2 )x + (c2 + c 3 )x 2 = a + bx + cx 2 Equating coefficients of like powers of x, we obtain the linear system whi<h has a solution, since the weffideot matcix [i : �] has rnok 3 aod, hence, is invertible. (We do not need to know what the solution is; we only need to know that it exists.) Therefore, B is a basis for '!P 2 • Ro11ark Obmve that the matrix O [i �] is the key to Example 6.32. We rnn immediately obtain it using the correspondence between '!1' 2 and IR 3 , as indicated in the Remark following Example 6.26. 448 Chapter Vector Spaces 6 Exa m p l e 6 . 3 3 Show that B = { l , x, x 2 , • • • } is a basis for <!P . In Example 6.28, we saw that B is linearly independent. It also spans <!P , since clearly every polynomial is a linear combination of (finitely many) powers of x. S o lulion Exa m p l e 6 . 3 4 .+ Find bases for the three vector spaces in Example 6. 13: Once again, we will work the three examples side by side to highlight the similarities among them. In a strong sense, they are all the same example, but it will take us until Section 6.5 to make this idea perfectly precise. Solulion (b) Since (a) Since we have W1 = span (u, v), where we have W2 = span (u(x), v(x)), where u ( x) = 1 + x 3 and Since {u, v} is clearly linearly in­ dependent, it is also a basis for W1 . [ J [0 J [ 0 0J (c) Since a + bx - bx 2 + ax 3 = a ( l + x 3 ) + b (x - x 2 ) v ( x) = x - x 2 Since {u (x), v (x) } is clearly lin­ early independent, it is also a basis for W2 . 1 O 1 a b =a +b -b a 1 -1 [0 0J we have W3 = span( U, V), where U= 1 1 and V = [ 0J O 1 -1 Since { U, V} is clearly linearly in­ dependent, it is also a basis for W3 • .+ Coordin a1es Section 3.5 introduced the idea of the coordinates of a vector with respect to a basis for subspaces of !R n . We now extend this concept to arbitrary vector spaces. Theorem 6 . 5 Let V be a vector space and let B be a basis for V. For every vector v in V, there is exactly one way to write v as a linear combination of the basis vectors in B. Proof The proofis the same as the proof ofTheorem 3.29. It works even ifthe basis B is infinite, since linear combinations are, by definition, finite. Section Linear Independence, Basis, and Dimension 6.2 449 The converse of Theorem 6.5 is also true. That is, if B is a set of vectors in a vector space V with the property that every vector in V can be written uniquely as a linear combination of the vectors in B, then B is a basis for V (see Exercise 30). In this sense, the unique representation property characterizes a basis. Since representation of a vector with respect to a basis is unique, the next definition makes sense. D e fi n it i o n Let B = {v1 , v2 , , vn } be a basis for a vector space V. Let v be a vector in V, and write v = c 1 v1 + c2v2 + + cnvn - Then c 1 , c2 , , e n are called the coordinates of v with respect to B, and the column vector . • . · · · • . • is called the coordinate vector of v with respect to B. Observe that if the basis B of V has n vectors, then [ v l B is a (column) vector in !R n . Exa m p l e 6 . 3 5 Find the coordinate vector [p ( x) ] B of p (x) = 2 - 3x + 5x 2 with respect to the stan­ dard basis B = { l , x, x 2 } of (!J> 2 . Solution The polynomial p(x) is already a linear combination of 1, x, and x2 , so This is the correspondence between (!J> 2 and IR 3 that we remarked on after Example 6.26, and it can easily be generalized to show that the coordinate vector of a polynomial with respect to the standard basis B = { l , x, x 2 , . . . , x " } is just the vector Remark The order in which the basis vectors appear in B affects the order of the entries in a coordinate vector. For example, in Example 6.35, assume that the 450 Chapter Vector Spaces 6 standard basis vectors are ordered as !3 ' = {x 2 , x, l}. Then the coordinate vector of p (x) = 2 - 3x + 5x2 with respect to !3 ' is [p(x) ] a � Exa m p l e 6 . 3 6 Find the coordinate vector [A ] B of A = l3 = {E 11 , E 1 2 , E2 1 > E22 } of M22 • Solulion [ Hl ] 2 -1 with respect to the standard basis 4 3 Since we have This is the correspondence between M22 and IR 4 that we noted before the intro­ duction to Example 6. 1 3 . It too can easily be generalized to give a correspondence between Mmn and !R mn . Exa m p l e 6 . 3 1 Find the coordinate vector [p(x) ] 8 of p (x) = 1 + 2x - x 2 with respect to the basis C = {l + x, x + x 2 , 1 + x 2 } of \)}> 2 . Solulion We need to find c1, c2 , and c3 such that c 1 ( 1 + x) + c2 (x + x 2 ) + c 3 ( 1 + x 2 ) = 1 + 2x - x 2 or, equivalently, (c 1 + c 3 ) + (c 1 + c2 ) x + ( c2 + c3 ) x 2 = 1 + 2x - x 2 As in Example 6.32, this means we need to solve the system c1 + 1 2 c2 + c3 = - 1 C1 + c2 c3 = whose solution is found to be c1 = 2, c2 = 0, c3 = - 1 . Therefore, Section Linear Independence, Basis, and Dimension 6.2 [ Since this result says that p (x) = 2 ( 1 + x) correct.] 451 ( 1 + x2 ), it is easy to check that it is 4 - The next theorem shows that the process of forming coordinate vectors is com­ patible with the vector space operations of addition and scalar multiplication. Theorem 6 . 6 Let B = {v1 , Vz , . . . , vn } be a basis for a vector space V. Let u and v be vectors in V and let e be a scalar. Then a. [ u + v l s = [ u l s + [v l s b. [ c u ] s = e [ u ] s Proof We begin by writing u and v in terms of the basis vectors-say, as Then, using vector space properties, we have and so l ] [l l] [ ] [l d1 e1 dz ez .. + .. . dn en e i + d1 e z + dz . . e n + dn and [ cu ] 8 = ee l eez : een = e el ez : = e [u]8 en An easy corollary to Theorem 6.6 states that coordinate vectors preserve linear combinations: ( 1) You are asked to prove this corollary in Exercise 3 1 . The most useful aspect of coordinate vectors is that they allow us to transfer information from a general vector space to !R n , where we have the tools of Chapters 1 to 3 at our disposal. We will explore this idea in some detail in Sections 6.3 and 6.6. For now, we have the following useful theorem. 452 Chapter Vector Spaces 6 Theorem 6 . 1 Let B = {v1 , v2 , , v"} b e a basis for a vector space V and let u1, . . . , uk be vectors in V. Then {u1, . . . , ud is linearly independent in V if and only if { [ u 1 ] 13, . . . , [ uk J 13} is linearly independent in IR " . • Proof • • Assume that {u1, . . . , ud is linearly independent in V and let in IR " . But then we have c d u 1 ] 13 + · · · + cd ud 13 = 0 [ c 1 u 1 + · · · + ckuk ] 13 = 0 using Equation ( 1 ), so the coordinates of the vector c1u1 + + ckuk with respect to B are all zero. That is, c 1 u 1 + · · · + ckuk = Ov 1 + Ov2 + · · · + Ov" = 0 · · · The linear independence of {u1 , . . . , uk} now forces c1 = c2 = { [ u 1 ] 13, . . . , [ uk J 13} is linearly independent. · · · = ck = 0, so The converse implication, which uses similar ideas, is left as Exercise 32. Observe that, in the special case where U; = V;, we have V; = 0 · V I + · · · + 1 · V; + · · · + 0 • Vn Dimension The definition of dimension is the same for a vector space as for a subspace of !R n -the number of vectors in a basis for the space. Since a vector space can have more than one basis, we need to show that this definition makes sense; that is, we need to establish that different bases for the same vector space contain the same number of vectors. Part (a) of the next theorem generalizes Theorem 2.8. Theorem 6 . 8 Let B = {v 1 , v2 , . . . , vn } be a basis for a vector space V. a. Any set of more than n vectors in V must be linearly dependent. b. Any set of fewer than n vectors in V cannot span V. (a) Let {u1, . . . , um} be a set of vectors in V, with m > n. Then { [ u 1 ] 13, . . . , [ um J 13} is a set of more than n vectors in !R n and, hence, is linearly dependent, by Theorem 2.8. This means that {u1, . . . , um} is linearly dependent as well, by Proof Theorem 6.7. (b) Let {u1, . . . , um} be a set of vectors in V, with m < n. Then S = { [ u 1 J 13, . . . , [ um J 13} is a set of fewer than n vectors in !R n . Now span(u 1 , . . . , um) = V if and only if span(S) = !R n (see Exercise 33). But span ( S ) is just the column space of the n X m matrix A = [ [ u 1 ] 13 · · · [ um J 13 ] so dim(span(S )) = dim(col(A)) :s m < n. Hence, S cannot span !R n , so {u1, . . . , um} does not span V. Now we extend Theorem 3.23. Section Linear Independence, Basis, and Dimension 6.2 Theorem 6 . 9 453 The Basis Theorem If a vector space V has a basis with n vectors, then every basis for V has exactly n vectors. The proof of Theorem 3.23 also works here, virtually word for word. However, it is easier to make use of Theorem 6.8. Let B be a basis for V with n vectors and let B' be another basis for V with m vectors. By Theorem 6.8, m ::::: n; otherwise, B' would be linearly dependent. Now use Theorem 6.8 with the roles of B and B' interchanged. Since B' is a basis of V with m vectors, Theorem 6.8 implies that any set of more than m vectors in V is linearly dependent. Hence, n ::::: m, since B is a basis and is, therefore, linearly independent. Since n ::::: m and m ::::: n, we must have n = m, as required. Proof The following definition now makes sense, since the number of vectors in a (finite) basis does not depend on the choice of basis. D e fi n it i o n A vector space V is called finite-dimensional if it has a basis con­ sisting of finitely many vectors. The dimension of V, denoted by dim V, is the num ber of vectors in a basis for V. The dimension of the zero vector space { 0} is defined to be zero. A vector space that has no finite basis is called infinite-dimensional. Exa m p l e 6 . 3 8 Since the standard basis for !R n has n vectors, dim !R n = n. In the case of IR 3 , a one­ dimensional subspace is just the span of a single nonzero vector and thus is a line through the origin. A two-dimensional subspace is spanned by its basis of two linearly independent (i.e., nonparallel) vectors and therefore is a plane through the origin. Any three linearly independent vectors must span IR 3 , by the Fundamental Theorem. The subspaces of IR 3 are now completely classified according to dimension, as shown in Table 6. 1 . Ta b l e 6 . 1 dim V 3 2 0 Exa m p l e 6 . 3 9 v IR 3 Plane through the origin Line through the origin {O} The standard basis for <!J> n contains n + 1 vectors (see Example 6.30), so dim <!J> n = n + 1. 454 Chapter Vector Spaces 6 Exa m p l e 6 . 4 0 Exa m p l e 6 . 4 1 The standard basis for Mm n contains mn vectors (see Example 6.3 1), so dim Mm n = mn. Both <!P and 9F are infinite-dimensional, since they each contain the infinite linearly independent set { l , x, x2 , } (see Exercise 44). • Exa m p l e 6 . 4 2 • • Find the dimension of the vector space W of symmetric 2 X 2 matrices (see Example 6. 10). Solulion A symmetric 2 X 2 matrix is of the form [� �] [� �] [� �] [� �] { [ � � l [ � �l [ � � ] } =a +b +c so W is spanned by the set s = If S is linearly independent, then it will be a basis for W. Setting we obtain from which it immediately follows that a = b = c = 0. Hence, S is linearly indepen­ dent and is, therefore, a basis for W. We conclude that dim W = 3. The dimension of a vector space is its "magic number:' Knowing the dimension of a vector space V provides us with much information about V and can greatly sim­ plify the work needed in certain types of calculations, as the next few theorems and examples illustrate. Theorem 6 . 1 0 Let V be a vector space with dim V = n. Then: a. Any linearly independent set in V contains at most n vectors. b. Any spanning set for V contains at least n vectors. c. Any linearly independent set of exactly n vectors in V is a basis for V. d. Any spanning set for V consisting of exactly n vectors is a basis for V. e. Any linearly independent set in V can be extended to a basis for V. f. Any spanning set for V can be reduced to a basis for V. Section Linear Independence, Basis, and Dimension 6.2 455 The proofs of properties (a) and (b) follow from parts (a) and (b) of Theo­ rem 6.8, respectively. (c) Let S be a linearly independent set of exactly n vectors in V. If S does not span V, then there is some vector v in V that is not a linear combination of the vectors in S. Inserting v into S produces a set S' with n + 1 vectors that is still linearly independent (see Exercise 54). But this is impossible, by Theorem 6.S(a). We conclude that S must span V and therefore be a basis for V. (d) Let S be a spanning set for V consisting of exactly n vectors. If S is linearly dependent, then some vector v in S is a linear combination of the others. Throwing v away leaves a set S' with n 1 vectors that still spans V (see Exercise 55). But this is impossible, by Theorem 6.S(b ) . We conclude that S must be linearly independent and therefore be a basis for V. (e) Let S be a linearly independent set of vectors in V. If S spans V, it is a basis for V and so consists of exactly n vectors, by the Basis Theorem. If S does not span V, then, as in the proof of property (c), there is some vector v in V that is not a linear combination of the vectors in S. Inserting v into S produces a set S' that is still linearly independent. If S' still does not span V, we can repeat the process and expand it into a larger, linearly independent set. Eventually, this process must stop, since no linearly independent set in V can contain more than n vectors, by Theorem 6.S(a). When the process stops, we have a linearly independent set S* that contains S and also spans V. Therefore, S* is a basis for V that extends S. (f) You are asked to prove this property in Exercise 56. Proof - You should view Theorem 6. 1 0 as, in part, a labor-saving device. In many instances, it can dramatically decrease the amount of work needed to check that a set of vectors is linearly independent, a spanning set, or a basis. Exa m p l e 6 . 4 3 In each case, determine whether S is a basis for V. (a) V = rzf 2 , S = { l + x, 2 - x + x 2, 3x - 2x2, (b) V = M22 , S = { [ � �], [ � - �] , [ � �] } - 1 + 3x + x2} _ (c) V = rzf 2 , S = { l + x, x + x2, 1 + x2} (a) Since dim ( \]f 2 ) = 3 and S contains four vectors, S is linearly depen­ dent, by Theorem 6. l O(a). Hence, S is not a basis for \)f 2 . (b) Since dim(M22 ) = 4 and S contains three vectors, S cannot span M22 , by Theo­ rem 6. l O(b). Hence, S is not a basis for M22 . ( c) Since dim ( \]f 2 ) = 3 and S contains three vectors, S will be a basis for rzf 2 if it is lin early independent or if it spans \]f 2 , by Theorem 6.10( c) or (d) . It is easier to show that S is linearly independent; we did this in Example 6.26. Therefore, S is a basis for \]f 2 • (This is the same problem as in Example 6.32-but see how much easier it becomes using Theorem 6. 1 0 ! ) Solution Exa m p l e 6 . 4 4 Extend { l + x , 1 - x } t o a basis fo r \)f 2 . First note that { l + x, 1 x} is linearly independent. (Why?) Since dim ((lf 2 ) = 3, we need a third vector-one that is not linearly dependent on the first two. Solution - Chapter Vector Spaces 6 456 We could proceed, as in the proof of Theorem 6.I O(e), to find such a vector using trial and error. However, it is easier in practice to proceed in a different way. We enlarge the given set of vectors by throwing in the entire standard basis for !!f 2 . This gives S = { l + x, 1 - x, 1 , x, x 2 } Now S is linearly dependent, by Theorem 6. IO(a), so we need to throw away some vectors-in this case, two. Which ones? We use Theorem 6.l O(f), starting with the first vector that was added, 1. Since 1 = i ( l + x) + i ( l - x), the set {l + x, 1 - x, l } is linearly dependent, so we throw away 1 . Similarly, x = i ( l + x) - i ( l - x), so {l + x, 1 - x, x} is linearly dependent also. Finally, we check that {l + x, 1 - x, x 2 } is linearly independent. (Can you see a quick way to tell this?) Therefore, { l + x, 1 - x, x2 } is a basis for l!f 2 that extends { l + x, 1 - x}. In Example 6.42, the vector space W of symmetric 2 X 2 matrices is a subspace of the vector space M22 of all 2 X 2 matrices. As we showed, dim W = 3 ::=::: 4 = dim M22 . This is an example of a general result, as the final theorem of this section shows. Theorem 6 . 1 1 Let W be a subspace of a finite-dimensional vector space V. Then: a. W is finite-dimensional and dim W ::=::: dim V. b. dim W = dim V if and only if W = V. Proof (a) Let dim V = n. If W = {O}, then dim ( W ) = 0 ::=::: n = dim V. If W is nonzero, then any basis B for V (containing n vectors) certainly spans W, since W is contained in V. But B can be reduced to a basis B' for W (containing at most n vec­ tors), by Theorem 6. I O (f) . Hence, W is finite-dimensional and dim( W ) ::=::: n = dim V. (b) If W = V, then certainly dim W = dim V. On the other hand, if dim W = dim V = n, then any basis B for W consists of exactly n vectors. But these are then n lin­ early independent vectors in V and, hence, a basis for V, by Theorem 6. I O (c). There­ fore, V = span (B) = W. I Exercises 6 . 2 In Exercises 1 -4, test the sets of matrices for linear indepen­ dence in M22. For those that are linearly dependent, express one of the matrices as a linear combination of the others. 4· { [� � l [ � �l [� � l [ � �] } In Exercises 5-9, test the sets ofpolynomials for linear inde­ pendence. For those that are linearly dependent, express one of the poly nomials as a linear combination of the others. 5. {x, 1 + x} in l!f 6. { l + x, 1 + x 2 , 1 - x + x 2 } in l!f 2 7. {x, 2x - x 2 , 3x + 2x 2 } in l!f 2 1 Section Linear Independence, Basis, and Dimension 6.2 8. {2x, x - x 2 , 1 + x 3 , 2 - x 2 + x 3 } in <;5} 3 9. { l - 2x, 3x + x 2 - x 3 , 1 + x 2 + 2x 3 , 3 + 2x + 3 x 3 } in <;5} 3 In Exercises 1 0-14, test the sets offunctions for linear in­ dependence in <;IF, For those that are linearly dependent, express one of the functions as a linear combination of the others. 1 1 . {l , sin2x, cos2x} 10. { l , sin x, cos x} 13. { l , ln (2x), ln (x 2 ) } 12. {eX, e - x } 14. {sin x, sin 2x, sin 3x} � 15. Iff and g are in C(b (J J , the vector space of all functions with continuous derivatives, then the determinant I I f(x) g(x) j'(x) g'(x) is called the Wronskian off and g [named after the W(x) = Polish-French mathematician J 6sef Maria Hoene­ Wronski ( 1 776- 1 853), who worked on the theory of determinants and the philosophy of mathematics] . Show that f and g are linearly independent if their Wronskian is not identically zero (that is, if there is some x such that W(x) * O ). � 16. In general, the Wronskian of f1 , • • • Jn in C(b ( n - i l is the determinant fz (x) j{(x) W(x) = and f1 , . . . ,fn are linearly independent, provided W(x) is not identically zero. Repeat Exercises 10-14 using the Wronskian test. 17. Let {u, v, w} be a linearly independent set of vectors in a vector space V. (a) Is {u + v, v + w, u + w} linearly independent? Either prove that it is or give a counterexample to show that it is not. (b) Is {u v, v w, u - w} linearly independent? Either prove that it is or give a counterexample to show that it is not. - - In Exercises 18-25, determine whether the set B is a basis for the vector space V 18. V = M22 , B = l9. V = M22 , B = { [� �l [ � �l [ _ � - � ] } { [ � � ] ' [ � -� ] ' [ � � ] ' [ � - � ] } 451 20. V = Mzz , 21. V = M22 , B= { [ � � l [ � � l [ _ � �l [ � �l [� � ] } - 22. V = <;IP 2 , B = {x, 1 + x, x x 2 } 23. V = <;IP 2 , B = { l - x, 1 - x 2 , x x 2 } 24. V = <;IP 2 , B = { l , 1 + 2x + 3x2 } 25. V = <;IP 2 , B = { l , 2 - x, 3 - x2 , x + 2x 2 } - [� !] [� !] { [� �l [� �l [ � �l [ � � ] } 26. Find the coordinate vector of A = with respect to the basis B = {E22 , E21 , Ew E 1 1 } of M22 • 27. Find the coordinate vector of A = to the basis B = of M22 . with respect 28. Find the coordinate vector of p (x) = 1 + 2x + 3x 2 with respect to the basis B = { l + x, 1 - x, x 2 } of <;JP 2 . 29. Find the coordinate vector of p (x) = 2 - x + 3x 2 with respect to the basis B = { l , 1 + x, - 1 + x 2 } of <;JP 2 . 30. Let B be a set of vectors in a vector space V with the property that every vector in V can be written uniquely as a linear combination of the vectors in B. Prove that B is a basis for V. 31. Let B be a basis for a vector space V, let u 1 , . . . , uk be vectors in V, and let c 1 , . . . , ck be scalars. Show that [ c 1U1 + . . . + ckud s = C 1 [ u 1 l s + . . . + cd ud s · 32. Finish the proof of Theorem 6.7 by showing that if { [ u 1 ] 8, . . . , [ uk ] 8} is linearly independent in IR " then { u 1 , . . . , uk } is linearly independent in V. 33. Let { u 1 , . . . , um } be a set of vectors in an n-dimensional vector space V and let B be a basis for V. Let S = { [ u 1 ] 8, . . . , [ um ] 8} be the set of coordinate vectors of {u 1 , . . . , um } with respect to B. Prove that span ( u 1 , • . • , um ) = V if and only if span(S ) = !R n . In Exercises 34-39, find the dimension of the vector space V and give a basis for V 34. V = {p (x) in <;JP 2 : p (O) = O} 35. V = {p ( x) in <;JP 2 : p ( l ) = O} � 36. V = {p ( x) in <;IP 2 : xp ' ( x) = p (x) } Chapter Vector Spaces 6 458 37. V = {A in M22 : A is upper triangular} 38. V = {A in M22 : A is skew-symmetric} 39. V = {A in M22 : AB = BA}, where B = [ 01 ] 1 1 40. Find a formula for the dimension of the vector space of symmetric n X n matrices. 41. Find a formula for the dimension of the vector space of skew-symmetric n X n matrices. 42. Let U and W be subspaces of a finite-dimensional vector space V. Prove Grassmann's Identity: dim ( U + W) = dim U + dim W - dim( U n W) [Hint: The subspace U + W is defined in Exercise 48 of Section 6. 1 . Let B = {v1 , . . . , vk} be a basis for U n W. Extend B to a basis C of U and a basis D of W. Prove that C U D is a basis for U + W.] 43. Let U and V be finite-dimensional vector spaces. (a) Find a formula for dim( U X V) in terms of dim U and dim V. (See Exercise 49 in Section 6. 1 . ) (b) I f W is a subspace o f V, show that dim Li = dim W, where Li = { ( w, w) : w is in W}. 44. Prove that the vector space i!f is infinite-dimensional. [Hint: Suppose it has a finite basis. Show that there is some polynomial that is not a linear combination of this basis.] 45. Extend { l + x, 1 + x + x 2 } to a basis for llf 2 . 46. Extend 47. Extend 48. Extend { [� � l [� � ] } { [ � � l [ � �l [ � � ] } { [ � � ], [ � � ] } to a basis for M22 • - to a basis for M22 • to a basis for the vector space of symmetric 2 X 2 matrices. 49. Find a basis for span ( l , 1 + x, 2x) in llf 1 . 50. Find a basis for span ( l - 2x, 2x - x 2 , 1 - x 2 , 1 + x 2 ) in i!f 2 . 5 1 . Find a basis for span ( l - x, x - x 2 , 1 - x 2 , 1 - 2x + x 2 ) in i!f 2 • O O l 52. Find a basis for span , , 1 1 0 1 [ � -�]) _ in M22 . ( [� J [ ] [-1 53. Find a basis for span(sin2x, cos 2x, cos 2x) in '!fa. 54. Let S = {v1 , . . . , vn } be a linearly independent set in a vector space V. Show that if v is a vector in V that is not in span(S ), then S' = {v1 , • • • , vn , v} is still linearly independent. 55. Let S = {v 1 , • • • , vn } be a spanning set for a vector space V. Show that if vn is in span (v1 , . . . , vn _ 1 ), then S' = {v1 , • . • , vn _ 1 } is still a spanning set for V. 56. Prove Theorem 6.l O(f) . 57. Let {v1 , • • • , vJ be a basis fo r a vector space V and let c 1 , • . . , cn be nonzero scalars. Prove that {c 1 V1 , . . . , C nvn } is also a basis for V. 58. Let {v1 , • • • , vJ be a basis for a vector space V. Prove that {vi , V1 + V2 , V1 + Vz + V3 , . . . , V1 + . . . + vJ is also a basis for V. Let a0, a 1 , • • • , a n be n + 1 distinct real numbers. Define polynomials p0(x), p 1 ( x) , . . . , Pn ( x) by ( x - a0) · • · ( x - a ; _ 1 ) (x - a; 1 ) · • · ( x - a n ) P; ( x) = (a - a0) • • · (a ; - a ; _ ) (a ; - a ;+ ) • • · (a ; - a n ) ; 1 +1 These are called the Lagrange polynomials associated with a0, a 1 , . . . , aw [Joseph-Louis Lagrange (1 736- 1 8 1 3) was born in Italy but spent most of his life in Germany and France. He made important contributions to such fields as number theory, algebra, astronomy, mechanics, and the calculus of variations. In 1 773, Lagrange was the first to give the volume interpretation of a determinant (see Chapter 4).] 59. ( a) Compute the Lagrange polynomials associated with a0 = 1, a 1 = 2, a 2 = 3. (b) Show, in general, that p ; (a) = {o if i * j 1 if i = j 60. (a) Prove that the set B = {p0(x), p 1 ( x) , . . . , P n ( x) } of Lagrange polynomials is linearly independent in i!f w [Hint: Set c0p0(x) + + c np n (x) = 0 and use Exercise 59(b) .] (b) Deduce that B is a basis for i!f w 61. If q(x) is an arbitrary polynomial in i!f n ' it follows from Exercise 60(b) that q ( x) = CoPo ( x) + . . . + cnp n (x) ( 1) for some scalars c0, . . . , cw ( a) Show that C; = q (a ; ) for i = 0, . . . , n, and deduce that q ( x) = q(a0)p0(x) + + q(a n )P n (x) is the unique representation of q(x) with respect to the basis B. · · · · · · Section Linear Independence, Basis, and Dimension 6.2 , (b) Show that for any n + 1 points ( a0, c0 ) , ( a1, c1 ) , ( a n , en ) with distinct first components, the func­ . . • tion q(x) defined by Equation ( 1 ) is the unique polynomial of degree at most n that passes through all of the points. This formula is known as the Lagrange interpolation formula. (Com­ pare this formula with Problem 19 in Explora­ tion: Geometric Applications of Determinants in Chapter 4.) (c) Use the Lagrange interpolation formula to find the polynomial of degree at most 2 that passes through the points 459 (i) ( 1 , 6), (2, - 1 ), and (3, - 2) (ii) ( - 1 , 10), (0, 5), and (3, 2) 62. Use the Lagrange interpolation formula to show that if a polynomial in i!f n has n + 1 zeros, then it must be the zero polynomial. 63. Find a formula for the number of invertible matrices in Mnn ( !f_p ) · [Hint: This is the same as determining the number of different bases for z;. (Why?) Count the number of ways to construct a basis for z;, one vector at a time.] Exp loration Magic S qu a r e s The engraving shown o n page 46 1 i s Albrecht Durer's Melancholia I ( 1 5 14). Among the many mathematical artifacts in this engraving is the chart of numbers that hangs on the wall in the upper right-hand corner. (It is enlarged in the detail shown.) Such an array of numbers is known as a magic square. We can think of it as a 4 X 4 matrix 3 2 10 1 1 6 7 1 5 14 Observe that the numbers in each row, in each column, and in both diagonals have the same sum: 34. Observe further that the entries are the integers 1 , 2, . . . , 16. (Note that Durer cleverly placed the 1 5 and 14 adjacent to each other in the last row, giving the date of the engraving.) These observations lead to the following definition. An n x n matrix M is called a magic square if the sum of the entries is the same in each row, each column, and both diagonals. This common sum is called the weight of M, denoted wt(M). If M is an n X n magic square that contains each of the entries 1 , 2, . . . , n 2 exactly once, then M is called a classical D e fi n it i o n magic square. 1 . If M is a classical n X n magic square, show that n(_n_2_+_l_) wt ( M) = _ [Hint: Use Exercise 5 1 in Section 2.4.] 2 2. Find a classical 3 X 3 magic square. Find a different one. Are your two ex­ amples related in any way? 460 3. Clearly, the 3 X 3 matrix with all entries equal to t is a magic square with weight 1 . Using your answer to Problem 2, find a 3 X 3 magic square with weight 1 , all of whose entries are different. Describe a method fo r constructing a 3 X 3 magic square with distinct entries and weight w for any real number w. Let Magn denote the set of all n X n magic squares, and let Mag� denote the set of all n X n magic squares of weight 0. 4. (a) Prove that Mag3 is a subspace of M33 . (b) Prove that Mag� is a subspace of Mag3 . 5. Use Problems 3 and 4 to show that if M is a 3 X 3 magic square with weight w, then we can write M as M = M0 + kJ where M0 is a 3 X 3 magic square of weight 0, J is the 3 X 3 matrix consisting entirely of ones, and k is a scalar. What must k be? [Hint: Show that M - kJ is in Maij for an appropriate value of k.] Let's try to find a way of describing all 3 X 3 magic squares. Let be a magic square with weight 0. The conditions on the rows, columns, and diag­ onals give rise to a system of eight homogeneous linear equations in the variables a, b, . . . , i. 6. Write out this system of equations and solve it. [Note: Using a CAS will facilitate the calculations.] 461 7. Find the dimension of Mag�. Hint: By doing a substitution, if necessary, use your solution to Problem 6 to show that M can be written in the form M= [ s -s - t t s- t 0 -s + t -t s+ t -s l 8. Find the dimension of Mag3 • [Hint: Combine the results of Problems 5 and 7.] 9. Can you find a direct way of showing that the ( 2, 2) entry of a 3 X 3 magic square with weight w must be w/3? [Hint: Add and subtract certain rows, columns, and diagonals to leave a multiple of the central entry.] 10. Let M be a 3 X 3 magic square of weight 0, obtained from a classical 3 X 3 magic square as in Problem 5. If M has the form given in Problem 7, write out an equation for the sum of the squares of the entries of M. Show that this is the equation of a circle in the variables s and t, and carefully plot it. Show that there are exactly eight points (s, t) on this circle with both s and t integers. Using Problem 8, show that these eight points give rise to eight classical 3 X 3 magic squares. How are these magic squares related to one another? 462 Section 6.3 Change of Basis 463 C h a n g e o f Basis In many applications, a problem described using one coordinate system may be solved more easily by switching to a new coordinate system. This switch is usually accomplished by performing a change of variables, a process that you have prob­ ably encountered in other mathematics courses. In linear algebra, a basis provides us with a coordinate system for a vector space, via the notion of coordinate vectors. Choosing the right basis will often greatly simplify a particular problem. For example, consider the molecular structure of zinc, shown in Figure 6.3(a). A scientist studying zinc might wish to measure the lengths of the bonds between the atoms, the angles between these bonds, and so on. Such an analysis will be greatly facilitated by intro­ ducing coordinates and making use of the tools of linear algebra. The standard basis and the associated standard xyz coordinate axes are not always the best choice. As Figure 6.3(b) shows, in this case {u, v, w} is probably a better choice of basis for IR 3 than the standard basis, since these vectors align nicely with the bonds between the atoms of zinc. v (a) (b) figure 6 . 3 Chan ge-of-Basis Matrices Figure 6.4 shows two different coordinate systems for IR 2 , each arising from a different basis. Figure 6.4(a) shows the coordinate system related to the basis !3 = { u , , u2 } , while Figure 6.4(b) arises from the basis C = {v1 , vz } , where The same vector x is shown relative to each coordinate system. It is clear from the diagrams that the coordinate vectors of x with respect to !3 and C are [x]8 = [�] and [ x ] c = [ _�] respectively. It turns out that there is a direct connection between the two coordinate vectors. One way to find the relationship is to use [ x] 8 to calculate 464 Chapter 6 Vector Spaces y y 2 -2 x -2 -4 -4 (a) (b) Figure 6 . 4 Then we can find [ x ] c by writing x as a linear combination of v1 and v2 • However, there is a better way to proceed-one that will provide us with a general mechanism for such problems. We illustrate this approach in the next example. Exa m p l e 6 . 4 5 Using the bases B and C above, find [ x ] c, given that [ x ] 8 = 1 . 3 Solulion Since x = u 1 + 3u2 , writing u 1 and u2 in terms ofv 1 and v2 will give us the required coordinates of x with respect to C. We find that u1 = and so [ -�] -3 [�] 2 [ � ] [ �] [�] [ � ] u2 = + _ =3 This gives [x] c = in agreement with Figure 6.4(b) . 2 = - 3v1 + v2 = 3v1 - V2 [ _�] This method may not look any easier than the one suggested prior to Example 6.45, but it has one big advantage: We can now find [ y ] c from [ y ] 8 for any vector y in IR 2 Section 6.3 Change of Basis 465 with very little additional work. Let's look at the calculations in Example 6.45 from a different point of view. From x = u1 + 3u2, we have by Theorem 6.6. Thus, [� ] [ -� -�] [�] [ x ] c = [ [ u 1 J d u2 l c l = where P is the matrix whose columns are [ u 1 ] c and [ u2 ] c· This procedure generalizes very nicely. Let B = { u 1 , . . . , u"} and C = {v1 , . . . , v"} be bases for a vector space V. The n X n matrix whose columns are the coordinate vectors [u1 ] C• . . . , [un l c of the vectors in B with respect to C is denoted by Pc<- B and is called the change-of-basis matrix from B to C. That is, D e fi n it i o n Pc<- B = [ [ u 1 J d u2 l c · · · [ un l c l Think of B as the "old" basis and C as the "new" basis. Then the columns of Pc<- B are just the coordinate vectors obtained by writing the old basis vectors in terms of the new ones. Theorem 6.12 shows that Example 6.45 is a special case of a general result. Theorem 6 . 1 2 Let B = {u 1 , . . . , uJ and C = {v1 , . . . , vn } be bases for a vector space V and let Pc<- B be the change-of-basis matrix from B to C. Then a. Pc,_ s [ x] 8 = [x] c for all x in V. b. PC <- 8 is the unique matrix P with the property that P [ x ] 8 = [ x ] c for all x in V. c. Pc<- B is invertible and (Pc ,_ 8) - 1 = Ps <- c· Proof (a) Let x be in V and let [ x ] c = [ C 1 U 1 + . . . + C n un J c tl = C1 [ u 1 J c + . . . + Cn [ un J c � I [ u. J c = Pc<-s [ x ] s [ u" ] ' 466 Chapter 6 Vector Spaces (b) Suppose that P is an n X n matrix with the property that P [ x] 8 = [ x] c fo r all x in V. Taking x = U;, the ith basis vector in B, we see that [x] 8 = [ u;] 8 = e;, so the ith column of P is P; = Pe ; = P [ u; ] B = [ U; ] c which is the ith column of Pc ,_ 8, by definition. It follows that P = Pc.-B· ( c) Since {u1, . . . , un } is linearly independent in V, the set { [u1 J c, . . . , [ un l d is linearly independent in ll�r, by Theorem 6.7. Hence, Pc .-8 = [ [ u 1 J c [ u ] c is invert­ ible, by the Fundamental Theorem. For all x in V, we have PC<- B [ x ] 8 = [ x ] c· Solving for [ x l t» we find that · · · [ x ] 8 = (Pc <-B ) - 1 [ x ] c for all x in V. Therefore, (Pc .-8) - 1 is a matrix that changes bases from C to B. Thus, by the uniqueness property (b), we must have (Pc .-8) - 1 = P8.- c . R e m a rks You may find it helpful to think of change of basis as a transformation (indeed, it is a linear transformation) from !R n to itself that simply switches from one coordi­ nate system to another. The transformation corresponding to Pc<-B accepts [ x ] 8 as input and returns [x]c as output; (Pc ,_ 8) - 1 = p8 ,_ c does just the opposite. Figure 6.5 gives a schematic representation of the process. • / � [ ]B / Multiplication � [ Jc [x]• c Change of basis Figure 6 . 5 v x• by Pc� B Multiplication [x]• B by PB� c = (Pc� B ) - 1 The columns of Pc.-8 are the coordinate vectors of one basis with respect to the other basis. To remember which basis is which, think of the notation C +--- B as saying "B in terms of C:' It is also helpful to remember that Pc <-t3 [ x] 8 is a linear com­ bination of the columns of Pc <-B· But since the result of this combination is [ xJ c, the columns of Pc .-8 must themselves be coordinate vectors with respect to C. • Exa m p l e 6 . 4 6 Find the change-of-basis matrices Pc.-8 and P8.- c for the bases B = { 1 , x, x 2 } and C = { I + x, x + x 2 , 1 + x 2 } of \]]> 2 . Then find the coordinate vector of p(x) = 1 + 2x - x 2 with respect to C. Changing to a standard basis is easy, so we find PB <- c first. Observe that the coordinate vectors for C in terms of B are Solution Section 6.3 Change of Basis 461 (Look back at the Remark following Example 6.26.) It follows that To find PC<- l3• we could express each vector in l3 as a linear combination of the vec­ tors in C (do this), but it is much easier to use the fact that Pc,_13 = (P13,_c) - 1 , by Theorem 6.1 2(c). We find that Pc<-- 13 = (P13,_c) - 1 = It now follows that [ -�: -�: -�::l 2 1 :- 2 2 1 1 :- - [p(x) ] c = Pc,_13 [p(x) ] 13 which agrees with Example 6.37. Ifwe do not need Pc,_13 explicitly, we can find [p(x) l e from [p(x) ] 13 and P13,_c using Gaussian elimination. Row reduction produces Remark [I I [p(x) l c l (See the next section on using Gauss-Jordan elimination.) It is worth repeating the observation in Example 6.46: Changing to a standard basis is easy. If £ is the standard basis for a vector space V and l3 is any other basis, then the columns of PE<- l3 are the coordinate vectors of l3 with respect to £, and these are usually "visible:' We make use of this observation again in the next example. Exa m p l e 6 . 4 1 In M22 , let l3 be the basis {E 11, E 2 1, E1 2 , E 22 } and let C be the basis {A, B, C, D}, where A- [ � ! ]. [ J [ ] [ 1 O 1 1 1 CB' ' O O 1 o o �] Find the change-of-basis matrix Pc,_13 and verify that [XJ c = Pc,_13 [X] 13 for X = 468 Chapter 6 Vector Spaces To solve this problem directly, we must find the coordinate vectors of l3 with respect to C. This involves solving four linear combination problems of the form X = aA bB cC dD, where X is in l3 and we must find a, b, c, and d. However, here we are lucky, since we can find the required coefficients by inspection. Clearly, E11 = A, E2 1 = - B C, E1 2 = -A B, and E22 = - C D. Thus, + + + Solulion 1 [E u l c � rn + + + m [ I} [ � l [ -!] : � [ � -!] - [ £,, ] , � - [E , , ] , � [E,, J c � Pc�,; � [ [E u l c [E,. ] , [E ,, J c [E,, J c ] If X = [� !l then [X] s � Pc +- 13 [ X ] 13 = and [i 0 -1 1 0 rn -1 1 0 0 -!][ ;] [ =] This is the coordinate vector with respect to C of the matrix [� �] - [� �] - [ � �] + 4 [ � � ] [� !] - A - B - C + 4D = - = x as it should be. We can compute Pc..-13 in a different way, as follows. As you will be asked to prove in Exercise 2 1 , if £ is another basis for M22 > then Pc ..-13 = Pc ..-EPE..-13 = (PE..-c) - 1 pE+- l3· If £ is the standard basis, then PE+-l3 and PE+-c can be found by inspec­ tion. We have Solulion 2 Section 6.3 Change of Basis � 469 (Do you see why?) Therefore, Pc +- B = (Pt:+- c) - 1 Pt:+- B [j J[j �] [j �m �] � [j -� ] 1 0 1 0 0 0 -1 1 -1 1 0 0 0 0 -1 1 -1 1 0 0 0 0 0 1 0 - 0 1 0 0 0 0 0 1 1 0 0 0 which agrees with the first solution. .+ The second method has the advantage of not requiring the computa­ tion of any linear combinations. It has the disadvantage of requiring that we find a matrix inverse. However, using a CAS will facilitate finding a matrix inverse, so in general the second method is preferable to the first. For certain problems, though, the first method may be just as easy to use. In any event, we are about to describe yet a third approach, which you may find best of all. Remark The Gauss-Jordan Melhod for Compuling a Change-of-Basis Malrix Finding the change-of-basis matrix to a standard basis is easy and can be done by inspection. Finding the change-of-basis matrix from a standard basis is almost as easy, but requires the calculation of a matrix inverse, as in Example 6.46. If we do it by hand, then (except for the 2 X 2 case) we will usually find the necessary inverse by Gauss-Jordan elimination. We now look at a modification of the Gauss-Jordan method that can be used to find the change-of-basis matrix between two nonstandard bases, as in Example 6.47. Suppose B = {u 1 , . . . , u"} and C = {v1 , , vn } are bases for a vector space V and Pc +-B is the change-of-basis matrix from B to C. The ith column of P is . • . so U; = p 1;v1 + · · · + P n ivn - If £ is any basis for V, then [ u; ]t; = [ P1 ;V 1 + · · · + Pnivn l t: = P1d v1 J t: + · · · + Pnd vn l t: This can be rewritten in matrix form as 410 Chapter 6 Vector Spaces which we can solve by applying Gauss-Jordan elimination to the augmented matrix [ [v 1 L�· · · · [vn l t: I [ u; ] EJ There are n such systems of equations to be solved, one for each column of Pc,_ 8, but the coefficient matrix [ [ v1 ] E • • • [ vn ] E ] is the same in each case. Hence, we can solve all the systems simultaneously by row reducing the n X 2n augmented matrix [ [v1 J t: · · · [ vn l t: l [ u 1 J t: · · · [ un l t: l = [ C I B J Since {V1 ' . . . ' vn } is linearly independent, so is { [ v I l E> ' [ vn l E} ' by Theorem 6. 7. Therefore, the matrix C whose columns are [ v1 ] E> , [ vn ] E has the n X n identity matrix I for its reduced row echelon form, by the Fundamental Theorem. It follows that Gauss-Jordan elimination will necessarily produce • • • • • • [ C I B J ---+ [ I I P J where P = Pc<-- 8. We have proved the following theorem. Theorem 6 . 1 3 Let l3 = {u 1 , , un } and C = {v1 , . . . , vn } be bases for a vector space V. Let B = [ [ u 1 l t: . . . [ un l t: l and C = [ [v 1 ] ,<: . . . [vn l t: L where £ is any basis for V. Then row reduction applied to the n X 2n augmented matrix [ C I B ] produces [ C I B J ---+ [ I I Pc.-al . • . If £ i s a standard basis, this method i s particularly easy t o use, since i n that case B = PE<-- B and C = PE<-- C · We illustrate this method by reworking the problem in Example 6.47. Exa m p l e 6 . 4 8 Rework Example 6.47 using the Gauss-Jordan method. Taking £ to be the standard basis for M22 , we see that Solution B = PE<-- B = [l f l 0 0 0 1 0 0 0 Row reduction produces IC I BJ � � [l 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 and C = PE.-c = [ f: l � (Verify this row reduction.) It follows that as we found before. P ,�, � [l 0 -1 1 -1 1 0 0 0 [l : : 0 1 0 0 0 0 0 1 0 -1 0 0 0 -1 1 0 1 0 0 1 0 0 0 0 0 0 -�] -�] I Section Change of Basis 6.3 Exercises 6 . 3 [x] 8 [x]c x [x] In Exercises 1 -4: (a) Find the coordinate vectors and of with respect to the bases B and C, respectively. (b) Find the change-of-basis matrix Pc,_8from B to C. (c) Use your answer to part (b) to compute c, and compare your answer with the one found in part (a). (d) Find the change-of-basis matrix P8,_ c from C to B. (e) Use your answers to parts (c) and (d) to compute and compare your answer with the one found in part (a). x = [ � ], = { [ � ], [ � ] } , = { [�l [ �] } x = [ �l = { [�l [ � ] } = { [�l [�] } x � [ H � un m. [�J l. c � m H:WJ l 4. x � [; J s � m J m . [ m c � m rn rn J l . x == = = = == - = = 1. [ x ] 8, B C 2. in � 2 _ _ B , in � 2 C 3. B _ in � ' in � ' In Exercises 5-8, follow the instructions for Exercises 1 -4 using p ( x) instead of 5. p ( x) 2 - x, B { l , x}, C {x, 1 + x} in 0\ 6. p(x) 1 + 3x, B = { l + x, 1 x}, C = {2x, 4} in 0 \ 7. p(x) 1 + x 2, B { l + x + x 2, x + x 2, x 2}, C 411 { l , x, x 2} in rzf 2 8. p(x) 4 2x - x 2, B C = { l , 1 + x, x 2} in rzf 2 {x, 1 + x2 , x + x 2}, x. = [ � � ], = c = { [ � - � l [ � � ], [ � � ], [ � � ] } = [l 1] = { [ � � l [ � �l [ � �l [ � � ] } , = { [ � � l [ � �l [ � � l [ � � ] } . x == - = = = In Exercises 9 and 1 0, follow the instructions for Exercises 1 -4 using A instead of 9. A B _ 10. A 1 1 the standard basis, in M22 , B C in M22 In Exercises 1 1 and 12, follow the instructions for Exercises 1 -4 usingf(x) instead of 1 1 . f(x) 2 sin x 3 cos x, B {sin x + cos x, cos x}, C {sin x + cos x, sin x - cos x} in span (sin x, cos x) 12. f(x) = sin x, B = {sin x + cos x, cos x}, C {cos x - sin x, sin x + cos x} in span (sin x, cos x) 13. Rotate the xy-axes in the plane counterclockwise through an angle () 60° to obtain new x' y' -axes. Use the methods of this section to find (a) the x'y'-coordinates of the point whose xy-coordinates are (3, 2) and (b) the xy-coordinates of the point whose x'y' -coordinates are (4, - 4). 14. Repeat Exercise 1 3 with () = 1 35°. = { [ � ] [�] } [ � -�] 15. Let B and C be bases for � 2 . If C , the change-of-basis matrix from B to C is Pc<-B = _ find B. 16. Let B and C be bases for rzf 2 . If B = {x, 1 + x, 1 x + x2 } and the change-of-basis matrix from B to C is - find C. and 412 Chapter 6 Vector Spaces In calculus, you learn that a Taylor polynomial of degree n about a is a poly nomial of the form p(x) = a0 + a 1 (x - a) + a 2 (x - a) 2 + · · · + a " (x - a) " where a n * 0. In other words, it is a polynomial that has been expanded in terms ofpowers of x - a instead ofpow­ ers of x. Taylor poly nomials are very useful for approximat­ ingfunctions that are "well behaved" near x = a. The set B = { l , x - a, (x - a) 2 , , (x - a) " } is a basis for rtl'n for any real number a. (Do you see a quick way to show this? Try using Theorem 6. 7.) This fact allows us to use the techniques of this section to rewrite a poly nomial as a Taylor poly nomial about a given a. 17. Express p (x) = 1 + 2x - 5x2 as a Taylor polynomial about a = 1 . • • • 18. Express p ( x) = 1 + 2 x - sx2 as a Taylor polynomial about a = - 2 . 19. Express p (x) = x3 as a Taylor polynomial about a = - 1 . 20. Express p ( x) = x3 as a Taylor polynomial about a = t . 21. Let B, C, and V be bases for a finite-dimensional vec­ tor space V. Prove that Pv +-cPc+-B = Pv +- B 22. Let V be an n-dimensional vector space with basis B = {v 1 , , vn }. Let P be an invertible n X n matrix . • . and set fo r i = 1 , . . . , n. Prove that C = {u 1 , for V and show that P = Ps+-c· • . • , uJ is a basis linear Tra n st o r m a l i o n s We encountered linear transformations in Section 3 . 6 in the context o f matrix trans­ formations from IR " to !R m . In this section, we extend this concept to linear transfor­ mations between arbitrary vector spaces. A linear transformation from a vector space V to a vector space W is a mapping T : V ---+ W such that, for all u and v in V and for all scalars c, D e fi n it i o n 1. T(u + v) = T(u) + T(v) 2. T(cu) = cT(u) It is straightforward to show that this definition is equivalent to the requirement that T preserve all linear combinations. That is, T : V ---+ W is a linear transformation if and only if T(c 1 v1 + c2v2 + · · · + ckvk ) = c 1 T(v1 ) + c2 T(v2 ) + · · · + ck T(vk ) for all v1 , . . . , vk in V and scalars c 1 , . . . , ck . Exa m p l e 6 . 4 9 Every matrix transformation is a linear transformation. That is, if A is an m X n matrix, then the transformation TA : IR " ---+ !R m defined by TA ( x) = Ax for x in IR " is a linear transformation. This is a restatement of Theorem 3.30. Section 6.4 Linear Transformations Exa m p l e 6 . 5 0 413 Define T : Mnn ---+ Mnn by T(A) = A T. Show that T is a linear transformation. Solution We check that, for A and B in Mnn and scalars c, T(A + B) = (A + Bf = A T + B T = T(A) + T(B) T(cA) = (cAf = cA T = cT(A) and Therefore, T is a linear transformation. Exa m p l e 6 . 5 1 Let D be the differential operator D : 0J ---+ '2F defined by D(j) = f' . Show that D is a linear transformation. Let f and g be differentiable functions and let c be a scalar. Then, from calculus, we know that Solution D(f + g) = (j + g) ' = f' + g' = D (j) + D(g) and D(cj) = (cf) ' = cf' = cD(j) Hence, D is a linear transformation. In calculus, you learn that every continuous function on [a, b] is integrable. The next example shows that integration is a linear transformation. Exa m p l e 6 . 5 2 Define S : � [ a, b ] ---+ Solution IR: by S(j) = J: f (x) dx. Show that S is a linear transformation. Let f and g be in � [a, b] . Then S(f + g) = = r(j + g) (x) dx r(j(x) + g(x)) dx rf(x) dx + rg(x) dx a a = a = S(f) + S(g) and S(cf) = = r(cj)(x) dx fcf(x) dx cff(x) dx a a = a = cS(f) It follows that S is linear. a 414 Chapter 6 Vector Spaces Exa m p l e 6 . 5 3 Show that none of the following transformations is linear: (a) T : M22 ---+ IR defined by T(A) = <let A (b) T : IR ---+ IR defined by T(x) = 2x (c) T : IR ---+ IR defined by T(x) = x + 1 In each case, we give a specific counterexample to show that one of the properties of a linear transformation fails to hold. Solution (a) Let A = [� �] and B = [ � � ]. [� �l I � �I I� � I I � � I Then A + B = T(A + B ) = <let (A + B ) = But T(A ) + T( B ) = detA + detB = so = 1 + =0+0=0 so T(A + B) * T(A) + T(B) and T is not linear. (b) Let x = 1 and y = 2. Then T ( x + y) = T ( 3 ) = 2 3 = 8 * 6 = 2 1 + 2 2 = T ( x) + T (y) so T is not linear. (c) Let x = 1 and y = 2. Then T( x + y) = T( 3 ) = 3 + 1 = 4 * 5 = ( 1 + 1) + (2 + 1) = T ( x) + T(y) Therefore, T is not linear. Example 6.53(c) shows that you need to be careful when you encounter the word "linear:' As a function, T(x) = x + 1 is linear, since its graph is a straight line. However, it is not a linear transformation from the vector space IR to itself, since it fails to satisfy the definition. (Which linear functions from IR to IR will also be linear transformations?) Remark � 4 There are two special linear transformations that deserve to b e singled out. Exa m p l e 6 . 5 4 (a) For any vector spaces V and W, the transformation T0 : V ---+ W that maps every vector in V to the zero vector in W is called the zero transformation. That is, T0 ( v) = 0 for all v in V (b) For any vector space V, the transformation I : V ---+ V that maps every vector in V to itself is called the identity transformation. That is, J ( v) = v for all v in V (If it is important to identify the vector space V, we may write Iv for clarity.) The proofs that the zero and identity transformations are linear are left as easy exercises. 4 Section 6.4 Linear Transformations 415 Properties of linear Tra nsformations _., Theorem 6 . 1 4 In Chapter 3, all linear transformations were matrix transformations, and their properties were directly related to properties of the matrices involved. The fol­ lowing theorem is easy to prove for matrix transformations. (Do it! ) The full proof for linear transformations in general takes a bit more care, but it is still straightforward. Let T : V ---+ W b e a linear transformation. Then: a. T (O) = 0 b. T ( -v) = - T(v) for all v in V. c. T (u - v) = T (u) - T (v) for all u and v in V. We prove properties (a) and (c) and leave the proof of property (b) for Exercise 2 1 . Proof _., (a) Let v be any vector in V. Then T (O) = T (Ov) = O T (v) = 0, as required. (Can you give a reason for each step?) (c) T(u - v) = T(u + ( - l )v) = T(u) + ( - l ) T(v) = T(u) - T(v) Remark Property (a) can be useful in showing that certain transformations are not linear. As an illustration, consider Example 6.53(b ). If T(x) = 2x, then T(O) = 2 ° = 1 * 0, so T is not linear, by Theorem 6. 14(a) . Be warned, however, that there are lots of transformations that do map the zero vector to the zero vector but that are still not linear. Example 6.53(a) is a case in point: The zero vector is the 2 X 2 zero matrix 0, so T( 0) = det 0 = 0, but we have seen that T(A) = det A is not linear. The most important property of a linear transformation T : V ---+ W is that T is completely determined by its effect on a basis for V. The next example shows what this means. Exa m p l e 6 . 5 5 Suppose T is a linear transformation from IR 2 to <!J' 2 such that [�] [ � J [�J r Find r _., Solution - and r Since B = span (B) . Solving = 2 - 3x + x2 and r [�] = 1 - x2 . { [ � ] , [ � ] } is a basis for IR2 (why?), every vector in IR2 is in 416 Chapter 6 Vector Spaces we find that c 1 = - 7 and c2 = 3. Therefore, = - 7r [ � ] [�] + 3T = - 7 ( 2 - 3x + x 2 ) + 3 ( 1 - x 2 ) = - 1 1 + 21x - 1 0x 2 Similarly, we discover that [ :] so r = ( 3a - 2 b) [:J ( [�J [ �J [�] + ( b - a) = r ( 3 a - 2b ) + ( b - a) = ( 3 a - 2b ) r + ( b - a) r [�] [�] ) [�J = ( 3 a - 2b ) ( 2 - 3x + x 2 ) + ( b - a ) ( l - x 2 ) = ( Sa - 3b) + ( - 9a + 6b ) x + ( 4a - 3b ) x 2 II- (Note tha\by setting a = - 1 and b = 2, we recover the solution r 2lx - lOx .) [ -�] = -11 + 4 The proof of the general theorem is quite straightforward. Theorem 6 . 1 5 Let T : V -+ W be a linear transformation and let l3 = {v 1 , , vn } be a spanning set for V. Then T ( B ) = { T ( v1 ) , , T ( vn ) } spans the range of T. • • • • • • The range of T is the set of all vectors in W that are of the form T(v), where v is in V. Let T(v) be in the range of T. Since l3 spans V, there are scalars c 1 , , en such that Proof . . • Applying T and using the fact that it is a linear transformation, we see that T ( v) = T ( c 1 v 1 + · · · + c nvn ) = c 1 T ( v1 ) + · · · + cn T ( vn ) In other words, T(v) is in span ( T(B)), as required. Theorem 6. 1 5 applies, in particular, when l3 is a basis for V. You might guess that, in this case, T(B) would then be a basis for the range of T. Unfortunately, this is not always the case. We will address this issue in Section 6.5. Composition of linear Tra nsformations In Section 3.6, we defined the composition of matrix transformations. The definition extends to general linear transformations in an obvious way. Section 6.4 Linear Transformations S T 0 is read of "S T:' 411 D e fi n it i o n If T : U ---+ V and S : V ---+ W are linear transformations, then the composition of S with T is the mapping S 0 T, defined by (S 0 T) (u) = S( T(u)) where u is in U. Observe that S 0 T is a mapping from U to W (see Figure 6.6) . Notice also that for the definition to make sense, the range of T must be contained in the domain of S. u U• T • T(u) v s --+ w S(T(u)) = (S T)(u) 0 • Composition of linear transformations Figure 6 . 6 Exa m p l e 6 . 5 6 Let T : IR 2 ---+ <!/' 1 and S : <!/' 1 ---+ <!/' 2 be the linear transformations defined by Find (S 0 T) Solution [ �] _ r[ :] = a + (a + b )x and (S 0 T) and S(p(x)) = xp (x) [ :]. We compute (S 0 T) [ _� ] s( r[ _� ] ) = = S(3 + (3 - 2)x) = S(3 + x) = x(3 + x) = 3x + x 2 and (S 0 T) [ :J ( r [ :] ) = s = S(a + (a + b)x) = x(a + (a + b)x) = ax + (a + b)x 2 Chapter 3 showed that the composition of two matrix transformations was another matrix transformation. In general, we have the following theorem. Theorem 6 . 1 6 If T : U ---+ V and S : V ---+ W are linear transformations, then S 0 T : U ---+ W is a linear transformation. 418 Chapter 6 Vector Spaces Proof Let u and v be in U and let c be a scalar. Then (S T) (u + v) = S( T(u + v)) = S( T(u) + T(v)) = S( T(u)) + S(T(v)) = (S T)(u) + (S T) (v) 0 0 and 0 (S T)(cu) = S( T(cu)) = S(cT(u)) = cS( T(u)) = c(S T) (u) 0 0 since Tis linear since is linear S since Tis linear since is linear S Therefore, S T is a linear transformation. 0 The algebraic properties of linear transformations mirror those of matrix trans­ formations, which, in turn, are related to the algebraic properties of matrices. For example, composition of linear transformations is associative. That is, if R, S, and T are linear transformations, then provided these compositions make sense. The proof of this property is identical to that given in Section 3.6. The next example gives another useful (but not surprising) property of linear transformations. Exa m p l e 6 . 5 1 Let S : U � V and T : V � W be linear transformations and let I : V � V be the iden­ tity transformation. Then for every v in V, we have ( T J)(v) = T(I(v)) = T(v) Since T I and T have the same value at every v in their domain, it follows that T I = T. Similarly, I S = S. 0 0 o o Remark The method of Example 6.57 is worth noting. Suppose we want to show that two linear transformations Ti and T2 (both from V to W ) are equal. It suffices to show that Ti ( v) = T2 (v) for every v in V. Further properties of linear transformations are explored in the exercises. Inverses or Linear Transrormalions D e fi n il i O D A linear transformation T : V � W is invertible if there is a linear transformation T' : W � V such that T' T = Iv and T T ' = Iw In this case, T' is called an inverse for T. o 0 Section 6.4 Linear Transformations 419 R e m a rks The domain V and codomain W of T do not have to be the same, as they do in the case of invertible matrix transformations. However, we will see in the next section that V and W must be very closely related. • The requirement that T' be linear could have been omitted from this definition. For, as we will see in Theorem 6.24, if T' is mapping from W to V such that T' 0 T = Iv and T 0 T' = Iw, then T' is forced to be linear as well. • If T' is an inverse for T, then the definition implies that T is an inverse for T'. Hence, T' is invertible too. • any Exa m p l e 6 . 5 8 Verify that the mappings T : IR 2 ---+ r!J> 1 and T' : r!J> 1 ---+ IR 2 defined by r [ : ] a + (a + = b ) x and T ' ( c are inverses. and dx) = We compute Solution (T' + [d � J o r) [ :J r' (r [ :] ) T' (a + (a + [ (a + :) _ a ] [ :] + + r[ ] + + + = ( T 0 T ') (c dx) = T ( T ' ( c b ) x) = = dx)) = c d-c = c (c ( d - c )) x = c dx Hence, T ' T = IIR2 and T 0 T ' = I21' 1 • Therefore, T and T' are inverses of each other. o 4 As was the case for invertible matrices, inverses of linear transformations are unique if they exist. The following theorem is the analogue of Theorem 3.6. Theorem 6 . 1 1 If T is an invertible linear transformation, then its inverse is unique. The proof is the same as that of Theorem 3.6, with products of matrices re­ placed by compositions of linear transformations. (You are asked to complete this proof in Exercise 3 1 .) Proof Thanks to Theorem 6. 1 7, if T is invertible, we can refer to the inverse of T. It will be denoted by r - 1 (pronounced " T inverse"). In the next two sections, we will ad­ dress the issue of determining when a given linear transformation is invertible and finding its inverse when it exists. 480 .. I Chapter 6 Vector Spaces Exercises 6 . 4 In Exercises 1 - 12, determine whether T is a linear transformation. 1. T : M22 M22 defined by a b a+b = r c d 0 c+d 2. T : M22 M22 defined by 1 w-z w x r 1 x-y y z 3. T : Mnn Mnn defined by T (A ) = AB, where B is a fixed n X n matrix 4. T : Mnn Mnn defined by T (A ) = AB - BA, where B is a fixed n X n matrix 5. T : Mnn IR defined by T (A ) = tr (A ) 6. T : Mnn IR defined by T(A) = a ll a 22 a nn 7. T : Mnn IR defined by T (A ) = rank (A ) 8. T : <JP2 <!P2 defined by T(a + bx + cx 2 ) = (a + 1 ) + (b + l) x + (c + l ) x 2 9. T : <!P2 <!P2 defined by T(a + bx + cx 2 ) = a + b(x + 1 ) + b(x + 1 ) 2 10. T : gjf ---,l> gji defined by T (j) = f (x 2 ) 1 1 . T : gji gji defined by T (j) = (j(x) ) 2 12. T : gji IR defined by T (j) = j (c) , where c is a fixed OJ --,lo [ ] [ [ ] [ ] --,lo --,lo 16. Let T : <JP2 <JP2 be a linear transformation for which T( l) = 3 - 2x, T(x) = 4x - x2 , and T(x2 ) = 2 + 2x2 Find T(6 + x - 4x2 ) and T(a + bx + cx 2 ). 17. Let T : <!P2 <!P2 be a linear transformation for which T( l + x) = 1 + x 2 , T(x + x 2 ) = x - x 2 , T( 1 + x 2) = 1 + x + x 2 --,lo --,lo Find T ( 4 - x + 3x 2 ) and T(a + bx + cx 2 ). 18. Let T : M22 --,lo IR be a linear transformation for which --,lo --,lo --,lo • • • --,lo --,lo --,lo --,lo --,lo scalar 13. Show that the transformations S and T in Exam­ ple 6.56 are both linear. 14. Let T : IR 2 --,lo IR 3 be a linear transformation for which [� J [�J [ �] [�] [ - �] [ �J. Find r 15. Let T : IR T Find r and r 2 --,lo . <JP2 be a linear transformation for which = 1 - 2x and r and T _ = x + 2x 2 Find r [� OJ [� ] [� ] [� �] [� � ] [ : �]. r 0 r 1 0 = 1' r = 3, r 1 0 = 2' = 4 and r 19. Let T : M22 --,lo IR be a linear transformation. Show that there are scalars a, b, c, and d such that [ ; :J [ ; :J r for all = aw + bx + cy + dz in M22 . 20. Show that there is no linear transformation T : IR 3 such that {] � 1 + x, t! J r [�] --,lo � 2 - x + x' . � - 2 + 2x' 21. Prove Theorem 6. 14(b) . 22. Let {v1 , , vn } be a basis for a vector space V and • . • <JP 2 let T : V --,lo V be a linear transformation. Prove that if T (v1) = V1, T (v2) = V2 . . . , T ( vn ) = vn , then T is the identity transformation on V. � 23. Let T : <JP n --,lo <JP n be a linear transformation such that T(x k ) = kx k - I for k = 0, 1 , . . . , n. Show that T must be the differential operator D. Section 6.5 The Kernel and Range of a Linear Transformation 24. Let v1 , • • • , v" be vectors in a vector space V and let T : V ---+ W be a linear transformation. ( a ) If { T(v1 ), . . . , T(vn )} is linearly independent in W, show that {v1 , • • • , vJ is linearly independent in V. (b) Show that the converse of part (a) is false. That is, it is not necessarily true that if {v1 , . • . , vn } is linearly independent in V, then {T(v1 ), • • • , T(v)} is linearly independent in W. Illustrate this with an example T : IR 2 ---+ IR 2 • 25. Define linear transformations S : IR 2 ---+ M22 and T : IR 2 ---+ IR 2 by s[ � ] [ : a = b Compute (S T) 0 compute ( T S) 0 ] [] [ :] [;]. � b and T � [�] [;]? a and (S T) 0 2c d _ Can you If so, compute it. 26. Define linear transformations S : <!P 1 ---+ <!P 2 and T : <!P 2 ---+ <!P 1 by S(a + bx) = a + (a + b )x + 2bx 2 and T(a + bx + cx 2 ) = b + 2cx Compute (S T)(3 + 2x - x 2 ) and (S T)(a + bx + cx 2 ) . Can you compute ( T S) (a + bx)? If so, compute it. � 27. Define linear transformations S : <;if " <;if " and T : <!P n -+ <!P n by S(p(x)) = p(x + 1 ) and T(p(x)) = p ' (x) 0 0 0 ---+ Find (S T) (p(x)) and ( T S) (p(x)). [Hint: Remember the Chain Rule.] � 28. Define linear transformations S : <;if n ---+ <;if n and T : <;if n -+ <;if n by 0 0 S(p (x)) = p (x + 1 ) and T(p(x)) = xp ' (x) Find (S T)(p(x)) and ( T S)(p(x)). 0 0 s[x] [ ] [] [ ] In Exercises 29 and 30, verify that S and T are inverses. 4x + y = 29. S : IR 2 -+ IR 2 defined by and T: IR 2 -+ IR 2 3x + y y x = x-y defined by T y - 3x + 4y 30. S : <!P 1 ---+ <!P 1 defined by S(a + bx) = ( - 4a + b) + 2ax and T : <!P 1 ---+ <!P 1 defined by T(a + bx) = b/2 + (a + 2b)x 31. Prove Theorem 6. 17. 32. Let T : V ---+ V be a linear transformation such that T o T = I. (a) Show that {v, T(v)} is linearly dependent if and only if T(v) = ±v. (b) Give an example of such a linear transformation with V = IR 2 • 33. Let T : V ---+ V be a linear transformation such that T T = T. ( a) Show that {v, T(v)} is linearly dependent if and only if T(v) = v or T(v) = 0. 0 (b) Give an example of such a linear transformation with V = IR 2 . The set of all linear transformations from a vector space V to a vector space W is denoted by ;£', ( V, W ) . If S and T are in ;£', ( V, W ) , we can define the sum S + T of S and T by (S + T) (v) = S(v) + T(v) for all v in V If c is a scalar, we define the scalar multiple cT of T by c to be (cT)(v) = cT(v) for all v in V Then S + T and cT are both transformations from V to W 34. Prove that S + T and cT are linear transformations. 35. Prove that H', ( V, W ) is a vector space with this addi­ tion and scalar multiplication. 36. Let R, S, and T be linear transformations such that the following operations make sense. Prove that: ( a) R o (S + T) = R o S + R o T (b) c(R S) = (cR) S = R (cS) for any scalar c 0 II 481 0 0 T h e Kernel a n d R a n g e o f a l i n e a r Transfo r m a t i o n The null space and column space are two o f the fundamental subspaces associated with a matrix. In this section, we extend these notions to the kernel and range of a linear transformation. 482 Chapter 6 Vector Spaces wordEnglish word is derived froma theThe Old form oforthe"graiwordn:' Like ameaning "seed" kerneltrans­of corn, the kernel of a linear formation is itist"core" orinforma­ "seed" in thetion sense that carries about ofmanythe transformation. of the important properties kernel cyrnel, corn, Exa m p l e 6 . 5 9 D e fi n it i o n Let T: V---+ W be a linear transformation. The kernel of T, denoted ker ( T), is the set of all vectors in V that are mapped by T to 0 in W. That is, ker ( T) = {v in V: T ( v) = O} The range of T, denoted range(T), is the set of all vectors in W that are images of vectors in V under T. That is, range ( T ) = { T ( v) : v in V} = {w in W : w = T ( v) for some v in V} Let A be an m X n matrix and let T = TA be the corresponding matrix transformation from !R n to !R m defined by T(v) = Av. Then, as we saw in Chapter 3, the range of T is the column space of A. The kernel of T is ker ( T) = {v in !R n : T (v) = O} = {v in !R n : Av = O} = null (A ) In words, the kernel of a matrix transformation is just the null space of the corre­ sponding matrix. Exa m p l e 6 . 6 0 Find the kernel and range of the differential operator D : <;if 3 ---+ <!P 2 defined by D (p(x)) = p'(x). Solulion Since D (a + bx + cx 2 + dx 3 ) = b + 2cx + 3dx 2 , we have ker ( D ) = {a + bx + cx 2 + dx 3 : D(a + bx + cx 2 + dx 3 ) = O} = {a + bx + cx 2 + dx 3 : b + 2cx + 3dx 2 = O} But b + 2cx + 3dx 2 = 0 if and only if b = 2c = 3d = 0, which implies that b = c = d = 0. Therefore, ker ( D ) = {a + bx + cx 2 + dx 3 : b = c = d = O} = {a : a in IR} In other words, the kernel of D is the set of constant polynomials. The range of D is all of<!P 2 , since every polynomial in <!P 2 is the image under D (i.e., the derivative) of some polynomial in <!P 3 . To be specific, if a + bx + cx 2 is in <!P 2 , then Section 6.5 The Kernel and Range of a Linear Transformation Exa m p l e 6 . 6 1 483 Let S : <!/' 1 � IR be the linear transformation defined by p x rp (x)dx S ( ( )) = Find the kernel and range of S. Solution 0 In detail, we have S(a + bx) r (a + bx)dx = [ ax + !x I = ( a + %) - o = a + % a + bx: S(a + bx) { a + bx : a + % 0 } { a + bx : a - !} { - !+ bx } y = 0 2 b 2 ker ( S ) = { Therefore, 1 2 = O} = b 2 = Ify b + bx then rydx Figure 6 . 1 -- = 2 0 = ' 0 Geometrically, ker(S ) consists of all those linear polynomials whose graphs have the property that the area between the line and the x-axis is equally distributed above and below the axis on the interval [O, l ] (see Figure 6.7). The range of S is IR, since every real number can be obtained as the image under S of some polynomial in <!/' 1 . For example, if is an arbitrary real number, then so Exa m p l e 6 . 6 2 a a { dx [ x 0 = S( ) . a = a a n= a -o a = Let T : M22 � M22 b e the linear transformation defined by taking transposes: T(A) = A T. Find the kernel and range of T. Solution We see that ker ( T) = {A in M22 : T (A) = O} = {A in M22 : AT = O} But if A T = 0, then A = (A T ) T = o T = 0. It follows that ker (T) = { O}. Since, for any matrix A in M22 , we have A = (A T ) T = T(A T ) (and A T is in M22 ), we deduce that range(T) = M22 . In all of these examples, the kernel and range of a linear transformation are sub­ spaces of the domain and co domain, respectively, of the transformation. Since we are generalizing the null space and column space of a matrix, this is perhaps not surpris­ ing. Nevertheless, we should not take anything for granted, so we need to prove that it is not a coincidence. 484 Chapter 6 Vector Spaces Theorem 6 . 1 8 Let T : V ---+ W b e a linear transformation. Then: a. The kernel of T is a subspace of V. b. The range of T is a subspace of W. (a) Since T (O) = 0, the zero vector of V is in ker(T), so ker(T) is nonempty. Let u and v be in ker ( T ) and let c be a scalar. Then T (u) = T (v) = 0, so Proof T(u + v) = T(u) + T(v) = 0 + 0 = 0 and T(cu) = cT(u) = c O = 0 Therefore, u + v and c u are in ker( T), and ker(T) is a subspace of V. (b) Since 0 = T(O), the zero vector of W is in range(T), so range(T) is nonempty. Let T (u) and T (v) be in the range of T and let c be a scalar. Then T(u) + T(v) = T(u + v) is the image of the vector u + v. Since u and v are in V, so is u + v, and hence T(u) + T(v) is in range (T). Similarly, cT(u) = T(c u) . Since u is in V, so is cu, and hence cT(u) is in range(T). Therefore, range(T) is a nonempty subset of W that is closed under addition and scalar multiplication, and thus it is a subspace of W. Figure 6.8 gives a schematic representation of the kernel and range of a linear transformation. rne a g (T) v � 0 T The kernel and range of •o W Figure 6 . 8 T : V --+ W In Chapter 3, we defined the rank of a matrix to be the dimension of its column space and the nullity of a matrix to be the dimension of its null space. We now extend these definitions to linear transformations. Let T : V ---+ W be a linear transformation. The rank of T is the dimension of the range of T and is denoted by rank( T). The nullity of T is the dimension of the kernel of T and is denoted by nullity( T). Definition Exa m p l e 6 . 6 3 If A is a matrix and T = TA is the matrix transformation defined by T (v) = Av, then the range and kernel of T are the column space and the null space of A, respectively, by Example 6.59. Hence, from Section 3.5, we have rank ( T) = rank (A ) and nullity ( T) = nullity (A ) Section 6.5 The Kernel and Range of a Linear Transformation Exa m p l e 6 . 6 4 485 Find the rank and the nullity of the linear transformation D : l!P 3 � <!J 2 defined by D(p(x)) = p'(x). Solution In Example 6.60, we computed range ( D ) = <!J 2 , so rank ( D ) = dim <!f 2 = 3 The kernel of D is the set of all constant polynomials: ker (D) = {a : a in IR} = {a · 1 : a in IR}. Hence, { 1 } is a basis for ker (D), so nullity ( D ) = dim ( ker ( D )) = 1 Exa m p l e 6 . 6 5 Find the rank and the nullity of the linear transformation S : <!J 1 � IR defined by S(p(x)) = Solution r 0 p (x) dx From Example 6.6 1 , range(S ) = IR and rank(S ) = dim IR = 1. Also, ker ( S ) = {� - + bx : b in IR } = {b ( - t + x) : b in IR} = span ( - t + x) so { -t + x} is a basis for ker(S ). Therefore, nullity(S ) = dim(ker (S )) = 1 . Exa m p l e 6 . 6 6 Find the rank and the nullity of the linear transformation T : M22 � M22 defined by T (A) = A T. Solution In Example 6.62, we found that range( T ) = M22 and ker( T) = {O}. Hence, rank ( T ) = dim M22 = 4 and nullity ( T ) = dim{O} = 0 In Chapter 3, we saw that the rank and nullity of an m X n matrix A are related by the formula rank(A) + nullity(A) = n. This is the Rank Theorem (Theorem 3.26). Since the matrix transformation T = TA has !R n as its domain, we could rewrite the relationship as rank (A ) + nullity (A ) = dim !R n This version of the Rank Theorem extends very nicely to general linear transforma­ tions, as you can see from the last three examples: rank ( D ) + nullity ( D ) = 3 + 1 = 4 = dim l!P3 rank ( S ) + nullity ( S) = 1 + 1 = 2 = dim <!J 1 rank ( T ) + nullity ( T ) = 4 + 0 = 4 = dim M22 Example 6.64 Example 6.65 Example 6.66 486 Chapter 6 Vector Spaces Theorem 6 . 1 9 The Rank Theorem Let T : V ---+ W be a linear transformation from a finite-dimensional vector space V into a vector space W. Then rank ( T ) + nullity ( T ) = dim V In the next section, you will see how to adapt the proof of Theorem 3.26 to prove this version of the result. For now, we give an alternative proof that does not use matrices. Let dim V = n and let {v1, . . . , vk } be a basis for ker (T) [so that nullity( T) = dim(ker( T)) = k] . Since {v1, . . . , vd is a linearly independent set, it can be extended to a basis for V, by Theorem 6.28. Let B = {v1 , . . . , vk > vk+ 1 , . . . , vn } be such a basis. If we can show that the set C = { T (vk + 1), . . . , T (vn ) } is a basis for range( T), then we will have rank( T) = dim(range( T)) = n - k and thus Proof rank ( T ) + nullity ( T ) = k + (n - k) = n = dim V as required. Certainly C is contained in the range of T. To show that C spans the range of T, let T (v) be a vector in the range of T. Then v is in V, and since B is a basis for V, we can find scalars c1, . . . , en such that Since v1, . • . , vk are in the kernel of T, we have T (v1 ) = · · · = T( vk ) = 0, so T ( v) = T(c 1 V 1 + . . . + c kvk + Ck+1 Vk+1 + . . . + c nvn ) = c 1 T(v1 ) + · · · + ck T(vk ) + c k+1 T(vk+1 ) + · · · + c n T(vn ) = Ck+ 1 T(vk+1 ) + . . . + c n T(vn ) This shows that the range of T is spanned by C. To show that C is linearly independent, suppose that there are scalars ck + 1, . . . , en such that Then T ( ck + I Vk + I + + cnvn ) = 0, which means that Ck + ! Vk + l + + cnvn is in the kernel of T and is, hence, expressible as a linear combination of the basis vectors v1, . . . , vk of ker ( T)-say, · · · · · · But now and the linear independence of B forces c1 = = en = 0. In particular, ck + 1 = = en = 0, which means C is linearly independent. We have shown that C is a basis for the range of T, so, by our comments above, the proof is complete. · · · · · · We have verified the Rank Theorem for Examples 6.64, 6.65, and 6.66. In practice, this theorem allows us to find the rank and nullity of a linear transformation with only half the work. The following examples illustrate the process. Section 6.5 The Kernel and Range of a Linear Transformation Exa m p l e 6 . 6 1 � 481 Find the rank and nullity of the linear transformation T : <!/' 2 � <!/' 3 defined by T(p(x)) = xp(x) . (Check that T really is linear.) In detail, we have T ( a + bx + cx 2 ) = ax + bx 2 + cx 3 It follows that ker ( T ) = {a + bx + cx 2 : T ( a + bx + cx 2 ) = O} = {a + bx + cx 2 : ax + bx 2 + cx 3 = O} = {a + bx + cx 2 : a = b = c = o} = {o} so we have nullity(T) = dim(ker( T)) = The Rank Theorem implies that Solution 0. rank ( T ) = dim <!/' 2 - nullity ( T ) = 3 - 0 = 3 Remarll In Example 6.67, it would be just as easy to find the rank of T first, since {x, x2 , x3 } is easily seen to be a basis for the range of T. Usually, though, one of the two (the rank or the nullity of a linear transformation) will be easier to compute; the Rank Theorem can then be used to find the other. With practice, you will become better at knowing which way to proceed. Exa m p l e 6 . 6 8 Let W be the vector space of all symmetric 2 X 2 matrices. Define a linear transfor­ mation T : W � <!/' 2 by r [ : �] iJ)-'"-- = ( a - b) + ( b - c)x + ( c - a )x 2 (Check that T is linear.) Find the rank and nullity of T. Solution as follows: The nullity of T is easier to compute directly than the rank, so we proceed { [ : �] [ : �] 0 } { [ : �] { [ : �] :(a { [ : �] } { [ � �] } ( [ � � ] ) { [� �] } ker ( T) = :T = : ( a - b) + ( b - c ) x + ( c - a ) x 2 = - b) = ( b - c) = ( c - a ) = 0} O} :a = b = c = span Therefore, is a basis for the kernel of T, so nullity( T) = dim (ker ( T)) = 1 . The Rank Theorem and Example 6.42 tell us that rank( T) = dim W - nullity( T) = 3 - 1 = 2. 488 Chapter 6 Vector Spaces one-to-one and Onto Linear Tra nsformations We now investigate criteria for a linear transformation to be invertible. The keys to the discussion are the very important properties one-to-one and onto. A linear transformation T : V ---+ W is called one-to-one if T maps distinct vectors in V to distinct vectors in W. If range( T) = W, then T is called onto. Definition R e m a rks • The definition of one-to-one may be written more formally as follows: T : V ---+ W is one-to-one if, for all u and v in V, u * v implies that T ( u ) * T ( v) The above statement is equivalent to the following: T : V ---+ W is one-to-one if, for all u and v in V, T ( u ) = T ( v) implies that u = v Figure 6.9 illustrates these two statements. v (a) T is Fioure 6 . 9 • one-to-one w v (b) T is not one-to-one w Another way to write the definition of onto is as follows: T : V ---+ W is onto if, for all w in W, there is at least one v in V such that w = T ( v) In other words, given w in W, does there exist some v in V such that w = T(v) ? If, for an arbitrary w, we can solve this equation for v, then T is onto (see Figure 6. 10). Section 6.5 The Kernel and Range of a Linear Transformation v (a) T is Figure 6 . 1 0 Exa m p l e 6 . 6 9 onto w v (b) T is not onto 489 w r [;] [ � ] Which of the following linear transformations are one-to-one? onto? (') T !I ' -> !I ' defined by � x y (b) D : <!/' 3 ---+ <!1' 2 defined by D(p(x)) = p' (x) (c) T : M22 ---+ M22 defined by T (A) = A T Solution (a) Let r [;: J [;: J = r [;: ] [;: ], [; l m . Then so 2x 1 = 2x2 and x 1 - y 1 = x2 - y2 • Solving these equations, we see that x 1 = x2 and y 1 = Yz . Hence, It-"- [; l so T is one-to-one. = T is not onto, since its range is not all of IR 3 . To be specific, there is no vector in !I ' 'uch th" r � (Why not?) (b) In Example 6.60, we showed that range(D) = <!/' 2 , s o D is onto. D is not one­ to-one, since distinct polynomials in <!/' 3 can have the same derivative. For example, x 3 -=F x 3 + 1 , but D(x3 ) = 3x2 = D(x3 + 1 ) . (c) Let A and B b e in M22 , with T (A) = T(B). Then A T = B T, s o A = (A T ) T = (B T l = B. Hence, T is one-to-one. In Example 6.62, we showed that range(T) = M22 • Hence, T is onto. 4 It turns out that there is a very simple criterion for determining whether a linear transformation is one-to-one. Theorem 6 . 2 0 A linear transformation T : V ---+ W is one-to-one if and only if ker (T) = {O}. 490 Chapter 6 Vector Spaces Proof Assume that T is one-to-one. If v is in the kernel of T, then T(v) = 0. But we also know that T (O) = 0, so T (v) = T (O). Since T is one-to-one, this implies that v = 0, so the only vector in the kernel of T is the zero vector. Conversely, assume that ker (T) = {O}. To show that T is one-to-one, let u and v be in V with T (u) = T(v) . Then T (u - v) = T(u) - T(v) = 0, which implies that u - v is in the kernel of T. But ker ( T ) = {O}, so we must have u - v = 0 or, equivalently, u = v. This proves that T is one-to-one. Exa m p l e 6 . 10 Show that the linear transformation T : IR 2 ---+ rtP 1 defined by is one-to-one and onto. Solulion If [�] r[�] = a + ( a + b)x is in the kernel of T, then 0= r[�] = a + ( a + b)x It follows that a = 0 and a + b = 0. Hence, b = 0, and therefore quently, ker ( T) = { [�] } [ � ] [ � ]. Conse- , and T is one-to-one, by Theorem 6.20. By the Rank Theorem, rank ( T ) = dim IR 2 - nullity ( T ) = 2 - 0 = 2 Therefore, the range of T is a two-dimensional subspace of IR 2 , and hence range(T) = IR 2 . It follows that T is onto. � Theorem 6 . 2 1 For linear transformations between two n-dimensional vector spaces, the proper­ ties of one-to-one and onto are closely related. Observe first that for a linear trans­ formation T : V ---+ W, ker (T) = {O} if and only if nullity(T) = 0, and T is onto if and only if rank( T) = dim W. (Why?) The proof of the next theorem essentially uses the method of Example 6.70. Let dim V = dim W = n . Then a linear transformation T : V ---+ W is one-to-one if and only if it is onto. Proof Assume that T is one-to-one. Then nullity( T) = 0 by Theorem 6.20 and the remark preceding Theorem 6.2 1 . The Rank Theorem implies that rank ( T ) = dim V - nullity ( T ) = n - 0 = n Therefore, T is onto. Conversely, assume that T is onto. Then rank(T) = dim W = n. By the Rank Theorem, nullity ( T ) = dim V - rank ( T ) = n - n = 0 Hence, ker( T ) = {O}, and T is one-to-one. Section 6.5 The Kernel and Range of a Linear Transformation 491 In Section 6.4, we pointed out that if T : V ---+ W is a linear transformation, then the image of a basis for V under T need not be a basis for the range of T. We can now give a condition that ensures that a basis for V will be mapped by T to a basis for W. Theorem 6 . 2 2 Let T : V ---+ W be a one-to-one linear transformation. If S = {v1 , . . . , vk} is a lin­ early independent set in V, then T (S ) = { T (v1 ), . . . , T(vk) } is a linearly indepen­ dent set in W. Let c 1 , . . . , ck be scalars such that Proof Then T(c 1 v1 + · · · + ckvk) = 0, which implies that c 1 v1 + · · · + ckvk is in the kernel of T. But, since T is one-to-one, ker ( T ) = {O}, by Theorem 6.20. Hence, c 1 v1 + · · · + ckvk = 0 But, since {v1 , • • • , vd is linearly independent, all of the scalars C; must be 0. Therefore, { T (v1 ), . . . , T (vk) } is linearly independent. c o r o n a rv 6 . 2 3 Let dim V = dim W = n. Then a one-to-one linear transformation T : V ---+ W maps a basis for V to a basis for W. Let B = {v1 , . • . , vn } be a basis for V. By Theorem 6.22, T ( B) = { T (v 1 ), • • • , T ( vn ) } is a linearly independent set in W, so we need only show that T ( B ) spans W. But, by Theorem 6. 15, T ( B ) spans the range of T. Moreover, T is onto, by Theo­ rem 6.2 1, so range(T) = W. Therefore, T ( B ) spans W, which completes the proof. Proof Exa m p l e 6 . 1 1 Let T : IR 2 ---+ \fP 1 be the linear transformation from Example 6.70, defined by r = [ : J = a + (a + b)x Then, by Corollary 6.23, the standard basis [ T ( [) { T ( e 1 ) , T ( e2 ) } of \fP 1 • We find that T( e1 ) = r = { e1 , e2 } for IR 2 is mapped to a basis [�] 1 + x and T( e2) = r [ �] = x = It follows that { 1 + x, x} is a basis for \fP 1 . We can now determine which linear transformations T : V ---+ W are invertible. Theorem 6 . 2 4 A linear transformation T : V ---+ W is invertible if and only if it is one-to-one and onto. 492 Chapter 6 Vector Spaces Assume that T is invertible. Then there exists a linear transformation T - 1 : W � V such that Proof T - 1 0 T = Iv and T 0 T - 1 = Iw To show that T is one-to-one, let v be in the kernel of T. Then T (v) = 0. Therefore, Y - 1 ( T (v)) = Y - 1 ( 0 ) =:> ( T - 1 0 T ) ( v) = 0 =:> I(v) = 0 =:> v = O which establishes that ker ( T ) = { O}. Therefore, T is one-to-one, by Theorem 6.20. To show that T is onto, let w be in W and let v = T - 1 (w) . Then T ( v) = T ( T - 1 ( w)) = ( T o T - 1 ) ( w) = I ( w) =w which shows that w is the image of v under T. Since v is in V, this shows that T is onto. Conversely, assume that T is one-to-one and onto. This means that nullity(T) = 0 and rank( T) = dim W. We need to show that there exists a linear transformation T ' : W � V such that T' 0 T = Iv and T 0 T' = Iw. Let w be in W. Since T is onto, there exists some vector v in V such that T(v) = w. There is only one such vector v, since, if v' is another vector in V such that T ( v' ) = w, then T(v) = T (v' ); the fact that T is one-to-one then implies that v = v' . It therefore makes sense to define a mapping T' : W � V by setting T'(w) = v. It follows that ( T ' o T ) ( v) = T ' ( T ( v)) = T ' ( w) = v and ( T 0 T ' ) (w) = T ( T ' ( w)) = T ( v) = w It then follows that T' 0 T = Iv and T 0 T' = Iw. Now we must show that T' is a linear transformation. To this end, let w 1 and w2 be in W and let c 1 and c2 be scalars. As above, let T (v1 ) = w 1 and T(v2 ) = w2 • Then v1 = T ' (w1 ) and v2 = T' (w2 ) and T ' ( c 1 w 1 + c2w2 ) = T ' ( c 1 T ( v1 ) + c2 T ( v2 )) = T ' ( T ( c 1 v 1 + c2v2 )) = I ( c 1 V 1 + CzVz ) Consequently, T' is linear, so, by Theorem 6. 17, T' = T - 1 • Section 6. 5 The Kernel and Range of a Linear Transformation The words are derived from and the"equalGreek words meaning meaning ; ' and "shape:'isomorphic Thus, figurati velspaces y speak­have ing, vector "equal shapes:' isomorphism isomorphic isos, morph, Exa m p l e 6 . 12 493 Isomorphisms of Vector Spaces We now are in a position to describe, in concrete terms, what it means for two vector spaces to be "essentially the same:' A linear transformation T : V � W is called an isomorphism if it is one-to-one and onto. If V and W are two vector spaces such that there is an iso­ morphism from V to W, then we say that V is isomorphic to W and write V = W. Definition Show that <!P n - I and !R n are isomorphic. Solution The process of forming the coordinate vector of a polynomial provides us with one possible isomorphism (as we observed already in Section 6.2, although we did not use the term isomorphism there). Specifically, define T : <;JP n - i � !R n by T(p(x) ) = [p(x) ], 0, where £ = { 1 , x, . . . , x" - 1 } is the standard basis for <;JP n - i That is, · Theorem 6.6 shows that T is a linear transformation. If p(x) = a0 + a1x + · · · + a n - 1X n- l is in the kernel of T, then Hence, a0 = a1 = = a n - i = 0, so p(x) = 0. Therefore, ker ( T ) = {O}, and T is one­ to-one. Since dim <!P n- i = dim IR " = n, T is also onto, by Theorem 6.2 1 . Thus, T is an isomorphism, and <!P n - i = IR " . · · · Exa m p l e 6 . 13 Show that Mm n and !R m " are isomorphic. Once again, the coordinate mapping from Mm n to !R m " (as in Example 6.36 ) is an isomorphism. The details of the proof are left as an exercise. Solution In fact, the easiest way to tell if two vector spaces are isomorphic is simply to check their dimensions, as the next theorem shows. 494 Chapter 6 Vector Spaces Theorem 6 . 2 5 Let V and W b e two finite-dimensional vector spaces (over the same field of scalars). Then V is isomorphic to W if and only if dim V = dim W. Let n = dim V. If V is isomorphic to W, then there is an isomorphism T : V ---+ W. Since T is one-to-one, nullity( T) = 0. The Rank Theorem then implies that Proof rank ( T ) = dim V - nullity ( T ) = n - 0 = n Therefore, the range of T is an n-dimensional subspace of W. But, since T is onto, W = range(T), so dim W = n, as we wished to show. Conversely, assume that Vand Whave the same dimension, n. Let B = {v1 , , vn } be a basis for V and let C = {w1, . . . , wn } be a basis for W. We will define a linear transformation T : V ---+ W and then show that T is one-to-one and onto. An arbitrary vector v in V can be written uniquely as a linear combination of the vectors in the basis B-say, . • . We define T by .-.. It is straightforward to check that T is linear. (Do so.) To see that T is one-to-one, suppose v is in the kernel of T. Then and the linear independence of C forces c1 = · · · = en = 0. But then so ker( T) = {O}, meaning that T is one-to-one. Since dim V = dim W, T is also onto, by Theorem 6.21 . Therefore, T is an isomorphism, and V = W. Exa m p l e 6 . 14 Show that u;g n and <lP n are not isomorphic. Solution Since dim u;g n = n * n + 1 = dim <lP n' u;g n and <lP n are not isomorphic, by Theorem 6.25. Exa m p l e 6 . 15 Let W be the vector space of all symmetric 2 X 2 matrices. Show that W is isomorphic to IFR 3 . In Example 6.42, we showed that dim W = 3. Hence, dim W = dim IFR 3 , so W = IFR 3 , by Theorem 6.25. (There is an obvious candidate for an isomorphism T : W ---+ IFR 3 . What is it?) Solution Section 6.5 The Kernel and Range of a Linear Transformation .. I 495 Remark Our examples have all been real vector spaces, but the theorems we have proved are true for vector spaces over the complex numbers C or ZP ' where p is prime. For example, the vector space M22 (Z 2 ) of all 2 X 2 matrices with entries from 2 2 has dimension 4 as a vector space over 2 2 , and hence M22 (Z 2 ) � Zi. Exercises 6 . 5 j]h 4. Let T : '!!' 2 ---+ '!!' 2 be the linear transformation defined by T (p ( x)) = xp ' (x). ( a) Which, if any, of the following polynomials are in 1. Let T : M22 ---+ M22 be the linear transformation defined by 1 (a) Which, if any, of the following matrices are in ker( T)? (ii) [� �] (iii) [3 J O 0 -3 ker( T)? (i) (ii) x (iii) x 2 (b) Which, if any, of the polynomials in part (a) are in range(T)? ( c ) Describe ker( T ) and range(T). 2. Let T : M22 ---+ IR be the linear transformation defined In Exercises 5-8, find bases for the kernel and range of the linear transformations T in the indicated exercises. In each case, state the nullity and rank of T and verify the Rank Theorem. 6. Exercise 2 5. Exercise 8. Exercise 4 7. Exercise 3 [1 -1] In Exercises 9- 1 4, find either the nullity or the rank of T and then use the Rank Theorem to find the other. a b = a-b 9. T : M22 ---+ IR 2 defined by T c d c d (b) Which, if any, of the matrices in part (a) are in range(T)? (c) Describe ker( T) and range(T). by T (A) = tr (A ) . ( a) Which, if any, of the following matrices are in ker ( T ) ? (i) [ - 11 ] 2 3 (ii) [� �] (iii) 0 3 (b) Which, if any, of the following scalars are in range(T)? (i) 0 (ii) 2 (iii) Vl/2 ( c) Describe ker( T ) and range(T). 3. Let T : '1P 2 ---+ IR 2 be the linear transformation defined by [b + c] (a) Which, if any, of the following polynomials are in 1 ker (T)? (i) + x (ii) x - x 2 (iii) + x - x 2 (b) Which, if any, of the following vectors are in range(T)? (i) (ii) [�] (iii) ( c) Describe ker ( T ) and range(T). [�] [ ] [ ] _ 10. T : r;; 2 ---+ 1R 2 defined by TCpCx)) = [ - 11 - 11 ] [� � ] [� � � n 1 1 . T : M22 ---+ M22 defined by T (A ) = AB, where B= 12. T : M22 ---+ M22 defined by T(A) = AB - BA, where a-b T ( a + bx + cx 2 ) = 1 [�] 1 B= J]h 13. T : '1P 2 ---+ IR defined by T(p(x)) = p '(O) 14. T : M33 ---+ M33 defined by T (A ) = A - Ar In Exercises 1 5-20, determine whether the linear transfor­ mation T is (a) one-to-one and (b) onto. x 2x - y 15. T : IR 2 ---+ IR 2 defined by T = y x + 2y [] [ ] Chapter 6 Vector Spaces 496 16. T : IR 2 ---+ <.5P 2 defined by r[ :J 3 1 . Show that 'ii; [ O, l ] = � [ 0, 2 ] . 32. Show that 'ii; [ a, b ] = 'ii; [ c, d ] for all a < b and c < d. 33. Let S : V---+ W and T : U---+ V be linear transformations. ( a ) Prove that if S and T are both one-to-one, so is S T. (b) Prove that if S and T are both onto, so is S T. = (a - 2b ) + ( 3a + b ) x + ( a + b ) x2 [� ] 17. T : <.5P 2 ---+ IR 3 defined by -b 2 T ( a + bx + cx ) = a b - 3c c-a 18. T : <.5P 2 ---+ IR 2 defined by T (p ( x)) = [] r[�l 19. T : IR 3 ---+ M 22 defined by r 20. 1" ' nl' --+ w defined by [ ] 0 0 [� � � ; ] 34. Let S : V---+ W and T : U---+ V be linear transformations. ( a ) Prove that if S T is one-to-one, so is T. (b) Prove that if S T is onto, so is S. 0 : = [ aa +- bb bb -+ cc ] c � a + b + c b - 2c , where W is the vector space of b - 2c a-c 0 35. Let T : V---+ W be a linear transformation between two finite-dimensional vector spaces. ( a) Prove that if dim V < dim W, then T cannot be onto. (b) Prove that if dim V > dim W, then T cannot be one-to-one. 36. Let a 0, a 1 , , a n be n + 1 distinct real numbers. Define T : <.5P n ---+ !R n+ I by • • • all symmetric 2 X 2 matrices In Exercises 21 -26, determine whether V and W are isomorphic. If they are, give an explicit isomorphism T : V ---+ W. 2 1 . V = D 3 (diagonal 3 X 3 matrices), W = IR 3 22. V = S 3 (symmetric 3 X 3 matrices), W = U3 (upper triangular 3 X 3 matrices) 23. V = S 3 (symmetric 3 x 3 matrices), W = S� (skew­ symmetric 3 X 3 matrices) 24. V = <.5P 2 , W = {p ( x) in <.5P 3 : p ( O ) = O} F+"S7 25. V = C, W = IR 2 26. V = {A in M22 : tr (A ) = O}, W = IR 2 � 27. Show that T : <.5Pn ---+ <.5Pn defined by T(p(x) ) = p(x) + p' (x) is an isomorphism. 28. Show that T : <.5P n ---+ <.5P n defined by T(p(x) ) = p(x - 2) is an isomorphism. (�) 29. � how_ that T : <.IP_n ---+ <.5P n defined by T (p ( x)) = x np 1s an 1Somorph1sm. 30. (a) Show that 'ii; [O, l] = 'ii; [2, 3 ] . [Hint: Define T : � [O, l ] ---+ � [2, 3] by letting T (j) be the function whose value at x is ( T(j)) ( x) = j ( x - 2 ) for x in [2, 3 ] . ] (b) Show that � [ O , l ] = <'.{;; [ a , a + l ] for all a. Prove that T is an isomorphism. 37. If V is a finite-dimensional vector space and T : V ---+ V is a linear transformation such that rank( T) = rank( T 2 ), prove that range(T) n ker( T) = {O}. [Hint: T 2 denotes T T. Use the Rank Theorem to help show that the kernels of T and T 2 are the same.] 38. Let U and W be subspaces of a finite-dimensional vector space V. Define T : U X W ---+ V by T(u, w) = u - w. ( a) Prove that T is a linear transformation. (b) Show that range(T) = U + W. (c) Show that ker(T) = U n W. [Hint: See Exercise 50 in Section 6. 1 . ] (d) Prove Grassmann's Identity: 0 dim ( U + W) = dim U + dim W - dim( U n W) [Hint: Apply the Rank Theorem, using results (a) and (b) and Exercise 43(b) in Section 6.2.] Section 6.6 The Matrix of a Linear Transformation 491 T h e Matrix 01 a l i n e a r Transto r m a t i o n Theorem 6. 15 showed that a linear transformation T : V � W is completely deter­ mined by its effect on a spanning set for V. In particular, if we know how T acts on a basis for V, then we can compute T (v) for any vector v in V. Example 6.55 illustrated the process. We implicitly used this important property of linear transformations in Theorem 3.31 to help us compute the standard matrix of a linear transformation T : !Rn � !Rm. In this section, we will show that every linear transformation between finite-dimensional vector spaces can be represented as a matrix transformation. Suppose that V is an n-dimensional vector space, W is an m-dimensional vector space, and T : V � W is a linear transformation. Let B and C be bases for V and W, respectively. Then the coordinate vector mapping R (v) = [ v ] 6 defines an isomor­ phism R : V � !Rn. At the same time, we have an isomorphism S : W � !Rm given by S(w) = [wJ c, which allows us to associate the image T (v) with the vector [ T (v) J c in !Rm. Figure 6. 1 1 illustrates the relationships. v v • R- 1 U T (v) T • ls R . [Rn - - - - - - - [v] s S 0 T 0 R - l w [Rnl • � [T(v) J c figure 6 . 1 1 Since R is an isomorphism, it is invertible, so we may form the composite mapping S o T o R- 1 : [Rn � [R m which maps [v] 6 to [ T (v) J c. Since this mapping goes from !Rn to !Rm, we know from Chapter 3 that it is a matrix transformation. What, then, is the standard matrix of S 0 T 0 R - 1 ? We would like to find the m X n matrix A such that A [ v J 6 ( S 0 T 0 R- 1 ) ( [v] 6 ) . 0r, since (S 0 T 0 R- 1 ) ( [v] 6) = [ T(v) ] c, we require A [v ] s = [ T ( v) ] c It turns out to be surprisingly easy to find. The basic idea is that of Theorem 3.3 1 . The columns of A are the images of the standard basis vectors for !Rn under S 0 T 0 R - 1 • But, if B = {v 1 , • • • , vn } is a basis for V, then R (v; ) = [v; ] 6 0 +-- ith entry 0 498 Chapter 6 Vector Spaces s o R - 1 (e;) = v;. Therefore, the ith column of the matrix A we seek is given by (S T R - 1 ) ( e; ) = S ( T ( R - 1 ( e; ))) = S ( T ( v; )) = [ T ( v; ) l e which is the coordinate vector of T (v;) with respect to the basis C of W. We summarize this discussion as a theorem. 0 Theorem 6 . 2 6 0 Let V and W be two finite-dimensional vector spaces with bases B and C, respec­ tively, where B = {v1 , , vJ. If T : V ---+ W is a linear transformation, then the m X n matrix A defined by A = [ [ T (v 1 ) l c ! [ T (v2 ) l c ! · · · ! [ T(vn ) l c l satisfies A [v] 8 = [ T ( v) J c • • • for every vector v in V. The matrix A in Theorem 6.26 is called the matrix of T with respect to the bases B and C. The relationship is illustrated below. (Recall that TA denotes multiplication by A.) T v -----+ T ( v) J, J, TA [v] 8 -----+ A [v] 8 = [ T ( v) J c The matrix of a linear transformation T with respect to bases B and C is some­ times denoted by [ T] c<-- B · Note the direction of the arrow: right-to-left (not left-to­ right, as for T: V---+ W ) . With this notation, the final equation in Theorem 6.26 becomes R e m a rks • [ T J c <-- B [v] s = [ T ( v) J c Observe that the Bs in the subscripts appear side by side and appear to "cancel" each other. In words, this equation says, "The matrix for T times the coordinate vector for v gives the coordinate vector for T (v) :' In the special case where V = W and B = C, we write [ T] 8 (instead of [ T] B <-- B) . Theorem 6.26 then states that [ T J s [vJ s = [ T ( v) J s • The matrix of a linear transformation with respect to given bases is unique. That is, for every vector v in V, there is only one matrix A with the property specified by Theorem 6.26-namely, A [v ] 8 = [ T ( v) J c (You are asked to prove this in Exercise 39. ) Section 6.6 The Matrix of a Linear Transformation 499 The diagram that follows Theorem 6.26 is sometimes called a commutative diagram because we can start in the upper left-hand corner with the vector v and get • to [T(v) J c in the lower right-hand corner in two different, but equivalent, ways. If, as before, we denote the coordinate mappings that map v to [ v J 8 and w to [w]c by R and S, respectively, then we can summarize this "commutativity" by s 0 T = TA R 0 The reason for the term commutative becomes clearer when V = W and B = C, for then R = S too, and we have R T = TA R 0 0 suggesting that the coordinate mapping R commutes with the linear transformation T (provided we use the matrix version of T-namely, TA = Tl Tl 8-where it is required). • The matrix [ TJ c<-- B depends on the order of the vectors in the bases B and C. Rearranging the vectors within either basis will affect the matrix [ TJ c ._ 8. [See Example 6.77(b) .] Exa m p l e 6 . 16 2 Let T : IR 3 ---+ IR be the linear transformation defined by and let B = { e 1 , e2 , e J and C = { e2 , ei } be bases for IR 3 and !R 2 , respectively. Find the mote ix of T whh '"P"' to B ond C and vecify Themem 6.26 foe v � Solution First, we compute Next, we need their coordinate vectors with respect to C. Since we have Therefore, the matrix of T with respect to B and C is = [� -2 - �] [ _� l · 500 Chapter 6 Vector Spaces To verify Theorem 6.26 for v, we first compute T ( v) � Then [ T ( v) l e = = (Check these.) Using all of these facts, we confirm that A [v] 8 � Exa m p l e 6 . 11 � [v i s � and _.,.. T [ J [ �� ] u[ �� L [ � � ] [: - � -�1 [ J [ � � ] � � [ T ( v) J , Let D : l!P3 � <!P2 be the differential operator D(p(x)) = p' (x). Let B = { l , x, x 2 , x 3 } and C = { l , x, x 2 } be bases for l!P3 and l!P2 , respectively. (a) Find the matrix A of D with respect to B and C. (b) Find the matrix A ' of D with respect to B ' and C, where B ' = {x 3 , x 2 , x, l}. (c) Using part (a), compute D(S Theorem 6.26. Solulion - x + 2x3 ) and D(a + bx + ex 2 + dx 3 ) to verify First note that D(a + bx + cx 2 + dx 3 ) = b + 2cx + 3dx 2 • (See Example 6.60.) (a) Since the images of the basis B under D are D( l ) = 0, D(x) = 1, D(x 2 ) = 2x, and D (x 3 ) = 3x 2 , their coordinate vectors with respect to C are [D(l) J, � [H [ D (x) J , � Consequently, [O [H [ D (x ' ) J , � OJ [H [ D (x ' ) J , � A = [ D Jc.- s = [ [ D ( l ) l e ! [ D (x) l e ! [ D (x2 ) l e ! [ D (x3 ) l e l 1 0 = 0 0 2 0 0 0 0 3 (b) Since the basis B ' is just B in the reverse order, we see that A ' = [ D b- s · = [ [ D (x3 ) l e ! [ D (x2 ) l e ! [ D (x) l e ! [ D ( l ) l e l = [� � � �i 3 0 0 0 m Section 6.6 The Matrix of a Linear Transformation 501 (This shows that the order of the vectors in the bases l3 and C affects the matrix of a transformation with respect to these bases.) ( c) First we compute D(S x + 2x 3 ) = - 1 + 6x 2 directly, getting the coordinate vector - [D (S - - x + 2x') ] , � [ 1 + 6x' ] , � [ -�] On the other hand, so [D (S - x + 2x 3 ) l e which agrees with Theorem 6.26. We leave proof of the general case as an exercise . .+ Since the linear transformation in Example 6. 77 is easy to use directly, there is re­ ally no advantage to using the matrix of this transformation to do calculations. How­ ever, in other examples-especially large ones-the matrix approach may be simpler, as it is very well-suited to computer implementation. Example 6.78 illustrates the basic idea behind this indirect approach. Exa m p l e 6 . 18 Let T : <!J' 2 � <!J' 2 be the linear transformation defined by T (p (x)) = p(2x - 1 ) (a) Find the matrix of T with respect to £ = { l , x, x2 }. (b) Compute T(3 + 2x - x 2 ) indirectly, using part (a). (a) We see that Solution T ( l ) = 1 , T ( x) = 2x so the coordinate vectors are Therefore, - 1, T(x 2 ) = (2x - 1 )2 = 1 - 4x + 4x 2 502 Chapter 6 Vector Spaces (b) We apply Theorem 6.26 as follows: The coordinate vector of p(x) = 3 + 2x - x 2 with respect to E is [p (x) ] , � Therefore, by Theorem 6.26, [J [ T(3 + 2x - x 2 ) ] £ = [ T(p(x)) ] £ [ T ] £ l p(x) ] £ [ � � -: J [ J U l It follows that T(3 + 2x - x 2 ) = 0 · 1 + 8 · x - 4 · x 2 = 8x - 4x 2 . [Verify this by computing T(3 + 2x - x 2 ) = 3 + 2(2x - 1 ) - (2x - 1 ) 2 directly.] The matrix of a linear transformation can sometimes be used in surprising ways. Example 6.79 shows its application to a traditional calculus problem. Exa m p l e 6 . 19 Let <f/J be the vector space of all differentiable functions. Consider the subspace W of <f/J given by W = span(e 3x, xe 3X, x 2 e 3x ) . Since the set B = {e 3X, xe 3X , x 2 e 3x} is linearly independent (why?), it is a basis for W. (a) Show that the differential operator D maps W into itself. (b) Find the matrix of D with respect to B. (c) Compute the derivative of 5e 3x + 2xe 3x - x 2 e 3x indirectly, using Theorem 6.26, and verify it using part (a) . (a) Applying D to a general element of W, we see that D ( ae 3x + bxe 3x + cx 2 e 3x) = ( 3a + b ) e 3x + ( 3b + 2c ) xe 3x + 3cx 2 e 3x Solulion � (check this), which is again in W. (b) Using the formula in part (a), we see that D (e 3x ) = 3e 3X , D ( xe 3x ) = e 3x + 3xe 3X, D ( x 2 e 3x) = 2xe 3x + 3x 2 e 3x so It follows that Section 6.6 The Matrix of a Linear Transformation 503 (c) Forf(x) = 5e 3x + 2xe 3x - x 2 e 3X, we see by inspection that V (x) ] a � Hence, by Theorem 6.26, we have [ D (j (x)) l s = [ D l s [f(x) l s = [J [� � �i [ �i [ :] = 1 0 0 3 -1 -3 which, in turn, implies thatf'(x) = D(j(x)) = 1 7e 3x + 4xe 3x - 3x 2 e 3x, in agreement 4 with the formula in part (a) . Remark The point of Example 6. 79 is not that this method is easier than direct differentiation. Indeed, once the formula in part (a) has been established, there is little to do. What is significant is that matrix methods can be used at all in what appears, on the surface, to be a calculus problem. We will explore this idea further in Example 6.83. Exa m p l e 6 . 8 0 Let V be an n-dimensional vector space and let I be the identity transformation on V. What is the matrix of I with respect to bases B and C of V if B = C (including the order of the basis vectors) ? What if B * C? Solution Let B = {v 1 , . . . , vJ. Then J(v1 ) = v1 , . . . , J(vn ) = vn , so and, if B = C, [ I l s = [ [ J(v1 ) l s ! [ J(vz ) l s ! · · · ! [ J(vn ) l s l = [ e 1 ! ez ! · · · ! e n l = In the n X n identity matrix. (This is what you expected, isn't it?) In the case B * C, we have [ J ( v1 ) l e = [vi l e , , [ J ( vn ) l e = [vn l e so [ IJ e ..- s = [ [vi l e ! · · · ! [vn l e l = Pe ..- s the change-of-basis matrix from B to C. · · . Malrices of Composite and Inverse linear Tra nsformations We now generalize Theorems 3.32 and 3.33 to get a theorem that will allow us to easily find the inverse of a linear transformation between finite-dimensional vector spaces (if it exists) . 504 Chapter 6 Vector Spaces Theorem 6 . 2 1 Let U, V, and W b e finite-dimensional vector spaces with bases B, C, and D, respectively. Let T : U � V and S : V � W be linear transformations. Then R e m a rks In words, this theorem says, "The matrix of the composite is the product of the matrices:' • Notice how the "inner subscripts" C must match and appear to cancel each other out, leaving the "outer subscripts" in the form D +--- B. • We will show that corresponding columns of the matrices [ S T] v<---- 6 and [ S J v<-- c l T] c<---- 6 are the same. Let V; be the ith basis vector in B. Then the ith column of [ S T ] v<---- 6 is Proof 0 0 [ ( S T) (v; ) ] v = [ S ( T( v; ) J v [ S l v<-- c l T(v;) ] c = [ s l v<-- c l T l C<-- 6 [ v; ] 6 0 by two applications of Theorem 6.26. But [ v; ] 6 = e ; (why?), so is the ith column of the matrix [ S l v<-- c l T l c<---- 6 . Therefore, the ith columns of [ S T l v<---- 6 and [ S l v<-- c [ T] C<-- 6 are the same, as we wished to prove. 0 Exa m p l e 6 . 8 1 Use matrix methods to compute ( S T) 0 Example 6.56. Solution [:] for the linear transformations S and T of Recall that T : IR 2 � <J/' 1 and S : <JP , � <!P 2 are defined by r[:] = a + ( a + b)x and S ( a + bx) = ax + bx 2 Choosing the standard bases £, [', and £" for IR 2 , <JP 1 , and <JP 2 , respectively, we see that � (Verify these.) By Theorem 6.27, the matrix of S T with respect to [ and £" is 0 Section 6.6 The Matrix of a Linear Transformation 505 Thus, by Theorem 6.26, Consequently, ( S T) 0 Example 6.56. [ �] = ax + (a + b )x 2 , which agrees with the solution to In Theorem 6.24, we proved that a linear transformation is invertible if and only if it is one-to-one and onto (i.e., if it is an isomorphism) . When the vector spaces in­ volved are finite-dimensional, we can use the matrix methods we have developed to find the inverse of such a linear transformation. Theorem 6 . 2 8 Let T : V ---+ W be a linear transformation between n-dimensional vector spaces V and W and let B and C be bases for V and W, respectively. Then T is invertible if and only if the matrix [ T] C +- B is invertible. In this case, Observe that the matrices of T and r - 1 (if T is invertible) are n X n. If T is invertible, then r - 1 T = Iv. Applying Theorem 6.27, we have In = [ Iv l s = [ Y- 1 TJ s [ Y - 1 J B+- c [ T] C+- B This shows that [ T ] c.- B is invertible and that ( [ T] c..-8 ) - 1 = [ r- 1 ] B+- C· Conversely, assume that A = [ T J c..- B is invertible. To show that T is invertible, it is enough to show that ker (T) = {O}. (Why?) To this end, let v be in the kernel of T. Then T(v) = 0 , so Proof 0 0 � A [v ] a = [ T l c..- a [v] a = [ T(v) J c = [ O l e = 0 which means that [ v] 8 is in the null space of the invertible matrix A. By the Fundamen­ tal Theorem, this implies that [ v J 8 = 0, which, in turn, implies that v = 0, as required. Exa m p l e 6 . 8 2 In Example 6.70, the linear transformation T : IR 2 ---+ !Jl 1 defined by r [�] = a + (a + b)x was shown to be one-to-one and onto and hence invertible. Find r - 1 . 506 Chapter 6 Vector Spaces Solulion In Example 6.8 1 , we found the matrix of T with respect to the standard bases £ and £' for IR 2 and <!P 1 , respectively, to be [ T] £' +- £ = [ � �] By Theorem 6.28, it follows that the matrix of r - 1 with respect to £' and [ is [ ] [ OJ l 0 -1 = [ Y - 1 1 £+- £' = ( [ T] £' +- £) - 1 = 1 By Theorem 6.26, 1 1 -1 1 [ _� � ][�] [b : a] This means that (Note that the choice of the standard basis makes this last calculation virtually irrelevant.) 4 The next example, a continuation of Example 6.79, shows that matrices can be used in certain integration problems in calculus. The specific integral we consider is usually evaluated in a calculus course by means of two applications of integration by parts. Contrast this approach with our method. Exa m p l e 6 . 8 3 Show that the differential operator, restricted to the subspace W = span (e 3x, xe3x, x 2 e 3x) of '20 , is invertible, and use this fact to find the integral I x 2 e 3x dx In Example 6.79, we found the matrix of D with respect to the basis B = { e 3X, xe 3X , x 2 e 3x} of W to be Solulion l [ t �:i By Theorem 6.28, therefore, D is invertible on W, and the matrix of D - 1 is o -1 2 3 = 0 0 Section 6.6 The Matrix of a Linear Transformation 501 Since integration is antidifferentiation, this is the matrix corresponding to integration on W. We want to integrate the function x 2 e 3x whose coordinate vector is Consequently, by Theorem 6.26, [� It follows that (To be fully correct, we need to add a constant of integration. It does not show up here because we are working with linear transformations, which must send zero vectors to zero vectors, forcing the constant of integration to be zero as well.) Warning In general, differentiation is not an invertible transformation. (See Exercise 22.) What the preceding example shows is that, suitably restricted, it some­ times is. Exercises 27-30 explore this idea further. Change of Basis and Simila rilv Suppose T : V --+ V is a linear transformation and 13 and C are two different bases for V. It is natural to wonder how, if at all, the matrices [ T ] 8 and [ T ] c are related. It turns out that the answer to this question is quite satisfying and relates to some questions we first considered in Chapter 4. Figure 6. 1 2 suggests one way to address this problem. Chasing the arrows around the diagram from the upper left-hand corner to the lower right-hand corner in two different, but equivalent, ways shows that I T = T I, something we already knew, since both are equal to T. However, if the "upper" version of T is with respect to the 0 v •V i I •V Figure 6 . 1 2 Jo T = ToJ v 0 v T - T - • T(v) i1 • T(v) v } basis C } basis 13 508 Chapter 6 Vector Spaces basis C and the "lower" version is with respect to B, then T = I T = T I is with respect to C in its domain and with respect to B in its codomain. Thus, the matrix of T in this case is [ T] B<- C · But 0 0 [ T] a.-- c = [I T] a.-- c = [IJ a.-- d T l c.-- c 0 and [ T] a.-c = [ T I] a.-c = [ T] a.-- a [I] a.-- c 0 Therefore, [I l a.-- d T] c.-c = [ T] a.-- a [I] a.-c· From Example 6.80, we know that [I] 8 .-c = P8 .-c, the (invertible) change-of­ basis matrix from C to B. If we denote this matrix by P, then we also have p - 1 = (P .-- c) - 1 = Pc.-- a 5 With this notation, so Thus, the matrices [ T] 8 and [ T] c are similar, in the terminology of Section 4.4. We summarize the foregoing discussion as a theorem. Theorem 6 . 2 9 Let V be a finite-dimensional vector space with bases B and C and let T : V � V be a linear transformation. Then where P is the change-of-basis matrix from C to B. Remark As an aid in remembering that P must be the change-of-basis matrix from C to B, and not B to C, it is instructive to look at what Theorem 6.29 says when written in full detail. As shown below, the "inner subscripts" must be the same (all Bs) and must appear to cancel, leaving the "outer subscripts;' which are both Cs. S ame Theorem 6.29 is often used when we are trying to find a basis with respect to which the matrix of a linear transformation is particularly simple. For example, we can ask whether there is a basis C of V such that the matrix [ T J c of T : V � V is a diagonal matrix. Example 6.84 illustrates this application. Exa m p l e 6 . 8 4 Let T : IR:2 IR:2 � be defined by If possible, find a basis C for IR:2 T x x + 3y [] [ ] y = 2x + 2y such that the matrix of T with respect to C is diagonal. Section 6.6 The Matrix of a Linear Transformation Solution 509 The matrix of T with respect to the standard basis E is [ T] E = [� �] This matrix is diagonalizable, as we saw in Example 4.24. Indeed, if p= [ � -�J and D = [0 0] 4 -1 then P - 1 [ T] Ep = D. If we let C be the basis of IR 2 consisting of the columns of P, then P is the change-of-basis matrix PE.- c from C to E. By Theorem 6.29, [ T] c = P - 1 [ T] EP = D so the matrix of T with respect to the basis C = { [ � ] , [ _�] } is diagonal. R e m a rks It is easy to check that the solution above is correct by computing [ T ] c directly. We find that • r[ � ] [! ] 4 [ � ] o [ _ � J r[ _ � J [ - � ] o [ � ] [ _ � J [ r[ � J L [�] [ r[ _ � J L [ _ �J = + and = Thus, the coordinate vectors that form the columns of [ T ] c are and in agreement with our solution above. • The general procedure for a problem like Example 6.84 is to take the stan­ dard matrix [ T] E and determine whether it is diagonalizable by finding bases for its eigenspaces, as in Chapter 4. The solution then proceeds exactly as in the preceding example. Example 6.84 motivates the following definition. D e f i n i t i o n Let V be a finite-dimensional vector space and let T : V ---+ V be a linear transformation. Then T is called diagonalizable if there is a basis C for V such that the matrix [ T ] c is a diagonal matrix. It is not hard to show that if B is any basis for V, then T is diagonalizable if and only if the matrix [ T] 8 is diagonalizable. This is essentially what we did, for a special case, in the last example. You are asked to prove this result in general in Exercise 42. Sometimes it is easiest to write down the matrix of a linear transformation with respect to a "nonstandard" basis. We can then reverse the process of Example 6.84 to find the standard matrix. We illustrate this idea by revisiting Example 3.59. Exa m p l e 6 . 8 5 Let e be the line through the origin in IR 2 with direction vector d = standard matrix of the projection onto e. [ �: ] . Find the 510 Chapter 6 Vector Spaces df d Let T denote the projection. There is no harm in assuming that d is a unit vector (i.e., + � = 1), since any nonzero multiple of d can serve as a Solulion direction vector for -€. Let d' [ -d�2 ] so that d and d' are orthogonal. Since = d' is also a unit vector, the set D = {d, d'} is an orthonormal basis for IR 2 . As Figure 6. 1 3 shows, T(d) = d and T(d') = 0. Therefore, [ T(d) J v = [�] and [ T(d') J v = [�] y Projection onto -€ Figure 6 . 1 3 so [ T] v = [� �] The change-of-basis matrix from D to the standard basis [ is so the change-of-basis matrix from E to D is By Theorem 6.29, then, the standard matrix of T is which agrees with part (b) of Example 3.59. Section 6.6 The Matrix of a Linear Transformation Exa m p l e 6 . 8 6 - 1) Let T : <!P 2 ---+ <!P 2 be the linear transformation defined by T(p(x)) = p ( 2x 511 1 (a) Find the matrix of T with respect to the basis B = { l + x, - x, x 2 } of <!f 2 . (b) Show that T is diagonalizable and find a basis C for <!P 2 such that [ T ] c is a diago­ nal matrix. (a) In Example 6.78, we found that the matrix of T with respect to the standard basis £ = { l , x, x 2 } is Solution [ T] , [ � � -:l 11 -� o� [0 l � The change-of-basis matrix from B to £ is p = PE<-B = It follows that the matrix of T with respect to B is -1 - [ u -! � m 0 : J i i � ] [ - 0� �0 -:i 1, [ T] B = p- 1 [ T] EP 2 - 4 � � (Check this.) (b) The eigenvalues of [ T] E are 2, and 4 (why?), so we know from Theorem 4.25 that [ T] E is diagonalizable. Eigenvectors corresponding to these eigenvalues are [ � JTJ H l [� � :l [� � �l respectively. Therefore, setting P� - - and D � we have p- i [ T] Ep = D. Furthermore, P is the change-of-basis matrix from a basis C to £, and the columns of P are thus the coordinate vectors of C in terms of £. It follows that C = {l, + x, - 2x + x 2 } and [ T l c = D. -1 1 Chapter 6 Vector Spaces 512 The preceding ideas can be generalized to relate the matrices [ T ] C<-- B and [ T] c '<-- B ' of a linear transformation T : V ---+ W, where 13 and 13 ' are bases for V and C and C' are bases for W. (See Exercise 44.) We conclude this section by revisiting the Fundamental Theorem of Invertible Matrices and incorporating some results from this chapter. Theorem 6 . 3 0 The Fundamental Theorem of Invertible Matrices: Version 4 Let A be an n X n matrix and let T : V ---+ W be a linear transformation whose matrix [ T] C<-- B with respect to bases 13 and C of V and W, respectively, is A. The following statements are equivalent: a. A is invertible. b. Ax = b has a unique solution for every b in !R n . c. Ax = 0 has only the trivial solution. d. The reduced row echelon form of A is Iw e. A is a product of elementary matrices. f. rank(A) = n g. nullity(A) = 0 h. The column vectors of A are linearly independent. i. The column vectors of A span !R n . j. The column vectors of A form a basis for !R n . k. The row vectors of A are linearly independent. 1. The row vectors of A span !R n . m. The row vectors of A form a basis for !R n . n. det A * 0 o. 0 is not an eigenvalue of A. p. T is invertible. q. T is one-to-one. r. T is onto. s. ker ( T) = {O} t. range(T) = W The equivalence (q) -¢:? (s) is Theorem 6.20, and (r) -¢:? (t) is the definition of onto. Since A is n X n, we must have dim V = dim W = n. From Theorems 6.2 1 and 6.24, we get (p) -¢:? ( q) -¢:? ( r) . Finally, we connect the last five statements to the others by Theorem 6.28, which implies that (a) -¢:? (p ). Proof .. I Exercises 6 . 6 In Exercises 1 - 12, find the matrix [ T] C<-- B of the linear transformation T : V ---+ W with respect to the bases 13 and C of V and W, respectively. Verify Theorem 6.26 for the vector v by computing T(v) directly and using the theorem. 1 . T : <!P 1 ---+ <!P 1 defined by T ( a + bx) = b - ax, 13 = C = { l , x}, v = p(x) = 4 + 2x .. 2. T : <!P 1 ---+ <!P 1 defined by T(a + bx) = b - ax, 13 = { l + x, 1 - x}, C = {l, x}, v = p(x) = 4 + 2x 3. T : <!P 2 ---+ <!P 2 defined by T(p(x) ) = p(x + 2), 13 = { l , x, x 2 }, C = { l , x + 2, (x + 2 ) 2 }, v = p(x) = a + bx + cx 2 Section 6.6 The Matrix of a Linear Transformation 4. T : <J/l 2 ---+ <J/l 2 defined by T(p(x) ) = p(x + 2), B = { l , x + 2, (x + 2 ) 2 }, C = { l , x, x 2 }, v = p(x) = a + bx + cx2 5. T : <J/l 2 ---+ IR 2 defined by T(p(x) ) = B = { l , x, x 2 }, C = {e 1 , ez }, [� � � � l v = p(x) = a + bx + cx 2 6. T : <J/l 2 ---+ IR 2 defined by T(p(x) ) = B= {x2, x, l },C = { [ � ] , [ � ] } , [� � � � l v = p(x) = a + bx + cx 2 7. T : IR 2 ---+ IR 3 defined by a 2b B a r[:J [ � l � WH - � ] } . c � m H J [: l l · v � n [ �] . [ : �] 8. Repeat Exercise 7 with v = 9. T : M22 ---+ M22 defined by T(A) = A r, B = C = {E 1 1 , E 1 2 , E 2 1 , E 22 }, v = A = 10. Repeat Exercise 9 with B = {E w E 2 1 , E 1 2 > E 11 } and C = {E 1 2 , E 2 1 > E w E 11 }. 11. T: M22 ---+ M22 defined by T (A) = AB - BA, where B= [ l -1 v=A = ] [ : �] -l , B = C = {E 11 , E 1 2 > E2 1 , E22 }, 1 12. T : M22 ---+ M22 defined by T(A) = A - A r, B = C = {E 11 , E 1 2 > E2 1 , E22 }, v = A = [ : �] � 13. Consider the subspace W of '.2ll , given by W = span (sin x, cos x). (a) Show that the differential operator D maps W into 513 ilili_ 14. Consider the subspace W of '.2ll , given by W = span ( e 2x, e-2x) . (a) Show that the differential operator D maps W into itself. (b) Find the matrix of D with respect to B = { e 2 , e - zx}. (c) Compute the derivative of f ( x) = e 2x - 3e - zx indirectly, using Theorem 6.26, and verify that it agrees with j'( x) as computed directly. ilili_ 15. Consider the subspace W of '.2ll , given by W = span ( e 2 , e 2x cos x, e 2x sin x). ( a) Find the matrix of D with respect to B = {e 2x, e 2x cos x, e 2x sin x}. (b) Compute the derivative off(x) = 3 e 2x - e 2x cos x + 2e 2x sin x indirectly, using Theorem 6.26, and verify that it agrees with f' (x) as computed directly. ilili_ 16. Consider the subspace W of '.2ll , given by W = span (cos x, sin x, x cos x, x sin x). ( a) Find the matrix of D with respect to B = {cos x, sin x, x cos x, x sin x}. (b) Compute the derivative off( x) = cos x + 2x cos x indirectly, using Theorem 6.26, and verify that it agrees with f' (x) as computed directly. X X In Exercises 1 7 and 1 8, T : U ---+ V and S : V ---+ W are linear transformations and B, C, and D are bases for U, V, and W, respectively. Compute [S T l v +- B in two ways: (a) by finding S T directly and then computing its matrix and (b) by finding the matrices of S and T separately and using Theorem 6.27. p(O) 17. T : <J/l 1 ---+ IR 2 defined by T(p(x)) = , S : IR 2 ---+ IR 2 p( l) 2 B = { l , x}, = defined by s a C = D = {e 1 , ez } 18. T : <J/l 1 ---+ <J/l 2 defined by T(p(x) ) = p(x + 1 ) , S : <JP 2 ---+ <JP 2 defined by S(p(x) ) = p(x + 1 ) , B = { l , x}, C = D = { l , x, x2 } 0 0 [ �] [ ; -- � l [ ] In Exercises 1 9-26, determine whether the linear transfor­ mation T is invertible by considering its matrix with respect to the standard bases. If T is invertible, use Theorem 6.28 itself. and the method of Example 6.82 to find T - 1 • (b) Find the matrix of D with respect to 20. T in Exercise 5 19. T in Exercise 1 B = {sin x, cos x}. 21. T in Exercise 3 (c) Compute the derivative off( x) = 3 sin x - 5 cos x 22. T : <Jfl 2 -+ <Jfl 2 defined by T(p (x) ) = p' (x) indirectly, using Theorem 6.26, and verify that it agrees withj' (x) as computed directly. � 23. T : <J/l 2 ---+ <JP 2 defined by T(p(x) ) = p(x) + p' ( x) 514 Chapter 6 Vector Spaces 24. T : M22 � M22 defined by T (A ) = AB, where [� �] B= 25. T in Exercise 1 1 26. T in Exercise 1 2 � In Exercises 27-30, use the method of Example 6.83 to evaluate the given integral. 27. f (sin x - 3 cos x) dx. (See Exercise 13.) 28. f 5e - 2x dx. (See Exercise 14.) 29. f (e2x cos x - 2e2x sin x) dx. (See Exercise 15.) 30. f (x cos x + x sin x) dx. (See Exercise 16.) In Exercises 31 -36, a linear transformation T : V � V is given. Ifpossible, find a basis C for V such that the matrix [ T] c of T with respect to C is diagonal. [�] [ ] r[ �] [ : : �] - 4b a + Sb 31. T : IR 2 � IR 2 defined by r 32. T : IR 2 � IR 2 defined by = 33. T : <!/' 1 � <!/' 1 defined by T(a + bx) = (4a + 2b) + (a + 3b)x 34. T : <!1' 2 � <!1' 2 defined by T(p(x)) = p(x + 1 ) � 35. T : <!/' 1 � <!/' 1 defined by T(p( x)) = p(x) + xp' (x) 36. T : <!1' 2 � <!1' 2 defined by T(p( x)) = p(3x + 2) 37. Let € be the line through the origin in IR 2 with direction vector d = [ �: ] it to compute the orthogonal projection of v onto W, where . Use the method of Example 6.85 to find the standard matrix of a reflection in e. 38. Let W be the plane in IR 3 with equation x - y + 2z = 0. Use the method of Example 6.85 to find the standard matrix of an orthogonal projection onto W. Verify that your answer is correct by using Compare your answer with Example 5. 1 1 . [Hint: Find an orthogonal decomposition of IR 3 as IR 3 = W + WJ_ using an orthogonal basis for W. See Example 5.3.] 39. Let T : V � W be a linear transformation between finite-dimensional vector spaces and let l3 and C be bases for V and W, respectively. Show that the matrix of T with respect to l3 and C is unique. That is, if A is a matrix such that A [ v ] 8 = [ T ( v) l e for all v in V, then A = [ T l c +- B· [Hint: Find values of v that will show this, one column at a time.] In Exercises 40-45, let T : V � W be a linear transforma­ tion between finite-dimensional vector spaces V and W Let l3 and C be bases for V and W, respectively, and let A = [ T] c +- B· 40. Show that nullity( T) = nullity(A). 41. Show that rank( T) = rank(A) . 42. I f V = W and l3 = C , show that T is diagonalizable if and only if A is diagonalizable. 43. Use the results of this section to give a matrix­ based proof of the Rank Theorem (Theorem 6. 19). 44. If !3 ' and C' are also bases for V and W, respectively, what is the relationship between [ T] c +- B and [ T] c · +- B '? Prove your assertion. 45. If dim V = n and dim W = m, prove that � ( V, W ) = Mm n - (See the exercises for Section 6.4.) [Hint: Let l3 and C be bases for V and W, respectively. Show that the mapping cp ( T) = [ T ] c._8, for T in � ( V, W ) , defines a linear transformation <p : � ( V, W) � Mmn that is an isomorphism.] 46. If V is a vector space, then the dual space of V is the vector space V* = � ( V, IR). Prove that if V is finite-dimensional, then V* = V. Exp loration Tilings , L att i c e s , and t h e C rystall o gr ap h i c Restriction Repeating patterns are frequently found in nature and i n art. The molecular struc­ ture of crystals often exhibits repetition, as do the tilings and mosaics found in the artwork of many cultures. Tiling (or tessellation) is covering of a plane by shapes that do not overlap and leave no gaps. The Dutch artist M. C. Escher ( 1 898- 1 972) pro­ duced many works in which he explored the possibility of tiling a plane using fanciful shapes (Figure 6. 14). M. C. Escher's "Symmetry Drawing El03" Figure 6 . 1 4 515 translationDrawing El03" M.Invariance C. Escher'under s "Symmetry figure 6 . 1 5 • • • • • • " • • A lattice l • • • figure 6 . 1 6 • • " • u • • • • • • • M.Rotational C. Escher'symmetry s "Symmetry Drawing El03" figure 6 . 1 1 In this exploration, we will be interested in patterns such as those in Figure 6. 14, which we assume to be infinite and repeating in all directions of the plane. Such a pattern has the property that it can be shifted (or translated) in at least two directions (corresponding to two linearly independent vectors) so that it appears not to have been moved at all. We say that the pattern is invariant under translations and has translational symmetry in these directions. For example, the pattern in Figure 6. 14 has translational symmetry in the directions shown in Figure 6. 1 5 . If a pattern has translational symmetry in two directions, it has translational sym­ metry in infinitely many directions. 1 . Let the two vectors shown in Figure 6. 1 5 be denoted by u and v. Show that the pattern in Figure 6.14 is invariant under translation by any integer linear combination of u and v-that is, by any vector of the form au + bv, where a and b are integers. For any two linearly independent vectors u and v in IR 2 , the set of points deter­ mined by all integer linear combinations of u and v is called a lattice. Figure 6. 1 6 shows an example o f a lattice. 2. Draw the lattice corresponding to the vectors u and v of Figure 6.15. Figure 6. 14 also exhibits rotational symmetry. That is, it is possible to rotate the entire pattern about some point and have it appear unchanged. We say that it is invariant under such a rotation. For example, the pattern of Figure 6. 14 is invariant under a rotation of 1 20° about the point 0, as shown in Figure 6. 17. We call 0 a center of rotational symmetry (or a rotation center) . Note that if a pattern is based on an underlying lattice, then any symmetries of the pattern must also be possessed by the lattice. 516 3. Explain why, if a point 0 is a rotation center through an angle e, then it is a rotation center through every integer multiple of e. Deduce that if 0 < e :s 360°, then 360 / e must be an integer. (If 360 / e = n, we say the pattern or lattice has n-fold rotational symmetry.) 4. What is the smallest positive angle of rotational symmetry for the lattice in Problem 2? Does the pattern in Figure 6.14 also have rotational symmetry through this angle? 5. Take various values of e such that 0 < e :s 360° and 360/e is an integer. Try to draw a lattice that has rotational symmetry through the angle e. In particular, can you draw a lattice with eight-fold rotational symmetry? We will show that values of e that are possible angles of rotational symmetry for a lattice are severely restricted. The technique we will use is to consider rotation trans­ formations in terms of different bases. Accordingly, let Re denote a rotation about the origin through an angle e and let £ be the standard basis for IR 2 . Then the standard matrix of Re is [ cos e - sin e [ Re l E - sm e . cos e ] 6. Referring to Problems 2 and 4, take the origin to be at the tails of u and v. (a) What is the actual (i.e., numerical) value of [ R e ] E in this case? (b) Let B be the basis { u, v}. Compute the matrix [ R 0 ] a7. In general, let u and v be any two linearly independent vectors in IR 2 and suppose that the lattice determined by u and v is invariant under a rotation through an angle e. If B = {u, v}, show that the matrix of R11 with respect to B must have the form where a, b, c, and d are integers. 8. In the terminology and notation of Problem 7, show that 2 cos e must be an integer. [Hint: Use Exercise 35 in Section 4.4 and Theorem 6.29.] 9. Using Problem 8, make a list of all possible values of e, with 0 < e :s 360°, that can be angles of rotational symmetry of a lattice. Record the corresponding val­ ues of n, where n = 360/e, to show that a lattice can have n-fold rotational symmetry if and only if n = 1 , 2, 3, 4, or 6. This result, known as the crystallographic restriction, was first proved by W Barlow in 1 894. 10. In the library or on the Internet, see whether you can find an Escher tiling for each of the five possible types of rotational symmetry-that is, where the smallest angle of rotational symmetry of the pattern is one of those specified by the crystal­ lographic restriction. 511 518 Chapter 6 Vector Spaces A p p l icati o n s Jlh Homogeneous Linear Diffe rential Equalions In Exercises 69-72 in Section 4.6, we showed that if y = y(t) is a twice-differentiable function that satisfies the differential equation y " + ay ' + by = 0 (1) then y is of the form if ,\ 1 and ,\ 2 are distinct roots of the associated characteristic equation ,\ 2 + a,\ + b = 0. (The case where ,\ 1 = ,\ 2 was left unresolved.) Example 6. 1 2 and Exercise 20 in this section show that the set of solutions to Equation ( 1 ) forms a subspace of 9F, the vector space of functions. In this section, we pursue these ideas further, paying particular attention to the role played by vector spaces, bases, and dimension. To set the stage, we consider a simpler class of examples. A differential equation of the form y' + ay = 0 � Theorem 6 . 3 1 (2) is called a first-order, homogeneous, linear differential equation. ("First-order" refers to the fact that the highest derivative that is involved is a first derivative, and "homogeneous" means that the right-hand side is zero. Do you see why the equa­ tion is "linear"?) A solution to Equation (2) is a differentiable function y = y (t) that satisfies Equation (2) for all values of t. It is easy to check that one solution to Equation (2) is y = e at (Do it.) However, we would like to describe all solutions-and this is where vector spaces come in. We have the following theorem. - . The set S of all solutions to y' + ay = 0 is a subspace of ?F. Since the zero function certainly satisfies Equation (2), S is nonempty. Let x and y be two differentiable functions of t that are in S and let c be a scalar. Then Proof x' + ax = 0 and y' + ay = 0 so, using rules for differentiation, we have ( x + y) ' + a (x + y) = x' + y' + ax + ay = (x' + ax) + (y ' + ay) = 0 + 0 = 0 and (cy) ' + a (cy) = cy ' + c (ay) = c (y' + ay) = c · O = 0 Hence, x + y and cy are also in S, so S is a subspace of ?F. Now we will show that S is a one-dimensional subspace of 9F and that { e - at } is a basis. To this end, let x = x ( t) be in S. Then, for all t, x' ( t) + ax (t) = 0 or x' ( t) = - ax ( t) Section 6.7 Applications 519 Define a new function z (t) = x ( t) eat . Then, by the Chain Rule for differentiation, z' ( t) = x ( t) ae at + x'(t)e at = a x (t)eat - ax (t)ea t =O Since z ' is identically zero, z must be a constant function-say, z ( t) = k. But this means that x (t)e at = z ( t) = k for all t so x ( t) = ke - a1 • Therefore, all solutions to Equation (2) are scalar multiples of the single solution y = e - at . We have proved the following theorem. Theorem 6 . 3 2 If S is the solution space of y' + ay = 0, then dim S = 1 and {e - a1 } is a basis for S. One model for population growth assumes that the growth rate of the popula­ tion is proportional to the size of the population. This model works well if there are few restrictions (such as limited space, food, or the like) on growth. If the size of the population at time t is p(t), then the growth rate, or rate of change of the population, is its derivative p' (t) . Our assumption that the growth rate of the population is pro­ portional to its size can be written as p ' (t) = kp (t) where k is the proportionality constant. Thus, p satisfies the differential equation p' - kp = 0, so, by Theorem 6.32, p (t) = ce k t for some scalar c. The constants c and k are determined using experimental data. Exa m p l e 6 . 8 1 in Michael Crichton'is(Nmentioned sewnovel York: Deln" inl, 1969), alwasthough the "vi l a i that novel supposedl y an alien vi r us. In real life, contaminated the town waterin 2000, supplyresulting of Walkerton, Ontario, in seven deaths and causing hundreds people to become seriously ill.of E. coli The Andromeda Strain E. coli The bacterium Escherichia coli (or E. coli, for short) is commonly found in the intestines of humans and other mammals. It poses severe health risks if it escapes into the environment. Under laboratory conditions, each cell of the bacterium divides into two every 20 minutes. If we start with a single E. coli cell, how many will there be after 1 day? We do not need to use differential equations to solve this problem, but we will, in order to illustrate the basic method. To determine c and k, we use the data given in the statement of the problem. If we take 1 unit of time to be 20 minutes, then we are given that p(O) = 1 and p(l) = 2. Therefore, Solution c = c · 1 = ce k · o = 1 and 2 = ee k · ! = i It follows that k = ln 2, so p (t) = e t ln 2 = e ln 2 ' = 2 t After 1 day, t = 72, so the number of bacteria cells will be p(72) = 2 7 2 (see Figure 6. 18). = 4.72 X 1 0 2 1 520 Chapter 6 Vector Spaces p (t) 5 x 1021 4.72 x 1021 4 x 1021 3 x 1021 I I I I I I I I I I 2 x 1021 1 x 1021 0 '---+---+--I--+--.+---+�-+-'-- t 0 10 20 30 40 50 60 70"' 72 Exponential growth Figure 6 . 1 8 Radioactive substances decay by emitting radiation. If m(t) denotes the mass of the substance at time t, then the rate of decay is m '(t). Physicists have found that the rate of decay of a substance is proportional to its mass; that is, m '(t) = km (t) or m ' - km = 0 where k is a negative constant. Applying Theorem 6.32, we have m (t) = ce k t for some constant c. The time required for half of a radioactive substance to decay is called its half-life. Exa m p l e 6 . 8 8 After 5.5 days, a 1 00 mg sample of radon-222 decayed to 37 mg. (a) Find a formula for m(t), the mass remaining after t days. (b) What is the half-life of radon-222? (c) When will only 1 0 mg remain? Solution (a) From m(t) = c/1, we have 100 = m (O) = ce k · o = c • 1 = c so m ( t) = l OOe k t With time measured in days, we are given that m ( 5.5) = 37. Therefore, 100 e 5 · 5 k = 37 so e 55k = 0.37 Solving for k, we find 5.5k = ln ( 0.37 ) so 1 Therefore, m ( t) = l OOe- 0 · 81. k= ln ( 0.37 ) 5.5 = -0.18 Section 6.7 Applications 521 m (t) 1 00 80 60 40 "so I I I I I I I I 20 2 3.85 : "" 4 6 8 10 Radioactive decay Figure 6 . 1 9 (b) To find the half-life of radon-222, we need the value of t for which m(t) = 50. Solving this equation, we find so Hence, and l00e - 0 · 1 8 t = 50 e - o . 1 s t = 0.50 - 0. 1 8t = ln ( t) = - ln 2 ln 2 t = - = 3.85 0. 1 8 Thus, radon-222 has a half-life of approximately 3.85 days. (See Figure 6. 1 9.) (c) We need to determine the value of t such that m(t) = 10. That is, we must solve the equation IOOe - o . 1 s t = 1 0 or e - o . 1 s t = 0.1 Taking the natural logarithm o f both sides yields - 0 . 1 8t = ln 0. 1 . Thus, ln 0. 1 -0.18 s o 1 0 mg o f the sample will remain after approximately 12.79 days. t = -- = 12.79 SeeFriedberg, A. J. Insel,by andS. H.L. E. Spence (Englewood Prentice-Hall, 1979).Cliffs, NJ: Linear Algebra The solution set S of the second-order differential equation y " + ay' + by = 0 is also a subspace of ?F (Exercise 20), and it turns out that the dimension of S is 2. Part (a) of Theorem 6.33, which extends Theorem 6.32, is implied by Theorem 4.40. Our approach here is to use the power of vector spaces; doing so allows us to obtain part (b) of Theorem 6.33 as well, a result that we could not obtain with our previous methods. Chapter 6 Vector Spaces 522 Theorem 6 . 3 3 Let S be the solution space of y " + ay' + by = 0 and let A 1 and A 2 be the roots of the characteristic equation A 2 + a,\ + b = 0. a. If A 1 * A 2 , then {e ;\ ' t, e ;\21} is a basis for S. b. If A 1 = A 2 , then {e A1 1, te ;\ 1 1} is a basis for S. R e m a rks • Observe that what the theorem says, in other words, is that the solutions of y " + ay' + by = 0 are of the form in the first case and in the second case. • Compare Theorem 6.33 with Theorem 4.38. Linear differential equations and linear recurrence relations have much in common. Although the former belong to continuous mathematics and the latter to discrete mathematics, there are many parallels. (a) We first show that {e A1 1, e A21} is contained in S. Let A be any root of the characteristic equation and let f( t) = eA t . Then Proof J'(t) = Ae A t and J"(t) = A 2 e At from which it follows that 2 f" + af' + bf = A e A t + aAe A t + be A t = (A2 + a,\ + b)eA t = 0 · e At = 0 Therefore, f is in S. But, since A 1 and A 2 are roots of the characteristic equation, this means that e A, t and e A, t are in S. The set {e A1 1, e A21} is also linearly independent, since if then, setting t = 0, we have y r Y = e e A2 Next, we set t = 1 to obtain c 1 + c2 = 0 or c2 = - c 1 ------------ But e A , - e A2 * 0, since e A , - e A2 = 0 implies that e A , = e\ which is clearly im­ possible if A 1 * A 2 . (See Figure 6.20.) We deduce that c 1 = 0 and, hence, c2 = 0, so {e A1 1, e A2t} is linearly independent. Since dim S = 2, {e A , t, e A21} must be a basis for S. Figure 6 . 2 0 (b) You are asked to prove this property in Exercise 2 1 . Section 6.7 Applications Exa m p l e 6 . 8 9 523 Find all solutions of y 11 - Sy' + 6y = 0. The characteristic equation is A 2 - SA + 6 = (A - 2) (,\ - 3) = 0. Thus, the roots are 2 and 3, so {e 2 1, e 3 1} is a basis for the solution space. It follows that the solutions to the given equation are of the form Solution The constants c 1 and c2 can be determined if additional equations, called bound­ ary conditions, are specified. Exa m p l e 6 . 9 0 Find the solution of y11 + 6y' + 9y = 0 that satisfies y(O) = 1, y'(O) = 0. The characteristic equation is A 2 + 6A + 9 = (A + 3) 2 = 0, so - 3 is a repeated root. Therefore, {e - 3 1, te - 3 1} is a basis for the solution space, and the general solution is of the form Solution The first boundary condition gives 1 = y(O) = c 1 e - 3 · o + 0 = c 1 so y = e - 3 t + c2 te - 3 1. Differentiating, we have y ' = - 3e - 3 t + c2 ( - 3 te- 3 t + e- 3 t) so the second boundary condition gives 0 = y'(O) = - 3e _ 3 . 0 + c 2 (0 + e _ 3 . 0 ) = - 3 + c2 or Therefore, the required solution is y = e- 3 t + 3te - 3 1 = (1 + 3t) e- 3 1 Theorem 6.33 includes the case in which the roots of the characteristic equation are complex. If A = p + qi is a complex root of the equation A 2 + a,\ + b = 0,