H1 Mathematics Textbook: 8865 Syllabus

H1 Mathematics Textbook CHOO YAN MIN & Answers. Covers 8865 (revised) syllabus. Includes TYS This version: 1st April 2017. The latest version will always be at this link. This textbook was first completed in August 2016. Since then, only small changes (usually corrections of typos) have been made. Page 2, Table of Contents www.EconsPhDTutor.com , Errors? Feedback? Email me! , With your help, I plan to keep improving this textbook. Page 3, Table of Contents www.EconsPhDTutor.com This book is licensed under the Creative Commons license CC-BY-NC-SA 4.0. You are free to: • Share — copy and redistribute the material in any medium or format • Adapt — remix, transform, and build upon the material The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. • NonCommercial — You may not use the material for commercial purposes. • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. Notices: You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation. No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. Author: Choo, Yan Min. Title: H1 Mathematics Textbook. ISBN: 978-981-11-0755-9 (e-book). Page 4, Table of Contents www.EconsPhDTutor.com The first thing to understand is that mathematics is an art. Paul Lockhart (2009, A Mathematician’s Lament, p. 22). A mathematician, like a painter or a poet, is a maker of patterns. If his patterns are more permanent than theirs, it is because they are made with ideas. ... Beauty is the first test: there is no permanent place in the world for ugly mathematics. - G.H. Hardy (1940 [1967], A Mathematician’s Apology, pp. 84-85). The scientist does not study nature because it is useful to do so. He studies it because he takes pleasure in it, and he takes pleasure in it because it is beautiful. - Henri Poincaré (1908 [1914], Science and Method, English trans., p. 22). Page 5, Table of Contents www.EconsPhDTutor.com About This Book This textbook is for Singaporean H1 Maths students. It is based exactly on the revised (8865) syllabus, which will be examined for the first time only in 2017.1 I assume that if you’re an H1 Maths student, you • • • • • have passed O-Level Mathematics; may or may not have taken O-Level Additional Mathematics; are somewhat weaker or less interested in maths than the average H2 Maths student; want to learn or do the minimum amount of maths necessary to get an A; won’t be studying such subjects as mathematics, physics, engineering, or economics at university. This textbook is thus written simply and non-rigorously. For example, there are few formal definitions or proofs.2 Simple and non-rigorous as this textbook may be, I fully intend that a careful study of this textbook (complemented by a capable teacher) will easily earn you your A in H1 Maths. For a comparison of H1 vs H2 maths, check out this brief 3-page document: H1 Maths vs H2 Maths: What’s the Difference? Which Should I Take? • FREE! This book is free. But if you paid any money for it, I certainly hope your money is going to me! This book is free because: 1. It is a shameless advertising vehicle for my awesome tutoring services. 2. The marginal cost of reproducing this book is zero. • DONATE! This book may be free, but donations are more than welcome! Donation methods in footnote.3 It’s irrational for Homo economicus to donate. But please consider donating because: 1. You’re a nice human being , [*emotional_manipulation*]. 2. Your donations will encourage me and others to continue producing awesome free content for the world. The old syllabus is 8864, to be examined for the last time in 2017. It is not very different from 8865. This is in contrast to my H2 Mathematics Textbook, which is an authoritative reference that the interested H1 Maths student should look at. That H2 Mathematics Textbook covers the same topics (and more) as this textbook, but much more rigorously and thoroughly. Indeed a good deal of this H1 Mathematics Textbook is simply a diluted version of that H2 Mathematics Textbook. 3 Singapore. POSB Savings Account 174052271 or OCBC Savings Account 5523016383 (Name: Choo Yan Min). International. Bitcoin wallet: 1GDGNAdGZhEq9pz2SaoAdLb1uu34LFwViz. PayPal ychoo@umich.edu (Name: Yan Min Choo, USD preferred because this account was set up in the US). USA. Venmo link (Name: Yanmin Choo). 1 2 Page 6, Table of Contents www.EconsPhDTutor.com • HELP ME IMPROVE THIS BOOK! Feel free to email me if: 1. There are any errors in this book. Please let me know even if it’s something as trivial as a spelling mistake or a grammatical error. 2. You have absolutely any suggestions for improvement. 3. Any part of this book is less than crystal clear. Here’s an anecdote about Richard Feynman, the great teacher and physicist: Feynman was once asked by a Caltech faculty member to explain why spin 1/2 particles obey Fermi-Dirac statistics. He gauged his audience perfectly and said, “I’ll prepare a freshman lecture on it.” But a few days later he returned and said, “You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it.” I agree: If you can’t explain something simply, you don’t understand it well enough.4 And as a corollary, the best way to gauge whether you understand something is to see if you can explain it simply to someone else. If at any point in this textbook, you have read the same passage a few times, tried to reason it through, and still find things confusing, then it is a failure on MY part. Please let me know and I will try to rewrite it so that it’s clearer. (There is also the possibility that I simply messed up! So please let me know if there’s anything confusing!) I deeply value any feedback, because I’d like to keep improving this textbook for the benefit of everyone! I am very grateful to all the kind folks who’ve already written in, allowing me to rid this book of more than a few embarrassing errors. • LyX rocks! This book was written using LYX.5 • Is the font size big enough? You’re probably reading this on some device. So I’ve tried to set the font sizes and stuff so that one can comfortably read this on a device as small as a seven-inch tablet. It should also be possible to read this on a phone, though somewhat less comfortably. (Please let me know if you have any feedback about this!) (I’ll probably be contacting some publishers to see if they want to do a print version of this, for anyone who prefers it in print.) This quote or some similar variant is often (mis)attributed to Einstein. But as Einstein himself once said, “73% of Einstein quotes are misattributed.” 5 A L TEX is the typesetting program used by most economists and scientists. But LATEX can be difficult to use. LYX is a user-friendly GUI version of LATEX. LYX has boosted my productivity by countless hours over the years and you should use LYX too! 4 Page 7, Table of Contents www.EconsPhDTutor.com Tips for the Student • Read maths slowly. Reading maths is not like reading Harry Potter. Most of Harry Potter is fluff. There is little fluff in maths. So go slowly. Dwell upon and carefully consider every sentence in this textbook. Make sure you completely understand what each statement says and why it is true. Reading maths is very different from reading any other subject matter. If you don’t quite understand some material, you might be tempted to move forward anyway. Don’t. In maths, later material usually builds on earlier material. So if you simply move forward, this will usually cost you more time and frustration in the long run. Better then to stop right there. Keep working on it until you “get” it. Ask a friend or a teacher for help. Feel free to even email me! (I’m always interested to know what the common points of confusion are and how I can better clear them up.) • Examples and exercises are your best friends. So work through them. A good stock of examples, as large as possible, is indispensable for a thorough understanding of any concept, and when I want to learn something new, I make it my first job to build one. - Paul Halmos (1983, Google Books). Work through all the examples and exercises. Merely moving your eyeballs is not the same as working. Working means having pencil and paper by your side and going through each example/exercise word-by-word, line-by-line. For example, I might say something like “x2 − y 2 = 0. Thus, (x − y)(x + y) = 0.” If it’s not obvious to you why the first sentence implies the second, stop right there and work on it until you understand why. Don’t just let your eyeballs fly over these sentences and pretend that your brain is “getting” it. I will often not bother to explain some steps, especially if they simply involve some simple algebra. • You get a List of Formulae during the A-level exam. It’s called List of Formulae MF26. It’s available at this link (MF26). (I cannot guarantee though that your JC will give you the List during your JC common tests and exams.) Page 8, Table of Contents www.EconsPhDTutor.com • Online Calculators Google is probably the quickest for simple calculations. Type in anything into your browser’s Google search bar and the answer will instantly show up: Wolfram Alpha is somewhat more advanced (but also slower). Enter “sin x” for example and you’ll get graphs, the derivative, the indefinite integral, the Maclaurin series, and a bunch of other stuff you neither know nor care about. The Derivative Calculator and the Integral Calculator are probably unbeatable for the specific purposes of differentiation and integration. Both give step-by-step solutions for anything you want to differentiate or integrate. Here is a collection of spreadsheets I made. These spreadsheets are for doing tedious and repetitive calculations that H2 Maths students (and hence also H1 Maths students) will often encounter. As with anything I do, I welcome any feedback you may have about these spreadsheets. Perhaps in the future I will make a more attractive version of it. (Instructions: Click “Make a copy” to open up your own independent copy of this spreadsheet. Enter your input in the yellow cells. Output is produced in the blue cells. If you mess up anything, simply click the same link and “Make a copy” again.) Page 9, Table of Contents www.EconsPhDTutor.com Use of Graphing Calculators You are required to know how to use a graphing calculator.6 This textbook will give only a very few examples involving graphing calculators. There is no better way of learning to use it than to play around with it yourself. By the time you sit down for your A-level exams, you should have had plenty of practice with it. You can also use any of the seven calculators in the list below (last updated by SEAB on March 1st, 2016, PDF). But this textbook will stick with the TI-84 PLUS Silver Edition (which I’ll simply call the TI84). (My understanding is that most students use a TI calculator and that the five approved TI calculators are pretty similar.) I’ll always start each example with the calculator freshly reset. 6 Pretty bizarre that in this age of the smartphone, they want you to learn how to use these clunky and now-useless devices from the ’80s and ’90s. It is the equivalent of learning to program a VCR. IMHO it’d be much better to teach you to some simple programming or Excel (or whatever spreadsheet program). “B-b-but ... how would such learning be tested in an exam format?” Ay, there’s the rub. In the Singapore education system, anything that cannot be “examified” is not worth learning. Page 10, Table of Contents www.EconsPhDTutor.com Contents About This Book 6 Tips for the Student 8 Use of Graphing Calculators 10 I 17 Functions and Graphs 1 Dividing By Zero 18 2 Functions 19 3 Graphs: Introduction 21 4 Graphs: Intercepts 23 5 Graphs: Turning Points 25 6 Quadratic Equations 27 7 Graphs: Asymptotes 32 8 Exponents: Laws 35 9 Exponents: Graphs 36 10 Exponential Growth and Decay 38 11 Logarithms: Introduction 42 12 Logarithms: Laws 43 13 Logarithms: Graphs 45 14 Logarithmic Growth 47 Page 11, Table of Contents www.EconsPhDTutor.com 15 Graphs: Symmetry 49 16 Graphing with the TI84 51 17 Simultaneous Equations: One Linear and One Quadratic 53 18 Solving Equations Using Your TI84 56 19 Quadratic Inequalities 57 20 Solving Inequalities Using Your TI84 58 21 Formulating an Equation or a System of Linear Equations from a Problem Situation 61 II Calculus 62 22 Equations of Lines 63 23 The Derivative as Slope of the Tangent 65 24 Chain Rule 72 25 Increasing, Decreasing, and f ′ 74 26 Finding Turning Points (the First Derivative Test) 76 27 Inflexion Points 80 28 Finding Max/Min Points on the TI84 85 29 Finding the Derivative at a Point on the TI84 87 30 Connected Rates of Change Problems 88 31 Integration as the Reverse of Differentiation 90 Page 12, Table of Contents www.EconsPhDTutor.com 32 The Constant of Integration 93 33 Basic Rules of Integration 94 34 The Definite Integral as the Area Under a Graph 97 35 Area between a Curve and Lines Parallel to Axes 101 36 Area between a Curve and a Line 102 37 Area between Two Curves 103 38 Finding Definite Integrals on your TI84 104 III 105 Probability and Statistics 39 How to Count: Four Principles 106 39.1 How to Count: The Addition Principle . . . . . . . . . . . . . . . . . . . . . . . 107 39.2 How to Count: The Multiplication Principle . . . . . . . . . . . . . . . . . . . 110 39.3 How to Count: The Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . 114 39.4 How to Count: The Complements Principle . . . . . . . . . . . . . . . . . . . . 116 40 How to Count: Permutations 117 40.1 Permutations with Repeated Elements . . . . . . . . . . . . . . . . . . . . . . . 121 40.2 Partial Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 40.3 Permutations with Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 41 How to Count: Combinations 127 41.1 Pascal’s Triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 41.2 The Combination as Binomial Coefficient . . . . . . . . . . . . . . . . . . . . . 131 41.3 The Number of Subsets of a Set is 2n . . . . . . . . . . . . . . . . . . . . . . . . 134 Page 13, Table of Contents www.EconsPhDTutor.com 42 Probability: Introduction 136 42.1 Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 42.2 The Experiment as a Model of Scenarios Involving Chance . . . . . . . . . . . 138 42.3 Mutually Exclusive Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 42.4 Complementary Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 42.5 The Union of Two Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 42.6 The Intersection of Two Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 42.7 Properties of Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 43 Probability: Conditional Probability 145 44 Probability: Independence 147 45 Probability: Not Everything is Independent 151 46 Random Variables: Introduction 153 47 Random Variables: Probability Distribution 154 48 Random Variables: Independence 158 49 Random Variables: Expectation 160 49.1 The Expectation Operator is Linear . . . . . . . . . . . . . . . . . . . . . . . . 163 50 Random Variables: Variance 165 50.1 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 50.2 Properties of the Variance Operator . . . . . . . . . . . . . . . . . . . . . . . . 172 51 The Binomial Distribution 174 51.1 Probability Distribution of the Binomial R.V. . . . . . . . . . . . . . . . . . . 175 51.2 The Mean and Variance of the Binomial Random Variable Page 14, Table of Contents . . . . . . . . . . 176 www.EconsPhDTutor.com 52 The Continuous Uniform Distribution 178 52.1 The Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . 178 52.2 Important Digression: P (X ≤ k) = P (X < k) . . . . . . . . . . . . . . . . . . . 180 52.3 The Cumulative Distribution Function (CDF) . . . . . . . . . . . . . . . . . . 181 52.4 The Probability Density Function (PDF) . . . . . . . . . . . . . . . . . . . . . 182 53 The Normal Distribution 183 53.1 The Normal Distribution, in General . . . . . . . . . . . . . . . . . . . . . . . . 189 53.2 Sum of Independent Normal Random Variables . . . . . . . . . . . . . . . . . 198 54 The Central Limit Theorem and The Normal Approximation 202 55 Sampling 205 55.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 55.2 Population Mean and Population Variance . . . . . . . . . . . . . . . . . . . . 206 55.3 Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 55.4 Distribution of a Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 55.5 A Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 55.6 Sample Mean and Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . 211 55.7 Sample Mean and Sample Variance are Unbiased Estimators . . . . . . . . . 217 55.8 The Sample Mean is a Random Variable . . . . . . . . . . . . . . . . . . . . . 220 55.9 The Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . 221 55.10Non-Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 56 Null Hypothesis Significance Testing (NHST) 223 56.1 One-Tailed vs Two-Tailed Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 56.2 The Abuse of NHST (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 56.3 Common Misinterpretations of the Margin of Error (Optional) . . . . . . . . 231 56.4 Critical Region and Critical Value . . . . . . . . . . . . . . . . . . . . . . . . . . 234 56.5 Testing of a Population Mean 2 (Small Sample, Normal Distribution, σ Known) . . . . . . . . . . . . . . . . . 236 Page 15, Table of Contents www.EconsPhDTutor.com 56.6 Testing of a Population Mean 2 (Large Sample, Any Distribution, σ Known) . . . . . . . . . . . . . . . . . . . 238 56.7 Testing of a Population Mean 2 (Large Sample, Any Distribution, σ Unknown) . . . . . . . . . . . . . . . . . 240 56.8 Formulation of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 57 Correlation and Linear Regression 243 57.1 Bivariate Data and Scatter Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 243 57.2 Product Moment Correlation Coefficient (PMCC) . . . . . . . . . . . . . . . . 245 57.3 Correlation Does Not Imply Causation (Optional) . . . . . . . . . . . . . . . . 251 57.4 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 57.5 Ordinary Least Squares (OLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 57.6 TI84 to Calculate the PMCC and the OLS Estimates . . . . . . . . . . . . . . 259 57.7 Interpolation and Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 57.8 The Higher the PMCC, the Better the Model? . . . . . . . . . . . . . . . . . . 269 IV Ten-Year Series 271 58 Past-Year Questions for Section A: Pure Mathematics 272 59 Past-Year Questions for Section B: Prob. & Stats 287 V 313 Answers to Exercises 60 Answers to Exercises in Part I: Functions and Graphs 314 61 Answers to Exercises in Part II: Calculus 330 62 Answers to Exercises in Part III: Probability and Statistics 340 63 Answers to Exercises in Part IV (2006-2015 A-Level Exams) 369 63.1 Answers for Ch. 58: Pure Mathematics . . . . . . . . . . . . . . . . . . . . . . 369 63.2 Answers for Ch. 59: Probability and Statistics . . . . . . . . . . . . . . . . . . 385 Page 16, Table of Contents www.EconsPhDTutor.com Part I Functions and Graphs Page 17, Table of Contents www.EconsPhDTutor.com 1 Dividing By Zero This chapter is a brief warning against a common mistake — dividing by 0. Students have little trouble avoiding this mistake if the divisor is obviously a big fat 0. Instead, students usually make this mistake when the divisor is an unknown constant or variable that might be 0. Example 1. Find the values of x for which x(x − 1) = (2x − 2)(x − 1). Here’s the wrong solution: “Divide both sides by x − 1 to get x = 2x − 2. So x = 2.” Here’s the correct solution: “Case #1. Suppose x − 1 = 0. Then the given equation is satisfied. So x = 1 is one possible value for which x(x − 1) = (2x − 1)(x − 1). Case #2. Now suppose x − 1 ≠ 0. So we can divide both sides by x − 1 to get x = 2x − 2. So x = 2. Conclusion. The two possible values of x for which x(x − 1) = (2x − 1)(x − 1) are x = 1 and x = 2.” Moral of the story. Whenever you divide by a certain quantity, make sure it’s non-zero. If you’re not sure whether it equals 0, then break up your analysis into two cases, as was done in the above example: Case #1 — the quantity equals 0 (and see what happens in this case); Case #2 — the quantity is non-zero (in which case you can go ahead and divide). By the way, let’s take this opportunity to clear up another popular misconception — You may have heard that 1/0 = ∞. This is wrong. 1/0 ≠ ∞. Instead, any non-zero number divided by 0 is undefined.7 “Undefined” is the mathematician’s way of saying, “You haven’t told me what you are talking about. So what you are saying is meaningless.” Exercise 1. What’s wrong with this “proof” that 1 = 0? (Answer on p. 314.) 1. Let x, y be positive numbers such that x = y. 2. Square both sides: x2 = y 2 . 3. Rearrange: x2 − y 2 = 0 4. Factorise: (x − y)(x + y) = 0. 5. Divide both sides by x − y to get x + y = 0. 6. Since x = y, sub y = x into the above equation to get 2x = 0. 7. Divide both sides by 2x to get 1 = 0. 7 One exception is 0/0, which is indeterminate. This means that 0/0 is sometimes undefined, but can sometimes be defined under certain circumstances. Page 18, Table of Contents www.EconsPhDTutor.com 2 Functions Undoubtedly the most important concept in all of mathematics is that of a function — in almost every branch of modern mathematics functions turn out to be the central objects of investigation. - Michael Spivak (1994 [2006], Calculus, p. 39). Informally, a function is a rule that maps each input to exactly one output. Example 2. Consider the function f defined by f (x) = x2 +5. The input is any real number x, the corresponding output is the real number x2 + 5. For example, f (3) = 32 + 5 = 14. In words, we may say either of the following equivalent statements: • f maps the input 3 to the output 14; or • the value of f at 3 is 14. Example 3. Consider the function g defined by g(x) = x/ (x2 + 1). The input is any real number x, the corresponding output is the real number x/ (x2 + 1). For example, g(3) = 3/ (32 + 1) = 0.3. In words, we may say either of the following equivalent statements: • g maps the input 3 to the output 0.3; or • the value of g at 3 is 0.3. We will usually consider only functions whose inputs and outputs are real numbers. But in general, this need not be the case. To illustrate this point, here are two examples. Example 4. Consider the function h that maps each person’s name to the first letter of that name. So for example, h (Lee Kuan Yew) = L. In words, we may say either of the following equivalent statements: • h maps the input Lee Kuan Yew to the output L; or • the value of h at Lee Kuan Yew is L. Another example: h (Barack Hussein Obama) = B. In words, we may say either of the following equivalent statements: • h maps the input Barack Hussein Obama to the output B; or • the value of h at Barack Hussein Obama is B. Page 19, Table of Contents www.EconsPhDTutor.com Example 5. Consider the function i that maps each building to the country of its location. So for example, i (Burj Khalifa) = United Arab Emirates and i (Petronas Towers) = Malaysia. Exercise 2. Let f (x) = 7x − 3. What are f (0), f (1), and f (2)? (Answer on p. 314.) Exercise 3. Let g be the function that maps each country to its capital. What are g(France) and g(Japan)? (Answer on p. 314.) Students frequently believe that f (x) denotes a function. This is wrong. f and f (x) refer to two different things. f denotes a function. f (x) denotes the value of f at x. This may seem like an excessively pedantic distinction. But maths is precise and pedantic. In maths, what we mean is precisely what we say and what we say is precisely what we mean. There is never any room for ambiguity or alternative interpretations. Page 20, Table of Contents www.EconsPhDTutor.com 3 Graphs: Introduction A point is any ordered pair (x, y) of real numbers. The graph of an equation is the set of points (x, y) that satisfy the equation. Example 6. Consider the equation y = 2x + 3. Its graph is the set of points (x, y) that satisfy the equation y = 2x + 3. For example, the point (x, y) = (0, 3) is in the graph of the equation y = 2x + 3, because 3 = 2 ⋅ 0 + 3. We can illustrate the graph of an equation in what is called the cartesian plane. The graph of y = 2x + 3 is drawn below. The point (0, 3) is marked in green. 8 6 4 2 0 -2 -1 0 1 2 -2 Example 7. Consider the equation y = x2 − 1. Its graph is the set of points (x, y) that satisfy the equation y = x2 − 1. For example, the point (x, y) = (3, 8) is in the graph of the equation y = x2 − 1, because 8 = 32 − 1. The graph of y = x2 − 1 is drawn below. The point (3, 8) is marked in green. 10 6 2 -4 Page 21, Table of Contents -2 -2 0 2 4 www.EconsPhDTutor.com The graph of a function f is the graph of the equation y = f (x). Example 8. Consider the function f defined by f (x) = x2 + 5. The graph of f is defined to be the graph of the equation y = f (x); equivalently, it is the graph of the equation y = x2 +5. The graph of f is drawn below. The point (2, 9) is marked in green. 8 4 0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 Example 9. Consider the function g defined by g(x) = x/ (x2 + 1). The graph of g is defined to be the graph of the equation y = g(x); equivalently, it is the graph of the equation y = x/ (x2 + 1). The graph of g is drawn below. The point (2, 0.4) is marked in green. 1 0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1 Exercise 4. Graph the following equations: (i) y = 2x − 1; (ii) y = 1 − x2 . Mark the point where x = 2. (Answer on p. 314.) Exercise 5. Graph the following functions: (i) the function f defined by f (x) = 5 − 3x; (ii) the function g defined by g(x) = 3x + x2 . Mark the point where x = 2. (Answer on p. 315.) Page 22, Table of Contents www.EconsPhDTutor.com 4 Graphs: Intercepts A graph may also intersect the vertical axis (also known as the y-axis). The y-coordinate of any such intersection point is called a vertical intercept (or y-intercept). A graph may intersect the horizontal axis (also known as the x-axis). The x-coordinate of any such intersection point is called a horizontal intercept (or x-intercept). Horizontal intercepts are also called zeros or roots (of the corresponding equation or function). (We’ll use the terms zeros and roots interchangeably in this textbook.) Example 10. Graphed below is the equation y = 2x − 1. The graph has one horizontal intercept, 0.5, and one vertical intercept, −1. Equivalently, the graph intersects the x-axis at (0.5, 0); and the y-axis at (0, −1). 4 3 2 1 0 -2 -1 -1 0 1 2 -2 -3 -4 -5 -6 We also call 0.5 the zero or root of the equation y = 2x − 1, because 2(0.5) − 1 = 0. That is, x = 0.5 satisfies the equation y = 0 or 2x − 1 = 0. Page 23, Table of Contents www.EconsPhDTutor.com Example 11. Graphed below is the function f defined by f (x) = x2 − 1. The graph has two horizontal intercepts, −1 and 1, and one vertical intercept −1. Equivalently, the graph intersects the x-axis at (−1, 0) and (1, 0); and the y-axis at (0, −1). 4 3 2 1 0 -2 -1 0 1 2 -1 -2 We also call −1 and 1 the zeros or roots of the function f , because f (−1) = 0 and f (1) = 0. That is, x = −1 or x = 1 satisfies the equation f (x) = 0. Page 24, Table of Contents www.EconsPhDTutor.com 5 Graphs: Turning Points A turning point of a graph is either a maximum turning point or a minimum turning point. Example 12. Graphed below are the functions f and g defined by f (x) = (x + 1)2 and g(x) = −(x − 1)2 . The graph of f has a minimum turning point, namely (−1, 0). The graph of g has a maximum turning point, namely (1, 0). Example 13. Graphed below is the equation y = x3 − 12x + 3. This graph has a maximum turning point (−2, 19) and a minimum turning point (2, −13). Page 25, Table of Contents www.EconsPhDTutor.com Example 14. Graphed below is the equation y = x + 1. This graph has neither maximum nor minimum turning points. Informally, a maximum turning point is where the y-value is greater than all nearby points. Similarly, a minimum turning point is where the the y-value is smaller than all nearby points. Page 26, Table of Contents www.EconsPhDTutor.com 6 Quadratic Equations Section A of the A-level exams will have some questions about quadratic equations. In theory, you should have completely mastered quadratic equations from your study of O-Level Mathematics. In practice? Probably not. So this chapter reviews quadratic equations. Example 15. Below are the graphs of the equations y = x2 + 3x + 1 (red), y = x2 + 2x + 1 (blue), y = x2 +x+1 (green), y = −x2 +x+1 (red dotted), y = −x2 −2x−1 (blue dotted), and y = −x2 − x − 1 (green dotted). Remark 1. You may have heard of the term parabola (plural: parabolae). Just so you know, the graph of a quadratic equation is an example of a parabola. But don’t worry, the word parabola will never show up on the A-level H1 Maths exam. Page 27, Table of Contents www.EconsPhDTutor.com We now learn how to complete the square. In general, (x + k)2 = x2 + 2kx + k 2 . Thus, b 2 b b2 (x + ) = x2 + x + 2 . 2a a 4a Or rearranging: b 1 b 2 b2 x2 + x = (x + ) − . a 2a 4a 1 In a moment we’ll make use of =. Now let’s consider the quadratic equation y = ax2 + bx + c. Assume that a ≠ 0, otherwise the equation simplifies to y = bx + c, which is just a straight line. We now manipulate the quadratic expression ax2 + bx + c. First, we divide by a (this is allowed because of our assumption that a ≠ 0): b c 2 ax2 + bx + c = a (x2 + x + ) . a a 1 2 Now plug = into = to get: c b 2 b2 − 4ac b 2 b2 ]. ax + bx + c = a [(x + ) − 2 + ] = a [(x + ) − 2a 4a a 2a 4a2 2 What we just did above is called completing the square. We can now compute the zeros of the equation y = ax2 + bx + c. ⇐⇒ ⇐⇒ ax2 + bx + c = 0 ⇐⇒ b 2 b2 − 4ac ) − =0 2a 4a2 ⇐⇒ (x + √ b ± b2 − 4ac x+ = 2a 2a ⇐⇒ a [(x + b 2 b2 − 4ac ) − ]=0 2a 4a2 (x + b 2 b2 − 4ac ) = 2a 4a2 √ −b ± b2 − 4ac . x= 2a This last expression solves ax2 + bx + c = 0. This expression will NOT be printed in the A-Level List of Formulae! So be sure you remember it! Page 28, Table of Contents www.EconsPhDTutor.com We can distinguish between six categories of quadratic equations, based on the signs of a (the coefficient of x2 ) and b2 − 4ac (the discriminant). Each of these six categories are illustrated in the figure below (reproduced from above). The properties of quadratic equations are summarised in the following table and discussed on the next page. Category 1. a > 0, b2 − 4ac > 0 2. a > 0, b2 − 4ac = 0 3. a > 0, b2 − 4ac < 0 4. a < 0, b2 − 4ac > 0 5. a < 0, b2 − 4ac = 0 6. a < 0, b2 − 4ac < 0 Page 29, Table of Contents Features ∪-shaped. Intersects the x-axis at two points. ∪-shaped. Just touches the x-axis at the minimum point. ∪-shaped. Doesn’t intersect the x-axis. ∩-shaped. Intersects the x-axis at two points. ∩-shaped. Just touches the x-axis at the maximum point. ∩-shaped. Doesn’t intersect the x-axis. www.EconsPhDTutor.com • The vertical intercept of the graph of a quadratic equation is always simply c. This is because plugging x = 0 into ax2 + bx + c yields c. • The sign of a. – If a > 0, then the graph is ∪-shaped and has a minimum turning point at x = −b/2a. – Conversely, if a < 0, then the graph is ∩-shaped and has a maximum turning point at x = −b/2a. • The sign of the discriminant b2 −4ac. This name makes sense, because the discriminant helps us discriminate between several possible cases of the equation ax2 + bx + c = 0: – If b2 − 4ac > 0, then: ∗ There are two real roots (or zeros or horizontal intercepts), namely −b ± √ b2 − 4ac . 2a ∗ Moreover, we can write ax2 + bx + c = (x − −b + √ √ b2 − 4ac −b + b2 − 4ac ) (x + ). 2a 2a What we have just done is to factorise the expression ax2 + bx + c. Factorisation is often a useful trick to play. Notice that if you plug in either of the roots into the right hand side (RHS) of the above equation, we do indeed get zero, as expected. – If b2 − 4ac = 0, then: ∗ There is only one real root (or zero or horizontal intercept), namely −b/2a. ∗ Moreover, we can write −b 2 b 2 ax + bx + c = (x − ) = (x + ) . 2a 2a 2 ∗ Notice that if you plug x = −b/2a into the RHS of the above equation, we do indeed get zero, as expected. – If b2 − 4ac < 0, then: ∗ There are no real roots (or zeros or horizontal intercepts). ∗ There is no way to factorise the expression ax2 +bx+c (at least without using complex numbers, which are not covered in H1 Maths). Page 30, Table of Contents www.EconsPhDTutor.com Exercise 6. For each of the following equations, sketch its graph and identify its intercepts and turning points (if these exist). (a) y = 2x2 + x + 1. (b) y = −2x2 + x + 1. (c) y = x2 + 6x + 9. (Answer on p. 316.) Exercise 7. (Answer on p. 317.) (i) When does the quadratic equation y = ax2 + bx + c have (a) two real roots? (b) two equal roots? (c) no real roots? (ii) When is ax2 + bx + c (a) positive for all possible values of x? (b) negative for all possible values of x? Page 31, Table of Contents www.EconsPhDTutor.com 7 Graphs: Asymptotes A horizontal asymptote is a horizontal line of the form y = a. Example 16. The graph below has horizontal asymptote y = 2, because as x grows infinitely large (i.e. towards ∞), y grows ever closer to (but is never equal to) 2. 5 4 3 2 1 0 Example 17. The graph below has horizontal asymptote y = 2, because as x grows infinitely small (i.e. towards −∞), y grows ever closer to (but is never equal to) 2. 5 4 3 2 1 0 Page 32, Table of Contents www.EconsPhDTutor.com A vertical asymptote is a vertical line of the form x = b. Example 18. The graph below has vertical asymptote x = 3, because as x grows ever closer to (but is never equal to) 3, y grows infinitely large (i.e. towards ∞). -4 -2 0 2 4 6 Example 19. The graph below has vertical asymptote x = 3, because as x grows ever closer to (but is never equal to) 3, y grows infinitely small (i.e. towards −∞). -4 -2 Page 33, Table of Contents 0 2 4 6 www.EconsPhDTutor.com Here are the informal definitions. A graph has a • Horizontal asymptote y = a if: As x grows ever larger or smaller (towards ∞ or −∞), y grows ever closer to (but never equals) a. • Vertical asymptote x = b if: As x grows ever closer to (but never equals) b, y grows ever larger or smaller (towards ∞ or −∞). Page 34, Table of Contents www.EconsPhDTutor.com 8 Exponents: Laws For all real numbers x, we have x1 = x and x0 = 1.8 For all real numbers x, y, a, and b (provided any denominators are non-zero): x a xa ( ) = a, y y xa ⋅ xb = xa+b , xa = xa−b , xb (xa ) b = xab , (xy)a = xa y a , x−a = 1 , xa a1/b = √ b a, ac/b = √ √ c b ac = ( b a) . Exercise 8. (Answer on p. 317.) Simplify the two expressions below. (53x ⋅ 251−x ) , 52x+1 + 3(25x ) + 17(52x ) (8x+2 − 34(23x )) . √ 2x+1 ( 8) Exercise 9. (Answer on p. 318.) Is each of the following true? (If true, explain why. If false, simply give a counterexample.) (i) x(a ) = xab ; b 8 b (ii) (xa ) = xab . By convention, 00 is usually defined to be equal to 1 – this textbook will follow this practice. Page 35, Table of Contents www.EconsPhDTutor.com 9 Exponents: Graphs e = 2.7182818 . . . is the constant known as Euler’s number. The significance of Euler’s number will be revealed only later on, when we study calculus. Example 20. The graphs below are of the equations y = 2x , y = 3x , y = 3 ⋅ 2x , y = 2 ⋅ 3x , and y = ex . 10 9 8 7 6 5 4 3 2 1 0 Each of these graphs has horizontal asymptote y = 0, because as x grows infinitely small (i.e. to −∞), y grows ever closer to (but never equals) 0. Page 36, Table of Contents www.EconsPhDTutor.com Example 21. The graphs below are of the functions f , g, h, and i defined by f (x) = 4x , g(x) = 5x , h(x) = 5 ⋅ 4x , and i(x) = 4 ⋅ 5x . 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Each of these graphs has horizontal asymptote y = 0, because as x grows infinitely small (i.e. to −∞), y grows ever closer to (but never equals) 0. Exercise 10. Graph (on the same diagram) the following equation and function: (i) y = 6x ; (ii) f defined by f (x) = 7x . (Answer on p. 318.) Page 37, Table of Contents www.EconsPhDTutor.com 10 Exponential Growth and Decay Exam Tip The topic of exponential growth and decay is new to the 8865 syllabus. So there are no TYS questions covering this topic. Informally, a quantity exhibits exponential growth if its growth rate is increasing.9 Example 22. Bacteria in a petridish double in weight every 10 seconds. Let bt be the total weight (micrograms) of the bacteria in the petridish at time t. Let t (seconds) be time. Initially, there were 7 micrograms of bacteria. The graph below is bt against t. In general, a quantity yt that grows exponentially takes the following form: yt = y0 ⋅ 2t/d , where y0 is the quantity at time t = 0, and d is the number of units of time it takes for the quantity to double. So in this case, we have bt = 7 ⋅ 2t/10 . 9 More precisely, its growth rate is proportional to the current magnitude of the quantity. Page 38, Table of Contents www.EconsPhDTutor.com Example 23. Rabbits in a forest double in population every 6 years. Let rt be the number of rabbits at time t (years). Initially, there were 20 rabbits. So rt = 20⋅2t/6 (graphed below). Example 24. The number of Singapore citizens originally from the People’s Republic of China doubles every 5 years. In the year 2000, there were 50, 000 such citizens. Let cy be the total number of such citizens in the year y. The graph below is of cy against y. Even more generally than before, a quantity xt that grows exponentially takes the following form: xt = xT ⋅ 2(t−T )/d , where xT is the quantity at time t = T , and d is the number of units of time it takes for the quantity to double. So in this case, we have cy = 50000 ⋅ 2(y−2000)/5 . Extrapolating, there’ll be cy = 3, 200, 000 such citizens in the year 2030. Page 39, Table of Contents www.EconsPhDTutor.com Example 25. The Prime Minister’s salary doubles every 12 years. In the year 2000, the PM’s annual salary was $2 million. Let sy be the PM’s annual salary (in millions of Singapore dollars) in the year y. So sy = 2 ⋅ 2(y−2000)/12 (graphed below). Extrapolating, the PM’s salary will be s2060 = 64 million Singapore dollars in 2060. Exponential decay is simply negative exponential growth. Example 26. The world population of panda bears halves every 20 years. In the year 1960, there were 64, 000 panda bears. Let py be the total number of panda bears in the year y. So py = 64000 ⋅ 0.5(y−1960)/20 (graphed below). Extrapolating, there will be p2020 = 8, 000 pandas in 2020. Page 40, Table of Contents www.EconsPhDTutor.com Example 27. The number of ethnic-Malay Singapore citizens halves every 10 years. In the year 1990, there were 800, 000 such citizens. Let mty be the total number of such citizens in the year y. So my = 800000 ⋅ 0.5(y−1900)/5 (graphed below). Extrapolating, there will be m2020 = 100, 000 such citizens in the year 2020. Exercise 11. Let by be the number of Singaporean billionaires in the year y. This number doubles every 7 years. In 1990, there were 4 Singaporean billionaires. (Answer on p. 319.) (i) Write down an equation that expresses by in terms of y. (ii) Graph this equation. (iii) Extrapolating, how many Singaporean billionaires will there be in 2025? Page 41, Table of Contents www.EconsPhDTutor.com 11 Logarithms: Introduction We define a = logb c to be the number such that ba = c. We call b the logarithmic base. Example 28. log2 8 = 3, because 23 = 8. log2 16 = 4, because 24 = 16. log3 9 = 2, because 32 = 9. log4 2 = 0.5, because 40.5 = 2. We define lg c = log10 c. Example 29. lg 100 = 2, because 102 = 100. lg 1000 = 3, because 103 = 1000. lg 10 = 1, because 101 = 10. lg 1 = 0, because 100 = 1. Remark 2. The Singapore-Cambridge A-level exams write lg c to mean the base-10 logarithm of c, so that’s what we’ll stick to. But you should know that some other writers (including most calculators) simply write log c to mean the same. We define ln c = loge c, where e = 2.7182818 . . . is Euler’s number. Example 30. ln e = 1, because e1 = e. ln e2 = 2, because e2 = e2 . Exercise 12. (Answer on p. 319.) (i) Compute each of the following: ln (1/e2 ), log5 0.008, lg 100000. (ii) Given the following, find the constants a, b, and c: loga 16 = 4, logb 0.25 = −1, and logc 5 = 1. (iii) Rewrite the following equations in log form: y = 3x and 5 = pq . (iv) Rewrite the following equations in exponential form: α = log4 β and logγ δ = 17. Page 42, Table of Contents www.EconsPhDTutor.com 12 Logarithms: Laws For all real numbers x, we have logx 1 = 0, because x0 = 1 (this was stated in our discussion of the laws of exponents). And if c ≤ 0, then logx c is undefined, because there is no real number a such that xa ≤ 0. Fact 1. Let a, b, x, y be positive numbers. Then (i) logb bx = x (ii) logb x + logb y = logb (xy), (iii) x logb x − logb y = logb , y (iv) logb xa = a logb x logb x = (v) y = ln x (vi) loga x . loga b ⇐⇒ ey = x. Proof. (Optional.) (i) is immediate from the definition of logarithms. (ii) By (i), x = blogb x and y = blogb y . Hence, xy = blogb x blogb y = blogb x+logb y . Apply logb to both sides of this equation to get logb (xy) = logb x + logb y. x blogb x x = log y = blogb x−logb y . Apply logb to both sides of this equation to get logb = y b b y logb x − logb y. (iii) (iv) By (i) and (ii), xa = blogb x = ba logb x . Apply logb to both sides of this equation to get logb xa = a logb x. a (v) By (i), x = blogb x . Plugging this into RHS and using also (ii), we have loga blogb x logb x loga b = = logb x. loga b loga b loga x = loga b (vi) is immediate from (i). (Observe that ln x = loge x.) Page 43, Table of Contents www.EconsPhDTutor.com Examples to illustrate each of the above laws: Example 31. log2 23 = 3 and log5 520 = 20. Example 32. ln 3 + ln 4 = ln 12, log2 5 + log2 7 = log2 35, and lg 5 + lg 3 = lg 15. Example 33. ln 3 − ln 4 = ln (3/4), log2 5 − log2 7 = log2 (5/7), and lg 5 − lg 3 = lg (5/3). Example 34. ln 34 = 4 ln 3, log2 117 = 7 log2 11, and lg 53 = 3 lg 5. Example 35. log2 5 = lg 5/ lg 2 = ln 5/ ln 2 = log3 5/ log3 2. Indeed, log2 5 = loga 5/ loga 2 for any positive number a. Exercise 13. (Answer on p. 320.) (i) Simplify log3 3x . (ii) Find x if 2 loga 7 + 0.25 loga 81 − loga 3 = loga x, where a is a positive constant. (iii) Find y if ln(y − 1) + ln y = 2. Page 44, Table of Contents www.EconsPhDTutor.com 13 Logarithms: Graphs Example 36. The graphs below are of the equations y = log2 x, y = log3 x, y = ln x, and y = lg x. Each of these graphs crosses the horizontal axis at the point (1, 0). Moreover, each has horizontal asymptote y = 0, because as x grows infinitely small (i.e. to −∞), y grows ever closer to (but never equals) 0. Page 45, Table of Contents www.EconsPhDTutor.com Example 37. The graphs below are of the functions f , g, and h defined by f (x) = log4 x, g(x) = log5 x, and h(x) = log6 x. Each of these graphs crosses the horizontal axis at the point (1, 0). Moreover, each of these graphs has horizontal asymptote y = 0, because as x grows infinitely small (i.e. to −∞), y grows ever closer to (but never equals) 0. Exercise 14. Graph (on the same diagram) the following equation and function: (i) y = log7 x; (ii) f defined by f (x) = log9 x. (Answer on p. 320.) Page 46, Table of Contents www.EconsPhDTutor.com 14 Logarithmic Growth Exam Tip The topic of logarithmic growth is new to the 8865 syllabus. So there are no TYS questions covering this topic. Informally, a quantity yt exhibits logarithmic growth if its growth rate is decreasing.10 It can be written as yt = y0 ln t, where y0 is the quantity at time t = 0. Example 38. The nth harmonic number is Hn = 1 1 1 1 + + + ⋅⋅⋅ + . 1 2 3 n For example, the first four harmonic numbers are H1 = 1 1 1 1 1 1 1 1 1 1 = 1, H2 = + = 1.5, H3 = + + = 1.8333 . . . , H4 = + + + = 2.0833 . . . 1 1 2 1 2 3 1 2 3 4 It turns out that harmonic numbers grow logarithmically. In particular, a graph of the harmonic numbers looks very similar to the graph of y = ln x (black dotted curve). The harmonic numbers grow very slowly. For example, the first to exceed 10 is H12367 = 1 1 1 1 + + + ⋅⋅⋅ + ≈ 10.000043. 1 2 3 12367 Nonetheless and remarkably, the harmonic numbers grow forever (towards ∞)! In 1968, it was shown (source) that the first harmonic number that exceeds 100 is: H15092688622113788323693563264538101449859497 . 10 A bit more precisely, the growth rate is inversely proportional to the time elapsed. Page 47, Table of Contents www.EconsPhDTutor.com Example 39. A brain tumour initially grows rapidly, then its rate of growth slows down. Its weight bt (grams) is graphed against time t (days). The growth of the brain tumour appears logarithmic. Example 40. Ah Kow is studying for the H1 Maths Exam. He takes a practice test every day for 30 days. On Day #1, he gets nearly 0 points. His score improves rapidly initially, but the rate of improvement slows down. The growth in Ah Kow’s test scores appears to be roughly logarithmic. Page 48, Table of Contents www.EconsPhDTutor.com 15 Graphs: Symmetry Informally, a graph is symmetric in a line if it is unchanged even after being reflected in that line. Example 41. The graph of y = x2 is symmetric in the line x = 0 (which also happens to be the vertical axis). 4 y x=0 Reflection line 3 y = x2 2 1 x 0 -2 Page 49, Table of Contents -1 0 1 2 www.EconsPhDTutor.com Example 42. The graph of y = 1 is symmetric in the lines y = x and y = −x. x 5 y 4 y = -x line y=x line 3 2 1 y=1/x 0 -5 -4 -3 -2 -1 0 -1 1 2 3 4 5 x -2 -3 -4 -5 Exercise 15. Draw the graphs of each of the following equations. (a) y = ex . (b) y = 3x + 2. (c) y = 2x2 + 1. Identify any intercepts, turning points, asymptotes, and lines of symmetry. (Answers on pp. 321, 322, and 323.) Page 50, Table of Contents www.EconsPhDTutor.com 16 Graphing with the TI84 Here are our first examples involving a graphing calculator. As mentioned, all such examples use a TI84. Example 43. Graph the function f defined by f (x) = x2 . 1. Press ON to turn on your calculator. 2. Press Y= to bring up the Y= editor. 3. Press X,T,θ,n to enter “X”; then x2 to enter the squared “2 ” symbol. 4. Now press GRAPH and the calculator will graph the equation y = x2 . After Step 1. Page 51, Table of Contents After Step 2. After Step 3. After Step 4. www.EconsPhDTutor.com Example 44. Graph the function g defined by g(x) = √ x. 1. Press ON to turn on your calculator. 2. Press Y= to bring up the Y= editor. Most buttons on the TI84 have three different roles. Simply pressing a button executes the role printed on the button itself. Pressing the blue 2ND and then a button executes the role printed in blue above the button. And pressing the green ALPHA and then a button executes the role printed in green above the button. √ 3. Press the blue 2ND button and then (which corresponds to the x2 button) to √ enter “ (”. Next press X,T,θ,n to enter “X”. (If we’d like, we can also enter the right parenthesis ) to close the left parenthesis, but this is not necessary — the TI84 understands what you mean, even if you don’t enter the right parenthesis.) √ 4. Now press GRAPH and the calculator will graph the equation y = x. After Step 1. After Step 2. Exercise 16. Graph y = ex − x2 + Page 52, Table of Contents √ After Step 3. After Step 4. x on your TI84. (Answer on p. 324.) www.EconsPhDTutor.com 17 Simultaneous Equations: One Linear and One Quadratic 1 2 Example 45. Solve the following pair of simultaneous equations: y = x+5 and y = x2 −2x+1. 1 2 Plug = into =: x + 5 = x2 − 2x + 1. Rearrange to get x2 − 3x − 4 = 0. We can factorise x2 − 3x − 4 = (x − 4)(x + 1). So x = 4 or x = −1. Correspondingly, y = 9 or y = 4. So there are two solutions to the given pair of simultaneous equations, namely (x, y) = (4, 9) and (x, y) = (−1, 4). We can also solve this using our TI84: 1. Press ON to turn on your calculator. 2. Press Y= to bring up the Y= editor. 3. Press X,T,θ,n + 5 to enter “x + 5”. 4. Now press ENTER to go to the second line. 5. Press X,T,θ,n x2 − 2 X,T,θ,n + 1 to enter “x2 − 2x + 1”. 6. Now press blue 2ND button and then CALC (which corresponds to the TRACE button). This brings up the CALCULATE menu. 7. Press 5 to select the “intersect” option. The TI84 now graphs both equations for you. It will now find the intersection points of the two graphs. You entered only two equations, but it was possible that you entered more than two. So just to be clear, the TI84 is now asking you, which are the two curves whose intersection points you want? It first asks, “First curve?” After Step 1. After Step 2. After Step 3. After Step 5. After Step 6. After Step 7. After Step 4. (... Example continued on the next page ...) Page 53, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) 8. Simply press ENTER to confirm that you want y = x + 5 to be your first curve. It now asks, “Second curve?” Again: 9. Simply press ENTER to confirm that you want y = x2 − 2x + 1 to be your second curve. It now asks, “Guess?” You can use the arrow keys to move the blinking cursor to close to where you believe an intersection point will be. Here I won’t bother moving the blinking cursor at all. Instead, I will simply 10. Press ENTER . The TI84 tells you that the nearest intersection point is (x, y) = (−1, 4). 11. To find the other intersection point, repeat steps #7 through #10, using the arrow keys as is appropriate. The TI84 tells you what the other intersection point is — it is (x, y) = (4, 9). After Step 8. After Step 9. After Step 10. After Step 11. 1 2 Example 46. Solve the following pair of simultaneous equations: x+y = 0 and y = 3x2 +x−1. 1 3 3 2 Rearrange = to y = −x. Plug = into = to get −x = 3x2 + x − 1 or 0 = 3x2 + 2x − 1. Now use the quadratic formula: √ −2 ± 22 − 4(3)(−1) −2 ± 4 1 x= = = −1, . 2(3) 6 3 Correspondingly, y = 1 or y = −1/3. So there are two solutions to the given pair of simultaneous equations, namely (x, y) = (−1, 1) and (x, y) = (1/3, −1/3). TI84 screenshots: Page 54, Table of Contents www.EconsPhDTutor.com Exercise 17. (Answer on p. 325.) Solve each of the following pairs of simultaneous equations, both with and without a graphing calculator: 1 2 (i) x = 5y − 2 and x2 = y − 5x + 3.6. 1 2 (ii) 4x = 1 − y and 2x2 + 3 = 5x − y. Page 55, Table of Contents www.EconsPhDTutor.com 18 Solving Equations Using Your TI84 You are required to know how to use a graphing calculator to find the numerical solution of equations (including system of linear equations). Example 47. Solve the system of equations y = x4 − x3 − 5, y = ln x. The method we learnt above was to graph both equations and then find their intersection points. Here I’ll use another method: First rewrite the two equations as a third equation y = x4 − x3 − 5 − ln x. Our goal is to find the horizontal intercepts of this equation, which will in turn also be the solutions to the above set of equations. Briefly, in the TI84: 1. Graph the equation y = x4 − x3 − 5 − ln x. It looks like there is only one horizontal intercept. 2. Zoom in. 3. Find the horizontal intercept using the “zero” option. Conclusion: There is one solution to this set of equations and its x-coordinate is 1.8658. To find the y-coordinate, we need merely plug in this value of x into either of the equations in the original set of equations: y = ln x = ln 1.8658 ≈ 0.6237. Altogether, this set of equations has one solution: (1.8658, 0.6237). After Step 1. After Step 2. After Step 3. Exercise 18. Using your graphing calculator, solve the following systems of equations.(a) 1 1 √ , y = x5 − x3 + 2. (b) y = y= , y = x3 + sin x. (Answers on p. 326.) 2 1−x 1+ x Page 56, Table of Contents www.EconsPhDTutor.com 19 Quadratic Inequalities Example 48. Solve x2 + 3x − 1 > 0. x2 + 3x − 1 is a ∪-shaped expression. By the quadratic formula, it equals 0 if and only if √ √ −3 ± 32 − 4(1)(−1) −3 ± 13 = . x= 2(1) 2 Hence, it is positive if x < (−3 − √ √ 13) /2 or x > (−3 + 13) /2. Example 49. Solve 2x2 + 5x − 1 < 0. 2x2 + 5x − 1 is a ∪-shaped expression. By the quadratic formula, it equals 0 if √ √ −5 ± 52 − 4(2)(−1) −5 ± 33 = . x= 2(2) 4 Hence, it is negative if (−5 − √ √ 33) /4 < x < (−5 + 33) /4. Example 50. Solve x2 + 3 > 0. x2 + 3 is a ∪-shaped expression. Moreover, its discriminant b2 − 4ac = 02 − 4(1)(3) = −12 is negative. Hence it is always positive, for all values of x. Exercise 19. (Answer on p. 327.) Solve the following inequalities (i) x2 + 3x − 5 > 6 − 2x2 . (ii) (x − 3)(x + 5) < 1. Page 57, Table of Contents www.EconsPhDTutor.com 20 Solving Inequalities Using Your TI84 Example 51. For what values of x is x > sin (0.5πx)? Rewrite the inequality as x − sin(0.5πx) > 0. Graph y = x − sin(0.5πx) on your graphing calculator. Our goal is to first find the horizontal intercepts of this equation; this will let us solve for x > sin (0.5πx). After Step 1. After Step 2. After Step 3. After Step 4. After Step 5. After Step 6. In the TI84: 1. Press ON to turn on your calculator. 2. Press Y= to bring up the Y= editor. 3. Press X,T,θ,n − SIN 0 . 5 . To enter “π”, press the blue 2ND button and then π (which corresponds to the ∧ button). Now press X,T,θ,n ) and altogether you will have entered “x − sin(0.5πx)”. 4. Now press GRAPH and the calculator will graph y = x − sin(0.5πx). It looks like the horizontal intercepts are close to the origin. Let’s zoom in to see better. 5. Press the (ZOOM) button to bring up a menu of ZOOM options. 6. Press 2 to select the Zoom In option. Nothing seems to happen. But now press ENTER and the TI will zoom in a little for you. It looks like there are 3 horizontal intercepts. To find out what precisely they are, we’ll use the TI84’s “zero” option. (... Example continued on the next page ...) Page 58, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) After Step 7. After Step 8. After Step 9. After Step 11. After Step 12. After Step 13. After Step 10. 4. Press the blue 2ND button and then CALC (which corresponds to the TRACE button). This brings up the CALCULATE menu. 5. Press 2 to select the “zero” option. This brings you back to the graph, with a cursor flashing. Also, the TI84 prompts you with the question: “Left Bound?” TI84’s ZERO function works by you first specifying a “Left Bound” and a “Right Bound” for x. TI84 will then check to see if there are any horizontal intercepts (i.e. values of x for which y = 0) within those bounds. 6. Using the < and > arrow keys, move the blinking cursor until it is where you want your first “Left Bound” to be. For me, I have placed it a little to the left of where I believe the leftmost horizontal intercept to be. 7. Press ENTER and you will have just entered your first “Left Bound”. TI84 now prompts you with the question: “Right Bound?”. 8. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is where you want your first “Right Bound” to be. For me, I have placed it a little to the right of where I believe the leftmost horizontal is. 9. Again press ENTER and you will have just entered your first “Right Bound”. TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to work out where the horizontal intercept is. So go ahead and: 10. Press ENTER . TI84 now informs you that there is a “Zero” at “x = −1”, “y = 0” and places the blinking cursor at precisely that point. This is the first horizontal intercept we’ve found. To find each of the other 2 horizontal intercepts, just repeat steps 4 through 10. You should be able to find that they are at x = 0 and x = 1. Altogether, the 3 intercepts are x = −1, 0, 1. Based on these and what the graph looks like, we conclude: x > sin (0.5πx) ⇐⇒ x ∈ (−1, 0) ∪ (1, ∞). Page 59, Table of Contents www.EconsPhDTutor.com Example 52. For what values of x is x > e + ln x? For this example, I won’t give the full detailed instructions of what to do on the TI84; I’ll only show a few screenshots. First, rewrite the inequality as x − e − ln x > 0 and so graph y = x − e − ln x on your graphing calculator: After Graphing. Zoom In, Adjust Window. Look for the values of x for which x − e − ln x = 0. They are x = 0.7083, 4.1387: Leftmost horizontal intercept. Rightmost horizontal intercept. Based on these horizontal intercepts and what the graph looks like, we conclude: x > e+ln x if and only if x ∈ (0, 0.7083) ∪ (4.1387, ∞). Exercise 20. Use a graphing calculator to find the values of x for which each of the √ 1 > x3 + sin x. following inequalities is true. (a) x3 − x2 + x − 1 > ex . (b) x > cos x. (c) 2 1−x (Answers on pp. 328.) Page 60, Table of Contents www.EconsPhDTutor.com 21 Formulating an Equation or a System of Linear Equations from a Problem Situation Exercise 21. (PSLE-style question.) When Apu was 40 years old, Beng was twice as old as Caleb. Today, Caleb is 28 years old and Apu is twice as old as Beng. What are the ages of Apu and Beng today? (If necessary, assume that the age of a person is always an integer and is fixed between January 1st and December 31st of each year.) (Answer on p. 329.) Solve the next two problems without using a calculator. Exercise 22. The points (1, 2), (3, 5), and (6, 9) satisfy the equation y = ax2 +bx+c. What are a, b, and c? (Answer on p. 329.) Exercise 23. The point (−1, 2) satisfies the equation y = ax2 + bx + c. Moreover, the minimum point of the equation y = ax2 + bx + c is (0, 0). What are a, b, and c? (Answer on p. 329.) Page 61, Table of Contents www.EconsPhDTutor.com Part II Calculus Page 62, Table of Contents www.EconsPhDTutor.com 22 Equations of Lines Recall that Slope = Rise / Run = “Change in y” / “Change in x”. Moreover, the line with slope m and which passes through the point (a, b) has equation y − b = m(x − a). Example 53. The line with slope 3 and which passes through the point (1, 2) has equation y−2 = 3(x−1). If desired, we can rearrange this equation into a more familiar form: y = 3x−1. Example 54. The line with slope −1 and which passes through the point (3, −1) has equation y − (−1) = −1(x − 3). If desired, we can rearrange this equation into a more familiar form: y = −x + 2. Page 63, Table of Contents www.EconsPhDTutor.com Example 55. The line with slope 2 and which passes through the point (0, 0) has equation y − 0 = 2(x − 0). If desired, we can rearrange this equation into a more familiar form: y = 2x. Example 56. The line with slope 0 and which passes through the point (1, 1) has equation y − 1 = 0(x − 1). If desired, we can rearrange this equation into a more familiar form: y = 1. Page 64, Table of Contents www.EconsPhDTutor.com 23 The Derivative as Slope of the Tangent The problem of finding the derivative is the problem of finding the slope of the tangent to a graph at a given point. Graphed below is some function f . Pick some point A = (a, f (a)). Draw the line l which is tangent to the graph at the point A. How do we find the slope of l? Unsure of how to proceed, we try a crude approximation. Pick some point X1 = (x1 , f (x1 )) that is also on the graph. Consider the line AX1 . What’s f (x1 ) − f (a) its slope? Slope = Rise ÷ Run and so AX1 has slope . x1 − a This number serves as our first crude approximation of the slope of l. How can we improve on this approximation? Simple — just pick some point X2 = (x2 , f (x2 )) f (x2 ) − f (a) that is closer to A. The line AX2 has slope . x2 − a This number serves as our second, improved approximation of the slope of l. At least in theory, we can keep repeating this procedure, by picking points that are ever closer to A. Our estimates of the slope of l will get ever better. Altogether then, we are motivated to make the following informal definition of the derivative: Page 65, Table of Contents www.EconsPhDTutor.com The derivative of the function f at the point a is the value of the following expression f (x) − f (a) , x−a when x is “very close to, but not equal to” a. The following proposition summarises the rules of differentiation you need to know. You don’t need to know why they work; instead, you need only blindly apply them like a monkey. For example, Rule #1 says that the function h defined by h(x) = k (where k is some constant) has derivative h′ defined by h′ (x) = 0. Proposition 1. If k is a constant, f and g are functions with derivatives f ′ and g ′ , then: 1. 4. d k dx = 0, 2. d 1 ln x = , dx x 5. d k x dx = kxk−1 , 3. d x e dx d f ± g = f ′ ± g′, dx 6. d kf = kf ′ . dx = ex , Proof. Omitted. The derivative of the function f is the function f ′ may be written compactly as: df = f′ dx or df (x) = f ′ (x). dx Example 57. Graphed below (in red) is the function f defined by f (x) = 5x. Also graphed is the derivative of f . The derivative of f is itself a function, namely: the function f ′ defined by f ′ (x) = 5. This says that the graph of f has constant slope 5 everywhere. Page 66, Table of Contents www.EconsPhDTutor.com Example 58. Graphed below (in red) is the function g defined by g(x) = x2 . Also graphed is the derivative of g. The derivative of g is itself a function, namely: the function g ′ defined by g ′ (x) = 2x. This says that the tangent to the graph of g at the point (x, g(x)) has slope 2x. For example, the tangent at (1.5, 2.25) has slope 2x = 2(1.5) = 3. Its equation is thus y − 2.25 = 3(x − 1.5) or y = 3x − 2.25. As another example, the tangent at (−1, −1) has slope 2x = 2(−1) = −2. Its equation is thus y − (−1) = −2 [x − (−1)] or y = −2x − 1. Page 67, Table of Contents www.EconsPhDTutor.com Example 59. Graphed below (in red) is the function h defined by h(x) = x3 − 2x2 + 5x − 1. Also graphed is the derivative of h. The derivative of h is itself a function, namely: the function h′ defined by h′ (x) = 3x2 − 4x + 5. This says that the tangent to the graph of h at the point (x, h(x)) has slope 3x2 − 4x + 5. For example, the tangent at (−1, −9) has slope 3(−1)2 − 4(−1) + 5 = 12. Its equation is thus y − (−9) = 12 [x − (−1)] or y = 12x + 3. As another example, the tangent at (1, 3) has slope 3(1)2 − 4(1) + 5 = 4. Its equation is thus y − 3 = 4(x − 1) or y = 4x − 1. Page 68, Table of Contents www.EconsPhDTutor.com Given an equation y = f (x), the derivative of y with respect to x is simply the function f ′ . In this context, this function may also be denoted dy . dx Example 60. Consider the equation y = 5x. The derivative of y with respect to x is dy = 5. dx Example 61. Consider the equation y = x2 . The derivative of y with respect to x is dy = 2x. dx Example 62. Consider the equation y = x3 − 2x2 + 5x − 1. The derivative of y with respect to x is dy = 3x2 − 4x + 5. dx A whole load more examples of differentiation on the next two pages: Page 69, Table of Contents www.EconsPhDTutor.com Example 63. Rule #1. The function f defined by f (x) = 7 is an example of a constant function. Its derivative is the function f ′ defined by f ′ (x) = 0. What Rule #1 says is that the derivative of any constant function is simply the zero function (i.e. the function that maps every input to the number 0). Intuitively and graphically, this is obvious. Example 64. Rule #1. The function g defined by g(x) = 31 is another example of a constant function. Its derivative is the function g ′ defined by g ′ (x) = 0. Example 65. Rule #2. The function f defined by f (x) = x has derivative f ′ defined by f ′ (x) = 1. Example 66. Rule #2. The function g defined by g(x) = x2 has derivative g ′ defined by g ′ (x) = 2x. Example 67. Rule #2. The function h defined by h(x) = x3 has derivative h′ defined by h′ (x) = 3x2 . Example 68. Rule #2. The function i defined by i(x) = x4 has derivative i′ defined by i′ (x) = 4x3 . Example 69. Rule #3. The function f defined by f (x) = ex has derivative f ′ defined by f ′ (x) = ex . That is, interestingly enough, the derivative of f is itself. Example 70. Rule #4. The function g defined by g(x) = ln x has derivative g ′ defined by g ′ (x) = 1/x. Example 71. Rule #5. The function h defined by h(x) = x3 + ln x has derivative h′ defined by h′ (x) = 3x2 + 1/x. Example 72. Rule #5. The function h defined by h(x) = ex + x4 has derivative h′ defined by h′ (x) = ex + 4x3 . Page 70, Table of Contents www.EconsPhDTutor.com Of course, Rule #5 generalises to where we’re summing up more than two functions. Example 73. Rule #5. The function i defined by i(x) = 15 + x + x2 has derivative i′ defined by i′ (x) = 1 + 2x. Example 74. Rule #5. The function f defined by f (x) = 1 + x + x2 + x3 + x4 + ⋅ ⋅ ⋅ + x100 has derivative f ′ defined by f ′ (x) = 1 + 2x + 3x2 + 4x3 + ⋅ ⋅ ⋅ + 100x99 . Example 75. Rule #6. The function f defined by f (x) = 30x has derivative f ′ defined by f ′ (x) = 30. Example 76. Rule #6. The function g defined by g(x) = 7 (1 + x + x2 + x3 + ⋅ ⋅ ⋅ + x100 ) has derivative g ′ defined by g ′ (x) = 7 (1 + 2x + 3x2 + ⋅ ⋅ ⋅ + 100x99 ). Example 77. Rule #6. The function h defined by h(x) = 4ex has derivative h′ defined by h′ (x) = 4ex . (Interestingly, the only functions whose derivatives are themselves must be of the form f (x) = kex , for some constant k.) Exercise 24. For the functions below, (a) compute its derivative; (b) find the equations of the tangents to the graph at the points where x = 1 and x = 2. (Answer on p. 330.) (i) f defined by f (x) = ln x + ex + x2 . (ii) g defined by g(x) = 1/x + x3 + 7ex . Exercise 25. For each of the two equations below, (a) compute the derivative of y with respect to x; (b) find the equations of the tangents to the graph at the points where x = 1 and x = 2. (Answer on p. 331.) √ 3 (i) y = 13 ( x − 2 ). x Page 71, Table of Contents (ii) y = 9ex − x5 . www.EconsPhDTutor.com 24 Chain Rule The Chain Rule is yet another rule of differentiation. A simple example to illustrate: Example 78. When I add 1 g of Milo (the x-variable) to a cup of water, the volume of the water increases by 2 cm3 (the y-variable). We can write this more compactly as dy = 2 cm3 g−1 . dx When the volume of the water increases by 1 cm3 (the y-variable), the water level (in the cup) rises by 0.3 cm (the z-variable). We can write this more compactly as dz = 0.3 cm cm−3 = 0.3 cm−2 . dy Altogether then, when I add 1 g of Milo (the x-variable) to a cup of water, I should expect the water level to rise by 0.6 cm. That is, dz = 0.6 cm g−1 . dx We got the above expression for dz/dx by making the following quick computation: dz dz dy = = 2 × 0.3 = 0.6 cm g−1 . dx dy dx In general, let x, y, and z be variables. Suppose x and z are not directly related. However, a small change in x causes a small change in y. And in turn, a small change in y causes a small change in z. Informally, the Chain Rule addresses the following question: “If there is a small unit change in x, how does z change?” The answer is this: The change in z caused by The change in z caused by The change in y caused by = × . a small unit change in x a small unit change in y a small unit change in x ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ dz/dx dz/dy The Chain Rule is thus simply this equation: Page 72, Table of Contents dy/dx dz dz dy = × . dx dy dx www.EconsPhDTutor.com Examples to illustrate how the Chain Rule is applied: Example 79. Let f be defined by f (x) = ex . Its derivative f ′ is defined by: 3 dex dex dx3 x3 = = e f (x) = ⋅ 3x2 . 3 dx dx dx 3 3 ′ Another simple example: Example 80. Let g be defined by g(x) = √ 4x − 1. Its derivative g ′ is defined by: √ √ d 4x − 1 d 4x − 1 d(4x − 1) −0.5 −0.5 g ′ (x) = = = 0.5 (4x − 1) ⋅ 4 = 2 (4x − 1) . dx d(4x − 1) dx Here’s a more complicated example, where the Chain Rule is applied twice. 3 Example 81. Let h be defined by h(x) = (ln x2 + e5x+3 ) . Its derivative h′ is defined by: 3 3 d (ln x2 + e5x+3 ) d (ln x2 + e5x+3 ) d(ln x2 + e5x+3 ) h (x) = = dx d(ln x2 + e5x+3 ) dx ′ 2 de5x+3 d(5x + 3) d ln x2 dx2 + ] dx2 dx d(5x + 3) dx 2 1 5x+3 2 5x+3 2 2 (ln ) ( + 5e5x+3 ) . ⋅ 2x + e ⋅ 5) = 3 x + e 2 x x = 3 (ln x2 + e5x+3 ) [ = 3 (ln x2 + e5x+3 ) ( Exercise 26. The functions f , g, and h are defined below. Find the value of the derivative of each, at x = 0. (Answer on p. 331.) (a) f (x) = x2 . 2 (b) g(x) = 1 + [x − ln (x + 1)] . 2 3 (c) h(x) = (1 + [x − ln (x + 1)] ) . Page 73, Table of Contents www.EconsPhDTutor.com 25 Increasing, Decreasing, and f ′ Example 82. The function f defined by f (x) = x2 has derivative f ′ defined by f ′ (x) = 2x. For x < 0, f is (strictly) decreasing, i.e. f ′ (x) < 0. For x > 0, f is (strictly) increasing, i.e. f ′ (x) > 0. At x = 0, f is neither strictly deceasing nor strictly increasing (f is flat), i.e. f ′ (0) = 0. A stationary point (x, f (x)) of a function f is a point at which f ′ (x) = 0. So in this example, (0, 0) is a stationary point. Example 83. Graphed below is the function g defined by g(x) = 3x3 − 5x2 + x − 7. Its derivative g ′ is defined by g ′ (x) = 9x2 − 10x + 1. From what we know about quadratic equations, g ′ (x) = 9x2 − 10x + 1 = (9x − 1)(x − 1) is negative if 1/9 < x < 1, zero if x = 1/9 or x = 1, and positive if x < 1/9 or x > 1. So for 1/9 < x < 1, the function g is (strictly) decreasing, i.e. g ′ (x) < 0. And for x < 1/9 or x > 1, the function g is (strictly) increasing, i.e. g ′ (x) > 0. At x = 1/9 or x = 1, g is neither strictly deceasing nor strictly increasing, i.e. g ′ (1/9) = 0 and g ′ (1) = 0. Those are also the stationary points of g. Page 74, Table of Contents www.EconsPhDTutor.com Example 84. Graphed below is the function h defined by h(x) = ln x. Its derivative h′ is defined by h′ (x) = 1/x. The derivative h′ is always positive. So the slope of the graph of h is positive everywhere. (There are no stationary points.) Exercise 27. Let f be defined by f (x) = 3x2 −4x+1. (i) Sketch the graph of f . (ii) Identify where f ′ (x) is negative, zero, and positive (equivalently, where the slope of the graph of f is decreasing, flat, and increasing). (iii) Identify the stationary points. (Answer on p. 332.) Page 75, Table of Contents www.EconsPhDTutor.com 26 Finding Turning Points (the First Derivative Test) It turns out that every maximum and minimum turning point is a stationary point. The intuition for this is quite simple: Example 85. Graphed below is f defined by f (x) = −(x − 1)2 . Here’s the intuition for why f ′ (0) = 0 (i.e. why there is a stationary point at x = 0): In order for 1 to be a maximum turning point of f , it must be that to its left, f is increasing; while to its right, f is decreasing. In other words, to the left of 1, f ′ (x) ≥ 0. While to the right of 1, f ′ (x) ≤ 0. Altogether then, we must have f ′ (1) = 0 — that is, the maximum turning point must also be a stationary point. The next exercise asks you to give a similar piece of intuition for why g ′ (−1) = 0. Exercise 28. Explain why g ′ (−1) = 0 in the above Example. (Answer on p. 332.) Every maximum or minimum turning point is a stationary point. However, the converse is not true: not every stationary point is a turning point. So to identify all the maximum or minimum turning points of a function, we can follow this two-step recipe: Page 76, Table of Contents www.EconsPhDTutor.com The Simple Recipe for Finding Maximum and Minimum Turning Points. 1. Find all stationary points (i.e. where the derivative is zero). Since every turning point is a stationary point, this ensures that we do not miss out on any turning points. 2. Investigate the nature of these points. Some stationary points may be maximum or minimum turning points. However, some stationary points may be neither (for example, they may be inflexion points, to be discussed in the next chapter). So we really do need to carefully check the nature of the stationary points we’ve found. For H1 Maths, checking what exactly a stationary point is usually just involves sketching the graph (either manually or using your graphing calculator). Examples on how to use the above Simple Recipe: Example 86. Consider f defined by f (x) = x2 . 1. f ′ (x) = 2x. So the only stationary point is at x = 0. 2. This is a minimum turning point, as a quick graph sketch will verify. Page 77, Table of Contents www.EconsPhDTutor.com Example 87. Consider g defined by g(x) = 4x7 − 14x4 + 28x. 1. g ′ (x) = 28x6 − 56x3 + 28 = 28 (x6 − 2x3 + 1) = 28 (x3 − 1) (x3 − 1). So the only stationary point is at x = 1. 2. But this is not a turning point, as a quick graph sketch will verify. Example 88. Consider h defined by h(x) = 3. 1. h′ (x) = 0 everywhere. So every point is a stationary point. 2. However, no point is a turning point. Indeed, the graph of h is simply a horizontal line. Page 78, Table of Contents www.EconsPhDTutor.com Example 89. Consider i defined by i(x) = x3 + x2 + x + 1. 1. i′ (x) = 3x2 + 2x + 1. This is a ∪-shaped quadratic expression whose discriminant is negative. So, it is never the case that i′ (x) = 0 and there are no stationary points. 2. Hence, there are no turning points either. Exercise 29. For each of the following functions, identify any maximum and minimum turning points. (Answers on pp. 333 and 334.) (i) f defined by f (x) = x. (ii) g defined by g(x) = 100. (iii) h defined by h(x) = x4 − 2x2 . (iv) i defined by i(x) = x3 . (v) j defined by j(x) = x3 + x2 − x + 1. Page 79, Table of Contents www.EconsPhDTutor.com 27 Inflexion Points • A graph is concave downwards (or simply concave) in a region if the line segment connecting any two points of the graph (in that region) is below the graph. • A graph is concave upwards (or simply convex) in a region if the line segment connecting any two points of the graph (in that region) is above the graph. • An inflexion point is any point where the concavity of the graph changes (either from concave downwards to concave upwards OR concave upwards to concave downwards).11 Example 90. Graphed below is f defined by f (x) = x3 . For x < 0, the graph is concave downwards (the line segment connecting any two points is below the graph). For x > 0, it is concave upwards (the line segment connecting any two points is above the graph). (0, 0) is an inflexion point because this is where the graph changes from concave downwards to concave upwards. The tangent line test says that a point is an inflexion point if and only if the line is above the graph on one side of the point and below the graph on the other side. This is illustrated in the above example. 11 The discussion in this chapter here is very brief and informal, because a proper discussion of inflexion points would be much longer. If you’re really interested in what inflexion points are, please read my H2 Mathematics Textbook. (In the 2006-2015 H1 Maths exams, I can find only one 2-mark question on inflexion points — see Exercise 63.1. So it isn’t terribly important, if all you care about is getting an A.) By the way, inflection would be the American spelling. Page 80, Table of Contents www.EconsPhDTutor.com Example 91. Graphed below is g defined by g(x) = x5 + 5x4 + 10x3 + 10x2 + 5x + 1. For x < −1, the graph is concave downwards. And for x > −1, it is concave upwards. So the graph of g has an inflexion point at x = −1. Example 92. Graphed below is h defined by h(x) = x3 − 2x2 + 4x + 1. For x < 2, the graph is concave downwards. And for x > 2, it is concave upwards. So the graph of h has an inflexion point at x = 2. Page 81, Table of Contents www.EconsPhDTutor.com The A-level syllabuses explicitly exclude non-stationary points of inflexion. Nonetheless, there is the temptation to believe that “every inflexion point must also be a stationary point”. Here’s a quick counter-example to dispel this false belief: Example 93. Graphed below is the function f defined by f (x) = x3 + x. We have f ′ (x) = 3x2 + 1. The point (0, 0) is not a stationary point because f ′ (0) = 1 ≠ 0. Nonetheless, it is an inflexion point, because to the left of 0, f is concave downwards; and to the right, f is concave upwards. (We can also verify this using the tangent line test.) The point (0, 0) is thus an example of a non-stationary point of inflexion. But don’t worry, the A-level exams will ONLY ask about stationary points of inflexion. And so for the purposes of the A-level exams, the Simple Recipe given in the previous chapter will detect not only all turning points, but also all inflexion points. Page 82, Table of Contents www.EconsPhDTutor.com Example 94. Graphed below is the function f defined by f (x) = x5 + 2x4 + x3 . 1. Find all stationary points (i.e. where the derivative is zero). f ′ (x) = 5x4 + 8x3 + 3x2 = x2 (5x2 + 8x + 3) = x2 (5x + 3)(x + 1). So the only stationary points are at x = −1, x = −0.6, and x = 0. These are labelled in the graph below as A, B, and C. 2. Investigate the nature of these points. A is a maximum turning point; B is a minimum turning point, and C is a stationary point of inflexion. (The graph of f actually has two other points of inflexion other than C. However, they are non-stationary and you are not required to find them for the A-levels.) Example 95. Graphed below is the function g defined by g(x) = 9x4 + 2x3 − 3x2 . 1. Find all stationary points (i.e. where the derivative is zero). g ′ (x) = 36x3 + 6x2 − 6x = 6x (6x2 + x − 1) = 6x(3x − 1)(2x + 1). So the only stationary points are at x = −1/2, x = 0, and x = 1/3. These are labelled in the graph below as A, B, and C. 2. Investigate the nature of these points. A and C are both minimum turning points, B is a maximum turning point. There are no stationary points of inflexion. (There may or may not be non-stationary points of inflexion, but you’re not required to know how to find these for the A-levels.) Page 83, Table of Contents www.EconsPhDTutor.com Example 96. Graphed below is the function h defined by h(x) = 2x6 − 3x4 − 1. 1. Find all stationary points (i.e. where the derivative is zero). h′ (x) = 12x5 − 12x3 = 12x3 (x2 − 1) = 12x3 (x − 1)(x + 1). So the only stationary points are at x = −1, x = 0, and x = 1. These are labelled in the graph below as A, B, and C. 2. Investigate the nature of these points. A and C are both minimum turning points, B is a maximum turning point. There are no stationary points of inflexion. (Again, there may or may not be non-stationary points of inflexion, but you’re not required to know how to find these for the A-levels.) Exercise 30. (Answer on p. 335.) For each of the following functions, find the stationary points and investigate the nature of each. (i) f defined by f (x) = x3 − 3x + 1. (ii) g defined by g(x) = x3 − 3x2 + 3x + 5. Page 84, Table of Contents www.EconsPhDTutor.com 28 Finding Max/Min Points on the TI84 Example 97. Define f by f (x) = x − sin (0.5πx). Let’s find the minimum point of f , in the region where 0 < x < 2. After Step 1. After Step 2. After Step 3. After Step 4. After Step 5. After Step 6. 1. Press ON to turn on your calculator. 2. Press Y= to bring up the Y= editor. 3. Press X,T,θ,n − SIN 0 . 5 . To enter “π”, press the blue 2ND button and then π (which corresponds to the ∧ button). Now press X,T,θ,n ) and altogether you will have entered “x − sin(0.5πx)”. 4. Now press GRAPH and the calculator will graph y = x − sin(0.5πx). Let’s zoom in to the region where 0 ≤ x ≤ 2. 5. Press the (ZOOM) button to bring up a menu of ZOOM options. 6. Press 2 to select the Zoom In option. Using the < and > arrow keys, move the cursor to where X = 1.0638298, Y = 0. Now press ENTER and the TI will zoom in a little, centred on the point X = 1.0638298, Y = 0. It looks like starting at x = 0, the function is decreasing, then hits a minimum point, then keeps increasing. Our goal now is to find out what that minimum point is. (... Example continued on the next page ...) Page 85, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) After Step 7. After Step 8. After Step 9. After Step 11. After Step 12. After Step 13. After Step 10. 4. Press the blue 2ND button and then CALC (which corresponds to the TRACE button). This brings up the CALCULATE menu. 5. Press 3 to select the “minimum” option. This brings you back to the graph, with a cursor flashing. Also, the TI84 prompts you with the question: “Left Bound?” TI84’s MINIMUM function works by you first choosing a “Left Bound” and a “Right Bound” for x. TI84 will then look for the minimum point within your chosen bounds. 6. Using the < and > arrow keys, move the blinking cursor until it is where you want your first “Left Bound” to be. For me, I have placed it a little to the left of where I believe the minimum point to be. 7. Press ENTER and you will have just entered your first “Left Bound”. TI84 now prompts you with the question: “Right Bound?”. 8. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is where you want your first “Right Bound” to be. For me, I have placed it a little to the right of where I believe the minimum point to be. 9. Again press ENTER and you will have just entered your first “Right Bound”. TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to work out where the minimum point is. So go ahead and: 10. Press ENTER . TI84 now informs you that there is a “Zero” at “X = .56066485”, “Y = −.2105137” and places the cursor at precisely that point. This is our desired minimum point. Page 86, Table of Contents www.EconsPhDTutor.com 29 Finding the Derivative at a Point on the TI84 Example 98. Define f by f (x) = esin x . Our goal is to find f ′ (2) and f ′ (3). 1. Press ON to turn on your calculator. 2. Press Y= to bring up the Y= editor. 3. Press X,T,θ,n , blue 2ND button and then ex (which corresponds to the LN button) ∧ SIN X,T,θ,n ) to enter “esin x ”. 4. Press the blue 2ND button and then CALC (which corresponds to the TRACE button). This brings up the CALCULATE menu. 5. Press 6 to select the “dy/dx” option. The TI84 now graphs y = esin x . Now you need only tell the TI84 at which point (x, y) you’d like it to evaluate dy/dx. So to find f ′ (2), simply 6. Press 2 . 7. Press ENTER . You’re now told that f ′ (2) = −1.033116. To find f ′ (3): 8. Press the blue 2ND button and then CALC (which corresponds to the TRACE button) to again bring up the CALCULATE menu. Again press 6 to select the “dy/dx” option. The only difference now is that we press 3 . Press ENTER . You’re now told that f ′ (3) = −1.140038. After Step 1. After Step 2. After Step 3. After Step 4. After Step 5. After Step 6. After Step 7. After Step 8. Page 87, Table of Contents www.EconsPhDTutor.com 30 Connected Rates of Change Problems Example 99. We unload sand onto a flat surface at a steady rate of 0.01 m3 s-1 . Assume the unloaded sand always forms a perfect cone whose height and base diameter are always equal. Let’s find the rate at which the base area of the cone is increasing, at the instant t = 20 s. First, recall that a cone with base radius r and height h has volume 1 V = πr2 h. 3 Since the base diameter equals the height (or h = 2r), we can rewrite this as 2 V = πr3 . 3 Now differentiate the above equation with respect to t, to get dV dr = 2πr2 . dt dt Let A = πr2 be the base area. The rate at which the base area is increasing is dA dr dV = 2πr = ÷ r. dt dt dt The volume of the sand is always increasing at a rate 0.01 m3 s-1 . That is: dV = 0.01 m3 s−1 . dt 3V 1/3 0.3 1/3 V ∣t=20 = 20 × 0.01 = 0.2 m . Hence, r∣t=20 = ( ) ∣ = ( ) m. Altogether then, t=20 2π π 3 dA 0.3 1/3 ∣ = 0.01 ÷ ( ) = 0.0219 m2 s−1 . dt t=20 π Page 88, Table of Contents www.EconsPhDTutor.com Exercise 31. (Answer on p. 336.) Illustrated below is a cone with lateral l, base radius r, and height h. You are given that such a cone has total external surface area (excluding the base) πrl and volume πr2 h/3. A manufacturer wishes to manufacture a cone whose volume is fixed at 1 m3 and whose total external surface area (excluding the base) is minimised. Let’s find out what its height should be, by following these steps: (a) Express r in terms of h. (b) Use the Pythagorean Theorem to express l in terms of r and h. Hence express l solely in terms of h. (c) Now express the total external surface area A (excludes the base) solely in terms of h. (d) Show that dA 3 π − h63 = . dh 2 A Hence conclude that the only stationary point is 6 1/3 h = ( ) ≈ 1.24 m. π (e) Using your expression from part (c) and your graphing calculator, graph A as a function of h. Hence confirm that the stationary point we found in part (d) is indeed the minimum turning point. That is, the desired height is indeed 6 1/3 h = ( ) ≈ 1.24 m. π Page 89, Table of Contents www.EconsPhDTutor.com 31 Integration as the Reverse of Differentiation If the function g is the derivative of the function f , then we may also say that f is an indefinite integral of g. Example 100. Consider the functions f and g defined by f (x) = x2 and g(x) = 2x. The function g is the derivative of the function f . We write: df =g dx or df (x) = g(x). dx The two statements above are equivalent. Each says: “the function g is the derivative of the function f ”. Conversely, the function f is an indefinite integral of the function g. We write: ∫ g dx = f or ∫ g(x) dx = f (x). The two statements above are equivalent. Each says: “the function f is an indefinite integral of the function g”. Remarks on notation: • The symbol ∫ is called the integration sign — it is an elongated S. • The symbol dx is called the differential of the variable x — it says that the variable of integration is x. • The function g to be integrated is called the integrand. One common source of confusion amongst students is a failure to grasp that x is merely a dummy variable. We can replace x with any other letter. The next example illustrates: Page 90, Table of Contents www.EconsPhDTutor.com Example 101. Consider again the functions f and g defined by f (x) = x2 and g(x) = 2x. The function f is an indefinite integral of the function g. We can write either ∫ g dx = f or ∫ g(x) dx = f (x). But we can equally well write any of the following: ∫ g da = f or ∫ g(a) da = f (a). ∫ g db = f or ∫ g(b) db = f (b). ∫ g dc = f or ∫ g(c) dc = f (c). The dummy variable is merely a place-holder for whatever input that goes into the function f or g. We can use any letter for this dummy variable, be it x or a or b or c. More examples to illustrate that integration is the reverse of differentiation: Example 102. Consider the functions f and g defined by f (x) = ln x + x and g(x) = 1 + 1. x The function g is the derivative of the function f . Conversely, the function f is an indefinite integral of the function g. We may write either ∫ g dx = f or ∫ g(x) dx = f (x). Example 103. Consider the functions f and g defined by f (x) = ex 2 2 and g(x) = 2x ⋅ ex . The function g is the derivative of the function f . Conversely, the function f is an indefinite integral of the function g. We may write either ∫ g dx = f Page 91, Table of Contents or ∫ g(x) dx = f (x). www.EconsPhDTutor.com Exercise 32. The following is a list of functions. State which, if any, function is an indefinite integral of another function. (Answer on p. 336.) f defined by f (x) = 2x, g defined by g(x) = 3x2 , h defined by h(x) = x3 , i defined by i(x) = x2 + 2 j defined by j(x) = x3 + 1 k defined by k(x) = x2 . Page 92, Table of Contents www.EconsPhDTutor.com 32 The Constant of Integration It turns out that every function has infinitely many indefinite integrals. Example 104. Consider the function f defined by f (x) = 2x. The following functions are all indefinite integrals of f : g defined by g(x) = x2 , h defined by h(x) = x2 + 7, i defined by i(x) = x2 − 11. Indeed, f is the derivative of any function j of the form j(x) = x2 + C (where C is any constant). Thus, any such j is an indefinite integral of f . This is not terribly surprising, given that the derivative of any constant C is 0. We call C the constant of integration. Moreover, the indefinite integral is unique up to the constant of integration. That is, if g and h are both indefinite integrals of f , then g and h must differ by only a constant. Example 105. Define f by f (x) = 2x. An indefinite integral of f is g defined by g(x) = x2 . Given that the function h is also an indefinite integral of f , then we know immediately that g and h differ by at most some constant C. That is, it must be that h takes the form h(x) = x2 + C (where C is some constant). Altogether then: 1. Every function has infinitely many indefinite integrals. 2. Moreover, each of these indefinite integrals differ from each other by at most some constant term. Example 106. Define f by f (x) = xex . You are given that an indefinite integral of f is the function g defined by g(x) = ex (x − 1). Then you immediately know that: 1. Every function h of the form h(x) = ex (x − 1) + C is an indefinite integral of f . 2. Moreover, besides such functions, there are no other indefinite integrals of f . Page 93, Table of Contents www.EconsPhDTutor.com 33 Basic Rules of Integration Below are the basic rules of integration. For example, Rule #1 says that if h is the function defined by h(x) = k (where k is some constant), then every function i defined by i(x) = kx + C is an indefinite integral of h. Moreover, there are no other indefinite integrals of h. Proposition 2. Let k be any constant. Let f and g be functions with derivatives f ′ and g ′ . Then 5. k ∫ (ax + b) dx = (ax + b)k+1 + C, a(k + 1) xk+1 2. ∫ xk dx = + C, k+1 6. ax+b dx ∫ e = 1 ax+b e + C, a 1 dx = ln ∣x∣ + C, 3. ∫ x 7. ∫ f ′ (x) ± g ′ (x) dx = f (x) ± g(x) + C, 4. ∫ ex dx = 8. 1. ∫ k dx = kx + C, ex + C, ′ ∫ kf (x) dx = kf (x) + C, where in each case, C is the constant of integration. (For Rule #2, assume k ≠ −1. And if k < 0, assume x ≠ 0. For Rule #3, assume x ≠ 0.) Proof. To prove ∫ f ′ (x) dx = f (x), it suffices to prove that the derivative of f is f ′ . d So to prove Rule #3 — i.e. that ∫ x−1 dx = ln ∣x∣+C — it suffices to prove that (ln ∣x∣ + C) = dx x−1 for all x ≠ 0. This we now do. First note that ⎧ ⎪ ⎪ ⎪ln x + C, ln ∣x∣ + C = ⎨ ⎪ ⎪ ⎪ ⎩ln (−x) + C, Thus, ⎧ 1 ⎪ ⎪ , ⎪ ⎪ ⎪ x ⎪ d ⎪ (ln ∣x∣ + C) = ⎨ ⎪ dx ⎪ ⎪ −1 1 ⎪ ⎪ ⎪ = , ⎪ ⎩ −x x for x > 0, for x < 0. for x > 0, for x < 0. d (ln ∣x∣ + C) = x−1 for all x ≠ 0. (Exercise 33 requests that you prove the dx remaining rules.) And so indeed Page 94, Table of Contents www.EconsPhDTutor.com Example 107. Define the functions f ′ , g ′ , and h′ by f ′ (x) = x, g ′ (x) = x2 , and h′ (x) = x3 . They have indefinite integrals f , g, and h, defined by x3 x4 x2 + C1 , g(x) = + C2 , and h(x) = + C3 , 2 3 4 where C1 , C2 , and C3 are constants of integration. We may also simply write: f (x) = x2 x dx = + C1 , ∫ 2 x3 2 x dx = + C2 , ∫ 3 x4 and ∫ x3 dx = + C3 . 4 Example 108. Define the functions f ′ , g ′ , and h′ by f ′ (x) = ex , g ′ (x) = 3ex , and h′ (x) = 3ex + x2 . They have indefinite integrals f , g, and h, defined by x3 + C3 , 3 where C1 , C2 , and C3 are constants of integration. We may also simply write: f (x) = ex + C1 , x x ∫ e dx = e + C1 , g(x) = 3ex + C2 , x x ∫ 3e dx = 3e + C2 , and h(x) = 3ex + x3 and ∫ 3ex + x2 dx = 3ex + + C3 . 3 Example 109. Define the functions f ′ , g ′ , and h′ by f ′ (x) = (7x + 2)2 , g ′ (x) = (7x + 2)3 , and h′ (x) = 5(7x + 2)3 . They have indefinite integrals f , g, and h, defined by f (x) = (7x + 2)3 , 3⋅7 g(x) = (7x + 2)4 , 4⋅7 and h(x) = 5 (7x + 2)4 , 4⋅7 where C1 , C2 , and C3 are constants of integration. We may also simply write: 2 ∫ (7x + 2) dx = (7x + 2)3 + C1 , 3⋅7 3 ∫ (7x + 2) dx = and ∫ 5(7x + 2)3 dx = Page 95, Table of Contents (7x + 2)4 + C2 , 4⋅7 5(7x + 2)4 + C3 . 4⋅7 www.EconsPhDTutor.com Exercise 33. Complete the proof of Proposition 2. (Answer on p. 337.) Exercise 34. Find each of the following indefinite integrals. (Don’t forget to include the constant of integration. Answer on p. 337.) (i) ∫ 7x5 − 8x4 + 3x2 + 2 dx. Page 96, Table of Contents (ii) ∫ e5x+2 − (5x + 2)2 dx. (iii) ∫ 16/x + 32x3 dx. www.EconsPhDTutor.com 34 The Definite Integral as the Area Under a Graph Surprisingly, we can use integration to find the area under the graph of a function. Example 110. Graphed below is the function f defined by f (x) = 2x. What is the shaded green area under the graph of f , between the lines x = 2 and x = 5? We can of course find this area using primary school methods: This is a parallelogram with base 3 and sides 4 and 10. Hence, it has area 1 1 × Base × (Sum of sides) = × 3 × (4 + 10) = 21. 2 2 But surprisingly enough, this area can also be found using integration. Pick any indefinite integral of f — say g defined by g(x) = x2 . Then the desired area is simply: g(5) − g(2) = 52 − 22 = 21. We can also write this area as 5 ∫2 f (x) dx. The above expression is called a definite integral, where 5 ∫2 f (x) dx = g(5) − g(2) = 21. 5 We sometimes also write [g(x)]2 as shorthand for g(5) − g(2). Page 97, Table of Contents www.EconsPhDTutor.com This is an amazing “trick” for finding the area under a graph. Why it works involves something called the Fundamental Theorems of Calculus, which are beyond the scope of H1 Maths.12 For H1 Maths, all you need know is that this amazing “trick” works and all you need do is perform this “trick” like a monkey. More examples: Example 111. Let the function f be defined by f (x) = 3 √ x + 1. The definite integral ∫1 f dx (simply the area under f , between 1 and 3) is highlighted in blue. Similarly, the definite integral ∫ 3 1 f dx (simply the area under f , between 5 and 8) is highlighted in red. Example 112. Consider the function g defined by g(x) = 9x2 + 6x + 1. What is the area under the graph of g, between the lines x = 0 and x = 7? By the above amazing “trick”, the desired area is simply: 7 7 2 3 2 3 2 3 2 ∫0 9x + 6x + 1 dx = [3x + 3x + x]0 = 3 ⋅ 7 + 3 ⋅ 7 + 7 − (3 ⋅ 0 + 3 ⋅ 0 + 0) = 1183. 12 But see my H2 Mathematics Textbook if you’re interested. Page 98, Table of Contents www.EconsPhDTutor.com Example 113. Consider the function h defined by h(x) = (5x + 2)2 . What is the area under the graph of h, between the lines x = −1 and x = 1? By the above amazing “trick”, the desired area is simply: 1 1 1 1 74 3 2 3 (5x + 2) dx = [ (5x + 2) ] = (5 ⋅ 1 + 2)3 − [5 ⋅ (−1) + 2] = . ∫−1 3⋅5 3⋅5 3⋅5 3 −1 1 Example 114. Consider the function i defined by i(x) = ex . What is the shaded green area under the graph of i, between the lines x = 3 and x = 4? By the above amazing “trick”, the desired area is simply: ∫3 4 i(x) dx = ∫ Page 99, Table of Contents 4 3 4 ex dx = [ex ]3 = g(4) − g(3) = e4 − e3 ≈ 34.5. www.EconsPhDTutor.com Exercise 35. (Answer on p. 337.) (i) Find the area bounded by the x-axis, the lines x = 1 and x = 2, and the graph of y = 6. (ii) Find the area bounded by the x-axis, the lines x = −2 and x = 3, and the graph of y = x2 + 5x + 10. (iii) Find the area bounded by the x-axis, the lines x = 1 and x = 2, and the graph of y = 1/x. Page 100, Table of Contents www.EconsPhDTutor.com 35 Area between a Curve and Lines Parallel to Axes Example 115. Find the exact area bounded by the curve y = x2 and the horizontal lines y = 1 and y = 2. It’s always helpful to make a quick sketch (given below). Our desired area is labelled A below. To find a desired area, there are usually multiple methods, some quicker than others. √ √ Method #1. The entire rectangle A + B + C + D has area 2 × 2 2 = 4 2. B has area √ √ 3 −1 x 1 2 2 2 2−1 2 . ∫−√2 x dx = [ 3 ] √ = − 3 − (− 3 ) = 3 − 2 −1 By symmetry, D has the same area as B. C has area 1 × 2. Hence, A has area √ √ √ 2 2−1 4 √ 2 2−1 A + B + C + D − (B + C + D) = 4 2 − ( +2+ ) = (2 2 − 1) . 3 3 3 √ Method #2. The right branch of the curve y = x2 has equation x = y. The right half of y=2 √ y=2 2 2 √ 2 4 √ x dy = ∫ the area A is ∫ y dy = [y 3/2 ]1 = (2 2 − 1). Hence, A = (2 2 − 1). 3 3 3 y=1 y=1 Exercise 36. Find the exact area bounded by the curve y = x3 , the horizontal lines y = 1 and y = 2, and the vertical axis. (Answer on p. 338.) Page 101, Table of Contents www.EconsPhDTutor.com 36 Area between a Curve and a Line Example 116. Find the area A bounded by the curve y = x2 and the line y = x + 1. √ 1± 5 . By the quadratic formula, the curve and line intersect at the points x = 2 √ (1+ 5)/2 ∫(1−√5)/2 √ (1+ 5)/2 x2 x3 2 x + 1 − x dx = [ + x − ] √ 2 3 (1− 5)/2 √ 3 √ 3 √ 2 √ √ ⎡ (1 + √5)2 (1 + 5) ⎤⎥ ⎡⎢ (1 − 5) (1 − 5) ⎤⎥ ⎢ 1 + 5 1 − 5 ⎥−⎢ ⎥ + − + − = ⎢⎢ ⎥ ⎢ ⎥ 3 3 3 3 2 2 3⋅2 2 2 3⋅2 ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ ⎦ ⎣ √ √ √ √ √ √ 6 + 2 5 1 + 5 16 + 8 5 6 − 2 5 1 − 5 16 − 8 5 =[ + − ]−[ + − ] 8 2 24 8 2 24 √ √ √ √ √ √ √ √ √ 3− 5 1− 5 2− 5 7+5 5 7−5 5 5 5 3+ 5 1+ 5 2+ 5 + − ]−[ + − ]= − = . =[ 4 2 3 4 2 3 12 12 6 Exercise 37. Find the exact area bounded by the curve y = ex and the lines y = 2, y = 3, and x = 0.5. (Answer on p. 339.) Page 102, Table of Contents www.EconsPhDTutor.com 37 Area between Two Curves Example 117. Find the area A bounded by the curves y = x2 − 2x − 1 and y = 1 − x2 . √ 1± 5 . So By the quadratic formula, the curves intersect at x = 2 A=∫ √ 0.5(1+ 5) √ 0.5(1− 5) 1 − x2 − (x2 − 2x − 1) dx = 2 ∫ √ 0.5(1+ 5) √ 0.5(1− 5) 1 − x2 + x dx √ √ 0.5(1+ 5) 5 5 x3 x2 + ] = , = 2 [x − 3 2 0.5(1−√5) 3 where we’ve simply recycled our tedious calculations from the previous example. y x A Exercise 38. Find exact area bounded by the curves y = 2 − x2 and y = x2 + 1. (Answer on p. 339.) Page 103, Table of Contents www.EconsPhDTutor.com 38 Finding Definite Integrals on your TI84 Example 118. Use your TI84 to find the approximate area bounded by the curve y = esin x and the horizontal axis, between x = 1 and x = 2. After Step 1. After Step 2. After Step 3. After Step 4. After Step 5. After Step 6. After Step 7. After Step 8. After Step 9. After Step 10. 1. Press ON to turn on your calculator. 2. Press Y= . 3. Press blue 2ND button and then ex (which corresponds to the LN button). Then press SIN X,T,θ,n ) ) and altogether you will have entered esin x . 4. Now press GRAPH and the calculator will graph the given equation. 5. Press the blue 2ND button and then CALC (which corresponds to the TRACE button), to bring up the CALCULATE menu. 6. Press 7 to select the “∫ f (x) dx” option. This brings you back to the graph. 7. The TI84 is now prompting you for “Lower Limit?” Simply press 1 . 8. Now press ENTER and you will have told the TI84 that your lower limit is x = 1. 9. The TI84 is now similarly prompting you for “Upper Limit?” Simply press 2 . 10. Now press ENTER and you will have told the TI84 that your upper limit is x = 2. The TI84 now tells you that our desired area (now shaded in black) is ∫ f (x) dx = 2.60466115. Page 104, Table of Contents www.EconsPhDTutor.com Part III Probability and Statistics Probability and Statistics accounts for 60% of the A-Level H1 Maths Exam. Page 105, Table of Contents www.EconsPhDTutor.com 39 How to Count: Four Principles How many arrangements or permutations are there of the three letters in CAT? For example, one possible permutation of CAT is TCA. To solve this problem, one possible method is the method of enumeration. That is, simply list out (enumerate) all the possible permutations. ACT, ATC, CAT, CTA, TAC, TCA. We see that there are 6 possible permutations. Enumeration works well enough when we have just three letters, as in CAT. Indeed, enumeration is sometimes the quickest method. In contrast, the 13 letters in the word UNPREDICTABLY have 6, 227, 020, 800 possible permutations. So enumeration is probably not practical. To help us count more efficiently, we’ll learn about four basic principles of counting: 1. The Addition Principle (AP); 2. The Multiplication Principle (MP); 3. The Inclusion-Exclusion Principle (IEP); and 4. The Complements Principle (CP). Page 106, Table of Contents www.EconsPhDTutor.com 39.1 How to Count: The Addition Principle The addition principle (AP) is very simple. Example 119. For lunch today, I can either go to the food court or the hawker centre. At the food court, I have 2 choices: ramen or briyani. At the hawker centre, I have 3 choices: bak chor mee, nasi lemak, or kway teow. Altogether then, I have 2 + 3 = 5 choices of what to eat for lunch today. Here’s an informal statement of the AP: The Addition Principle (AP). I have to choose a destination, out of two possible areas. At area #1, there are p possible destinations to choose from. At area #2, there are q possible destinations to choose from. The Addition Principle (AP) simply states that I have, in total, p + q different choices. (Just so you know, the AP is sometimes also called the Second Principle of Counting or the Rule of Sum or the Disjunctive Rule.) Of course, the AP generalises to cases where there are more than just 2 “areas”. It may seem a little silly, but just to illustrate, let’s use the AP to tackle the CAT problem: Page 107, Table of Contents www.EconsPhDTutor.com Example 120. Problem: How many permutations are there of the letters in the word CAT? We can divide the possibilities into three cases: Case #1. First letter is an A. Then the next two letters are either CT or TC — 2 possibilities. Case #2. First letter is a C. Then the next two letters are either AT or TA — 2 possibilities. Case #3. First letter is a T. Then the next two letters are either AC or CA — 2 possibilities. Altogether then, by the AP, there are 2 + 2 + 2 = 6 possibilities. That is, there are 6 possible permutations of the letters in CAT. These are illustrated in the tree diagram below. Page 108, Table of Contents www.EconsPhDTutor.com The next exercise is very simple and just to illustrate again the AP. Exercise 39. Without retracing your steps, how many ways are there to get from the Starting Point to the River (see figure below)? (Answer on p. 340.) Exercise 40. How many permutations are there of the letters in the word DEED? Illustrate your answer with a tree diagram similar to that given in the CAT example above. (Answer on p. 340.) Page 109, Table of Contents www.EconsPhDTutor.com 39.2 How to Count: The Multiplication Principle Example 121. For lunch today, I can either have prata or horfun. For dinner tonight, I can have McDonald’s, KFC, or Pizza Hut. Enumeration shows that I have a total of 6 possible choices for my two meals today: (Prata, McDonald’s), (Prata, KFC), (Prata, Pizza Hut), (Horfun, McDonald’s), (Horfun, KFC), (Horfun, Pizza Hut). Alternatively, we can use the Multiplication Principle (MP). I have 2 choices for lunch and 3 choices for dinner. Hence, for my two meals today, I have in total 2 × 3 = 6 possible choices. Here’s an informal statement of the MP: The Multiplication Principle (MP). I have to choose two destinations, one from each of two possible areas. At area #1, there are p possible destinations to choose from. At area #2, there are q possible destinations to choose from. The Multiplication Principle (AP) simply states that I have, in total, p × q different choices. (The MP is sometimes also called the Fundamental or First Principle of Counting or the Rule of Product or the Sequential Rule.) Of course, the MP generalises to cases where there are more than just 2 “areas”. Here’s an example where we have to make 3 decisions: Page 110, Table of Contents www.EconsPhDTutor.com Example 122. For breakfast tomorrow, I can have shark’s fin or bird’s nest (2 choices). For lunch tomorrow, I can have black pepper crab or curry fishhead (2 choices). For dinner tomorrow, I can have an apple, a banana, or a carrot (3 choices). By the MP, for tomorrow’s meals, I have a total of 2 × 2 × 3 = 12 possible choices. We can enumerate these (I’ll use abbreviations): (SF, BPC, A), (SF, BPC, B), (SF, BPC, C), (SF, CF, A), (SF, CF, B), (SF, CF, C), (BN, BPC, A), (BN, BPC, B), (BN, BPC, C), (BN, CF, A), (BN, CF, B), (BN, CF, C). More examples: Page 111, Table of Contents www.EconsPhDTutor.com Example 123. Problem: How many four-letter words can be formed using the letters in the 26-letter alphabet? Let’s rephrase this problem so that it is clearly in the framework of the MP. We have 4 blank spaces to be filled: _ _ _ _. 1 2 3 4 These 4 blanks spaces correspond to 4 decisions to be made. Decision #1: What letter to put in the first blank space? Decision #2: What letter to put in the second blank space? Decision #3: What letter to put in the third blank space? Decision #4: What letter to put in the fourth blank space? How many choices have we for each decision? For Decision #1, we can put A, B, C, ..., or Z. So we have 26 choices for Decision #1. For Decision #2, we can again put A, B, C, ..., or Z. So we again have 26 choices for Decision #2. We likewise have 26 choices for Decision #3 and also 26 choices for Decision #4. Altogether then, by the MP, there are 26 × 26 × 26 × 26 = 264 = 456, 976 ways to make our four decisions. Solution: There are 264 = 456, 976 possible four-letter words that can be formed using the 26-letter alphabet. Page 112, Table of Contents www.EconsPhDTutor.com Example 124. One 18-sided die has the numbers 1 through 18 printed on each of its sides. Another six-sided die has the letters A, B, C, D, E, and F printed on each of its sides. We roll the two dice. How many distinct possible outcomes are there? Again, let’s rephrase this problem in the framework of the MP. Consider 2 blank spaces: _ _. 1 2 These 2 blank spaces correspond to 2 decisions to be made. Decision #1: What number to put in the first blank space? Decision #2: What letter to put in the second blank space? Again we ask: How many choices have we for each decision? For Decision #1, we can put 1, 2, 3, ..., or 18. So we have 18 choices for Decision #1. For Decision #2, we can put A, B, C, D, E, or F. So we have 6 choices for Decision #2. Altogether then, by the MP, there are 18 × 6 = 108 ways to make our two decisions. In other words, there are 108 possible outcomes from rolling these two dice. (If necessary, it is tedious but not difficult to enumerate them: 1A, 1B, 1C, 1D, 1E, 1F, 2A, 2B, ..., 17E, 17F, 18A, 18B, 18C, 18D, 18E, and 18F.) Exercise 41. A club as a shortlist of 3 men for president, 5 animals for vice-president, and 10 women for club mascot. How many possible ways are there to choose the president, the vice-president, and the mascot? (Answer on p. 340.) Exercise 42. (Answer on p. 341.) The highly-stimulating game of 4D consists of selecting a four-digit number, between 0000 and 9999 (so there are 10, 000 possible numbers). Your mother tells you to go to the nearest gambling den (also known as a Singapore Pools outlet) to buy any three numbers, subject to these two conditions: • The four digits in each number are distinct. • Each four-digit number is distinct. How many possible ways are there to fulfil your mother’s request? Page 113, Table of Contents www.EconsPhDTutor.com 39.3 How to Count: The Inclusion-Exclusion Principle The Inclusion-Exclusion Principle (IEP) is another very simple principle. Example 125. For lunch today, I can either go to the food court or the hawker centre. At the food court, I have 4 choices of cuisine: Chinese, Indian, Malay, and Western. At the hawker centre, I have 3 choices of cuisine: Chinese, Malay, and Thai. There are 2 choices of cuisine that are common to both the food court and the hawker centre (Chinese and Malay). And so by the Inclusion-Exclusion Principle (IEP), I have in total 4 + 3 − 2 = 5 choices of cuisine. The Venn diagram below illustrates. Why do we subtract 2? If we simply added the 4 choices available at the food court to the 3 available at the hawker centre, then we’d double-count the Chinese and Malay cuisines, which are available at both the food court and the hawker centre. And so we must subtract the 2 cuisines that are at both locations. Page 114, Table of Contents www.EconsPhDTutor.com Example 126. Problem: How many integers between 1 and 20 are divisible by 2 or 5? There are 10 integers divisible by 2, namely 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20. There are 4 integers divisible by 5, namely 5, 10, 15, and 20. There are 2 integers divisible by BOTH 2 and 5, namely 10 and 20. Hence, by the IEP, there are 10 + 4 − 2 = 12 integers that are divisible by either 2 or 5. (These are namely 2, 4, 5, 6, 8, 10, 12, 14, 15, 16, 18, and 20.) Here’s an informal statement of the IEP: The Inclusion-Exclusion Principle (IEP). I have to choose a destination, out of two possible areas. At area #1, there are p possible destinations to choose from. At area #2, there are q possible destinations to choose from. Areas #1 and #2 overlap — they have r destinations in common. The IEP simply states that I have, in total, p + q − r different choices. Exercise 43. (Answer on p. 342.) The food court has 4 types of cuisine: Chinese, Indonesian, Korean, and Western. The hawker centre has 3: Chinese, Malay, and Western. A restaurant has 3: Chinese, Japanese, or Malay. In total, how many different types of cuisine are there? Illustrate your answer with a Venn diagram. Page 115, Table of Contents www.EconsPhDTutor.com 39.4 How to Count: The Complements Principle The Complements Principle (CP) is another very simple principle. Example 127. The food court has 4 types of cuisine: Chinese, Malay, Indian, and Other. I’m at the food court but don’t feel like eating Malay or Chinese. So by the Complements Principle (CP), I have 4 − 2 = 2 possible choices of cuisine (Indian and Other). Here’s an informal statement of the CP: The Complements Principle (CP). There are p possible destinations. I must choose one. I rule out q of the possible destinations. The Complements Principle says that I am left with p − q possible choices. Exercise 44. There are 10 Southeast Asian countries, of which 3 (Brunei, Indonesia, and the Philippines) are not on the mainland. How many mainland Southeast Asian countries are there that a European tourist can visit? (Answer on p. 342.) Page 116, Table of Contents www.EconsPhDTutor.com 40 How to Count: Permutations In this chapter, we’ll use the MP to generate several more methods of counting. But first, we’ll learn about the factorial notation. Definition 1. Let n ∈ Z+0 . Then n-factorial, denoted n!, is defined by n! = n × (n − 1) × ⋅ ⋅ ⋅ × 1 for n ≥ 1 and 0! = 1. Example 128. 0! = 1, 1! = 1, 2! = 2× = 2, 3! = 3 × 2 × 1 = 6, 4! = 4 × 3 × 2 × 1 = 24, 5! = 5 × 4 × 3 × 2 × 1 = 120. Exercise 45. Compute 6!, 7!, and 8!. (Answer on p. 342.) We now revisit the CAT problem, using the MP: Page 117, Table of Contents www.EconsPhDTutor.com Example 129. Problem: How many permutations (or arrangements) are there of the three letters in the word CAT? Let’s rephrase this problem in the framework of the MP. Consider three blank spaces: _ _ _. 1 2 3 These 3 blank spaces correspond to 3 decisions to be made. Decision #1: What letter to put in the first blank space? Decision #2: What letter to put in the second blank space? Decision #3: What letter to put in the third blank space? Again we ask: How many choices have we for each decision? For Decision #1, we can put C, A, or T. So we have 3 choices for Decision #1. Having already used up a letter in Decision #1, we are left with two letters. So we have 2 choices for Decision #2. Having already used up a letter in Decision #1 and another in Decision #2, we are left with just one letter. So we have only 1 choice for Decision #3. Altogether then, by the MP, there are 3×2×1 = 3! = 6 possible ways of making our decisions. This is also the number of ways there are to arrange the three letters in the word CAT. Let’s now try the UNPREDICTABLY problem. Page 118, Table of Contents www.EconsPhDTutor.com Example 130. Problem: How many ways permutations are there of the 13 letters in the word UNPREDICTABLY? Again, let’s rephrase this problem in the framework of the MP. Consider 13 blank spaces: _ _ _ _ _ _ _ _ _ _ _ _ _. 1 2 3 4 5 6 7 8 9 10 11 12 13 These 13 blanks spaces correspond to 13 decisions to be made. Decision #1: What letter to put in the first blank space? Decision #2: What letter to put in the second blank space? ... Decision #13: What letter to put in the 13th blank space? Again we ask: How many choices have we for each decision? First an important note: In the word UNPREDICTABLY, no letter is repeated. (Indeed, UNPREDICTABLY is the longest “common” English word without any repeated letters.) For Decision #1, we can put U, N, P, R, E, D, I, C, T, A, B, L, or Y. So we have 13 choices for Decision #1. For Decision #2, having already used up a letter in Decision #1, we are left with 12 letters. So we have 12 choices for Decision #2. For Decision #3, having already used up a letter in Decision #1 and another letter in Decision #2, we are left with 11 letters. So we have 11 choices for Decision #3. ⋮ For Decision #13, having already used up a letter in Decision #1, another in Decision #2, another in Decision #3, ..., and another in Decision #12, we are left with one letter. So we have 1 choice for Decision #13. Altogether then, by the MP, there are 13 × 12 × ⋅ ⋅ ⋅ × 2 × 1 = 13! = 6, 227, 020, 800 possible ways of making our decisions. This is also the number of ways there are to arrange the 13 letters in the word UNPREDICTABLY. The next fact simply summarises what should already be obvious from the above examples: Fact 2. There are n! possible permutations of n distinct objects. Here is an informal proof of the above fact. Page 119, Table of Contents www.EconsPhDTutor.com Consider n empty spaces. We are to fill them with the n distinct objects. _ _ _ . . . _. 1 2 3 n For space #1, we have n possible choices. For space #2, we have n − 1 possible choices (because one object was already placed in space #1). ... And finally for space #n, we have only 1 object left and thus only 1 choice. By the MP then, there are n × (n − 1) × ⋅ ⋅ ⋅ × 1 = n! possible ways of filling in these n spaces with the n distinct objects. Example 131. The word COWDUNG has seven distinct letters. Hence, there are 7! = 5040 permutations of the letters in the word COWDUNG. Page 120, Table of Contents www.EconsPhDTutor.com 40.1 Permutations with Repeated Elements In the previous section, we saw that there are 3! permutations of the three letters in the word CAT and 13! permutations of the 13 letters in the word UNPREDICTABLY. We made an important note: In each of these words, there was no repeated letter. We now consider permutations of a set where some elements are repeated. Example 132. How many permutations are there of the three letters in the word SEE? A naïve application of the MP would suggest that the answer is 3! = 6. This is wrong. Enumeration shows that there are only 3 possible permutations: EES, ESE, SEE. To see why a naïve application of the MP fails, set up the problem in the framework of the MP. Consider 3 blank spaces: _ _ _. 1 2 3 These 3 blanks spaces correspond to 3 decisions to be made. Decision #1: What letter to put in the first blank space? Decision #2: What letter to put in the second blank space? Decision #3: What letter to put in the third blank space? Again we ask: How many choices have we for each decision? For Decision #1, we can put E or S. So we have 2 choices for Decision #1. But now the number of choices available for Decision #2 depends on what we chose for Decision #1! (If we chose E in Decision #1, then we again have 2 choices for Decision #2. But if instead we chose S in Decision #2, then we now have only 1 choice for Decision #2.) This violates the implicit but important assumption in the MP that the number of choices available in one decision is independent on the choice made in the other decision. Hence, the MP does not directly apply. (... Example continued on the next page ...) Page 121, Table of Contents www.EconsPhDTutor.com (... Example continued from the next page ...) The reason SEE has only 3 possible permutations (instead of 3! = 6) is that it contains a repeated element, namely E. But why would this make any difference? To understand why, let’s rename the second E as Ê, so that the word SEE is now transformed into a new word SEÊ. From the three letters of this new word, we’d again have 3! = 6 possible permutations: EÊS, ÊES, ESÊ, ÊSE, SEE, SÊE. Restricting attention to the two letters EÊ, we see that there are 2! = 2 ways to permute these two letters. Hence, any single permutation (in the case where we do not distinguish between the two E’s) corresponds to 2 possible permutations (in the case where we do). The figure below illustrates how the 3 permutations of SEE correspond to the 6 permutations in SEÊ. Hence, when we do not distinguish between the two E’s, there are only half as many possible permutations. We next consider permutations of SASS. Page 122, Table of Contents www.EconsPhDTutor.com Example 133. How many permutations are there of the four letters in the word SASS? The answer is 4!/3! = 4. Let’s see why. If we distinguish between the three S’s, perhaps by calling them S, Ŝ, and S̄, then we’d have 4! = 24 possible permutations of the letters in the word SAŜS̄. But amongst the three S’s themselves, we have 3! = 6 possible permutations: SŜS̄, SS̄Ŝ, ŜSS̄, S̄SŜ, ŜS̄S, and S̄ŜS. So distinguishing between the three S’s increases by 6-fold the number of possible permutations. Working backwards, the word SASS thus has one-sixth as many permutations as SAŜS̄. That is, SASS has 4!/3! = 4 possible permutations. The figure below illustrates how the 4 possible permutations of SASS correspond to the 24 possible permutations of SAŜS̄. Exercise 46. There are 3 identical white tiles and 4 identical black tiles. How many ways are there of arranging these 7 tiles in a row? (Answer on p. 342.) Page 123, Table of Contents www.EconsPhDTutor.com 40.2 Partial Permutations Example 134. Using the 26-letter alphabet, how many 3-letter words can we form that have no repeated letters? This, of course, is simply the problem of filling in these 3 empty spaces using 26 distinct elements. For space #1, we have 26 possible choices. For space #2, we have 25. And for space #2, we have 24. ___ 1 2 3 By the MP then, the number of ways to fill the three spaces is 26 × 25 × 24. This is also the number of three-letter words with no repeated letters. Problems like the above example crop up often enough to motivate a new piece of notation: Definition 2. Let n, k be positive integers with n ≥ k. Then P (n, k), read aloud as n permute k, is defined by P (n, k) = n! . (n − k)! P (n, k) answers the following question: “Given n distinct objects and k spaces (where k ≤ n), how many ways are there to fill the k spaces?” Just so you know, P (n, k) is also variously denoted nP k, Pkn , n Pk , etc., but we’ll stick solely with the P (n, k) in this textbook. Example 124 (continued from above). The number of 3-letter words without repeated letters is simply P (26, 3) = 26!/23! = 26 × 25 × 24. Example 135. Problem: Using the 22-letter Phoenician alphabet, how many 4-letter words can we form that have no repeated letters? This, of course, is simply the problem of filling in these 4 empty spaces using 22 distinct elements. So the answer is P (22, 4) = 22!/18! = 22 × 20 × 19 × 18 words. Exercise 47. Out of a committee of 11 members, how many ways are there to choose a president and a vice-president? (Answer on p. 342.) Page 124, Table of Contents www.EconsPhDTutor.com 40.3 Permutations with Restrictions Example 136. At a dance party, there are 7 heterosexual married couples (and thus 14 people in total). Problem #1. How many ways are there of arranging them in a line, with the restriction that every person is next to his or her partner? Think of there as being 7 units (each unit being a couple). There are 7! ways to arrange these 7 units in a line. Within each unit, there are 2 possible arrangements. Hence, in total, there are 7! × 27 possible arrangements. Example 137. (I assume you’re familiar with the standard 52-card deck.) (... Example continued on the next page ...) Page 125, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) Problem #1. Using a standard 52-card deck, how many ways are there of arranging any 3 cards in a line, with the restriction that no two cards of the same suit are next to each other? This is the problem of filling in 3 spaces with 52 distinct objects. For space #1, we have 52 possible choices. _ _ _. 1 2 3 For space #2, having picked a card of suit X for space #1, we must pick a card from some other suit Y. And so there are only 39 possible choices (we have three suits available — that’s 3 × 13 = 39). For space #3, having picked a card of suit Y for space #2, we must pick a card from some other suit Z. Note that suit Z can be the same as suit X. And so there are 38 possible choices (we have three suits available, less the card used for space #1 — that’s 3 × 13 − 1 = 38). Altogether then, there are 52 × 39 × 38 possible arrangements. Exercise 48. (Answer on p. 343.) There are 4 brothers and 3 sisters. In how many ways can they be arranged ... (a) in a line, without any 2 brothers being next to each other? (b) in a line, without any 2 sisters being next to each other? Page 126, Table of Contents www.EconsPhDTutor.com 41 How to Count: Combinations P (n, k) is the number of ways we can fill k (ordered) spaces using n distinct objects. In contrast, C(n, k) is the number of ways of choosing k out of n distinct objects. Equivalently, it is the same problem of filling k spaces using n distinct objects, except that now order does not matter. Example 138. Suppose we have a committee of 13 members and wish to select a president and a vice-president. This is equivalent to the problem of filling in 2 spaces, given 13 distinct objects. __ 1 2 The answer is thus simply P (13, 2) = 13 × 12. Suppose instead that we want to choose two co-presidents. How many ways are there of doing so? This is simply the same problem as before — again we want to fill in 2 spaces, given 13 distinct objects. The only difference now is that the order of the 2 chosen objects does not matter. So the answer must be that there are P (13, 2)/2! ways of choosing the two co-presidents. Example 139. How many ways are there of choosing 5 cards out of a standard 52-card deck? _____ 1 2 3 4 5 First, how many ways are there to fill 5 spaces using 52 distinct objects (where order matters)? Answer: P (52, 5) = 52 × 51 × 50 × 49 × 48 = 311, 875, 200. And so if we don’t care about order, we must adjust this number by dividing by 5! to get P (52, 5)/5! = 2, 598, 960. So the answer is that to choose 5 cards out of a 52-card deck, there are 2, 598, 960 ways. The above examples suggest that, in general, to choose k out of n given distinct objects, there are P (n, k)/k! possible ways. This motivates the following definition: Page 127, Table of Contents www.EconsPhDTutor.com Definition 3. Let n, k be positive integers with n ≥ k. Then C(n, k), read aloud as n choose k, is defined by C(n, k) = P (n, k) n! = . k! (n − k)!k! It turns out that C(n, k) appears so often in maths that it has many alternative notations ⎛n⎞ — one of the most common is . ⎝k ⎠ “n choose k” also has several names, such as the combination, the combinatorial number, and even the binomial coefficient. Shortly, we’ll see why the name binomial coefficient makes sense. Exercise 49 gives an alternate expression for C(n, k) which you’ll often find very useful. Exercise 49. Show that C(n, k) = 344.) n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1) . (Answer on p. k! Exercise 50. Compute C(4, 2), C(6, 4), and C(7, 3). (Answer on p. 344.) Exercise 51. We wish to form a basketball team, consisting of 1 centre, 2 forwards, and 2 guards. We have available 3 centres, 7 forwards, and 5 guards. How many ways are there of forming a team? (Answer on p. 344.) Here’s a nice symmetry property: Ways to choose k out Ways to choose n − k out = of n distinct objects of n distinct objects. Intuitively, this property is true because choosing k out of n objects, is the same as choosing which n − k out of n objects to ignore. Let’s jot down this symmetry property as a formal fact: Fact 3. (Symmetry.) C(n, k) = C(n, n − k). Page 128, Table of Contents www.EconsPhDTutor.com Example 140. We have a group of 100 men. 70 are needed for a task. The number of ways to choose these 70 men is: C(100, 70) = 100! . 30!70! This is the same as the number of ways to choose the 30 men that will not be used for the task: C(100, 30) = Page 129, Table of Contents 100! . 70!30! www.EconsPhDTutor.com 41.1 Pascal’s Triangle Pascal’s Triangle consists of a triangle of numbers. If we adopt the convention that the topmost row is row 0 and the leftmost term of each row is the 0th term, then the nth row, k th term is the number C(n, k): 1 1 1 1 1 1 1 1 7 2 3 4 5 6 1 3 6 10 15 21 1 1 4 10 20 25 1 5 15 35 1 6 21 1 7 1 ⋮ It turns out that beautifully enough, each term is equal to the sum of the two terms above it. The next exercise asks you to verify several instances of this: Exercise 52. Verify the following: (a) C(1, 0) + C(1, 1) = C(2, 1); (b) C(4, 2) + C(4, 3) = C(5, 3); (c) C(17, 2) + C(17, 3) = C(18, 3). (Answer on p. 344.) Fact 4. (Pascal’s Rule/Identity/Relation.) C(n + 1, k) = C(n, k) + C(n, k − 1). Proof. C(n + 1, k) is the number of ways of choosing k out of n + 1 distinct objects. Suppose we do not choose the last object, i.e. the (n + 1)th object. Then we have to choose our k objects out of the first n objects. There are C(n, k) ways of doing so. Suppose we do choose the last object. Then we have to choose another k − 1 objects, out of the first n objects. There are C(n, k − 1) ways of doing so. Altogether then, by the Addition Principle, there are C(n, k) + C(n, k − 1) ways of choosing k out of n + 1 distinct objects. Page 130, Table of Contents www.EconsPhDTutor.com 41.2 The Combination as Binomial Coefficient Mathematics is the art of giving the same name to different things. - Henri Poincaré, p. 34 in Science and Method. Poincaré’s quote is especially true in combinatorics. In this section, we’ll learn why C (n, k) can be called the combination and also the binomial coefficient. Verify for yourself that the following equations are true: (1 + x)0 = 1, (1 + x)1 = 1 + x, (1 + x)2 = 1 + 2x + x2 , (1 + x)3 = 1 + 3x + 3x2 + x3 , (1 + x)4 = 1 + 4x + 6x2 + 4x3 + x4 , (1 + x)5 = 1 + 5x + 10x2 + 10x3 + 5x4 + x5 , (1 + x)6 = 1 + 6x + 15x2 + 20x3 + 15x4 + 6x5 + x6 , (1 + x)7 = 1 + 7x + 21x2 + 35x3 + 35x4 + 21x5 + 7x6 + x7 . ⋮ Each of the expressions on the RHS is called a binomial series. Each can also be called the binomial expansion of (1 + x)n . Notice anything interesting? No? Try this exercise: ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ , , , , , , , . Compare ⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠ ⎝5⎠ ⎝6⎠ ⎝7⎠ these to the coefficients of the binomial expansion of (1+x)7 . What do you notice? (Answer on p. 345.) Exercise 53. Compute It turns out that somewhat surprisingly, the coefficients of the binomial expansions of ⎛n⎞ ⎛n⎞ ⎛n⎞ (1 + x)n are simply , , ... . As an additional exercise, you should verify for ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝n⎠ yourself that this is also true for n = 0 through n = 6. There are several ways to explain why the combinatorial numbers also happen to be the binomial coefficients. Here we’ll give only the combinatorial explanation: Page 131, Table of Contents www.EconsPhDTutor.com Consider (1 + x)2 . Expanding, we have (1 + x)2 = (1 + x)(1 + x) = 1 ⋅ 1 + 1 ⋅ x + x ⋅ 1 + x ⋅ x. Consider the 4 terms on the right. For 1 ⋅ 1, we “chose” 1 from the first (1 + x) and 1 from the second (1 + x). For 1 ⋅ x, we “chose” 1 from the first (1 + x) and x from the second (1 + x). For x ⋅ 1, we “chose” x from the first (1 + x) and 1 from the second (1 + x). Finally, for x ⋅ x, we “chose” x from the first (1 + x) and x from the second (1 + x). Ð→ From the two (1 + x)’s in the product, there is C(2, 0) = 1 way to choose 0 of the x’s. ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ From the two (1 + x)’s in the product, there are C(2, 1) = 2 ways to choose 1 of the x’s. Ð→ From the two (1 + x)’s in the product, there is C(2, 2) = 1 way to choose 2 of the x’s. Altogether then, the coefficient on x0 is C(2, 0) (“choose 0 of the x’s”), that on x1 is C(2, 1) (“choose 1 of the x’s”), and that on x2 is C(2, 1) (“choose 2 of the x’s”). That is: (1 + x)2 = ⎛2⎞ 0 ⎛2⎞ 1 ⎛2⎞ 2 x + x + x = 1 + 2x + x2 . ⎝0⎠ ⎝1⎠ ⎝2⎠ Exercise 54. (Answer on p. 345.) Mimicking what was just done above, explain why (1 + x)3 = ⎛3⎞ 0 ⎛3⎞ 1 ⎛3⎞ 2 ⎛3⎞ 3 x + x + x + x. ⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ More generally, we have Page 132, Table of Contents www.EconsPhDTutor.com Fact 5. Let n ∈ Z+ . Then ⎛ n ⎞ n−i i ⎛ n ⎞ n 0 ⎛ n ⎞ n−1 1 ⎛ n ⎞ n−2 2 ⎛n⎞ 0 n x y + x y + ⋅⋅⋅ + xy . x y + x y = ⎠ ⎝ ⎠ ⎝ ⎠ ⎠ ⎝ ⎠ ⎝ ⎝ 2 n 1 0 i i=0 n (x + y)n = ∑ Page 133, Table of Contents www.EconsPhDTutor.com 41.3 The Number of Subsets of a Set is 2n By plugging x = 1, y = 1 into the last fact, we see that (1 + 1) = 2n is the sum of the terms in the nth row of Pascal’s triangle: Fact 6. Let n ∈ Z+ . Then ⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ = + + + ⋅⋅⋅ + . ⎝n⎠ i=0 ⎝ i ⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ n 2 =∑ n There’s a nice combinatorial interpretation of the above fact (Poincaré’s quote at work again). Consider the set S = {A, B}. S has 22 = 4 subsets: ∅ = {}, {A}, {B}, and S = {A, B}. Now consider the set T = {A, B, C}. T has 23 = 8 subsets: ∅ = {}, {A}, {B}, {C}, {A, B}, {A, C}, {B, C}, and T = {A, B, C}. In general, if a set has n elements, how many subsets does it have? We can couch this in the framework of the Multiplication Principle — this is really a sequence of n decisions of whether or not to include each element in the subset. There are 2 choices for each decision. Thus, there are 2n choices altogether. In other words, using a set of n elements, we can form 2n subsets. But of course, this must in turn be equal to the sum of the following: • C (n, 0) ways to form subsets with 0 elements; • C (n, 1) ways to form subsets with 1 element; • C (n, 2) ways to form subsets with 2 elements; ... • C (n, n) ways to form subsets with n elements. Thus, 2n = Page 134, Table of Contents ⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ + + + ⋅⋅⋅ + . ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝n⎠ www.EconsPhDTutor.com Exercise 55. Verify that 27 = ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ + + + ⋅⋅⋅ + . (Answer on p. 345.) ⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠ Exercise 56. Using what you’ve learnt, write down (3 + x)4 . (Answer on p. 346.) Exercise 57. (Answer on p. 346.) (a) The Tan family has 4 sons and the Wong family has 3 daughters. Using the sons and daughters from these two families, how many ways are there of forming 2 heterosexual couples? (b) The Lee family has 6 sons and the Ho family has 9 daughters. Using the sons and daughters from these two families, how many ways are there of forming 5 heterosexual couples? Page 135, Table of Contents www.EconsPhDTutor.com 42 Probability: Introduction 42.1 Mathematical Modelling All models are wrong, but some are useful. - G.E.P. Box, p. 202 in Robustness in Statistics. Whenever we use maths in a real-world scenario, we have some mathematical model in mind. Here’s a very simple example just to illustrate: Example 141. We want to know how much material to purchase, in order to build a fence around a field. We might go through these steps: 1. Formulate a mathematical model: Our field is the shape of a rectangle, with length 100 m and breadth 50 m. 2. Analyse: The rectangle has perimeter 100 + 50 + 100 + 50 = 300 m. 3. Apply the results of our analysis: We need to buy enough material to build a 300-metre long fence. The figure below depicts how mathematical modelling works. Starting with some real-world scenario, we go through these steps: 1. Formulate a mathematical model. That is, describe the real-world scenario in mathematical language and concepts. This first step is arguably the most important. It is often subjective — not everyone will agree that your mathematical model is the most appropriate for the scenario at hand. To use the above example, the field may not be a perfect rectangle, so some may object to your description of the field as a rectangle. Nonetheless, you may decide that all things considered, the rectangle is a good mathematical model. Page 136, Table of Contents www.EconsPhDTutor.com 2. Analyse the model. This involves using maths and the rules of logic. (A-level maths exams tend to be mostly concerned with this second step.) In the above example, this second step simply involved computing the perimeter of the rectangle — 100 + 50 + 100 + 50 = 300 m. Of course, for the A-levels, you can expect the analysis to be more challenging than this. Note that this second step, in contrast to the first, is supposed to be completely watertight, non-subjective, and with no room for disagreement. After all, hardly anyone reasonable could disagree that a perfect rectangle with length 100 m and breadth 50 m has perimeter 300 m. 3. Apply your results. Now apply the results of your analysis to the real-world scenario. In the above example, pretend you’re a mathematical consultant hired by the fence-builder. Then your final report might simply say, “We recommend the purchase of 300 m worth of fence material.” This third and last step is, like the first, subjective and open to debate. It involves your interpretation of what the results of your analysis mean (in the real world) and your recommendation of what actions to take. For example, you find that the fence will have perimeter 300 m and thus recommend that 300 m of fence material be purchased. However, someone else, looking at the same result, might point out that the corners of the fence require additional or special material; she might thus make a slightly different recommendation. We’ve secretly always been using mathematical modelling; we just haven’t always been terribly explicit about it. The foregoing discussion was placed here, because with probability and statistical models, we want to be especially clear about that we are doing mathematical modelling. Page 137, Table of Contents www.EconsPhDTutor.com 42.2 The Experiment as a Model of Scenarios Involving Chance Real-world scenarios often involve chance. We can model such scenarios mathematically using a mathematical object called the experiment. The experiment can be formally defined, but we shall not do so in this textbook. Instead, we’ll merely discuss the experiment informally, with the aid of examples.13 Example 142. A coin flip is an example of an experiment. There are two possible outcomes: H and T . Example 143. A die roll is an example of an experiment. There are six possible outcomes: 1, 2, 3, 4, 5, and 6. An event is simply any set of possible outcomes. Example 144. In the die roll experiment, an example of an event is A = {1, 3, 5}. This is the event that the die roll is odd. The probability of this event occurring is 0.5. We may write P(A) = 0.5. Another example of an event is B = {2, 4, 6}. This is the event that the die roll is even. The probability of this event occurring is 0.5. We may write P(B) = 0.5. Another example of an event is C = {1}. This is the event that the die roll is 1. The probability of this event occurring is 1/6. We may write P(C) = 1/6. Exercise 58. (Answer on p. 347.) For each of the following experiments, list the possible outcomes. State the probability of the given event. (a) You pick, at random, a card from a standard 52-card deck. The event A is the event that we get a spade. (b) You flip two fair coins. The event B is the event that both coin-flips are the same. (c) You roll two fair dice. The event C is the event that the dice sum to 9. 13 See my H2 Mathematics Textbook for a thorough, rigorous, and formal discussion. Page 138, Table of Contents www.EconsPhDTutor.com 42.3 Mutually Exclusive Events To say that two events A and B are mutually exclusive (or disjoint) is to say, informally, that: If A occurs, this means that B cannot possibly have occurred. And if B occurs, this means that A cannot possibly have occurred. Example 145. Consider the events A = {1, 3, 5}, B = {2, 4, 6}, and C = {1} in the die-roll experiment. • The events A and B are mutually exclusive. • The events B and C are mutually exclusive. • But the events A and C are not mutually exclusive. Example 146. We randomly pick a student from the student population. D is the event that the student is taller than 1.8 m; E is the event that the student is shorter than 1.6 m; and F is the event that the student is male. • The events D and E are mutually exclusive. • But the events E and F are not mutually exclusive. • Nor are the events D and F . Example 147. We randomly pick a car in the carpark. G is the event that the car is blue. H is the event that the car is a Mercedes-Benz. I is the event that the car has only two seats. Of the three events given, no two are mutually exclusive. Exercise 59. We randomly pick a student from the student population. A is the event that this student has an iPhone. B is the event that this student has exactly one phone. C is the event that this student has at least two phones. (i) Are A and B mutually exclusive? (ii) A and C? (iii) B and C? (Answer on p. 347.) Page 139, Table of Contents www.EconsPhDTutor.com 42.4 Complementary Events Let A be an event. Its complement — the event A′ (also denoted Ac ) — is the set of all outcomes other than those in A. Example 148. Consider the events A = {1, 2}, B = {2, 3, 5}, and C = {1} in the die-roll experiment. Their complements are A′ = {3, 4, 5, 6}, B ′ = {1, 4, 6}, and C ′ = {2, 3, 4, 5, 6}. Example 149. We randomly pick a student from the student population. D is the event that the student is taller than 1.8 m. Its complement is D′ , the event that the student is 1.8 m or shorter. Example 150. We randomly pick a car in the carpark. G is the event that the car is blue. Its complement is G′ , the event that the car is not blue. Exercise 60. We randomly pick a student from the student population. A is the event that this student has exactly one phone. B is the event that this student has two phones. What are the complements B ′ and C ′ ? (Answer on p. 347.) Page 140, Table of Contents www.EconsPhDTutor.com 42.5 The Union of Two Events Example 151. Flip three fair coins. The possible outcomes are HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T. Let A be the event that there is at least 1 tail, B be the event that there are at least 2 heads, and C be the event that there are at least 3 tails. That is, A = {HHT, HT H, HT T, T HH, T HT, T T H, T T T } , B = {HHH, HHT, HT H, T HH} , C = {T T T } . A ∪ B is the event that there is at least 1 tail OR there are at least 2 heads. A ∪ C is the event that there is at least 1 tail. B ∪ C is the event that there are at least 3 tails OR there are at least 2 heads. A ∪ B = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T } , A ∪ C = A = {HHT, HT H, HT T, T HH, T HT, T T H, T T T } , B ∪ C = {HHH, HHT, HT H, T HH, T T T } . Exercise 61. Roll two dice. Let A be the event that the sum of the rolls is even; B be the event that it is 11 or 12; and C be the event that it is odd. Write down the probabilities of the events A, B, C, A ∪ B, A ∪ C, and B ∪ C. (Answer on p. 348.) Page 141, Table of Contents www.EconsPhDTutor.com 42.6 The Intersection of Two Events Example 152. Flip three fair coins. The possible outcomes are HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T. As before, let A be the event that there is at least 1 tail, B be the event that there are at least 2 heads, and C be the event that there are at least 3 tails. A ∩ B is the event that there is at least 1 tail AND there are at least 2 heads. A ∩ C is the event that there are at least 3 tails. B ∩ C is the event that there are at least 3 tails AND there are at least 2 heads. A ∩ B = {HT T, T HT, T T H} , A ∩ C = C = {T T T } , B ∩ C = {} . Note that B ∩ C is the empty event. That is, it is the event that contains no outcomes. Exercise 62. Roll two dice. As before, let A be the event that the sum of the rolls is even; B be the event that it is 11 or 12; and C be the event that it is odd. Write down the probabilities of the events A ∩ B, A ∩ C, and B ∩ C. (Answer on p. 348.) Page 142, Table of Contents www.EconsPhDTutor.com 42.7 Properties of Probabilities Let A and B be events. Probabilities must satisfy the following properties. 1. Non-negativity: P(A) ≥ 0. 2. Normalisation: P(S) = 1, where S is the set of all possible outcomes. 3. Sum of two mutually exclusive events: P(A ∪ B) = P(A) + P(B). 4. Complements: P(A) = 1 − P (Ac ). 5. Monotonicity: If every event in B is also in A, then P(B) ≤ P(A). 6. Probabilities are at most 1: P(A) ≤ 1. 7. Inclusion-Exclusion: P(A ∪ B) = P(A) + P(B) − P(A ∩ B). Venn diagrams are helpful for illustrating probabilities. Those below help to illustrate four of the above properties. Page 143, Table of Contents www.EconsPhDTutor.com Exercise 63. Illustrate each of the following two properties with a Venn diagram: (a) “If two events A and B are mutually exclusive, then P(A ∩ B) = 0.” (b) “Let A, B, and C be events. Then P(A ∪ B ∪ C) = P(A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C).” (Answer on p. 348.) Page 144, Table of Contents www.EconsPhDTutor.com 43 Probability: Conditional Probability Example 153. Flip three fair coins. The possible outcomes are HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T. Let A be the event that there is at least 1 tail and B be the event that there are at least 2 heads. That is, A = {HHT , HT H, HT T, T HH, T HT, T T H, T T T } , B = {HHH, HHT , HT H, T HH} . Question: You are told that A occurred; what then is the probability that B also occurred? The desired probability is called the conditional probability of A given B and is denoted P(B∣A). It is given by P(B∣A) = P(A ∩ B) 3/8 3 = = . P(A) 7/8 7 Explanation: There is probability 7/8 that A occurred. There is probability 3/8 that both A and B occurred. Thus, given that A occurred, the probability that B also occurred is 3/7. Page 145, Table of Contents www.EconsPhDTutor.com Example 154. A and B are events, with P(A) = 0.5, P(B) = 0.6, and P(A ∩ B) = 0.2. Hence, given that B has occurred, the probability that A has also occurred is simply 0.2/0.6 = 1/3. (The information that P(A) = 0.5 is irrelevant.) Formally: P(A∣B) = P(A ∩ B) 0.2 1 = = . P(B) 0.6 3 Exercise 64. Roll two dice. Given that the sum of the two dice rolls is 8, what is the probability that we rolled at least one even number? (Answer on p. 349.) Page 146, Table of Contents www.EconsPhDTutor.com 44 Probability: Independence Informally, two events A and B are independent if the probability that both occur is simply the product of the probabilities that each occurs. Independence is thus analogous to the MP from counting. Formally: Definition 4. Two events A, B ∈ Σ are independent if P(A ∩ B) = P(A)P(B). There is a second, equivalent perspective of independence. Informally, two events A and B are independent if the probability that A occurs is independent of whether B has occurred. Formally: Fact 7. Suppose P(B) ≠ 0. Then A, B are independent events ⇐⇒ P(A∣B) = P(A). 1 Proof. By definition of conditional probabilities, P(A∣B) = P(A ∩ B)/P(B). By definition 2 2 1 of independence, P(A ∩ B) = P(A)P(B). Plugging = into =, we have P(A∣B) = P(A), as desired. Page 147, Table of Contents www.EconsPhDTutor.com Example 155. Flip two fair coins. Let H1 be the event that the first coin flip is Heads — that is, H1 = {HH, HT }. Analogously define T1 , H2 , and T2 . The intuitive idea of independence is easy to grasp. If we say that the two coin flips are independent, what we mean is that the following four conditions are true: 1. H1 and H2 are independent. (The probability that the second flip is heads is independent of whether the first flip is heads.) 2. H1 and T2 are independent. (The probability that the second flip is tails is independent of whether the first flip is heads.) 3. T1 and H2 are independent. (The probability that the second flip is heads is independent of whether the first flip is tails.) 4. T1 and T2 are independent. (The probability that the second flip is tails is independent of whether the first flip is tails.) Formally: 1. P (H1 ∩ H2 ) = P({HH}) = P (H1 ) P (H2 ) = P({HH, HT }) ⋅ P({HH, T H}) = 0.5 × 0.5 = 0.25. 2. P (H1 ∩ T2 ) = P({HT }) = P (H1 ) P (T2 ) = P({HH, HT })⋅P({HT, T T }) = 0.5×0.5 = 0.25. 3. P (T1 ∩ H2 ) = P({T H}) = P (T1 ) P (H2 ) = P({T H, T T })⋅P({HH, T H}) = 0.5×0.5 = 0.25. 4. P (T1 ∩ T2 ) = P({T T }) = P (T1 ) P (T2 ) = P({T H, T T }) ⋅ P({HT, T T }) = 0.5 × 0.5 = 0.25. Example 156. Flip a fair coin and roll a fair die. Consider the event “Heads” E1 = {H1, H2, H3, H4, H5, H6}, and the event “Roll an odd number” E2 = {H1, H3, H5, T 1, T 3, T 5}. These two events E1 and E2 are independent, as we now verify: P (E1 ∣E2 ) = P (E1 ∩ E2 ) 3/12 1 = = = P (E1 ) . P (E2 ) 6/12 2 More broadly, we can even say that the coin flip and die roll are independent. Informally, this means that the outcome of the coin flip has no influence on the outcome of the die roll, and vice versa. The idea of independence is a little tricky to illustrate on a Venn diagram. I’ll try anyway. Page 148, Table of Contents www.EconsPhDTutor.com Example 157. The Venn diagram below illustrates a sample space with 100 equally likely outcomes (represented by 100 small squares). The event A is highlighted in red. The event B is highlighted in blue. P(A) = 0.2 (A is made of 20 small squares). P(B) = 0.1 (B is made of 10 small squares). The event A ∩ B, coloured in green, is made of 2 small squares, so P(A ∩ B) = 0.02. We compute P(A∣B) = P(A ∩ B) 0.02 = = 0.2. P(B) 0.1 We observe that P(A) = 0.2 = P(A∣B). And so by Fact 7, we conclude that the events A and B are independent. Page 149, Table of Contents www.EconsPhDTutor.com Exercise 65. Symmetry of Independence. In Fact 7, we showed that “A, B independent ⇐⇒ P(A∣B) = P(A)”. Now prove that “A, B are independent events ⇐⇒ P(B∣A) = P(B).” (Answer on p. 349.) Exercise 66. (Answer on p. 349.) An example of a transitive relation is equality: If A = B and B = C, then A = C. Another example is ≤: If A ≤ B and B ≤ C, then A ≤ C. In contrast, independence is not transitive, as this exercise will demonstrate. That is, even if A and B are independent, and B and C are independent, it may not be that A and C are also independent. Flip two fair coins. Let H1 be the event that the first coin flip is heads, H2 be the event that the second is heads, and T1 be the event that the first flip is tails. Show that (a) H1 and H2 are independent. (b) H2 and T1 are independent. (c) H1 and T1 are not independent. Page 150, Table of Contents www.EconsPhDTutor.com 45 Probability: Not Everything is Independent The idea of independence is intuitively easy to grasp. Indeed, so much so that students often assume that “everything is independent”. This is a mistake. Unless you’re explicitly told, NEVER assume that two events are independent. Here are two examples where the assumption of independence is plausible: Example 158. The event “coin-flip #1 is heads” and the event “coin-flip #2 is heads” are probably independent. Example 159. The event “die-roll #1 is 3” and the event “die-roll #2 is 6” are probably independent. Here are two examples where the assumption of independence is not plausible: Example 160. The event “Google’s share price rises today” is probably not independent of the event “Apple’s share price rises today”. Example 161. The event “it rains in Singapore today” is probably not independent of the event “it rains in Kuala Lumpur today”. Nonetheless, the assumption of independence is frequently — and incorrectly — made even when it is implausible. One reason is that the maths is easy if we assume independence — we can simply multiply probabilities together. Page 151, Table of Contents www.EconsPhDTutor.com Exercise 67. (Answer on p. 349.) Say the probability that a randomly-chosen person is or was an NBA player is one in a million. (This is probably about right, since there’ve only ever been 4, 000 or so NBA players, since the late 1940s.) The Barry family had four players in the NBA — the father Rick Barry and three of his four sons Jon, Brent, and Drew. (The oldest son Scooter didn’t make the NBA but was still good enough to play professionally in other basketball leagues around the world.) A journalist concludes that the probability of a Barry family ever occurring is 4 1 1 ) = . ( 1, 000, 000 1, 000, 000, 000, 000, 000, 000, 000, 000 This is equal to the probability of buying a 4D number on six consecutive weeks, and winning first prize every time. Is the journalist correct? Page 152, Table of Contents www.EconsPhDTutor.com 46 Random Variables: Introduction Informally, a random variable assigns a numerical code to each possible outcome. A bit more formally, it is a function that maps each outcome to a real number. Example 162. Flip a fair coin. Let X be the random variable that indicates whether the coin-flip is heads. So X(H) = 1 and X(T ) = 0. (A bit more formally, we say that X is the function that maps the outcome H to the number 1 and the outcome T to the number 0.) We refer to 1 and 0 as the possible observed values of the random variable X. These correspond to the two possible outcomes of the coin-flip experiment. Example 163. Flip three fair coins. Let Y be the random variable that counts the number of heads. So Y (T T T ) = 0, Y (HT T ) = Y (T HT ) = Y (T T H) = 1, Y (HHT ) = Y (HT H) = Y (T HH) = 2, and Y (HHH) = 3. We refer to 0, 1, 2, and 3 as the possible observed values of the random variable Y . Let A be the random variable that that indicates whether there are at least 2 heads. So A(HHH) = A(HHT ) = A(HT H) = A(T HH) = 1 And A(T T T ) = A(T T H) = A(T HT ) = A(HT T ) = 0. We refer to 1 and 0 as the possible observed values of the random variable A. Example 164. Draw a card from a standard 52-card deck. In bridge, an ace is worth 4 high card points, a king 3, a queen 2, and a jack 1. Any other card is worth 0 points. So we might let B be the corresponding random variable, where for example B(Aª) = 4, B(J¨) = 1, and B(7«) = 0. Exercise 68. Let X be the random variable that is the sum of two fair die-rolls. What are the possible observed values of X? (Answer on p. 349.) Exercise 69. Let C be the random variable that counts the total number of high card points, in any two randomly-chosen cards from a standard 52-card deck. What are the possible observed values of C? (Answer on p. 350.) Page 153, Table of Contents www.EconsPhDTutor.com 47 Random Variables: Probability Distribution The notation X = k is shorthand for the event that contains all the outcomes s such that X(s) = k. The notation “X ≥ k”, “X > k”, “X ≤ k”, “X < k”, “a ≤ X ≤ b”, etc. are similarly defined. Example 162 (continued from above). Recall the fair coin-flip. Let A be the event that the coin-flip is heads and B be the event that the coin-flip is tails. So P(A) = 0.5 and P(B) = 0.5. Let X be the random variable that indicates whether the coin-flip is heads. That is, X(H) = 1 and X(T ) = 0. By our newly-introduced notation, we can also write P(X = 1) = 0.5 and P(X = 0) = 0.5. We also have P(X ≤ 1) = P(X = 0) + P(X = 1) = 1. Example 163 (continued from above). Recall the three fair coin-flips. Let C, D, E, and F be the events that there are 0, 1, 2, and 3 heads. So P(C) = 1/8, P(D) = 3/8, P(E) = 3/8, and P(F ) = 1/8. Let Y be the random variable that counts the number of heads. By our newly-introduced notation, we can also write P(Y = 0) = 1/8, P(Y = 1) = 3/8, P(Y = 2) = 3/8, and P(Y = 3) = 1/8. We also have P(Y ≤ 2) = P(Y = 0) + P(Y = 1) + P(Y = 2) = 7/8. Example 164 (continued from above). Recall the high card point count in bridge. Randomly choose a card from a standard 52-card deck. Let G be its high card point count. By our newly-introduced notation, we can write P(G = 0) = 9/13, P(G = 1) = 1/13, P(G = 2) = 1/13, P(G = 3) = 1/13, and P(G = 4) = 1/13. We also have P(G > 2) = P(G = 3) + P(G = 4) = 2/13. Page 154, Table of Contents www.EconsPhDTutor.com The probability distribution (or probability law or probability mass function) of a random variable X is a complete specification of P (X = k), for all possible observed values k (of the random variable X) . In the above examples, we gave the probability distributions of several random variables. More examples of random variables and their probability distributions: Example 165. Flip two fair coins. The four possible outcomes are HH, HT , T H, and TT. Let X indicate whether the two coin flips are the same and Y count the number of heads. That is, X(HH) = 1, X(HT ) = 0, X(T H) = 0, X(T T ) = 1, Y (HH) = 2, Y (HT ) = 1, Y (T H) = 1, Y (T T ) = 0. And so the probability distribution of X is P(X = 0) = 0.5, P(X = 1) = 0.5. And the probability distribution of Y is P(Y = 0) = 0.25, P(Y = 1) = 0.5, P(Y = 2) = 0.25. Another example: Page 155, Table of Contents www.EconsPhDTutor.com Example 166. Pick a random card from the standard 52-card deck. The 52 possible outcomes are A«, K«, , . . . , 2«, Aª, Kª, . . . , 2ª, A©, K©, . . . , 2©, A¨, K¨, . . . , 2¨. Let Y indicate whether the picked card is a spade («). That is, Y (Any «) = 1, Y (Any other card) = 0. So the probability distribution of Y is: P(Y = 0) = Page 156, Table of Contents 39 , 52 P(Y = 1) = 13 . 52 www.EconsPhDTutor.com Example 167. Roll two fair dice. The 36 possible outcomes are ,..., , , ,..., . ,..., , Let X is the sum of the two dice. And so for example, X ⎛ ⎝ ⎞ ⎛ = 7 and X ⎠ ⎝ ⎞ = 5. ⎠ The table below says that P (X = 2) = 1/36, because there is only one way the event X = 2 can occur. And P (X = 3) = 2/36, because there are two ways the event X = 3 can occur. You are asked to complete the table in the next exercise. k s such that X(s) = k 2 3 , P (X = k) 1 36 2 36 4 5 6 7 8 9 10 11 12 Exercise 70. (Continuation of the above example.) (Answer on p. 350.) (a) Complete the above table. Consider the event E, described in words as “the sum of the two dice is at least 10”. (b) Write down the event E in terms of X. (c) Calculate P(E). Page 157, Table of Contents www.EconsPhDTutor.com 48 Random Variables: Independence Informally, two random variables are independent if knowing the value of one does not tell us anything about the value of the other. Example 168. Flip a fair coin twice. The four possible outcomes are HH, HT, T H, T T . When we say that “the two coin-flips are independent”, what exactly do we mean by this? Let’s rephrase this statement slightly more formally. Let A indicate whether the first coin-flip was heads and B indicate whether the second was heads. That is, A(HH) = 1, A(HT ) = 1, A(T H) = 0, A(T T ) = 0, B(HH) = 1, B(HT ) = 0, B(T H) = 1, B(T T ) = 0. The following two statements are equivalent: 1. “The two coin-flips are independent.” 2. “The random variables A and B are independent.” Informally, the second statement says that knowing the value of A (whether the first coinflip was heads or not) tells us absolutely nothing about the value of B (whether the second coin-flip was heads or not). For example, if we know that A = 1, then P(B = 0) = 0.5 and P(B = 1) = 0.5. And if we know instead that A = 0, then P(B = 0) = 0.5 and P(B = 1) = 0.5. Thus, knowing whether A = 1 or A = 0 makes absolutely no difference about what we wan say about B. Formally: Definition 5. Given random variables X and Y , we say that X and Y are independent if for all x, y, P (X = x, Y = y) = P(X = x)P(Y = y). Page 158, Table of Contents www.EconsPhDTutor.com Example 168 (continued from above). It may be “obvious”, even without proof, that “the two coin-flips are independent”. But as an exercise, let’s formally prove that this is so, using the above formal definition. A and B remain the random variables indicating whether the first and second coin-flips are heads (respectively). We now verify that indeed, P (A = a, B = b) = P(A = a)P(B = b) for all possible values of a and b: P (A = a, B = b) P (A = 0, B = 0) = 0.25 P (A = 1, B = 0) = 0.25 P (A = 0, B = 1) = 0.25 P (A = 1, B = 1) = 0.25 P(A = a)P(B = b) P (A = 0) P (B = 0) = 0.5 × 0.5, P (A = 1) P (B = 0) = 0.5 × 0.5, P (A = 0) P (B = 1) = 0.5 × 0.5, P (A = 1) P (B = 1) = 0.5 × 0.5. ✓ ✓ ✓ ✓ The above method for proving that two random variables are independent becomes especially useful, when it is not immediately “obvious” that they are independent: Exercise 71. Flip two fair coins. Let X indicate whether the two coin flips were the same and Y count the number of heads. Are X and Y independent random variables? (Answer on p. 350.) Earlier we warned against blithely assuming that any two events are independent. Here we can repeat this warning: Unless explicitly told (or you have a good reason), do not assume that two random variables are independent. The assumption of independence is a strong one. There are many scenarios where it is plausible. For example, the flips of two coins are probably independent. The rolls of two dice are probably independent. There are, however, also many scenarios where it is not plausible. Today’s changes in the share prices of Google and Apple are probably not independent. Today’s rainfall in Singapore and in Kuala Lumpur are probably not independent. Nonetheless, the assumption of independence is frequently — and incorrectly — made even when it is implausible. The reason is that the maths is easy if we assume independence — we can simply multiply probabilities together. Unfortunately, incorrectly assuming independence can sometimes have tragic consequences. Page 159, Table of Contents www.EconsPhDTutor.com 49 Random Variables: Expectation Example 169. Let X be the outcome of a fair die roll. Informally, the expected value (or the mean) of X is the average expected outcome of a fair die roll. Note that X takes on a value 1 with probability 1/6. Similarly, it takes on a value 2 with probability 1/6. Etc. Hence, the expected value of X, denoted E [X] is given by: E[X] = 1 1 1 1 1 1 + 2 + 3 + 4 + 5 + 6 21 1 ⋅1+ ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6= = = 3.5. 6 6 6 6 6 6 6 6 On average, we expect the outcome of a fair die roll to be 3.5. A bit more formally, the expected value or mean of a random variable X — denoted E[X] — is simply a weighted average of the possible observed values of X, where the weights are simply given by the probability that the random variable takes on each possible observed value. Given a random variable X, its mean is usually denoted µX . If it’s obvious from the context that we’re talking about the random variable X, we drop the subscript X and simply use µ to denote the mean of X. Example 170. Let Y be the sum of two fair die-rolls. In Exercise 70, we worked out that P (Y = 2) = 1/36, P (Y = 3) = 2/36, etc. Thus: µY = P (Y = 2) ⋅ 2 + P (Y = 3) ⋅ 3 + P (Y = 4) ⋅ 4 + P (Y = 5) ⋅ 5 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 12 = 1 2 3 4 5 6 5 4 3 2 1 ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6+ ⋅7+ ⋅8+ ⋅9+ ⋅ 10 + ⋅ 11 + ⋅ 12 36 36 36 36 36 36 36 36 36 36 36 = 2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 + 30 + 22 + 12 252 = = 7. 36 36 Page 160, Table of Contents www.EconsPhDTutor.com Example 171. Flip two fair coins and roll two fair dice. Let X be the number of heads and Y be the number of sixes. Problem: What is E[X + Y ]? As it turns out, it is generally true that E[X + Y ] = E[X] + E[Y ] (as we’ll see in the next section). So if we knew this, then the problem is very easy: E[X + Y ] = E[X] + E[Y ] = 1 4 1+ = . 3 3 But as an exercise, let’s pretend we don’t know that E[X + Y ] = E[X] + E[Y ]. We thus have to work out E[X + Y ] the hard way: First, note that the possible observed values of X + Y are 0, 1, 2, 3, 4. P (X + Y = 0) is the probability of 0 heads and 0 sixes. And P (X + Y = 1) is the probability of 1 head and 0 sixes OR 0 heads and 1 six. We can compute: P (X + Y = 0) = 1 1 5 5 25 ⋅ ⋅ ⋅ = , 2 2 6 6 144 P (X + Y = 1) = ⎛ 2 ⎞ 1 1 5 5 1 1 ⎛ 2 ⎞ 5 1 50 10 60 ⋅ ⋅ ⋅ + ⋅ = + = . ⎝ 1 ⎠ 2 2 6 6 2 2 ⎝ 1 ⎠ 6 6 144 144 72 You are asked to complete the rest of this problem in the exercise below. Page 161, Table of Contents www.EconsPhDTutor.com Exercise 72. Complete the above example by following these steps: (a) Compute P (X + Y = 2). (b) Compute P (X + Y = 3). (c) Compute P (X + Y = 4). (d) Now compute E[X + Y ]. (Answer on p. 351.) Exercise 73. In the game of 4D, you pay $1 to pick any four-digit number between 0000 and 9999 (there are thus 10, 000 possible choices). There are two variants of the 4D game — “big” and “small”. The prize structures are as given below. Let X be the prize received from a $1 stake in the “big” game and Y be the prize received from a $1 stake in the “small” game. (Answer on p. 352.) (a) Write down the possible observed values of X and Y . (b) Write down the probability distributions of X and Y . (c) Hence find E[X] and E[Y ]. (d) Which game — “big” or “small” — is expected to lose you less money? (Source: Singapore Pools, “Rules for the 4-D Game”, Version 1.11, 17/11/15, PDF.) Page 162, Table of Contents www.EconsPhDTutor.com 49.1 The Expectation Operator is Linear d is an example of a linear transformation. Example 172. The differentiation operator dx Because it satisfies the following two conditions: d d d (f (x) + g(x)) = f (x) + g(x), dx dx dx and d d (kf (x)) = k f (x). dx dx A common mistake made by students is to believe that “everything is linear”. Here are two examples of operators that are not linear transformations. Example 173. The square-root operator general, we do not have √ ⋅ is not a linear transformation, because in √ √ √ x + y = x + y, or √ √ kx = k x. Example 174. The square operator ⋅2 is not a linear transformation, because in general, we do not have 2 (x + y) = x2 + y 2 , or Page 163, Table of Contents 2 (kx) = kx2 . www.EconsPhDTutor.com It turns out that the expectation operator E is a linear transformation. That is, if X and Y are random variables and c is a constant, then E[X + Y ] = E [X] + E [Y ] , and E[cX] = cE [X] . The expectation operator is linear. This is true even if independence is not satisfied, which makes it an especially powerful property. Example: Example 175. I stake $100 on each of two different 4D numbers for Saturday’s drawing (“big” game). (So that’s $200 total.) Let X and Y be my winnings (excluding my original stake) from the first and second numbers (respectively). Now, X and Y are certainly not independent because for example, if my first number wins first prize, then my second number cannot possibly also win first prize. Nonetheless, despite X and Y not being independent, the linearity of the expectation operator tells us that E [X + Y ] = E [X] + E [Y ] = $65.90 + $65.90 = $131.80. Page 164, Table of Contents www.EconsPhDTutor.com 50 Random Variables: Variance Example 176. Consider a random variable X that is equally likely to take on one of 5 possible values: 0, 1, 2, 3, 4. Its mean is µX = ∑ P (X = k) ⋅ k = 1 1 1 1 1 ⋅ 0 + ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 = 2. 5 5 5 5 5 Now consider another random variable Y that is equally likely to take on one of 5 possible values: −8, −3, 2, 7, 12. Coincidentally, its mean is the same: µY = ∑ P (Y = k) ⋅ k = 1 1 1 1 1 ⋅ (−8) + ⋅ (−3) + ⋅ 2 + ⋅ 7 + ⋅ 12 = 2. 5 5 5 5 5 The random variables X and Y share the same mean. However, there is an obvious difference: Y is “more spread out”. What, precisely, do we mean when we say that one random variable is “more spread out” than another? Our goal in this section is to invent a measure of “spread-outness”. We’ll call this the variance and denote the variance of any random variable X by V [X]. It’s not at all obvious how the variance should be defined. One possibility is to define the variance as the weighted average of the deviations from the mean. Page 165, Table of Contents www.EconsPhDTutor.com Example 165 (continued from above). (Our first proposed definition of variance.) For X, the weighted average of the deviations from the mean is V [X] = ∑ P (X = k) ⋅ (k − µ) = = 0−µ 1−µ 2−µ 3−µ 4−µ + + + + 5 5 5 5 5 0−2 1−2 2−2 3−2 4−2 2 1 1 2 + + + + = − − + 0 + + = 0. 5 5 5 5 5 5 5 5 5 Hmm. This works out to be 0. Is that just a weird coincidence? Let’s try the same for Y : V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = = −8 − µ −3 − µ 2 − µ 7 − µ 12 − µ + + + + 5 5 5 5 5 −8 − 2 −3 − 2 2 − 2 7 − 2 12 − 2 + + + + = −2 − 1 + 0 + 1 + 2 = 0. 5 5 5 5 5 Hmm. Again it works out to be 0. This is no mere coincidence. It turns out that ∑ P(X = k) ⋅ (k − µ) is always equal to 0. k This is because =µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ∑ P(X = k) ⋅ (k − µ) = ∑ P(X = k) ⋅ k − ∑ P(X = k) ⋅ µ k k k = µ − µ∑ P(X = k) = 0. k ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ =1 So our first proposed definition of the variance — the weighted average of the deviations from the mean — is always equal to 0. Intuitively, the reason is that the negative deviations (corresponding to those values below the mean) exactly cancel out the positive deviations (corresponding to those values above the mean). This proposed definition is thus quite useless. We cannot use it to say things like Y is “more spread out” than X. This suggests a second approach: define the variance to be the weighted average of the absolute deviations from the mean. Page 166, Table of Contents www.EconsPhDTutor.com Example 165 (continued from above). (Our second proposed definition of variance.) For X, the weighted average of the absolute deviations from the mean is V [X] = ∑ P (X = k) ⋅ ∣k − µ∣ = = ∣0 − µ∣ ∣1 − µ∣ ∣2 − µ∣ ∣3 − µ∣ ∣4 − µ∣ + + + + 5 5 5 5 5 1 2 6 ∣0 − 2∣ ∣1 − 2∣ ∣2 − 2∣ ∣3 − 2∣ ∣4 − 2∣ 2 1 + + + + = + +0+ + = . 5 5 5 5 5 5 5 5 5 5 And now let’s work out the same for Y : V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = = ∣−8 − µ∣ ∣−3 − µ∣ ∣2 − µ∣ ∣7 − µ∣ ∣12 − µ∣ + + + + 5 5 5 5 5 ∣−8 − 2∣ ∣−3 − 2∣ ∣2 − 2∣ ∣7 − 2∣ ∣12 − 2∣ + + + + = 2 + 1 + 0 + 1 + 2 = 6. 5 5 5 5 5 Wonderful! So we can now use this second proposed definition of the variance to say things like “Y is more spread out than X”. This second proposed definition seems perfectly satisfactory. Yet for some bizarre reason, it will not be our actual definition of variance. Instead, the variance will be defined as the weighted average of the squared deviations from the mean. Page 167, Table of Contents www.EconsPhDTutor.com Example 165 (continued from above). (The actual definition of variance.) For X, the weighted average of the squared deviations from the mean is 2 2 2 2 (0 − µ) (1 − µ) (2 − µ) (3 − µ) (4 − µ) V [X] = ∑ P (X = k) ⋅ (k − µ) = + + + + 5 5 5 5 5 2 2 2 2 2 2 2 (0 − 2) (1 − 2) (2 − 2) (3 − 2) (4 − 2) 4 1 1 4 = + + + + = + + 0 + + = 2. 5 5 5 5 5 5 5 5 5 And now let’s work out the same for Y : 2 2 2 2 2 (−3 − µ) (2 − µ) (7 − µ) (12 − µ) (−8 − µ) + + + + V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = 5 5 5 5 5 2 2 2 2 2 2 (−8 − 2) (−3 − 2) (2 − 2) (7 − 2) (12 − 2) = + + + + = 20 + 5 + 0 + 5 + 20 = 50. 5 5 5 5 5 A bit more formally, if X is a random variable and µ is its expected value, then its variance 2 is defined to be the expected value of (X − µ) . 2 The variance of X is denoted V[X] or σX or even more simply as σ 2 (if it is clear from the context that we’re talking about the variance of X). So we may write 2 2 V[X] = σX = E [(X − µ) ] . So to calculate the variance, we do this: Consider all the possible values that X can take. Take the difference between these values and the mean of X. Square them. Then take the probability-weighted average of these squared numbers. More examples: Page 168, Table of Contents www.EconsPhDTutor.com Example 177. Let the random variable X be the outcome of the roll of a fair die. We already know that µ = 3.5. Hence, 2 2 V[X] = E [(X − µ) ] = E [(X − 3.5) ] = P (X = 1) ⋅ (1 − 3.5)2 + P (X = 2) ⋅ (2 − 3.5)2 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ (6 − 3.5)2 = 35 1 (2.52 + 1.52 + 0.52 + 0.52 + 1.52 + 2.52 ) = ≈ 2.92. 6 12 So the variance of the die roll is 35/12 ≈ 2.92. This means that the expected squared deviation of X from its mean µ = 3.5 is 35/12 ≈ 2.92. Example 178. Roll two fair dice. Let the random variable Y be the sum of the two dice. We already know from Example 170 that µ = 7. So, using also our findings from Exercise 70, 2 2 V[Y ] = E [(Y − µ) ] = E [(Y − 7) ] = P (Y = 2) ⋅ (2 − 7)2 + P (Y = 3) ⋅ (3 − 7)2 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ (12 − 7)2 = 1 ⋅ 52 + 2 ⋅ 42 + 3 ⋅ 32 + 4 ⋅ 22 + 5 ⋅ 12 + 6 ⋅ 02 + 5 ⋅ 12 + 4 ⋅ 22 + 3 ⋅ 32 + 2 ⋅ 42 + 1 ⋅ 52 36 = 2 (25 + 32 + 27 + 16 + 5) 210 70 = = ≈ 5.83. 36 36 12 So the variance of the sum of two dice is 70/12 ≈ 5.83. This means that on average, the square of the deviation of Y from its mean µ = 7 is 70/12 ≈ 5.83. As the above examples suggest, calculating the variance can be tedious. Fortunately, there is a shortcut: Page 169, Table of Contents www.EconsPhDTutor.com Fact 8. Let X be a random variable with mean µ. Then V[X] = E [X 2 ] − µ2 . Proof. Omitted. We now redo the previous two examples using this shortcut: Example 177 (continued from above). Let the random variable X be the outcome of the roll of a fair die. We already know that µ = 3.5. So compute E [X 2 ] = P (X = 1) ⋅ 12 + P (X = 2) ⋅ 22 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ 62 = Hence, V[X] = E [X 2 ] − µ2 = 91 1 2 2 (1 + 2 + ⋅ ⋅ ⋅ + 62 ) = . 6 6 182 147 35 91 − 3.52 = − = . 6 12 12 12 Example 178 (continued from above). Let the random variable Y be the sum of two rolled dice. We already know from Example 170 that µ = 7. So, using also our findings from Exercise 70, E [Y 2 ] = P (Y = 2) ⋅ 22 + P (Y = 3) ⋅ 32 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 122 = 1 ⋅ 22 + 2 ⋅ 32 + 3 ⋅ 42 + 4 ⋅ 52 + 5 ⋅ 62 + 6 ⋅ 72 + 5 ⋅ 82 + 4 ⋅ 92 + 3 ⋅ 102 + 2 ⋅ 112 + 1 ⋅ 122 36 = 4 + 18 + 48 + 100 + 294 + 320 + 324 + 300 + 242 + 144 1974 658 = = . 36 36 12 Hence, V[Y ] = E [Y 2 ] − µ2 = 658 658 588 70 − 72 = − = . 12 12 12 12 Exercise 74. Let the random variable B be the high card point count of a randomly-chosen card from a standard 52-card deck. Find V[B]. (Answer on p. 353.) Page 170, Table of Contents www.EconsPhDTutor.com 50.1 Standard Deviation Let X be a random variable. Then E [X] has the same unit of measure as X. In contrast, V [X] uses the squared unit. Example 179. There are 100 dumbbells in a gym, of which 30 have weight 5 kg and the remaining 70 have weight 10 kg. Let X be the weight of a randomly-chosen dumbbell. Then the mean and variance of X are E [X] = µ = 0.3 × 5 kg + 0.7 × 10 kg = 8.5 kg. 2 V [X] = 0.3 × (5 kg − 8.5 kg) + 0.7 × (10 kg − 8.5 kg) 2 = 0.3 × 12.25 kg2 + 0.7 × 2.25 kg2 = 5.25 kg2 . To get a measure of “spread” that uses the original unit of measure, we simply take the square root of the variance. This is called the standard deviation as a measure of spread. Definition 6. Let X be a random variable and V[X] be its variance. Then the standard deviation of X is defined as SD [X] = √ V[X]. 2 The variance of a random variable X is often denoted σX or even more simply as σ 2 (if it is clear from the context that we’re talking about the variance of X). Correspondingly, the standard deviation of X is often denoted σX or σ. Example 171 (continued from above). We calculated the variance of X to be V [X] = σ 2 = 5.25 kg2 . √ Hence, the standard deviation of X is simply σ = 5.25 ≈ 2.29 kg. Exercise 75. There are 100 rulers in a bookstore, of which 35 have length 20 cm and the remaining 65 have length weight 30 cm. Let Y be the weight of a randomly-chosen dumbbell. Find the mean, variance, and standard deviation of Y . (Be sure to include the units of measurement. Answer on p. 353.) Page 171, Table of Contents www.EconsPhDTutor.com 50.2 Properties of the Variance Operator 1. If X is a random variable and c is a constant, then V[cX] = c2 E [X]. 2. If X and Y are independent random variables, then V[X + Y ] = E [X] + E [Y ]. With the above properties, it becomes much easier than before to find the variance of the sum of 2 dice, 3 dice, or indeed n dice. Example 180. Let X be the outcome of a fair die-roll. We showed earlier that V[X] = 35/12. Now roll two fair dice. Let X1 and X2 be the respective outcomes. Let Y be the sum of the two dice (i.e. Y = X1 + X2 ). Assuming independence, we have V[Y ] = V [X1 + X2 ] = V [X1 ] + V [X2 ] = 70 . 12 Compare this quick computation to the work we did in Example 178! Now roll three fair dice. Let X3 , X4 , and X5 be the respective outcomes. Let Z be the sum of the three dice (i.e. Z = X3 + X4 + X5 ). Again, assuming independence, we have V[Z] = V [X3 + X4 + X5 ] = V [X3 ] + V [X4 ] + V [X5 ] = 105 . 12 Again, compare this quick computation to the work we would have had to do, without this property! Now, let A be double the outcome of a die roll (i.e. A = 2X). Note importantly that A ≠ Y . Y is the sum of two independent die rolls. In contrast, A is double the outcome of a single die roll. Indeed, we have that V[A] = V[2X] = 4V[X] = 140 ≠ V[Y ]. 12 Similarly, let B be triple the outcome of a die roll (i.e. B = 3X). Note importantly that B ≠ Z. Z is the sum of three independent die rolls. In contrast, B is triple the outcome of a single die roll. Indeed, we have that V[B] = V[3X] = 9V[X] = Page 172, Table of Contents 315 ≠ V[Z]. 12 www.EconsPhDTutor.com It is important to remember that the second property does not hold if X and Y are not independent. Exercise 76. The weight of a fish in a pond is a random variable with mean µ kg and variance σ 2 kg2 . (Include the units of measurement in your answer. Answer on p. 353.) (a) If two fish are caught and the weights of these fish are independent of each other, what are the mean and variance of the total weight of the two fish? (b) If one fish is caught and an exact clone is made of it, what are the mean and variance of the total weight of the fish and its clone? (c) If two fish are caught and the weights of these fish are not independent of each other, what are the mean and variance of the total weight of the two fish? Page 173, Table of Contents www.EconsPhDTutor.com 51 The Binomial Distribution Example 181. Flip 3 fair coins. Let X be the random variable that counts the number of heads. Then X is an example of a binomial random variable with parameters 3 and 1 . 2 Example 182. Flip 4 fair coins. Let Y be the random variable that counts the number of heads. Then Y is an example of a binomial random variable with parameters 4 and 1 . 2 Example 183. There are 10 ATMs. On any given day, each has, independently, probability 0.1 of failure. Let Z be the random variable that counts the number of failures on any given day. Then Z is an example of a binomial random variable with parameters 10 and 0.1. Example 184. 90% of H2 Maths students pass their A-level exams. Let A be the number of passes among 2 randomly-chosen students. Then A is a binomial random variable with parameters 2 and 0.9. Let B be the number of passes among 3 randomly-chosen students. Then B is a binomial random variable with parameters 3 and 0.9. The following three statements are entirely equivalent: 1. X is a binomial random variable with parameters n and p. 2. The random variable X has the binomial distribution with parameters n and p. 3. X ∼ B(n, p). Page 174, Table of Contents www.EconsPhDTutor.com 51.1 Probability Distribution of the Binomial R.V. We flip a biased coin n times. On each flip, the coin has probability p of landing on heads. Let X count the number of heads. Then X is the binomial random variable with parameters n and p. What is P(X = k)? In other words, what is the probability that there are k heads and n − k tails? First let’s consider instead the probability that the first k coin-flips are heads and the remaining n − k coin-flips are tails. We know that the probability of a heads is p and the probability of a tails is 1 − p. Hence, by the Multiplication Principle, this probability is simply pk (1 − p)n−k . The above is the probability of k heads and n − k tails, but where exactly the first k trials are successes and exactly the last n − k trials are failures. But we don’t care about where the successes are. We only care that there are k successes. And there are C(n, k) ways to have exactly k successes in n trials. Thus, our desired probability is: P(X = k) = ⎛n⎞ k p (1 − p)n−k . ⎝k ⎠ Example 185. Let X be the number of heads when 10 fair coins are flipped. Then X ∼ B(10, 0.5). And the probability that exactly 8 coins are heads is: P(X = 8) = ⎛ 10 ⎞ 8 2 45 . 0.5 0.5 = 1024 ⎝ 8 ⎠ Example 186. 90% of H2 Maths students pass their A-level exams. Let Y be the number of passes among 20 randomly-chosen students. Then Y ∼ B(20, 0.9). And the probability that at least 18 pass is P(Y ≥ 18) = P(Y = 18) + P(Y = 19) + P(Y = 20) = ⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0 0.9 0.1 + 0.9 0.1 + 0.9 0.1 ≈ 0.677. ⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠ Page 175, Table of Contents www.EconsPhDTutor.com 51.2 The Mean and Variance of the Binomial Random Variable Example 187. Problem: Three machines each have, independently, probability 0.3 of failure. What is the expected number of failures? What is the variance of the number of failures? Solution: Let Z ∼ B(3, 0.3) be the number of failures. Then P (Z = 1) = ⎛3⎞ 1 2 0.3 0.7 , ⎝1⎠ Hence, P (Z = 2) = ⎛3⎞ 2 1 0.3 0.7 , ⎝2⎠ P (Z = 3) = ⎛3⎞ 3 0 0.3 0.7 . ⎝3⎠ E[Z] = P (Z = 1) ⋅ 1 + P (Z = 2) ⋅ 2 + P (Z = 3) ⋅ 3 = ⎛3⎞ 1 2 ⎛3⎞ 2 1 ⎛3⎞ 3 0 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3 ⎝1⎠ ⎝2⎠ ⎝3⎠ = 0.441 + 0.378 + 0.081 = 0.9. That is, the expected number of failures is 0.9. Now, E [Z 2 ] = P (Z = 1) ⋅ 12 + P (Z = 2) ⋅ 22 + P (Z = 3) ⋅ 32 = ⎛3⎞ 1 2 2 ⎛3⎞ 2 1 2 ⎛3⎞ 3 0 2 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3 ⎝1⎠ ⎝2⎠ ⎝3⎠ = 0.441 + 0.756 + 0.243 = 1.44. Hence, 2 V[Z] = E [Z 2 ] − (E [Z]) = 1.44 − 0.92 = 0.63. That is, the variance of the number of failures is 0.63. It turns out though that there is a much quicker formula for finding the mean and variance of any binomial random variable. Page 176, Table of Contents www.EconsPhDTutor.com Fact 9. If X ∼ B(n, p), then E[X] = np and V[X] = np(1 − p). (You can verify that this formula works for the last example: n = 3, p = 0.3, and thus E[Z] = np = 0.9.) Proof. Omitted. Exercise 77. (Answer on p. 354.) Plane engine #1 contains 20 components, each of which has probability 0.01 of failure. Plane engine #2 contains 35 components, each of which has probability 0.005 of failure. The probability that any component fails is independent of whether any other component has failed. An engine fails if and only if at least 2 of its components fail. What is the probability that both engines fail? Page 177, Table of Contents www.EconsPhDTutor.com 52 The Continuous Uniform Distribution The binomial random variable is discrete, because its range of possible observed values is finite. We’ll now look instead at continuous random variables. Informally, a random variable Y is continuous if its range takes on a continuum of values. For H1 Maths, you need only learn about one continuous random variable: the normal random variable (subject of the next chapter). Nonetheless, we’ll first look at another continuous random variable that is not in the syllabus. This is the continuous uniform random variable. It is much simpler than the normal random variable and can thus help build up your intuition of how continuous random variables work. 52.1 The Continuous Uniform Distribution A line measuring exactly 1 metre in length is drawn on the floor. It is about to rain. Let X be the position of the first rain-drop that hits the line. X is measured as the distance (in metres) from the left-most point of the line. So for example, if the first rain-drop hits the left-most point of the line, then x = 0. If it hits the exact midpoint of the line, then x = 0.5. And if it hits the right-most point, then x = 1. Assume we can measure X to infinite precision. Then, assuming the first rain-drop is equally likely to hit any point of the line, we can model X as a continuous uniform random variable on [0, 1]. This says that • The range of X is [0, 1] (the first rain-drop can hit any point along the line); and • X is equally likely to take on any value in the interval [0, 1] (the first rain-drop is equally likely to hit any point along the line). The following three statements are entirely equivalent: 1. X is a continuous uniform random variable on [0, 1]. 2. X is a random variable with the continuous uniform distribution on [0, 1]. 3. X ∼ U [0, 1]. Recall that previously with any discrete random variable Y , we could find its probability distribution. That is, we could find P (Y = k) (the probability that Y takes on the value k). For example, if Y ∼ B (3, 0.5) modelled the number of heads in three coin-flips, then ⎛3⎞ 1 2 3 the probability that there was one heads was P (Y = 1) = 0.5 0.5 = . 8 ⎝1⎠ Page 178, Table of Contents www.EconsPhDTutor.com Now, in contrast, for any continuous random variable X, strangely enough, there is zero probability that X takes on any particular value! For example, if X ∼ U [0, 1], then P (X = 0.37) = 0. That is, there is zero probability that X takes on the value of 0.37! At first glance, this may seem strange. But remember: There are infinitely-many real numbers in the interval [0, 1]. So it makes sense to say that the probability of X taking on any particular value is zero.14 So for any continuous random variable X, it is pointless to try to write down P (X = k) for different possible values of k, because P (X = k) is always equal to zero (regardless of what k is). Instead, we shall try to write down P (a ≤ X ≤ b), for different possible values of a and b. Now, if X ∼ U [0, 1], then the probability that X takes on values between 0.3 and 0.7 is simply 0.7 − 0.3 = 0.4. That is, P (0.3 ≤ X ≤ 0.7) = 0.7 − 0.3 = 0.4. Similarly, the probability that X takes on values between 0.16 and 0.35 is simply 0.35−0.16 = 0.19. That is, P (0.16 ≤ X ≤ 0.35) = 0.35 − 0.16 = 0.19. The above observations suggest that it may be useful to define a new concept, called the cumulative distribution function. 14 But strangely enough, zero probability is not the same thing as impossible. For example, we’d say that • There is zero probability, but it is not impossible that X ∼ U [0, 1] takes on the value 0.37. • There is zero probability and it is impossible that X ∼ U [0, 1] takes on the value 1.2. (Actually, rather than use the word “impossible”, mathematicians prefer saying “almost never”, which has a precise definition.) Page 179, Table of Contents www.EconsPhDTutor.com 52.2 Important Digression: P (X ≤ k) = P (X < k) For any continuous random variable X, we have P (X ≤ k) = P (X < k) . That is, whether an inequality is strict makes no difference. The reason is that: P (X ≤ k) = P (X < k) + P (X = k) = P (X < k) + 0 = P (X < k) . Thus, for continuous random variables, it doesn’t matter whether inequalities are strict or weak. Example 188. Let X ∼ U [0, 1]. Then P (0.2 ≤ X ≤ 0.5) = P (0.2 < X ≤ 0.5) = P (0.2 ≤ X < 0.5) = P (0.2 < X < 0.5) . Page 180, Table of Contents www.EconsPhDTutor.com 52.3 The Cumulative Distribution Function (CDF) Let X be a random variable. Its cumulative distribution function (CDF) — denoted FX — simply tells us the probability that X takes on values less than or equal to k, for every k ∈ R. Example 189. Let X ∼ U [0, 1]. Let FX be its CDF. Then we have, for example, FX (0.7) = P (X ≤ 0.7) = 0.7, and FX (0.2) = P (X ≤ 0.2) = 0.2. Example 190. Let Y ∼ U [3, 5]. This is the continuous uniform distribution on [3, 5]. It is equally likely to take on any value in the interval [3, 5]. Let FY be the CDF of Y . Then we have, for example, FY (3.1) = P (Y ≤ 3.1) = 0.05, Page 181, Table of Contents and FY (4.4) = P (Y ≤ 4.4) = 0.7. www.EconsPhDTutor.com 52.4 The Probability Density Function (PDF) Given a random variable X, its probability density function (PDF) — denoted fX — is simply defined as the derivative of its CDF FX . 15 That is, fX = d FX . dk Example 191. The PDF of X ∼ U[0, 1] (graphed below) is simply the function fX ∶ R → R defined by fX (k) = 1, if k ∈ [0, 1], and fX (k) = 0, otherwise. Recall that the area under the curve (definite integral) can be computed as the reverse process of differentiation. Hence, for any a ≤ b, the area under the PDF between a and b is precisely P (a ≤ X ≤ b). For example, there is probability 0.25 (red area) that X takes on values between 0.5 and 0.75. There is probability 0.1 (blue area) that X takes on values between 0.2 and 0.3. Exercise 78. The continuous uniform random variable Y ∼ U[3, 5] is equally likely to take on values between 3 and 5, inclusive. (a) Write down CDF FY . (b) Write down and graph its PDF fY . (c) Compute, and also illustrate on your graph, the quantities P (3.1 ≤ Y ≤ 4.6) and P (4.8 ≤ Y ≤ 4.9). (Answer on p. 354.) 15 Note that although every random variable has a CDF, not every random variable has a PDF. In particular, if the random variable’s CDF is not differentiable, then by our definition here, the random variable does not have a PDF. Page 182, Table of Contents www.EconsPhDTutor.com 53 The Normal Distribution The standard normal (or Gaussian) random variable (SNRV) is very important. In fact, it is so important that we usually reserve the letter Z for it, and the Greek letters φ and Φ (lower- and upper-case phi) for its PDF and CDF. The following three statements are entirely equivalent: 1. Z is a SNRV. 2. Z is a random variable with the standard normal distribution. 3. Z ∼ N (0, 1). Here’s the formal definition: Definition 7. Z is called a standard normal random variable (SNRV) if its PDF φ ∶ R → R is defined by: 2 1 φ(a) = √ e−0.5a . 2π For the A-levels, you need not remember this complicated-looking PDF. Nor need you understand where it comes from. The normal PDF is often also referred to as the bell curve, due to its resemblance to a bell (kinda). As with the continuous uniform, for any a ≤ b, the area under the normal PDF between a and b gives us precisely P (a ≤ X ≤ b). For example, there is probability 0.0819 (red area) that X takes on values between 0.5 and 0.75. There is probability 0.4593 (blue area) that X takes on values between −1 and 0.3. Page 183, Table of Contents www.EconsPhDTutor.com As usual, the CDF Φ ∶ R → R is defined by: Φ(a) = P (Z ≤ a) = ∫ a −∞ φ(x)dx = ∫ a 2 1 √ e−0.5x dx. −∞ 2π Unfortunately, this last integral has no simpler expression (mathematicians would say that it has no “closed-form expression”). Instead, as we’ll soon see, we have to use the so-called Z-tables (or a graphing calculator) to look up values of Φ(k). The next fact summarises the properties of the normal distribution. Some of these properties are illustrated in the figure that follows. Fact 10. Let Z ∼ N(0, 1) and let φ and Φ be the PDF and CDF of Z. 1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random variable.) 2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising implication that no matter how large a is, there is always some non-zero probability that Z ≥ a.) 3. E [Z] = 0. (The mean of Z is 0.) 4. The PDF φ reaches a global maximum at the mean 0. (In fact, we can go ahead and 1 compute φ (0) = √ ≈ 0.399.) 2π 5. V [Z] = 1. (The variance of Z is 1.) 6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference whether the inequality is strict. This is because P(Z = a) = 0.) 7. The PDF φ is symmetric about the mean. This has several implications: (a) P (Z ≥ a) = P (Z ≤ −a) = Φ(−a). (b) Since P (Z ≥ a) = 1 − P (Z ≤ a) = 1 − Φ(a), it follows that Φ(−a) = 1 − Φ(a) or, equivalently, Φ(a) = 1 − Φ(−a). (c) Φ(0) = 1 − Φ(0) = 0.5. 8. P (−1 ≤ Z ≤ 1) = Φ (1) − Φ (−1) ≈ 0.6827. (There is probability 0.6827 that Z takes on values within 1 standard deviation of the mean.) 9. P (−2 ≤ Z ≤ 2) = Φ (2) − Φ (−2) ≈ 0.9545. (There is probability 0.9545 that Z takes on values within 2 standard deviations of the mean.) 10. P (−3 ≤ Z ≤ 3) = Φ (3) − Φ (−3) ≈ 0.9973. (There is probability 0.9973 that Z takes on values within 3 standard deviations of the mean.) 11. The PDF φ has two points of inflexion, namely at ±1. (The points of inflexion are one standard deviation away from the mean.) Proof. Omitted. Page 184, Table of Contents www.EconsPhDTutor.com -4 -3 Page 185, Table of Contents -2 -1 0 1 2 3 4 www.EconsPhDTutor.com Example 192. Let’s use the TI84 to find Φ(2.51). 1. Press the blue 2ND button and then DISTR (which corresponds to the VARS button). This brings up the DISTR menu. 2. Press 2 to select the “normalcdf” option. The TI84 is now asking for your lower and upper bounds. Since Φ(2.51) = Φ(2.51)−Φ(−∞), your lower bound is −∞ and your upper bound is 2.51. 3. But there’s no way to enter −∞ on your TI84. So instead, you’ll enter −1099 , which is simply a very large negative number. To do so, press (-) , the blue 2ND button, EE (which corresponds to the , button), and then 9 9 . (Don’t press ENTER yet!) 4. Now to enter your upper bound. First press , (this simply demarcates your lower and upper bounds). Then enter your upper bound 2.51 by pressing 2 . 5 1 . Then press ENTER . Your TI84 says that the answer is Φ(2.51) ≈ 0.99396. After Step 1. -4 -3 Page 186, Table of Contents After Step 2. -2 -1 After Step 3. 0 1 After Step 4. 2 3 4 www.EconsPhDTutor.com Example 193. To find Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4), the steps are very similar. So for each, I’ll simply give the screenshot from the TI84: Φ(−2.51) -4 -3 -2 -1 0 1 P (−4 ≤ Z ≤ 4) Φ(1.372) 2 3 4 -4 -3 -2 -1 0 1 2 3 4 Example 194. We’ll find Φ(2.51), Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4) using Z-tables. Refer to the Z-tables on p. 188. (These are the exact same tables that appear on the List of Formulae you’ll get during exams.) • To find Φ(2.51), look at the row labelled 2.5 and the column labelled 1 — read off the number 0.9940. We thus have Φ(2.51) = 0.9940. • To find Φ(−2.51), note that the table does not explicitly give values of Φ(z), if z < 0. But we can exploit the fact that the standard normal is symmetric about the mean µ = 0. This fact implies that Φ(−z) = 1 − Φ(z). Hence, Φ(−2.51) = 1 − Φ(2.51) = 0.0060. • To find Φ(1.372), first look at the row labelled 1.3 and the column labelled 7 — read off the number 0.9147. This tells us that Φ(1.37) = 0.9147. Now look at the right end of the table (where it says “ADD”). Since the third decimal place of 1.372 is 2, we look under the column labelled 2 — this tells us to ADD 3. Thus, Φ(1.372) = 0.9147+0.003 = 0.9150. • To find P (−4 ≤ Z ≤ 4), the Z-tables printed are actually useless, because they only go to 2.99. So you can just write P (−4 ≤ Z ≤ 4) ≈ 1. Exercise 79. Using both the Z-tables and your graphing calculator, find the following: (a) P (Z ≥ 1.8). (b) P (−0.351 < Z < 1.2). (Answer on p. 355.) Page 187, Table of Contents www.EconsPhDTutor.com THE NORMAL DISTRIBUTION FUNCTION If Z has a normal distribution with mean 0 and variance 1 then, for each value of z, the table gives the value of (z) , where (z )  P(Z  z). For negative values of z use (z)  1  (z) . 1 2 3 4 7 8 9 0.5359 0.5753 0.6141 0.6517 0.6879 4 4 4 4 4 8 8 8 7 7 12 12 12 11 11 16 16 15 15 14 28 28 27 26 25 32 32 31 30 29 36 36 35 34 32 0.7190 0.7517 0.7823 0.8106 0.8365 0.7224 0.7549 0.7852 0.8133 0.8389 3 3 3 3 3 7 10 14 17 20 24 7 10 13 16 19 23 6 9 12 15 18 21 5 8 11 14 16 19 5 8 10 13 15 18 27 26 24 22 20 31 29 27 25 23 0.8577 0.8790 0.8980 0.9147 0.9292 0.8599 0.8810 0.8997 0.9162 0.9306 0.8621 0.8830 0.9015 0.9177 0.9319 2 2 2 2 1 5 4 4 3 3 7 6 6 5 4 9 12 14 16 19 21 8 10 12 14 16 18 7 9 11 13 15 17 6 8 10 11 13 14 6 7 8 10 11 13 0.9406 0.9515 0.9608 0.9686 0.9750 0.9418 0.9525 0.9616 0.9693 0.9756 0.9429 0.9535 0.9625 0.9699 0.9761 0.9441 0.9545 0.9633 0.9706 0.9767 1 1 1 1 1 2 2 2 1 1 4 3 3 2 2 5 4 4 3 2 6 5 4 4 3 7 6 5 4 4 8 7 6 5 4 10 8 7 6 5 11 9 8 6 5 0.9798 0.9842 0.9878 0.9906 0.9929 0.9803 0.9846 0.9881 0.9909 0.9931 0.9808 0.9850 0.9884 0.9911 0.9932 0.9812 0.9854 0.9887 0.9913 0.9934 0.9817 0.9857 0.9890 0.9916 0.9936 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 2 2 1 1 1 2 2 2 1 1 3 2 2 2 1 3 3 2 2 1 4 3 3 2 2 4 4 3 2 2 0.9946 0.9960 0.9970 0.9978 0.9984 0.9948 0.9961 0.9971 0.9979 0.9985 0.9949 0.9962 0.9972 0.9979 0.9985 0.9951 0.9963 0.9973 0.9980 0.9986 0.9952 0.9964 0.9974 0.9981 0.9986 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 z 0 1 2 3 4 5 6 7 8 9 0.0 0.1 0.2 0.3 0.4 0.5000 0.5398 0.5793 0.6179 0.6554 0.5040 0.5438 0.5832 0.6217 0.6591 0.5080 0.5478 0.5871 0.6255 0.6628 0.5120 0.5517 0.5910 0.6293 0.6664 0.5160 0.5557 0.5948 0.6331 0.6700 0.5199 0.5596 0.5987 0.6368 0.6736 0.5239 0.5636 0.6026 0.6406 0.6772 0.5279 0.5675 0.6064 0.6443 0.6808 0.5319 0.5714 0.6103 0.6480 0.6844 0.5 0.6 0.7 0.8 0.9 0.6915 0.7257 0.7580 0.7881 0.8159 0.6950 0.7291 0.7611 0.7910 0.8186 0.6985 0.7324 0.7642 0.7939 0.8212 0.7019 0.7357 0.7673 0.7967 0.8238 0.7054 0.7389 0.7704 0.7995 0.8264 0.7088 0.7422 0.7734 0.8023 0.8289 0.7123 0.7454 0.7764 0.8051 0.8315 0.7157 0.7486 0.7794 0.8078 0.8340 1.0 1.1 1.2 1.3 1.4 0.8413 0.8643 0.8849 0.9032 0.9192 0.8438 0.8665 0.8869 0.9049 0.9207 0.8461 0.8686 0.8888 0.9066 0.9222 0.8485 0.8708 0.8907 0.9082 0.9236 0.8508 0.8729 0.8925 0.9099 0.9251 0.8531 0.8749 0.8944 0.9115 0.9265 0.8554 0.8770 0.8962 0.9131 0.9279 1.5 1.6 1.7 1.8 1.9 0.9332 0.9452 0.9554 0.9641 0.9713 0.9345 0.9463 0.9564 0.9649 0.9719 0.9357 0.9474 0.9573 0.9656 0.9726 0.9370 0.9484 0.9582 0.9664 0.9732 0.9382 0.9495 0.9591 0.9671 0.9738 0.9394 0.9505 0.9599 0.9678 0.9744 2.0 2.1 2.2 2.3 2.4 0.9772 0.9821 0.9861 0.9893 0.9918 0.9778 0.9826 0.9864 0.9896 0.9920 0.9783 0.9830 0.9868 0.9898 0.9922 0.9788 0.9834 0.9871 0.9901 0.9925 0.9793 0.9838 0.9875 0.9904 0.9927 2.5 2.6 2.7 2.8 2.9 0.9938 0.9953 0.9965 0.9974 0.9981 0.9940 0.9955 0.9966 0.9975 0.9982 0.9941 0.9956 0.9967 0.9976 0.9982 0.9943 0.9957 0.9968 0.9977 0.9983 0.9945 0.9959 0.9969 0.9977 0.9984 5 6 ADD 20 20 19 19 18 24 24 23 22 22 Critical values for the normal distribution If Z has a normal distribution with mean 0 and variance 1 then, for each value of p, the table gives the value of z such that P(Z  z) = p. p z 0.75 0.674 0.90 1.282 0.95 1.645 0.975 1.960 0.99 2.326 0.995 2.576 0.9975 2.807 0.999 3.090 0.9995 3.291 53.1 The Normal Distribution, in General Let Z ∼ N(0, 1) be the SNRV and σ, µ ∈ R be constants. Consider σZ + µ, itself a random variable. We know that since E [Z] = 0 and V [Z] = 1, it follows from the properties from the mean and variance that E [σZ + µ] = σE [Z] + µ = µ and V [σZ + µ] = σ 2 V [Z] = σ 2 . It turns out that σZ + µ is a normal random variable with mean µ and variance σ 2 : Definition 8. X is called a normal random variable with mean µ and variance σ 2 if its PDF fX ∶ R → R is defined by: a−µ 2 1 fX (a) = √ e−0.5( σ ) . σ 2π Once again, for the A-levels, you need not remember this complicated-looking PDF. Nor need you understand where it comes from. The following three statements are entirely equivalent: 1. X is a normal random variable with mean µ and variance σ 2 . 2. X is a random variable with normal distribution of mean µ and variance σ 2 . 3. X ∼ N (µ, σ 2 ). Page 189, Table of Contents www.EconsPhDTutor.com Example 195. The normal random variables A ∼ N(−1, 1), B ∼ N(1, 1), and C ∼ N(2, 1) have variance 1 (just like the SNRV), but non-zero means. Their PDFs are graphed below. (Included for reference is the standard normal PDF in black.) We see that the effect of increasing the mean µ is to move the graph of the PDF rightwards. And decreasing the mean moves it leftwards. Page 190, Table of Contents www.EconsPhDTutor.com Example 196. The normal random variables D ∼ N(0, 0.1), E ∼ N(0, 2), and F ∼ N(0, 3) have mean 0 (just like the SNRV), but non-unit variances. Their PDFs are graphed below. (Included for reference is the standard normal PDF in black.) The effect of changing the variance σ 2 is this: • The larger the variance, the “fatter” the “tails” of the PDF and the shorter the peak. • Conversely, the smaller the variance, the “thinner” the “tails” of the PDF and the taller the peak. Page 191, Table of Contents www.EconsPhDTutor.com Example 197. The normal random variables G ∼ N(−1, 0.1), H ∼ N(1, 2), and I ∼ N(2, 3) have non-zero means and non-unit variances. Their PDFs are graphed below. (Included for reference is the standard normal PDF in black.) Exercise 80. Let X ∼ N(µ, σ 2 ). Verify that if µ = 0 and σ 2 = 1, then for all a ∈ R, we have fX (a) = φ(a). What can you conclude? (Answer on p. 355.) Page 192, Table of Contents www.EconsPhDTutor.com In general, normality is preserved under linear transformations: Fact 11. Let X ∼ N (µ, σ 2 ) and a, b ∈ R be constants. Then aX + b ∼ N (aµ + b, a2 σ 2 ). Proof. Omitted. Thus, we can easily transform any normal random variable into the SNRV: Corollary 1. If X ∼ N (µ, σ 2 ), then X −µ = Z ∼ N(0, 1). Equivalently, X = σZ + µ. σ Proof. The next exercise asks you to prove this corollary. Exercise 81. Using Fact 11, prove that if X ∼ N (µ, σ 2 ), then on p. 356.) X −µ = Z ∼ N(0, 1). (Answer σ The above corollary gives us an alternative method for computing probabilities associated with normal random variables. In general, if X ∼ N (µ, σ 2 ), then P (X ≤ c) = P (σZ + µ ≤ c) = P (Z ≤ Page 193, Table of Contents c−µ c−µ ) = Φ( ). σ σ www.EconsPhDTutor.com The properties that we listed for the SNRV also apply, with only a few modifications, to any NRV. I highlight any differences in red. The figure that follows illustrates. Fact 12. Let X ∼ N (µ, σ 2 ) and let fX and FX be the PDF and CDF of X. 1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random variable.) 2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising implication that no matter how large a is, there is always some non-zero probability that Z ≥ a.) 3. E [X] = µ. (The mean of Z is µ.) 4. The PDF fX reaches a global maximum at the mean µ. (In fact, we can go ahead and 0.399 1 .) compute fX (µ) = √ ≈ σ σ 2π 5. V [X] = σ 2 . (The variance of X is σ 2 .) 6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference whether the inequality is strict. This is because P(Z = a) = 0.) 7. The PDF φ is symmetric about the mean. This has several implications: (a) P (X ≥ µ + a) = P (X ≤ µ − a) = FX (µ − a). (b) Since P (X ≥ µ + a) = 1 − P (X ≤ µ + a) = 1 − FX (µ + a), it follows that FX (µ − a) = 1 − FX (µ + a) or, equivalently, FX (µ + a) = 1 − FX (µ − a). (c) FX (µ) = 1 − FX (µ) = 0.5. 8. P (µ − σ ≤ X ≤ µ + σ) = Φ (1) − Φ (−1) ≈ 0.6827. (There is probability 0.6827 that X takes on values within 1 standard deviation of the mean.) 9. P (µ − σ ≤ X ≤ µ + σ) = Φ (2) − Φ (−2) ≈ 0.9545. (There is probability 0.9545 that X takes on values within 2 standard deviations of the mean.) 10. P (µ − σ ≤ X ≤ µ + σ) = Φ (3) − Φ (−3) ≈ 0.9973. (There is probability 0.9973 that X takes on values within 3 standard deviations of the mean.) 11. The PDF φ has two points of inflexion, namely at ±σ. (The points of inflexion are one standard deviation away from the mean.) Proof. Omitted. Page 194, Table of Contents www.EconsPhDTutor.com Page 195, Table of Contents www.EconsPhDTutor.com Example 198. Let G ∼ N(−1, 0.1), H ∼ N(1, 2), and I ∼ N(2, 3). We’ll find P (G < 2) using our TI84. The first few steps are similar to before: 1. Press the blue 2ND button and then VARS (which corresponds to the DISTR button). This brings up the DISTR menu. 2. Press 2 to select the “normalcdf” option. 3. Enter the lower bound −1099 by pressing (-) , the blue 2ND button, EE (which corresponds to the , button), and then 9 9 . (Don’t press ENTER yet!) 4. Enter the upper bound 2 by pressing , and 2 . (Don’t press ENTER yet!!). After Step 1. After Step 2. After Step 3. After Step 4. Previously, we didn’t bother telling the TI84 our mean µ and standard deviation σ. And so by default, if we pressed ENTER at this point, the TI84 simply assumed that we wanted the SNRV Z ∼ N(0, 1). Now we’ll tell the TI84 what µ and σ are: 5. First enter the mean µ = −1. Press , (-) 1 . √ √ 0 6. Now enter the standard deviation σ = 0.1 (and not the variance). Press , . 1 ) . Finally, press ENTER . The TI84 says that P (G < 2) ≈ 1. After Step 5. After Step 6. Finding P (H < 2), P (I < 2), P (−1 < G < 1), P (−1 < H < 1), and P (−1 < I < 1) is similar: P (H < 2) and P (I < 2) P (−1 < G < 1) P (−1 < H < 1) P (−1 < I < 1) Since I has mean µ = 2, we should have exactly P (I < 2) = 0.5. So here the TI84 has actually made a small error in reporting instead that P (I < 2) ≈ 0.5000000005. Page 196, Table of Contents www.EconsPhDTutor.com Example 199. We now redo the previous two examples, but use Z-tables: P (G < 2) = P (Z < 2 − µG 2 − (−1) = √ ≈ 9.4868) = Φ (9.4868) ≈ 1, σG 0.1 P (H < 2) = P (Z < 2 − µH 2 − 1 = √ ≈ 0.7071) = Φ (0.7071) ≈ 0.7601, σH 2 P (I < 2) = P (Z < 2 − µI 2 − 2 = √ = 0) = Φ (0) = 0.5, σI 3 −1 − (−1) 1 − (−1) √ <Z< √ ≈ 6.3246) 0.1 0.1 = Φ (6.3246) − Φ (0) ≈ 1 − Φ(0) = 0.5. P (−1 < G < 1) = P (0 = −1 − 1 1−1 P (−1 < H < 1) = P (−1.4142 ≈ √ < Z < √ = 0) 2 2 = Φ(0) − Φ(−1.4142) ≈ 0.5 − [1 − Φ(1.4142)] = Φ(1.4142) − 0.5 ≈ 0.9213 − 0.5 = 0.4213, −1 − 2 1−2 P (−1 < I < 1) = P (−1.7321 ≈ √ < Z < √ ≈ −0.5774) 3 3 = Φ(−0.5774) − Φ(−1.7321) = 1 − Φ(0.5774) − [1 − Φ(1.7321)] ≈ 0.9584 − 0.7182 = 0.2402. Exercise 82. Let X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2). Using both the Z-tables and your graphing calculator, find the following: (a) P (X ≥ 1) and P (Y ≥ 1). (b) P (−2 ≤ X ≤ −1.5) and P (−2 ≤ Y ≤ −1.5). (Answer on p. 356.) Page 197, Table of Contents www.EconsPhDTutor.com 53.2 Sum of Independent Normal Random Variables Theorem 1. If X and Y are independent normal random variables, then X + Y is also a normal random variable. Moreover, X − Y is also a normal random variable. Proof. Omitted. We already knew from before that E [X ± Y ] = E [X] ± E [Y ]. Moreover, if X and Y are independent, then V [X ± Y ] = V [X] + V [Y ]. Thus, the above theorem implies: 2 ) and Y ∼ N (µY , σY2 ) be independent and a, b ∈ R Corollary 2. Let X ∼ N (µX , σX 2 be constants. Then X + Y ∼ N (µX + µY , σX + σY2 ) and more generally, aX + bY ∼ 2 N (aµX + bµY , a2 σX + b2 σY2 ). 2 Moreover, X − Y ∼ N (µX − µY , σX + σY2 ) 2 N (aµX − bµY , a2 σX + b2 σY2 ). and more generally, aX − bY ∼ Examples: Page 198, Table of Contents www.EconsPhDTutor.com Example 200. The weight (in kg) of a sumo wrestler is modelled by X ∼ N (200, 50). Assume that the weight of each sumo wrestler is independent of the weight of any other sumo wrestler. We randomly choose two sumo wrestlers. (a) What is the probability that their total weight is greater than 405 kg? (b) What is the probability that one is more than 10% heavier than that the other? (a) Let X1 ∼ N (200, 50) and X2 ∼ N (200, 50) be the weight of the first and second sumo wrestler. Then X1 + X2 ∼ N (400, 100). Thus, P (X1 + X2 > 405) = P (Z > 405 − 400 √ ) = P (Z > 0.5) = 1 − Φ (0.5) ≈ 1 − 0.6915 = 0.3085. 100 (b) Our goal is to find p = P (X1 > 1.1X2 ) + P (X2 > 1.1X1 ). This is the probability that the first sumo wrestler is more than 10% heavier than the second, plus the probability that the second is more than 10% heavier than the first. Of course, by symmetry, these two probabilities are equal. Thus, p = 2 × P (X1 > 1.1X2 ). Now, P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) . But X1 − 1.1X2 ∼ N (200 − 1.1 ⋅ 200, 50 + 1.12 ⋅ 50) = N (−20, 110.5). Thus, 0 − (−20) P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) = P (Z > √ ) 110.5 ≈ P (Z > 1.9026) = 1 − Φ (1.9026) ≈ 1 − 0.9714 = 0.0286. Altogether then, p = 2P (X1 > 1.1X2 ) = 2 × 0.0286 = 0.0572. Page 199, Table of Contents www.EconsPhDTutor.com Example 201. The weight (in kg) of a caught fish is modelled by X ∼ N (1, 0.4). The weight (in kg) of a caught shrimp is modelled by Y ∼ N (0.1, 0.1). Assume that the weights of any caught fish and shrimp are independent. (a) What is the probability that the total weight of 4 caught fish and 50 caught shrimp is greater than 10 kg? (b) What is the probability that a caught fish weighs more than 9 times as much as a caught shrimp? (a) Let S be the total weight of 4 caught fish and 50 caught shrimp. Note, importantly, that it would be wrong to write S = 4X + 50Y , because 4X + 50Y would be 4 times the weight of a single caught fish, plus 50 times the weight of a single caught shrimp. In contrast, we want Z to be the sum of the weights of 4 independent fish and 50 independent shrimp. Thus, we should instead write S = X1 + X2 + X3 + X4 + Y1 + Y2 + ⋅ ⋅ ⋅ + Y50 , where • X1 ∼ N (1, 0.4), X2 ∼ N (1, 0.4), X3 ∼ N (1, 0.4), and X4 ∼ N (1, 0.4) are the weights of each caught fish. • Y1 ∼ N (0.1, 0.1), Y2 ∼ N (0.1, 0.1), . . . , and Y50 ∼ N (0.1, 0.1) are the weights of each caught shrimp. Now, S ∼ N (4 × 1 + 50 × 0.1, 4 × 0.4 + 50 × 0.1) = N (9, 6.6). (Note by the way that in contrast, 4X +50Y ∼ N (9, 42 × 0.4 + 502 × 0.1) = N (9, 256.4), which has a rather different variance!) Thus, P (S > 10) ≈ 0.3485 (calculator). (b) P (X > 9Y ) = P (X − 9Y > 0). But X − 9Y ∼ N (1 − 9 × 0.1, 0.4 + 92 × 0.1) = N (0.1, 8.5). Thus, P (X − 9Y > 0) ≈ 0.5137 (calculator). Page 200, Table of Contents www.EconsPhDTutor.com Exercise 83. (Answer on p. 357.) Water and electricity usage are billed, respectively, at $2 per 1, 000 litres (l) and $0.30 per kilowatt-hour (kWh). Assume that each month, the amount of water used by Ahmad (and his family) at their HDB flat is normally distributed with mean 25, 000 l and variance 64, 000, 000 l2 . Similarly, the amount of electricity they use is normally distributed with mean 200 kWh and variance 10, 000 kWh2 . Assume that monthly water usage and electricity usage are independent. (a) Find the probability that their total water and electricity utility bill in any given month exceeds $100. (b) Find the probability that their total water and electricity utility bill in any given year exceeds $1, 000. Suppose instead that electricity usage is billed at $x per kWh. (c) Then what is the maximum value of x, in order for the probability that the total utility bill in a given month exceeds $100 is 0.1 or less? Page 201, Table of Contents www.EconsPhDTutor.com 54 The Central Limit Theorem and The Normal Approximation Suppose we have n independent random variables, each identically-distributed with mean µ ∈ R and variance σ 2 ∈ R. Then informally, the Central Limit Theorem (CLT) says: If n is “large enough”, then the sum of n independent, identically-distributed random variables is well-approximated by a normal distribution. How large is “large enough”? The most common rule-of-thumb is that n ≥ 30 is “large enough”, so that’s what we’ll use in this book, even though this is somewhat arbitrary. Page 202, Table of Contents www.EconsPhDTutor.com Example 202. Let X be the random variable that is the sum of 100 rolls of a fair die. From our earlier work, we know that each die roll has mean 3.5 and variance 35/12. Problem: Find P(X ≥ 360) and P(X > 360). The CLT says that since n = 100 ≥ 30 is large enough and the distribution is “nice enough” (we are assuming this), the random variable X can be approximated by the normal random variable Y ∼ N (100 × 3.5, 100 × 35/12) = N (350, 3500/12). Now, in using Y as an approximation for X, we might be tempted to simply write P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360). Note however that X is a discrete random variable, so that P(X ≥ 360) ≠ P(X > 360). More specifically, P(X ≥ 360) = P(X = 360) + P(X > 360). In contrast, Y is a continuous random variable, so that P(Y ≥ 360) = P(Y > 360). Hence, if we simply use the approximations P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360), then implicitly we’d be saying that P(X = 360) = 0, which is blatantly false. To correct for this, we perform the so-called continuity correction. This says that we’ll instead use the approximations P(X ≥ 360) ≈ P(Y ≥ 359.5) and P(X > 360) ≈ P(Y ≥ 360.5). Thus, P(X ≥ 360) ≈ P(Y ≥ 359.5) ≈ 0.2890 (calculator) and P(X > 360) ≈ P(Y ≥ 360.5) ≈ 0.2693. Page 203, Table of Contents www.EconsPhDTutor.com Continuity Correction. If X is a discrete random variable that is to be approximated by a continuous random variable Y , then • P (X ≥ k) ≈ P (Y ≥ k − 0.5), • P (X ≤ k) ≈ P (Y ≤ k + 0.5), • P (X > k) ≈ P (Y > k + 0.5), • P (X < k) ≈ P (Y < k − 0.5). Note that if the random variable to be approximated is itself continuous, then there is no need to perform the continuity correction. This is illustrated in Exercise 85 below. Exercise 84. Let X be the random variable that is the sum of 30 rolls of a fair die. Find P(100 ≤ X ≤ 110). (Answer on p. 358.) Exercise 85. The weight of each Coco-Pop is independently- and identically-distributed with mean 0.1 g and variance 0.004 g2 . A box of Coco-Pops has exactly 5, 000 Coco-Pops. It is labelled as having a net weight of 500 g. Find the probability that that the actual net weight of the Coco-Pops in this box is less than or equal to 499 g. (Answer on p. 358.) Page 204, Table of Contents www.EconsPhDTutor.com 55 Sampling 55.1 Population A population is simply any ordered set of objects we’re interested in. Example 203. The two candidates for the 2016 Bukit Batok SMC By-Election are Dr. Chee Soon Juan and PAP Guy. It is the night of the election and voting has just closed. Our objects-of-interest are the 23, 570 valid ballots cast. (A ballot is simply a piece of paper on which a vote is recorded. The words ballot and vote are often used interchangeably.) Arrange the ballots in any arbitrary order. Let v1 = 1 if the first ballot is in favour of Dr. Chee and v1 = 0 otherwise. Similarly and more generally, for any i = 2, 3, . . . , 23570, let vi = 1 if the ith ballot is in favour of Dr. Chee and v1 = 0 otherwise. Our population here is simply the ordered set P = (v1 , v2 , . . . , v23570 ). So in this example, the population is simply an ordered set of 1s and 0s. Page 205, Table of Contents www.EconsPhDTutor.com 55.2 Population Mean and Population Variance The population mean µ is simply the average across all population values. The population variance σ 2 is a measure of the variation across all population values. Formally:16 Definition 9. Given a finite population P = (v1 , v2 , . . . , vk ), the population mean µ and population variance σ 2 are defined by 2 2 2 2 k k (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vk − µ) ∑i=1 (vi − µ) ∑i=1 vi v1 + v2 + ⋅ ⋅ ⋅ + vk 2 = and σ = = . µ= k k k k Example 205 (continued from above). Suppose that of the 23, 570 votes, 9, 142 were for Dr. Chee and the remaining against. So the vector (v1 , v2 , . . . , v23570 ) contains 9, 142 1s and 14, 428 0s. Then the population mean is µ= v1 + v2 + ⋅ ⋅ ⋅ + vn 9142 × 1 + 14428 × 0 9142 = = ≈ 0.3879. n 23570 23570 In this particular example, the population values are binary (either 0 or 1). And so we have a nice alternative interpretation: the population mean is also the population proportion. In this case, it is the proportion of the population who voted for Dr. Chee. So here the proportion of votes for Dr. Chee is about 0.3879. The population variance is 2 2 9142 9142 2 2 2 ) + 14428 ⋅ (0 − 23570 ) 9142 ⋅ (1 − 23570 (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vn − µ) 2 σ = = ≈ 0.2374. n 23570 As usual, the variance tells us about the degree to which the vi ’s vary. Of course, in this example, we already know that the vi ’s can take on only two values — 0 and 1. So the variance isn’t terribly interesting or informative in this example. In particular, it doesn’t tell us anything more that the population mean didn’t already tell us (indeed, it can be shown that in this example, σ 2 = µ − µ2 ). 16 In the case of an infinite population, the definitions of µ and σ 2 must be adjusted slightly, but the intuition is the same. Page 206, Table of Contents www.EconsPhDTutor.com 55.3 Parameter Informally, a parameter is some number we’re interested in and which may be calculated based on the population. Example 205 (continued from above). A parameter we might be interested in is the population mean µ — this is also the proportion of votes in favour of Dr. Chee. (Another parameter we might be interested in is the population variance σ 2 , but let’s ignore that for now.) Voting has just closed. In a few hours’ time (after the vote-counting is done), we will know what exactly µ is. But right now, we still don’t know what µ is. Suppose we are impatient and want to know right away what µ might be. In other words, suppose we want to get an estimate of the true value of µ. What are some possible methods of getting a quick estimate of µ? One possibility is to observe a random sample of 100 votes and count the proportion of these 100 votes that are in favour of Dr. Chee. So for example, say we do this and observe that 39 out of the 100 votes are for Dr. Chee. That is, we find that the observed sample mean (which in this context can also be called the observed sample proportion) is 0.39. Then we might conclude: Based on this observed random sample of 100 votes, we estimate that µ is 0.39. The layperson might be content with this. But the statistician digs a little deeper and asks questions such as: • How do we know if this estimate is “good”? • What are the criteria to determine whether an estimate is “good”? We’ll now try to address, if only to a limited extent, these questions. But to do so, we must first precisely define terms like sample and estimate. Page 207, Table of Contents www.EconsPhDTutor.com 55.4 Distribution of a Population Informally,17 the distribution of a population tells us 1. The range of possible values taken on by the objects in the population; and 2. The proportion of the population that takes on each possible value. Example 205 (continued from above). The population is P = (v1 , v2 , . . . , v23570 ), the ordered set of 23570 ballots. Suppose that of these, 9, 142 are votes for Dr. Chee (hence recorded as 1s) and the remaining 14, 428 are for PAP Guy (hence recorded as 0s). Then the distribution of the population can informally be described in words as: • A proportion 9142/23570 of the population are 1s, and • A proportion 14428/23570 of the population are 0s. Example 204. The population is P = (3, 4, 7, 7, 2, 3). Then the distribution of the population can informally be described in words as: • A proportion 1/6 of the population are 2s; • A proportion 2/6 of the population are 3s; • A proportion 1/6 of the population are 4s; and • A proportion 2/6 of the population are 7s. 17 Formally, we’d define the population distribution as a function. Indeed, some writers define the population itself as the distribution function. Page 208, Table of Contents www.EconsPhDTutor.com 55.5 A Random Sample Informally, to observe a random sample of size n, we follow this procedure: Imagine the 23, 570 ballots are in a single big bag. 1. Randomly pull out one ballot. Record the vote (either we write x1 = 1, if the vote was for Dr. Chee, or we write x1 = 0, if it wasn’t). 2. Put this ballot back in (this second step is why we call it sampling with replacement). 3. Repeat the above n times in total, so as to record down the values of x1 , x2 , . . . , xn . We call (x1 , x2 , . . . , xn ) an observed random sample of size n. Note that this is an ordered set (or vector) of numbers. Formally: Definition 10. Let P be a population. Then the random vector (i.e. ordered set of random variables) (X1 , X2 , . . . , Xn ) is a random sample of size n from the population P if • X1 , X2 , . . . , Xn are independent; and • X1 , X2 , . . . , Xn are identically-distributed, with the same distribution as P . As always, we must be careful to distinguish between a function and a value taken on by the function. This table summarises. Value taken by the function Function f is a function f (x) is a possible value taken on by the function X is a random variable x is a possible observed value of the random variable (X1 , X2 , . . . , Xn ) is a random sample (x1 , x2 , . . . , xn ) is a possible observed random sample An example to illustrate: Page 209, Table of Contents www.EconsPhDTutor.com Example 205 (continued from above). To repeat, the distribution of the population P = (v1 , v2 , . . . , v23570 ) can informally be described in words as: • 9142/23570 of the population were 1s; and • 14428/23570 of the population were 0s. Let X1 , X2 , and X3 be independent random variables, each with the same distribution as the population. That is, for each i = 1, 2, 3, P (Xi = 0) = 14428 23570 and P (Xi = 1) = 9142 . 23570 The ordered set (or vector) (X1 , X2 , X3 ) is a random sample of size 3. An example of an observed random sample of size 3 might be (x1 , x2 , x3 ) = (1, 1, 0) — this would be where we randomly sample 3 ballots (with replacement) and find that the first two are votes for Dr. Chee but the third is not. Another example of an observed random sample of size 3 might be (x1 , x2 , x3 ) = (0, 0, 0) — this would be where we randomly sample 3 ballots (with replacement) and find that none of the three are for Dr. Chee. As another example, (X1 , X2 , X3 , X4 , X5 ) is a random sample of size 5. An example of an observed random sample of size 5 might be (x1 , x2 , x3 , x4 , x5 ) = (0, 1, 0, 1, 0) — this would be where we randomly sample 5 ballots (with replacement) and find that only the second and fourth are votes for Dr. Chee. Another example of an observed random sample of size 5 might be (x1 , x2 , x3 , x4 , x5 ) = (1, 1, 0, 1, 1) — this would be where we randomly sample 5 ballots (with replacement) and find that only the third is not a vote for Dr. Chee. In this textbook, we’ll be very careful to distinguish between a random sample (which is a vector of random variables) and an observed random sample (which is a vector of real numbers). This may be contrary to the practice of your teachers or indeed even the A-level exams. Page 210, Table of Contents www.EconsPhDTutor.com 55.6 Sample Mean and Sample Variance Definition 11. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Then the corresponding sample mean X̄ and the sample variance of S are the random variables defined by: X̄ = X 1 + X 2 + ⋅ ⋅ ⋅ + Xn , n 2 2 2 2 n (X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄) ∑i=1 (Xi − X̄) 2 S = = . n−1 n−1 (The List of Formulae you get during exams will contain the observed sample variance.) Note that strangely enough, the denominator of S 2 is n − 1, rather than n as one might expect. As we’ll see later, there is a good reason for this. By the way, there are two other formulae for calculating the sample variance: Fact 13. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Let X̄ be the sample mean and S 2 be the sample variance. Let a ∈ R be a constant. Then n 2 [∑n i=1 Xi ] n ∑i=1 Xi2 − 2 (a) S = n−1 2 [∑ (X −a)] n ∑i=1 (Xi − a) − i=1 n i 2 and (b) S = . n−1 2 n (The List of Formulae has a but not b.) Proof. Omitted. Page 211, Table of Contents www.EconsPhDTutor.com Once again, it is important to distinguish between • The sample mean X̄ (a random variable) vs. the observed sample mean x̄ (a real number). • The sample variance S 2 (a random variable) vs. the observed sample variance s2 (a real number). Example 205 (continued from above). Let (X1 , X2 , X3 ) be a random sample of size 3. The corresponding sample mean X̄ and sample variance S 2 are these random variables: X1 + X2 + X3 X̄ = , 3 2 2 2 (X1 − X̄) + (X2 − X̄) + (X3 − X̄) S2 = . 3−1 Suppose our observed random sample of size 3 is (1, 0, 0). Then the corresponding observed sample mean x̄ and observed sample variance s2 are these real numbers: x̄ = x1 + x2 + x3 1 + 0 + 0 1 = = , n 3 3 2 2 2 2 2 2 (1 − 13 ) + (0 − 31 ) + (0 − 31 ) (x1 − x̄) + (x2 − x̄) + (x3 − x̄) 1 2 s = = = . n−1 3−1 3 Let (X1 , X2 , X3 , X4 , X5 ) be a random sample of size 5. The corresponding sample mean X̄ and sample variance S 2 are these random variables: X 1 + X 2 + X 3 + X4 + X5 X̄ = , 5 2 2 2 (X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (X5 − X̄) S2 = . 5−1 Suppose our observed random sample of size 5 is (0, 1, 0, 0, 1). Then the corresponding observed sample mean x̄ and observed sample variance s2 are these real numbers: x̄ = x1 + x2 + x3 + x4 + x5 0 + 1 + 0 + 0 + 1 2 = = = 0.4, n 5 5 2 2 2 2 2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) + (x4 − x̄) + (x5 − x̄) s = n−1 2 2 2 2 2 (0 − 51 ) + (1 − 15 ) + (0 − 51 ) + (0 − 15 ) + (1 − 51 ) = = 0.35. 5−1 2 Page 212, Table of Contents www.EconsPhDTutor.com We call a random variable an estimator if it is used to generate estimates (“guesses”) for some parameter. Example: Example 205 (continued from above). It is the night of the election and polling has just closed. We still do not know the true proportion µ that voted for Dr. Chee. We decide to get a random sample of size 3: (X1 , X2 , X3 ). The corresponding sample mean X̄3 = (X1 + X2 + X3 ) /3 shall be an estimator for µ. (Informally, an estimator is a method for generating “guesses” for some unknown parameter, in this case µ.) This estimator is used to generate estimates (“guesses”) for µ. For every observed random sample, the estimator generates an estimate. Suppose our observed random sample of size 3 is (1, 0, 0). We calculate the corresponding observed sample mean to be x̄ = 1/3. We say that x̄ = 1/3 is an estimate for µ. (By the way, unless we are extremely lucky, it is highly unlikely that the true value of the unknown parameter µ is precisely 1/3. After all, 1/3 is merely an estimate obtained from a single observed random sample of size 3.) Suppose instead that our observed random sample of size 3 were (0, 1, 1). Then the corresponding observed sample mean would be x̄ = 2/3. We’d instead say that x̄ = 2/3 is our estimate for µ. There is also more than one estimator we can use. For example, suppose instead that we decide to get a random sample of size 5: (X1 , X2 , X3 , X4 , X5 ). We shall instead use the corresponding sample mean X̄ = (X1 + X2 + X3 + X4 + X5 ) /3 as our estimator for µ. And so for example suppose our observed random sample of size 5 is is (0, 1, 0, 0, 1). Then the corresponding observed sample mean x̄ = 0.4 and x̄ = 0.4 would be our estimate for µ. Now, are these estimators and estimates “good” or “reliable”? How much should we trust them? These are questions that we’ll address in the next section. A different example: Page 213, Table of Contents www.EconsPhDTutor.com Example 205. Suppose we wish to find the average height µ (in cm) of an adult male. As a practical matter, it would be quite difficult to locate and record the height of every adult male in the world. So instead, what we might do is to randomly pick 4 adult males and record their heights. This gives us a random sample (H1 , H2 , H3 , H4 ) of heights. The corresponding sample mean is the random variable H̄ = (H1 + H2 + H3 + H4 ) /4. H̄ shall serve as our estimator for µ. Suppose our observed random sample is (h1 , h2 , h3 , h4 ) = (178, 165, 182, 175). Then the corresponding observed sample mean is h̄ = h1 + h2 + h3 + h4 178 + 165 + 182 + 175 = = 175. n 4 Thus, h̄ = 175 serves as an estimate (or “guess”) of the true average male height µ. Again, are the estimator H̄ and estimate h̄ = 175 “good” or “reliable”? How much should we trust them? These are questions that we’ll address in the next section. Page 214, Table of Contents www.EconsPhDTutor.com Example 206. Let X be the random variable that is the height (in cm) of an adult female Singaporean. Our parameters-of-interest are the true population mean µ and true population variance σ 2 of X. We wish to generate estimates for µ and σ 2 . To this end, we get a random sample of size 8: (X1 , X2 , . . . , X8 ). The corresponding sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X8 ) /8 will serve as our estimator for µ. And the corresponding 8 2 sample variance S 2 = ∑ (Xi − X̄) /(8 − 1) will serve as our estimator for σ 2 . i=1 (a) Suppose our observed random sample is such that 8 ∑ xi = 1, 320 and i=1 8 ∑ x2i = 218, 360. i=1 Then the observed sample mean x̄ and the observed sample variance s2 are n ∑i=1 xi 1320 = = 165, x̄ = n 8 (∑n xi ) ∑i=1 x2i − i=1n 2 s = n−1 n 2 218360 − 1320 8 = = 80. 7 2 And our estimates for µ and σ 2 are, respectively, 165 cm and 80 cm2 . (b) Suppose instead our observed random sample is such that 8 ∑(xi − 160) = 72 and i=1 8 2 ∑ (xi − 160) = 1, 560. i=1 Then the observed sample mean x̄ and the observed sample variance s2 are n n n 72 ∑i=1 xi ∑i=1 (xi − 160 + 160) ∑i=1 (xi − 160) x̄ = = = + 160 = + 160 = 169, n n n 8 2 [∑ (x −a)] n 1, 560 − 728 ∑i=1 (xi − 160) − i=1 ni 2 s = = ≈ 130.3. n−1 7 2 n 2 And our estimates for µ and σ 2 are, respectively, 169 cm and 130.3 cm2 . Page 215, Table of Contents www.EconsPhDTutor.com Exercise 86. Calculate the observed sample mean and variance for the following observed random sample of size 7: (3, 14, 2, 8, 8, 6, 0). (Answer on p. 358.) Exercise 87. (Answer on p. 358.) Let X be the random variable that is the weight (in kg) of an American. Suppose we are interested in estimating the true population mean µ and variance σ 2 of X. We get an observed random sample of size 10: (x1 , x2 , . . . , x10 ). 10 10 i=1 i=1 (a) Suppose you are told that ∑ xi = 1, 885 and ∑ x2i = 378, 265. Find the observed sample mean x̄ and observed sample variance s2 . 10 10 2 (b) Suppose you are instead told that ∑(xi − 50) = 1, 885 and ∑ (xi − 50) = 378, 265. Find i=1 2 i=1 the observed sample mean x̄ and observed sample variance s . Page 216, Table of Contents www.EconsPhDTutor.com 55.7 Sample Mean and Sample Variance are Unbiased Estimators Earlier we asked: How do we decide if an estimator and the estimates it generates are “good”? How do we know whether to trust any given estimate? For H1 Maths, we’ll learn only about one (important) criterion for deciding whether an estimator is “good”. This is unbiasedness. Informally, an estimator is unbiased if on average, the estimator “gets it right”. Formally: Definition 12. Let X be a random variable and θ ∈ R be a parameter (i.e. just some real number). We say that X is an unbiased estimator for θ if E [X] = θ. If x is an estimate generated by an unbiased estimator X, then we call x an unbiased estimate. The next proposition says that the sample mean X̄ is an unbiased estimator for the population mean µ; and the sample variance S 2 is an unbiased estimator for the population variance σ 2 . Proposition 3. Let (X1 , X2 , . . . , Xn ) be a random sample of size n drawn from a distribution with population mean µ and population variance σ 2 . Let X̄ be the sample mean and S 2 be the sample variance. Then (a) E [X̄] = µ. And (b) E [S 2 ] = σ 2 . Proof. You are asked to prove (a) in Exercise 89. The proof of (b) is omitted. Proposition 3(b) is the reason why, strangely enough, we define the sample variance with n − 1 in the denominator: 2 2 2 (X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄) S = . n−1 2 As defined, S 2 is an unbiased estimator for the population variance σ 2 . This, then, is the reason why we define it like this. Some writers call S 2 the unbiased sample variance, but we shall not bother doing so. We’ll simply call S 2 the sample variance. Page 217, Table of Contents www.EconsPhDTutor.com Example 203 (continued from above). (Chee Soon Juan election.) Suppose two observed random samples of size 3 are (x1 , x2 , x3 ) = (1, 0, 0) and (x1 , x2 , x3 ) = (1, 0, 1). The corresponding observed sample means are x̄1 = 1/3 and x̄2 = 2/3. These are two possible estimates (“guesses”) of the true sample proportion µ. Unless we’re extremely lucky, it’s unlikely that either of these two estimates is exactly correct. Nonetheless, what the above unbiasedness proposition tells us is this: Suppose the unknown population mean is µ = 0.39. We draw the following 10 observed random samples of size 3 (table below). For each sample i, we calculate the corresponding observed sample mean x̄i . Sample i x1 x2 x3 1 1 0 1 2 0 0 0 0 1 0 3 4 1 0 0 5 0 1 1 6 1 0 0 7 0 0 0 8 0 0 0 0 0 1 9 10 1 1 0 x̄i 2/3 0 2/3 1/3 2/3 1/3 0 0 1/3 2/3 Note that every estimate x̄i is wrong. Indeed, since the sample mean X̄i can only take on values 0, 1/3, 2/3, or 1, the estimates can never possibly be equal to the true µ = 0.39. Nonetheless, what the above proposition says informally is that on average, the estimate gets it correct. Formally, E [X̄] = µ = 0.39. For a demonstration that you can play around with, try this Google spreadsheet. Page 218, Table of Contents www.EconsPhDTutor.com Exercise 88. (Answer on p. 359.) We are interested in the weight (in kg) of Singaporeans. We have an observed random sample of size 5: (32, 88, 67, 75, 56). (a) Find unbiased estimates for the population mean µ and variance σ 2 of the weights of Singaporeans. (State any assumptions you make.) (b) What is the average weight of a Singaporean? Exercise 89. Prove that E [X̄] = µ. (This is part (a) of Proposition 3). (Answer on p. 359.) Exercise 90. Suppose we flip a coin 10 times. The first 7 flips are heads and the next 3 are tails. Let 1 denote heads and 0 denote tails. (Answer on p. 360.) (a) Write down, in formal notation, our observed random sample, the observed sample mean, and observed sample variance. (b) Are these observed sample mean and variance unbiased estimates for the true population mean and variance? (c) Can we conclude that this a biased coin (i.e. the true population mean is not 0.5)? Page 219, Table of Contents www.EconsPhDTutor.com 55.8 The Sample Mean is a Random Variable This section is just to repeat, stress, and emphasise that the sample mean X̄ is itself a random variable. This is an important point. Indeed, the sample mean X̄ is both (i) a random variable; and (ii) an estimator. In contrast, an observed sample mean x̄ is both (i) a real number; and (ii) an estimate. We’ve showed that E [X̄] = µ. This equation can be interpreted in two equivalent ways: • The expected value of the sample mean equals the population mean µ. • The sample mean is an unbiased estimator for the population mean µ. We now give the variance of the sample mean. It turns out to be equal to the population variance σ 2 , divided by the sample size n. Fact 14. V [X̄] = σ2 . n Proof. You are asked to prove this fact in Exercise 91 . Exercise 91. Prove Fact 14. (Hint: Note that X̄ = Xn are independent.) (Answer on p. 360.) 1 (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) and X1 , X2 , . . . , n Exercise 92. For each of the following terms, give a formal definition and an intuitive explanation. (State whether each term is a random variable or a real number.) For simplicity, you may assume that the finite population is given by P = (x1 , x2 , . . . , xk ). (Answer on p. 361.) (a) The population mean. (b) The population variance. (c) The sample mean. (d) The sample variance. (e) The mean of the sample mean. (f) The variance of the sample mean. (g) The mean of the sample variance. (h) The observed sample mean. (i) The observed sample variance. Page 220, Table of Contents www.EconsPhDTutor.com 55.9 The Distribution of the Sample Mean Fact 15. Let X1 , X2 , . . . , Xn ∼ N (µ, σ 2 ) be independent random variables. Then X1 + X2 + ⋅ ⋅ ⋅ + X n σ2 X̄n = ∼ N (µ, ) . n n Proof. Corollary 2 tells us that the sum of normal random variables is itself a normal random variable. So X1 + X2 + ⋅ ⋅ ⋅ + Xn is a normal random variable. Fact 11 tells us that a linear transformation of a normal random variable is itself a normal random variable. So X̄n = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is a normal random variable. In the previous sections, we already showed that X̄n has mean µ and variance σ 2 /n. Altogether then, X̄n ∼ N (µ, σ2 ). n Now, suppose instead X1 , X2 , . . . , Xn are not normally-distributed. Surprisingly, a similar result still holds, thanks to the CLT. Informally, draw X1 , X2 , . . . , Xn from any distribution. Then thanks to the CLT: If n is “large enough”, then X̄n = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is well-approximated by a normal distribution with mean µ and variance σ 2 /n. In the next chapter, we’ll make greater use of the two results just given in this section. Page 221, Table of Contents www.EconsPhDTutor.com 55.10 Non-Random Samples Some examples to illustrate the concept of a non-random sample: Example 207. Suppose we’re interested in the average height of a Singaporean. The only way to know this for sure is to survey every single Singaporean. This, however, is not practical. Instead, we have only the resources to survey 100 individuals. We decide to go to a basketball court and measure the heights of 100 people there. We thereby gather an observed sample of size 100: (x1 , x2 , . . . , x100 ). We find that the average individual’s height is x̄ = ∑ xi /100 = 179 cm. Is x̄ = 179 cm an unbiased estimate of the average Singaporean’s height? Intuitively, we know that the answer is obviously no. The reason is that our observed sample of size 100 was non-random. We picked a basketball court, where the individuals are overwhelmingly (i) male; and (ii) taller than average. Our estimate x̄ = 179 cm is thus probably biased upwards. Example 208. Suppose we’re interested in what the average Singaporean family spends on food each month. The only way to know this for sure is to survey every single family in Singapore. This, however, is not practical. Instead, we have only the resources to survey 100 families. We decide to go to Sixth Avenue and randomly ask 100 families living there what they reckon they spend on food each month. We thereby gather an observed sample of size 100: (x1 , x2 , . . . , x100 ). We find that the average family spends x̄ = ∑ xi /100 = $2, 700 on food each month. Is x̄ = $2, 700 an unbiased estimate of the average monthly spending on food by a Singaporean family? Intuitively, we know that the answer is obviously no. The reason is that our observed sample of size 100 was non-random. We picked an unusually affluent neighbourhood. Our estimate x̄ = $2, 700 is thus probably biased upwards. Page 222, Table of Contents www.EconsPhDTutor.com 56 Null Hypothesis Significance Testing (NHST) Here’s a quick sketch of how Null Hypothesis Significance Testing (NHST) works: Example 209. A piece of equipment has probability θ of breaking down. We have many pieces of the same type of equipment. Assume the rates of breakdown across the pieces of equipment are identical and independent. 1. Write down a null hypothesis H0 . In this case, it might be “H0 : θ = 0.6”. 2. Write down an alternative hypothesis HA . In this case, it might be “HA : θ < 0.6”. (This is a one-tailed test — to be explained shortly.) 3. Observe a random sample. For example, we might have an observed random sample of size 5, where only the fourth piece of equipment breaks down. And so we’d write (x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0). 4. Write down a test statistic. In this case, an obvious test statistic is the sample number of failures T = X1 + X2 + X3 + X4 + X5 . Our observed test statistic is thus t = x1 + x2 + x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1. 5. Now ask, how likely is it that — if H0 were true — our test statistic would have been “at least as extreme as” that actually observed? That is, what is the probability P (Observe data as extreme as that observed∣H0 )? The above probability is called the p-value of the observed sample. In this case, the p-value is the probability of observing a random sample where 1 or fewer pieces of equipment broke down, assuming H0 ∶ θ = 0.6 were true. That is, p = P (T ≤ t = 1∣H0 ) . Now, remember that T is a random variable. In fact, it’s a binomial random variable. Assuming H0 to be true, we have T ∼ B (n, θ) = B (5, 0.6). Thus, p = P (T ≤ 1∣H0 ) = P (T = 0∣H0 ) + P (T = 1∣H0 ) = ⎛5⎞ 0 5 ⎛5⎞ 1 4 0.6 0.4 + 0.6 0.4 = 0.08704. ⎝0⎠ ⎝1⎠ This says that if H0 were true, then the probability of observing a test statistic as extreme as the one we actually observed is only 0.08704. We might interpret this relatively small p-value as casting doubt on or providing evidence against H0 . Page 223, Table of Contents www.EconsPhDTutor.com Here is the full list of the ingredients that go into NHST. Null Hypothesis Significance Testing (NHST) 1. Null hypothesis H0 (e.g. “this equipment has probability 0.6 of breaking down”). 2. Alternative hypothesis HA (e.g. “this equipment has probability less than 0.6 of breaking down”). The test is either one-tailed or two-tailed, depending on HA . 3. A random sample of size n: (X1 , X2 , . . . , Xn ). 4. A test statistic T (which simply maps each observed random sample to a real number.) 5. The p-value of the observed sample. This is the probability that — assuming H0 were true — T takes on values that are at least “as extreme as” the actual observed test statistic t. 6. The significance level α. This is a pre-selected threshold, usually chosen to be some small value. The conventional significance levels are α = 0.1, α = 0.05, or α = 0.01. We then conclude qualitatively that: • A small p-value casts doubt on or provides evidence against H0 . • A large p-value fails to cast doubt on or provide evidence against H0 . In particular, if p < α, then we say that we reject H0 at the significance level α. And if p ≥ α, then we say that we fail to reject H0 at the significance level α. Note importantly that to reject H0 (at some significance level α) does NOT mean that H0 is false and HA is true. Similarly, failure to reject H0 does NOT mean that H0 is true and HA is false. More on this below. Another example of NHST, now slightly more formally and carefully presented. Page 224, Table of Contents www.EconsPhDTutor.com Example 205. (Dr. Chee election example.) Our parameter of interest is µ, the proportion of votes for Dr. Chee. We guess that Dr. Chee won only 30% of the votes. We might thus write down two competing hypotheses: H0 ∶ µ = 0.3, HA ∶ µ > 0.3. We call H0 the null hypothesis and HA the alternative hypothesis. We pre-select α = 0.05 as our significance level. This is the arbitrary threshold at which we’ll say we reject (or fail to reject) H0 . We gather a random sample of 100 votes: (X1 , X2 , . . . , X100 ). Our test statistic is the number of votes in favour of Dr. Chee, given by T = X1 + X2 + ⋅ ⋅ ⋅ + X100 . Suppose that in our observed random sample (x1 , x2 , . . . , x100 ), we find that 39 are in favour of Dr. Chee. Our observed test statistic is thus t = 39. We now ask: What is the probability that — assuming H0 were true — T takes on values that are at least “as extreme as” the actual observed test statistic t? That is, what is the p-value of the observed sample? Now, assuming H0 were true, T is a binomial random variable with parameters 100 and 0.3. That is, T ∼ B (n, p) = B (100, 0.3). So: p = P (T ≥ 39∣H0 ) = P (T = 39∣H0 ) + P (T = 40∣H0 ) + ⋅ ⋅ ⋅ + P (T = 100∣H0 ) = ⎛ 100 ⎞ 39 61 ⎛ 100 ⎞ 40 60 ⎛ 100 ⎞ 100 0 0.3 0.7 + 0.3 0.7 + ⋅ ⋅ ⋅ + 0.3 0.7 ≈ 0.03398. ⎝ 39 ⎠ ⎝ 40 ⎠ ⎝ 100 ⎠ The small p-value casts doubt on or provides evidence against H0 . And since p ≈ 0.03398 < α = 0.05, we can also say that we reject H0 at the α = 0.05 significance level. Page 225, Table of Contents www.EconsPhDTutor.com A wee bit of philosophy. The interpretation of probability and statistics used in the A-levels (and thus also in this textbook) is called the objectivist interpretation. You needn’t know much about this, but what you should know is this: Let θ be the parameter we’re interested in. Under the objectivist interpretation, the value of θ may be unknown, but it is fixed. This has two consequences: 1. We never speak probabilistically about θ, because θ is a fixed number. For example, we never say “θ is probably less than 0.6” or “θ has probability 0.8 of being between 0.4 and 0.7”. Such statements are nonsensical. 2. The null hypothesis, which is always written as an equality (e.g. “H0 ∶ θ = 0.6”), is almost certainly false. After all, θ can (usually) take on a continuum of values. So do NOT interpret “we fail to reject H0 ” to mean “H0 is true”. This is because H0 is almost certainly false. When performing NHST, we will assiduously avoid saying things like “H0 is true”, “H0 is false”, “HA is true”, or “HA is false”. Instead, we will stick strictly to saying either “we reject H0 at the significance level α” or “we fail to reject H0 at the significance level α”. Each of these two statements has a very precise meaning. The first says that p < α. The second says that p ≥ α. Nothing more and nothing less. Exercise 93. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level, whether the coin is biased towards heads. (Answer on p. 362.) Page 226, Table of Contents www.EconsPhDTutor.com 56.1 One-Tailed vs Two-Tailed Tests In the previous section, all the NHST we did were one-tailed tests.18 For example, in the NHST done for Dr. Chee, we had H0 ∶ µ = 0.3, HA ∶ µ > 0.3. This was a one-tailed test because the alternative hypothesis HA was that µ was to the right of 0.3. If instead we changed the alternative hypothesis to: H0 ∶ µ = 0.3, HA ∶ µ ≠ 0.3. Then this would be called a two-tailed test, because the alternative hypothesis HA is that µ is either to the left or to the right of 0.3. We now repeat the examples done in the previous section, but with HA tweaked so that we instead have two-tailed tests. The difference is that the p-value is calculated differently. 18 By the way, the more common convention is to say “one-tailed” and “two-tailed” tests, rather than “one-tail” and “twotail” tests, as is the norm in Singapore (similar to those “Close for break” signs you sometimes see). But after some consultation with my grammatical experts, I have been told that both are equally correct. Page 227, Table of Contents www.EconsPhDTutor.com Example 209 (equipment breakdown). Everything is as before, except that we now change the alternative hypothesis: H0 ∶ θ = 0.6, HA ∶ θ ≠ 0.6. Say we observe the same random sample as before: (x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0). Again our test statistic is the sample number of failures T = X1 + X2 + X3 + X4 + X5 . And so again our observed test statistic is t = x1 + x2 + x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1. The difference now is how the p-value (of the observed sample) is calculated. In words, the p-value gives the likelihood that our test statistic is “at least as extreme as” that actually observed — assuming H0 were true. Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme as that actually observed” to mean the event T ≤ t = 1. Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both the event T ≤ t = 1 and the event that T is as far away on the other side of E [T ∣H0 ] = 3. The second event is, specifically, T ≥ 5. Altogether then, the p-value is given by p = P (T ≤ 1, T ≥ 5∣H0 ) = P (T = 0∣H0 ) + P (T = 1∣H0 ) + P (T = 5∣H0 ) = ⎛5⎞ 0 5 ⎛5⎞ 1 4 ⎛5⎞ 1 4 0.6 0.4 + 0.6 0.4 + 0.6 0.4 = 0.1648. ⎝0⎠ ⎝1⎠ ⎝5⎠ Since p = 0.1648 ≥ α = 0.1, we say that we fail to reject H0 at the α = 0.1 significance level. Observe that previously, under the one-tailed test, we could reject H0 at the α = 0.1 significance level, because there p = 0.08704. Now, in contrast, under the two-tailed test, we fail to reject H0 at the same significance level. In general, all else equal, the p-value for an observed random sample is greater under a two-tailed test than under a one-tailed test. Thus, under a two-tailed test, we are less likely to reject H0 . Page 228, Table of Contents www.EconsPhDTutor.com Example 205 (Dr. Chee election). We change the alternative hypothesis: H0 ∶µ = 0.3, HA ∶µ ≠ 0.3. Say we observe the same random sample as before: (x1 , x2 , . . . , x100 ), in which 39 votes were in favour of Dr. Chee. So again our observed test statistic is t = x1 + x2 + ⋅ ⋅ ⋅ + x100 = 39. The difference now is how the p-value (of the observed sample) is calculated. In words, the p-value gives the likelihood that our test statistic is “at least as extreme as” that actually observed — assuming H0 were true. Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme as that actually observed” to mean the event T ≥ t = 39. Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both the event T ≥ t = 39 and the event that T is as far away on the other side of E [T ∣H0 ] = 30. The second event is, specifically, T ≤ 21. Altogether then, the p-value is given by p = P (T ≤ 21, T ≥ 39∣H0 ) = 1 − P (22 ≤ T ≤ 38∣H0 ) = 1 − [P (T = 22∣H0 ) + P (T = 23∣H0 ) + ⋅ ⋅ ⋅ + P (T = 38∣H0 )] ⎡ ⎤ ⎢⎛ 100 ⎞ 22 78 ⎛ 100 ⎞ 23 77 ⎛ 100 ⎞ 38 62 ⎥⎥ ⎢ =1−⎢ 0.3 0.7 + 0.3 0.7 + ⋅ ⋅ ⋅ + 0.3 0.7 ⎥ ≈ 0.06281. ⎝ 23 ⎠ ⎝ 38 ⎠ ⎢⎝ 22 ⎠ ⎥ ⎦ ⎣ Since p = 0.06281 ≥ α = 0.05, we say that we fail to reject H0 at the α = 0.05 significance level. Again observe that previously, under the one-tailed test, we could reject H0 at the α = 0.05 significance level, because there p = 0.03398. Now, in contrast, under the two-tailed test, we fail to reject H0 at the same significance level. Exercise 94. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level, whether the coin is biased.(Answer on p. 362.) Page 229, Table of Contents www.EconsPhDTutor.com 56.2 The Abuse of NHST (Optional) NHST is popular because it gives a simplistic, formulaic cookbook procedure. Moreover, its conclusion appears to be binary: either we reject H0 or we fail to reject H0 . However, NHST is widely misunderstood, misinterpreted, and misused even within scientific communities. It has long been heavily criticised. In March 2016, the American Statistical Association even issued an official policy statement on how NHST should be used! Here I discuss only the most important, commonly-made error. We may write the p-value as p = P (D∣H0 ) , where D stands for the observed data and H0 stands for the null hypothesis. The p-value answers the following question: — assuming H0 were true, what’s the probability that we’d get data “at least as extreme” as those actually observed (D)? Say we get a p-value of 0.03. We should then say simply that • The small p-value casts doubt on or provides evidence against H0 . • If the pre-selected significance level was α = 0.05, then we may say that we reject H0 at the 5% significance level. However, instead of merely saying the above, some researchers may instead conclude that: H0 is true with probability 0.03. Do you see the error here? The researcher has gone from the finding that p = P (D∣H0 ) = 0.03 to the conclusion that P (H0 ∣D) = 0.03. The error is the same as leaping from “A lottery ticket buyer who doesn’t cheat has a small probability q of winning” to “Jane bought a lottery ticket and won. Therefore, there is only probability q that she didn’t cheat.” The p-value is NOT the probability that H0 is true.19 Instead, it is the probability that — assuming H0 were true — we would have gotten data “at least as extreme” as those actually observed. This is an important difference. But it is also a subtle one, which is why even researchers get confused. 19 Indeed, under the objectivist view, such a statement is nonsensical anyway, because H0 is either true or not true; it makes no sense to talk probabilistically about whether H0 is true. Page 230, Table of Contents www.EconsPhDTutor.com 56.3 Common Misinterpretations of the Margin of Error (Optional) The sampling error or margin of error is often misinterpreted by laypersons (and journalists). Example 210. On the night of the 2016 Bukit Batok SMC By-Election, the Elections Department announced* that based on a sample count of 900 ballots, • Dr. Chee had won 39% of the votes. • These sample counts have a confidence level of 95%, with a ±4% margin of error. What does the above gobbledygook mean? Let µ be the true proportion of votes won by Dr. Chee. Let X̄ be the sample proportion and x̄ be the observed sample proportion. It’s clear enough what the 39% means — they randomly counted 900 ballots and found (after accounting for any spoilt votes) that x̄ = 39% were in favour of Dr. Chee. What’s less clear is what the 95% confidence level and ±4% margin of error mean. Here are three possible interpretations of what is meant. Only one is correct. 1. “With probability 0.95, µ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43).” 2. “With probability 0.95, X̄ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43).” Equivalently, suppose we repeatedly observe many random samples of size 900. Then we should find that in 0.95 of these observed random samples, the observed sample mean is between 0.35 and 0.43. 3. “With probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04).” We have no idea what µ is. All we can say is that with probability 0.95, the sample mean X̄ of votes for Dr. Chee is between µ − 0.04 and µ + 0.04. Equivalently, suppose we repeatedly observe many random samples of size 900. Then we should find that in 0.95 of these observed random samples, the observed sample mean is between µ − 0.04 and µ + 0.04. Take a moment to understand what each of the above interpretations say. Then decide which you think is the correct interpretation, before turning to the next page. (... Example continued on the next page ...) *Sources: The Straits Times [backup], TodayOnline [backup]. Page 231, Table of Contents www.EconsPhDTutor.com (... Example from the previous page ...) Interpretation #1 — “with probability 0.95, µ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43)” — is perhaps the one most commonly made by laypersons.* It makes two errors: 1. It is nonsensical to speak probabilistically about the proportion µ of votes won by Dr. Chee. µ is some fixed number. So either µ is in the interval (0.35, 0.43), or it isn’t. It makes no sense to speak probabilistically about whether µ is in that interval. 2. The margin of error is applicable to the true proportion µ and not to the observed sample proportion x̄ = 0.39. Some “authorities” often attempt** to correct Interpretation #1 by offering Interpretation #2 — “with probability 0.95, X̄ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43)”. However, Interpretation #2 is still wrong, because it still makes the second of the above two errors. Unfortunately, the correct interpretation is also the one that says the least. It is Interpretation #3 — “with probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04)”. This interpretation says merely that if we were somehow able to repeatedly observe random samples of size 900, then we’d find that 0.95 of the corresponding observed sample means will be in (µ − 0.04, µ + 0.04). Which isn’t saying much, because first of all, we have only one observed random sample; we do not get to repeatedly observe random samples. Secondly, this still doesn’t tell us much about µ, which is what we’re really interested in. The correct interpretation (Interpretation #3) is the least interesting interpretation. Perhaps this explains why journalists often prefer to give an incorrect interpretation. *E.g. the article “Margin of Ignorance” (backup) begins by reporting poll results that Kerry-Edwards was supported by 51% of voters, while Bush-Cheney was supported by 45%. The author then ridicules other journalists for their misinterpretation of these data. (He also claims, incorrectly, that polling is based on the Central Limit Theorem.) He then triumphantly gives the “correct” explanation: “95 times out of 100 the true Kerry-Edwards number will fall between 47 and 55 and the Bush-Cheney number will fall between 41 and 49.” This, of course, is what we called incorrect Interpretation #1 above. **Section 3 of “Erring on the Margin of Error” lists some such mistakes. For a discussion of where the Elections Department’s ±4% margin of error comes from, please see the Appendices of my H2 Mathematics Textbook. Page 232, Table of Contents www.EconsPhDTutor.com Journalists often try to explain what the confidence level and margin of error mean — they almost always get it wrong. Example 211. On the night of the 2016 Bukit Batok SMC By-Election, a website called Mothership.sg wrote: “Based on the sample count of 100 votes,* it was revealed at 9.26pm that the SDP Sec-Gen received 39 percent of votes. In other words, Chee would score 35 per cent in the worst case scenario and 43 per cent in the best case scenario.” This is the most absurd misinterpretation of the margin of error I have ever seen.** Let’s see what the correct worst- and best-case scenarios are. Suppose that in the observed random sample of 900 votes, exactly 39% or 0.39 × 900 = 351 were votes for Dr. Chee and the remaining 549 were for PAP Guy. Then: • Worst-case scenario: The observed random sample of 900 votes happened to contain exactly all of the votes in favour of Dr. Chee. That is, Dr. Chee won only 351 votes and PAP Guy won the remaining 23, 570 − 351 = 23, 219 votes. So the correct worst-case scenario is that Dr. Chee won ≈ 1.5% of the votes. • Best-case scenario: The observed random sample of 900 votes happened to contain exactly all of the votes in favour of PAP Guy. That is, PAP Guy won only 549 votes and Dr. Chee won the remaining 23570 − 549 = 23, 021 votes. So the correct best-case scenario is that Dr. Chee won ≈ 97.7% of the votes. These worst- and best-case scenarios are admittedly unlikely. Nonetheless, they are possible scenarios all the same. The journalist’s purported worst- and best-case scenarios are completely wrong. *By the way, even this basic fact was wrong. The sample count was not 100 votes. Instead, it was 900 votes, consisting of 100 votes from each of 9 polling stations. Moreover, the Mothership.sg journalist failed to report the confidence level of 95%, either because he didn’t know what it meant or because he didn’t think it important. But it is important. It is pointless to inform the reader about the margin of error without also specifying the confidence level. **You can find several misinterpretations of the margin of error collected in this academic paper: “Erring in the Margin of Error”. None is as absurdly bad as the one here. Page 233, Table of Contents www.EconsPhDTutor.com 56.4 Critical Region and Critical Value Informally, the critical region is the set of values of the observed test statistic t for which we would reject the null hypothesis. The critical region is thus sometimes also called the rejection region. And the critical value(s) is (are) the exact value(s) of the observed test statistic t at which we are just able to reject the null hypothesis. Example 205. (Dr. Chee election.) Say that as before, we have a one-tailed test where the two competing hypotheses are: H0 ∶ µ = 0.3, HA ∶ µ > 0.3. Say that as before, we choose α = 0.05 as our significance level. Say that as before, in our observed random sample of 100 votes, 39 are in favour of Dr. Chee, so that our observed test statistic is t = 39. We calculated that the corresponding p-value is 0.03398 and so we were able to reject H0 at the α = 0.05 significance level. We now calculate the critical region and the critical value. We can calculate that if t = 38, then the corresponding p-value is ≈ 0.053 (you should verify this for yourself). And so we would be unable to reject H0 . We thus conclude that the critical value is 39, because this is the value of t at which we are just able to reject H0 . And the critical region is the set {39, 40, 41, . . . , 100}. These are the values at which we’d be able to reject H0 at the α = 0.05 significance level. Page 234, Table of Contents www.EconsPhDTutor.com Same example as above, but now two-tailed: Example 205. (Dr. Chee election.) Say that as before, we have a two-tailed test where the two competing hypotheses are: H0 ∶ µ = 0.3, HA ∶ µ ≠ 0.3. The significance level is again α = 0.05. Again, the observed random sample of 100 votes contains 39 in favour of Dr. Chee, so that our observed test statistic is t = 39. We calculated that the corresponding p-value is 0.06281 and so we failed to reject H0 at the α = 0.05 significance level. We calculate that if t = 40, then the corresponding p-value is ≈ 0.03745 (you should verify this for yourself). Thus, the critical values are 20 and 40, because these are the values of t at which we are just able to reject H0 . The critical region is the set {0, 1, . . . , 20, 40, 41, . . . , 100}. These are the values at which we’d be able to reject H0 at the α = 0.05 significance level. Exercise 95. (Answer on p. 363.) We flip a coin 20 times. What are the critical region and critical value(s) in (a) A test, at the 5% significance level, of whether the coin is biased towards heads. (b) A test, at the 5% significance level, of whether the coin is biased. Page 235, Table of Contents www.EconsPhDTutor.com 56.5 Testing of a Population Mean (Small Sample, Normal Distribution, σ 2 Known) Example 212. The weight (in mg) of a grain of sand is X ∼ N (µ, 9). Our unknown parameter of interest is the true population mean µ (i.e. the true average weight of a grain of sand). Our “guess” is that µ = 5. We thus write down two competing hypotheses: H0 ∶ µ = 5, HA ∶ µ ≠ 5. (Note that this is a two-sided test.) We take a random sample of size 4 — (X1 , X2 , X3 , X4 ). Our test statistic is the sample mean X̄ = (X1 + X2 + X3 + X4 ) /4. Our observed random sample is (x1 , x2 , x3 , x4 ) = (3, 9, 11, 7). That is, we randomly pick four grains of sand that happen to have weights 3, 9, 11, and 7 mg. Then the observed test statistic is x̄ = 3 + 9 + 11 + 7 = 7.5. 4 The p-value is the probability that the test statistic X̄ takes on values “at least as extreme as” our observed test statistic x̄ = 7.5, assuming H0 ∶ µ = 5 were true. Note that if H0 were true, then X̄ ∼ N (µ, σ 2 /n) = N (5, 9/4). Thus, the p-value is given by p = P (X̄ ≥ 7.5, X̄ ≤ 2.5∣H0 ) = P (X̄ ≥ 7.5∣H0 ) + P (X̄ ≤ 2.5∣H0 ) ⎛ ⎛ 7.5 − 5 ⎞ 2.5 − 5 ⎞ =P Z≥ √ +P Z ≤ √ ≈ 0.04779 + 0.04779 = 0.09558. ⎝ ⎝ 9/4 ⎠ 9/4 ⎠ Thus, we reject H0 at the α = 0.1 significance level. However, we would fail to reject H0 at the α = 0.05 significance level. Page 236, Table of Contents www.EconsPhDTutor.com The table below summarises the tests to use for the population mean, in different circumstances. In this section, we learnt how to handle the first case (any sample size, normal distribution, σ 2 known). The following sections will deal with the other two cases. Sample size Distribution σ2 σ 2 known Any Normal Known Z-test: X̄ − µ √ ∼ N(0, 1). σ/ n Large Any Known Z-test: X̄ − µ √ ∼ N(0, 1). σ/ n Large Any Unknown Z-test: X̄ − µ √ ∼ N(0, 1). s/ n Small Normal Unknown Not in A-levels. Small Non-normal Either Not in A-levels. Exercise 96. The Singapore daily high temperature (in °C) can be modelled by X ∼ N (µ, 8). Our unknown parameter of interest is the true population mean µ (i.e. the true average daily high temperature). Your friend guesses that µ = 34. You gather the following data on daily high temperatures, of 10 randomly-chosen days in 2015: (35, 35, 31, 32, 33, 34, 31, 34, 35, 34). Test your friend’s hypothesis, at the α = 0.05 significance level. (Be sure to write down your null and alternative hypotheses.) (Answer on p. 364.) Page 237, Table of Contents www.EconsPhDTutor.com 56.6 Testing of a Population Mean (Large Sample, Any Distribution, σ 2 Known) We’ll recycle the same example from the previous section. Before, we knew that X was normally distributed. Now the big difference is that we have absolutely no idea what distribution X comes from! To compensate, we require also that our random sample is “large enough”, so that the CLT-approximation can be used. Example 213. The weight (in mg) of a grain of sand is X ∼ (µ, 9). (This says simply that X is distributed with mean µ and variance 9.) Our unknown parameter of interest is the true population mean µ (i.e. the true average weight of a grain of sand). Again, we “guess” that µ = 5. Again, we write down H0 ∶ µ = 5, HA ∶ µ ≠ 5. (Note that this is, again, a two-sided test.) This time, we’ll take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test statistic is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100. Recall the magic of the CLT. Even if we have absolutely no idea what distribution X is drawn from, then provided n is sufficiently large, X̄ is normally distributed. So here, since the sample is large (n = 100 ≥ 20), by the CLT, we know that X̄ has, approximately, the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately, X̄ ∼ N (µ, σ 2 /n) = N (5, 9/100). Say the observed test statistic we get is: x̄ = x1 + x2 + ⋅ ⋅ ⋅ + x100 = 5.5. 100 (... Example continued on the next page ...) Page 238, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) Again, the p-value is the probability that our test statistic X̄ takes on values “at least as extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the p-value is given by p = P (X̄ ≥ 5.6, X̄ ≤ 4.4∣H0 ) = P (X̄ ≥ 5.6∣H0 ) + P (X̄ ≤ 4.4∣H0 ) ⎛ ⎛ 4.4 − 5 ⎞ 5.6 − µ 4.4 − µ 5.6 − 5 ⎞ CLT √ ) + P (Z ≤ √ )=P Z≥ √ +P Z ≤ √ ≈ P (Z ≥ σ/ n σ/ n ⎝ ⎝ 9/100 ⎠ 9/100 ⎠ = P (Z ≥ 2) + P (Z ≤ −2) ≈ 0.0455. Thus, we reject H0 at the α = 0.05 significance level. Exercise 97. The Singapore daily high temperature (in °C) can be modelled by X ∼ (µ, 8). Our unknown parameter of interest is the true population mean µ (i.e. the true average daily high temperature). Your friend guesses that µ = 34. You gather the data on daily high temperatures, of 100 randomly-chosen days in 2015 and find the observed sample average temperature to be 33.4 °C. Test your friend’s hypothesis, at the α = 0.05 significance level. (Be sure to write down your null and alternative hypotheses. Also, clearly state where you use the CLT.) (Answer on p. 364.) Page 239, Table of Contents www.EconsPhDTutor.com 56.7 Testing of a Population Mean (Large Sample, Any Distribution, σ 2 Unknown) We’ll recycle the same example from the previous section. Again, we have absolutely no idea what distribution X comes from. And again, the random sample is large enough, so that the CLT can be used. But now, σ 2 is unknown. This turns out to be no big deal. We can simply replace σ 2 with the observed unbiased sample variance s2 , and do the same thing as before. Example 214. The weight (in mg) of a grain of sand is X ∼ (µ, σ 2 ). (This says simply that X is distributed with mean µ and variance σ 2 .) Our unknown parameter of interest is the true population mean µ (i.e. the true average weight of a grain of sand). Again, we “guess” that µ = 5. Again, we write down H0 ∶ µ = 5, HA ∶ µ ≠ 5. (Note that this is, again, a two-sided test.) Again, we take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test statistic is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100. Again, since the sample is large (n = 100 ≥ 20), by the CLT, that X̄ has, approximately, the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately, X̄ ∼ N (µ, σ 2 /n) = N (5, σ 2 /100). Since the sample variance S 2 is an unbiased estimator for σ 2 , it is plausible that we also have, approximately, X̄ ∼ N (µ, σ 2 /n) = N (5, s2 /100), where s2 is the observed sample variance. Say the observed sample mean and observed sample variance we get are: 2 100 x1 + x2 + ⋅ ⋅ ⋅ + x100 ∑i=1 (xi − x̄) 2 = 5.6 and s = =8 x̄ = 100 n−1 (... Example continued on the next page ...) Page 240, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) Again, the p-value is the probability that our test statistic X̄ takes on values “at least as extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the p-value is given by p = P (X̄ ≥ 5.6, X̄ ≤ 4.4∣H0 ) = P (X̄ ≥ 5.6∣H0 ) + P (X̄ ≤ 4.4∣H0 ) ⎛ ⎛ 4.4 − 5 ⎞ 5.6 − µ 4.4 − µ 5.6 − 5 ⎞ CLT +P Z ≤ √ ≈ P (Z ≥ √ ) + P (Z ≤ √ ) = P Z ≥ √ s/ n s/ n ⎝ ⎝ 8/100 ⎠ 8/100 ⎠ ≈ P (Z ≥ 2.1213) + P (Z ≤ −2.1213) ≈ 0.03389. Thus, we reject H0 at the α = 0.05 significance level. Exercise 98. The Singapore daily high temperature (in °C) can be modelled by X ∼ (µ, σ 2 ). Our unknown parameter of interest is the true population mean µ (i.e. the true average daily high temperature). Your friend guesses that µ = 34. You gather the data on daily high temperatures, of 100 randomly-chosen days in 2015. Your observed sample mean temperature is 33.4 °C and your observed sample variance is 11.2 °C2 . Test your friend’s hypothesis, at the α = 0.05 significance level. (Be sure to write down your null and alternative hypotheses. Also, clearly state where you use the CLT.) (Answer on p. 365.) Page 241, Table of Contents www.EconsPhDTutor.com 56.8 Formulation of Hypotheses Example 215. We flip a coin 100 times. We get 100 heads. What can we say about the coin? This is an open-ended question, to which there can be many different answers. Here’s the answer we’re taught to give for H2 Maths: Let µ be the probability that a coin-flip is heads. We formulate a pair of competing hypotheses: H0 ∶ µ = 0.5, HA ∶ µ ≠ 0.5. Our test statistic T is the number of heads (out of 100 coin-flips). Our observed test statistic t is 100. The corresponding p-value (note that this is a two-tailed test) is P (T ≥ 100, T ≤ 0∣H0 ) = P (T = 0∣H0 ) + P (T = 100∣H0 ) = ⎛ 100 ⎞ 0 100 ⎛ 100 ⎞ 100 0 0.5 0.5 + 0.5 0.5 ≈ 1.578 × 10−30 . ⎝ 0 ⎠ ⎝ 100 ⎠ The tiny p-value may be interpreted as casting on or providing evidence against H0 . We note also that we can easily reject H0 at any of the conventional significance levels (α = 0.1, α = 0.05, or α = 0.01). Exercise 99. (Answer on p. 365.) We observe the weights (in kg) of a random sample of 50 Singaporeans: (x1 , x2 , . . . , x50 ). We observe that ∑ xi /50 = 68 and ∑ x2i /50 = 5000. A friend claims that the average American is heavier than the average Singaporean. It is known that the average American weighs 75 kg. Is your friend correct? If you make any assumptions or approximations, make clear exactly where you do so. (Hint: Use Fact 13(a)). Page 242, Table of Contents www.EconsPhDTutor.com 57 Correlation and Linear Regression 57.1 Bivariate Data and Scatter Diagrams In this chapter, we’ll be interested in the relationship between two sets of data. Example 216. We measure the heights and weights of 10 adult male Singaporeans. Their heights (in cm) and weights (in kg) are given in this table: i 1 2 3 4 5 6 7 8 9 10 hi (cm) 182 165 173 155 178 174 169 160 150 190 wi (kg) 81 70 71 53 72 75 69 60 44 80 We call (hi , wi ) observation i. So for example, observation 5 is (178, 72) and observation 9 is (150, 44). We can plot a scatter diagram of these 10 persons’ weights (vertical axis) against their heights (horizontal). 90 Weight (kg) 80 70 60 50 Height (cm) 40 145 155 165 175 185 195 The black dotted line is called a line of best fit. Shortly (section 57.4), we’ll learn how to construct this line of best fit. The more closely the data points in the above scatter diagram lie to a straight line, the more strongly linearly-correlated are weight and height. So here with these particular data, the linear correlation between weight and height seems strong. In the next section, we’ll learn about the product moment correlation coefficient, which is a way to precisely quantify the degree to which two sets of data are linearly-correlated. Because the line of best fit is upward-sloping, we can also say that the linear correlation is positive. Page 243, Table of Contents www.EconsPhDTutor.com Example 217. We have data from the Clementi weather station for the daily high temperature (in °C) and daily rainfall (in mm) on 361 days in 2015. (Strangely, data were missing for four days, namely Feb 10-13.) 1 2 3 4 ... i ti (°C) 27.3 29.5 31.1 32 pi (mm) 0 0.2 0 0 361 30.2 12.4 We can again plot a scatter diagram of rainfall against temperature. 80 Rainfall (mm) 70 60 50 40 30 20 10 0 25 30 Temperature (degrees Celsius) 35 Again, the black dotted line is a line of best fit. The data points do not seem close to this line. Thus, it seems that the linear correlation between temperature and rainfall is weak. The line of best fit is downward-sloping and so we say that the linear correlation is negative. Exercise 100. (Answer on p. 366.) The table below shows the prices charged (p) and the number of haircuts (q) given by 5 different barbers, during June 2016. Draw a scatter diagram with price on the horizontal axis. Plot also what you think looks like a line of best fit. i 1 2 3 4 5 pi ($) 8 9 4 10 8 qi 300 250 1000 400 400 Page 244, Table of Contents www.EconsPhDTutor.com 57.2 Product Moment Correlation Coefficient (PMCC) In the previous section, we used a scatter diagram to determine if there was a plausible linear relationship between two sets of data. This, though, was a very crude method. A more precise measure of the degree to which two sets of data are linearly correlated is called the product moment correlation coefficient (PMCC). Formally: Definition 13. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of real numbers. Then their product moment correlation coefficient (PMCC) , denoted r, is the real number defined by ∑i=1 (xi − x̄) (yi − ȳ) . r=√ √ 2 2 n n ∑i=1 (xi − x̄) ∑i=1 (yi − ȳ) n Properties of the PMCC. 1. −1 ≤ r ≤ 1. 2. We say the linear correlation is positive if r > 0 and negative if r < 0. 3. If r = 1, then the data points lie exactly on an upward-sloping line (and we say the linear correlation is perfect). y x Page 245, Table of Contents www.EconsPhDTutor.com 4. If r = −1, then the data points lie exactly on a downward-sloping line (and we say the linear correlation is perfect). y x 5. If r is close to 1, then the data points lie close to an upward-sloping line (and we say the linear correlation is very strong). y x 6. If r is close to −1, then the data points lie close to a downward-sloping line (and we say the linear correlation is very strong). y x Page 246, Table of Contents www.EconsPhDTutor.com 7. If r is close to 0, then we say that the linear correlation is very weak. y x y x 8. r is merely a measure of linear correlation and nothing else. Two variables may be very closely related but not linearly-correlated. For example, data generated by the quadratic model yi = x2i may have a very low r. y x Page 247, Table of Contents www.EconsPhDTutor.com Example 216 (continued from above). This is the height and weight example revisited. For convenience, we reproduce the data and scatter diagram: 1 2 3 4 5 6 7 8 9 10 i hi (cm) 182 165 173 155 178 174 169 160 150 190 wi (kg) 81 70 71 53 72 75 69 60 44 80 90 Weight (kg) 80 70 60 50 Height (cm) 40 145 155 165 175 185 195 h̄ = 182 + 165 + 173 + 155 + 178 + 174 + 169 + 160 + 150 + 190 = 169.6, 10 w̄ = 81 + 70 + 71 + 53 + 72 + 75 + 69 + 60 + 44 + 80 = 67.5, 10 n ∑ (hi − h̄) (wi − w̄) = (182 − h̄) (81 − w̄) + ⋅ ⋅ ⋅ + (190 − h̄) (80 − w̄) = 1237 i=1 ¿ √ Án 2 Á À∑ (hi − h̄) = (182 − 169.6)2 + ⋅ ⋅ ⋅ + (190 − 169.6)2 ≈ 37.180640, i=1 ¿ √ Án Á À∑ (wi − w̄)2 = (81 − 67.5)2 + + ⋅ ⋅ ⋅ + (80 − 67.5)2 ≈ 35.418922, i=1 ∑i=1 (hi − h̄) (wi − w̄) Ô⇒ r = √ ≈ 0.9393. √ 2 2 n n ∑i=1 (hi − h̄) ∑i=1 (wi − w̄) n As expected, r > 0 (the linear correlation is positive or, equivalently, the line of best fit is upward-sloping). Moreover, r is close to 1 (the linear correlation is very strong). Page 248, Table of Contents www.EconsPhDTutor.com Example 217 (continued from above). This is the temperature and rainfall example revisited. For convenience, we reproduce the data and scatter diagram: 1 2 3 4 ... i ti (°C) 27.3 29.5 31.1 32 0 0.2 0 0 pi (mm) 361 30.2 12.4 We can again plot a scatter diagram of rainfall against temperature. 80 Rainfall (mm) 70 60 50 40 30 20 10 0 30 Temperature (degrees Celsius) 35 25 t̄ = 27.3 + 29.5 + 31.1 + 32 + ⋅ ⋅ ⋅ + 30.2 ≈ 31.5, 361 w̄ = 0 + 0.2 + 0 + 0 + ⋅ ⋅ ⋅ + 12.4 ≈ 5.0. 361 ∑i=1 (ti − t̄) (wi − w̄) Ô⇒ r = √ √ 2 2 n n ∑i=1 (ti − t̄) ∑i=1 (wi − w̄) n (27.3 − 31.5) (0 − 5.0) + ⋅ ⋅ ⋅ + (30.2 − 31.5) (12.4 − 5.0) =√ √ 2 2 2 2 (27.3 − 31.5) + ⋅ ⋅ ⋅ + (30.2 − 31.5) (0 − 5.0) + ⋅ ⋅ ⋅ + (12.4 − 5.0) ≈ −0.1623. As expected, r < 0 (the linear correlation is negative or, equivalently, the line of best fit is downward-sloping). Moreover, r is fairly close to 0 (the linear correlation is weak). Page 249, Table of Contents www.EconsPhDTutor.com Exercise 101. Compute the PMCC between p and q, using the data below. (Answer on p. 366.) 1 2 3 4 5 i pi ($) 8 9 4 10 8 qi 300 250 1000 400 400 Page 250, Table of Contents www.EconsPhDTutor.com 57.3 Correlation Does Not Imply Causation (Optional) Correlation does not imply causation. This saying has now become a cliché. Doesn’t make it any less true. Below is an amusing but spurious correlation (source). US spending on science, space, and technology correlates with Suicides by hanging, strangulation and suffocation 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 10000 suicides $25 billion 8000 suicides $20 billion 6000 suicides $15 billion Hanging suicides US spending on science 1999 $30 billion 4000 suicides 1999 2000 2001 2002 2003 2004 Hanging suicides 2005 2006 2007 2008 2009 US spending on science tylervigen.com The PMCC is r ≈ 0.99789126. So the two sets of data are almost perfectly linearlycorrelated. But of course, this doesn’t mean that spending on science causes suicides or that suicides cause spending on science. More likely, the correlation is simply spurious. A comic from xkcd: Page 251, Table of Contents www.EconsPhDTutor.com 57.4 Linear Regression Example 87 (continued from above). We suspect that the heights and weights of adult male Singaporeans are linearly-correlated. We thus write down this linear model: w = a + bh. Recall the quote: “All models are wrong, but some are useful.” The model w = a + bh is unlikely to be exactly correct. But hopefully it will be useful. We treat a and b as unknown parameters (do you expect b to be positive or negative?). Our goal is to try to get estimates for a and b, from an observed random sample of height and weight data. We recycle the data from earlier. These, along with the scatter diagram, are reproduced for convenience. 1 2 3 4 5 6 7 8 9 10 i hi (cm) 182 165 173 155 178 174 169 160 150 190 wi (kg) 81 70 71 53 72 75 69 60 44 80 90 Weight (kg) 80 70 60 50 Height (cm) 40 145 155 165 175 185 195 The basic idea of linear regression is this: Find the line that “best fits” the given data. Drawn in the figure above are three plausible candidates for the “line of best fit”. But there can only be one line of best fit. Which is it? At the end of the day, we’ll choose black dotted line as “the” line of best fit. But why? This will be answered in the next section. Page 252, Table of Contents www.EconsPhDTutor.com Example 217 (continued from above). We suspect that daily rainfall and daily high temperatures for 2015 were linearly-correlated. We thus write down this linear model: p = a + bt. Again, our goal is to get estimates for the unknown parameters a and b (do you expect b to be positive or negative?). We gather the following data (recycled from before): 1 2 3 4 ... i ti (°C) 27.3 29.5 31.1 32 0 0.2 0 0 pi (mm) 361 30.2 12.4 We can again plot a scatter diagram of rainfall against temperature. 80 Rainfall (mm) 70 60 50 40 30 20 10 0 25 30 Temperature (degrees Celsius) 35 Again, drawn in the figure above are several plausible candidates for the “line of best fit”. It turns out that the black dotted line will be “the” line of best fit. Page 253, Table of Contents www.EconsPhDTutor.com 57.5 Ordinary Least Squares (OLS) There are different methods for determining “the” line of best fit. Each method will give a different line of best fit. The method we’ll learn in H2 Maths is the most basic and most standard method. It is called the method of ordinary least squares (OLS). Let’s assume there is some true linear model, which may be written as y = a+bx. As always, we stick to the objectivist interpretation. The parameters a and b have some true, fixed values. However, they are unknown (and may forever be unknown). Nonetheless, we’ll try to do our best and get estimates for a and b. These estimates will be denoted â and b̂. And our line of best fit will then be y = â + b̂x. How do we find this line of best fit? Intuitively, this will be the line to which the data points are “as close as possible”. But there are many ways to define the term “as close as possible”. For example, we could try to minimise the sum of the distances between the points and the line. But we shall not do this. Instead, we’ll use the method of OLS: 1. Measure the vertical distance of each data point (xi , yi ) from the line. This is called the residual and is denoted ûi . 2. Our goal is to find the line y = â + b̂x that minimises ∑ û2i — this quantity is called the Sum of Squared Residuals (SSR). Example: Page 254, Table of Contents www.EconsPhDTutor.com Example 216 (height and weight example revisited). Our candidate line of best fit is w = â+ b̂h = 65+0h = 65. This is a horizontal line, which simply “predicts” that everyone’s weight is always 65 kg, regardless of their height. (This is a somewhat silly candidate line of best fit. Not surprisingly, this is not the actual line of best fit.) 85 Weight (kg) 80 75 5 70 65 60 55 50 45 Height (cm) 40 145 155 165 175 185 195 i 1 2 3 4 5 6 7 8 9 10 hi (cm) 182 165 173 155 178 174 169 160 150 190 wi (kg) 81 70 71 53 72 75 69 60 44 80 ŵi (kg) 65 65 65 65 65 65 65 65 65 65 ûi = wi − ŵi (kg) 16 5 6 −12 7 10 4 −5 −21 15 The second last row of the above table gives, for each person with height hi , the corresponding predicted weight ŵi (as per our candidate line of best fit). The residual ûi (last row) is then defined as the vertical distance between the data point and the weight predicted by the candidate line of best fit. 10 The SSR is ∑ û2i = 162 + 52 + 62 + (−12)2 + 72 + 102 + 42 + (−5)2 + (−21)2 + 152 = 1317. Can i=1 we do better than this? That is, can we find another candidate line of best fit whose SSR is smaller than 1317? Page 255, Table of Contents www.EconsPhDTutor.com The following fact gives two formulae for b̂, the slope of the line of best fit. Formula (i) is printed in the List of Formulae you get during exams, but formula (ii) is not. Fact 16. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of data. The OLS regression line of y on x is y − ȳ = b̂ (x − x̄), where ∑ (xi − x̄) (yi − ȳ) (i) b̂ = i=1 n , 2 ∑i=1 (xi − x̄) n (ii) b̂ = ∑ xi yi − nx̄ȳ . ∑ x2i − nx̄2 Moreover, the regression line can also be written in the form y = â + b̂x, where b̂ is as given above and â = ȳ − b̂x̄. Proof. We want to find â and b̂ such that the line y = â + b̂x has the smallest SSR possible. The residual ûi is defined as the vertical distance between (xi , yi ) and the line y = â + b̂x. That is, ûi = yi − y = yi − (â + b̂xi ) . Thus, the SSR is ∑ û2i = ∑ [yi − (â + b̂xi )] 2 We wish to minimise the SSR, by choosing appropriate values of â and b̂. This involves the following pair of first order conditions:20 ∂ ∑ û2i = 0, ∂â ∂ ∂ b̂ ∑ û2i = 0. The remainder of the proof simply involves taking derivatives and doing the algebra — it can be found in the Appendices of my H2 Mathematics Textbook. Remark 3. Whenever we simply say regression line or line of best fit, it may safely be assumed that we are talking about the OLS regression line. 20 There’s a bit of hand-waving here. Page 256, Table of Contents www.EconsPhDTutor.com Example 216 (height and weight example revisited). We already calculated h̄ = 169.6, w̄ = 67.5, n 2 ∑ (hi − h̄) = 1382.4, i=1 ∑i=1 (hi − h̄) (wi − w̄) n So, b̂ = 2 ∑i=1 (hi − h̄) n = n ∑ (hi − h̄) (wi − w̄) = 1237. i=1 1237 ≈ 0.8948. 1382.4 Thus, the regression line is w − 67.5 = 0.8948 (h − 169.6) or w = â + b̂h = −84.26 + 0.8948h. 90 Weight (kg) 85 4 80 8 75 70 65 60 55 50 45 Height (cm) 40 145 155 165 175 185 195 i 1 2 3 4 5 6 7 8 9 10 hi (cm) 182 165 173 155 178 174 169 160 150 190 wi (kg) 81 70 71 53 72 75 69 60 44 80 ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8 ûi = wi − ŵi (kg) 2.4 6.6 0.5 −1.4 −3.0 3.6 2.0 1.1 −6.0 −5.8 10 The SSR for the actual line of best fit is ∑ û2i = 2.42 + ⋅ ⋅ ⋅ + (−5.8)2 ≈ 147.6. This is much i=1 better than the SSR of 1317 that we found for the previous candidate line of best fit, which was simply a horizontal line. Page 257, Table of Contents www.EconsPhDTutor.com Exercise 102. (a) Find the regression line of q on p, using the data below. (b) Complete the table. (c) Draw the scatter diagram, including the regression line and the corresponding residuals. (d) Compute the SSR. (Answer on p. 367.) 1 2 3 4 5 i pi ($) 8 9 4 10 8 qi 300 250 1000 400 400 q̂i ûi = qi − q̂i Page 258, Table of Contents www.EconsPhDTutor.com 57.6 TI84 to Calculate the PMCC and the OLS Estimates Example 218. We’ll find the PMCC and the regression line for these data: i 1 2 3 4 5 xi 1 7 3 11 8 yi 14 5 6 4 4 1. Press ON to turn on your calculator. 2. Press the blue 2ND button and then CATALOG (which corresponds to the 0 button). This brings up the CATALOG menu. 3. Using the down arrow key ∨ , scroll down until the cursor is on DiagnosticOn. 4. Press ENTER once. And press ENTER a second time. The TI84 now says “DONE”, telling you that the Diagnostic option has been turned on. The above steps need only be performed once. Unless of course you’ve just reset your calculator (as is required before each exam). In which case you have to go through the above steps again. After Step 1. After Step 2. After Step 3. After Step 4. 5. Press STAT to bring up the STAT menu. 6. Press 1 to select the “1:Edit” option. 7. The TI84 now prompts you to enter data under the column titled “L1”. This is where you should enter the data for x, using the numeric pad and the ENTER key as is appropriate. (I omit from this step the exact buttons you should press.) 8. After entering the last entry, press the right arrow key > to go to column L2. So enter the data for y, again using the numeric pad and the ENTER key as is appropriate. After Step 5. After Step 6. After Step 7. After Step 8. (... Example continued on the next page ...) Page 259, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) 9. Now press STAT to again bring up the STAT menu. 10. Press the right arrow key > to go to the CALC submenu. 11. Press 4 to select the “4:LinReg(ax+b)” option. 12. To tell the TI84 to go ahead and do the calculations, simply press ENTER . The TI84 tells you that the PMCC is r = −.8147656398. The equation of the regression line of y on x is y = ax + b = −.859375x + 11.75625. (Be careful to note that the TI84 uses the symbol “a” for the coefficient for x, whereas in the A-level List of Formulae, they use b instead. Don’t get these mixed up!) After Step 9. After Step 10. After Step 11. After Step 12. Exercise 103. Using your TI84, find the PMCC between q and p, and also find the regression line of q on p (see data below). Verify that your answer for this exercise is the same as those in the last two exercises. (Answer on p. 368.) i 1 2 3 4 5 pi ($) 8 9 4 10 8 qi 300 250 1000 400 400 Page 260, Table of Contents www.EconsPhDTutor.com 57.7 Interpolation and Extrapolation Given any value of x, we call the corresponding ŷ = b̂ (x − x̄) + ȳ the fitted value or the predicted value. One use of the regression line is that it can help us predict (or “guess”) the value of y, even for x for which we have no data. Example 216 (height and weight example revisited). Say we want to guess the weight of an adult male Singaporean who is 185 cm tall. Using our regression line, we predict that his weight is ŵh=185 = 0.8948 × 185 − 84.26 ≈ 81.3 kg. This is called interpolation, because we are predicting the weight of a person whose height is between two of our observations. Say instead we want to guess the weight of an adult male Singaporean who is 210 cm tall. Using our regression line, we predict that his weight is ŵh=210 = 0.8948 × 210 − 84.26 ≈ 103.6 kg. This is called extrapolation, because we are predicting the weight of a person whose height is beyond on our rightmost observation. 1 2 3 4 5 6 7 8 9 10 i hi (cm) 182 165 173 155 178 174 169 160 150 190 185 210 wi (kg) 81 70 71 53 72 75 69 60 44 80 ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8 81.3 103.6 110 Weight (kg) 6 100 90 80 70 60 50 Height (cm) 40 145 155 Page 261, Table of Contents 165 175 185 195 205 215 www.EconsPhDTutor.com For the A-level exams, you are supposed to mindlessly and formulaically say that “Extrapolation is less reliable than interpolation”, because The former predicts what’s beyond the known observations; the latter predicts what’s between two known observations. This, though, is not a very satisfying explanation for why extrapolation is “less reliable” than interpolation. It merely leads to another question: “Why should a prediction be more reliable if done between two known observations, than if done to the right of the right-most observation (or to the left of the left-most observation)?” We won’t give an adequate answer to this latter question. Instead, we’ll simply give a bunch of examples to illustrate the dangers of extrapolation: Example 219. A man on a diet weighs 115 kg in Week #1. Here’s a chart of his weight loss. The OLS line of best fit suggests that he has been losing about 0.5 kg a week. He forgot to record his weight on Week #6. By interpolation, we “predict” that his weight that week was 112.5 kg. This is probably a reliable guess. By extrapolation, we predict that his weight on Week #201 will be 15 kg. This guess is obviously absurd. It requires that he keeps losing 0.5 kg a week for nearly 4 years. Page 262, Table of Contents www.EconsPhDTutor.com Example 220. A growing boy is 160 cm tall in Month #1. Here’s a chart of his growth. The OLS line of best fit suggests that he has been growing by about 1 cm a month. He forgot to record his height in Month #6. By interpolation, we “predict” that his height that month was 165 cm. This is probably a reliable guess. By extrapolation, we predict that his height in Month #101 will be 260 cm. This guess is obviously absurd. It requires that he keep growing by 1 cm a month for the 8-plus years. Page 263, Table of Contents www.EconsPhDTutor.com Here are three colourful examples of the dangers of extrapolation from other contexts. Example 221. Russell’s Chicken (Problems of Philosophy, 1912, Google Books link). The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken. ... The mere fact that something has happened a certain number of times causes animals and men to expect that it will happen again. Thus our instincts certainly cause us to believe the sun will rise to-morrow, but we may be in no better a position than the chicken which unexpectedly has its neck wrung. Example 222. The Fermat numbers are F0 = 22 + 1 = 3, 0 F1 = 22 + 1 = 5, 1 F2 = 22 + 1 = 17, 2 F3 = 22 + 1 = 257, 3 F4 = 22 + 1 = 65537. 4 Remarkably, the first five Fermat numbers are all prime. This observation led Fermat to conjecture (guess) in the 17th century that all Fermat numbers are prime. This was an act of extrapolation. Unfortunately, Fermat’s act of extrapolation was wrong. About a century later, Euler 5 showed that F5 = 22 + 1 = 4294967297 = 641 × 6700417 is composite (not prime). Today, the Fermat numbers F5 , F6 , . . . , F32 are all known to be composite. Indeed, it was shown in 1964 that F32 is composite. Over half a century later, it is not yet known if F33 = 33 22 + 1 is prime or composite. F33 is an unimaginably huge number, with 2, 585, 827, 973 digits. Page 264, Table of Contents www.EconsPhDTutor.com Example 223. On Ah Beng’s first day at school, he learns in Chinese class that the Chinese character for the number 1 is written as a single horizontal stroke. On his second day at school, he learns that the Chinese character for the number 2 is written as two horizontal strokes. On his third day at school, he learns that the Chinese character for the number 3 is written as three horizontal strokes. The Chinese character for 1 The Chinese character for 2 The Chinese character for 3 After his third day at school, Ah Beng decides he’ll skip at least the next few Chinese classes, because he thinks he knows how to write the Chinese characters for the numbers 4 and above. 4 simply consists of four horizontal strokes; 5 simply consists of five horizontal strokes; etc. Unfortunately, Ah Beng’s act of extrapolation is wrong. The characters for the numbers 4 through 10 look instead like this: 4 Page 265, Table of Contents 5 6 7 8 9 10 www.EconsPhDTutor.com On the other hand, here are two historical examples of extrapolation that, to everyone’s surprise, have held up remarkably well (at least to date). Example 224. Moore’s Law. In 1965, Gordon Moore observed that the number of components that could be crammed onto each integrated circuit doubled every year. He predicted that this rate of progress would continue at least through 1975. In 1975, he adjusted his prediction to a more modest rate of doubling every two years. Thus far, this latter prediction has held up remarkably well, as the following graph (taken from Nature) shows. Unfortunately, as stated in the same Nature article, it “has become increasingly obvious to everyone involved” that “Moore’s law ... is nearing its end”. Page 266, Table of Contents www.EconsPhDTutor.com Example 225. Augustine’s Law. In 1983, Norman Augustine observed that the cost of a tactical aircraft grows four-fold every ten years. (Google Books.) This is considerably quicker than the rate at which the annual US defense budget and US Gross National Product (GNP) grows. Extrapolating, he concluded: • In 2054, the entire annual US defense budget will be spent on a single aircraft. • Early in the 22nd century, the entire US GNP will be spent on a single aircraft. (... Example continued on the next page ...) Page 267, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) These seemingly-absurd conclusions were written at least partly in jest. Except so far they have been right on track. In a 2010 Economist article, Augustine was quoted as saying, “We are right on target. Unfortunately nothing has changed.” That article also presented an updated version of Augustine’s Law. The latest F-35 fighter program is estimated to cost the US Department of Defense US$1.124 trillion. To be fair, that estimate is the cost of the entire program over its projected 60year lifespan (through 2070) — this includes R&D, the purchase of over 2, 000 F-35s, and operating costs. But still, US$1.124 trillion is a mind-blowing figure.* *Figure quoted from an April 2016 Defense News story. Note though that the estimate keeps changing. Exercise 104. Using the data below, “predict” how many haircuts were sold in June 2016 by (a) a barber who charged $7 per haircut; and (b) a barber who charged $200 per haircut. Which prediction is an act of interpolation and which is an act of extrapolation? Which prediction do you think is more reliable?(Answer on p. 368.) i 1 2 3 4 5 pi ($) 8 9 4 10 8 300 250 1000 400 400 qi Page 268, Table of Contents www.EconsPhDTutor.com 57.8 The Higher the PMCC, the Better the Model? There are no routine statistical questions, only questionable statistical routines. - Usually attributed to David Cox. It’s much more interesting to live not knowing than to have answers which might be wrong. - Richard Feynman (1981, YouTube). The A-level examiners21 want you to say, mindlessly and formulaically, that All else equal, a model with a higher PMCC is better than a model with a lower PMCC. Regurgitating the above sentence will earn you your full mark. But in fact, without the “all else equal” clause, it is nonsense. And since it is almost never true that “all else is equal”, it is almost always nonsense. In every introductory course or text on statistics, one is told that the PMCC is merely a relatively-unimportant consideration, in deciding between models. Yet somehow, the A-level examiners seem to consider the PMCC an all-important consideration. Here’s a quick example to illustrate. Example 226. (From the 2015 H2 Maths exam.) In an experiment the following information was gathered about air pressure P , measured in inches of mercury, at different heights above sea-level h, measured in feet. h 2000 5000 10000 15000 20000 25000 30000 35000 40000 45000 P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28 The exam first asks us to find the PMCCs between (a) h and P ; (b) ln h and P ; and (c) √ h and P . The answers are (a) ra ≈ −0.980731; (b) rb ≈ −0.974800; and (c) rc ≈ −0.998638. The A-level exam then says, “Using the most appropriate case ..., find the equation which √ best models air pressure at different heights.” The “correct” answer is that (c) P = a + b h is the “most appropriate” model, simply because the PMCC there is the largest. (... Example continued on the next page ...) 21 See H2 Maths 9740 N2015/II/10(iii), N2014/II/8(b)(ii), N2012/II/8(v), N2011/II/8(iii), N2010/II/10(iii), and N2008/II/8(i). Page 269, Table of Contents www.EconsPhDTutor.com (... Example continued from the previous page ...) But this is utter nonsense. One does not conclude that one model is “more appropriate” than another simply because its PMCC is 0.018 larger. Small measurement errors or plain bad luck could easily explain these tiny differences in PMCCs. Moreover, even if one model has r = 0.9 and another has r = 0.4, it does not automatically follow that the first model is “more appropriate” than the second. In deciding which statistical model to use, there are very many considerations, of which the PMCC is a relatively-unimportant one. In my view, the correct answer should have been this: We have far too little information to make any conclusions. Sadly, in the Singapore education system, what I consider to be the correct answer would not have gotten you any marks. Instead, one is taught that there must always be one single, simplistic, formulaic, definitive, “correct” answer. This is a convenient substitute for thinking. As it turns out, the “most correct” linear model — based on the actual barometric formula (see the last page of the Appendices in my H2 Mathematics Textbook) — is actually the following: ln P = a + b ln (1 + L h) . T The constants L = −0.0065 kelvin per metre (Km-1 ) and T = 288.15 kelvin (K) are, respectively, the standard temperature lapse rate (up to 11, 000 m above sea level) and the standard temperature (at sea level). The PMCC for the above model is rd ≈ 0.999998, which is “better” than the cases examined above. (See this Google spreadsheet for the data and calculations.) But again, the PMCC is merely one relatively-unimportant √ consideration. Our conclusion that this last model is superior to the model P = a + b h is based not on the fact that rd is 0.001 larger than rc . Instead, we are confident in this √ model because it was derived from physical theories. In contrast, the model P = a + b h (or indeed any of the other models suggested above) is completely arbitrary and has no theoretical justification. Hence, even if the model P = √ a + b h had a PMCC of 1, we’d still prefer this last model. Page 270, Table of Contents www.EconsPhDTutor.com Part IV Ten-Year Series This part lists all the questions from 2006-2015 A-Level exams, sorted into the two sections of the exam (Pure Mathematics and Statistics), and in reverse chronological order. In the older exams, they had the habit of not distinctly numbering different parts within the same question as parts (i), (ii), etc. So I have sometimes taken the liberty of adding or modifying such numbers. Exam Tip Unless explicitly instructed, you are always allowed to use your graphing calculator, so use it wherever possible. Examples of explicit instructions to avoid using your calculator include (but are not limited to): • “Without using your calculator ...” • “Use a non-calculator method ...” • “Find the exact value of ...” • “Express your answer in terms of Page 271, Table of Contents √ 3 or π.” www.EconsPhDTutor.com 58 Past-Year Questions for Section A: Pure Mathematics Exercise 105. (8864 N2015/I/1. Answer on p. 369.) Show that there are no real values of k for which 2k + (k − 4)x − 2x2 is always negative. [4] Exercise 106. (8864 N2015/I/2. Answer on p. 369.) (i) Differentiate 1 3 . [2] (2x − 1)4 2 (ii) Use a non-calculator method to find ∫ (x + 2/x) dx. [5] 0.5 Exercise 107. (8864 N2015/I/3. Answer on p. 369.) The diagram shows the curve C with equation y = 12x + 8e−2x . y 0 x (i) Use differentiation to find the exact x-coordinate of the stationary point of C. [4] (ii) Find the area of the region bounded by C, the x-axis and the lines x = 0 and x = a, where a is a positive constant. Give your answer in terms of a. [3] Page 272, Table of Contents www.EconsPhDTutor.com Exercise 108. (8864 N2015/I/4. Answer on p. 369.) The diagram shows a piece of paper DEF in the shape of an equilateral triangle of side y cm. An equilateral triangle of side x cm is removed from each corner of DEF . The perimeter of the remaining shape P QRST U is 30 cm. D x P x Q y U F R x x T S E √ (i) Show that the area, A cm2 , of P QRST U is given by A = (0.5 3) (50 + 10x − x2 ). [5] (ii) Without using a calculator, find the maximum value of A as x varies, justifying that this value is a maximum. [3] Exercise 109. (8864 N2015/I/5. Answer on p. 370.) The curve C has equation y = 0.5x − ln(x + 1). (i) Sketch the graph of C, stating the coordinates of any points of intersection with the axes and the equations of any asymptotes parallel to the y-axis. [3] (ii) Find the numerical value of the gradient of C at the point P where x = 0.5, giving your answer correct to 3 decimal places. [1] (iii) The normal to C at P meets the x-axis at A and the y-axis at B. Find the length of AB. [5] Page 273, Table of Contents www.EconsPhDTutor.com Exercise 110. (8864 N2014/I/1. Answer on p. 370.) Use a non-calculator method to find the exact value of 6 ∫1 √ 1 dx. 1 + 4x [4] Exercise 111. (8864 N2014/I/2. Answer on p. 371.) (i) Differentiate ln (x2 + 4). [2] (ii) The curve C has equation y = ln (x2 + 4). Show that the values of x for which the gradient of C is equal to the constant k satisfy the equation kx2 − 2x + 4k = 0. [1] (iii) Find the values of k for which this equation has equal roots. [2] Exercise 112. (8864 N2014/I/3. Answer on p. 371.) The curve C has equation y = 1 − e1−2x . (i) Sketch the graph of C, stating the exact coordinates of any points of intersection with the axes and the equation of the asymptote. [3] (ii) Without using a calculator, find the equation of the tangent to C at the point where x = 1, giving your answer in the form y = mx + c, where m and c are exact constants. [4] Exercise 113. (8864 N2014/I/4. Answer on p. 371.) ABCD is a rectangle in which AB = y cm and BC = 3x cm. The point E is on DA and the point G√ is on DC such that DEF G is a square of side x cm (see diagram). The length of BF is 2 65 cm. (i) Show that 5x2 + y 2 − 2xy = 260. [2] (ii) Given that the perimeter of the rectangle ABCD is 60 cm, find the values of x and y. [5] Page 274, Table of Contents www.EconsPhDTutor.com Exercise 114. (8864 N2014/I/5. Answer on p. 372.) The curve C has equation y = x3 + kx2 + 7x + c, where k and c are constants. The stationary points of C are at A and B. (i) Given that A has coordinates (1, 2), show that k = −5 and find the value of c. [4] (ii) Hence find the exact values of the coordinates of B. [3] (iii) Sketch the graph of C, stating the coordinates of any points where the curve crosses the x-axis. [2] (iv) Use a non-calculator method to find the exact area of the region bounded by C, the x-axis and the lines x = 1 and x = 2. [3] Exercise 115. (8864 N2013/I/1. Answer on p. 372.) Find the set of values of k for which the equation x2 − (k − 2)x + (2k + 1) = 0 has no real roots. [4] Exercise 116. (8864 N2013/I/2. Answer on p. 372.) (i) Differentiate ln (1 + 2x2 ). [2] (ii) Use a non-calculator method to find the exact value of 0 1 ∫−1 (1 − 3x)4 dx. [4] Exercise 117. (8864 N2013/I/3. Answer on p. 372.) A piece of card has the shape of a trapezium ABCE. The point D on CE is such that ABCD is a rectangle. It is given that AB = y cm, BC = 4x cm and DE = 3x cm (see diagram). The area of the card is S cm2 . Given that the perimeter of the card is 20 cm, (i) find an expression for S in terms of x, [3] (ii) find the maximum value of S, justifying that this value is a maximum. [3] Page 275, Table of Contents www.EconsPhDTutor.com Exercise 118. (8864 N2013/I/4. Answer on p. 373.) The curve C has equation y = x3 − ax2 + 3x + 6, where a is a constant. (i) Find, in terms of a, the gradient of the normal to C at the point P where x = 1. [3] The normal at P passes through the point (−5, 3). (ii) Show that a satisfies the equation a2 − 10a + 24 = 0 and hence find the two possible values of a. [5] (iii) For the smaller value of a, find the coordinates of the point of intersection of the normal at P and the line y = x. [2] Exercise 119. (8864 N2013/I/5. Answer on p. 373.) (i) By taking logarithms, find the exact root of the equation e2−2x = 2e−x . [3] (ii) Use differentiation to show that the curve C with equation y = e2−2x − 2e−x has a stationary point at (2, −e−2 ). [3] (iii) Sketch C, stating the exact value of the x-coordinate of its point of intersection with the x-axis. [2] (iv) Use your calculator to find the area of the region bounded by C, the x-axis and the lines x = 0 and x = 1. [1] Exercise 120. (8864 N2012/I/1. Answer on p. 373.) Given that 3e2x = 4 (e−2x − 1), use the substitution u = e2x to find the exact value of x. [4] Page 276, Table of Contents www.EconsPhDTutor.com Exercise 121. (8864 N2012/I/2. Answer on p. 374.) The diagram shows a garden which is enclosed by a wall AH and fencing along the rest of the boundary ABCDEF GH. The angles at B, C, D, E, F and H are each right angles and EF = 20 m, BC = y m and AB = DC = DE = x m. It is given that the total length of the fencing is 100 m, and that the area of the rectangle HBCG is three times the area of the rectangle DEF G. (i) Show that x2 + 30x − 400 = 0. [4] (ii) Find the length of HF . [2] Page 277, Table of Contents www.EconsPhDTutor.com Exercise 122. (8864 N2012/I/3. Answer on p. 374.) The diagram shows the curve C with equation y = k 2 − x2 and the line L with equation y = 3k 2 /4, where k is a positive constant. (i) Find, in terms of k, the x-coordinates of the points where C and L intersect. [2] (ii) Hence find, in terms of k, the area of the finite region between C and L. [4] Exercise 123. (8864 N2012/I/4. Answer on p. 374.) (i) Differentiate (a) 2 ln(3x + 2), [2] (b) 4/(2x + 1). [2] √ √ 2 (ii) Without using a calculator, find the exact value of ∫ ( x − 1/ x) dx, simplifying 2 your answer. [5] 4 Exercise 124. (8864 N2012/I/5. Answer on p. 375.) The curve C has equation y = 2x −x2 . (i) Sketch C, stating the coordinates of the points of intersection with the axes. [3] (ii) Find the numerical value of the gradient of C at the point where x = 1.5. Give your answer correct to 4 decimal places. [1] (iii) Hence find the equation of the tangent to C at the point where x = 1.5. Give your answer in the form y = mx + c, with m and c correct to 4 decimal places. [2] (iv) This tangent meets the y-axis at A and the line y = x at B. Find the length of AB. [4] Page 278, Table of Contents www.EconsPhDTutor.com Exercise 125. (8864 N2011/I/1. Answer on p. 375.) Find, algebraically, the set of values for which x2 + (k − 2)x + (k + 1) > 0 for all real values of x. [4] Exercise 126. (8864 N2011/I/2. Answer on p. 375.) (i) On a single diagram, sketch the graphs of y = 2 − 0.6x and y = x2 − 1, stating clearly the coordinates of any points of intersection with the y-axis. [2] (ii) Find the x-coordinates of the points of intersection of y = 2 − 0.6x and y = x2 − 1, giving your answers correct to 4 decimal places. [2] (iii) Write down as an integral an expression for the area of the region bounded by y = 2−0.6x and y = x2 − 1 and the lines x = 2 and x = 3. Evaluate this integral, giving your answer correct to 3 decimal places. [2] Exercise 127. (8864 N2011/I/3. Answer on p. 376.) (i) Find ∫ e3x+2 dx. [2] √ √ (ii) Without using a calculator, find ∫ 3 ( x − 1/ x) dx. [4] 4 9 Exercise 128. (8864 N2011/I/4. Answer on p. 376.) The diagram shows a square piece of cardboard ABCD of side 2 m. A square of side x m is removed from each corner of ABCD. The remaining shape is now folded along P Q, QR, RS and SP to form an open rectangular box of height x m. (i) Show that the volume, V m3 , of the box is given by V = 4x3 − 8x2 + 4x. [3] (ii) Without using a calculator, find the maximum value of V as x varies. [5] Page 279, Table of Contents www.EconsPhDTutor.com Exercise 129. (8864 N2011/I/5. Answer on p. 376.) The curve C has equation y = x − ln(2x + 1). O is the origin and P is the point on C for which x = 2. The normal to C at the point P meets the x-axis at A and the y-axis at B (see diagram). Without using a calculator, (i) find the exact coordinates of the minimum point on C, [3] (ii) find the exact coordinates of A and B and hence show that the exact area of triangle OAB is (p − q ln 5)2 /30, where p and q are integers to be found. [8] Exercise 130. (8864 N2010/I/1. Answer on p. 376.) Find the set of values of k for which the equation 4x2 − 2kx + 9 = 0 has two real distinct roots. [3] Exercise 131. (8864 N2010/I/2. Answer on p. 376.) Find (i) ∫ e1−2x dx, [2] (ii) ∫ 2/(x + 1)3 dx. [3] Page 280, Table of Contents www.EconsPhDTutor.com Exercise 132. (8864 N2010/I/3. Answer on p. 377.) The equation of a curve is y = ln(2x − 3). (i) Sketch the curve, stating the exact equations of any asymptotes and the exact coordinates of any intersections with the axes. [2] (ii) Find dy/dx. [1] (iii) Hence find the equation of the normal to the curve at the point where x = 3, giving your answer in the form ax + by = c, where a and b are integers. [4] Exercise 133. (8864 N2010/I/4. Answer on p. 377.) A window in a new building has the shape of a rectangle ABCD joined to an isosceles triangle ABE, as shown in the diagram. IT is given that AB = 2x m and AE = 5/8AB. The total perimeter AEBCDA of the window is 6 m. (i) Show that AD = (3 − 9x/4) m. [2] The area of the window is to be as large as possible. (ii) Show that the area of the window is equal to (6x − 15x2 /4) m2 . [3] (iii) Hence use a non-calculator method to find the maximum value of this area. [4] Page 281, Table of Contents www.EconsPhDTutor.com Exercise 134. (8864 N2010/I/5. Answer on p. 377.) A curve C has equation y = 6 − 4x3 − 3x4 . (i) Use a non-calculator method to find the coordinates of the stationary points of C. [4] (ii) Sketch C. Mark the point of inflexion with a cross. [2] (iii) Find, correct to 2 decimal places, the x-coordinates of the points where C cuts the x-axis. [2] (iv) Find ∫ 6 − 4x3 − 3x4 dx. Hence find the exact area of the region bounded by C, the x-axis and the lines x = −1 and x = 1/2. [3] Exercise 135. (8863 N2009/I/1. Answer on p. 378.) Without using a calculator, solve the simultaneous equations x + 2y = 3, x2 + xy = 2. [4] √ Exercise 136. (8863 N2009/I/2. Answer on p. 378.) (i) Sketch the graphs of y = x and √ y = 0.5x on a single diagram and write down the coordinates of the points where y = x and y = 0.5x intersect. [2] (ii) Find ∫ √ x dx and ∫ 0.5x dx. [2] (iii) Without using a calculator, find the area of the region between the two graphs. [2] Exercise 137. (8863 N2009/I/4. Answer on p. 379.) (i) Sketch the curve y = x − 1/x, stating clearly the coordinates of all points of intersection with the axes. [1] (ii) Find the gradient of the normal at the point P on the curve where x = 2. [2] (iii) Find the equation of the normal at P in the form ax + by + c = 0, where a and b are integers. [3] (iv) The normal at P meets the y-axis at N and the tangent at P meets the y-axis at T . Find the area of triangle P T N . [5] Page 282, Table of Contents www.EconsPhDTutor.com Exercise 138. (8863 N2009/I/5. Answer on p. 379.) A curve has equation y = 2x3 − 5x2 − 4x + 3. (i) Find dy/dx. Hence find the exact coordinates of the stationary points on the curve. [4] (ii) Sketch the curve, stating clearly the coordinates of all points of intersection with the axes. [3] (iii) Solve the inequality 2x3 −5x2 −4x+3 > 0. Hence find the exact solutions of the inequality 2e3x − 5e2x − 4ex + 3 > 0. [5] Exercise 139. (8863 N2008/I/1. Answer on p. 380.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. Sketch the graph of y = sin x for 0 ≤ x ≤ 4π. [1] It is given that α is an acute angle, and sin α = c. State, in terms of c, the value of (i) sin(2π + α), [1] (ii) sin(3π + α). [1] State, in terms of α and π, one value of x between π and 2π for which sin x = −c. [1] Exercise 140. (8863 N2008/I/2. Answer on p. 380.) The sum of two numbers x & y is 20 and the sum of their squares is 300. Given that x > y, find the exact value of x & y. [5] Exercise 141. (8863 N2008/I/3. Answer on p. 380.) The diagram shows the graphs of C1 ∶ y = 2x2 and C2 ∶ y = x2 + k 2 , where k is a positive constant. The graphs intersect at P and Q, as shown. (i) Show that the x-coordinates of P and Q are k and −k respectively. [1] (ii) Find the exact value of the area of the shaded region between C1 and C2 . [5] Page 283, Table of Contents www.EconsPhDTutor.com Exercise 142. (8863 N2008/I/5. Answer on p. 381.) A spot of light on a computer screen moves in a horizontal line across the screen. At time t seconds, its distance, x mm, from the left-hand edge of the screen is given, for t ≥ 0, by x = t3 − 12t2 + kt, where k is a positive constant. (i) Find the set of values of k for which x is an increasing function of t. [5] It is now given that k = 36. (ii) Sketch the graph of x against t. [1] (ii) The screen has width 375 mm. Find the time in seconds at which the spot reaches the right-hand edge of the screen, giving your answer correct to 1 decimal place. [2] Exercise 143. (8863 N2008/I/6. Answer on p. 381.) The diagram shows the curve C with equation y = ln(2x + 4). The point P on C has coordinates (1, ln 6). The tangent to C at P meets the x-axis at T . (i) Show that the exact x-coordinate of T is 1 − 3 ln 6. [4] The normal to C at P meets the x-axis at N . (ii) Find the exact x-coordinate of N . [2] (iii) Find the exact area of triangle P T N . [4] Page 284, Table of Contents www.EconsPhDTutor.com Exercise 144. (8863 N2007/I/1. Answer on p. 381.) (i) Find the numerical value of the derivative of 3x when x = 2. [1] (ii) Hence find the equation of the tangent to the graph of y = 3x at the point where x = 2, giving your answer in the form y = mx + c. [2] Exercise 145. (8863 N2007/I/3. Answer on p. 382.) (i) Sketch, for x ≥ 0, the graphs of y = 20/(x + 2) and y = 10 − x2 on the same axes. [2] (ii) The graphs intersect on the y-axis. Find, correct to 3 decimal places, the x-coordinate of the point of intersection for which x > 0. [1] (iii) Find ∫ 20/(x + 2) dx and ∫ (10 − x2 ) dx. [3] (iv) Use your answers to parts (ii) and (iii) to find the area of the region, in the first quadrant, between the two graphs. [2] Exercise 146. (8863 N2007/I/4. Answer on p. 382.) The diagram shows a large rectangular field surrounded by a wall. The broken lines represent fences. The corner shapes are an isosceles triangle and a square. The length of the fence bordering the triangle is x metres. (i) Explain why the area of the triangle is 0.25x2 m2 . [2] The total length of the fences is 100 m . The total area of the triangle and the square is A m2 . (ii) Show that A = 2500 − 50x + 0.5x2 . [3] (iii) Use differentiation to find the value of x for which A is a minimum. State the corresponding minimum value of A and explain briefly how you can tell that it is a minimum rather than a maximum. [4] (iv) Find the largest value that A can take, given that 10 ≤ x ≤ 80. Show clearly how you obtain your answer. [2] Page 285, Table of Contents www.EconsPhDTutor.com Exercise 147. (8863 N2007/I/5. Answer on p. 383.) (i) Without using a calculator, solve the simultaneous equations y = 2x2 + 3x + 2 and y = 2x + 3. [3] (ii) Hence solve the inequality 2x2 + 3x + 2 ≥ 2x + 3. [2] Part (iii) of this question is no longer in the 8865 (revised) syllabus, so you can skip it. (iii) Hence, using a sketch of the graph of x = cos θ, solve the inequality 2 cos2 θ + 3 cos θ + 2 ≥ 2 cos θ + 3, for 0○ ≤ θ ≤ 540○ . [6] Exercise 148. (8174 N2006/I/6. Answer on p. 383.) Solve (i) e5x+2 = 23, [2] (ii) lg (40 + y 2 ) = 2.5. [3] Exercise 149. (8174 N2006/I/7. Answer on p. 383.) The line y = 1 − 3x is a tangent to the curve x2 + y 2 + kx + 2y + 7 = 0. Find the possible values of the constant k. [5] Exercise 150. (8174 N2006/I/9. Answer on p. 384.) (i) Find ∫ (5x2 − 8x) dx. [2] 1 (ii) Evaluate ∫ e−2x dx. [4] 0 Exercise 151. (8174 N2006/I/16. Answer on p. 384.) The diagram shows the line y = −4x + 19 intersecting the curve y = −2x2 + 6x + 11 at the points A and B. Find (i) the coordinates of the points A and B, [4] (ii) the area of the shaded region. [7] Page 286, Table of Contents www.EconsPhDTutor.com 59 Past-Year Questions for Section B: Prob. & Stats Exercise 152. (8864 N2015/I/6. Answer on p. 385.) The masses of peaches sold by a shop have a normal distribution. Over a long period of time, it is found that 20% of peaches have a mass less than 40 grams and 25% of peaches have a mass greater than 60 grams. Find the mean and variance of the distribution. [4] Exercise 153. (8864 N2015/I/7. Answer on p. 385.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. A college has 1200 students. Of these students, 500 are in Year One, 400 are in Year Two and 300 are in Year Three. A list of the names of all 1200 students, with the names arranged in alphabetical order, is available. A survey is to be carried out to investigate how many hours students spend playing computer games each week. (i) Describe how to obtain a systematic sample of 100 students from the list to take part in the survey. [2] (ii) State one disadvantage of using a systematic sample in this context. [1] (iii) What type of sample might it be more appropriate to use? You do not need to describe how you would obtain this sample. [1] Exercise 154. (8864 N2015/I/8. Answer on p. 385.) Two events A and B are such that P(A) = p, P(B) = 2p, P(A ∪ B) = 0.42 and P(A ∩ B) = 0.03. (i) Show that p = 0.15. [1] (ii) Find P (A ∪ B ′ ). [3] (iii) Determine whether the events A and B ′ are independent. [2] Exercise 155. (8864 N2015/I/9. Answer on p. 385.) Kai throws a fair die 8 times. Find the probability that he obtains a six (i) exactly three times, [1] (ii) fewer than four times. [2] Lam throws a fair die 600 times. (iii) Using a suitable approximation, estimate the probability that the number of times he obtains a six is between 90 and 100 inclusive. State the mean and variance of the distribution that you use. [4] Page 287, Table of Contents www.EconsPhDTutor.com Exercise 156. (8864 N2015/I/10. Answer on p. 386.) The height, h metres, and the weight, w kg, were recorded for a random sample of 10 members of a rowing club. The results are given in the following table. Rower A B C D E F G H I J h 1.75 1.90 1.81 1.82 1.81 1.60 1.88 1.71 1.95 1.76 w 95 102 96 98 99 90 106 92 110 93 (i) Draw a sketch of the scatter diagram for the data, as shown on your calculator. [2] (ii) Find the product moment correlation coefficient and comment on its value in the context of the data. [2] (iii) Find the equation of the regression line of w on h and sketch this line on your scatter diagram. [2] (iv) Use the equation of your regression line to calculate an estimate of the weight of a rower whose height is 1.66 metres. Give two reasons why you would expect this estimate to be reliable. [3] Exercise 157. (8864 N2015/I/11. Answer on p. 386.) Men and women staying at a large hotel have masses, in kg, that are normally distributed with means and standard deviations as shown in the following table. Mean mass Standard deviation Men 77 9.8 Women 62 10.6 (i) Find the probability that the mass of a man chosen at random is within ±2 kg of the mean mass of men. [2] (ii) Find the probability that the total mass of three men chosen at random is greater than the total mass of four women chosen at random. State the mean and variance of the distribution that you use. [4] The lift in the hotel has a safety limit of 460 kg. Three men and four women are chosen at random. (iii) Find the probability that they can safely travel in the lift together. State the mean and variance of the distribution that you use. [3] Page 288, Table of Contents www.EconsPhDTutor.com Exercise 158. (8864 N2015/I/12. Answer on p. 386.) An accountancy qualification involves two separate examinations, Part I and Part II. To be successful, a student must first pass Part I and after passing Part I must then pass Part II. Students who fail Part I at the first attempt always make a second attempt. Students are allowed at most two attempts at Part I but only one attempt at Part II. The probability that a student will pass Part I, on either attempt, is 3/4. The probability that a student will pass Part II is 2/5. (i) Draw a tree diagram to represent this information. [3] (ii) Find the probability that a student chosen at random will succeed in the accountancy qualification. [2] (iii) Find the probability that a student chosen at random will succeed in the accountancy qualification, given that the student fails Part I at the first attempt. [2] Five randomly chosen students take the qualification. (iv) Find the probability that at least two of them will be successful. [3] Exercise 159. (8864 N2015/I/13. Answer on p. 387.) A scientist claims that the mean length of fish in a particular lake is 15.2 cm. The lengths of fish are known to have a normal distribution with standard deviation 2.1 cm. A random sample of 30 fish is selected and found to have a sample mean length of 14.5 cm. (i) Test, at the 5% significance level, whether the scientist’s claim should be rejected. [4] The lengths of a random sample of 40 fish from a second lake are summarised as follows, where x cm denotes the length of a fish in this lake. ∑(x − 18) = −32, ∑(x − 18)2 = 325. (ii) Find unbiased estimates of the population mean and variance. [3] (iii) What do you understand by the term ‘unbiased estimate’? [1] The population mean length of fish from this second lake is µ cm. Using the sample data, a significance test of the null hypothesis µ = 18 against the alternative hypothesis µ < 18 is carried out at the α% significance level. (iv) Find the set of values of α for which the null hypothesis will be rejected. [3] Exercise 160. (8864 N2014/I/6. Answer on p. 387.) The heights of girls in a school have a normal distribution with mean 142.2 cm and standard deviation 6 cm. Find the probability that a girl chosen at random from this school has height (i) less than 146 cm, [2] (ii) within 5 cm of the mean. [2] Page 289, Table of Contents www.EconsPhDTutor.com Exercise 161. (8864 N2014/I/7. Answer on p. 387.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. There are 5000 households in a particular town. For each household the weekly food shopping is done either by going to the supermarket or by ordering online and having the order delivered. The numbers using each method of shopping are recorded, according to the age, in years, of the person responsible for the shopping. The data is summarised in the following table. Supermarket Online Under 25 years 500 1000 25 − 60 years 900 1600 Over 60 years 800 200 A researcher carries out a survey to investigate the amount spent on food per week. She decides to use a sample of size 100 from these households. (i) Describe how she might obtain a systematic sample. [2] (ii) Describe how she might obtain a stratified sample, identifying the strata and finding the size of the sample taken from each of the strata. [2] (iii) State, with a reason, whether a systematic sample or a stratified sample would be more appropriate in this context. [1] Exercise 162. (8864 N2014/I/8. Answer on p. 388.) In a certain large city, the number of hours, x, spent travelling to and from work and the number of hours, y, spent watching television were recorded for a random sample of 8 people, for one particular week. The results are given in the following table. A B C D E F G H x 12.8 8.4 4.4 9.0 7.2 2.2 9.2 6.3 y 4.5 8.3 14.8 8.0 9.2 12.5 7.8 10.4 (i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2] (ii) Find the product moment correlation coefficient and comment on its value in the context of the data. [2] (iii) Find the equation of the regression line of y on x, in the form y = mx + c, giving the values of m and c correct to 4 significant figures. Sketch this line on your scatter diagram. [2] (iv) Use the equation of your regression line to estimate the number of hours of television watched by a person who spends 13.2 hours a week travelling to and from work. Comment on the reliability of your estimate. [3] Page 290, Table of Contents www.EconsPhDTutor.com Exercise 163. (8864 N2014/I/9. Answer on p. 388.) A bakery produces two kinds of cake. One kind of cake contains fruit, and the other kind contains no fruit. There is a constant probability that a cake contains fruit. The cakes are sold in packs of 6. Each pack has a random selection of cakes. For these packs, the mean number of cakes containing fruit is 2.4. (i) Find the probability that a pack chosen at random has (a) no cakes containing fruit, [2] (b) at most two cakes containing fruit. [1] A customer buys 8 packs of cakes for a party. (ii) Find the probability that at least 4 of these packs have at most two cakes containing fruit. [3] A supermarket stocks 150 of these packs of cakes. (iii) Using a suitable approximation, estimate the probability that more than half of the packs have at most two cakes containing fruit. You should state the mean and variance of any distribution that you use. [4] Exercise 164. (8864 N2014/I/10. Answer on p. 388.) It is known that the lengths of leaves from beech trees in a particular forest have a population variance of 4.4 cm2 . Scientists believe that the mean length of leaves from beech trees in this forest is 7 cm. A random sample of 50 of these leaves has a mean length of 6.5 cm. (i) Test, at the 5% significance level, whether the population mean length of leaves from beech trees in this forest is less than 7 cm. [4] The lengths, x cm, of a random sample of 50 leaves from beech trees in another forest are summarised by ∑ x = 310.4 and ∑ x2 = 2209.2. (ii) Calculate unbiased estimates of the population mean and variance. [3] A test, at the α% significance level, shows that there is sufficient evidence to suggest that the population mean length of leaves from beech trees in this second forest differs from 7 cm. (iii) Find the set of possible values of α. [4] Page 291, Table of Contents www.EconsPhDTutor.com Exercise 165. (8864 N2014/I/11. Answer on p. 389.) A group of students are asked whether they own any of a laptop, a tablet and a games machine. The numbers owning different combinations are shown in the Venn diagram. The number of students owning none of these is x. One of the students is chosen at random. L is the event that the student owns a laptop. T is the event that the student owns a tablet. G is the event that the student owns a games machine. (i) Write down expressions for P(L) and P(G) in terms of x. Given that L and G are independent, show that x = 10. [4] Using this value of x, find (ii) P(L ∪ T ), (iii) P(T ∩ G′ ), and (iv) P(L∣G). [1 mark each.] Two students from the whole group are chosen at random. (v) Find the probability that both of these students each owns exactly two out of the three times (laptop, tablet, games machine). [3] Exercise 166. (8864 N2014/I/12. Answer on p. 389.) The outputs of a certain metal, in tonnes, extracted each day from two mines, A and B, have independent normal distributions. The mean of the distribution of the daily output from A is 50 tonnes. The probability that the daily output from A is more than 75 tonnes is 0.0189. (i) Show that the variance of this distribution is 145 tonnes2 , correct to 3 s.f. [3] The mean and variance of the distribution of the daily output from B are 75 tonnes and 64 tonnes2 respectively. B operates for seven days each week. (ii) Find the probability that in a 7-day week B produces less than 500 tonnes. [3] (iii) A operates for five days each week. Find the probability that in any particular week the output from B is more than twice the output from A. You should state the mean and variance of any distribution that you use. [5] Page 292, Table of Contents www.EconsPhDTutor.com Exercise 167. (8864 N2013/I/6. Answer on p. 389.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. Suky is organising a pop concert. She sells 5000 tickets at $X each, 10000 tickets at $Y each and 15000 tickets at $Z each. Suky wants to find out whether those who bought the tickets thought that the price they paid was good value for money. She decides to do this by choosing a stratified random sample of size 150. (i) Describe how Suky might choose her sample. [3] (ii) State one reason for using stratified random sampling in this context. [1] Exercise 168. (8864 N2013/I/7. Answer on p. 390.) A particular type of electronic device is being tested to determine for how long information stored in it is retained after power has been switched off. A random sample of 250 such devices is chosen and the time, T hours, for which information is retained is measured for each one. The results obtained are summarised as follows. ∑(t − 75) = 305, ∑(t − 75)2 = 29555. (i) Find unbiased estimates of the population mean and variance. [3] (ii) This type of device has previously been considered capable of retaining information for 75 hours, on average, after power is switched off, but the manufacturers now claim that information is retained for longer than this. Test at the 2.5% significance level whether the claim is justified. [4] Exercise 169. (8864 N2013/I/8. Answer on p. 390.) A shop sells batteries in packs of 10. An advertiser claims that individual batteries each have a lifetime of at least 100 hours. The probability that an individual battery has a lifetime less than 100 hours is 0.2, independently of all other batteries. (i) Find the probability that, in a randomly chosen pack of 10 batteries, each of the batteries satisfies the advertiser’s claim. [1] Customers are satisfied if at least 8 of the batteries in a pack have a lifetime of at least 100 hours. (ii) Find the probability that a randomly chosen pack will satisfy customers. [3] A customer buys a batch of 80 packs of these batteries. (iii) Using a suitable approximation, estimate the probability that at least 75% of packs in the batch will satisfy the customer. State the mean and variance of the distribution that you use. [4] Page 293, Table of Contents www.EconsPhDTutor.com Exercise 170. (8864 N2013/I/9. Answer on p. 390.) The ages x, in years, and the heights y, in cm, for 10 boys are given in the following table. Boy A B C D E F G H I J x 8.2 10.1 6.6 13.5 6.8 11.4 7.8 6.9 12.8 7.5 y 123 135 119 141 112 151 122 116 141 123 (i) Give a sketch for the scatter diagram for the data, as shown on your calculator. [2] (ii) Find the product moment correlation coefficient and comment on its value in the context of the data. [2] (iii) Find the equation of the regression line of y on x, in the form y = mx + c, giving the values of m and c correct to 2 decimal places. Sketch this line on your scatter diagram. [2] (iv) Use the equation of your regression line to calculate an estimate of the height of a boy whose age is 13.2 years and comment on the reliability of your estimate. [3] Exercise 171. (8864 N2013/I/10. Answer on p. 391.) A company producing barbecue sauce claims that the mass of salt in a bottle of the sauce has a mean of 12 g. The mass of salt is known to have a normal distribution with standard deviation 0.8 g. A random sample of 20 bottles is selected. The sample mean is m g. A test at the 5% significance level is carried out on this sample, and the company’s claim is accepted. (i) Find the set of possible values of m. [5] The company launches a new variety of the sauce and claims that the mean salt content per bottle has been reduced. The mass of salt in a random sample of 40 bottles of the new variety has a mean of 11.75 g. The mass of salt still has a normal distribution with standard deviation 0.8 g. (ii) Test the company’s claim about the new variety of sauce, using a 5% significance level. [4] Page 294, Table of Contents www.EconsPhDTutor.com Exercise 172. (8864 N2013/I/11. Answer on p. 391.) A pet shop sells two types of animal food. Type A is supplied by a manufacturer and sold in packets with the food content having a mean mass of 1 kg. The masses of the food content are normally distributed. It is known that 20% of the packets contain less than 990 g of food. (i) Find the standard deviation of the distribution. [3] Type B animal food is mixed by the shop owner from two ingredients P and Q. One packet contains 3 scoops of ingredient P and 2 scoops of ingredient Q. The masses, in grams, of the food in scoops of ingredients P and Q have independent normal distributions with means and standard deviations as shown in the following table. Mean Standard deviation Ingredient P 240 10 Ingredient Q 145 8 (ii) Find the probability that a randomly selected packet of Type B has a mass of food less than 1 kg. State the mean and variance of any distribution that you use. [4] (iii) Find the probability that the mass of food in a randomly selected packet of Type B is more than the mass of food in a randomly selected packet of Type A. State the mean and variance of any distribution that you use. [4] Exercise 173. (8864 N2013/I/12. Answer on p. 391.) Jai is playing a game which involves throwing a fair six-sided die. If the result is a 3, 4, 5 or 6, his score is the number shown. If the result is a 1 or a 2, he throws the die a second time and his score is the sum of the two numbers from his two throws. (i) Draw a tree diagram to represent the possible outcomes. [3] Events A and B are defined as follows: Event A: Jai’s score is 5 or 6, Event B: Jai has two throws. (ii) Show that P(A) = 4/9. [2] Find (iii) P(A ∩ B), [1] (iv) P(A ∪ B), [2] and (v) P(B∣A′ ). [4] Exercise 174. (8864 N2012/I/6. Answer on p. 392.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. (i) Describe what is meant by ‘systematic sampling’. [2] A researcher is conducting a survey in a particular town to find out how many hours adults spend on their computers each day. He decides to survey a sample of 100 adults by standing outside the main supermarket at midday and using systematic sampling. (ii) State, in this context, one advantage and one disadvantage of this procedure. [2] (iii) Describe briefly how, in this case, the researcher might choose a more appropriate systematic sample. [1] Page 295, Table of Contents www.EconsPhDTutor.com Exercise 175. (8864 N2012/I/7. Answer on p. 392.) Events A and B are such that P(A) = P(B) = p and P(A ∪ B) = 5/9. (i) Given that A and B are independent, find a quadratic equation satisfied by p. [3] (ii) Hence find the value of p and the value of P(A ∩ B). [2] Exercise 176. (8864 N2012/I/8. Answer on p. 392.) An election was held to choose the leader of a political party. • Candidate A received 50% of all the votes, and 60% of A’s votes were cast by males. • Candidate B received 35% of all the votes, and 40% of B’s votes were cast by males. • Candidate C received 15% of all the votes, and 20% of C’s votes were cast by males. A person V , who voted in the election, is selected at random. Find the probability that V (i) voted for A and is male, [1] (ii) is female, [2] (iii) voted for C, given that V is male. [2] Exercise 177. (8864 N2012/I/9. Answer on p. 392.) A company is selling ‘Pluto’ cars. The age x, in years, and the advertised price y, in hundreds of dollars, for ten Pluto cars are given in the following table. Car 1 2 3 4 5 6 7 8 9 10 x 5.0 4.5 6.0 5.2 5.6 6.0 3.0 2.0 7.1 7.5 y 85 90 65 72 75 70 130 150 42 42 (i) Draw a sketch of the scatter diagram for the data, as shown on your calculator. [2] (ii) Find the product moment correlation coefficient and comment on its value in the context of the data. [2] (iii) Find the equation of the regression line of y on x, in the form y = mx + c, giving the values of m and c correct to 2 decimal places. Sketch this line on your scatter diagram. [2] (iv) Calculate an estimate of the advertised price of a Pluto car which is (a) 4 years old, [2] (b) 9 years old. [2] (v) Comment on the reliability of each of your estimates in part (iv). [2] Page 296, Table of Contents www.EconsPhDTutor.com Exercise 178. (8864 N2012/I/10. Answer on p. 393.) ‘Sunbrite’ plants are sold in trays of 12 plants. For any Sunbrite plant, the probability that it flowers is 0.8, independently of all other Sunbrite plants. Find the probability that from one tray of Sunbrite plants (i) exactly 10 will flower, [2] (ii) fewer than 8 will flower. [2] A gardener A buys 8 trays of Sunbrite plants. (iii) Use a suitable approximation to estimate the probability that more than 75 plants will flower. State the mean and variance of the distribution that you use. [4] Two other gardeners, B and C, each buy 8 trays of Sunbrite plants. (iv) Find the probability that, for at least two of the three gardeners A, B, and C, more than 75 of their plants will flower. [3] Exercise 179. (8864 N2012/I/11. Answer on p. 393.) A company sells balls of string. A manager claims that the average length of string in a ball is at least 300 m. To test this claim, a random sample of 100 balls of string is checked and the lengths of string per ball, x m, are summarised by ∑(x − 300) = −60 and ∑(x − 300)2 = 1240. (i) Find unbiased estimates of the population mean and variance. [3] (ii) Test at the 5% significance level whether the manager’s claim is valid. [5] The manufacturing process is improved and the new population variance is known to be 12.1 m2 . A new random sample of 100 balls of string is chosen and the mean of this sample is k m. A test at the 10% significance level indicates that the manager’s claim is valid for this improved process. (iii) Find the least possible value of k, giving your answer correct to 2 decimal places. [3] Page 297, Table of Contents www.EconsPhDTutor.com Exercise 180. (8864 N2012/I/12. Answer on p. 394.) A supermarket sells two types of grapefruit, A and B. The masses, in kilograms, of the grapefruit of each type have independent normal distributions. The means and standard deviations of these distributions, and the selling prices, in $ per kilogram, are shown in the following table. Mean (kg) Standard deviation (kg) Selling price ($ per kg) Type A 0.25 0.02 1.50 Type B 0.35 0.03 2.40 Stating clearly the mean and variance of all distributions that you use, find the probability that (i) the total mass of 10 randomly chosen grapefruit of type A is less than 2.4 kg, [3] (ii) the total mass of 6 randomly chosen grapefruit of type A is within 0.2 kg of the total mass of 5 randomly chosen grapefruit of type B. [4] (iii) Mrs Woo buys 3 grapefruit of type A and 3 grapefruit of type B. Mr Tan buys 10 grapefruit of type A. Stating clearly the mean and variance of the distribution that you use, find the probability that Mrs Woo pays more than Mr Tan. [6] Exercise 181. (8864 N2011/I/6. Answer on p. 394.) Independent events A and B are such that P(A) = a and P(B) = b. Given that P(A ∪ B) = 0.46 and P(A ∩ B) = 0.04, find a quadratic equation satisfied by a and hence find the possible values of P(A). [5] Exercise 182. (8864 N2011/I/7. Answer on p. 394.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. Two thousand students travel to college either by car, by bicycle or on foot. Any given student travels by the same method each day. The numbers in each of two year-groups using each method of travel are summarised in the table below. Car Bicycle On foot Year 1 200 400 500 Year 2 240 360 300 Researcher A carries out a survey to investigate the length of students’ journey times to college, using a random sample of 100 students. (i) Explain what is meant in this context by the term ‘a random sample’. [2] Researcher B decides to use stratified sampling with three strata from the combined yeargroups, also using 100 students. (ii) Identify the three strata and find the size of the sample taken from each stratum. [2] (iii) State one advantage that stratified sampling would have compared to random sampling in this context, and state how a better stratified sample of size 100 could have been achieved, using the data in the above table. [2] Page 298, Table of Contents www.EconsPhDTutor.com Exercise 183. (8864 N2011/I/8. Answer on p. 395.) The air temperature T , in °C, and the altitude H, in metres, were recorded at noon on a certain day at each of 8 locations in a mountainous region. The results are summarised in the table below. H 200 285 335 450 581 878 1225 1550 T 27 23 22 20 15 14 8 6 (i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2] (ii) Find the product moment correlation coefficient and comment on its value in the context of this question. [2] (iii) Find the equation of the regression line of T on H. Sketch this line on your scatter diagram. [2] (iv) Calculate an estimate of the air temperature at noon at a place in the region with altitude 1000 metres. Comment on the reliability of this estimate. [2] Exercise 184. (8864 N2011/I/9. Answer on p. 395.) A certain type of light bulb is designed to have a mean lifetime of 12, 000 hours. The standard deviation of the lifetimes is 1, 400 hours. Tests on a random sample of 50 bulbs from a certain batch give a mean lifetime of 11, 500 hours. (i) Test at the 1% level of significance whether this particular batch is substandard (that is, the mean lifetime of bulbs in the batch is less than 12, 000 hours). [4] Tests on a random sample of 50 bulbs from another batch give a mean lifetime of T hours. A test at the 5% level of significance does not indicate that this batch is substandard. (ii) Obtain an equation for the least possible value of T , and solve it. [4] Exercise 185. (8864 N2011/I/10. Answer on p. 395.) Jon attempts a puzzle in his daily newspaper each day. The probability that he will complete the puzzle on any given day is 0.8, independently of any other day. (i) Find the probability that, in a given week of 7 days, Jon will complete the puzzle (a) exactly 3 times, [1] (b) at least 5 times. [2] (ii) Find the probability that, over a period of 10 weeks, Jon completes the puzzle at least 5 times each week. [2] (iii) Using a suitable approximation, find the probability that, over a period of 10 weeks, Jon completes the puzzle at least 50 times in total. State the mean and variance of approximation. [4] Page 299, Table of Contents www.EconsPhDTutor.com Exercise 186. (8864 N2011/I/11. Answer on p. 396.) Box A contains 5 red balls, 4 green balls and 1 yellow ball. Box B contains 6 red balls and 2 green balls. One of the boxes is selected by tossing two fair coins. If both coins show heads, box A is selected and otherwise box B is selected. (i) One ball is chosen at random from the selected box and the colour of the ball is noted. (a) Draw a tree diagram to represent this situation. [3] (b) Find the probability that a red ball is chosen. [2] (c) Given that a red ball is chosen, find the probability that it comes from box A. [2] (ii) Instead, two balls are chosen at random, without replacement, from the selected box. Find the probability that both balls are the same colour. [4] Exercise 187. (8864 N2011/I/12. Answer on p. 396.) Boys and girls visiting a theme park have masses, in kg, that are independent and are normally distributed with means and standard deviations as shown in the following table. Mean Standard deviation Boys 60 12 Girls 50 10 (i) Find the probability that the mass of a boy chosen at random is between 50 kg and 70 kg. [2] (ii) A boy and a girl are chosen at random. Find the probability that the mass of the boy is greater than the mass of the girl, stating clearly the mean and variance of the distribution that you use. [4] (iii) On a ride at the theme park, trains carrying up to 5 people travel around a track. The total mass of the people on the train must not exceed the safety limit of 300 kg. Three boys and two girls are chosen at random. Find the probability that their total mass is less than 300 kg, stating clearly the mean and variance of the distribution that you use. [4] (iv) The track is improved and new trains carrying up to 6 people are designed. The new safety limit is L kg. Obtain the equation for L, given that it is 95% certain that 6 boys chosen at random have a total mass not exceeding L kg. Hence find L. [3] Exercise 188. (8864 N2010/I/6. Answer on p. 396.) The events A and B are such that P(A) = 0.6, P(B) = 0.3 and P(A∣B) = 0.2. Find the probability that (i) both A and B occur, [1] (ii) at least one of A and B occurs, [2] (iii) exactly one of A and B occurs. [2] Page 300, Table of Contents www.EconsPhDTutor.com Exercise 189. (8864 N2010/I/7. Answer on p. 397.) A group of students take an examination in Chemistry. A student who fails the examination at the first attempt is allowed one further attempt. For a randomly chosen student, the probability of passing the examination at the first attempt is 0.7 and the probability of passing at the second attempt is 0.9. The information is shown in the tree diagram below. (i) Find the probability that a randomly chosen student fails the examination at both attempts. [1] (ii) Given that a student passes the examination, find the probability that it is at the second attempt. [3] (iii) Three students taking the examination are chosen at random. Find the probability that two of them pass at the first attempt and the other passes at the second attempt. [3] Exercise 190. (8864 N2010/I/8. Answer on p. 397.) A college has 1, 400 students in Year One, 900 students in Year Two and 700 students in Year Three. It is intended to carry out a survey to investigate how much students spend on new clothes each year. (i) Describe how to obtain a stratified random sample of 60 students to take part in the survey. [2] (ii) Describe, in this context, one advantage that stratified sampling has compared to simple random sampling. [1] The amount of money spent by a student is denoted by $X. The values for a (non-stratified) random sample of 50 students are summarised by ∑ x = 10, 450, ∑ x2 = 2, 235, 000. The population mean and variance of X are denoted by µ and σ 2 respectively. (iii) Calculate unbiased estimates of µ and σ 2 . [3] A significance test of the null hypothesis µ = 200 against the alternative hypothesis µ > 200 is carried out at the 10% level of significance. (iv) Without doing any further calculations, state two assumptions or approximations that are involved when carrying out the significance test using the above sample data. [2] Page 301, Table of Contents www.EconsPhDTutor.com Exercise 191. (8864 N2010/I/9. Answer on p. 397.) The probability of any sunflower seed germinating when it is sown is 0.7, independently of all other sunflower seeds. Find the probability that, when 8 seeds are sown, (i) exactly 6 will germinate, [2] (ii) at least 6 will germinate. [2] (iii) 60 sunflower seeds are sown. Use a suitable approximation to estimate the probability that fewer than 40 will germinate. You should state the mean and variance of the approximation. [4] Exercise 192. (8864 N2010/I/10. Answer on p. 397.) A factory produces components for an electrical product. The masses of the components are normally distributed with standard deviation 1.2 grams. The factory owner claims that the mean mass of the components is 15 grams. A random sample of 80 components was taken and found to have a mean mass of 15.25 grams. (i) Test the owner’s claim at the 5% level of significance. [4] The owner purchases new machinery to produce the components, and the standard deviation remains unchanged. The owner claims the mean mass is now less than 15 grams. A new random sample of 80 components is taken. (ii) Find the set of values within which the mean mass of this sample must lie for the owner’s new claim to be accepted at the 5% level of significance. [5] Exercise 193. (8864 N2010/I/11. Answer on p. 398.) (a) Eight pairs of values of variables x and y are measured. Draw a sketch of a possible scatter diagram of the data for each of the following cases: the product moment correlation coefficient is approximately (i) 0, [1] (ii) −0.8. [1] (b) The monthly earnings, y thousand dollars, of 7 workers of different ages, x years, in a particular company are given in the table below. 20 22 27 35 45 55 x 18 y 2.55 2.65 2.85 3.15 4.76 5.45 6.26 (i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2] (ii) Find the product moment correlation coefficient. [1] (iii) Find the equation of the regression line of y on x in the form y = mx + c, giving the values of m and c correct to 4 decimal places. [1] (iv) Calculate an estimate of the monthly earnings of a 40-year-old worker. State why you would expect this to be a reliable estimate. [2] (v) All workers are given an increase of N thousand dollars per month. Without any further calculations state any change you would expect in the values of your constants m and c found in part (iii). [2] Page 302, Table of Contents www.EconsPhDTutor.com Exercise 194. (8864 N2010/I/12. Answer on p. 399.) Sweets of a certain brand are individually wrapped. The masses, in grams, of the unwrapped sweets and the wrappers have independent normal distributions with means and standard deviations as shown in the table below. Mean Standard deviation Unwrapped sweets 40 3 Wrappers 4 0.5 (i) Find the probability that an individual unwrapped sweet has mass less than 36 grams. [1] (ii) State the mean and variance of the mass of an individual wrapped sweet. Find the probability that a wrapped sweet has mass between 42 grams and 46 grams. [3] Twelve wrapped sweets are packed together in a cardboard tube. The mass of an empty tube is normally distributed with mean 50 grams and standard deviation 5 grams. The masses of all sweets and tubes are independent. (iii) Find the probability that the total mass of a tube containing 12 wrapped sweets is more than 600 grams, stating clearly the mean and variance of the distribution that you use. [4] A rival company produces similar tubes of sweets. The masses of these tubes of sweets have a normal distribution. Over a long period of time, it is found that 5% of them have a mass less than 450 grams and 8% have a mass more than 550 grams. (iv) Find the mean and variance of this distribution. [5] Exercise 195. (8864 N2009/I/6. Answer on p. 399.) Three researchers, A, B, and C, share an office. When the office phone rings, the probabilities of the call being for each of them are as follows. A ∶ 0.2, B ∶ 0.3, C ∶ 0.5. The probabilities of each researcher being in the office when the phone rings are as follows. A ∶ 0.7, B ∶ 0.6, C ∶ 0.8. All the probabilities are independent. Find the probability that, when the phone rings, (i) the call is for A and A is in the office, [1] (ii) the researcher being called is in the office, [2] (iii) the call is for C, given that the researcher being called is not in the office. [2] Page 303, Table of Contents www.EconsPhDTutor.com Exercise 196. (8864 N2009/I/7. Answer on p. 399.) A and B are two events such that P(A) = 1/3, P(B) = 2/5 and P(A ∪ B) = 17/30. (i) Find P(A ∩ B). [1] (ii) Show that A and B are not independent. [1] (iii) Using a Venn diagram, or otherwise, find P(A′ ∪ B). [3] Exercise 197. (8864 N2009/I/8. Answer on p. 399.) Components in machines used in a factory wear out and need to be replaced. The lifetime of a component has a normal distribution with mean 120 days and standard deviation 18 days. (i) Find the probability that the lifetime of a component is more than 144 days. [2] (ii) Two components are chosen at random. Find the probability that one has a lifetime of more than 144 days and one has a lifetime of less than 144 days. [2] A company develops a new design for the component. The standard deviation of the lifetimes remains 18 days, but the company claims that the mean lifetime is longer than for the old components. From a random sample of 50 components of the new design, the sample mean is 124 days. (iii) Test at the 5% level of significance whether there is sufficient evidence to support the company’s claim. [4] Exercise 198. (8864 N2009/I/9. Answer on p. 400.) A liquid nutrient is added to the soil around the fruit trees in an orchard, with the aim of increasing the total weight of fruit produced by the trees. For each of 8 trees, the volume of liquid nutrient, x cm3 , and the corresponding weight, y kg, of fruit per tree is given in the table below. 0 20 40 60 90 120 160 200 x y 15.1 15.7 16.2 16.8 16.7 16.5 17.3 18.1 (i) Give a sketch of the scatter diagram for the data, as shown on your calculator. (ii) Calculate the product moment correlation coefficient and comment on its value in the context of the data. [2] (iii) Calculate the equation of the regression line of y on x. Sketch this line on your scatter diagram. [2] (iv) Estimate the weight of fruit on a tree when 135 of liquid nutrient is added to its soil. [1] (v) Explain why it might be unsuitable to use the equation in part (iii) to estimate how much liquid nutrient would be needed for a tree to yield 20 kg of fruit. [1] Page 304, Table of Contents www.EconsPhDTutor.com Exercise 199. (8864 N2009/I/10. Answer on p. 400.) Over a long period of time, it is found that 20% of candidates who take a particular piano examination fail the examination. (i) Find the probability that, in a group of 10 randomly chosen candidates who take the examination, exactly 2 will fail. [2] (ii) It is given that 15% of the candidates who pass the piano examination are awarded a distinction. Find the probability that, in a randomly chosen group of 10 candidates who take the examination, fewer than 2 will be awarded a distinction. [3] (iii) Use a suitable approximation to estimate the probability that, in a group of 50 randomly chosen candidates who take the examination, at most 12 will fail. You should state the mean and variance of the distribution used in the approximation. [4] Exercise 200. (8864 N2009/I/11. Answer on p. 401.) (a) An insurance company receives a large number of claims for flood damage. On a particular day the company receives 72 such claims. Because of staff shortages, it is only possible to process 8 of these claims. Parts (a)(i) and (a)(ii) are no longer in the 8865 (revised) syllabus, so you can skip them. (i) Describe how you would choose a systematic random sample of size 8 from the received claims. [2] (ii) Comment on whether this method of sampling gives a better indication of the value of the 72 claims as compared to simply choosing as the sample the first 8 claims received. [1] (b) From the claims received by the company, over a long period of time, a random sample of 120 is taken. The values of the claims, $x, are summarised by ∑(x − 1000) = 5320, ∑(x − 1000)2 = 8282000. (i) Find unbiased estimates of the population mean and variance. [3] (ii) What do you understand by the term ‘unbiased estimate’? [1] (iii) The population mean is denoted by $µ. Using the sample data, a significance test of the null hypothesis µ = 1000 against the alternative hypothesis µ ≠ 1000 is carried out at the α% level of significance. Find the set of values of α for which the null hypothesis will be rejected. [5] Page 305, Table of Contents www.EconsPhDTutor.com Exercise 201. (8864 N2009/I/12. Answer on p. 401.) (a) The plums sold by a supermarket are graded ‘small’, ‘medium’ or ‘large’. The masses of the plums have a normal distribution. Plums with a mass less than 22 grams are graded as small, plums with a mass greater than 29 grams are graded as large and the rest are graded as medium. Given that 30% of plums are small and 20% are large, find the mean and standard deviation of the distribution. [4] (b) The masses, in kilograms, of apples and nectarines sold by the supermarket have independent normal distributions with means and standard deviations as shown in the following table. Mean Standard deviation Apples 0.15 0.03 Nectarines 0.07 0.02 (i) Two apples and four nectarines are chosen at random. Find the probability that the total mass of the two apples is greater than the total mass of the four nectarines. [4] (ii) Apples cost $9 per kilogram and nectarines cost $12 per kilogram. Find the mean and the variance of the total cost of two apples and four nectarines and hence find the probability that the total cost is between $5 and $6. [5] Exercise 202. (8864 N2008/I/7. Answer on p. 402.) An examination is marked out of 100. It is taken by a large number of candidates. The mean mark, for all candidates, is 72.1, and the standard deviation is 15.2. (i) Give a reason why a normal distribution, with this mean and standard deviation, would not give a good approximation to the distribution of marks. [1] (ii) A random sample of 50 of the candidates is taken. Calculate the probability that the mean mark of this sample lies between 70.0 and 75.0.[3] Exercise 203. (8864 N2008/I/8. Answer on p. 402.) A baker makes loaves of bread. 60% of the loaves that he makes are ‘crusty’. (i) A customer buys six randomly chosen loaves. Find the probability that exactly three of them are crusty. [2] (ii) A market trader buys 40 randomly chosen loaves. Use a suitable approximation to find the probability that at least 20 of them are crusty. [4] (iii) The mass of a loaf has a normal distribution with mean 1.24 kg and standard deviation σ kg. The probability that a randomly chosen loaf has mass less than 1 kg is 0.04. Find the value of σ. [3] Page 306, Table of Contents www.EconsPhDTutor.com Exercise 204. (8864 N2008/I/9. Answer on p. 402.) Two children, Tan and Mui, are each to be given a pen from a box containing 3 red pens and 5 blue pens. One pen is chosen at random and given to Tan. A green pen is then put in the box. A second pen is chosen at random from the box and given to Mui. (i) Draw a tree diagram to represent the possible outcomes. [2] (ii) Write down the conditional probability that Mui’s pen is blue, given that Tan’s pen is red. [1] (iii) Find the probability that Mui’s pen is red. [2] (iv) Find the conditional probability that Tan’s pen is red, given that Mui’s pen is blue. [5] Exercise 205. (8864 N2008/I/10. Answer on p. 403.) A consumer association is testing the lifetime of a particular type particular type of battery that is claimed to have a lifetime of 150 hours. A random sample of 70 batteries of this type is tested and the lifetime, x hours, of each battery is measured. The results are summarised by ∑ x = 10317, ∑ x2 = 1540231. The population mean lifetime is denoted by µ hours. The null hypothesis µ = 150 is to be tested against the alternative hypothesis µ < 150. (i) Find the p-value of the test and state the meaning of this p-value in the context of the question. [5] A second random sample of 50 batteries of this type is test and the lifetime, y hours, of each battery is measured, with results summarised by ∑ y = 7331, ∑ y 2 = 1100565. (ii) Combining the two samples into a single sample, carry out a test, at the 10% significance level, of the same null and alternative hypotheses. [6] Page 307, Table of Contents www.EconsPhDTutor.com Exercise 206. (8864 N2008/I/11. Answer on p. 403.) An engineering company makes cranes. The numbers, x, sold in each three-month period for two years together with the profits, y thousand dollars, on the sale of these cranes are given in the following table. x 15 17 13 21 16 22 14 18 y 290 350 270 430 340 410 300 360 (i) Give a sketch of the scatter diagram for the data as shown on your calculator. [2] (ii) Find x̄ and ȳ, and mark the point (x̄, ȳ) on your scatter diagram. [2] (iii) Calculate the equation of the regression line of y on x, and draw this line on your scatter diagram. [2] (iv) Calculate the product moment correlation coefficient, and comment on its value in relation to your scatter diagram. [2] (v) For the next three-month period, the sales target is 20 cranes. Estimate the corresponding profit. [2] (vi) The company’s sales director uses the regression line in part (iii) to predict the profit if 40 cranes were to be sold in a three-month period. Comment on the validity of this prediction. [2] Exercise 207. (8864 N2008/I/12. Answer on p. 404.) A supermarket obtains a large supply of apples of a single variety. The mass of an apple has a normal distribution with mean 0.234 kg and standard deviation 0.025 kg. Some of the apples are packed, at random, into ‘small’ bags, each containing 5 apples, and others are packed, at random, into ‘large’ bags, each containing 10 apples. (i) Find the probability that a randomly chosen small bag has a mass exceeding 1.2 kg. [4] (ii) Find the probability that the total mass of two randomly chosen small bags is within ±0.2 kg of the mass of a randomly chosen large bag. [4] Lee buys two small bags at $1.50 per kg, and Foo buys one large bag at $1.20 per kg. (iii) Find the probability that Lee pays at least $0.50 more than Foo. [6] Exercise 208. (8864 N2007/I/6. Answer on p. 404.) A manufacturer produces packets of margarine. The mass of margarine in a packet has a normal distribution with mean 502 g and standard deviation 0.8 g. (i) Find the proportion of packets which contain less than 500 g of margarine. [2] The manufacturer increases the mean amount of margarine in a packet to µ g. The standard deviation remains unchanged. Only 1 packet in 1000, on average, now contains less than 500 g. (ii) Find µ, correct to 1 decimal place. [3] Page 308, Table of Contents www.EconsPhDTutor.com Exercise 209. (8864 N2007/I/7. Answer on p. 404.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. A school has a canteen where students can buy their lunch. Each day most, but not all, students buy their lunch in the canteen. The headteacher wants to find out what students think of the lunches provided in the canteen. On one particular day she selects a sample of students to interview from those buying their lunch by • choosing at random one of the first 10 students to buy their lunch, • then choosing every 10th student after the first student chosen. (i) What is this type of sampling method called? [1] (ii) State one advantage and one disadvantage of the sampling method used in this context. [2] (iii) Describe an alternative sampling method which would be better in this case. [2] Exercise 210. (8864 N2007/I/8. Answer on p. 404.) Seven cities in a certain country are linked by rail to the capital city. The table below shows the distance of each city from the capital and the rail fare from the city to the capital. City A B C D E F G Distance, x km 124 44 76 148 16 180 104 Rail fare, $y 156 53 99 169 23 177 138 (i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2] (ii) Calculate the product moment correlation coefficient. [1] You are given that the regression line of y on x has equation y = 16.7 + 1.01x, where the coefficients are given correct to 3 significant figures. (iii) Calculate the equation of the regression line of x on y, giving your answer in the form x = a + by. [1] (iv) Use the appropriate regression line to estimate (a) the rail fare from a city that is 28 km from the capital, [2] (b) the distance of a city from the capital if the rail fare is $198. [2] (v) Comment briefly on the reliability of the estimates in part (iv). [2] Page 309, Table of Contents www.EconsPhDTutor.com Exercise 211. (8864 N2007/I/9. Answer on p. 405.) A random variable X has a binomial distribution with n = 6 and probability of success p. (i) Write down an expression, in terms of p, for P(X = 4). [1] It is given that p = 1/4. (ii) Find P(X = 4), giving your answer as a fraction. [1] (iii) The mean and standard deviation of X are denoted by µ and σ respectively. Find P(µ − σ < X < µ + σ), correct to 2 decimal places. [5] Exercise 212. (8864 N2007/I/10. Answer on p. 405.) Bottles of a particular brand of washing-up liquid are said to contain 500 ml. A random sample of 50 bottles is taken and the volumes of liquid in the bottles are measured. The volumes, x ml, are summarised ∑(x − 500) = −35.8 and ∑(x − 500)2 = 150.5. (i) Find unbiased estimates of the population mean and variance. [3] (ii) Assuming a normal distribution, test at the 5% significance level whether the population mean volume is less than 500 ml. [4] (iii) State, giving a reason, whether it is necessary to assume a normal distribution for the test to be valid. [1] Exercise 213. (8864 N2007/I/11. Answer on p. 405.) The table below shows the results of a survey of the 120 cars in a car park, in which the colour of each car and the gender of the driver were recorded. Male Female Green 18 12 Blue 48 22 6 14 Red One of the cars is selected at random. M is the event that the car selected has a male owner. G is the event that the car selected is green. B is the event that the car selected is blue. R is the event that the car selected is red. (i) Find the following probabilities: (a) P(M ), (b) P(M ∩ G), (c) P(M ∪ B), (d) P(M ∣R′ ). [1 mark each.] (ii) Determine whether the events M and G are independent, justifying your answer. [2] It is given that bicycle racks are fitted to 20% of the green cars, 30% of the blue cars and 5% of the red cars. One of the cars is selected at random and found to have a bicycle rack fitted. (iii) What is the probability that it is a blue car? [5] Page 310, Table of Contents www.EconsPhDTutor.com Exercise 214. (8864 N2007/I/12. Answer on p. 406.) Men and women have masses, in kg, that are normally distributed with means and standard deviations as shown in the following table. Mean mass Standard deviation Men 75 12.5 55 10.5 Women (i) Two men are chosen at random. Find the probability that one of the men has mass more than 90 kg and the other has mass less than 90 kg. [4] (ii) One man and one woman are chosen at random. Find the probability that the woman’s mass is greater than the man’s. [4] The safety limit for a hotel elevator is 530 kg. (iii) Six men are chosen at random. Find the probability that their total mass is greater than 530 kg. [4] (iv) Six male hotel guests enter the elevator, at a time when a large number of sumo wrestlers are staying at the hotel. Give two reasons why the probability that their total mass exceeds 530 kg may be different from the value calculated in part (iii). [2] Exercise 215. (8174 N2006/II/8. Answer on p. 406.) A and B are independent events such that P(A) = 0.6 and P(A ∪ B) = 0.7. Find P(A ∩ B ′ ). [6] Exercise 216. (8174 N2006/II/9. Answer on p. 406.) This question is no longer in the 8865 (revised) syllabus, so you can skip it. Some students are conducting a survey at a sports club. They each question a sample of the club members. (i) Anil decides to choose the first 20 men and the first 20 women he sees. What name is given to this type of sampling? [1] (ii) Betty decides to choose every tenth person on the membership list. What name is given to this type of sampling? [1] (iii) Calvin decides to use random sampling. Describe briefly one way in which he could select his sample. [2] The club has 240 members and 3 sections — badminton, squash and tennis. The table shows the number of men and women in each section. Male Female TOTAL Badminton 32 12 44 Squash 60 40 100 Tennis 48 48 96 TOTAL 140 100 240 (iv) Dennis decides to take a stratified sample of size 60 from the total membership. (a) How many women does he select? (b) How many men from the squash section does he select? [3] Page 311, Table of Contents www.EconsPhDTutor.com Exercise 217. (8174 N2006/II/13. Answer on p. 406.) The probability that a resident of a certain town watches a particular television programme is 0.3. (i) Find the probability that exactly 4 out of 12 residents watch the programme. [3] (ii) Use a suitable approximation to find the probability that, out of 80 residents, more than 20 but less than 30 watch the programme. [7] Exercise 218. (8174 N2006/II/14. Answer on p. 407.) A team either wins or loses each of their matches. If the team wins a match, the probability that it wins the next match is 0.8. If the team loses a match, the probability that it wins the next match is 0.4. The team plays 4 matches in total. The team wins the first match. Calculate the probability that the team wins (i) both the second and third matches, [2] (ii) the fourth match, [5] (iii) at least 3 of the 4 matches placed. [3] Exercise 219. (8174 N2006/II/14-OR. Answer on p. 407.) The heights of male students in a college can be modelled using a normal distribution with mean 176 cm and standard deviation 4 cm. (i) Calculate the probability that one of these students, chosen at random, is less than 170 cm tall. [2] (ii) Find the height that is exceeded by 10% of these students. [2] In another college there are 1000 female students. Of these, 6 are less than 150 cm tall and 883 of them are less than 175 cm tall. (iii) Assuming the heights of these students can be modelled using a normal distribution with mean m and standard deviation s, find the value of m and of s. [6] Page 312, Table of Contents www.EconsPhDTutor.com Part V Answers to Exercises My answers here are often more verbose than what would be necessary for you to get the full credit on an exam. The reason is to help you understand my answers better. Page 313, Table of Contents www.EconsPhDTutor.com 60 Answers to Exercises in Part I: Functions and Graphs Answer to Exercise 1. The error is in Step #5. Since x = y, we have x − y = 0. Hence, we cannot divide both sides by x − y. Answer to Exercise 2. Given f (x) = 7x−3, we have f (0) = 7⋅0−3 = −3, f (1) = 7⋅1−3 = 4, and f (2) = 7 ⋅ 2 − 3 = 11. Answer to Exercise 3. Given g (the function that maps each country to its capital), we have g(France) = Paris and g(Japan) = Tokyo. Answer to Exercise 4. -2.5 -2.0 -1.5 (i) -1.0 4 3 2 1 0 -0.5 -1 0.0 -2 -3 -4 0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0 2.5 (ii) 2 1 0 -2.5 -2.0 -1.5 -1.0 -0.5 -1 0.0 -2 -3 -4 Page 314, Table of Contents www.EconsPhDTutor.com Answer to Exercise 5. (i) 12 8 4 0 -2 -1 0 1 2 3 1 3 -4 (ii) 12 8 4 -5 -3 Page 315, Table of Contents -2 0 0 -4 www.EconsPhDTutor.com Answer to Exercise 6. The graphs of all three equations are below: (a) y = 2x2 + x + 1 (red). (b) y = −2x2 + x + 1 (blue). (c) y = x2 + 6x + 9 (green). (a) Since b2 − 4ac = 12 − 4(2)(1) = 1 − 8 = −7 < 0, there are no horizontal intercepts. The b 1 vertical intercept is c = 1. The turning point is at x = − = − = −0.25. 2a 4 (b) Since b2 √ − 4ac = 12√ − 4(−2)(1) = 1 + 8 = 9 > 0, there are two horizontal intercepts, −b ± 9 −1 ± 9 namely = = 1, −0.5. The vertical intercept is c = 1. The turning point is 2a −4 b −1 at x = − = = 0.25. 2a −4 (c) Since b2 − 4ac = 62 − 4(1)(9) = 36 − 36 = 0, there is one horizontal intercept, namely 6 b b − = − = 3. The vertical intercept is c = 9. The turning point is at x = − = 3. 2a 2 2a Page 316, Table of Contents www.EconsPhDTutor.com Answer to Exercise 7. (i) The quadratic equation y = ax2 + bx + c has (a) two real roots if and only if b2 − 4ac > 0; (b) two equal roots if and only if b2 − 4ac = 0; and (c) no real roots if and only if b2 − 4ac < 0. (ii) (a) ax2 + bx + c is positive for all possible values of x if and only if a > 0 (so ∪-shaped) and b2 − 4ac < 0 (so doesn’t touch x-axis). (b) ax2 + bx + c is negative for all possible values of x if and only if a < 0 (so ∩-shaped) and b2 − 4ac < 0 (so doesn’t touch x-axis). Answer to Exercise 8. (53x ⋅ 52(1−x) ) (53x ⋅ 251−x ) = 2x+1 ∵ 25 = 52 2x+1 x 2x x 2x 5 + 3(25 ) + 17(5 ) 5 + 3(25 ) + 17(5 ) 52+x = 2x+1 Add the exponents 5 + 3(25x ) + 17(52x ) 52+x = 2x+1 ∵ 25 = 52 5 + 3(52x ) + 17(52x ) 52+x = 2x 1 Factorise out 52x 5 (5 + 3 + 17) 1 52+x 5x = x = 2x = 2x = 5−x . 5 5 (25) 5 (8x+2 − 34(23x )) (8x+2 − 34(23x )) = √ 2x √ 1 Splitting out the exponents √ 2x+1 ( 8) ( 8) ( 8) (8x+2 − 34(23x )) (8x+2 − 34(8x )) = = x √ x √ (8) ( 8) (8) ( 8) (8x ) (82 − 34) = Factorise out the 8x x √ (8) ( 8) (82 − 34) (64 − 34) √ √ = = ( 8) ( 8) 30 30 15 =√ = √ =√ . 8 2 2 2 Page 317, Table of Contents www.EconsPhDTutor.com Answer to Exercise 9. (i) x(a ) = xab is false. Here’s a counter-example. Let x = 2, a = 3, b 4 b = 4. Then x(a ) = 2(3 ) = 281 , but xab = 23×4 = 212 – the two are clearly not equal. b b (ii) (xa ) = xab is true, as we now prove: b times ³¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ b (xa ) = (xa ) ⋅ (xa ) ⋅ ⋅ ⋅ ⋅ ⋅ (xa ) b times ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ a times ⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞ ⎛³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞ = ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ = xab . Answer to Exercise 10. 10 9 8 7 6 5 4 3 2 1 0 Page 318, Table of Contents www.EconsPhDTutor.com Answer to Exercise 11. (i) by = 4 ⋅ 2(y−1990)/7 . (ii) (iii) In 2025, there’ll be b2025 = 128 Singaporean billionaires. Answer to Exercise 12. (i) ln (1/e2 ) = −2, log5 0.008 = −4, lg 100000 = 5. (ii) loga 16 = 4 Ô⇒ a = 2. logb 0.25 = −1 Ô⇒ b = 4. logc 5 = 1 Ô⇒ c = 5. (iii) y = 3x Ô⇒ log3 y = x. 5 = pq Ô⇒ logp 5 = q. (iv) α = log4 β Ô⇒ 4α = β. logγ δ = 17 Ô⇒ γ 17 = δ. Page 319, Table of Contents www.EconsPhDTutor.com Answer to Exercise 13. (i) log3 3x = x. (ii) 2 loga 7 + 0.25 loga 81 − loga 3 = loga 49 + loga 811/4 − loga 3 = loga 49 + loga 3 − loga 3 = loga 49 = loga x. So x = 49. (iii) ln(y − 1) + ln y = ln [y(y − 1)] = 2 ⇐⇒ y(y − 1) = e2 ⇐⇒ y 2 − y − e2 = 0. By the quadratic formula, y= 1± √ (−1)2 − 4(1) (−e2 ) 1 ± = 2(1) √ 1 + 4e2 . 2 We know that y must be positive, so it must be that y = (1 + √ 1 + 4e2 ) /2. Answer to Exercise 14. Page 320, Table of Contents www.EconsPhDTutor.com Answer to Exercise 15 (a). Graphed below is the equation y = ex . The vertical intercept is 1 — i.e. the graph crosses the y-axis at the point (0, 1). There are no turning points. The horizontal asymptote is y = 0 — as x grows infinitely small (i.e. to −∞), y grows ever closer to (but does not equal) 0. There are no lines of symmetry. 8 y 7 6 5 4 y = ex 3 2 1 x 0 -2 Page 321, Table of Contents -1 0 1 2 www.EconsPhDTutor.com Answer to Exercise 15 (b). Graphed below is the equation y = 3x + 2. The vertical intercept is 2 — i.e. the graph crosses the y-axis at the point (0, 2). The horizontal intercept is −2/3 — i.e. the graph crosses the y-axis at the point (−2/3, 0). There are no turning points. There are no asymptotes. There are infinitely many lines of symmetry — specifically, every line that is perpendicular to the graph is a line of symmetry. 8 y 7 6 5 y = 3x + 2 4 3 2 1 x 0 -2 -1 0 1 2 -1 -2 -3 -4 Page 322, Table of Contents www.EconsPhDTutor.com Answer to Exercise 15 (c). Graphed below is the equation y = 2x2 + 1. The vertical intercept is 1 — i.e. the graph crosses the y-axis at the point (0, 1). The turning point is (0, 1). There are no asymptotes. There is one line of symmetry, namely x = 0 (this is also the vertical axis). 10 y 9 8 y = 2x2 + 1 7 6 5 4 3 2 1 x 0 -2 Page 323, Table of Contents -1 0 1 2 www.EconsPhDTutor.com Answer to Exercise 16. 1. Press ON to turn on your calculator. 2. Press Y= to bring up the Y= editor. 3. Press the blue 2ND button and then ex (which corresponds to the LN button) to enter “eˆ”. Next press X,T,θ,n to enter “X”. Then − , X,T,θ,n x2 + to enter “−x2 +”. √ Now press the blue 2ND (which corresponds to the x2 button), then X,T,θ,n to √ enter “ (x”. √ 4. Now press GRAPH and the calculator will graph the equation y = x. After Step 1. Page 324, Table of Contents After Step 2. After Step 3. After Step 4. www.EconsPhDTutor.com 1 3 2 4 Answer to Exercise 17. (i) Rearrange = into y = 0.2x + 0.4. Rearrange = into y = x2 + 5x + 3.6. 3 4 Now plug = into = to get 0.2x + 0.4 = x2 + 5x + 3.6 or x2 + 4.8x + 3.2 = 0 or 5x2 + 24x + 16 = 0. Now use the quadratic formula: x= −24 ± √ √ 242 − 4(5)(16) −24 ± 256 = = −4, −0.8. 2(5) 10 Correspondingly, y = −0.4 or y = −0.24. So there are two solutions to the given pair of simultaneous equations, namely (x, y) = (−4, −0.4) and (x, y) = (−0.8, −0.24). TI84 screenshots: 1 3 2 4 (ii) Rearrange = into y = 1 − 4x. Rearrange = into y = −2x2 + 5x − 3. 3 4 Now plug = into = to get 1 − 4x = −2x2 + 5x − 3 or 2x2 − 9x + 4 = 0 or (2x − 1)(x − 4) = 0. So x = 0.5 or x = 4. Correspondingly, y = −1 or y = −15. So there are two solutions to the given pair of simultaneous equations, namely (x, y) = (0.5, −1) and (x, y) = (4, −15). TI84 screenshots: Page 325, Table of Contents www.EconsPhDTutor.com Answer to Exercise 18 (a). The system of equations is y = 1 √ , y = x5 − x3 + 2. 1+ x Rewrite the two equations into a new equation y = x5 − x3 + 2 − 1 √ . 1+ x Our goal is to find the horizontal intercepts of this equation. These horizontal intercepts will give us the solutions to the above system of equations. 1. Graph the equation y = x5 − x3 + 2 − 1 √ . 1+ x It looks like there are no horizontal intercepts. Conclusion: This system of equations has no solutions. After Step 1. Answer to Exercise 18 (b). The system of equations is y = 1 , y = x3 + sin x. 2 1−x 1 − x3 − sin x. 1 − x2 Our goal is to find the horizontal intercepts of this equation. These horizontal intercepts will give us the solutions to the above system of equations. Rewrite the two equations into a new equation y = 1. Graph the equation y = 1 − x3 − sin x. 2 1−x It looks like there is only one horizontal intercept. 2. Find the horizontal intercept. It is −1.1790. Conclusion: This system of equations has one solution and its x-coordinate is −1.1790. To find the corresponding y-coordinate, we need merely plug in this value of x into 1 1 either of the equations in the original system of equations: y = = 2 ≈ 2 1−x 1 − (−1.1790) −2.5633. Altogether, this system of equations has one solutions: (−1.1790, −2.5633). After Step 1. Page 326, Table of Contents After Step 2. www.EconsPhDTutor.com Answer to Exercise 19. (i) Rearrange the inequality x2 + 3x − 5 > 6 − 2x2 into 3x2 + 3x − 11 > 0. The expression 3x2 + 3x − 11 is a ∪-shaped quadratic that equals 0 when x= −3 ± √ √ 32 − 4(3)(−11) −3 ± 141 = . 2(3) 6 Thus, the inequality holds if x < (−3 − √ 141) /6 or x > (−3 + √ 141) /6. (ii) Rearrange the inequality (x−3)(x+5) < 1 into x2 +2x−16 < 0. The expression x2 +2x−16 is a ∪-shaped quadratic that equals 0 when x= −2 ± √ √ √ 22 − 4(1)(−16) −2 ± 68 = = −1 ± 17. 2(1) 2 Thus, the inequality holds if −1 − Page 327, Table of Contents √ √ 17 < x < −1 + 17. www.EconsPhDTutor.com Answer to Exercise 20 (a) Rewrite the inequality as x3 − x2 + x − 1 − ex > 0. Graph y = x3 − x2 + x − 1 − ex . x3 − x2 + x − 1 − ex = 0 ⇐⇒ x = 3.0472, 3.5040. Thus, x3 − x2 + x − 1 > ex ⇐⇒ 3.0472 < x < 3.5040. After graphing. Zoom, adjust. Left intercept. Right intercept. √ √ Answer to Exercise 20 (b) Rewrite the inequality as x−cos x > 0. Graph y = x−cos x. √ √ x − cos x = 0 ⇐⇒ x = 0.6417. Thus, x > cos x ⇐⇒ x > 0.6417. After graphing. Zoom in once. The only horizontal intercept. Answer to Exercise 20 (c) Rewrite the inequality as 1/ (1 − x2 ) − x3 − sin x > 0. Graph y = 1/ (1 − x2 ) − x3 − sin x. By observation, • 1/ (1 − x2 ) − x3 − sin x > 0 if −1 < x < 1; and • 1/ (1 − x2 ) − x3 − sin x < 0 if x > 1; • 1/ (1 − x2 ) − x3 − sin x > 0 if x is to the left of the only horizontal intercept. After graphing. Zoom in once. The only horizontal intercept. 1/ (1 − x2 ) − x3 − sin x = 0 ⇐⇒ x = −1.179. We conclude that 1/ (1 − x2 ) − x3 − sin x > 0 Page 328, Table of Contents ⇐⇒ x < −1.179 or − 1 < x < 1. www.EconsPhDTutor.com Answer to Exercise 21. Let A, B, and C be the present-day age of Apu, Beng, and Caleb. Let k be the number of years ago when Apu was 40 years old. From the first sentence, we 1 2 3 4 know that A − k = 40 and B − k = 2(C − k). From the second sentence: A = 2B and C = 28. 3 1 4 2 5 6 Sub =into = and =into = to get 2B − k = 40 and B − k = 2(28 − k). 5 7 7 6 From =, k = 2B − 40. Sub = into = to get: B − (2B − 40) = 2 [28 − (2B − 40)] 40 − B = 2 [68 − 2B] = 136 − 4B 3B = 96 Ô⇒ B = 32 3 Beng is 32 years old today. And from =, Apu is 64 years old today. Answer to Exercise 22. The given information provides this system of equations 2 1 2 2 2 3 a (1) + b (1) + c = 2, a (3) + b (3) + c = 5, a (6) + b (6) + c = 9. You can solve this system of equations either by calculator or by hand, as I do now: 2 1 4 Take = minus = to get 8a + 2b = 3 or b = 0.5(3 − 8a) = 1.5 − 4a. 4 1 4 5 5 Plug = into = to get a + 1.5 − 4a + c = 2 or c = 0.5 + 3a. 3 Plug = and = into = to get 36a + 6 (1.5 − 4a) + 0.5 + 3a = 9 ⇐⇒ 15a + 9.5 = 9 ⇐⇒ 15a = −0.5 ⇐⇒ a = −1/30. 4 5 Now from =, b = 49/30 and from =, c = 0.4. Answer to Exercise 23. The turning point (which is a minimum turning point if a is b b 2 b b2 b2 b2 positive) of the equation is at x = − and y = a (− ) + b (− ) + c = − +c = c− . 2a 2a 2a 4a 2a 4a We know that at the minimum point, x = 0 and y = 0. So b = 0 and c = 0. Since (−1, 2) 1 2 satisfies the equation y = ax2 + bx + c, we also have a (−1) + b (1) + c = 2. Thus, a = 2. Altogether then, a = 2, b = 0, and c = 0. Page 329, Table of Contents www.EconsPhDTutor.com 61 Answers to Exercises in Part II: Calculus Answer to Exercise 24. (i) (a) f ′ (x) = 1/x + ex + 2x. (i) (b) f (1) = ln 1 + e1 + 12 = e + 1 and f ′ (1) = 1/1 + e1 + 2(1) = 3 + e. So the equation of the tangent at the point (1, 3 + e) is y − (e + 1) = (3 + e) (x − 1) . Or rearranging: y = (3 + e)x − 2. f (2) = ln 2 + e2 + 22 = ln 2 + e2 + 4 and f ′ (2) = 1/2 + e2 + 2(2) = 4.5 + e2 . So the equation of the tangent at the point (2, ln 2 + e2 + 4) is y − (ln 2 + e2 + 4) = (4.5 + e2 ) (x − 2) . Or rearranging: y = (4.5 + e2 ) x − 5 − e2 + ln 2. (ii) (a) g ′ (x) = −1/x2 + 3x2 + 7ex . (ii) (b) g(1) = 1/1 + 13 + 7e1 = 2 + 7e and g ′ (1) = −1/12 + 3 ⋅ 12 + 7e = 2 + 7e. So the equation of the tangent at the point (1, 2 + 7e) is y − (2 + 7e) = (2 + 7e) (x − 1) . Or rearranging: y = (2 + 7e)x. g(2) = 1/2 + 23 + 7e2 = 8.5 + 7e2 and g ′ (2) = −1/22 + 3 ⋅ 22 + 7e2 = 11.75 + 7e2 . So the equation of the tangent at the point (2, 8.5 + 7e2 ) is y − (8.5 + 7e2 ) = (11.75 + 7e2 ) (x − 2) . Or rearranging: y = (11.75 + 7e2 ) x − 15 − 7e2 . Page 330, Table of Contents www.EconsPhDTutor.com Answer to Exercise 25. (i) (a) dy 3 = 13 (0.5x−0.5 + 2 ⋅ 3 ). dx x √ 3 y = 13 ( 1 − 2 ) = −26 and 1 (i) (b) At x = 1, dy 3 = 13 (0.5 ⋅ 1−0.5 + 2 ⋅ 3 ) = 84.5. dx 1 So the equation of the tangent at the point (1, −26) is y − (−26) = 84.5 (x − 1). Or rearranging: y = 84.5x − 110.5. And at x = 2: √ √ 3 3 y = 13 ( 2 − 2 ) = 13 ( 2 − ) 2 4 and dy 3 1 3 = 13 (0.5 ⋅ 2−0.5 + 2 ⋅ 3 ) = 13 ( √ + ) . dx 2 2 2 4 √ So the equation of the tangent at the point (2, 13 ( 2 − 3/4)) is √ 3 3 1 y − [13 ( 2 − )] = 13 ( √ + ) (x − 2) 4 2 2 4 √ 3 9 1 1 or y = 13 ( √ + ) x + 13 ( 2 − − √ ) . 4 2 2 4 2 (ii) (a) dy/dx = 9ex − 5x4 . (ii) (b) At x = 1, y = 9e1 − 15 = 9e − 1 and dy/dx = 9e1 − 5 ⋅ 14 = 9e − 5. So the equation of the tangent at the point (1, 9e − 1) is y − (9e − 1) = (9e − 5) (x − 1) or y = (9e − 5)x + 4. At x = 2, y = 9e2 − 25 = 9e2 − 32 and dy/dx = 9e2 − 5 ⋅ 24 = 9e2 − 80. So the equation of the tangent at the point (2, 9e2 − 32) is y − (9e2 − 32) = (9e2 − 80) (x − 2) or y = (9e2 − 80) x − 9e2 + 128. Answer to Exercise 26. (a) f ′ (x) = 2x, so f ′ (0) = 0. (b) g ′ (x) = 2 [x − ln (x + 1)] (1 − 1 ). So g ′ (0) = 0. x+1 3 2 (c) Observe that h(x) = [g(x)] . So h′ (x) = 3 [g(x)] g ′ (x). 2 Since g(0) = 1 and g ′ (0) = 0, we have h′ (0) = 3 [g(0)] g ′ (0) = 0. Page 331, Table of Contents www.EconsPhDTutor.com Answer to Exercise 27. (i) (ii) f ′ (x) = 6x − 4 is negative for x < 2/3, equal to 0 at x = 2/3, and positive for x > 2/3. (iii) There is one stationary point: (2/3, −1/3). Answer to Exercise 28. In order for −1 to be a minimum turning point of g, it must be that to its left, g is decreasing; while to its right, g is increasing. In other words, to the left of −1, g ′ (x) ≤ 0. While to the right of −1, g ′ (x) ≥ 0. Altogether then, we must have g ′ (−1) = 0 — at the minimum turning point, the slope of the function must be 0. Page 332, Table of Contents www.EconsPhDTutor.com Answer to Exercise 29. (i) f ′ (x) = 1. So f has no stationary points. Hence, it has no turning points. (ii) g ′ (x) = 0. So every point of g is a stationary point. However, no point of g is a turning point. Indeed, the graph of g is simply a horizontal line. (iii) h′ (x) = 4x3 − 4x = 4x (x2 − 1) = 4x(x − 1)(x + 1). So the stationary points of h are where x = 0, x = 1, or x = −1. From a graph sketch, we see that there are minimum turning points at x = ±1 and a maximum turning point at x = 0. y x -2 Page 333, Table of Contents -1 0 1 2 www.EconsPhDTutor.com Answer to Exercise 29. (iv) i′ (x) = 3x2 . So the only stationary point of i is at x = 0. But this is not a turning point. (Indeed, as we’ll discover in the next chapter, this is an example of an inflexion point.) (v) j ′ (x) = 3x2 + 2x − 1 = (3x − 1)(x + 1). So the only two stationary points are at x = 1/3 or x = −1 . From a graph sketch, we see that there is a minimum turning points at x = −1 and a maximum turning point at x = 1/3. Page 334, Table of Contents www.EconsPhDTutor.com Answer to Exercise 30. (i) f ′ (x) = 3x2 − 3 = 3 (x2 − 1) = 3(x − 1)(x + 1). So the only stationary points are at x = −1 and x = 1. From a graph sketch, the former (labelled A below) is a maximum turning point and the latter (labelled C below) is a minimum turning point. There are no stationary points of inflexion. (However, there is a non-stationary point of inflexion, namely B. But you need not know how to identify this for the A-Levels.) (ii) g ′ (x) = 3x2 − 6x + 3 = 3 (x2 − 2x + 1) = 3(x − 1)2 . The only stationary point is at x = 1. From a graph sketch, it is a point of inflexion. Page 335, Table of Contents www.EconsPhDTutor.com 1 Answer to Exercise 31. (a) The volume is fixed as 1 = πr2 h. So r = 3 (b) By the Pythagorean Theorem, l = √ √ r2 + h2 = √ 3 . πh 3 + h2 . πh (c) The total external surface area of the cone (including the base) is √ A = πrl = π 3 πh √ 3 + h2 = π πh √ 9 3h + = π 2 h2 π √ 9 + 3πh. h2 −18 dA 3 π − h63 3 π − h63 h3 + 3π (d) Compute = √ . = √ = dh 2 9 + 3πh 2 9 + 3πh 2 A h2 h2 dA 6 1/3 So = 0 ⇐⇒ h = ( ) ≈ 1.24 m. dh π √ (e) Graph A = 9/h2 + 3πh on your graphing calculator. (This is simply the expression we found in part (c).) (Ignore the region where h < 0 since the height of the cone cannot be negative.) Zoom in to verify that the stationary point we found in part (d) is indeed a minimum turning point. Answer to Exercise 32. The function f defined by f (x) = 2x has indefinite integrals i defined by i(x) = x2 + 2, and k defined by k(x) = x2 . The function g defined by g(x) = 3x2 has indefinite integrals h defined by h(x) = x3 , and j defined by j(x) = x3 + 1. Page 336, Table of Contents www.EconsPhDTutor.com Answer to Exercise 33. d (kx + C) = k dx 1. Ô⇒ ∫ k dx = kx + C, ✓ 2. d xk+1 ( + C) = xk dx k + 1 Ô⇒ xk+1 k x dx = + C, ∫ k+1 ✓ 4. d x (e + C) = ex dx Ô⇒ x x ∫ e dx = e + C, ✓ 5. d (ax + b)k+1 [ + C] = (ax + b)k dx a(k + 1) Ô⇒ (ax + b)k+1 k (ax + b) dx = + C, ∫ a(k + 1) ✓ 6. d 1 ax+b [ e + C] = eax+b dx a Ô⇒ 1 ax+b dx = eax+b + C, ∫ e a ✓ 7. d [f (x) ± g(x) + C] = f ′ (x) ± g ′ (x) Ô⇒ ∫ f ′ (x) ± g ′ (x) dx = f (x) ± g(x) + C, ✓ dx d [kf (x) + C] = kf ′ (x) dx 8. ′ ∫ kf (x) dx = kf (x) + C. Ô⇒ ✓ Answer to Exercise 34. (i) ∫ 7x5 − 8x4 + 3x2 + 2 dx = 7x6 /6 − 8x5 /5 + x3 + 2x + C, where C is the constant of integration. 5x+2 (ii) ∫ e 2 − (5x + 2) dx = e 5x+2 (5x + 2)3 /5 − + C, where C is the constant of integration. 3⋅5 (iii) ∫ 16/x + 32x3 dx = 16 ln ∣x∣ + 8x4 + C, where C is the constant of integration. 2 2 2 Answer to Exercise 35. (i) ∫ y dx = ∫ 6 dx = [6x]1 = 12 − 6 = 6. 1 1 3 3 (ii) ∫ y dx = ∫ −2 −2 2 2 x3 5x2 8 1 5 5 x + 5x + 10 dx = [ + + 10x] = + 10 + 20 − ( + + 10) = 19 . 3 2 3 3 2 6 1 2 2 2 (iii) ∫ y dx = ∫ 1/x dx = [ln ∣x∣]1 = ln 2 − ln 1 = ln 2. 1 1 Page 337, Table of Contents www.EconsPhDTutor.com Answer to Exercise 36. Our desired area is labelled A below. Method #1. The entire rectangle A + B + C + D has area 21/3 × 2 = 24/3 . The rectangle B + C 1/3 4 2 21/3 x 24/3 − 1 has area 1 × 1 = 1. The region D has area ∫ x3 dx = [ ] = . Hence, 4 1 4 1 A = A + B + C + D − (B + C + D) = 2 Method #2. y = x3 ⇐⇒ x = y 1/3 . So A = ∫ y=2 y=1 4/3 24/3 − 1 3 − (1 + ) = (24/3 − 1) . 4 4 x dy = ∫ 2 1 y 1/3 dy = 3 4/3 2 3 4/3 [y ]1 = (2 − 1). 4 4 y y=2 A y=1 D B C x Page 338, Table of Contents www.EconsPhDTutor.com Answer to Exercise 37. The desired area is labelled A below. The area A + B + C + D equals 3 (ln 3 − 0.5). ln 3 ln 3 ln 3 ex dx = [ex ]ln 2 = eln 3 − eln 2 = 3 − 2 = 1. y dx = ∫ The area C + D equals ∫ ln 2 ln 2 The area B equals 2 (ln 2 − 0.5). Hence, the desired area A = (A + B + C + D)−B −(C + D) = 3 (ln 3 − 0.5)−2 (ln 2 − 0.5)−1 = 3 ln 3 − 2 ln 2 − 1.5. √ Answer to Exercise 38. The two curves intersect at ± 2/2 (quadratic formula). So √ A=∫ 2/2 √ − 2/2 √ √ √ √ √ √ 2/2 3 2x 2 2 2 2 2 2 2 2 ] √ =[ − ] − [− + ]= . 2 − x2 − (x2 + 1) dx = [x − 3 − 2/2 2 12 2 12 3 A Page 339, Table of Contents www.EconsPhDTutor.com 62 Answers to Exercises in Part III: Probability and Statistics Answer to Exercise 39. Taking the green path, there are 3 ways. Taking the red path, there are 2 ways. Hence, there are 3 + 2 = 5 ways to get from the Starting Point to the River. Answer to Exercise 40. The tree diagram below illustrates. Case #1. First letter is a D. Case #1(i). Second letter is a D. Then the last two letters must both be E’s. (1 permutation.) Case #1(ii). Second letter is an E. Then the last two letters must be either DE or ED. (2 permutations.) Case #2. First letter is a E. Case #2(i). Second letter is an E. Then the last two letters must both be D’s. (1 permutation.) Case #2(ii). Second letter is a D. Then the last two letters must be either DE or ED. (2 permutations.) Altogether then, there are 1 + 2 + 1 + 2 = 6 possible permutations of the letters in DEED. Answer to Exercise 41. 3 × 5 × 10 = 150. Page 340, Table of Contents www.EconsPhDTutor.com Answer to Exercise 42. We must choose three 4D numbers. Choosing the first 4D number involves four decisions — what to put as the first, second, third and fourth digits, with the condition that no digit is repeated. ____ 1 2 3 4 Thus, by the MP, there are 10 × 9 × 8 × 7 = 5040 ways to choose the first 4D number. If we ignored the fact that we already chose the first 4D number, then there’d similarly be 5040 ways to choose the second 4D number (given the condition that this second 4D number does not have any repeated digits). However, there is an additional condition — namely, the second 4D number cannot be the same as the first. Thus, there are 5040 − 1 = 5039 ways to choose the second 4D number. By similar reasoning, we see that there are 5040 − 2 = 5038 ways to choose the third 4D number. Altogether then, by the MP, there are 5040 × 5039 × 5038 = 127, 947, 869, 280 ways to choose the three 4D numbers. Page 341, Table of Contents www.EconsPhDTutor.com Answer to Exercise 43. Apply the IEP twice. 1. The food court and hawker centre share 2 types of cuisine (Chinese and Western) in common. And so together, the food court and the hawker centre have 4 + 3 − 2 = 5 different types of cuisine. 2. Combine together the food court and the hawker centre (call this the “Low-Class Place”). The Low-Class Place has 5 types of cuisine and shares 2 types of cuisine (Chinese and Malay) with the restaurant. And so together, the Low-Class Place and restaurant have 5 + 3 − 2 = 6 different types of cuisine (namely Chinese, Indonesian, Japanese, Korean, Malay, and Western). Answer to Exercise 44. 10 − 3 = 7. (Can you name them?) Answer to Exercise 45. 6! = 720, 7! = 5040, and 8! = 40320. Answer to Exercise 46. 7!/ (4!3!) = 35. Answer to Exercise 47. The problem of choosing a president and vice-president from a committee of 11 members is equivalent to the problem of filling 2 spaces with 11 distinct objects. The answer is thus P (11, 2) = 11!/9! = 11 × 10 = 110. Page 342, Table of Contents www.EconsPhDTutor.com Answer to Exercise 48. Let B and S stand for brother and sister, respectively. (a) First consider the problem of permuting the seven letters in BBBBSSS, without any two B’s next to each other. There is only 1 possible arrangement, namely BSBSBSB. There are 4! ways to permute the brothers and 3! ways to permute the sisters. Hence, there are in total 1 × 4!3! = 144 possible ways to arrange the siblings in a line, so that no two brothers are next to each other. (b) First consider the problem of permuting the seven letters in BBBBSSS, without any two S’s next to each other. We’ll use the AP. 1. B in position #1. (a) B in position #2. Then the only way to fill the remaining five positions is SBSBS. Total: 1 possible arrangement. (b) S in position #2. Then we must have B in position #3. i. B in position #4. Then the only way to fill the remaining three positions is SBS. Total: 1 possible arrangement. ii. S in position #4. Then we must have B in position #5. And there are two ways to fill the remaining two positions: either BS or SB. Total: 2 possible arrangements. 2. S in position #1. Then we must have B in position #2. (a) B in position #3. Then, like in 1(b), we are left with two B’s and two S’s to fill the remaining four positions. Hence, Total: 3 possible arrangements. (b) S in position #3. Then we must have B in position #4. There are three ways to fill the remaining three positions: SBB, BSB, and BBS. Total: 3 possible arrangements. By the AP, there are 1 + 1 + 2 + 3 + 3 = 10 possible arrangements. Again, there are 4! ways to permute the brothers and 3! ways to permute the sisters. Hence, there are in total 10 × 4!3! = 1440 possible ways to arrange the siblings in a line, so that no two sisters are next to each other. Page 343, Table of Contents www.EconsPhDTutor.com Answer to Exercise 49. ⎛n⎞ n! = ⎝ k ⎠ k!(n − k)! = n × (n − 1) × ⋅ ⋅ ⋅ × (n − k + 1) × (n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1 k!(n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1 = n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1) k! (mass cancellation). Answer to Exercise 50. C(4, 2) = 4! 4×3 4! = = 2!(4 − 2)! 2!2! 2 × 1 = 6, C(6, 4) = 6! 6×5 6! = = 4!(6 − 4)! 4!2! 2 × 1 = 15, C(7, 3) = 7! 7×6×5 7! = = = 35. 3!(7 − 3)! 3!4! 3 × 2 × 1 Answer to Exercise 51. ⎛ 3 ⎞⎛ 7 ⎞⎛ 5 ⎞ = 630. ⎝ 1 ⎠⎝ 2 ⎠⎝ 2 ⎠ Answer to Exercise 52. (a) C(1, 0) + C(1, 1) = 1 + 1 = 2 = C(2, 1). (b) C(4, 2) + C(4, 3) = 3 + 3 = 6 = C(5, 3). (c) C(17, 2) + C(17, 3) = 17! 17 × 16 17 × 16 × 15 17! + = + 2!15! 3!14! 2×1 3×2×1 = 17 × 8 + 17 × 8 × 5 = 17 × 8 × 6 = Page 344, Table of Contents 18 × 17 × 16 . 3×2×1 www.EconsPhDTutor.com Answer to Exercise 53. ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ = 35, = 35, = 21, = 7, = 1, ⎝4⎠ ⎝3⎠ ⎝2⎠ ⎝1⎠ ⎝0⎠ ⎛7⎞ ⎛7⎞ ⎛7⎞ = 21, = 7, = 1. ⎝5⎠ ⎝6⎠ ⎝7⎠ Answer to Exercise 54. Expanding, we have (1 + x)3 = (1 + x)(1 + x)(1 + x) = 1 ⋅ 1 ⋅ 1 + 1 ⋅ 1 ⋅ x + 1 ⋅ x ⋅ 1 + x ⋅ 1 ⋅ 1 + 1 ⋅ x ⋅ x + x ⋅ 1 ⋅ x + x ⋅ x ⋅ 1 + x ⋅ x ⋅ x. ´¹¹ ¹ ¸′ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸′ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸′ ¹ ¹ ¹ ¹ ¶ 1 x 0 xs 2 xs 3 xs Consider the 6 terms on the right. There is C(3, 0) = 1 way to choose 0 of the x’s. Hence, the coefficient on x0 is C(3, 0) — this corresponds to the term 1 ⋅ 1 ⋅ 1 above. There are C(3, 1) = 3 ways to choose 1 of the x’s. Hence, the coefficient on x1 is C(3, 1) — this corresponds to the terms 1 ⋅ 1 ⋅ x, 1 ⋅ x ⋅ 1, and x ⋅ 1 ⋅ 1 above. There are C(3, 2) = 3 ways to choose 2 of the x’s. Hence, the coefficient on x2 is C(3, 2) — this corresponds to the terms 1 ⋅ x ⋅ x, x ⋅ 1 ⋅ x, and x ⋅ x ⋅ 1 above. There is C(3, 03) = 1 way to choose 3 of the x’s. Hence, the coefficient on x3 is C(3, 3) — this corresponds to the term x ⋅ x ⋅ x above. Altogether then, (1 + x)3 = ⎛3⎞ 0 ⎛3⎞ 1 ⎛3⎞ 2 ⎛3⎞ 3 x + x + x + x = 1 + 3x + 3x2 + x3 . ⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ Answer to Exercise 55. 27 = 128. ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ + + + ⋅⋅⋅ + = 1 + 7 + 21 + 35 + 35 + 21 + 7 + 1 = 128. ⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠ So indeed, 27 = ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ + + + ⋅⋅⋅ + . ⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠ Page 345, Table of Contents www.EconsPhDTutor.com Answer to Exercise 56. (3 + x)4 = ⎛4⎞ 4 0 ⎛4⎞ 3 1 ⎛4⎞ 2 2 ⎛4⎞ 1 3 ⎛4⎞ 4 4 3x + 3x + 3x + 3x + 3x ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠ ⎝0⎠ = 81 + 4 ⋅ 27x + 6 ⋅ 9x2 + 4 ⋅ 3x3 + x4 = 81 + 108x + 54x2 + 12x3 + x4 . Answer to Exercise 57. and (a) There are ⎛4⎞ = 4 ways of choosing the two Tan sons ⎝2⎠ ⎛3⎞ = 3 ways of choosing the two Wong daughters. ⎝2⎠ Having chosen these sons and daughters, there are only 2! = 2 × 1 possible ways of matching them up. This is because for the first chosen Tan Son, we have 2 possible choices of brides for him. And then for the second chosen Tan Son, there is only 1 possible choice of bride left for him. Altogether then, there are ⎛ 4 ⎞⎛ 3 ⎞ ⋅ 2 = 24 ways of forming the two couples. ⎝ 2 ⎠⎝ 2 ⎠ ⎛6⎞ ⎛9⎞ = 6 ways of choosing the five Lee sons and = 126 ways of choosing ⎝5⎠ ⎝5⎠ the five Ho daughters. (b) There are Having chosen these sons and daughters, there are 5! = 5 × 4 × 3 × 2 × 1 possible ways of matching them up. This is because for the first chosen Tan Son, we have 5 possible choices of brides for him. And then for the second chosen Tan Son, there are 4 possible choices of brides left for him. Etc. Altogether then, there are ⎛ 6 ⎞⎛ 9 ⎞ ⋅ 5! = 6 ⋅ 126 ⋅ 5! = 90, 720 ways of forming the five ⎝ 5 ⎠⎝ 5 ⎠ couples. Page 346, Table of Contents www.EconsPhDTutor.com Answer to Exercise 58(a). The 52 possible outcomes are: A«, K«, Q«, . . . , 2«, Aª, Kª, Qª, . . . , 2ª, A©, K©, Q©, . . . , 2©, A¨, K¨, Q¨, . . . , 2¨. P(A) = 1/4. (b) The 4 possible outcomes are HH, HT , T H, and T T . P(B) = 1/2. (c) The 36 possible outcomes are: , ,..., There are 4 ways to roll a 9, namely: , , ,..., , ,..., , , or . . Hence P(C) = 4/36. Answer to Exercise 59. A and B are not mutually exclusive. A and C are not mutually exclusive. But B and C are mutually exclusive. Answer to Exercise 60. B ′ is the event that the student has at least two phones. C ′ is the event that the student has zero, one, or at least three phones. Page 347, Table of Contents www.EconsPhDTutor.com Answer to Exercise 61. P(A) = 0.5, P(B) = 3/36, P(C) = 0.5, P(A ∪ B) = P(A) + P ({11}) = 0.5+1/12 = 7/12, P(A∪C) = 1, and P(B∪C) = P(C)+P ({12}) = 0.5+1/12 = 7/12. Answer to Exercise 62. P ({11}) = 1/12. P(A ∩ B) = P ({12}) = 1/12, P(A ∩ C) = 0, and P(B ∩ C) = Answer to Exercise 63. (a) (b) Page 348, Table of Contents www.EconsPhDTutor.com Answer to Exercise 64. Let A be the event that we rolled at least one even number and B be the event that the sum of the two dice was 8. We have P(B) = 5/36 (see Exercise 70). And A ∩ B can occur if and only if the two dice were 3/36. , , or . Hence, P(A ∩ B) = Altogether then, P(A∣B) = P(A ∩ B) 3/36 3 = = . P(B) 5/36 5 Answer to Exercise 65. By Fact 7, A, B are independent events ⇐⇒ P(A∣B) = P(A). Rearranging, P(B) = P(A ∩ B)/P(A) = P(B∣A), as desired. Answer to Exercise 66. First, note that P (H1 ) = P (T1 ) = P (H2 ) = 0.5. (a) P (H1 ∩ H2 ) = 0.25 = 0.5 × 0.5 = P (H1 ) P (H2 ), so that indeed H1 and H2 are independent. (b) P (H2 ∩ T1 ) = 0.25 = 0.5×0.5 = P (H2 ) P (T1 ), so that indeed H2 and T1 are independent. (c) Observe that H1 ∩ T1 = ∅ (it is impossible that “the first coin flip is heads” AND also “the first coin flip is tails”). Hence, P (H1 ∩ T1 ) = P (∅) = 0 ≠ 0.25 = 0.5 × 0.5 = P (H1 ) P (T1 ), so that indeed H1 and T1 are not independent. Answer to Exercise 67. No, the journalist is incorrectly assuming that the probability of one family member making the NBA is independent of another family member making the NBA. But such an assumption is almost certainly false. The same excellent genes that made Rick Barry a great basketball player, probably also helped his three sons. Not to mention that having an NBA player as your father probably helps a lot too. The two events “family member #1 in NBA” and “family member #2 in NBA” are probably not independent. So we cannot simply multiply probabilities together. Answer to Exercise 68. The possible observed values of X are 2, 3, 4, . . . , and 12. Page 349, Table of Contents www.EconsPhDTutor.com Answer to Exercise 69. The possible observed values of C are 8 (two aces), 7 (one ace and one king), 6 (one ace and one queen, or two kings), 5 (one ace and one jack, or one king and one queen), 4 (one ace, or one king and one jack, or two queens), 3 (one king, or one queen and one jack) 2 (one queen, or two jacks), 1 (one jack), 0 (no ace, king, queen, or jack). Answer to Exercise 70. k (a) s such that X(s) = k P(X = k) 1 36 2 . 3 , . 4 , , . 5 , , , . 6 , , , , . 7 , , , , , 8 , , , , . 9 , , , . 10 , , . 11 , . 12 . 2 36 3 36 4 36 . 5 36 6 36 5 36 4 36 3 36 2 36 1 36 (b) E is the event X ≥ 10. (c) P(E) = P (X ≥ 10) = P (X = 10) + P (X = 11) + P (X = 12) = 3 2 1 6 1 + + = = . 36 36 36 36 6 Answer to Exercise 71. No. For example, P (X = 0, Y = 0) = 0, but P (X = 0) P (Y = 0) = 0.5 × 0.25 = 0.125. Page 350, Table of Contents www.EconsPhDTutor.com Answer to Exercise 72. (a) P(X + Y = 2) is simply the probability of 2 heads and 0 sixes OR 1 head and 1 six OR 0 heads and 2 sixes. So P (X + Y = 2) = = 1 1 5 5 ⎛ 2 ⎞1 1⎛ 2 ⎞5 1 1 1 1 1 ⋅ ⋅ ⋅ + ⋅ ⋅ + ⋅ ⋅ ⋅ 2 2 6 6 ⎝ 1 ⎠2 2⎝ 1 ⎠6 6 2 2 6 6 25 20 1 46 + + = . 144 144 144 144 (b) P (X + Y = 3) is simply the probability of 2 heads and 1 six OR 1 head and 2 sixes. So P (X + Y = 3) = 2 12 1 1 ⎛ 2 ⎞ 5 1 ⎛ 2 ⎞ 1 1 1 1 10 ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ = + = . 2 2 ⎝ 1 ⎠ 6 6 ⎝ 1 ⎠ 2 2 6 6 144 144 144 (c) P (X + Y = 4) is simply the probability of 2 heads and 2 sixes. So P (X + Y = 4) = 1 1 1 1 1 ⋅ ⋅ ⋅ = . 2 2 6 6 144 E[X + Y ] (d) = P (X + Y = 0) ⋅ 0 + P (X + Y = 1) ⋅ 1 + P (X + Y = 2) ⋅ 2 + P (X + Y = 3) ⋅ 3 + P (X + Y = 4) ⋅ 4 = 25 60 46 12 1 60 + 92 + 36 + 4 192 4 ⋅0+ ⋅1+ ⋅2+ ⋅3+ ⋅4= = = . 144 144 144 144 144 144 144 3 Page 351, Table of Contents www.EconsPhDTutor.com Answer to Exercise 73(a). The possible observed values of X are 2000, 1000, 490, 250, 60, 0. (Don’t forget to include 0.) Similarly, the possible observed values of X are 3000, 2000, 800, 0. (b) P (X = 2000) = P (X = 1000) = P (X = 490) = 1 , 10000 P (X = 250) = P (X = 60) = 10 , 10000 P (X = 0) = 9977 , 10000 P (Y = 3000) = P (Y = 2000) = P (Y = 800) = 1 , 10000 P (Y = 0) = 9997 . 10000 E[X] = 2000P (X = 2000) + 1000P (X = 1000) + ⋅ ⋅ ⋅ + 0P (X = 0) (c) = 2000 1000 490 250 ⋅ 10 60 ⋅ 10 9977 ⋅ 0 + + + + + = 0.659 10000 10000 10000 10000 10000 10000 E[Y ] = P (Y = 3000) ⋅ 3000 + P (Y = 2000) ⋅ 2000 + P (Y = 800) ⋅ 800 + P (Y = 0) ⋅ 0 = 1 1 1 9997 ⋅ 3000 + ⋅ 2000 + ⋅ 800 + ⋅ 0 = 0.3 + 0.2 + 0.08 + 0 = 0.58. 10000 10000 10000 10000 (d) For every $1 staked, the “big” game is expected to lose you $0.341 and the “small” game is expected to lose you $0.42. Thus, the “big” game is expected to lose you less money. Page 352, Table of Contents www.EconsPhDTutor.com Answer to Exercise 74. From our work earlier, we know that P (B = 1) = P (B = 2) = P (B = 3) = P (B = 4) = 4/52 and P (B = 0) = 36/52. Compute E [B] = E [B 2 ] = Hence, 1 ⋅ 4 + 2 ⋅ 4 + 3 ⋅ 4 + 4 ⋅ 4 + 36 ⋅ 0 10 = . 52 13 12 ⋅ 4 + 22 ⋅ 4 + 32 ⋅ 4 + 42 ⋅ 4 + 36 ⋅ 0 30 = . 52 13 30 10 2 290 V[B] = E [B ] − (E [B]) = −( ) = . 13 13 169 2 2 Answer to Exercise 75. E [Y ] = 65 35 × 20 cm + × 30 cm = 26.5 cm. 100 100 V [Y ] = 35 65 2 2 × (20 cm − 26.5 cm) + × (30 cm − 26.5 cm) = 22.75 cm2 . 100 100 SD [Y ] = √ V [Y ] ≈ 4.77 cm. Answer to Exercise 76. (a) 2µ kg, 2σ 2 kg2 . (b) 2µ kg, 4σ 2 kg2 . (c) The mean of the total weight of the two fish is 2µ kg. However, we do not know the variance, since the weights of the two fish are not independent. Page 353, Table of Contents www.EconsPhDTutor.com Answer to Exercise 77. Let X ∼ B (20, 0.01) be the number of components in engine #1 that fail. Let Y ∼ B (35, 0.005) be the number of components in engine #2 that fail. The probability that engine #1 fails is P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − P (X = 0) − P (X = 1) =1− ⎛ 20 ⎞ ⎛ 20 ⎞ 0.010 0.9920 − 0.011 0.9919 ≈ 0.0169. ⎝ 0 ⎠ ⎝ 1 ⎠ The probability that engine #2 fails is P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − P (Y = 0) − P (Y = 1) =1− ⎛ 35 ⎞ ⎛ 35 ⎞ 0.0050 0.99535 − 0.0051 0.99534 ≈ 0.0133. ⎝ 0 ⎠ ⎝ 1 ⎠ Hence, the probability that both engines fail is P (X ≥ 2) P (Y ≥ 2) ≈ 0.00022. Answer to Exercise 78. ⎧ ⎪ ⎪ 0, if k < 3, ⎪ ⎪ ⎪ ⎪ (a) FY (k) = ⎨0.5k, if k ∈ [3, 5], ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ if k > 5. ⎩1, ⎧ ⎪ ⎪ ⎪0.5, if k ∈ [3, 5] (b) fY (k) = ⎨ ⎪ ⎪ otherwise. ⎪ ⎩0, (c) P (3.1 ≤ Y ≤ 4.6) = 0.75 is in blue and P (4.8 ≤ Y ≤ 4.9) = 0.05 is in red. Page 354, Table of Contents www.EconsPhDTutor.com Answer to Exercise 79. (a) From Z-tables, P (Z ≥ 1.8) = 1 − P (Z ≤ 1.8) = 1 − Φ(1.8) ≈ 1 − 0.9641 = 0.0359. Graphing calculator screenshot: -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 (b) From Z-tables, P (−0.351 < Z < 1.2) = Φ(1.2) − Φ(−0.351) = Φ(1.2) − [1 − Φ(0.351)] ≈ 0.8849 − (1 − 0.6372) = 0.8849 − 0.3628 = 0.5221. Graphing calculator screenshot: Answer to Exercise 80. If µ = 0 and σ 2 = 1, then a−µ 2 2 a−0 2 1 1 1 fX (a) = √ e−0.5( σ ) = √ e−0.5( 1 ) = √ e−0.5a = φ(a). σ 2π 1 2π 2π We’ve just shown that the PDF of X ∼ N(µ, σ 2 ) when µ = 0 and σ 2 , is the same as the PDF of the SNRV Z ∼ N(0, 1). Hence, the SNRV is indeed simply a normal random variable with mean µ = 0 and variance σ 2 = 1. Page 355, Table of Contents www.EconsPhDTutor.com Answer to Exercise 81. 1 −µ with a = and b = : σ σ First observe that X − µ X −µ = + . Now simply use Fact 11, σ σ σ X − µ X −µ µ −µ 1 2 = + ∼ N( + , σ ) = N (0, 1) . σ σ σ σ σ σ2 Answer to Exercise 82. We are given that X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2). (a) P (X ≥ 1) = P (Z ≥ P (Y ≥ 1) = P (Z ≥ 1 − 2.14 √ ) ≈ P (Z ≥ −0.5098) = P (Z ≤ 0.5098) = Φ (0.5098) ≈ 0.6949. 5 1 − (−0.33) √ ) ≈ P (Z ≥ 0.9405) = 1 − P (Z ≤ 0.9405) = 1 − Φ (0.9405) ≈ 2 0.1735. P (X ≥ 1) and P (Y ≥ 1) P (−2 ≤ X ≤ −1.5) and P (−2 ≤ Y ≤ −1.5) (b) P (−2 ≤ X ≤ −1.5) = P ( (−2) − 2.14 (−1.5) − 2.14 √ √ ≤Z≤ ) 5 5 ≈ P (−1.8515 ≤ Z ≤ −1.6279) = P (1.6279 ≤ Z ≤ 1.8515) = Φ (1.8515) − Φ (1.6279) ≈ 0.9679 − 0.9482 = 0.0197. P (−2 ≤ Y ≤ −1.5) = P ( (−2) − (−0.33) (−1.5) − (−0.33) √ √ ≤Z≤ ) 2 2 ≈ P (−1.1809 ≤ Z ≤ −0.8273) = P (0.8273 ≤ Z ≤ 1.1809) = Φ (1.1809) − Φ (0.8273) ≈ 0.8812 − 0.7959 = 0.0853. Page 356, Table of Contents www.EconsPhDTutor.com Answer to Exercise 83. (a) Let W ∼ N (25000, 64000000) and E ∼ N (200, 10000). Let B = 0.002W + 0.3E be the total bill in a given month. Then B ∼ N (0.002 × 25000 + 0.3 × 200, 0.0022 × 64000000 + 0.32 × 10000) = N (50 + 60, 256 + 900) = N (110, 1156) . Thus, P (B > 100) ≈ 0.6157 (calculator). (b) Let B1 ∼ N (110, 1156), B2 ∼ N (110, 1156), . . . , B12 ∼ N (110, 1156) be the bills in each of the 12 months. Then the total bill in a year is T = B1 +B2 +⋅ ⋅ ⋅+B12 ∼ N (12 × 110, 12 × 1156) = N (1320, 13872). Thus, P (T > 1000) ≈ 0.9967 (calculator). (c) The total bill in a given month is B = 0.002W + xE and B ∼ N (50 + 200x, 256 + 10000x2 ) . Our goal is to find the value of x for which P (B > 100) = 0.1. We have 100 − (50 + 200x) 50 − 200x P (B > 100) = P (Z > √ ) = P (Z > √ ) 256 + 10000x2 256 + 10000x2 50 − 200x ) = 0.1. = 1 − Φ (√ 256 + 10000x2 From the Z-tables, Φ (√ 50 − 200x 256 + 10000x2 ) = 0.9 ⇐⇒ 50 − 200x √ ≈ 1.2815. 256 + 10000x2 One can rearrange, do the algebra (square both sides), and use the quadratic formula. Alternatively, one can simply use one’s graphing calculator to find that x ≈ 0.084. We conclude that the maximum value of x is approximately 0.084, in order for the probability that the total utility bill in a given month exceeds $100 is 0.1 or less. Page 357, Table of Contents www.EconsPhDTutor.com Answer to Exercise 84. 3.5 and variance 35/12. From our earlier work, we know that each die roll has mean The CLT says that since n = 30 ≥ 30 is large enough and the distribution is “nice enough” (we are assuming this), X can be approximated by the normal random variable Y ∼ N (30 × 3.5, 30 × 35/12) = N (105, 1050/12). Thus, using also the continuity correction, we have P(100 ≤ X ≤ 110) ≈ P(99.5 ≤ Y ≤ 110.5) ≈ 0.4435 (calculator). Answer to Exercise 85. Let X be the random variable that is the sum of the weights of the 5, 000 Coco-Pops. The CLT says that since n = 5000 ≥ 30 is large enough and the distribution is “nice enough” (we are assuming this), X can be approximated by the normal random variable Y ∼ N (5000 × 0.1, 5000 × 0.004) = N (500, 20). Thus, P (X ≤ 499) ≈ P (Y ≤ 499) ≈ 0.4115 (calculator). Answer to Exercise 86. x̄ = 3 + 14 + 2 + 8 + 8 + 6 + 0 41 = and 7 7 ∑ x2i − nx̄2 9 + 156 + 4 + 64 + 64 + 36 − 412 /7 155 = = . s = n−1 6 7 2 Answer to Exercise 87. (a) The sample mean x̄ and variance s2 are n ∑i=1 x 1885 x̄ = = = 188.5, n 10 2 (∑n i=1 x) n ∑i=1 x2 − 2 s = n−1 n 378, 265 − 1885 10 = ≈ 2550. 9 2 (b) The sample mean x̄ and variance s2 are n n n n ∑i=1 x ∑i=1 (x − 50 + 50) ∑i=1 (x − 50) + ∑i=1 50 1885 + 50n 1885 x̄ = = = = = + 50 = 238.5, n n n n n 2 [∑ (x −50)] n 378, 265 − 1885 ∑i=1 (xi − 50) − i=1 ni 10 2 = ≈ 2550. s = n−1 9 2 Page 358, Table of Contents n 2 www.EconsPhDTutor.com Answer to Exercise 88. (a) Assume that the weights of the five Singaporeans sampled are independently- and identically-distributed. Then unbiased estimates for the population mean µ and variance σ 2 of the weights of Singaporeans are, respectively, the observed sample mean x̄ and observed sample variance s2 : ∑ xi 32 + 88 + 67 + 75 + 56 = = 63.6, n 5 ∑ x2i − nx̄2 322 + 882 + 672 + 752 + 562 − 4 × 63.6 2 s = = = 448.3. n−1 4 x̄ = (b) We don’t know! And unless we literally gather and weigh every single Singaporean, we will never know what exactly the average weight of a Singaporean is. All we’ve found in part (a) is an estimate (63.6 kg) for the average weight of a Singaporean. We know that on average, the estimator we uses “gets it right”. However, it could well be that we’re unlucky (and got 5 unusually heavy or unusually light persons) and the estimate of 63.6 kg is thus way off. Answer to Exercise 89. E [X1 + X2 + ⋅ ⋅ ⋅ + Xn ] X 1 + X 2 + ⋅ ⋅ ⋅ + Xn ]= n n E [X1 ] + E [X2 ] + ⋅ ⋅ ⋅ + E [Xn ] µ + µ + ⋅ ⋅ ⋅ + µ nµ = = = = µ. n n n E [X̄] = E [ We have just shown that E [X̄] = µ. In other words, we’ve just shown that X̄ is an unbiased estimator for µ. Page 359, Table of Contents www.EconsPhDTutor.com Answer to Exercise 90. (a) The observed random sample is (x1 , x2 , . . . , x10 ) = (1, 1, 1, 1, 1, 1, 1, 0, 0, 0). The observed sample mean and observed sample variance are x̄ = x1 + x2 + ⋅ ⋅ ⋅ + x10 = 0.7, n 2 2 2 ⋅ (x1 − x̄) + (x2 − x̄) + ⋅ ⋅ ⋅ + (x10 − x̄) 7 ⋅ 0.32 + 3 ⋅ 0.72 s = = = 0.23. n−1 9 2 (b) Yes, the observed sample mean x̄ = 0.7 is an unbiased estimate for the true population mean µ (i.e. the true proportion of coin flips that are heads). ⋅ And yes, the observed sample variance s2 = 0.23 is an unbiased estimate for the true population variance σ 2 . (c) No, this is merely one observed random sample, from which we generated a single estimate (“guess”) — namely x̄ = 0.7 — of the true population mean µ. All we know is that the sample mean X̄ is an unbiased estimator for the true population mean µ. That is, the average estimate generated by X̄ will equal µ. However, any particular estimate x̄ may or may not be equal to µ. Indeed, if we’re unlucky, our particular estimate may be very far from the true µ. 1 1 Answer to Exercise 91. V [X̄] = V [ (X1 + X2 + ⋅ ⋅ ⋅ + Xn )] = 2 V [X1 + X2 + ⋅ ⋅ ⋅ + Xn ] = n n 2 1 σ 1 (V [X1 ] + V [X2 ] + ⋅ ⋅ ⋅ + V [Xn ]) = 2 (nσ 2 ) = . 2 n n n Page 360, Table of Contents www.EconsPhDTutor.com Answer to Exercise 92. (a) The population mean µ is the number defined by k µ = ∑ xi /k. It is the average across all population values. i=1 k (b) The population variance σ 2 is the number defined by σ 2 = ∑ (xi − µ) /k. It measures i=1 the dispersion across the population values. n (c) The sample mean X̄ is a random variable defined by X̄ = ∑ Xi /n. It is the average i=1 of all values in a random sample. n (d) The sample variance S 2 is a random variable defined by S 2 = ∑ (Xi − X̄) / (n − 1). i=1 It measures the dispersion across the values in a random sample. (e) The mean of the sample mean, also called the expected value of the sample mean, is the number E [X̄]. The interpretation is that if we we have infinitely-many observed samples of size n, calculate the observed sample mean for each, then E [X̄] is equal to the average across the observed sample means. It can be shown that E [X̄] = µ and hence that the sample mean X̄ is an unbiased estimator for the population mean µ. (f) The variance of the sample mean is the number V [X̄]. The interpretation is that if we have infinitely-many observed random samples of size n, calculate the observed sample mean for each, then V [X̄] measures the dispersion across the observed sample means. (g) The mean of the sample variance, also called the expected value of the sample variance, is the number E [S 2 ]. The interpretation is that if we have infinitely-many observed random samples of size n, calculate the observed sample variance for each, then E [S 2 ] is equal to the average across the observed sample variances. It can be shown that E [S 2 ] = σ 2 and hence that the sample variance S 2 is an unbiased estimator for the population variance σ2 . (h) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the corresponding observed sample mean as x̄ = x1 + x2 + x3 1 + 1 + 0 2 = = . 3 3 3 The observed sample mean is the average of all values in an observed random sample. (i) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the corresponding observed sample variance as 2 2 2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) 1/9 + 1/9 + 4/9 1 s = = = . 3−1 2 3 2 The observed sample variance measures the dispersion across the observed sample variances. Page 361, Table of Contents www.EconsPhDTutor.com Answer to Exercise 93. Let µ be the probability that a coin-flip is heads. The null and alternative hypotheses are H0 ∶ µ = 0.5 and HA ∶ µ > 0.5. Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if the ith coin-flip is heads and 0 otherwise. Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 . In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed test statistic is t = 17. Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is P (T ≥ 17∣H0 ) = P (T = 17∣H0 ) + P (T = 18∣H0 ) + P (T = 19∣H0 ) + P (T = 20∣H0 ) = ⎛ 20 ⎞ 17 3 ⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 ≈ 0.0013. ⎝ 17 ⎠ ⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠ Since p ≈ 0.0013 < α = 0.05, we reject H0 at the 5% significance level. Answer to Exercise 94. Let µ be the true long-run proportion of coin-flips that are heads. The null and alternative hypotheses are H0 ∶ µ = 0.5 and HA ∶ µ ≠ 0.5. Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if the ith coin-flip is heads and 0 otherwise. Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 . In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed test statistic is t = 17. Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is P (T ≥ 17, T ≤ 3∣H0 ) = P (T = 0∣H0 ) + ⋅ ⋅ ⋅ + P (T = 3∣H0 ) + P (T = 17∣H0 ) + ⋅ ⋅ ⋅ + P (T = 20∣H0 ) = ⎛ 20 ⎞ 0 20 ⎛ 20 ⎞ 1 19 ⎛ 20 ⎞ 17 3 ⎛ 20 ⎞ 20 0 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.0026. ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 17 ⎠ ⎝ 20 ⎠ Since p ≈ 0.0026 < α = 0.05, we reject H0 at the 5% significance level. Page 362, Table of Contents www.EconsPhDTutor.com Answer to Exercise 95. Let µ be the probability that a coin-flip is heads. (a) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ > 0.5. The test statistic T is the number of heads (out of the 20 coin-flips). For t = 14, the corresponding p-value is P (T ≥ 14∣H0 ) = P (T = 14∣H0 true) + P (T = 15∣H0 true) + ⋅ ⋅ ⋅ + P (T = 20∣H0 true) = ⎛ 20 ⎞ 14 6 ⎛ 20 ⎞ 15 5 ⎛ 20 ⎞ 20 0 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.05766. ⎝ 14 ⎠ ⎝ 15 ⎠ ⎝ 20 ⎠ For t = 15, the corresponding p-value is P (T ≥ 15∣H0 ) = P (T = 15∣H0 true) + P (T = 15∣H0 true) + ⋅ ⋅ ⋅ + P (T = 20∣H0 true) = ⎛ 20 ⎞ 14 6 ⎛ 20 ⎞ 15 5 ⎛ 20 ⎞ 20 0 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.02069. ⎝ 15 ⎠ ⎝ 15 ⎠ ⎝ 20 ⎠ Thus, the critical value is 15 (this is the value of t at which we are just able to reject H0 at the α = 0.05 significance level). And the critical region is {15, 16, . . . , 20} (this is the set of values of t at which we’d be able to reject H0 at the α = 0.05 significance level). (b) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ ≠ 0.5. The test statistic T is the number of heads (out of the 20 coin-flips). For t = 14, the corresponding p-value is P (T ≥ 14, T ≤ 6∣H0 ) = 1 − P (7 ≤ T ≤ 13∣H0 ) = 1 − [P (T = 7∣H0 true) + P (T = 8∣H0 true) + ⋅ ⋅ ⋅ + P (T = 13∣H0 true)] ⎡ ⎤ ⎢⎛ 20 ⎞ 7 13 ⎛ 20 ⎞ 8 12 ⎥ ⎛ ⎞ 20 13 7 = 1 − ⎢⎢ 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ⎥⎥ ≈ 0.1153. ⎝ 8 ⎠ ⎝ 13 ⎠ ⎢⎝ 7 ⎠ ⎥ ⎣ ⎦ For t = 15, the corresponding p-value is P (T ≥ 15, T ≤ 5∣H0 ) = 1 − P (6 ≤ T ≤ 14∣H0 ) = 1 − [P (T = 6∣H0 true) + P (T = 7∣H0 true) + ⋅ ⋅ ⋅ + P (T = 14∣H0 true)] ⎡ ⎤ ⎥ ⎢⎛ 20 ⎞ 6 14 ⎛ 20 ⎞ 7 13 ⎛ ⎞ 20 = 1 − ⎢⎢ 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.513 0.57 ⎥⎥ ≈ 0.1153. ⎝ 7 ⎠ ⎝ 14 ⎠ ⎢⎝ 6 ⎠ ⎥ ⎣ ⎦ Thus, the critical value is 15 and the critical region is {15, 16, . . . , 20}. Page 363, Table of Contents www.EconsPhDTutor.com Answer to Exercise 96. The competing hypotheses are: H0 ∶ µ = 34, HA ∶ µ ≠ 34. The observed sample mean is x̄ = 35 + 35 + 31 + 32 + 33 + 34 + 31 + 34 + 35 + 34 = 33.4. 10 The corresponding p-value is p = P (X̄ ≥ 33.4, X̄ ≤ 34.6∣H0 ) = P (X̄ ≥ 33.4∣H0 ) + P (X̄ ≤ 34.6∣H0 ) ⎛ ⎛ 33.4 − 34 ⎞ 34.6 − 34 ⎞ =P Z≥ √ +P Z ≤ √ ≈ 0.5271. ⎝ ⎝ 9/10 ⎠ 9/10 ⎠ The large p-value does not cast doubt on or provide evidence against H0 . We fail to reject H0 at the α = 0.05 significance level. Answer to Exercise 97. The competing hypotheses are: H0 ∶ µ = 34, HA ∶ µ ≠ 34. The observed sample mean is x̄ = 33.4. The corresponding p-value is p = P (X̄ ≤ 33.4, X̄ ≥ 34.6∣H0 ) = P (X̄ ≤ 33.4∣H0 ) + P (X̄ ≥ 34.6∣H0 ) ⎛ 34.6 − 34 ⎞ 33.4 − 34 ⎞ CLT ⎛ ≈ P Z≤ √ +P Z ≥ √ ≈ 0.04550. ⎝ ⎝ 9/100 ⎠ 9/100 ⎠ The large p-value casts doubt on or provides evidence against H0 . We reject H0 at the α = 0.05 significance level. Page 364, Table of Contents www.EconsPhDTutor.com Answer to Exercise 98. The competing hypotheses are: H0 ∶ µ = 34, HA ∶ µ ≠ 34. The observed sample mean is x̄ = 33.4. And the observed sample variance is s2 = 11.2. The corresponding p-value is p = P (X̄ ≤ 33.4, X̄ ≥ 34.6∣H0 ) = P (X̄ ≤ 33.4∣H0 ) + P (X̄ ≥ 34.6∣H0 ) ⎛ 33.4 − 34 ⎞ 34.6 − 34 ⎞ CLT ⎛ ≈ P Z≤√ +P Z ≥ √ ≈ 0.07300. ⎝ ⎝ 11.2/100 ⎠ 11.2/100 ⎠ The fairly small p-value casts some doubt on or provides some evidence against H0 . But we fail to reject H0 at the α = 0.05 significance level. Answer to Exercise 99. The observed sample mean is x̄ = 68 and the observed sample variance (use Fact 13(a)) is s2 = 2 − [∑i=1n xi ] 50 × 5000 − (68×50) 50 = ≈ 383.7. n−1 49 n ∑i=1 x2i n 2 Let µ be the true average weight of a Singaporean. The competing hypotheses are H0 ∶ µ = 75 and HA ∶ µ < 75. (This is a one-tailed test, because your friend’s claim is that the average American is heavier than the average Singaporean. If the claim were instead that the average American’s weight is different from the average Singaporean’s, then we’d have a two-tailed test.) Since the sample size n = 50 is “large enough”, we can appeal to the CLT. The p-value is 68 − 75 ⎞ CLT ⎛ p = P (X̄ ≤ 68∣H0 ) ≈ P Z ≤ √ ≈ 0.0058. ⎝ 383.7/50 ⎠ The small p-value casts doubt on or provides evidence against H0 . We can reject H0 at any conventional significance level (α = 0.1, α = 0.05, or α = 0.01). Page 365, Table of Contents www.EconsPhDTutor.com Answer to Exercise 100. 1200 q 1000 800 600 400 200 p ($) 0 0 2 4 6 8 10 12 Answer to Exercise 101. Compute p̄ = (8 + 9 + 4 + 10 + 8) /5 = 7.8 and q̄ = (300 + 250 + 1000 + 400 + 400) /5 = 470. Also, n ∑ (pi − p̄) (qi − q̄) = (8 − p̄) (300 − q̄) + (9 − p̄) (250 − q̄) + ⋅ ⋅ ⋅ + (8 − p̄) (400 − q̄) i=1 = (8 − 7.8) (300 − 470) + (9 − 7.8) (250 − 470) + ⋅ ⋅ ⋅ + (8 − 7.8) (400 − 470) = −2480, ¿ √ Án Á À∑ (pi − p̄)2 = (8 − p̄)2 + (9 − p̄)2 + (4 − p̄)2 + (10 − p̄)2 + (8 − p̄)2 i=1 √ √ 2 2 2 2 2 = (8 − 7.8) + (9 − 7.8) + (4 − 7.8) + (10 − 7.8) + (8 − 7.8) = 20.8 ≈ 4.56070170, ¿ √ Án Á À∑ (qi − q̄)2 = (300 − q̄)2 + (250 − q̄)2 + ⋅ ⋅ ⋅ + (400 − q̄)2 i=1 √ √ 2 2 2 = (300 − 470) + (250 − 470) + ⋅ ⋅ ⋅ + (400 − 470) = 368000 ≈ 606.63003552. Thus, n −2480, ∑i=1 (pi − p̄) (qi − q̄) r=√ ≈ ≈ −0.8964. √ 2 2 4.56070170 × 606.63003552 n n ∑i=1 (pi − p̄) ∑i=1 (qi − q̄) Page 366, Table of Contents www.EconsPhDTutor.com Answer to Exercise 102. (a) We already computed (in the previous exercise) that n n i=1 i=1 2 p̄ = 7.8, q̄ = 470, ∑ (pi − p̄) (qi − q̄) = −2480 and ∑ (pi − p̄) = 20.8. So, n ∑i=1 (pi − p̄) (qi − q̄) −2480 b̂ = ≈ −119.2 = 2 n 20.8 ∑i=1 (pi − p̄) Thus, the regression line of q on p is q − q̄ = b̂ (p − p̄) or q − 470 = −119.2 (p − 7.8) or q = 1400 − 119.2p. (b) i 1 2 3 4 5 pi ($) 8 9 4 10 8 300 250 1000 400 400 qi q̂i 446 327 923 208 406 ûi = qi − q̂i −146 −77 77 192 −46 q 1000 900 800 700 600 500 400 300 200 100 p ($) 0 (c) 0 2 5 4 2 2 6 8 10 2 (d) The SSR is ∑ û2i ≈ (−146) + (−77) + 772 + 1922 + (−46) = 72308. i=1 Page 367, Table of Contents www.EconsPhDTutor.com Answer to Exercise 103. After Step 1. After Step 2. After Step 3. After Step 4. After Step 5. After Step 6. After Step 7. After Step 8. After Step 9. After Step 10. After Step 11. After Step 12. The TI84 tells us that r = −.8963881445 and the regression line is y = ax+b = −119.2307692+ 1400. This is indeed consistent with the answers from the previous exercises. Answer to Exercise 104. In the previous exercises, we already calculated that the OLS line of best fit is q = 1400 − 119.2p. Thus, (a) By interpolation, a barber who charged $7 per haircut sold 1400 − 119.2 × 7 ≈ 566 haircuts. (b) By extrapolation, a barber who charged $200 per haircut sold 1400−119.2×200 = −22440 haircuts. This is plainly absurd. The second prediction is obviously absurd and thus obviously less reliable than the first. Page 368, Table of Contents www.EconsPhDTutor.com 63 Answers to Exercises in Part IV (2006-2015 A-Level Exams) 63.1 Answers for Ch. 58: Pure Mathematics Answer to Exercise 105 (8864 N2015/I/1). The given expression is a ∩-shaped quadratic. It is thus always negative if and only if its discriminant is negative. But this is impossible, because it discriminant is D = (k − 4)2 − 4(−2)(2k) = k 2 + 8k + 16 = (k + 4)2 , which is always positive. Answer to Exercise 106 (8864 N2015/I/2). (i) d 3 −24 −5 = −12(2x − 1) (2) = . dx (2x − 1)4 (2x − 1)5 1 1 2 2 4 x3 4 (ii) ∫ (x + ) dx = ∫ x2 + 4 + 2 dx = [ + 4x − ] x x 3 x 0.5 0.5 0.5 1 1 7 1 = ( + 4 − 4) − ( + 2 − 8) = 6 . 3 24 24 Answer to Exercise 107 (8864 N2015/I/3). x = −0.5 ln 0.75. a (i) dy/dx = 12 − 16e−2x = 0 ⇐⇒ a (ii) ∫ 12x + 8e−2x dx = [6x2 − 4e−2x ]0 = (6a2 − 4e−2a ) − (6 ⋅ 02 − 4e−2⋅0 ) = 6a2 − 4e−2a + 4. 0 Answer to Exercise 108 (8864 N2015/I/4). (i) The perimeter of P QRST U is 3x + 3(y − 2x) = 3(y − x) = 30, so y = 10 + x. √ √ √ 2 △DEF has area 0.5y y 2 − (0.5y) = 3y 2 /4 = 3(10 + x)2 /4. √ Each of the three small cut-out triangles has area 3x2 /4. So √ √ √ 3 3 3 (100 + x2 + 20x − 3x2 ) A= (10 + x)2 − 3x2 = 4 4 4 √ √ 3 3 (100 + 20x − 2x2 ) = (50 + 10x − x2 ) . = 4 2 (ii) √ 3 dA = (10 − 2x) = 0 dx 2 Page 369, Table of Contents √ ⇐⇒ x=5 Ô⇒ Amax = 3 75. 2 www.EconsPhDTutor.com Answer to Exercise 109 (8864 N2015/I/5). (i) dy 1 = 0.5x ln 0.5 − . dx x+1 (ii) Ô⇒ R √ 1 dy RRRR 2 = 0.50.5 ln 0.5 − RRR = − 0.5 ln 2 − ≈ −1.157. dx RR 0.5 + 1 3 Rx=0.5 √ (iii) The point P is (0.5, 0.5 − ln 1.5). The normal to C at P has equation √ −1 1 y − ( 0.5 − ln 1.5) = √ (x − 0.5) = √ (x − 0.5) 2 − 0.5 ln 2 − 3 0.5 ln 2 + 23 or y = 0.8644568499x − 0.13058675187. Its y- and x-intercepts are y0 = 0.8644568499(0)−0.13058675187 ≈ −0.13058675187 √ and x0 = (0 + 0.13058675187)/0.8644568499 ≈ 0.15106219805. The length of AB is thus 0.040. x20 + y02 ≈ Answer to Exercise 110 (8864 N2014/I/1). ∫1 6 6 √ √ 1 2√ 1 √ √ 1 + 4x] = ( 25 − 5) = 0.5 (5 − 5) . dx = [ 4 2 1 + 4x 1 Page 370, Table of Contents www.EconsPhDTutor.com −1 Answer to Exercise 111 (8864 N2014/I/2). (i) 2x (x2 + 4) . −1 (ii) 2x (x2 + 4) = k ⇐⇒ 2x = k (x2 + 4) ⇐⇒ kx2 − 2x + 4k = 0. (iii) This quadratic equation has equal roots if and only if its discriminant is zero. D = (−2)2 − 4(k)(4k) = 4 − 16k 2 = 0 ⇐⇒ k = ±0.5. Answer to Exercise 112 (8864 N2014/I/3). (i) (ii) dy/dx = 2e1−2x . The point is (1, 1 − e−1 ). dy/dx∣x=1 = 2e−1 . So the equation is y − (1 − e−1 ) = 2e−1 (x − 1) or y = 2e−1 x + 1 − 3e−1 . Answer to Exercise 113 (8864 N2014/I/4). (i) By the Pythagorean Theorem, (y − x)2 + (2x)2 = 4(65) or 5x2 + y 2 − 2xy = 260. (ii) The perimeter is 2(3x + y) = 6x + 2y = 60, so y = 30 − 3x. Plug this into the equation from (i) to get 5x2 + (30 − 3x)2 − 2x(30 − 3x) = 260 ⇐⇒ 5x2 + 900 + 9x2 − 180x − 60x + 6x2 = 260 ⇐⇒ 20x2 − 240x + 640 = 0 ⇐⇒ x2 − 12x + 32 = (x − 8)(x − 4) = 0 ⇐⇒ x = 4, 8. Correspondingly, y = 18, 6. We reject the latter because y > x. Thus, (x, y) = (4, 18). Page 371, Table of Contents www.EconsPhDTutor.com Answer to Exercise 114 (8864 N2014/I/5). (i) dy/dx = 3x2 + 2kx + 7 = 0. If A is a stationary point, then 3(1)2 + 2k(1) + 7 = 0 ⇐⇒ k = −5. And 2 = 13 − 5 ⋅ 12 + 7 ⋅ 1 + c ⇐⇒ c = −1. (ii) 3x2 − 10x + 7 = 0 ⇐⇒ x= 10 ± √ (−10)2 − 4(3)(7) 5 ± = 2(3) √ 25 − 21 7 = 1, . 3 3 y = (7/3)3 − 5(7/3)2 + 7(7/3) − 1 = 343/27 − 245/9 + 49/3 − 1 = 22/27. So B is (7/3, 22/27). (iii) (iv) ∫ 2 1 2 x4 5x3 7x2 x 40 1 5 7 1 19 x −5x +7x−1dx = [ − + − ] = (4 − + 14 − 1)−( − + − ) = . 4 3 2 2 1 3 4 3 2 2 12 3 2 Answer to Exercise 115 (8864 N2013/I/1). A quadratic equation has no real roots if 2 and only if its discriminant is negative. In this case, the discriminant is D = [−(k − 2)] − 4(1)(2k + 1) = k 2 − 4k + 4 − 8k − 4 = k 2 − 12k = k(k − 12). D < 0 if and only if k ∈ (0, 12). Answer to Exercise 116 (8864 N2013/I/2). (i) 4x/ (1 + 2x2 ). 0 (ii) ∫ −1 0 1 1 1 1 −3 7 −3 dx = [ (1 − 3x) ] = − 4 = . (1 − 3x)4 9 9 9 64 −1 Answer to Exercise 117 (8864 N2013/I/3). (i) The perimeter is 4x + 2y + 3x + 5x = 12x + 2y = 20, so y = 10 − 6x. The area is S = 4xy + 6x2 = 4x (10 − 6x) + 6x2 = 40x − 18x2 . (ii) dS/dx = 40 − 36x = 0 ⇐⇒ x = 10/9. So Smax = 40(10/9) − 18(10/9)2 = 200/9. Page 372, Table of Contents www.EconsPhDTutor.com Answer to Exercise 118 (8864 N2013/I/4). (i) dy/dx = 3x2 − 2ax + 3. dy/dx∣x=1 = 3 − 2a + 3 = 6 − 2a. So the gradient of the normal is 1/(2a − 6). (ii) The normal passes through the point P (1, 10−a) and has equation y−3 = (x+5)/(2a−6). Plugging the point into the equation, 10−a−3 = 1+5 2a − 6 ⇐⇒ 2(7−a)(a−3) = 6 ⇐⇒ −a2 +10a−21 = 3 ⇐⇒ a2 −10a+24 = 0. a2 − 10a + 24 = (a − 6)(a − 4) = 0 ⇐⇒ a = 4, 6. (iii) y − 3 = (y + 5)/(2 ⋅ 4 − 6) ⇐⇒ 2(y − 3) = (y + 5) ⇐⇒ y = 11. It is (11, 11). Answer to Exercise 119 (8864 N2013/I/5). (i) 2 − 2x = ln 2 − x ⇐⇒ x = 2 − ln 2. (ii) dy/dx = −2e2−2x + 2e−x = 0 ⇐⇒ ex = e2x−2 ⇐⇒ e2 = ex ⇐⇒ x = 2. And y = e2−2(2) − 2e−2 = e−2 − 2e−2 = e−2 . So the stationary point is (2, −e−2 ). (iii) e2−2x − 2e−x = 0 ⇐⇒ e2−x = 2 ⇐⇒ x = 2 − ln 2. 1 1 (iv) Exact: ∫ e2−2x − 2e−x dx = [−0.5e2−2x + 2e−x ]0 = (−0.5e0 + 2e−1 ) − (−0.5e2 + 2e0 ) = 0 −1 2e − 2.5 + 0.5e2 ≈ 1.930. TI84: Answer to Exercise 120 (8864 N2012/I/1). Let u = e2x . Then 3e2x = 4 (e−2x − 1) ⇐⇒ 3u = 4 (u−1 − 1) ⇐⇒ 3u2 = 4 (1 − u) ⇐⇒ 3u2 + 4u − 4 = 0 ⇐⇒ (3u − 2)(u + 2) = 0 ⇐⇒ u = 2/3, −2. Assuming that x is real, it cannot be that e2x = −2. Hence, e2x = 2/3 or x = 0.5 ln(2/3). Page 373, Table of Contents www.EconsPhDTutor.com Answer to Exercise 121 (8864 N2012/I/2). (i) The area of HBCG and DEF G is 1 (x + 20)y and 20x. The former is thrice the latter: (x + 20)y = 60x ⇐⇒ y = 60x/(x + 20). 2 1 2 The total fencing is x + y + x + x + 20 + x + y = 4x + 2y + 20 = 100. Plug = into = to get 4x+ 120x +20 = 100 x + 20 ⇐⇒ 4x2 +100x+400+120x = 100x+2000 ⇐⇒ 4x2 +120x−1600 = 0. Dividing by 4 yields the desired result. (ii) x2 + 30x − 400 = (x − 10)(x + 40) = 0 ⇐⇒ x = 10, −40. (Reject the negative value.) HF = x + y = 10 + 60 ⋅ 10/(10 + 20) = 30. Answer to Exercise 122 (8864 N2012/I/3). (i) 3k 2 /4 = k 2 − x2 ⇐⇒ x = ±0.5k. 0.5k (ii) ∫ −0.5k 0.5k 0.5k 3 k2 x3 k2 x3 k − x − k 2 dx = [ x − ] = 2[ x − ] 4 4 3 −0.5k 4 3 0 2 2 0.5k k3 k3 = 2[ − ] 8 24 0 = k3 6 Answer to Exercise 123 (8864 N2012/I/4). (i) (a) 6/(3x + 2). (b) −8/(2x + 1)2 . (ii) ∫2 4 2 4 4 √ 1 1 x2 ( x − √ ) dx = ∫ x + − 2dx = [ + ln ∣x∣ − 2x] x 2 2 x 2 = (8 + ln 4 − 8) − (2 + ln 2 − 4) = ln 2 + 2. Page 374, Table of Contents www.EconsPhDTutor.com Answer to Exercise 124 (8864 N2012/I/5). (i) 2x − x2 = 0 ⇐⇒ x = 2, 4. (ii) dy/dx = 2x ln 2 − 2x. dy/dx∣x=1.5 = 21.5 ln 2 − 3 ≈ −1.0395. (iii) The point is (1.5, 21.5 − 1.52 ). The equation of the tangent at the point is y − (21.5 − 1.52 ) = (21.5 ln 2 − 3) (x − 1.5) ⇐⇒ y ≈ −1.0395x + 2.1377. (iv) The points A and B are (2.1377, 0) and (1.048, 1.048). So the length AB is √ (2.1377 − 1.048)2 + 1.0482 ≈ 1.51. Answer to Exercise 125 (8864 N2011/I/1). This is a ∪-shaped quadratic which is entirely above the x-axis if and only if its discriminant is negative. In this case, the discriminant is D = (k − 2)2 − 4(1)(k + 1) = k 2 − 4k + 4 − 4k − 4 = k 2 − 8k = k(k − 8). D < 0 if and only if k ∈ (0, 8). Answer to Exercise 126 (8864 N2011/I/2). (i) (ii) 2 − 0.6x = x2 − 1 ⇐⇒ x ≈ −1.1116, 1.5995. 3 (iii) ∫ x2 − 1 − (2 − 0.6x ) dx ≈ 3.615. 2 Page 375, Table of Contents www.EconsPhDTutor.com Answer to Exercise 127 (8864 N2011/I/3). (i) ∫ e3x+2 dx = e3x+2 /3 + C, where C is the constant of integration. 9 √ √ 9 √ (ii) ∫ 3 ( x − 1/ x) dx = 3 [2x1.5 /3 − 2 x]4 = 3 [2(27 − 8)/3 − 2 ⋅ (3 − 2)] = 32. 4 Answer to Exercise 128 (8864 N2011/I/4). (i) V = (2 − 2x)2 x = 4x3 − 8x2 + 4x. (ii) dV /dx = 12x2 − 16x + 4 = 0 ⇐⇒ 3x2 − 4x + 1 = (3x − 1)(x − 1) = 0 ⇐⇒ x = 1/3, 1. (These are the two stationary points.) dV /dx is decreasing at x = 1/3 and increasing at x = 1. Hence, the former is a maximum turning point. Thus, Vmax = 4(1/3)3 − 8(1/3)2 + 4(1/3) = 4/27 − 8/9 + 4/3 = 16/27. Answer to Exercise 129 (8864 N2011/I/5). (i) dy/dx = 1−2/(2x+1) = 0 ⇐⇒ x = 0.5. y = 0.5 − ln 2. The minimum point is (0.5, 0.5 − ln 2). (ii) Point P is (2, 2 − ln 5). dy/dx∣x=2 = 0.6. So the normal to C at P has equation y − (2 − ln 5) = − 1 5 16 (x − 2) or y = − x + − ln 5. 0.6 3 3 3 16 16 So A = ( ( − ln 5) , 0) and B = (0, − ln 5). So the area of △OAB is 5 3 3 2 16 3 16 1 3 16 2 ( − ln 5) = (16 − 3 ln 5) . 0.5 ( − ln 5) ( − ln 5) = 5 3 3 10 3 30 Answer to Exercise 130 (8864 N2010/I/1). A quadratic equation has two real roots 2 if and only if its discriminant is positive. In this case, the discriminant is D = (−2k) − 4(4)(9) = 4k 2 − 144 = 4(k − 6)(k + 6). D > 0 if and only if k < −6 or k > 6. Answer to Exercise 131 (8864 N2010/I/2). (i)∫ e1−2x dx = −0.5e1−2x + C, where C is the constant of integration. (ii) ∫ 2/(x + 1)3 dx = −(x + 1)−2 + C, where C is the constant of integration. Page 376, Table of Contents www.EconsPhDTutor.com Answer to Exercise 132 (8864 N2010/I/3). (i) Below. (ii) dy/dx = 2/(2x − 3). (iii) dy/dx∣x=3 = 2/3. So at the point (3, ln 3), the normal has equation y − ln 3 = −1.5(x − 3) or y + 1.5x = 4.5 + ln 3 or 2y + 3x = 9 + 2 ln 3. Answer to Exercise 133 (8864 N2010/I/4). (i) The perimeter is (5/4)2x+2x+2AD = 6. Rearranging , AD = 3 − 9x/4. √ (ii) The area is A = 2x(3 − 9x/4) + x (1.25x)2 − x2 = 6x − 9x2 /2 + 3x2 /4 = 6x − 15x2 /4. (iii) dA/dx = 6 − 15x/2 = 0 ⇐⇒ x = 0.8. Amax = 6 ⋅ 0.8 − 15 ⋅ 0.82 /4 = 4.8 − 2.4 = 2.4. Answer to Exercise 134 (8864 N2010/I/5). (i) dy/dx = −12x2 − 12x3 = 0 ⇐⇒ x = 0, −1. So the stationary points are (x, y) = (0, 6), (−1, 7). (ii) Below. (iii) 6 − 4x3 − 3x4 = 0 ⇐⇒ x = −1.72, 0.96 (calculator). (iv) ∫ 6 − 4x3 − 3x4 dx = 6x − x4 − 0.6x5 + C, where C is a constant of integration. 0.5 1 0.6 51 3 4 4 5 0.5 ∫−1 6 − 4x − 3x dx = [6x − x − 0.6x ]−1 = (3 − 16 − 32 ) − (−6 − 1 + 0.6) = 9 160 . Page 377, Table of Contents www.EconsPhDTutor.com 1 2 Answer to Exercise 135 (8863 N2009/I/1). We’re given x + 2y = 3 and x2 + xy = 2. 1 3 3 2 From =, x = 3 − 2y. Plug = into = to get (3 − 2y)2 + (3 − 2y)y = 2 or 2y 2 − 9y + 7 = 0 or (2y − 7)(y − 1) = 0. Thus, y = 7/2, 1. Correspondingly, x = −4, 1. The solutions are thus (x, y) = (7/2, −4), (1, 1). √ Answer to Exercise 136 (8863 N2009/I/2). (i) x = 0.5x ⇐⇒ x = 0.25x2 ⇐⇒ x(1 − 0.25x) = 0 ⇐⇒ x = 0, 4. So the points of intersection are (0, 0) and (4, 2). √ (ii) ∫ xdx = 2x1.5 /3 + C and ∫ 0.5xdx = x2 /4 + D, where C and D are constants of integration (iii) ∫ 0 4√ 4 x − 0.5xdx = [2x1.5 /3 − x2 /4]0 = 16/3 − 4 = 4/3. Page 378, Table of Contents www.EconsPhDTutor.com Answer to Exercise 137 (8863 N2009/I/4). (i) The curve intersect the x-axis at (1, 0) and does not intersect the y-axis. (ii) dy/dx = 1 + 1/x2 . dy/dx∣x=2 = 5/4. So the gradient of the normal at P is −0.8. (iii) The point P is (2, 1.5). So the equation of the normal is y − 1.5 = −0.8(x − 2) or 4x + 5y − 15.5 = 0. (iv) N is (0, 3.1). The equation of the tangent at P is y − 1.5 = (5/4)(x − 2). So the point T is (0, −1). So the area of △P T N is 0.5(4.1)(2) = 4.1. Answer to Exercise 138 (8863 N2009/I/5). (i) dy/dx = 6x2 − 10x − 4 = 0 ⇐⇒ 3x2 −5x−2 = (3x+1)(x−2) = 0 ⇐⇒ x = −1/3, 2. So the stationary points are (−1/3, 100/27) and (2, −9). (ii) Below. (iii) From the graph, 2x3 − 5x2 − 4x + 3 > 0 ⇐⇒ x ∈ (−1, 0.5) ∪ (3, ∞). 2e3x − 5e2x − 4ex + 3 > 0 ⇐⇒ ex ∈ (−1, 0.5) ∪ (3, ∞) ⇐⇒ x ∈ (−∞, ln 0.5) ∪ (ln 3, ∞). Page 379, Table of Contents www.EconsPhDTutor.com Answer to Exercise 139 (8863 N2008/I/1). (i) sin(2π + α) = sin 2π cos α + sin α cos 2π = 0 ⋅ cos α + sin α ⋅ 1 = sin α = c. (ii) sin(3π + α) = sin 3π cos α + sin α cos 3π = 0 ⋅ cos α + sin α ⋅ (−1) = − sin α = −c. sin(π + α) = sin π cos α + sin α cos π = 0 ⋅ cos α + sin α ⋅ (−1) = − sin α = −c. 1 2 Answer to Exercise 140 (8863 N2008/I/2). We are given that x + y = 20 and x2 + y 2 = 300. 1 3 3 2 From =, we have y = 20 − x. Plug = into = to get x2 + (20 − x)2 = 300 or 2x2 − 40x + 100 = 0 or x2 − 20x + 50 = 0. Solving the quadratic: x= 20 ± √ Correspondingly, y = 10 ∓ √ √ (−20)2 − 4(1)(50) = 10 ± 100 − 50 = 10 ± 50. 2(1) √ 50. So the two solutions are (x, y) = (10 ± √ 50, 10 ∓ √ √ √ 50) , (10 ∓ 50, 10 ± 50) . Answer to Exercise 141 (8863 N2008/I/3). k k (i) 2x2 = x2 + k 2 ⇐⇒ x = ±k. k (ii) ∫ x2 + k 2 − 2x2 dx = [−x3 /3 + k 2 x]−k = 2 [−x3 /3 + k 2 x]0 = 2 (−k 3 /3 + k 3 ) = 4k 3 /3. −k Page 380, Table of Contents www.EconsPhDTutor.com Answer to Exercise 142 (8863 N2008/I/5). D = (−24)2 − 4(3)(k) < 0 ⇐⇒ k > 48. (ii) Below. (i) dx/dt = 3t2 − 24t + k > 0 ⇐⇒ (iii) t3 − 12t2 + 36t = 375 ⇐⇒ t ≈ 11.7 (calculator). Answer to Exercise 143 (8863 N2008/I/6). (i) dy/dx = 2/(2x + 4) = 1/(x + 2). dy/dx∣x=1 = 1/3. The equation of the tangent at P is y − ln 6 = (1/3)(x − 1). So the x-coordinate of T is 1 − 3 ln 6. (ii) The equation of the normal at P is y − ln 6 = −3(x − 1). So the x-coordinate of N is 1 + (ln 6)/3. (iii) 0.5 [(ln 6) /3 + 3 ln 6] ln 6 = (5/3)(ln 6)2 . Answer to Exercise 144 (8863 N2007/I/1). d3x /dx = 3x ln 3. At x = 2, this equals 9 ln 3. (ii) The point is (2, 9). The equation is y − 9 = 9 ln 3(x − 2) or y = (9 ln 3)x + 9(1 − 2 ln 3). Page 381, Table of Contents www.EconsPhDTutor.com Answer to Exercise 145 (8863 N2007/I/3). (i) Below. (ii) 20/(x + 2) = 10 − x2 ⇐⇒ x ≈ 2.317 (calculator). (iii) ∫ 20/(x + 2)dx = 20 ln ∣x + 2∣ + C and ∫ (10 − x2 ) dx = 10x − x3 /3 + D, where C and D are constants of integration. (iv) ∫ 0 2.317 2.317 10 − x2 − 20/(x + 2)dx = [10x − x3 /3 − 20 ln ∣x + 2∣]0 ≈ 3.635. Answer to Exercise 146 (8863 N2007/I/4). (i) Let a be the length of each side of the isosceles triangle. By the Pythagorean Theorem, a2 + a2 = x2 . Thus, the area of the triangle is 0.5a2 = x2 /4. (ii) Let b be the length of a side of the square. Then 2b + x = 100. And A = x2 /4 + b2 = 2 x2 /4 + [0.5(100 − x)] = 2500 − 50x + 0.5x2 . (iii) dA/dx = −50 + x = 0 ⇐⇒ x = 50. So Amin = 2500 − 50 ⋅ 50 + 0.5 ⋅ 502 = 1250. This is the minimum because A is a ∪-shaped quadratic function of x. (iv) A is a ∪-shaped quadratic function of x. We know the minimum is at x = 50. So the maximum is at either corner (i.e. x = 10 or x = 80). A(10) = 2500 − 50 ⋅ 10 + 0.5 ⋅ 102 = 2050. A(80) = 2500 − 50 ⋅ 80 + 0.5 ⋅ 802 = 1700. So Amax = A(10) = 2050. Page 382, Table of Contents www.EconsPhDTutor.com 1 Answer to Exercise 147 (8863 N2007/I/5). (i) We are given y = 2x2 + 3x + 2 and 2 2 1 y = 2x + 3. Plug = into = to get 2x + 3 = 2x2 + 3x + 2 or 2x2 + x − 1 = 0 or (2x − 1)(x + 1) = 0. Thus, x = 0.5, −1. Correspondingly, y = 4, 1. The two solutions to the given simultaneous equations are thus (x, y) = (0.5, 4), (−1, 1). (ii) The inequality 2x2 + 3x + 2 ≥ 2x + 3 is equivalent to (2x − 1)(x + 1) ≥ 0, which is true if and only if x ≤ −1 or x ≥ 0.5. (iii) From our work above, we know that the given inequality holds if and only if cos θ ≤ −1 or cos θ ≥ 0.5. Since cos θ ∈ [−1, 1], this is equivalent to cos θ = −1 or cos θ ∈ [0.5, 1]. Which in turn is true if and only if θ ∈ [0○ , 60○ ] ∪ {180○ } ∪ [300○ , 420○ ] ∪ {540○ }. Answer to Exercise 148 (8174 N2006/I/6). (i) x = (ln 23 − 2)/5 ≈ 0.227. √ (ii) y = ± 102.5 − 40 ≈ ±16.620. 1 Answer to Exercise 149 (8174 N2006/I/7). Plug the equation of the line y = 1 − 3x 2 into the equation of the curve x2 + y 2 + kx + 2y + 7 = 0 to get 2 x2 + (1 − 3x) + kx + 2 (1 − 3x) + 7 = 0 ⇐⇒ 3 10x2 + (k − 12)x + 10 = 0. Apply d/dx to the equation of the curve: d d (x2 + y 2 + kx + 2y + 7) = 0 dx dx ⇐⇒ 2x + 2y dy dy + k + 2 = 0. dx dx The tangent line has slope −3. So at the point at which the line touches the curve, we have dy/dx = −3. Plugging this into the above, we have 2x + 2y(−3) + k + 2(−3) = 0 or 4 1 4 5 5 2x − 6y + k − 6 = 0. Now plug = into = to get 20x − 12 + k = 0 or k = 12 − 20x. Now plug = into 3 = to get 10x2 − 20x2 + 10 = 0 or 10x2 = 10 or x2 = 1 or x = ±1. Correspondingly, k = −8, 32. Page 383, Table of Contents www.EconsPhDTutor.com Answer to Exercise 150 (8174 N2006/I/9). (i) ∫ (5x2 − 8x) dx = 5x3 /3 − 16x2 + C, where C is the constant of integration. 1 1 (ii) ∫ e−2x dx = [−0.5e−2x ]0 = −0.5 (e−2 − 1) = 0.5 (1 − e−2 ). 0 Answer to Exercise 151 (8174 N2006/I/16). (i) −4x + 19 = −2x2 + 6x + 11 ⇐⇒ 2x2 − 10x + 8 = 0 ⇐⇒ x2 − 5x + 4 = (x − 1)(x − 4) = 0 ⇐⇒ x = 1, 4. So A = (1, 15) and B = (4, 3). 4 4 4 (ii) ∫ −2x2 + 6x + 11 − (−4x + 19)dx = ∫ −2x2 + 10x − 8dx = [−2x3 /3 + 5x2 − 8x]1 1 1 = (−2/3) ⋅ 63 + 5 ⋅ 15 − 8 ⋅ 3 = −42 + 75 − 24 = 9. Page 384, Table of Contents www.EconsPhDTutor.com 63.2 Answers for Ch. 59: Probability and Statistics Answer to Exercise 152 (8864 N2015/I/6). Let X be the mass of a peach. We are 1 2 given that P(X < 40) = P (Z < (40 − µ)/σ) = 0.2 and P(X > 60) = P (Z > (60 − µ)/σ) = 0.25. 3 4 1 2 From = and =, we have (40 − µ)/σ ≈ −0.841621234 and (60 − µ)/σ ≈ 0.67448975. So (60 − µ) − (40 − µ) = 20 ≈ 0.67448975σ + 0.841621234σ Ô⇒ σ ≈ 13.192. And µ ≈ 51.102. Answer to Exercise 153 (8864 N2015/I/7). (i) Take the 12th, 24th, . . . , 1200th students. (ii) There might be some strange period-12 pattern in the list of names, thus introducing bias to the sample. (iii) Stratified. Answer to Exercise 154 (8864 N2015/I/8). (i) 0.03 = P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 3p − 0.42, so p = 0.15. (ii) P(A ∪ B ′ ) = P(A) + P(B ′ ) − P(A ∩ B ′ ) = p + (1 − 2p) − 0.12 = 1 − p − 0.12 = 0.73. (iii) P(A)P(B ′ ) = p(1 − 2p) = 0.15(0.7) = 0.105. P(A ∩ B ′ ) = 0.12. Since P(A)P(B ′ ) ≠ P(A ∩ B ′ ), A and B ′ are not independent. Answer to Exercise 155 (8864 N2015/I/9). (i) Let X ∼ B(8, 1/6) be the number of sixes. P(X = 3) = C(8, 3)(1/6)3 (5/6)5 = 56 ⋅ 55 /68 ≈ 0.104. ⎛8⎞ 1 0 5 8 ⎛8⎞ 1 1 5 7 ⎛8⎞ 1 2 5 6 ⎛8⎞ 1 3 5 5 (ii) P(X < 3) = ( ) ( ) + ( ) ( ) + ( ) ( ) + ( ) ( ) 6 6 6 6 ⎝0⎠ 6 ⎝1⎠ 6 ⎝2⎠ 6 ⎝3⎠ 6 ≈ 0.969. (iii) Let Y ∼ B(600, 1/6). Since the sample size is large, Y can approximated by A ∼ N (100, 500/6). P (90 ≤ Y ≤ 100) = P (89.5 < A < 100.5) = P ⎛ 89.5 − 100 100.5 − 100 ⎞ √ <A< √ ⎝ 500/6 500/6 ⎠ ≈ 0.52184005 − 0.12502718 ≈ 0.397. Page 385, Table of Contents www.EconsPhDTutor.com Answer to Exercise 156 (8864 N2015/I/10). (i) h (ii) r ≈ 0.922. There is a fairly strong, positive linear correlation between h and w. (iii) w − 98.1 = 58.0(h − 1.799). (iv) ŵh=1.66 = 58.0(1.66−1.799)+98.1 ≈ 90.0. (1) Linear interpolation is somehow magically reliable. (2) Our linear model seems to fit the data pretty well. Answer to Exercise 157 (8864 N2015/I/11). (i) Let M be the mass of a randomly chosen man. P(75 ≤ M ≤ 79) ≈ 0.162. (ii) Let W be the mass of a randomly chosen woman. M1 + M2 + M3 − (W1 + W2 + W3 + W4 ) ∼ N (3 ⋅ 77 − 4 ⋅ 62, 3 ⋅ 9.82 + 4 ⋅ 10.62 ) = N (−17, 737.56) . The desired probability is P (M1 + M2 + M3 − (W1 + W2 + W3 + W4 ) > 0) ≈ 0.266. (iii) M1 +M2 +M3 +W1 +W2 +W3 +W4 ∼ N (3 ⋅ 77 + 4 ⋅ 62, 3 ⋅ 9.82 + 4 ⋅ 10.62 ) = N (479, 737.56). The desired probability is P (M1 + M2 + M3 + W1 + W2 + W3 + W4 ≤ 460) ≈ 0.242. Answer to Exercise 158 (8864 N2015/I/12). (i) (ii) (3/4) × (2/5) + (1/4) × (3/4) × (2/5) = 3/8. (iii) (3/4) × (2/5) = 0.3. (iv) Let X ∼ B(5, 3/8) be the number of successes. P(X ≥ 2) ≈ 0.619. Page 386, Table of Contents www.EconsPhDTutor.com Answer to Exercise 159 (8864 N2015/I/13). (i) Let X be the length of a fish. We are given that X ∼ N (µ, 2.12 ) The competing hypotheses are H0 ∶ µ = 15.2 and HA ∶ µ ≠ 15.2. The sample mean is X̄30 ∼ N (µ, 2.12 /30). The p-value is P (X̄30 ≤ 14.5, X̄30 ≥ 15.9∣H0 ) ≈ 0.068 > 0.05. We fail to reject H0 , which is the scientist’s claim. (ii) Unbiased estimates of the population mean and variance are, respectively, −32 x̄ = + 18 = 17.2, 40 2 325 − (−32) 40 2 s = ≈ 7.677. 39 (iii) An ‘unbiased estimate’ is generated by an unbiased estimator, which is a random variable whose expected value is equal to the parameter of interest. (iv) The p-value is P (X̄40 ≤ 17.2∣H0 ) ≈ 0.03391771. The null hypothesis would be rejected if α ? 3.40. Answer to Exercise 160 (8864 N2014/I/6). (i) P(H < 146) ≈ 0.737. (ii) P(137.2 < H < 147.2) ≈ 0.595. Answer to Exercise 161 (8864 N2014/I/7). (i) Order the 5000 households by name. Take the 50th, 100th, . . . , 5000th households. (ii) The six strata are “under-25, supermarket”, “under-25, online”, “25−60, supermarket”, “25 − 60, online”, “over-60, supermarket”, “over-60, online”. From each, randomly pick, respectively, 10, 20, 18, 32, 16, and 4 households. (iii) Stratified sampling, because it usually results in a smaller sample variance. Page 387, Table of Contents www.EconsPhDTutor.com Answer to Exercise 162 (8864 N2014/I/8). (i) 15 10 5 0 (ii) r ≈ −0.926. There is a fairly strong, negative correlation between x and y. (iii) y = −0.9021x + 16.15. (iv) ŷx=13.2 = −0.9021(13.2) + 16.15 ≈ 4.2. We are supposed to say that this predicted value is unreliable because it involves extrapolation. Answer to Exercise 163 (8864 N2014/I/9). (i) (a) Let X ∼ B(6, 0.4) be the number of cakes with fruit. P(X = 0) = 0.046656. (b) P(X ≤ 2) = 0.54432. (ii) Let Y ∼ B(8, 0.54432) be the number of packs with at most two cakes containing fruit. P(Y ≥ 4) ≈ 0.729. (iii) Let A ∼ B(150, 0.54432) be the number of packs with at most two cakes containing fruit. Since the sample size is large, A is well-approximated by B ∼ N(81.648, 37.205). The desired probability is P(A > 75) ≈ P(B > 75.5) ≈ 0.843. Answer to Exercise 164 (8864 N2014/I/10). (i) Let X ∼ (µ, 4.4) be the length of a leaf. The competing hypotheses are H0 ∶ µ = 7 and HA ∶ µ < 7. The sample mean X̄50 is, by the CLT, well-approximated by N(µ, 4.4/50). The p-value P (X̄50 < 6.5∣H0 ) ≈ 0.0459 is less than 5%, so we can reject the null hypothesis. (ii) Unbiased estimates of the population mean and variance are 310.4 = 6.208, x̄ = 50 2209.2 − 310.4 50 s = ≈ 5.76. 49 2 2 (iii) The competing hypotheses are H0 ∶ µ = 7 and HA ∶ µ ≠ 7. The sample mean X̄50 is, by the CLT, well-approximated by N(µ, 5.76/50). The p-value is P (X̄50 < 6.208, X̄ > 7.792∣H0 ) ≈ 0.01962372. To reject H0 , α ? 1.97. Page 388, Table of Contents www.EconsPhDTutor.com Answer to Exercise 165 (8864 N2014/I/11). (i) The total number of students is 48 + 12 + 10 + 20 + 55 + 15 + 130 + x = 290 + x. So P(L) = 48 + 12 + 10 + 20 90 = , 290 + x 290 + x P(G) = 55 + 15 + 10 + 20 100 = . 290 + x 290 + x Compute P(L ∩ G) = 30/(290 + x). If L and G are independent, then P(L)P(G) = 100 30 90 = P(L ∩ G) = . 290 + x 290 + x 290 + x So 300 = 290 + x or x = 10. (ii) P(L ∪ T ) = (48 + 12 + 10 + 20 + 130 + 15)/300 = 235/300 = 47/60. (iii) P(T ∩ G′ ) = (130 + 12)/300 = 142/300 = 71/150. (iv) P(L∣G) = (10 + 20)/(10 + 20 + 55 + 15) = 30/100 = 0.3. (v) There are 10 + 12 + 15 = 37 students with exactly two items. So the probability that two randomly chosen students have exactly two items is (37/300) × (36/299) ≈ 0.0148. Answer to Exercise 166 (8864 N2014/I/12). (i) P(A > 75) = P(A > (75 − 50)/σ) = 0.0189 ⇐⇒ (75 − 50)/σ ≈ 2.077016894 ⇐⇒ σ ≈ 12.03649333 ⇐⇒ σ 2 ≈ 145. (ii) WB = B1 + B2 + ⋅ ⋅ ⋅ + B7 ∼ N (7 ⋅ 75, 7 ⋅ 64) = N (525, 448). P(WB < 500) ≈ 0.119. WA = A1 + A2 + ⋅ ⋅ ⋅ + A5 ∼ N (5 ⋅ 50, 5 ⋅ 145) = N (250, 725) , (iii) WB − 2WA ∼N (525 − 2 ⋅ 250, 448 + 22 ⋅ 725) = N (25, 3348) , P (WB > 2WA ) = P (WB − 2WA > 0) ≈ 0.667. Answer to Exercise 167 (8864 N2013/I/6). (i) Randomly pick 25, 50, and 75 people who bought the $X, $Y , $Z tickets, respectively. (ii) Results in lower sample variance (as compared to simple random sampling). Page 389, Table of Contents www.EconsPhDTutor.com Answer to Exercise 168 (8864 N2013/I/7). (i) Unbiased estimates of the population mean and variance are 29555 − 305 250 2 t̄ = 305/250 + 75 = 76.22, s = ≈ 117.2. 250 − 1 2 (ii) Let T ∼ (µ, σ 2 ) be the retention time. The competing hypotheses are H0 ∶ µ = 75 and HA ∶ µ > 75. The sample mean retention time is T̄250 ∼ (µ, σ 2 /250). By the CLT, T̄250 is well-approximated by T̄250 ∼ N (µ, s2 /250). The p-value is √ P (T̄250 ≥ 76.22∣H0 ) = P (Z ≥ (76.22 − 75)/( 117.2/250) ≈ 0.03738856 > 0.025. We fail to reject the null hypothesis. Answer to Exercise 169 (8864 N2013/I/8). (i) Let X ∼ B(10, 0.2) be the number of batteries that have a lifetime less than 100 hours. P(X = 0) = 0.810 ≈ 0.107. (ii) P(X ≤ 2) ≈ 0.678 (calculator). (iii) Let Y ∼ B(80, 0.678) be the number of packs that are satisfactory. By the CLT, Y is well-approximated by A ∼ N(80 ⋅ 0.678, 80 ⋅ 0.678 ⋅ 0.322) ≈ N(54.224, 17.471). So P(Y ≥ 60) ≈ P(A ≥ 59.5) ≈ 0.103. Answer to Exercise 170 (8864 N2013/I/9). (i) 160 150 140 130 120 110 100 0 3 6 9 12 15 (ii) r ≈ 0.9032560806 is positive and fairly large, suggesting a fairly strong linear correlation between age and height. (iii) y = 4.46x + 87.43. (iv) ŷx=13.2 = 4.46(13.2) + 87.43 ≈ 146. We are supposed to say that this estimate is reliable because it involves interpolation. Page 390, Table of Contents www.EconsPhDTutor.com Answer to Exercise 171 (8864 N2013/I/10). (i) Let M ∼ N (µ, 0.82 ) be the mass of salt in a bottle. The sample mean is M̄20 ∼ N (µ, 0.82 /20). The competing hypotheses are H0 ∶ µ = 12 and HA ∶ µ ≠ 12. Let [mL , mU ] be the range of possible values of m, in order for H0 to not be rejected at the 5% significance level. So P (mL ≤ M̄20 ≤ mU ∣H0 ) = 0.95. So by calculator, mU ≈ 12.35060902 and mL ≈ 11.64939098. So the set of possible values of m is [11.64939098, 12.35060902]. (ii) The sample mean is M̄40 ∼ N (µ, 0.82 /40). The competing hypotheses are H0 ∶ µ = 12 and HA ∶ µ < 12. The p-value P (M̄20 ≤ 11.75∣H0 ) = 0.02405341 is less than 0.05, so we can reject H0 — this is evidence in favour of the company’s claim. Answer to Exercise 172 (8864 N2013/I/11). (i) Let A ∼ N (1000, σ 2 ) be the mass (in g) of a Type A packet of animal food. P(A < 990) = P(Z < (990 − 1000) /σ) = 0.2 ⇐⇒ (990 − 1000) /σ = −0.841621234 ⇐⇒ σ ≈ 11.9. (ii) Let P ∼ N (240, 102 ) and Q ∼ N (145, 82 ) be the masses (in g) of a scoop of P and a scoop of Q, respectively. Then B = P1 + P2 + P3 + Q1 + Q2 ∼ N (3 ⋅ 240 + 2 ⋅ 145, 3 ⋅ 102 + 2 ⋅ 82 ) = N (1010, 428) . And so P(B < 1000) ≈ 0.314. (iii) B − A ∼ N(1010 − 1000, 428 + 11.92 ) ≈ N(10, 569). So P(B > A) = P(B − A > 0) ≈ 0.662. Answer to Exercise 173 (8864 N2013/I/12). (i) (ii) P(A) = P(5 or 6) + P(1)P(4 or 5) + P(2)P(3 or 4) = 1/3 + (1/6)(1/3) + (1/6)(1/3) = 4/9. (iii) P(A ∩ B) = P(1)P(4 or 5) + P(2)P(3 or 4) = 1/9. (iv) P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 4/9 + 1/3 − 1/9 = 2/3. (v) P(A′ ) = 1 − P(A) = 1 − 4/9 = 5/9. P(B ∩ A′ ) = P(1)P(1 or 2 or 3) + P(2)P(1 or 2) = (1/6)(1/2) + (1/6)(1/3) = 5/36. So P(B∣A′ ) = P(B ∩ A′ )/P(A′ ) = (5/36)/(5/9) = 1/4. Page 391, Table of Contents www.EconsPhDTutor.com Answer to Exercise 174 (8864 N2012/I/6). (i) Take an ordered list of the population. Select every kth object in the list, until we get our desired sample size. (ii) Advantage: He can expect the supermarket to be busy and get all the respondents he needs. Disadvantage: The adults who go to the main supermarket at midday may not be representative of the adult population. (iii) Get an ordered list of the adult population. Select every 91st person on the list, until he has his 100 respondents. Answer to Exercise 175 (8864 N2012/I/7). (i) A, B independent ⇐⇒ P(A)P(B) = p2 = P(A ∩ B). P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 2p − p2 . So p2 − 2p + 5/9 = 0. (ii) p2 − 2p + 5/9 = (p − 5/3)(p − 1/3) = 0 ⇐⇒ p = 1/3 (reject p = 5/3). P(A ∩ B) = p2 = 1/9. Answer to Exercise 176 (8864 N2012/I/8). (i) 50% × 60% = 0.3. (ii) Proportion of votes cast by males: 50%×60%+35%×40%+15%×20% = 0.3+0.14+0.03 = 0.47. Thus P(Female) = 0.53. (iii) P(Male) = 0.47. P(C ∩ Male) = 0.03. So P(C∣Male) = P(C ∩ Male)/P(Male) = 3/47. Answer to Exercise 177 (8864 N2012/I/9). (i) y x (ii) r ≈ −0.9840253445 is very large and negative. This suggest a very strong, negative linear correlation between x and y. (iii) y = −19.21x + 183.12. (iv) (a) ŷx=4 = −19.21(4) + 183.12 ≈ 106.26. (iv) (b) ŷx=9 = −19.21(9) + 183.12 ≈ 10.23. (v) We are supposed to say that the estimate ŷx=4 is reliable because it involves interpolation and the estimate ŷx=9 is not because it involves extrapolation. Page 392, Table of Contents www.EconsPhDTutor.com Answer to Exercise 178 (8864 N2012/I/10). (i) Let A ∼ B(12, 0.8) be the number that flower. P(A = 10) ≈ 0.283 (calculator). (ii) P(A < 8) ≈ 0.0726 (calculator). (iii) Let B ∼ B(96, 0.8) be the number that flower. By the CLT, B is well-approximated by C ∼ N(96 ⋅ 0.8, 96 ⋅ 0.8 ⋅ 0.2) = N(76.8, 15.36). So, P(B > 75) ≈ P(C > 75.5) ≈ 0.630. [This is fairly close to the exact probability of P(B > 75) ≈ 0.638 (calculator).] (iv) Using the approximation in (iii), the answer is C(3, 2)0.6302 (1−0.630)+0.6303 ≈ 0.691. [Using instead the exact probability, it is C(3, 2)0.6382 (1 − 0.638) + 0.6383 ≈ 0.702.] Answer to Exercise 179 (8864 N2012/I/11). (i) Unbiased estimates of the population mean and variance are −60 x̄ = + 300 = 299.4, 100 1240 − (−60) 1204 100 2 s = = . 100 − 1 99 2 (ii) Let X̄100 ∼ (µ, σ 2 /100) be the sample mean. By the CLT, it is approximately the case that X̄100 ∼ N (µ, s2 /100). The competing hypotheses are H0 ∶ µ = 300 and HA ∶ µ ≥ 300. The p-value is P (X̄100 ≥ 299.4∣H0 ) ≈ 0.957. We are unable to reject H0 . This is evidence against the manager’s claim. (iii) Let X̄100 ∼ (µ, 12.1/100) be the sample mean. By the CLT, it is approximately the case that X̄100 ∼ N (µ, 0.121). The competing hypotheses are H0 ∶ µ = 300 and HA ∶ µ ≥ 300. The minimum kmin at which we’d be able to reject H0 at the 10% significance level is given by P (X̄100 ≥ kmin ∣H0 ) = 0.1. So by calculator, kmin ≈ 300.4457884. Page 393, Table of Contents www.EconsPhDTutor.com Answer to Exercise 180 (8864 N2012/I/12). (i) A1 +A2 +⋅ ⋅ ⋅+A10 ∼ N (10 ⋅ 0.25, 10 ⋅ 0.022 ). So P (A1 + A2 + ⋅ ⋅ ⋅ + A10 < 2.4) ≈ 0.0569. (ii) A1 + A2 + ⋅ ⋅ ⋅ + A6 ∼ N (6 ⋅ 0.25, 6 ⋅ 0.022 ). B1 + B2 + ⋅ ⋅ ⋅ + B5 ∼ N (5 ⋅ 0.35, 5 ⋅ 0.032 ). So A1 + A2 + ⋅ ⋅ ⋅ + A6 − (B1 + B2 + ⋅ ⋅ ⋅ + B5 ) ∼ N (6 ⋅ 0.25 − 5 ⋅ 0.35, 6 ⋅ 0.022 + 5 ⋅ 0.032 ) Ô⇒ P (−0.2 < A1 + A2 + ⋅ ⋅ ⋅ + A6 − (B1 + B2 + ⋅ ⋅ ⋅ + B5 ) < 0.2) ≈ 0.274. (iii) Mrs Woo and Mr Tan pay, respectively, W = 1.5 (A1 + A2 + A3 ) + 2.4 (B1 + B2 + B3 ) ∼ N (1.5 ⋅ 3 ⋅ 0.25 + 2.4 ⋅ 3 ⋅ 0.35, 1.52 ⋅ 3 ⋅ 0.022 + 2.42 ⋅ 3 ⋅ 0.032 ) = N (3.645, 0.018252) T = 1.5 (A1 + A2 + ⋅ ⋅ ⋅ + A10 ) ∼ N (1.5 ⋅ 10 ⋅ 0.25, 1.52 ⋅ 10 ⋅ 0.022 ) = N(3.75, 0.009). So W − T ∼ N (−0.105, 0.027252). And P(W − T > 0) ≈ 0.262. Answer to Exercise 181 (8864 N2011/I/6). P(A ∪ B) = P(A) + P(B) − P(A ∩ B) or 1 0.46 = a + b − ab. 2 Moreover, A, B independent ⇐⇒ P(A)P(B) = P(A ∩ B) or ab = 0.04. 2 1 Plug = into = to get: 0.46 = a + 0.04/a − 0.04 or a2 − 0.5a + 0.04 = 0. a2 − 0.5a + 0.04 = (a − 0.1)(a − 0.4) = 0 ⇐⇒ a = 0.1, 0.4. Answer to Exercise 182 (8864 N2011/I/7). (i) Every student is equally likely to be chosen. (ii) The three strata are “car”, “bicycle”, and “on foot”. The totals for each stratum are 440, 760, and 800, for a grand total of 2000 students. So from each stratum, take 22, 38, and 40 students. (iii) Stratified sampling usually results in lower sample variance (than simple random sampling). A better stratified sample of size 100 could have been achieved by using six strata instead of just three: namely “Year 1 car”, “Year 1 bicycle”, “Year 1 on foot”, “Year 2 car”, “Year 2 bicycle”, and “Year 2 on foot”. Page 394, Table of Contents www.EconsPhDTutor.com Answer to Exercise 183 (8864 N2011/I/8). (i) T H (ii) r ≈ −0.9670056283 is large and negative, which suggests there is a strong, negative linear correlation between H and T . (iii) T = −0.01472090021H + 27.00297934. (iv) T̂H=1000 = −0.01472090021(1000) + 27.00297934 ≈ 12.28. We are supposed to say it’s reliable because it involves interpolation. Answer to Exercise 184 (8864 N2011/I/9). (i) The lifetime of a light bulb in this batch is L ∼ (µ, 14002 ). The sample mean lifetime is L̄50 ∼ (µ, 14002 /50). By the CLT, we have approximately L̄50 ∼ N (µ, 14002 /50). The competing hypotheses are H0 ∶ µ = 12000 and HA ∶ µ < 12000. The p-value is P (L̄50 ≤ 11500∣H0 ) ≈ 0.00577864 < 0.01, so we can reject H0 . This is evidence in favour of believing that this particular batch is substandard. (ii) P (L̄50 ≤ Tmin ∣H0 ) = 0.05 ⇐⇒ Tmin ≈ 11674.3356 (calculator). Answer to Exercise 185 (8864 N2011/I/10). (i) (a) Let X ∼ B(7, 0.8) be the number of times Jon completes the puzzle. P(X = 3) = 0.028672 (calculator). (i) (b) P(X ≥ 5) = 0.851968 (calculator). (ii) 0.8519685 ≈ 0.449. (iii) Let Y ∼ B(70, 0.8) be the number of times Jon completes the puzzle. By the CLT, Y is well-approximated by A ∼ N(70 ⋅ 0.8, 70 ⋅ 0.8 ⋅ 0.2) = N(56, 11.2). So P(Y ≥ 50) ≈ P(A ≥ 49.5) ≈ 0.974. [This is fairly close to the exact probability P(Y ≥ 50) ≈ 0.970.] Page 395, Table of Contents www.EconsPhDTutor.com Answer to Exercise 186 (8864 N2011/I/11). (i) (a) (i) (b) P(Red ball) = P(A ∩ Red ball) + P(B ∩ Red ball) = P(A)P(Red ball∣A) + P(B)P(Red ball∣B) = 1/45/10 + 3/46/8 = 11/16. (i) (c) P(A∣Red ball) = P(A ∩ Red ball)/P(Red ball) = 1/8/11/16 = 2/11. (ii) P(Same) = P(A ∩ Same) + P(B ∩ Same) = P(A)P(Same∣A) + P(B)P(Same∣B) 1 5 4 4 3 3 65 21 8 3 163 + )+ ( + )= + = . = ( 4 10 9 10 9 4 87 87 90 7 315 Answer to Exercise 187 (8864 N2011/I/12). Let B ∼ N (60, 122 ) and G ∼ N (50, 102 ) be the masses of a boy and a girl. (i) P (50 ≤ B ≤ 70) ≈ 0.595 (calculator). (ii) B − G ∼ N (60 − 50, 122 + 102 ) = N (10, 244). So P (B > G) = P (B − G > 0) ≈ 0.739 (calculator). (iii) B1 + B2 + B3 + G1 + G2 ∼ N (3 ⋅ 60 + 2 ⋅ 50, 3 ⋅ 122 + 2 ⋅ 102 ) = N (280, 632). So P (B1 + B2 + B3 + G1 + G2 < 300) ≈ 0.787. (iv) B1 + B2 + ⋅ ⋅ ⋅ + B6 ∼ N (6 ⋅ 60, 6 ⋅ 122 ) = N (360, 864). √ P (B1 + B2 + ⋅ ⋅ ⋅ + B6 < L) = P (Z < (L − 360)/ 864) = 0.95 √ ⇐⇒ (L − 360)/ 864 ≈ 1.644853627 ⇐⇒ L ≈ 408.349. Answer to Exercise 188 (8864 N2010/I/6). (i) P (A ∩ B) = P(A∣B)P (B) = 0.2 ⋅ 0.3 = 0.06. (ii) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.6 + 0.3 − 0.06 = 0.84. (iii) P (A ∪ B) − P (A ∩ B) = 0.84 − 0.06 = 0.78. Page 396, Table of Contents www.EconsPhDTutor.com Answer to Exercise 189 (8864 N2010/I/7). (i) 0.3 × 0.1 = 0.03. (ii) 0.3 × 0.9/(0.7 + 0.3 × 0.9) = 27/97. (iii) C(3, 2)0.72 (0.3 × 0.9) = 3 × 0.49 × 0.27 = 0.3969. Answer to Exercise 190 (8864 N2010/I/8). There are 3, 000 students total. (i) Randomly pick 28, 18, and 14 students from Years One, Two, and Three respectively. (ii) Stratified sampling usually results in a lower sample variance. (iii) Unbiased estimates of µ and σ 2 are 10450 x̄ = = 209, 50 2235000 − 10450 50 s = ≈ 1039.79591837. 50 − 1 2 2 (iv) (1) The large sample size lets us use the CLT approximation. (2) What each student spends is independent of what any other student spends (this assumption is actually already implicit in the definition of a random sample). Answer to Exercise 191 (8864 N2010/I/9). Let X ∼ B(8, 0.7) be the number that germinate. (i) P(X = 6) ≈ 0.296 (calculator). (ii) P(X ≥ 6) ≈ 0.552 (calculator). (iii) Let Y ∼ B(60, 0.7) be the number that germinate. By the CLT, Y is well-approximated by A ∼ N(42, 12.6). So P(Y < 40) ≈ P(A < 39.5) ≈ 0.241. [This is fairly close to the exact probability of P(Y < 40) ≈ 0.238 (calculator).] Answer to Exercise 192 (8864 N2010/I/10). Let X ∼ N (µ, 1.22 ) be the mass of a component. (i) The sample mean is X̄80 ∼ N (µ, 1.22 /80). The competing hypotheses are H0 ∶ µ = 15 and HA ∶ µ ≠ 15. The p-value P (X̄80 ≥ 15.25, X̄80 ≤ 14.75∣H0 ) ≈ 0.06240742 is more than 5%, so we fail to reject H0 . This fails to cast doubt or provide evidence against the factory owner’s claim. (ii) The sample mean is X̄80 ∼ N (µ, 1.22 /80). The competing hypotheses are H0 ∶ µ = 15 and HA ∶ µ < 15. The maximum observed sample mean kmax at which we’d reject H0 (in favour of the owner’s new claim) is given by: P (X̄80 ≤ kmax ∣H0 ) = 0.05. So by calculator, kmax ≈ 14.77931973. So the set of values of x̄80 for which we’d reject H0 (in favour of the owner’s new claim) is [0, 14.77931973). Page 397, Table of Contents www.EconsPhDTutor.com Answer to Exercise 193 (8864 N2010/I/11). (a) (i) y x (a) (ii) y x (b) (i) y x (b) (ii) r ≈ 0.9872953317. (b) (iii) y = 0.1069x + 0.5615. (b) (iv) ŷx=40 = 0.1069(40) + 0.5615 = 4.8375. We are supposed to say that this is a reliable estimate because it involves interpolation. (b) (v) m is unchanged and c increases by N . Page 398, Table of Contents www.EconsPhDTutor.com Answer to Exercise 194 (8864 N2010/I/12). Let U ∼ N (40, 32 )be the masses of a unwrapped sweet. (i) P(U < 36) ≈ 0.0912 (calculator). (ii) Let W ∼ N (44, 32 + 0.52 ) = N (44, 9.25) be the mass of a wrapped sweet. P(42 < W < 46) ≈ 0.489 (calculator). (iii) Let T ∼ N (12 ⋅ 44 + 50, 12 ⋅ 9.25 + 52 ) = N (578, 136) be e mass of a tube. P(T > 600) ≈ 0.0296 (calculator). (iv) Let X ∼ N (µ, σ 2 ) be the mass of a tube produced by the rival company. 1 P(X < 450) = P (X < (450 − µ)/σ) = 0.05 ⇐⇒ (450 − µ)/σ = −1.644853627. 2 P(X > 550) = P (X > (550 − µ)/σ) = 0.08 ⇐⇒ (550 − µ)/σ = 1.40507156. (550 − µ) − (450 − µ) = 100 = 1.40507156σ − (−1.644853627) σ = 3.049925187σ. So σ ≈ 32.788 and σ 2 ≈ 1075.033. And µ ≈ 503.931. Answer to Exercise 195 (8864 N2009/I/6). (i) 0.2 ⋅ 0.7 = 0.14. (ii) 0.2 ⋅ 0.7 + 0.3 ⋅ 0.6 + 0.5 ⋅ 0.8 = 0.14 + 0.18 + 0.4 = 0.72. (iii) (0.5 ⋅ 0.2)/(1 − 0.72) = 5/14. Answer to Exercise 196 (8864 N2009/I/7). (i) P(A ∩ B) = P(A) + P(B) − P(A ∪ B) = 1/3 + 2/5 − 17/30 = 1/6. (ii) P(A)P(B) = 2/15 is not equal to P(A ∩ B) = 1/6, so A and B are not independent. (iii) P(A′ ∪ B) = 1 − [P(A) − P(A ∩ B)] = 1 − (1/3 − 1/6) = 5/6. Answer to Exercise 197 (8864 N2009/I/8). Let X ∼ N (120, 182 ) be the lifetime of a component. (i) P(X > 144) ≈ 0.09121122 (calculator). (ii) P (X1 < 144) P (X2 > 144) + P (X1 > 144) P (X2 < 144) ≈ 0.16578346669. (iii) Let X ∼ N (µ, 182 ) be the new lifetime of a component. The sample mean is X̄50 ∼ N (µ, 182 /50). The competing hypotheses are H0 ∶ µ = 120 and HA ∶ µ > 120. The p-value is P (X̄50 ≥ 124∣H0 ) ≈ 0.05805087 > 0.05, so we fail to reject H0 . This fails to provide evidence in favour of the company’s claim. Page 399, Table of Contents www.EconsPhDTutor.com Answer to Exercise 198 (8864 N2009/I/9). (i) y x (ii) r ≈ 0.9306540721 is fairly large and positive, suggesting a fairly strong positive linear correlation between x and y. (iii) y = 0.01232906764x + 15.48661792. (iv) ŷx=135 = 0.01232906764(135) + 15.48661792 ≈ 17.15. (v) We are supposed to say that this involves extrapolation and is thus unreliable/unsuitable. Answer to Exercise 199 (8864 N2009/I/10). (i) Let X ∼ B(10, 0.2) be the number (out of ten) who fail. P(X = 2) ≈ 0.302 (calculator). (ii) Let Y ∼ B(10, 0.8 ⋅ 0.15) be the number (out of ten) who get a distinction. P(Y < 2) ≈ 0.658 (calculator). (iii) Let A ∼ B(50, 0.2) be the number (out of 50) who fail. By the CLT, A is wellapproximated by B ∼ N(10, 8). So P(A ≤ 12) ≈ P(B ≤ 12.5) ≈ 0.812. [This is fairly close to the exact probability P(A ≤ 12) ≈ 0.814 (calculator).] Page 400, Table of Contents www.EconsPhDTutor.com Answer to Exercise 200 (8864 N2009/I/11). (a) (i) Sort the claims in alphabetical order. Then take the 9th, 18th, . . . , and 72nd claims in the list. (a) (ii) Probably. The first 8 submissions might not be representative of the 72 received that day. For example, it might be that those who wake up early and submit their insurance claims early are also the ones who make the most outrageous claims. (b) (i) Unbiased estimates of the population mean and variance are 5320 1 x̄ = + 1000 = 1044 , 120 3 8282000 − 5320 120 s2 = ≈ 67614.6778711. 120 − 1 2 (b) (ii) An ‘unbiased estimate’ is generated by an unbiased estimator, which is a random variable whose expected value is equal to the parameter of interest. (b) (iii) The sample mean is X̄72 ∼ (µ, σ 2 /120). By the CLT, we have approximately X̄72 ∼ N (µ, s2 /120). The p-value is 1 2 2 P (X̄72 > 1044 , X̄72 < 955 ∣H0 ) = 2P (X̄72 < 955 ∣H0 ) ≈ 0.06180786. 3 3 3 So we’d reject H0 if α ? 0.06180786. Answer to Exercise 201 (8864 N2009/I/12). (a) Let X ∼ N (µ, σ 2 ) be the mass of a plum. 1 P(X < 22) = P(Z < (22 − µ)/σ) = 0.3 ⇐⇒ (22 − µ)/σ ≈ −0.524400513. 2 P(X > 29) = P(Z > (29 − µ)/σ) = 0.2 ⇐⇒ (29 − µ)/σ ≈ 0.841621234. (29 − µ) − (22 − µ) = 0.841621234σ − (−0.524400513) σ = 7 = 1.366021747σ ⇐⇒ σ ≈ 5.124. And µ ≈ 24.687. (b) (i) Let A ∼ N (0.15, 0.032 ) and N ∼ N (0.07, 0.022 ) be the masses of an apple and a nectarine. A1 + A2 − (N1 + N2 + N3 + N4 ) ∼ N (2 ⋅ 0.15 − 4 ⋅ 0.07, 2 ⋅ 0.032 + 4 ⋅ 0.022 ) = N (0.02, 0.0034). P (A1 + A2 > (N1 + N2 + N3 + N4 )) = P (A1 + A2 − (N1 + N2 + N3 + N4 ) > 0) ≈ 0.634 (calculator). (b) (ii) 9 (A1 + A2 ) + 12 (N1 + N2 + N3 + N4 ) is the random variable with distribution N (9 ⋅ 2 ⋅ 0.15 + 12 ⋅ 4 ⋅ 0.07, 92 ⋅ 2 ⋅ 0.032 + 122 ⋅ 4 ⋅ 0.022 ) = N (6.06, 0.3762) . P (5 < 9 (A1 + A2 ) + 12 (N1 + N2 + N3 + N4 ) < 6) ≈ 0.419 (calculator). Page 401, Table of Contents www.EconsPhDTutor.com Answer to Exercise 202 (8864 N2008/I/7). (i) The normal distribution would suggest that a non-trivial percentage of students get more than 100. (ii) The sample mean is X̄50 ∼ (72.1, 15.22 /50). By the CLT, we have approximately X̄50 ∼ N (72.1, 15.22 /50). So by calculator, P (70.0 ≤ X̄50 ≤ 75.0) ≈ 0.74704179. Answer to Exercise 203 (8864 N2008/I/8). (i) C(6, 3)0.63 0.43 = 0.27648. (ii) Let X ∼ B(40, 0.6) be the number that are crusty. By the CLT, X is well-approximated by Y ∼ N(24, 9.6). So P(X ≥ 20) ≈ P(Y ≥ 19.5) ≈ 0.927. [This is fairly close to the exact probability P(X ≥ 20) ≈ 0.926 (calculator).] (iii) Let M ∼ N (1.24, σ 2 ) be the mass of a loaf. P(M < 1) = P(Z < (1 − 1.24)/σ) = 0.04 ⇐⇒ (1 − 1.24)/σ = −1.750686071 ⇐⇒ σ ≈ 0.137. Answer to Exercise 204 (8864 N2008/I/9). (i) ϴ (ii) If Tan’s pen is red, then there are 2 red pens, 5 blue pens, and 1 green pen in the box when Mui gets a randomly-chosen pen. So the probability that Mui’s pen is blue is 5/8. (iii) If Tan’s pen is red, then there are 2 red pens, 5 blue pens, and 1 green pen in the box when Mui gets a randomly-chosen pen; and Mui gets a red pen with probability 2/8. If Tan’s pen is blue, then there are 3 red pens, 4 blue pens, and 1 green pen in the box when Mui gets a randomly-chosen pen; and Mui gets a red pen with probability 3/8. Altogether then, her probability of getting a red pen is 3/82/8 + 5/83/8 = 21/64. (iv) Mui’s pen is blue with probability 1 − 21/64 − 1/8 = 35/64. Tan’s pen is red and Mui’s pen is blue with probability 3/85/8 = 15/64. Thus, the desired conditional probability is 15/64/35/64 = 3/7. Page 402, Table of Contents www.EconsPhDTutor.com Answer to Exercise 205 (8864 N2008/I/10). (i) The sample mean is X̄70 ∼ (µ, σ 2 /70). By the CLT, we have approximately X̄70 ∼ N (µ, s2 /70). The competing hypotheses are H0 ∶ µ = 150 and HA ∶ µ < 150. The observed sample mean and observed sample variance are, respectively, 10317 ≈ 147.385714286, x̄70 = 70 1540231 − 10317 70 2 s = ≈ 284.820082816. 70 − 1 2 The p-value is the probability of getting a test statistic that is at least as extreme as that actually observed. It is: P (X̄70 < x̄70 ∣H0 ) ≈ 0.09748170. (ii) The sample mean is W̄120 ∼ (µ, σ 2 /120). By the CLT, we have approximately W̄120 ∼ N (µ, s2w /120). The observed sample mean and observed sample variance are, respectively, 1 10317 + 7331 = 147 , w̄120 = 70 + 50 15 2 1540231 + 1100565 − (10317+7331) 70+50 2 sw = ≈ 381.205602241. 70 + 50 − 1 The p-value P (W̄120 < w̄120 ∣H0 ) ≈ 0.04990429 is less than 10%, so we are able to reject H0 . Answer to Exercise 206 (8864 N2008/I/11). (i) y (17, 343.75) x (ii) (x̄, ȳ) ≈ (17, 343.75) is indicated in blue. (iii) y = 17.083̇x + 53.3̇. (iv) r ≈ 0.9688043135 is very large and positive, suggesting a strong positive linear correlation between x and y. (v) ŷx=20 = 17.083̇(20) + 53.3̇ = 395. The estimated corresponding profit is $395, 000. 2 (vi) ŷx=40 = 17.083̇(40) + 53.3̇ = 736 . The predicted corresponding profit is $736, 667. We 3 are supposed to say that this prediction is unreliable. Page 403, Table of Contents www.EconsPhDTutor.com Answer to Exercise 207 (8864 N2008/I/12). (i) Let S ∼ N (5 ⋅ 0.234, 5 ⋅ 0.0252 ) = N(1.17, 0.003125). P(S > 1.2) ≈ 0.296 (calculator). (ii) S1 + S2 ∼ N(2 ⋅ 1.17, 2 ⋅ 0.003125) = N(2.34, 0.00625). L ∼ N(10 ⋅ 0.234, 10 ⋅ 0.0252 ) = N(2.34, 0.00625). So S1 + S2 − L ∼ N(0, 0.0125). P(L − 0.2 < S1 + S2 < L + 0.2) = P(−0.2 < S1 + S2 − L < 0.2) ≈ 0.926. (iii) Lee pays 1.5 (S1 + S2 ) ∼ N (1.5 ⋅ 2.34, 1.52 ⋅ 0.00625) = N (3.51, 0.0140625). Foo pays 1.2L ∼ N (1.2 ⋅ 2.34, 1.22 ⋅ 0.00625) = N (2.808, 0.009). So 1.5 (S1 + S2 ) − 1.2L ∼ N (0.702, 0.0230625). And P (1.5 (S1 + S2 ) − 1.2L ≥ 0.5) ≈ 0.908 (calculator). Answer to Exercise 208 (8864 N2007/I/6). Let M ∼ N (502, 0.82 ) be the mass of margarine in a packet. (i) P(M < 500) ≈ 0.00621 (calculator). (ii) The new mass of margarine in a packet is M ∼ N (µ, 0.82 ). P(M < 500) = P(Z < (500 − µ)/0.8) = 0.001 ⇐⇒ (500 − µ)/0.8 ≈ −3.090232306 ⇐⇒ µ ≈ 502.4721858. Answer to Exercise 209 (8864 N2007/I/7). (i) Systematic. (ii) Advantage: Simple to implement. Disadvantage: The students who do not buy lunch have no possibility of being included in her sampling method. (iii) Take an alphabetical list of all students. Select every kth student on the list. Answer to Exercise 210 (8864 N2007/I/8). (i) y x (ii) r ≈ 0.9734793616. (iii) x = 0.9397628752y − 10.55810619. (iv) (a) ŷx=28 = 16.7 + 1.01(28) = 44.98. (iv) (b) x̂y=198 = 0.9397628752(198) − 10.55810619 ≈ 175.5. (v) We are supposed to say that the estimate ŷx=28 is reliable because it involves interpolation, but the estimate x̂y=198 is not because it involves extrapolation. Page 404, Table of Contents www.EconsPhDTutor.com Answer to Exercise 211 (8864 N2007/I/9). (i) P(X = 4) = C(6, 4)p4 (1 − p)2 . (ii) P(X = 4) = C(6, 4)p4 (1 − p)2 = 15(1/4)4 (3/4)2 = 15 ⋅ 9/46 = 135/4096. √ √ √ (iii) µ = np = 6(1/4) = 3/2 and σ = np(1 − p) = 6(1/4)(3/4) = 9/8. So ⎛3 P(µ − σ < X < µ + σ) = P − ⎝2 √ 9 3 <X < + 8 2 √ 9⎞ = P(X = 1, X = 2) 8⎠ 1 1 3 5 1 2 3 4 6 ⋅ 243 + 15 ⋅ 81 2673 = C(6, 1) ( ) ( ) + C(6, 2) ( ) ( ) = = ≈ 0.63. 4 4 4 4 46 4096 Answer to Exercise 212 (8864 N2007/I/10). (i) Unbiased estimates of the population mean and variance are −35.8 + 500 = 499.284, x̄ = 50 150.5 − (−35.8) 50 2 s = ≈ 2.54831020408. 50 − 1 2 (ii) The sample mean is X̄50 ∼ N (µ, σ 2 /50). We can use s2 as an unbiased estimate for σ 2 . The competing hypotheses are H0 ∶ µ = 500 and HA ∶ µ < 500. And so the p-value is P (X̄50 ≤ 499.284∣H0 ) ≈ 0.00075813 < 0.05, so we can reject H0 . (iii) No, the sample size was large enough that we could have used the CLT. Answer to Exercise 213 (8864 N2007/I/11). (i) (a) P(M ) = (18 + 48 + 6)/120 = 3/5. (i) (b) P(M ∩ G) = 18/120 = 3/20. (i) (c) P(M ∪ B) = (18 + 48 + 6 + 22)/120 = 47/60. (i) (d) P(M ∣R′ ) = (18 + 48)/(18 + 48 + 12 + 22) = 66/100 = 0.66. (ii) P(M )P(G) = (3/5) (30/120) = 3/20 is equal to P(M ∩ G) = 3/20; thus M and G are indeed independent. (iii) The number of blue cars with bicycle racks is 0.3 ⋅ 70 = 21. The number of cars with bicycle racks is 0.2 ⋅ 30 + 0.3 ⋅ 70 + 0.05 ⋅ 20 = 6 + 21 + 1 = 28. So the desired probability is 21/28 = 3/4. Page 405, Table of Contents www.EconsPhDTutor.com Answer to Exercise 214 (8864 N2007/I/12). N (55, 10.52 ) be the masses of a man and a woman. Let M ∼ N (75, 12.52 ) and W ∼ (i) P (M1 > 90) P (M2 < 90)+P (M2 > 90) P (M1 < 90) = 2 (0.11506967) (0.88493033) ≈ 0.204 (calculator). (ii) W − M ∼ N (−20, 10.52 + 12.52 ). So P(W > M ) = P(W − M > 0) ≈ 0.110 (calculator). (iii) M1 +M2 +⋅ ⋅ ⋅+M6 ∼ N (6 ⋅ 75, 6 ⋅ 12.52 ) = N (450, 937.5). So P (M1 + M2 + ⋅ ⋅ ⋅ + M6 > 530) ≈ 0.00449034 (calculator). (iv) The weights of the hotel guests are probably not independent. The distribution of weights of the hotel guests may differ from that of the population. Answer to Exercise 215 (8174 N2006/II/8). P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = P(A) + P(B) − P(A)P(B) or 0.7 = 0.6 + b − 0.6b. So b = 0.25 and P(A ∩ B) = 0.15. P(A ∩ B ′ ) = P(A) − P(A ∩ B) = 0.6 − 0.15 = 0.45. Answer to Exercise 216 (8174 N2006/II/9). (i) Quota. (ii) Systematic. (iii) Use a computer random number generator to generate, for each member, a number between 0 and 1. Take the x members with the largest numbers, where x is his desired sample size. (iv) (a) 25 women. (iv) (b) 15 men from squash. Answer to Exercise 217 (8174 N2006/II/13). (i) Let X ∼ B(12, 0.3) be the number of residents (out of 12) who watch the programme. P(X = 4) ≈ 0.231. (ii) Let Y ∼ B(80, 0.3) be the number of residents (out of 80) who watch the programme. By the CLT, Y is well-approximated by A ∼ N(24, 16.8). So P(20 < Y < 30) ≈ P(20.5 < A < 29.5) ≈ 0.714. [This is fairly close to the exact probability P(20 < Y < 30) ≈ 0.711 (calculator).] Page 406, Table of Contents www.EconsPhDTutor.com Answer to Exercise 218 (8174 N2006/II/14). (i) P(W W ) = 0.82 = 0.64. (ii) P(W W W )+P(W LW )+P(LW W )+P(LLW ) = 0.83 +0.8⋅0.2⋅0.4+0.2⋅0.4⋅0.8+0.2⋅0.6⋅0.4 = 0.512 + 0.128 + 0.048 = 0.688. (iii) P(W W W )+P(W LW )+P(LW W )+P(W W L) = 0.512+0.128+0.82 ⋅0.2 = 0.64+0.128 = 0.668. Answer to Exercise 219 (8174 N2006/II/14-OR). Let X ∼ N (176, 42 ) be the height of a male student. (i) P(X < 170) ≈ 0.0668 (calculator). (ii) P (X > k) = 0.1 ⇐⇒ k ≈ 181.1262063 (calculator). Let Y ∼ N (m, s2 ) be the height of a female student. 1 (iii) P (Y < 150) = P (Z < (150 − m) /s) = 0.006 ⇐⇒ (150 − m) /s ≈ −2.512144328. 1 P (Y < 175) = P (Z < (175 − m) /s) = 0.883 ⇐⇒ (175 − m) /s ≈ 1.190118042. (175 − m) − (150 − m) = 25 ≈ 1.190118042s − (−2.512144328s) = 3.70226237s ⇐⇒ s ≈ 6.753. And m ≈ 166.964. (This is the last page of this textbook.) Page 407, Table of Contents www.EconsPhDTutor.com I make educational YouTube videos too! Mostly on economics. Do me a favour by checking them out! I’m a newbie at this, so please feel free to leave me a comment if you have any feedback or suggestions. YouTube.com/EconCow EconCow.com Tuition Ad I give tuition for any of the following subjects:  Economics  Mathematics  Writing, English, General Paper. I have a PhD in economics (University of Michigan, 2015) and have been teaching and tutoring since 2010. For more information, please visit: www.EconsPhDTutor.com Or simply email: DrChooYanMin@gmail.com

H1 Mathematics Textbook: 8865 Syllabus

Related documents

Products

Support

H1 Mathematics Textbook: 8865 Syllabus

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib